Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6447743

Warn about chroot instability on Linux

    Details

    • Type: Bug
    • Status: Closed
    • Priority: P4
    • Resolution: Fixed
    • Affects Version/s: 6
    • Fix Version/s: 6
    • Component/s: hotspot
    • Labels:
    • Subcomponent:
    • Resolved In Build:
      b96
    • CPU:
      generic
    • OS:
      linux

      Backports

        Description

        There were complaints at a customer site that hotspot crashed while running in a chroot environment on linux. There are known bugs on linux likely to be fixed, but it was agreed at the Hotspot runtime meeting 5/3/06 that we should at least warn the user when we detect that this is present. The email thread from 4/14/06 extracted below describes the problem and possible solutions in detail.

        --- ###@###.### writes:
        Hello,

        I only caught some of the presentation of the G-S guy at Laurie's all-hands before my webfeed cut out, but one thing he mentioned was a problem with the VM when running in a chroot environment without a proc filesystem on linux. Apparently, the mechanism we use to get the number of processors returns '1' in this case, which means the VM can optimize for one processor and not worry about some SMP issues. In this case, we still are running SMP and don't know it so problems arise.

        Does anyone know if there open CR about this, or is anyone looking into it?

        I did a quick test on my Redhat desktop machine (with just a simple C program) and see similar results: The sysconf(_SC_NPROCESSORS_CONF) call returns 1 when there is no /proc filesystem, even when running on a 4-way machine. (it returns the correct value when there is a /proc).

        This looks to me to be a linux bug, though I suppose it might be in our interest to look into a workaround. Perhaps sysconf(_SC_NPROCESSORS_CONF) is not the best way to get the CPU count?

        Then again, I suppose we could just make sure to tell people that if they run on linux in a chroot environment, they better have a /proc filesystem. Maybe we could just release note it or something.

        Any ideas?

        --- ###@###.### writes:
        Hi Keith,

        Toward the end of the G-S talk the presenter seemed to admit the problem
        was actually a linux bug. Unfortunately the linux libraries still
        extract information from /proc.

        One thought would be to probe for /proc. If we can't access the usual
        files we pessimally assume ncpus > 1.

        Dave

        --- ###@###.### writes:

        I found this issue on Redhat's bugzilla:
        https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=151852

        Looks like it might be fixed in a newer version of chroot (9.3.1). Not sure if/when that's going into Redhat (I found it in a fedora rpm).

        Guess I won't worry about it, then.

        --- ###@###.### writes:

        ... at least for the purposes of LOCK: prefixes and membar/fence
        insertion. For GC we might want to assume something different. If
        /proc isn't visible we could also try to probe the number of CPUs with
        sched_getaffinity() or sched_setaffinit(), where available. That's
        sleazy and I'd advise against it, however.

        This is a linux bug. I don't think we should try work-arounds, although
        it's probably worth adding a release note.

        Dave

        --- nikolay@###@###.### writes:

          We can be cool, and on x86 compute number of CPUs like my program from attachment, although I absolutely agree with Dave that it's Linux bug and should be fixed by them.

          Nikolay.

        --- ###@###.### writes:

        I think that Paul's point on this issue is, gee, why is it that GS is finding this, and not Sun?

        A release note is definitely warranted, but I don't have an issue with some mechanism (hack) to correct the incorrect data from the OS. Dave is right, this is definitely an OS bug, but that doesn't relieve us of the responsibility to manage it.

        To a certain extent, this relates a bit to the relative success of Netscape in the early days of browser development: their policy was simple: "There is no such thing as bad HTML". And as a result their browser *always* rendered something reasonably well regardless of what was stuffed into the document. While it would be nice to just say, "Oh, its an OS problem", that's not really realistic if that is what customers are using.

        (Later on though, Netscape did pay a price; it was almost impossible to migrate to a new, more rationale code base. We need to balance our decisions accordingly, and make sure we carefully control how we build in hacks for problems like this.)

        And, FWIW, *in this case* falsely getting the processor count as non '1', is better than the reverse. (But I suspect there are other cases where the reverse is true, so we want to be careful about how we address this.)

        -John

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  coleenp Coleen Phillimore
                  Reporter:
                  coleenp Coleen Phillimore
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:
                    Imported:
                    Indexed: