Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8133666

OperatingSystemMXBean reports abnormally high machine CPU consumption on Linux

    Details

      Backports

        Description

        The code in UnixOperatingSystem.c used to calculate various CPU usage statics depends on out-of-date interpretation of various fields in /proc/stats that no longer hold true since Linux 2.6. This can result in issues like getSystemCpuLoad() reporting 100% CPU consumption for IO heavy loads.

        Specifically, we should add irq and sirq to the system CPU time. And we should add IO wait, irq and sirq (all three new fields) to the total time. This is similar to what tools like top and sar do on Linux.

          Issue Links

            Activity

            Hide
            dbuck David Buck added a comment - - edited
            The current code assumes the older (pre-2.6 linux) output where the cpu line of proc/stat looked like this:
            ===
            user nice system idle
            ===

            But since the 2.6 kernels, this line has been expanded to add 3 new fields, like this:
            ===
            user nice system idle iowait irq softirq
            ===

            To be specific, a subset of the time that was reported as "idle" in 2.4, may now be only reported as "iowait"
            Similarly, some of the time that used to be reported as "system" may now only be reported as irq or softirq.

            So now the "true" total time is the sum of all 7 of these numbers, not just the first 4. Also, the true system/kernel time should include both "irq" and "softirq" in addition to "system". I confirmed that this is the correct interpretation of these numbers by examining the source code for various tools like vmstat, the kernel (kernel/sched.c) and various documentation.

            Because our code measures CPU load as a ratio of total CPU time, not including the new numbers in the total figure can REALLY throw off our calculations. The easiest way to see this is to run something very IO intensive (even copying a file can be enough) and the system CPU usage reported by getSystemCpuLoad() will spike. All other tool like sar or top do not report time spent waiting for IO as CPU consumption, and neither should we.

            To test this fix, I just added a println into the main loop in /jdk9-dev/jdk/test/com/sun/management/OperatingSystemMXBean/GetSystemCpuLoad.java and ran it with high IO activity.

            One last note, since the 2.6 kernel release, there have been even more fields added to the cpu line of proc/stat, but after careful review I do not believe any of them should be included in our calculations. For example, the time reported for guest operating systems is already included in the "user" or "nice" fields. Also, it doesn't seem reasonable to try and include ticks "stolen" by a hypervisor or host OS.
            Show
            dbuck David Buck added a comment - - edited The current code assumes the older (pre-2.6 linux) output where the cpu line of proc/stat looked like this: === user nice system idle === But since the 2.6 kernels, this line has been expanded to add 3 new fields, like this: === user nice system idle iowait irq softirq === To be specific, a subset of the time that was reported as "idle" in 2.4, may now be only reported as "iowait" Similarly, some of the time that used to be reported as "system" may now only be reported as irq or softirq. So now the "true" total time is the sum of all 7 of these numbers, not just the first 4. Also, the true system/kernel time should include both "irq" and "softirq" in addition to "system". I confirmed that this is the correct interpretation of these numbers by examining the source code for various tools like vmstat, the kernel (kernel/sched.c) and various documentation. Because our code measures CPU load as a ratio of total CPU time, not including the new numbers in the total figure can REALLY throw off our calculations. The easiest way to see this is to run something very IO intensive (even copying a file can be enough) and the system CPU usage reported by getSystemCpuLoad() will spike. All other tool like sar or top do not report time spent waiting for IO as CPU consumption, and neither should we. To test this fix, I just added a println into the main loop in /jdk9-dev/jdk/test/com/sun/management/OperatingSystemMXBean/GetSystemCpuLoad.java and ran it with high IO activity. One last note, since the 2.6 kernel release, there have been even more fields added to the cpu line of proc/stat, but after careful review I do not believe any of them should be included in our calculations. For example, the time reported for guest operating systems is already included in the "user" or "nice" fields. Also, it doesn't seem reasonable to try and include ticks "stolen" by a hypervisor or host OS.
            Hide
            hgupdate HG Updates added a comment -
            URL: http://hg.openjdk.java.net/jdk9/hs-rt/jdk/rev/d49f4e34e260
            User: dbuck
            Date: 2015-08-18 13:43:03 +0000
            Show
            hgupdate HG Updates added a comment - URL: http://hg.openjdk.java.net/jdk9/hs-rt/jdk/rev/d49f4e34e260 User: dbuck Date: 2015-08-18 13:43:03 +0000
            Hide
            hgupdate HG Updates added a comment -
            URL: http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/d49f4e34e260
            User: lana
            Date: 2015-09-09 21:33:27 +0000
            Show
            hgupdate HG Updates added a comment - URL: http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/d49f4e34e260 User: lana Date: 2015-09-09 21:33:27 +0000
            Hide
            mcastegr Mattis Castegren (Inactive) added a comment -
            Marking this for release-note=no, JDK-8133527 will be release noted which is very similar
            Show
            mcastegr Mattis Castegren (Inactive) added a comment - Marking this for release-note=no, JDK-8133527 will be release noted which is very similar

              People

              • Assignee:
                dbuck David Buck
                Reporter:
                dbuck David Buck
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: