Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8144483

One long Safepoint pause directly after each GC log rotation

    Details

    • Subcomponent:
    • Resolved In Build:
      b03
    • CPU:
      sparc
    • OS:
      solaris_10

      Backports

        Description

        FULL PRODUCT VERSION :
        java version "1.8.0_51"
        Java(TM) SE Runtime Environment (build 1.8.0_51-b16)
        Java HotSpot(TM) 64-Bit Server VM (build 25.51-b03, mixed mode)

        Applies to 1.7.0_80 and above, 1.8.0_20 and above, 1.9.0.

        FULL OS VERSION :
        SunOS 5.10 Generic_142900-07 sun4u sparc SUNW,Sun-Fire-V240

        Due to analysis bug applies to any Solaris version.

        A DESCRIPTION OF THE PROBLEM :
        There's a bug in the Solaris specific HotSpot code which exhibits itself as extremely long pauses directly after GC log rotation.

        The bug can be seen in 1.7.0 patchlevel 80 and above, 1.8.0 patchlevel 20 and above and 1.9.0. It was noticed using -XX:PrintGCApplicationStoppedTime.

        During GC log rotation the function os::print_memory_info() in src/os/solaris/vm/os_solaris.cpp is called, which itself calls check_addr0().

        Now check_addr0() reads the file /proc/self/pmap, but does this entry by entry. Productive Java processes often have 1000 and more map entries (e.g. each thread stack is one entry, often each jar etc.

        Reading one entry takes about 35-40ms in a zone on a T4 system. Reading e.g. 1000 entries takes 35-40 seconds during which the JVM is stopped. We experienced even stop times up until 2400 seconds in extreme cases.

        Note that this doesn't have to do with GC. It only happens directly after GC log file rotation.

        The bug is reading the file entry by entry instead of using pread() to read it in one go. Look for instance at the source code of the Solaris command "pmap" which uses pread() to read the file at once.

        It might also be questioned, whether this map file check must happen during a safepoint or could also be done outside, only synchronizing in case a hit was found and a warning actually needs to be written to the log file.

        The bug was shadowed by another bug until and including 1.7.0_76 and 1.8.0_11: the close() that closes the map file after reading the entries was erroneously put inside the read loop instead of after the read loop. Thus these versions only read the first entry, then closed the file and erred out when trying to read the next entry. So reading was quick.

        The change that removed the shadowing bug was:

        Java 8: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/diff/7d28f4e15b61/src/os/solaris/vm/os_solaris.cpp

        Java 7: http://hg.openjdk.java.net/jdk7u/jdk7u/hotspot/diff/e50eb3195734/src/os/solaris/vm/os_solaris.cpp

        Example code for reading using pread() (going back to OpenSolaris):

        https://github.com/joyent/illumos-joyent/blob/master/usr/src/cmd/ptools/pmap/pmap.c

        Look e.g. at function xmapping_iter() and there look for pread().

        See also the discussion thread starting at http://mail.openjdk.java.net/pipermail/hotspot-gc-use/2015-November/002351.html.

        The analysis was done using source code review and realtime checks with pstack and truss. It would be nice if you could instruct me how to get this fixed. It is a bit specific because it only happens on Solaris and when GC log rotation is active, but Solaris should be an important platform for Oracle and GC log rotation will be used more and more.

        I will also attach a simple reproduction code as a tarball, that contains a README and a copy of the bug description (if I find out how to attach).


        THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

        REGRESSION. Last worked in version 7u76

        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        Start any Java process, that creates many entries in /proc/self/map and uses GC log rotation. Add -XX:+PrintGCStoppedTimeto track safepoint pauses. Trigger enough activity such that the GC log rotates. Watch a long safepoint pause directly after log rotation.

        To create a Java process with many entries in /proc/self/map, it can e.g. have many threads and/or open many Jar files with a classloader (both common scenarios on the server).

        The problem should be easily noticable with about 500-1000 entries in /proc/self/map.

        I will attach a self-contained reproduction that implements this recipe (if attachments are allowed).

        EXPECTED VERSUS ACTUAL BEHAVIOR :
        Expected: Short safepoint pauses as long as no GC is involved.

        Actual: One long safepoint pause directly after each GC log rotation. The reason is *not* the GC duration. Pause gets longer as more entries in /proc/self/map exist.
        ERROR MESSAGES/STACK TRACES THAT OCCUR :
        Example:

        2015-11-10T22:32:43.347+0100: 27.582: Total time for which application threads were stopped: 24.1739366 seconds, Stopping threads took: 0.0002938 seconds

        REPRODUCIBILITY :
        This bug can be reproduced always.

        ---------- BEGIN SOURCE ----------
        import java.io.File;
        import java.io.IOException;
        import java.net.MalformedURLException;
        import java.net.URL;
        import java.net.URLClassLoader;
        import java.util.jar.JarFile;

        public class GcRotationPause {

        private static final int allocCount = 100000;

        /* Create some threads to generate thread stack maps.
         * Threads will be started and sleep for the given time.
         * No CPU activity is needed.
         */
        public static void startThreads(int num, final long duration) {
            int i;
            for (i = 0; i < num; i++) {
                new Thread(new Runnable() {
                    public void run() {
                        try {
                            Thread.currentThread().sleep(duration);
                        } catch (InterruptedException ex) {
                            System.err.println(ex);
                        }
                    }
                }).start();
            }
        }

        /* Create class loaders that search a jar file
         * and look up a non-existing resource, so that
         * the jar file gets mapped into memory
         */
        public static URLClassLoader[] openJars(int num, String jarName) {
            int i;
            URLClassLoader[] result = new URLClassLoader[num];
            for (i = 0; i < num; i++) {
                try {
                    String file = i + "-" + jarName;
                    result[i] = new URLClassLoader(new URL[]{(new File(file)).toURI().toURL()});
                    result[i].findResource("does.not.exist");
                } catch (MalformedURLException ex) {
                    System.err.println(ex);
                }
            }
            return result;
        }

        /* Loop for some time and do some allocation
         * to generate GC log activity and rotation.
         * The hashCode() and println() is just some
         * fuzzy code to prevent removal of the code
         * by the JIT
         */
        public static void loopAllocation(long duration) {
            long until = System.currentTimeMillis() + duration;
            long now;
            int i;
            while ((now = System.currentTimeMillis()) < until) {
                for (i = 0; i < allocCount; i++) {
                    Object o = new Object();
                    if (o.hashCode() == 0) {
                        System.out.println("Prevent JIT code removal");
                    }
                }
            }
        }

        public static void main(String[] args) {
            int threadCount = Integer.parseInt(args[0]);
            long duration = Long.parseLong(args[1]);
            int jarCount = Integer.parseInt(args[2]);
            String jarName = args[3];
            System.out.println("Starting " + threadCount + " threads, each keeping them for " + duration + " ms.");
            startThreads(threadCount, duration);
            System.out.println("Opening " + jarCount + " jars based on " + jarName + ".");
            URLClassLoader[] cls = openJars(jarCount, jarName);
            loopAllocation(duration);
            System.out.println("Done.");
        }
        }

        and use the following script to start it:

        #!/bin/sh

        # CONFIG STARTS HERE

        # Choose your path to Java
        JAVA_HOME=/usr/local/jdk1.7.0

        # Number of threads to craete
        THREADS=400

        # Number of Jar files to map
        JARS=400

        # Time in milliseconds that the reproducer tuns
        DURATION=600000

        # JVM start options (sizing, GC log and log rotation)
        FLAGS="
        -server
        -d64
        -Xmx64m -Xms64m
        -XX:NewSize=2m
        -XX:MaxNewSize=2m
        -XX:SurvivorRatio=6
        -Xloggc:gc.log
        -XX:+UseGCLogFileRotation
        -XX:GCLogFileSize=51200
        -XX:NumberOfGCLogFiles=100
        -XX:+PrintGCApplicationConcurrentTime
        -XX:+PrintGCApplicationStoppedTime
        -XX:+PrintGCDateStamps
        -XX:+PrintGCTimeStamps
        -XX:+PrintGCDetails
        -XX:+PrintHeapAtGC
        -XX:+PrintTenuringDistribution
        "

        # CONFIG ENDS HERE

        # Our test class name - do not change
        CLASS=GcRotationPause

        PATH=$JAVA_HOME/bin:$PATH
        export PATH

        echo "Cleaning up old class and jar files..."
        rm -rf work

        echo "Creating work directory..."
        mkdir -p work
        cd work

        echo "Compiling $CLASS.java..."
        javac -d . ../$CLASS.java

        echo "Creating $CLASS.jar"
        jar cf $CLASS.jar ${CLASS}*.class

        echo "Cloning jar file $JARS times"
        n=$JARS
        while [ $n -gt 0 ]
        do
            cp $CLASS.jar ${n}-$CLASS.jar
            n=`expr $n - 1`
        done

        echo "Running $CLASS in background."
        echo "Watch the GC log files in the work directory and attach with truss to follow reading /proc/self/map."
        java $FLAGS $CLASS $THREADS $DURATION $JARS $CLASS.jar &

        ---------- END SOURCE ----------

        CUSTOMER SUBMITTED WORKAROUND :
        Do not use GC log rotation.
        Do not use Solaris.
        Both not really helpful. No other workaround.

          Activity

          Hide
          aroy Abhijit Roy (Inactive) added a comment - - edited
          Requested the bug submitter to provide us the following information for our quick analysis.
          - The tarball which is mentioned in the reported file
          - The problem description
          - README file (reproducible steps)
          Show
          aroy Abhijit Roy (Inactive) added a comment - - edited Requested the bug submitter to provide us the following information for our quick analysis. - The tarball which is mentioned in the reported file - The problem description - README file (reproducible steps)
          Hide
          aroy Abhijit Roy (Inactive) added a comment - - edited
          According to submitter,
          Please find the tarball attached. It contains the problem description and a separate README for how to use it.
          Remember that the problem only happens on Solaris, so you need to try reproduction on that platform (more precisely I only tried Solaris Sparc).
          Show
          aroy Abhijit Roy (Inactive) added a comment - - edited According to submitter, Please find the tarball attached. It contains the problem description and a separate README for how to use it. Remember that the problem only happens on Solaris, so you need to try reproduction on that platform (more precisely I only tried Solaris Sparc).
          Show
          mgerdin Mikael Gerdin (Inactive) added a comment - Also see e-mail thread at http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2015-November/015369.html
          Hide
          aroy Abhijit Roy (Inactive) added a comment - - edited
          Test Result:
          ##########

          OS: Oracle Solaris 10 u11ga (Sun Sparc)

          JDK:
          ******
          7u76 b38 : Pass
          7u80 b15 : Fail
          8u0 b132 : Pass
          8u20 b31 : Fail
          8u51 b16 : Fail
          8u72ea b12 : Fail
          9-ea b94 : Pass
          ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

          7u76,8
          **********
          gc.log.1
          #######
          Total time for which application threads were stopped: 0.0088600 seconds
          gc.log.99
          ########
          Total time for which application threads were stopped: 0.0083722 seconds
          ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          7u80,8u20,8u51,8u72ea:
          ********************************
          gc.log.1
          #######
          2015-12-14T10:04:11.934+0000: 14.072: Total time for which application threads were stopped: 12.1704150 seconds, Stopping threads took: 0.0000977 seconds

          gc.log.45
          ########
          2015-12-14T10:13:49.975+0000: 592.089: Total time for which application threads were stopped: 11.6838587 seconds, Stopping threads took: 0.0002991 seconds

          ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          9ea
          ******
          gc.log.1
          #######
          Total time for which application threads were stopped: 0.0081028 seconds

          gc.log.99
          ########
          Total time for which application threads were stopped: 0.0095763 seconds
          Show
          aroy Abhijit Roy (Inactive) added a comment - - edited Test Result: ########## OS: Oracle Solaris 10 u11ga (Sun Sparc) JDK: ****** 7u76 b38 : Pass 7u80 b15 : Fail 8u0 b132 : Pass 8u20 b31 : Fail 8u51 b16 : Fail 8u72ea b12 : Fail 9-ea b94 : Pass ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 7u76,8 ********** gc.log.1 ####### Total time for which application threads were stopped: 0.0088600 seconds gc.log.99 ######## Total time for which application threads were stopped: 0.0083722 seconds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 7u80,8u20,8u51,8u72ea: ******************************** gc.log.1 ####### 2015-12-14T10:04:11.934+0000: 14.072: Total time for which application threads were stopped: 12.1704150 seconds, Stopping threads took: 0.0000977 seconds gc.log.45 ######## 2015-12-14T10:13:49.975+0000: 592.089: Total time for which application threads were stopped: 11.6838587 seconds, Stopping threads took: 0.0002991 seconds ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 9ea ****** gc.log.1 ####### Total time for which application threads were stopped: 0.0081028 seconds gc.log.99 ######## Total time for which application threads were stopped: 0.0095763 seconds
          Hide
          csahu Cheleswer Sahu (Inactive) added a comment -
          JDK9 uses unified logging mechanism as default. Using unified logging this problem is not reproducible , therefore I have added 9-na label to it.
          Show
          csahu Cheleswer Sahu (Inactive) added a comment - JDK9 uses unified logging mechanism as default. Using unified logging this problem is not reproducible , therefore I have added 9-na label to it.
          Hide
          hgupdate HG Updates added a comment -
          URL: http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/b40d68340feb
          User: kevinw
          Date: 2016-01-08 16:21:45 +0000
          Show
          hgupdate HG Updates added a comment - URL: http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/b40d68340feb User: kevinw Date: 2016-01-08 16:21:45 +0000
          Hide
          hgupdate HG Updates added a comment -
          URL: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/b40d68340feb
          User: robm
          Date: 2016-01-15 15:34:02 +0000
          Show
          hgupdate HG Updates added a comment - URL: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/b40d68340feb User: robm Date: 2016-01-15 15:34:02 +0000

            People

            • Assignee:
              csahu Cheleswer Sahu (Inactive)
              Reporter:
              webbuggrp Webbug Group
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: