Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8137099

G1 needs to "upgrade" GC within the safepoint if it can't allocate during that safepoint to avoid OoME

    Details

    • Subcomponent:
      gc
    • Resolved In Build:
      b01

      Description

      We regularly see OoM-Errors with G1 in our stress tests. We run the tests with the same heap size with ParallelGC and CMS without that problem.

      The stress tests are based on real world application code with a lot of threads.

      Scenario:
      We have an application with a lot of threads and spend time in critical native sections.

      1. An evacuation failure happens during a GC.
      2. After clean-up work, the safepoint is left.
      3. An other thread can't allocate and triggers a new incremental gc.
      4. A thread, that can't allocate after an incremental GC, triggers a full GC. However, the GC doesn't start because an other thread
          started an incremental GC, the GC-locker is active or the GCLocker initiated GC has not yet been performed.
          If an incremental GC doesn't succeed due to the GC-locker, and if this happens more often than GCLockerRetryAllocationCount (=2) an OOME is thrown.

      Without critical native code, we would try to trigger a full gc until we succeed. In this case there is just a performance issue, but not an OOME.

      The reason is that only G1 splits the "upgrade" of young gc to full gc into multiple VM operations. Between those, the gclocker state can change and prevent full gc.

      The problem can be reproduced with the attached program.
      The parameters might vary depending on the system.

      java -Xmx64m -XX:+UseG1GC -XX:+PrintGC -XX:MaxGCPauseMillis=10 -XX:+UnlockExperimentalVMOptions -XX:-G1ForceFullGCAfterEvacuationFailure -XX:-PrintAdaptiveSizePolicy TestEvacFailureThreaded 10 10000000 10000 10000 10000 10 0.7

      A snipped of the output:

      #2539: [GC pause (G1 Evacuation Pause) (young) 62M->62M(64M), 0.0062519 secs]
      #2540: [GC pause (G1 Evacuation Pause) (young) 62M->62M(64M), 0.0050967 secs]
      #2538: [GC concurrent-mark-end, 0.0193436 secs]
      #2538: [GC remark, 0.0048717 secs]
      #2538: [GC cleanup 62M->62M(64M), 0.0016663 secs]
      #2541: [GC pause (GCLocker Initiated GC) (young) 62M->62M(64M), 0.0061165 secs]
      #2542: [GC pause (G1 Evacuation Pause) (mixed)-- 62M->62M(64M), 0.0063998 secs]
      #2543: [GC pause (G1 Evacuation Pause) (mixed)-- 62M->62M(64M), 0.0066795 secs]
      #2544: [GC pause (GCLocker Initiated GC) (mixed)-- 62M->62M(64M), 0.0082145 secs]
      #2545: [GC pause (G1 Evacuation Pause) (mixed)-- 62M->62M(64M), 0.0102476 secs]
      #2546: [GC pause (GCLocker Initiated GC) (mixed)-- 62M->62M(64M), 0.0142916 secs]
      #2547: [GC pause (G1 Evacuation Pause) (mixed)-- 62M->62M(64M), 0.0108066 secs]
      #2548: [GC pause (G1 Evacuation Pause) (young) 62M->62M(64M), 0.0065968 secs]
      #2549: [Full GC (Allocation Failure) 62M->23M(64M), 0.0483837 secs]
      java.lang.OutOfMemoryError: Java heap space
              at TestEvacFailureThreaded.runTest(TestEvacFailureThreaded.java:75)
              at TestEvacFailureThreaded$2.run(TestEvacFailureThreaded.java:138)



      1. TestEvacFailureThreaded.cpp
        0.6 kB
        Axel Siebenborn
      2. TestEvacFailureThreaded.h
        0.6 kB
        Axel Siebenborn
      3. TestEvacFailureThreaded.java
        4 kB
        Axel Siebenborn

        Issue Links

          Activity

          Hide
          mgerdin Mikael Gerdin (Inactive) added a comment -
          Axel, do you still see this OOME after the fix for JDK-8130265 ?
          I just remembered this bug and it seems like that fix resolves or hides this problem for me.
          Show
          mgerdin Mikael Gerdin (Inactive) added a comment - Axel, do you still see this OOME after the fix for JDK-8130265 ? I just remembered this bug and it seems like that fix resolves or hides this problem for me.
          Hide
          asiebenborn Axel Siebenborn added a comment -
          Mikael, I still can reproduce the problem.
          Show
          asiebenborn Axel Siebenborn added a comment - Mikael, I still can reproduce the problem.
          Hide
          tschatzl Thomas Schatzl added a comment - - edited
          Looks very similar to JDK-8179226, i.e. gclocker in combination with full and young gcs.
          Show
          tschatzl Thomas Schatzl added a comment - - edited Looks very similar to JDK-8179226 , i.e. gclocker in combination with full and young gcs.
          Hide
          tschatzl Thomas Schatzl added a comment -
          JDK-8179226 is similar but not the same, although this same issue also occurs in JDK-8179226. Changing JDK-8179226 to be about the other bug (a test bug), and this one to fix the missing "upgrade" of young collections to full ones.
          Show
          tschatzl Thomas Schatzl added a comment - JDK-8179226 is similar but not the same, although this same issue also occurs in JDK-8179226 . Changing JDK-8179226 to be about the other bug (a test bug), and this one to fix the missing "upgrade" of young collections to full ones.
          Hide
          tschatzl Thomas Schatzl added a comment -
          The reason for the bug (and the suggested fix indicates it) is that in G1 and only G1, full GCs determined by ergonomics are executed using two safepoints. Between the first (young gc request) and the second (full gc request) the gc locker can "lock out" the second so that it never occurs.

          The other collectors issue a full gc after a young gc within the same pause to ensure that everything possible has been done to get memory atomically so this lock-out can't happen (if the first gc has not been locked out, the second can't either because the GCLocker can't be entered while a safepoint is in progress).
          Show
          tschatzl Thomas Schatzl added a comment - The reason for the bug (and the suggested fix indicates it) is that in G1 and only G1, full GCs determined by ergonomics are executed using two safepoints. Between the first (young gc request) and the second (full gc request) the gc locker can "lock out" the second so that it never occurs. The other collectors issue a full gc after a young gc within the same pause to ensure that everything possible has been done to get memory atomically so this lock-out can't happen (if the first gc has not been locked out, the second can't either because the GCLocker can't be entered while a safepoint is in progress).
          Hide
          hgupdate HG Updates added a comment -
          URL: http://hg.openjdk.java.net/jdk/hs/rev/862c41cf1c7f
          User: tschatzl
          Date: 2018-01-11 09:54:31 +0000
          Show
          hgupdate HG Updates added a comment - URL: http://hg.openjdk.java.net/jdk/hs/rev/862c41cf1c7f User: tschatzl Date: 2018-01-11 09:54:31 +0000
          Hide
          hgupdate HG Updates added a comment -
          URL: http://hg.openjdk.java.net/jdk/jdk/rev/862c41cf1c7f
          User: jwilhelm
          Date: 2018-01-19 16:54:42 +0000
          Show
          hgupdate HG Updates added a comment - URL: http://hg.openjdk.java.net/jdk/jdk/rev/862c41cf1c7f User: jwilhelm Date: 2018-01-19 16:54:42 +0000

            People

            • Assignee:
              tschatzl Thomas Schatzl
              Reporter:
              asiebenborn Axel Siebenborn
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: