Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8065402

G1 does not expand marking stack when mark stack overflow happens during concurrent marking

    Details

    • Subcomponent:
      gc
    • Resolved In Build:
      team
    • CPU:
      generic
    • OS:
      generic

      Description

      Attached is the spreadsheet summarizing Intel's experiment increasing MarkStackSize manually vs count of
      '[GC concurrent-mark-reset-for-overflow]'

      The observation is we do not expand MarkStackSize to MarkStackSizeMax when concurrent-mark-overflow happens. They have to increase it manually.

      The expand flag is set when concurrent-mark-reset-for-overflow happens.
      The issue is we try to expand markStack in void ConcurrentMark::checkpointRootsFinal(bool clear_all_soft_refs)
      If there is no overflow, at the end, we call set_non_marking_state(); then try to expand markStack.

      set_non_marking_state() calls reset_marking_state and reset expand based on _cm->has_overflown(_cm overflow is cleaned during marking). So when we check if (_markStack.should_expand()), it is always false.
      1. b8065402.java
        1 kB
        Alexander Harlap
      2. gclogs.tar.gz
        1.90 MB
        Alexander Harlap
      3. nweight_stackoverflow_summary.xlsx
        64 kB
        Jenny Zhang

        Issue Links

          Activity

          Hide
          yuzhang Jenny Zhang added a comment -
          A related issue is marking cycle takes too long. The mixed gc can not start, so full gc kicks in.
          Big data (Intel, Oracle NoSQL) workloads work around this by increase concurrent mark threads. Probably need to find a separate bug to keep track.
          Show
          yuzhang Jenny Zhang added a comment - A related issue is marking cycle takes too long. The mixed gc can not start, so full gc kicks in. Big data (Intel, Oracle NoSQL) workloads work around this by increase concurrent mark threads. Probably need to find a separate bug to keep track.
          Hide
          aharlap Alexander Harlap added a comment -
          I simulated this issue with following debugging change - in order to decrease size of G1CMTaskQueue:


          --- a/src/share/vm/gc/g1/g1ConcurrentMark.hpp Tue Mar 14 22:14:33 2017 -0700
          +++ b/src/share/vm/gc/g1/g1ConcurrentMark.hpp Wed Mar 22 13:30:14 2017 -0400
          @@ -93,7 +93,7 @@
           #pragma warning(pop)
           #endif
           
          -typedef GenericTaskQueue<G1TaskQueueEntry, mtGC> G1CMTaskQueue;
          +typedef GenericTaskQueue<G1TaskQueueEntry, mtGC, 1024> G1CMTaskQueue;
           typedef GenericTaskQueueSet<G1CMTaskQueue, mtGC> G1CMTaskQueueSet;
           
           // Closure used by CM during concurrent reference discovery
          @@ -221,7 +221,7 @@
           class G1CMMarkStack VALUE_OBJ_CLASS_SPEC {
           public:
             // Number of TaskQueueEntries that can fit in a single chunk.
          - static const size_t EntriesPerChunk = 1024 - 1 /* One reference for the next pointer */;
          + static const size_t EntriesPerChunk = 64 - 1 /* One reference for the next pointer */;
           private:
             struct TaskQueueEntryChunk {
               TaskQueueEntryChunk* next;

          After that GCBasher was used as a test with flag "-XX:MarkStackSize=1K"
          I ran test on machine with 68 cores.

          Attached file gclogs.tar.gz contains two log files - with current way of expanding MarkStack (only if overflow happened in remark) - file base.log, and file fix.log - for proposed fix - to expand MarkStack if overflow happened in the Concurrent Mark.

          Base.log has no mark stack expansion, has two of "Concurrent Mark Abort" and two matched Full GC

          Fix.log has few " Expanded mark stack" and none of "Concurrent Mark Abort" and none of FULL GC


           
          Show
          aharlap Alexander Harlap added a comment - I simulated this issue with following debugging change - in order to decrease size of G1CMTaskQueue: --- a/src/share/vm/gc/g1/g1ConcurrentMark.hpp Tue Mar 14 22:14:33 2017 -0700 +++ b/src/share/vm/gc/g1/g1ConcurrentMark.hpp Wed Mar 22 13:30:14 2017 -0400 @@ -93,7 +93,7 @@  #pragma warning(pop)  #endif   -typedef GenericTaskQueue<G1TaskQueueEntry, mtGC> G1CMTaskQueue; +typedef GenericTaskQueue<G1TaskQueueEntry, mtGC, 1024> G1CMTaskQueue;  typedef GenericTaskQueueSet<G1CMTaskQueue, mtGC> G1CMTaskQueueSet;    // Closure used by CM during concurrent reference discovery @@ -221,7 +221,7 @@  class G1CMMarkStack VALUE_OBJ_CLASS_SPEC {  public:    // Number of TaskQueueEntries that can fit in a single chunk. - static const size_t EntriesPerChunk = 1024 - 1 /* One reference for the next pointer */; + static const size_t EntriesPerChunk = 64 - 1 /* One reference for the next pointer */;  private:    struct TaskQueueEntryChunk {      TaskQueueEntryChunk* next; After that GCBasher was used as a test with flag "-XX:MarkStackSize=1K" I ran test on machine with 68 cores. Attached file gclogs.tar.gz contains two log files - with current way of expanding MarkStack (only if overflow happened in remark) - file base.log, and file fix.log - for proposed fix - to expand MarkStack if overflow happened in the Concurrent Mark. Base.log has no mark stack expansion, has two of "Concurrent Mark Abort" and two matched Full GC Fix.log has few " Expanded mark stack" and none of "Concurrent Mark Abort" and none of FULL GC  
          Hide
          aharlap Alexander Harlap added a comment -
          Added test for reproducing issue
          Show
          aharlap Alexander Harlap added a comment - Added test for reproducing issue
          Hide
          aharlap Alexander Harlap added a comment - - edited
          Constructed test b8065402.java that does not require any VM modifications to reproduce an issue.

          Testing with this file:
          jdk-base/bin/java -Xmx32g -server -XX:+UseG1GC -Xlog:gc*=debug b8065402 150
          Duration - 182.831 sec (with 5 full gc)

          Proposed fix:
          jdk-fix/bin/java -Xmx32g -server -XX:+UseG1GC -Xlog:gc*=debug b8065402 150
          Duration - 68.267 sec (with 0 full gc) Mark Stack was expanded (twice): 4M -> 8M ->16M
          Show
          aharlap Alexander Harlap added a comment - - edited Constructed test b8065402.java that does not require any VM modifications to reproduce an issue. Testing with this file: jdk-base/bin/java -Xmx32g -server -XX:+UseG1GC -Xlog:gc*=debug b8065402 150 Duration - 182.831 sec (with 5 full gc) Proposed fix: jdk-fix/bin/java -Xmx32g -server -XX:+UseG1GC -Xlog:gc*=debug b8065402 150 Duration - 68.267 sec (with 0 full gc) Mark Stack was expanded (twice): 4M -> 8M ->16M
          Hide
          hgupdate HG Updates added a comment -
          URL: http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/24afa1eef92f
          User: kbarrett
          Date: 2017-05-09 23:24:46 +0000
          Show
          hgupdate HG Updates added a comment - URL: http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/24afa1eef92f User: kbarrett Date: 2017-05-09 23:24:46 +0000

            People

            • Assignee:
              aharlap Alexander Harlap
              Reporter:
              yuzhang Jenny Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: