Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8065402

G1 does not expand marking stack when mark stack overflow happens during concurrent marking

    Details

    • Subcomponent:
      gc
    • CPU:
      generic
    • OS:
      generic

      Description

      Attached is the spreadsheet summarizing Intel's experiment increasing MarkStackSize manually vs count of
      '[GC concurrent-mark-reset-for-overflow]'

      The observation is we do not expand MarkStackSize to MarkStackSizeMax when concurrent-mark-overflow happens. They have to increase it manually.

      The expand flag is set when concurrent-mark-reset-for-overflow happens.
      The issue is we try to expand markStack in void ConcurrentMark::checkpointRootsFinal(bool clear_all_soft_refs)
      If there is no overflow, at the end, we call set_non_marking_state(); then try to expand markStack.

      set_non_marking_state() calls reset_marking_state and reset expand based on _cm->has_overflown(_cm overflow is cleaned during marking). So when we check if (_markStack.should_expand()), it is always false.
      1. b8065402.java
        1 kB
        Alexander Harlap
      2. gclogs.tar.gz
        1.90 MB
        Alexander Harlap
      3. nweight_stackoverflow_summary.xlsx
        64 kB
        Jenny Zhang

        Issue Links

          Activity

          Hide
          tschatzl Thomas Schatzl added a comment -
          Costs of the current mechanism:
           - you really need to start over, with the existing marks on the bitmap kept intact though (afair). This will eventually converge to a successful marking of the entire object graph.
           - above procedure may take so long that you will run into a full gc
          (- confusing to user)

          Costs of expansion
           - using more memory than necessary

          One could compare the length of concurrent markings with and without a big enough mark stack on a per-application basis. In case of this benchmark I do not expect a big performance hit.

          The full gc due to not completing concurrent marking is a considerable risk though, and most users, particularly with really large heaps will probably not care about a few additional MB of mark stack (assumption).


          Show
          tschatzl Thomas Schatzl added a comment - Costs of the current mechanism:  - you really need to start over, with the existing marks on the bitmap kept intact though (afair). This will eventually converge to a successful marking of the entire object graph.  - above procedure may take so long that you will run into a full gc (- confusing to user) Costs of expansion  - using more memory than necessary One could compare the length of concurrent markings with and without a big enough mark stack on a per-application basis. In case of this benchmark I do not expect a big performance hit. The full gc due to not completing concurrent marking is a considerable risk though, and most users, particularly with really large heaps will probably not care about a few additional MB of mark stack (assumption).
          Hide
          yuzhang Jenny Zhang added a comment -
          A related issue is marking cycle takes too long. The mixed gc can not start, so full gc kicks in.
          Big data (Intel, Oracle NoSQL) workloads work around this by increase concurrent mark threads. Probably need to find a separate bug to keep track.
          Show
          yuzhang Jenny Zhang added a comment - A related issue is marking cycle takes too long. The mixed gc can not start, so full gc kicks in. Big data (Intel, Oracle NoSQL) workloads work around this by increase concurrent mark threads. Probably need to find a separate bug to keep track.
          Hide
          aharlap Alexander Harlap added a comment -
          I simulated this issue with following debugging change - in order to decrease size of G1CMTaskQueue:


          --- a/src/share/vm/gc/g1/g1ConcurrentMark.hpp Tue Mar 14 22:14:33 2017 -0700
          +++ b/src/share/vm/gc/g1/g1ConcurrentMark.hpp Wed Mar 22 13:30:14 2017 -0400
          @@ -93,7 +93,7 @@
           #pragma warning(pop)
           #endif
           
          -typedef GenericTaskQueue<G1TaskQueueEntry, mtGC> G1CMTaskQueue;
          +typedef GenericTaskQueue<G1TaskQueueEntry, mtGC, 1024> G1CMTaskQueue;
           typedef GenericTaskQueueSet<G1CMTaskQueue, mtGC> G1CMTaskQueueSet;
           
           // Closure used by CM during concurrent reference discovery
          @@ -221,7 +221,7 @@
           class G1CMMarkStack VALUE_OBJ_CLASS_SPEC {
           public:
             // Number of TaskQueueEntries that can fit in a single chunk.
          - static const size_t EntriesPerChunk = 1024 - 1 /* One reference for the next pointer */;
          + static const size_t EntriesPerChunk = 64 - 1 /* One reference for the next pointer */;
           private:
             struct TaskQueueEntryChunk {
               TaskQueueEntryChunk* next;

          After that GCBasher was used as a test with flag "-XX:MarkStackSize=1K"
          I ran test on machine with 68 cores.

          Attached file gclogs.tar.gz contains two log files - with current way of expanding MarkStack (only if overflow happened in remark) - file base.log, and file fix.log - for proposed fix - to expand MarkStack if overflow happened in the Concurrent Mark.

          Base.log has no mark stack expansion, has two of "Concurrent Mark Abort" and two matched Full GC

          Fix.log has few " Expanded mark stack" and none of "Concurrent Mark Abort" and none of FULL GC


           
          Show
          aharlap Alexander Harlap added a comment - I simulated this issue with following debugging change - in order to decrease size of G1CMTaskQueue: --- a/src/share/vm/gc/g1/g1ConcurrentMark.hpp Tue Mar 14 22:14:33 2017 -0700 +++ b/src/share/vm/gc/g1/g1ConcurrentMark.hpp Wed Mar 22 13:30:14 2017 -0400 @@ -93,7 +93,7 @@  #pragma warning(pop)  #endif   -typedef GenericTaskQueue<G1TaskQueueEntry, mtGC> G1CMTaskQueue; +typedef GenericTaskQueue<G1TaskQueueEntry, mtGC, 1024> G1CMTaskQueue;  typedef GenericTaskQueueSet<G1CMTaskQueue, mtGC> G1CMTaskQueueSet;    // Closure used by CM during concurrent reference discovery @@ -221,7 +221,7 @@  class G1CMMarkStack VALUE_OBJ_CLASS_SPEC {  public:    // Number of TaskQueueEntries that can fit in a single chunk. - static const size_t EntriesPerChunk = 1024 - 1 /* One reference for the next pointer */; + static const size_t EntriesPerChunk = 64 - 1 /* One reference for the next pointer */;  private:    struct TaskQueueEntryChunk {      TaskQueueEntryChunk* next; After that GCBasher was used as a test with flag "-XX:MarkStackSize=1K" I ran test on machine with 68 cores. Attached file gclogs.tar.gz contains two log files - with current way of expanding MarkStack (only if overflow happened in remark) - file base.log, and file fix.log - for proposed fix - to expand MarkStack if overflow happened in the Concurrent Mark. Base.log has no mark stack expansion, has two of "Concurrent Mark Abort" and two matched Full GC Fix.log has few " Expanded mark stack" and none of "Concurrent Mark Abort" and none of FULL GC  
          Hide
          aharlap Alexander Harlap added a comment -
          Added test for reproducing issue
          Show
          aharlap Alexander Harlap added a comment - Added test for reproducing issue
          Hide
          aharlap Alexander Harlap added a comment - - edited
          Constructed test b8065402.java that does not require any VM modifications to reproduce an issue.

          Testing with this file:
          jdk-base/bin/java -Xmx32g -server -XX:+UseG1GC -Xlog:gc*=debug b8065402 150
          Duration - 182.831 sec (with 5 full gc)

          Proposed fix:
          jdk-fix/bin/java -Xmx32g -server -XX:+UseG1GC -Xlog:gc*=debug b8065402 150
          Duration - 68.267 sec (with 0 full gc) Mark Stack was expanded (twice): 4M -> 8M ->16M
          Show
          aharlap Alexander Harlap added a comment - - edited Constructed test b8065402.java that does not require any VM modifications to reproduce an issue. Testing with this file: jdk-base/bin/java -Xmx32g -server -XX:+UseG1GC -Xlog:gc*=debug b8065402 150 Duration - 182.831 sec (with 5 full gc) Proposed fix: jdk-fix/bin/java -Xmx32g -server -XX:+UseG1GC -Xlog:gc*=debug b8065402 150 Duration - 68.267 sec (with 0 full gc) Mark Stack was expanded (twice): 4M -> 8M ->16M

            People

            • Assignee:
              aharlap Alexander Harlap
              Reporter:
              yuzhang Jenny Zhang
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: