Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8065402

G1 does not expand marking stack when mark stack overflow happens during concurrent marking

    Details

    • Type: Bug
    • Status: Open
    • Priority: P4
    • Resolution: Unresolved
    • Affects Version/s: 9
    • Fix Version/s: 10
    • Component/s: hotspot
    • Labels:
    • Subcomponent:
      gc
    • CPU:
      generic
    • OS:
      generic

      Description

      Attached is the spreadsheet summarizing Intel's experiment increasing MarkStackSize manually vs count of
      '[GC concurrent-mark-reset-for-overflow]'

      The observation is we do not expand MarkStackSize to MarkStackSizeMax when concurrent-mark-overflow happens. They have to increase it manually.

      The expand flag is set when concurrent-mark-reset-for-overflow happens.
      The issue is we try to expand markStack in void ConcurrentMark::checkpointRootsFinal(bool clear_all_soft_refs)
      If there is no overflow, at the end, we call set_non_marking_state(); then try to expand markStack.

      set_non_marking_state() calls reset_marking_state and reset expand based on _cm->has_overflown(_cm overflow is cleaned during marking). So when we check if (_markStack.should_expand()), it is always false.
      1. gclogs.tar.gz
        1.90 MB
        Alexander Harlap
      2. nweight_stackoverflow_summary.xlsx
        64 kB
        Jenny Zhang

        Activity

        Hide
        tschatzl Thomas Schatzl added a comment -
        Given the following log output:

        699.092: #320: [GC concurrent-mark-reset-for-overflow]
        700.875: #320: [GC concurrent-mark-end, 2.2655883 secs]
        700.875: #320: [GC remark 700.875: #320: [Finalize Marking, 0.0004584 secs] 700.876: #320: [GC ref-proc700.876: #320: [SoftReference, 0 refs, 0.0005507 secs]700.876: #320: [WeakReference, 266 refs, 0.0003223 secs]700.877: #320: FinalReference, 31 refs, 0.0002600 secs]700.877: #320: [PhantomReference, 10 refs, 0.0003397 secs]700.877: #320: [JNI Weak Reference, 0.0000309 secs], 0.0017227 secs] 700.877: #320: [Unloading, 0.0026094 secs] 700.880: #320: [GC aggregate-data, 0.0187864 secs]

        What happens here is that during marking there is a mark stack overflow. However, G1 can continue and complete marking as indicated by the "concurrent-mark-end" message fine. The remark phase does not indicate a mark stack overflow either. So the current implementation simply assumes that there is no need to increase the mark stack. In this case, the concurrent-mark-reset-for-overflow is mostly benign, as marking can complete 1 1/2 seconds later.

        It would be different if G1 markign ran into mark stack overflows continuously so that it cannot complete the marking.

        The question is, what is the expected behavior of g1:

        a) when the mark stack overflows during concurrent mark, just continue/restart marking (and print the message) and hope that it completes, and at the following remark pause expand the stack.
        b) when the mark stack overflows during concurrent mark, increase the mark stack and continue, trying to avoid marking restart in the future. (And do nothing special in the remark pause)
        c) when the mark stack overflows during concurrent mark, just continue/restart marking (and print the message) and hope that it completes, and at the following remark pause do nothing special either if there is no overflow during remark (the current behavior)
        d) something else

        If a), then it is true that the overflow flag should probably not be cleared after marking is complete so that remark expands the stack.
        If b), then the stack needs to expand when the message is printed.
        If c), nothing needs to be done further.
        Show
        tschatzl Thomas Schatzl added a comment - Given the following log output: 699.092: #320: [GC concurrent-mark-reset-for-overflow] 700.875: #320: [GC concurrent-mark-end, 2.2655883 secs] 700.875: #320: [GC remark 700.875: #320: [Finalize Marking, 0.0004584 secs] 700.876: #320: [GC ref-proc700.876: #320: [SoftReference, 0 refs, 0.0005507 secs]700.876: #320: [WeakReference, 266 refs, 0.0003223 secs]700.877: #320: FinalReference, 31 refs, 0.0002600 secs]700.877: #320: [PhantomReference, 10 refs, 0.0003397 secs]700.877: #320: [JNI Weak Reference, 0.0000309 secs], 0.0017227 secs] 700.877: #320: [Unloading, 0.0026094 secs] 700.880: #320: [GC aggregate-data, 0.0187864 secs] What happens here is that during marking there is a mark stack overflow. However, G1 can continue and complete marking as indicated by the "concurrent-mark-end" message fine. The remark phase does not indicate a mark stack overflow either. So the current implementation simply assumes that there is no need to increase the mark stack. In this case, the concurrent-mark-reset-for-overflow is mostly benign, as marking can complete 1 1/2 seconds later. It would be different if G1 markign ran into mark stack overflows continuously so that it cannot complete the marking. The question is, what is the expected behavior of g1: a) when the mark stack overflows during concurrent mark, just continue/restart marking (and print the message) and hope that it completes, and at the following remark pause expand the stack. b) when the mark stack overflows during concurrent mark, increase the mark stack and continue, trying to avoid marking restart in the future. (And do nothing special in the remark pause) c) when the mark stack overflows during concurrent mark, just continue/restart marking (and print the message) and hope that it completes, and at the following remark pause do nothing special either if there is no overflow during remark (the current behavior) d) something else If a), then it is true that the overflow flag should probably not be cleared after marking is complete so that remark expands the stack. If b), then the stack needs to expand when the message is printed. If c), nothing needs to be done further.
        Hide
        yuzhang Jenny Zhang added a comment -
        the current implementation is confusing for users. When they see that overflow message, they expect the marking stack will expand up to max marking stack.

        What is the cost of ignoring those overflow and start over? I do not have data for that.

        What happens if we keep having those overflow during concurrent marking and can not recover?
        Show
        yuzhang Jenny Zhang added a comment - the current implementation is confusing for users. When they see that overflow message, they expect the marking stack will expand up to max marking stack. What is the cost of ignoring those overflow and start over? I do not have data for that. What happens if we keep having those overflow during concurrent marking and can not recover?
        Hide
        tschatzl Thomas Schatzl added a comment -
        Costs of the current mechanism:
         - you really need to start over, with the existing marks on the bitmap kept intact though (afair). This will eventually converge to a successful marking of the entire object graph.
         - above procedure may take so long that you will run into a full gc
        (- confusing to user)

        Costs of expansion
         - using more memory than necessary

        One could compare the length of concurrent markings with and without a big enough mark stack on a per-application basis. In case of this benchmark I do not expect a big performance hit.

        The full gc due to not completing concurrent marking is a considerable risk though, and most users, particularly with really large heaps will probably not care about a few additional MB of mark stack (assumption).


        Show
        tschatzl Thomas Schatzl added a comment - Costs of the current mechanism:  - you really need to start over, with the existing marks on the bitmap kept intact though (afair). This will eventually converge to a successful marking of the entire object graph.  - above procedure may take so long that you will run into a full gc (- confusing to user) Costs of expansion  - using more memory than necessary One could compare the length of concurrent markings with and without a big enough mark stack on a per-application basis. In case of this benchmark I do not expect a big performance hit. The full gc due to not completing concurrent marking is a considerable risk though, and most users, particularly with really large heaps will probably not care about a few additional MB of mark stack (assumption).
        Hide
        yuzhang Jenny Zhang added a comment -
        A related issue is marking cycle takes too long. The mixed gc can not start, so full gc kicks in.
        Big data (Intel, Oracle NoSQL) workloads work around this by increase concurrent mark threads. Probably need to find a separate bug to keep track.
        Show
        yuzhang Jenny Zhang added a comment - A related issue is marking cycle takes too long. The mixed gc can not start, so full gc kicks in. Big data (Intel, Oracle NoSQL) workloads work around this by increase concurrent mark threads. Probably need to find a separate bug to keep track.
        Hide
        aharlap Alexander Harlap added a comment -
        I simulated this issue with following debugging change - in order to decrease size of G1CMTaskQueue:


        --- a/src/share/vm/gc/g1/g1ConcurrentMark.hpp Tue Mar 14 22:14:33 2017 -0700
        +++ b/src/share/vm/gc/g1/g1ConcurrentMark.hpp Wed Mar 22 13:30:14 2017 -0400
        @@ -93,7 +93,7 @@
         #pragma warning(pop)
         #endif
         
        -typedef GenericTaskQueue<G1TaskQueueEntry, mtGC> G1CMTaskQueue;
        +typedef GenericTaskQueue<G1TaskQueueEntry, mtGC, 1024> G1CMTaskQueue;
         typedef GenericTaskQueueSet<G1CMTaskQueue, mtGC> G1CMTaskQueueSet;
         
         // Closure used by CM during concurrent reference discovery
        @@ -221,7 +221,7 @@
         class G1CMMarkStack VALUE_OBJ_CLASS_SPEC {
         public:
           // Number of TaskQueueEntries that can fit in a single chunk.
        - static const size_t EntriesPerChunk = 1024 - 1 /* One reference for the next pointer */;
        + static const size_t EntriesPerChunk = 64 - 1 /* One reference for the next pointer */;
         private:
           struct TaskQueueEntryChunk {
             TaskQueueEntryChunk* next;

        After that GCBasher was used as a test with flag "-XX:MarkStackSize=1K"
        I ran test on machine with 68 cores.

        Attached file gclogs.tar.gz contains two log files - with current way of expanding MarkStack (only if overflow happened in remark) - file base.log, and file fix.log - for proposed fix - to expand MarkStack if overflow happened in the Concurrent Mark.

        Base.log has no mark stack expansion, has two of "Concurrent Mark Abort" and two matched Full GC

        Fix.log has few " Expanded mark stack" and none of "Concurrent Mark Abort" and none of FULL GC


         
        Show
        aharlap Alexander Harlap added a comment - I simulated this issue with following debugging change - in order to decrease size of G1CMTaskQueue: --- a/src/share/vm/gc/g1/g1ConcurrentMark.hpp Tue Mar 14 22:14:33 2017 -0700 +++ b/src/share/vm/gc/g1/g1ConcurrentMark.hpp Wed Mar 22 13:30:14 2017 -0400 @@ -93,7 +93,7 @@  #pragma warning(pop)  #endif   -typedef GenericTaskQueue<G1TaskQueueEntry, mtGC> G1CMTaskQueue; +typedef GenericTaskQueue<G1TaskQueueEntry, mtGC, 1024> G1CMTaskQueue;  typedef GenericTaskQueueSet<G1CMTaskQueue, mtGC> G1CMTaskQueueSet;    // Closure used by CM during concurrent reference discovery @@ -221,7 +221,7 @@  class G1CMMarkStack VALUE_OBJ_CLASS_SPEC {  public:    // Number of TaskQueueEntries that can fit in a single chunk. - static const size_t EntriesPerChunk = 1024 - 1 /* One reference for the next pointer */; + static const size_t EntriesPerChunk = 64 - 1 /* One reference for the next pointer */;  private:    struct TaskQueueEntryChunk {      TaskQueueEntryChunk* next; After that GCBasher was used as a test with flag "-XX:MarkStackSize=1K" I ran test on machine with 68 cores. Attached file gclogs.tar.gz contains two log files - with current way of expanding MarkStack (only if overflow happened in remark) - file base.log, and file fix.log - for proposed fix - to expand MarkStack if overflow happened in the Concurrent Mark. Base.log has no mark stack expansion, has two of "Concurrent Mark Abort" and two matched Full GC Fix.log has few " Expanded mark stack" and none of "Concurrent Mark Abort" and none of FULL GC  

          People

          • Assignee:
            Unassigned
            Reporter:
            yuzhang Jenny Zhang
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated: