Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8019342

G1: High "Other" time most likely due to card redirtying

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: P4
    • Resolution: Fixed
    • Affects Version/s: hs24, hs25
    • Fix Version/s: 9
    • Component/s: hotspot
    • Subcomponent:
      gc
    • Resolved In Build:
      b12
    • CPU:
      generic
    • OS:
      generic

      Backports

        Description

        We have seen a few instances in recent G1 logs where we have fair amount of unattributed "other" time. An example:

        170.552: [GC pause (G1 Evacuation Pause) (young) 170.552: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 15552, predicted base time: 18.87 ms, remaining time: 581.13 ms, target pause time: 600.00 ms]
         170.552: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 25 regions, survivors: 128 regions, predicted young region time: 730.95 ms]
         170.552: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 25 regions, survivors: 128 regions, old: 0 regions, predicted pause time: 749.82 ms, target pause time: 600.00 ms]
        , 0.2650650 secs]
           [Parallel Time: 251.5 ms, GC Workers: 23]
              [GC Worker Start (ms): Min: 170552.6, Avg: 170553.1, Max: 170553.6, Diff: 1.0]
              [Ext Root Scanning (ms): Min: 0.1, Avg: 0.8, Max: 1.7, Diff: 1.5, Sum: 18.8]
              [Update RS (ms): Min: 0.9, Avg: 1.6, Max: 1.9, Diff: 1.0, Sum: 36.2]
                 [Processed Buffers: Min: 2, Avg: 4.4, Max: 8, Diff: 6, Sum: 101]
              [Scan RS (ms): Min: 32.6, Avg: 32.9, Max: 33.1, Diff: 0.4, Sum: 755.8]
              [Object Copy (ms): Min: 213.8, Avg: 214.2, Max: 215.3, Diff: 1.5, Sum: 4925.9]
              [Termination (ms): Min: 0.0, Avg: 1.1, Max: 1.3, Diff: 1.3, Sum: 26.2]
              [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 1.3]
              [GC Worker Total (ms): Min: 250.1, Avg: 250.6, Max: 251.2, Diff: 1.0, Sum: 5764.2]
              [GC Worker End (ms): Min: 170803.7, Avg: 170803.7, Max: 170803.8, Diff: 0.1]
           [Code Root Fixup: 0.0 ms]
           [Clear CT: 0.6 ms]
           [Other: 13.0 ms]
              [Choose CSet: 0.0 ms]
              [Ref Proc: 0.7 ms]
              [Ref Enq: 0.0 ms]
              [Free CSet: 2.5 ms]
           [Eden: 100.0M(100.0M)->0.0B(2332.0M) Survivors: 512.0M->48.0M Heap: 1617.6M(12.0G)->1581.4M(12.0G)]
         [Times: user=5.77 sys=0.01, real=0.27 secs]

        In the log snippet above the "other" time was 13ms. The attributable part of that came to 3.2ms leaving 10ms unattributed. The entire pause was ~260ms so 10ms is around 4% of the entire pause.

        Another (more extreme) example (from the ECE application):

        2013-05-17T19:27:14.480+0000: 10014.025: [GC pause (young)
        Desired survivor size 117440512 bytes, new threshold 1 (max 1)
        - age 1: 22673072 bytes, 22673072 total
         10014.025: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 7189, predicted base time: 11.81 ms, remaining time: 48.19 ms, target pause time: 60.00 ms]
         10014.025: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 435 regions, survivors: 7 regions, predicted young region time: 17.55 ms]
         10014.025: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 435 regions, survivors: 7 regions, old: 0 regions, predicted pause time: 29.36 ms, target pause time: 60.00 ms]
        , 0.2567180 secs]
           [Parallel Time: 26.5 ms, GC Workers: 7]
              [GC Worker Start (ms): Min: 10014025.9, Avg: 10014026.0, Max: 10014026.0, Diff: 0.1]
              [Ext Root Scanning (ms): Min: 4.2, Avg: 4.9, Max: 6.2, Diff: 2.0, Sum: 34.0]
              [Update RS (ms): Min: 3.5, Avg: 4.7, Max: 5.5, Diff: 2.0, Sum: 33.2]
                 [Processed Buffers: Min: 3, Avg: 10.6, Max: 17, Diff: 14, Sum: 74]
              [Scan RS (ms): Min: 6.5, Avg: 7.1, Max: 7.3, Diff: 0.8, Sum: 49.4]
              [Object Copy (ms): Min: 9.4, Avg: 9.7, Max: 9.9, Diff: 0.5, Sum: 67.6]
              [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
              [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.4]
              [GC Worker Total (ms): Min: 26.3, Avg: 26.4, Max: 26.4, Diff: 0.1, Sum: 184.7]
              [GC Worker End (ms): Min: 10014052.3, Avg: 10014052.4, Max: 10014052.4, Diff: 0.1]
           [Code Root Fixup: 0.0 ms]
           [Clear CT: 0.9 ms]
           [Other: 229.4 ms]
              [Choose CSet: 0.0 ms]
              [Ref Proc: 0.1 ms]
              [Ref Enq: 0.0 ms]
              [Free CSet: 2.0 ms]
           [Eden: 1740.0M(1740.0M)->0.0B(1716.0M) Survivors: 28.0M->28.0M Heap: 12.5G(14.0G)->10.8G(14.0G)]
         [Times: user=0.19 sys=0.00, real=0.26 secs]

        In this the "other" time was ~230ms (or almost 90% of the pause).

        By mapping out the phases of the pause executed by the VMThread, the likely culprit seems to be the card redirtying that takes place near the end of G1CollectedHeap::evacuate_collection_set().

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  tschatzl Thomas Schatzl
                  Reporter:
                  johnc John Cuthbertson
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: