Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8226197

Reduce G1’s CPU cost with simplified write post-barrier and disabling concurrent refinement

    Details

    • Type: Enhancement
    • Status: Open
    • Priority: P4
    • Resolution: Unresolved
    • Affects Version/s: 9, 10, 11, 12, 13
    • Fix Version/s: tbd
    • Component/s: hotspot
    • Subcomponent:
      gc

      Description

      While migrating our production services from CMS to G1, we found that G1’s complicated write post-barrier incurs considerable CPU cost. Currently the post-barrier is the following for a write “p.f = q”:

      if ((p xor q) >> LOG_REGION_BITS != 0) { // if the write crosses region boundary
        if (q != null) {
          card_address = &card_table[addr_to_index(p)]
          if (*card_address != YOUNG) {
            store_load_fence;
            if (*card_address != DIRTY) {
              *card_address = DIRTY;
              T.dirtyCardQueue.enqueue(card_address);
            }
          }
        }
      }

      And for CMS the write barrier is only:
      card_address = &card_table[addr_to_index(p)]
      *card_address = DIRTY;

      The complexity of G1’s write post-barrier is due to the need to support concurrently refinement threads. However, even if user has set -XX:G1ConcRefinementThreads=0, the write post-barrier remains the same. Ideally the write post-barrier could be much simpler if there is no concurrent refinement.

      This RFE proposes to add a mode to G1 that uses a simplified write post-barrier:
      if ((p xor q) >> LOG_REGION_BITS != 0) {
        if (q != null) {
          card_address = &card_table[addr_to_index(p)]
          *card_address = DIRTY;
        }
      }

      In this mode, G1 would disable concurrent refinement and per-Java-thread dirty card queue. G1 would need to process all dirty cards during a collection pause. Thus pause time could become longer, but as long as MaxGCPauseMillis is reasonably large with regard to the heap size, G1’s adaptive heuristics should still be able to adjust the young-gen size to meet the pause time goal.

      This new mode would reduce G1’s CPU usage considerably. It will be particularly helpful for certain types of workloads, e.g.:
       - Workloads heavily tuned for CMS to minimize old-gen collections, and sensitive to CPU usage;
       - Workloads that mainly care about throughput and CPU usage;

      I have implemented a prototype for this mode, and attached some preliminary results.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                manc Man Cao
                Reporter:
                manc Man Cao
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: