Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8162929

Enqueuing dirty cards into a single DCQS during GC does not scale

    Details

    • Subcomponent:
      gc
    • Resolved In Build:
      b07

      Description

      While looking at some more demanding large microbenchmarks (e.g. BigRamtester, 20g heap, 1M regions) enqueueing dirty cards during GC in G1ParScanThreadState::update_rs incurs a significant amount of wait (idle) time.

      The reason is that enqueuing completed buffers takes a global lock (basically ending up in PtrQueue::handle_zero_index() and PtrQueueSet::enqueue_complete_buffer(); there is also some strange locking/unlocking going on in PtrQueue::locking_enqueue_completed_buffer()).

      That does not scale beyond a few threads.

      The problem is harder than it seems because after providing a per-thread DCQS, performance does not improve a lot. The stalling is moved to the malloc() calls done when allocating new DCQ buffers.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                kbarrett Kim Barrett
                Reporter:
                tschatzl Thomas Schatzl
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: