Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8268372

ZGC: dynamically select the number of concurrent GC threads used

    XMLWordPrintable

    Details

    • Type: Enhancement
    • Status: Resolved
    • Priority: P4
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 17
    • Component/s: hotspot
    • Labels:
    • Subcomponent:
      gc
    • Resolved In Build:
      b26

      Description

      ZGC use ConcGCThreads GC threads in each GC cycle for concurrent operations, such as marking, non-ref processing, relocation, etc. Currently, the default value for ConcGCThreads is ~12.5% of total #CPU. Such relatively conservative default value is used to avoid "stealing" too much CPU from mutators, which is important especially for latency sensitive benchmarks.

      However, for some benchmarks (or some phases of benchmarks), 12.5% is not enough for GC to keep up with mutators; allocation stalls are observed. Naively increasing the default value (to 25% for instance) solves the allocation stall problem, but also causes large regression on some latency sensitive benchmarks (observed in our testing). Therefore, instead of using a static number of GC threads for every GC cycle, we dynamically select #GC threads used in a GC cycle, which permits fewer GC threads for good latency and more GC threads to match higher allocation rate.

      With this change, the default ConcGCThreads will be 25% of total #CPU, and we select #GC threads (a value in the range of [1, ConcGCThreads]) to use for each GC cycle, based on various metrics (GC cycle duration, free space left, allocation rate, etc).

      Such feature is enabled by default, and can be turned off using `-XX:-UseDynamicNumberOfGCThreads`.

      --------------------------------------------------
      # Implementation Overview

      dynamic-gc needs to decide #workers to use and when to initiate a gc cycle. We will first cover some basic metrics utilized.

      ## Metrics

      ### 1. gc duration

      A gc cycle mostly consists of parallel phases (using multiple gc threads) connected with short serial phaes (using a single thread). We track these two parts (`per_worker_time` in parallel phases and `serial_time`) separately to better model how gc duration reacts to the change of #threads used.

      ### 2. allocation rate

      Periodically sample #bytes allocated by mutators; keep a history of certain number of samples, from while constructing the average and standard deviation. Then use them to as an estimate for future allocation rate.

      alloc_rate = avg + sd * sd_factor, where sd_factor is ~3.3

      ### 3. time_till_oom

      Derived based on #free_bytes and `alloc_rate`.

      ## Main Algorithm

      The algorithm assumes that for two consecutive gc cycles, the total gc cpu time should be similar; IOW, serial_time + per_worker_time * #workers.

      Therefor, the number of workers we should be using, if a gc cycle is started right now, to avoid OOM should be at least:

      #workers = ceit((time_till_oom - serial_time) / per_work_time)

      Then we use #workers to calculate the actual gc duration and check if we indeed need to start a gc. A gc cycle is initiated when `gc_duration >= time_till_oom`, where `gc_duration = serial_time + per_work_time * #workers`.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              ayang Albert Yang
              Reporter:
              ayang Albert Yang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: