Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8180450

secondary_super_cache does not scale well

    Details

      Description

      On some workloads, updates to the Klass::secondary_super_cache field
      cause excessive cache line invalidation traffic, with noticeable slowdowns.

      Specifically, the cache itself may become unstable (which is a normal corner case for one-element caches) and at that point a multi-threaded application may begin "hammering" on the cache line from multiple threads, causing an explosion of coherence traffic.

      One customer reported this as happening when multiple threads were traversing heterogeneous sequences of objects, testing the same classes against more than one interface, with rapid variation between the interfaces.

      In such a case, two interfaces could compete to use the single SSC slot on each class that occurs in the object sequence. The competition would turn into frequent updating of the SSC slots by multiple threads, causing cache lines to ping-pong between processors.

      To fix this, the SSC has to have some sort of limit on its update rate, or be replaced by a mechanism that scales better.

      The simplest fix is probably to put an "update count" profile counter somewhere, and consult that counter just before updating the SSC. If the counter is too high (evidence of a high contention rate), don't update the SSC. The trade-off is between linear searches of the Klass::secondary_supers array (which is stable and therefore replicated across caches) versus time spent waiting to acquire write access to the SSC (which may be hundreds of cycles). Linear search will easily win in those cases, except of course for very dense dynamic query mixes over very complex interface graphs, which is a corner case we can leave for the future.

      The obvious place to put the update count is next to the SSC, on the same cache line. When the miss count overflows past some selected threshold, the SSC is left unchanged. On balance the extra footprint of a 32-bit field per Klass seems acceptable.

      Such a counter should be allowed to decay, so that temporary bursts in type test complexity do not shut down the SSC forever.

      Another possible fix would be a thread-local update counter for the SSC, under JavaThread::current. In that case, only Java code could use the extra fix to avoid cache contention, but that is probably acceptable also. This fix would be significantly more complex, but would have the benefit that only "offending" threads would throttle themselves.

      Similarly, the counter could be placed in the MethodData object which carries the profile of the instruction which is causing the SSC contention. (This instruction could be instanceof, checkcast, aastore, or a call to an intrinsic method that emulates one of those.) This fix would be even more complex than the thread-based fix, and would probably be overkill given the relatively small importance of the problem.

      If the secondary_supers lists ever grow in length to more than a few tens of elements, additional mechanisms may be needed for quickly testing the subtype relation. Probably a tree walk would be sufficient. Sometimes unified caches (global or thread-local) are proposed, or perhaps unified numbering schemes, but those, also, seem overkill for this problem.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                jrose John Rose
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated: