Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8237077

C2 fails to optimize certain code shapes with memory access indexed var handles

    Details

    • Type: Enhancement
    • Status: Open
    • Priority: P3
    • Resolution: Unresolved
    • Affects Version/s: 15
    • Fix Version/s: tbd
    • Component/s: hotspot
    • Labels:

      Description

      Note: to reproduce this issue, it is best to use the code in the Panama repository, the relevant code is contained in the "foreign-memaccess" branch. Consider the following benchmark:

      static final int ELEM_SIZE = 1_000_000;
      static final int CARRIER_SIZE = (int)JAVA_INT.byteSize();
      static final int ALLOC_SIZE = ELEM_SIZE * CARRIER_SIZE;

      static final VarHandle VH_int = MemoryLayout.ofSequence(JAVA_INT).varHandle(int.class, sequenceElement());

      @Benchmark
          public void segment_loop() {
              try (MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE)) {
                  for (int i = 0; i < ELEM_SIZE; i++) {
                      MemoryAddress address = segment.baseAddress();
                      if (i % 2 == 0) {
                          VH_int.set(address, (long)i, i + 1);
                      } else {
                          VH_int.set(address, (long)i, i - 1);
                      }
                  }
              }
          }

      This gives good performances, and profiler traces shows that the loop is unrolled as expected. But if we change the benchmark to this:

      @Benchmark
          public void segment_loop() {
              try (MemorySegment segment = MemorySegment.allocateNative(ALLOC_SIZE)) {
                  for (int i = 0; i < ELEM_SIZE; i++) {
                      if (i % 2 == 0) {
                          VH_int.set(segment.baseAddress(), (long)i, i + 1);
                      } else {
                          VH_int.set(segment.baseAddress(), (long)i, i - 1);
                      }
                  }
              }
          }
                      
      The loop is not unrolled, and none of the memory access API checks are hoisted outside of the loop, which yields much slower performances. I suspect some failure in escape analysis, or scalarization.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                vlivanov Vladimir Ivanov
                Reporter:
                mcimadamore Maurizio Cimadamore
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: