Uploaded image for project: 'Code Tools'
  1. Code Tools
  2. CODETOOLS-7901494

JMH yields unstable scores due to Blackhole.consume(<primitive>) codegen

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: P5
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: tools
    • Labels:
      None
    • Subcomponent:
      jmh

      Description

      The following benchmark produces significantly different results from launch to launch, while the environment and the input remain the same (note fixed random seed).

          package bench;

          import org.openjdk.jmh.annotations.*;
          import org.openjdk.jmh.infra.Blackhole;

          import java.util.Random;

          @State(Scope.Thread)
          public class RandomBoolean {
              final boolean[] array = new boolean[1000];

              @Setup
              public void setup() {
                  Random random = new Random(0);
                  for (int i = 0; i < array.length; i++) {
                      array[i] = random.nextBoolean();
                  }
              }

              @Benchmark
              public void test(Blackhole bh) {
                  for (boolean b : array) {
                      bh.consume(b);
                  }
              }
          }

      Example output:

      # JMH 1.10.5 (released 9 days ago)
      # VM version: JDK 1.8.0_60, VM 25.60-b23
      # VM invoker: C:\Program Files\Java\jdk1.8.0_60\jre\bin\java.exe
      # VM options: -Dfile.encoding=UTF-8
      # Warmup: 3 iterations, 1 s each
      # Measurement: 5 iterations, 1 s each
      # Timeout: 10 min per iteration
      # Threads: 1 thread, will synchronize iterations
      # Benchmark mode: Average time, time/op
      # Benchmark: bench.RandomBoolean.test

      # Run progress: 0,00% complete, ETA 00:00:40
      # Fork: 1 of 5
      # Warmup Iteration 1: 4130,365 ns/op
      # Warmup Iteration 2: 3490,692 ns/op
      # Warmup Iteration 3: 3748,163 ns/op
      Iteration 1: 3764,941 ns/op
      Iteration 2: 3772,275 ns/op
      Iteration 3: 3740,286 ns/op
      Iteration 4: 3786,077 ns/op
      Iteration 5: 3748,906 ns/op

      # Run progress: 20,00% complete, ETA 00:00:34
      # Fork: 2 of 5
      # Warmup Iteration 1: 7317,436 ns/op
      # Warmup Iteration 2: 7170,205 ns/op
      # Warmup Iteration 3: 7386,139 ns/op
      Iteration 1: 7407,121 ns/op
      Iteration 2: 7364,115 ns/op
      Iteration 3: 7344,951 ns/op
      Iteration 4: 7360,834 ns/op
      Iteration 5: 7379,837 ns/op

      # Run progress: 40,00% complete, ETA 00:00:25
      # Fork: 3 of 5
      # Warmup Iteration 1: 3808,657 ns/op
      # Warmup Iteration 2: 3578,093 ns/op
      # Warmup Iteration 3: 3647,702 ns/op
      Iteration 1: 3682,237 ns/op
      Iteration 2: 3604,381 ns/op
      Iteration 3: 3615,115 ns/op
      Iteration 4: 3696,793 ns/op
      Iteration 5: 3631,282 ns/op

      # Run progress: 60,00% complete, ETA 00:00:16
      # Fork: 4 of 5
      # Warmup Iteration 1: 7325,468 ns/op
      # Warmup Iteration 2: 7189,478 ns/op
      # Warmup Iteration 3: 7371,007 ns/op
      Iteration 1: 7370,501 ns/op
      Iteration 2: 7391,320 ns/op
      Iteration 3: 7348,972 ns/op
      Iteration 4: 7344,491 ns/op
      Iteration 5: 7378,230 ns/op

      # Run progress: 80,00% complete, ETA 00:00:08
      # Fork: 5 of 5
      # Warmup Iteration 1: 7338,641 ns/op
      # Warmup Iteration 2: 7215,131 ns/op
      # Warmup Iteration 3: 7370,513 ns/op
      Iteration 1: 7368,331 ns/op
      Iteration 2: 7351,042 ns/op
      Iteration 3: 7358,692 ns/op
      Iteration 4: 7370,550 ns/op
      Iteration 5: 7339,147 ns/op


      Result "test":
        5900,817 ?(99.9%) 1371,584 ns/op [Average]
        (min, avg, max) = (3604,381, 5900,817, 7407,121), stdev = 1831,025
        CI (99.9%): [4529,234, 7272,401] (assumes normal distribution)


      # Run complete. Total time: 00:00:41

      Benchmark Mode Cnt Score Error Units
      RandomBoolean.test avgt 25 5900,817 ? 1371,584 ns/op

      The results are stable within each fork, but vary between forks.
      Two modes can be observed: a faster one (~3700 ns/op) and a slower one (~7400 ns/op).

      -XX:+PrintAssembly and -XX:+LogCompilation show the difference between two modes in the compiled code of Blackhole.consume. Where in one case a branch is considered taken, in the other it is considered untaken and vice versa.

      Looking back at the sources we see two conditional branches:

          public final void consume(boolean bool) {
              boolean bool1 = this.bool1; // volatile read
              boolean bool2 = this.bool2;
              if (bool == bool1 & bool == bool2) {
                  // SHOULD NEVER HAPPEN
                  nullBait.bool1 = bool; // implicit null pointer exception
              }
          }

      The difference comes from the way how bool1 and bool2 are initialized:
          bool1 = r.nextBoolean(); bool2 = !bool1;

      JIT relies on the statistics, and since the values are random, we have 50% probability
      that JIT will emit "je" instruction and 50% for "jne".

      The score would be stable if consume() performed the same work regardless of the actual values of bool1 and bool2.
      E.g.
          if ((bool ^ bool1) == (bool ^ bool2)) ...
      or simply
          if (bool & volatileFalse) ...

        Attachments

          Activity

            People

            • Assignee:
              shade Aleksey Shipilev
              Reporter:
              apangin Andrei Pangin
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: