Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8035561

Optimize the average operation if the pipeline is sized

    Details

      Description

      If a pipeline is of a known size then the average operation need not maintain a count of the elements.

      For example, given the following pipeline:

        double[] d =
        DoubleStream ds = DoubleStream.of(d);
        double a = ds.average().getOrElse(0.0);

      The size of the pipeline is the size of the array.

      The implementation of average is currently:

              double[] avg = collect(() -> new double[4],
                                     (ll, d) -> {
                                         ll[2]++;
                                         Collectors.sumWithCompensation(ll, d);
                                         ll[3] += d;
                                     },
                                     (ll, rr) -> {
                                         Collectors.sumWithCompensation(ll, rr[0]);
                                         Collectors.sumWithCompensation(ll, rr[1]);
                                         ll[2] += rr[2];
                                         ll[3] += rr[3];
                                     });

      and it can be implemented as follows when the size is known:

        sum() / pipeline_size

      (Assuming it is easy to surface the size of the pipeline, otherwise a special op is required to get access to that information).
      1. average.patch
        17 kB
        Paul Sandoz
      2. AverageTest.java
        5 kB
        Paul Sandoz

        Issue Links

          Activity

          Hide
          psandoz Paul Sandoz added a comment - - edited
          Plus the average/sum operation is doubly (no pun intended) inefficient because it maintains an uncompensated and compensated sum.
          Show
          psandoz Paul Sandoz added a comment - - edited Plus the average/sum operation is doubly (no pun intended) inefficient because it maintains an uncompensated and compensated sum.
          Hide
          psandoz Paul Sandoz added a comment -
          It appears any such optimizations are ineffective since the Kahan summation / compensation summation is the dominant cost and the cost of also counting and an uncompensated sum is far less in comparison (probably also due to some clever optimizations when combined with loop unrolling of the compensated sum).

          Attached is a benchmark (AverageTest.java) comparing various average calculations using compensated and uncompensated summation.

          The results on a mac book pro are:

          Benchmark Mode Thr Cnt Sec Mean Mean error Units
          t.AverageTest.average_par avgt 1 5 0 6.222 0.203 ms/op
          t.AverageTest.average_seq avgt 1 5 0 38.373 0.148 ms/op
          t.AverageTest.for_seq avgt 1 5 0 8.620 0.050 ms/op
          t.AverageTest.for_seq_sumc avgt 1 5 0 38.351 0.249 ms/op
          t.AverageTest.for_seq_sumc_only avgt 1 5 0 34.001 0.086 ms/op
          t.AverageTest.reduce_par avgt 1 5 0 3.716 0.038 ms/op
          t.AverageTest.reduce_seq avgt 1 5 0 8.609 0.053 ms/op

          A sequentially calculated compensated sum for 10_000_000 elements is ~ 4x slower than an uncompensated sum, which makes sense since there is a ratio of ~ 1:4 double summation operations.

          The parallel speed up of the average() operation is ~ 6x. This is somewhat suspicious since the mac book has only 4 cores. It is not yet clear to me what difference, if any, between sequential and parallel generated code might explain the disproportionate gain.


          Applying a patch (average.patch) to optimize the average() operation shows there is no material gain:

          Benchmark Mode Thr Cnt Sec Mean Mean error Units
          t.AverageTest.average_par avgt 1 5 0 6.280 0.165 ms/op
          t.AverageTest.average_seq avgt 1 5 0 39.031 0.642 ms/op
          Show
          psandoz Paul Sandoz added a comment - It appears any such optimizations are ineffective since the Kahan summation / compensation summation is the dominant cost and the cost of also counting and an uncompensated sum is far less in comparison (probably also due to some clever optimizations when combined with loop unrolling of the compensated sum). Attached is a benchmark (AverageTest.java) comparing various average calculations using compensated and uncompensated summation. The results on a mac book pro are: Benchmark Mode Thr Cnt Sec Mean Mean error Units t.AverageTest.average_par avgt 1 5 0 6.222 0.203 ms/op t.AverageTest.average_seq avgt 1 5 0 38.373 0.148 ms/op t.AverageTest.for_seq avgt 1 5 0 8.620 0.050 ms/op t.AverageTest.for_seq_sumc avgt 1 5 0 38.351 0.249 ms/op t.AverageTest.for_seq_sumc_only avgt 1 5 0 34.001 0.086 ms/op t.AverageTest.reduce_par avgt 1 5 0 3.716 0.038 ms/op t.AverageTest.reduce_seq avgt 1 5 0 8.609 0.053 ms/op A sequentially calculated compensated sum for 10_000_000 elements is ~ 4x slower than an uncompensated sum, which makes sense since there is a ratio of ~ 1:4 double summation operations. The parallel speed up of the average() operation is ~ 6x. This is somewhat suspicious since the mac book has only 4 cores. It is not yet clear to me what difference, if any, between sequential and parallel generated code might explain the disproportionate gain. Applying a patch (average.patch) to optimize the average() operation shows there is no material gain: Benchmark Mode Thr Cnt Sec Mean Mean error Units t.AverageTest.average_par avgt 1 5 0 6.280 0.165 ms/op t.AverageTest.average_seq avgt 1 5 0 39.031 0.642 ms/op

            People

            • Assignee:
              psandoz Paul Sandoz
              Reporter:
              psandoz Paul Sandoz
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: