Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8146801

Allocating short arrays of non-constant size is slow

    Details

    • Type: Enhancement
    • Status: Resolved
    • Priority: P3
    • Resolution: Fixed
    • Affects Version/s: 9
    • Fix Version/s: 9
    • Component/s: hotspot
    • Labels:
    • Subcomponent:
    • Resolved In Build:
      b112
    • CPU:
      x86

      Description

      When allocating an array of statically-known size, our current hot-path zeroing strategy seems to split out the zeroing into the individual stores when the size is small. However, this does not happen at all for the arrays of non-constant size, which sets us up for the significant penalty when allocating small arrays.

      Benchmark:
       http://cr.openjdk.java.net/~shade/8146801/EmptyArrayBench.java
       http://cr.openjdk.java.net/~shade/8146801/benchmarks.jar

      Performance data:
        http://cr.openjdk.java.net/~shade/8146801/notes.txt

      The crux of the issue seems to be a large "rep stos" setup cost (see [1]). Note that Agner argues [2] that rep instructions are still future-proof, because they allow CPUs to select the appropriate implementation. It seems we only need to cater for the setup costs here. In C2, we avoid "rep stos" on small arrays when the size is known statically. In C1, we always do the looped mov, which is amusingly faster than C2-ish attempt at "rep stos"-ing on small arrays. It might be worthwhile to check the array size in zeroing path, and do the looped initialization for small sizes.

      C2 non-constant size = 8: 12.610 ± 0.193 ns/op
       http://cr.openjdk.java.net/~shade/8146801/c2-field-8.perfasm

      C2 constant size = 8: 4.681 ± 0.135 ns/op
        http://cr.openjdk.java.net/~shade/8146801/c2-const-8.perfasm

      C1 non-constant size = 8: 6.839 ± 0.103 ns/op
       http://cr.openjdk.java.net/~shade/8146801/c1-field-8.perfasm

      C1 constant size = 8: 6.843 ± 0.079 ns/op
        http://cr.openjdk.java.net/~shade/8146801/c1-const-8.perfasm

      [1] http://www.agner.org/optimize/optimizing_assembly.pdf , 17.9, "Moving blocks of data (All processors)"
      [2] http://www.agner.org/optimize/optimizing_assembly.pdf , 17.9, "Moving data on future processors"

        Issue Links

          Activity

          Hide
          shade Aleksey Shipilev added a comment - - edited
          Candidate webrev, that improves field_* tests almost up to const_* tests performance:
           http://cr.openjdk.java.net/~shade/8146801/webrev.02/
          Show
          shade Aleksey Shipilev added a comment - - edited Candidate webrev, that improves field_* tests almost up to const_* tests performance:   http://cr.openjdk.java.net/~shade/8146801/webrev.02/
          Show
          shade Aleksey Shipilev added a comment - RFR: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2016-February/021720.html
          Hide
          hgupdate HG Updates added a comment -
          URL: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/a66bdd827fcb
          User: shade
          Date: 2016-03-04 00:31:42 +0000
          Show
          hgupdate HG Updates added a comment - URL: http://hg.openjdk.java.net/jdk9/hs-comp/hotspot/rev/a66bdd827fcb User: shade Date: 2016-03-04 00:31:42 +0000
          Hide
          hgupdate HG Updates added a comment -
          URL: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a66bdd827fcb
          User: lana
          Date: 2016-03-30 18:38:07 +0000
          Show
          hgupdate HG Updates added a comment - URL: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/a66bdd827fcb User: lana Date: 2016-03-30 18:38:07 +0000

            People

            • Assignee:
              shade Aleksey Shipilev
              Reporter:
              shade Aleksey Shipilev
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: