Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8050850

Loop unrolling produces unnecessary stack spills

    Details

    • Subcomponent:
    • CPU:
      x86
    • OS:
      generic

      Description

      Originally found with larger benchmark. With the targeted microbenchmark:
        http://cr.openjdk.java.net/~shade/8050850/ArrayTest.java

      default: 259.322 +- 3.997 ns/op
      LoopMaxUnroll=1: 276.164 +- 7.740 ns/op
      LoopMaxUnroll=2: 208.690 +- 2.690 ns/op
      LoopMaxUnroll=3: 207.843 +- 1.946 ns/op
      LoopMaxUnroll=4: 256.309 +- 1.202 ns/op

      Default mode seems be very close to LMU=4 assembly-wise. Below are the hottest loops for different LMUs:
       (monospaced: http://cr.openjdk.java.net/~shade/8050850/lmu-hots.txt)

      -------------------

      LoopMaxUnroll=1:

      ; working...
        3.12% 3.08% 0x00007fec78107f60: mov 0x10(%rsi,%r10,4),%r11d
        4.22% 4.20% 0x00007fec78107f65: add 0xc(%r12,%r11,8),%edi

      ; index increment + back branch
       80.62% 82.20% 0x00007fec78107f6a: inc %r10d
        2.11% 2.02% 0x00007fec78107f6d: cmp %ecx,%r10d
                          0x00007fec78107f70: jl 0x00007fec78107f60

      -------------------

      LoopMaxUnroll=2:

      ; working...
        1.73% 1.80% 0x00007f966d1a15b0: mov 0x10(%rsi,%r11,4),%r9d
        2.44% 2.23% 0x00007f966d1a15b5: add 0xc(%r12,%r9,8),%edx
       31.49% 43.87% 0x00007f966d1a15ba: movslq %r11d,%r9
        1.04% 1.04% 0x00007f966d1a15bd: mov 0x14(%rsi,%r9,4),%r9d
        1.91% 1.80% 0x00007f966d1a15c2: mov 0xc(%r12,%r9,8),%r9d
       22.86% 23.19% 0x00007f966d1a15c7: add %r9d,%edx

      ; index increment + back branch
       24.86% 15.38% 0x00007f966d1a15ca: add $0x2,%r11d
        0.87% 0.76% 0x00007f966d1a15ce: cmp %r10d,%r11d
                          0x00007f966d1a15d1: jl 0x00007f966d1a15b0

      -------------------

      LoopMaxUnroll=4:

        0.47% 0.17% 0x00007fc25919f5f0: mov %rdx,%rbx

      ; taking three things from stack
        0.11% 0.07% 0x00007fc25919f5f3: mov (%rsp),%r8
        8.02% 9.26% 0x00007fc25919f5f7: mov 0x8(%rsp),%rdx
        2.62% 2.65% 0x00007fc25919f5fc: mov 0x10(%rsp),%r9

      ; working...
        0.46% 0.43% 0x00007fc25919f601: mov 0x10(%rsi,%r10,4),%r11d
        0.13% 0.13% 0x00007fc25919f606: add 0xc(%r12,%r11,8),%edi
       14.20% 15.05% 0x00007fc25919f60b: movslq %r10d,%rax
        0.25% 0.18% 0x00007fc25919f60e: mov 0x14(%rsi,%rax,4),%r11d
        0.04% 0.07% 0x00007fc25919f613: mov 0xc(%r12,%r11,8),%r11d

      ; putting the same three things back on stack, no usages (!!!)
        9.80% 10.28% 0x00007fc25919f618: mov %r9,0x10(%rsp)
        2.80% 2.97% 0x00007fc25919f61d: mov %rdx,0x8(%rsp)
        0.25% 0.32% 0x00007fc25919f622: mov %r8,(%rsp)

      ; working...
        0.13% 0.15% 0x00007fc25919f626: mov %rbx,%rdx
        9.09% 9.92% 0x00007fc25919f629: mov 0x18(%rsi,%rax,4),%r8d
        2.71% 3.00% 0x00007fc25919f62e: mov 0xc(%r12,%r8,8),%r8d
        4.23% 4.05% 0x00007fc25919f633: mov 0x1c(%rsi,%rax,4),%ebx
        0.04% 0.04% 0x00007fc25919f637: mov 0xc(%r12,%rbx,8),%r9d
       13.38% 12.47% 0x00007fc25919f63c: add %r11d,%edi
        1.32% 0.78% 0x00007fc25919f63f: add %r8d,%edi
        4.90% 4.75% 0x00007fc25919f642: add %r9d,%edi

      ; index increment + back branch
       11.02% 11.13% 0x00007fc25919f645: add $0x4,%r10d
        2.59% 2.44% 0x00007fc25919f649: cmp %ecx,%r10d
                          0x00007fc25919f64c: jl 0x00007fc25919f5f0

      -------------------

      There, LMU=4 starts to spill something without a good reason.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              shade Aleksey Shipilev
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated: