Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6615692

JVM 1.6.0 SPECjvm2008 Scimark FFT & scimark2 FFT performance 30% worse on Solaris than RedHat 5

    Details

    • Type: Bug
    • Status: Closed
    • Priority: P2
    • Resolution: Not an Issue
    • Affects Version/s: 6-pool
    • Fix Version/s: None
    • Component/s: performance
    • Labels:
    • Subcomponent:
    • CPU:
      x86
    • OS:
      solaris

      Description

      The Scimark Fast Fourier Transform metric running on Solaris with jvm 1.6.0
      performs 30% worse than RedHat 5 running with jvm 1.6.0.

      Benchmark snv_74@x3220 RHEL5@x3220 %DIFF snv_74@x5355 RHEL5@x5355 %DIFF
      MonteCarlo 259.89 246.16 5.58% 285.9 273.5 4.53%
      FFT 51.44 72.63 -29.18% 32.6 48.62 -32.95%
      LU 806.86 520.83 54.92% 871.08 529.52 64.50%
      SOR 904.92 882.01 2.60% 988.03 956.29 3.32%
      Sparse 510.32 500.73 1.92% 466.45 464.61 0.40%

      These runs we performed on both single socket(x3220) and dual socket(x5355) Xeons
      These results have also been duplicated on s10u4.
      Studio Analyzer Experiment source, bytecode and disassembly outputs for scimark FFT attached for both RHEL5-U1 and snv_89.

      Both run on 64bit JVMS.

      Analyzer output shows hot being the same in both, with two particular instructions taking significatly longer on snv..

      bytecode cpu time diffs...

      diff -cw bytecode.snv_89.out bytecode.RHEL5-U1.out
      Source file: /export/bench/java/scimark2a/jnt/scimark2/FFT.java
      Object file: /export/bench/java/scimark2a/jnt/scimark2/FFT.class
      Load Object: /export/bench/java/scimark2a/jnt/scimark2/FFT.class

         Excl. Incl.
         User CPU User CPU
          sec. sec.
      <snip>
      *** 666,672 ****
           0. 0. [135] 00000116: dload 8
           0. 0. [135] 00000118: dload 29
           0. 0. [135] 0000011a: dmul
      ! ## 3.232 3.232 [135] 0000011b: dsub
           0. 0. [135] 0000011c: dstore 31
           0. 0. [136] 0000011e: dload 6
           0. 0. [136] 00000120: dload 29
      --- 666,672 ----
           0. 0. [135] 00000116: dload 8
           0. 0. [135] 00000118: dload 29
           0. 0. [135] 0000011a: dmul
      ! ## 1.518 1.518 [135] 0000011b: dsub
           0. 0. [135] 0000011c: dstore 31
           0. 0. [136] 0000011e: dload 6
           0. 0. [136] 00000120: dload 29
      <snip>
      *** 692,698 ****
           0. 0. [139] 0000013c: iload 25
           0. 0. [139] 0000013e: iconst_1
           0. 0. [139] 0000013f: iadd
      ! ## 2.662 2.662 [139] 00000140: daload
           0. 0. [139] 00000141: dload 33
           0. 0. [139] 00000143: dsub
           0. 0. [139] 00000144: dastore
      --- 692,698 ----
           0. 0. [139] 0000013c: iload 25
           0. 0. [139] 0000013e: iconst_1
           0. 0. [139] 0000013f: iadd
      ! ## 1.947 1.947 [139] 00000140: daload
           0. 0. [139] 00000141: dload 33
           0. 0. [139] 00000143: dsub
           0. 0. [139] 00000144: dastore
      ***************


      Disassembly cpu time diffs:
      bash-3.2$ more dissassem.RHEL5-U1.out
      Source file: /export/bench/java/scimark2a/jnt/scimark2/FFT.java
      Object file: jnt.scimark2.FFT
      Load Object: JAVA_COMPILED_METHODS

         Excl. Incl.
         User CPU User CPU
          sec. sec.

      *** 180,207 ****
           0. 0. [ ?] 2ea: testb %al,(%rax)
           0. 0. [ ?] 2ec: addb %al,(%rax)
           0. 0. [ ?] 2ee: addb %al,(%rax)
      ! 0. 0. [ ?] 2f0: shll %r12d
           0. 0. [ ?] 2f3: movl %r12d,%r8d
           0. 0. [ ?] 2f6: addl %r14d,%r8d
      ! 0.010 0.010 [ ?] 2f9: cmpl 0x28(%rsp),%r8d
      ! 0.010 0.010 [ ?] 2fe: jae .+0x433 [ 0x731 ]
           0. 0. [ ?] 304: movl %r8d,%edx
           0. 0. [ ?] 307: incl %edx
           0. 0. [ ?] 309: movslq %r8d,%rax
      ! 0. 0. [ ?] 30c: movsd 0x18(%rcx,%rax,8),%xmm4
      ! ## 1.511 1.511 [ ?] 312: cmpl 0x28(%rsp),%edx
           0. 0. [ ?] 316: jae .+0x3b3 [ 0x6c9 ]
           0. 0. [ ?] 31c: movsd 0x20(%rcx,%rax,8),%xmm5
      ! 0. 0. [ ?] 322: movapd %xmm4,%xmm11
           0. 0. [ ?] 327: mulsd %xmm9,%xmm11
      ! 0. 0. [ ?] 32c: movapd %xmm5,%xmm12
           0. 0. [ ?] 331: mulsd %xmm9,%xmm12
      ! 0. 0. [ ?] 336: movapd %xmm5,%xmm1
           0. 0. [ ?] 33a: mulsd %xmm10,%xmm1
           0. 0. [ ?] 33f: movapd %xmm4,%xmm2
           0. 0. [ ?] 343: mulsd %xmm10,%xmm2
           0. 0. [ ?] 348: subsd %xmm1,%xmm11
      ! 0.010 0.010 [ ?] 34d: addsd %xmm2,%xmm12
           0. 0. [ ?] 352: cmpl 0x28(%rsp),%r12d
           0. 0. [ ?] 357: jae .+0x2f6 [ 0x64d ]
           0. 0. [ ?] 35d: movl %ebp,0xc(%rsp)
      --- 180,207 ----
           0. 0. [ ?] 2ea: testb %al,(%rax)
           0. 0. [ ?] 2ec: addb %al,(%rax)
           0. 0. [ ?] 2ee: addb %al,(%rax)
      ! 0.022 0.022 [ ?] 2f0: shll %r12d
           0. 0. [ ?] 2f3: movl %r12d,%r8d
           0. 0. [ ?] 2f6: addl %r14d,%r8d
      ! 0. 0. [ ?] 2f9: cmpl 0x28(%rsp),%r8d
      ! 0.011 0.011 [ ?] 2fe: jae .+0x433 [ 0x731 ]
           0. 0. [ ?] 304: movl %r8d,%edx
           0. 0. [ ?] 307: incl %edx
           0. 0. [ ?] 309: movslq %r8d,%rax
      ! 0.022 0.022 [ ?] 30c: movsd 0x18(%rcx,%rax,8),%xmm4
      ! ## 0.781 0.781 [ ?] 312: cmpl 0x28(%rsp),%edx
           0. 0. [ ?] 316: jae .+0x3b3 [ 0x6c9 ]
           0. 0. [ ?] 31c: movsd 0x20(%rcx,%rax,8),%xmm5
      ! 0.011 0.011 [ ?] 322: movapd %xmm4,%xmm11
           0. 0. [ ?] 327: mulsd %xmm9,%xmm11
      ! 0.044 0.044 [ ?] 32c: movapd %xmm5,%xmm12
           0. 0. [ ?] 331: mulsd %xmm9,%xmm12
      ! 0.055 0.055 [ ?] 336: movapd %xmm5,%xmm1
           0. 0. [ ?] 33a: mulsd %xmm10,%xmm1
           0. 0. [ ?] 33f: movapd %xmm4,%xmm2
           0. 0. [ ?] 343: mulsd %xmm10,%xmm2
           0. 0. [ ?] 348: subsd %xmm1,%xmm11
      ! 0. 0. [ ?] 34d: addsd %xmm2,%xmm12
           0. 0. [ ?] 352: cmpl 0x28(%rsp),%r12d
           0. 0. [ ?] 357: jae .+0x2f6 [ 0x64d ]
           0. 0. [ ?] 35d: movl %ebp,0xc(%rsp)
      ***************
      <snip>
      *** 209,236 ****
           0. 0. [ ?] 366: movq %rcx,%rbp
           0. 0. [ ?] 369: movl %r12d,%ecx
           0. 0. [ ?] 36c: incl %ecx
      ! 0.010 0.010 [ ?] 36e: movslq %r12d,%r9
           0. 0. [ ?] 371: movsd 0x18(%rbp,%r9,8),%xmm1
      ! ## 1.431 1.431 [ ?] 378: subsd %xmm11,%xmm1
      ! 0. 0. [ ?] 37d: movsd %xmm1,0x18(%rbp,%rax,8)
      ! 0. 0. [ ?] 383: cmpl 0x28(%rsp),%ecx
      ! 0.020 0.020 [ ?] 387: jae .+0x214 [ 0x59b ]
           0. 0. [ ?] 38d: movq %rbp,%rcx
           0. 0. [ ?] 390: movsd 0x20(%rcx,%r9,8),%xmm1
           0. 0. [ ?] 397: addl %r10d,%ebx
           0. 0. [ ?] 39a: subsd %xmm12,%xmm1
      ! 0.010 0.010 [ ?] 39f: movsd %xmm1,0x20(%rcx,%rax,8)
      ! 0.010 0.010 [ ?] 3a5: movapd %xmm11,%xmm1
           0. 0. [ ?] 3aa: addsd 0x18(%rcx,%r9,8),%xmm1
      ! 0.040 0.040 [ ?] 3b1: movsd %xmm1,0x18(%rcx,%r9,8)
      ! 0. 0. [ ?] 3b8: movapd %xmm12,%xmm1
           0. 0. [ ?] 3bd: addsd 0x20(%rcx,%r9,8),%xmm1
      ! 0.010 0.010 [ ?] 3c4: movsd %xmm1,0x20(%rcx,%r9,8)
      ! 0. 0. [ ?] 3cb: testl %eax,0x350302f(%rip)
           0. 0. [ ?] 3d1: cmpl 4(%rsp),%ebx
           0. 0. [ ?] 3d5: jge .+0x1b4 [ 0x589 ]
           0. 0. [ ?] 3db: movl %ebx,%r12d
      ! 0.010 0.010 [ ?] 3de: addl %r11d,%r12d
           0. 0. [ ?] 3e1: movl 4(%rsp),%r9d
           0. 0. [ ?] 3e6: movl 0xc(%rsp),%ebp
           0. 0. [ ?] 3ea: jmp .-0xfa [ 0x2f0 ]
      --- 209,236 ----
           0. 0. [ ?] 366: movq %rcx,%rbp
           0. 0. [ ?] 369: movl %r12d,%ecx
           0. 0. [ ?] 36c: incl %ecx
      ! 0. 0. [ ?] 36e: movslq %r12d,%r9
           0. 0. [ ?] 371: movsd 0x18(%rbp,%r9,8),%xmm1
      ! ## 0.924 0.924 [ ?] 378: subsd %xmm11,%xmm1
      ! 0.033 0.033 [ ?] 37d: movsd %xmm1,0x18(%rbp,%rax,8)
      ! 0.011 0.011 [ ?] 383: cmpl 0x28(%rsp),%ecx
      ! 0. 0. [ ?] 387: jae .+0x214 [ 0x59b ]
           0. 0. [ ?] 38d: movq %rbp,%rcx
           0. 0. [ ?] 390: movsd 0x20(%rcx,%r9,8),%xmm1
           0. 0. [ ?] 397: addl %r10d,%ebx
           0. 0. [ ?] 39a: subsd %xmm12,%xmm1
      ! 0.011 0.011 [ ?] 39f: movsd %xmm1,0x20(%rcx,%rax,8)
      ! 0. 0. [ ?] 3a5: movapd %xmm11,%xmm1
           0. 0. [ ?] 3aa: addsd 0x18(%rcx,%r9,8),%xmm1
      ! 0.044 0.044 [ ?] 3b1: movsd %xmm1,0x18(%rcx,%r9,8)
      ! 0.011 0.011 [ ?] 3b8: movapd %xmm12,%xmm1
           0. 0. [ ?] 3bd: addsd 0x20(%rcx,%r9,8),%xmm1
      ! 0.011 0.011 [ ?] 3c4: movsd %xmm1,0x20(%rcx,%r9,8)
      ! 0.044 0.044 [ ?] 3cb: testl %eax,0xffffffffff91ae6f(%rip)
           0. 0. [ ?] 3d1: cmpl 4(%rsp),%ebx
           0. 0. [ ?] 3d5: jge .+0x1b4 [ 0x589 ]
           0. 0. [ ?] 3db: movl %ebx,%r12d
      ! 0. 0. [ ?] 3de: addl %r11d,%r12d
           0. 0. [ ?] 3e1: movl 4(%rsp),%r9d
           0. 0. [ ?] 3e6: movl 0xc(%rsp),%ebp
           0. 0. [ ?] 3ea: jmp .-0xfa [ 0x2f0 ]
      ***************

      disassembly for both run on snv.
      rerunning on RHEL for RHEL analyzer test now to confirm results are the same.

        Attachments

          Activity

            People

            • Assignee:
              charring Colm Harrington
              Reporter:
              charring Colm Harrington
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Imported:
                Indexed: