Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8029302

Performance regression in Math.pow intrinsic

    Details

    • Subcomponent:
    • Resolved In Build:
      b15
    • OS:
      linux

      Backports

        Description

        FULL PRODUCT VERSION :
        java version "1.7.0_40"
        Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
        Java HotSpot(TM) 64-Bit Server VM (build 24.0-b56, mixed mode)


        FULL OS VERSION :
        Linux spica 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 GMT 2011 x86_64 x86_64 x86_64 GNU/Linux
        (CentOS 6)

        EXTRA RELEVANT SYSTEM CONFIGURATION :
        /proc/cpuinfo:
        processor: 0
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 0
        siblings: 6
        core id: 0
        cpu cores: 6
        apicid: 0
        initial apicid: 0
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.75
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 1
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 0
        siblings: 6
        core id: 1
        cpu cores: 6
        apicid: 2
        initial apicid: 2
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.24
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 2
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 0
        siblings: 6
        core id: 2
        cpu cores: 6
        apicid: 4
        initial apicid: 4
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.23
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 3
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 0
        siblings: 6
        core id: 3
        cpu cores: 6
        apicid: 6
        initial apicid: 6
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.24
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 4
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 0
        siblings: 6
        core id: 4
        cpu cores: 6
        apicid: 8
        initial apicid: 8
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.24
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 5
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 0
        siblings: 6
        core id: 5
        cpu cores: 6
        apicid: 10
        initial apicid: 10
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.24
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 6
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 1
        siblings: 6
        core id: 0
        cpu cores: 6
        apicid: 32
        initial apicid: 32
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.28
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 7
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 1
        siblings: 6
        core id: 1
        cpu cores: 6
        apicid: 34
        initial apicid: 34
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.30
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 8
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 1
        siblings: 6
        core id: 2
        cpu cores: 6
        apicid: 36
        initial apicid: 36
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.29
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 9
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 1
        siblings: 6
        core id: 3
        cpu cores: 6
        apicid: 38
        initial apicid: 38
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.29
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 10
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 1
        siblings: 6
        core id: 4
        cpu cores: 6
        apicid: 40
        initial apicid: 40
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.27
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:

        processor: 11
        vendor_id: GenuineIntel
        cpu family: 6
        model: 45
        model name: Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
        stepping: 7
        cpu MHz: 2501.000
        cache size: 15360 KB
        physical id: 1
        siblings: 6
        core id: 5
        cpu cores: 6
        apicid: 42
        initial apicid: 42
        fpu: yes
        fpu_exception: yes
        cpuid level: 13
        wp: yes
        flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 x2apic popcnt aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
        bogomips: 4999.27
        clflush size: 64
        cache_alignment: 64
        address sizes: 46 bits physical, 48 bits virtual
        power management:


        A DESCRIPTION OF THE PROBLEM :
        It seems the Math.pow() implementation has changed between 7u25 and 7u40, with a strong performance regression.

        Attached test case shows on my machine:
         - 7u25: ~1700ms
         - 7u40: ~8500ms

        Using "-XX:+UnlockDiagnosticVMOptions -XX:+PrintIntrinsics" shows the intrinsic implementation is used in both cases.


        THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Yes

        THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

        REGRESSION. Last worked in version 7u25

        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        - Compile attached code
        - Run with JDK 7u25 and 7u40

        REPRODUCIBILITY :
        This bug can be reproduced always.

        ---------- BEGIN SOURCE ----------
        import java.util.Random;

        public class Main {

            public static void main(String[] args) throws Exception {

                while (true) {

                    final Random random = new Random();
                    final double[] values = new double[100_000_000];
                    for (int i = 0; i < values.length; i++)
                        values[i] = random.nextDouble();

                    System.gc();

                    final long start = System.currentTimeMillis();

                    double blackhole = 0;
                    for (int i = 0; i < values.length; i++)
                        blackhole += Math.pow(values[i], 2);

                    final long elapsed = System.currentTimeMillis() - start;

                    System.out.println(elapsed + "ms (" + blackhole + ")");
                }
            }
        }
        ---------- END SOURCE ----------
        1. Main.java
          0.7 kB
          Christian Thalinger

          Issue Links

            Activity

            Hide
            acorn Karen Kinnear added a comment -
            To the best of my understanding, the compiler team owns the vm intrinsics library.
            Show
            acorn Karen Kinnear added a comment - To the best of my understanding, the compiler team owns the vm intrinsics library.
            Hide
            twisti Christian Thalinger added a comment -
            I can reproduce the regression on one of our machines:

            cthaling@intelsdv03.us.oracle.com:~/ws$ /java/re/jdk/7u25/latest/binaries/linux-x64/bin/java Main
            1752ms (3.333572549821999E7)
            1775ms (3.333630157987734E7)
            1775ms (3.3339959311967324E7)

            cthaling@intelsdv03.us.oracle.com:~/ws$ /java/re/jdk/7u40/latest/binaries/linux-x64/bin/java Main
            8860ms (3.3332716430023532E7)
            8982ms (3.333500694103926E7)
            8872ms (3.3330793038973276E7)
            Show
            twisti Christian Thalinger added a comment - I can reproduce the regression on one of our machines: cthaling@intelsdv03.us.oracle.com :~/ws$ /java/re/jdk/7u25/latest/binaries/linux-x64/bin/java Main 1752ms (3.333572549821999E7) 1775ms (3.333630157987734E7) 1775ms (3.3339959311967324E7) cthaling@intelsdv03.us.oracle.com :~/ws$ /java/re/jdk/7u40/latest/binaries/linux-x64/bin/java Main 8860ms (3.3332716430023532E7) 8982ms (3.333500694103926E7) 8872ms (3.3330793038973276E7)
            Hide
            twisti Christian Thalinger added a comment -
            Vladimir reminded me of a very important point: the C++ implementation has a special case for power-of-2 values. Here the same numbers for 3 instead of 2:

                            blackhole += Math.pow(values[i], 3);

            cthaling@intelsdv03.us.oracle.com:~/ws$ /java/re/jdk/7u25/latest/binaries/linux-x64/bin/java Main
            24041ms (2.4998553978965953E7)
            24084ms (2.50029473511136E7)
            24088ms (2.5001841408912268E7)

            cthaling@intelsdv03.us.oracle.com:~/ws$ /java/re/jdk/7u40/latest/binaries/linux-x64/bin/java Main
            8853ms (2.5002001402242936E7)
            8987ms (2.500130893984287E7)
            9024ms (2.499793506756205E7)

            We might have to enhance the intrinsic to special-case power-of-2 values.
            Show
            twisti Christian Thalinger added a comment - Vladimir reminded me of a very important point: the C++ implementation has a special case for power-of-2 values. Here the same numbers for 3 instead of 2:                 blackhole += Math.pow(values[i], 3); cthaling@intelsdv03.us.oracle.com :~/ws$ /java/re/jdk/7u25/latest/binaries/linux-x64/bin/java Main 24041ms (2.4998553978965953E7) 24084ms (2.50029473511136E7) 24088ms (2.5001841408912268E7) cthaling@intelsdv03.us.oracle.com :~/ws$ /java/re/jdk/7u40/latest/binaries/linux-x64/bin/java Main 8853ms (2.5002001402242936E7) 8987ms (2.500130893984287E7) 9024ms (2.499793506756205E7) We might have to enhance the intrinsic to special-case power-of-2 values.
            Hide
            azeemj Azeem Jiva added a comment -
            ILW=MMM=P3

            Impact: Medium, performance regression is only with power of 2 values
            Likelihood: Medium, again only with power of 2 values
            Workaround: Medium, the developer can write code similar to:

            if (y == 2) {
              return y * y;
            }
            return Math.pow(x, y);

            The regression is limited to certain values and in general Math.pow has significant improvements from 7u25 to 7u40 (see Christian's comment above).
            Show
            azeemj Azeem Jiva added a comment - ILW=MMM=P3 Impact: Medium, performance regression is only with power of 2 values Likelihood: Medium, again only with power of 2 values Workaround: Medium, the developer can write code similar to: if (y == 2) {   return y * y; } return Math.pow(x, y); The regression is limited to certain values and in general Math.pow has significant improvements from 7u25 to 7u40 (see Christian's comment above).
            Hide
            twisti Christian Thalinger added a comment -
            A rather "easy" fix for this and other special cases would be to add these special cases to Math.pow, like:

                public static double pow(double a, double b) {
                    if (b == 2) {
                        return a * a;
                    }
                    return StrictMath.pow(a, b); // default impl. delegates to StrictMath
                }

            and intrinsify StrictMath.pow in the compiler. Here is the speedup:

            cthaling@macbook:~/ws$ java -Xbootclasspath/p:$HOME/ws/jdk8-tl/jdk/classes Main
            103ms (3.3331021752770673E7)
            92ms (3.332789835208015E7)
            102ms (3.3333771275472376E7)
            Show
            twisti Christian Thalinger added a comment - A rather "easy" fix for this and other special cases would be to add these special cases to Math.pow, like:     public static double pow(double a, double b) {         if (b == 2) {             return a * a;         }         return StrictMath.pow(a, b); // default impl. delegates to StrictMath     } and intrinsify StrictMath.pow in the compiler. Here is the speedup: cthaling@macbook :~/ws$ java -Xbootclasspath/p:$HOME/ws/jdk8-tl/jdk/classes Main 103ms (3.3331021752770673E7) 92ms (3.332789835208015E7) 102ms (3.3333771275472376E7)
            Hide
            twisti Christian Thalinger added a comment - - edited
            On a second thought I doubt we can intrinsify StrictMath.pow but we could add another private method to Math which we can intrinsify. Something like:

                public static double pow(double a, double b) {
                    if (b == 2) {
                        return a * a;
                    }
                    return powImpl(a, b);
                }

                private static double powImpl(double a, double b) {
                    return StrictMath.pow(a, b); // default impl. is either intrinsified or delegates to StrictMath
                }
            Show
            twisti Christian Thalinger added a comment - - edited On a second thought I doubt we can intrinsify StrictMath.pow but we could add another private method to Math which we can intrinsify. Something like:     public static double pow(double a, double b) {         if (b == 2) {             return a * a;         }         return powImpl(a, b);     }     private static double powImpl(double a, double b) {         return StrictMath.pow(a, b); // default impl. is either intrinsified or delegates to StrictMath     }
            Hide
            kvn Vladimir Kozlov added a comment -
            Hi Azeem,

            On 04/08/2014 11:53 AM, Azeem Jiva wrote:
            > Joe,
            > There is a Math.pow performance regression where power of 2 inputs run slower than other values. One recommendation was to fix this in the libraries rather then a special case in the JVM. Can we fix this in 8u20 in the libraries? What are your thoughts on this?
            >

            Let's give credit / blame where it is due: the work done under JDK-7133857 introduced a performance regression in some cases along with some correctness bugs (JDK-7174532). Way back when when Cliff was still around, I worked with him (and an intern IIRC) to intrinsify pow. (This led to the development of some fruitful pow tests [1]). I don't know how the ultimate x87 instruction sequences used in JDK-7133857 differ from what was done previously, but given the bug tail of JDK-7133857, the results of the effort seems a bit suspect.

            The fdlibm code used for StrictMath.pow does have an explicit up-front check for an exponent of 2.

            I do *not* support adding another check for an exponent of 2 in the JDK libraries. Any bug here is with the HotSpot intrinsification of pow and IIRC that is where the fix / work-around should go. I also recommend reexamining the work done under JDK-7133857 to make sure it meets other correctness properties that we might not have tests for. (I was not asked to review that work before it when back.)

            Cheers,

            -Joe

            [1] https://blogs.oracle.com/darcy/entry/finding_a_bug_in_fdlibm
            Show
            kvn Vladimir Kozlov added a comment - Hi Azeem, On 04/08/2014 11:53 AM, Azeem Jiva wrote: > Joe, > There is a Math.pow performance regression where power of 2 inputs run slower than other values. One recommendation was to fix this in the libraries rather then a special case in the JVM. Can we fix this in 8u20 in the libraries? What are your thoughts on this? > Let's give credit / blame where it is due: the work done under JDK-7133857 introduced a performance regression in some cases along with some correctness bugs ( JDK-7174532 ). Way back when when Cliff was still around, I worked with him (and an intern IIRC) to intrinsify pow. (This led to the development of some fruitful pow tests [1]). I don't know how the ultimate x87 instruction sequences used in JDK-7133857 differ from what was done previously, but given the bug tail of JDK-7133857, the results of the effort seems a bit suspect. The fdlibm code used for StrictMath.pow does have an explicit up-front check for an exponent of 2. I do *not* support adding another check for an exponent of 2 in the JDK libraries. Any bug here is with the HotSpot intrinsification of pow and IIRC that is where the fix / work-around should go. I also recommend reexamining the work done under JDK-7133857 to make sure it meets other correctness properties that we might not have tests for. (I was not asked to review that work before it when back.) Cheers, -Joe [1] https://blogs.oracle.com/darcy/entry/finding_a_bug_in_fdlibm
            Hide
            hgupdate HG Updates added a comment -
            URL: http://hg.openjdk.java.net/jdk8u/hs-dev/hotspot/rev/400709e275c1
            User: adlertz
            Date: 2014-05-15 13:21:25 +0000
            Show
            hgupdate HG Updates added a comment - URL: http://hg.openjdk.java.net/jdk8u/hs-dev/hotspot/rev/400709e275c1 User: adlertz Date: 2014-05-15 13:21:25 +0000
            Hide
            hgupdate HG Updates added a comment -
            URL: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/400709e275c1
            User: jcoomes
            Date: 2014-05-20 19:27:22 +0000
            Show
            hgupdate HG Updates added a comment - URL: http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/rev/400709e275c1 User: jcoomes Date: 2014-05-20 19:27:22 +0000

              People

              • Assignee:
                adlertz Niclas Adlertz (Inactive)
                Reporter:
                webbuggrp Webbug Group
              • Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: