Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8247838

PostLoopMultiversioning is broken and generates incorrect result

    Details

    • Type: Bug
    • Status: Closed
    • Priority: P4
    • Resolution: Duplicate
    • Affects Version/s: 14, 15
    • Fix Version/s: 15
    • Component/s: hotspot
    • Labels:
    • Subcomponent:
    • CPU:
      x86
    • OS:
      generic

      Description

      C2 optimization PostLoopMultiversioning is broken and generates incorrect result with latest jdk (14/15). This can be reproduced by below program on x86 with UseAVX=3.

      public class Foo {

        private static final int SIZE = 65536;

        private static void bar(int[] a, int[] b, int[] c, int start, int limit) {
          for (int i = start; i < limit; i += 1) {
            c[i] = a[i] + b[i];
          }
        }

        public static void main(String[] args) {
          int[] a = new int[SIZE];
          int[] b = new int[SIZE];
          int[] c = new int[SIZE];

          for (int i = 0; i < SIZE; i++) {
            a[i] = i;
            b[i] = i;
            c[i] = 0;
          }

          for (int i = 0; i < 20000; i++) {
            bar(a, b, c, 16384, 32768);
          }

          int sum = 0;
          for (int i = 32760; i < 32780; i++) {
            sum += c[i];
          }
          System.out.println(sum);
        }
      }

      $java -XX:+UnlockExperimentalVMOptions -XX:+PostLoopMultiversioning Foo
      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      # SIGSEGV (0xb) at pc=0x00007f04f181cb15, pid=20589, tid=20611
      #
      # JRE version: OpenJDK Runtime Environment (16.0) (slowdebug build 16-internal+0-adhoc..jdksrc)
      # Java VM: OpenJDK 64-Bit Server VM (slowdebug 16-internal+0-adhoc..jdksrc, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
      # Problematic frame:
      # V [libjvm.so+0x10dbb15] SuperWord::transform_loop(IdealLoopTree*, bool)+0x513
      #
      # Core dump will be written. Default location: /home/ent-user/case/core
      #
      # An error report file with more information is saved as:
      # /home/ent-user/case/hs_err_pid20589.log
      #
      # Compiler replay data is saved as:
      # /home/ent-user/case/replay_pid20589.log
      #
      # If you would like to submit a bug report, please visit:
      # https://bugreport.java.com/bugreport/crash.jsp
      #
      Aborted (core dumped)

      The SIGSEGV occurs at C2 code superword.cpp here (http://hg.openjdk.java.net/jdk/jdk/file/cc7b6598df7e/src/hotspot/share/opto/superword.cpp#l173).

      Cause is that lpt_next could be null after loop strip mining. The code block checks which post loop (the normal vector post or the multi-versioned post) resides after the main loop. But if the main loop is strip-mined, the _next loop would be null. So to fix this crash we can check if it's strip-mined and search the _parent->_next loop if it is.

      Patch:
      diff --git a/src/hotspot/share/opto/superword.cpp b/src/hotspot/share/opto/superword.cpp
      index 0f4da5e8cfa..caf59461164 100644
      --- a/src/hotspot/share/opto/superword.cpp
      +++ b/src/hotspot/share/opto/superword.cpp
      @@ -169,7 +169,7 @@ void SuperWord::transform_loop(IdealLoopTree* lpt, bool do_optimization) {
           SLP_extract();
           if (PostLoopMultiversioning && Matcher::has_predicated_vectors()) {
             if (cl->is_vectorized_loop() && cl->is_main_loop() && !cl->is_reduction_loop()) {
      - IdealLoopTree *lpt_next = lpt->_next;
      + IdealLoopTree *lpt_next = cl->is_strip_mined() ? lpt->_parent->_next : lpt->_next;
               CountedLoopNode *cl_next = lpt_next->_head->as_CountedLoop();
               _phase->has_range_checks(lpt_next);
               if (cl_next->is_post_loop() && !cl_next->range_checks_present()) {

      After this fix, there's no crash but C2 still generates incorrect result.

      $java Foo
      524216

      $java -XX:+UnlockExperimentalVMOptions -XX:+PostLoopMultiversioning Foo
      917462

      After looking into the generated assembly, we found the multi-versioned post loop should generate vectorized loads/stores with predicates, but it doesn't. The original implementation hard-coded x86 k1 register as the mask register. The k1 register is initialized at the beginning of multiverioned post loop based on the remaining trip count and restored when the multiversioned loop ends. (see http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/ccfc68592c92#l6.7 for JDK-8153998) But after several x86 backend code refactorings and the default mask register changing from k1 to k0 (see http://hg.openjdk.java.net/jdk/jdk/rev/390f529f4f22#l2.29 for JDK-8211251), the vectorized with predicates instructions can no longer be generated by x86 assembler.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                thartmann Tobias Hartmann
                Reporter:
                pli Pengfei Li
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: