Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-5014723

implement "strip mining" loop optimization

    Details

    • Subcomponent:
    • Understanding:
      Fix Understood
    • CPU:
      generic
    • OS:
      generic

      Description

      These JSR-166 tests (part of the J2SE workspace)

      test/java/util/concurrent/locks/ReentrantLock/TimeoutLockLoops.java
      test/java/util/concurrent/BlockingQueue/CancelledProducerConsumerLoops.java

      are known to fail on machines with multiple CPUs, both
      solaris-sparc and linux-amd64. It has never been seen to fail on
      uniprocessor machines (tested on linux-i586, windows-i586).
       
      The failures are seen on Tiger b43.

      The failures are not easy to reproduce. I see a failure about 1% of the
      time for both tests on both solaris-sparc and linux-amd64.

      My recipe for reproducing it is to run an infinite loop till a failure
      occurs, eg. for linux-amd64 I did:

      (martin@quoddy) java/util/concurrent/locks/ReentrantLock $ perl -e 'for ($i = 0;; $i++) { print "$i\n"; system("/u/martin/jct-tools/2.1.6/solaris/bin/jtreg -jdk:/u/martin/ws/tiger/build/linux-amd64/j2sdk-image -automatic TimeoutLockLoops.java"); exit 1 if $? != 0; }'

      which failed after 98 iterations.

      Here is an extract from a failing TimeoutLockLoops.jtr file:

      ----------System.out:(6/185)----------
      Threads: 1 0.038286s run time
      Threads: 2 0.039877s run time
      Threads: 3 0.0399s run time
      Threads: 5 0.039472s run time
      Threads: 8 0.059474s run time
      Threads: 12 0.079438s run time
      ----------System.err:(0/0)----------
      result: Failed. Execution failed: Program `/u/martin/ws/tiger/build/linux-amd64/j2sdk-image/bin/java' interrupted! (timed out?)


      test result: Failed. Execution failed: Program `/u/martin/ws/tiger/build/linux-amd64/j2sdk-image/bin/java' interrupted! (timed out?)

      Here is an extract from a failing CancelledProducerConsumerLoops.jtr file:

      ----------System.out:(14/490)----------
      Pairs:1
      ArrayBlockingQueue 3.0553042s run time
      LinkedBlockingQueue 2.2291932s run time
      SynchronousQueue 2.485516s run time
      Pairs:2
      ArrayBlockingQueue 4.0495974s run time
      LinkedBlockingQueue 3.4787056s run time
      SynchronousQueue 3.558907s run time
      Pairs:3
      ArrayBlockingQueue 4.8086858s run time
      LinkedBlockingQueue 3.1188338s run time
      SynchronousQueue 3.3789728s run time
      Pairs:5
      ArrayBlockingQueue 3.3516484s run time
      ----------System.err:(0/0)----------
      result: Failed. Execution failed: Program `/u/martin/ws/tiger/build/solaris-sparc/j2sdk-image/bin/java' interrupted! (timed out?)


      test result: Failed. Execution failed: Program `/u/martin/ws/tiger/build/solaris-sparc/j2sdk-image/bin/java' interrupted! (timed out?)

      I notice there are lingering processes after the test has stopped
      running, which must be killed manually.

      martin 15329 0.0 2.1 1344824 21604 pts/2 S 15:17 0:00 /u/martin/ws/tiger/build/linux-amd64/j2sdk-image/bin/java -Dtest.src=/export/martin/ws/canWrite/test/java/util/concurrent/BlockingQueue -Dtest.classes=/export/martin/ws/canWrite/test/java/util/concurrent/BlockingQueue/JTwork/classes/java/util/concurrent/BlockingQueue com.sun.javatest.regtest.MainWrapper /export/martin/ws/canWrite/test/java/util/concurrent/BlockingQueue/JTwork/classes/java/util/concurrent/BlockingQueue/CancelledProducerConsumerLoops.jta


      ###@###.### 2004-03-16

        Issue Links

          Activity

          Hide
          defectconv Defect Conversion BT2 (Inactive) added a comment -
          BT2:EVALUATION


          ###@###.### 2004-04-09

          This is what I found. I got 2 kinds of hangs.
          One when it hangs without any message on amd64 and I could
          not connect gdb to it. The second with unreachable safepoint:

          # SafepointSynchronize::begin: Fatal error:
          # SafepointSynchronize::begin: Timed out while attempting to reach a safepoint.
          # SafepointSynchronize::begin: Threads which did not reach the safepoint:

          I reproduced the last hang on solaris_x86 (I know dbx better and I was able
          to debug my binaries).

          The loop "do {} while (n-- > 0);" in ReentrantLockLoop::run()
          is countable even if the limit is "ITERS = Integer.MAX_VALUE".
          We don't put safepoints in countable loops assuming it will finish
          in reasonable time.

          The scenario is simple: one thread acquired 'lock' (lock.tryLock() == true)
          and now processing the loop body. Other threads are in barrier.await() or
          waiting lock.tryLock(). At this time VM got a request for an operation and
          forced all threads to go to safepoint state. All threads which are in
          barrier.await() and lock.tryLock() go to safepoint (I have frames traces
          which confirms this). The only thread left is one which processed the loop body.
          And now the 'lock' can be easy acquired by this thread
          since all other threads are in safepoint state and don't look for 'lock'.

          The are only 2 places in a compiled method where we are placing
          safepoints: before 'return' and at the end of uncountable loops or
          loops where at least one call always executed (assuming we hit
          the safepoint on the return from the call).

          As I pointed above we don't have safepoint at the end of the loop.
          Due to optimization the methods compute1() and compute2(x) are fully
          inlined so we don't have safepoints for their 'return's.
          The methods lock.tryLock() and lock.unlock() are inlined partially:
          the code still has some nested calls (calls in tryLock() and unlock()).
          But the execution only of the inlined parts of lock.tryLock() and lock.unlock()
          is enough to acquire the 'lock' since all other threads are in safepoint state.

          I executed the compiled code of ReentrantLockLoop::run() by steps in dbx
          in this state and saw that we indeed never calls any functions.
          This is why we avoid any safepoints until the end of iterations
          and it is relatively long time (2**31-1 iterations).
          On perf-lx2 (2x1.4GHz P3):

          Threads: 18 0.225942996s run time
          Threads: 27 397.664724057s run time
          Exception in thread "main" java.lang.Error: Some thread completed instead of timing out
                  at TimeoutLockLoops$ReentrantLockLoop.test(TimeoutLockLoops.java:69)
                  at TimeoutLockLoops.main(TimeoutLockLoops.java:32)

          The compiler should somehow figure out that a countable loop may take
          long time to execute and place safepoint into it.

          The fast solution (for 1.5.0 beta2) is to add a new flag to force
          safepoint in countable loops. It should be 'false' by default since
          the performance can be hit very hard. But we will get the workaround
          for customers who get this problem.
          And we can use the flag for these failed (timeout) test cases.

          The possible long solution (for next release) is the implementation of
          "strip mining" when you transform a countable loop into two nested
          loops and place safepoint into external loop.
          Show
          defectconv Defect Conversion BT2 (Inactive) added a comment - BT2:EVALUATION ###@###.### 2004-04-09 This is what I found. I got 2 kinds of hangs. One when it hangs without any message on amd64 and I could not connect gdb to it. The second with unreachable safepoint: # SafepointSynchronize::begin: Fatal error: # SafepointSynchronize::begin: Timed out while attempting to reach a safepoint. # SafepointSynchronize::begin: Threads which did not reach the safepoint: I reproduced the last hang on solaris_x86 (I know dbx better and I was able to debug my binaries). The loop "do {} while (n-- > 0);" in ReentrantLockLoop::run() is countable even if the limit is "ITERS = Integer.MAX_VALUE". We don't put safepoints in countable loops assuming it will finish in reasonable time. The scenario is simple: one thread acquired 'lock' (lock.tryLock() == true) and now processing the loop body. Other threads are in barrier.await() or waiting lock.tryLock(). At this time VM got a request for an operation and forced all threads to go to safepoint state. All threads which are in barrier.await() and lock.tryLock() go to safepoint (I have frames traces which confirms this). The only thread left is one which processed the loop body. And now the 'lock' can be easy acquired by this thread since all other threads are in safepoint state and don't look for 'lock'. The are only 2 places in a compiled method where we are placing safepoints: before 'return' and at the end of uncountable loops or loops where at least one call always executed (assuming we hit the safepoint on the return from the call). As I pointed above we don't have safepoint at the end of the loop. Due to optimization the methods compute1() and compute2(x) are fully inlined so we don't have safepoints for their 'return's. The methods lock.tryLock() and lock.unlock() are inlined partially: the code still has some nested calls (calls in tryLock() and unlock()). But the execution only of the inlined parts of lock.tryLock() and lock.unlock() is enough to acquire the 'lock' since all other threads are in safepoint state. I executed the compiled code of ReentrantLockLoop::run() by steps in dbx in this state and saw that we indeed never calls any functions. This is why we avoid any safepoints until the end of iterations and it is relatively long time (2**31-1 iterations). On perf-lx2 (2x1.4GHz P3): Threads: 18 0.225942996s run time Threads: 27 397.664724057s run time Exception in thread "main" java.lang.Error: Some thread completed instead of timing out         at TimeoutLockLoops$ReentrantLockLoop.test(TimeoutLockLoops.java:69)         at TimeoutLockLoops.main(TimeoutLockLoops.java:32) The compiler should somehow figure out that a countable loop may take long time to execute and place safepoint into it. The fast solution (for 1.5.0 beta2) is to add a new flag to force safepoint in countable loops. It should be 'false' by default since the performance can be hit very hard. But we will get the workaround for customers who get this problem. And we can use the flag for these failed (timeout) test cases. The possible long solution (for next release) is the implementation of "strip mining" when you transform a countable loop into two nested loops and place safepoint into external loop.
          Hide
          defectconv Defect Conversion BT2 (Inactive) added a comment -
          BT2:CONVERTED DATA

          BugTraq+ Release Management Values

          COMMIT TO FIX:
          dragon

          Show
          defectconv Defect Conversion BT2 (Inactive) added a comment - BT2:CONVERTED DATA BugTraq+ Release Management Values COMMIT TO FIX: dragon
          Hide
          kvn Vladimir Kozlov added a comment -
          BT2:SUGGESTED FIX

          Copied from Evaluation:

          The possible long solution (for next release) is the implementation of
          "strip mining" when you transform a countable loop into two nested
          loops and place safepoint into external loop.
          Show
          kvn Vladimir Kozlov added a comment - BT2:SUGGESTED FIX Copied from Evaluation: The possible long solution (for next release) is the implementation of "strip mining" when you transform a countable loop into two nested loops and place safepoint into external loop.
          Hide
          vlivanov Vladimir Ivanov added a comment -
          ILW = HLL = P4
          I = H: non-responsive application
          L = L: long running counted loop
          W = L: -XX:+UseCountedLoopSafepoints
          Show
          vlivanov Vladimir Ivanov added a comment - ILW = HLL = P4 I = H: non-responsive application L = L: long running counted loop W = L: -XX:+UseCountedLoopSafepoints
          Hide
          dholmes David Holmes added a comment -
          I would argue that W = M because it may not be possible for the end user to control the command line eg browser based apps.
          Show
          dholmes David Holmes added a comment - I would argue that W = M because it may not be possible for the end user to control the command line eg browser based apps.
          Hide
          iignatyev Igor Ignatyev added a comment -
          JDK-8009993 (P3) was closed as a dup of this, increasing priority to P3
          Show
          iignatyev Igor Ignatyev added a comment - JDK-8009993 (P3) was closed as a dup of this, increasing priority to P3
          Hide
          iignatyev Igor Ignatyev added a comment -
          ILW = HLL = P4 sounds more reasonable for this bug, lowering back to P4.
          Show
          iignatyev Igor Ignatyev added a comment - ILW = HLL = P4 sounds more reasonable for this bug, lowering back to P4.
          Hide
          kvn Vladimir Kozlov added a comment -
          Implementation suggested by Roland in RH is tracked by JDK-8186027. I will closed this bug when/if we push that implementation.
          Show
          kvn Vladimir Kozlov added a comment - Implementation suggested by Roland in RH is tracked by JDK-8186027 . I will closed this bug when/if we push that implementation.

            People

            • Assignee:
              kvn Vladimir Kozlov
              Reporter:
              martin Martin Buchholz
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Imported:
                Indexed: