Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8172781

testParkTimesOut_parkUntil(LockSupportTest) hangs

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: P4
    • Resolution: Not an Issue
    • Affects Version/s: 9
    • Fix Version/s: None
    • Component/s: core-libs
    • Labels:
      None
    • Subcomponent:
    • Introduced In Build:
      b151
    • Introduced In Version:
      9
    • CPU:
      generic
    • OS:
      linux

      Description

      JSR166TestCase is failing with `main' threw exception: java.lang.Exception: JUnit test failure

      junit.framework.AssertionFailedError: timed out waiting for thread to terminate

      Attached the logs for more infromation
      1. JSR166TestCase.txt
        47 kB
        Abdul Kolarkunnu
      2. JSR166TestCase#id1.txt
        41 kB
        Abdul Kolarkunnu

        Issue Links

          Activity

          Hide
          martin Martin Buchholz added a comment -
          Future versions of these tests will have better handlng of large timeout factors and fewer infinite retry loops but one rogue non-terminating test thread will continue to be able to timeout all the tck tests when viewed as a single jtreg test; a (tolerable) lack of test isolation.
          Show
          martin Martin Buchholz added a comment - Future versions of these tests will have better handlng of large timeout factors and fewer infinite retry loops but one rogue non-terminating test thread will continue to be able to timeout all the tck tests when viewed as a single jtreg test; a (tolerable) lack of test isolation.
          Hide
          dholmes David Holmes added a comment -
          The problem is that the linux system I am using seems to permanently exhibit the Linux Kernel Leap Second Bug:

          http://lkml.iu.edu/hypermail/linux/kernel/1207.1/01408.html
          http://stackoverflow.com/questions/11769687/pthread-cond-timedwait-returns-one-second-early

          I can see the elapsed time for calls to cond_timwedwait return 1 second early:

           /scratch/dh198349/tests > ./check_timedwait
          Timeout in millis: 15
          Elapsed time: 0
          Timeout in millis: 30
          Elapsed time: 0
          Timeout in millis: 60
          Elapsed time: 0
          Timeout in millis: 120
          Elapsed time: 0
          Timeout in millis: 240
          Elapsed time: 0
          Timeout in millis: 480
          Elapsed time: 0
          Timeout in millis: 960
          Elapsed time: 0
          Timeout in millis: 1920
          Elapsed time: 920
          Timeout in millis: 3840
          Elapsed time: 2840
          Timeout in millis: 7680
          Elapsed time: 6680

          but only for 64-bit! For 32-bit all is good:

          /scratch/dh198349/tests > ./check_timedwait32
          Timeout in millis: 15
          Timed-out
          Elapsed time: 15
          Timeout in millis: 30
          Timed-out
          Elapsed time: 30
          Timeout in millis: 60
          Timed-out
          Elapsed time: 60
          Timeout in millis: 120
          Timed-out
          Elapsed time: 120
          Timeout in millis: 240
          Timed-out
          Elapsed time: 240
          ...

          Of course there's no reason any machine today should be exhibiting that exact bug, nor why it should only affect 64-bit binaries, so it has to be something else.

          That said I employed the

          date -s "`date`"

          workaround and it is fixed!

          In our test code the parkUntil is returning immediately and we get a busy loop. I thought that loop should still terminate after 12ms but it's not written that way as it resets the 'start time' on each iteration! So eventually awaitTermination times out. Unless by chance the busy thread gets descheduled long enough at the right point in the loop and we suddenly see >12ms elapsing and so we terminate and pass.

          So that explains why the test failures were somewhat intermittent.
          Show
          dholmes David Holmes added a comment - The problem is that the linux system I am using seems to permanently exhibit the Linux Kernel Leap Second Bug: http://lkml.iu.edu/hypermail/linux/kernel/1207.1/01408.html http://stackoverflow.com/questions/11769687/pthread-cond-timedwait-returns-one-second-early I can see the elapsed time for calls to cond_timwedwait return 1 second early:  /scratch/dh198349/tests > ./check_timedwait Timeout in millis: 15 Elapsed time: 0 Timeout in millis: 30 Elapsed time: 0 Timeout in millis: 60 Elapsed time: 0 Timeout in millis: 120 Elapsed time: 0 Timeout in millis: 240 Elapsed time: 0 Timeout in millis: 480 Elapsed time: 0 Timeout in millis: 960 Elapsed time: 0 Timeout in millis: 1920 Elapsed time: 920 Timeout in millis: 3840 Elapsed time: 2840 Timeout in millis: 7680 Elapsed time: 6680 but only for 64-bit! For 32-bit all is good: /scratch/dh198349/tests > ./check_timedwait32 Timeout in millis: 15 Timed-out Elapsed time: 15 Timeout in millis: 30 Timed-out Elapsed time: 30 Timeout in millis: 60 Timed-out Elapsed time: 60 Timeout in millis: 120 Timed-out Elapsed time: 120 Timeout in millis: 240 Timed-out Elapsed time: 240 ... Of course there's no reason any machine today should be exhibiting that exact bug, nor why it should only affect 64-bit binaries, so it has to be something else. That said I employed the date -s "`date`" workaround and it is fixed! In our test code the parkUntil is returning immediately and we get a busy loop. I thought that loop should still terminate after 12ms but it's not written that way as it resets the 'start time' on each iteration! So eventually awaitTermination times out. Unless by chance the busy thread gets descheduled long enough at the right point in the loop and we suddenly see >12ms elapsing and so we terminate and pass. So that explains why the test failures were somewhat intermittent.
          Hide
          jeff Jeff Dinkins added a comment -
          Since we're almost out of time on 9, I made a best guess that this is not a 9 stopper. If that was the wrong judgement, please feel free to remove tbd_minor, but also please update why this may be more important than I'm assuming.
          Show
          jeff Jeff Dinkins added a comment - Since we're almost out of time on 9, I made a best guess that this is not a 9 stopper. If that was the wrong judgement, please feel free to remove tbd_minor, but also please update why this may be more important than I'm assuming.
          Hide
          dholmes David Holmes added a comment -
          This would appear to be an OS issue, but I can't confirm the situation with OEL 6.0. I think we can close this again.
          Show
          dholmes David Holmes added a comment - This would appear to be an OS issue, but I can't confirm the situation with OEL 6.0. I think we can close this again.
          Hide
          dholmes David Holmes added a comment -
          Apparent OS issue.
          Show
          dholmes David Holmes added a comment - Apparent OS issue.

            People

            • Assignee:
              dholmes David Holmes
              Reporter:
              akolarkunnu Abdul Kolarkunnu
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: