Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6939126

System.nanoTime not increasing

    Details

    • Type: Bug
    • Status: Closed
    • Priority: P2
    • Resolution: Not an Issue
    • Affects Version/s: 6u18-rev
    • Fix Version/s: None
    • Component/s: hotspot
    • Labels:
    • Subcomponent:
    • CPU:
      sparc
    • OS:
      solaris_10

      Description

      Customer noticed a problem where threads that called
      java.util.concurrent.ArrayBlockingQueue.poll never returned, which
      caused severe downstream problems in our server. Customer isolated the
      problem down to System.nanoTime not increasing. The value returned by
      this method has not changed in several hours.

      The files in this package demonstrate that System.nanoTime() is not
      advancing and that calls to ArrayBlockingQueue.poll() do not return.

      The TimeDebugTask class was loaded into our server and the runTask
      method was invoked. The output of this method can be see in the
      TimeDebugTask.stderr file. You'll notice that all six calls to
      System.nanoTime() returned the same value (5187510361303074) even
      though System.currentTimeMillis() advanced by over a second.

      You'll also see that the call to sleeper.sleep(1000L), which
      internally calls ArrayBlockingQueue.poll(), never returns until the
      thread is forceably interrupted. Here was the stack trace of the Task
      thread that was interrupted:

      "Task Thread 1 (R5433)" prio=3 tid=0x00000000054f4800 nid=0xd0 waiting on condition [0xfffffd6937bb9000]
         java.lang.Thread.State: TIMED_WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for <0xfffffd69c79808c8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
              at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963)
              at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:342)
              at com.unboundid.util.WakeableSleeper.sleep(WakeableSleeper.java:104)
              at com.unboundid.directory.server.tasks.TimeDebugTask.runTask(TimeDebugTask.java:110)
              at com.unboundid.directory.server.backends.task.Task.execute(Task.java:1179)
              at com.unboundid.directory.server.backends.task.TaskThread.runThread(TaskThread.java:195)
              at com.unboundid.directory.server.api.DirectoryThread.run(DirectoryThread.java:184)

      The JVM is still up and running in this state, and customer has a mechanism
      to load and run arbitrary code, so if you have some code that would
      help diagnose this, please let us know.

      Customer have a standalone Java server process running with JDK 1.6 update
      18 on a Sun x4600 with Solaris 10 (SunOS ospccr17 5.10
      Generic_127128-11 i86pc i386 i86pc). The JDK is using the following
      arguments:

      -d64 -server -Xmx89g -Xms89g -XX:MaxNewSize=2g -XX:NewSize=2g
       -XX:+UseConcMarkSweepGC -XX:+CMSConcurrentMTEnabled
       -XX:+CMSParallelRemarkEnabled -XX:+CMSParallelSurvivorRemarkEnabled
       -XX:ParallelCMSThreads=4 -XX:CMSMaxAbortablePrecleanTime=10000
       -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseParNewGC -XX:+UseMembar
       -XX:+UseBiasedLocking -XX:+UseLargePages -XX:PermSize=64M
       -XX:+UseNUMA -XX:+HeapDumpOnOutOfMemoryError
       -XX:-ReduceInitialCardMarks

      They have seen this problem on multiple x4600s, they believes this issue was introduced with jdk 1.6.0_18 with CR ID

       6863420 os::javaTimeNanos() go backward on Solaris x86

      and problem seems to only affect a single process after running for a long time.
      They have sent the following gc snipped that is very interesting:

        2010-04-10T00:28:45.082-0700: [CMS-concurrent-preclean: 18289640.709/18289642.986 secs] [Times: user=11.97 sys=7.69, real=18289642.99 secs]

      the full GC log is located attached.

      Also, looking at getTimeNanos() method implementation:

      // gethrtime can move backwards if read from one cpu and then a different cpu
      // getTimeNanos is guaranteed to not move backward on Solaris
      inline hrtime_t getTimeNanos() {
        if (VM_Version::supports_cx8()) {
          const hrtime_t now = gethrtime();
          // Use atomic long load since 32-bit x86 uses 2 registers to keep long.
          const hrtime_t prev = Atomic::load((volatile jlong*)&max_hrtime);
          if (now <= prev) return prev; // same or retrograde time;
          const hrtime_t obsv = Atomic::cmpxchg(now, (volatile jlong*)&max_hrtime, prev);
          assert(obsv >= prev, "invariant"); // Monotonicity
          // If the CAS succeeded then we're done and return "now".
          // If the CAS failed and the observed value "obs" is >= now then
          // we should return "obs". If the CAS failed and now > obs > prv then
          // some other thread raced this thread and installed a new value, in which case
          // we could either (a) retry the entire operation, (b) retry trying to install now
          // or (c) just return obs. We use (c). No loop is required although in some cases
          // we might discard a higher "now" value in deference to a slightly lower but freshly
          // installed obs value. That's entirely benign -- it admits no new orderings compared
          // to (a) or (b) -- and greatly reduces coherence traffic.
          // We might also condition (c) on the magnitude of the delta between obs and now.
          // Avoiding excessive CAS operations to hot RW locations is critical.
          // See http://blogs.sun.com/dave/entry/cas_and_cache_trivia_invalidate
          return (prev == obsv) ? now : obsv ;
        } else {
          return oldgetTimeNanos();
        }
      }

      This method will return 'prev' until 'prev > now':

      if (now <= prev) return prev; // same or retrograde time;

      But according to the comment, gethrtime can move backwards. So if it moves backwards and stays behind, then it will continue to return the same value (i.e. prev). This would definitely explain the behavior that they're seeing. Am I missing something?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                dmeetry Dmeetry Degrave (Inactive)
                Reporter:
                hnassiry Haroon Nassiry (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Imported:
                  Indexed: