Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8191093

Improve behavior when safepoint begin times out

    Details

      Description

      We recently had an instance of the PageArmed == 0 guarantee failing (see JDK-8155700 and JDK-8038480 for other cases of this). In our case we are pretty sure that the underlying cause was a flaky host that caused a thread (or threads) to somehow get stuck and never ack the safepoint. However, it would have been nice if the JVM handled the situation a bit better.

      What I'd like to improve is:

      First, eliminate the unintentional attempt inside the loop to arm the polling page when the {{iterations}} variable overflows.

      Second, introduce a time-based heuristic to force the JVM to abort when it's stuck in the loop for way too long. I think introducing a new cmd line arg to specify how long to wait before aborting (and setting it to something conservative, like 30 mins or an hour) is probably the best way. We could re-use the SafepointTimeout / DieOnSafepointTimeout args for this. However, I think it's nice to get a warning early (the current 10sec default for SafepointTimeoutDelay is reasonable IMHO) and aborting much later.

      Thoughts?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                rehn Robbin Ehn
                Reporter:
                tonyp Tony Printezis
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: