Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8274687

JDWP deadlocks if some Java thread reaches wait in blockOnDebuggerSuspend

    XMLWordPrintable

    Details

    • Subcomponent:
    • Resolved In Build:
      b24

      Backports

        Description

        Case 1: Deadlock on resume by debugger
        ======================================

        The JDWP agent deadlocks the vm if

        * A thread T is blocked in blockOnDebuggerSuspend because it called
          j.l.Thread.resume() on a thread "resumee" that is currently suspended by the
          debugger

        * The debugger tries to resume one or all threads

        because T owns handlerLock waiting for a resume by the debugger and the debugger
        needs handlerLock for the resume.

        Stacks on Deadlock
        ------------------

        ### Stack of Thread T

        #0 futex_wait_cancelable
        #1 __pthread_cond_wait_common
        #2 __pthread_cond_wait
        #3 os::PlatformEvent::park
        #4 JvmtiRawMonitor::simple_wait
        #5 JvmtiRawMonitor::raw_wait
        #6 JvmtiEnv::RawMonitorWait
        #7 debugMonitorWait
        #8 blockOnDebuggerSuspend
        #9 handleAppResumeBreakpoint
        #10 event_callback
        #11 cbBreakpoint
        #12 JvmtiExport::post_raw_breakpoint
        #13 InterpreterRuntime::_breakpoint

        ### JDWP Agent Stack

        #0 futex_wait_cancelable
        #1 __pthread_cond_wait_common
        #2 __pthread_cond_wait
        #3 os::PlatformEvent::park
        #4 JvmtiRawMonitor::simple_enter
        #5 JvmtiRawMonitor::raw_enter
        #6 JvmtiEnv::RawMonitorEnter
        #7 debugMonitorEnter
        #8 eventHandler_lock
        #9 threadControl_resumeThread
        #10 resume
        #11 debugLoop_run
        #12 connectionInitiated
        #13 attachThread
        #14 JvmtiAgentThread::call_start_function
        #15 JavaThread::thread_main_inner
        #16 Thread::call_run
        #17 thread_native_entry
        #18 start_thread
        #19 clone

        See attachment for jtreg reproducer.

        Case 2: Deadlock on JDWP Dispose command
        ========================================

        We see sporadic timouts running
        test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003 because
        the debuggee main thread and the JDWP agent thread deadlock with the following
        stacks:

        ### Debuggee Main Thread "M"

        #0 futex_wait_cancelable
        #1 __pthread_cond_wait_common
        #2 __pthread_cond_wait
        #3 os::PlatformEvent::park
        #4 JvmtiRawMonitor::simple_wait
        #5 JvmtiRawMonitor::raw_wait
        #6 JvmtiEnv::RawMonitorWait
        #7 debugMonitorWait
        #8 blockOnDebuggerSuspend
        #9 handleAppResumeBreakpoint
        #10 event_callback
        #11 cbBreakpoint
        #12 JvmtiExport::post_raw_breakpoint
        #13 InterpreterRuntime::_breakpoint

        ### JDWP Agent Thread "A"

        #0 futex_wait_cancelable
        #1 __pthread_cond_wait_common
        #2 __pthread_cond_wait
        #3 os::PlatformEvent::park
        #4 JvmtiRawMonitor::simple_enter
        #5 JvmtiRawMonitor::raw_enter
        #6 JvmtiEnv::RawMonitorEnter
        #7 debugMonitorEnter
        #8 eventHandler_free
        #9 threadControl_onDisconnect
        #10 debugLoop_run
        #11 connectionInitiated
        #12 attachThread
        #13 JvmtiAgentThread::call_start_function
        #14 JavaThread::thread_main_inner
        #15 Thread::call_run
        #16 thread_native_entry
        #17 start_thread
        #18 clone

        #### How to reproduce

        The deadlock will likely be reached with the following patch. Apply and run dispose003.

        --- a/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c
        +++ b/src/jdk.jdwp.agent/share/native/libjdwp/debugLoop.c
        @@ -180,6 +180,9 @@ debugLoop_run(void)
                     shouldListen = !lastCommand(cmd);
                 }
             }
        + /* Sleep to trigger deadlock in test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003 */
        + fprintf(stderr, "debugLoop: sleep\n");
        + sleep(1);
             threadControl_onDisconnect();
             standardHandlers_onDisconnect();

        #### Analysis

        M hit the internal breakpoint in j.l.Thread.resume()[1]. The resumee
        "testedThread" (named "thread2" in log output[2]) is currently suspended
        therefore M waits on threadLock until resumee is not suspended anymore while
        owning handlerLock (acquired in event_callback)[3].

        A should call threadControl_reset to resume all threads including "testedThread" so
        that M can continue but it is blocked before that in eventHandler_free trying to
        enter handlerLock owned by M.

        Note that the vm.dispose() call by the debugger immediately returns. Resuming
        all suspended threads is done asynchronously[4].

        [1] M calls j.l.Thread.resume() and hits the internal breakpoint set by the JDWP agent
            https://github.com/openjdk/jdk/blob/32811026ce5ecb1d27d835eac33de9ccbd51fcbf/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003a.java#L139

        [2] "testedThread" is named "thread2" in log output.
            https://github.com/openjdk/jdk/blob/32811026ce5ecb1d27d835eac33de9ccbd51fcbf/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003a.java#L137

        [3] M calls `blockOnDebuggerSuspend()` when hitting the internal
            breakpoint in j.l.Thread.resume(). There it waits while the resumee is
            suspended by the debugger.
            https://github.com/openjdk/jdk/blob/32811026ce5ecb1d27d835eac33de9ccbd51fcbf/src/jdk.jdwp.agent/share/native/libjdwp/threadControl.c#L749

        [4] vm.dispose() call by debugger returns immediately. Threads are resumed asynchronously.
            https://github.com/openjdk/jdk/blob/32811026ce5ecb1d27d835eac33de9ccbd51fcbf/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003.java#L228

          Attachments

            Issue Links

              Activity

                People

                Assignee:
                rrich Richard Reingruber
                Reporter:
                rrich Richard Reingruber
                Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                  Dates

                  Created:
                  Updated:
                  Resolved: