Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8284289

Improved way of obtaining call traces asynchronously for profiling



    • Type: JEP
    • Status: Submitted
    • Priority: P4
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: hotspot
    • Labels:
    • Author:
      Johannes Bechberger
    • JEP Type:
    • Exposure:
    • Subcomponent:
    • Scope:
    • Effort:
    • Duration:



      Define an efficient, secure, and supported API for asynchronous stack traces with information on Java and native frames.


      • Provide a supported API for external profilers to obtain information on Java and native frames.
      • Unify stack walking for profiling to improve its maintainability and stability.
      • Support asynchronous usage as well as calling the API from signal handlers.
      • The implementation does not affect the performance of an unprofiled JVM.
      • Memory requirements for the collected data don't significantly increase compared to the existing AsyncGetCallTrace routine.


      • It is not a goal to create a new JVMTI extension.
      • The new API shall not be recommended for production usage.


      The AsyncGetCallTrace routine has seen increasing use in recent years in profilers like async-profiler with almost all available profilers, open-source and commercial, using it. But it is only an internal API, as it is not exported in any header, and the information on frames it returns is pretty limited: Only the method and byte code index for Java frames is captured. Both make implementing profilers and related tooling harder. Tools like async-profiler have to resort to complicated code to at least partially obtain information that the JVM already has. Information that is currently hidden and impossible to get is

      • whether a compiled Java frame is inlined which is currently only obtainable for the topmost compiled frames
      • the compilation level of a Java frame (e.g. C1 or C2 compiled)
      • C/C++ frames that are not at the top of the stack

      Such data can be helpful when profiling and tuning a VM for a given application and also for profiling code that uses JNI heavily.

      There are two stack walking implementations for profiling (in JFR and AsyncGetCallTrace) that benefit from being unified. This improves maintainability and stability by removing redundant code and increasing coverage.


      This JEP proposes an AsyncGetCallTrace2 API which is modeled after AsyncGetCallTrace:

      void AsyncGetCallTrace2(CallTrace *trace, jint depth, void* ucontext,
                              uint32_t options);

      This API can be called by profilers to obtain the call trace for the current thread. Calling this API from a signal-handler is safe and the new implementation will be at least as stable as AsyncGetCallTrace or the JFR stack walking code. The VM fills in information about the frames and the number of frames. The caller of the API should allocate the CallTrace structure with enough memory for the requested stack depth.


      • trace: buffer for structured data to be filled in by the JVM
      • depth: maximum depth of the call stack trace
      • ucontext: optional ucontext_t of the current thread when it was interrupted
      • options: bit set for options, currently only the lowest bit is considered, it enables (1) and disables (0) the inclusion of C/C++ frames, all other bits are considered to be 0

      The trace struct

      typedef struct {
        jint num_frames;                // number of frames in this trace
        CallFrame *frames;              // frames
        void* frame_info;               // more information on frames
      } CallTrace;

      is filled by the VM. Its num_frames field contains the actual number of frames in the frames array or an error code. The frame_info field in that structure can later be used to store more information but is currently supposed to be NULL.

      The error codes are a subset of the error codes for AsyncGetCallTrace, with the addition of THREAD_NOT_JAVA related to calling this procedure for non-Java threads:

      enum Error {
        NO_JAVA_FRAME         =   0,
        NO_CLASS_LOAD         =  -1, 
        GC_ACTIVE             =  -2,    
        UNKNOWN_NOT_JAVA      =  -3,
        NOT_WALKABLE_NOT_JAVA =  -4,
        UNKNOWN_JAVA          =  -5,
        UNKNOWN_STATE         =  -7,
        THREAD_EXIT           =  -8,
        DEOPT                 =  -9,
        THREAD_NOT_JAVA       = -10

      Every CallFrame is the element of a union, as the information stored for Java and non-Java frames differs:

      typedef union {
        FrameTypeId type;     // to distinguish between JavaFrame and NonJavaFrame 
        JavaFrame java_frame;
        NonJavaFrame non_java_frame;
      } CallFrame;

      There a several distinguishable frame types:

      enum FrameTypeId : uint8_t {
        FRAME_JAVA         = 1, // JIT compiled and interpreted
        FRAME_JAVA_INLINED = 2, // inlined JIT compiled
        FRAME_NATIVE       = 3, // native wrapper to call C methods from Java
        FRAME_STUB         = 4, // VM generated stubs
        FRAME_CPP          = 5  // C/C++/... frames

      The first two types are for Java frames for which we store the following information in a struct of type JavaFrame:

      typedef struct {     
        FrameTypeId type;            // frame type
        uint8_t comp_level;      // compilation level, 0 is interpreted
        uint16_t bci;            // 0 < bci < 65536
        jmethodID method_id;
      } JavaFrame;               // used for FRAME_JAVA and FRAME_JAVA_INLINED

      The comp_level states the compilation level of the method related to the frame with higher numbers representing "more" compilation. 0 is defined as interpreted. It is modeled after the CompLevel enum in compiler/compilerDefinitions but is dependent on the used compiler infrastructure.

      Information on all other frames is stored in the NonJavaFrame struct:

      typedef struct {
        FrameTypeId type;  // frame type
        void *pc;          // current program counter inside this frame
      } NonJavaFrame;  

      Although the API provides more information on the frames, the amount of space required per frame (e.g. 16 bytes on x86) is the same as for the original AsyncGetCallTrace API.

      The underlying stack walking code can be unified such that AsyncGetCallTrace, AsyncGetCallTrace2, and the JFR call stack collection become thin wrappers for a single implementation.

      A prototype implementation can be found https://github.com/parttimenerd/jdk/tree/parttimenerd_asgct2 with a demo at https://github.com/parttimenerd/asgct2-demo/.


      Keep AsyncGetCallTrace as is, meaning a lack of maintenance and stability for a widely used de-facto API.

      Risks and Assumptions

      Returning information on C/C++ frames leaks implementation details, but this is also true for the Java frames of AsyncGetCallTrace as they leak details of the implementation of standard library files and include native wrapper frames.


      Unifying the existing profiling-related stack walking code allows for testing it more efficiently by combining the existing tests. The implementation of this JEP will also add new stress tests to find rare stability problems on all supported platforms. The idea is to run the profiling on a set of example programs (for example the dacapo and renaissance benchmark suites) repeatedly with small profiling intervals (<= 0.1ms). A prototype implementation can be found at https://github.com/parttimenerd/jdk-profiling-tester.




            jbechberger Johannes Bechberger
            jbechberger Johannes Bechberger
            Reviewed By:
            Christoph Langer, Jaroslav BachorĂ­k
            0 Vote for this issue
            6 Start watching this issue