Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8229391

Improve performance for com.sun.management.ThreadMXBean#getThreadAllocatedBytes(long) for current thread

    Details

    • Type: Enhancement
    • Status: Open
    • Priority: P4
    • Resolution: Unresolved
    • Affects Version/s: 8
    • Fix Version/s: tbd
    • Component/s: core-svc
    • Labels:
      None

      Description

      This issue is created to provide an additional optimization for com.sun.management.ThreadMXBean#getThreadAllocatedBytes(long) method as suggested by Paul Hohensee in the comment to JDK-8185005.

      "Paul Hohensee added a comment - 2019-05-23 14:09
      Further contribution by Matt Bonner <matthew.bonnar@veeva.com> and Nathan Janken <nathan.janken@veeva.com>.

      Change Overview

      The Corretto-8 JDK includes an optimization for an internal JVM function that looks up threads by ids (see https://github.com/corretto/corretto-8/commit/47886d111152b18428b9a6fc7df2b0d6081e02b9). That change results in significant performance improvements, including cases where thread-allocated memory needs to be checked. However there is still room for further improvements in scenarios where threads check their own allocated bytes. Our initial benchmarks show that in multi-threaded scenarios the time complexity can be reduced from O(n) to O(1), where n is the number of threads performing concurrent memory allocation checks. In these types of scenarios performance is often critical. For instance these checks may be running several times within a single application request in order to profile memory usage.

      With the current Corretto-8 JDK implementation, when a thread's allocated bytes are checked, e.g. by using com.sun.management.ThreadMXBean#getThreadAllocatedBytes(long), the Thread_lock mutex must be acquired in order to call Threads::find_java_thread_from_java_tid. In scenarios where a thread is checking its own allocated bytes it does not need to find the thread because it is readily available.

      The logic for the proposed change is as follows:

      When a request is made to get a thread's allocated bytes, check if there is only one thread being requested and whether that thread is the current thread
              If it is, simply return the allocated bytes for the current thread
              If it isn't, fall back to the lock acquisition/thread look-up

      Git Diff

      diff --git a/src/hotspot/src/share/vm/services/management.cpp b/src/hotspot/src/share/vm/services/management.cpp
      index a8e6b0b2..02aff6d1 100644
      --- a/src/hotspot/src/share/vm/services/management.cpp
      +++ b/src/hotspot/src/share/vm/services/management.cpp
      @@ -2236,6 +2236,17 @@ JVM_ENTRY(void, jmm_GetThreadAllocatedMemory(JNIEnv *env, jlongArray ids,
                    "the given array of thread IDs");
        }
      + if (num_threads == 1 && THREAD->is_Java_thread()) {
      + JavaThread* current_thread = (JavaThread*)THREAD;
      +
      + if (ids_ah->long_at(0) == java_lang_Thread::thread_id(current_thread->threadObj())) {
      + // If the only thread being requested is the current java thread,
      + // simply return its allocated bytes.
      + sizeArray_h->long_at_put(0, current_thread->cooked_allocated_bytes());
      + return;
      + }
      + }
      +
        MutexLockerEx ml(Threads_lock);
        for (int i = 0; i < num_threads; i++) {
          JavaThread* java_thread = Threads::find_java_thread_from_java_tid(ids_ah->long_at(i));

      Performance Test Results

      To test the performance impact of this change, we set up a few scenarios where we could compare the overall run time. Our approach involved running and timing two simple Java programs: one that makes use of our proposed optimization (Self-Monitoring Threads) and one that does not but hits the affected code path (Global Monitoring Thread).

      Self Monitoring Threads

      For this case we ran a Java program that sets multiple "task" threads that each call com.sun.management.ThreadMXBean#getThreadAllocatedBytes(long) in a loop with a fixed amount of iterations, and we timed the execution with a varying amount of task threads. See attached file self-monitoring-results.png.

      Global Monitoring Thread

      We also wanted to ensure that the proposed change does not negatively impact another common use case that this API supports: the case where one global monitoring thread performs memory allocation checks against several other threads.

      To do so, we ran a Java program that sets up multiple "task" threads that perform simple memory allocations (instantiating Strings) in a loop, while the originating thread calls com.sun.management.ThreadMXBean#getThreadAllocatedBytes(long) for each of the task threads in a loop with a fixed amount of iterations. Similar to the previous case, we then timed the execution with a varying amount of task threads. See attached file global-monitoring-results.png.

      Unit Testing

      We decided not to add a new unit test for this change because we didn't believe it to be practical since this change is solely intended to optimize performance. There should be no change in functional behavior and the existing functionality is already covered by a unit test (src/jdk/test/com/sun/management/ThreadMXBean/ThreadAllocatedMemory.java) which passes with our proposed patch applied. "

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                dtitov Daniil Titov
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated: