Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8230571

[TESTBUG] JFR: Extreme value testing

    XMLWordPrintable

    Details

    • Subcomponent:
      jfr

      Description

      Over the years we have seen several bugs reported by users where JFR has been used, or configured, with values/settings outside normal range. For example, setting the stack depth to 1024 or running an application where thousand new threads are started every second.

      It's expected that there will be more overhead when using extreme values, but the overhead should be proportional to the size of the value. For example, a stack depth of 1024 should not use more than roughly 16 times the memory, safepoint time or CPU usage than of a default stack depth of 64. Sometimes even when the overhead is proportional to the value, the issue is hidden in the noise A 5 ms safepoint may go unnoticed for a typical run, but becomes a real issue when a 100 times larger value is used in production.

      The JFR implementation has been designed to run algorithms in linear time, but it is easy to miss complexity problems when one subsystem calls into another and the method is expected to return in constant time, For small values in nightly testing things seem fine and then the bug is found after release in users production environment.

      To prevent these kind of bugs, I think it may be worth spending a few days trying to provoke issues by setting/using extreme values,

      We should not spend overly amount of time writing unit tests. These issues are typically not caused by regressions, but when code is checked in the first time. A small local test app or manual check on command line can probably find 95-99% of the issues over the lifetime of the product, but to much less cost than a unit test (reviews, wasted time on intermittent failures, test machine resources...). The tests are unstable / slow running by nature and we want the most bang for the buck. Chasing false positives will only steal time from other activities (reducing overall quality and rate of innovation)

      The bugs can be in the JVM, in the jdk.jfr module or the parser API. Here are some things I think could be problematic.

      - Start a recording, create 100 000 event classes and commit an event for each event class, stop the recording and run it through the parser
      - Change 100 000 settings simultaneously using Recording::setSettings(...)
      - Change 100 000 settings incrementally using Recording::enable(...).
      - Create a .jfc file with 100 000 event / setting entries
      - Start 100 000 recordings, run them simultaneously and then stop them.
      - Run with 100 000 classes and class loaders (in the event payload).
      - Run with 100 000 modules.
      - Set -XX:FlightRecorderOptions=stackdepth=2048, emit 100 000 events with a deep stack (not all the same) and verify the stacks with the parser
      - Create 10 000 chunks in the disk repository and try to dump a recording.
      - Start 100 000 threads with one event per thread
      - Run an application with 25 000 live threads.
      - Set -XX:FlightRecordingOptions=old-object-queue-size=100000, create leaking objects that fill the queue and dump a recording with path-to-gc-roots=true. Also try with stackTrace=true
      - Set number of global buffers to 10 000 - 100 000 (may need to reduce size of each buffer to fit RAM):
      - Create 100 000 events where each event has a unique string (more than 32 characters) to find issues with a large string pool.

      To stress the system, testing should be run using profile.jfc where applicable and results verified in the parser. Memory usage, safepoint time and CPU usage should not increase in an unexpected way when recording, dumping or parsing.




      For example, To investigate if JFR can handle many classes or class loaders, I would do something like this:

      /**
       * This is not a unit test, but a testapp that can be
       * used to provoke bugs or performance issues when many
       * classes or class loaders are being used.
       *
       * Example usage:
       *
       * java -XX:StartFlightRecording:settings=profile
       * TestManyClasses <class-loader-count> <classes-per-class-loader>>
       */
      public class TestManyClasses {

          private static List<ClassLoader> loaders = new ArrayList<>();
          
          static class TestEvent {
              Class<> theClass;
          }

          static class ClassGenerator extends ClassLoader {
            // to be implemented
          }
       
          public static void main(String[] args) throws Throwable {
              int classLoaderCount = Integer.parseInt(args[0]);
              int classCount = Integer.parseInt(args[1]);
              for (int i = 0; i <classCount; i++) {
                  ClassGenerator cg = new ClassGenerator();
                  loaders.add(tcl);
                  for (int j = 0; j < classCount; j++) {
                      Class<?> theClass = cg.newClass();
                      theClass.newInstance();
                      TestEvent event = new TestEvent();
                      event.theClass = theClass;
                      event.commit();
                  }
              }
          }
      }

      I would then test it on my local machine to get some feeling for what is happening, i.e

      $ time java -XX:StartFlightRecording:settings=profile,dumponexit=true TestManyClasses 100 100000
      $ time java -XX:StartFlightRecording:settings=profile,dumponexit=true TestManyClasses 10 1000000
      $ time java -XX:StartFlightRecording:settings=profile,dumponexit=true TestManyClasses 1 10000000
      $ time java -XX:StartFlightRecording:settings=profile,dumponexit=true TestManyClasses 100000 100
      $ time java -XX:StartFlightRecording:settings=profile,dumponexit=true TestManyClasses 1000000 10
      $ time java -XX:StartFlightRecording:settings=profile,dumponexit=true TestManyClasses 10000000 1

      Does it take a second or 10 minutes. Does it increase linearly or quadratically? I would use even higher values just to see where things break down (may not be related to JFR). Tests that involves old object sample, I would run on a TB machine.

      I would see how long safepoints caused by JFR are:

      $ jfr print —events ExecuteVMOperation <recording-file>

      and the amount of memory being used:

      $ jfr print —events PhysicalMemory <recording-file>

      I would also check how long time it takes to parse the recording files using another small program

      public class TestParse {
          public static RecordedEvent event;

          public static void main(String... args) {
              RecordingFile rf = new RecordingFile(Paths.get(args[0]));
              while (rf.hasMoreEvents()) {
                  event = rf.readEvent();
              }
          }
      }

      $ time java TestParse -XX:StartFlightRecording:dumponexit=true <recoding-file>

      If things are unexpected slow, I would look at the resulting recording file and see if there is one method that easts up all the CPU. I would look at how much memory is being used. Does it look reasonable?

      I would check how much space in the file that is being used by the checkpoint event. Does it scale linearly?

      $ jfr summary <recording-file>




        Attachments

          Issue Links

            Activity

              People

              Assignee:
              mseledtsov Mikhailo Seledtsov
              Reporter:
              egahlin Erik Gahlin
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: