Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8240497

Optional Usage Logging facility

    Details

    • Type: JEP
    • Status: Submitted
    • Priority: P3
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: core-svc
    • Labels:
      None
    • Author:
      Laurence Cable
    • JEP Type:
      Feature
    • Exposure:
      Open
    • Scope:
      JDK
    • Discussion:
      serviceability dash dev at openjdk dot java dot net
    • Effort:
      S
    • Duration:
      S

      Description

      Summary

      Add the capability to the Java Runtime to optionally enable and capture logging of runtime usage information.

      Goals

      Enable administrators of JDK installations, particularly those at scale, to capture environmental and usage data in a common fashion, in order to provide greater visibility into usage of such installations.

      Motivation

      For many enterprise customers in particular, those that deploy, large, potentially heterogeneous, installations of JDK distributions across their computing environments, both on premise and especially in the cloud, it is highly desirable to have visibility into the usage of those environments.

      With the increasing adoption of various JDK distributions by enterprise customers; providing a common implementation of usage logging across those distributions would allow such customers to capture and retain such usage information from appropriately configured invocations of their various distributions and installations in a consistent, common fashion and format, independent of origin.

      Collection of such usage information, can facilitate transparent application and runtime compliance auditing, version tracking and lifecycle mgmt thereof, and enable the curation and correlation of other debugging and telemetry data emitted by those runtimes.

      Defining such common mechanisms will greatly simplify these administrative tasks in hetrogeneous environments and at scale.

      The Oracle JDK distribution already provides a similar mechanism to facilitate the optional capture of such usage data, described here (for reference only): Oracle Usage Tracker Description

      We propose to make similar, but not identical, functionality openly and generally available.

      Note this proposal is explicitly not constrained by any form of compatibility with the existing Oracle JDK implementation, but it does derive some of its architecture and interfaces from that prior implementation, where those appear to be of utility to this proposal.

      Description

      Usage logging captures useful static data from a JVM instance during its initialization, in a common format and via a common mechanism, externalizing that data to a common set of output "sinks" for subsequent consumption, processing and analysis.

      The static usage data we propose to capture from such suitably configured instances can be used to answer potential questions about runtime usage such as:

      • what version(s) of the JDK and JVM are being invoked, and from which distribution(s) and vendor(s)?
      • on what O.S platforms, versions, and Instruction Set Architectures are these instances executing?
      • on what host systems (and environments) are these instances executing?
      • when are these instances being invoked, and on whose behalf?
      • what is the execution context associated with these instances, i.e. what is being executed, with what options and switches specified that may affect the execution?
      • is there a common, unique identifier (UUID) that may be associated with this particular invocation instance for subsequent correlation of other relevant artifacts?
      • ...

      All of this information is already readily available via existing public APIs within the current implementation, from such sources as:

      • java.lang.System properties
      • various JMX APIs
      • other standard JDK APIs (e.g ProcessHandle.current().pid() or Instant.now() ...)
      • ...

      This proposal describes a mechanism to optionally externalize these values using common format(s) to a variety of external sinks (filesystem, HTTP(S) URL, JFR recording, Unix Domain Socket), and to do so based upon common configuration and invocation mechanism(s).

      The following runtime properties & values are proposed to be extracted, formatted and externalized:

      tr>
      NameValueSource
      timestamp(*)approximate *local* Date & Time of invocationRuntimeMXBean.getStartTime() or Instant.now()
      hostname(*)DNS Host nameInetAddress.getLocalHost().getCanonicalHostName()
      ip.address(*)Host IP ("0.0.0.0" if not available)InetAddress.getLocalHost().getHostAddress()
      java.argumentsapplication command line arguments, if anySystem.getProperty(...)
      java.versionJava versionSystem.getProperty(...)
      java.vendorJava vendor nameSystem.getProperty(...)
      java.vm.versionJVM versionSystem.getProperty(...)
      java.vm.vendorJVM vendor nameSystem.getProperty(...)
      os.nameOS platform nameSystem.getProperty(...)
      os.versionOS versionSystem.getProperty(...)
      os.archInstruction set architectureSystem.getProperty(...)
      jvm.argsJVM command line arguments, if anySystem.getProperty(...)
      classpathClass pathSystem.getProperty(...)
      jdk.module.pathModule path, if presentSystem.getProperty(...)
      jdk.module.upgrade.pathModule upgrade path, if presentSystem.getProperty(...)
      jdk.module.mainMain module name, if presentSystem.getProperty(...)
      jdk.module.main.classMain module class name, if presentSystem.getProperty(...)
      user.nameOS usernameSystem.getProperty(...)
      user.dirCurrent working directorySystem.getProperty(...)
      user.homeUser home directorySystem.getProperty(...)
      java.library.pathnative library pathSystem.getProperty(...)
      jvm.pid(*)OS process idRuntimeMXBean.getPid() or ProcessHandle.current().pid()
      jvm.uuid(*)UUID of JVMUUID.nameUUIDFromBytes(...)
      Additional (system) properties listed in properties file (see later)System.getProperty(...)

      (*) note these are "virtual" properties and are not found in System.Properties

      Implementation

      Usage Logging will occur (a)synchronously, as configured, as a "one time" occurrence during JVM startup.

      The JVM implementation already contains a suitable startup hook implementation: jdk.internal.vm.PostVMInitHook() which is invoked via an "upcall" from the JVM native code, after initialization, but prior to execution of any "user" code.

      This proposal would utilize this hook in order to perform any usage logging.

      Usage logging is enabled by the detecting the presence of a suitable properties file located at particular, predetermined default location(s), or via a property uttered on the invoking command line of the runtime, describing the name & location of a property file.

      The configuration file shall be discovered (in order, if present), in one of (3) possible locations :

      1. located "per invocation", specified by uttering a java command line property option, of the form: -Djdk.usagelogger.config.file=<path> describing the path to the named property file.
      2. located "per invocation", specified by the value of the O.S environment variable JDK_USAGELOGGER_CONFIG_FILE.
      3. located at a "per installation location", enabling usage logging for all invocations of that particular runtime installation in which the default configuration file resides:
        • a usagelogger.properties file is located in the conf/management subdirectory of each configured runtime installation hierarchy.
      4. located at a "global" location, O.S dependent, enabling usage logging for all runtime environments installed, and with read access to that global location:
        • on Linux, and other *NIX systems, the global location of the default config is: /etc/java/conf/management/usagelogger.properties.
        • on Mac OS(tm), this global location is: /Library/Application\ Support/Java/conf/management/usagelogger.properties
        • on Windows(tm) platforms this global location is: %Program Files%\Java\conf\management\usagelogger.properties
        • other environments, these location(s) are undefined by this proposal.

      These locations are searched "in order" by a runtime, and the 1st location searched containing a readable properties file, provides the configuration for that invocation, thus overriding any other, more general configurations that may be present.

      Note that usage logging is enabled, merely by the presence (and detection) of a readable configuration file, loading and parsing the contents of the configuration file itself is deferred; any mis-configuration within a property file is silently ignored by the runtime, and no further usage logging shall occur, in particular the properties file location algorithm is not resumed if an invalid properties file is located and fails to parse.

      The intent is that any such errors shall not cause a failure of the runtime itself, merely a silent failure of the usage logging to occur with minimal cost during initialization.

      In order to cause (any) usage data logging to occur, a (usage logger) property configuration file must contain at least one of the following properties configured:

      property namevaluecomments
      jdk.usagelogger.logToFileabsolute path+filename to log usagecreated if absent, and appended to
      jdk.usagelogger.logToURLHTTP(S) URL to which usage log payload is POST'edapplication/cloudevents+json
      jdk.usagelogger.logToUDSpath & filename to Unix Domain Socketsame formats as file logging

      File-based logging

      The jdk.usagelogger.logToFile property specifies the absolute path and filename of the file to which a usage log record should be appended, if the file does not exist, then it will be created, if the containing directory path is writeable, otherwise the file is opened for appending, again if writeable.

      Note that the filename component value of the jdk.usagelogger.logToFile (path) may be parameterized as follows:

      VariableValue
      ${date}yyyyMMdd
      ${time}HH.mm.SS
      ${jdk.jvm.pid}O.S dependent process id of the JVM
      ${jdk.jvm.uuid}type 3 name based UUID computed from properties logged
      ${java.version}Java version string
      ${java.vendor}Java Vendor string

      The ability to parameterize, and thus (partially) generate the usage log filename for each invocation, enables the administrator to curate the collation of usage log files and facilitates the management of log file lifecycle based upon these various parameters.

      For example parameterizing the filename to include ${date} would allow collation by date (day) of logs, using either ${jdk.jvm.pid} or '${jdk.jvm.uuid}` would uniquely identify every invocation (resulting in a unique file per invocation).

      Note that the path component value of the jdk.usagelogger.logToFile (path) may also be parameterized as follows:

      VariableValue
      ${date}yyyyMMdd
      ${hostname}hostname
      ${java.version}Java version string
      ${java.vendor}Java Vendor string

      It should be noted that in the case of filename variable substitution, if the resulting file does not exist it will be created, however in the case of variable substitution of the path component of the property, if the resulting path is non-existent then no logging will occur (i.e non-existent directories in the resulting path are not created automatically).

      Note that for security reasons, the path constructed by variable substitution must be an absolute and not relative path, if the latter, file logging will not occur.

      It is also possible that some characters that appear in variable substitutions may comprise illegal or special status when appearing within a path expression, e.g. the ':' on Windows, and the '/' on *NIX platforms. Should such substitution occur, the log file may fail to be created or opened as a result of interpretation by the underlying O.S platform.

      If the logging filename ends in .json the usage log will be formatted as an application/json object as per the data component of the HTTP(S) logging mechanism described below, otherwise the format is a CSV format described below.

      The CSV format of the usage log record written by the Usage Logger to the file is as follows:

      • a usage log "line", is a CSV list, terminated by a '\n' character.
      • each 'value' (of a given property) is double-quoted
      • any null values will be indicated by the string null
      • argument lists (e.g. cmd line) will be encoded as quoted, space separated lists: e.g. "prop1=value1 prop2=value2 ..."
      • for the sake of brevity and storage efficiency, especially for shared log files in high frequency execution environments, property 'names' are elided from the log record, only the property values are output, hence the written order of properties in the record must be defined in order to facilitate processing, this order is:
      1. timestamp
      2. hostname
      3. host IP
      4. JVM O.S PID
      5. JVM UUID (type 3, name based)
      6. cmd line arguments as a quoted, space separated list, embedded quotes are double quoted
      7. JAVA_HOME
      8. JDK Version
      9. JDK Vendor
      10. JVM Version
      11. JVM Vendor
      12. O.S Name
      13. O.S Version
      14. O.S ISA
      15. JVM Arguments
      16. Classpath, or null if not specified
      17. module path, or null if not specified
      18. main module name, or null if not specified
      19. main module main class, or null if not specified
      20. user name
      21. current working directory path
      22. user home directory path
      23. java.library.path
      24. jdk.jfr.repository path, or null if not specified
      25. java.io.tmpdir
      26. jdk.usagelogger.config.file
      27. any additional system properties configured from the properties file, as a quoted list of =, space separated

      HTTP(S)-based logging

      The jdk.usagelogger.logToURL property specifies an HTTP or HTTPS URL, to which, a JSON payload describing the usage data is POST'ed.

      Note that the value of the jdk.usagelogger.logToURL can be parameterized (both path and any query param components) as follows:

      VariableValue
      ${user.name}O.S dependent username
      ${hostname}hostname
      ${ip.address}IPv4 or IPv6 address of. host
      ${date}yyyyMMdd
      ${time}HH.mm.SS
      ${jdk.jvm.pid}O.S dependent process id of the JVM
      ${jdk.jvm.uuid}type 3 name based UUID of JVM
      ${os.name}O.S "name"
      ${java.version}Java version
      ${java.vendor}Java Vendor

      The ability to (partially) parameterize the target URL enables the administrator (and the implementor of the service accepting the Usage Log POSTs) to organize the processing and organization of the incoming requests w/o resorting to parsing of the JSON payload in order to determine some useful categorization/collation values that may be expressed in the URL itself.

      Variable substitutions on the URL specified, are appropriately URL encoded.

      The JSON payload describing the usage data captured shall be encoded as a https://cloudevents.io/ (HTTP: https://github.com/cloudevents/spec/blob/v1.0/http-protocol-binding.md) specification event.

      As per the Cloudevents HTTP Protocol binding the following HTTP header fields shall be specified in the POST:

      HeaderValue
      Content-Typeapplication/cloudevents-json
      ce-specversion"1.0"
      ce-typef.q.c.n of UsageLogger
      ce-sourcethe URN UUID of the emitting JVM
      ce-idthe O.S dependent PID of the emitting JVM

      The payload of the POST shall be encoded as per https://github.com/cloudevents/spec/blob/v1.0/json-format.md.

      The envelope attributes shall be specified as follows:

      AttributeValue
      specversion"1.0"
      typef.q.c.n of UsageLogger
      sourcethe URN UUID of the emitting JVM
      idthe O.S dependent PID of the emitting JVM
      timestart time of the emitting JVM
      datacontenttypeapplication/json

      The data element of the payload envelope shall comprise a single top level JSON object containing name/value tuples for each of the properties and values that comprise the usage log data as captured by this proposal. Specifying the order of this objects contents is not specified, since the JSON object can be treated as a dictionary and properties can be discovered therein by a named property search.

      UDS-based logging

      On systems where a Unix Domain Socket implementation is available, the Usage Logger can (also) be configured to write its usage log onto such a socket.

      Unix Domain Sockets are 'named' via a path and filename in the (default) filesystem. Specifying the jdk.usagelogger.logToUDS configuration parameter, which names the socket path and filename, will cause the same CSV format usage record format as logged to a file sink, to be written to the specified UDS.

      Note, if the socket filename ends in .json the usage log will be formatted as an application/cloudevents-json Object as per the format for HTTP(S) logging).

      Using Unix Domain Sockets as a sink of usage logging data is especially useful in container based deployment environments, where a 'sidecar' architecture can be utilized in order to collect usage logger data from containerized runtime invocations.

      Note: this option only exists on systems and JDKs that support Unix Domain Sockets, the property will be ignored otherwise.

      JFR logging

      In addition to logging to either a file, a URL, or a Unix Domain Socket, the Usage Logger may also emit a JFR event in order to capture usage logging information in any JFR recording, enumerating the same information as is encoded in either the file, UDS, or URL based usage logs.

      This capability is enabled indirectly through the enablement of either file, URL or UDS based logging. The configuration of any of these mechanisms will indirectly cause usage logging to be included in any JFR recording(s) that may occur during the lifetime of the runtime. This enables the curation/correlation of any JFR recordings with usage logs.

      Thus when any logging is enabled, the Usage Logger will also register, in addition, a UsageLogEvent with JFR, if available, such that a UsageLogEvent instance will be emitted if, and only if, a recording, or recordings are initiated during the lifetime of the runtime itself.

      Controlling Concurrency of Logging Processing

      Since the Usage Logger will be in the initialization execution path of the runtime, it is important that the act of usage logging does not significantly impact the startup of the runtime itself.

      While the Usage Logger must interrogate its environment in order to detect the presence (or absence) of its properties configuration file, the actual execution of the logging operations themselves may occur concurrently with the execution of the user code.

      This execution concurrency may controlled by uttering the jdk.usagelogger.run.mode property within a usagelogger.properties file.

      This property can assume the following values, controlling the concurrency of usage logging execution in relation to the main user execution thread:

      property valuecomment
      daemonlogging occurs concurrently on a daemon thread - this is the default, note that the runtime may exit before this thread completes
      asynchronouslogging occurs concurrently on a "user" thread, the runtime will not exit normally until this is completed
      synchronouslogging does not occur concurrently, logging will complete prior to any user code being executed

      This capability is particularly useful when attempting to log 'short lived' runtimes, since by default, when executing as a daemon thread, if the main thread executes prematurely then logging may not complete before JVM termination, selecting either asynchronous or synchronous will obviate this loss of usage logging data that would occur otherwise, however it does impact the execution lifetime of the runtimes so configured.

      JMX support

      Much of the information extracted by this Usage Logging proposal is already available through existing individual JMX APIs, with the exception of the proposed JVM UUID, which is computed on demand from some of the data logged by this implementation, and utilized to provide a unique identifier associated with an individual usage invocation, therefore we also propose to modify the existing java.lang.management.RuntimeMXBean API to add a new method: UUID getId() which would return the (type 3 name based) UUID computed by the Usage Logging implementation.

      Note that this value is always computable, regardless of Usage Logging being configured or not, and is thus available via this proposed API regardless of the state of Usage Logging.

      Alternatives to this Proposal

      There are a couple of alternative implementations that were considered:
      1) wrapper 'script'/'launcher' for Java runtime
      2) Java agent implementation of usage logging

      These were rejected; as both alternatives require significant and conscious user intervention in order to opt-in and enable usage logging.

      In the case of a wrapper, through the invocation of the runtime by using the wrapper/launcher as opposed to the existing runtime command.

      In the case of an agent based solution, specifying the agent parameters to the runtime, either by the command line or JAVA_TOOL_OPTIONS environment variable, in order to cause the agent to be loaded and executed.

      Neither of these options meet the 'audit/compliance' requirements of capturing usage independent of the users/invokers (in)action to participate in usage logging and were therefore rejected.

      Testing

      Basic unit tests to demonstrate the functionality, of the mechanism would be developed, including:

      • test global usagelogger.properties file (mis)configuration
      • test per JDK installation usagelogger.properties file configuration (and override of global config if present)
      • test per java command (line) override of usage logger properties
      • test file usage logging (and use of variable substitution)
      • test http URL usage logging
      • test https URL usage logging
      • test Unix Domain Socket logging
      • test presence of UsageLogEvent in JFR recordings (when other logging is enabled)
      • test recording modes (synchronous, asynchronous, daemon)
      • ...

      Dependencies

      This JEP depends upon:

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mr Mark Reinhold
                Reporter:
                lcable Larry Cable
                Owner:
                Larry Cable
                Reviewed By:
                Brian Goetz, Mikael Vidstedt
              • Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated: