Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8277163

Value Objects (Preview)

    XMLWordPrintable

    Details

    • Type: JEP
    • Status: Submitted
    • Priority: P3
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • JEP Type:
      Feature
    • Exposure:
      Open
    • Scope:
      SE
    • Discussion:
      valhalla dash dev at openjdk dot java dot net
    • Effort:
      XL
    • Duration:
      XL

      Description

      Summary

      Enhance the Java object model with value objects, class instances that have only final instance fields and lack object identity. This is a preview language and VM feature.

      Goals

      This JEP provides for the declaration of identity-free value classes and specifying the behavior of their instances, called value objects, with respect to equality, synchronization, and other operations that traditionally depend upon identity.

      At runtime, the HotSpot JVM will prefer inlining value objects where feasible, in particular for JIT-compiled method calls and local operations. An inlined value object is encoded directly with its field values, avoiding any overhead from object headers, indirections, or heap allocation.

      Non-Goals

      Value class types are reference types. The Valhalla project is also developing user-defined primitive types, but these will require additional changes to the Java object model and type system. See "Dependencies" for details.

      Existing value-based classes in the standard libraries will not be affected by this JEP. Once the features of this JEP become final, those classes will be available for migration to value classes as a separate task.

      Motivation

      Java's objects and classes offer powerful abstractions for representing data, including fields, methods, constructors, access control, and nominal subtyping. Every object also comes with identity, enabling features such as field mutation and locking.

      Many classes don't take advantage of all of these features. In particular, a significant subset of classes don't have any use for identity—their field values can be permanently set on instantiation, their instances don't need to act as synchronization locks, and their preferred notion of equality makes no distinction between separately-allocated instances with matching field values.

      At runtime, support for identity can be expensive. It generally requires that an object's data be located at a particular memory location, packaged with metadata to support the full range of object functionality. Fields are accessed with memory loads, which are relatively slow operations. As objects are shared between program components, data structures and garbage collectors end up with tangled, non-local webs of objects created at different times. Sometimes, JVM implementations can optimize around these constraints, but the resulting performance improvements can be unpredictable and unreliable.

      An alternative is to encode program data with primitive types. Primitive values don't have identity, and so can be copied freely and encoded as compact bit sequences. But programs that represent their data with primitive types give up all the other abstractions provided by objects and classes. (For example, if a geographic location is encoded as two floats, there's no way to restrict the valid range of values, keep matching pairs of floats together, prevent re-interpreting the values with the wrong units, or compatibly switch to a double-based encoding.)

      Value classes provide programmers with a mechanism to opt out of object identity, and in return get many of the performance benefits of primitive types, without giving up the other features of Java classes.

      Opting out of identity is an important step towards user-defined primitives, which would fully combine the performance profile of today's primitives with the abstractions of class declarations. JEP 401 will support such types.

      However, many classes will be better served by declaring themselves value classes, carrying on with familiar (and compatible) reference type semantics, and still unlocking many of the same JVM optimizations. This includes many JDK classes, like LocalDate, that are currently designated as "value-based" to discourage users from relying on their instances' identities.

      Description

      The features described below are preview features, enabled with the --enable-preview compile-time and runtime flags.

      Overview

      A value object is a class instance that does not have identity. That is, a value object does not have any particular memory address or any other property to distinguish it from other instances of the same class whose fields have the same values. Value objects cannot mutate their fields or be used for synchronization. The == operator on value objects compares their fields. A value class declaration introduces a class whose instances are value objects.

      An identity object is a class instance or array that does have identity—the traditional behavior of objects in Java. An identity object can mutate its non-final fields and is associated with a synchronization monitor. The == operator on identity objects compares their identities. An identity class declaration—the default for a concrete class—introduces a class whose instances are identity objects.

      Value class declarations

      A class can be declared a value class with the value contextual keyword. If a concrete class is declared without the value contextual keyword, it is an identity class.

      value class Substring implements CharSequence {
          private String str;
          private int start;
          private int end;
      
          public Substring(String str, int start, int end) {
              checkBounds(start, end, str.length());
              this.str = str;
              this.start = start;
              this.end = end;
          }
      
          public int length() {
              return end - start;
          }
      
          public char charAt(int i) {
              checkBounds(0, i, length());
              return str.charAt(start + i);
          }
      
          public Substring subSequence(int s, int e) {
              checkBounds(s, e, length());
              return new Substring(str, start + s, start + e);
          }
      
          public String toString() {
              return str.substring(start, end);
          }
      
          private static void checkBounds(int start, int end, int length) {
              if (start < 0 || end < start || length < end)
                  throw new IndexOutOfBoundsException();
          }
      }

      A value class declaration is subject to the following restrictions:

      • The class is implicitly final, so cannot be extended. The class may not be declared abstract.

      • All instance fields are implicitly final, so must be assigned exactly once by constructors or initializers, and cannot be assigned outside of a constructor or initializer.

      • The class does not implement—directly or indirectly—IdentityObject (see below). This implies that the superclass is either Object or a stateless abstract class.

      • No constructor makes a super constructor call. Instance creation will occur without executing any superclass initialization code.

      • No instance methods are declared synchronized.

      • (Possibly) The class does not declare a finalize() method.

      • (Possibly) The constructor does not make use of this except to set the fields in the constructor body, or perhaps after all fields are definitely assigned.

      In most other ways, a value class declaration is just like an identity class declaration. It can have superinterfaces, type parameters, enclosing instances, inner classes, overloaded constructors, static members, and the full range of access restrictions on its members.

      A record class may also be declared as a value class:

      value record Name(String first, String last) {
          public String full() { return "%s %s".formatted(first, last); }
      }

      Records are often good candidates to be value classes, because their fields are already required to be final.

      Working with value objects

      Value objects are created and operated on just like normal objects:

      Substring s1 = new Substring("abc", 0, 2);
      Substring s2 = null;
      if (s1.length() == 2)
          s2 = s1.subSequence(1, 2);
      CharSequence cs = s2;
      System.out.println(cs.toString()); // prints "b"

      The == operator compares value objects of the same class in terms of their field values, not object identity. Fields with basic primitive types are compared by their bit patterns. Other field values—both identity and value objects—are recursively compared with ==.

      assert new Substring("abc", 1, 2) == s2;
      assert new Substring("abcd", 1, 2) != s2;
      assert s1.subSequence(0, 2) == s1;

      The equals, hashCode, and toString methods, if inherited from Object, along with System.identityHashCode, behave consistently with this definition of equality.

      Substring s3 = s1.subSequence(0, 2);
      assert s1.equals(s3);
      assert s1.hashCode() == s3.hashCode();
      assert System.identityHashCode(s1) == System.identityHashCode(s3);

      Attempting to synchronize on a value object results in an exception.

      Object obj = s1;
      try { synchronized (obj) { } }
      catch (IllegalMonitorStateException e) { /* expected exception */ }

      The ValueObject and IdentityObject interfaces

      We introduce two new interfaces as essential preview APIs:

      • java.lang.ValueObject
      • java.lang.IdentityObject

      All value classes implicitly implement ValueObject. All identity classes—including all preexisting concrete classes in the Java ecosystem—implicitly implement IdentityObject. Array types are also subtypes of IdentityObject.

      These interfaces facilitate distinguishing between identity objects and value objects in three ways:

      • An instanceof IdentityObject or instanceof ValueObject test can be used to determine whether an object has identity, and similarly for reflection on the Class.

      • A variable of type IdentityObject or ValueObject can hold an arbitrary object with or without identity, respectively.

      • An extends IdentityObject or extends ValueObject type parameter bound can be used to require type arguments that guarantee values with or without identity, respectively.

      By default, an interface extends neither IdentityObject nor ValueObject, and can be implemented by both kinds of concrete classes. An interface can explicitly extend one of the interfaces if the author determines that all implementing objects are expected to have (or not have) identity. It is an error if a class or interface ends up implementing both interfaces implicitly, explicitly, or by inheritance. (As a special case, an interface may only be considered a functional interface, compatible with lambda expressions, if it extends neither IdentityObject nor ValueObject. This allows for flexibility in the implementation of lambda expressions.)

      An abstract class can similarly be declared to implement either IdentityObject or ValueObject; or, if it declares a field, an instance initializer, a non-empty constructor, or a synchronized method, it implicitly implements IdentityObject (perhaps with a warning). Otherwise, the abstract class extends neither interface and can be extended by both kinds of concrete classes.

      The class Object implements neither IdentityObject nor ValueObject, but is effectively, and perhaps explicitly, abstract. (As described above, concrete classes always implement one or the other.) Calls to new Object() are re-interpreted as instance creations of a new, empty identity subclass of Object (name TBD).

      Migration of existing classes

      If an existing class does not expose its constructors to separately-compiled code, and meets the other requirements of value class declarations, it may be declared as a value class without breaking binary compatibility.

      There are some behavioral changes that users of the class may notice:

      • The == operator may treat two instances as the same, where previously they were considered different

      • Attempts to synchronize on an instance will fail, either at compile time or run time

      • The results of toString, equals, and hashCode, if they haven't been overridden, may be different

      • Assumptions about unique ownership of an instance may be violated (for example, an identical instance may be "created" at two different program points)

      • Performance will generally improve, but may have different characteristics that are surprising

      Some classes in the standard library are designated value-based, and can be expected to become value classes in a future release.

      Developers are encouraged to identify and migrate value class candidates in their own code, where appropriate.

      class file representation & interpretation

      A value class is declared in a class file using the ACC_VALUE modifier (0x0100). At class load time, the class is considered to implement the interface ValueObject; an error occurs if a value class is not final, has a non-final instance field, or implements—directly or indirectly—IdentityObject.

      An abstract class that allows value subclasses declares this capability in its class file using the ACC_PERMITS_VALUE modifier (0x0040). At class load time, an error occurs if the class is not abstract, declares an instance field, declares a synchronized method, or implements—directly or indirectly—IdentityObject.

      At class load time, a class (not an interface) is considered to implement the interface IdentityObject if it is not a value class and does not explicitly permit value subclasses. (An exception may need to be made, in the preview release, for the Object and Record classes.) Every array type is also considered to implement IdentityObject.

      It is a load time error if any class or interface implements or extends—directly or indirectly—both ValueObject and IdentityObject.

      A value class's type is represented using the usual L descriptor (LSubstring;). To facilitate inlining optimizations, a Preload attribute can be provided by any class, communicating to the JVM that a set of referenced CONSTANT_Class entries should be eagerly loaded to locate potentially-useful layout information.

      Preload_attribute {
          u2 attribute_name_index;
          u4 attribute_length;
          u2 number_of_classes;
          u2 classes[number_of_classes];
      }

      Two new opcodes facilitate instance creation:

      • aconst_init, with a CONSTANT_Class operand, produces an initial instance of the named value class, with all fields set to their default values. This operation always has private access: a linkage error occurs if anyone other than the value class or its nestmates attempts an aconst_init operation.

      • withfield, with a CONSTANT_Fieldref operand, produces a new value object by using an existing object as a template but replacing the value of one of its fields. This operation also has private access.

      It is a linkage error to use the opcode new with a value class. Instance initialization methods can be declared in a value class, but verification prevents their invocation.

      A new kind of special method, an unnamed factory method, can be declared to return instances of the class. Unnamed factory methods are named <new> (or, alternatively, <init> with a non-void return) and are static. They are invoked with invokestatic.

      The if_acmpeq and if_acmpne operations implement the == test for value objects, as described above. The monitorenter instruction throws an exception if applied to a value object.

      Java language compilation

      Each class file generated by javac includes a Preload attribute naming any value class that appears in one of the class file's declared field or method descriptors.

      Constructors of value classes compile to unnamed factory methods, not instance initialization methods. In the constructor body, the compiler treats this as a mutable local variable, initialized by aconst_init, modified by withfield, and ultimately returned as the method result.

      API support

      A new reflective preview API method, Class.isValue, indicates whether a class object corresponds to a value class. This is matched by a new Modifier.VALUE flag. The method Class.getDeclaredConstructors, and related methods, search for unnamed factory methods rather than instance initialization methods when invoked on a value class.

      java.lang.ref recognizes value objects and treats them specially (details TBD).

      java.lang.invoke provides a mechanism to execute the aconst_init and withfield instructions reflectively. The LambdaMetafactory class checks that neither IdentityObject nor ValueObject are among the requested superinterfaces.

      javax.lang.model recognizes value class declarations.

      Performance model

      Because value objects lack identity, JVMs may freely duplicate and re-encode them in an effort to improve computation time, memory footprint, and garbage collector performance.

      Implementations are free to use different encodings in different contexts, such as stack vs. heap, as long as the values of the objects' fields are preserved. However, these encodings must account for the possibility of a null value, and must ensure that fields and arrays storing value objects are read and written atomically.

      In practice, this means that local variables, method parameters, and expression results can often use inline encodings, while fields and array components are not typically inlined, except perhaps in the case of very small value classes.

      Previously, JVMs have used similar optimization techniques to inline identity objects when the JVM is able to prove that an object's identity is never used. Developers can expect more predictable and widespread optimizations for value objects.

      HotSpot implementation

      This section describes implementation details of this release of the HotSpot virtual machine, for the information of OpenJDK engineers. These details are subject to change in future releases and should not be assumed by users of HotSpot or other JVMs.

      Value objects in HotSpot are encoded as follows:

      • In fields and arrays, value objects are encoded as regular heap objects.

      • In the interpreter and C1, value objects on the stack are also encoded as regular heap objects.

      • In C2, value objects on the stack are typically scalarized when stored or passed with value class types. Scalarization effectively encodes each field as a separate variable, with an additional variable encoding null; no heap allocation is needed. Methods with value-class-typed parameters support both a pointer-based entry point (for interpreter and C1 calls) and a scalarized entry point (for C2-to-C2 calls). Value objects are allocated on the heap when they need to be viewed as values of a supertype of the value class, or when stored in fields or arrays.

      C2 relies on the Preload attribute to identify value class types at preparation time. If a value class is not named by Preload (for example, if the class was an identity class at compile time), method calls may end up using a heap object encoding instead. In the case of an overriding mismatch—a method and its super methods disagree about scalarization of a particular type—the overriding method may dynamically force callers to de-opt and use the pointer-based entry point.

      To facilitate the special behavior of instructions like if_acmpeq, value objects in the heap are identified with a new flag in their object header.

      Alternatives

      JVMs have long performed escape analysis to identify objects that do not rely on identity throughout their lifespan and can be inlined. These optimizations are somewhat unpredictable, and do not help with objects that escape the scope of the optimization.

      Hand-coded optimizations via basic primitive values are possible to improve performance, but as noted in the "Motivation" section, these techniques require giving up valuable abstractions.

      The C language and its relatives support inline storage for structs and similar class-like abstractions. For example, the C# language has value types. Unlike value objects, instances of these abstractions have identity, meaning they support operations such as field mutation. As a result, the semantics of copying on assignment, invocation, etc., must be carefully specified, leading to a more complex user model and less flexibility for runtime implementations. We prefer an approach that leaves these low-level details to the discretion of JVM implementations.

      Risks and Assumptions

      The feature makes significant changes to the Java object model. Developers may be surprised by, or encounter bugs due to, changes in the behavior of operations such as == and synchronized. It will be important to validate that such disruptions are rare and tractable.

      Another possible, but hopefully rare, disruption is the unexpected presence of an additional IdentityObject superinterface when examining a class reflectively.

      Some changes could potentially affect the performance of identity objects. The if_acmpeq instruction, for example, typically only costs one instruction cycle, but will now need an additional check to detect value objects. The identity class case should be optimized as the fast path, and we will need to minimize any performance regressions.

      There is a security risk that == and hashCode can indirectly expose private field values. Further, two large trees of value objects can take unbounded time to compute ==, potentially a DoS attack risk. Developers need to understand these risks.

      Dependencies

      In anticipation of this feature we already added warnings about potential incompatible changes to value class candidates in javac and HotSpot, via JEP 390.

      JEP 401 will expand on value objects by allowing for the declaration of primitive types. These types support value class features like fields and methods, and have many of the same semantics. But they do not support null and don't guarantee atomic reads and writes; in exchange, they can be more universally inlined by JVMs.

      JEP 402 will provide class declarations, as allowed by JEP 401, for the basic primitive types (int, boolean, etc.) These declarations will subsume the existing wrapper classes.

      JVM class and method specialization (JEP 218, with revisions) will allow generic classes and methods to specialize field, array, and local variable layouts when parameterized by value class types.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              dlsmith Dan Smith
              Reporter:
              dlsmith Dan Smith
              Owner:
              Dan Smith Dan Smith
              Reviewed By:
              Brian Goetz
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated: