Details

    • Author:
      Brent Christian
    • JEP Type:
      Feature
    • Exposure:
      Open
    • Subcomponent:
    • Scope:
      Implementation
    • Discussion:
      core dash libs dash dev at openjdk dot java dot net
    • Effort:
      L
    • Duration:
      XL
    • Alert Status:
       Green
    • JEP Number:
      254

      Description

      Summary

      Adopt a more space-efficient internal representation for strings.

      Goals

      Improve the space efficiency of the String class and related classes while maintaining performance in most scenarios and preserving full compatibility for all related Java and native interfaces.

      Non-Goals

      It is not a goal to use alternate encodings such as UTF-8 in the internal representation of strings. A subsequent JEP may explore that approach.

      Motivation

      The current implementation of the String class stores characters in a char array, using two bytes (sixteen bits) for each character. Data gathered from many different applications indicates that strings are a major component of heap usage and, moreover, that most String objects contain only Latin-1 characters. Such characters require only one byte of storage, hence half of the space in the internal char arrays of such String objects is going unused.

      Description

      We propose to change the internal representation of the String class from a UTF-16 char array to a byte array plus an encoding-flag field. The new String class will store characters encoded either as ISO-8859-1/Latin-1 (one byte per character), or as UTF-16 (two bytes per character), based upon the contents of the string. The encoding flag will indicate which encoding is used.

      String-related classes such as AbstractStringBuilder, StringBuilder, and StringBuffer will be updated to use the same representation, as will the HotSpot VM's intrinsic string operations.

      This is purely an implementation change, with no changes to existing public interfaces. There are no plans to add any new public APIs or other interfaces.

      The prototyping work done to date confirms the expected reduction in memory footprint, substantial reductions of GC activity, and minor performance regressions in some corner cases.

      For further detail, see:

      Alternatives

      We tried a "compressed strings" feature in JDK 6 update releases, enabled by an -XX flag. When enabled, String.value was changed to an Object reference and would point either to a byte array, for strings containing only 7-bit US-ASCII characters, or else a char array. This implementation was not open-sourced, so it was difficult to maintain and keep in sync with the mainline JDK source. It has since been removed.

      Testing

      Thorough compatibility and regression testing will be essential for a change to such a fundamental part of the platform.

      We will also need to confirm that we have fulfilled the performance goals of this project. Analysis of memory savings will need to be done. Performance testing should be done using a broad range of workloads, ranging from focused microbenchmarks to large-scale server workloads.

      We will encourage the entire Java community to perform early testing with this change in order to identify any remaining issues.

      Risks and Assumptions

      Optimizing character storage for memory may well come with a trade-off in terms of run-time performance. We expect that this will be offset by reduced GC activity and that we will be able to maintain the throughput of typical server benchmarks. If not, we will investigate optimizations that can strike an acceptable balance between memory saving and run-time performance.

      Other recent projects have already reduced the heap space used by strings, in particular JEP 192: String Deduplication in G1. Even with duplicates eliminated, the remaining string data can be made to consume less space if encoded more efficiently. We are assuming that this project will still provide a benefit commensurate with the effort required.

        Issue Links

        1.
        Basic string intrinsics for x86 Sub-task Resolved Tobias Hartmann  
         
        2.
        Adapt C2's string concatenation optimization Sub-task Resolved Tobias Hartmann  
         
        3.
        Basic string intrinsics for Sparc Sub-task Resolved Tobias Hartmann  
         
        4.
        Improve performance of string compression on Sparc Sub-task Resolved Tobias Hartmann  
         
        5.
        Improve performance of string inflation on Sparc Sub-task Resolved Tobias Hartmann  
         
        6.
        String.coder should be final Sub-task Resolved Aleksey Shipilev  
         
        7.
        Figure out the best code shape for a kill switch Sub-task Resolved Aleksey Shipilev  
         
        8.
        StringCoding need to be update/optimized for compact string implementation Sub-task Resolved Xueming Shen  
         
        9.
        Investigate performance regressions on Sparc Sub-task Resolved Tobias Hartmann  
         
        10.
        C1 and C2 intrinsics for StringUTF16.(get|set)Char Sub-task Resolved Tobias Hartmann  
         
        11.
        String.charAt blows the MaxInlineSize limit, penalizes C1 Sub-task Closed Unassigned  
         
        12.
        StringUTF16.(get|set)Char intrinsic should use scaled operand Sub-task Resolved Aleksey Shipilev  
         
        13.
        StringUTF16 should check for the maximum length Sub-task Resolved Xueming Shen  
         
        14.
        CompactStrings flag handling without extending the JVM interface Sub-task Resolved Aleksey Shipilev  
         
        15.
        Replace common copying loops with arraycopy/copyOf/copyOfRange Sub-task Closed Aleksey Shipilev  
         
        16.
        Backout runtime checks in intrinsics before integration Sub-task Resolved Tobias Hartmann  
         
        17.
        Remove StringCharIntrinsics flag after JDK-8138651 is fixed Sub-task Resolved Aleksey Shipilev  
         
        18.
        Integration Sub-task Resolved Tobias Hartmann  
         
        19.
        Release Note: JEP 254: Compact Strings Sub-task Resolved Xueming Shen  
         

          Activity

          Hide
          plevart Peter Levart added a comment -
          Just a note on code at: http://cr.openjdk.java.net/~sherman/8054307/jdk/ ...
          A reference to String should be safe to pass to threads via data-race. New "byte coder" field is also part of String's final state but is not marked final. Java code is currently written so that it can't be made final (assigned in initBytes methods). So is safety relying on the implementation details and piggy-backing on "value" field final marker to affect "coder" field store order? Should there be an explicit Unsafe.storeFence() at the end of each constructor?
          Show
          plevart Peter Levart added a comment - Just a note on code at: http://cr.openjdk.java.net/~sherman/8054307/jdk/ ... A reference to String should be safe to pass to threads via data-race. New "byte coder" field is also part of String's final state but is not marked final. Java code is currently written so that it can't be made final (assigned in initBytes methods). So is safety relying on the implementation details and piggy-backing on "value" field final marker to affect "coder" field store order? Should there be an explicit Unsafe.storeFence() at the end of each constructor?
          Hide
          sherman Xueming Shen added a comment -
          There is a prototype implementation in JDK Sandbox repository (mainly tested on x86), under JDK-8054307-branch here is the brief build instructions:
           
          $ hg clone http://hg.openjdk.java.net/jdk9/sandbox/
           $ cd sandbox
           $ sh ./get_source.sh
           $ sh ./common/bin/hgforest.sh up -r JDK-8054307-branch
           $ make configure
           $ make images
          Show
          sherman Xueming Shen added a comment - There is a prototype implementation in JDK Sandbox repository (mainly tested on x86), under JDK-8054307 -branch here is the brief build instructions:   $ hg clone http://hg.openjdk.java.net/jdk9/sandbox/  $ cd sandbox  $ sh ./get_source.sh  $ sh ./common/bin/hgforest.sh up -r JDK-8054307 -branch  $ make configure  $ make images
          Hide
          plevart Peter Levart added a comment -
          There's a substantial improvement in speed if StringUTF16.[get|put]Char are implemented using Unsafe instead of in plain Java:
          http://cr.openjdk.java.net/~plevart/misc/JEP254/CharAtBench.java
          Show
          plevart Peter Levart added a comment - There's a substantial improvement in speed if StringUTF16.[get|put]Char are implemented using Unsafe instead of in plain Java: http://cr.openjdk.java.net/~plevart/misc/JEP254/CharAtBench.java
          Hide
          sherman Xueming Shen added a comment -
          There are new intrinsics for StringUTF16.get/putChar in the prototype implementation.
          Show
          sherman Xueming Shen added a comment - There are new intrinsics for StringUTF16.get/putChar in the prototype implementation.
          Hide
          thartmann Tobias Hartmann added a comment -
          I updated the JEP integration/due date according to the following schedule:

          Putback to hs-comp: Nov 03
          Integrate into hs-main: Nov 12
          Integrate into jdk9-dev: Nov 18
          Integrate into Master: Nov 25 == JEP int date
          Show
          thartmann Tobias Hartmann added a comment - I updated the JEP integration/due date according to the following schedule: Putback to hs-comp: Nov 03 Integrate into hs-main: Nov 12 Integrate into jdk9-dev: Nov 18 Integrate into Master: Nov 25 == JEP int date

            People

            • Assignee:
              sherman Xueming Shen
              Reporter:
              bchristi Brent Christian
              Owner:
              Xueming Shen
              Reviewed By:
              Aleksey Shipilev, Brian Goetz, Charlie Hunt
              Endorsed By:
              Brian Goetz
            • Votes:
              0 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

              • Due:
                Created:
                Updated:
                Resolved:
                Integration Due: