Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8144691

JEP 254: Compact Strings: endiannes mismatch in Java source code and intrinsic

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: P4
    • Resolution: Not an Issue
    • Affects Version/s: 9
    • Fix Version/s: None
    • Component/s: core-libs
    • Labels:
    • Subcomponent:
    • Introduced In Build:
      b93
    • CPU:
      x86
    • OS:
      windows_8

      Description

      FULL PRODUCT VERSION :
      java version "1.9.0-ea"
      Java(TM) SE Runtime Environment (build 1.9.0-ea-b93)
      Java HotSpot(TM) 64-Bit Server VM (build 1.9.0-ea-b93, mixed mode)

      A DESCRIPTION OF THE PROBLEM :
      The problem: Java source code suggests that bytes for UTF16 characters are stored in java.lang.String's field "value" as big endian:

      // from String.java:
          public char charAt(int index) {
              if (isLatin1()) {
                  return StringLatin1.charAt(value, index);
              } else {
                  return StringUTF16.charAt(value, index);
              }
          }

      // from StringUTF16.java:

          @HotSpotIntrinsicCandidate
          public static char getChar(byte[] val, int index) {
              index <<= 1;
              return (char)(((val[index++] & 0xff) << HI_BYTE_SHIFT) |
                            ((val[index] & 0xff) << LO_BYTE_SHIFT));
          }

      ... however, in fact, JVM applies intrinsic that uses target platfrom's natural endianness. For example, it is little endian on Windows which is x86/x64, but big endian on Solaris SPARC.

      I suppose this is done for performance reasons, and it's OK. However, if this is the case, the source code should explicitly reflect this fact.

      Also, tools opening HPROF dumps or debuggers that present Strings should be provided with explicit means to find out which endianness is used in particular JVM.


      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Running the following TestCompactStrings on a little endian machine (e.g. Windows) prints "little endian" and on a big endian machine (e.g. Solaris SPARC) prints "big endian".

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      According to the current Java source code, it should print "big endian" always.

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.lang.reflect.Field;

      class TestCompactStrings {
        public static void main(String[] args) throws Exception {
          String s = "\u1234";
          Class<? extends String> aClass = s.getClass();

          Field valueField = aClass.getDeclaredField("value");
          valueField.setAccessible(true);

          byte[] bytes = (byte[])valueField.get(s);

          if (bytes[0] == 0x12 && bytes[1] == 0x34) {
            System.out.println("big endian");
          }
          else if (bytes[0] == 0x34 && bytes[1] == 0x12) {
            System.out.println("little endian");
          }
          else {
            System.out.println("unexpected");
          }
        }
      }

      ---------- END SOURCE ----------

        Issue Links

          Activity

          Hide
          psonal Pallavi Sonal added a comment -
          The attached test case is reproducible in JDK 9 ea b93.
          Following is the output:
          java TestCompactStrings
          little endian
          Show
          psonal Pallavi Sonal added a comment - The attached test case is reproducible in JDK 9 ea b93. Following is the output: java TestCompactStrings little endian
          Hide
          sherman Xueming Shen added a comment -
          The test should and indeed does print "little endian" on a little endian hardware. The submitter should either read to the bottom of the source StringUTF16.java, in which we define the hi/lo_byte_shift based on platform's native endianness

              private static native boolean isBigEndian();

              static final int HI_BYTE_SHIFT;
              static final int LO_BYTE_SHIFT;
              static {
                  if (isBigEndian()) {
                      HI_BYTE_SHIFT = 8;
                      LO_BYTE_SHIFT = 0;
                  } else {
                      HI_BYTE_SHIFT = 0;
                      LO_BYTE_SHIFT = 8;
                  }
              }

          or just run his test program on a machine.
          Show
          sherman Xueming Shen added a comment - The test should and indeed does print "little endian" on a little endian hardware. The submitter should either read to the bottom of the source StringUTF16.java, in which we define the hi/lo_byte_shift based on platform's native endianness     private static native boolean isBigEndian();     static final int HI_BYTE_SHIFT;     static final int LO_BYTE_SHIFT;     static {         if (isBigEndian()) {             HI_BYTE_SHIFT = 8;             LO_BYTE_SHIFT = 0;         } else {             HI_BYTE_SHIFT = 0;             LO_BYTE_SHIFT = 8;         }     } or just run his test program on a machine.

            People

            • Assignee:
              sherman Xueming Shen
              Reporter:
              webbuggrp Webbug Group
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: