Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8166606

System.getenv() returns corrupted/incorrectly encoded strings when JVM is UTF-16/BE/LE

    Details

    • Subcomponent:
    • CPU:
      x86_64
    • OS:
      linux

      Description

      FULL PRODUCT VERSION :
      java version "1.8.0_102"
      Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
      Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)


      ADDITIONAL OS VERSION INFORMATION :
      ubuntu 14.04
      mac

      Linux kittingj-covdesktop-lnx 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

      Does not occur on windows

      EXTRA RELEVANT SYSTEM CONFIGURATION :
      ubuntu locale/LANG = en_US.UTF8
      JVM is launched with -Dfile.encoding=UTF-16

      A DESCRIPTION OF THE PROBLEM :
      On linux/mac, but not windows... when the JVM is configured with -Dfile.encoding=UTF-16
      (or UTF-16BE or UTF-16LE)

      System.getenv() returns garbage strings.
      They seems to be a combination of BOM/Byte Order Marker with UTF-8
      They aren't usable java unicode strings .
      There seems to be some kind of mistranslation happening between getting these values from the OS and Java.

      The expectation here is that I should be able print these.
      I don't see this problem under windows, but do see it on linux/mac.



      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Try printing strings that come from System.getenv() when...
      OS = Linux/Mac, english UTF-8
      JVM is launched with -Dfile.encoding=UTF-16

      public class Main {

      public static void main(String[] args) throws UnsupportedEncodingException {
      Map<String, String> env = System.getenv();
      String defaultCharsetName = Charset.defaultCharset().toString();
      System.out.printf("The default charset = %s%n", defaultCharsetName);

      for (Map.Entry<String, String> entry : env.entrySet()) {

      System.out.printf("%s = %s%n", entry.getKey(), entry.getValue());

      byte[] keyBytes = entry.getKey().getBytes();
      byte[] valueBytes = entry.getValue().getBytes();
      System.out.printf("%s%n", Arrays.toString(keyBytes));
      System.out.printf("%s%n", Arrays.toString(valueBytes));
      }
      System.out.println("Done");
      }

      On Linux/Mac

      copy the above code to a file -- Main.java
      javac Main.java
      java -Dfile.encoding=UTF-16 Main
      Compare the environment variables with what you see in a terminal window




      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      I would expect that printing the values returned by System.getenv() would look equivalent to
      looking at the env variables in a terminal window. (e.g. type set in a terminal window)
      ACTUAL -
      Observe that the env variables all print as garbage.
      Also, the byte encodings aren't correct for UTF-16
      For English characters, you would expect to see '00' in every other byte.

      Instead, it looks like UTF-8 with a BOM.

      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      code compiles and runs; but displays incorrectly on Linux/Mac.
      It does display correctly on Windows (Win 7 Pro - 64Bit/English)

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      public class Main {

      public static void main(String[] args) throws UnsupportedEncodingException {
      Map<String, String> env = System.getenv();
      String defaultCharsetName = Charset.defaultCharset().toString();
      System.out.printf("The default charset = %s%n", defaultCharsetName);

      for (Map.Entry<String, String> entry : env.entrySet()) {

      System.out.printf("%s = %s%n", entry.getKey(), entry.getValue());

      byte[] keyBytes = entry.getKey().getBytes();
      byte[] valueBytes = entry.getValue().getBytes();
      System.out.printf("%s%n", Arrays.toString(keyBytes));
      System.out.printf("%s%n", Arrays.toString(valueBytes));
      }
      System.out.println("Done");
      }
      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      There's isn't a good way to workaround this

      The best I've come up with is...
      getBytes() on the string from System.getenv()
      remove the BOM / create a new byte array that doesn't contain a BOM
      create a new string using these bytes and specifying a charset, e.g.
      str = new String(badBytes, "UTF-8");

      This works for the KEYS of getenv().
      It doesn't seem to work for the VALUES of getenv().
      I'm not really clear how the VALUES are encoded... the ending of the string seems to be missing the last character and/or contains a byte sequence of "-1", "-3" (I'm not familiar with what sequence would be)

        Attachments

        1. JI9043723.java
          0.7 kB
        2. outJDK8u92.log
          34 kB
        3. screenshot.jpg
          screenshot.jpg
          416 kB

          Activity

            People

            • Assignee:
              rriggs Roger Riggs
              Reporter:
              webbuggrp Webbug Group
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: