Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8029584

Please allow \uxxxx unicode-escaping on the jvm command-line arguments

    Details

    • Type: Enhancement
    • Status: Open
    • Priority: P2
    • Resolution: Unresolved
    • Affects Version/s: 7u6
    • Fix Version/s: 10
    • Component/s: tools

      Description

      Enhancement request from a customer. See bugdb for more details on
      justification

      When launching a Java Application under Windows using the Tanuki Wrapper, it
      is impossible to properly send Unicode Characters to the command-line,
      perhaps at all, perhaps without tightly restricting the System Encoding
      configuration. It would really help, if we could Unicode-escape \uXXXX
      characters on the command-line and then add a JVM argument to indicate this
      was done. This would allow passing any Unicode character, even if only ASCII
      is available on the command-line.

      It appears that the Tanuki codebase uses the proper Win32 Unicode magic.

      It appears that the JVM command-line arguments under Windows are parsed in
      hotspot.src.os.windows.launcher.java_md.c and that it would be simple to
      pre-parse the command-line to handle Unicode in this way.

      This a request for Enhancement (RFE)?

      The IBM JVM has this feature but HotSpot does Not. See

      http://publib.boulder.ibm.com/infocenter/javasdk/v5r0/index.jsp?topic=%2Fcom.ibm.java.doc.user.aix64.50%2Fuser%2Fglobalization.html

      for Example.

       

        Issue Links

          Activity

          Hide
          dholmes David Holmes added a comment -
          Command-line parsing is handled in part by the launcher:

          jdk/src/windows/bin/java_md.c

          and in part by hotspot

          hotspot/src/share/vm/runtime/arguments.cpp

          I'm not certain where this escaping would need to be implemented, but I suspect in the launcher.
          Show
          dholmes David Holmes added a comment - Command-line parsing is handled in part by the launcher: jdk/src/windows/bin/java_md.c and in part by hotspot hotspot/src/share/vm/runtime/arguments.cpp I'm not certain where this escaping would need to be implemented, but I suspect in the launcher.
          Hide
          iklam Ioi Lam added a comment -
          We need to better specify what we want. For the command line

              java <vmArgs ...> <mainClass> <appArgs ...>

          We probably want to be able to use \uxxxx encoding in all places, including vmArgs. This way, you can do

             java -Dsome.property="ABC\u1234def" ...

          or even

             java -X\u0058+Verbose -v\u0065rsion
             =>
             java -XX+Verbose -version

          Allowing escaping at arbitrary places seems weird, but the uniformity should simplify the implementation (and specification).
          Show
          iklam Ioi Lam added a comment - We need to better specify what we want. For the command line     java <vmArgs ...> <mainClass> <appArgs ...> We probably want to be able to use \uxxxx encoding in all places, including vmArgs. This way, you can do    java -Dsome.property="ABC\u1234def" ... or even    java -X\u0058+Verbose -v\u0065rsion    =>    java -XX+Verbose -version Allowing escaping at arbitrary places seems weird, but the uniformity should simplify the implementation (and specification).
          Hide
          iklam Ioi Lam added a comment -
          However, implementing the unescaping for mainClass and appArgs are simpler (see below), but there are two major problems with implementing it for vmArgs:

          [1] The specification of JNI_CreateJavaVM(JavaVM **pvm, void **penv, void *args)

          http://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/invocation.html says args is of type JavaVMInitArgs

          typedef struct JavaVMInitArgs {
              jint version;
              jint nOptions;
              JavaVMOption *options;
              jboolean ignoreUnrecognized;
          } JavaVMInitArgs;

          The version field must be set to JNI_VERSION_1_2. (In contrast, the version field in JDK1_1InitArgs must be set to JNI_VERSION_1_1.) The options field is an array of the following type:

          typedef struct JavaVMOption {
              char *optionString; /* the option as a string in the default platform encoding */
              void *extraInfo;
          } JavaVMOption;

          Because optionString is in "default platform encoding", it may not be able to pass certain unicode characters (i.e., if the platform encoding is iso8859-1). We need to update the spec to change the type of optionString to UTF8.

          [2] Currently, the conversion of "default platform encoding" -> java.lang.String is done via an JNI upcall (java.c NewPlatformString() -> sun/launcher/LauncherHelper.makePlatformString). However, if we want to support unescaping of vmArgs, then the conversion must be done BEFORE the JVM is launched. This means we need to have TWO SETS of encoding conversion code in the VM :-(

          BTW, I am not sure if "default platform encoding" can be overridden by the command-line. Is it controlled via -Dfile.encoding? E.g., what happens if you do:

          export LANG=en_US.ascii
          java -Dmy.prop='\u1234' -Dfile.encoding=UTF8 -Dmy.other.prop='\u5678' ....

          Should my.prop become "?" or "\u1234"

          ------
          So, if we allow unescaping only for mainClass and appArgs, we can do it with the JVM already running, at which point calling NewPlatformString is possible. But then the inability to specify -Dmy.prop='\u1234' leaves something to be desired.
          Show
          iklam Ioi Lam added a comment - However, implementing the unescaping for mainClass and appArgs are simpler (see below), but there are two major problems with implementing it for vmArgs: [1] The specification of JNI_CreateJavaVM(JavaVM **pvm, void **penv, void *args) http://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/invocation.html says args is of type JavaVMInitArgs typedef struct JavaVMInitArgs {     jint version;     jint nOptions;     JavaVMOption *options;     jboolean ignoreUnrecognized; } JavaVMInitArgs; The version field must be set to JNI_VERSION_1_2. (In contrast, the version field in JDK1_1InitArgs must be set to JNI_VERSION_1_1.) The options field is an array of the following type: typedef struct JavaVMOption {     char *optionString; /* the option as a string in the default platform encoding */     void *extraInfo; } JavaVMOption; Because optionString is in "default platform encoding", it may not be able to pass certain unicode characters (i.e., if the platform encoding is iso8859-1). We need to update the spec to change the type of optionString to UTF8. [2] Currently, the conversion of "default platform encoding" -> java.lang.String is done via an JNI upcall (java.c NewPlatformString() -> sun/launcher/LauncherHelper.makePlatformString). However, if we want to support unescaping of vmArgs, then the conversion must be done BEFORE the JVM is launched. This means we need to have TWO SETS of encoding conversion code in the VM :-( BTW, I am not sure if "default platform encoding" can be overridden by the command-line. Is it controlled via -Dfile.encoding? E.g., what happens if you do: export LANG=en_US.ascii java -Dmy.prop='\u1234' -Dfile.encoding=UTF8 -Dmy.other.prop='\u5678' .... Should my.prop become "?" or "\u1234" ------ So, if we allow unescaping only for mainClass and appArgs, we can do it with the JVM already running, at which point calling NewPlatformString is possible. But then the inability to specify -Dmy.prop='\u1234' leaves something to be desired.
          Hide
          iklam Ioi Lam added a comment -
          A compromise could be to unescape \uxxxx in ParseArguments in java.c, as long as the default platform encoding supports the character; it would become '?' if the default platform encoding does not support the character.

          On most platforms today, the default platform encoding is some sort of UTF8, so almost everyone would be happy.

          Unfortunately, the only people who would be unhappy, (and arguably the people who would otherwise benefit the most from this feature), would be those who use some sort of non-unicode-compatible encoding (like the original filer of this bug) ....
          Show
          iklam Ioi Lam added a comment - A compromise could be to unescape \uxxxx in ParseArguments in java.c, as long as the default platform encoding supports the character; it would become '?' if the default platform encoding does not support the character. On most platforms today, the default platform encoding is some sort of UTF8, so almost everyone would be happy. Unfortunately, the only people who would be unhappy, (and arguably the people who would otherwise benefit the most from this feature), would be those who use some sort of non-unicode-compatible encoding (like the original filer of this bug) ....
          Hide
          iklam Ioi Lam added a comment -
          The original filer of this bug requested to support the same feature as IBM's -Xargencoding. This can be done completely inside the launcher and does not involve hotspot. In fact, hotspot is not involved in the processing of mainClass and appArgs.

          Transfer (hotspot,runtime) => (tools,launcher).
          Show
          iklam Ioi Lam added a comment - The original filer of this bug requested to support the same feature as IBM's -Xargencoding. This can be done completely inside the launcher and does not involve hotspot. In fact, hotspot is not involved in the processing of mainClass and appArgs. Transfer (hotspot,runtime) => (tools,launcher).
          Hide
          ksrini Kumar Srinivasan added a comment -
          As an FYI this was closed as WNF, JDK-4858889
          Show
          ksrini Kumar Srinivasan added a comment - As an FYI this was closed as WNF, JDK-4858889
          Hide
          ntoda Neil Toda added a comment -
          We have received a patch from IBM for the launcher. This patch was created 6/2009 so has been in their code base for a while now. Porting it to JDK9DEV modular file structure.
          Show
          ntoda Neil Toda added a comment - We have received a patch from IBM for the launcher. This patch was created 6/2009 so has been in their code base for a while now. Porting it to JDK9DEV modular file structure.

            People

            • Assignee:
              ksrini Kumar Srinivasan
              Reporter:
              asaha Abhijit Saha
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated: