Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8023881

IDN.USE_STD3_ASCII_RULES option is too strict to use Unicode in IDN.toASCII

    Details

    • Subcomponent:
    • Resolved In Build:
      b108
    • Verification:
      Verified

      Backports

        Description

        IDN.toASCII("示例.com", IDN.USE_STD3_ASCII_RULES) throws:

            Exception ... java.lang.IllegalArgumentException: Contains non-LDH characters
        at java.net.IDN.toASCIIInternal(IDN.java:275)
        at java.net.IDN.toASCII(IDN.java:118)


        Per step 3, section 4.1, RFC 3490:

           3. If the UseSTD3ASCIIRules flag is set, then perform these checks:

             (a) Verify the absence of non-LDH ASCII code points; that is, the
                 absence of 0..2C, 2E..2F, 3A..40, 5B..60, and 7B..7F.

             (b) Verify the absence of leading and trailing hyphen-minus; that
                 is, the absence of U+002D at the beginning and end of the
                 sequence.

        However, in the impl of IDN is checking far more strictly than above:

        private static String toASCIIInternal(String label, int flag)
            ...
            if (useSTD3ASCIIRules) {
                for (int i = 0; i < dest.length(); i++) {
                    int c = dest.charAt(i);
                    if (!isLDHChar(c)) {
                        throw new IllegalArgumentException(
                            "Contains non-LDH characters");
                    }
                }
            ...
        }

        private static boolean isLDHChar(int ch){
            // high runner case
            if(ch > 0x007A){
                return false;
            }
            ...
        }

        isLDHChar() does not accept Unicode bigger than 0x007A. For example
        "0x3041" ("あ") is denied. It is too strict to convert Unicode with IDN.toASCII().


        I run a simple test with an Internationalized Domain Names command line
        tool, idn, on linux:

        $ idn --usestd3asciirules www.示例.com
        www.xn--fsq092h.com

        It means that Unicode is acceptable to IDN toASCII conversion (idn tool) even the
        UseSTD3ASCIIRules is set.

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  xuelei Xue-Lei Fan
                  Reporter:
                  xuelei Xue-Lei Fan
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  8 Start watching this issue

                  Dates

                  • Due:
                    Created:
                    Updated:
                    Resolved: