Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8248516

some newly added locale cannot parse uppercased date string.

    XMLWordPrintable

    Details

    • Type: CSR
    • Status: Closed
    • Priority: P3
    • Resolution: Withdrawn
    • Fix Version/s: tbd
    • Component/s: core-libs
    • Labels:
      None
    • Subcomponent:
    • Compatibility Kind:
      behavioral
    • Compatibility Risk:
      low
    • Compatibility Risk Description:
      Hide
      Applications that expect the current behavior would break with those supplementary code points that have case-mappings. This should have been the way when supplementary character support was introduced in the JDK. Thus even this is technically an incompatibility, very few applications would be expected to complain about it.
      Show
      Applications that expect the current behavior would break with those supplementary code points that have case-mappings. This should have been the way when supplementary character support was introduced in the JDK. Thus even this is technically an incompatibility, very few applications would be expected to complain about it.
    • Interface Kind:
      Java API
    • Scope:
      SE

      Description

      Summary

      Date/Time names with supplementary characters cannot be parsed in a case-insensitive manner.

      Problem

      JDK15 added a new locale "ff-Adlm-LR", which has locale data, such as month/day names in Adlam script, which is encoded in a supplementary character plane. java.text.DateFormat parses those names in a case-insensitive manner, but it throws an exception because underlying String.regionMatches(ignoreCase == true) fails for supplementary characters, such that:

      "\ud83a\udd2e".regionMatches(true, 0, "\ud83a\udd0c", 0, 2)

      Returns false. where:

      "\ud83a\udd2e" == 'ADLAM SMALL LETTER O' (U+1E92E)
      "\ud83a\udd0c" == 'ADLAM CAPITAL LETTER O' (U+1E90C)

      despite that:

      "\ud83a\udd2e".toUpperCase(Locale.ROOT).equals("\ud83a\udd0c")
      Character.toUpperCase(0x1e92e) == 0x1e90c

      each statement returns true.

      Solution

      Change those specs for String.regionMatches(boolean,...), String.equalsIgnoreCase(), and String.compareToIgnoreCase() to perform "code point" comparison in case for supplementary characters. Characters in Basic Multilingual Plane (<= \uFFFF) are continued to be compared with code units got from charAt() method.

      Although this change will alter the semantics in traversing the string to compare, the rationale to change it is that these String methods should consistently behave across characters (code points) whether they are in Basic Multilingual Plane or not. There should be no reason to exclude supplementary characters from comparing strings in a case-insensitive manner.

      Specification

      Append the following sentence just after the last list item of conditions in the method description of String.regionMatches(boolean, ...) method.

      * In case that both <i>toffset+k</i> and <i>ooffset+k</i> point to
      * supplementary characters, that is <i>k</i> point to high surrogates
      * and <i>k+1</i> point to low surrogates, {@code codePointAt()} is
      * used to retrieve the code points in place for {@code charAt()} method,
      * and <i>k+1</i> is excluded from the above condition. If they point
      * to an unpaired high or low surrogates, they are compared using
      * {@code charAt()} method.

      Change the following list item of conditions in the method description of String.equalsIgnoreCase() method from:

      *   <li> Calling {@code Character.toLowerCase(Character.toUpperCase(char))}
      *        on each character produces the same result

      to:

      *   <li> Calling {@code Character.toLowerCase(Character.toUpperCase(int))}
      *        on each code point produces the same result

      Change the following description in the method description of String.compareToIgnoreCase() method from:

      * {@code Character.toLowerCase(Character.toUpperCase(character))} on
      * each character.

      to:

      * {@code Character.toLowerCase(Character.toUpperCase(int))} on
      * each code point of the character.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              naoto Naoto Sato
              Reporter:
              webbuggrp Webbug Group
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: