Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8041791

String.toLowerCase regression - violates Unicode standard

    Details

    • Subcomponent:
    • Resolved In Build:
      b14
    • CPU:
      generic
    • OS:
      generic
    • Verification:
      Verified

      Backports

        Description

        The change JDK-8020037 "String.toLowerCase incorrectly increases length, if string contains \u0130 char" seems to be wrong, according to my reading of the Unicode standard.

        The text "String.toLowerCase incorrectly increases length" makes the assumption that this is a problem, but of course it isn't: The documentation specifically says "Since case mappings are not always 1:1 char mappings, the resulting String may be a different length than the original String."

        I look at http://www.unicode.org/Public/UNIDATA/SpecialCasing.txt and see:

        # Preserve canonical equivalence for I with dot. Turkic is handled below.

        0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE

        My understanding of this is that in all locales *except* the ones handled specially (which are 'az', 'lt', and 'tr') we should bi-directionally convert "\u0130" <-> "\u0069\u0307".
        I.e. lowercasing "\u0130" should result in "\u0069\u0307";
        converting "\u0069\u0307" to uppercase or titlecase should yield "\u0130".

        Note this allows round-trip conversions, which is why it is specified this way.

        Java 7 correctly does the former conversion, but not the latter.
        Java 8 does neither.

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  naoto Naoto Sato
                  Reporter:
                  pbothner Per Bothner (Inactive)
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  7 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: