Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6755060

Collator.compare() does not compare correctly for the Thai locale

    XMLWordPrintable

    Details

    • Subcomponent:
    • Resolved In Build:
      b13
    • CPU:
      sparc
    • OS:
      solaris_2.5.1
    • Verification:
      Verified

      Description

      OPERATING SYSTEM(S):
      All

      FULL JDK VERSION(S):
      All

      DESCRIPTION:

      If we take the following array of Thai characters:

      "\u0e01", "\u0e01\u0e2f", "\u0e01\u0e46", "\u0e01\u0e4f", "\u0e01\u0e5a", "\u0e01\u0e5b", "\u0e01\u0e4e", "\u0e01\u0e4c", "\u0e01\u0e48", "\u0e01\u0e01", "\u0e01\u0e4b\u0e01", "\u0e01\u0e4d", "\u0e01\u0e30", "\u0e01\u0e31\u0e01", "\u0e01\u0e32", "\u0e01\u0e33", "\u0e01\u0e34", "\u0e01\u0e35", "\u0e01\u0e36", "\u0e01\u0e37", "\u0e01\u0e38", "\u0e01\u0e39", "\u0e40\u0e01", "\u0e40\u0e01\u0e48", "\u0e40\u0e01\u0e49", "\u0e40\u0e01\u0e4b", "\u0e41\u0e01", "\u0e42\u0e01", "\u0e43\u0e01", "\u0e44\u0e01", "\u0e01\u0e3a", "\u0e24\u0e32", "\u0e24\u0e45", "\u0e40\u0e25", "\u0e44\u0e26"

      and sort them using a Thai Collator instance, the result, according to the CLDR, should be as follows:

      "\u0e01", "\u0e01\u0e2f", "\u0e01\u0e46", "\u0e01\u0e4f", "\u0e01\u0e5a", "\u0e01\u0e5b", "\u0e01\u0e4e", "\u0e01\u0e48", "\u0e01\u0e4c", "\u0e01\u0e4d", "\u0e01\u0e01", "\u0e01\u0e4b\u0e01", "\u0e01\u0e30", "\u0e01\u0e31\u0e01", "\u0e01\u0e32", "\u0e01\u0e33", "\u0e01\u0e34", "\u0e01\u0e35", "\u0e01\u0e36", "\u0e01\u0e37", "\u0e01\u0e38", "\u0e01\u0e39", "\u0e01\u0e3a", "\u0e40\u0e01", "\u0e40\u0e01\u0e48", "\u0e40\u0e01\u0e49", "\u0e40\u0e01\u0e4b", "\u0e41\u0e01", "\u0e42\u0e01", "\u0e43\u0e01", "\u0e44\u0e01", "\u0e24\u0e32", "\u0e24\u0e45", "\u0e40\u0e25", "\u0e44\u0e26"

      However, with the current implementation for the Thai collator, this is not the result we see.

      The testcase defines the array of Thai characters, sorts them using a Thai Collator instance, and compares the resulting array to the correct results as defined by the CLDR. The comparison of each character should yield "true", but it does not:

      c:\>java TestCollation
      "?".equals("?") yields true
      "??".equals("??") yields false
      "??".equals("??") yields false
      "??".equals("??") yields false
      "??".equals("??") yields false
      "??".equals("??") yields false
      "??".equals("??") yields false
      "??".equals("??") yields false
      "??".equals("??") yields false
      "??".equals("??") yields false
      "??".equals("??") yields true
      "???".equals("???") yields true
      "??".equals("??") yields true
      "???".equals("???") yields true
      "??".equals("??") yields true
      "??".equals("??") yields true
      "??".equals("??") yields true
      "??".equals("??") yields true
      "??".equals("??") yields true
      "??".equals("??") yields true
      "??".equals("??") yields true
      "??".equals("??") yields true
      "??".equals("??") yields false
      "???".equals("??") yields false
      "???".equals("???") yields false
      "???".equals("???") yields false
      "??".equals("???") yields false
      "??".equals("??") yields false
      "??".equals("??") yields false
      "??".equals("??") yields false
      "??".equals("??") yields false
      "??".equals("??") yields true
      "??".equals("??") yields true
      "??".equals("??") yields true
      "??".equals("??") yields true


      TESTCASE SOURCE:
      import java.text.Collator;
      import java.util.Arrays;
      import java.util.Comparator;
      import java.util.Locale;

      public class TestCollation {
          public static void main(String[] args) {
              final Locale locale = new Locale("th", "TH");
              final String a[] = {"\u0e01", "\u0e01\u0e2f", "\u0e01\u0e46", "\u0e01\u0e4f", "\u0e01\u0e5a", "\u0e01\u0e5b", "\u0e01\u0e4e", "\u0e01\u0e4c", "\u0e01\u0e48", "\u0e01\u0e01", "\u0e01\u0e4b\u0e01", "\u0e01\u0e4d", "\u0e01\u0e30", "\u0e01\u0e31\u0e01", "\u0e01\u0e32", "\u0e01\u0e33", "\u0e01\u0e34", "\u0e01\u0e35", "\u0e01\u0e36", "\u0e01\u0e37", "\u0e01\u0e38", "\u0e01\u0e39", "\u0e40\u0e01", "\u0e40\u0e01\u0e48", "\u0e40\u0e01\u0e49", "\u0e40\u0e01\u0e4b", "\u0e41\u0e01", "\u0e42\u0e01", "\u0e43\u0e01", "\u0e44\u0e01", "\u0e01\u0e3a", "\u0e24\u0e32", "\u0e24\u0e45", "\u0e40\u0e25", "\u0e44\u0e26"};
              final String b[] = {"\u0e01", "\u0e01\u0e2f", "\u0e01\u0e46", "\u0e01\u0e4f","\u0e01\u0e5a", "\u0e01\u0e5b", "\u0e01\u0e4e", "\u0e01\u0e48", "\u0e01\u0e4c", "\u0e01\u0e4d", "\u0e01\u0e01", "\u0e01\u0e4b\u0e01", "\u0e01\u0e30", "\u0e01\u0e31\u0e01", "\u0e01\u0e32", "\u0e01\u0e33", "\u0e01\u0e34", "\u0e01\u0e35", "\u0e01\u0e36", "\u0e01\u0e37", "\u0e01\u0e38", "\u0e01\u0e39", "\u0e01\u0e3a", "\u0e40\u0e01", "\u0e40\u0e01\u0e48","\u0e40\u0e01\u0e49", "\u0e40\u0e01\u0e4b", "\u0e41\u0e01", "\u0e42\u0e01", "\u0e43\u0e01", "\u0e44\u0e01", "\u0e24\u0e32", "\u0e24\u0e45", "\u0e40\u0e25", "\u0e44\u0e26"};

              Arrays.sort(a, new Comparator<String>() {
                              public int compare(String s1, String s2) {
                                  return Collator.getInstance(locale).compare(s1, s2);
                              }
                          });

              for (int i = 0; i < b.length; i++) {
                  System.out.println("\"" + a[i] + "\".equals(\"" + b[i] + "\") yields " + a[i].equals(b[i]));
              }
          }
      }

        Attachments

          Activity

            People

            Assignee:
            yhuang Yong Huang (Inactive)
            Reporter:
            dkorbel David Korbel (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: