Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4821213

(cs) UTF-8: CharsetEncoder.canEncode(char) accepts unpaired surrogates

    XMLWordPrintable

    Details

    • Subcomponent:
    • Resolved In Build:
      b32
    • CPU:
      generic, sparc
    • OS:
      generic, solaris_8
    • Verification:
      Verified

      Description

      Name: auR10023 Date: 02/20/2003



      CharsetEncoder.canEncode(char) returns true with surrogate unicode
      characters in UTF-8 but javadoc for this method says:

      ...
      This method returns false if the given character is a surrogate
      character; such characters can be interpreted only when they are members
      of a pair consisting of a high surrogate followed by a low surrogate
      ...

      Here is the example:

      ------------- test.java --------------

      import java.nio.charset.*;

      public class test {

          static char [] illChrs = {
              '\ud800', '\ud801', '\udffe', '\udfff'
          };


          public static void main (String[] args) {
              CharsetEncoder en = null;
              try {
                  en = Charset.forName("UTF-8").newEncoder();
              } catch(IllegalArgumentException e) {
                  e.printStackTrace(System.out);
                  System.out.println("Unexpected " + e);
                  return;
              }
       
              for (int j = 0; j < illChrs.length; j++) {
                  if (en.canEncode((char)j)) {
                      System.out.println("Unexpected value returned with " +
                                         (int)illChrs[j]);
                  }
              }
          }
      }

      #java -version
      java version "1.4.2-beta"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b16)
      Java HotSpot(TM) Client VM (build 1.4.2-beta-b16, mixed mode)

      #java test

      Unexpected value returned with 55296
      Unexpected value returned with 55297
      Unexpected value returned with 57342
      Unexpected value returned with 57343


      ======================================================================

        Attachments

          Activity

            People

            Assignee:
            mr Mark Reinhold
            Reporter:
            avusunw Avu Avu (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:
              Imported:
              Indexed: