Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6836089

Swing HTML parser can't properly decode codepoints outside the Unicode Plane 0 into a surrogate pair

    XMLWordPrintable

    Details

    • Subcomponent:
    • Resolved In Build:
      b07
    • CPU:
      generic
    • OS:
      generic
    • Verification:
      Verified

      Backports

        Description

        The statement

           System.out.println("\ud840\udc00".codePointAt(0));

        returns

           131072, because both \ud840 and \udc00 are surrogate characters.

        If one say
         
           JTextPane htmlPane = new JTextPane();
           htmlPane.setEditorKit(new HTMLEditorKit());

           htmlPane.setText("<html><head></head><body>&#131072;</body></html>");

        the entity reference won't be parsed correctly into a surrogate pair.

           System.out.println(htmlPane.getText());

        returns

        <html>
          <head>
            
          </head>
          <body>
            &#0;
          </body>
        </html>

        rather than

        <html>
          <head>
            
          </head>
          <body>
            &#55360;&#56320;
          </body>
        </html>


        or at least

        <html>
          <head>
            
          </head>
          <body>
            &#131072;
          </body>
        </html>

          Attachments

            Issue Links

              Activity

                People

                Assignee:
                vkarnauk Vladislav Karnaukhov
                Reporter:
                jloefflm Johann Löfflmann (Inactive)
                Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                  Dates

                  Created:
                  Updated:
                  Resolved:
                  Imported:
                  Indexed: