Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6836089

Swing HTML parser can't properly decode codepoints outside the Unicode Plane 0 into a surrogate pair

    Details

    • Subcomponent:
    • Resolved In Build:
      b07
    • CPU:
      generic
    • OS:
      generic
    • Verification:
      Verified

      Backports

        Description

        The statement

           System.out.println("\ud840\udc00".codePointAt(0));

        returns

           131072, because both \ud840 and \udc00 are surrogate characters.

        If one say
         
           JTextPane htmlPane = new JTextPane();
           htmlPane.setEditorKit(new HTMLEditorKit());

           htmlPane.setText("<html><head></head><body>&#131072;</body></html>");

        the entity reference won't be parsed correctly into a surrogate pair.

           System.out.println(htmlPane.getText());

        returns

        <html>
          <head>
            
          </head>
          <body>
            &#0;
          </body>
        </html>

        rather than

        <html>
          <head>
            
          </head>
          <body>
            &#55360;&#56320;
          </body>
        </html>


        or at least

        <html>
          <head>
            
          </head>
          <body>
            &#131072;
          </body>
        </html>

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  vkarnauk Vladislav Karnaukhov
                  Reporter:
                  jloefflm Johann Löfflmann (Inactive)
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  0 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:
                    Imported:
                    Indexed: