Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8268457

XML Transformer outputs Unicode supplementary character incorrectly to HTML

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: P4
    • Resolution: Fixed
    • Affects Version/s: 17
    • Fix Version/s: 18
    • Component/s: xml
    • Labels:
      None
    • Resolved In Build:
      b04
    • CPU:
      generic
    • OS:
      generic

      Description

      I found following XML Transformer bug.

      1. Using OpenJDK bundled XML Transformer.
      2. Input XML contains Unicode supplementary character.
      3. Output file format is HTML.

      In this case, generated HTML contains incorrect character.
      If output file format is TEXT, generated TEXT is fine.

      Transformer has serializer class ToHTMLStream and ToTextStream.
      In ToTextStream, surrogate pair character is converted #&xxxx; format,
      but ToHTMLStream does not convert to #&xxxx; format.

      I think ToHTMLStream should implement like as ToTextStream.

      To reproduce,

      1. Compile and run attached SurrogateTest.java. Result file is stored in testdata directory.
      2. Comfirm the value of the input tag on the case01out.html.
         Expected: <input id="tag1" value="𠮟">
         Actual: <input id="tag1" value="𠮟&#55362;">
      3. ToTextStream result is fine (case02out.txt)

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              joehw Joe Wang
              Reporter:
              myano Masanori Yano
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: