Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8145974

XMLStreamWriter produces invalid XML for surrogate pairs on OutputStreamWriter

    Details

    • Subcomponent:
    • Resolved In Build:
      b119
    • CPU:
      x86_64
    • OS:
      windows_7

      Backports

        Description

        FULL PRODUCT VERSION :
        1.7.0_65

        ADDITIONAL OS VERSION INFORMATION :
        Microsoft Windows [Version 6.1.7601]

        A DESCRIPTION OF THE PROBLEM :
        I have narrowed down a problem where our application produced XML which it could not parse back. The XML contained "character references", but the reference had an invalid value (there are valid ranges fro them in XML). It turned out that these character references are generated specifically for characters outside the BMP, i.e. are encoded using a surrogate pair. Further investigation revealed that this happens only when constructing the XMLStreamWriter with an OutputStreamWriter. The surrogates are encoded as valid UTF-8 multibytes sequences when usign a plain OutputStream. The error can however not be in the OutputStreamWriter, since the character references are specific to XML files of which the OutputStreamWriter knows nothing.

        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        I am attaching a test program which clearly demonstrates the problem.


        REPRODUCIBILITY :
        This bug can be reproduced always.

        ---------- BEGIN SOURCE ----------
        package com.dramaqueen.exporters;

        import static org.junit.Assert.*;

        import java.io.ByteArrayInputStream;
        import java.io.InputStream;
        import java.io.OutputStreamWriter;
        import java.io.UnsupportedEncodingException;

        import javax.xml.stream.XMLInputFactory;
        import javax.xml.stream.XMLOutputFactory;
        import javax.xml.stream.XMLStreamException;
        import javax.xml.stream.XMLStreamReader;
        import javax.xml.stream.XMLStreamWriter;

        import org.junit.Test;

        import com.sun.xml.internal.messaging.saaj.util.ByteOutputStream;

        @SuppressWarnings("nls")
        public class StreamVersusWriterTest {

        @Test
        public void streamVersusWriter() {
        String charset = "UTF-8";

        ByteOutputStream streamA = new ByteOutputStream();
        ByteOutputStream streamB = new ByteOutputStream();

        XMLOutputFactory factory = XMLOutputFactory.newInstance();
        try {
        XMLStreamWriter writerA = factory.createXMLStreamWriter(streamA,
        charset);
        generateXML(writerA, charset);

        OutputStreamWriter streamWriter = new OutputStreamWriter(streamB,
        charset);
        XMLStreamWriter writerB = factory.createXMLStreamWriter(
        streamWriter);
        generateXML(writerB, charset);

        String outputA = streamA.toString();
        String outputB = streamB.toString();

        System.out.println("output using OutputStream : " + outputA);
        System.out.println("output using OutputStreamWriter: " + outputB);

        // assertEquals(outputA, outputB);

        readXML(outputA.getBytes(charset), charset);
        readXML(outputB.getBytes(charset), charset);

        } catch (XMLStreamException e) {
        e.printStackTrace();
        // assertTrue(false);
        } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
        // assertTrue(false);
        }
        }

        private void generateXML(XMLStreamWriter writer, String charset)
        throws XMLStreamException {
        // Char sequence containing a smiley which is encoded as a surrogate
        // pair in the Java string
        String sequence = "A😊�Bß";
        writer.writeStartDocument(charset, "1.0");
        writer.writeStartElement("a");
        writer.writeCharacters(sequence);
        writer.writeEndElement();
        writer.writeEndDocument();
        writer.flush();
        }

        private void readXML(byte[] xmlData, String charset)
        throws XMLStreamException {
        InputStream stream = new ByteArrayInputStream(xmlData);
        XMLInputFactory factory = XMLInputFactory.newInstance();
        XMLStreamReader xmlReader
        = factory.createXMLStreamReader(stream, charset);
        while (xmlReader.hasNext())
        xmlReader.next();
        }
        }

        ---------- END SOURCE ----------

        CUSTOMER SUBMITTED WORKAROUND :
        Use OutputStream, not OutpuStreamWriter

        1. solution.patch
          3 kB
          Aleksej Efimov
        2. StreamVersusWriterTest.java
          3 kB
          Pallavi Sonal

          Issue Links

            Activity

            Hide
            psonal Pallavi Sonal added a comment -
            Attached Test case executed on following versions:
            JDK 7u60 - Fail
            JDK 8u66 - Fail
            JDK 8u72 - Fail
            JDK 9ea b93 - Pass

            Here is the output on failed versions:
            output using OutputStream : <?xml version="1.0" encoding="UTF-8"?><a>A😊�Bß</a>
            output using OutputStreamWriter: <?xml version="1.0" encoding="UTF-8"?><a>A&#xd83d;&#xde0a;�Bß</a>
            javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,51]
            Message: Character reference "&#
            at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:598)
            at StreamVersusWriterTest.readXML(StreamVersusWriterTest.java:37)
            at StreamVersusWriterTest.main(StreamVersusWriterTest.java:68)


            Here is the output on JDK 9:
            output using OutputStream : <?xml version="1.0" encoding="UTF-8"?><a>AðŸË?Šï
            ¿½Bß</a>
            output using OutputStreamWriter: <?xml version="1.0" encoding="UTF-8"?><a>AðŸË
            ?Šï¿½Bß</a>
            Show
            psonal Pallavi Sonal added a comment - Attached Test case executed on following versions: JDK 7u60 - Fail JDK 8u66 - Fail JDK 8u72 - Fail JDK 9ea b93 - Pass Here is the output on failed versions: output using OutputStream : <?xml version="1.0" encoding="UTF-8"?><a>A😊�Bß</a> output using OutputStreamWriter: <?xml version="1.0" encoding="UTF-8"?><a>A&#xd83d;&#xde0a;�Bß</a> javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,51] Message: Character reference "&# at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:598) at StreamVersusWriterTest.readXML(StreamVersusWriterTest.java:37) at StreamVersusWriterTest.main(StreamVersusWriterTest.java:68) Here is the output on JDK 9: output using OutputStream : <?xml version="1.0" encoding="UTF-8"?><a>AðŸË?Šï ¿½Bß</a> output using OutputStreamWriter: <?xml version="1.0" encoding="UTF-8"?><a>AÃ°Å¸Ë ?Šï¿½Bß</a>
            Hide
            aefimov Aleksej Efimov added a comment -
            The bug reproducible on latest JDK9 builds.
            Show
            aefimov Aleksej Efimov added a comment - The bug reproducible on latest JDK9 builds.
            Hide
            aefimov Aleksej Efimov added a comment -
            Patch with possible solution is attached.
            Show
            aefimov Aleksej Efimov added a comment - Patch with possible solution is attached.
            Hide
            hgupdate HG Updates added a comment -
            URL: http://hg.openjdk.java.net/jdk9/dev/jaxp/rev/f92e8518bb34
            User: aefimov
            Date: 2016-05-12 22:22:15 +0000
            Show
            hgupdate HG Updates added a comment - URL: http://hg.openjdk.java.net/jdk9/dev/jaxp/rev/f92e8518bb34 User: aefimov Date: 2016-05-12 22:22:15 +0000
            Hide
            afomin Alexander Fomin (Inactive) added a comment -
            No issues related to the fix in the recent core-libs nightly. SQE OK to take it in PSU16_03.
            Show
            afomin Alexander Fomin (Inactive) added a comment - No issues related to the fix in the recent core-libs nightly. SQE OK to take it in PSU16_03.
            Hide
            hgupdate HG Updates added a comment -
            URL: http://hg.openjdk.java.net/jdk9/jdk9/jaxp/rev/f92e8518bb34
            User: lana
            Date: 2016-05-18 20:42:16 +0000
            Show
            hgupdate HG Updates added a comment - URL: http://hg.openjdk.java.net/jdk9/jdk9/jaxp/rev/f92e8518bb34 User: lana Date: 2016-05-18 20:42:16 +0000

              People

              • Assignee:
                aefimov Aleksej Efimov
                Reporter:
                webbuggrp Webbug Group
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: