Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8140747

Data corruption when parsing XML using StAX/Xerces

    Details

    • Type: Bug
    • Status: Closed
    • Priority: P3
    • Resolution: Duplicate
    • Affects Version/s: 7u71
    • Fix Version/s: None
    • Component/s: xml
    • Subcomponent:
    • CPU:
      x86
    • OS:
      linux

      Description

      FULL PRODUCT VERSION :
      java version "1.7.0_71"
      Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
      Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)

      java version "1.7.0_72"
      Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
      Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)

      java version "1.8.0_20"
      Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
      Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)

      java version "1.8.0_25"
      Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
      Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)


      ADDITIONAL OS VERSION INFORMATION :
      Linux hwd 3.16.3 #10 SMP PREEMPT Sun Sep 28 00:13:58 PDT 2014 x86_64 GNU/Linux

      A DESCRIPTION OF THE PROBLEM :
      When parsing XML using the StAX API, data can be corrupted after each invocation of "read(byte[], int, int)" on the underlying InputStream, depending on how many bytes are actually read.
      The Xerces implementation seems to overwrite its own internal buffer, leading to corrupted/inconsistent data. The bug is silent, no exception is thrown.

      This is currently affecting the following versions of the JREs:
      - 7u71
      - 7u72
      - 8u20
      - 8u25

      7u67 and 8u11 are not affected.

      REGRESSION. Last worked in version 7u67

      ADDITIONAL REGRESSION INFORMATION:
      java version "1.7.0_71"
      Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
      Java HotSpot(TM) 64-Bit Server VM (build 24.71-b01, mixed mode)

      java version "1.7.0_72"
      Java(TM) SE Runtime Environment (build 1.7.0_72-b14)
      Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)

      java version "1.8.0_20"
      Java(TM) SE Runtime Environment (build 1.8.0_20-b26)
      Java HotSpot(TM) 64-Bit Server VM (build 25.20-b23, mixed mode)

      java version "1.8.0_25"
      Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
      Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)


      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Execute the provided repro case.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      It should print "rugs"
      ACTUAL -
      It prints "bugs" when using the affected JREs

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.io.ByteArrayInputStream;
      import java.io.FilterInputStream;
      import java.io.IOException;
      import java.io.InputStream;
      import java.nio.charset.Charset;

      import javax.xml.stream.XMLInputFactory;
      import javax.xml.stream.XMLStreamReader;

      /*
       * Correct output (7u67,8u11)
       * rugs
       *
       * Incorrect output (7u71,7u72,8u20,8u25)
       * bugs
       */
      public class XmlReaderBug {

          private static final int BYTES_PER_READ = 6;

          private static final String XML =
              "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
              "<He likes=\"rugs\" because=\"they really tie the room together\"/>";

          public static void main(String[] args) throws Exception {
              final InputStream xmlStream = new ByteArrayInputStream(XML.getBytes(Charset.forName("UTF-8")));
              final InputStream throttledXmlStream = new ThrottledInputStream(xmlStream, BYTES_PER_READ);

              final XMLInputFactory xmlFactory = XMLInputFactory.newInstance();
              final XMLStreamReader xmlStreamReader = xmlFactory.createXMLStreamReader(throttledXmlStream);
              xmlStreamReader.next();

              // bugs or rugs?
              System.out.println(xmlStreamReader.getAttributeValue(null, "likes"));
          }

          // An InputStream implementation that limits the number of bytes read by read(byte[], int, int)
          private static class ThrottledInputStream extends FilterInputStream {
              private final int bytesPerRead;

              public ThrottledInputStream(InputStream stream, int bytesPerRead) throws Exception {
                  super(stream);
                  this.bytesPerRead = bytesPerRead;
              }

              @Override
              public int read(byte[] b, int off, int len) throws IOException {
                  if (off < 0 || len < 0 || len > b.length - off) {
                      throw new IndexOutOfBoundsException();
                  } else if (len == 0) {
                      return 0;
                  }

                  // Limit bytes read
                  int bytesToRead = Math.min(bytesPerRead, len);

                  // Ensure deterministic behavior (similar to org.apache.commons.io.IOUtils.read)
                  // Useless for this test case, but convenient for consistently reproducing
                  // the bug with other stream implementations
                  int totalBytesRead = 0;
                  int bytesRead = 0;
                  do {
                      bytesRead = Math.max(0, in.read(b, off + totalBytesRead, bytesToRead));
                      bytesToRead -= bytesRead;
                      totalBytesRead += bytesRead;
                  } while (bytesRead > 0);

                  // No more bytes
                  if (totalBytesRead == 0) {
                      return -1;
                  }

                  return totalBytesRead;
              }
          }
      }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      - Do not use the affected versions of the JRE
      - Use InputStreams that do not return too few bytes at a time seems to make the issue to vanish. However, I am not sure if this is really the case, or if it just makes the issue more unlikely to happen.
      Since the bug is silent, and given that most InputStreams do not make any guarantees on how many bytes are actually read at each invocation of read(), I would only recommend to stay on earlier versions of the JDK



        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                aefimov Aleksej Efimov
                Reporter:
                webbuggrp Webbug Group
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: