Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8144651

Corruption when parsing large XML files chunked over HTTP

    Details

    • Type: Bug
    • Status: Closed
    • Priority: P3
    • Resolution: Duplicate
    • Affects Version/s: 7u71
    • Fix Version/s: None
    • Component/s: xml
    • Labels:

      Description

      FULL PRODUCT VERSION :
      java version "1.7.0_71"
      Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
      Java HotSpot(TM) Client VM (build 24.71-b01, mixed mode, sharing)

      ADDITIONAL OS VERSION INFORMATION :
      Microsoft Windows [Version 5.2.3790]
      Microsoft Windows [Version 6.1.7601]

      A DESCRIPTION OF THE PROBLEM :
      Was introduced when upgrading the JRE in use from Java 7u51 to 7u71 without changing any code. It also appears to be with JAXB parsing for large XML files (on the order of 20MB or larger) when the stream is chunked over HTTP. Later investigation showed that the error was not in 7u67, so must have been introduced with the changes added to 7u71.

      REGRESSION. Last worked in version 1.0.4

      ADDITIONAL REGRESSION INFORMATION:
      java version "1.7.0_51"
      Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
      Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
      Find or create a web server that generates a 20MB or larger XML file and serves it over a chunked HTTP stream. The only one I've found is the one we built, running Tomcat. Generate JAXB object(s) from the XML schema (preferably one with both strings and numbers). Then use the source code below to connect, read, and parse the stream and marshall it back to a file.

      As far as I can tell, removing any of the following conditions prevents the error: Chunked HTTP stream, Java 7u71 or later, large XML, JAXB/Stax parsing. The exceptions are usually thrown when attempting to parse integers or doubles that get non-numeric portions inserted from other portions of the XML - always from near the boundaries of the chunks as logged.

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Successful parsing into the JAXB object. File written contains the same data as the server's XML (normalization of XML likely needed to compare)
      ACTUAL -
      Exceptions thrown while parsing, repeatable almost always, but inconsistent in the specific location of parse corruption. The exceptions are usually thrown when attempting to parse integers or doubles that get non-numeric portions inserted from other portions of the XML - always from near the boundaries of the chunks as logged. The corruption has also been seen to happen in attribute names, though rarely.

      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      17:08:09,705 DEBUG wire:63 - << "2000[\r][\n]"
      17:08:09,705 DEBUG wire:77 - << "trend>....OTHER XML...<trend hours=""
      17:08:09,705 DEBUG wire:77 - << "634.0972777777778" datetime="2013-05-21T00:43:48.350Z" t"
      17:08:09,705 DEBUG wire:63 - << "[\r][\n]"
      17:08:09,705 DEBUG wire:63 - << "2000[\r][\n]"
      17:08:09,705 DEBUG wire:77 - << "rend-mode="0">
      Exception in thread "main" java.lang.NumberFormatException: t34.0972777777778
        at com.sun.xml.internal.bind.DatatypeConverterImpl._parseDouble(DatatypeConverterImpl.java:213)
        at mypackage.Trend_JaxbXducedAccessor_hours.parse(TransducedAccessor_field_Double.java:48)
        at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StructureLoader.startElement(StructureLoader.java:194)
        at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:486)
        at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:465)
        at com.sun.xml.internal.bind.v2.runtime.unmarshaller.InterningXmlVisitor.startElement(InterningXmlVisitor.java:60)
        at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.handleStartElement(StAXStreamConnector.java:231)
        at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.bridge(StAXStreamConnector.java:165)
        at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:355)
        at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:334)
       
      OR

      17:19:12,563 DEBUG wire:63 - << "2000[\r][\n]"
      17:19:12,563 DEBUG wire:77 - << ...OTHER XML...<trend index="5"
      17:19:12,563 DEBUG wire:77 - << "" label="N"
      17:19:12,563 DEBUG wire:63 - << "[\r][\n]"
      Exception in thread "main" java.lang.NumberFormatException: Not a number: N
        at com.sun.xml.internal.bind.DatatypeConverterImpl._parseInt(DatatypeConverterImpl.java:106)
        at com.sun.xml.internal.bind.DatatypeConverterImpl._parseShort(DatatypeConverterImpl.java:118)

      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.io.FileOutputStream;
      import java.io.IOException;
      import java.io.InputStream;

      import javax.xml.bind.JAXBContext;
      import javax.xml.bind.JAXBElement;
      import javax.xml.bind.JAXBException;
      import javax.xml.bind.Unmarshaller;
      import javax.xml.parsers.ParserConfigurationException;
      import javax.xml.stream.XMLInputFactory;
      import javax.xml.stream.XMLStreamException;
      import javax.xml.stream.XMLStreamReader;

      import org.apache.http.HttpResponse;
      import org.apache.http.client.HttpClient;
      import org.apache.http.client.methods.HttpGet;
      import org.apache.http.conn.scheme.PlainSocketFactory;
      import org.apache.http.conn.scheme.Scheme;
      import org.apache.http.conn.scheme.SchemeRegistry;
      import org.apache.http.impl.client.DefaultHttpClient;
      import org.apache.http.impl.conn.BasicClientConnectionManager;
      import org.xml.sax.SAXException;

      public class XML_FAIL2 {
      public static void main(String[] args) throws IOException, SAXException, ParserConfigurationException, JAXBException, XMLStreamException {
      //TODO: Replace the following with a URL returning large XML over chunked HTTP
      String url = "http://someUrlReturningAlargeChunkedXML";
      //TODO: Replace the following with a JAXB object representing the XML expected.
      Class<?> jaxBObjectOfResponse = JaxBObjectOfResponse.class;
      JAXBContext context = JAXBContext.newInstance(jaxBObjectOfResponse);
      Unmarshaller unmarshaller = context.createUnmarshaller();

      XMLInputFactory factory = XMLInputFactory.newInstance();
              SchemeRegistry registry = new SchemeRegistry();
              registry.register(
                      new Scheme("http", 80, PlainSocketFactory.getSocketFactory()));
      HttpClient client = new DefaultHttpClient(new BasicClientConnectionManager(registry));
      FileOutputStream parsedFile;
      parsedFile = new FileOutputStream("out.xml");

      HttpGet method = new HttpGet(url);
      HttpResponse response = client.execute(method);
      InputStream inputStream = response.getEntity().getContent();

      XMLStreamReader responseReader = factory.createXMLStreamReader(inputStream);
      JAXBElement<?> unmarshalled = unmarshaller.unmarshal(responseReader, jaxBObjectOfResponse);
      if (unmarshalled ==null || unmarshalled.getValue()==null) {
      System.out.println("Null object");
      } else {
      System.out.println("Non-null object. ");
      context.createMarshaller().marshal(unmarshalled.getValue(), parsedFile);
      }
      inputStream.close();
      parsedFile.close();
      }
      }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :
      Using a different XML parser from the default one seems to work. I added stax2-api-3.1.1.jar and woodstox-core-asl-4.2.0.jar to the class path and that worked around the problem. Streaming the response to a file and then parsing that file seems to work also. As far as I can tell, removing any of the following conditions prevents the error: Chunked HTTP stream, Java 7u71 or later, large XML, JAXB/Stax parsing.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                aefimov Aleksej Efimov
                Reporter:
                webbuggrp Webbug Group
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: