Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4843787

org.xml.sax.SAXException was thrown when parsing large file

    Details

    • Type: Bug
    • Status: Closed
    • Priority: P4
    • Resolution: Not an Issue
    • Affects Version/s: 1.4.2
    • Fix Version/s: None
    • Component/s: xml
    • Labels:

      Description


      Name: gm110360 Date: 04/07/2003


      FULL PRODUCT VERSION :
      java version "1.4.2-beta"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2-beta-b19)
      Java HotSpot(TM) Client VM (build 1.4.2-beta-b19, mixed mode)

      FULL OS VERSION :
      Microsoft Windows XP [Version 5.1.2600]

      (Note: also shown error on win98 2nd edition)

      A DESCRIPTION OF THE PROBLEM :

      Parsing a large file with many entities using SAX or DOM, an exception will be thrown: org.xml.sax.SAXException: Fatal Error: URI=null Line=595: Parser has reached the entity expansion limit "64,000" set by the Application.


      STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :

      Run the source. Please email me for example test file. (testfile.xml)

      In case you don't want to email me for the file, here is how to create one:

      1) create an testfile.xml in the same directory where you run the code
      2) Paste the following:

      <?xml version='1.0' encoding='utf-8'?>
      <!--DTD for vocab -->
      <!DOCTYPE FirstNode [
      ELEMENT FirstNode (ChildNode)*
      ELEMENT ChildNode (#PCDATA)
      ]>

      <FirstNode>
      <ChildNode>
      <html><body><a name="1"></a>
      <p><b>concinnity</b></p>
      <blockquote>concinnity was Word of the Day on <a href="http://www.dictionary.com/wordoftheday/archive/2001/08/18.html">August 18, 2001</a>.</blockquote><br>
      <table border="0" cellpadding="0" cellspacing="0" width="100%"><tr><td class="src"><a href="/search?q=00-database-info&amp;db=wotd" title="Click for more information about this dictionary">Source</a>: <cite>Dictionary.com Word of the Day</cite></td></tr></table>
      <a name="2"></a>

      <TABLE><TR><TD><A NAME="C0548200"><B>con&#183;cin&#183;ni&#183;ty</B></A> &nbsp;&nbsp;<A TITLE="Click for guide to symbols." onClick="ahdpop();return false;" HREF="/help/ahd4/pronkey.html" CLASS="linksrc"><b>Pronunciation Key</b></A>&nbsp;&nbsp;(k<IMG ALT="" SRC="pronkey_files/schwa.gif" height="15" width="6" ALIGN="ABSBOTTOM">n-s<IMG
      ALT="" SRC="pronkey_files/ibreve.gif" height="15" width="7" ALIGN="ABSBOTTOM">n<IMG ALT="" SRC="pronkey_files/prime.gif" height="22" width="4" ALIGN="ABSBOTTOM"><IMG ALT="" SRC="pronkey_files/ibreve.gif" height="15" width="7" ALIGN^F
      quot; SRC="pronkey_files/emacr.gif" height="15" width="7" ALIGN="ABSBOTTOM">)<BR>
       <I>n.</I> <I>pl.</I> <B>con&#183;cin&#183;ni&#183;ties </B><OL><LI> Harmony in the arrangement or interarrangement of parts with respect to a whole.</LI>
      <LI> Studied elegance and facility in style of expression: &#147;He has what one character calls &#145;the gifts of concinnity and concision,&#146; that deft swipe with a phrase that can be so
      devastating in children&#148; (Elizabeth Ward).
      </LI>
      <LI>An instance of harmonious arrangement or studied elegance and facility.</LI>
      </OL><BR>
      <HR ALIGN="left" WIDTH="25%">[From Latin<TT> concinnit<IMG ALT="" SRC="pronkey_files/amacr.gif" height="15" width="7" ALIGN="ABSBOTTOM">s</TT>, from<TT> concinn<IMG ALT="" SRC="pronkey_files/amacr.gif" height="15" width="7" ALIGN="ABSBOTTOM">re</TT>, <I>to put in order</I>,
      from<TT> concinnus</TT>, <I>deftly joined</I>.]</TD>
      </TR></TABLE>
      <a name="3"></a>
      <b>concinnity</b><br><br>
       \Con*cin"ni*ty\, n. [L. concinnitas, fr. concinnus
         skillfully put together, beautiful. Of uncertain origin.]
         Internal harmony or fitness; mutual adaptation of parts;
         elegance; -- used chiefly of style of discourse. [R.]
      <br><br>
               An exact concinnit
      ;<table border="0" cellpadding="0" cellspacing="0" width="100%"><tr><td class="src"><a href="/search?q=00-database-info&amp;db=web1913" title="Click for more information about this dictionary">Source</a>: <cite>Webster's Revised Unabridged Dictionary, &copy; 1996, 1998 MICRA, Inc.</cite></td></tr></table>
      </body></html>
      </ChildNode>

      </FirstNode>

      3) Repeatedly copy and paste the <ChildNode>...</ChildNode> content for about 196 times inside the <FirstNode>..</FirstNode>

      When you run, the error happens after reading about 195 ChildNode.

      You can change line 30 and 31 of source:
              test.DOMRead();
              //test.SAXRead();
      to:
              //test.DOMRead();
              test.SAXRead();

      to test SAX error. In both cases, an exception was generated.


      EXPECTED VERSUS ACTUAL BEHAVIOR :
      No error.
      Exception when run

      ERROR MESSAGES/STACK TRACES THAT OCCUR :
      org.xml.sax.SAXException: Fatal Error: URI=null Line=595: Parser has reached the entity expansion limit "64,000" set by the Application.
              at TErrorHandler.fatalError(XMLError.java:198)
              at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3342)
              at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3333)
              at org.apache.crimson.parser.Parser2.expandEntityInContent(Parser2.java:2667)
              at org.apache.crimson.parser.Parser2.maybeReferenceInContent(Parser2.java:2569)
              at org.apache.crimson.parser.Parser2.content(Parser2.java:1980)
              at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1654)
              at org.apache.crimson.parser.Parser2.content(Parser2.java:1926)
              at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1654)
              at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:634)
              at org.apache.crimson.parser.Parser2.parse(Parser2.java:333)
              at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:448)
              at org.apache.crimson.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:185)
              at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:76)
              at XMLError.DOMRead(XMLError.java:101)
              at XMLError.main(XMLError.java:30)


      REPRODUCIBILITY :
      This bug can be reproduced always.

      ---------- BEGIN SOURCE ----------
      import java.util.*;
      import org.w3c.dom.*;
      import java.io.*;

      import javax.xml.parsers.DocumentBuilder;
      import javax.xml.parsers.DocumentBuilderFactory;
      import javax.xml.parsers.FactoryConfigurationError;
      import javax.xml.parsers.ParserConfigurationException;
      import javax.xml.parsers.*;

      import org.xml.sax.SAXException;
      import org.xml.sax.SAXParseException;
      import org.xml.sax.*;
      import org.xml.sax.helpers.*;
      import org.w3c.dom.*;
      import org.w3c.dom.Document;
      import org.w3c.dom.DOMException;


      public class XMLError {

          private String fname = null;

          public XMLError(String fname) {
              this.fname = fname;
          }
          
          public static void main(String [] argv){
              XMLError test = new XMLError("testfile.xml");
              test.DOMRead();
              //test.SAXRead();
          }

      public void SAXRead(){
               System.out.println("Reading " + fname + "...");
               String data = readFile(fname);
      if(data == null){
                   System.out.println("There is no such file as " + fname);
                   return;
               }
      try{
                          SAXParserFactory factory = SAXParserFactory.newInstance();
                          factory.setValidating(true);
                          SAXParser parser = factory.newSAXParser();
                          //org.xml.sax.helpers.DefaultHandler
                          
                          parser.parse(new ByteArrayInputStream(data.getBytes()), new DefaultHandler(){
                              private CharArrayWriter contents = new CharArrayWriter();
                              private int count;
                              
                              public void characters(char[] ch, int start, int length){
                                  contents.write( ch, start, length );
                              }
                              public void endDocument(){
                                  System.out.println("Finish: " + count);
                              }
                              public void endElement(String uri, String localName, String qName) {
                                  if ( qName.equals( "ChildNode" ) ) {
                                      count++;
                                      String str = contents.toString();
                                      System.out.println("Importing... " + count + " : " + str);
                                  }
                              }
                              public void startDocument(){
                                  //contents.reset();
                                  count = 0;
                                  
                              }
                              public void startElement(String uri, String localName, String qName, Attributes attributes){
                                  contents.reset();
                                  //System.out.println("The name: " + localName + ", qName: " + qName);
                              }
                              
                          });
                      }catch(Exception ee){
                          ee.printStackTrace();
                      }
      }
          
          public void DOMRead(){
              System.out.println("Reading " + fname + "...");
              String data = readFile(fname);
              if(data == null){
                  System.out.println("There is no such file as " + fname);
                  return;
              }
              int count = 0;
              try {
                  TErrorHandler error = new TErrorHandler();
                  DocumentBuilderFactory factory =
                  DocumentBuilderFactory.newInstance();
                  factory.setValidating(true);
                  factory.setIgnoringElementContentWhitespace(true);

                  //factory.setNamespaceAware(true);
                  //factory.setExpandEntityReferences(false);

                  System.out.println("Parsing xml data...");
                  DocumentBuilder builder = factory.newDocumentBuilder();
                  builder.setErrorHandler(error);
                  Document document = builder.parse(new ByteArrayInputStream(data.getBytes()));
                  Node node;
                  node = document.getFirstChild();
                  if(node == null){
                      return;
                  }
                  System.out.println("Start importing data: ");
                  while(node != null){
                      if(node.getNodeType() == Node.ELEMENT_NODE){
                          if("FirstNode".equalsIgnoreCase(node.getNodeName())) break;
                      }
                      node = node.getNextSibling();
                  }
                  node = node.getFirstChild();
                  String str = null;
                 
                  boolean done = false;
                  while((node != null) && (!done)){
                      str = getValue(node);
                      if(str == null) break;
                      node = node.getNextSibling();
                      count++;
                      if((count % 10) == 0){
                          System.out.print(".");
                      }
                  }
              }catch(Exception e){
                  e.printStackTrace();
              }
              
              System.out.println("\n\nDone: " + count);
          }
          static public String getValue(Node node){
              if(node == null) return null;
              Node node2 = node.getFirstChild();
              if(node2 == null){
                  return "";
              }
              if(node2.getNodeType() != Node.TEXT_NODE) return null;
              return node2.getNodeValue();
          }

          public static String readFile(String fname){
              if((fname == null) || (fname.trim().length() <= 0)){
                  return null;
              }
              BufferedReader in = null;
              String str;
              StringBuffer buf = new StringBuffer();
              try{
                  in = new BufferedReader(new FileReader(fname));
                  while(in.ready()){
                      str = in.readLine();
                      if(str == null) break;
                      buf.append(str + "\n");
                  }
                  in.close();
              }catch(IOException e){
                  //e.printStackTrace();
                  return null;
              }
              return buf.toString();
          }
      }

      class TErrorHandler implements ErrorHandler {
          int errNo = 0;
          String errMessage = "";
          public void resetError(){
              errNo = 0;
              errMessage = "";
          }
          public void setError(String mesg){
              errNo = 1;
              if(mesg == null) return;
              errMessage = errMessage + "\n" + mesg;
          }
          TErrorHandler() {
          }
          private String getParseExceptionInfo(SAXParseException spe) {
              String systemId = spe.getSystemId();
              if (systemId == null) {
                  systemId = "null";
              }
              String info = "URI=" + systemId + " Line=" + spe.getLineNumber() +
              ": " + spe.getMessage();
              return info;
          }
          public void warning(org.xml.sax.SAXParseException sAXParseException) throws org.xml.sax.SAXException {
              setError("Warning: " + getParseExceptionInfo(sAXParseException));
          }
          public void error(org.xml.sax.SAXParseException sAXParseException) throws org.xml.sax.SAXException {
              String message = "Error: " + getParseExceptionInfo(sAXParseException);
              throw new SAXException(message);
          }
          public void fatalError(org.xml.sax.SAXParseException sAXParseException) throws org.xml.sax.SAXException {
              String message = "Fatal Error: " + getParseExceptionInfo(sAXParseException);
              throw new SAXException(message);
          }
      }

      ---------- END SOURCE ----------

      CUSTOMER SUBMITTED WORKAROUND :

      None
      (Review ID: 183616)
      ======================================================================
      ###@###.### 2004-07-13

        Attachments

          Activity

            People

            • Assignee:
              nbajajsunw Neeraj Bajaj (Inactive)
              Reporter:
              gmanwanisunw Girish Manwani (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Imported:
                Indexed: