Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 6536111
Votes 35
Synopsis SAX parser throws OutOfMemoryError
Category jaxp:sax
Reported Against
Release Fixed 1.4, 6u14(b03) (Bug ID:2173432)
State 11-Closed, Verified, bug
Priority: 2-High
Related Bugs
Submit Date 19-MAR-2007
Description
FULL PRODUCT VERSION :
java version "1.6.0"
Java(TM) SE Runtime Environment (build 1.6.0-b105)
Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)


A DESCRIPTION OF THE PROBLEM :
When parsing huge XML files (> 200MB) with SAX Java 6 runs out of memory, because the whole input file is stored in memory. Java 1.5 and the current Xerces version 2.9.0 work fine.
I assume that there is a bug in XMLDocumentScannerImpl. It has a flag fReadingDTD indicating that currently the DTD is read. If this is true, refresh(int) adds character to a buffer. It seems the end of the DTD is not recognized and the whole XML file is added to the buffer.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Run the code which creates a large XML file in tmp (i.e. /var/tmp) location, and the OutOfMemoryError will show.

Parse it with the standard SAXParser using at least an EntityResolver that resolves the SystemId.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Should work without any OutOfMemory errors
ACTUAL -
OutOfMemory error

ERROR MESSAGES/STACK TRACES THAT OCCUR :
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at com.sun.org. customer .xerces.internal.util.XMLStringBuffer.append(XMLStringBuffer.java:205)
        at com.sun.org. customer .xerces.internal.impl.XMLDocumentScannerImpl.refresh(XMLDocumentScannerImpl.java:1493)
        at com.sun.org. customer .xerces.internal.impl.XMLEntityScanner.invokeListeners(XMLEntityScanner.java:2070)
        at com.sun.org. customer .xerces.internal.impl.XMLEntityScanner.scanLiteral(XMLEntityScanner.java:1063)
        at com.sun.org. customer .xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:974)
        at com.sun.org. customer .xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanAttribute(XMLDocumentFragmentScannerImpl.java:1537)
        at com.sun.org. customer .xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1314)
        at com.sun.org. customer .xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2740)
        at com.sun.org. customer .xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:645)
        at com.sun.org. customer .xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:508)
        at com.sun.org. customer .xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
        at com.sun.org. customer .xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
        at com.sun.org. customer .xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
        at com.sun.org. customer .xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
        at com.sun.org. customer .xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
        at webbugstestcases.jaxp.sax.inc920008.SAXParserTest.main(SAXParserTest.java:71)
Java Result: 1


---------- BEGIN SOURCE ----------
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.io.StringReader;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;

public class SAXParserTest {
    private static final String DTD =
            "<!ELEMENT config  (config*,entry*)*>\n"
                    + "<!ATTLIST config key CDATA #REQUIRED>\n"
                    + "<!ELEMENT entry (#PCDATA)>\n"
                    + "<!ATTLIST entry key CDATA #REQUIRED type CDATA
#REQUIRED value CDATA #REQUIRED isnull CDATA #IMPLIED >";

    private static final EntityResolver RESOLVER = new EntityResolver() {
        public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException {
            InputSource is = new InputSource(new StringReader(DTD));
            return is;
        }
    };

    public static void main(String[] args) throws
ParserConfigurationException,
            SAXException, FileNotFoundException, IOException {
        // create a huge XML file
        File test = File.createTempFile("test", "xml");
        test.deleteOnExit();
        BufferedWriter out = new BufferedWriter(new FileWriter(test));
        out.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n");
        out.write("<!DOCTYPE config SYSTEM
\"org/knime/core/node/config/XMLConfig.dtd\">\n");
        out.write("<config key=\"root\">\n");
        for (int i = 0; i < 1000000; i++) {
            out.write("<config key=\"" + i + "\">");
            out.write("<entry key=\"datacell\" type=\"xstring\"
value=\"org.knime.core.data.def.IntCell\"/>\n");
            out.write("</config>\n");
        }
        out.write("</config>");
        out.close();
       
        // try to parse it
        SAXParserFactory factory = SAXParserFactory.newInstance();
        factory.setValidating(true);
        SAXParser parser = factory.newSAXParser();

        XMLReader reader = parser.getXMLReader();
        reader.setEntityResolver(RESOLVER);

        // java.lang.OutOfMemoryError: Java heap space, even with 256MB heap
        reader.parse(new InputSource(new FileInputStream(test)));
    }
}
---------- END SOURCE ----------


REPRODUCIBILITY :
This bug can be reproduced always.
Posted Date : 2007-03-20 00:01:20.0
Work Around
N/A
Evaluation
To answer all the requests to fix this issue, I'm raising the priority to 2. We should investigate it as soon as possible.
Posted Date : 2008-05-28 22:20:16.0

Fix is ready. Needs to get a review and regression test. We will then request an integration into a JDK6 update release as soon as possible.
Posted Date : 2008-07-02 04:48:45.0

Fix is verified in JAXP 1.4 on java.net. Will request for an integration into a jdk6 update release.
Posted Date : 2008-07-15 06:17:01.0

I appreciate all the concerns and votes for this issue. The fix is now integrated into the workspace for jdk6 update 14 which is scheduled to be released in the mid May timeframe. 

Meanwhile, you may use the endorsed mechanism to override the jaxp functionalities with jaxp jars downloadable from java.net.
Posted Date : 2009-02-26 18:56:08.0
Comments
  
  Include a link with my name & email   

Submitted On 17-JUL-2007
this is really a huge blocker in den Sun SAX implementation. I will definitely vote for it. you currently not parse big XML files.


Submitted On 24-SEP-2007
I have written StAX code that, it appears, this bug now completely breaks in JDK6. When I use JDK5 (and the BEA StAX jars) my code works again. This is kind of a major problem, no? This means my StAX code cannot (easily) run under JDK6.


Submitted On 24-SEP-2007
Note: when I say is completely broken, it compiles, it parses the file, but on big files it crashes with this exact same exception. My XML file is about 266MB. I am using 

        XMLInputFactory factory = XMLInputFactory.newInstance();
        factory.setProperty(XMLInputFactory.IS_COALESCING, Boolean.TRUE);
        XMLStreamReader parser = factory
                .createXMLStreamReader(new FileInputStream(filename));

... Then the standard parser.hasNext(), parser.next() and the code gives this exception on .next.

The XMLStreamReader (StAX) should NOT need to store the file in memory, it should be streaming through the file and thus require almost no memory at all.


Submitted On 15-OCT-2007
is there any update on this issue? this is really a blocker in our environment.


Submitted On 19-NOV-2007
bennini
I am also getting the same problem when parsing very large XML files under JRE 1.6.

check the post here:

http://forum.java.sun.com/thread.jspa?messageID=9977989

which has more information about the problem as well as JProfiler screen shots. I am parsing files that are roughly 1GB. Under Java 1.4 and 1.5, everything works great. memory consumption never goes above 20MB.


Submitted On 27-NOV-2007
Workaround:

Download xerces from http://xerces.apache.org/xerces2-j/download.cgi , add xml-apis.jar and xercesImpl.jar to the classpath.


Submitted On 28-APR-2008
gorguda
Incredible huge bug, should be fix ASAP, we took a week to discover it (we were thinking of a memory leak of our own code).
Back to 1.5 fixed everything.


Submitted On 29-MAY-2008
ubschmidt
Another workaround:

Use Woodstox (http://woodstox.codehaus.org/).


Submitted On 03-OCT-2008
LeoRR
Doesn anybody knows which jdk6 update includes the fix to this bug?


Submitted On 23-JAN-2009
stephane_aboab
Hello. I need this fix for my customer. When and into which Java SE revision it will be released? Thanx.


Submitted On 05-FEB-2009
Hello, we are urgently waiting for a fix in JRE 6 and JDK 6. Why does it take so long? What are the definitions of prio 2? I don't trust java anymore, that's terrible.


Submitted On 04-APR-2009
pifpafpuf
Its my guess that the endorsed mechanism does not work, likely because the problematic class(es) are not in an endorsed package, but rather in com.sun.org.XXX.

Harald.


Submitted On 13-APR-2009
Our developer spent hours trying to figure out what this problem was, glad this is not on our side but would appreciate for a fix to be released asap. 


Submitted On 14-APR-2009
joehw12
See the comment in the Evaluation section, the fix will be in update 14. To also answer Harald's comment, the endorsed mechnism does work for the JAXP package.


Submitted On 18-MAY-2009
njoneill
I'm running into what looks like a related problem when trying to parse a document with a very large CDATA section.  This is with JDK 6 release 14 beta 6.  Apparently the parser is attempting to buffer the entire CDATA section as a whole.  Is this a bug, or are we limited to using CDATA sections small enough to fit into memory?


java.lang.OutOfMemoryError: Java heap space
at com.sun.org.apache.xerces.internal.util.XMLStringBuffer.append(XMLStringBuffer.java:205)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.scanData(XMLEntityScanner.java:1380)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanCDATASection(XMLDocumentFragmentScannerImpl.java:1646)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2977)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
at com.sun.org.apache.xerces.internal.parsers.XML11Configutation.parse(XML11Configutation.java:807)
at com.sun.org.apache.xerces.internal.parsers.XML11Configutation.parse(XML11Configutation.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:395)



PLEASE NOTE: JDK6 is formerly known as Project Mustang