Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 6690015
Votes 6
Synopsis XML Parse attributes with amp gt; in attribute value causes wrong order
Category jaxp:sax
Reported Against
Release Fixed
State 11-Closed, duplicate of 6518733, bug
Priority: 3-Medium
Related Bugs
Submit Date 17-APR-2008
Description
FULL PRODUCT VERSION :
:~$ java -version
java version "1.6.0_03"
Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
Java HotSpot(TM) Client VM (build 1.6.0_03-b05, mixed mode, sharing)

ADDITIONAL OS VERSION INFORMATION :
Windows XP service pack 2
Linux <hostname> 2.6.22-14-generic #1 SMP Tue Feb 12 07:42:25 UTC 2008 i686 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
Problem occurs dependent on at least two factors:

1. The number of attributes in the parsed element
2. The existence of allowed entities, eg. amp gt;  (ampersand not actually written here)

Similar (but not the same) bug found in bug database search, 6567432, but that was declared to be fixed for java 6 update 3, and I am using Java 6 update 5.
===================================================

Problem:

When an XML element is parsed, and that element has:
    1. enough attributes (my tests were using 16 attributes)
    2. attributes which values contain allowed entities, eg. amp gt;
the retrieval of attributes results in:
    1. mixed up attribute name/ attribute value pairs
    2. sometimes attribute values merging with attribute names, resulting in a generally confused output.
    3. absolutely NO exception or error is ever thrown. Only wrong output is the symptom.

This bug does NOT occur in java 1.4.2


STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
compile and run the provided test application (against the provided XML) with both java 1.4, then java 6 to compare the results (it is required to save the provided XML as a file, and change the filename in the example to point to this file).
Java 1.4 results in correct output,
Java 6 results in garbage.

package astraia.test;

import java.io.FileInputStream;

import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
 
public class Example
{
    public static void main(String[] argv)
    {
		try
		{
			FileInputStream fis = new FileInputStream("/home/sean/Desktop/chris/lessNoInternat.xml");
 
	        Document doc = DocumentBuilderFactory.newInstance()
	        .newDocumentBuilder()
	        .parse(new InputSource(fis));
			Element root = doc.getDocumentElement();
			NodeList textnodes = root.getElementsByTagName("text");
			int len = textnodes.getLength();
			int index = 0;
			int attindex = 0;
			int attrlen = 0;
			NamedNodeMap attrs = null;
 
			while (index<len)
			{
				Element te = (Element)textnodes.item(index);
				attrs = te.getAttributes();
				attrlen = attrs.getLength();
				attindex = 0;
				Node node = null;
 
				while (attindex<attrlen)
				{
					node = attrs.item(attindex);
					System.out.println("attr: "+node.getNodeName()+ " is shown holding value: " + node.getNodeValue());
					attindex++;
				}
				index++;
				System.out.println("-------------");
			}
	        fis.close();
		}
		catch(Exception e)
		{
			System.out.println("we've had an exception, type "+ e);
		}
	}
}

xml file:

<?xml version="1.0" encoding="UTF-8"?>
<block>
<lang>
<text dna="8233" ro="hello, and i'll type some normal characters in (&gt;=1.5 mm) ro" it="here to make sure international characters don't play a part(&gt;=1.5mm) it" tr="make sure international characters don't play a part (&gt;=1.5 mm) tr" pt_br="make sure international characters don't play a part (&gt;=1,5 mm) pt_br" de="make sure international characters don't play a part (&gt;=1,5 mm) de" el="make sure international characters don't play a part (&gt;= 1.5 mm) el" zh_cn="make sure international characters don't play a part¿&gt;= 1.5 mm¿ zh_cn" pt="make sure international characters don't play a part (&gt;=1,5 mm) pt" bg="make sure international characters don't play a part (&gt;= 1.5 mm) bg" fr="make sure international characters don't play a part (&gt;= 1,5 mm) fr" en="make sure international characters don't play a part (&gt;= 1.5 mm) en" ru="make sure international characters don't play a part (&gt;=1.5 ¿¿) ru" es="make sure international characters don't play a part (&gt;=1.5 mm) es" ja="make sure international characters don't play a part¿&gt;=1.5mm¿ ja" nl="make sure international characters don't play a part (&gt;= 1,5 mm) nl" />
</lang>
</block>

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -

The expected results are that when I iterate through the attributes and print out their name and values, they match what I see when i look at the xml file.
Below, we see a run of the application using java 1.4
As you can see, each line shows you on the left what attribute we are currently looking at,
followed by the value it is shown holding.

attr:<attribute-name>: is shown holding the value: <attribute-value>


attr: dna is shown holding value: 8233
attr: ro is shown holding value: hello, and i'll type some normal characters in (>=1.5 mm) ro
attr: it is shown holding value: here to make sure international characters don't play a part(>=1.5mm) it
attr: tr is shown holding value: make sure international characters don't play a part (>=1.5 mm) tr
attr: pt_br is shown holding value: make sure international characters don't play a part (>=1,5 mm) pt_br
attr: de is shown holding value: make sure international characters don't play a part (>=1,5 mm) de
attr: el is shown holding value: make sure international characters don't play a part (>= 1.5 mm) el
attr: zh_cn is shown holding value: make sure international characters don't play a part¿>= 1.5 mm¿ zh_cn
attr: pt is shown holding value: make sure international characters don't play a part (>=1,5 mm) pt
attr: bg is shown holding value: make sure international characters don't play a part (>= 1.5 mm) bg
attr: fr is shown holding value: make sure international characters don't play a part (>= 1,5 mm) fr
attr: en is shown holding value: make sure international characters don't play a part (>= 1.5 mm) en
attr: ru is shown holding value: make sure international characters don't play a part (>=1.5 ¿¿) ru
attr: es is shown holding value: make sure international characters don't play a part (>=1.5 mm) es
attr: ja is shown holding value: make sure international characters don't play a part¿>=1.5mm¿ ja
attr: nl is shown holding value: make sure international characters don't play a part (>= 1,5 mm) nl
-------------

ACTUAL -
The actual results, as seen when this example program is run through Java 6, update 5
shows the attribute names, and values a little garbled together sometimes, and mixed up, so that, for example, the value of attribute name 'en' no longer matches the original content, but the value of another attribute + the name of another attribute appended at the end.


As you can see, each line shows you on the left what attribute we are currently looking at,
followed by the value it is shown holding.

attr:<attribute-name>: is shown holding the value: <attribute-value>


attr: bg is shown holding value: make sure international characters don't play a part (>= 1,5 mm) fr
attr: de is shown holding value: make sure international characters don't play a part (>=1,5 mm) de
attr: dna is shown holding value: 8233
attr: el is shown holding value: make sure international characters don't play a part (>= 1.5 mm) el
attr: en is shown holding value: make sure international characters don't play a part (>=1.5 ¿¿) run
attr: es is shown holding value: make sure international characters don't play a part¿>=1.5mm¿ jaes
attr: fr is shown holding value: make sure international characters don't play a part (>= 1,5 mm) fr
attr: it is shown holding value: here to make sure international characters don't play a part(>=1.5mm) it
attr: ja is shown holding value: make sure international characters don't play a part¿>=1.5mm¿ ja
attr: nl is shown holding value: make sure international characters don't play a part (>= 1,5 mm) nl
attr: pt is shown holding value: make sure international characters don't play a part (>=1,5 mm) pt
attr: pt_br is shown holding value: make sure international characters don't play a part (>=1,5 mm) pt_br
attr: ro is shown holding value: hello, and i'll type some normal characters in (>=1.5 mm) ro
attr: ru is shown holding value: make sure international characters don't play a part (>=1.5 ¿¿) ru
attr: tr is shown holding value: make sure international characters don't play a part (>=1.5 mm) tr
attr: zh_cn is shown holding value: make sure international characters don't play a part (>=1,5 mm) pt_cn
-------------


ERROR MESSAGES/STACK TRACES THAT OCCUR :
No error message or exception

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
package astraia.test;

import java.io.FileInputStream;

import javax.xml.parsers.DocumentBuilderFactory;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
 
public class Example
{
    public static void main(String[] argv)
    {
		try
		{
			FileInputStream fis = new FileInputStream("/home/sean/Desktop/chris/lessNoInternat.xml");
 
	        Document doc = DocumentBuilderFactory.newInstance()
	        .newDocumentBuilder()
	        .parse(new InputSource(fis));
			Element root = doc.getDocumentElement();
			NodeList textnodes = root.getElementsByTagName("text");
			int len = textnodes.getLength();
			int index = 0;
			int attindex = 0;
			int attrlen = 0;
			NamedNodeMap attrs = null;
 
			while (index<len)
			{
				Element te = (Element)textnodes.item(index);
				attrs = te.getAttributes();
				attrlen = attrs.getLength();
				attindex = 0;
				Node node = null;
 
				while (attindex<attrlen)
				{
					node = attrs.item(attindex);
					System.out.println("attr: "+node.getNodeName()+ " is shown holding value: " + node.getNodeValue());
					attindex++;
				}
				index++;
				System.out.println("-------------");
			}
	        fis.close();
		}
		catch(Exception e)
		{
			System.out.println("we've had an exception, type "+ e);
		}
	}
}




xml file:

<?xml version="1.0" encoding="UTF-8"?>
<block>
<lang>
<text dna="8233" ro="hello, and i'll type some normal characters in (&gt;=1.5 mm) ro" it="here to make sure international characters don't play a part(&gt;=1.5mm) it" tr="make sure international characters don't play a part (&gt;=1.5 mm) tr" pt_br="make sure international characters don't play a part (&gt;=1,5 mm) pt_br" de="make sure international characters don't play a part (&gt;=1,5 mm) de" el="make sure international characters don't play a part (&gt;= 1.5 mm) el" zh_cn="make sure international characters don't play a part¿&gt;= 1.5 mm¿ zh_cn" pt="make sure international characters don't play a part (&gt;=1,5 mm) pt" bg="make sure international characters don't play a part (&gt;= 1.5 mm) bg" fr="make sure international characters don't play a part (&gt;= 1,5 mm) fr" en="make sure international characters don't play a part (&gt;= 1.5 mm) en" ru="make sure international characters don't play a part (&gt;=1.5 ¿¿) ru" es="make sure international characters don't play a part (&gt;=1.5 mm) es" ja="make sure international characters don't play a part¿&gt;=1.5mm¿ ja" nl="make sure international characters don't play a part (&gt;= 1,5 mm) nl" />
</lang>
</block>
---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
no workaround known

Release Regression From : 5.0
The above release value was the last known release where this 
bug was not reproducible. Since then there has been a regression.
Posted Date : 2008-04-17 08:49:43.0
Work Around
N/A
Evaluation
Thanks for the comments and votes on this issue.

The suggested code change is the same as that made in the patch for 6518733. Here's the change:
https://jaxp-sources.dev.java.net/source/browse/jaxp-sources/xml-xerces/java/src/com/sun/org/apache/xerces/internal/impl/XMLScanner.java?r1=1.7&r2=1.8

I have verified using the submitted test and xml file that the issue had been fixed. Unfortunately, the patch for 6518733 did not get into jdk6 until update 14. I appologize for the inconvenience.

After JavaOne, we plan to improve the process and bring JDK7 and 6 in sync with JAXP to resolve the problem that have affected users quite often where jaxp fixes were not integrated into the JDK.

Please also note that you may download the latest jaxp build from java.net and use the endorsed mechanism or place the jaxp-ri jar file on bootclasspath to override the jaxp implementation in jdk.
Posted Date : 2009-05-20 01:01:38.0
Comments
  
  Include a link with my name & email   

Submitted On 21-APR-2008
svaens
I am the submitter. 
For easier reproduction of the problem, I provide here a link to TWO xml files.
http://sean.freeshell.org/java/xmlfiles.tar.gz

One of these files contains ampersands, and the other not. 
Using the java test application shown above, and each of these files (one by one) one can see the difference in output caused by the existence of the bug. 
If one were to test these files with Java 1.4, one would notice that there was no difference in output when using the different input files. 
I will keep this link active as long as possible, or until the bug is resolved.



Submitted On 29-JAN-2009
Markus_Keller
Still broken in jdk6_11.

Note: Could be a duplicate of bug 6567432, since the fix proposed in that bug also fixes this bug:

The fix is trivial: At the end of com.sun.org.apache.xerces.internal.impl.XMLScanner.getStringBuffer(), insert
    fStringBufferIndex++;
before
    return tmpObj;


Submitted On 15-MAY-2009
udittmer
Seems an easy fix; any chance of getting this fixed in an upcoming Java 6 release?



PLEASE NOTE: JDK6 is formerly known as Project Mustang