EVALUATION
Surprisingly the problem is caused by incorrectly encoded JVM_CONSTANT_Utf8 in constant pool of the specific classfile (it is org.apache.xerces.impl.xpath.regex.ParserForXMLSchema). It contains constant string
private static final String DIGITS = "09\u0660\u0669\u06F0\u06F9\u0966\u096F\u09E6\u09EF\u0A66\u0A6F\u0AE6\u0AEF" +"\u0B66\u0B6F\u0BE7\u0BEF\u0C66\u0C6F\u0CE6\u0CEF\u0D66\u0D6F\u0E50\u0E59\u0ED0\u0ED9" +"\u0F20\u0F29";
which is encoded in classfile as:
30 39 e0 99 a0 e0 99 a9 e0 9b b0 e0 9b b9 e0 a5 |09..............|
a6 e0 a5 af e0 a7 a6 e0 a7 af e0 a9 a6 e0 a9 af |................|
e0 ab a6 e0 ab af e0 ad a6 e0 ad af e0 af a7 e0 |................|
af af e0 b1 a6 e0 b1 af e0 b3 a6 e0 b3 af e0 b5 |................|
a6 e0 b5 af e0 b9 90 e0 b9 99 e0 bb 90 e0 bb 99 |................|
e0 bc a0 e0 bc a9
Note that third character (\u0660) in DIGITS string is encoded using three bytes (e0 99 a0), but it should be encoded using only two bytes (d9 a0).
When such classfile is read by VM, it stores this UTF-8 string in symbol table via oopFactory::new_symbols() in ClassFileParser::parse_constant_pool_entries(). When the constant string represented by this symbol is accessed, it is converted to interned String object in constantPoolOopDesc::string_at_impl(). Now when profiler wants classfile for ParserForXMLSchema, JvmtiClassFileReconstituter::write_class_file_format() invokes constantPoolOopDesc::copy_cpool_bytes().
This code in case of JVM_CONSTANT_String gets the UTF-8 from the interned String object and this UTF-8 string is encoded correctly and uses two bytes (d9 a0) to encode third character. Now this UTF-8 string is used to look-up corresponding symbol in SymbolTable - but symbol table contains UTF-8 string from the classfile, which does not match the correct UTF-8 string from the interned String object. Therefor
symbolOop sym = SymbolTable::lookup_only(str, (int) strlen(str), hash);
returns null and assert is subsequently thrown from
idx1 = tbl->symbol_to_value(sym);
I was thinking about several possible fixes. Since the problem is caused by incorrectly generated classfile (probably by jikes) and it is very rare, I choose to use brute-force search through 'tbl' when SymbolTable::lookup_only() returns NULL. This way there is no memory or CPU overhead added by the fix for the case, where classfile is OK.
Proposed fix is attached. Lines with // DEBUG suffix will be deleted. I used them for debugging the issue.
The broken classfile can be found in http://downloads.sourceforge.net/lportal/liferay-portal-tomcat-6.0-5.1.2.zip
Unzip liferay-portal-tomcat-6.0-5.1.2.zip file and org.apache.xerces.impl.xpath.regex.ParserForXMLSchema.class is in ./lib/ext/xercesImpl.jar
|