|
Description
|
FULL PRODUCT VERSION :
java version "1.4.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)
FULL OS VERSION :
Microsoft Windows XP [Version 5.1.2600]
A DESCRIPTION OF THE PROBLEM :
When an RTF file contains a hexadecimal escape, RTFEditorKit converts that 8-bit character into a bogus character (that appears to be 32-bits) or drops the character entirely. The escape sequences are the \'XY kind, and \'a0 through \'ff seem to work fine. \'80 through \'9f, though, cause problems.
http://www.biblioscape.com/rtf15_spec.htm#Heading46 describes these 8-bit hexadecimal escapes.
Some characters, for example \'97, drop out of the input ENTIRELY, as if they were not in the RTF file at all. Others, such as \'80, appear, but as (char)1026. See below for a full list (in source code form) of these "oddballs".
/** Returns the value that RTFEditorKit should have found. Note that
you'll never see this return (char)145, (char)147, (char)148,
(char)150, (char)151, or (char)152, because they simply
DISAPPEAR.
*/
public static char getTheCorrectValue(char resultOfBug) {
switch ((int)resultOfBug) {
case 346: return (char)140;
case 347: return (char)156;
case 352: return (char)138;
case 356: return (char)141;
case 357: return (char)157;
case 353: return (char)154;
case 377: return (char)143;
case 378: return (char)159;
case 381: return (char)142;
case 382: return (char)158;
case 402: return (char)131;
case 710: return (char)136;
case 1026: return (char)128;
case 1027: return (char)129;
case 1106: return (char)144;
case 8117: return (char)146;
case 8126: return (char)149;
case 8218: return (char)130;
case 8222: return (char)132;
case 8224: return (char)134;
case 8225: return (char)135;
case 8230: return (char)133;
case 8240: return (char)137;
case 8249: return (char)139;
case 8250: return (char)155;
case 8482: return (char)153;
}
}
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Open the following RTF file with RTFEditorKit into a DefaultStyledDocument. Run over the characters in the document, and you'll see that only the newline '\n' appears. \'97, (char)151, is gone entirely.
{\rtf1\ansi\deff0\deftab720{\fonttbl{\f0\fnil MS Sans Serif;}{\f1\fnil\fprq2 TibetanMachine;}}
\deflang1033\pard\plain\f1\fs48\cf0 \'97\par }
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
\'80 should become (char)128, \'9f should become (char)159, et cetera.
ACTUAL -
\'80 became (char)1026, \'97 disappeared, et cetera. See description.
ERROR MESSAGES/STACK TRACES THAT OCCUR :
None appear.
REPRODUCIBILITY :
This bug can be reproduced always.
CUSTOMER SUBMITTED WORKAROUND :
This works for MANY documents, but not ALL:
Open the RTF file as a text file. Search for '\XY and replace it with \uRST. E.g., "\'97" becomes "\u151".
(Incident Review ID: 189587)
======================================================================
|