|
Description
|
FULL PRODUCT VERSION :
C:\Programme\Java\jdk1.6.0_03\bin>java -version
java version "1.6.0_03"
Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
Java HotSpot(TM) Client VM (build 1.6.0_03-b05, mixed mode)
ADDITIONAL OS VERSION INFORMATION :
Windows XP SR-2
A DESCRIPTION OF THE PROBLEM :
RFC 3629 states that "Implementations of the decoding algorithm MUST protect against decoding invalid sequences."
Current implementation of UTF-8 is not protected against invalid sequences from "ED A0 80" to "ED BF BF". Surrogate pairs are created instead, like CESU-8 does.
Maybe this is as designed. But at least this should be documented in highlighted position, and created surrogate pairs should be valid.
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1.) Decode following byte sequence with UTF-8 decoder: "ED, A0, 80, ED, BF,BF"
2.) Decode following byte sequence with UTF-8 decoder: "ED, BF,BF, ED, A0, 80"
EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
1.) CoderResult.isMalformed()
2.) CoderResult.isMalformed()
ACTUAL -
1.) valid surrogate pair: U+D800 + U+DFFF
2.) invalid surrogate pair: U+DFFF + U+D800
REPRODUCIBILITY :
This bug can be reproduced always.
Posted Date : 2009-01-28 10:22:12.0
|
|
Evaluation
|
The latest Unicode recommendation regarding this issue is at
http://www.unicode.org/versions/corrigendum1.html
in which it recommends
"To address this issue, the Unicode Technical Committee has modified the definition of UTF-8 to forbid conformant implementations from interpreting non-shortest forms for BMP characters, and clarified some of the conformance clauses."
The "non_shortest forms" of supplementary characters are still "allowed" to be decoded (while not be generated in decoding). The UTF-8 charset implementation has been updated recently (#4486841) to follow the recommendation.
The decision for now is that we are not going to udpate the implementation to prohibit the non-shortest forms for supplementary characters. Will reconsider this position should the Standard changes or new security concern raise, in the future.
Posted Date : 2009-03-04 18:23:28.0
|