United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: 6730652 CharsetEncoder.canEncode(char) returns incorrect values for some Charsets
6730652 : CharsetEncoder.canEncode(char) returns incorrect values for some Charsets

Details
Type:
Bug
Submit Date:
2008-07-28
Status:
Resolved
Updated Date:
2010-04-02
Project Name:
JDK
Resolved Date:
2009-08-28
Component:
core-libs
OS:
solaris_9
Sub-Component:
java.nio.charsets
CPU:
sparc
Priority:
P3
Resolution:
Fixed
Affected Versions:
2.0
Fixed Versions:
7

Related Reports

Sub Tasks

Description
CharsetEncoder.canEncode(ch) != CharsetEncoder.canEncode("" + ch) for some Charsets?

I ran this simple test with JDK 5 and JDK 6 on Mac OSx and Windows XP,

SortedMap<String, Charset> map = Charset.availableCharsets();
        for (Charset cs : map.values()) {
            try {
                CharsetEncoder encoder = cs.newEncoder();

                char ch = '\u5185';
                if (encoder.canEncode(ch) != encoder.canEncode("" + ch))
                    System.out.println("Encoder: " + cs.name() + " failed");
            } catch (Exception e) {
                //System.out.println("Encoder: " + cs.name() + " Error");
            }
        }

It failed 
JDK 5 (1.5.0_15) windows xp:
Encoder: ISO-2022-KR failed
Encoder: x-IBM1381 failed
Encoder: x-IBM1383 failed
Encoder: x-IBM942 failed
Encoder: x-IBM942C failed
Encoder: x-IBM943 failed
Encoder: x-IBM943C failed

JDK 6 (1.6.0_04) Windows xp:
Encoder: x-ISO-2022-CN-CNS failed

Mac Osx (JDK 5 1.5.0_13)
Encoder: MacRoman failed
Encoder: x-IBM1381 failed
Encoder: x-IBM1383 failed
Encoder: x-IBM942 failed
Encoder: x-IBM942C failed
Encoder: x-IBM943 failed
Encoder: x-IBM943C failed

                                    

Comments
EVALUATION

The failure in attached test case has been fixed a while ago with other changes, however the FindCanEncodeBug still fails. The cause is that ISO2022.canEncode() invokes ISOEncoder.canEncode(), in which the ISOEncoder is the EUC_TW encoder in ISO2022_CN_CNS case, but ISO2022_CN_CNS only suports the first 3 planes of the EUC_TW, the canEncode() fails if character in other planes is fed.
                                     
2009-08-14
EVALUATION

The only failure in latest 6ux is "Encoder: x-ISO-2022-CN-CNS failed".
The root cause is that encoder of x-ISO-2022-CN-CNS failed to encode the CNS p3 character, which the testing codepoint \u5185 is.
                                     
2009-05-03



Hardware and Software, Engineered to Work Together