Submitted On 03-SEP-2004
Bernhard.Mandl
This is certainly a good idea but only a small step in the right direction.
I think we need a more complete solution for code point support
1.) We need a CodePoint class that wrappes the code point int.
Java is supposed to be a strongly typed language and using int for code points is not what I would call "type safe".
All Methods of class "Character" that take an "int codePoint" should be avaialable as methods in class CodePoint.
CodePoint objects should be pooled so that using "==" is valid for comparing CodePoint instances.
(disallow construction with "new", just like enums).
Maybe CodePoint could be implemented as enum where each valid code point is one enumerated constant?
Don't know how this would be performance wise, but conceptual this would be very pretty.
2.) A method "count()" should be added to String that returns the number of code points in the string.
("codePointCount()" would be easyer to read but quite long to type)
I even think "length()" should be deprecated and replaced by "charCount()".
All programs that use "length" to get the number of characters in a string are no longer correct and by
deprecating "length()" you will at least get a compiler warning that tells you so.
3.) Finally it would be nice to have "public Iterable<CodePoint> codePoints()" in class String.
Submitted On 24-MAY-2005
alvint
i don't see a need for the 'codePoints' method if you're using the 'foreach' construct. as mentioned in related bug 6275004, this form should iterate over code points:
for (int c : s) { ... }
while this form should iterate over characters:
for (char c : s) { ... }
Submitted On 24-MAY-2005
alvint
(continued)
this way, you wouldn't need to figure out how to add some backing collection for codepoints to String; the Iterable.iterator method can simply return an iterator backed by the existing char[] data.
Submitted On 10-OCT-2007
GarretWilson
I wish, I wish that we could just change char to be 32 bits. Otherwise, we take two bytes that we concatenate to make a full char; then we take two of those chars (for surrogate pairs) to make a complete Unicode code point; then we do more manipulation to create more bytes so that we can store those chars as UTF-8; and then we do percent-encoding so that we can stick that in a URI. A single Unicode code point has suddenly wound up as a long string of characters. And this is what we're going to build the web on for the rest of eternity, all because someone decided back in the dark ages that a byte would forever remain 8 bits and that a char would forever remain 16 bits.
PLEASE NOTE: JDK6 is formerly known as Project Mustang
|