|
Quick Lists
|
|
Bug ID:
|
4093056
|
|
Votes
|
4
|
|
Synopsis
|
RFE: Add facilities for fast character-encoding conversion
|
|
Category
|
java:char_encodings
|
|
Reported Against
|
1.1.4
, 1.1.5
, 1.2beta2
, 1.2beta3
|
|
Release Fixed
|
|
|
State
|
11-Closed, duplicate of 4287465,
request for enhancement
|
|
Priority:
|
5-Very Low
|
|
Related Bugs
|
4131655
,
4287465
|
|
Submit Date
|
14-NOV-1997
|
|
Description
|
ByteToCharConverter and CharToByteConverter was removed from java.io
in the 1.1 Beta 3 and moved to sun.io.
Not allowing low level charset conversion causes a 100%
performance hit when using String.getBytes() and String(byte[], encoding).
Low-level CharToByte and ByteToChar converters needs to be moved back into
java.io for both performance and allow the application to be 100% pure Java.
Below is a small reproduction which does 2 tests:
1) convert bytes to a String
2) convert a String into bytes
The first set uses the String() and String.getBytes() methods.
The second set uses the sun.io.ByteToCharConverter() and
sun.io.CharToByteConverter().
The output from running this app is:
Test: String(byte[], charset)
Elapsed time: 750
Test: String.getBytes(charset)
Elapsed time: 390
Test: sun.io.ByteToCharConverter.convertAll()
Elapsed time: 438
Test: sun.io.CharToByteConverter.convertAll())
Elapsed time: 110
You can see that the sun.io is much faster. These Converter methods
use to be in java.io before the JDK 1.1 Beta-3, but were removed.
We want to have the performance (we are reading/writing to a network
socket) but we also want to be 100% Java. Using the new "prescibed"
way of doing conversions is a major impact.
When can we get these (or similar) type methods put back into
the java.io classes?
Thanks for your assitance,
-Carl
--- ConvertPerf.java ---
import java.io.*;
import sun.io.*;
public class ConvertPerf {
static final int ARRAYSIZE = 500;
public static void main(String args[])
{
int i;
byte anArrayOfBytes[] = new byte[ARRAYSIZE];
byte dummyBytes[] = null;
String aStr = null;
long startTime;
long endTime;
String testString = new String("This is a string to be converted
to bytes");
String _charsetName = new String("Cp850");
// fill up the array of bytes with some data
for (i = 0; i < ARRAYSIZE; i++)
{
anArrayOfBytes[i] = 'A';
}
// test using String(byte[])
System.out.println("Test: String(byte[], charset)");
startTime = System.currentTimeMillis();
for(i = 0; i < ARRAYSIZE; i++)
{
try
{
aStr = new String(anArrayOfBytes, _charsetName);
}
catch (UnsupportedEncodingException uee)
{
System.out.println("Got UnsupportedEncodingException");
}
}
endTime = System.currentTimeMillis();
System.out.println("\tElapsed time: " + (endTime - startTime));
// test using String().getBytes()
System.out.println("Test: String.getBytes(charset)");
startTime = System.currentTimeMillis();
try
{
for(i = 0; i < ARRAYSIZE; i++)
{
dummyBytes = testString.getBytes(_charsetName);
}
}
catch (UnsupportedEncodingException uee)
{
System.out.println("Got UnsupportedEncodingException");
}
endTime = System.currentTimeMillis();
System.out.println("\tElapsed time: " + (endTime - startTime));
System.out.println("Test:
sun.io.ByteToCharConverter.convertAll()");
ByteToCharConverter _toUnicode =
ByteToCharConverter.getDefault();
startTime = System.currentTimeMillis();
try
{
for(i = 0; i < ARRAYSIZE; i++)
{
aStr = new
String(_toUnicode.convertAll(anArrayOfBytes));
}
}
catch (MalformedInputException mie)
{
System.out.println("got MalformedInputException");
}
endTime = System.currentTimeMillis();
System.out.println("\tElapsed time: " + (endTime - startTime));
System.out.println("Test:
sun.io.CharToByteConverter.convertAll())");
CharToByteConverter _fromUnicode =
CharToByteConverter.getDefault();
startTime = System.currentTimeMillis();
try
{
for(i = 0; i < ARRAYSIZE; i++)
{
dummyBytes =
_fromUnicode.convertAll(testString.toCharArray());
}
}
catch (MalformedInputException mie)
{
System.out.println("got MalformedInputException");
}
endTime = System.currentTimeMillis();
System.out.println("\tElapsed time: " + (endTime - startTime));
}
}
--- end of ConvertPerf.java ---
(Review ID: 19817)
======================================================================
|
|
Work Around
|
======================================================================
|
|
Evaluation
|
This is an RFE, not a bug. -- xxxxx@xxxxx 11/14/1997
Doing this right will require some significant additions to the java.io package.
It is too late to do this for JDK 1.2. -- xxxxx@xxxxx 1/5/1998
Making character converters public is part of RFE 4287465.
xxxxx@xxxxx 2000-02-25
|
|
Comments
|
Submitted On 06-FEB-1998
kleckner
I meant that the other way around - the deprecated
version of getBytes that does "incorrect" conversion
of char to byte runs 10x faster than the version of
getBytes that uses default conversion...
Submitted On 06-FEB-1998
kleckner
I find in one application that the deprecated
getBytes makes my thin application run 10x slower!
Submitted On 02-AUG-1999
monschke
At 1/5/98, Sun says it is too late for JDK1.2.
Please tell me you got this in for JDK1.3.
Submitted On 22-OCT-1999
MaximShiryaev
Agree!!!
Porting my Search Engine from win32 to Java I've found this problem. The only
workaround is:
// need to read a byte from a stream
// and place it into char
char theChar;
int a = someStream.read();
if ( a < 128 ) {
theChar = (char) a;
} else {
byte[] tmpByte = new byte[1];
tmpByte[0] = (byte)a;
tmpString = new String(tmpByte, "Cp1251")
theChar = tmpString.charAt(0);
}
How do you like it!?!?
For pure english texts it works 10(!) times faster than for Russian (in my
case) ones.
Currently I consider JNI workaroud.
Maxim
PLEASE NOTE: JDK6 is formerly known as Project Mustang
|
|
|
 |