Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 4093056
Votes 4
Synopsis RFE: Add facilities for fast character-encoding conversion
Category java:char_encodings
Reported Against 1.1.4 , 1.1.5 , 1.2beta2 , 1.2beta3
Release Fixed
State 11-Closed, duplicate of 4287465, request for enhancement
Priority: 5-Very Low
Related Bugs 4131655 , 4287465
Submit Date 14-NOV-1997
Description




ByteToCharConverter and CharToByteConverter was removed from java.io
in the 1.1 Beta 3 and moved to sun.io.

Not allowing low level charset conversion causes a 100%
performance hit when using String.getBytes() and String(byte[], encoding).

Low-level CharToByte and ByteToChar converters needs to be moved back into
java.io for both performance and allow the application to be 100% pure Java.


Below is a small reproduction which does 2 tests:

	1) convert bytes to a String
	2) convert a String into bytes

The first set uses the String() and String.getBytes() methods.

The second set uses the sun.io.ByteToCharConverter() and 
sun.io.CharToByteConverter().

The output from running this app is:

   Test: String(byte[], charset)
        Elapsed time: 750
   Test: String.getBytes(charset)
        Elapsed time: 390
   Test: sun.io.ByteToCharConverter.convertAll()
        Elapsed time: 438
   Test: sun.io.CharToByteConverter.convertAll())
        Elapsed time: 110

You can see that the sun.io is much faster.  These Converter methods
use to be in java.io before the JDK 1.1 Beta-3, but were removed.

We want to have the performance (we are reading/writing to a network
socket) but we also want to be 100% Java.  Using the new "prescibed"
way of doing conversions is a major impact.

When can we get these (or similar) type methods put back into 
the java.io classes?

Thanks for your assitance, 

-Carl

--- ConvertPerf.java ---
import java.io.*;
import sun.io.*;

public class ConvertPerf {

    static final int ARRAYSIZE = 500;

    public static void main(String args[]) 
    {
       int i;
       byte anArrayOfBytes[] = new byte[ARRAYSIZE];
       byte dummyBytes[] = null;
       String aStr = null;
       long startTime;
       long endTime;
       String testString = new String("This is a string to be converted
to bytes");
       String _charsetName = new String("Cp850");

       // fill up the array of bytes with some data
       for (i = 0; i < ARRAYSIZE; i++)
       {
           anArrayOfBytes[i] = 'A';
       }

       // test using String(byte[])
       System.out.println("Test: String(byte[], charset)");
       startTime = System.currentTimeMillis();
       for(i = 0; i < ARRAYSIZE; i++)
       {
           try
           {
               aStr = new String(anArrayOfBytes, _charsetName);
           }
           catch (UnsupportedEncodingException uee)
           {
               System.out.println("Got UnsupportedEncodingException");
           }
       }
       endTime = System.currentTimeMillis();
       System.out.println("\tElapsed time: " + (endTime - startTime));
       
       // test using String().getBytes()
       System.out.println("Test: String.getBytes(charset)");
       startTime = System.currentTimeMillis();
       try
       {
          for(i = 0; i < ARRAYSIZE; i++)
          {
                 dummyBytes = testString.getBytes(_charsetName);
          }
       }
       catch (UnsupportedEncodingException uee)
       {
           System.out.println("Got UnsupportedEncodingException");
       }
       endTime = System.currentTimeMillis();
       System.out.println("\tElapsed time: " + (endTime - startTime));

       System.out.println("Test:
sun.io.ByteToCharConverter.convertAll()");
       ByteToCharConverter _toUnicode =
ByteToCharConverter.getDefault();
       startTime = System.currentTimeMillis();
       try
       {
          for(i = 0; i < ARRAYSIZE; i++)
          {
                aStr = new
String(_toUnicode.convertAll(anArrayOfBytes));
          }
       }
       catch (MalformedInputException mie)
       {
          System.out.println("got MalformedInputException");
       }
       endTime = System.currentTimeMillis();
       System.out.println("\tElapsed time: " + (endTime - startTime));
    
       System.out.println("Test:
sun.io.CharToByteConverter.convertAll())");
       CharToByteConverter _fromUnicode =
CharToByteConverter.getDefault();
       startTime = System.currentTimeMillis();
       try
       {
          for(i = 0; i < ARRAYSIZE; i++)
          {
                dummyBytes =
_fromUnicode.convertAll(testString.toCharArray());
          }
       }
       catch (MalformedInputException mie)
       {
          System.out.println("got MalformedInputException");
       }
       endTime = System.currentTimeMillis();
       System.out.println("\tElapsed time: " + (endTime - startTime));
    
    }

}

--- end of ConvertPerf.java ---


(Review ID: 19817)
======================================================================
Work Around





======================================================================
Evaluation
This is an RFE, not a bug.  --  xxxxx@xxxxx  11/14/1997

Doing this right will require some significant additions to the java.io package.
It is too late to do this for JDK 1.2.  --  xxxxx@xxxxx  1/5/1998

Making character converters public is part of RFE 4287465.
 xxxxx@xxxxx  2000-02-25
Comments
  
  Include a link with my name & email   

Submitted On 06-FEB-1998
kleckner
I meant that the other way around - the deprecated
version of getBytes that does &quot;incorrect&quot; conversion
of char to byte runs 10x faster than the version of
getBytes that uses default conversion...


Submitted On 06-FEB-1998
kleckner
I find in one application that the deprecated
getBytes makes my thin application run 10x slower!


Submitted On 02-AUG-1999
monschke
At 1/5/98, Sun says it is too late for JDK1.2.
Please tell me you got this in for JDK1.3.


Submitted On 22-OCT-1999
MaximShiryaev
Agree!!!
Porting my Search Engine from win32 to Java I've found this problem. The only
workaround is:
// need to read a byte from a stream 
// and place it into char
char theChar;
int a = someStream.read();
if ( a &lt; 128 ) {
  theChar = (char) a;
} else {
  byte[] tmpByte = new byte[1];
  tmpByte[0] = (byte)a;
  tmpString = new String(tmpByte, &quot;Cp1251&quot;)
  theChar = tmpString.charAt(0);
}
How do you like it!?!?
For pure english texts it works 10(!) times faster than for Russian (in my
case) ones.
Currently I consider JNI workaroud.
Maxim



PLEASE NOTE: JDK6 is formerly known as Project Mustang