Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 4131655
Votes 1
Synopsis java.io.InputStreamReader performance: Factor of five speed penalty
Category java:classes_io
Reported Against 1.1.5 , 1.2beta3
Release Fixed
State 11-Closed, duplicate of 4093056, bug
Priority: 4-Low
Related Bugs 4093056 , 4131647
Submit Date 22-APR-1998
Description
See the attachment for the code that generated these measurements; you may
need to comment out the test for "UTF16Reader" to compile and run it.  The
test data is generated by the "gen.java" file attached to bugid 4131647,
which turned up its own set of bugs ... Note that the custom readers took
about 1/2 hour to write and debug.  Admittedly, things like UTF-8 will be
slower than UTF-16 but that does not justify a FACTOR OF FIVE (or more)
difference in speed. 

---------------------------
From xxxx Sat Apr 18 15:56:36 1998

Subject: Reader performance


You'd asked for numbers when I asked you about performance problems in
the Reader/Writer framework, and here are some ugly ones.

Each of these (single) runs read 1M chars of XML data (basically, this
was randomly generated UNICODE, with some XML framing) from files cached
in memory.  The "read" loop was "read a 1K block, then read 512 characters
one at a time" until the end of the data was reached.

    InputStreamReader, "UnicodeLittle"  16.34 ms (JDK 1.1.5)
    InputStreamReader, "UnicodeLittle"  17.94 ms (JDK 1.2 beta4)

    Custom "UnicodeLittleReader"         3.86 ms (JDK 1.1.5)
    Custom "UnicodeLittleReader"         3.77 ms (JDK 1.2 beta4)

    InputStreamReader, "UTF8"           24.82 ms (JDK 1.1.5)
    InputStreamReader, "UTF8"           25.63 ms (JDK 1.2 beta4)

The custom reader does the obvious stuff -- notably not allocating a
garbage character array on each character-at-a-time read, and adding
no superfluous method calling overhead for block reads.  Stuff that
the character converter  customer  framework seemingly precludes.

If the character-at-a-time reads were removed, the times were rougly five
seconds to read the Unicode via InputStreamReader, eleven for UTF-8, and
about 10% faster for the custom reader.  That is, the custom reader is
still on the order of 25% faster.

For comparision, one XML parser, which doesn't use Readers because
of their performance, read ** AND PARSED ** the two files in only
two seconds more than the JDK's bulk read cases took ...
 
It's no wonder the people designing these APIs are steering away from
using the java.io.Reader classes.  Which is worrisome, since all XML
data is UNICODE.
 
- xxxx

<UPDATE>
<AUTHOR>   xxxxx@xxxxx   1998-06-29 </AUTHOR>

Software REWRITTEN to use the bulk reads can get acceptable
performance even with this speed penalty.  In fact, I've
now done so and outperform the fastest of the third party
XML processing engines.

However, for other applications I still think this is a
pretty severe problem.   Not everyone has complete control
over all of their input data sources.

</UPDATE>
Work Around
All software that needs to do character-at-a-time reads needs to
arrange to buffer the data, perhaps with a BufferedReader or in
application-specific buffers.

I don't call this a "convenient" workaround since it's not possible
in cases where the Reader is handed to a subsystem that may not
have exclusive use thereof:  the buffer would need to be used by
the next subsystem.  Also, InputStreamReader already _has_ a buffer.

  xxxxx@xxxxx   1998-06-29


This is not a workaround -- this is how the InputStreamReader class was designed
to be used!  An application should not pass an instance of InputStreamReader
around to different subsystems, it should pass an instance of BufferedReader
that buffers the InputStreamReader.

--   xxxxx@xxxxx   6/30/1998
Evaluation
Performance is comparable for bulk conversions, and the logic of byte to
character converters is inherently quite complex for single character conversions. The presented comparison case bypasses the byte to character conversion mechanism altogether.

  xxxxx@xxxxx   1998-06-29

(To clarify:  the implementation I wrote -- less than 30 minutes -- mostly
benefits from internal APIs that don't force conversion into a buffer.
The java.io.InputStreamReader code allocates a one byte buffer, converts
into it, and then returns the content of that buffer ... the buffer is
garbage immediately.  It CANNOT bypass byte-to-character conversion, that
is part of the problem definition ... I'm assuming Benedict made a typo
above, Readers are byte-to-char, not char-to-byte!!  :-)

  xxxxx@xxxxx   1998-06-29

Yes, that was my error. To further clarify, I meant that it bypasses the ByteToCharConverter  API mechanism, not the conversion per se. 
  xxxxx@xxxxx   1998-06-29


The InputStreamReader class was not designed to support efficient
single-character reads.  Due to the inherent complexities of character
encodings, it is impossible to support efficient single-character reads without
an additional level of post-conversion buffering.  This is why the
InputStreamReader specification explicitly suggests that instances should be
wrapped in a BufferedReader.  In applications that must pass the same reader to
different subsystems, a single BufferedReader instance should be passed around.

We are well aware of the need for a more general, and more efficient,
character-conversion API.  The fact that the current internal
Byte/CharConverter API throws exceptions so often is one reason why we did not
make that API public.

I'm closing this as a duplicate of 4093056.

--   xxxxx@xxxxx   6/30/1998
Comments
  
  Include a link with my name & email   


PLEASE NOTE: JDK6 is formerly known as Project Mustang