|
Description
|
JDK 1.2.2 (Win95)
The symptoms
------------
+ I am reading in a large customer net (~10 Mb uncompressed) which I
wrote to disk compressed using GZIPOutputStream().
+ Because of the large size of my application I am starting with a
large heap (256m), and must avoid garbage collection
(above 200m garbage collection time rise very rapidly).
+ Reading in my large customer using GZIPInputStream caused memory
to fill (invoking gc), even though there should have been plenty
of space for the customer
+ Reading in the same customer uncompressed caused no problems.
+ using freeMemory() I determined that GZIPInputStream was using
up roughly 10 times the uncompressed size of the customer in heap.
The Cause
---------
+ Digging into GZIPInputStream.java and InflaterInputStream.java
I determined that each had one or two core member functions
which declared new scratch array objects each time the function
is called. In particular, InflaterInputStream.read()
contains the line:
byte[] b = new byte[1];
Seems pretty innocuous, but
(a) "b" is a full array customer , not just one byte, and
(b) read() is called once for each byte of *uncompressed* data
Suggested Fix
-------------
I implemented the following changes in local copies of
GZIPInputStream.java and InflaterInputStream.java and reading now
take a small fixed amount of heap (~3Kb) regardless of file size.
InflaterInputStream.java
1) add member field: private byte[] b = new byte[512];
2) remove line 106 : read() : byte[] b = new byte[1];
remove line 177 : skip() : byte[] b = new byte[512];
GZIPInputStream.java
1) add member field: private byte[] skipBuff = new byte[128];
2) remove line 215 : skip() : byte[] buf = new byte[128];
3) line 217 : skip() : change "buf" to "skipBuff"
Summary
-------
GZIP compression is most likely to be used with large objects and
large heap sizes. Because garbage collection for large heap sizes
does not currently work adequately what should be just an inefficiency
in the inflater has become a serious liability. I would,
therefore, recommend that these or some equivalent changes be made to
the JDK for the earliest possible release.
Note for SQE team: performance improvement, no test case needed
|