Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 4546734
Votes 4
Synopsis Add method StringBuffer.trimToSize
Category java:classes_lang
Reported Against 1.3 , 1.4 , hopper , kestrel-beta , merlin-beta3
Release Fixed 1.5(tiger)
State 10-Fix Delivered, request for enhancement
Priority: 4-Low
Related Bugs 4310930 , 4524944 , 4724129
Submit Date 05-DEC-2001
Description
When a StringBuffer is turned into a String, the new String
 uses the StringBuffer's backing array. This is a very fast
way to create the String since no allocation is necessary
but it can waste space since the backing array may be much
larger than the resulting String. Reusing StringBuffers can
make this problem worse since the StringBuffer may have grown
its backing array in past usages.

We would like to give users a method to use if they feel that
the efficient use of memory is more important than the speed that
toString currently has. Adding a compact method in StringBuffer
would allow a user to eliminate the wasted memory by causing the
backing array to be resized to fit the length of the StringBuffer.
The idiom buffer.compact().toString() will result in a String
with a backing array of the minimum size required.
Work Around
N/A
Evaluation
Can be done in Tiger.
  xxxxx@xxxxx   2001-12-04

The name trimToSize might be better.
  xxxxx@xxxxx   2002-02-14
Comments
  
  Include a link with my name & email   

Submitted On 18-MAY-2002
raojx
Ok, how come 4637640 is a duplicate of this one? Does that 
mean they'll be fixed together? Because I don't think 
they're related. Please take a closer look at 4637640, it 
has the potential to cause very serious and hard to find 
memory leak, or at least fix the javadoc so that people 
will know substring and its parent share the same 
underlying array.


Submitted On 29-AUG-2002
dlipp
As raojx wrote this is an extremely serious and hard to find 
memory leak. We've been searching for it (in a complex 
application) for several weeks right now. In combination with 
heavy load and the implementation of the HotSpot GC we 
always blew our VM (OutOfMemoryError) although we did not 
keep references on most of this substrings. 
It's unbelievable this bug is still there! Even if HotSpot GC was 
working better under heavy load this would be unacceptable 
as people might want to keep references of substrings in a 
Hashtable or whatever. 
At one point we import 30'000 records from file to db. Keeping 
references on all Ids we already imported (first fews chars of 
a record) gave us a memory waste of > 100 MB compared to 
only keep the mini Strings i.e. created by new String
(unacceptableSubString)  (!!!)

I cannot believe substring is still implemented that way. 
Our workaround is not to use String.substring. We create an 
empty char[] and copy what we want using the getChar(int, 
int, char[], int) method.
By the way. We also replaced StringTokenizer as it uses the 
substring mechanism...


Submitted On 15-NOV-2002
quelgar
I agree that this behaviour is not obvious. Even less
obvious is that you get a similar problem when using
String.substring() on a large String. In both cases, the
orignal backing char array is kept, even though only a small
part of it may be being used.

However, once you know that this is happening, you don't
need a trimToSize() method to fix it, just use the
String(String) constructor. This constructor copies the char
array if the source string has extra capacity - the comments
indicate this is explicitly for the purpose of trimming
baggage. Eg:

StringBuffer sb = new StringBuffer(1000000);
s.append('a');
String s = new String(sb.toString());
sb = null;
// that 1 million byte char array should be available for GC now

In fact, I cannot see any other use for the String(String)
constructor other than trimming the backing array. I think
the correct way to close this bug is for the documentation
for this constructor to explicitly spell out that it should
be used to trim the backing storage if you need to.


Submitted On 05-DEC-2002
bestsss
quelgar, simply use StringBuffer.substring(0); it's
 more a bit more effective


Submitted On 11-FEB-2003
skells
[radical view] I have always thought that trimToSize() should 
be a method on Object with empty implementation. This is 
polymorphic behavior.

In the issue of the stringbuffer and string buffer reuse, i 
believe that it is the job of the string constructor to decide if 
there is too much wasted space, and it there is then copy to 
a new array otherwise share. Something like

public String (StringBuffer buffer) {
        synchronized(buffer) {
            
            // not required  remove, as it is the case anyway 
this.offset = 0;
            value = buffer.getValue();
            count = buffer.length();
            if (value.length > 16 && value.length > 2 * count) {
                  //there is too much wasted space, so copy the 
value - too much to be detenmined by sun, maybe by 
enviroment parameters            
                value = new char[count];
                System.arraycopy(buffer.getValue(),0, this.value, 
0, count);
        } else {
            buffer.setShared();
        }
    }


Submitted On 05-MAR-2003
rnaseef
I agree with markt regarding this feature (see bug 4310930).  
Optimizations that are questionable such as this one must be 
considered options to be explicitly agreed-to by the 
programmer, not default behavior.

We have some java apps that are very quickly using up tons 
of memory, probably because of this issue.  We use 
StringBuffer all over the place so the cost of fixing this may 
be extremely high.

This is a memory leak in the classical sense.  Memory is not 
being freed even after the StringBuffer has been freed 
because there are String objects around using some portion 
of that memory.

Perhaps a more focused optimization for string+string is in 
order here.


Submitted On 21-MAR-2003
verdy_p
For now the workaround is to use:

new String(stringBuffer.toString())
to allocate a String that won't leak unused space in the 
StringBuffer,

and avoid reusing StringBuffers (which will constantly grow in 
memory) because only 1 use really requires the indicated 
length: when expanding StringBuffers, it performs an 
unneeded copy of its existing content and it stresses the CPU.

Its much better to use this code before reusing StringBuffers:
stringBuffer.setLength(0);
stringBuffer.setLength(desiredCapacity);

The setLength(0) call will force reducing the capacity of the 
backing array to 16 characters by allocating a new array.
The setLength(desiredCapacity) will reallocate this small 
buffer and initialize it to 0s.
I just wonder why the StringBuffer initialize these characters 
to 0 when expanding or allocating this array. May be this is 
for security reasons, but I think that this should be the role of 
the native VM allocator which will perform this array erasure 
much faster than in the StringBuffer.java class.
In my opinion, after all calls to "new char[length]" one should 
not need to clear the content of the allocated array and so 
the StringBuffer does not need to perform this initialization 
too: this is waste of time and of CPU ressource.


Submitted On 21-MAR-2003
verdy_p
This bug also affects String+String operations that are 
compiled as:
new StringBuffer().append(str1).append(str2).toString()
I.e. it reallocates the backing store in the StringBuffer for 
each string appended, requiring backing store expansion with 
copies. But this expansion uses a new size which is the 
largest of twice (the older size +1) and the needed size.
When appending small strings to an already large string, this 
expansion wastes memory.
The typical leak occurs when appending characters or lines in 
a large StringBuffer for a whole text: the wasted space can 
be but to 50% in the final String returned by toString().

Partial Workaround:
->  replace "string1+string2" by "new String(string1+string2)"
You won't avoid the StringBuffer expansion with intermediate 
copies (which stresses the CPU), but at least the backing 
store shared in the return of toString() will be deallocated, 
and the new String will fit exactly the string size.

Better workaround:
replace "string1+string2;" by:
new StringBuffer(string1.length()+string2.length())
      .append(string1).append(string2).toString()
This version allocates the backing store only once, removes 
all expansions and additional copies, and return a String that 
wastes no space, by just sharing the backingbuffer used in 
StringBuffer.
This code should be the one compiled for String 
concatenations with the + operator: il will just require 
creating temporary variables in the bytecode to store the 
evaluation of each String operand, then dereferencing them in 
the correct left-to-right order to get their length, and finally 
append them in the StringBuffer as before, before freeing the 
temporary variables from the bytecode operand stack.

Because such optimization does only uses a temporary 
StringBuffer that is not used after calling toString(), there is 
no need to mark the StringBuffer with setShared(): we could 
remove this operation, by making StringBuffer inherit from a 
new internal class "TemporaryStringBuffer" that would be used 
for String Concatenations, so that the String(StringBuffer) 
constructor used by the StringBuffer.toString() method will 
not need to set it to shared state (the setShared() method 
would be a noop for TemporaryStringBuffers).


Submitted On 11-OCT-2006
diroussel2
Note that the fix Java 1.5 is not just to add trimToSize(), but also to change the toString() implementation from 
   return new String(this);
to
   return new String(value, 0, count);

so it does look like the bug is fixed properly



PLEASE NOTE: JDK6 is formerly known as Project Mustang