A fix for the stated problem has been prototyped and tested. A webrev
of the associated changes is attached.
Unfortunately simply reintroducing the sharing between String and the
StringBuffer classes which was present in the 1.4.x release train is
not so simple. The following need to be taken into consideration:
1. Between 1.4.x and 1.5.0 the unsynchronized StringBuilder class
was introduced and javac changed to default to using
StringBuilder rather than StringBuffer for the language's "+"
operator. The old sharing code relied on the synchronization in
the StringBuffer class for correctness.
2. There were problems in earlier JDK releases with Strings created
from StringBuffers with large backing stores causing large
amounts of inaccessible memory to be consumed. This was one of
the original motivations for completely removing the
String/StringBuffer sharing, done by ensuring that the String's
backing storage was sized precisely correctly.
The prototype was specifically designed to address issue (2) above.
Construction from StringBuffers and StringBuilders, as well as
"trimming" of Strings using the copy constructor, all take into
consideration the length of the resulting String as well as the size
of the underlying char being used to construct the String. If the
underlying char is too big, a copy is made.
In general, in the presence of an unsynchronized StringBuilder and the
String/StringBuilder sharing, it is not possible to guarantee 100%
adherence to the specification of the String class. The basic problem
occurs when one thread calls toString() on the StringBuilder at the
same time another thread calls a method which mutates the
StringBuilders' contents. In the absence of synchronization it is
theoretically possible that the String result of the toString() method
may become visible to the first thread before the second thread
notices that the backing char of the StringBuilder is now shared
with another thread, so for a brief period of time the first thread
may see the contents of the String mutate, which is explicitly
disallowed by the specification of that class. In practice, it is very
unlikely that this mutation could really be seen by the first thread,
because it is guaranteed to occur only for a brief and bounded period
of time, and basically impossible that it could be controlled in any
repeatable manner, but the possibility exists nonetheless. In the
prototype, the boolean flag indicating whether the contents of the
StringBuilder / StringBuffer are shared with a String was made
volatile to attempt to provide some timeliness guarantees about when
the presence of sharing would become visible to other threads. One
might argue that the results of String construction using
StringBuilder are undefined anyway in the face of multithreaded access
because StringBuilder is unsynchronized, but that is not really the
point; Strings must be immutable after construction, and if a risk of
any kind is going to be taken in this area, clear and across-the-board
performance improvements need to be shown.
The prototype did address the pathological regression demonstrated by
the microbenchmark in the evaluation, and also solved an
OutOfMemoryError raised by a customer in an escalation who had
transitioned from JDK 1.4.2 to 1.5.
We attempted to find other benchmarks to use to validate the expected
performance improvement. Unfortunately, we were not able to find any.
Sun's standard suite of Java benchmarks used for performance
measurement did not yield any speedup, and in fact showed a regression
on the key SPECjbb2005 benchmark as is discussed below. The
Sun-internal XMLTest suite used to validate XML parsing and handling
performance was run, but no improvement was seen. This contradicts the
claims by the submitter that XML-related codes are adversely impacted
by the lack of String/StringBuffer sharing in 1.5.
It was discovered that the reintroduction of the sharing code caused a
reproducible regression on the order of 4% in SPECjbb2005 scores.
Extensive testing did not yield a root cause of the slowdown. It was
hypothesized that perhaps additional storage in Strings in the heap
was slowing down the garbage collector. Measurements were taken, but
it was not possible that additional time spent in the garbage
collector could have resulted in this magnitude of drop in the
benchmark score; in fact, GC times decreased somewhat due to the
allocation of less storage during the run. It was hypothesized that
the introduction of the volatile boolean test in several places may
have impacted performance, but this was removed to no effect. It was
hypothesized that some refactoring that came along with the sharing
code may have perturbed the inlining performed by the Java Hotspot
compilers. This refactoring was left in place but the sharing removed;
the scores returned to their original values. An experiment was done
whereby the multiplicative factor indicating how large the backing
store of the String could grow was set to zero; the default value had
been 4, indicating that a String could keep at most four times its
length worth of backing store alive. Despite the presence of all of
the other code changes and refactorings, the scores returned to their
original values with the zero multiplicative factor. This indicates
that somehow the performance is impacted simply by the presence of
extra unused data, even though GC times didn't increase significantly.
Examination of profiler output did not point to an obvious reason for
The improvement of behavior of the customer application mentioned
above was analyzed. Due to the characteristics of the reintroduction
of the sharing code, there was only one possibility for why it helped
significantly, which was that the application was doing redundant
work, either substring or toString() operations. Sun suggested to the
customer to instead maintain their own String intern table at the
application level, thereby avoiding holding onto multiple copies of
the same String data. This both solved the customer's OutOfMemoryError
problems as well as increased the performance of the application.
One might consider reintroducing sharing only between String and
StringBuffer and not between String and StringBuilder to avoid the
correctness problems described above, but because StringBuffer and
StringBuilder share most of their implementations, this would result
in a large amount of code duplication and make maintenance much more
difficult. Additionally, given that javac no longer emits StringBuffer
code for the "+" operator, use of StringBuffer is expected to decrease
over time so this would provide a diminishing improvement anyway.
Given the correctness risks of fixing this bug, the fact that so far
only microbenchmarks and pathological application codes seem to show
an improvement with the fix, and that there is a hard-to-track-down
performance regression on a key benchmark with the fix, this bug is
being closed as "will not fix". The following further points should be
1. There have been very few reports of this problem since the time
5.0 was released. Clearly it has not impacted most customers'
2. Internal attempts to characterize StringBuffer performance have
resulted in the conclusion that 5.0 is actually faster than 1.4.2
at String manipulation in almost all cases.
3. Given the current structure of the core libraries' code, the
correct place for this optimization to take place, if it does at
all, is probably in the JVM using escape analysis.
Please reopen this bug if a reasonable, real-world application can be
produced which is significantly and adverseley impacted by the lack of
String / StringBuilder / StringBuffer data sharing.