EVALUATION
This problem could easily be addressed by caching the last few looked-up
charsets.
-- ###@###.### 2004/4/25
A very interesting story wrt. performance.
To my great surprise, I discovered that the penalty for a cache miss,
i.e. creating a new Charset object, is *much* lower after 1.4.2_05.
Consider this program:
----------------------------------------------------------------------
import java.nio.charset.*;
import java.util.*;
class t1 {
static final List times = new ArrayList();
static void time(Runnable job) {
long t1 = System.currentTimeMillis();
while (System.currentTimeMillis() - t1 < 2*1000) // warm up
job.run();
System.gc();
System.gc();
try { Thread.sleep(100); } catch (Exception e) {}
long t2 = System.currentTimeMillis();
job.run();
times.add(new Long(System.currentTimeMillis() - t2));
}
public static void main(String[] args) {
final int iterations = 10000;
time(new Runnable() { public void run() {
for(int i=0; i<iterations; i++)
Charset.isSupported((i&1) == 0 ? "ISO-2022-JP" :
"UTF-8");}});
for (Iterator it = times.iterator(); it.hasNext(); )
System.out.print(it.next() + " ");
System.out.println();
}
}
----------------------------------------------------------------------
When run against all the 1.4.2 update releases and 1.5.0, I get:
1.4.2_01 -server 4590
1.4.2_01 -client 5868
1.4.2_02 -server 4372
1.4.2_02 -client 6244
1.4.2_03 -server 4641
1.4.2_03 -client 5738
1.4.2_04 -server 4668
1.4.2_04 -client 6117
1.4.2_05 -server 30
1.4.2_05 -client 75
1.4.2_06 -server 30
1.4.2_06 -client 76
1.5.0 -server 23
1.5.0 -client 50
So the news is very very good. Charset creation is dramatically cheaper
as of 1.4.2_05, and it's even better in 1.5.0. With this, there is much
less need to cache Charsets, especially since the caching involves
overhead of its own. Nevertheless, a usage pattern involving two
Charsets is sufficiently common that caching is a good idea.
Measurements indicate that the simplest possible
increase in caching, i.e. adding another simple 1-element cache, is
the best engineering decision.
###@###.### 2004-08-02
---------------------------------------------------------------
With the new suggested fix, the results of running the tests on a win2k machine using the default client jvm are:
CharsetBench test case
1.4.2_06 687 687 67656 703 688 67672
1.4.2_06 (with fix) 672 672 1406 671 671 1390
isSupportedTest test case
1.4.2_06 78 64 8046
1.4.2_06 (with fix) 63 32 282
###@###.### 2004-08-04
|
SUGGESTED FIX
==== Provided suggested fix in Charset.java =====
***************
*** 27,34 ****
import sun.misc.ServiceConfigurationError;
import sun.nio.cs.StandardCharsets;
import sun.nio.cs.ThreadLocalCoders;
-
/**
* A named mapping between sequences of sixteen-bit Unicode characters and
* sequences of bytes. This class defines methods for creating decoders and
--- 27,34 ----
import sun.misc.ServiceConfigurationError;
import sun.nio.cs.StandardCharsets;
import sun.nio.cs.ThreadLocalCoders;
+ import java.util.Hashtable;
/**
* A named mapping between sequences of sixteen-bit Unicode characters and
* sequences of bytes. This class defines methods for creating decoders and
***************
*** 279,287 ****
--- 279,290 ----
// along with the name that was used to find it
//
private static volatile Object[] cache = null;
+ private static Hashtable hash = new Hashtable();
private static Charset cache(String charsetName, Charset cs) {
cache = new Object[] { charsetName, cs };
+
+ hash.put( charsetName, cs );
return cs;
}
***************
*** 373,385 ****
}
}
private static Charset lookup(String charsetName) {
if (charsetName == null)
throw new IllegalArgumentException("Null charset name");
Object[] ca = cache;
if ((ca != null) && ca[0].equals(charsetName))
return (Charset)ca[1];
! Charset cs = standardProvider.charsetForName(charsetName);
if (cs != null)
return cache(charsetName, cs);
cs = lookupViaProviders(charsetName);
--- 376,395 ----
}
}
+
+
private static Charset lookup(String charsetName) {
if (charsetName == null)
throw new IllegalArgumentException("Null charset name");
+
Object[] ca = cache;
if ((ca != null) && ca[0].equals(charsetName))
return (Charset)ca[1];
!
! Charset cs = (Charset)hash.get(charsetName);
! if ( cs != null ) return cs;
!
! cs = standardProvider.charsetForName(charsetName);
if (cs != null)
return cache(charsetName, cs);
cs = lookupViaProviders(charsetName);
###@###.### 2004-02-25
============================================================================
Here is my currently recommended fix for 1.4.2_06, which also works for 1.5:
--- /tmp/geta12651 2004-08-02 18:41:03.992978000 -0700
+++ Charset.java 2004-08-02 13:22:40.782452000 -0700
@@ -271,22 +271,23 @@
throw new IllegalCharsetNameException(s);
}
}
/* The standard set of charsets */
private static CharsetProvider standardProvider = new StandardCharsets();
- // Cache of the most-recently-returned charset,
- // along with the name that was used to find it
+ // Cache of the most-recently-returned charsets,
+ // along with the names that were used to find them
//
- private static volatile Object[] cache = null;
+ private static volatile Object[] cache1 = null; // "Level 1" cache
+ private static volatile Object[] cache2 = null; // "Level 2" cache
- private static Charset cache(String charsetName, Charset cs) {
- cache = new Object[] { charsetName, cs };
- return cs;
+ private static void cache(String charsetName, Charset cs) {
+ cache2 = cache1;
+ cache1 = new Object[] { charsetName, cs };
}
// Creates an iterator that walks over the available providers, ignoring
// those whose lookup or instantiation causes a security exception to be
// thrown. Should be invoked with full privileges.
//
private static Iterator providers() {
@@ -410,26 +411,40 @@
}
return (ecp != null) ? ecp.charsetForName(charsetName) : null;
}
private static Charset lookup(String charsetName) {
if (charsetName == null)
throw new IllegalArgumentException("Null charset name");
- Object[] ca = cache;
- if ((ca != null) && ca[0].equals(charsetName))
- return (Charset)ca[1];
- Charset cs = standardProvider.charsetForName(charsetName);
- if (cs != null)
- return cache(charsetName, cs);
- cs = lookupExtendedCharset(charsetName);
- if (cs != null)
- return cache(charsetName, cs);
- cs = lookupViaProviders(charsetName);
- if (cs != null)
- return cache(charsetName, cs);
+
+ Object[] a;
+ if ((a = cache1) != null && charsetName.equals(a[0]))
+ return (Charset)a[1];
+ // We expect most programs to use one Charset repeatedly.
+ // We convey a hint to this effect to the VM by putting the
+ // level 1 cache miss code in a separate method.
+ return lookup2(charsetName);
+ }
+
+ private static Charset lookup2(String charsetName) {
+ Object[] a;
+ if ((a = cache2) != null && charsetName.equals(a[0])) {
+ cache2 = cache1;
+ cache1 = a;
+ return (Charset)a[1];
+ }
+
+ Charset cs;
+ if ((cs = standardProvider.charsetForName(charsetName)) != null ||
+ (cs = lookupExtendedCharset(charsetName)) != null ||
+ (cs = lookupViaProviders(charsetName)) != null) {
+ cache(charsetName, cs);
+ return cs;
+ }
+
/* Only need to check the name if we didn't find a charset for it */
checkName(charsetName);
return null;
}
/**
* Tells whether the named charset is supported. </p>
----------------------------------------------------------------------
Here is a benchmark program to test the above fix:
----------------------------------------------------------------------
import java.nio.charset.*;
import java.util.*;
class CharsetBench {
static final List times = new ArrayList();
static void time(Runnable job) {
long t1 = System.currentTimeMillis();
while (System.currentTimeMillis() - t1 < 10*1000) // warm up
job.run();
System.gc();
System.gc();
try { Thread.sleep(100); } catch (Exception e) {}
long t2 = System.currentTimeMillis();
job.run();
times.add(new Long(System.currentTimeMillis() - t2));
}
public static void main(String[] args) {
final int iterations = 10000000;
for (int j = 0; j < 2; j++) {
time(new Runnable() { public void run() {
for(int i=0; i<iterations; i++)
Charset.isSupported("ISO-2022-JP");}});
time(new Runnable() { public void run() {
for(int i=0; i<iterations; i++)
Charset.isSupported("UTF-8");}});
time(new Runnable() { public void run() {
for(int i=0; i<iterations; i++)
Charset.isSupported((i&1) == 0 ? "ISO-2022-JP" :
"UTF-8");}});
}
for (Iterator it = times.iterator(); it.hasNext(); )
System.out.print(it.next() + " ");
System.out.println();
}
}
----------------------------------------------------------------------
With this program, I get the following results:
----------------------------------------------------------------------
javac CharsetBench.java && for w in cc1; do for f in -server -client; do echo $w $f `jws $w java $f CharsetBench`; done; done; for v in 1.5 1.4.2_06 1.4.2; do for f in -server -client; do echo $v $f `jver $v java $f CharsetBench`; done; done
cc1 -server 74 92 413 92 92 409
cc1 -client 355 356 700 356 356 711
1.5 -server 98 116 12807 114 118 12623
1.5 -client 349 341 22302 349 349 22331
1.4.2_06 -server 98 116 17438 102 117 17617
1.4.2_06 -client 371 371 37553 359 371 37618
----------------------------------------------------------------------
(where cc1 contains the results with the proposed fix applied to 1.5).
What these results show is that the proposed fix makes both
the repeated-single-Charset and the alternating-Charset access
pattern faster, the second by an more than an order of magnitude.
Looks good enough to me. Time to stop tweaking.
###@###.### 2004-08-02
|