Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 6790402
Votes 2
Synopsis Speed-up FastCharsetProvider
Category java:char_encodings
Reported Against
Release Fixed
State 5-Cause Known, bug
Priority: 4-Low
Related Bugs
Submit Date 06-JAN-2009
Description
FULL PRODUCT VERSION :
java version "1.6.0_11"
Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
Java HotSpot(TM) Client VM (build 11.0-b16, mixed mode, sharing)

ADDITIONAL OS VERSION INFORMATION :
Windows XP SR-2

A DESCRIPTION OF THE PROBLEM :
This is a MIXED list of BUGs and RFEs regarding performance of sun.nio.cs.FastCharsetProvider !

  Bug 1:
There is a shortcut to enhance VM startup in line 94:
	if (cln.equals("US_ASCII")) {
	    cs = new US_ASCII();
            ...
As it seems that current charset for startup is UTF-8, the shortcut should be updated to UTF-8.
Also check this in sun.io.Converters, where ISO-8859-1 is yet defined for defaultEncoding.

  Bug 2:
Method canonicalize() is invoked 2 times in sequence after invoking charsetForName() (lines 119 -> 82).
This unnecessarily consumes performance.

  Bug 3:
Method lookup() should be synchronized to insure lock also for invokation from charsets() iteration; synchronization could be saved for method charsetForName().

RFE 4:
After unsuccessful lookup in cache, there is a 2nd lookup in classMap to check support for given charset name:
	// Check cache first
	Charset cs = cache.get(csn);
	if (cs != null)
	    return cs;
	// Do we even support this charset?
	String cln = classMap.get(csn);
	if (cln == null)
	    return null;
This could be enhanced by initializing the cache with a NULL Charset  customer . So there is only need for 1 single map lookup:
        Charset cs = cache.get(csn);
        if (cs == null) // Don't we even support this charset?
            return null;
        if (cs != NULL)
            return cs;
so ...

RFE 5:
Class name lookup can be done after startup charset check, if latter is done by canonical name comparison:
        if (csn.equals("us-ascii"))

RFE 6:
In method toLower() s shouldn't be looped twice after 'toLower' becomes false, and copying of lower chars could be saved:
(copying String's internal char[] shouldn't be slower than new char[], as it also needs to fill initial content with 0's)
    private static String toLower(final String s) {
        boolean allLower = true;
        char[] ca = null;
        for (int i=0; i<s.length(); i++) {
            int c = s.charAt(i);
            if (((c - 'A') | ('Z' - c)) >= 0) {
                if (allLower) {
                    ca = s.toCharArray();
                    allLower = false;
                }
                ca[i] += '\u0020';
            }
        }
        return allLower ? s : new String(ca);
    }


RFE 7:
charsets() iterator must not canonicalize each charset name, so canonicalizing can be moved from lookup() to charsetForName().

See my correction here:
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/src/sun/nio/cs/FastCharsetProvider.java?diff_format=s&rev=552&view=markup


RFE 8:
classMap could be saved. Instead pre-load the cache initially with the class names. This would save 1 HashMap lookup to get the class name of a charset to be instantiated:
    In constructor do ...
        cache = classNames; // type: Map<String,Object>

RFE 9:
Pre-load the VM's startup charset. This would save the check for each lookup:
        private final static Charset STARTUP_CHARSET = new UTF_8();
        cache.put(toLower(STARTUP_CHARSET.name()), STARTUP_CHARSET);

All this would result in very simple lookup:

    private synchronized Charset lookup(String lowCanonical) {
        // Check cache first
        Object o = cache.get(lowCanonical); // TODO: maybe also cache by arbitrary alias
        if (o instanceof String) {
            // Instantiate new charset
            Charset cs = newCharset((String)o);
            // Cache it
            if (cs != null)
                cache.put(lowCanonical, cs);
            return cs;
        }
        return (Charset)o;
    }
    Charset newCharset(String className) {
        try {
            Class c = Class.forName(packagePrefix+"."+className,
                                    true, getClass().getClassLoader());
            return (Charset)c.newInstance();
        }
        catch (ClassNotFoundException x) {}
        catch (IllegalAccessException x) {}
        catch (InstantiationException x) {}
        return null;
    }

See my correction here:
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/src/sun/nio/cs/FastCharsetProvider.java?diff_format=s&rev=555&view=markup


REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
For complete source code see:
https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/src/sun/nio/cs/FastCharsetProvider.java?diff_format=s&rev=551&view=markup
---------- END SOURCE ----------
Posted Date : 2009-01-06 11:04:37.0
Work Around
N/A
Evaluation
combine the "cm" and "c" maps is an interesting idea.
Posted Date : 2009-01-06 23:06:13.0
Comments
  
  Include a link with my name & email   

Submitted On 13-JAN-2009
UlfZibis
toLower() could be made little faster:

    static final String toLower(final String s) {
        boolean allLower = true;
        char[] ca = null;
        for (int i=0; i<s.length(); i++) {
            int c = s.charAt(i);
            if (c <= 'Z' && c >= 'A') {
                if (allLower) {
                    ca = s.toCharArray();
                    allLower = false;
                }
                ca[i] += '\u0020';
            }
        }
        return allLower ? s : new String(ca);
    }

Assuming that there is a significant percentage of numeric characters in charset names, only the first compare operation in the if clause must be executed in case of c is a numeric character.


Submitted On 13-JAN-2009
UlfZibis
Oops ...
Correction; It should be:
    if (c >= 'A' && c <= 'Z')


Submitted On 28-JAN-2009
UlfZibis
See also:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6796087


Submitted On 11-JUN-2009
UlfZibis
Additionally I have found out, that generated PreHashMaps are balanced much better, if they only contain uppercased keys.

-> 256 buckets against 1024 (for 209 entries).

So better use instead:
    static final String toUpper(final String s)

toUpper() would too be potentially faster, as the 2nd term of { if (c >= 'a' && c <= 'z') } would only come to process in the rare case of character '_'.


Submitted On 19-JUL-2009
UlfZibis
Fix available here:
https://bugs.openjdk.java.net/show_bug.cgi?id=100092



PLEASE NOTE: JDK6 is formerly known as Project Mustang