Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 6206838
Votes 2
Synopsis Writer.append(CharSequence) should not call toString()
Category java:classes_io
Reported Against
Release Fixed
State 6-Fix Understood, request for enhancement
Priority: 3-Medium
Related Bugs 6218677
Submit Date 10-DEC-2004
Description
A DESCRIPTION OF THE REQUEST :
In the JDK1.5.0 src.zip, it is clear to see that the append(...) methods call .toString() on their CharSequence argument.

This is not required by the Appendable interface, not is it desirable at all (see below).

My recommendation is to:
rewrite append(CharSequence csq, int start, int end) similar to write(String str, int off, int len). That is, fill up a char[] buffer by calling csq.charAt(), and then pass that buffer to write()

[
Alternatively: instead of calling toString() on the whole CharSequence, call it on small "blocks" of subsequences.
]



JUSTIFICATION :
The beautiful thing about CharSequence is that  it can be used in many places where a String can be used (regex for example), without having to use an real immutable java.lang.String.

For example, we have a CharSequence implementation that is backed by the contents of a file (which may be 100's of MB, even gigbytes). We can pass such CharSequence's to regexp, and know that it will run with a fixed memory footprint (several KB), instead of having to suck the whole file into memory.

However, Writer.append(...) isn't party to this benifit. I cannot pass one of these CharSequence's to Writer and have it "stream it out gracefully". Writer will instead load my whole file into memory as a String.
  xxxxx@xxxxx   2004-12-10 00:54:22 GMT
Suggested fix contributed by java.net member leouser:

A DESCRIPTION OF THE FIX :
BUGID 6206838 Writer.append(CharSequence) should not call toString()
FILES AFFECTED: java.io.Writer
JAVA VERSION: jdk-6-rc-bin-b64-linux-i586-15_dec_2005.bin

Discussion(embedded in testcase as well):
/**
 * BUGID 6206838 Writer.append(CharSequence) should not call toString()
 * This bug/rfe request a more refined approach to appending.  The 'problem'
 * is that CharSequence is an interface... its data storage could be anything,
 * maybe the data is stored on Mars for all we know.  The problem with
 * toString() in such a flexible situation is that there may very well be
 * a mismatch between trying to turn all of it into one big string as opposed
 * to reading it in nice 1024 sized chunks.  This is what this RFE does, it
 * revises the writing happening in append so that it does not happen all
 * at once.  I don't agree with the bug writers suggestion of doing it
 * one char at a time.  The 1024 sized chunks should be adequate and more
 * efficient while not running into the mass String problem.
 *
 * REIMPLEMENTING APPEND IN Writer IS NOT ENOUGH!:
 * In developing tests for this I discovered that StringWriter, PrintWriter,
 * CharArrayWriter and maybe others are engaging in "CUT AND PASTE" style
 * coding.  Instead of relying upon their abstract Writer superclass' append
 * methods they are cuting and pasting the implementation of the superclass
 * so that they can return a reference to 'this'!  So rewriting this code
 * will not improve at least 3 classes.  Im not certain as to "CUT AND PASTE"
 * is used.  You can easily do things like:
 * super.append(args...);
 * return this;
 * It is precisely at moments like this when "CUT AND PASTE" shows its weaknesses:
 * to have the changes in Writer to take hold we must reimplement the methods
 * in StringWriter, PrintWriter and CharArrayWriter to either call the superclass
 * or engage in the perpetrating the "CUT AND PASTE" further. yucky. :D
 *
 * BENEFITS: Less memory will be consumed in writing very large CharSequences
 * into the Writer.  Though I must admit that I find this questionable if it
 * will actually acheive this.  I believe Bloch in Effective Java has pointed
 * out that nothing is eligable for GC until the method in which it was
 * created has exited.  So we may be trading the creation of one big string for
 * lots of little Strings, whose sum consumption on memory is ==.  But we
 * may have a more efficient mechanism for writing, instead of reading one giant
 * piece of data in a large slurp we probably will be getting reads that are
 * at the optimal size for a filesystem backed CharSequence.  Other types of
 * storage, I can't say.  If the CharSequence was backed by a database it may
 * end up being less efficient... lots of round trips could be the result of
 * a series of subSequence calls... .  The more I think about this the more
 * I sense that this is not a cure-all enhancement.  Then again with any
 * enhancement there are going to be use cases that will not benefit and in
 * fact be hurt by this, maybe my CharSequence/Database example is something
 * that will be damaged.  Though if I had to choose I would pick the chunk
 * implementation over the do-it-all-at-once implementation.
 *
 * ANTI-RATIONALE: see BENEFITS...
 *
 * TESTING STRATEGY:
 * Exercise the append methods with different sets of data.  There doesn't
 * appear to be a failure case since the append method is flexible enough
 * to take a null.  We could create a malicious CharSequence but that is
 * just going to confirm that it was malicious.
 *
 * Files affected: java.io.Writer
 * Writer enhanced with java 6 version:
 * jdk-6-rc-bin-b64-linux-i586-15_dec_2005.bin
 *
 * test ran succesfully on a SUSE 7.3 Linux distribution
 *
 * Brian Harry
 *   xxxxx@xxxxx  
 * Jan 16, 2006
 *
 */

UNIFIED DIFF:
--- /home/nstuff/java6/jdk1.6.0/java/io/Writer.java	Thu Dec 15 02:16:37 2005
+++ /home/javarefs/java/io/Writer.java	Mon Jan 16 09:58:22 2006
@@ -204,11 +204,11 @@
      * @since  1.5
      */
     public Writer append(CharSequence csq) throws IOException {
-	if (csq == null)
+	if(csq == null){
 	    write("null");
-	else
-	    write(csq.toString());
-    	return this;
+	    return this;
+	}
+	return append(csq, 0, csq.length());
     }
 
     /**
@@ -249,7 +249,22 @@
      */
     public Writer append(CharSequence csq, int start, int end) throws IOException {
 	CharSequence cs = (csq == null ? "null" : csq);
-	write(cs.subSequence(start, end).toString());
+	int len = end - start;
+	do{
+	    int amount;
+	    if(writeBufferSize < len){
+		amount = writeBufferSize;
+		len = len - amount;
+	    }
+	    else{
+		amount = len;
+		len = 0;
+	    }
+	    int eindex = start + amount;
+	    write(cs.subSequence(start, eindex).toString());
+	    start += amount;
+	}
+	while(len != 0);
  	return this;
     }
 


JUnit TESTCASE :
import junit.framework.TestCase;
import junit.textui.TestRunner;
import static java.lang.System.out;
import java.io.*;

/**
 * BUGID 6206838 Writer.append(CharSequence) should not call toString()
 * This bug/rfe request a more refined approach to appending.  The 'problem'
 * is that CharSequence is an interface... its data storage could be anything,
 * maybe the data is stored on Mars for all we know.  The problem with
 * toString() in such a flexible situation is that there may very well be
 * a mismatch between trying to turn all of it into one big string as opposed
 * to reading it in nice 1024 sized chunks.  This is what this RFE does, it
 * revises the writing happening in append so that it does not happen all
 * at once.  I don't agree with the bug writers suggestion of doing it
 * one char at a time.  The 1024 sized chunks should be adequate and more
 * efficient while not running into the mass String problem.
 *
 * REIMPLEMENTING APPEND IN Writer IS NOT ENOUGH!:
 * In developing tests for this I discovered that StringWriter, PrintWriter,
 * CharArrayWriter and maybe others are engaging in "CUT AND PASTE" style
 * coding.  Instead of relying upon their abstract Writer superclass' append
 * methods they are cuting and pasting the implementation of the superclass
 * so that they can return a reference to 'this'!  So rewriting this code
 * will not improve at least 3 classes.  Im not certain as to "CUT AND PASTE"
 * is used.  You can easily do things like:
 * super.append(args...);
 * return this;
 * It is precisely at moments like this when "CUT AND PASTE" shows its weaknesses:
 * to have the changes in Writer to take hold we must reimplement the methods
 * in StringWriter, PrintWriter and CharArrayWriter to either call the superclass
 * or engage in the perpetrating the "CUT AND PASTE" further. yucky. :D
 *
 * BENEFITS: Less memory will be consumed in writing very large CharSequences
 * into the Writer.  Though I must admit that I find this questionable if it
 * will actually acheive this.  I believe Bloch in Effective Java has pointed
 * out that nothing is eligable for GC until the method in which it was
 * created has exited.  So we may be trading the creation of one big string for
 * lots of little Strings, whose sum consumption on memory is ==.  But we
 * may have a more efficient mechanism for writing, instead of reading one giant
 * piece of data in a large slurp we probably will be getting reads that are
 * at the optimal size for a filesystem backed CharSequence.  Other types of
 * storage, I can't say.  If the CharSequence was backed by a database it may
 * end up being less efficient... lots of round trips could be the result of
 * a series of subSequence calls... .  The more I think about this the more
 * I sense that this is not a cure-all enhancement.  Then again with any
 * enhancement there are going to be use cases that will not benefit and in
 * fact be hurt by this, maybe my CharSequence/Database example is something
 * that will be damaged.  Though if I had to choose I would pick the chunk
 * implementation over the do-it-all-at-once implementation.
 *
 * ANTI-RATIONALE: see BENEFITS...
 *
 * TESTING STRATEGY:
 * Exercise the append methods with different sets of data.  There doesn't
 * appear to be a failure case since the append method is flexible enough
 * to take a null.  We could create a malicious CharSequence but that is
 * just going to confirm that it was malicious.
 *
 * Files affect: java.io.Writer
 * Writer enhanced with java 6 version:
 * jdk-6-rc-bin-b64-linux-i586-15_dec_2005.bin
 *
 * test ran succesfully on a SUSE 7.3 Linux distribution
 *
 * Brian Harry
 *   xxxxx@xxxxx  
 * Jan 16, 2006
 *
 */
public class TestWriter extends TestCase{


    public TestWriter(String method){
	super(method);
    }


    public void testWriter1(){
	
	out.println();
	out.println("Testing append(CharSequence)");
	CharSequence test1 = "Java is Cool";
	String test1s = test1.toString();
	StringWriter2 sw = new StringWriter2();
	sw.append(test1);
	String results = sw.toString();
	assertEquals(test1s, results);
	out.println("Testing append(CharSequence, int,int)");
	StringWriter2 sw2 = new StringWriter2();
	sw2.append(test1, 1, test1.length());
	String results2 = sw2.toString();
	assertEquals(test1s.substring(1), results2);
	StringBuilder sb = new StringBuilder();
	for(int i = 0; i < 300; i++)
	    sb.append(test1);

	out.println("Testing String of size " + sb.length());
	String bigtest = sb.toString();
	StringWriter2 sw3 = new StringWriter2();
	sw3.append(bigtest);
	String results3 = sw3.toString();
	assertEquals(bigtest, results3);
	out.println("Testing null...");
	StringWriter2 sw4 = new StringWriter2();
	sw4.append((CharSequence)null);
	assertEquals(sw4.toString(), "null");
	out.println("Testing nothing string...");
	StringWriter2 sw5 = new StringWriter2();
	sw5.append("");
	assertEquals( "", sw5.toString());

    }








    public static void main(String[] args){
	
	TestCase tc = new TestWriter("testWriter1");
	TestRunner.run(tc);

    }

public static class StringWriter2 extends Writer {

    private StringBuffer buf;


    public StringWriter2() {
	buf = new StringBuffer();
	lock = buf;
    }


    public StringWriter2(int initialSize) {
	if (initialSize < 0) {
	    throw new IllegalArgumentException("Negative buffer size");
	}
	buf = new StringBuffer(initialSize);
	lock = buf;
    }


    public void write(int c) {
	buf.append((char) c);
    }


    public void write(char cbuf[], int off, int len) {
        if ((off < 0) || (off > cbuf.length) || (len < 0) ||
            ((off + len) > cbuf.length) || ((off + len) < 0)) {
            throw new IndexOutOfBoundsException();
        } else if (len == 0) {
            return;
        }
        buf.append(cbuf, off, len);
    }


    public void write(String str) {
	buf.append(str);
    }


    public void write(String str, int off, int len)  {
	buf.append(str.substring(off, off + len));
    }

 
    public StringWriter2 append(CharSequence csq) {
	if (csq == null)
	    write("null");
	else
	    append(csq, 0, csq.length());
    	return this;
    }

 
    public StringWriter2 append(CharSequence csq, int start, int end) {
	CharSequence cs = (csq == null ? "null" : csq);
	try{
	    super.append(cs, start, end);
	}
	catch(IOException io){}
    	return this;
    }

 
    public StringWriter2 append(char c) {
	write(c);
	return this;
    }

 
    public String toString() {
	return buf.toString();
    }

 
    public StringBuffer getBuffer() {
	return buf;
    }

  
    public void flush() {
    }

 
    public void close() throws IOException {
    }

}

}


FIX FOR BUG NUMBER:
6206838
Posted Date : 2006-01-18 19:22:30.0
Work Around
N/A
Evaluation
Yes, it is possible to avoid loading a large CharSequence into a String in one operation.  Since there is no guarantee that append() (or the write() which it is specified to delegate to) is atomic, we can re-write append(CharSequence, int, int) to buffer its processing of the input CharSequence in increments of writeBufferSize.
Posted Date : 2005-10-24 00:09:25.0

Contribution-forum:https://jdk-collaboration.dev.java.net/servlets/ProjectForumMessageView?forumID=1463&messageID=10943
Posted Date : 2006-01-18 19:22:29.0
Comments
  
  Include a link with my name & email   


PLEASE NOTE: JDK6 is formerly known as Project Mustang