|
Description
|
A DESCRIPTION OF THE REQUEST :
In the JDK1.5.0 src.zip, it is clear to see that the append(...) methods call .toString() on their CharSequence argument.
This is not required by the Appendable interface, not is it desirable at all (see below).
My recommendation is to:
rewrite append(CharSequence csq, int start, int end) similar to write(String str, int off, int len). That is, fill up a char[] buffer by calling csq.charAt(), and then pass that buffer to write()
[
Alternatively: instead of calling toString() on the whole CharSequence, call it on small "blocks" of subsequences.
]
JUSTIFICATION :
The beautiful thing about CharSequence is that it can be used in many places where a String can be used (regex for example), without having to use an real immutable java.lang.String.
For example, we have a CharSequence implementation that is backed by the contents of a file (which may be 100's of MB, even gigbytes). We can pass such CharSequence's to regexp, and know that it will run with a fixed memory footprint (several KB), instead of having to suck the whole file into memory.
However, Writer.append(...) isn't party to this benifit. I cannot pass one of these CharSequence's to Writer and have it "stream it out gracefully". Writer will instead load my whole file into memory as a String.
xxxxx@xxxxx 2004-12-10 00:54:22 GMT
Suggested fix contributed by java.net member leouser:
A DESCRIPTION OF THE FIX :
BUGID 6206838 Writer.append(CharSequence) should not call toString()
FILES AFFECTED: java.io.Writer
JAVA VERSION: jdk-6-rc-bin-b64-linux-i586-15_dec_2005.bin
Discussion(embedded in testcase as well):
/**
* BUGID 6206838 Writer.append(CharSequence) should not call toString()
* This bug/rfe request a more refined approach to appending. The 'problem'
* is that CharSequence is an interface... its data storage could be anything,
* maybe the data is stored on Mars for all we know. The problem with
* toString() in such a flexible situation is that there may very well be
* a mismatch between trying to turn all of it into one big string as opposed
* to reading it in nice 1024 sized chunks. This is what this RFE does, it
* revises the writing happening in append so that it does not happen all
* at once. I don't agree with the bug writers suggestion of doing it
* one char at a time. The 1024 sized chunks should be adequate and more
* efficient while not running into the mass String problem.
*
* REIMPLEMENTING APPEND IN Writer IS NOT ENOUGH!:
* In developing tests for this I discovered that StringWriter, PrintWriter,
* CharArrayWriter and maybe others are engaging in "CUT AND PASTE" style
* coding. Instead of relying upon their abstract Writer superclass' append
* methods they are cuting and pasting the implementation of the superclass
* so that they can return a reference to 'this'! So rewriting this code
* will not improve at least 3 classes. Im not certain as to "CUT AND PASTE"
* is used. You can easily do things like:
* super.append(args...);
* return this;
* It is precisely at moments like this when "CUT AND PASTE" shows its weaknesses:
* to have the changes in Writer to take hold we must reimplement the methods
* in StringWriter, PrintWriter and CharArrayWriter to either call the superclass
* or engage in the perpetrating the "CUT AND PASTE" further. yucky. :D
*
* BENEFITS: Less memory will be consumed in writing very large CharSequences
* into the Writer. Though I must admit that I find this questionable if it
* will actually acheive this. I believe Bloch in Effective Java has pointed
* out that nothing is eligable for GC until the method in which it was
* created has exited. So we may be trading the creation of one big string for
* lots of little Strings, whose sum consumption on memory is ==. But we
* may have a more efficient mechanism for writing, instead of reading one giant
* piece of data in a large slurp we probably will be getting reads that are
* at the optimal size for a filesystem backed CharSequence. Other types of
* storage, I can't say. If the CharSequence was backed by a database it may
* end up being less efficient... lots of round trips could be the result of
* a series of subSequence calls... . The more I think about this the more
* I sense that this is not a cure-all enhancement. Then again with any
* enhancement there are going to be use cases that will not benefit and in
* fact be hurt by this, maybe my CharSequence/Database example is something
* that will be damaged. Though if I had to choose I would pick the chunk
* implementation over the do-it-all-at-once implementation.
*
* ANTI-RATIONALE: see BENEFITS...
*
* TESTING STRATEGY:
* Exercise the append methods with different sets of data. There doesn't
* appear to be a failure case since the append method is flexible enough
* to take a null. We could create a malicious CharSequence but that is
* just going to confirm that it was malicious.
*
* Files affected: java.io.Writer
* Writer enhanced with java 6 version:
* jdk-6-rc-bin-b64-linux-i586-15_dec_2005.bin
*
* test ran succesfully on a SUSE 7.3 Linux distribution
*
* Brian Harry
* xxxxx@xxxxx
* Jan 16, 2006
*
*/
UNIFIED DIFF:
--- /home/nstuff/java6/jdk1.6.0/java/io/Writer.java Thu Dec 15 02:16:37 2005
+++ /home/javarefs/java/io/Writer.java Mon Jan 16 09:58:22 2006
@@ -204,11 +204,11 @@
* @since 1.5
*/
public Writer append(CharSequence csq) throws IOException {
- if (csq == null)
+ if(csq == null){
write("null");
- else
- write(csq.toString());
- return this;
+ return this;
+ }
+ return append(csq, 0, csq.length());
}
/**
@@ -249,7 +249,22 @@
*/
public Writer append(CharSequence csq, int start, int end) throws IOException {
CharSequence cs = (csq == null ? "null" : csq);
- write(cs.subSequence(start, end).toString());
+ int len = end - start;
+ do{
+ int amount;
+ if(writeBufferSize < len){
+ amount = writeBufferSize;
+ len = len - amount;
+ }
+ else{
+ amount = len;
+ len = 0;
+ }
+ int eindex = start + amount;
+ write(cs.subSequence(start, eindex).toString());
+ start += amount;
+ }
+ while(len != 0);
return this;
}
JUnit TESTCASE :
import junit.framework.TestCase;
import junit.textui.TestRunner;
import static java.lang.System.out;
import java.io.*;
/**
* BUGID 6206838 Writer.append(CharSequence) should not call toString()
* This bug/rfe request a more refined approach to appending. The 'problem'
* is that CharSequence is an interface... its data storage could be anything,
* maybe the data is stored on Mars for all we know. The problem with
* toString() in such a flexible situation is that there may very well be
* a mismatch between trying to turn all of it into one big string as opposed
* to reading it in nice 1024 sized chunks. This is what this RFE does, it
* revises the writing happening in append so that it does not happen all
* at once. I don't agree with the bug writers suggestion of doing it
* one char at a time. The 1024 sized chunks should be adequate and more
* efficient while not running into the mass String problem.
*
* REIMPLEMENTING APPEND IN Writer IS NOT ENOUGH!:
* In developing tests for this I discovered that StringWriter, PrintWriter,
* CharArrayWriter and maybe others are engaging in "CUT AND PASTE" style
* coding. Instead of relying upon their abstract Writer superclass' append
* methods they are cuting and pasting the implementation of the superclass
* so that they can return a reference to 'this'! So rewriting this code
* will not improve at least 3 classes. Im not certain as to "CUT AND PASTE"
* is used. You can easily do things like:
* super.append(args...);
* return this;
* It is precisely at moments like this when "CUT AND PASTE" shows its weaknesses:
* to have the changes in Writer to take hold we must reimplement the methods
* in StringWriter, PrintWriter and CharArrayWriter to either call the superclass
* or engage in the perpetrating the "CUT AND PASTE" further. yucky. :D
*
* BENEFITS: Less memory will be consumed in writing very large CharSequences
* into the Writer. Though I must admit that I find this questionable if it
* will actually acheive this. I believe Bloch in Effective Java has pointed
* out that nothing is eligable for GC until the method in which it was
* created has exited. So we may be trading the creation of one big string for
* lots of little Strings, whose sum consumption on memory is ==. But we
* may have a more efficient mechanism for writing, instead of reading one giant
* piece of data in a large slurp we probably will be getting reads that are
* at the optimal size for a filesystem backed CharSequence. Other types of
* storage, I can't say. If the CharSequence was backed by a database it may
* end up being less efficient... lots of round trips could be the result of
* a series of subSequence calls... . The more I think about this the more
* I sense that this is not a cure-all enhancement. Then again with any
* enhancement there are going to be use cases that will not benefit and in
* fact be hurt by this, maybe my CharSequence/Database example is something
* that will be damaged. Though if I had to choose I would pick the chunk
* implementation over the do-it-all-at-once implementation.
*
* ANTI-RATIONALE: see BENEFITS...
*
* TESTING STRATEGY:
* Exercise the append methods with different sets of data. There doesn't
* appear to be a failure case since the append method is flexible enough
* to take a null. We could create a malicious CharSequence but that is
* just going to confirm that it was malicious.
*
* Files affect: java.io.Writer
* Writer enhanced with java 6 version:
* jdk-6-rc-bin-b64-linux-i586-15_dec_2005.bin
*
* test ran succesfully on a SUSE 7.3 Linux distribution
*
* Brian Harry
* xxxxx@xxxxx
* Jan 16, 2006
*
*/
public class TestWriter extends TestCase{
public TestWriter(String method){
super(method);
}
public void testWriter1(){
out.println();
out.println("Testing append(CharSequence)");
CharSequence test1 = "Java is Cool";
String test1s = test1.toString();
StringWriter2 sw = new StringWriter2();
sw.append(test1);
String results = sw.toString();
assertEquals(test1s, results);
out.println("Testing append(CharSequence, int,int)");
StringWriter2 sw2 = new StringWriter2();
sw2.append(test1, 1, test1.length());
String results2 = sw2.toString();
assertEquals(test1s.substring(1), results2);
StringBuilder sb = new StringBuilder();
for(int i = 0; i < 300; i++)
sb.append(test1);
out.println("Testing String of size " + sb.length());
String bigtest = sb.toString();
StringWriter2 sw3 = new StringWriter2();
sw3.append(bigtest);
String results3 = sw3.toString();
assertEquals(bigtest, results3);
out.println("Testing null...");
StringWriter2 sw4 = new StringWriter2();
sw4.append((CharSequence)null);
assertEquals(sw4.toString(), "null");
out.println("Testing nothing string...");
StringWriter2 sw5 = new StringWriter2();
sw5.append("");
assertEquals( "", sw5.toString());
}
public static void main(String[] args){
TestCase tc = new TestWriter("testWriter1");
TestRunner.run(tc);
}
public static class StringWriter2 extends Writer {
private StringBuffer buf;
public StringWriter2() {
buf = new StringBuffer();
lock = buf;
}
public StringWriter2(int initialSize) {
if (initialSize < 0) {
throw new IllegalArgumentException("Negative buffer size");
}
buf = new StringBuffer(initialSize);
lock = buf;
}
public void write(int c) {
buf.append((char) c);
}
public void write(char cbuf[], int off, int len) {
if ((off < 0) || (off > cbuf.length) || (len < 0) ||
((off + len) > cbuf.length) || ((off + len) < 0)) {
throw new IndexOutOfBoundsException();
} else if (len == 0) {
return;
}
buf.append(cbuf, off, len);
}
public void write(String str) {
buf.append(str);
}
public void write(String str, int off, int len) {
buf.append(str.substring(off, off + len));
}
public StringWriter2 append(CharSequence csq) {
if (csq == null)
write("null");
else
append(csq, 0, csq.length());
return this;
}
public StringWriter2 append(CharSequence csq, int start, int end) {
CharSequence cs = (csq == null ? "null" : csq);
try{
super.append(cs, start, end);
}
catch(IOException io){}
return this;
}
public StringWriter2 append(char c) {
write(c);
return this;
}
public String toString() {
return buf.toString();
}
public StringBuffer getBuffer() {
return buf;
}
public void flush() {
}
public void close() throws IOException {
}
}
}
FIX FOR BUG NUMBER:
6206838
Posted Date : 2006-01-18 19:22:30.0
|