Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 4980042
Votes 0
Synopsis Cannot use Surrogates in zip file metadata like filenames
Category java:classes_util_jarzip
Reported Against tiger-beta
Release Fixed 7(b57)
State 10-Fix Delivered, bug
Priority: 4-Low
Related Bugs 5030283 , 4244499
Submit Date 19-JAN-2004
Description
java/util/zip/ZipOutputStream.java has an implementation of UTF8 encoding
that does not take into account surrogates:

private static byte[] getUTF8Bytes(String s) {
  char[] c = s.toCharArray();
  int len = c.length;
  // Count the number of encoded bytes...
  int count = 0;
  for (int i = 0; i < len; i++) {
      int ch = c[i];
      if (ch <= 0x7f) {
  	count++;
      } else if (ch <= 0x7ff) {
  	count += 2;
      } else {
  	count += 3;
      }
  }
  // Now return the encoded bytes...
  byte[] b = new byte[count];
  int off = 0;
  for (int i = 0; i < len; i++) {
      int ch = c[i];
      if (ch <= 0x7f) {
  	b[off++] = (byte)ch;
      } else if (ch <= 0x7ff) {
  	b[off++] = (byte)((ch >> 6) | 0xc0);
  	b[off++] = (byte)((ch & 0x3f) | 0x80);
      } else {
  	b[off++] = (byte)((ch >> 12) | 0xe0);
  	b[off++] = (byte)(((ch >> 6) & 0x3f) | 0x80);
  	b[off++] = (byte)((ch & 0x3f) | 0x80);
      }
  }
  return b;
}
-----------------------------------------------------------
Also, Norbert Lindenberg noted:

I did notice another thing that looks fishy: 
src/share/native/java/util/zip/ZipFile.c has calls to the JNI routines 
GetStringUTFLength and GetStringUTFRegion, apparently also to handle 
file names. These are probably wrong, because JNI uses modified UTF-8 
and zip/jar files should use standard UTF-8.
Work Around
N/A
Evaluation
Probably jarzip should use standard mechanisms for encoding/decoding UTF8,
instead of doing it by hand.
  xxxxx@xxxxx   2004-01-18
Changing the current encoding to support surrogates could result in creating JAR files that are incompatible, i.e. cannot be read, by previous Java releases.  This incompatibility is not acceptable.  In fixing 4244499 though, there is a reasonable chance that support can be provided for the current implementation as well as standard UTF-8.
Posted Date : 2008-04-09 21:17:53.0

we go with the standard utf-8 charset from jdk7
Posted Date : 2009-04-16 22:27:50.0
Comments
  
  Include a link with my name & email   


PLEASE NOTE: JDK6 is formerly known as Project Mustang