Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 4244499
Votes 641
Synopsis ZipEntry() does not convert filenames from Unicode to platform
Category java:classes_util_jarzip
Reported Against 1.3 , 1.4 , 1.2.1 , 1.2.2 , 1.3.0 , 1.3.1 , 1.4.1 , 1.4.2 , 1.4.0_01
Release Fixed 7(b57)
State 10-Fix Delivered, bug
Priority: 2-High
Related Bugs 6245146 , 6272251 , 4412571 , 4508677 , 4532049 , 5030283 , 6739892 , 6827209 , 4700978 , 4980042 , 4820807 , 4415733
Submit Date 07-JUN-1999
Description


I try to create a ZIP archive containing files, provided that the filenames are french words (ie with accentuated characters). The filenames are contained in String, this means they are encoded in Unicode. If I try to create a File from the String filename, this filename is converted OK to platform specifics; but if I create a ZipEntry from the String filename, it is NOT converted to platform specifics, leading to a filename in ZIP archive which is the Unicode image (unreadable from various ZIP tools !).

For instance:

String filename = "?l?ve.txt";

// This will create a right filename on disk
File myFile = new File(filename);
...
// A file ?l?ve.txt is created on disk

// This will create a bad (unconverted) filename in ZIP archive
ZipEntry myEntry = new Entry(filename);
...
// An entry ??l??ve.txt is created in ZIP archive

The result is that the generated ZIP entry is not usable for extraction...
(Review ID: 83688) 
======================================================================




Solaris VM (build Solaris_JDK_1.2.1_04, native threads, sunwjit)
Classic VM (build JDK-1.2.2-W, green threads, sunwjit)
java version "1.1.6"

Within a ZIP file, pathnames use the forward slash / as separator, as required
by the ZIP
<A HREF="ftp://ftp.uu.net/pub/archiving/zip/doc/appnote-970311-iz.zip">spec</A>.
This requires a conversion from or to the local file.separator on systems like
Windows.  The API (ZipEntry) does not take care of the transformation, and the
need for the programmer to deal with it is not documented.  As a result, code
like
  ZipEntry ze;
  File f;
  f = new File( ze.getName());
will be written and fail on the Windows platform, or the reverse
  ze = new ZipEntry( f.getName());
will fail or produce invalid jars on Windows platforms.

Either the docs or the API needs to be fixed.  Preferably a new method and
constructor could be added
  File f = ze.toFile();     ze = new ZipEntry( f);
that would perform the translation between '/' and File.separatorChar, leaving
the existing methods/constructors (perhaps deprecated) for use by existing code.
But if the API is not fixed, then the docs must be fixed to make sure the
programmer deals with the translation explicitly.

Note new methods in java.util.zip.ZipEntry would also need to be reflected
in java.util.jar.JarEntry.
(Review ID: 100505)
======================================================================
Posted Date : 2005-08-15 12:10:22.0
Work Around


ZipEntry ze;
File f;
String s;

s = ze.getName();
if ( File.separatorChar != '/' )
  s = s.replace( '.', File.separatorChar);
f = new File( s);

s = f.getName();
if ( File.separatorChar != '/')
  s = s.replace( File.separatorChar, '/');
ze = new ZipEntry( s);
(Review ID: 100505)
======================================================================
Evaluation
There's a lot of additional information in the JDC discussions about this bug and the duplicates 4532049, 4700978, 4415733, 4820807.

The zip specification does not specify the character encoding to be used for file names (essentially, it doesn't consider file names that include non-ASCII characters). We decided that for jar files, which must be portable between different platforms and different locale environments, only UTF-8 makes sense. Therefore the code currently encodes and decodes all file names within jar/zip files using UTF-8.

However, for normal (non-jar) zip files, the convention used by other tools is to use the platform encoding for file names. Applications that use the java.util.zip package to read/write normal zip files therefore fail (or produce unreadable files) if a file name contains a non-ASCII character, unless the platform encoding happens to be UTF-8.

To solve this problem, I think we need to distinguish between jar and zip files, and enable the use of encodings other than UTF-8 for the file names within non-jar zip files.

A possible solution would be to add a ZipFile constructor:

    java.util.zip.ZipFile.ZipFile(File file,  int mode, String encoding)

which lets an application specify the encoding for the file names and zip comments used within the zip file. Document that the encoding used for the other constructors is UTF-8, and that callers of the new constructor can pass in the result of java.nio.charset.Charset.defaultCharset().name() to request the platform encoding.

This lets applications access zip files that use the encoding of the platform they run on, or even generate zip files using the encoding of the platform of the client machine that a zip files is intended for (some of the bug discussion mentions servlets creating zip files for download).

The jar classes would continue to use the constructors that don't take the encoding parameter, and therefore continue to use UTF-8.

The encoding of the contents of the files included in the zip files is not affected - they're just byte streams.

For command line use, the jar command could be enhanced with an option that specifies the file name encoding, using either an encoding name or "default" for the platform encoding. This option should be disabled when creating jar files.
  xxxxx@xxxxx   2005-1-28 18:42:10 GMT
We expect to resolve this in the Dolphin/6.0 release (though our planning for
Dolphin is not complete).  We anticipate a Dolphin source repository sometime
this summer.  Hopefully, we can get this fix into Dolphin very early, to
discover any unintended consequences well before Dolphin's official release.

A contributor to the JDK community has started workin on this bug (thanks!)
and you can join/follow the discussion here:

https://jdk-collaboration.dev.java.net/servlets/ProjectForumMessageView?messageID=13115&forumID=1463

We're considering two possibilities for the fix: one is largely that proposed
by several people, namely to add constructors that allow clients to indicate a
zip file's encoding.  The other is to work with providers of zip
implementations to provide the encoding of the entries in a file in the file
itself.  Discussion on the latter has been started at the above URL (see the
entry "Unicode extension for ZIP file specification".

Note that this bug raises two, independent issues: one concerns the character
encoding for the file's entries; the other concerns the kind of path separator
that is used on particular platforms.  The latter has a straightforward fix
(and for now, work around as noted).
Posted Date : 2006-06-13 18:20:56.0

Contribution forum : https://jdk-collaboration.dev.java.net/servlets/ProjectForumMessageView?forumID=1463&messageID=16142
Posted Date : 2006-10-13 08:02:16.0
Comments
  
  Include a link with my name & email   

Submitted On 27-JUL-2000
Fotis
&quot;Within a zip file, all entry names use / as the separator&quot;: This seems to be true at the moment and is a very bad idea, because it means, that all zipfiles produced this way cannot be opened by windows tools. 


Submitted On 27-SEP-2000
sukie
Sukie 9/27/00 - The first problem described in this bug report (i.e. corrupted French accented filename in 
ZipEntry) should be reported against JDK 1.1.8 too.  This bug really makes it impossible to zip any 
filenames containing accented or double byte characters.  If you know of any workaround to this problem, I 
would appreciate hearing from you. Afterall, I suspect Sun is going to wave its hands and say we are not 
fixing it for JDK 1.1.8.  


Submitted On 14-MAY-2001
elm12
Are there any plans to fix this bug at all? It's been known 
for almost two years now!!


Submitted On 29-MAY-2001
skonigsdorfer
(From bug originator)

I fully agree with the previous comment !
I opened this bug two years ago, and I still cannot make a 
ZIP file with french file names !!!
I have just discovered Sun has released the JDK 1.4.0 beta, 
with JRE translation in ten languages, but this bug still 
remains...

Conclusion: please vote for this bug !!!


Submitted On 07-JUN-2001
jeff_robertson
The statement that zip files created by the java.util.Zip 
classes cannot be opened by Windows-based unzip programs is 
incorrect.


Submitted On 08-JUN-2001
jeff_robertson
In addition to my earlier comments, I would also like to 
point out that java.io.File *already* translates 
between '/' and '\' when on Windows. Try it:

System.out.println( new File("c:/foo"));

will print:

c:\foo

(it doesn't matter anyway because Windows allows either / 
or \ as a path separator)

I assume that File performs a similar normalization when 
used on OS's that use different from either / or \, but I 
don't have such a system handy for testing.

So, the issue with the UTF8 names is still something that 
needs to be taken care of, but all this worrying about path 
separators is needless.


Submitted On 03-SEP-2001
janilxx
I have scandinavic letters(like ä and ö if these are shown 
correctly to you) in filename. I put that file to zipEntry. 
All ok. I'll check the filename using zipEntry.getName() 
and it shows all letters are still ok. But when I send that 
zip from servlet to browser and open that sent zip using 
winzip I can see that the filename is not the same anymore. 
Scandinavian letters have changed :(

This bug is now in 16th place in top 25 bugs list but it 
seems not to be fixed. Isn't there any skillful coders that 
get paid by SUN out there? :)

Vote for this bug!


Submitted On 11-SEP-2001
lk1058
a


Submitted On 11-SEP-2001
lk1058
please fix this bug in near future for nations all 
over the world which language uses multibytes~~~


Submitted On 18-SEP-2001
janilxx
I have files with scandinavic letters(äöå etc) in name in 
zip. 

I have code:
zipEntry = zipInputStream.getNextEntry()

Exception is throwed when that "scandinavic letters file" 
is tried to access:
java.lang.IllegalArgumentException
	at java.util.zip.ZipInputStream.getUTF8String
(ZipInputStream.java:291)
	at java.util.zip.ZipInputStream.readLOC
(ZipInputStream.java:230)
	at java.util.zip.ZipInputStream.getNextEntry
(ZipInputStream.java:75)
	at com.zyx.upload.Uploader.createFiles
(Uploader.java:461)


Submitted On 15-NOV-2001
___pip___
___pip___

well, better this way!


Submitted On 15-NOV-2001
___pip___
where is my link???


Submitted On 15-NOV-2001
___pip___
well, perhaps someone has discovered any workaround for the 
first problem since the last message of sukie, a year ago...
I'd appreciate to hear about it!


Submitted On 16-JAN-2002
WördehoffH
This bug seems to be duplicated by bug 4415733.
(http://developer.java.sun.com/developer/bugParade/bugs/4415733.html)


Submitted On 22-JAN-2002
666k
Please, fix this bug. I have zip files with ñ,ó, etc and I 
can not unzip this entries. I think that the bug is calling 
to native code, in zip.dll or libzip.so. The method 
getEntry (I think) converts Unicode String to UTF-8, and 
the entry is not found.

I hope that you fixed this bug.


Submitted On 30-JAN-2002
tj2000az
Please fix this bug (and 4415733 for that matter) ASAP!!!


Submitted On 30-JAN-2002
tj2000az
Maybe I should be more specific in my posting:

Using JDK/JRE 1.3.1_2

Getting the ZipEntry ze directly from ZipFile zf seems to 
return the correct character with ze.getName(), but in a 
different encoding, when the file was packed with PKZIP - 
it returns garbage instead of the correct character when 
packed with WinZip.

Retrieving the ZipEntry containing the international 
character with zin.getNextEntry() from a ZipInputStream zin 
throws an exception without any specific error message 
(null) to the calling applet.


Submitted On 27-AUG-2002
vojtecho
This bug is really pain. Is there any workaround for this? Any 
tool I can use ... whatever ...


Submitted On 10-OCT-2002
jez12gbr
Well, I realise that patching the standard classes are not an 
option. Here is my latest thought on the problem:
This is how it currently works: we give a Unicode String to 
ZipOutputStream, it then converts it to UTF-8, and places 
that inside the Zip archive.
What we need is a way of supplying a modified String to 
ZipOutputStream, so that when converted to UTF8 it actually 
yields CP437. I have tried it by doing this:
byte[] bytes = fileName.getBytes("Cp437");
String str = new String(bytes, 0);
System.err.println(str);
ZipEntry ze1 = new ZipEntry(str);
However, that gives a dash before each accented character. 
Due to the fact that adding 0 as high-order byte to the 
existing byte defining the characters (that's what new String
(bytes, 0) does) is not the solution: when converted to UTF-
8 it does not yield Cp437. What we need is to create a 
method that does the opposite of converting to UTF8, so that 
it yields the right encoding when supplied to ZipOutputStream.


Submitted On 10-OCT-2002
jez12gbr
I have managed to fix this bug by patching the 
java.util.zip.ZipOutputStream class, like so:
    private static byte[] getUTF8Bytes(String s) {
    	System.err.println("Patched ZOS"); // to check that 
it's my version
    	try {
	    	return s.getBytes("Cp437");
	    }
	    catch(java.io.UnsupportedEncodingException e)
	    {
	    	// never happens
	    }
	    return null;
I have read in one other bug submission that Zip uses Cp437 
character encoding. So, by converting to that char set 
instead of UTF8, everything is fine.
Using this patched version requires an unsupported option (for 
1.3.1): java -
Xbootclasspath/p:"c:/jdk1.3.1/jre/lib/ext/java.jar".
However, there might something I haven't thought of, since I 
believe that class is used as well for generating jars. I'm not 
sure whether jar creation/reading is broken because of that 
patch. Bug 4092784 caused the conversion to UTF-8.


Submitted On 10-JAN-2003
reihtul
This bug hasn't been fixed for more than 3 years and a half... 
Will it be one day?? Does anybody know about any better Zip 
encoding/decoding library in java? :-)


Submitted On 07-FEB-2003
mayuga
I have the same problem too.
But, if you make a new zip using this sentence:
jar cvMf file.zip directory_to_compress

If the directory to compress contains a file with special 
chars, when you Unzip it from Java, the file will be unzipped 
correctly.


Submitted On 09-APR-2003
isleigh
How can a bug this general be allowed to exist for so long?
 The same problem prevents the jar utility from
uncompressing files created with a £ sign in them


Submitted On 05-JUN-2003
olive64
here is a workaround for this bug. 
- create a new package with 3 classes ZipOutputStream, 
ZipConstants and ZipEntry
- copy the source code from the corresponding java.util.zip 
package
- in ZipOutputStream, add "private String encoding = "UTF-
8";" in the data members declaration
- add a new constructor ZipOutputStream(OutputStream out, 
String encoding) that saves the encoding and calls the 
regular cstor
- add the following method
    private byte[] getEncodedBytes(String s) throws 
UnsupportedEncodingException {
	return s.getBytes(encoding);
    }
-remove the 2 getUTF8... methods
- replace all the call to String.getUTFBytes() by 
getEncodedBytes() in ZipOutputStream
- edit the new ZipEntry, remove the static initializer that calls 
the native methods initIDs()

And Voila, you have a ZipOutputStream that supports multi-
bytes entry names (tested with the "Shift-JIS" encoding 
param for japanese characters)

The last step in ZipEntry is a bit scary but seems to have no 
effect, and this is the only way I found to write a custom 
ZipOutputStream, because of the native methods calls 
nightmare.

Any input/comments/improvements greatly appreciated, 
olivier@terragrafica.com


Submitted On 17-DEC-2003
armateras
current best solution is org.apache.tools.zip.* of Ant

^^;


Submitted On 17-DEC-2003
dbeutner
Hello out there,

some weeks ago, SUN has evaluated the duplicate 
bug 4415733, sent in by myself some years ago... For 
your convenience, I copy the text in here:

>---cut---cut---cut---cut---cut---cut---<

Java uses UTF-8 to store file names in JAR files.  
Other archivers use different encodings.  Unfortunately, 
there is no portable way to determine or specify the 
encodings of filenames or other data in jar or zip files, 
and so there is limited interoperability between 
different zip implementations when non-ASCII file 
names are used (this is not just a Java issue).

The Java zip implementation could
- extend the Zip standard so that other  
implementations could recognize that a zip file was 
created by Java.
- use heuristics to recognize jar/zip files created by 
other implementations.

But that requires a lot of work and inter-implementation 
compatibility testing that might be broken by an 
upgrade of a competing implementation. Is the 
ongoing maintenance effort worth it?
xxxxx@xxxxx 2003-11-06


Submitted On 14-JAN-2004
skonigsdorfer
(From original poster of this bug)

I have read Sun's evaluation of related bug 4415733.
If I understand that any heuristics may be heavy to code, it
is VERY simple for Sun to allow at least the caller define
the encoding to use (currently forced to utf-8) when
creating a new ZIP file.
Furthermore, there would be no compatibility issue to add a
new constructor for ZipOutputStream with the encoding as an
additional parameter (see above for the 'do it yourself'
from Sun sources).


Submitted On 24-APR-2004
verdyp
I read that the Zip specification includes now some 
internal tags to specify which encoding is used. At 
least there could exist already a tag that indicates that 
this encoding is UTF-8, with the same conventions as 
in URI. If so, we should better use it, and let the various 
zip tools in Windows recognize this tag because it 
would mean interoperable zip files. Then it's up to 
each Zip tool to support the necessary conversion 
when inserting/extracting files into/from the archive.

Having to code all zip entries with CP437 seems 
ill ,even on Windows where this encoding is now really 
deprecated since long (at least in favor of ISO8859-1 
or Windows 1252)... (CP437 was created for DOS 
filesystems, with limited filename lengths, but almost 
all platforms today support and need long filenames, 
for which the CP437 encoding is already 
innappropriate).
The encoding issue is then not specific to Java, but to 
Zip file formats in general which use deprecated 
labels for plateform-specific encodings rather than 
adopting UTF-8 encoding and URL conventions to 
designate folders.


Submitted On 13-MAY-2004
Hugh_T
I am using the Apache Commons zip classes in our 
internationalized product for just this reason:  it allows 
the encoding of the filenames within the archive to be 
specified.


Submitted On 02-JUL-2004
pablodc
I found in this page a couple of  workarrounds for this bug, jazzlib is one of them and work for me (spanish characters in zip files):
http://www.peterbuettner.de/develop/javasnippets/zipOnlyAscii/


Submitted On 08-JUL-2004
HacketiWack
Open the source of your virtual machine, so we can correct the problem ourselft.
Money talks again.


Submitted On 08-DEC-2004
onekilo
I have attempted to use jazzlib, apache ant 1.6.2 zip tools and none of these are able to handle files with non-us ascii characters.  I am attempting to use ISO-8859-1 encoding for the files.  
Furthermore the index file is written correctly using the Java.util.zip classes but the filename isnt so using UNIX zip -x fails as the file does not exist.


Submitted On 03-JAN-2005
Laie_Techie
This bug seems to be a case of bug 4415733., though I would broaden bug 4415733 to include _any_  non-Latain character.


Submitted On 04-JUL-2005
MartinHilpert
Even now with JDK 1.5.0_04, this is still not fixed. I understand the issues of the ZIP file format as described here, but why don't you just offer another constructor/method so we can set our own encoding (as it was suggested above)? This wouldn't break compatibility but help all of us!


Submitted On 04-JUL-2005
MartinHilpert
There are so many other open bugs related to this bug that all together would probably put this bug to the top 1 most wanted bug on the top 25 bugs. And reading all the suggestions here, the fix would be very easy for sun to implement - please, please, please, pleeeeeeeaaaase!


Submitted On 28-JUL-2005
Sylle
Hi,

Here is a workaround I found for the special characters:
(I only tried to write ZIP file using danish characters so far and opened them in WinZip to verify the file names)

1) I made a copy of ZIPOutputStream, ZipConstants, DeflaterOutputStream and ZipEntry and modified their import in order to get them use each other.

2) I modified some methods in ZIPOutputStream:

  private void writeLOC(ZipEntry e) throws IOException {
    writeInt(LOCSIG);     // LOC header signature
    writeShort(512);      // version needed to extract
    ...
  }

  private void writeCEN(ZipEntry e) throws IOException {
    writeInt(CENSIG);     // CEN header signature
    writeShort(512);     // version made by
    ...
  }

As default the version is 0, it is for MS-DOS file format compatibility.
The value 512 for the version is to define that it is a Windows platform (only the upper byte is used for that, the lower byte indicates the ZIP specification version).
Here are the other possible values (from ZIP file specifications):
0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems)
1 - Amiga                     
2 - OpenVMS
3 - Unix                      
4 - VM/CMS
5 - Atari ST                  
6 - OS/2 H.P.F.S.
7 - Macintosh                 
8 - Z-System
9 - CP/M                     
10 - Windows NTFS
11 - MVS (OS/390 - Z/OS)      
12 - VSE
13 - Acorn Risc               
14 - VFAT
15 - alternate MVS            
16 - BeOS
17 - Tandem                   
18 - OS/400
19 - OS/X (Darwin)            
20 thru 255 - unused

And I modified the UTF8 methods as well:

  static int getUTF8Length(String s) {
      return s.getBytes().length;
   }

  private static byte[] getUTF8Bytes(String s) {
      return s.getBytes();
  }

This is just a test and I can imagine to add a check of platform in order to write the correct version but I have no idea how compatible the generated files are.
Anyway, I think this problem will be solved soon (few years ;-)) from now) so keep this dirty workaround as a temporary solution.

If someone wants to try and send me a feedback or wants the 4 modified files by lazyness my e-mail is spo@opi.dk

Cheers


Submitted On 26-AUG-2005
Marcelo9
It is incredible that this problem is taking Sun so long time. SO SERIOUS BUG LIKE THIS!!!!!

I think I will open the source code and I will fix that!!!!!


Submitted On 06-SEP-2005
papgyo
In the apache ant package I guess there is a constuctor where the encoding can be passed to.


Submitted On 22-NOV-2005
MartinHilpert
It's the second most top bug - why don't you just fix this damn bugger in a 1.5.0_6 release?


Submitted On 07-DEC-2005
JF_Beaulac
Should be reported against the 1.4.2_xx series too. It still does it on 1.4.2_09.

Cmon Sun, Fix this !!!


Submitted On 19-DEC-2005
Christian_Schlichtherle
Have a look at <a href="http://truezip.dev.java.net">TrueZIP</a>. It does not only fix this bug, it also allows fully transparent access to ZIP compatible files as if they were directories in the pathname using drop-in replacements for the File* classes. So  it does not make a difference to your application if you are addressing a native file or an entry in a ZIP file.

Enjoy, Christian Schlichtherle


Submitted On 19-DEC-2005
Christian_Schlichtherle
Have a look at <a href="http://truezip.dev.java.net">TrueZIP</a>. It does not only fix this bug, it also allows fully transparent access to ZIP compatible files as if they were directories in the pathname using drop-in replacements for the File* classes. So  it does not make a difference to your application if you are addressing a native file or an entry in a ZIP file.

Enjoy, Christian Schlichtherle


Submitted On 19-DEC-2005
Christian_Schlichtherle
Have a look at <a href="http://truezip.dev.java.net">TrueZIP</a>. It does not only fix this bug in its lowlevel API, it's high level API also allows fully transparent access to ZIP compatible files as if they were directories in the pathname using drop-in replacements for the File* classes. So  it does not make a difference to your application if you are addressing a native file or an entry in a ZIP file.

BTW: Though not specified in PKZIP's application note, the de facto standard encoding for ZIP files is IBM437, aka CP437, the original IBM PC character set in the United States. Not hard to tell if look at where ZIP originated from.

Enjoy, Christian Schlichtherle


Submitted On 19-DEC-2005
Christian_Schlichtherle
Have a look at <a href="http://truezip.dev.java.net">TrueZIP</a>. It does not only fix this bug in its lowlevel API, it's high level API also allows fully transparent access to ZIP compatible files as if they were directories in the pathname using drop-in replacements for the File* classes. So  it does not make a difference to your application if you are addressing a native file or an entry in a ZIP file.

BTW: Though not specified in PKZIP's application note, the de facto standard encoding for ZIP files is IBM437, aka CP437, the original IBM PC character set in the United States. Not hard to tell if look at where ZIP originated from.

Enjoy, Christian Schlichtherle


Submitted On 06-JAN-2006
MartinHilpert
Happy new year! Can we expect to have this second most reported bug being fixed in the next major 1.6 Java release?


Submitted On 17-MAY-2006
jeff@redcondor.com
7 year old bug with > 500 votes.   What does it take to get Sun's attention??


Submitted On 22-MAY-2006
gagern
I'm planning to attack this issue by contributing to the jdk collaboration project. I started a thread in the dev froum there, but wanted to make the most important announcements here as well.

One open question (at least for me) is whether to use real UTF-8 by default and for JAR, or modified UTF-8 as has been the case so far and as has been reported as bug 5030283.

What exactly is the impact with the jar command line tool? It is intended for JAR files which use (modified) UTF-8. So why does this bug apply there?


Submitted On 26-JUL-2006
jschnab@gmx.de
can it be that sun didn't fix the issue because it is not marked as reported against the Java 1.5 & 1.6 versions?
PLEASE FIX IT !!!


Submitted On 28-JUL-2006
fcolavin
please fix it !!!! pleaseeeeeeeeeeeeeeeeee 


Submitted On 19-OCT-2006
Jobin
Yes, This is a big problem in creating zip files with specific names.


Submitted On 29-DEC-2006
A partial fix of this problem would be in ZipInputStream.getUTF8String: if you detect that given octet stream is not a valid utf-8 string, you may do someting more intelligent than throwing an exception, for example, retrying with the other usual ZIP encoding. I thing that such fix is easy to do, wouldn't hurt anybody, and would allow at least to read third party zips from Java.


Submitted On 05-JAN-2007
Christian_Schlichtherle
Use TrueZIP. It provides a drop-in replacement for java.util.zip which accepts arbitrary encodings as a constructor parameter.

Better yet, its virtual filesystem lets you completely abstract from ZIP files, treating them just like virtual directories.


Submitted On 06-APR-2007
christo
It is generally not recommended to use white space or 
Umlaute or other non ASCII characters in file names.
Therfore I would not regard this as a bug.


Submitted On 06-MAY-2007
Please fix this bug!


Submitted On 06-MAY-2007
Vote for this bug! Please fix this bug!


Submitted On 05-JUN-2007
rminner
Hey, two days to the 8-year anniversary. Should we organize a party? :-)


Submitted On 26-JUN-2007
I'm now using  the apache ant implementation for ZipOutputStream. With setting the encoding to "Cp437" it works for me. German umlauts are working fine :)
at last...


Submitted On 10-JUL-2007
kobevaliant
Is there any difficulty in fixing this bug?


Submitted On 09-MAR-2008
decoding System coded ByteBuffer's from a FileChannel is done by <pre>MappedByteBuffer map = raf.getChannel().map(FileChannel.MapMode.READ_ONLY, 0, raf.length());
        map.order(ByteOrder.nativeOrder());
        StringBuffer strBuffer = new StringBuffer(Charset.defaultCharset().decode(map));</pre>
Thus the Charset.defaultCharset().decode(ByteBuffer) can solve the problem if your filename is containing special characters. e.g. (File).listFiles() can provide the file names and be converted with the System default Charset.


Submitted On 18-JUN-2008
ecki
Is that work (jdk-collabortion) discontinued or can we find that somewhere in openjdk. Righ now no workaround for *reading* broken ZIP files exist.


Submitted On 24-NOV-2008
Part of the pain this bug causes, is caused by the following method in java.util.zip.ZipFile:

    public InputStream getInputStream(ZipEntry entry) throws IOException {
	return getInputStream(entry.name);
    }

The method public Enumeration<? extends ZipEntry> entries() can sum up all entries in the .zip file, even those with non-ASCII characters in the name, and even entries with duplicate names. But by falling back to their names when requesting an InputStream for these legally acquired an really excisting entries, you cannot access them beyond checking their presence. Cannot the InputStream be opened on the ZipEntry itself, instead of on it's name?



PLEASE NOTE: JDK6 is formerly known as Project Mustang