Submitted On 27-JUL-2000
Fotis
"Within a zip file, all entry names use / as the separator": This seems to be true at the moment and is a very bad idea, because it means, that all zipfiles produced this way cannot be opened by windows tools.
Submitted On 27-SEP-2000
sukie
Sukie 9/27/00 - The first problem described in this bug report (i.e. corrupted French accented filename in
ZipEntry) should be reported against JDK 1.1.8 too. This bug really makes it impossible to zip any
filenames containing accented or double byte characters. If you know of any workaround to this problem, I
would appreciate hearing from you. Afterall, I suspect Sun is going to wave its hands and say we are not
fixing it for JDK 1.1.8.
Submitted On 14-MAY-2001
elm12
Are there any plans to fix this bug at all? It's been known
for almost two years now!!
Submitted On 29-MAY-2001
skonigsdorfer
(From bug originator)
I fully agree with the previous comment !
I opened this bug two years ago, and I still cannot make a
ZIP file with french file names !!!
I have just discovered Sun has released the JDK 1.4.0 beta,
with JRE translation in ten languages, but this bug still
remains...
Conclusion: please vote for this bug !!!
Submitted On 07-JUN-2001
jeff_robertson
The statement that zip files created by the java.util.Zip
classes cannot be opened by Windows-based unzip programs is
incorrect.
Submitted On 08-JUN-2001
jeff_robertson
In addition to my earlier comments, I would also like to
point out that java.io.File *already* translates
between '/' and '\' when on Windows. Try it:
System.out.println( new File("c:/foo"));
will print:
c:\foo
(it doesn't matter anyway because Windows allows either /
or \ as a path separator)
I assume that File performs a similar normalization when
used on OS's that use different from either / or \, but I
don't have such a system handy for testing.
So, the issue with the UTF8 names is still something that
needs to be taken care of, but all this worrying about path
separators is needless.
Submitted On 03-SEP-2001
janilxx
I have scandinavic letters(like ä and ö if these are shown
correctly to you) in filename. I put that file to zipEntry.
All ok. I'll check the filename using zipEntry.getName()
and it shows all letters are still ok. But when I send that
zip from servlet to browser and open that sent zip using
winzip I can see that the filename is not the same anymore.
Scandinavian letters have changed :(
This bug is now in 16th place in top 25 bugs list but it
seems not to be fixed. Isn't there any skillful coders that
get paid by SUN out there? :)
Vote for this bug!
Submitted On 11-SEP-2001
lk1058
a
Submitted On 11-SEP-2001
lk1058
please fix this bug in near future for nations all
over the world which language uses multibytes~~~
Submitted On 18-SEP-2001
janilxx
I have files with scandinavic letters(äöå etc) in name in
zip.
I have code:
zipEntry = zipInputStream.getNextEntry()
Exception is throwed when that "scandinavic letters file"
is tried to access:
java.lang.IllegalArgumentException
at java.util.zip.ZipInputStream.getUTF8String
(ZipInputStream.java:291)
at java.util.zip.ZipInputStream.readLOC
(ZipInputStream.java:230)
at java.util.zip.ZipInputStream.getNextEntry
(ZipInputStream.java:75)
at com.zyx.upload.Uploader.createFiles
(Uploader.java:461)
Submitted On 15-NOV-2001
___pip___
___pip___
well, better this way!
Submitted On 15-NOV-2001
___pip___
where is my link???
Submitted On 15-NOV-2001
___pip___
well, perhaps someone has discovered any workaround for the
first problem since the last message of sukie, a year ago...
I'd appreciate to hear about it!
Submitted On 16-JAN-2002
WördehoffH
This bug seems to be duplicated by bug 4415733.
(http://developer.java.sun.com/developer/bugParade/bugs/4415733.html)
Submitted On 22-JAN-2002
666k
Please, fix this bug. I have zip files with ñ,ó, etc and I
can not unzip this entries. I think that the bug is calling
to native code, in zip.dll or libzip.so. The method
getEntry (I think) converts Unicode String to UTF-8, and
the entry is not found.
I hope that you fixed this bug.
Submitted On 30-JAN-2002
tj2000az
Please fix this bug (and 4415733 for that matter) ASAP!!!
Submitted On 30-JAN-2002
tj2000az
Maybe I should be more specific in my posting:
Using JDK/JRE 1.3.1_2
Getting the ZipEntry ze directly from ZipFile zf seems to
return the correct character with ze.getName(), but in a
different encoding, when the file was packed with PKZIP -
it returns garbage instead of the correct character when
packed with WinZip.
Retrieving the ZipEntry containing the international
character with zin.getNextEntry() from a ZipInputStream zin
throws an exception without any specific error message
(null) to the calling applet.
Submitted On 27-AUG-2002
vojtecho
This bug is really pain. Is there any workaround for this? Any
tool I can use ... whatever ...
Submitted On 10-OCT-2002
jez12gbr
Well, I realise that patching the standard classes are not an
option. Here is my latest thought on the problem:
This is how it currently works: we give a Unicode String to
ZipOutputStream, it then converts it to UTF-8, and places
that inside the Zip archive.
What we need is a way of supplying a modified String to
ZipOutputStream, so that when converted to UTF8 it actually
yields CP437. I have tried it by doing this:
byte[] bytes = fileName.getBytes("Cp437");
String str = new String(bytes, 0);
System.err.println(str);
ZipEntry ze1 = new ZipEntry(str);
However, that gives a dash before each accented character.
Due to the fact that adding 0 as high-order byte to the
existing byte defining the characters (that's what new String
(bytes, 0) does) is not the solution: when converted to UTF-
8 it does not yield Cp437. What we need is to create a
method that does the opposite of converting to UTF8, so that
it yields the right encoding when supplied to ZipOutputStream.
Submitted On 10-OCT-2002
jez12gbr
I have managed to fix this bug by patching the
java.util.zip.ZipOutputStream class, like so:
private static byte[] getUTF8Bytes(String s) {
System.err.println("Patched ZOS"); // to check that
it's my version
try {
return s.getBytes("Cp437");
}
catch(java.io.UnsupportedEncodingException e)
{
// never happens
}
return null;
I have read in one other bug submission that Zip uses Cp437
character encoding. So, by converting to that char set
instead of UTF8, everything is fine.
Using this patched version requires an unsupported option (for
1.3.1): java -
Xbootclasspath/p:"c:/jdk1.3.1/jre/lib/ext/java.jar".
However, there might something I haven't thought of, since I
believe that class is used as well for generating jars. I'm not
sure whether jar creation/reading is broken because of that
patch. Bug 4092784 caused the conversion to UTF-8.
Submitted On 10-JAN-2003
reihtul
This bug hasn't been fixed for more than 3 years and a half...
Will it be one day?? Does anybody know about any better Zip
encoding/decoding library in java? :-)
Submitted On 07-FEB-2003
mayuga
I have the same problem too.
But, if you make a new zip using this sentence:
jar cvMf file.zip directory_to_compress
If the directory to compress contains a file with special
chars, when you Unzip it from Java, the file will be unzipped
correctly.
Submitted On 09-APR-2003
isleigh
How can a bug this general be allowed to exist for so long?
The same problem prevents the jar utility from
uncompressing files created with a £ sign in them
Submitted On 05-JUN-2003
olive64
here is a workaround for this bug.
- create a new package with 3 classes ZipOutputStream,
ZipConstants and ZipEntry
- copy the source code from the corresponding java.util.zip
package
- in ZipOutputStream, add "private String encoding = "UTF-
8";" in the data members declaration
- add a new constructor ZipOutputStream(OutputStream out,
String encoding) that saves the encoding and calls the
regular cstor
- add the following method
private byte[] getEncodedBytes(String s) throws
UnsupportedEncodingException {
return s.getBytes(encoding);
}
-remove the 2 getUTF8... methods
- replace all the call to String.getUTFBytes() by
getEncodedBytes() in ZipOutputStream
- edit the new ZipEntry, remove the static initializer that calls
the native methods initIDs()
And Voila, you have a ZipOutputStream that supports multi-
bytes entry names (tested with the "Shift-JIS" encoding
param for japanese characters)
The last step in ZipEntry is a bit scary but seems to have no
effect, and this is the only way I found to write a custom
ZipOutputStream, because of the native methods calls
nightmare.
Any input/comments/improvements greatly appreciated,
olivier@terragrafica.com
Submitted On 17-DEC-2003
armateras
current best solution is org.apache.tools.zip.* of Ant
^^;
Submitted On 17-DEC-2003
dbeutner
Hello out there,
some weeks ago, SUN has evaluated the duplicate
bug 4415733, sent in by myself some years ago... For
your convenience, I copy the text in here:
>---cut---cut---cut---cut---cut---cut---<
Java uses UTF-8 to store file names in JAR files.
Other archivers use different encodings. Unfortunately,
there is no portable way to determine or specify the
encodings of filenames or other data in jar or zip files,
and so there is limited interoperability between
different zip implementations when non-ASCII file
names are used (this is not just a Java issue).
The Java zip implementation could
- extend the Zip standard so that other
implementations could recognize that a zip file was
created by Java.
- use heuristics to recognize jar/zip files created by
other implementations.
But that requires a lot of work and inter-implementation
compatibility testing that might be broken by an
upgrade of a competing implementation. Is the
ongoing maintenance effort worth it?
xxxxx@xxxxx 2003-11-06
Submitted On 14-JAN-2004
skonigsdorfer
(From original poster of this bug)
I have read Sun's evaluation of related bug 4415733.
If I understand that any heuristics may be heavy to code, it
is VERY simple for Sun to allow at least the caller define
the encoding to use (currently forced to utf-8) when
creating a new ZIP file.
Furthermore, there would be no compatibility issue to add a
new constructor for ZipOutputStream with the encoding as an
additional parameter (see above for the 'do it yourself'
from Sun sources).
Submitted On 24-APR-2004
verdyp
I read that the Zip specification includes now some
internal tags to specify which encoding is used. At
least there could exist already a tag that indicates that
this encoding is UTF-8, with the same conventions as
in URI. If so, we should better use it, and let the various
zip tools in Windows recognize this tag because it
would mean interoperable zip files. Then it's up to
each Zip tool to support the necessary conversion
when inserting/extracting files into/from the archive.
Having to code all zip entries with CP437 seems
ill ,even on Windows where this encoding is now really
deprecated since long (at least in favor of ISO8859-1
or Windows 1252)... (CP437 was created for DOS
filesystems, with limited filename lengths, but almost
all platforms today support and need long filenames,
for which the CP437 encoding is already
innappropriate).
The encoding issue is then not specific to Java, but to
Zip file formats in general which use deprecated
labels for plateform-specific encodings rather than
adopting UTF-8 encoding and URL conventions to
designate folders.
Submitted On 13-MAY-2004
Hugh_T
I am using the Apache Commons zip classes in our
internationalized product for just this reason: it allows
the encoding of the filenames within the archive to be
specified.
Submitted On 02-JUL-2004
pablodc
I found in this page a couple of workarrounds for this bug, jazzlib is one of them and work for me (spanish characters in zip files):
http://www.peterbuettner.de/develop/javasnippets/zipOnlyAscii/
Submitted On 08-JUL-2004
HacketiWack
Open the source of your virtual machine, so we can correct the problem ourselft.
Money talks again.
Submitted On 08-DEC-2004
onekilo
I have attempted to use jazzlib, apache ant 1.6.2 zip tools and none of these are able to handle files with non-us ascii characters. I am attempting to use ISO-8859-1 encoding for the files.
Furthermore the index file is written correctly using the Java.util.zip classes but the filename isnt so using UNIX zip -x fails as the file does not exist.
Submitted On 03-JAN-2005
Laie_Techie
This bug seems to be a case of bug 4415733., though I would broaden bug 4415733 to include _any_ non-Latain character.
Submitted On 04-JUL-2005
MartinHilpert
Even now with JDK 1.5.0_04, this is still not fixed. I understand the issues of the ZIP file format as described here, but why don't you just offer another constructor/method so we can set our own encoding (as it was suggested above)? This wouldn't break compatibility but help all of us!
Submitted On 04-JUL-2005
MartinHilpert
There are so many other open bugs related to this bug that all together would probably put this bug to the top 1 most wanted bug on the top 25 bugs. And reading all the suggestions here, the fix would be very easy for sun to implement - please, please, please, pleeeeeeeaaaase!
Submitted On 28-JUL-2005
Sylle
Hi,
Here is a workaround I found for the special characters:
(I only tried to write ZIP file using danish characters so far and opened them in WinZip to verify the file names)
1) I made a copy of ZIPOutputStream, ZipConstants, DeflaterOutputStream and ZipEntry and modified their import in order to get them use each other.
2) I modified some methods in ZIPOutputStream:
private void writeLOC(ZipEntry e) throws IOException {
writeInt(LOCSIG); // LOC header signature
writeShort(512); // version needed to extract
...
}
private void writeCEN(ZipEntry e) throws IOException {
writeInt(CENSIG); // CEN header signature
writeShort(512); // version made by
...
}
As default the version is 0, it is for MS-DOS file format compatibility.
The value 512 for the version is to define that it is a Windows platform (only the upper byte is used for that, the lower byte indicates the ZIP specification version).
Here are the other possible values (from ZIP file specifications):
0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems)
1 - Amiga
2 - OpenVMS
3 - Unix
4 - VM/CMS
5 - Atari ST
6 - OS/2 H.P.F.S.
7 - Macintosh
8 - Z-System
9 - CP/M
10 - Windows NTFS
11 - MVS (OS/390 - Z/OS)
12 - VSE
13 - Acorn Risc
14 - VFAT
15 - alternate MVS
16 - BeOS
17 - Tandem
18 - OS/400
19 - OS/X (Darwin)
20 thru 255 - unused
And I modified the UTF8 methods as well:
static int getUTF8Length(String s) {
return s.getBytes().length;
}
private static byte[] getUTF8Bytes(String s) {
return s.getBytes();
}
This is just a test and I can imagine to add a check of platform in order to write the correct version but I have no idea how compatible the generated files are.
Anyway, I think this problem will be solved soon (few years ;-)) from now) so keep this dirty workaround as a temporary solution.
If someone wants to try and send me a feedback or wants the 4 modified files by lazyness my e-mail is spo@opi.dk
Cheers
Submitted On 26-AUG-2005
Marcelo9
It is incredible that this problem is taking Sun so long time. SO SERIOUS BUG LIKE THIS!!!!!
I think I will open the source code and I will fix that!!!!!
Submitted On 06-SEP-2005
papgyo
In the apache ant package I guess there is a constuctor where the encoding can be passed to.
Submitted On 22-NOV-2005
MartinHilpert
It's the second most top bug - why don't you just fix this damn bugger in a 1.5.0_6 release?
Submitted On 07-DEC-2005
JF_Beaulac
Should be reported against the 1.4.2_xx series too. It still does it on 1.4.2_09.
Cmon Sun, Fix this !!!
Submitted On 19-DEC-2005
Christian_Schlichtherle
Have a look at <a href="http://truezip.dev.java.net">TrueZIP</a>. It does not only fix this bug, it also allows fully transparent access to ZIP compatible files as if they were directories in the pathname using drop-in replacements for the File* classes. So it does not make a difference to your application if you are addressing a native file or an entry in a ZIP file.
Enjoy, Christian Schlichtherle
Submitted On 19-DEC-2005
Christian_Schlichtherle
Have a look at <a href="http://truezip.dev.java.net">TrueZIP</a>. It does not only fix this bug, it also allows fully transparent access to ZIP compatible files as if they were directories in the pathname using drop-in replacements for the File* classes. So it does not make a difference to your application if you are addressing a native file or an entry in a ZIP file.
Enjoy, Christian Schlichtherle
Submitted On 19-DEC-2005
Christian_Schlichtherle
Have a look at <a href="http://truezip.dev.java.net">TrueZIP</a>. It does not only fix this bug in its lowlevel API, it's high level API also allows fully transparent access to ZIP compatible files as if they were directories in the pathname using drop-in replacements for the File* classes. So it does not make a difference to your application if you are addressing a native file or an entry in a ZIP file.
BTW: Though not specified in PKZIP's application note, the de facto standard encoding for ZIP files is IBM437, aka CP437, the original IBM PC character set in the United States. Not hard to tell if look at where ZIP originated from.
Enjoy, Christian Schlichtherle
Submitted On 19-DEC-2005
Christian_Schlichtherle
Have a look at <a href="http://truezip.dev.java.net">TrueZIP</a>. It does not only fix this bug in its lowlevel API, it's high level API also allows fully transparent access to ZIP compatible files as if they were directories in the pathname using drop-in replacements for the File* classes. So it does not make a difference to your application if you are addressing a native file or an entry in a ZIP file.
BTW: Though not specified in PKZIP's application note, the de facto standard encoding for ZIP files is IBM437, aka CP437, the original IBM PC character set in the United States. Not hard to tell if look at where ZIP originated from.
Enjoy, Christian Schlichtherle
Submitted On 06-JAN-2006
MartinHilpert
Happy new year! Can we expect to have this second most reported bug being fixed in the next major 1.6 Java release?
Submitted On 17-MAY-2006
jeff@redcondor.com
7 year old bug with > 500 votes. What does it take to get Sun's attention??
Submitted On 22-MAY-2006
gagern
I'm planning to attack this issue by contributing to the jdk collaboration project. I started a thread in the dev froum there, but wanted to make the most important announcements here as well.
One open question (at least for me) is whether to use real UTF-8 by default and for JAR, or modified UTF-8 as has been the case so far and as has been reported as bug 5030283.
What exactly is the impact with the jar command line tool? It is intended for JAR files which use (modified) UTF-8. So why does this bug apply there?
Submitted On 26-JUL-2006
jschnab@gmx.de
can it be that sun didn't fix the issue because it is not marked as reported against the Java 1.5 & 1.6 versions?
PLEASE FIX IT !!!
Submitted On 28-JUL-2006
fcolavin
please fix it !!!! pleaseeeeeeeeeeeeeeeeee
Submitted On 19-OCT-2006
Jobin
Yes, This is a big problem in creating zip files with specific names.
Submitted On 29-DEC-2006
A partial fix of this problem would be in ZipInputStream.getUTF8String: if you detect that given octet stream is not a valid utf-8 string, you may do someting more intelligent than throwing an exception, for example, retrying with the other usual ZIP encoding. I thing that such fix is easy to do, wouldn't hurt anybody, and would allow at least to read third party zips from Java.
Submitted On 05-JAN-2007
Christian_Schlichtherle
Use TrueZIP. It provides a drop-in replacement for java.util.zip which accepts arbitrary encodings as a constructor parameter.
Better yet, its virtual filesystem lets you completely abstract from ZIP files, treating them just like virtual directories.
Submitted On 06-APR-2007
christo
It is generally not recommended to use white space or
Umlaute or other non ASCII characters in file names.
Therfore I would not regard this as a bug.
Submitted On 06-MAY-2007
Please fix this bug!
Submitted On 06-MAY-2007
Vote for this bug! Please fix this bug!
Submitted On 05-JUN-2007
rminner
Hey, two days to the 8-year anniversary. Should we organize a party? :-)
Submitted On 26-JUN-2007
I'm now using the apache ant implementation for ZipOutputStream. With setting the encoding to "Cp437" it works for me. German umlauts are working fine :)
at last...
Submitted On 10-JUL-2007
kobevaliant
Is there any difficulty in fixing this bug?
Submitted On 09-MAR-2008
decoding System coded ByteBuffer's from a FileChannel is done by <pre>MappedByteBuffer map = raf.getChannel().map(FileChannel.MapMode.READ_ONLY, 0, raf.length());
map.order(ByteOrder.nativeOrder());
StringBuffer strBuffer = new StringBuffer(Charset.defaultCharset().decode(map));</pre>
Thus the Charset.defaultCharset().decode(ByteBuffer) can solve the problem if your filename is containing special characters. e.g. (File).listFiles() can provide the file names and be converted with the System default Charset.
Submitted On 18-JUN-2008
ecki
Is that work (jdk-collabortion) discontinued or can we find that somewhere in openjdk. Righ now no workaround for *reading* broken ZIP files exist.
Submitted On 24-NOV-2008
Part of the pain this bug causes, is caused by the following method in java.util.zip.ZipFile:
public InputStream getInputStream(ZipEntry entry) throws IOException {
return getInputStream(entry.name);
}
The method public Enumeration<? extends ZipEntry> entries() can sum up all entries in the .zip file, even those with non-ASCII characters in the name, and even entries with duplicate names. But by falling back to their names when requesting an InputStream for these legally acquired an really excisting entries, you cannot access them beyond checking their presence. Cannot the InputStream be opened on the ZipEntry itself, instead of on it's name?
PLEASE NOTE: JDK6 is formerly known as Project Mustang
|