Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 4820807
Votes 18
Synopsis java.util.zip.ZipInputStream cannot extract files with Chinese chars in name
Category java:classes_util_jarzip
Reported Against 1.2.1 , 1.4.1 , tiger
Release Fixed 7(b57)
State 10-Fix Delivered, request for enhancement
Priority: 4-Low
Related Bugs 4885817 , 4244499
Submit Date 19-FEB-2003
Description




FULL PRODUCT VERSION :
java version "1.4.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)


FULL OPERATING SYSTEM VERSION :
 customer  Windows 2000 [Version 5.00.2195]
Service Pack 3

A DESCRIPTION OF THE PROBLEM :
If ZipInputStream is used to read a zip file containing one
or more files with Chinese, Japanese or Korean names, the
getNextEntry method throws an IllegalArgumentException.

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
1. Create a zip file containing at least one file with a
Chinese, Japanese or Korean filename.
2. Try to read using a ZipInputStream.

EXPECTED VERSUS ACTUAL BEHAVIOR :
Should return a valid entry with the correct filename
instead of throwing an exception.

ERROR MESSAGES/STACK TRACES THAT OCCUR :
java.lang.IllegalArgumentException

    at java.util.zip.ZipInputStream.getUTF8String(ZipInputStream.java:291)

    at java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:230)

    at java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:75)

REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
import java.io.FileInputStream;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

public final class TestCase {
    public static void main(String[] args) throws IOException {
        ZipInputStream zis = new ZipInputStream(new FileInputStream
("myfile.zip"));
        ZipEntry entry;
        while ((entry = zis.getNextEntry()) != null) {
            System.out.println("found " + entry.getName());
        }
    }
}

---------- END SOURCE ----------

CUSTOMER WORKAROUND :
Do not use CJK filenames in zip files.
(Review ID: 181382) 
======================================================================


  xxxxx@xxxxx   2003-09- customer 

Same problem reported by a CAP member from Germany:

J2SE Version (please include all output from java -version flag):
  java version "1.4.1"
  Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
  Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)

and

  java version "1.5.0-beta"
  Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-beta-b16)
  Java HotSpot(TM) Client VM (build 1.5.0-beta-b16, mixed mode)


Does this problem occur on J2SE 1.3, 1.4 or 1.4.1?  Yes / No (pick one)
  Yes

Operating System Configuration Information (be specific):
  English Linux and German Win2K

Bug Description:
  A ZIP file with entries that contain german umlauts. When read
  read these entries using ZipInputStream.getNextEntry() it throws an 
  IllegalArgumentException at:

Exception in thread "main" java.lang.IllegalArgumentException
         at 
java.util.zip.ZipInputStream.getUTF8String(ZipInputStream.java:298)
         at java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:237)
         at 
java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:73)
         at ZipUmlauts.main(ZipUmlauts.java:22)

  It would be better, if the getUTF8String() method would just ignore 
  these "illegal" characters or add them "as-is".

Test Program: (ZipUmlauts.java umlauts.zip)
-------------

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

/*
 *  ZipUmlauts.java created on Sep 1, 2003 8:45:08 AM
 */

/**
 * @version ${Id:}
 * @author rs
 * @since pirobase®CB 1.0
 */
public final class ZipUmlauts {

    public static void main(String[] args) throws IOException {
        FileInputStream fis=new FileInputStream("umlauts.zip");
        ZipInputStream zis=new ZipInputStream(fis);
        ZipEntry ze;
        while ((ze=zis.getNextEntry())!=null) {
            System.out.println(ze.getName());
        }
    }

}
Work Around
N/A
Evaluation
Unfortunately, fixing this in a backward-compatible way may be impossible.
At least, for non-ASCII file names, Java should be able to create files
on one system and extract them on a different system, even if the
encodings are different.

The suggestion of adding an encoding attribute is a good one.
That should have been done when the decision to encode file names
in UTF-8 was first made.
  xxxxx@xxxxx   2003-09-04

I have confirmed that, as long as one uses Sun's J2SE zip
implementation consistently, in a environment where file.encoding
supports the character set of interest,
that one can create, list and extract
jar/zip files containing non-ASCII characters (including Chinese
characters) correctly.   Other zip implementations also have
character encoding interoperability problems, so J2SE's
implementation is not alone.

The suggestion of falling back to file.encoding is an appealing one,
but it's quite dangerous to go down that route.

Encoding "autodetection" is a good interactive feature for users, but
it's not so good for file formats.  To have a file be properly readable
depending fairly randomly on the data bit patterns stored within it
is a reliability disaster.  It's much better to have consistent failure
than intermittent "success".

Re-architecting zip to record the encoding of the file names will
hopefully get done for J2SE 1.6.

  xxxxx@xxxxx   2003-11-25


I believe this is a duplicate of 4244499. See the evaluation of that bug report for a relatively simple proposed solution.
  xxxxx@xxxxx   2005-1-29 00:28:38 GMT
ZipInputStream(InputStrea, Charset) has been introduced in jdk7 to solvoe this issue.
Try 
    ZipInputStream zis=new ZipInputStream(fis, Charset.forName("ibm437"));
For the umlauts case.

Try use "gbk" for the chinesefilenameInside.zip
Posted Date : 2009-04-16 22:38:53.0
Comments
  
  Include a link with my name & email   

Submitted On 27-SEP-2003
hhwindspirit
It's a old problem...


Submitted On 26-OCT-2003
breezee26
here is a workaround for this bug. 
- create a new package with 3 classes ZipInputStream, 
ZipConstants and ZipEntry
- copy the source code from the corresponding java.util.zip 
package
- according to the workaround of lallinec (THU FEB 20 03:04 
A.M. 2003)
to bug #4415733, in ZipInputStream, fix the readLOC() 
method :
     // Old Code:
     // ZipEntry e = createZipEntry(getUTF8String(b,0,len));
     String fileName = null;
     try{
        fileName = getUTF8String(b, 0, len);
     }catch (Exception ex){
        fileName = new String(b, System.getProperty
("file.encoding"));
     }
     ZipEntry e = createZipEntry(fileName);

- edit the new ZipEntry, remove the static initializer that calls 
the native methods initIDs().
this step seems a bit scary, but it's according to the 
workaround 
to bug #4244499 (the workaround of Olive64, THU JUN 05 
01:55 P.M. 2003),
that handles a similar bug at the ZipOutputStream.

Now you have a ZipInputStream that supports multi-bytes 
entry names.
to extract the zip file, using the fixed code that is offered 
above, 
create a function that gets an "encoding" string, a "destPath" 
string 
and a "sourceFile" (zipped) and does :

ZipInputStream zipinputstream = null;
ZipEntry zipentry;
zipinputstream = new ZipInputStream(new FileInputStream
(sourceFile),encoding);
zipentry = zipinputstream.getNextEntry();
while (zipentry != null) { //for each entry to be extracted
    String entryName = zipentry.getName();
    int n;
    FileOutputStream fileoutputstream = new FileOutputStream
( destPath + entryName );
    while ((n = zipinputstream.read(buf, 0, 1024)) > -1)
        fileoutputstream.write(buf, 0, n);
    fileoutputstream.close(); 
    zipinputstream.closeEntry();
    zipentry = zipinputstream.getNextEntry();
}//while
zipinputstream.close();


Submitted On 09-DEC-2003
faralla
I am trying that workaround, but I have several 
problems.
You say: "create a new package". Do you 
mean, "create a package with an own name"? Then I 
have the problem, that for example ZipInputStream 
cannot access usesDefaultInflater.
when I call the packe also "java.util.zip" I have the 
problem that it won´t be used. at least not when 
running in Tomcat since I have to use it in a servlets.

Any comments on that?

Thanks,
 Philipp


Submitted On 08-APR-2004
scsulliv
Is this fixed in J2SE 1.5.0 ?


Submitted On 20-JAN-2005
dbeutner
Hi,

this is a duplicate of acknowledged real bugs 4244499 and 4688771 , at the moment of writing both within the bugs top25.

For the first one being the original and leading in top25, I would suggest to move the votes to it. This one could be closed as duplicated.

Adding encoding parameter does not lead to any backwards compatibility issues, what a nonsense :-(

No, it is not fixed with 1.5 :-(

Hey, it's such a fresh bug, only six years old, what do you expect... ;-(


Submitted On 04-JUL-2005
MartinHilpert
The workaround of breezee26  doesn't work for me: I get an 

java.util.zip.ZipException: ZIP file must have at least one entry

Exception even though I _have_ entries (proved by reverting back to java ZipOutputStream it works again - of course with faulty special characters).


Submitted On 26-AUG-2005
Marcelo9
It is incredible that this problem is taking Sun so long time. SO SERIOUS BUG LIKE THIS!!!!!

I think I will open the source code and I will fix that!!!!!


Submitted On 26-SEP-2005
inzaghi101
Has anyone solved this problem?


Submitted On 22-NOV-2005
MartinHilpert
A working solution so far?


Submitted On 03-JAN-2006
Gravityzhong
I googled this problem and found that the zip library in apache ant (org.apache.tools.zip.*) can read zip archives correctly. It adds a "encoding" argument to the constructor of ZipFile.
But when it comes to create zip archives, there still is a problem. If I use jar / java.util.zip.ZipOutputStream / org.apache.tools.zip.ZipOutputStream to put a file with chinese filename in a zip archive, then the unzip program on Linux cannot list or extract the archive correctly. It's strange that jar, ZipOutputStream and unzip are all using UTF8! 
I created another archive using org.apache.tools.zip.ZipOutputStream with the option that Chinese filenames are encoded in GBK. Chinese WindowsXP can correctly extract this archive.
It seems that the unzip(5.51) program on Linux is buggy.


Submitted On 07-JAN-2008
proveindia
I am getting java.lang.IllegalArgumentException while extracting the ZIP file which is zipped using WINZIP 10. is there any workaround to solve this issue from SUN  or other utility ?. 

I tried to encoding using GB2312 /GBK then I am not getting this exception, however Japanese character comes as Question mark. 

This issue makes my life BIG QUESTION MARK ?.  

Any Help will be appreciated... 



PLEASE NOTE: JDK6 is formerly known as Project Mustang