Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 6707044
Votes 85
Synopsis uncommon_trap of ifnull bytecode leaves garbage on expression stack
Category hotspot:compiler2
Reported Against
Release Fixed hs14(b03), hs11(b15) (Bug ID:2165066)
State 10-Fix Delivered, bug
Priority: 2-High
Related Bugs 6726504
Submit Date 26-MAY-2008
Description
FULL PRODUCT VERSION :
java version "1.6.0_06"
Java(TM) SE Runtime Environment (build 1.6.0_06-b02)
Java HotSpot(TM) Server VM (build 10.0-b22, mixed mode)

FULL OS VERSION :
Linux 2.6.22.1 #7 SMP PREEMPT Tue Mar 18 18:22:09 EDT 2008 i686 GNU/Linux

A DESCRIPTION OF THE PROBLEM :
On the  customer  Lucene project, we've now had 4 users hit by an apparent
JRE bug.  When this bug strikes, it silently corrupts the search
index, which is very costly to the user (makes the index unusable).
Details are here:

  https://issues. customer .org/jira/browse/LUCENE-1282

I can reliably reproduce the bug, but only on a very large (19 GB)
search index.  But I narrowed down one variant of the bug to attached
test case.

THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: No

THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes

STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Compile and run the attached code (Crash.java), with -Xbatch and it should fail (ie, throw the
RuntimeException, incorrectly).  It should pass without -Xbatch.


EXPECTED VERSUS ACTUAL BEHAVIOR :
Expected is no RuntimeException should be thrown.  Actual is it is thrown.
REPRODUCIBILITY :
This bug can be reproduced always.

---------- BEGIN SOURCE ----------
public class Crash {

  public static void main(String[] args) {
    new Crash().crash();
  }

  private Object alwaysNull;

  final void crash() throws Throwable {
    for (int r = 0; r < 3; r++) {
      for (int docNum = 0; docNum < 10000;) {
        if (r < 2) {
          for(int j=0;j<3000;j++)
            docNum++;
        } else {
          docNum++;
          doNothing(getNothing());
          if (alwaysNull != null) {
            throw new RuntimeException("BUG: checkAbort is always null: r=" + r + " of 3; docNum=" + docNum);
          }
        }
      }
    }
  }

  Object getNothing() {
    return this;
  }

  int x;
  void doNothing(Object o) {
    x++;
  }
}


---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
Don't specify -Xbatch.  You can also tweak the code to have it pass the test.  Reducing the 10000
or 3000 low enough makes it pass.  Changing the doNothing(...)  line
to assign the result of getNothing() to an intermediate variable
first, also passes (this is the approach we plan to use for Lucene). Removing the x++ also passes.
Posted Date : 2008-05-26 10:57:14.0
Work Around
N/A
Evaluation
-Xbatch is not the problem, but it simply causes the interpreter to pause so that
the incorrectly generated code can be installed and run.

The crashtest.log on apache.org site implies that the problem occurs with -client and -Xint.  This is incorrect as the crashtest script has a bug of its own that fails to pass those options to the JVM.  The bug is a specific to the server compiler.
Posted Date : 2008-07-16 15:17:03.0

When an ifnull or ifnonnull bytecode that has never been reached is compiled by the JIT, it is turned into an uncommon_trap.  The path along that uncommon_trap mistakenly leaves an extra, dirty, element on the top of JVM expression stack, causing havoc down the line.
Posted Date : 2008-07-16 15:17:03.0

http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/9b66e6287f4a
Posted Date : 2008-07-16 19:30:59.0
Comments
  
  Include a link with my name & email   

Submitted On 27-MAY-2008
MichaelMcCandless
Two additions to the original bug:

  * 1.6.0_03 does NOT show the issue, so apparently it was newly
    introduced in _04.

  * Sometimes the bug also happens when you run without -Xbatch.
    Increasing the number of iterations for docNum and j increases the
    likelihood that it will fail without -Xbatch



Submitted On 27-MAY-2008
hossman
in accepting the bug, the word "Apache" was apparently substituted with " customer " ... both in a description of how the problem was originally discovered, and in the URL to see the issue initially reported to Apache Lucene.  There is no reason for this obfuscation, we have nothing to hide.

Correct URL...
https://issues.apache.org/jira/browse/LUCENE-1282


Submitted On 28-MAY-2008
ijuma82
I am surprised that the priority for this bug is "Low". JIT miscompilation causing index corruption in a very popular open-source application is quite nasty and a test case that shows the problem has been supplied.

In addition, the title is misleading because it makes it sound like it only happens with -Xbatch. In reality, -Xbatch just makes it easier to reproduce. I could reproduce every time _without_ -Xbatch after increasing the number of iterations to 1000000 for docNum and 300000 for j. I tested jdk6u4, openjdk6 in Fedora 9 and jdk 6u10 beta b24. 


Submitted On 02-JUN-2008
dave_sitsky
I have customers using our product with 1.6.0_06 being hit by this bug.  It is incredibly nasty to end up with a corrupted lucene index after a long load time, all due to an introduced bug in the server hotspot compiler.

This should bug should be classified as high priority.

In our scenario, we did not run with -Xbatch, so please update the bug summary and priority appropriately.  This occurs in the real world for real applications and is really nasty.




Submitted On 16-JUL-2008
ijuma82
I noticed that a fix for this was committed in:

http://permalink.gmane.org/gmane.comp.java.openjdk.hotspot.compiler.devel/263

Are there plans to backport this to jdk6u10?


Submitted On 16-JUL-2008
rasbold
With a fix now in OpenJDK, we're now looking into 
getting this fix backported into the 6u10 release.


Submitted On 10-AUG-2008
kedartal
Any news?



PLEASE NOTE: JDK6 is formerly known as Project Mustang