NullPointerException just before data loss
1.4.2(mantis-b18) (Bug ID:2117808)
, 1.5(tiger) (Bug ID:2117809)
java version "1.4.1"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.1-b21)
Java HotSpot(TM) 64-Bit Server VM (build 1.4.1_01-b01, mixed mode)
com. customer .snmp.snmp2.agent.AgentRuntimeException: Failure in opening SnmpSession, errstat: 0, errvalue: 0
at com. customer .snmp.snmp2.agent.SnmpAgent.addClientAndOpenSession(SnmpAgent.java:801)
at com. customer .snmp.snmp2.agent.SnmpAgent.restartSnmpAgent(SnmpAgent.java:939)
at com. customer .snmp.snmp2.agent.SnmpAgent.restartSnmpAgent(SnmpAgent.java:955)
at com. customer .snmp.snmp2.agent.SnmpAgent.setPort(SnmpAgent.java:1034)
CSVLogReader.CreateMapArguments() - Exception !!!
All files mentioned in the description are in the attachments
Problem description :
We have CSVLogReader.class that performs CSV file line parsing . After
reading the line from the file, the CreateMapArguments(String line)
method is called in order to put each field to the buffer according to its
type. Inside this method there is a try/catch (Exception E) block, that
catches NullPointerException. In order to understand the reason, we've
inserted E.printStackTrace() call, and we've expected to see the stack trace
Exception customer in the standard output. But only
"java.lang.NullPointerException" was printed. So the first question is: why
we didn't see the full stack trace ?
Throwable X = E.fillInStackTrace();
gives the same results.
Then we've concluded that the only way to see what customer caused the
NullPointerException is to produce a thread dump immediately after catching
From the program we called to a script that sent kill -QUIT signal to the
process. The results of the thread point that the source of this Exception
is inside Java classes.
Here are some examples from several thread dumps that show the threads which
called to the CreateMapArguments() method:
1. Thread dump from gath7_stdout.log
"Thread-24" prio=5 tid=0x1053d0680 nid=0x4a runnable
- locked <fffffffee8b85858> (a ISM.ISM_114_1_0.CSVLogReader)
Here is the ISM.ISM_114_1_0.CSVFieldTokenizer.fieldAsString() method with
line 161 specified:
public String fieldAsString(int FieldNo)
char arr = new char[m_FieldLen[FieldNo]]; // line 161
System.arraycopy(m_Buffer[FieldNo], 0, arr, 0, m_FieldLen[FieldNo]);
return (new String(arr));
As you can see, no our objects were called in this line.
The same situation we can see in the gath9_stdout.log :
"Thread-15" prio=5 tid=0x1055f2180 nid=0x30 runnable
- locked <fffffffee8b86ef8> (a ISM.ISM_114_1_0.CSVLogReader)
2. Thread dump from gath17_stdout.log
"Thread-26" prio=5 tid=0x1002eb170 nid=0x53 runnable
- locked <fffffffee8b85070> (a ISM.ISM_114_1_0.CSVLogReader)
Here is the Util.MapArguments.putString() method with line 889 specified:
public void putString(Integer Key, String Value)
putString(Key.intValue(), Value); // line 889
As you can see, the Integer.intValue() method is called before Exception was
3. Thread dump from gath20_stdout.log
"Thread-24" prio=5 tid=0x100301430 nid=0x44 runnable
- locked <fffffffee81867c8> (a ISM.ISM_114_1_0.CSVLogReader)
The relevant line in the ISM.ISM_114_1_0.SuperParser.parseDateTime() is :
int Year = m_CalObj.get(Calendar.YEAR); // line 286
The same situation we can see in the gath24_stdout.log :
"Thread-23" prio=5 tid=0x1002fa8e0 nid=0x43 runnable
- locked <fffffffee8181e08> (a ISM.ISM_114_1_0.CSVLogReader)
As you can see, the Calendar m_CalObj method is not null here, and the
problem is in the Calendar methods.
4. Thread dump from gath28_stdout.log
"Thread-30" prio=5 tid=0x105562140 nid=0x52 runnable
- locked <fffffffee8b814a0> (a ISM.ISM_114_1_0.CSVLogReader)
The relevant line in the Util.MapArguments.putString() is :
m_StringErr [KeyDesc.Index] = NO_ERROR; // line 903
The m_StringErr is a array of booleans.
How do you explain this situation ? What customer can cause the
NullPointerException ? And why we can't see the stack trace
by using printStackTrace() call ?
Attached with this email :
1. All mentioned thread dumps (gath7_stdout.log, gath9_stdout.log,
gath17_stdout.log, gath20_stdout.log, gath24_stdout.log, gath28_stdout.log)
2. The CSVLogReader source file.
According to the description, the customer is looking for the stack trace
on the standard output. The printStackTrace() method prints the stack trace
on the standard error, not standard output. However it does appear that the
log file do capture standard error becaue the traceback from the second
call to printStackTrace() does come out (the description implies that it
doesn't.) He should modify his program to write his extra debugging
output to System.err instead of System.out and capture stderr and stdout separately when running the program. The combined output to the 2 output
streams may not be intermixed in the same order as written.
His invocation of the "printstacktrace" script does not do what he wants.
The execution will run concurrently with his program, so the thread he
is interested in could have continued execution far beyond his call
to Runtime.exec(). He needs to add a call to MyProcess.waitFor()
after the call to Runtime.getRuntime().exec(). However this will
show the location in the exception handling code, not where the original
exception occurred. His analysis of where the NullPointerException
occurred from the thread dumps is incorrect, the thread dumps show
where the program execution had avanced to by the time the SIGQUIT
It does seem to be a bug that the printStackTrace() call on the original
exception object does not print the stack trace. This does work correctly
on a small test case I have written. Without more information on how to
reproduce the bug I cannot evaluate it any further.
Initially, I suspected it to be a compiler issue, but after inspecting the 2 cores at the given location, using SA, I could find very few compiled frames:
And the method CreateMapArguments() that is seeing the data loss, wasn't compiled in any of the cores.
I suspect the generated interpreter code right now. To make sure it's an interpreter issue & not a compiler problem, I would suggest a run with
java -Xcomp. And if that run fails, do another run with disabling compilation of above three methods.
New class file was developed as suggested and tested. In summary,
it is getting worse :-(
The class file is under /dumps/CSVLogReader.class, if needed.
There have been several crashes due to Null pointer exception.
For a sample, look in the file /t3-5/gatherer/g5/Log/Gatherer_stdout.log
towards the end of the logfile.
There was also another crash with little information. Pertinent details
are in /t3-1/gatherer/g29/Log/Gatherer_stdout.log and its corresponding
core file is /t3-1/gatherer/g29/Gatherer/core.
There were also Hashtable hang bug and its details are in
Using -Xcomp is NOT an acceptable workaround for several reasons.
- We do not ask a customer to run with un-documented flags in production.
- We very well know that using it is deterimetal to performance and
that's probably the reason we don't document it.
- This application's (ISVs) main competitive feature is performance,
the main reason for which I (MDE) am working with them now but
in the middle of these bugs. let me stop with this.......
The system is on SWAN. When I left I was told that there's some
maintanance work going on with the building router but that should
not affect this system. However, I was not able to get to my mail
in sfbay domain; but, I can access that E10K from here now.
So, the connectivity may be intermittent but please don't give up.
> Date: Wed, 20 Nov 2002 19:41:53 -0500 (GMT)
> From: xxxxx@xxxxx
> Synopsis: NullPointerException just before data loss
> ** Nov 20 2002 5:56AM R,Anupam **
> Event: The test with -Xcomp doesn't reproduce the problem. So most probably it's the generated interpreter code, that is the culprit. I am currently looking at the genereated codelets for bytecodes. My suspecion right now is on iconst_m1 opcode, which will put a -1(0xffffffffffffffff) on the operand stack.
> Looking in CreateMapArguments() the place where this opcode is coming from is,
> boolean Found = false;
> Can you try a run, after making Found as "int" data type and using values of 1 & 2 as false & true respectively. This will avoid iconst_m1.
> But since, -Xcomp doen't reproduce the problem that can be used as workaround for the time being.
> I will continue to need the machine and I haven't made any modification to the set up on E10K today, as requested.
Looked at every word in the stack frame of failing method's frame using SA. I would expect the local reference which is getting corrupted to be present there, but didn't find any oop of type Util.MapArguments. I did see a null value in the region where other local oops were present. Knowing how interpreter frames are laid out, would help get the exact offset for this local. I think an earlier bytecode execution corrupted(null) this location, so next iload_2 loading from that location puts null on the operand stack.
Test doesn't fail when compilation of CrateMapArguments() is disabled and in -Xcomp mode.
Looking at TraceDeoptimization outputs, CrateMapArguments() is both compiled and deoptimized twice:
1. First one is a normal compile, and is deopt due to failed optimized checkcast. Deopt here looks proper and execution proceeds normally.
2. Second compile is an OSR compile, due to the for loop in CrateMapArguments(), and is deoptimized due to "unresolved class: Util/KDSServer". Excecution after this deopt gives a NullPointerException.
This points to a problem in setting up of interpreted frames after deopt.
looking at .class and PrintOpto, it looks solely a OSR compile only problem and deopt may not be the issue. Although, deopt packing showed map as NULL, deopt itself happened inside the catch block, and catch was executed because map was already null. PrintOpto shows that at calls to parse*() methods in the pseudo compiled code L2=#NULL(that's the place where local#2, map, will be kept). Whereas, for normal compiles(where there is no data corruption) L2 has good values. Compiler has assumed _L2(map) to be dead in a range where it should have been alive.
Attached file BadNullAssert.java shows a test case for a possible root cause of this bug.
The problem is subtle: An OSR method is compiled using a model of typestates which are
derived from flow analysis over the bytecodes. This flow analysis is necessarily incomplete
if the interpreter has not yet exercised all paths in the code, leaving some classes still
unloaded. In effect the CFG for the method grows over time, as the interpreter exercises
new paths and loads new classes. Exercising a new path must invalidate pre-existing OSR
methods, since they may have compiled into them assumptions about typestates that must
change as new CFG edges come into being.
The problem can manifest (in the test case above, for example) when an unloaded class is
the subject of a checkcast operation. The flow analysis that precedes an OSR compilation
will simply assert that any value casted to an unloaded class must be null, and the
compilation proceeds on that assumption. Later on, if the class loads, the interpreter
can checkast non-null values successfully, leading to typestates that were previously
proved impossible. The OSR method can be invoked on a typestate created by the interpreter
containing non-null values, and yet the OSR code will assume that they are null, based
on the out-of-date type analysis.
Solution is to have OSR methods register dependencies on unloaded classes, and get flushed
if the classes ever load. This is not an issue with regular (non-OSR) methods, since
a normal method creates all its own typestates, from the method's entry point forward.
There is no way for the interpreter dump surprising typestates into a non-OSR method.
A binary which fixes this bug is here: /net/lukasiewicz/export/lukasiewicz1/4761344
There are product, optimized, and fastdebug flavors. It is based on the current 1.4.2
To gather more information, testing should use the following extra flags:
-XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=hotspot.log
The file "hotspot.log" will be written with diagnostic information.
The name can be adjusted to any pathname.
The libjvm.so binaries containing the fix are under this directory:
The 32-bit VM builds are:
fastdebug/libjvm_g.so asserts enabled, compiled optimized, for java_g
optimized/libjvm.so debugging options compiled in
product/libjvm.so product build
The corresponding 64-bit VM builds are:
All of these were built from the current 1.4.2 baseline, but they
can run inserted into the 1.4.1 JRE.
To verify installation, start a VM with -Xinternalversion, and make
sure the version string includes "jrose in mantis".
If these binaries do not fix the problem, please re-run with the
previously suggested three options including +LogCompilation.
After some intensive analysis, we are more confident the root problem is the compiler
erroneously deciding that a particular class is unloaded, due to the fact that it is
defined in a delegate class loader and not directly visible in the current class loader.
We believe we can fix this cleanly and reliably by using the class loader constraints
recorded by the VM, to correctly report that the class in question is in fact loaded
in the VM, even though it has not yet been fetched from the local class loader.
I am working on a VM that includes this adjustment. Meanwhile, run with logging
turned on, and the log file set to a non-existent file, such as "/no/such/file".
This will cause the VM to choose a unique file in /tmp/, avoiding the problem
with multiple VMs overwriting a single log.
There is an updated VM which fixes the problems mentioned above.
Please test with it.
The 64-bit VM builds are:
Recent JDC comments report that there is some variation of
this bug active in 1.4.2. The particular bug fixed under
this report had to do with the compiler getting confused
by aliasing of classes through different clas loaders,
causing it to "know" that a given class was not yet loaded
and therefore all references "must" be null--leading to
embarassing spurious NPEs.
On top of this problem, up until 1.5 the compiler optimizes
the treatment of NPEs by using a preallocated NPE object,
rather than creating it every time. The preallocated NPE
object works great, except that it lacks a backtrace.
A backtrace-free NPE makes it hard to diagnose any NPE bug,
whether it comes from user error (usually) or a JVM bug (rarely).
We have opened a separate bug 4292742 for the missing backtrace,
and fixed it in 1.5beta2.
This backtrace problem is probably bothering people on 1.4.2 release.
It is probably making it difficult to debug application errors in
in compiled code, forcing them to run -Xint or -client to debug.
If this is true for you, I suggest transferring your JDC vote to bug
4292742, so we can consider backporting a version of the 1.5 fix.
On the other hand, NPE exceptions that are truly spurious, and are
known to be due to JVM errors, should probably be reported as new bugs.
However, if they are clearly cases of the complex syndrome described
in this bug, we would appreciate an explicit note to this effect,
so we can re-evaluate the fix to the class loading logic.
Submitted On 21-OCT-2002
We've been seeing the same symptom... a mysterious
NullPointerException occurs, but when you print the stack
trace it doesn't show anything. [Our stack traces *do* print
correctly for all other exceptions... it's bogus just for these
mysterious NPEs.] These NPEs seem to be related to the
JVM... I have a case where it always occurs with JVM 1.4.1
server hotspot, with or without UseConcMarkSweepGC; it
does NOT occur with client hotspot. We consider this a very
serious issue... glad to see we're not the only ones running
Submitted On 22-OCT-2002
We get the same sort of problem - random
NullPointerExceptions with no stacktrace (just the word
These happen consistently with the 1.4.1 -server VM. When
changing to the -client, they don't occur.
We're running on Linux (Redhat 7.3), and under Tomcat 4.04.
BTW - we never seem to be able to get a stacktrace with
NullPointerExceptions - although we get them with all other
exceptions. This also happens on 1.3.1.
Submitted On 04-NOV-2002
We get a similar kind of problem from 1.4.1 on, the stack trace points to
the line where the NPE occurs.
Our code is something like:
private HashMap bar()
HashMap map = new HashMap(),
private void foo()
HashMap map = bar();
// .. some code is executed that does not use the map
String value = (String)map.get("Key"); // here we get a NPE
The reference to the map is created in bar() and only used/known in foo. From 1.4.1 on
the map itself is null (as we could see in sys.out) when we access it later.
The problem doesn't occur in
-Java HotSpot(TM) Client VM (build 1.4.0-b92, mixed mode)
-Java HotSpot(TM) Client VM (build 1.4.0_01-b03, mixed mode)
-Java HotSpot(TM) Client VM (build 1.4.1-b21, mixed mode)
-Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed mode)
setting the server option.
Using -client instead also fixes the NPE with 1.4.1!!!
We run on Suse Linux, Kernel 2.4.17, glibc 2.2-7
Maybe something has changed in the garbage collection algo from 1.4.0_01 to 1.4.1?
Submitted On 04-NOV-2002
Addendum: The NPE-problem occurs randomly!
Submitted On 08-NOV-2002
To expand on Skibu: We have been seeing this NPE in a static
method of a Servlet. It happens randomly. It looks like
multple threads are accessing the method and are corrupting
the local stack! By inserting print statements, we see that
an object ref is NOT null at first, but somehow gets set to
null later on (NOTE: the method never modifies the object
ref, which is passed in to the static method as a parameter!)
By making the static method synchronized, the problem goes
away. Still, this is only a band-aid. The object ref which
is being corrupted is not a static object reference: it is a
We are seeing this with 1.4.1 and 1.4.1_01 using the -server
Submitted On 08-NOV-2002
We are running into the same NullPointerException problem
without using the UseConcMarkSweepGC option. We were
able to get around the problem by declaring the function to
Submitted On 10-DEC-2002
We get a similiar error with HotSpot Server VM (build
1.4.1_01-b01, mixed mode). Our application runs in Tomcat
4.0.6. When we execute a load test, after about 15 minutes,
some local variables in a method suddenly become null. The
NPE that results from this has no stack trace.
This error doesn't occur in 1.4.1_01 in interpreted mode (-
Xint) or with JDK 1.3.1_06 (in mixed mode). Our workaround is
to use JDK 1.3.1_06.
Submitted On 16-JAN-2003
Same Problem with
Java(TM) 2 Runtime Environment, Standard Edition (build
Java HotSpot(TM) Client VM (build 1.4.1_01-b01, mixed
on Redhat 8.0, when running Server VM. It seems, that the
same exception causes always this incorrect empty Stack
Submitted On 28-FEB-2003
I probably should have added that we've tried running
Tomcat/Lucene on Windows 2000 using j2sdk1.4.0,
j2sdk1.4.1, and j2sdk1.4.1_01 -- the problem manifests in all
of these environments.
A possibly unrelated problem is that the server VM takes an
unbounded amount of memory until an OutOfMemoryException
occurs (even with -Xmx1536m), whereas the client VM runs
successfully using only about 200MB.
Submitted On 28-FEB-2003
We've seen the same problem (NullPointerException with no
stack trace) running Apache's Lucene under Tomcat 4. By
liberally inserting "print" statements, we were able to track it
down to 2 local variable values being reset to null within
a "for" loop in the method
but there's no JAVA code to cause the values to change.
Switching Tomcat to use the Client VM instead of the Server
VM seems to have solved the problem for now.
Submitted On 15-JUL-2003
We are seeing a similar problem on the JDK 1.4.2 running
tomcat 4.1.24 and "-server". Some of our
NullPointerExceptions show no stack trace when doing a
printStackTrace(PrintWriter). This is a real pain...
Submitted On 09-SEP-2003
This is NOT FIXED in 1.4.2, please reopen.
Submitted On 21-NOV-2003
windows x86 1.4.2_02 jdk I am still seeing this problem. It
appears to have NOT been fixed.
Submitted On 19-MAR-2004
i can confirm this is not fixed in 1.4.2
our production machines use "Java HotSpot(TM) Server VM
(build 1.4.2_02-b03, mixed mode)", on redhat linux 6.2
here's the output when this bug kicks in:
11:23:03,892 ERROR [TP worker 3] [WorkerThread] an exception
occured while processing the event
and here's a normal output:
14:56:23,405 ERROR [TP worker 4] [WorkerThread] an exception
occured while processing the event
java.lang.NullPointerException: test stack trace
it doesn't happen often, and i wasn't able to consistenly
reproduce it, but it's there
Submitted On 28-DEC-2004
We are seeing what I believe is the same problem under 1.4.2_05, and have found a workaround that works for us.
Unfortunately the environment is too complex to distill down to a reproducible case, but here are the details we've gathered:
The problem originally manifested as a stack-trace free NullPointerException. Through generous use of the old-school 'insert many System.out.printlns' debugging style, we were able to determine that the NPE was being generated within the HashMap class, in get(). We determined that before the call, the key was definitely non-null. Our situation differs from elsen's in that the map reference itself is never null. The map is still accessible after the NPE. The call worked fine for other keys. The key was an inner class that was not itself loaded by another classloader, but in some cases (and in the case that caused the error) referred to another class that was definitely loaded by another classloader. If we ran the process in the client VM, the problem went away (though this was not a reasonable production workaround). It looks like we've been able to solve the problem by using the toString() of the key object as the key, rather than the object itself. For our class, this guarantees uniqueness and for whatever reason doesn't trigger the error.
PLEASE NOTE: JDK6 is formerly known as Project Mustang