EVALUATION
The suggested fix resolves the failures it was supposed to resolve in the nightlies.
|
|
|
EVALUATION
To summarize. Out of the three distinct failures we see, the following two (setCollectionUsageThreshold) are caused by test issues which do not take into account the fact that max is not constant over time:
[2009-11-25T13:12:56.92] # ERROR: Unexpected java.lang.IllegalArgumentException: Invalid threshold: 932184064 > max (930086912). in pool G1 Old Gen
[2009-11-25T13:12:56.92] # ERROR: IllegalArgumentException is not thrown in pool G1 Eden for threshold 1048577 (init = 1048576(1024K) used = 0(0K) committed = 0(0K) max = 1048576(1024K))
We will only fix the third type of failure (isCollectionUsageThresholdExceeded):
[2009-11-25T13:11:59.71] # ERROR: isCollectionUsageThresholdExceeded() returned false, while threshold = 0 and used = 0
|
|
|
SUGGESTED FIX
For the failure that we'll fix (isCollectionUsageThresholdExceeded):
[2009-11-25T13:11:59.71] # ERROR: isCollectionUsageThresholdExceeded() returned false, while threshold = 0 and used = 0
it's sufficient to ensure that max / committed return a non-zero value (let's say a minimum of the G1 region size). See Evaluation for more details.
|
|
|
EVALUATION
From an e-mail I sent:
One more thing. If I change the test to get the max from getUsage() instead of getCollectionUsage() the test passes and both failures are eliminated:
[2009-11-25T13:12:56.92] # ERROR: Unexpected java.lang.IllegalArgumentException: Invalid threshold: 932184064 > max (930086912). in pool G1 Old Gen
[2009-11-25T13:12:56.92] # ERROR: IllegalArgumentException is not thrown in pool G1 Eden for threshold 1048577 (init = 1048576(1024K) used = 0(0K) committed = 0(0K) max = 1048576(1024K))
Thinking about this for a minute, yes, I understand why the test does getCollectionUsage() instead of getUsage(). So, the real cause for both failures is really that max is assumed not to change, right?
|
|
|
EVALUATION
From an e-mail I sent:
> [2009-11-25T13:12:56.92] # ERROR: IllegalArgumentException is not thrown in pool G1 Eden for threshold 1048577 (init = 1048576(1024K) used = 0(0K) committed = 0(0K) max = 1048576(1024K))
>
> I'm not quite sure why this happens. I'm investigating right now...
I _think_ I have an explanation for this.
The test (nsk/monitoring/MemoryPoolMBean/setCollectionUsageThreshold//setthreshold004) gets max by doing getCollectionUsage() on the pool.
>>>> init = 1048576(1024K) used = 0(0K) committed = 0(0K) max = 1048576(1024K)
>>>> max = 1048576
Then it tries to do setCollectionUsageThreshold with max + 1 (i.e., 1048577) and it doesn't get an illegal argument error.
# ERROR: IllegalArgumentException is not thrown in pool G1 Eden for threshold 1048577 (init = 1048576(1024K) used = 0(0K) committed = 0(0K) max = 1048576(1024K))
Now, the setCollectionUsageThreshold(), it gets max from getUsage() (the native method getUsage0() in fact) and that actually returns a different set of values:
>>> init = 1048576(1024K) used = 2097152(2048K) committed = 2097152(2048K) max = 2097152(2048K)
(and I'm pretty sure this is the max that setCollectionUsageThreshold uses to do the bounds check, which is why it succeeds)
So, is the test correct? Should it get the max value from getUsage() instead of getCollectionUsage()?
|
|
|
EVALUATION
Mandy's reply:
> So, is the test correct? Should it get the max value from getUsage() instead of getCollectionUsage()?
This is a test bug. Since max can be changing as specified in the spec, the test has to use a better way to determine a threshold value big enough to cause the IAE. The usage or collection usage is not a reliable source for such test case.
G1 is the first collector that changes the max value dynamically (to be accurate, it's the first collector being tested with varying max value). I am sorry for causing more work for you.
> [2009-11-25T13:12:56.92] # ERROR: Unexpected java.lang.IllegalArgumentException: Invalid threshold: 932184064 > max (930086912). in pool G1 Old Gen
>
> This happens because the max size of the old gen actually decreases over time (as the max size of the eden and survivor increase; remember: old max = overall_max - eden_max - survivor_max). Mandy: is this a test bug?
This is a test bug again although I think the API has to be enhanced to help determining the right threshold to set with variable max value (e.g. setting a percentage instead of a fixed value in bytes). I will file a RFE to add the new API for that.
> Or shall I bail out of the policy of having a dynamically changing max?
I can imagine that it's not easy to set a fixed max value for G1 since their sum should be equal to the max heap size returned from MemoryMXBean. Or set the max to undefined (-1) which may uncover more test bugs. We should check with SQE if the NSK monitoring tests deal with undefined max well (but I doubt it as the test failures you ran into use the max value to determine the threshold).
Mandy
|
|
|
EVALUATION
http://hg.openjdk.java.net/jdk7/hotspot-gc/hotspot/rev/9118860519b6
|
|
|
EVALUATION
Thanks to Mandy for pointing me to 6546089. This was the reason why I couldn't reproduce the isCollectionUsageThresholdExceeded (I was using a JDK 6 release to test my libjvm.so). With a JDK 7 release I can now reproduce the isCollectionUsageThresholdExceeded failures. Setting the max and committed sizes to be at least as large as a region size (i.e., > 0) resolves this failure.
|
|
|
PUBLIC COMMENTS
This (clearly!) looks related to the introduction of the G1 memory pools (see CR 6815790).
|
|
|
EVALUATION
The failing tests seem to fall into two categories: isCollectionUsageThresholdExceeded and setCollectionUsageThreshold. I haven't been able to reproduce the former, I've been able to reproduce some of the latter).
For the former (isCollectionUsageThresholdExceeded tests), here's the typical error:
[2009-11-25T13:11:59.62] 2 pool G1 Survivor
[2009-11-25T13:11:59.62] supports collection usage thresholds
[2009-11-25T13:11:59.62] setting threshold 0 init = 0(0K) used = 0(0K) committed = 0(0K) max = 0(0K)
[2009-11-25T13:11:59.62] threshold 0 is set, (used = 0, isExceeded = false)
[2009-11-25T13:11:59.62] used value (0) crossed the threshold value (0)
[2009-11-25T13:11:59.71] # ERROR: isCollectionUsageThresholdExceeded() returned false, while threshold = 0 and used = 0
What the test does is to set the threshold to used + 1 and notice that the pool is reporting that the threshold has not been crossed. However, it also uses max as a ceiling for used + 1 and in G1, unlike the other GCs, max can be 0 (for the eden and survivor pools). So, the threshold is actually set to 0 and it looks as if isCollectionUsageThresholdExceeded() returns true of used == threshold (which is true in this case: 0 == 0).
from ThresholdSupport class in lowMemoryDetector.cpp:
bool is_high_threshold_crossed(MemoryUsage usage) {
if (_support_high_threshold && _high_threshold > 0) {
return (usage.used() >= _high_threshold);
}
return false;
}
First, should isCollectionUsageThresholdExceeded() return true if used == threshold? Second, is it reasonable for the test to assume that max > 0? I think I can fix that by assuming that the max of each pool is at least region_size.
For the latter (setCollectionUsageThreshold tests), which I can reproduce, these are the two errors I see:
[2009-11-25T13:12:56.92] # ERROR: Unexpected java.lang.IllegalArgumentException: Invalid threshold: 932184064 > max (930086912). in pool G1 Old Gen
[2009-11-25T13:12:56.92] # ERROR: IllegalArgumentException is not thrown in pool G1 Eden for threshold 1048577 (init = 1048576(1024K) used = 0(0K) committed = 0(0K) max = 1048576(1024K))
I'm still trying to figure out what's causing (I _think_ the first one might be caused by the fact that in G1 the max size of the old gen can change, and in fact decrease, over time).
|
|
|