EVALUATION
Regarding the JDC/SDN comment of 2006-02-12, the fix for this bug in the Mustang release can be found in the source snapshots available from the Mustang project on java.net (https://mustang.dev.java.net/). It was roughly the following patch to src/share/classes/sun/rmi/transport/DGCClient.java:
@@ -366,11 +374,12 @@
/*
* Calculate the initial retry delay from the
* average time elapsed for each of the first
- * two failed dirty calls.
+ * two failed dirty calls. The result must be
+ * at least 1000ms, to prevent a tight loop.
*/
dirtyFailureDuration =
- (dirtyFailureDuration +
- (endTime - startTime)) >> 1;
+ Math.max((dirtyFailureDuration +
+ (endTime - startTime)) >> 1, 1000);
}
long newRenewTime =
endTime + (dirtyFailureDuration << n);
|
|
|
EVALUATION
Fixed similar to "Suggested Fix".
|
|
|
EVALUATION
Yes, this bug appears to exist as described, and when it occurs, the effect would indeed seem to be rather unpleasant. At a miniumum, something like the Suggested Fix seems reasonable.
Moreover, I would guess that this bug is most likely to be triggered when the dirty call failure does not involve any network operations, such as when a dirty call fails with a SecurityException because a custom client socket factory does not have permission to connect to the particular remote endpoint. But in such a case, retry seems pointless anyway-- perhaps the retry algorithm should only be attempted when the failure is a RemoteException, or some set of RemoteException subclasses (like ConnectException, ConnectIOException, MarshalException, UnmarshalException, etc.).
###@###.### 2005-2-24 00:11:20 GMT
|
|
|
SUGGESTED FIX
Below is the code for method sun.rmi.transport.DGCClient.makeDirtyCall
with some annotations (lines starting with "->") from licensee:
/**
* Make a DGC dirty call to this entry's endpoint, for the ObjIDs
* corresponding to the given set of refs and with the given
* sequence number.
*
* This method should NOT be called while synchronized on this entry.
*/
private void makeDirtyCall(Set refEntries, long sequenceNum) {
ObjID[] ids;
if (refEntries != null) {
ids = createObjIDArray(refEntries);
} else {
ids = emptyObjIDArray;
}
long startTime = System.currentTimeMillis();
try {
Lease lease =
dgc.dirty(ids, sequenceNum, new Lease(vmid, leaseValue));
long duration = lease.getValue();
long newRenewTime = computeRenewTime(startTime, duration);
long newExpirationTime = startTime + duration;
synchronized (this) {
dirtyFailures = 0;
setRenewTime(newRenewTime);
expirationTime = newExpirationTime;
}
} catch (Exception e) {
long endTime = System.currentTimeMillis();
synchronized (this) {
dirtyFailures++;
if (dirtyFailures == 1) {
/*
* If this was the first recent failed dirty call,
* reschedule another one immediately, in case there
* was just a transient network problem, and remember
* the start time and duration of this attempt for
* future calculations of the delays between retries.
*/
dirtyFailureStartTime = startTime;
dirtyFailureDuration = endTime - startTime;
setRenewTime(endTime);
} else {
/*
* For each successive failed dirty call, wait for a
* (binary) exponentially increasing delay before
* retrying, to avoid network congestion.
*/
int n = dirtyFailures - 2;
if (n == 0) {
/*
* Calculate the initial retry delay from the
* average time elapsed for each of the first
* two failed dirty calls.
*/
dirtyFailureDuration =
(dirtyFailureDuration +
(endTime - startTime)) >> 1;
-> /*
-> * I think something like the following should be
-> * added:
-> * if (dirtyFailureDuration < 1000)
-> * dirtyFailureDuration = 1000;
-> */
}
long newRenewTime =
endTime + (dirtyFailureDuration << n);
/*
* Continue if the last known held lease has not
* expired, or else at least a fixed number of times,
* or at least until we've tried for a fixed amount
* of time (the default lease value we request).
*/
if (newRenewTime < expirationTime ||
dirtyFailures < dirtyFailureRetries ||
newRenewTime < dirtyFailureStartTime + leaseValue)
{
setRenewTime(newRenewTime);
} else {
/*
* Give up: postpone lease renewals until next
* ref is registered for this endpoint.
*/
setRenewTime(Long.MAX_VALUE);
}
}
if (refEntries != null) {
/*
* Add all of these refs to the set of refs for this
* endpoint that may be invalid (this VM may not be in
* the server's referenced set), so that we will
* attempt to explicitly dirty them again in the
* future.
*/
invalidRefs.addAll(refEntries);
/*
* Record that a dirty call has failed for all of these
* refs, so that clean calls for them in the future
* will be strong.
*/
Iterator iter = refEntries.iterator();
while (iter.hasNext()) {
RefEntry refEntry = (RefEntry) iter.next();
refEntry.markDirtyFailed();
}
}
/*
* If the last known held lease will have expired before
* the next renewal, all refs might be invalid.
*/
if (renewTime >= expirationTime) {
invalidRefs.addAll(refTable.values());
}
}
}
}
###@###.### 2005-2-22 03:14:45 GMT
|
|
|
|