The root cause of this problem is in the DirectX runtime or the ATI driver.
I have written a native app that exhibits the same behavior.
The problem is that accessing d3d through a surface that was created prior
to an alt-tab event causes the system crash in EndScene. There is no warning
or error during the SetRenderTarget, BeginScene, or DrawPrimitive calls,
so there is no way for us to detect a problem prior to crashing.
It is possible for us to detect an alt-tab event, via the WM_ACTIVATEAPP
message. however, the multithreaded nature of our library means that one
thread (perhaps the main Java application thread) can be in the middle of
creating or using this obsolete surface when the alt-tab event occurs. Given
the constraints of multithreaded programming, there is no good place for us to put in a fail-safe check to make sure that we do not ever call EndScene
when the alt-tab event has occurred.
I am refining the native app to chase down potential solutions and will attach
the app when I have something concise that demonstrates the problem (and
hopefully the fix).
I have attached a native app, d3dRadeonCrashTest2MT, which demonstrates the
problem and fix pretty effectively.
The problem boils down to the following:
If we create a surface after the primary has gone into a lost state, set
this offscreen surface to be the render target of the d3d device (which was
created off of the obsolete primary), and render to that d3d device, then
the system reboots or hangs.
The fix is as follows:
At offscreen surface creation time, check whether the primary is valid. If
it is not (IsLost() returns an error), then restore it. During that
restoration, also recreate the d3d device (if there is already a d3d device).
If that restoration is successful, recreate the offscreen surface.
if anything goes wrong during this process (such as: the primary cannot
simply be restored and must be recreated), fail the offscreen surface
creation and let stuff happen unaccelerated until the normal surface
recreation process kicks in at some later time. An example of this
kind of failure is when a fullscreen app is minimized; we cannot simply
restore the primary because we do not have control of the screen.
An important part of this fix is to always recreate the d3d device (if one
exists already) whenever the primary is restored. This way, if the primary
is restored by any other caller than the offscreen surface creation function,
then both primary and d3d device will be in a good state whenever we go to
use them next.
The attached application should show the problem on many systems, including
some non-Radeon systems. The application also has the fix in it; if you
run the app with the flags:
then we automatically check primary->IsLost() after the offscreen surface
creation and restore the primary along with the d3d device before
Note that the application mimics the multithreading issues of the jdk by
having a separate thread creating and rendering into the offscreen surface.
Prior to writing the application to use multiple threads, i thought it
would be possible to trap the windows event WM_ACTIVATE and perform any
necessary surface restorations at that time. But in a multithreaded app,
you cannot guarantee that another thread has not already created an invalid
offscreen surface by the time the Windows event thread has received the