The test renders an unaccelerated BI directly to the screen
in a tight loop, resizing the window from time to time.
After a while, the test stops updating the screen,
shows garbage and eventually crashes.
The problem is in the D3DScreenUpdateManager's handling of
on-screen surfaces. When a window is resized the current surface
gets replaced with a new one, which is initially in the "lost"
state. This is so that we don't waste resources on surfaces which
aren't rendered to.
When a surface is first rendered to it is "restored" -
the native surface gets created.
The problem is that this may happen from different threads
at the same time: the screen updater thread, which checks
the surfaces and flips the ones rendered to, as well as restores
the lost ones, and any other thread which does the
rendering - the main thread in this particular case.
Since the 'lost' state of the surface data and the
restoration isn't synchronized, it may get "restored"
from both threads at the same time.
Here's an example of what happens
T1 ScreenUpdater Thread
g = getGraphics() run()
../// resize happens, the new dst surface is installed, in lost state
D3DSD.initSurface //created new swap chain S1
.. D3DSD.initSurface() // dst now has new swap chain S2
dst.flip() // we flip S2, not S1!
Here's a stack trace illustrating this:
java.lang.Exception: Stack trace
setting RT: sun.java2d.d3d.D3DSurfaceData$D3DWindowSurfaceData@5e176f
java.lang.Exception: Stack trace
Now the BufferedContext thinks that it had
set the render target on thenative level (since the
destination surface data object doesn't change), but
in reality the new native surface is never set as render
target, so nothing gets rendered.
Also, we're leaking the first surface that was restored.
The native resource managers still tracks it, but it will only
be released when device reset happens. So if we have tons of
resizes we quickly exhaust video memory, and then strange
things start to happen - createSwapChain returns some weird
errors (not the OUT_OF_VIDEO_MEMORY as one would expect),
and we sometimes crash when transfering pixels to a locked a
managed surface. This looks like a bug in d3d runtime - since
lock succeeds, and we don't render outside of its memory
(I've verified it with memset(ptr, 0, h*lineStride).
A better and more risky fix would be to make surfaceData's
lost state to be thread-safe with locking. But this is prone
to issues, so instead it appears that it is easier to deal
with the consequences of it not being thread-safe.
First of all, this situation can only happen to
our own "on-screen" surfaces, belonging to D3DScreenUpdateManager.
This is because the D3DVolatileSurfaceManager doesn't actually
restore the current accelerated surface, it creates a new one
instead. Also, typically applications don't validate
volatile images and render to/from them on different threads
Preventing leaks causing the crashes is simple: we just need
to release the current resource in the native surface before
allocating a new one.
But we still have a problem of BufferedContext not resetting
the render target (the cause of rendering artifacts).
For that a workaround is to reset the
BufferedContext after a successful restoration of the
D3DWindowSurfaceData. This will make sure that the
next rendering call will set the proper render target
to the d3d device.
With this fix the test runs with no problems.