I have tried a couple of other approaches.
1. Instead of loading the image into the texture and then
drawing this texture to the destination, I tried to use
IDirect3DSurface::UpdateSurface. This is a mehtod for
uploading pixels into an unlockable surface created in DEFAULT
pool (vram). Unfortunately this approach didn't yield any
benefits and is in fact slower (at least on my PCIX board).
2. When uploading images larger than 256x256 we
tile the image by uploading it piece by piece into the texture
and rendering it. The texture is a DYNAMIC texture (and thus resides
in vram), which can be locked with DISCARD flag (which allows the hw not to
stall every time the texture is locked).
But we were only locking the texture with this flag if
we were filling the whole texture. If the image is say 300x300,
the 3 pieces which didn't fit into the 256x256 temp. texture
were uploaded w/o the use of the DISCARD flag. I've fixed that,
but again, there were not much benefit.
Approach 2) is probably worth integrating anyway, and probably
applying it to the MaskBlit image upload code as well.
Here are the results of an investigation (same attached
PerfTest was used) - the result is in fps:
The code path to get pixels from a BI to the back-buffer
or screen is as follows:
1. the pixels are copied to a texture
2. texture is drawn to the back-buffer
This is because creating lockable render targets (like a back-buffe)
is hightly unadviseable since locking stalls the gpu.
If the source image is too large to fit in a texture
it is tiled (steps 1-2 repeated for each tile).
Step 1 consists of this call:
D3DBL_CopyImageToIntXrgbSurface - takes care of copying pixels to the blit texture
It locks the destination surface and calls optimized software
loop which copies the pixels from the src image to the texture,
making format conversion on the fly. If no conversion is needed
it is just a memcpy (specific method for this case is
Given this information here are the results of the investigation:
(the numbers are frames per second, for 300x400 image on nvidia fx 7800)
6u4 ddraw: 1060
6u4 noddraw: 1823
6u10 nod3d: 2262 (100%)
6u10 d3d default: 881 (38%)
1. 6u10 D3DBL_CopyImageToIntXrgbSurface no-oped: 1968 (87%)
2. 6u10 AnyIntIsomorphicCopy no-oped: 1300 (57%)
3. 6u10 AnyIntIsomorphicCopy no-oped + DYNAMIC disabled: 1050
1. means that the whole copying to the texture is no-oped, we just
draw the texture to the destination. This gives us an
approximation of what we could get if copying to the texture was
2. the blit loop which copies the pixels to the texture is no-oped, so
we just lock and unlock the surface. This gives us an approximation
of how much time we spend actually copying the pixels
3. 2. and we disable the use of DYNAMIC textures. We already use DYINAMIC textures
for this purpose to improve performance, so this is just to illustrate
how much we get by using DYNAMIC textures
By default the d3d is 62% slower than no-d3d case in this benchmark.
Note that in most cases people will be comparing with the old pipeline which
had ddraw enabled, and performance drop for them will only be around
10%. And if the dimensions of the image increase (depending configuration)
the performance difference decreases.
We spend most of the time getting the pixels
into the texture. Just copying the pixels alone takes around 30%
of the time (and this is w/o conversion). We could try to improve there,
not sure how though, unless we use sse/mmx instructions, which is
out of scope of this bug.
I did some experiments with an ideal case (when the scan stride of the
source and destination are the same - like if we're copying a 256x256
image, which happens to be the same size as our blit texture).
In this case could just use a single lock,memcpy(),unlock to copy the pixels
to the texture (instead of memcpy per scan line). The overall mprovement
was around 8%. But this case is relateively rare. Creating a blit texture
of the size of the source image seems prone to thrashing, and caused
a significant slowdown in some cases (like bouncing between
two images of different sizes).
So no clear solution so far.