EVALUATION
The texture filtering mode (e.g. GL_NEAREST) should be set once when the
texture is created (this state is per-texture object). We should update
this state only when we need to use a different filtering mode for that
texture. It appears that this change improves performance of small (20x20)
texture copies by about ~3% on most hardware.
The texture function (e.g. GL_MODULATE) should be set once the first time
a texture is rendered (this state is per-context). We should update this
state only when we need to use a different texture function (e.g. GL_REPLACE,
in the case of OGLMaskBlit). Surprisingly, this single change improves
performance of small (20x20) texture copies by as much as 35%, depending
on the hardware. It also has a modest benefit for other texture-based
operations, like text rendering (drawString() with 8 characters improves
by 10% with this change).
These changes have little impact on larger image copies since the calls
involved are dwarfed by the texture mapping operation itself. For smaller
operations (20x20 drawImage()), the overall improvements from the above
changes varies by hardware:
Sol9 900MHz USIII XVR-1200 +11%
WinXP 2x2.6GHz P4 NV GF FX 5600 +37%
NV GF2 MX400 +26%
ATI R9500 Pro +42%
JDS 2x2.6HGz P4 NV GF FX 5600 +12%
This change has a modest improvement on SwingMark scores (e.g. on the last
configuration, performance is improved from 8360 to 8130).
Note that there are other potential optimizations in this area. For example,
we could avoid calling glBindTexture() when the texture is the same as last
time. This change is fairly complex because it requires invalidating the
"lastTexture" field on all contexts when a texture is deleted (texture object
IDs are frequently reused by drivers). Also, there doesn't appear to be a
big win from this change (maybe 1-2% improvement at best), so it wasn't
worth exploring further.
Another optimization would be to avoid calling glEnable(GL_TEXTURE_2D)
and glDisable(GL_TEXTURE_2D) around every texturing operation. This change
is also complex because it requires tracking each operation and determining
when it is safe to leave texturing enabled. I think I experimented with this
a while back, and it might buy us some more gains, but that will require
more investigation (outside the scope of this fix).
###@###.### 2005-04-14 20:13:42 GMT
I found another simple optimization. In OGLBlitSurfaceToSurface(), we
always call glPixelZoom() before and after the glCopyPixels() call,
even if the scale factors are both 1.0f (the default value). Avoiding
these calls results in the following gains (on JDS, 2x2.6GHz P4,
Nvidia GF FX 5600):
20x20 drawImage() from VI to screen: + 2%
20x20 drawImage() from VI to VI: +20%
20x20 copyArea() (onscreen): +22%
20x20 copyArea() (VI): +22%
The first case doesn't show much improvement because we flush the RQ after
every pbuffer->screen copy, which requires a thread switch per copy. That
immediate flush is necessary to keep Swing responsive, but it keeps the
benchmarks from showing their full potential. For example, if we removed
the rq.flushNow() call after every pbuffer->screen copy, SwingMark would
improve by approximately 10%. (This discussion is getting outside the scope
of this bug report, so I'll leave that investigation for another day.)
###@###.### 2005-04-14 22:55:47 GMT
|