EVALUATION
This fix was integrated into Mustang b77, but I forgot to update this bug
report with the exact system property that enables the new codepath:
-Dsun.java2d.opengl=true -Dsun.java2d.opengl.lcdshader=true
As mentioned above, there were two outstanding driver bugs when this fix
was integrated: artifacts on ATI boards, and poor performance on Nvidia boards.
Since then, both bugs have been fixed by their respective driver teams and
we are currently testing those fixes. The ATI fix should be available in their
8.26 release (for Linux, not sure how that correlates to their Windows Catalyst
driver version) in a couple months, and the Nvidia fix should be
available in their 85.xx series in a similar timeframe. Once both driver fixes
are publicly available and verified, we should be able to enable the lcdshader
flag by default (again, only if the OGL pipeline has been enabled).
|
|
|
EVALUATION
Just to clarify my last point, if you enable the OGL pipeline, the LCD
acceleration codepath will be disabled by default. You will have to enable
both system properties to enable LCD acceleration for the OGL pipeline.
|
|
|
EVALUATION
I've attached a simple performance test (LCDTextTest.java) that I've used to
measure LCD text performance. I ran this test on a number of machines with
shader-level hardware, and the results are all over the map. I've included
these performance numbers below. In these results, the numbers indicate the
number of milliseconds taken to render a particular test string (about 35
characters long, in 12 point font) 10000 times, so lower is better.
def - "default pipeline" (GDI/DDraw on Windows, X11 on Linux/Solaris)
ogl - "OGL pipeline" (-Dsun.java2d.opengl=True)
mono - black/white text (TEXT_ANTIALIAS_OFF)
gray - grayscale AA text (TEXT_ANTIALIAS_ON)
lcd - LCD optimized text (TEXT_ANTIALIAS_LCD_HRGB)
awt - render directly to the screen
swing - render to the Swing backbuffer (pbuffer in the case of OGL)
(swing-fbo) - render to the Swing backbuffer, which is an FBO
(-Dsun.java2d.opengl.fbobject=true); only applicable when
OGL is enabled
NV GF FX 5600 (AGP), JDS Linux, 2x 2.8GHz P4
awt swing (swing-fbo)
def-mono 225 319
def-gray 1697 218
def-lcd 2219 480
ogl-mono 116 109 109
ogl-gray 116 109 109
ogl-lcd 718 721 3669
NV GF FX 5600 (AGP), Windows XP, 2x 2.8GHz P4
awt swing (swing-fbo)
def-mono 1469 672
def-gray 1750 250
def-lcd 2266 328
ogl-mono 234 234 235
ogl-gray 235 234 234
ogl-lcd 969 7359 6907
NV GF 6800 (PCI-E), Windows XP, 1x 2.2GHz Opteron 148
awt swing (swing-fbo)
def-mono 375 297
def-gray 531 187
def-lcd 750 375
ogl-mono 78 78 62
ogl-gray 78 78 63
ogl-lcd 406 375 344
NV Quadro FX 1100 (AGP), Solaris 10, 2x 2.0GHz Opteron 246
awt swing (swing-fbo)
def-mono 160 215
def-gray 2345 247
def-lcd 2803 515
ogl-mono 117 100 100
ogl-gray 115 101 100
ogl-lcd 519 483 6475
ATI Radeon 9600 (AGP), JDS Linux, 1x 3.2GHz P4
awt swing (swing-fbo)
def-mono 356 245
def-gray 11304 198
def-lcd 13206 422
ogl-mono 118 119 116
ogl-gray 118 118 116
ogl-lcd 606 604 606
ATI Radeon 9500 Pro (AGP), Windows XP, 2x 2.8GHz P4
awt swing (swing-fbo)
def-mono 2188 516
def-gray 2656 250
def-lcd 3516 328
ogl-mono 391 390 390
ogl-gray 391 390 390
ogl-lcd 2125 2125 2125
ATI Radeon x300 SE (PCI-E), Windows XP, 1x 3.4GHz P4
awt swing (swing-fbo)
def-mono 532 297
def-gray 610 156
def-lcd 766 250
ogl-mono 94 94 93
ogl-gray 94 94 94
ogl-lcd 1032 954 953
Points of interest:
- Performance improves significantly on newer boards (those with better
shader support).
- Performance is terrible on Nvidia when rendering to an FBO (on all
platforms) or to a pbuffer/render-to-texture surface (on Windows only);
this is a performance issue with glCopyTexSubImage2D(), which has been
filed with Nvidia this morning (#216273). May be related to some other
glCopyPixels() issues that are discussed in JDK bugid 6298234.
- It is interesting that Nvidia GF 6800 (PCI-Express) does not exhibit
any of those performance issues, presumably because the readback bus
speeds are much faster with PCI-E than AGP, so driver slow paths that
go to sysmem are likely to be less noticeable (I've asked Nvidia about
this).
- ATI does not have those same performance issues with FBO, although their
glCopyTexSubImage2D() performance is generally worse than that of Nvidia
on comparable hardware, so I will follow up with ATI about that.
- ATI has a bug in glCopyTexSubImage2D() that causes LCD text to look
garbled (when rendered to a pbuffer or FBO destination only); that bug
will be filed with ATI shortly.
Out of all of these configurations, there are only two where LCD text is
faster with the OGL pipeline: Nvidia Quadro FX 1100 on Solaris 10, and
Nvidia GeForce 6800 on WinXP. Due to this fact and the other issues listed
above, I suggest we check this code in now, but disabled by default (it can
be enabled by a system property, TBD). This will give us some time to shake
out the driver issues, and then once those are resolved we can turn this on
by default at a later date.
|
|
|
EVALUATION
The fix for this has been in progress for at least six months because there
have been a number of technical hurdles along the way (lots of investigation
of fragment shader performance, finding sneaky ways to avoid expensive pow()
calls in the shader, etc). At this point, I have a fairly complete implementation
that uses a GLSL fragment shader, and depending on which video card is used,
it can be about as fast as, or up to 30% faster than the equivalent operation
using software loops. In fact, shader performance has been improving so much
in recent graphics hardware that I would expect this gap to grow larger and
larger over time.
Due to time constraints, I've only made this work for
hardware that supports the GL_ARB_fragment_shader extension, i.e. there is no
software fallback (it would be really slow anyway, as it would require readbacks
from the framebuffer into system memory), so for older boards we will continue
to just use grayscale AA as we've done up until this point in Mustang. Also,
this shader-based implementation only works when the foreground color is opaque,
but this should be sufficient for 99% of all cases.
|
|
|
|