|
Quick Lists
|
|
Bug ID:
|
6273431
|
|
Votes
|
0
|
|
Synopsis
|
OGL: improve performance of parameter queuing
|
|
Category
|
java:classes_2d
|
|
Reported Against
|
|
|
Release Fixed
|
mustang(b43)
|
|
State
|
10-Fix Delivered,
bug
|
|
Priority:
|
4-Low
|
|
Related Bugs
|
6277756
|
|
Submit Date
|
20-MAY-2005
|
|
Description
|
In our single-threaded OGL pipeline, we enqueue parameters for each rendering
operation onto a java.nio.ByteBuffer. For example, a fillRect() operation
currently looks like this:
buf.putInt(FILL_RECT);
buf.putInt(x).putInt(y).putInt(w).putInt(h);
We have a number of operations whose parameters consist entirely of integers (no floats, longs, etc). For these operations we can exploit the fact that we always keep the current buffer position aligned on a 4-byte boundary, and therefore use an IntBuffer view on the original ByteBuffer to enqueue those
int parameters. (In a microbenchmark, tested on both Solaris/SPARC and Linux,
I found that using IntBuffer.put() can be up to 3x faster than
ByteBuffer.putInt() when the buffer is in the native machine endianness.)
So for the above example, we would instead use:
IntBuffer ibuf = buf.asIntBuffer();
ibuf.put(FILL_RECT);
ibuf.put(x).put(y).put(w).put(h);
buf.position(buf.position() + 20);
xxxxx@xxxxx 2005-05-20 00:09:21 GMT
|
|
Work Around
|
N/A
|
|
Evaluation
|
Using the approach described above, it appears that we can improve performance
of a number of common operations. For example, on Solaris/SPARC with XVR-1200:
Operation Performance
--------- -----------
20x20 fillRect() +45%
20x20 drawLine() +52%
20x20 drawImage() +20%
20x20 copyArea() +10%
There are a few other operations that will also likely improve using these
techniques (e.g. setClip(), MaskFill, MaskBlit).
xxxxx@xxxxx 2005-05-20 00:07:37 GMT
Similar improvements are seen on Linux as well (JDS, Nvidia GF FX 5600,
7590 drivers, 2x 2.6GHz P4):
Operation Performance
--------- -----------
20x20 fillRect() + 3%
20x20 drawLine() +37%
20x20 drawImage() +17%
20x20 copyArea() +26%
xxxxx@xxxxx 2005-05-20 05:45:28 GMT
While the proposed changes certainly improve performance, the approach is
a bit clunky ("round peg, square hole"). If we are getting to the point
where we are using tricks to get better performance out of NIO ByteBuffers,
why not just write a thin Unsafe wrapper that meets our needs? We are
already going out of our way to maintain 4-byte alignment, so only a few
more changes would be required to achieve 8-byte alignment when necessary
(i.e. when adding long and double parameters to the buffer). This approach
has a couple added benefits:
- interface is mostly compatible with NIO classes
- performance gains for all existing code without creating view buffers
and such, as suggested earlier
- no temporary object creation (before, we would create one or more
view buffers for each drawGlyphList() call; while not too expensive,
it would be nice to avoid this)
Here are some updated performance numbers with these changes in place
(on the Solaris/SPARC configuration listed above):
Operation Performance
--------- -----------
1x1 fillRect() +64%
20x20 fillRect() +52%
1x1 drawLine() +57%
20x20 drawLine() +64%
100x100 drawLine() +39%
1x1 drawImage() +34%
20x20 drawImage() +32%
100x100 drawImage() + 6%
1x1 copyArea() +11%
20x20 copyArea() +11%
4 ch drawString() +20%
32 ch drawString() +17%
And on my Windows XP machine (2x 2.6GHz P4, GF FX 5600):
Operation Performance
--------- -----------
1x1 fillRect() +41%
20x20 fillRect() +33%
1x1 drawLine() +49%
20x20 drawLine() +47%
100x100 drawLine() +46%
1x1 drawImage() +68%
20x20 drawImage() +69%
100x100 drawImage() + 3%
20x20 copyArea() 0% (known driver slowness)
4 ch drawString() +38%
32 ch drawString() + 6%
xxxxx@xxxxx 2005-05-25 23:44:30 GMT
|
|
Comments
|
PLEASE NOTE: JDK6 is formerly known as Project Mustang
|
|
|
 |