EVALUATION
Using the approach described above, it appears that we can improve performance
of a number of common operations. For example, on Solaris/SPARC with XVR-1200:
Operation Performance
--------- -----------
20x20 fillRect() +45%
20x20 drawLine() +52%
20x20 drawImage() +20%
20x20 copyArea() +10%
There are a few other operations that will also likely improve using these
techniques (e.g. setClip(), MaskFill, MaskBlit).
###@###.### 2005-05-20 00:07:37 GMT
Similar improvements are seen on Linux as well (JDS, Nvidia GF FX 5600,
7590 drivers, 2x 2.6GHz P4):
Operation Performance
--------- -----------
20x20 fillRect() + 3%
20x20 drawLine() +37%
20x20 drawImage() +17%
20x20 copyArea() +26%
###@###.### 2005-05-20 05:45:28 GMT
While the proposed changes certainly improve performance, the approach is
a bit clunky ("round peg, square hole"). If we are getting to the point
where we are using tricks to get better performance out of NIO ByteBuffers,
why not just write a thin Unsafe wrapper that meets our needs? We are
already going out of our way to maintain 4-byte alignment, so only a few
more changes would be required to achieve 8-byte alignment when necessary
(i.e. when adding long and double parameters to the buffer). This approach
has a couple added benefits:
- interface is mostly compatible with NIO classes
- performance gains for all existing code without creating view buffers
and such, as suggested earlier
- no temporary object creation (before, we would create one or more
view buffers for each drawGlyphList() call; while not too expensive,
it would be nice to avoid this)
Here are some updated performance numbers with these changes in place
(on the Solaris/SPARC configuration listed above):
Operation Performance
--------- -----------
1x1 fillRect() +64%
20x20 fillRect() +52%
1x1 drawLine() +57%
20x20 drawLine() +64%
100x100 drawLine() +39%
1x1 drawImage() +34%
20x20 drawImage() +32%
100x100 drawImage() + 6%
1x1 copyArea() +11%
20x20 copyArea() +11%
4 ch drawString() +20%
32 ch drawString() +17%
And on my Windows XP machine (2x 2.6GHz P4, GF FX 5600):
Operation Performance
--------- -----------
1x1 fillRect() +41%
20x20 fillRect() +33%
1x1 drawLine() +49%
20x20 drawLine() +47%
100x100 drawLine() +46%
1x1 drawImage() +68%
20x20 drawImage() +69%
100x100 drawImage() + 3%
20x20 copyArea() 0% (known driver slowness)
4 ch drawString() +38%
32 ch drawString() + 6%
###@###.### 2005-05-25 23:44:30 GMT
|