Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 6273431
Votes 0
Synopsis OGL: improve performance of parameter queuing
Category java:classes_2d
Reported Against
Release Fixed mustang(b43)
State 10-Fix Delivered, bug
Priority: 4-Low
Related Bugs 6277756
Submit Date 20-MAY-2005
Description
In our single-threaded OGL pipeline, we enqueue parameters for each rendering
operation onto a java.nio.ByteBuffer.  For example, a fillRect() operation
currently looks like this:
    buf.putInt(FILL_RECT);
    buf.putInt(x).putInt(y).putInt(w).putInt(h);

We have a number of operations whose parameters consist entirely of integers (no floats, longs, etc).  For these operations we can exploit the fact that we always keep the current buffer position aligned on a 4-byte boundary, and therefore use an IntBuffer view on the original ByteBuffer to enqueue those
int parameters. (In a microbenchmark, tested on both Solaris/SPARC and Linux,
I found that using IntBuffer.put() can be up to 3x faster than
ByteBuffer.putInt() when the buffer is in the native machine endianness.)
So for the above example, we would instead use:
    IntBuffer ibuf = buf.asIntBuffer();
    ibuf.put(FILL_RECT);
    ibuf.put(x).put(y).put(w).put(h);
    buf.position(buf.position() + 20);
  xxxxx@xxxxx   2005-05-20 00:09:21 GMT
Work Around
N/A
Evaluation
Using the approach described above, it appears that we can improve performance
of a number of common operations.  For example, on Solaris/SPARC with XVR-1200:

Operation           Performance
---------           -----------
20x20 fillRect()       +45%
20x20 drawLine()       +52%
20x20 drawImage()      +20%
20x20 copyArea()       +10%

There are a few other operations that will also likely improve using these
techniques (e.g. setClip(), MaskFill, MaskBlit).
  xxxxx@xxxxx   2005-05-20 00:07:37 GMT

Similar improvements are seen on Linux as well (JDS, Nvidia GF FX 5600,
7590 drivers, 2x 2.6GHz P4):

Operation           Performance
---------           -----------
20x20 fillRect()       + 3%
20x20 drawLine()       +37%
20x20 drawImage()      +17%
20x20 copyArea()       +26%

  xxxxx@xxxxx   2005-05-20 05:45:28 GMT

While the proposed changes certainly improve performance, the approach is
a bit clunky ("round peg, square hole").  If we are getting to the point
where we are using tricks to get better performance out of NIO ByteBuffers,
why not just write a thin Unsafe wrapper that meets our needs?  We are
already going out of our way to maintain 4-byte alignment, so only a few
more changes would be required to achieve 8-byte alignment when necessary
(i.e. when adding long and double parameters to the buffer).  This approach
has a couple added benefits:
  - interface is mostly compatible with NIO classes
  - performance gains for all existing code without creating view buffers
    and such, as suggested earlier
  - no temporary object creation (before, we would create one or more
    view buffers for each drawGlyphList() call; while not too expensive,
    it would be nice to avoid this)

Here are some updated performance numbers with these changes in place
(on the Solaris/SPARC configuration listed above):

Operation             Performance
---------             -----------
  1x1   fillRect()       +64%
 20x20  fillRect()       +52%
  1x1   drawLine()       +57%
 20x20  drawLine()       +64%
100x100 drawLine()       +39%
  1x1   drawImage()      +34%
 20x20  drawImage()      +32%
100x100 drawImage()      + 6%
  1x1   copyArea()       +11%
 20x20  copyArea()       +11%
  4 ch  drawString()     +20%
 32 ch  drawString()     +17%

And on my Windows XP machine (2x 2.6GHz P4, GF FX 5600):

Operation             Performance
---------             -----------
  1x1   fillRect()       +41%
 20x20  fillRect()       +33%
  1x1   drawLine()       +49%
 20x20  drawLine()       +47%
100x100 drawLine()       +46%
  1x1   drawImage()      +68%
 20x20  drawImage()      +69%
100x100 drawImage()      + 3%
 20x20  copyArea()         0% (known driver slowness)
  4 ch  drawString()     +38%
 32 ch  drawString()     + 6%

  xxxxx@xxxxx   2005-05-25 23:44:30 GMT
Comments
  
  Include a link with my name & email   


PLEASE NOTE: JDK6 is formerly known as Project Mustang