United StatesChange Country, Oracle Worldwide Web Sites Communities I am a... I want to...
Bug ID: 4276423 drawImage of an offscreen image to the screen much slower in JDK 1.2
4276423 : drawImage of an offscreen image to the screen much slower in JDK 1.2

Details
Type:
Bug
Submit Date:
1999-09-29
Status:
Resolved
Updated Date:
2001-07-18
Project Name:
JDK
Resolved Date:
2001-07-18
Component:
client-libs
OS:
windows_nt,windows_2000
Sub-Component:
2d
CPU:
x86
Priority:
P3
Resolution:
Fixed
Affected Versions:
1.2.0,1.4.0
Fixed Versions:
1.4.0

Related Reports
Relates:
Relates:
Relates:

Sub Tasks

Description
The attached test case measures the performance of copying an offscreen
image to the screen.  The performance of this operation is much slower
in JDK1.2 than it was in JDK 1.1.8, by a factor of more than 3x on the
win32 runtime.

                                    

Comments
CONVERTED DATA

BugTraq+ Release Management Values

COMMIT TO FIX:
merlin-beta

FIXED IN:
merlin-beta2

INTEGRATED IN:
merlin-beta


                                     
2004-06-14
EVALUATION

The performance is off by a bit on Solaris, especially if there are no DGA
drivers for the video card, but win32 is seeing the majority of the impact.

jim.graham@Eng 1999-09-28

I recently got the following results on my PIII-dual 866 NT4 system (video
card ATI Rage Pro Turbo), at 32 bits per pixel:

jdk1.1:
	20x20 		16,909,090 pps
	100x100		22,268,000 pps
	300x300		24,488,304 pps
jdk1.2:
	20x20		9,440,362 pps
	100x100		21,333,600 pps
	300x300 	24,570,419 pps

jdk1.3:
	20x20		9,106,579 pps
	100x100		21,231,683 pps
	300x300		24,616,363 pps

jdk1.4 (my most recent build):
	20x20		7,495,593
	100x100		22,236,607
	300x300		23,695,652

And on my PIII-500 (single CPU) win98 system with a Matrox G400 running
32 bits per pixel:

jdk1.1:
	20x20		37,724,773
	100x100		36,661,107
	300x300		36,808,163

jdk1.2:
	20x20		3,631,375
	100x100		31,629,213
	300x300		43,824,489
jdk1.3:
	20x20		3,728,680
	100x100		33,635,275
	300x300		42,096,774

jdk1.4:
	20x20		5,750,953
	100x100		108.602,065
	300x300		165,263,578

From these results, it looks like:
	- There are definitely differences between OS's and video cards,
	especially when we are comparing hardware-accelerated images and non-
	accelerated images.

	- The overhead of the small (20x20) images appears to drag down the
	performance of 1.4 offscreen images to nearly the level of the
	1.2/1.3 software-based images.  In fact, on the older ATI video
	card, the hw-based images were even slower than the software-based
	images.

	- NT performance of all images seems gated at some maximum amount.  This
	might be a restriction on NT, or it could be a constraint of the
	older video card.  More investigation would be necessary to figure
	it out.  But all larger image sizes on all releases seem about the
	same.

	- win98 shows the difference between jdk1.1 hw-based images
	(flying at about 36M pps) versus jdk1.2/1.2 sw-based images
	(limited to only about 3M on the smallest image).

	- win98 on this fast video card shows the advantage to directDraw
	in the latest jdk1.4 builds; performance of jdk1.1 was gated
	about around 36M pps, but the performance of DirectDraw-based images
	appears much higher, at around 165M pps for th largest image size.

more work is necessary.  We need to make sure that we eliminate any overhead
that might be contributing to the lower scores in jdk1.4 for small image
sizes.  Profiling is necessary...

chet.haase@Eng 2001-04-24

I did a little more debugging/profiling and got the following information:

One of the key pieces of overhead in our Blt processing is due to the
ddraw Clipper object.  When I eliminate the Clipper (i.e., I don't attach
it to the window or set the clipper on the primary), then I more than double 
the performance of the smalles (20x20) image copies.  On my test system
(PIII-866 dual processor, nVidia TNT2), this made the performance go from
11 M pixels per second to over 26 M pixels per second.

Of course, this is a bottleneck that we cannot do much about: drawing
without a Clipper object requires that we do our own clipping to the
window (not too hard) but it also means that we would be subject to
Windows events that could cause rendering artifacts.  For example, if 
our window was obstructed, we would do our Blts over any overlapping
windows, regardless of which window was supposed to be on top (ddraw
draws directly to the screen without regard for Window properties).
And even if our window was on top at the time we issued the Blt call, 
this might not prevent some event (such as the user dragging a window)
from overlapping the window at the time of the the actual Blt operation
(there is a delay between our issuing the call and that call actually
being processed by the hardware).  Actually, this situation might be
handled for us through context switching mechanisms of the driver/hardware
(hopefully the hardware would flush the graphics pipe before allowing the
window system to move things around).  But there is still a small hole of
opportunity between our checking for obstruction and actually issuing the call.

Anyway, this got our performance up to 26 M pixels per second.  But the jdk1.1
version is still at 44 M, nearly twice the performance of our non-clipped
jdk1.4 version.  I think this difference can be attributed to various
overhead elements in our drawImage() processing.  During a profiling run
(using Compuware's TrueTime product), I found that we are spending
significant amounts of time (on the order of one to five percent) in the
following routines:
	ClipInfo (used to derive the actual src/dst values after clipping
		against sg.getCompBounds()
	Blit.getFromCache() (gets the cache entry for our Blit call)
	DrawImage.blitSurfaceData (spends a couple of percent just dealing
		with setting the CompositeType)
	AcceleratedOffScreenImage.getSourceSurfaceData (gets the accelerated
		surfaceData object for accelerated images)

There are various other methods and simple operations which end up taking
over a percent of the runtime.  Many of these functions are very simple
(like the equals() comparison when retrieving the Blit from the cache), but
when called over 60000 times (in this case), they add up to significant
overhead.

The reason for performance loss due to overhead in this case is that the
primitives in question are so small (20x20) that the more we do between
issuing the call from the application and actually issuing the ddraw call,
the more we suffer from each intermediate step.  For the larger primitives,
the amount of overhead is now insignificant compared to the performance
time of the actual rendering so we see the performance benefits of
ddraw much more clearly.

chet.haase@Eng 2001-07-18

I am closing this bug and opening a new bug on just the small-image case.
See bug 4481344 for more details on this problem.  The original reason for
this bug was to fix general image copying performance; we have done that
in jdk1.4 via hardware-accelerated images and performance of most image
sizes is way beyond the performance in any prior jdk.  However, since there
are still issues with small image sizes (such as the 20x20 case quoted in
this bug report), a bug should still be open against this problem.

I am marking this bug as Fixed and Integrated because the original problem
was fixed many releases ago for general images.


chet.haase@Eng 2001-07-18
                                     
2001-07-18



Hardware and Software, Engineered to Work Together