EVALUATION
The problem is that R300 has a limited set of constant registers, and our ConvolveOp
shader is currently making inefficient use of uniform arrays. The hardware has only
so many vec4 registers, and for a 5x5 ConvolveOp, our current code uses:
uniform vec2 imgMin;
uniform vec2 imgMax;
uniform vec2 offsets[25];
uniform float kernelVals[25];
ATI's drivers aren't smart enough to know to pack the offsets and kernelVals into
a single array, so we should take care of that ourselves. Also, we can do the
same for imgMin and imgMax. Ultimately we end up with:
// image edge limits:
// imgEdge.xy = imgMin.xy (anything < will be treated as edge case)
// imgEdge.zw = imgMax.xy (anything > will be treated as edge case)
"uniform vec4 imgEdge;"
// value for each location in the convolution kernel:
// kernelVals[i].x = offsetX[i]
// kernelVals[i].y = offsetY[i]
// kernelVals[i].z = kernel[i]
"uniform vec3 kernelVals[MAX_KERNEL_SIZE];"
After making these changes, the shader compiler no longer complains about exceeding
the number of available constants, but on Catalyst 7.2 and earlier, it now complains
about something else:
Link successful. The GLSL fragment shader will run in software - available number
of texture instructions exceeded.
This problem only occurs when the source texture has non-pow2 dimensions because
we use the GL_ARB_texture_rectangle extension in this case. We worked with ATI
to confirm that this is indeed a driver issue that has been fixed for their upcoming
Catalyst 7.3 release (and hopefully fixed soon on Linux as well).
So in summary, we're making the changes described above in the JDK to work around
the constant register limit issue, but folks will need to install Catalyst 7.3
or later for the complete problem to go away. (A workaround for Catalyst 7.2 and
earlier is to simply use pow2-sized images only, but that's a fairly limiting
restriction, so it would be better to just install 7.3 when it becomes available.)
|