SUGGESTED FIX
Job ID: 20070213174113.never.6511991
Original workspace: smite:/export/ws/6511991
Submitter: never
Archived data: /net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2007/20070213174113.never.6511991/
Webrev: http://prt-web.sfbay.sun.com/net/prt-archiver.sfbay/data/archived_workspaces/main/c2_baseline/2007/20070213174113.never.6511991/workspace/webrevs/webrev-2007.02.13/index.html
Fixed 6511991: add support for real temporaries in adlc
Often when writing complex instruction definitions in an ad file
temporary registers are needed for code generation. KILLs can be use
in some cases but this requires having a fixed register for the
temporary which overly constrains the regiser allocator resulting in
more shuffling of registers than is really necessary. This is
particularly noticeable in i486.ad.
This change add a new effect called TEMP which is like a synthetic
USE. USEs represent real inputs to the MachNode and come for input
the match rule. KILLs don't correspond inputs to the MachNode so they
can't be assigned a register. KILLs also don't interfere with the
inputs to the node either so they aren't very useful for creating
temporaries.
TEMP can also be used to modify DEFs which means that the DEF will
interfere the inputs guaranteeing that the output register is
different than any of the inputs.
There are some minor restrictions on their use. TEMPs must come
before any KILLs in the argument list of the instruction. This is
because of the machinery in aldc having to do with the numbering of
inputs. Fixing it to be flexible was too complicated but it will
complain when you violate this rule.
I changed all the ad files in the places which made sense. In
sparc.ad I left alone the uses of O7 as a temp since O7 isn't a part
of the allocatable register sets so broadening the mask for those uses
wouldn't help register pressure. I also fixed a lot of code to use
operand names instead of hard coding the names.
I had to workaround a bug in the iterator model used in adlc since
they are internal instead of external, so two different pieces of code
can't iterate the same object simultaneously. I added some code to
preserve the state of the iterator when ComponentLists use iteration
internally so that queries on a ComponentList won't break users which
are also iterating the list.
I added field name printing in the opto assembly output when the
ciField is available from the adr_type. So printing now looks like
this:
fd7 B129: # B141 B130 <- B128 Freq: 18860
fd7 MOV EBX,[EBP + #8] ! Field java/util/HashMap$Entry.key
fda MOV EAX,[EBP + #12] ! Field java/util/HashMap$Entry.value
fdd CMPu EBX,EDX
fdf Jeq,us B141 P=0.027625 C=18860.000000
Register allocation time doesn't appear to be affected and performance
looks pretty much like a wash though there are tiny regressions and
improvements in some of the subbenchmarks. The current refworkload
data is at the end.
http://javaweb.sfbay/~never/webrev/6511991
Approved by:
Reviewed by:
Fix verified (y/n): y
sunblade 2500 2x1.2G 2G RAM
============================================================================
t1: reference_server
Benchmark Samples Mean Stdev
jetstream 15 66.31 1.99
scimark 15 74.79 0.87
specjbb2000 15 25853.12 180.32
specjbb2005 15 8419.68 243.87
specjvm98 15 140.23 0.71
volano25 15 12443.73 220.08
--------------------------------------------------------------------------
Weighted Geomean 1767.89
============================================================================
t2: reference_server
Benchmark Samples Mean Stdev %Diff P Significant
jetstream 15 65.85 1.37 -0.70 0.463 *
scimark 15 75.20 0.92 0.54 0.226 *
specjbb2000 15 25817.19 151.87 -0.14 0.560 *
specjbb2005 15 8390.93 212.50 -0.34 0.733 *
specjvm98 15 139.67 0.57 -0.39 0.026 *
volano25 15 12514.67 265.19 0.57 0.432 *
--------------------------------------------------------------------------
Weighted Geomean 1767.16 -0.04
============================================================================
hsdev-5 8x2.6G 2G RAM
============================================================================
t1: reference_server
Benchmark Samples Mean Stdev
jetstream 15 149.89 1.98
scimark 15 306.71 0.86
specjbb2000 15 118245.86 2226.14
specjbb2005 15 10552.48 201.69
specjvm98 15 320.76 1.75
volano25 15 113678.13 7787.67
--------------------------------------------------------------------------
Weighted Geomean 5551.44
============================================================================
t2: reference_server
Benchmark Samples Mean Stdev %Diff P Significant
jetstream 15 155.59 1.38 3.80 0.000 Yes
scimark 15 306.32 0.65 -0.13 0.169 *
specjbb2000 15 118370.99 1480.00 0.11 0.858 *
specjbb2005 15 10659.55 101.61 1.01 0.081 *
specjvm98 15 322.91 1.63 0.67 0.002 Yes
volano25 15 113387.80 4217.76 -0.26 0.900 *
--------------------------------------------------------------------------
Weighted Geomean 5588.83 0.67
============================================================================
The jetstream improvement is a 13% improvement in Copy which just
seems like an aberration. The generated code isn't significantly different.
|