EVALUATION
http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/93c14e5562c4
|
|
|
EVALUATION
These are the numbers with Vladimir's suggested changes on a Nehalem box:
32-bit (instrinsic):
$ gamma bits
Integer.numberOfLeadingZeros:
sum: -2, time: 2317
sum: -2, time: 2190
sum: -2, time: 2560
sum: -2, time: 2560
sum: -2, time: 2561
Integer.numberOfTrailingZeros:
sum: -2147483648, time: 1836
sum: -2147483648, time: 1832
sum: -2147483648, time: 1921
sum: -2147483648, time: 1922
sum: -2147483648, time: 1922
Long.numberOfLeadingZeros:
sum: -34, time: 6986
sum: -34, time: 6981
sum: -34, time: 7082
sum: -34, time: 7071
sum: -34, time: 7074
Long.numberOfTrailingZeros:
sum: -2147483616, time: 5162
sum: -2147483616, time: 5164
sum: -2147483616, time: 5448
sum: -2147483616, time: 5488
sum: -2147483616, time: 5452
64-bit (intrinsic):
$ gamma bits
Integer.numberOfLeadingZeros:
sum: -2, time: 2113
sum: -2, time: 2249
sum: -2, time: 2169
sum: -2, time: 2169
sum: -2, time: 2168
Integer.numberOfTrailingZeros:
sum: -2147483648, time: 1882
sum: -2147483648, time: 1876
sum: -2147483648, time: 1830
sum: -2147483648, time: 1831
sum: -2147483648, time: 1830
Long.numberOfLeadingZeros:
sum: -34, time: 3952
sum: -34, time: 3952
sum: -34, time: 4065
sum: -34, time: 4122
sum: -34, time: 4085
Long.numberOfTrailingZeros:
sum: -2147483616, time: 2419
sum: -2147483616, time: 2407
sum: -2147483616, time: 3705
sum: -2147483616, time: 3788
sum: -2147483616, time: 3717
Numbers are sometimes slightly better, but not significantly.
|
|
|
EVALUATION
These numbers are from an AMD Shanghai w/ 2.6Ghz:
32-bit (Java code):
$ gamma bits
Integer.numberOfLeadingZeros:
sum: -2, time: 9649
sum: -2, time: 10700
sum: -2, time: 10140
sum: -2, time: 11156
sum: -2, time: 11156
Integer.numberOfTrailingZeros:
sum: -2147483648, time: 10563
sum: -2147483648, time: 10555
sum: -2147483648, time: 13401
sum: -2147483648, time: 13616
sum: -2147483648, time: 13617
Long.numberOfLeadingZeros:
sum: -34, time: 12758
sum: -34, time: 12750
sum: -34, time: 12128
sum: -34, time: 12123
sum: -34, time: 12402
Long.numberOfTrailingZeros:
sum: -2147483616, time: 15583
sum: -2147483616, time: 16279
sum: -2147483616, time: 16420
sum: -2147483616, time: 18200
sum: -2147483616, time: 18201
32-bit (instrinsic w/o lzcnt instruction):
$ gamma -XX:-UseCountLeadingZerosInstruction bits
Integer.numberOfLeadingZeros:
sum: -2, time: 5749
sum: -2, time: 5292
sum: -2, time: 5229
sum: -2, time: 5230
sum: -2, time: 5236
Integer.numberOfTrailingZeros:
sum: -2147483648, time: 3944
sum: -2147483648, time: 3935
sum: -2147483648, time: 3780
sum: -2147483648, time: 3831
sum: -2147483648, time: 3832
Long.numberOfLeadingZeros:
sum: -34, time: 11610
sum: -34, time: 11598
sum: -34, time: 12427
sum: -34, time: 12426
sum: -34, time: 12426
Long.numberOfTrailingZeros:
sum: -2147483616, time: 6726
sum: -2147483616, time: 6760
sum: -2147483616, time: 7455
sum: -2147483616, time: 7559
sum: -2147483616, time: 7456
Speedup: 1.94, 3.55, 0.98, 2.44
32-bit (intrinsic w/ lzcnt):
$ gamma bits
Integer.numberOfLeadingZeros:
sum: -2, time: 2325
sum: -2, time: 2028
sum: -2, time: 2071
sum: -2, time: 2071
sum: -2, time: 2071
...
Long.numberOfLeadingZeros:
sum: -34, time: 6653
sum: -34, time: 6641
sum: -34, time: 8301
sum: -34, time: 8671
sum: -34, time: 8286
...
Speedup: 4.90, 1.46
Overall speedup: 4.90, 3.55, 1.46, 2.44
64-bit (Java code):
$ gamma bits
Integer.numberOfLeadingZeros:
sum: -2, time: 9226
sum: -2, time: 9284
sum: -2, time: 10742
sum: -2, time: 12271
sum: -2, time: 12273
Integer.numberOfTrailingZeros:
sum: -2147483648, time: 11102
sum: -2147483648, time: 11707
sum: -2147483648, time: 10873
sum: -2147483648, time: 10880
sum: -2147483648, time: 10873
Long.numberOfLeadingZeros:
sum: -34, time: 10186
sum: -34, time: 10306
sum: -34, time: 11391
sum: -34, time: 12298
sum: -34, time: 12299
Long.numberOfTrailingZeros:
sum: -2147483616, time: 14097
sum: -2147483616, time: 14780
sum: -2147483616, time: 15025
sum: -2147483616, time: 17561
sum: -2147483616, time: 17553
64-bit (instrinsic w/o lzcnt):
$ gamma -XX:-UseCountLeadingZerosInstruction bits
Integer.numberOfLeadingZeros:
sum: -2, time: 5772
sum: -2, time: 5083
sum: -2, time: 4867
sum: -2, time: 4866
sum: -2, time: 4867
Integer.numberOfTrailingZeros:
sum: -2147483648, time: 3591
sum: -2147483648, time: 3572
sum: -2147483648, time: 3521
sum: -2147483648, time: 3623
sum: -2147483648, time: 3739
Long.numberOfLeadingZeros:
sum: -34, time: 5808
sum: -34, time: 5799
sum: -34, time: 6627
sum: -34, time: 6627
sum: -34, time: 6628
Long.numberOfTrailingZeros:
sum: -2147483616, time: 4150
sum: -2147483616, time: 4142
sum: -2147483616, time: 5382
sum: -2147483616, time: 5383
sum: -2147483616, time: 6063
Speedup: 2.52, 3.09, 1.86, 3.26
64-bit (intrinsic w/ lzcnt):
$ gamma bits
Integer.numberOfLeadingZeros:
sum: -2, time: 2320
sum: -2, time: 1942
sum: -2, time: 1812
sum: -2, time: 1812
sum: -2, time: 1812
...
Long.numberOfLeadingZeros:
sum: -34, time: 2499
sum: -34, time: 2487
sum: -34, time: 3313
sum: -34, time: 3313
sum: -34, time: 3468
...
Speedup: 6.77, 3.71
Overall speedup: 6.77, 3.09, 3.71, 3.26
|
|
|
EVALUATION
These numbers are from a T2 w/ 1.165 GHz. The SPARC implementations of the intrinsics need a hardware supported POPC instruction.
32-bit (Java code):
$ gamma bits
Integer.numberOfLeadingZeros:
sum: -2, time: 86712
sum: -2, time: 86898
sum: -2, time: 87017
sum: -2, time: 87455
sum: -2, time: 86938
Integer.numberOfTrailingZeros:
sum: -2147483648, time: 61786
sum: -2147483648, time: 61736
sum: -2147483648, time: 58972
sum: -2147483648, time: 58971
sum: -2147483648, time: 58971
Long.numberOfLeadingZeros:
sum: -34, time: 145686
sum: -34, time: 145644
sum: -34, time: 99939
sum: -34, time: 99939
sum: -34, time: 99939
Long.numberOfTrailingZeros:
sum: -2147483616, time: 121674
sum: -2147483616, time: 121630
sum: -2147483616, time: 81086
sum: -2147483616, time: 81087
sum: -2147483616, time: 81086
32-bit (intrinsic):
$ gamma bits
Integer.numberOfLeadingZeros:
sum: -2, time: 43354
sum: -2, time: 43353
sum: -2, time: 43307
sum: -2, time: 43307
sum: -2, time: 43307
Integer.numberOfTrailingZeros:
sum: -2147483648, time: 26652
sum: -2147483648, time: 26606
sum: -2147483648, time: 26606
sum: -2147483648, time: 26606
sum: -2147483648, time: 26606
Long.numberOfLeadingZeros:
sum: -34, time: 90336
sum: -34, time: 90301
sum: -34, time: 64500
sum: -34, time: 64500
sum: -34, time: 64500
Long.numberOfTrailingZeros:
sum: -2147483616, time: 79282
sum: -2147483616, time: 79244
sum: -2147483616, time: 42386
sum: -2147483616, time: 42386
sum: -2147483616, time: 42386
Speedup: 2.01, 2.22, 1.55, 1.91
64-bit uses the same instructions as 32-bit.
|
|
|
EVALUATION
I used this small micro-benchmark for testing:
public class bits {
public static void main(String[] args) {
Integer.highestOneBit(0);
Integer.lowestOneBit(0);
Integer.numberOfLeadingZeros(0);
Integer.numberOfTrailingZeros(0);
Long.numberOfLeadingZeros(0);
Long.numberOfTrailingZeros(0);
System.out.println("Integer.numberOfLeadingZeros:");
for (int i = 0; i < 5; i++)
doleadi();
System.out.println("Integer.numberOfTrailingZeros:");
for (int i = 0; i < 5; i++)
dotraili();
System.out.println("Long.numberOfLeadingZeros:");
for (int i = 0; i < 5; i++)
doleadl();
System.out.println("Long.numberOfTrailingZeros:");
for (int i = 0; i < 5; i++)
dotraill();
}
static void doleadi() {
long start = System.currentTimeMillis();
int sum = 0;
for (int i = 0; i < Integer.MAX_VALUE; i++) {
sum += lead(i);
}
long end = System.currentTimeMillis();
System.out.println("sum: " + sum + ", time: " + (end - start));
}
static void dotraili() {
long start = System.currentTimeMillis();
int sum = 0;
for (int i = 0; i < Integer.MAX_VALUE; i++) {
sum += trail(i);
}
long end = System.currentTimeMillis();
System.out.println("sum: " + sum + ", time: " + (end - start));
}
static void doleadl() {
long start = System.currentTimeMillis();
int sum = 0;
for (long l = 0; l < Integer.MAX_VALUE; l++) {
sum += lead(l);
}
long end = System.currentTimeMillis();
System.out.println("sum: " + sum + ", time: " + (end - start));
}
static void dotraill() {
long start = System.currentTimeMillis();
int sum = 0;
for (long l = 0; l < Integer.MAX_VALUE; l++) {
sum += trail(l);
}
long end = System.currentTimeMillis();
System.out.println("sum: " + sum + ", time: " + (end - start));
}
static int high(int i) { return Integer.highestOneBit(i); }
static int low(int i) { return Integer.lowestOneBit(i); }
static int lead(int i) { return Integer.numberOfLeadingZeros(i); }
static int trail(int i) { return Integer.numberOfTrailingZeros(i); }
static int lead(long l) { return Long.numberOfLeadingZeros(l); }
static int trail(long l) { return Long.numberOfTrailingZeros(l); }
}
|
|
|
EVALUATION
On an Intel Nehalem w/ 2.93GHz I get these numbers:
32-bit (Java code):
$ gamma bits
Integer.numberOfLeadingZeros:
sum: -2, time: 6347
sum: -2, time: 6674
sum: -2, time: 7442
sum: -2, time: 7429
sum: -2, time: 7471
Integer.numberOfTrailingZeros:
sum: -2147483648, time: 8767
sum: -2147483648, time: 8881
sum: -2147483648, time: 8155
sum: -2147483648, time: 8159
sum: -2147483648, time: 8144
Long.numberOfLeadingZeros:
sum: -34, time: 10080
sum: -34, time: 10075
sum: -34, time: 9703
sum: -34, time: 9702
sum: -34, time: 9702
Long.numberOfTrailingZeros:
sum: -2147483616, time: 10944
sum: -2147483616, time: 10932
sum: -2147483616, time: 10870
sum: -2147483616, time: 10859
sum: -2147483616, time: 10868
32-bit (intrinsic):
$ gamma bits
Integer.numberOfLeadingZeros:
sum: -2, time: 2759
sum: -2, time: 2367
sum: -2, time: 2334
sum: -2, time: 2334
sum: -2, time: 2334
Integer.numberOfTrailingZeros:
sum: -2147483648, time: 2066
sum: -2147483648, time: 2059
sum: -2147483648, time: 2059
sum: -2147483648, time: 2059
sum: -2147483648, time: 2060
Long.numberOfLeadingZeros:
sum: -34, time: 6623
sum: -34, time: 6610
sum: -34, time: 7170
sum: -34, time: 7172
sum: -34, time: 7162
Long.numberOfTrailingZeros:
sum: -2147483616, time: 5180
sum: -2147483616, time: 5159
sum: -2147483616, time: 5714
sum: -2147483616, time: 5731
sum: -2147483616, time: 5724
That's a speedup of 3.18 (Integer.numberOfLeadingZeros), 3.96 (Integer.numberOfTrailingZeros), 1.36 (Long.numberOfLeadingZeros), and 1.90 (Long.numberOfTrailingZeros) respectively.
64-bit (Java code):
$ gamma bits
Integer.numberOfLeadingZeros:
sum: -2, time: 6247
sum: -2, time: 7895
sum: -2, time: 7884
sum: -2, time: 7884
sum: -2, time: 7884
Integer.numberOfTrailingZeros:
sum: -2147483648, time: 6642
sum: -2147483648, time: 6720
sum: -2147483648, time: 7033
sum: -2147483648, time: 7022
sum: -2147483648, time: 7022
Long.numberOfLeadingZeros:
sum: -34, time: 8518
sum: -34, time: 8596
sum: -34, time: 8677
sum: -34, time: 8677
sum: -34, time: 8677
Long.numberOfTrailingZeros:
sum: -2147483616, time: 8485
sum: -2147483616, time: 8478
sum: -2147483616, time: 7941
sum: -2147483616, time: 7944
sum: -2147483616, time: 7945
64-bit (intrinsic):
$ gamma bits
Integer.numberOfLeadingZeros:
sum: -2, time: 2306
sum: -2, time: 2250
sum: -2, time: 2059
sum: -2, time: 2059
sum: -2, time: 2059
Integer.numberOfTrailingZeros:
sum: -2147483648, time: 1883
sum: -2147483648, time: 1876
sum: -2147483648, time: 1876
sum: -2147483648, time: 1876
sum: -2147483648, time: 1876
Long.numberOfLeadingZeros:
sum: -34, time: 3569
sum: -34, time: 3697
sum: -34, time: 4375
sum: -34, time: 4343
sum: -34, time: 4292
Long.numberOfTrailingZeros:
sum: -2147483616, time: 2405
sum: -2147483616, time: 2400
sum: -2147483616, time: 3711
sum: -2147483616, time: 3667
sum: -2147483616, time: 3797
Speedup: 3.83, 3.74, 2.02, 2.17
|
|
|
|