Java Solaris Communities Sun Store Join SDN My Profile Why Join?
 
Bug Database
Bug Detail
Quick Lists
Top 25 Bugs
Top 25 RFE's
Recently Closed Bugs
Printable Page Printable Page


Bug Database
Bug ID: 5108893
Votes 0
Synopsis Math.abs() is slow
Category hotspot:compiler2
Reported Against tiger-rc
Release Fixed mustang(b13)
State 10-Fix Delivered, bug
Priority: 4-Low
Related Bugs 6506405 , 5005861
Submit Date 29-SEP-2004
Description


A DESCRIPTION OF THE REQUEST :
Current JVMs implement Math.abs() using compares and branches even though there are far more efficient code sequences for computing the absolute value of  a floating point value that do not require a branches.  These non-branching code sequences are both much shorter and cheaper than the compare and branch code currently emitted by hotspot (in both the client and server JVMs).

For example, x87 has a fabs instruction which is slightly cheaper than an add and SSE uses a boolean & instructions (andsd for doubles andss for floats) which has the same cost as an add instruction.  However as the timing results below show, currently Math.abs() is much more expensive than an add.

JUSTIFICATION :
Math.abs() is currently much slower and more expensive than it should be.  It costs around six times more than an add when it should cost the same or even be slightly cheaper.  The cost of Math.abs() can be significant in some codes (including some of ours) that make heavy use of it.  Using the more efficient code sequences would signifcantly speed up such programs.

EXPECTED VERSUS ACTUAL BEHAVIOR :
EXPECTED -
Expected behaviour is that Math.abs(double) and Math.abs(float) should cost about the same as an add.
ACTUAL -
Actual behaviour is that Math.abs() is much more expensive than an add as show in the timing results for the server and client 1.5.0rc JVMs below.

java version "1.5.0-rc"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-rc-b63)
Java HotSpot(TM) Server VM (build 1.5.0-rc-b63, mixed mode)

avg    2.3 ns   total 4.68E-1 s  for assign             (~ 4.0 cycles)
avg    2.5 ns   total 5.00E-1 s  for add                (~ 4.2 cycles)
avg   17.0 ns   total 3.41E0 s   for Math.abs()         (~ 29.0 cycles)
avg    2.4 ns   total 4.85E-1 s  for assign             (~ 4.1 cycles)
avg    2.5 ns   total 5.00E-1 s  for add                (~ 4.2 cycles)
avg   17.0 ns   total 3.39E0 s   for Math.abs()         (~ 28.8 cycles)

java version "1.5.0-rc"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0-rc-b63)
Java HotSpot(TM) Client VM (build 1.5.0-rc-b63, mixed mode, sharing)

avg    5.0 ns   total 1.00E0 s   for assign             (~ 8.5 cycles)
avg    5.3 ns   total 1.06E0 s   for add                (~ 9.0 cycles)
avg   30.8 ns   total 6.16E0 s   for Math.abs()         (~ 52.3 cycles)
avg    5.0 ns   total 1.00E0 s   for assign             (~ 8.5 cycles)
avg    5.4 ns   total 1.08E0 s   for add                (~ 9.2 cycles)
avg   30.6 ns   total 6.11E0 s   for Math.abs()         (~ 51.9 cycles)


---------- BEGIN SOURCE ----------

import java.text.DecimalFormat;
import java.util.Random;

/** Test that show that Math.abs is much slower than a floating point add
 * even though it should have about the same cost
 *
 * @author  bjw  Bruce Walter, Cornell Program of Computer Graphics 2004
 */
public class AbsTest {
  //target total number of repetitions of the operation
  public static final int opTargetRepetitions = 200000000;
  //size of arrays that are operated on
  public static final int arraySize = 10000;
  //number of times we need to process each array to reach total target count
  public static final int reps = opTargetRepetitions/arraySize;
  //pretty print the output numbers to make them easier to read
  public static final DecimalFormat decForm = new DecimalFormat("###0.0");
  public static final DecimalFormat sciForm = new DecimalFormat("0.00E0");
  //my processor is a 1.7GHz Xeon (actually it is a dual processor, but this test is single threaded)
  public static final double ghzProcSpeed = 1.7; //my processor is 1.7GHz
    
  public static void runTimingTest(TestOp op, double result[], double src[], boolean print) {
    long time = System.currentTimeMillis();
    for(int i=0;i<reps;i++) {
      op.performOp(result,src);
    }
    time = System.currentTimeMillis() - time;
    double denom = 1000000.0/(reps*src.length);
    if (print) {
      String ps = decForm.format(time*denom);
      while (ps.length()<6) ps = " "+ps;
      ps = "avg "+ps+" ns   total "+sciForm.format(time/1000.0)+" s";
      while (ps.length()<32) ps += " ";
      ps = ps+" for "+op.toString();
      while (ps.length()<50) ps += " ";
      System.out.println(ps+"\t(~ "+decForm.format(time*denom*ghzProcSpeed)+" cycles)");
    }
  }
    
  public static void main(String[] args) throws InterruptedException {
    double src[] = new double[arraySize];
    double result[] = new double[arraySize];
    Random ran = new Random(5232482349538L);
    //set the src array to be random values between -1 and 1 (but excluding zero)
    for(int i=0;i<src.length;i++) {
      do {
        src[i] = 2*ran.nextDouble() - 1.0;
      } while (src[i] == 0);
    }
    TestOp tests[] = { new AssignOp(), new AddOp(), new AbsOp()};
    //warm up hotspot
    for(int i=0;i<tests.length;i++) {
      runTimingTest(tests[i],result,src,false);
    }
    //now run the real tests and print the timings
    for(int i=0;i<tests.length;i++) {
      runTimingTest(tests[i],result,src,true);
    }
    //do it again to show the timings are reasonably consistent
    for(int i=0;i<tests.length;i++) {
      runTimingTest(tests[i],result,src,true);
    }
  }
  
  public abstract static class TestOp {
    public abstract void performOp(double result[], double src[]);
  }
  
  public static class AssignOp extends TestOp {
    public String toString() { return "assign"; }
    public void performOp(double result[], double src[]) {
      for(int i=0;i<src.length;i++) {
        result[i] = src[i];
      }
    }
  }
  
  public static class AddOp extends TestOp {
    public String toString() { return "add"; }
    public void performOp(double result[], double src[]) {
      for(int i=0;i<src.length;i++) {
        result[i] = 0.143+src[i];
      }
    }
  }
  
  public static class AbsOp extends TestOp {
    public String toString() { return "Math.abs()"; }
    public void performOp(double result[], double src[]) {
      for(int i=0;i<src.length;i++) {
        result[i] = Math.abs(src[i]);
      }
    }
  }
 
}

---------- END SOURCE ----------

CUSTOMER SUBMITTED WORKAROUND :
None.  Attempts to force the more efficient code sequence such as:

private static double myAbs(double a) {
   return Double.longBitsToDouble(Double.doubleToRawLongBits(a)&0x7fffffffffffffffL);
}

are not any faster due to the overheads of  calling these methods on Double.
(Incident Review ID: 315868) 
======================================================================
  xxxxx@xxxxx   10/14/04 23:50 GMT
Work Around
N/A
Evaluation
Fix this in Mustang
  xxxxx@xxxxx   2004-09-29

I'm seeing performance of ABS that is roughly 2.5x slower, not 6x as the submitter claims:

@prt-solx86-q1-2$ $jp/bin/java -server AbsTest
avg    2.7 ns   total 5.31E-1 s  for assign             (~ 4.0 cycles)
avg    2.9 ns   total 5.74E-1 s  for add                (~ 4.3 cycles)
avg    7.6 ns   total 1.53E0 s   for Math.abs()         (~ 11.4 cycles)
avg    2.7 ns   total 5.31E-1 s  for assign             (~ 4.0 cycles)
avg    2.8 ns   total 5.50E-1 s  for add                (~ 4.1 cycles)
avg    7.6 ns   total 1.53E0 s   for Math.abs()         (~ 11.5 cycles)

My machine is a dual 1.5Ghz, but even on a P3-700Mhz performance is still roughly 2.5x slower

@foundation$ $jp/bin/java -server AbsTest             
avg    5.8 ns   total 1.17E0 s   for assign             (~ 4.1 cycles)
avg    6.4 ns   total 1.29E0 s   for add                (~ 4.5 cycles)
avg   16.4 ns   total 3.28E0 s   for Math.abs()         (~ 11.5 cycles)
avg    5.8 ns   total 1.17E0 s   for assign             (~ 4.1 cycles)
avg    6.4 ns   total 1.29E0 s   for add                (~ 4.5 cycles)
avg   16.4 ns   total 3.27E0 s   for Math.abs()         (~ 11.5 cycles)

Looking into this...
  xxxxx@xxxxx   10/14/04 23:50 GMT

It seems the SSE2 version of AbsD is SLOW.  If I enable the flags -XX:UseSSE=0 or -XX:UseSSE=1 then ABS is fast again (because we are using FABS).  Looking into a faster instruction for SSE2 machines
  xxxxx@xxxxx   10/20/04 16:38 GMT

Got ABS intrinsified... 4cycles on a PC, 7 cycles on a SPARC (same as an ADD) and working on C1 now.
  xxxxx@xxxxx   10/25/04 19:41 GMT
Comments
  
  Include a link with my name & email   

Submitted On 20-DEC-2004
mksreddy
My code which is  using Math.abs (on double) is about  3 times faster in  latest mustang build:
java version "1.6.0-ea"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.6.0-ea-b16
Java HotSpot(TM) Client VM (build 1.6.0-ea-b16, mixed mode, sharing)

Before mustang the methodn execution used to take 220ms on average, Now it takes about 70ms on average.

Thanks for the change.



PLEASE NOTE: JDK6 is formerly known as Project Mustang