|
Quick Lists
|
|
Bug ID:
|
4213096
|
|
Votes
|
16
|
|
Synopsis
|
A Proposal for User-Defined Lightweight Value-Semantics Objects
|
|
Category
|
java:specification
|
|
Reported Against
|
1.2
|
|
Release Fixed
|
|
|
State
|
6-Fix Understood,
request for enhancement
|
|
Priority:
|
5-Very Low
|
|
Related Bugs
|
4213071
,
4213176
,
4229200
,
4809540
,
4820062
|
|
Submit Date
|
19-FEB-1999
|
|
Description
|
A Proposal for User-Defined Lightweight Value-Semantics Objects
Introduction:
-------------
This is a proposal to allow a Java programmer to define new
Lightweight Value-Semantics Objects [LVSOs] to the Java language.
"Lightweight" means that manipulating the objects is approximately as
efficient as manipulating primitive objects. "Value-Semantics" means
that the "objects" have value semantics as opposed to the reference
semantics of ordinary Java objects. "Objects" means that the
declarations whose addition I propose be added adding to the language
are similar to class declarations, and you get all of the class
options that are reasonable given value semantics.
Please note that I propose to allow the user to declare new classes of
LVSOs, not to create the notion of LVSOs ab initio. Java already has
LVSOs; int, boolean, etc. We should not debate whether we need LVSOs,
only whether we should let the user add new LVSO classes.
I will develop the proposal in three stages:
the basic stage, in which you essentially get to define your
primitive types
the subtypable primitives stage, in which we explore the
implications of inheritance for LVSOs [whose method calls would not
be polymorphic, since there are no pointers in Java and no
references to LVSOs].
the overloaded operators stage. I won't devote too much attention
to this because the Java developers do not plan to introduce
operator overloading in the near future. See the resolution of bug
#4087427.
Some of the ideas for type safety language features come from Ada.
Motivation:
-----------
Consider the Apples and Oranges classes of Appendix A. These classes
have simple behaviors and few instance variables; the motivation for
declaring the classes is type safety; the compiler won't let you
compare Apples and Oranges. In this case the programmer's intent was
to create a new primitive type with int's semantics but with type
safety.
Consider the Money class and its subclasses of Appendix B. In this
case we have a class that implements a non-integral fixed point
representation, and two subclasses that are, again, provided for type
safety.
Lastly, consider the Complex class of Appendix C. The Complex class
is a canonical example of a type that is not a primitive type in many
languages, but which you would like to be implemented efficiently.
The Complex class differs from the other two in the fact that the
instance variable set is more than a single instance of one primitive
type.
These classes could be treated similarly to complicated classes for
which the overhead of customer creation is amortized over a long customer
lifetime, but there are several reasons why the programmer might
prefer a different treatment for these tiny objects:
1: The objects are small. They don't represent the fruits of much
computation, making the high cost of customer creation a problem.
2: You probably want value semantics rather than reference
semantics. In particular, equals() is likely to do the right
thing in more cases than ==
Because all objects are manipulated with references in Java, and
storage for non-primitive objects are managed by the garbage collector
and never stack allocated, we must allocate and free objects to get
type safety or to get the effects of structures. You would like the
compiler to prevent you from comparing Apples and Oranges, but it cost
980 units of time[1] to create an Apple or an Orange, where a simple
assignment costs one time unit. In the real world people will be
inclined to declare integer variables for discrete units such as
Apples and Oranges, and float variables for physical units such as
meters or seconds, and from time to time Java programs will have bugs
that trace to type mistakes. In the case of complex numbers and other
small aggregates of primitive types with simple methods, programmers
will have three unappetizing choices:
1: Create the objects and pay the performance penalty
2: Create the objects and reuse them, managing your own storage and
having the same dangling reference problem you get in languages
like C++ [although there would be no memory leaks]
3: Maintain clusters of local variables; in this example you might
have double ac_signal_real; double ac_signal_imag; where you
would like to have written Complex ac_signal; this is what you
would like the compiler to do for you.
One problem with simply creating the objects is that we are then faced
with a second choice. We can make the small objects immutable, so for
example the sole Complex addition method looks like this
Complex add(Complex addend)
{
return new Complex(real + addend.real, imag + addend.imag);
}
but efficiency considerations would impel us to at least provide the
option of modifying an existing Complex and including the method
Complex add_d(Complex addend)
{
real = real + addend.real;
imag = imag + addend.imag;
return this;
}
and the programmer must always keep in mind whether a Complex
reference has more than one name at a given point in his program in
order to be able to use add_d and avoid the cost of an allocation
every time you add.
The Basic Proposal; extensions of Primitive
-------------------------------------------
When a programmer declares an int in Java [or, for that matter, in
almost any other programming language], he allocates a cell able to
contain a basic integer in whatever frame he's declaring in. For
example, if he declares an int in a method he gets an int in the local
variables area of the stack frame of every invocation of that method.
If the programmer keeps modifying that int, the same cell is reused.
On the other hand, when a programmer declares an customer , including a
boxed instance of a primitive type such as Integer, he allocates a
cell able to hold a reference to an instance. The class designer has
to decide whether to make the customer updatable, which might reduce the
creation of new objects but which will lead to unexpected behavior if
there are multiple pointers to the same customer , or immutable, in which
case the new operator with its performance penalty needs to be
performed repeatedly. The boxed primitive types in java.lang.* are
immutable.
I propose that the programmer be able to declare that a class extends
java.lang.Primitive instead of the usual java.lang.Object or one of
its subclasses. If he does this, then the declared memory contains
the customer itself instead of a reference to the customer . If a class is
derived from Primitive then there is no way to declare a reference to
instances of that class.
I further propose the following conventions:
1: You may use any visible constructor to create a value of a
Primitive extension. You can't apply new to the class name,
since Primitive extensions are not allocated on the heap.
2: If you declare a variable of a Primitive extension but provide no
initial value the instance variables are built with the default
constructor. There must be one, or an uninitialized declaration
is forbidden. However, a consequence of rule 1 is that if you
elect to initialize your variable you could write something like
Complex unit_real_vector = Complex(1.0, 0.0);
Of course the constructor must exist and must be visible per its
access modifier.
3: The equals method inherited from Primitive recursively applies
equal to corresponding instance variables. This can, of course,
be overridden but there is no polymorphism because all value
types can be determined at compile time.
A == B is not allowed if A and B are different Primitive subtypes
of if one operand is a Primitive subtype and the other isn't, and
is determined by applying A.equals(B) if they are of the same
Primitive subtype.
4: Primitive subtypes cannot be declared to implement an interface.
A Primitive subtype cannot contain itself as an instance variable.
There can be no sequence of Primitive subtypes S1, S2, ... Sn such
that S1 contains an S2, S2 contains an customer , ..., S(n-1) contains an Sn
and Sn contains an S1.
proposed implementation
-----------------------
Types come in three flavors: primitive, Primitive subtype, or customer .
Every Primitive subtype reduces to a sequence of types of the form
<Array>^i<plain Java type> where i is a non-negative integer. A
<plain Java type> is a primitive type or an Object. The reduction is
as follows:
1: a non-array Object or a primitive type reduces to itself
2: a Primitive subtype reduces to the concatenation of the reductions
of the types of its instance variables
3: If Type reduces to [ <array>^i1<t1>, <array>^i2<t2>, ... ] then
Type[] reduces to [ <array>^i1+1<t1>, <array>^i2+1<t2>, ... ]
This is guaranteed to terminate by noncircularity of the instance
variable type definitions and the fact that there is a finite number
of Primitive subtypes. In practice I do not expect the reduction of a
primitive subtype to reduce to more than a handful of types. We call
the result of this process the "decomposition" of the Primitive
subtype.
If a Primitive subtype P decomposes to a single type T, then the
compiler shall generate code that allocates a cell that can hold a
value of type T for each variable of type P. The compiler shall
arrange for the methods to be called on implicit arguments of type T.
Byte code for static methods is the way to accomplish this since there
is no this for an instance method call. Therefore, the "methods" of a
Primitive subtype compile as static methods.
If a Primitive subtype P decomposes to a sequence of types T1, T2,
... Tn, then the compiler shall generate code that allocates n cells,
one each to hold a value of each of the Ti types, for each variable of
type P. The compiler shall arrange for the methods to be called on a
sequence of implicit arguments of types Ti. For each of the explicit
arguments of a Primitive subtype several formal parameters, one for
each of that subtype's Ti's, shall be passed instead. For example,
double magnitude(Complex C){ .. real .. imag .. }
compiles into byte code appropriate for
static double magnitude(double real, double imag){ .. real .. imag .. }
Do note that an array of Complex decomposes to two arrays, each of
double.
Because the byte code fundamentally can't return multiple values [as
in the LISP style], return values of Primitive subtypes which
decompose to multiple types must be returned in some other manner. I
propose hidden formal parameters to methods returning such a value.
The compiler shall generate one mutable boxed primitive type instance
for each primitive value returned as a Primitive subtype, except for
the first type T1 which can be returned as the return value. [Owen,
help me word this?] Although these boxed types must be allocated,
they only need to be allocated once per calling method invocation at
worst, and in some cases a method would be able to pass some of the
hidden boxed primitive types to its callees and therefore avoid extra
allocations. Another possibility is to have instance variables on
Thread that maintain per-Thread lists of mutable boxed primitive type
instances for this purpose. So
Complex add(Complex addend){ ... }
compiles as if it were
static double add(boxed_double imag_ret, double real, double imag,
double addend_real, double addend_imag)
{ ... }
There is one other small detail. The == operator must generate a call to
equals() when the operands are compatible LVSOs.
Extending Primitive subtypes:
-----------------------------
I do not propose that type information be kept with LVSOs. You may
therefore not assign a DollarsAndCents to a Money. However,
inheritance is useful even without polymorphism. I propose the Ada
convention that an operation on a supertype be extended to the subtype
by replacing the supertype by the subtype everywhere in the signature,
including the result type. An operation will not be extended if any
part of the signature is already of the subtype, or if the result is
of the supertype and the subtype adds instance variables.
Overloaded Operators:
---------------------
It doesn't look like operator overloading will be added to Java any
time soon. However, if it ever is added, say with C++ syntax, you
might want to be able to engage an operation on its right-hand
operand. The C++ convention of declaring a variant of a method
applicable to a different syntax by adding a bogus int formal
parameter may serve us well here. For example, scalar multiplication
of a Complex might look like this:
Complex operation*(double scalar)
{
return Complex(real * scalar, imag * scalar);
}
Complex operation*(double scalar, int i)
{
return Complex(real * scalar, imag * scalar);
}
...
Complex C(1.0, 2.0)
... C * 3.0 // engages Complex operation*(double scalar)
... 3.0 * C // engages Complex operation*(double scalar, int i)
Appendix A
public final class Apples
{
private int number;
public Apples(int i)
{
number = i;
}
public Apples plus(Apples addend)
{
return Apples(number + addend.number);
}
// you get this one for free; it's placed here for emphasis
public boolean equals(Apples comparand)
{
return number == comparand.number;
}
...
}
public final class Oranges
{
private int number;
public Oranges(int i)
{
number = i;
}
public Oranges plus(Oranges addend)
{
return Oranges(number + addend.number);
}
// you get this one for free; it's placed here for emphasis
public boolean equals(Oranges comparand)
{
return number == comparand.number;
}
...
}
Appendix B
public class Money
{
private long cents;
public Money(double amount)
{
cents = amount * 100 + 0.1;
}
private Money(long amount)
{
cents = amount;
}
public Money add(Money addend)
{
return Money(cents + addend.cents);
}
public boolean equals(Money comparand)
{
return cents = addend.cents;
}
...
}
public final class DollarsAndCents extends Money
...
public final class Euros extends Money
...
Appendix C
public final class Complex
{
private double real;
private double imag;
public Complex(double _real, double _imag)
{
real = _real; imag = _imag;
}
public Complex add(Complex addend)
{
return Complex(real + addend.real, imag + addend.imag);
}
public boolean equals(Complex comparand)
{
return real == comparand.real && imag = comparand.imag;
}
...
}
notes
[1] Bruce Eckels, "Thinking in Java", Prentice Hall, 1998,
Pp. 1060-1061
These timings were performed on Sun's JDK1.1.4 on a Pentium Pro
200MHz.
(Review ID: 48962)
=====================================
Posted Date : 2006-02-17 22:24:17.0
=================================
Posted Date : 2005-07-22 03:26:23.0
|
|
Work Around
|
Since this is a request for an enhancement, this section might
not really be appropriate. The alternatives, however, are:
1: Create objects and pay the performance penalty
2: Create the objects and reuse them, managing your own storage and
having the same dangling reference problem you get in languages
like C++ [although there would be no memory leaks]
3: Maintain clusters of local variables; in this example you might
have double ac_signal_real; double ac_signal_imag; where you
would like to have written Complex ac_signal; this is what you
would like the compiler to do for you.
======================================================================
|
|
Evaluation
|
This is exactly the wrong thing to do. We do NOT want Java to be more like
Ada. There have been several proposals for lightweight objects. I believe that
in the end this will be dealt with via optimization technology (IBM has had
great results in this area). In any case, there are probably lighter weight
proposals for light weight objects.
xxxxx@xxxxx 1999-05-04
|
|
Comments
|
Submitted On 16-MAR-1999
Ray Burns
I agree that it would be great to give the compiler the _option_ to be able to
store such values in registers and on the stack. But I don't like the idea of
making the Java language substantially more complex.
Instead of coming up with a completely new type, why not just provide a
"no identity" syntax to signify to the compiler/runtime that
"==" need not work correctly on objects of this class.
Any class which has no non-final fields is inherently immutable. The only
reason today's compilers can't pass such classes around on the stack and in
registers is that somebody might want to "==" one of them at some
point. Thus, the following code would not work as expected:
class Abc
{
final x;
Abc(int x) { this.x=x; }
}
class Xyz
{
void doSomething()
{
Abc a = new Abc(1);
Abc b = a;
if(a == b) ...
}
}
A possible syntax change to indicate "no identity" would be:
class Abc noidentity
{
...
}
Now the above "==" could be flagged as an error. Or it could simply
return false and the definition of "==" could be suitably modified to
say that: If two objects of a class declared with the "noidentity"
keyword are compared using ==, the result may be false, even if both references
came from the same "new".
The tricky part is when these "no identity" objects are cast to class
Object. Once cast, a bona-fide object must be created for them, so that
references can be treated in the same way. In the following code, an Abc is
cast to Object twice. Any reasonable implementation would end up with two
identical heap-allocated Abc objects, one referernced by a1 and the other by
a2.
Abc a = ...;
Object a1 = a;
if(/*something falsish*/) a=...;
Object a2 = a;
...
vector.add(a1);
if(a1 != a2)
vector.add(a2);
I don't see any way this can be handled well without some modification of the
implementation of ==, which would severely impact run-time speed of much
existing code. My inclination is to ignore the problem and just let == fail in
these situations.
In any case, I think that modifying == would be more palatable than introducing
the language complexities of a totally new way of defining primitive types.
In your system, I see no reason to eliminate the use of the "new"
keyword for constructing primitive types. The concept that "new"
must always allocate on the heap is a C++ concept, and need not carry so
literally to Java.
Submitted On 30-MAR-1999
MartDesruisseaux
I'm very happy to see this proposal. The lack of an efficient way to work with
small mathematical object is one of the worst "Java killer" around
me. Many peoples refuse to take a look to Java ("no way!" they say)
and will continue to use Fortran as long as Java has no efficient way to handle
complex numbers. I know it may bring a new level of complexity, but LVSO are
really important enough for that. I'm sure the Ray Burns's comment above is an
acceptable solution too, but I'm not sure it would satisfied me and my
colleagues. I strongly support this proposal!
Submitted On 10-JUN-1999
kornai
At the risk of getting myself into some hot water,
let me say simply that the real issue is the
contract enjoyed by C/C++ structs that guarantees
efficient layout in memory (modulo word boundry
padding). People in the numbercrunching world need
_something_ that upholds this contract, and this
proposal does as nice a job of reconciling this
requirement with the spirit of Java (where direct
memory control is viewed as evil) as can be hoped
for. No religious flames please.
Submitted On 08-SEP-2000
hwc
It's a shame this proposal hasn't got more votes - few people will see it until it's in the top 25.
Personally I'd be happy with just the first stage, being able to define a data structure within
a class without the overhead of creating a whole new (inner) class.
I wonder if Microsoft's C#, which seems to have retained struct for "light weight" objects, may
influence people's opinions one way or the other?
Submitted On 04-OCT-2000
hwc
Some optimization may be possible without changing the
existing syntax but it's unlikely to be straighforward. Ray
Burns' "no identity" proposal might help with "==" but does
not address the fundamental problem of assignment.
With Java as it stands, any object passed to any function
is passed by reference. This reference can be stored or
passed to any other function. Unless the compiler can be
sure those functions will not store that reference it must
create the object on the heap.
P.S. I cannot comment on Java becoming more like Ada, but
allowing the programmer to pass an object by reference as
well as by value goes back at least as far as Algol 68.
Submitted On 14-AUG-2001
tbreuel
You cannot address this via "optimization technology". The
primary significance of "expanded classes" is not
eliminating a few heap allocations, or manipulating a few
objects inside a function, or putting a few values into
registers. Instead, it is for the choice of representation
of large arrays of small structures. A compiler cannot on
the fly decide that a 500M array might be better represented
after all using expanded classes.
The current representation scheme adds lots of memory and
CPU overhead to data-intensive applications, and this
disparity will simply not go away: you pay at least one
32bit word of overhead per object (the pointer itself), and
in the case of the actual Java runtime, it is more like 3-4
words of overhead (the pointer to the object plus its
header). In fact, on 64bit machines, the overhead will get
even worse.
Again, it is up to Sun to decide where the priorities are.
However, it is important to keep in mind that many high-end
server applications involve statistical decision rules and
statistical string analysis, and these kinds of small data
types occur frequently in such systems. If Java wants to be
a competitive choice in such fields as text and data mining
and bioinformatics, this issue must be addressed.
Submitted On 20-FEB-2002
walter_bruce@hotmail.com
I agree that something like this is needed. The details may
may differ (I don't care much about inheritance or operator
overloading), but the ability to extend the set of light-
weight, directly embeddable objects beyond the set of
primitives that comes with Java by default, is definitely
needed.
The issue is not how to do == (probably should just default
to performing == on all the fields in the lightweight
object) or whether compiler optizations could do this
automatically (not generally possible because it involves
the memory layout of objects that may escape scope), but an
intrinsically more efficient memory layout for small
objects and especially for objects that contain small
objects. Consider, for example, two ways of coding a
triangle class that contains 3 vertices:
class Triangle {
Point3d vertex0 = new Point3d();
Point3d vertex1 = new Point3d();
Point3d vertex2 = new Point3d();
. . . (constructors, methods, etc)
}
class Triangle {
double vertex0X,vertex0Y,vertex0Z;
double vertex1X,vertex1Y,vertex1Z;
double vertex2X,vertex2Y,vertex2Z;
. . . (constructors, methods, etc)
}
The first using embedded heavyweight objects suffers
significant memory and performance penalties. It requires
roughly 36 extra bytes for storage (3*4 for each of the
embedded object references plus 3*8 for the heap overhead
of allocating three additional entities on the heap) and
requires an extra level of (performance sapping)
indirection whenever accessing the vertices.
The second loses the encapsulation of the vertices
resulting in harder to write and maintain code and lots of
code duplication in the methods that use these vertices
because we lose access to the predefined methods of Point3d.
If Point3d could be implemented as a lightweight object, we
could finally get the best of both worlds. Better memory
layout, reduced storage requirements, reduced levels of
indirection, good encapsulation, and ability to reuse the
methods associated with Point3d.
Submitted On 06-MAR-2005
AlexLamSL
against...
Submitted On 07-FEB-2006
arumad
it would be good (in dreams) if hierarchy of java data types will have only one root - type void.
it's annoying what primitive types have no place in common data types hierarchy tree and covariant returns do not working on them.
Submitted On 11-JUN-2007
I'm for. The evaluation is also a bit .. arrogant. Who is writing it and thinking they represent the whole community.
The issue that is being addressed is that of composition vs association, and perhaps that gives an indication of a different way to approach it.
Firstly, my reason for being very positive towards this. I have just finished reducing the memory footprint of a java-based database to half, by implementing each record as a byte array with each stored object broken down and written as it's component float, int etc.
This is HELL. For each object, I have to write how it should efficiently encode and decode itself from a byte array.
Having objects derived from Primitive (or perhaps declared using a primitive keyword) would clarify some otherwise un-documented meaning. That there will never be a second reference to that object.
If we allowed, for example, Complex, to be able to be referenced as a normal object, but also declared as, e.g. 'private primitive Complex a;', then it this would give us the ability to map write assignments to being a field by field copy of the 'normal' Complex instance, and read assignments being that of creating a new object as a clone().
e.g.
class Line
{
private primitive Point start; // object allocated within Line, not separately on heap)
private primitive Point finish;
public Line( Point a, Point b )
{
this.start = a; // will effectively do an arraycopy from &a to this.start
this.end = b; // ditto
}
public Point getStart()
{
return this.a; // returns this.a.clone() to create a new object from the local primitive
}
}
PLEASE NOTE: JDK6 is formerly known as Project Mustang
|
|
|
 |