This is a small but important feature. There are two kinds of use for exotic identifiers:
- Writing Java code to reference artifacts in other languages, e.g. call a Ruby "+" function
- Generating Java code to represent artifacts of other languages, e.g. generating a Java class whose methods are named for XML tags and attributes, of which "class" is a popular example.
Some spec points:
- An identifier is a SIMPLE identifier or an EXTENDED identifier:
SimpleIdentifierCharacters but not a Keyword or BooleanLiteral or NullLiteral
// JavaLetter and JavaLetterOrDigit are unchanged from JLS3
// Extended identifiers must not be empty, and should permit standalone \ and novel escape sequences
// like \| and \? in support of John's mangling scheme.
// Thus ExtendedIdentifier cannot be #"StringLiteral" because it would empty strings, and
// disallow a standalone \ (through StringLiteral's use of StringCharacter), and is too restrictive
// about legal escape sequences (only \b, \t, \n et al, as per 3.10.6).
InputCharacter but not / or . or ; or < or > or [ or "
- A simple identifier is an unlimited-length sequence of Java letters and digits...
- An extended identifier is a # ASCII character, then a " ASCII character, then an unlimited-length sequence of Unicode characters (excluding / (\u002F) and . (\u002E) and ; (\u003B) and < (\u003C) and > (\u003E) and [ (\u005B) and " (\u0022)), then a " ASCII character.
- The body of a simple identifier is its sequence of Java letters and digits. The body of an extended identifier is the sequence of Unicode characters between the " tokens. Two identifiers are the same only if their bodies have the same Unicode character at corresponding positions. // See also 6.5.
- The body of an extended identifier can use the character and string escape sequences (3.10.6) to represent certain special characters. Outside those escape sequences, the backslash Unicode character (\u005C) is not treated specially in an extended identifier.
- Unlike a simple identifier, an extended identifier may have the same spelling as a keyword or any literal (with the pedantic exception of the empty string literal).
The following characters sequences...are reserved for use as keywords and cannot be used as ***simple identifiers***:
3.10.6 Escape Sequences for Character and String Literals
It is a compile-time error if the character following a backslash in an escape ***sequence for a character literal or string literal*** is not an ASCII...
6.2 Names and Identifiers
A simple name is ***the body of*** a single identifier. A qualified name consists of a name, a ???.??? token, and a ***simple name***.
6.5 Determining the Meaning of a Name
// Need to ensure that any reference to an identifier in the rest of the JLS is interpreted to mean "the body of the identifier", e.g. 8.4.1 Formal Parameters "If two formal parameters of the same method or constructor are declared to have the same name (that is, their declarations mention the same Identifier),", e.g. 8.9 Enums "* The string must match exactly an identifier used to declare an enum constant in this type", e.g. 9.7 Annotations "The Identifier in an ElementValuePair must be the simple name of...", e.g. much of section 14.
7.7 Unique Package Names
// This section suggests a convention for mangling domain name characters not permitted in simple identifiers. Suggest this problem can be avoided by composing a package name from extended, not simple, identifiers.