Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8223002

Keyword Management for the Java Language

    Details

    • Type: JEP
    • Status: Draft
    • Priority: P4
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: specification
    • Labels:
      None
    • Author:
      Alex Buckley
    • JEP Type:
      Informational
    • Exposure:
      Open
    • Subcomponent:
    • Scope:
      SE
    • Discussion:
      jdk dash dev at openjdk dot java dot net
    • Effort:
      M
    • Duration:
      M

      Description

      Summary

      Evolving the Java language often means new keywords for new features, but new keywords risk breaking existing programs. To balance compatibility and readability, a new kind of keyword may be used: a hyphenated keyword that is a compound of pre-existing keywords and identifiers, such as non-final, break-with, and short-circuit.

      Note: All examples in this JEP are intended solely to illustrate a syntactic form under discussion. They are not intended to suggest that any particular language feature is being considered for inclusion in Java now or in the future.

      Goals

      • Explore the syntactic options open to Java language designers for denoting new features.

      • Solve the perpetual problem of keyword tokens being so scarce and expensive that language designers have to constrain or corrupt the Java programming model to fit the keywords available.

      • Advise language designers on the style of keyword suited to different kinds of features.

      Non-Goals

      • In any proposal for new elements of Java syntax, it is important to avoid being influenced by the (often strawman) syntax of language features presently in development.

      • It is not a goal to optimize new elements of Java syntax for ease of implementation by compiler developers.

      Motivation

      A keyword is a sequence of ASCII letters that cannot be used as an identifier in Java programs. Java uses a small set of keywords to denote the most fundamental features of the language:

      • Primitive types: boolean, byte, char, double, float, int, long, short

      • Reference types and their members: package, class, interface, extends, implements, throws, enum, abstract, final, native, private, protected, public, static, strictfp, synchronized, transient, void, volatile

      • Statements: assert, break, case, catch, continue, default, do, else, for, finally, if, import, return, switch, throw, try, while

      • Expressions: instanceof, new, super, this

      Over time, Java language designers face a challenge: the keywords conceived for the features of Java 1.0 are rarely suitable for denoting new features. There are several obvious techniques for addressing this problem:

      • Eminent domain: Reclassify an identifier as a keyword, such as assert in Java 1.4 and enum in Java 1.5. A similar but more conservative move is to reclassify some unusual set of identifiers as keywords, such as identifiers that begin with two underscores (e.g., __nonnull), a style often seen in feature prototypes and inspired by reserved identifiers in C.

      • Overload: Reuse an existing keyword for a new feature. For example, reuse the default keyword from a switch statement to declare an annotation element and a default method. As another example, reuse the break keyword from a switch statement to yield a value as the result of a switch expression (break <value>; which unfortunately looks like break <label>;).

      • Distort: Find a syntax that doesn't require a new keyword, such as @interface to declare an annotation type.

      • Smoke and mirrors: Create the illusion of new keywords in new contexts through various linguistic heroics, such as treating the identifier var as a type name but only in local variable declarations, or reclassifying the identifier module as a keyword but only in module declarations.

      For most new features, all of these techniques are on the table -- but most of the time, none are very good. Given that all of these techniques are problematic, and there is not even a least-problematic technique that works in all situations, it is desirable to try to expand the set of syntactic forms that serve as keywords. Otherwise, the lack of reasonable techniques for extending the syntax of the language will become a significant impediment to language evolution.

      In addition, modifiers like static and final make up a quarter of all keywords, but the set of modifiers is not complete; there is no way to say "not static" or "not final". Consequently, it is not possible to create features where variables or classes are final by default, or members are static by default, because there is no way to denote the opt out of "not static" or "not final". Leaving a feature out of Java for reasons of simplicity is fine; leaving it out because there is no way to denote the obvious semantics is not. This is a constant problem in evolving the language, and an ongoing tax paid by every Java developer.

      Description

      Syntax in feature design

      The best syntax for a new feature -- whether declaration, statement, or expression -- is inherently feature-specific.

      Some features are denoted best with tokens other than keywords: the operator -> for a lambda expression, the separator :: for a method reference expression, and the separator ... for a varargs parameter declaration. Also, features that support built-in types tend to find their own syntactic ground independent of keywords: the literals true, false, and null, the delimiter """ for multi-line string literals, the prefix 0b for binary literals, and the suffixes L, F, D, etc for numeric literals.

      Most features, though, are denoted best with keywords whose length, alphabet, and tone align with pre-existing keywords. That means 2-20 ASCII letters which spell out a simple noun, verb, or adjective of U.S. English. Traditionally, there were two kinds of keyword that met these constraints:

      • Classic keyword: A sequence of Java letters that is always tokenized as a keyword, never as an identifier.

      • Contextual keyword: A sequence of Java letters that is tokenized as a keyword in certain contexts but as an identifier in all other contexts (e.g. module, a restricted keyword in Java 9). Alternatively, a sequence of Java letters that is always tokenized as an identifier but for which special provision is made in certain contexts (e.g., var, a restricted identifier for local variables in Java 10).

      Each classic and contextual keyword is unitary -- an individual token -- but this JEP opens up new syntactic ground by allowing a keyword to be a compound of multiple tokens, separated by delimiters. The delimiter is a familiar ASCII character that is not a Java letter, namely - (hyphen). This leads to two kinds of hyphenated keyword:

      • Hyphenated classic keyword: A keyword that is formed by using a hyphen to join a (unitary) classic keyword with identifiers, literals, other (unitary) classic keywords, and (unitary) contextual keywords.

      • Hyphenated contextual keyword: A keyword that is formed by using a hyphen to join a (unitary) contextual keyword with identifiers, literals, and other (unitary) contextual keywords.

      Hyphenated keywords

      Hyphenation admits a rich array of phrases relevant to current and potential constructs of the Java language.

      Hyphenated classic keywords

      • non-final (if the default for method parameters was to be made final)
      • break-with (to yield a value from a switch expression)
      • package-private (the default accessibility for class members)
      • public-read (to denote "publicly readable, privately writable")
      • enum-class and annotation-interface (versus enum and @interface)
      • value-class and record-class (versus value class and record)
      • default-value (for elements of an annotation type)
      • this-class (to denote the class literal for the current class)
      • this-return (to mark a setter or builder method as returning its receiver)
      • short-circuit (perhaps useful for fibers)

      Hyphenated contextual keywords

      • non-null
      • read-only
      • lazy-var (to declare a lazy final field)
      • eventually-true (perhaps useful for lazy final fields)

      Hyphenated keywords are terminal symbols of the syntactic grammar of the Java language. This presents a challenge for the lower-level lexical grammar of the Java language, where input characters and line terminators are tokenized into keywords, identifiers, literals, operators, and separators. The easy case is a hyphenated keyword that starts with a classic keyword: after tokenizing the Java letters that make up the classic keyword, the lexer has to tokenize a trailing - character and Java letters not as an operator and an identifier, but rather as part of the hyphenated keyword. The hard case is a hyphenated keyword that starts with something other than a classic keyword, because the lexer has to realize that a sequence of characters which it tokenized as an identifier (e.g., non) is in fact, after further tokenization of an operator (-) and a classic keyword (final), the start of a hyphenated keyword.

      A future version of this JEP may suggest notions of compound keywords other than hyphenated keywords, such as keywords joined with + or : delimiters.

      Keyword management

      The following policy is commended to Java language designers:

      1. Use a hyphenated classic keyword when you want to introduce a keyword in the middle of code, at a place where an identifier may occur.

      2. Use a hyphenated {classic, contextual} keyword when you want a keyword at a declaration site (class, field, method).

      3. Use a unitary {classic, contextual} keyword only in the most extreme cases where no hyphenated keyword is suitable.

      The following subsections provide rationale for this policy.

      Avoid classic keywords

      While it may be legal for language designers to define i as a keyword in a future version of Java, it would likely break every program in the world, since i is used so commonly as an identifier. (When the assert keyword was added in 1.4, it broke every testing framework.) The cost of remediating the effect of such an incompatible change varies as well: invalidating a name choice for a local variable has a local fix, but invalidating the name of a public type or an interface method might well be fatal.

      Additionally, the keywords that language designers are likely to want to reclaim are often those that are popular as identifiers (e.g., value, var, method), making such fatal collisions more likely. In some cases, if the keyword candidate in question is rarely used as an identifier, designers might opt to take the source-compatibility hit -- but candidates that are unlikely to collide (e.g., usually_but_not_always_final) are probably not the keywords anyone is hoping for.

      Realistically, the space of identifiers is unlikely to be a well that language designers can draw from very often to find keywords, and the bar must be very high.

      As a historical note, const and goto have been keywords since Java 1.0, even though they are not used by any language feature. They were defined as keywords not because a future version of Java was expected to use them, but because it supported a broader goal: migration from the then-preeminent C++ to the then-fledgling Java. Per the Java Language Specification in 1996, it allowed "a Java compiler to produce better error messages if these C++ keywords incorrectly appear in programs". (Namely, if const had been an identifier, then const int x = ... would have been flagged by a Java compiler as "Error, 'const' found where a keyword was expected", which is incongruent to a C++ developer who thinks const is a keyword; by making const a keyword in Java, Java compilers were forced to recognize it and flag "Error, 'const' keyword not allowed here", which is more comprehensible to a C++ developer.) Given the vast amount of code now written in Java, and the source incompatibility of a new classic keyword, there would be no justification for eagerly defining a classic keyword to support migration from another language. For example, it would be unacceptable to reclassify function from an identifier to a keyword in order to improve error messages for code copy-pasted from ECMAScript.

      Cautiously consider contextual keywords

      At first glance, unitary contextual keywords (and their friends, reserved type names) appear to be a magic wand: they let language designers create the illusion of new keywords without breaking existing programs. However, the positive track record of unitary contextual keywords hides a great deal of complexity and distortion.

      The process of introducing a unitary contextual keyword is not a simple matter of choosing a word and adding it to the grammar; each one requires an analysis of potential current and future interactions. Each grammar position is its own story: contextual keywords that might be used as modifiers (e.g., readonly) have different ambiguity considerations than those that might be used in code (e.g., a match expression). While a small number of special situations can be managed in a specification or a compiler, the more heavily that unitary contextual keywords are used, the more likely there would be more significant maintenance costs and longer bug tails.

      Beyond specifications and compilers, unitary contextual keywords distort the language for IDEs. IDEs often have to guess whether an identifier is meant to be an identifier or a unitary contextual keyword, and it may not have enough information to make a good guess until it has seen more input. While this is easy to dismiss as “not my problem”, in reality, it results in worse code highlighting, auto-completion, and refactoring abilities for everybody. (IDEs have the same trouble with hyphenated contextual keywords too.)

      Finally, each identifier that is a candidate for dual-purposing as a unitary contextual keyword may have its own special considerations. For example, the use of var as a restricted identifier is justified only because the naming conventions for type names are so broadly adhered to. Using a hyphenated contextual keyword rather than a unitary contextual keyword can sidestep these considerations, since the hyphenated phrase has never been used as an identifier, though the ambiguity issue remains.

      In summary, unitary contextual keywords are a tool in the language design toolbox, but they should be used with care.

      Prefer hyphenated keywords

      Hyphenated {classic, contextual} keywords create less trouble than unitary contextual keywords because the lexer can tell with fixed lookahead whether A-B should become three tokens (identifier, operator, identifier) or one (hyphenated keyword), whereas arbitrary lookahead may be required to tokenize an identifier as a unitary contextual keyword. There is less trouble for parsing as well; for example, non-null cannot be confused for a subtraction expression. In sum, this gives a lot more room for creating new, less-conflicting keywords. Happily, these new keywords are likely to be good names, as many of the missing concepts that might be added to Java can fundamentally be described by their relationship to pre-existing concepts (e.g., non-null).

      There is a technical constraint on the space of hyphenated keywords, because some terms of the form A-B already have semantic meaning as expressions or statements:

      • Expressions that use a classic keyword as their first token and may appear on the RHS of a subtraction. For example, the notional hyphenated keyword lazy-int would clash with pre-existing code that uses the expression int.class in a subtraction, as in int lazy = ...; int x = lazy-int.class.hashCode();. Similarly, the notional hyphenated keyword object-new would clash with pre-existing code that uses a new expression in a subtraction, as in int object = ...; int x = object-new Foo().f;

      • Statements that take an expression as an operand. For example, the notional hyphenated keyword return-never would clash with pre-existing code that returns the negation of the numeric variable never.

      These examples show type-correct expressions and statements, but there are also type-incorrect expressions and statements that would clash with hyphenated keywords. That is, some terms of the form A-B are not semantically meaningful, but they are syntactically valid, and overloading them as hyphenated keywords would make lexing and parsing very difficult. In particular, the terms are:

      • Expressions that use a classic keyword as their last token. For example, consider the reference-typed expressions Foo.class, Foo.this, and Foo::new -- the subtractions Foo.class-day, Foo.this-day, and Foo::new-day are valid Java syntax when day is a numeric variable, but they are not semantically meaningful because subtraction does not accept a reference-typed expression as its left operand. Overloading the syntax by introducing a notional hyphenated keyword class-day, this-day, or new-day would be an unreasonable burden on compiler and IDE vendors.

      • Statements that take an expression as an operand. For example, the statement throw-quickly is valid Java syntax when quickly is a variable in scope, but it is not semantically meaningful (-quickly is not a Throwable regardless of the type of quickly). Overloading the syntax by introducing a notional hyphenated keyword throw-quickly would also roil compiler and IDE vendors.

      Formally, the hyphenated classic keyword A-B would be problematic if A is {assert, case, class, new, return, this, throw}, or if B is {boolean, byte, char, double, float, int, long, new, short, super, switch, this, void}.

      Alternatives

      A strategy to mitigate the cost of a new classic keyword would be to have a mechanism that allows the keyword to still be used as an identifier. This would have allowed developers in the Java 1.4 era to fix up their variables called assert so that their programs still compiled. However, any such mechanism would bring its own complexity and interactions with other features, and the idea of asking developers to revisit code in this way is undesirable. As a matter of interest, Kotlin allows a keyword to be used as an identifier by enclosing the keyword in backticks, but the goal is specifically to allow Kotlin code to use Java declarations whose names are identifiers in Java but keywords in Kotlin, such as is and when. General-purpose expansion of the Kotlin keyword space is accomplished with soft keywords, which map to unitary contextual keywords in this JEP.

      Reusing the same classic keyword for different features has ample precedent in Java. For example, final is (ab)used to mean "not mutable" and "not overridable" and "not extensible". Using a pre-existing keyword in a new feature is sometimes natural and sensible, but usually it is not the first choice. Over time, as the range of demands placed on the keyword space expands, this may descend into the ridiculous; no one wants to use null final as a way of negating finality. (While one might think such things are too ridiculous to consider, there were serious-seeming suggestions during JEP 325 to use new switch to describe a switch with different semantics, presumably to be followed by new new switch in ten years.)

      One way to live without making new keywords is to stop evolving Java entirely. While there are some who think this is a fine idea, doing so because of the lack of available tokens would be a silly reason. Java has a long life ahead, and Java developers are excited about new features that enable to them to write more expressive and reliable code.

      Risks and Assumptions

      Some Java developers will have a negative reaction to the idea of hyphenated keywords, while others may accept the idea but dislike the hyphenated suggestions that emerge over time for particular language features. However, this risk is likely to diminish over time, because many such reactions are possibly-transient responses to unfamiliarity.

      Java has a long tradition of declarations having default properties (e.g., package accessibility for classes, mutability for fields, and concreteness for methods) and then using keywords to modify the properties of a given declaration (e.g., public, final, abstract). Hyphenated keywords could subvert this tradition by "merging" a modifier and the declaration into a single term, such as value-class D {..} rather than value class D {..}. Similarly, a hyphenated keyword could simulate a modifier on a modifier (public-read, non-final, semi-abstract) when it may be better to find a unitary term that describes the desired concept and introduce it as a contextual or even classic keyword.

        Attachments

          Activity

            People

            • Assignee:
              abuckley Alex Buckley
              Reporter:
              abuckley Alex Buckley
              Owner:
              Alex Buckley
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated: