Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8216396

Support new Japanese era and new currency code points in java.lang.Character for Java SE 8

    XMLWordPrintable

    Details

    • Subcomponent:
    • Resolved In Build:
      master
    • CPU:
      generic
    • OS:
      generic

      Backports

        Description

        The Java SE 8 Platform uses character code points from version 6.2 of the Unicode Standard. As a result, the new Japanese Era code point (U+32FF) which is expected to be assigned in version 12.1 of the Unicode Standard and new currency code points assigned in version 10.0 of the Unicode Standard are not available for use in Java 8.

        Solution
        ----------------
        Modify the specification of java.lang.Character to allow (though not require) implementations of the Java SE 8 Platform to support the new era code point and currency code points. In effect, the Java SE 8 Platform supports Unicode 6.2 plus an extension.
         
        Consequently, the behavior of fields and methods of java.lang.Character may vary across implementations of the Java SE 8 Platform when processing U+32FF and currency code points U+20BB,U+20BC,U+20BD,U+20BE,U+20BF, except for the following methods that define Java identifiers:
         
        isJavaIdentifierStart(int)
        isJavaIdentifierStart(char)
        isJavaIdentifierPart(int)
        isJavaIdentifierPart(char)
         
        Code points in Java identifiers must continue to be drawn from Unicode 6.2, for source compatibility reasons.
        These changes necessitate a Maintenance Review of the Java SE 8 Platform. See the announcement [0] to the OpenJDK community.
        [0] http://mail.openjdk.java.net/pipermail/jdk8u-dev/2018-December/008324.html

        Specification
        The initial portion of specification of the java.lang.Character class is changed from:

        /**
         * The {@code Character} class wraps a value of the primitive
         * type {@code char} in an object. An object of type
         * {@code Character} contains a single field whose type is
         * {@code char}.
         * <p>
         * In addition, this class provides several methods for determining
         * a character's category (lowercase letter, digit, etc.) and for converting
         * characters from uppercase to lowercase and vice versa.
         * <p>
         * Character information is based on the Unicode Standard, version 6.2.0.
         * <p>
         * The methods and data of class {@code Character} are defined by
         * the information in the <i>UnicodeData</i> file that is part of the
         * Unicode Character Database maintained by the Unicode
         * Consortium. This file specifies various properties including name
         * and general category for every defined Unicode code point or
         * character range.
         * <p>
         * The file and its description are available from the Unicode Consortium at:
         * <ul>
         * <li><a href="http://www.unicode.org">http://www.unicode.org&lt;/a>
         * </ul>
         *
         * <h3><a name="unicode">Unicode Character Representations</a></h3>

        to

        /**
         * The {@code Character} class wraps a value of the primitive
         * type {@code char} in an object. An object of class
         * {@code Character} contains a single field whose type is
         * {@code char}.
         * <p>
         * In addition, this class provides a large number of static methods for
         * determining a character's category (lowercase letter, digit, etc.)
         * and for converting characters from uppercase to lowercase and vice
         * versa.
         *
         * <h3><a id="conformance">Unicode Conformance</a></h3>
         * <p>
         * The fields and methods of class {@code Character} are defined in terms
         * of character information from the Unicode Standard, specifically the
         * <i>UnicodeData</i> file that is part of the Unicode Character Database.
         * This file specifies properties including name and category for every
         * assigned Unicode code point or character range. The file is available
         * from the Unicode Consortium at
         * <a href="http://www.unicode.org">http://www.unicode.org&lt;/a>.
         * <p>
         * The Java SE 8 Platform uses character information from version 6.2
         * of the Unicode Standard, with two extensions. First, the Java SE 8 Platform
         * allows an implementation of class {@code Character} to use the Japanese Era
         * code point, {@code U+32FF}, from the first version of the Unicode Standard
         * after 6.2 that assigns the code point. Second, in recognition of the fact
         * that new currencies appear frequently, the Java SE 8 Platform allows an
         * implementation of class {@code Character} to use the Currency Symbols
         * block from version 10.0 of the Unicode Standard. Consequently, the
         * behavior of fields and methods of class {@code Character} may vary across
         * implementations of the Java SE 8 Platform when processing the aforementioned
         * code points ( outside of version 6.2 ), except for the following methods
         * that define Java identifiers:
         * {@link #isJavaIdentifierStart(int)}, {@link #isJavaIdentifierStart(char)},
         * {@link #isJavaIdentifierPart(int)}, and {@link #isJavaIdentifierPart(char)}.
         * Code points in Java identifiers must be drawn from version 6.2 of
         * the Unicode Standard.
         *
         * <h3><a name="unicode">Unicode Character Representations</a></h3>


        The initial portion of specification of the isJavaLetter(char ch) method is changed from:

        /**
         * Determines if the specified character is permissible as the first
         * character in a Java identifier.
         * <p>
         * A character may start a Java identifier if and only if
         * one of the following is true:
         * <ul>
         * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true}
         * <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER}
         * <li> {@code ch} is a currency symbol (such as {@code '$'})
         * <li> {@code ch} is a connecting punctuation character (such as {@code '_'}).
         * </ul>
         *

        to

        /**
         * Determines if the specified character is permissible as the first
         * character in a Java identifier.
         * <p>
         * A character may start a Java identifier if and only if
         * one of the following conditions is true:
         * <ul>
         * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true}
         * <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER}
         * <li> {@code ch} is a currency symbol (such as {@code '$'})
         * <li> {@code ch} is a connecting punctuation character (such as {@code '_'}).
         * </ul>
         *
         * These conditions are tested against the character information from version
         * 6.2 of the Unicode Standard.
         *


        The initial portion of specification of the isJavaLetterOrDigit(char ch) method is changed from:

        /**
         * Determines if the specified character may be part of a Java
         * identifier as other than the first character.
         * <p>
         * A character may be part of a Java identifier if and only if any
         * of the following are true:
         * <ul>
         * <li>  it is a letter
         * <li>  it is a currency symbol (such as {@code '$'})
         * <li>  it is a connecting punctuation character (such as {@code '_'})
         * <li>  it is a digit
         * <li>  it is a numeric letter (such as a Roman numeral character)
         * <li>  it is a combining mark
         * <li>  it is a non-spacing mark
         * <li> {@code isIdentifierIgnorable} returns
         * {@code true} for the character.
         * </ul>
         *

        to

        /**
         * Determines if the specified character may be part of a Java
         * identifier as other than the first character.
         * <p>
         * A character may be part of a Java identifier if and only if any
         * of the following conditions are true:
         * <ul>
         * <li>  it is a letter
         * <li>  it is a currency symbol (such as {@code '$'})
         * <li>  it is a connecting punctuation character (such as {@code '_'})
         * <li>  it is a digit
         * <li>  it is a numeric letter (such as a Roman numeral character)
         * <li>  it is a combining mark
         * <li>  it is a non-spacing mark
         * <li> {@code isIdentifierIgnorable} returns
         * {@code true} for the character.
         * </ul>
         *
         * These conditions are tested against the character information from version
         * 6.2 of the Unicode Standard.
         *


        The initial portion of specification of the isJavaIdentifierStart(char ch) method is changed from:

        /**
         * Determines if the specified character is
         * permissible as the first character in a Java identifier.
         * <p>
         * A character may start a Java identifier if and only if
         * one of the following conditions is true:
         * <ul>
         * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true}
         * <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER}
         * <li> {@code ch} is a currency symbol (such as {@code '$'})
         * <li> {@code ch} is a connecting punctuation character (such as {@code '_'}).
         * </ul>
         *

        to

        /**
         * Determines if the specified character is
         * permissible as the first character in a Java identifier.
         * <p>
         * A character may start a Java identifier if and only if
         * one of the following conditions is true:
         * <ul>
         * <li> {@link #isLetter(char) isLetter(ch)} returns {@code true}
         * <li> {@link #getType(char) getType(ch)} returns {@code LETTER_NUMBER}
         * <li> {@code ch} is a currency symbol (such as {@code '$'})
         * <li> {@code ch} is a connecting punctuation character (such as {@code '_'}).
         * </ul>
         *
         * These conditions are tested against the character information from version
         * 6.2 of the Unicode Standard.
         *

        The initial portion of specification of the isJavaIdentifierStart(int codePoint) method is changed from:

        /**
         * Determines if the character (Unicode code point) is
         * permissible as the first character in a Java identifier.
         * <p>
         * A character may start a Java identifier if and only if
         * one of the following conditions is true:
         * <ul>
         * <li> {@link #isLetter(int) isLetter(codePoint)}
         *      returns {@code true}
         * <li> {@link #getType(int) getType(codePoint)}
         *      returns {@code LETTER_NUMBER}
         * <li> the referenced character is a currency symbol (such as {@code '$'})
         * <li> the referenced character is a connecting punctuation character
         *      (such as {@code '_'}).
         * </ul>
         *

        to

        /**
         * Determines if the character (Unicode code point) is
         * permissible as the first character in a Java identifier.
         * <p>
         * A character may start a Java identifier if and only if
         * one of the following conditions is true:
         * <ul>
         * <li> {@link #isLetter(int) isLetter(codePoint)}
         *      returns {@code true}
         * <li> {@link #getType(int) getType(codePoint)}
         *      returns {@code LETTER_NUMBER}
         * <li> the referenced character is a currency symbol (such as {@code '$'})
         * <li> the referenced character is a connecting punctuation character
         *      (such as {@code '_'}).
         * </ul>
         *
         * These conditions are tested against the character information from version
         * 6.2 of the Unicode Standard.
         *


        The initial portion of specification of the isJavaIdentifierPart(char ch) method is changed from:

        /**
         * Determines if the specified character may be part of a Java
         * identifier as other than the first character.
         * <p>
         * A character may be part of a Java identifier if any of the following
         * are true:
         * <ul>
         * <li>  it is a letter
         * <li>  it is a currency symbol (such as {@code '$'})
         * <li>  it is a connecting punctuation character (such as {@code '_'})
         * <li>  it is a digit
         * <li>  it is a numeric letter (such as a Roman numeral character)
         * <li>  it is a combining mark
         * <li>  it is a non-spacing mark
         * <li> {@code isIdentifierIgnorable} returns
         * {@code true} for the character
         * </ul>
         *

        to

        /**
         * Determines if the specified character may be part of a Java
         * identifier as other than the first character.
         * <p>
         * A character may be part of a Java identifier if any of the following
         * conditions are true:
         * <ul>
         * <li>  it is a letter
         * <li>  it is a currency symbol (such as {@code '$'})
         * <li>  it is a connecting punctuation character (such as {@code '_'})
         * <li>  it is a digit
         * <li>  it is a numeric letter (such as a Roman numeral character)
         * <li>  it is a combining mark
         * <li>  it is a non-spacing mark
         * <li> {@code isIdentifierIgnorable} returns
         * {@code true} for the character
         * </ul>
         *
         * These conditions are tested against the character information from version
         * 6.2 of the Unicode Standard.
         *


        The initial portion of specification of the isJavaIdentifierPart(int codePoint) method is changed from:

        /**
         * Determines if the character (Unicode code point) may be part of a Java
         * identifier as other than the first character.
         * <p>
         * A character may be part of a Java identifier if any of the following
         * are true:
         * <ul>
         * <li>  it is a letter
         * <li>  it is a currency symbol (such as {@code '$'})
         * <li>  it is a connecting punctuation character (such as {@code '_'})
         * <li>  it is a digit
         * <li>  it is a numeric letter (such as a Roman numeral character)
         * <li>  it is a combining mark
         * <li>  it is a non-spacing mark
         * <li> {@link #isIdentifierIgnorable(int)
         * isIdentifierIgnorable(codePoint)} returns {@code true} for
         * the character
         * </ul>
         *

        to

        /**
         * Determines if the character (Unicode code point) may be part of a Java
         * identifier as other than the first character.
         * <p>
         * A character may be part of a Java identifier if any of the following
         * conditions are true:
         * <ul>
         * <li>  it is a letter
         * <li>  it is a currency symbol (such as {@code '$'})
         * <li>  it is a connecting punctuation character (such as {@code '_'})
         * <li>  it is a digit
         * <li>  it is a numeric letter (such as a Roman numeral character)
         * <li>  it is a combining mark
         * <li>  it is a non-spacing mark
         * <li> {@link #isIdentifierIgnorable(int)
         * isIdentifierIgnorable(codePoint)} returns {@code true} for
         * the code point
         * </ul>
         *
         * These conditions are tested against the character information from version
         * 6.2 of the Unicode Standard.
         *

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  dkejriwal Deepak Kejriwal
                  Reporter:
                  naoto Naoto Sato
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  7 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: