Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8238984

Case insensitive matching doesn't work correctly for some character classes

    XMLWordPrintable

    Details

    • Type: CSR
    • Status: Closed
    • Priority: P4
    • Resolution: Approved
    • Fix Version/s: 15
    • Component/s: core-libs
    • Labels:
      None
    • Subcomponent:
    • Compatibility Kind:
      behavioral
    • Compatibility Risk:
      medium
    • Compatibility Risk Description:
      While using such character classes as \p{Lower} or \p{Upper} in case-insensitive mode may seem strange, any existing regular expression that happen to use such constructs will start to behave differently.
    • Scope:
      JDK

      Description

      Summary

      Named regex character classes of forms \p{name} and \P{name} have to be made aware of the case insensitive mode.

      Problem

      In the case insensitive mode of matching against regular expression, not only a character of the input text has to be checked for inclusion into a character class, but also its lower-case, upper-case and title-case form should be checked. With the current implementation, this holds true for single characters and character classes denoted with braces, but not for the named classes of form \p{name} or \P{name}.

      In particular, this behavior goes against the POSIX standard, which states:

      9.2 Regular Expression General Requirements ... When a standard utility or function that uses regular expressions specifies that pattern matching shall be performed without regard to the case (uppercase or lowercase) of either data or patterns, then when each character in the string is matched against the pattern, not only the character, but also its case counterpart (if any), shall be matched.

      http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html

      Solution

      The named character classes will be made aware of the case insensitive mode. In particular, when in the case insensitive mode, all range classes of form [a-z] or [A-Z] should match to the same set of characters as to the class \p{Lower} or \p{Upper}.

      Specification

      No specification changes are necessary.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              igerasim Ivan Gerasimov
              Reporter:
              webbuggrp Webbug Group
              Reviewed By:
              Roger Riggs
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: