Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8043727

Behavior of regex \b (word boundary) is unclear; Description of \B is wrong

    XMLWordPrintable

    Details

    • Type: Enhancement
    • Status: Closed
    • Priority: P4
    • Resolution: Duplicate
    • Affects Version/s: 7u45
    • Fix Version/s: None
    • Component/s: core-libs
    • Labels:

      Description

      A DESCRIPTION OF THE PROBLEM :
      The documentation for java.util.regex.Pattern says \b matches "A word boundary" with no further mention.

      One can only infer that it behaves equivalently to Perl, since the class describes its syntax as similar to Perl and explains its differences from Perl.

      Well, in Perl, \b is equivalent to (?:(?<!\w)(?=\w)|(?<=\w)(?!\w)). Although the meaning of \w (word character) can be changed by some modifiers, \b is always in sync with \w about what constitutes a word character.

      This nice property does not seem to hold in Java. \w is equivalent to [a-zA-Z0-9_], but digging in the source shows that \b considers the underscore plus anything matched by Character.isLetterOrDigit as a word character, which includes Unicode stuff.

      (If UNICODE_CHARACTER_CLASS is enabled for the Pattern, \w and \b both change, and now use the same definition of a word character.)

      If Java's \b behavior is correct, it should at least be documented, as it's currently impossible to reason confidently about what it's supposed to do.

      Also, the description of \B is wrong, or at least, open to misinterpretation if you don't already know what it does. It says it's "A non-word boundary" when what it ought to say is "Not a word boundary". Think about it: the location of the boundary of a "non-word" is the same as the location of a boundary of a word (except at the beginning/end of a string).


      URL OF FAULTY DOCUMENTATION :
      http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              sherman Xueming Shen
              Reporter:
              webbuggrp Webbug Group
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: