Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6970904

Character sequence \w in an regex pattern is narrower than defined in the specification

    Details

    • Subcomponent:
    • Resolved In Build:
      1.4
    • CPU:
      generic
    • OS:
      generic
    • Verification:
      Verified

      Backports

        Description

        Enclosed test case RegexTest_234 contains the valid xml document RegexTest_234.xml for the valid schema RegexTest_234.xsd.

        The specification (http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/datatypes.html#regexs) states:
        Character sequence: Equivalent ·character class:

        \w [#x0000-#x10FFFF]-[\p{P}\p{Z}\p{C}]
                                 (all characters except the set of "punctuation", "separator" and "other" characters)

        The character sequence in xml document is foo#xcab1 bar#xcab1, the regex pattern is (\w+)\s+(\w+), validation of the xml document against the schema fails with the exception:
        SAX error: file:~/devel/analysis/RegexTest_234.xml(1,129): cvc-pattern-valid: Value 'foo¿ bar¿' is not facet-valid with respect to pattern '(\w+)\s+(\w+)' for type '#AnonType_valuedoc'.

        Although the document is valid.

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  joehw Joe Wang
                  Reporter:
                  lkuskov Leonid Kuskov
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  1 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:
                    Imported:
                    Indexed: