Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8027645

Pattern.split() with positive lookahead

    Details

    • Subcomponent:
    • Resolved In Build:
      b117
    • OS:
      generic
    • Verification:
      Verified

      Backports

        Description

        FULL PRODUCT VERSION :
        java version " 1.7.0_11 "
        Java(TM) SE Runtime Environment (build 1.7.0_11-b21)
        Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)

        A DESCRIPTION OF THE PROBLEM :
        For the sake of simplicity I will use String.split(regex) in my examples, even though the actual bug is in Pattern.split().

        When using the regular expression " (?=\\p{Lu}) " to split a string starting with an uppercase letter, split() will split the string before it starts:
         " FooBar " .split( " (?=\\p{Lu}) " ) will result in [,Foo,Bar].

        If however this match on character 0 is the only match, the empty string is not included in the array:
         " Foo " .split( " (?=\\p{Lu}) " ) will result in [Foo].

        Stepping through the code with these examples shows that the match is correctly detected, but later the code assumes there was no match (because the end-index of the match is 0), and thus returns an array containing the input string.

        This is clearly wrong. One could argue if the empty string should be contained in the array or not, but it should either always be there, or never.


        REPRODUCIBILITY :
        This bug can be reproduced always.

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  sherman Xueming Shen
                  Reporter:
                  igerasim Ivan Gerasimov
                • Votes:
                  1 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: