Details

    • Type: Enhancement
    • Status: Resolved
    • Priority: P4
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 9
    • Component/s: core-libs
    • Labels:
      None
    • Resolved In Build:
      b119
    • CPU:
      generic
    • OS:
      generic

      Description

      (1) pull out the "broken" printNodeTree (for debugging) from the Pattern. This one does not work as expected for a while . To replace the printNoteTree with the working one and putting it at a separate class j.u.regex.PrintPattern, which now can print out the clean and complete node tree of the pattern. For example,

         Pattern: [a-z0-9]+|ABCDEFG
           0: <Start>
           1: <Branch>
           2: <CharPropertyGreedy +>
           3: <Union>
           4: <Range[a-z]>
           5: <Range[0-9]>
               <-branch.separator->
           6: <Slice "ABCDEFG">
           7: </Branch>
           8: <END>

      (2) the optimization for the greedy repetition of a "CharProperty", which parse the greedy repetition on a single "CharProperty", such as \p{IsGreek}+, or the most commonly used .* into a single/smooth loop node.

      from

          Pattern: \p{IsGreek}+
           0: <Start>
           1: <Curly GREEDY + >
           2: <Script GREEK>
               </Curly>
           3: <END>

      to

           Pattern: \p{IsGreek}+
           0: <Start>
           1: <CharPropertyGreedy Script GREEK+>
           2: <END>

         The simple jmh benchmark [2] indicates it is about 50%+, especially for those no-match case.

      (3) the optimization for the "union" of various individual "char" inside a chracter class [...], usch as. [ABCDEF]. For a regex like [a-zABCDEF], now the engine generates the nodes like

         Pattern: [a-zABCDEF]
           0: <Start>
           1: <Union>
           2: <Union>
           3: <Union>
           4: <Union>
           5: <Union>
           6: <Union>
           7: <Range[a-z]>
           8: <Bits [ A B C D E F]>
           8: <Bits [ A B C D E F]>
           8: <Bits [ A B C D E F]>
           8: <Bits [ A B C D E F]>
           8: <Bits [ A B C D E F]>
           8: <Bits [ A B C D E F]>
           9: <END>

      with the optimization it generate (which it should)

         Pattern: [a-zABCDEF]
           0: <Start>
           1: <Union>
           2: <Range[a-z]>
           3: <Bits [ A B C D E F]>
           4: <END>

         The jmh benchmark [2] also indicates it is much faster, especially for those no-match case.

      (4) Replace those "constant" CharProperty nodes with a simple function interface/lambda. The change reduces the total package classes (anonymous classes) from 130+ to < 70.


      oh, there is another one
      (5) fix the change for the "j.u.regex: Negated Character Classes" [3]

      [1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2016-March/039269.html
      [2] http://cr.openjdk.java.net/~sherman/regexClosure/MyBenchmark.java
      [3] http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-June/006957.html

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sherman Xueming Shen
                Reporter:
                sherman Xueming Shen
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: