Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8168049

Fix Performance of Lexer.isJSWhitespace

    Details

    • Subcomponent:
    • Resolved In Build:
      b145
    • CPU:
      generic
    • OS:
      generic

      Description

      A DESCRIPTION OF THE REQUEST :
      While profiling script editor in NetBeans I've noticed that one of performance hot-spots comes from Nashorn parser - particularly 2 methods:

      http://hg.openjdk.java.net/jdk9/dev/nashorn/file/a46b7d386795/src/jdk.scripting.nashorn/share/classes/jdk/nashorn/internal/parser/Lexer.java#l386
      (code is the same for both Jdk8 and Jdk9)

          public static boolean isJSWhitespace(final char ch) {
              return JAVASCRIPT_WHITESPACE.indexOf(ch) != -1;
          }

          public static boolean isJSEOL(final char ch) {
              return JAVASCRIPT_WHITESPACE_EOL.indexOf(ch) != -1;
          }

      These methods are called very frequently, but must typical check (ch == ' ') actually goes through rather complex String.indexOf()

          public int indexOf(int ch, int fromIndex) {
              final int max = value.length;
              if (fromIndex < 0) {
                  fromIndex = 0;
              } else if (fromIndex >= max) {
                  // Note: fromIndex might be near -1>>>1.
                  return -1;
              }

              if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
                  // handle most cases here (ch is a BMP code point or a
                  // negative value (invalid code point))
                  final char[] value = this.value;
                  for (int i = fromIndex; i < max; i++) {
                      if (value[i] == ch) {
                          return i;
                      }
                  }
                  return -1;
              } else {
                  return indexOfSupplementary(ch, fromIndex);
              }
          }
       


      JUSTIFICATION :
      To improve performance of Nashorn JavaScript parser

      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      I expect following will be more correct in terms of perfrmance

          public static boolean isJSWhitespace(final char ch) {
              return ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r'
                  || JAVASCRIPT_OTHER_WHITESPACE.indexOf(ch) != -1;
          }

          public static boolean isJSEOL(final char ch) {
              return ch == '\n' || ch == '\r'
                  || ch == '\u2028' // line separator
                  || ch == '\u2029' // paragraph separator
                  ;
          }

      PS: i'm actually not sure that all Unicode characters mentioned in JAVASCRIPT_WHITESPACE make sense. Most of them have sense only for text processors - like whitespaces of different typographic width. JavaScript source file is typically a plain text, with special character occurring only within string literals, but there is no whitespace withing string literals.

        Attachments

          Activity

            People

            • Assignee:
              hannesw Hannes Wallnoefer
              Reporter:
              webbuggrp Webbug Group
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: