Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6826329

(str) Fastpath for new String(bytes..) and String#getBytes(..) for ASCII + ISO-8859-1

    XMLWordPrintable

    Details

    • Type: Enhancement
    • Status: Open
    • Priority: P4
    • Resolution: Unresolved
    • Affects Version/s: 7
    • Fix Version/s: None
    • Component/s: core-libs
    • Labels:
    • Subcomponent:
    • CPU:
      x86
    • OS:
      windows_xp

      Description

      A DESCRIPTION OF THE REQUEST :
      String#getBytes(..) and new String(bytes..) internally use slow and each time newly instatiated Charset-X-coders.

      Additionally:
      At first assumption user could think, that String#getBytes(byte[] buf, Charset cs) might be faster than String#getBytes(byte[] buf, String csn), because he assumes, that Charset would be internally created from csn.
      As this is only true for the first call, there should be a *note* in JavaDoc about cost of those methods in comparision. Don't forget (byte[] ...) constructor's JavaDoc too.


      JUSTIFICATION :
      Assumed that ASCII and ISO-8859-1 have high percentage in usage on those methods especially for CORBA applications, we should have a fast shortcut in class String.

        See also:
      http://cr.openjdk.java.net/~sherman/6636323_6636319/webrev
      http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6636319
      http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6636323



      EXPECTED VERSUS ACTUAL BEHAVIOR :
      EXPECTED -
      Fastpath for ASCII + ISO-8859-1 for methods and constructors like:
      String#getBytes(..) and new String(bytes..)
      Alternatives:
      String#getASCIIBytes(..)
      String#getISO8859_1Bytes(..)

      ACTUAL -
      byte[] getBytes(Charset charset)
      internally instantiates CharsetEncoder which is much slower, especially on short strings.


      ---------- BEGIN SOURCE ----------
      1 simple example:

      public class String {
          ...
          int getBytes(byte[] buf, byte mask) {
              int j = 0;
              for (int i=0; i<values.length; i++, j++) {
                  if (values[i] | mask == mask)
                      buf[j] = (byte)values[i];
                      continue;
                  if (isHighSurrogate(values[i] && i+1<length && isLowSurrogate(values[i+1])
                       i++;
                  buf[j] = '?'; // or default replacement
              }
              return j;
          ...
          }

      ---------- END SOURCE ----------

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              sherman Xueming Shen
              Reporter:
              ryeung Roger Yeung (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Imported:
                Indexed: