Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8264777

Overload optimized FileInputStream::readAllBytes

    XMLWordPrintable

    Details

    • Subcomponent:
    • Resolved In Build:
      b23
    • Verification:
      Verified

      Description

      ADDITIONAL SYSTEM INFORMATION :
      Here is a JMH benchmark which gives an implementation example and shows 30% performance gain for 10MB sized files on Windows. For smaller files there is less improvement but i did not see any degradation on any file size from 100Byte to 10MB:

      package benchmarks;

      import java.io.IOException;
      import java.util.Arrays;

      import org.openjdk.jmh.annotations.Benchmark;
      import org.openjdk.jmh.annotations.BenchmarkMode;
      import org.openjdk.jmh.annotations.Fork;
      import org.openjdk.jmh.annotations.Measurement;
      import org.openjdk.jmh.annotations.OutputTimeUnit;
      import org.openjdk.jmh.annotations.Param;
      import org.openjdk.jmh.annotations.Setup;
      import org.openjdk.jmh.annotations.State;
      import org.openjdk.jmh.annotations.TearDown;
      import org.openjdk.jmh.annotations.Warmup;
      import org.openjdk.jmh.runner.Runner;
      import org.openjdk.jmh.runner.RunnerException;
      import org.openjdk.jmh.runner.options.Options;
      import org.openjdk.jmh.runner.options.OptionsBuilder;

      @BenchmarkMode(org.openjdk.jmh.annotations.Mode.SingleShotTime)
      @OutputTimeUnit(java.util.concurrent.TimeUnit.MILLISECONDS)
      @State(org.openjdk.jmh.annotations.Scope.Thread)
      @Fork(value = 1, jvmArgsAppend = { "-ea" })
      @Warmup(batchSize = 1000)
      @Measurement(batchSize = 1000)
      public class FileRead {
      // XXX put in any directory where the files are located.
      String dirName = "resources/";
      // XXX put in any filenames you like to test.
      @Param({ "100b.txt", "1k.txt", "10k.txt", "100k.txt", "1MB.txt", "10MB.txt" })
      String fileName;

      java.io.File file;
      byte[] result;
      byte[] expected;

      @Setup
      public void setup() throws IOException {
      file = new java.io.File(dirName + fileName).getAbsoluteFile();
      result = null;
      expected = java.nio.file.Files.readAllBytes(file.toPath());
      }

      @TearDown
      public void check() {
      assert Arrays.equals(expected, result) : "Nothing changed?";
      }

      public static final int MAX_ARRAY_LENGTH = Integer.MAX_VALUE - 8;

      @Benchmark
      public void readAllBytesOld() throws IOException {
      try (java.io.InputStream input = new java.io.FileInputStream(file)) {
      result = input.readAllBytes();
      }
      }

      @Benchmark
      public void readAllBytesNew() throws IOException {
      try (java.io.InputStream input = new java.io.FileInputStream(file) {

      @Override
      public byte[] readAllBytes() throws IOException {
      long length = this.getChannel().size();
      // use jdk.internal.util.ArraysSupport.newLength(int, int, int)?
      if (length > MAX_ARRAY_LENGTH)
      throw new OutOfMemoryError("File too large for array: " + length);
      return readNBytes(this, (int) length);
      }

      byte[] readNBytes(java.io.InputStream input, int byteLength) throws IOException {
      if (byteLength == 0)
      return new byte[0];
      byte[] byteBuf = new byte[byteLength]; // exact buffer size
      int byteCount = 0;
      int byteTransferSize = byteBuf.length;
      int bytesRead;
      while ((bytesRead = input.read(byteBuf, byteCount, byteTransferSize)) >= 0) {
      byteCount += bytesRead;
      byteTransferSize = byteBuf.length - byteCount;
      if (byteTransferSize <= 0) {
      break;
      }
      }
      return (byteBuf.length == byteCount) ? byteBuf : Arrays.copyOf(byteBuf, byteCount);
      }
      }) {
      result = input.readAllBytes();
      }
      }

      public static void main(String[] args) throws RunnerException, InterruptedException {
      Options opt = new OptionsBuilder().include(FileRead.class.getSimpleName()).shouldFailOnError(true).build();
      new Runner(opt).run();
      }
      }

      My results:

      Benchmark (fileName) Mode Cnt Score Error Units
      ReadByt3.readAllBytesNew 100b.txt ss 34,081 ms/op
      ReadByt3.readAllBytesNew 1k.txt ss 35,951 ms/op
      ReadByt3.readAllBytesNew 10k.txt ss 40,996 ms/op
      ReadByt3.readAllBytesNew 100k.txt ss 66,433 ms/op
      ReadByt3.readAllBytesNew 1MB.txt ss 587,246 ms/op
      ReadByt3.readAllBytesNew 10MB.txt ss 5361,234 ms/op

      ReadByt3.readAllBytesOld 100b.txt ss 35,115 ms/op
      ReadByt3.readAllBytesOld 1k.txt ss 35,951 ms/op
      ReadByt3.readAllBytesOld 10k.txt ss 45,528 ms/op
      ReadByt3.readAllBytesOld 100k.txt ss 125,894 ms/op
      ReadByt3.readAllBytesOld 1MB.txt ss 630,972 ms/op
      ReadByt3.readAllBytesOld 10MB.txt ss 7538,637 ms/op


      A DESCRIPTION OF THE PROBLEM :
      InputStream::readAllBytes currently reads all bytes through a series of buffers (https://bugs.openjdk.java.net/browse/JDK-8193832). For local files - where the filesize is known in advance- this could be optimized by reading all bytes at once from OS. Thus avoiding additional array creations and Array copies.
      Reading all bytes at once is for example heavily used on products like the eclipse IDE.


        Attachments

          Issue Links

            Activity

              People

              Assignee:
              bpb Brian Burkhalter
              Reporter:
              webbuggrp Webbug Group
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: