Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8170831

ZipFile implementation no longer caches the last accessed entry/pos

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: P4
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 9
    • Component/s: core-libs
    • Labels:
      None

      Description

      The ZipFile's ZIP format support implementation has been pulled up from the native C to Java level in JDK9, with the benefits of no more expensive back and forth jni calls, no more expensive native memory allocation for each every zip entry lookup/access... With the assumption that now a entry lookup cost is a simple hash table lookup the latest new implementation actually removed a "tricky" cache mechanism existing in the old C implementation, in which it always caches the last accessed native entry (with name, loc position info), with the assumption that the "use pattern" of zip entry is something like

      ZipEntry e = zipfile.getEntry(String name);
      InputStream is = zipfile.getInputStream(e);
      ...

      With the cache in place, the implementation can avoid the second time expensive lookup when the invoker comes back to read the bytes with the entry we just handed out.

      After some analysis of certain use scenario recently, it appears it might still be desired to have such cache mechanism to reduce the unnecessary lookup cost in use scenario mentioned above, for example the lookup cost of encoding the name from String to byte[] (for name table lookup). Also this cache mechanism can also help the corner case of "entries with same names" in a zip/jar file.

      While "entries with same names" in a zip/jar file is not encouraged (out ZipInputStream throws exception if such attempt is tried), the ZIP format spec does not really say anything about it. And the old ZipFile implementation actually works correctly to give you the corresponding bytes during iteration with the use pattern as
      zf.stream().forEach( ze -> zf.getInputStream(ze).readAllBytes() ...)

      So the proposal here is to add this cache mechanism back.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                sherman Xueming Shen
                Reporter:
                sherman Xueming Shen
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: