Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8254284

Refine ZipOutputStream.putNextEntry() to recalculate ZipEntry's compressed size

    XMLWordPrintable

    Details

    • Type: CSR
    • Status: Closed
    • Priority: P4
    • Resolution: Approved
    • Fix Version/s: 16
    • Component/s: core-libs
    • Labels:
      None
    • Subcomponent:
    • Compatibility Kind:
      behavioral
    • Compatibility Risk:
      low
    • Compatibility Risk Description:
      Hide
      The compatibility risk is set to "Low" as there are no obvious compatibility issues with this change. The ZIP area has a long history of behavior changes being reported months or years after the fact so it is probably not "Minimal".

      If we choose to ignore the implicitly recorded compressed size in a `ZipEntry` read from a zip file when writing it to a `ZipOutputStream`, this will lead the re-calculation of the compressed size for that entry while its data is written to the stream. For client code using `ZipOutputStream.putNextEntry()` there will be no observable behavioural changes. For the underlying zip file this will create entries with incomplete information in the LFH and an additional Data Descriptor as described in JDK-8253952. However, the result is still fully compatible to the ZIP file specification. It will also not be unusual, because by default all new zip files created with `ZipOutputStream` already contain LFHs without Compressed and Uncompressed Size and CRC information and an additional Data Descriptor.

      I'm not aware of any tool which can not handle such files and if it exists it would have problems with the majority of Java created zip files anyway (e.g. all jar-files created with the jar tool have zip entries with incomplete LFH data and additional Data Descriptor).

      Ignoring the implicitly set compressed size of a `ZipEntry` has no measurable performance impact and will increase the size of zip archives which used to have the complete file information in the LFH before by 16 bytes per entry.
      Show
      The compatibility risk is set to "Low" as there are no obvious compatibility issues with this change. The ZIP area has a long history of behavior changes being reported months or years after the fact so it is probably not "Minimal". If we choose to ignore the implicitly recorded compressed size in a `ZipEntry` read from a zip file when writing it to a `ZipOutputStream`, this will lead the re-calculation of the compressed size for that entry while its data is written to the stream. For client code using `ZipOutputStream.putNextEntry()` there will be no observable behavioural changes. For the underlying zip file this will create entries with incomplete information in the LFH and an additional Data Descriptor as described in JDK-8253952 . However, the result is still fully compatible to the ZIP file specification. It will also not be unusual, because by default all new zip files created with `ZipOutputStream` already contain LFHs without Compressed and Uncompressed Size and CRC information and an additional Data Descriptor. I'm not aware of any tool which can not handle such files and if it exists it would have problems with the majority of Java created zip files anyway (e.g. all jar-files created with the jar tool have zip entries with incomplete LFH data and additional Data Descriptor). Ignoring the implicitly set compressed size of a `ZipEntry` has no measurable performance impact and will increase the size of zip archives which used to have the complete file information in the LFH before by 16 bytes per entry.
    • Interface Kind:
      Java API
    • Scope:
      SE

      Description

      Summary

      Refine ZipOutputStream.putNextEntry(ZipEntry) to recalculate ZipEntry's compressed size when ZipEntry.setCompressedSize() has not been explicitly called on the ZipEntry.

      Problem

      In general it is not safe to directly write a ZipEntry obtained from ZipInputStream.getNextEntry(), ZipFile.entries(), ZipFile.getEntry() or ZipFile.stream() with ZipOutputStream.putNextEntry() to a ZipOutputStream and then read the entries data from the ZipInputStream and write it to the ZipOutputStream as follows:

       ZipEntry entry;
       ZipInputStream zis = new ZipInputStream(...);
       ZipOutputStream zos = new ZipOutputStream(...);
       while((entry = zis.getNextEntry()) != null) {
           zos.putNextEntry(entry);
           zis.transferTo(zos);
       }

      The problem with this code is that the ZIP file format does not record the compression level used for deflation in its entries. In general, it doesn't even mandate a predefined compression ratio per compression level. Therefore the compressed size recorded in a ZipEntry read from a ZIP file might differ from the new compressed size produced by the receiving ZipOutputStream. Such a difference will result in a ZipException with the following message:

       java.util.zip.ZipException: invalid entry compressed size (expected 12 but got 7 bytes)

      Unfortunately, there are examples where code which follows this coding pattern: Searching for "java.util.zip.ZipException: invalid entry compressed size (expected 12 but got 7 bytes)" gives ~2500 hits (~100 on StackOverflow). It's also no hard to find plenty of instances of this anti-pattern on GitHub when doing a code search for ZipEntry and putNextEntry(). E.g. Gradle 4.x wrapper task is affected as well as the latest version of the mockableAndroidJar task. I've recently fixed two occurrences of this pattern in OpenJDK (see JDK-8240333 and JDK-8240235) but there still exist more of them (e.g. test/jdk/java/util/zip/ZipFile/CopyJar.java which is there since 1999 :).

      Solution

      In ZipOutputStream.putNextEntry(ZipEntry) ignore and recalculate ZipEntry's compressed size when ZipEntry.setCompressedSize() has not been explicitly called on that ZipEntry.

      Specification

      Changes to the JarOutputStream API doc:

      --- a/src/java.base/share/classes/java/util/jar/JarOutputStream.java
      +++ b/src/java.base/share/classes/java/util/jar/JarOutputStream.java
      @@ -76,8 +76,15 @@ public class JarOutputStream extends ZipOutputStream {
           /**
            * Begins writing a new JAR file entry and positions the stream
            * to the start of the entry data. This method will also close
      -     * any previous entry. The default compression method will be
      -     * used if no compression method was specified for the entry.
      +     * any previous entry.
      +     * <p>
      +     * The default compression method will be used if no compression
      +     * method was specified for the entry. When writing a compressed
      +     * (DEFLATED) entry, and the compressed size has not been explicitly
      +     * set with the {@link ZipEntry#setCompressedSize(long)} method,
      +     * then the compressed size will be set to the actual compressed
      +     * size after deflation.
      +     * <p>
            * The current time will be used if the entry has no set modification
            * time.
            *

      Changes to the the ZipOutputStream API doc:

      --- a/src/java.base/share/classes/java/util/zip/ZipOutputStream.java
      +++ b/src/java.base/share/classes/java/util/zip/ZipOutputStream.java
      @@ -179,9 +179,15 @@ public class ZipOutputStream extends DeflaterOutputStream implements ZipConstant
           /**
            * Begins writing a new ZIP file entry and positions the stream to the
            * start of the entry data. Closes the current entry if still active.
      +     * <p>
            * The default compression method will be used if no compression method
      -     * was specified for the entry, and the current time will be used if
      -     * the entry has no set modification time.
      +     * was specified for the entry. When writing a compressed (DEFLATED)
      +     * entry, and the compressed size has not been explicitly set with the
      +     * {@link ZipEntry#setCompressedSize(long)} method, then the compressed
      +     * size will be set to the actual compressed size after deflation.
      +     * <p>
      +     * The current time will be used if the entry has no set modification time.
      +     *
            * @param     e the ZIP entry to be written
            * @throws    ZipException if a ZIP format error has occurred
            * @throws    IOException if an I/O error has occurred

      See https://github.com/openjdk/jdk/pull/520

      The complete changes are attached as 0001-8253952-Refine-ZipOutputStream.putNextEntry-to-recal.patch to this CSR.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              simonis Volker Simonis
              Reporter:
              simonis Volker Simonis
              Reviewed By:
              Alan Bateman, Lance Andersen
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: