Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8167002

JAXP schema validator: Use HashSet instead of ArrayList for tracking XML IDs

    XMLWordPrintable

    Details

    • Subcomponent:
    • Resolved In Build:
      b140
    • Verification:
      Not verified

      Description

      A colleague reported:

      ValidationState is used to validate XML ID and IDREF elements (among other types). To do so, it keeps data structures containing all IDs and IDREFs that have been seen in a document. The only methods ever called on the ID container are add() and contains(), so substituting HashSet for ArrayList makes no difference in behavior, while improving performance by orders of magnitude in large documents. No change is necessary/possible, however, for the IDREF container.This one is only ever iterated over -- never searched -- and order matters, so ArrayList is appropriate.

      On a test document with ~1.5M elements, ~330K IDs, and ~430K IDREFs, this
      change speeds up parsing (with validation enabled) by a factor of 26 (from
      21.4 minutes, ~800ms/element, to 49 seconds, ~31µs/element).

      There are further obvious algorithmic improvements possible for additional
      constant-factor gains, but this is simple, safe, and brings schema validation
      from O(n^2+mn) to O(n+m), where n is the number of IDs and m is the number of
      IDREFs.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              martin Martin Buchholz
              Reporter:
              martin Martin Buchholz
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: