Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6982173

(coll) Performance anomalies in AbstractSet.removeAll

    Details

    • Type: Bug
    • Status: Open
    • Priority: P4
    • Resolution: Unresolved
    • Affects Version/s: 7
    • Fix Version/s: None
    • Component/s: core-libs
    • Labels:

      Description

      As written about in
       http://msmvps.com/blogs/jon_skeet/archive/2010/07/29/there-s-a-hole-in-my-abstraction-dear-liza-dear-liza.aspx

      there can be performance anomolies in AbstractSet.removeAll under some conditions; quoting:

      " This implementation [of AbstractSet.removeAll] determines which is the smaller of this set and the specified collection, by invoking the size method on each. If this set has fewer elements, then the implementation iterates over this set, checking each element returned by the iterator in turn to see if it is contained in the specified collection. If it is so contained, it is removed from this set with the iterator's remove method. If the specified collection has fewer elements, then the implementation iterates over the specified collection, removing from this set each element returned by the iterator, using this set's remove method.

      Now that sounds reasonable on the surface of it - iterate through the smaller collection, check for the presence in the bigger collection. However, this is where the abstraction is leaky. Just because we can ask for the presence of an item in a large collection doesn't mean it's going to be fast. In our case, the collections are the same size - but checking for the presence of an item in the HashSet is O(1) whereas checking in the ArrayList is O(N)... whereas the cost of iterating is going to be the same for each collection. Basically by choosing to iterate over the HashSet and check for presence in the ArrayList, we've got an O(M * N) solution overall instead of an O(N) solution. Ouch. The removeAll method is making an "optimization" based on assumptions which just aren't valid in this case."

        Issue Links

          Activity

          Hide
          michaelm Michael McMahon added a comment -
          BT2:EVALUATION

          It's unfortunate that AbstractSet.removeAll() is specified to behave exactly in this way. So, while in one sense it's not a bug, it clearly is still a performance anomaly.

          Maybe, removeAll() could be overridden in HashSet to use some more sophisticated heuristics, based on what we know about the actual types and their performance characteristics. I'm not sure if the spec. or implementation of AbstractSet.removeAll() could be changed in the same way though. If not, then we would probably have to look at the other uses of AbstractSet within java.util.
          Show
          michaelm Michael McMahon added a comment - BT2:EVALUATION It's unfortunate that AbstractSet.removeAll() is specified to behave exactly in this way. So, while in one sense it's not a bug, it clearly is still a performance anomaly. Maybe, removeAll() could be overridden in HashSet to use some more sophisticated heuristics, based on what we know about the actual types and their performance characteristics. I'm not sure if the spec. or implementation of AbstractSet.removeAll() could be changed in the same way though. If not, then we would probably have to look at the other uses of AbstractSet within java.util.
          Hide
          michaelm Michael McMahon added a comment -
          BT2:EVALUATION

          or perhaps something could be done with a marker interface so that collections can identify themselves as having certain performance characteristics and that information could be used by removeAll()
          Show
          michaelm Michael McMahon added a comment - BT2:EVALUATION or perhaps something could be done with a marker interface so that collections can identify themselves as having certain performance characteristics and that information could be used by removeAll()
          Hide
          smarks Stuart Marks added a comment -
          Show
          smarks Stuart Marks added a comment - Jon Skeet's article now appears at this location: https://codeblog.jonskeet.uk/2010/07/29/there-s-a-hole-in-my-abstraction-dear-liza-dear-liza/

            People

            • Assignee:
              michaelm Michael McMahon
              Reporter:
              darcy Joe Darcy
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Imported:
                Indexed: