Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-6456806

Excessively long CMS preclean cycle eventually leads to concurrent mode failure

    Details

    • Subcomponent:
      gc
    • Resolved In Build:
      b03
    • CPU:
      sparc
    • OS:
      solaris_8

      Description

      Customer has a voice over IP application running 1.4.2_11 that after a period of time
      (usually 14-24 hours)
      They have a pre-clean that takes quite some time. We've seen from a few hundred seconds up to 6000+

      GC options: -server -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
      -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled
      -XX:MaxNewSize=128m -XX:NewSize=128m -Xms1G -Xmx1G -XX:SurvivorRatio=6
      -XX:MaxTenuringThreshold=4 -XX:CMSInitiatingOccupancyFraction=60
      -Dsun.rmi.dgc.server.gcInterval=0x7FFFFFFFFFFFFFFE
      -Dsun.rmi.dgc.client.gcInterval=0x7FFFFFFFFFFFFFFE
      -XX:+DisableExplicitGC -XX:+UseTLAB -verbose:gc
      -XX:+PrintTenuringDistribution -XX:+PrintGCTimeStamps
      -XX:+PrintGCDetails -XX:+PrintHeapAtGC -Xloggc:gc

      an example of a 6000+ preclean:
      90401.397: [GC {Heap before GC invocations=6436:
      Heap
      par new generation total 114688K, used 107474K [0xb5400000, 0xbd400000, 0xbd400000)
       eden space 98304K, 100% used [0xb5400000, 0xbb400000, 0xbb400000)
       from space 16384K, 55% used [0xbb400000, 0xbbcf4b70, 0xbc400000)
       to space 16384K, 0% used [0xbc400000, 0xbc400000, 0xbd400000)
      concurrent mark-sweep generation total 917504K, used 761505K [0xbd400000, 0xf5400000, 0xf5400000)
      concurrent-mark-sweep perm gen total 51336K, used 30768K [0xf5400000, 0xf8622000, 0xf9400000)
      90401.397: [ParNew (promotion failed): 107474K->107474K(114688K),
      3.4757402 secs]90404.873: [CMS95947.013: [CMS-concurrent-preclean: 6187.143/6199.426 secs]
      (concurrent mode failure): 763375K->507358K(917504K), 5560.5453822
      secs] 868980K->507358K(1032192K) Heap after GC invocations=6437:
      Heap
      par new generation total 114688K, used 0K [0xb5400000, 0xbd400000, 0xbd400000)
       eden space 98304K, 0% used [0xb5400000, 0xb5400000, 0xbb400000)
       from space 16384K, 0% used [0xbb400000, 0xbb400000, 0xbc400000)
       to space 16384K, 0% used [0xbc400000, 0xbc400000, 0xbd400000)
      concurrent mark-sweep generation total 917504K, used 507358K
      [0xbd400000, 0xf5400000, 0xf5400000)
      concurrent-mark-sweep perm gen total 51336K, used 30758K [0xf5400000, 0xf8622000, 0xf9400000) } , 5564.0254377 secs]


      The pattern in one of the gc logs running with -XX:PrintCMSStatistics=2
      shows the re-scan timings spiking.
      At the time of the long prelcean there are 833857 cards rescaned.

      113684.645: [ParNew (promotion failed): 107519K->107519K(114688K), 1.0264022 secs]113685.671: [CMS (cardTable: 0 cards, re-scanned 833857 cards, 3 iterations)

      Looks like a lot of cards re-scanned and just after it the number drops substantially.The one before it is ~4000.

      I can see the ramp up where pre-clean was taking 2-8 seconds then starts to go up to the high teens and then some 40-60 seconds and then this one.

      113684.644: [GC {Heap before GC invocations=5683:
      Heap
      par new generation total 114688K, used 107519K [0xb5400000, 0xbd400000, 0xbd400000)
       eden space 98304K, 100% used [0xb5400000, 0xbb400000, 0xbb400000)
       from space 16384K, 56% used [0xbc400000, 0xbccffed0, 0xbd400000)
       to space 16384K, 0% used [0xbb400000, 0xbb400000, 0xbc400000)
      concurrent mark-sweep generation total 917504K, used 809255K [0xbd400000, 0xf5400000, 0xf5400000)
      concurrent-mark-sweep perm gen total 52888K, used 31705K [0xf5400000, 0xf87a6000, 0xf9400000)
      113684.645: [ParNew (promotion failed): 107519K->107519K(114688K), 1.0264022 secs]113685.671: [CMS (cardTable: 0 cards, re-scanned 833857 cards, 3 iterations)
      117418.012: [CMS-concurrent-preclean: 4482.648/4494.823 secs]
      (CMS-concurrent-preclean yielded 0 times)
      (concurrent mode failure): 815888K->451828K(917504K), 3748.8971119 secs] 916774K->451828K(1032192K) Heap after GC invocations=5684:
      Heap
      par new generation total 114688K, used 0K [0xb5400000, 0xbd400000, 0xbd400000)
       eden space 98304K, 0% used [0xb5400000, 0xb5400000, 0xbb400000)
       from space 16384K, 0% used [0xbc400000, 0xbc400000, 0xbd400000)
       to space 16384K, 0% used [0xbb400000, 0xbb400000, 0xbc400000)
      concurrent mark-sweep generation total 917504K, used 451828K [0xbd400000, 0xf5400000, 0xf5400000)
      concurrent-mark-sweep perm gen total 52888K, used 31693K [0xf5400000, 0xf87a6000, 0xf9400000)} , 3749.9312617 secs]


      There is a jump from ~15 seconds to 57 to 0 then back to 15 then 4482 seconds between the CMS events.
      The cards re-scanned are 156931 for the 57 second one
      Drop to 1209
      then the ramp up to the eventual 833857

      112761.630: [CMS-concurrent-preclean: 14.293/14.561 secs]
      (CMS-concurrent-preclean yielded 0 times)
      112776.001: [CMS-concurrent-preclean-start]
      112834.097: [CMS-concurrent-preclean: 57.002/58.097 secs]
      (CMS-concurrent-preclean yielded 0 times)
      112848.226: [CMS-concurrent-preclean-start]
      112849.126: [CMS-concurrent-preclean: 0.650/0.900 secs]
      (CMS-concurrent-preclean yielded 0 times)
      112863.932: [CMS-concurrent-preclean-start]
      112878.684: [CMS-concurrent-preclean: 14.478/14.751 secs]
      (CMS-concurrent-preclean yielded 0 times)
      112892.655: [CMS-concurrent-preclean-start]
      112908.947: [CMS-concurrent-preclean: 15.788/16.292 secs]
      (CMS-concurrent-preclean yielded 0 times)
      112923.189: [CMS-concurrent-preclean-start]
      117418.012: [CMS-concurrent-preclean: 4482.648/4494.823 secs]
      (CMS-concurrent-preclean yielded 0 times)
      117442.657: [CMS-concurrent-preclean-start]
      117497.231: [CMS-concurrent-preclean: 54.569/54.574 secs]
      (CMS-concurrent-preclean yielded 0 times)
      117509.801: [CMS-concurrent-preclean-start]
      117510.480: [CMS-concurrent-preclean: 0.679/0.680 secs]
      (CMS-concurrent-preclean yielded 0 times)
      117521.990: [CMS-concurrent-preclean-start]
      117522.484: [CMS-concurrent-preclean: 0.493/0.494 secs]
      (CMS-concurrent-preclean yielded 0 times)
      117533.450: [CMS-concurrent-preclean-start]
      117533.943: [CMS-concurrent-preclean: 0.493/0.494 secs]

      The rate of card rescans, there's a big drop
      here. Before that, it's typically (less than) 1 second per
      1000 dirty cards scanned. Then suddenly for this particular
      event it jumps up to about 5 times that value, some 5 seconds
      per 1000 cards scanned.

      Customer has sent in SAR data and there doesn't appear to be
      a burden on the system and paging/swap doesn't seem to
      play a role in the impact of this behavior.

        Attachments

          Activity

            People

            • Assignee:
              kevinw Kevin Walls
              Reporter:
              msusko Mark Susko (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Imported:
                Indexed: