Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-4832993

PrintWriter.flush() concurrency problem under heavy load

    Details

    • Subcomponent:
    • CPU:
      sparc
    • OS:
      solaris_8

      Description

      Customer is running their server under heavy load, 1000 concurrent HTTP clients. The server is configured with 1700 client threads and is running
      on a 4-way 700mHz PIII Xeon box with 2GB RAM using Windows Win2K Advanced
      Server and they have tested on Solaris 8 Sparc and see the same problem
      described below.

      ---------
      After a few minutes of running we start to see (via <ctrl> Break) threads
      that are "runnable" but waiting in java.io.PrintWriter.flush(). There is a
      synchronized block in PrintWriter.flush(), but the PrintWriter each thread
      has is a different instance. There should be no contention for 'lock'. Also
      the stack traces show that we are waiting at line 121 (in 1.4.1_02), but
      line 121 is the closing brace of the synchronized block.

      Here's an example thread showing the thread waiting in flush():

      "client1500" daemon prio=5 tid=0x283D0EE0 nid=0x12a8 runnable
      [400ef000..400efdc0]
      at java.io.PrintWriter.flush(PrintWriter.java:121)
      at com.sssw.srv.jsp.AgoJspWriter.flush(AgoJspWriter.java:198)
      at com.sssw.srv.jsp.AgoJspPageContext.release(AgoJspPageContext.java:87)
      at
      com.sssw.srv.jsp.AgoJspFactory.releasePageContext(AgoJspFactory.java:59)
      at
      com.sssw.gen.jsp.jsp_include_include_jsp_1016613202._jspService(jsp_include_include_jsp_1016613202.java:104)
      at com.sssw.srv.jsp.AgoHttpJspPage.service(AgoHttpJspPage.java:94)
      at com.sssw.srv.busobj.AgoFilterChain.doServlet(AgoFilterChain.java:394)
      at com.sssw.srv.busobj.AgoFilterChain.doFilter(AgoFilterChain.java:180)
      at com.sssw.srv.resources.AgWarResource.service(AgWarResource.java:629)
      at
      com.sssw.srv.resources.AgWarURLResource.perform(AgWarURLResource.java:113)
      at com.sssw.srv.http.httpd.perform(httpd.java:5568)
      at com.sssw.srv.http.Client.processRequest(Client.java:905)
      at com.sssw.srv.http.Client.loop(Client.java:1271)
      at com.sssw.srv.http.Client.runConnection(Client.java:1487)
      at com.sssw.srv.http.Client.run(Client.java:1435)
      at java.lang.Thread.run(Thread.java:536)

      If we let the test continue to run we will get hundreds (831 in my last
      test) of threads sitting in this state. About half of these threads sitting
      in the above state and about half of them doing a flush() from a different
      path:

      "client1499" daemon prio=5 tid=0x283D0288 nid=0xf34 runnable
      [400af000..400afdc0]
      at java.io.PrintWriter.flush(PrintWriter.java:121)
      at
      com.sssw.srv.busobj.AgoHttpReplyEvent.completeContent(AgoHttpReplyEvent.java:87)
      at com.sssw.srv.resources.AgWarResource.service(AgWarResource.java:631)
      at
      com.sssw.srv.resources.AgWarURLResource.perform(AgWarURLResource.java:113)
      at com.sssw.srv.http.httpd.perform(httpd.java:5568)
      at com.sssw.srv.http.Client.processRequest(Client.java:905)
      at com.sssw.srv.http.Client.loop(Client.java:1271)
      at com.sssw.srv.http.Client.runConnection(Client.java:1487)
      at com.sssw.srv.http.Client.run(Client.java:1435)
      at java.lang.Thread.run(Thread.java:536)

      Again, none of these PrintWriter objects are sharing the same 'lock'.
      Looking through all the threads, I don't see any thread that holds the
      lock.

      Thousands and thousands of requests complete successfully, so not all
      threads are getting hung up here, but more and more do as the test runs on.
      If we do successive <ctrl> Break's, some of the threads that were waiting
      in flush() in previous traces are not waiting in current ones. If I had to
      guess what was happening, it looks like there is some scheduling bottleneck
      coming out of the synchronized block on an SMP system.

      This is really hurting our performance / scalability. We realize that
      having this many threads is not a good design (we are fixing this in the
      release we are working on), but it is surprising that this particular
      synchronized block is causing problems while thousands of others aren't.


      ###@###.### 2003-04-18
      We brought the customer's server into the lab and ran a test.
      This time the thread dumps listed many of the same hanging threads
      in a condition wait status instead of runnable. From the pstack most
      of these threads show the following:
      ----------------- lwp# 22 / thread# 21 --------------------
       ff31f008 lwp_cond_wait (3f4160, 3f4148, 0)
       ff31654c _lwp_cond_wait (3f4160, 3f4148, 0, ff186000, 0, 0) + c
       fee698f8 __1cNObjectMonitorEwait6MxipnGThread__v_ (3f6078, 0, 5c00, 5d04, 87fe08, 55ec) + 61c
       fee68c50 __1cSObjectSynchronizerEwait6FnGHandle_xpnGThread__v_ (e43d167c, 0, 0, 87fe08, e43d12b0, 4a4ae8) + 180
       fee68e18 JVM_MonitorWait (87fe94, e43d1774, 0, 0, 87fe08, 0) + 130
       fa00b92c ???????? (e43d1774, 0, 0, ffffffff, 0, 0)
       fa005804 ???????? (e43d17f4, b6, 0, fa015e64, c, e43d1710)
       fa005804 ???????? (e43d1884, b6, 0, fa015e30, 4, e43d1790)
       fa005804 ???????? (e43d1904, 0, 0, fa015e30, 4, e43d1820)
       fa00010c ???????? (e43d1994, e43d1c08, a, f66c6430, 4, e43d18a8)
       fed681f8 __1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_ (e43d1c00, e43d1a60, e43d1
      b20, 87fe08, 87fe08, e43d1a74) + 25c
       fee65f08 __1cJJavaCallsMcall_virtual6FpnJJavaValue_nLKlassHandle_nMsymbolHandle_4pnRJavaCallArguments_pnGThread__v_ (ff186000,
      2f9c50, e43d1b14, e43d1b10, e43d1b20, 87fe08) + 190
       fee78f1c __1cJJavaCallsMcall_virtual6FpnJJavaValue_nGHandle_nLKlassHandle_nMsymbolHandle_5pnGThread__v_ (e43d1c00, e43d1bfc, e4
      3d1bf0, e43d1be8, e43d1be0, 87fe08) + 60
       fee8b210 __1cMthread_entry6FpnKJavaThread_pnGThread__v_ (87fe08, 87fe08, 3f4330, 2f9c50, 301d38, fee841b0) + 1d8
       fee841d8 __1cKJavaThreadDrun6M_v_ (87fe08, 0, 0, 0, 0, 0) + 254
       fee80624 _start (87fe08, fe9ef688, 1, 1, fe9ee000, 0) + 20
       fe9db728 _thread_start (87fe08, 0, 0, 0, 0, 0) + 40

      The entire pstack is located in the attachments.

        Attachments

          Activity

            People

            • Assignee:
              jhangalsunw Jayalaxmi Hangal (Inactive)
              Reporter:
              atongschsunw Albert Tong-schmidt (Inactive)
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Imported:
                Indexed: