Details

    • Subcomponent:
    • Resolved In Build:
      04
    • CPU:
      sparc
    • OS:
      solaris_8

      Backports

        Description


        ============================= Problem ===========================
        customers description:

        Java Virtual Machine 1.3.1-b24 running on Sun Solaris 8 on an Enterprise 10000 machine 16 cpu's.
        We have JNI involved in our application, we are using CORBA as communication protocol and a sybase jdbc driver.
        We are already using the export LD_LIBRARY_PATH=/usr/lib/lwp on all system for
        a long time.
        Yes we have mixed classes (1.2.2_05a and 1.3.1).
        The application is using 1800 Mb of memory and around 1600 threads are used.

        In our production environment our application server crashed. A core file and pstack file are available. After investigating these files we did not found any info why or what caused the server to crash. We also got an error log from the console generated from the Virtual Machine. This error is for the moment our starting point and we are hoping that Sun can support us in tracking down the problem and give us some info on this error id.

           This is the error from the console.
              #
              # HotSpot Virtual Machine Error, Internal Error
              # Please report this error at
              # http://java.sun.com/cgi-bin/bugreport.cgi
              #
              # Error ID: 455843455054494F4E530E43505000CD 01
              #
              # Problematic Thread: prio=5 tid=0x1c9b7d8 nid=0x53c runnable
              #

        ======================== dbx stacktrace =======================
        Here is a dbx stacktrace:

        Reading java
        core file header read successfully
        Reading ld.so.1
        Reading libthread.so.1
        Reading libdl.so.1
        Reading libc.so.1
        Reading libc_psr.so.1
        Reading libjvm.so
        Reading libCrun.so.1
        Reading libsocket.so.1
        Reading libnsl.so.1
        Reading libm.so.1
        Reading libw.so.1
        Reading libmp.so.2
        Reading libhpi.so
        Reading libverify.so
        Reading libjava.so
        Reading libzip.so
        Reading en_US.UTF-8.so.2
        Reading methods_en_US.UTF-8.so.2
        Reading libUtility.so
        Reading libpthread.so.1
        Reading librt.so.1
        Reading libaio.so.1
        Reading libnet.so
        Reading nss_files.so.1
        Reading libioser12.so
        Reading libGEDWrapper0.so
        Reading libtls7d.so
        Reading libmth7d.so
        Reading libbla7d.so
        Reading libnagc.so.6
        Reading libintl.so.1
        Reading libucb.so.1
        Reading libresolv.so.2
        Reading libelf.so.1
        Reading libGEDWrapper9.so
        Reading libGEDWrapper3.so
        Reading libGEDWrapper5.so
        Reading libGEDWrapper6.so
        Reading libGEDWrapper2.so
        Reading libGEDWrapper1.so
        Reading libGEDWrapper10.so
        Reading libGEDWrapper7.so
        Reading libGEDWrapper4.so
        Reading libGEDWrapper8.so
        Reading libRTWrapperDLL.so
        Reading libRTContribWrapperDLL.so
        detected a multithreaded program
        t@1340 (l@1340) terminated by signal ABRT (Abort)
        0xff3196f8: __lwp_kill+0x0008: bgeu,a __lwp_kill+0x1c
        (/opt/SUNWspro/bin/../WS6U2/bin/sparcv9/dbx) where
        current thread: t@1340
        =>[1] __lwp_kill(0x0, 0x53c, 0x0, 0xff336000, 0x19798, 0xff2c9b00), at 0xff3196f8
          [2] raise(0x6, 0x0, 0x0, 0xffffffff, 0xff33a394, 0xc), at 0xff2c9b08
          [3] abort(0xff336000, 0x5f57e648, 0x0, 0x4, 0x0, 0x5f57e669), at 0xff2b5124
          [4] os::abort(0x1, 0xff092000, 0x1, 0x5f57e, 0xff092000, 0x5f57e664), at 0xfefa019c
          [5] report_error(0xe4, 0x5f57eee4, 0xcd, 0xff028208, 0xff0ffddc, 0xff092000), at 0xfef0f904
          [6] report_fatal(0xcd, 0xff092000, 0xff02874c, 0x5f57f, 0xff092000, 0x5f57f824), at 0xfef0f1d
        4
          [7] ExceptionMark::ExceptionMark(0x886e2708, 0x5f57f8f8, 0xff092000, 0xf74000c0, 0xff092000,
        0x5f57f884), at 0xfecf2540
          [8] constantPoolOopDesc::klass_at_if_loaded(0x0, 0x0, 0x5f57f98c, 0xf740f378, 0xff092000, 0x5
        f57f914), at 0xfecf0f84
          [9] methodOopDesc::fast_exception_handler_bci_for(0x1c9b7d8, 0xf7e13070, 0xf7e13538, 0x5f57fa
        68, 0x7b, 0x5f57f99c), at 0xfed5dbb8
          [10] InterpreterRuntime::exception_handler_for_exception(0x1c9b7d8, 0xf74373f8, 0x5f57fc48, 0
        xff092000, 0x1c9b7d8, 0x109a0), at 0xfed5d670
          [11] 0x123860(0x5f57fbdc, 0x1, 0xff09fa58, 0x12d944, 0x8, 0x5f57fae8), at 0x12385f
          [12] 0xff0f9968(0x5f57fc68, 0x5f57fea0, 0xa, 0xf7e13598, 0x4, 0x5f57fb80), at 0xff0f9967
          [13] JavaCalls::call_helper(0x5f57fe98, 0xff092000, 0x5f57fde4, 0x1c9b7d8, 0x123d78, 0x5f57fe
        a0), at 0xfecc67ec
          [14] JavaCalls::call_virtual(0xf7e136d8, 0x5f57fdd0, 0x5f57fdd4, 0xff092000, 0x5f57fe98, 0x5f
        57fde4), at 0xfedf0e18
          [15] JavaCalls::call_virtual(0x5f57fe98, 0x5f57fe94, 0x5f57fe90, 0x5f57fe84, 0x5f57fe7c, 0x1c
        9b7d8), at 0xfedf6dec
          [16] thread_entry(0xf7417e18, 0x1c9b7d8, 0xff092000, 0x5f57ffa0, 0x1e, 0xe), at 0xfee14ba4
          [17] JavaThread::run(0x5f500000, 0xff09cf3c, 0xff092000, 0x80000, 0x1c9b7d8, 0x80000), at 0xf
        ee0f6a4
          [18] _start(0xff092000, 0x5f580000, 0x0, 0x0, 0x0, 0x0), at 0xfee0d410

        a pstack and pmap are also available:

        http://cores.germany/cgi/content.pl?file=/cores/CA_36376145/crash-1/pstack
        http://cores.germany/cgi/content.pl?file=/cores/CA_36376145/crash-1/pmap.prod

        there's also a dbxoutput, pstack,pmap,env and email from a second crash:

        http://cores.germany/cgi/content.pl?file=/cores/CA_36376145/crash-2/dbx-stack-trace-2
        http://cores.germany/cgi/content.pl?file=/cores/CA_36376145/crash-2/pstack-2
        http://cores.germany/cgi/content.pl?file=/cores/CA_36376145/crash-2/pmap-2
        http://cores.germany/cgi/content.pl?file=/cores/CA_36376145/crash-2/environment
        http://cores.germany/cgi/content.pl?file=/cores/CA_36376145/crash-2/email

        to see the complete explorer data see:

        http://cores.germany/cgi/view.pl
        and enter case id 36376145

        ============================ jvm Options ===========================
        JAVA_OPTIONS=-server -verbose:gc -Xnoclassgc -Xms1800m -Xmx1800m

        ============================= Patches ========================
           patches for 1.3.1 on Solaris 8:

           required: installed:

           108652-33 Xsun Patch -
           108921-12 dtwm Patch 108921-07
           108940-24 motif Patch 108940-12

           - I don't think that they need these patches as they use the -server option.
           or is X11 still used with -server?

        ============================= Analysis ==========================

        an analysis from my colleague Kevin Walls:

           Hi Peter,

              EXCEPTIONS.CPP line 205 is what the ascii string in hex error message decodes
              to. that line says:

              205 fatal("ExceptionMark constructor expects no pending exceptions");

              ..which suggests to me that it's handling an exception, and sees that one is
              already pending.

              In the pstack of lwp 1340 which aborts:

              (incidentally - are they using the alternate thread library in lib/lwp? That
              would be my guess as why lwp/thread numbers all seem to match! Not sure if
              it's relevant or if they should be using it - could be a useful comparison
              good/bad?)

              .
              .
              .
               fed5d670
              __1cSInterpreterRuntimebFexception_handler_for_exception6FpnKJavaThread_pnHoopD
              esc__pC_ (1c9b7d8, f74373f8, 5f57fc48, ff092000, 1c9b7d8, 109a0) + 338
               00123860 ???????? (5f57fbdc, 1, ff09fa58, 12d944, 8, 5f57fae8)
               ff0f9968 __1cMStubRoutinesG_code1_ (5f57fc68, 5f57fea0, a, f7e13598, 4,
              5f57fb80) + 3e8
               fecc67ec
              __1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_p
              nGThread__v_ (5f57fe98, ff092000, 5f57fde4, 1c9b7d8, 123d78, 5f57fea0) + 308
              .
              .
              .

              The middle line here ??????? - this is probably in the program text so I guess
              it's their native app? pmap would prove this.

              If it is their program, then it's then calling into: (running
              /opt/SUNWspro/bin/dem on the symbol in the pstack):

              unsigned
              char*InterpreterRuntime::exception_handler_for_exception(JavaThread*,oopDesc*)

              ..which seems to be what works out the continuation address when an exception
              happens (but from above, one was already pending I suppose, hence the abort).

              If that was true it could be their own fault: however, I think the call_helper
              routine is all java land execution stuff, so the ????? address may be on the
              heap (again, pstack will prove it - pstack on the core should be enough to show
              if it's the heap?).

              So the ?????? may be compiled code:

              java -Xint would run in interpreted, no-hotspot mode. I hope the problem is
              reproducible and they have a test environment!

              Ah - from:

              http://cheesypoof.uk/lxr/source/hotspot1.3.1/src/share/vm/runtime/stubRoutines.
              hpp?p=Java_1.3.1

               17 // StubRoutines provides entry points to assembly routines used by
               18 // compiled code and the run-time system. Platform-specific entry
               19 // points are defined in the platform-specific inner class.
               20 //
               
               ..so it IS compiled code!

              Maybe we've got a new problem statement:

              compiled code is generating an exception while one is already pending
              (with it to be confirmed if non-compiled code has the same problem)

              I'm really not sure if that's any help at all, but it was interesting!

              Kevin Walls


              the address 00123860 hex is within the heap: This is the line in pmap:
              00026000 39832K read/write/exec [ heap ]

              So it's dynamically compiled code as we thought -- ie . hotspot-compiled
              code.


        =========================================================

        I hope that this data is enough for you to find out what happend.
        if you need more information then ask me.

        ###@###.###

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  sundar Sundararajan Athijegannathan
                  Reporter:
                  pmaier Peter Maier (Inactive)
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  0 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved:
                    Imported:
                    Indexed: