Uploaded image for project: 'JDK'
  1. JDK
  2. JDK-8271121

ZGC: stack overflow (segv) when -Xlog:gc+start=debug

    XMLWordPrintable

    Details

    • Subcomponent:
      gc
    • Introduced In Build:
      b26
    • Introduced In Version:
      16
    • Resolved In Build:
      b10
    • CPU:
      x86_64
    • OS:
      linux

      Backports

        Description

        ADDITIONAL SYSTEM INFORMATION :
        I was able to reproduce the issue using OpenJDK 16.0.1, 17 (EA) and 18 (EA).

        $ uname -a
        Linux <redacted> 5.12.14-arch1-1 #1 SMP PREEMPT Thu, 01 Jul 2021 07:26:06 +0000 x86_64 GNU/Linux

        $ jdk-16.0.1/bin/java -version
        openjdk version "16.0.1" 2021-04-20
        OpenJDK Runtime Environment (build 16.0.1+9-24)
        OpenJDK 64-Bit Server VM (build 16.0.1+9-24, mixed mode, sharing)

        $ jdk-17/bin/java -version
        openjdk version "17-ea" 2021-09-14
        OpenJDK Runtime Environment (build 17-ea+29-2576)
        OpenJDK 64-Bit Server VM (build 17-ea+29-2576, mixed mode, sharing)

        $ jdk-18/bin/java -version
        openjdk version "18-ea" 2022-03-15
        OpenJDK Runtime Environment (build 18-ea+4-122)
        OpenJDK 64-Bit Server VM (build 18-ea+4-122, mixed mode, sharing)

        A DESCRIPTION OF THE PROBLEM :
        When the JVM is started with -XX:+UseZGC and -Xlog:gc+start=debug, oftentimes (80-90% of the time) it crashes eventually (usually within several seconds to several minutes) with a SIGSEGV. Below is the backtrace obtained from the core dump. Most of the frames are the repetition of #6-10 and omitted from the list.

        (gdb) bt
        #0 0x00007f461bcf605b in ZPageAllocator::alloc_page(unsigned char, unsigned long, ZAllocationFlags) () from jdk-16.0.1/lib/server/libjvm.so
        #1 0x00007f461bcdf727 in ZHeap::alloc_page(unsigned char, unsigned long, ZAllocationFlags) () from jdk-16.0.1/lib/server/libjvm.so
        #2 0x00007f461bcf2834 in ZObjectAllocator::alloc_object_in_shared_page(ZPage**, unsigned char, unsigned long, unsigned long, ZAllocationFlags) () from jdk-16.0.1/lib/server/libjvm.so
        #3 0x00007f461bd00f82 in ZRelocate::relocate_object(ZForwarding*, unsigned long) const () from jdk-16.0.1/lib/server/libjvm.so
        #4 0x00007f461b3c0bd0 in AccessInternal::PostRuntimeDispatch<ZBarrierSet::AccessBarrier<286790ul, ZBarrierSet>, (AccessInternal::BarrierType)3, 286790ul>::oop_access_barrier(oopDesc*, long) () from jdk-16.0.1/lib/server/libjvm.so
        #5 0x00007f461bc1b9b2 in JavaThread::get_thread_name() const () from jdk-16.0.1/lib/server/libjvm.so
        #6 0x00007f461bd09197 in ZStatCriticalPhase::register_start(TimeInstant<CompositeCounterRepresentation, CompositeElapsedCounterSource> const&) const () from jdk-16.0.1/lib/server/libjvm.so
        #7 0x00007f461bcdec17 in ZForwarding::wait_page_released() const () from jdk-16.0.1/lib/server/libjvm.so
        #8 0x00007f461bd00fc8 in ZRelocate::relocate_object(ZForwarding*, unsigned long) const () from jdk-16.0.1/lib/server/libjvm.so
        #9 0x00007f461b3c0bd0 in AccessInternal::PostRuntimeDispatch<ZBarrierSet::AccessBarrier<286790ul, ZBarrierSet>, (AccessInternal::BarrierType)3, 286790ul>::oop_access_barrier(oopDesc*, long) () from jdk-16.0.1/lib/server/libjvm.so
        #10 0x00007f461bc1b9b2 in JavaThread::get_thread_name() const () from jdk-16.0.1/lib/server/libjvm.so
        ... (repeated frames ommited)
        #14590 0x00007f461bc1b9b2 in JavaThread::get_thread_name() const () from jdk-16.0.1/lib/server/libjvm.so
        #14591 0x00007f461bd09197 in ZStatCriticalPhase::register_start(TimeInstant<CompositeCounterRepresentation, CompositeElapsedCounterSource> const&) const () from jdk-16.0.1/lib/server/libjvm.so
        #14592 0x00007f461bcf5ab5 in ZPageAllocator::alloc_page_stall(ZPageAllocation*) () from jdk-16.0.1/lib/server/libjvm.so
        #14593 0x00007f461bcf62b0 in ZPageAllocator::alloc_page(unsigned char, unsigned long, ZAllocationFlags) () from jdk-16.0.1/lib/server/libjvm.so
        #14594 0x00007f461bcdf727 in ZHeap::alloc_page(unsigned char, unsigned long, ZAllocationFlags) () from jdk-16.0.1/lib/server/libjvm.so
        #14595 0x00007f461bcf2834 in ZObjectAllocator::alloc_object_in_shared_page(ZPage**, unsigned char, unsigned long, unsigned long, ZAllocationFlags) () from jdk-16.0.1/lib/server/libjvm.so
        #14596 0x00007f461bcdb232 in ZCollectedHeap::allocate_new_tlab(unsigned long, unsigned long, unsigned long*) () from jdk-16.0.1/lib/server/libjvm.so
        #14597 0x00007f461b9d03de in MemAllocator::allocate_inside_tlab_slow(MemAllocator::Allocation&) const () from jdk-16.0.1/lib/server/libjvm.so
        #14598 0x00007f461b9d072e in MemAllocator::allocate() const () from jdk-16.0.1/lib/server/libjvm.so
        #14599 0x00007f461b6723b1 in InstanceKlass::allocate_instance(Thread*) () from jdk-16.0.1/lib/server/libjvm.so
        #14600 0x00007f461bb18cc4 in OptoRuntime::new_instance_C(Klass*, JavaThread*) () from jdk-16.0.1/lib/server/libjvm.so
        #14601 0x00007f45fd8578ad in ?? ()
        #14602 0x0000100001200000 in ?? ()
        #14603 0x00007f460530d3c8 in ?? ()
        #14604 0x000186a00000eb78 in ?? ()
        #14605 0x00007f43afa382a8 in ?? ()
        #14606 0x0000000000000000 in ?? ()

        I believe this is a stack overflow issue (and not null pointer dereference or alike) because the fault always seems to happen while the processor is running the instruction like `mov %esi,-0x118(%rbp)` or `push %rbx` to access the stack.

        REGRESSION : Last worked in version 15

        STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
        Run java with -Xlog:gc+start=debug and -XX:+UseZGC:

        $ jdk-16.0.1/bin/java -Xlog:gc+start=debug:stdout:t,l,tg -Xmx128m -Xms128m -XX:+UseZGC Repro.java

        EXPECTED VERSUS ACTUAL BEHAVIOR :
        EXPECTED -
        The JVM should keep running without crashing with a segmentation fault.
        ACTUAL -
        It crashes.

        $ jdk-16.0.1/bin/java -Xlog:gc+start=debug:stdout:t,l,tg -Xmx128m -Xms128m -XX:+UseZGC Repro.java
        [2021-07-09T15:05:48.103+0900][info][gc,start] GC(0) Garbage Collection (Warmup)
        [2021-07-09T15:05:48.203+0900][info][gc,start] GC(1) Garbage Collection (Allocation Rate)
        [2021-07-09T15:05:48.302+0900][info][gc,start] GC(2) Garbage Collection (Allocation Rate)
        [2021-07-09T15:05:48.402+0900][info][gc,start] GC(3) Garbage Collection (Allocation Rate)
        [2021-07-09T15:05:48.502+0900][info][gc,start] GC(4) Garbage Collection (Allocation Rate)
        [2021-07-09T15:05:48.569+0900][debug][gc,start] Allocation Stall (main)
        [2021-07-09T15:05:48.569+0900][info ][gc,start] GC(5) Garbage Collection (Allocation Stall)
        ... (omitted similar logs)
        [2021-07-09T15:05:50.336+0900][info ][gc,start] GC(44) Garbage Collection (Allocation Stall)
        [2021-07-09T15:05:50.397+0900][debug][gc,start] Allocation Stall (main)
        [2021-07-09T15:05:50.397+0900][info ][gc,start] GC(45) Garbage Collection (Allocation Stall)
        [2021-07-09T15:05:50.470+0900][debug][gc,start] Allocation Stall (main)
        [2021-07-09T15:05:50.470+0900][info ][gc,start] GC(46) Garbage Collection (Allocation Stall)
        zsh: segmentation fault (core dumped) jdk-16.0.1/bin/java -Xlog:gc+start=debug:stdout:t,l,tg -Xmx128m -Xms128m -XX:+UseZGC Repro.java

        ---------- BEGIN SOURCE ----------
        // java -Xlog:gc+start=debug:stdout:t,l,tg -Xmx128m -Xms128m -XX:+UseZGC Repro.java
        public class Repro {
        public static void main(String[] args) throws InterruptedException {
        final String[] objs = new String[100000];
        while (true)
        for (int i = 0; i < 100000; ++i)
        objs[i] = new String("abcdefghijklmnopqrstuvwxyz");
        }
        }
        ---------- END SOURCE ----------

        CUSTOMER SUBMITTED WORKAROUND :
        Change -Xlog:gc+start=debug to -Xlog:gc+start=info. Not sure if this is enough to work around the issue, but at least it avoids the problematic code path found in the backtraces and seems to be working fine.

        FREQUENCY : often


          Attachments

            Issue Links

              Activity

                People

                Assignee:
                pliden Per Liden
                Reporter:
                webbuggrp Webbug Group
                Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                  Dates

                  Created:
                  Updated:
                  Resolved: