Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detection of memory leak in a high-volume system causes reboots #29

Open
weldpua2008 opened this issue May 11, 2017 · 5 comments
Open

Comments

@weldpua2008
Copy link

weldpua2008 commented May 11, 2017

Hello,
We are using the following script to generate Memory Leak Flame Graphs on our production servers every 30 minutes:

/usr/src/stapxx/stap++ /usr/src/stapxx/samples/lj-gc-objs.sxx -x `ps --no-headers -fC nginx|awk '/worker/  {print$2}'| shuf | head -n 1` -D MAXACTION=200000


/usr/src/stapxx/stap++ /usr/src/stapxx/samples/sample-bt-leaks.sxx  -x `ps --no-headers -fC nginx|awk '/worker/  {print$2}'| shuf | head -n 1` --arg time=5 -D STP_NO_OVERLOAD -D MAXMAPENTRIES=10000 > a.bt
/usr/src/FlameGraph/stackcollapse-stap.pl  a.bt >  a.cbt
/usr/src/FlameGraph/flamegraph.pl --countname=bytes --title="Memory Leak Flame Graph" a.cbt > a.svg
cp a.svg  /code/www/

We also using the following scripts

#every 60 minutes
 /usr/src/stapxx/stap++ /usr/src/stapxx/samples/lj-lua-stacks.sxx --arg time=5 --skip-badvars -x 6372 > /tmp/result.bt
/usr/src/openresty-systemtap-toolkit/fix-lua-bt /tmp/result.bt > /tmp/result-fix.bt
/usr/src/FlameGraph/stackcollapse-stap.pl /tmp/result-fix.bt > /tmp/result.cbt
/usr/src/FlameGraph/flamegraph.pl --encoding="ISO-8859-1" --title="Lua-land on-CPU for (`hostname`) at `date`" /tmp/result.cbt > /tlvmedia/code/www/result.svg

########
# every 10 minutes 
sudo stdbuf -oL /usr/src/stapxx/stap++ /usr/src/stapxx/samples/lj-vm-states.sxx -x `ps --no-headers -fC nginx|awk '/worker/  {print$2}'| shuf | head -n 1` --arg time=10 &> /tmp/lua-vm-state; echo 1;

After installation debug symbols ( debuginfo-install glibc ) we are getting randomly reboot

# last reboot
reboot   system boot  2.6.32-642.15.1. Thu May 11 09:51 - 12:17  (02:26)
reboot   system boot  2.6.32-642.15.1. Thu May 11 07:51 - 12:17  (04:26)
reboot   system boot  2.6.32-642.15.1. Thu May 11 07:21 - 12:17  (04:56)
reboot   system boot  2.6.32-642.15.1. Thu May 11 03:21 - 12:17  (08:56)
reboot   system boot  2.6.32-642.15.1. Thu May 11 02:22 - 12:17  (09:54)

We have CentOs 6.8:

# uname -a
Linux s 2.6.32-642.15.1.el6.x86_64 #1 SMP Fri Feb 24 14:31:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

@weldpua2008 weldpua2008 changed the title Detection of memory leak in a high-volume system cause reboots Detection of memory leak in a high-volume system causes reboots May 11, 2017
@agentzh
Copy link
Member

agentzh commented May 11, 2017

@weldpua2008 No, you should never use this tool in high-volume systems since this tool uses code instrumentation instead of sampling, unlike the on-CPU flame graph sampling tool.

@weldpua2008
Copy link
Author

weldpua2008 commented May 11, 2017

@agentzh We are using the following scripts for a months:

#every 60 minutes
# Full code https://gist.github.com/weldpua2008/8b60d336cdd2fee233812dd44cbd50c6
# 
 /usr/src/stapxx/stap++ /usr/src/stapxx/samples/lj-lua-stacks.sxx --arg time=5 --skip-badvars -x 6372 > /tmp/result.bt
/usr/src/openresty-systemtap-toolkit/fix-lua-bt /tmp/result.bt > /tmp/result-fix.bt
/usr/src/FlameGraph/stackcollapse-stap.pl /tmp/result-fix.bt > /tmp/result.cbt
/usr/src/FlameGraph/flamegraph.pl --encoding="ISO-8859-1" --title="Lua-land on-CPU for (`hostname`) at `date`" /tmp/result.cbt > /tlvmedia/code/www/result.svg

########
# every 10 minutes 
sudo stdbuf -oL /usr/src/stapxx/stap++ /usr/src/stapxx/samples/lj-vm-states.sxx -x `ps --no-headers -fC nginx|awk '/worker/  {print$2}'| shuf | head -n 1` --arg time=10 &> /tmp/lua-vm-state

but after adding the above script to schedule every 30 minutes (full version of our Memory Leak Flame Graph is at https://gist.github.com/weldpua2008/44e6884ac2bc6d0c129ddf03a9336656) we are experiencing reboot

@weldpua2008
Copy link
Author

weldpua2008 commented May 15, 2017

@agentzh,

# openresty -V
nginx version: openresty/1.11.2.3
built by gcc 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC)
built with OpenSSL 1.0.2k  26 Jan 2017
TLS SNI support enabled
configure arguments: --prefix=/usr/local/openresty/nginx --with-cc-opt='-O2 -I/usr/local/openresty/zlib/include -I/usr/local/openresty/pcre/include -I/usr/local/openresty/openssl/include' --add-module=../ngx_devel_kit-0.3.0 --add-module=../echo-nginx-module-0.60 --add-module=../xss-nginx-module-0.05 --add-module=../ngx_coolkit-0.2rc3 --add-module=../set-misc-nginx-module-0.31 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.06 --add-module=../srcache-nginx-module-0.31 --add-module=../ngx_lua-0.10.8 --add-module=../ngx_lua_upstream-0.06 --add-module=../headers-more-nginx-module-0.32 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.18 --add-module=../redis2-nginx-module-0.14 --add-module=../redis-nginx-module-0.3.7 --with-ld-opt='-Wl,-rpath,/usr/local/openresty/luajit/lib -L/usr/local/openresty/zlib/lib -L/usr/local/openresty/pcre/lib -L/usr/local/openresty/openssl/lib -Wl,-rpath,/usr/local/openresty/zlib/lib:/usr/local/openresty/pcre/lib:/usr/local/openresty/openssl/lib' --with-pcre-jit --with-ipv6 --with-stream --with-stream_ssl_module --with-http_v2_module --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --with-http_stub_status_module --with-http_realip_module --with-http_addition_module --with-http_auth_request_module --with-http_secure_link_module --with-http_random_index_module --with-http_geoip_module --with-http_gzip_static_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-threads --with-file-aio --with-dtrace-probes --with-http_ssl_module

I have added crash and kernel debug to read vmcore.
The vmcore-dmesg.txt content:

<7>stap_5d92dc3bc2adb3726c22e6eda3972f60_31180: systemtap: 2.9/0.164, base: ffffffffa0413000, memory: 3135data/69text/1110ctx/2058net/121062alloc kb, probes: 4
<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [<ffffffff812a2f6b>] strcmp+0xb/0x30
<4>PGD 1b91445067 PUD 1b098f2067 PMD 0
<4>Oops: 0000 [#1] SMP
<4>last sysfs file: /sys/module/xt_state/sections/__mcount_loc
<4>CPU 2
<4>Modules linked in: stap_5d92dc3bc2adb3726c22e6eda3972f60_31180(U) uprobes(U) dccp_diag dccp tcp_diag inet_diag bonding ipv6 ipt_LOG xt_recent xt_state xt_limit xt_comment iptable_filter iptable_raw iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle ip_tables ext2 microcode ipmi_devintf iTCO_wdt iTCO_vendor_support sg power_meter acpi_ipmi ipmi_si ipmi_msghandler ixgbe ptp pps_core mdio sb_edac edac_core i2c_i801 i2c_core lpc_ich mfd_core joydev ioatdma dca shpchp ext4 jbd2 mbcache sd_mod crc_t10dif megaraid_sas xhci_hcd ahci wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_694eefb08615576ffe8f8e195c3253fa_29461]
<4>
<4>Pid: 11717, comm: nginx Not tainted 2.6.32-696.1.1.el6.x86_64 #1 Supermicro PIO-618U-T4T+-ST031/X10DRU-i+
<4>RIP: 0010:[<ffffffff812a2f6b>]  [<ffffffff812a2f6b>] strcmp+0xb/0x30
<4>RSP: 0000:ffff8810c0583bc8  EFLAGS: 00010287
<4>RAX: 000000000000002f RBX: ffff880b3c643fe8 RCX: 0000000000002dc5
<4>RDX: 00000000000000d8 RSI: 0000000000000000 RDI: ffff880b3c643fe8
<4>RBP: ffff8810c0583bc8 R08: 0000000000000000 R09: 0000000000000000
<4>R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffa05df9e0
<4>R13: 00007ffc425f5000 R14: ffff88115f0b2ab0 R15: 0000000000001000
<4>FS:  00007f6b89d0d720(0000) GS:ffff880061c80000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>CR2: 0000000000000000 CR3: 0000001a3140f000 CR4: 00000000001407e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
<4>Process nginx (pid: 11717, threadinfo ffff8810c0580000, task ffff88115f0b2ab0)
<4>Stack:
<4> ffff8810c0583c18 ffffffffa0422109 00000000000000d8 ffffffff811bc950
<4><d> ffff8810c0583c58 ffffffffa05deba0 ffffffffa05deb80 ffff88115f0b2ab0
<4><d> ffffffffa05deb90 0000000000001000 ffff8810c0583c88 ffffffffa04196ae
<4>Call Trace:
<4> [<ffffffffa0422109>] _stp_vma_mmap_cb+0xd9/0x290 [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
<4> [<ffffffff811bc950>] ? mntput_no_expire+0x30/0x110
<4> [<ffffffffa04196ae>] __stp_call_mmap_callbacks+0x8e/0xf0 [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
<4> [<ffffffffa0422afc>] __stp_utrace_task_finder_target_quiesce+0x36c/0x400 [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
<4> [<ffffffff810e275a>] utrace_get_signal+0x3da/0x730
<4> [<ffffffff810abb5d>] ? hrtimer_try_to_cancel+0x3d/0xd0
<4> [<ffffffff81097ee6>] get_signal_to_deliver+0x316/0x460
<4> [<ffffffff8100a285>] do_signal+0x75/0x870
<4> [<ffffffff811e3c34>] ? ep_poll+0x314/0x350
<4> [<ffffffff8106c480>] ? default_wake_function+0x0/0x20
<4> [<ffffffff8100ab10>] do_notify_resume+0x90/0xc0
<4> [<ffffffff8100b3a1>] int_signal+0x12/0x17
<4>Code: 84 ff 40 88 39 74 0d 48 83 c1 01 48 83 ea 01 75 e7 c6 01 00 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 40 00 0f b6 07 <0f> b6 16 48 83 c7 01 48 83 c6 01 38 d0 75 0e 84 c0 75 ea 31 c0
<1>RIP  [<ffffffff812a2f6b>] strcmp+0xb/0x30
<4> RSP <ffff8810c0583bc8>
<4>CR2: 0000000000000000

crash /usr/lib/debug/lib/modules/2.6.32-696.1.1.el6.x86_64/vmlinux ./vmcore

crash> bt
PID: 11717  TASK: ffff88115f0b2ab0  CPU: 2   COMMAND: "nginx"
 #0 [ffff8810c0583790] machine_kexec at ffffffff8103fd6b
 #1 [ffff8810c05837f0] crash_kexec at ffffffff810d1e12
 #2 [ffff8810c05838c0] oops_end at ffffffff8154ee30
 #3 [ffff8810c05838f0] no_context at ffffffff8105186b
 #4 [ffff8810c0583940] __bad_area_nosemaphore at ffffffff81051af5
 #5 [ffff8810c0583990] bad_area at ffffffff81051c1e
 #6 [ffff8810c05839c0] __do_page_fault at ffffffff81052423
 #7 [ffff8810c0583ae0] do_page_fault at ffffffff81550dbe
 #8 [ffff8810c0583b10] page_fault at ffffffff8154e0b5
    [exception RIP: strcmp+11]
    RIP: ffffffff812a2f6b  RSP: ffff8810c0583bc8  RFLAGS: 00010287
    RAX: 000000000000002f  RBX: ffff880b3c643fe8  RCX: 0000000000002dc5
    RDX: 00000000000000d8  RSI: 0000000000000000  RDI: ffff880b3c643fe8
    RBP: ffff8810c0583bc8   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000246  R12: ffffffffa05df9e0
    R13: 00007ffc425f5000  R14: ffff88115f0b2ab0  R15: 0000000000001000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #9 [ffff8810c0583bd0] _stp_vma_mmap_cb at ffffffffa0422109 [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
#10 [ffff8810c0583c20] __stp_call_mmap_callbacks at ffffffffa04196ae [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
#11 [ffff8810c0583c90] __stp_utrace_task_finder_target_quiesce at ffffffffa0422afc [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
#12 [ffff8810c0583d00] utrace_get_signal at ffffffff810e275a
#13 [ffff8810c0583d90] get_signal_to_deliver at ffffffff81097ee6
#14 [ffff8810c0583e30] do_signal at ffffffff8100a285
#15 [ffff8810c0583f30] do_notify_resume at ffffffff8100ab10
#16 [ffff8810c0583f50] int_signal at ffffffff8100b3a1
    RIP: 0000003188ce91a3  RSP: 00007ffc4255ff88  RFLAGS: 00000246
    RAX: fffffffffffffffc  RBX: 0000000000000007  RCX: ffffffffffffffff
    RDX: 0000000000000200  RSI: 0000000001d83280  RDI: 0000000000000042
    RBP: 0000000000000001   R8: 000000000076f4e0   R9: 00007f6aaa6a0f78
    R10: 0000000000000007  R11: 0000000000000246  R12: 0000000001d734b0
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 00000000000000e8  CS: 0033  SS: 002b
crash>

crash> bt -f
PID: 11717  TASK: ffff88115f0b2ab0  CPU: 2   COMMAND: "nginx"
 #0 [ffff8810c0583790] machine_kexec at ffffffff8103fd6b
    ffff8810c0583798: 0000000003091000 ffff880003091000
    ffff8810c05837a8: 0000000003090000 ffff8810c0583b18
    ffff8810c05837b8: 8800000000000000 000000000000ffff
    ffff8810c05837c8: ffff8810c0583b18 ffff8810c05837f8
    ffff8810c05837d8: 0000000000000009 ffff88115f0b2ab0
    ffff8810c05837e8: ffff8810c05838b8 ffffffff810d1e12
 #1 [ffff8810c05837f0] crash_kexec at ffffffff810d1e12
    ffff8810c05837f8: 0000000000001000 ffff88115f0b2ab0
    ffff8810c0583808: 00007ffc425f5000 ffffffffa05df9e0
    ffff8810c0583818: ffff8810c0583bc8 ffff880b3c643fe8
    ffff8810c0583828: 0000000000000246 0000000000000000
    ffff8810c0583838: 0000000000000000 0000000000000000
    ffff8810c0583848: 000000000000002f 0000000000002dc5
    ffff8810c0583858: 00000000000000d8 0000000000000000
    ffff8810c0583868: ffff880b3c643fe8 ffffffffffffffff
    ffff8810c0583878: ffffffff812a2f6b 0000000000000010
    ffff8810c0583888: 0000000000010287 ffff8810c0583bc8
    ffff8810c0583898: 0000000000000000 ffff8810c05838f8
    ffff8810c05838a8: 0000000000000246 ffff8810c0583b18
    ffff8810c05838b8: ffff8810c05838e8 ffffffff8154ee30
 #2 [ffff8810c05838c0] oops_end at ffffffff8154ee30
    ffff8810c05838c8: 0000000000000000 ffff8810c0583b18
    ffff8810c05838d8: 0000000000000000 0000000000000009
    ffff8810c05838e8: ffff8810c0583938 ffffffff8105186b
 #3 [ffff8810c05838f0] no_context at ffffffff8105186b
    ffff8810c05838f8: ffff88106332f020 00000014651ec6d0
    ffff8810c0583908: ffff881092304e1e 0000000000000000
    ffff8810c0583918: 0000000000000000 ffff8810c0583b18
    ffff8810c0583928: ffff88115f0b2ab0 0000000000030001
    ffff8810c0583938: ffff8810c0583988 ffffffff81051af5
 #4 [ffff8810c0583940] __bad_area_nosemaphore at ffffffff81051af5
    ffff8810c0583948: ffff8810c0583968 ffffffffa01f6593
    ffff8810c0583958: ffff8810045843a8 ffff8810c0583b18
    ffff8810c0583968: 0000000000000000 0000000000000000
    ffff8810c0583978: ffff882066af1250 ffff88115f0b2ab0
    ffff8810c0583988: ffff8810c05839b8 ffffffff81051c1e
 #5 [ffff8810c0583990] bad_area at ffffffff81051c1e
    ffff8810c0583998: ffffffff81477808 0000000000000028
    ffff8810c05839a8: 0000000000000000 ffff881cfa6a2b80
    ffff8810c05839b8: ffff8810c0583ad8 ffffffff81052423
 #6 [ffff8810c05839c0] __do_page_fault at ffffffff81052423
    ffff8810c05839c8: ffff8810c0583a18 ffffffff8149de08
    ffff8810c05839d8: ffff8810c0583b18 0000000000000000
    ffff8810c05839e8: ffff881cfa6a2be8 0000000000000000
    ffff8810c05839f8: ffff88205d447600 ffff881067060020
    ffff8810c0583a08: ffff881061123180 0000000000000246
    ffff8810c0583a18: ffff8810c0583a58 ffffffff81480924
    ffff8810c0583a28: ffffffffa02ccd78 ffff88106586d240
    ffff8810c0583a38: ffff8810045843a8 ffff88106332f6e0
    ffff8810c0583a48: 0000000000000002 ffff8810045843a8
    ffff8810c0583a58: ffff8810c0583a78 ffffffffa0398821
    ffff8810c0583a68: ffff8810045843a8 ffff88106586d240
    ffff8810c0583a78: ffff8810c0583aa8 ffffffffa039a343
    ffff8810c0583a88: ffff8810045843a8 ffff88106332f6e0
    ffff8810c0583a98: ffff88106332f020 ffff88106332f6e8
    ffff8810c0583aa8: ffff8810c0583ae8 ffff8810c0583b18
    ffff8810c0583ab8: 0000000000000000 0000000000000000
    ffff8810c0583ac8: ffff88115f0b2ab0 0000000000001000
    ffff8810c0583ad8: ffff8810c0583b08 ffffffff81550dbe
 #7 [ffff8810c0583ae0] do_page_fault at ffffffff81550dbe
    ffff8810c0583ae8: 0000000000000001 ffffffffa05df9e0
    ffff8810c0583af8: 00007ffc425f5000 ffff88115f0b2ab0
    ffff8810c0583b08: ffff8810c0583bc8 ffffffff8154e0b5
 #8 [ffff8810c0583b10] page_fault at ffffffff8154e0b5
    [exception RIP: strcmp+11]
    RIP: ffffffff812a2f6b  RSP: ffff8810c0583bc8  RFLAGS: 00010287
    RAX: 000000000000002f  RBX: ffff880b3c643fe8  RCX: 0000000000002dc5
    RDX: 00000000000000d8  RSI: 0000000000000000  RDI: ffff880b3c643fe8
    RBP: ffff8810c0583bc8   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000246  R12: ffffffffa05df9e0
    R13: 00007ffc425f5000  R14: ffff88115f0b2ab0  R15: 0000000000001000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
    ffff8810c0583b18: 0000000000001000 ffff88115f0b2ab0
    ffff8810c0583b28: 00007ffc425f5000 ffffffffa05df9e0
    ffff8810c0583b38: ffff8810c0583bc8 ffff880b3c643fe8
    ffff8810c0583b48: 0000000000000246 0000000000000000
    ffff8810c0583b58: 0000000000000000 0000000000000000
    ffff8810c0583b68: 000000000000002f 0000000000002dc5
    ffff8810c0583b78: 00000000000000d8 0000000000000000
    ffff8810c0583b88: ffff880b3c643fe8 ffffffffffffffff
    ffff8810c0583b98: ffffffff812a2f6b 0000000000000010
    ffff8810c0583ba8: 0000000000010287 ffff8810c0583bc8
    ffff8810c0583bb8: 0000000000000000 0000000000001000
    ffff8810c0583bc8: ffff8810c0583c18 ffffffffa0422109
 #9 [ffff8810c0583bd0] _stp_vma_mmap_cb at ffffffffa0422109 [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
    ffff8810c0583bd8: 00000000000000d8 ffffffff811bc950
    ffff8810c0583be8: ffff8810c0583c58 ffffffffa05deba0
    ffff8810c0583bf8: ffffffffa05deb80 ffff88115f0b2ab0
    ffff8810c0583c08: ffffffffa05deb90 0000000000001000
    ffff8810c0583c18: ffff8810c0583c88 ffffffffa04196ae
#10 [ffff8810c0583c20] __stp_call_mmap_callbacks at ffffffffa04196ae [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
    ffff8810c0583c28: 0000000000000000 0000000008040074
    ffff8810c0583c38: ffff8810c0583c68 00007ffc425f5000
    ffff8810c0583c48: ffff8808fc4d7900 ffff880b3c643fe8
    ffff8810c0583c58: ffffffffa05deb80 ffff88115f0b2ab0
    ffff8810c0583c68: ffff880a54e2b3e0 ffffffffa05deb80
    ffff8810c0583c78: 000000000000006b 000000000000006a
    ffff8810c0583c88: ffff8810c0583cf8 ffffffffa0422afc
#11 [ffff8810c0583c90] __stp_utrace_task_finder_target_quiesce at ffffffffa0422afc [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
    ffff8810c0583c98: 0000000000000000 0000000008040074
    ffff8810c0583ca8: ffff881cfa6a2b80 ffff880b3c643fe8
    ffff8810c0583cb8: ffff880a54e2a000 ffff880b3c643000
    ffff8810c0583cc8: 0000000000000002 ffff88131ab524e0
    ffff8810c0583cd8: ffff8820103ed450 ffff88115f0b2ab0
    ffff8810c0583ce8: ffff8810c0583ed8 ffff88131ab524e8
    ffff8810c0583cf8: ffff8810c0583d88 ffffffff810e275a
#12 [ffff8810c0583d00] utrace_get_signal at ffffffff810e275a
    ffff8810c0583d08: 00000060c0583d48 0000000000000000
    ffff8810c0583d18: 0000000000000000 ffff8810c0583f58
    ffff8810c0583d28: ffff8810c0583e58 0000000000000001
    ffff8810c0583d38: 0000000500000060 0000010000000005
    ffff8810c0583d48: ffff8810c0583d88 ffffffff810abb5d
    ffff8810c0583d58: ffff8810c0583de8 ffff8810c0583f58
    ffff8810c0583d68: ffff88115f0b2ab0 ffff8810c0583e58
    ffff8810c0583d78: ffff88114058f5c0 ffff88115f0b2ab0
    ffff8810c0583d88: ffff8810c0583e28 ffffffff81097ee6
#13 [ffff8810c0583d90] get_signal_to_deliver at ffffffff81097ee6
    ffff8810c0583d98: 000000000000c350 ffff8810c0583db8
    ffff8810c0583da8: ffff8810c0583e48 ffff88115f0b3128
    ffff8810c0583db8: ffff8810c0583ed8 ffff8810c0583f58
    ffff8810c0583dc8: ffff88115f0b2ab0 ffff88115f0b2ab0
    ffff8810c0583dd8: ffff88115f0b2ab0 ffff882065225e48
    ffff8810c0583de8: ffff882065225640 ffff88115f0b3228
    ffff8810c0583df8: 0000000000002dc5 ffff8810c0583f58
    ffff8810c0583e08: ffff8810c0583ed8 ffff8810c0583e58
    ffff8810c0583e18: 0000000000000000 ffff88115f0b3228
    ffff8810c0583e28: ffff8810c0583f28 ffffffff8100a285
#14 [ffff8810c0583e30] do_signal at ffffffff8100a285
    ffff8810c0583e38: 0000000000000000 0000000000000286
    ffff8810c0583e48: ffff8810c0583f38 ffffffff811e3c34
    ffff8810c0583e58: 0000000000000286 ffff8811fffffffc
    ffff8810c0583e68: 00000200054b6300 0000000001d83280
    ffff8810c0583e78: 000000000004d17d 000000002267d4b8
    ffff8810c0583e88: ffff881000000001 ffff88115f0b2ab0
    ffff8810c0583e98: ffffffff8106c480 dead000000100100
    ffff8810c0583ea8: dead000000200200 ffff880c054b6300
    ffff8810c0583eb8: 0000000000000000 00000000006acfc0
    ffff8810c0583ec8: 000000000004d17d 000000002267d4b8
    ffff8810c0583ed8: 0000000000000000 0000000000000000
    ffff8810c0583ee8: 0000000000000000 0000000000000000
    ffff8810c0583ef8: 00011f31c3e676b8 0000000000000006
    ffff8810c0583f08: ffff8810c0583f58 0000000000000000
    ffff8810c0583f18: 0000000000000000 0000000000000000
    ffff8810c0583f28: ffff8810c0583f48 ffffffff8100ab10
#15 [ffff8810c0583f30] do_notify_resume at ffffffff8100ab10
    ffff8810c0583f38: 0000000000000007 0000000001d734b0
    ffff8810c0583f48: 0000000000000001 ffffffff8100b3a1
#16 [ffff8810c0583f50] int_signal at ffffffff8100b3a1
    RIP: 0000003188ce91a3  RSP: 00007ffc4255ff88  RFLAGS: 00000246
    RAX: fffffffffffffffc  RBX: 0000000000000007  RCX: ffffffffffffffff
    RDX: 0000000000000200  RSI: 0000000001d83280  RDI: 0000000000000042
    RBP: 0000000000000001   R8: 000000000076f4e0   R9: 00007f6aaa6a0f78
    R10: 0000000000000007  R11: 0000000000000246  R12: 0000000001d734b0
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 00000000000000e8  CS: 0033  SS: 002b
crash>


crash> ps is at https://gist.github.com/weldpua2008/5d19c26b80bfbdd0566561a9cbd3cde6
crash> vm https://gist.github.com/weldpua2008/e50b3f677016cd6bb523cfbd6389fdd8
cr5ash> files https://gist.github.com/weldpua2008/e9068f890bfbb05ac083f6365a99beb0

@hamishforbes
Copy link

Hi, Did you ever work out a solution for this?

I'm having similar issues, not running on a cron but just trying to pin down an intermittent hotloop problem.

I've found that running ngx-active-reqs script, from openresty-systemtap-toolkit, and then lj-lua-bt causes a crash almost every time.
This is on an up to date Centos 6.8 system

@agentzh
Copy link
Member

agentzh commented May 30, 2017

@hamishforbes Ensure you build the latest systemtap from its source release. Do not use the version in the system yum repository! It's ancient and very buggy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants