为什么我的题目只显示半截(图片只显示半截)
986
2022-05-29
背景说明
问题分析
背景说明
问题分析
背景说明
这个内容只是为了做个记录。
因为项目中有出现 coredump 的情况。
问题分析
先用 GDB 调起来。
[app@主机A bin]$ gdb PROGRAM core.31018
下面是一连串的 GDB 信息。
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-80.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later
上面这段话的意思是,随便用,没毛病。
Reading symbols from /bin/PROGRAM...done. [New LWP 31018] [New LWP 31027] [New LWP 31022] [New LWP 31036] [New LWP 31038] [New LWP 31041] [New LWP 31044] [New LWP 31047] [New LWP 31042] [New LWP 31032] [New LWP 31033] [New LWP 31034] [New LWP 31035] [New LWP 31037] [New LWP 31020] [New LWP 31026] [New LWP 31031] [New LWP 31030] [New LWP 31040] [New LWP 31039] [New LWP 31046] [New LWP 31045] [New LWP 31043] [New LWP 31019] [New LWP 31025] [New LWP 31024] [New LWP 31023] [New LWP 31021] [New LWP 31029] [New LWP 31028]
上面是 LWP 编号,也就是我们常说的线程号,在 linux 中线程就是 LWP,有人说,LWP 不是线程,而是进程。因为是 light-weight process 嘛,肯定是进程,是的,又不是 thread,确实它是叫做轻量级进程。但是在 linux中,除了它其他的也没有线程了。看一下 WIKI 上说的:
In computer operating systems, a light-weight process (LWP) is a means of achieving multitasking. In the traditional meaning of the term, as used in Unix System V and Solaris, a LWP runs in user space on top of a single kernel thread and shares its address space and system resources with other LWPs within the same process. Multiple user level threads, managed by a thread library, can be placed on top of one or many LWPs - allowing multitasking to be done at the user level, which can have some performance benefits.
看了半天,也不知道所以然是啥对吧。那就对了,不用纠结,来跟我一起说,计较那么多概念干吗,这个东西就是线程!
[Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1".
上面是说 debug 用的是啥子库。
Core was generated by `PROGRAM -g 1 -i 3006 -u VM_16_46_centos -U /data/app/log/LOG -m 0 -A'. Program terminated with signal 6, Ab
这里列出来了是怎么产生的 core。 这里有信号 6. 中止。 系统有多少信号呢?
大概是下面这么多。
那上面的处理动作是什么意思呢?
_A 缺省的动作是终止进程 _
_B 缺省的动作是忽略此信号 _
_C 缺省的动作是终止进程并进行内核映像转储(dump core) _
_D 缺省的动作是停止进程 _
_E 信号不能被捕获 _
_F 信号不能被忽略 _
#0 0x00007fa1fef385f7 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-19.2.el7.x86_64 elfutils-libelf-0.163-3.el7.x86_64 glibc-2.17-106.el7_2.4.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-10.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64 libcurl-7.29.0-25.el7.centos.x86_64 libgcc-4.8.5-4.el7.x86_64 libidn-1.28-4.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 libssh2-1.4.3-10.el7.x86_64 libstdc++-4.8.5-4.el7.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 nspr-4.10.8-2.el7_1.x86_64 nss-3.19.1-18.el7.x86_64 nss-softokn-freebl-3.16.2.3-13.el7_1.x86_64 nss-util-3.19.1-4.el7_1.x86_64 openldap-2.4.40-8.el7.x86_64 openssl-libs-1.0.1e-42.el7.9.x86_64 pcre-8.32-15.el7.x86_64 readline-6.2-9.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64
上面这些是引用了一系列的东西来 debug这个 core 文件。要是换了个机器说不定 core 的。要是换了个机器说不定 core 的内容都看不到了呢(我猜的,我并没有那么闲,真的换个机器试一下)。
查看断点。
(gdb) bt #0 0x00007fa1fef385f7 in raise () from /lib64/libc.so.6 #1 0x00007fa1fef39ce8 in abort () from /lib64/libc.so.6 #2 0x00007fa1fef78317 in __libc_message () from /lib64/libc.so.6 #3 0x00007fa1fef7e184 in malloc_printerr () from /lib64/libc.so.6 #4 0x00007fa1fef818e7 in _int_malloc () from /lib64/libc.so.6 #5 0x00007fa1fef828dc in malloc () from /lib64/libc.so.6 #6 0x000000000043a147 in CMemPool::frealloc (ud=0x0, ptr=0x0, osize=0, nsize=64, p=0x1a8a450) at MemPool.h:266 #7 0x0000000000434898 in luaM_realloc_ (L=0x1b344e0, block=0x0, osize=0, nsize=64) at lmem.cpp:79 #8 0x000000000043b481 in luaH_new (L=0x1b344e0, narray=0, nhash=0) at ltable.cpp:359 #9 0x000000000042cbf8 in lua_createtable (L=0x1b344e0, narray=0, nrec=0) at lapi.cpp:582 #10 0x00007fa1fecf0f76 in getMessage (l=0x1b344e0, pMessage=0x7fa1bc0008c0) at message.h:218 #11 0x00007fa1fecf3af6 in getResponse (l=0x1b344e0, res=0x1b0d6d0) at service.cpp:28 #12 0x00007fa1fecf3d3b in sendM (l=0x1b344e0) at service.cpp:59 #13 0x0000000000430dc0 in luaD_precall (L=0x1b344e0, func=0x1b247b0, nresults=2) at ldo.cpp:319 #14 0x000000000043faad in luaV_execute (L=0x1b344e0, nexeccalls=1) at lvm.cpp:590 #15 0x0000000000431092 in luaD_call (L=0x1b344e0, func=0x1b24740, nResults=-1) at ldo.cpp:377 #16 0x000000000042d420 in f_call (L=0x1b344e0, ud=0x7ffeb1c9db20) at lapi.cpp:801 #17 0x000000000042ffed in luaD_rawrunprotected (L=0x1b344e0, f=0x42d3eb
上面这条就是告诉你这个 core 文件 dump 点是在哪里,调用关系从下到上。这里面看到的问题点基本上都是底层的调用。而这些底层的调用也只是表现,最重要的是上层的变量是怎么传的。
闲着没事,看下所有线程的当前断点。
(gdb) info threads Id Target Id Frame 30 Thread 0x7fa1f5365700 (LWP 31028) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 29 Thread 0x7fa1f4b64700 (LWP 31029) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 28 Thread 0x7fa1f8b6c700 (LWP 31021) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 27 Thread 0x7fa1f7b6a700 (LWP 31023) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 26 Thread 0x7fa1f7369700 (LWP 31024) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 25 Thread 0x7fa1f6b68700 (LWP 31025) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 24 Thread 0x7fa1f9b6e700 (LWP 31019) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 23 Thread 0x7fa1edb56700 (LWP 31043) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 22 Thread 0x7fa1ecb54700 (LWP 31045) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 21 Thread 0x7fa1ec353700 (LWP 31046) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 20 Thread 0x7fa1efb5a700 (LWP 31039) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 19 Thread 0x7fa1ef359700 (LWP 31040) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 18 Thread 0x7fa1f4363700 (LWP 31030) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 17 Thread 0x7fa1f3b62700 (LWP 31031) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 16 Thread 0x7fa1f6367700 (LWP 31026) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 15 Thread 0x7fa1f936d700 (LWP 31020) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 14 Thread 0x7fa1f0b5c700 (LWP 31037) 0x00007fa1feff09b3 in select () from /lib64/libc.so.6 13 Thread 0x7fa1f1b5e700 (LWP 31035) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 12 Thread 0x7fa1f235f700 (LWP 31034) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 11 Thread 0x7fa1f2b60700 (LWP 31033) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 10 Thread 0x7fa1f3361700 (LWP 31032) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 9 Thread 0x7fa1ee357700 (LWP 31042) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 8 Thread 0x7fa1ebb52700 (LWP 31047) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 7 Thread 0x7fa1ed355700 (LWP 31044) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 6 Thread 0x7fa1eeb58700 (LWP 31041) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 5 Thread 0x7fa1f035b700 (LWP 31038) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 4 Thread 0x7fa1f135d700 (LWP 31036) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 Thread 0x7fa1f836b700 (LWP 31022) 0x00007fa1fddaba82 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 2 Thread 0x7fa1f5b66700 (LWP 31027) 0x00007fa1fddab6d5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 * 1 Thread 0x7fa2009b0740 (LWP 31018) 0x00007fa1fef385f7 in raise () from /lib64/libc.so.6 (gdb)
大部分都在 wait/timewait 之类的,也没啥毛病。
尝试打印下变量:
(gdb) p req No symbol "req" in current context.
怎么没有符号表?
切一下frame。
(gdb) frame 29 #29 0x00000000004268ac in PROGRAM (req=0x7ffeb1c9e340) at srv.cpp:107 (gdb) p req $1 = (SVCINFO *) 0x7ffeb1c9e340
可以看到这个变量的定义和值。有人说,这玩意是地址怎么看?
其实有源码就什么都能看得到的。只是这里没有加载进来。
GDB 默认搜索当前目录,但是也没搜索到。
编译的时候是会记录源码位置的,但是因为这个主机上没有,所以看不到。
如果有兴趣玩的话,可以自己写一段把源码放一起,看看效果。
C++ 任务调度
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。