Saturday, April 19, 2014

gdb: Debugging core dump of a user space application


What is a core dump?

A core dump is a snapshot of useful information of the process - such as memory contents, registers etc. Core dump files are represented in ELF format.

Triggers that generate core dump:
  • A core dump is automatically triggered (depending of system core dump configuration) in the event of some fatal error of a process.
  • A core dump can be generated manually using linux command gcore.
  • A core dump can be generated manually using gdb facilities.
  • A core dump can be generated programmatically from inside a process.

Core dump can be used by developers offline to debug fatal errors of the process, inspect process state even for non-critical software issues.

The offline debugging ability offered by core dumps helps:
  • debug software issues which are infrequently reproducible
  • debug software issues where access to the affected devices is limited/unavailable (such as customer environment)

Are core dumps generated by default?

Depends on the configuration. System should be configured to generate core dumps. The default behavior depends on the default system configuration for core dump file.

How to configure a system to generate a core dump in the event of fatal errors?

Set the maximum size of core files created using the following command:
 
ulimit -c <max_core_file_size>

When set to 0, core files are not generated. When core file being generated exceeds the above size, then core file would be truncated to the above size. To set core file size to unlimited, use the following command:

ulimit -c unlimited

You can check the current value using the following command:

root@babu-VirtualBox:~/tools/core_dump# ulimit -a | grep core
core file size          (blocks, -c) 0
root@babu-VirtualBox:~/tools/core_dump# 

The core file location & format can be configured using the following command:

echo "/tmp/core_files/core.%p" > /proc/sys/kernel/core_pattern

If no directory is specified (as provided above - /tmp/core_files), the generated core file is stored in the current directory of the process.

%p above specifies the format of the core file name. Here are all the allowed format specifiers:


%p: pid  of the process dumped 

%u: uid of the process dumped
%g: gid of the process dumped
%s: signal number causing core dump
%t: timestamp of core dump. specified in seconds since seconds since 0:00h, 1 Jan 1970
%h: hostname (uname command output)
%e: executable filename

The file location & format configured above is not retained across reboots. To keep the configuration even across reboots, the above config should be done in /etc/sysctl.conf. Add "kernel.core_pattern=/tmp/core_files/core.%p" to /etc/sysctl.conf.



The currently configured core file location & format can be viewed using the following command:

root@babu-VirtualBox:~/tools/core_dump# cat  /proc/sys/kernel/core_pattern
core
root@babu-VirtualBox:~/tools/core_dump#

or

root@babu-VirtualBox:~/tools/core_dump# sysctl -a | grep core_
kernel.core_pattern = core
kernel.core_pipe_limit = 0
kernel.core_uses_pid = 1
root@babu-VirtualBox:~/tools/core_dump# 

In addition to the above commands, I found two more core dump related configuration facilities:

root@babu-VirtualBox:~/tools/core_dump# cat /proc/sys/kernel/core_pipe_limit 
0
root@babu-VirtualBox:~/tools/core_dump# cat /proc/sys/kernel/core_uses_pid 
1 <-- Impact of a non-zero value is same as %p format specifier in /proc/sys/kernel/core_pattern 
root@babu-VirtualBox:~/tools/core_dump# 

or

root@babu-VirtualBox:~/tools/core_dump# sysctl -a | grep core_
kernel.core_pattern = core
kernel.core_pipe_limit = 0
kernel.core_uses_pid = 1
root@babu-VirtualBox:~/tools/core_dump# 

I am yet to explore the purpose of these configuration facilities. 


How to manually generate a core dump?


  • CLI method:
root@babu-VirtualBox:~/tools/core_dump# ls
latencytop  latencytop.c  latencytop.o  Makefile
root@babu-VirtualBox:~/tools/core_dump# pidof latencytop 
2832
root@babu-VirtualBox:~/tools/core_dump# gcore 2832
0xb76e5424 in __kernel_vsyscall ()
Saved corefile core.2832
root@babu-VirtualBox:~/tools/core_dump# ls
core.2832  latencytop  latencytop.c  latencytop.o  Makefile
root@babu-VirtualBox:~/tools/core_dump# 

Note: This method generated core file irrespective of ulimit configuration.


or


root@babu-VirtualBox:~/tools/core_dump# ls

ex  latencytop  latencytop.c  latencytop.o  Makefile
root@babu-VirtualBox:~/tools/core_dump# kill -s SIGSEGV `pidof latencytop`  (or SIGABRT instead of SIGSEGV)
root@babu-VirtualBox:~/tools/core_dump# ls
core.3669  ex  latencytop  latencytop.c  latencytop.o  Makefile
root@babu-VirtualBox:~/tools/core_dump# 

Note: This method generated core file only if ulimit is configured to a non-zero size.

  • GDB method:
root@babu-VirtualBox:~/tools/core_dump# ls
latencytop  latencytop.c  latencytop.o  Makefile
root@babu-VirtualBox:~/tools/core_dump# pidof latencytop 
2832
root@babu-VirtualBox:~/tools/core_dump# gdb -p 2832
GNU gdb (GDB) 7.6.1-ubuntu
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 2832
Reading symbols from /home/babu/tools/core_dump/latencytop...done.
Reading symbols from /lib/i386-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug/lib/i386-linux-gnu/libc-2.17.so...done.
done.
Loaded symbols for /lib/i386-linux-gnu/libc.so.6
Reading symbols from /lib/ld-linux.so.2...Reading symbols from /usr/lib/debug/lib/i386-linux-gnu/ld-2.17.so...done.
done.
Loaded symbols for /lib/ld-linux.so.2
0xb76e5424 in __kernel_vsyscall ()
(gdb) generate-core-file
Saved corefile core.2832
(gdb) quit
A debugging session is active.
            Inferior 1 [process 2832] will be detached.
Quit anyway? (y or n) y
Detaching from program: /home/babu/tools/core_dump/latencytop, process 2832
root@babu-VirtualBox:~/tools/core_dump# 
root@babu-VirtualBox:~/tools/core_dump# ls
core.2832  latencytop  latencytop.c  latencytop.o  Makefile
root@babu-VirtualBox:~/tools/core_dump# 

Note: This method generated core file irrespective of ulimit configuration.


How to programmatically generate a core dump from inside a process?

  • Invoke abort where ever core dump should be generated. Ofcourse, the process gets terminated after generated the core. Another point to remember - System should have been configured to generate core dumps. Otherwise, abort will not generate core.
  • Another approach to generate core dump without terminating the process is to fork the process, let the child invoke abort & let the parent continue with normal execution.

Note: This method generated core file only if ulimit is configured to a non-zero size.


Steps to ANALYZE core dump file offline

gdb is used to analyze core dump file. After using the steps required to load the core file into gdb, all gdb commands can be used to inspect/debug the core file. 

Step1:
Identify the name of the application from the core file.

If %e not configured in /proc/sys/kernel/core_pattern, then the core file name itself indicates the application that generated the core. If not, then we can use the following command to get the application name:


root@babu-VirtualBox:~/tools/core_dump# file core.5240 
core.5240: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from './latencytop'
root@babu-VirtualBox:~/tools/core_dump# 

or

root@babu-VirtualBox:~/tools/core_dump# strings core.5240 | tail -n 1
./latencytop
root@babu-VirtualBox:~/tools/core_dump# 

Step2:
To debug a core file using gdb, we need binaries with debug symbols. However, in embedded systems, debug disabled or debug-symbol-stripped binaries are used in QA environment and customer deployments. So, its most likely that we first need to get a debug enabled application binary and shared/static libraries. We also need to ensure that the code from which we build debug binary is same as the code of the binaries on which the software issue is observed.

To demo the core dump tool, let me use the application latencytop (https://github.com/babuneelam/gcov_uspace_tests/tree/master/latencytop). I have enabled gdb flags in the Makefile as well. And I then generated a core file using kill command.

Step3:
Feed the core file & debug-enabled/unstripped binary to gdb to debug the core file:
root@babu-VirtualBox:~/tools/core_dump# gdb latencytop 
GNU gdb (GDB) 7.6.1-ubuntu
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/babu/tools/core_dump/latencytop...done.
(gdb) set solib-search-path /lib/i386-linux-gnu/  <--- this can be a list of colon separated PATHs
(gdb) core-file core.3678 
[New LWP 3678]
warning: Can't read pathname for load map: Input/output error.
Core was generated by `./latencytop'.
#0  0xb7725424 in __kernel_vsyscall ()
(gdb) 
(gdb) bt
#0  0xb7725424 in __kernel_vsyscall ()
#1  0xb7612740 in __nanosleep_nocancel ()
    at ../sysdeps/unix/syscall-template.S:81
#2  0xb7612563 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#3  0x08048d81 in main (argc=1, argv=0xbfd2a354) at latencytop.c:173
(gdb) list latencytop.c:173
168
169 //abort();
170
171    while ((iterations == 0) || (count++ < iterations)) {
172
173        sleep(delay);
174
175        e = NULL;
176        if (pid) {
177            if (tid) {
(gdb) 


solib-search path provides the location of the shared libraries the binary uses. Without providing info about where the shared libraries are, the stack trace wouldn't display the symbol names that help corner down the crashing code path. In this case, I am setting shared library path to /lib/i386-linux-gnu/. However, i may different in different systems. For embedded systems where the development & build machines are different, developers need to spend significant effort in setting up the library path. This would be even more challenging if the build system were not to place all shared libraries in common build directory. 

How would the stack trace be if we don't feed the location of libraries to gdb during core dump analysis:

root@babu-VirtualBox:~/tools/core_dump# gdb latencytop
GNU gdb (GDB) 7.6.1-ubuntu
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/babu/tools/core_dump/latencytop...done.
(gdb) set solib-search-path /lib1/   <--- Providing incorrect shared library path for demonstration purpose
(gdb) core-file core.3678 
[New LWP 3678]
warning: Can't read pathname for load map: Input/output error.
warning: Could not load shared library symbols for 2 libraries, e.g. /lib/i386-linux-gnu/libc.so.6.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
Core was generated by `./latencytop'.
#0  0xb7725424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7725424 in __kernel_vsyscall ()
#1  0xb7612740 in ?? ()     
#2  0xb7612563 in ?? ()
#3  0xbfd2a0bc in ?? ()
#4  0x00000000 in ?? () <--- symbol names are not displayed!! As we can see, this is not limited to just the symbols of shared library, but even impacted symbols of the binary !!
(gdb) 

How would the stack trace be if we don't compile the binary with debug (-g) flag or strip the debug symbols:
root@babu-VirtualBox:~/tools/core_dump# gdb latencytop
GNU gdb (GDB) 7.6.1-ubuntu
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/babu/tools/core_dump/latencytop...(no debugging symbols found)...done.
(gdb) set solib-search-path /lib/i386-linux-gnu/
(gdb) core core.5188 
[New LWP 5188]
warning: Can't read pathname for load map: Input/output error.
Core was generated by `./latencytop'.
Program terminated with signal 11, Segmentation fault.
#0  0xb77b5424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb77b5424 in __kernel_vsyscall ()
#1  0xb76a2740 in __nanosleep_nocancel ()
    at ../sysdeps/unix/syscall-template.S:81
#2  0xb76a2563 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#3  0x08048d81 in main ()  <-- With debug flags enabled, we even got info about file & line number - "latencytop.c:173".. This is missing without gdb flags!!
(gdb) 

TBD
solib-absolute-prefix - TBD


Core dump tool - Internals

Following is the sequence of events/code-flows that create a core dump of a process:

  • User space application encounters an fatal-error/exception. This leads kernel to raise a signal for this process. This signal could be raised manually using kill command or methods as well. 
  • Kernel basically sets a flag in process descriptor indicating that a signal is raised & it's handling is pending. 
  • Kernel then handles the signal. Signal handling is done by kernel every time the process is resumed for execution in user space. Today, kernel has this resuming opportunity in scheduler and interrupt-handler context. So, in these two places, kernel checks for pending signals of the process.
  • If there are pending signals, then kernel invokes default signal handler: If the signal were compile-time configured to dump registers, then core dump is generated. User space process is not invoked at all in accomplishing this. If a custom signal handler were registered, then custom handling instead of core dump is done.


References:
http://ss64.com/bash/ulimit.html
http://man7.org/linux/man-pages/man2/getrlimit.2.html
http://man7.org/linux/man-pages/man5/core.5.html

No comments:

UA-48797665-1