Andrew Burgess fd492bf1e2 gdb: handle main thread exiting during detach
Overview
========

Consider the following situation, GDB is in non-stop mode, the main
thread is running while a second thread is stopped.  The user has the
second thread selected as the current thread and asks GDB to detach.
At the exact moment of detach the main thread exits.

This situation currently causes crashes, assertion failures, and
unexpected errors to be reported from GDB for both native and remote
targets.

This commit addresses this situation for native and remote targets.
There are a number of different fixes, but all are required in order
to get this functionality working correct for native and remote
targets.

Native Linux Target
===================

For the native Linux target, detaching is handled in the function
linux_nat_target::detach.  In here we call stop_wait_callback for each
thread, and it is this callback that will spot that the main thread
has exited.

GDB then detaches from everything except the main thread by calling
detach_callback.

After this the first problem is this assert:

  /* Only the initial process should be left right now.  */
  gdb_assert (num_lwps (pid) == 1);

The num_lwps call will return 0 as the main thread has exited and all
of the other threads have now been detached.  I fix this by changing
the assert to allow for 0 or 1 lwps at this point.  As the 0 case can
only happen in non-stop mode, the assert becomes:

  gdb_assert (num_lwps (pid) == 1
	      || (target_is_non_stop_p () && num_lwps (pid) == 0));

The next problem is that we do:

  main_lwp = find_lwp_pid (ptid_t (pid));

and then proceed assuming that main_lwp is not nullptr.  In the case
that the main thread has exited though, main_lwp will be nullptr.

However, we only need main_lwp so that GDB can detach from the
thread.  If the main thread has exited, and GDB has already detached
from every other thread, then GDB has finished detaching, GDB can skip
the calls that try to detach from the main thread, and then tell the
user that the detach was a success.

For Remote Targets
==================

On remote targets there are two problems.

First is that when the exit occurs during the early phase of the
detach, we see the stop notification arrive while GDB is removing the
breakpoints ahead of the detach.  The 'set debug remote on' trace
looks like this:

  [remote] Sending packet: $z0,7f1648fe0241,1#35
  [remote]   Notification received: Stop:W0;process:2a0ac8
  # At this point an unpatched gdbserver segfaults, and the connection
  # is broken.  A patched gdbserver continues as below...
  [remote] Packet received: E01
  [remote] Sending packet: $z0,7f1648ff00a8,1#68
  [remote] Packet received: E01
  [remote] Sending packet: $z0,7f1648ff132f,1#6b
  [remote] Packet received: E01
  [remote] Sending packet: $D;2a0ac8#3e
  [remote] Packet received: E01

I was originally running into Segmentation Faults, from within
gdbserver/mem-break.cc, in the function find_gdb_breakpoint.  This
function calls current_process() and then dereferences the result to
find the breakpoint list.

However, in our case, the current process has already exited, and so
the current_process() call returns nullptr.  At the point of failure,
the gdbserver backtrace looks like this:

  #0  0x00000000004190e4 in find_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:982
  #1  0x000000000041930d in delete_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:1093
  #2  0x000000000042d8db in process_serial_event () at ../../src/gdbserver/server.cc:4372
  #3  0x000000000042dcab in handle_serial_event (err=0, client_data=0x0) at ../../src/gdbserver/server.cc:4498
  ...

The problem is that, as a result non-stop being on, the process
exiting is only reported back to GDB after the request to remove a
breakpoint has been sent.  Clearly gdbserver can't actually remove
this breakpoint -- the process has already exited -- so I think the
best solution is for gdbserver just to report an error, which is what
I've done.

The second problem I ran into was on the gdb side, as the process has
already exited, but GDB has not yet acknowledged the exit event, the
detach -- the 'D' packet in the above trace -- fails.  This was being
reported to the user with a 'Can't detach process' error.  As the test
actually calls detach from Python code, this error was then becoming a
Python exception.

Though clearly the detach has returned an error, and so, maybe, having
GDB throw an error would be fine, I think in this case, there's a good
argument that the remote error can be ignored -- if GDB tries to
detach and gets back an error, and if there's a pending exit event for
the pid we tried to detach, then just ignore the error and pretend the
detach worked fine.

We could possibly check for a pending exit event before sending the
detach packet, however, I believe that it might be possible (in
non-stop mode) for the stop notification to arrive after the detach is
sent, but before gdbserver has started processing the detach.  In this
case we would still need to check for pending stop events after seeing
the detach fail, so I figure there's no point having two checks -- we
just send the detach request, and if it fails, check to see if the
process has already exited.

Testing
=======

In order to test this issue I needed to ensure that the exit event
arrives at the same time as the detach call.  The window of
opportunity for getting the exit to arrive is so small I've never
managed to trigger this in real use -- I originally spotted this issue
while working on another patch, which did manage to trigger this
issue.

However, if we trigger both the exit and the detach from a single
Python function then we never return to GDB's event loop, as such GDB
never processes the exit event, and so the first time GDB gets a
chance to see the exit is during the detach call.  And so that is the
approach I've taken for testing this patch.

Tested-By: Kevin Buettner <kevinb@redhat.com>
Approved-By: Kevin Buettner <kevinb@redhat.com>
2023-10-26 18:11:54 +01:00
..
2023-06-03 22:43:57 +02:00
2023-10-12 18:23:13 -06:00
2023-06-03 22:43:57 +02:00
2023-06-03 22:43:57 +02:00
2023-06-03 22:43:57 +02:00
2020-06-12 16:01:35 -04:00
2023-10-04 16:23:40 +01:00
2023-06-03 22:43:57 +02:00

		   README for GDBserver & GDBreplay
		    by Stu Grossman and Fred Fish

Introduction:

This is GDBserver, a remote server for Un*x-like systems.  It can be used to
control the execution of a program on a target system from a GDB on a different
host.  GDB and GDBserver communicate using the standard remote serial protocol.
They communicate via either a serial line or a TCP connection.

For more information about GDBserver, see the GDB manual:

    https://sourceware.org/gdb/current/onlinedocs/gdb/Remote-Protocol.html

Usage (server (target) side):

First, you need to have a copy of the program you want to debug put onto
the target system.  The program can be stripped to save space if needed, as
GDBserver doesn't care about symbols.  All symbol handling is taken care of by
the GDB running on the host system.

To use the server, you log on to the target system, and run the `gdbserver'
program.  You must tell it (a) how to communicate with GDB, (b) the name of
your program, and (c) its arguments.  The general syntax is:

	target> gdbserver COMM PROGRAM [ARGS ...]

For example, using a serial port, you might say:

	target> gdbserver /dev/com1 emacs foo.txt

This tells GDBserver to debug emacs with an argument of foo.txt, and to
communicate with GDB via /dev/com1.  GDBserver now waits patiently for the
host GDB to communicate with it.

To use a TCP connection, you could say:

	target> gdbserver host:2345 emacs foo.txt

This says pretty much the same thing as the last example, except that we are
going to communicate with the host GDB via TCP.  The `host:2345' argument means
that we are expecting to see a TCP connection to local TCP port 2345.
(Currently, the `host' part is ignored.)  You can choose any number you want for
the port number as long as it does not conflict with any existing TCP ports on
the target system.  This same port number must be used in the host GDB's
`target remote' command, which will be described shortly. Note that if you chose
a port number that conflicts with another service, GDBserver will print an error
message and exit.

On some targets, GDBserver can also attach to running programs.  This is
accomplished via the --attach argument.  The syntax is:

	target> gdbserver --attach COMM PID

PID is the process ID of a currently running process.  It isn't necessary
to point GDBserver at a binary for the running process.

Usage (host side):

You need an unstripped copy of the target program on your host system, since
GDB needs to examine it's symbol tables and such.  Start up GDB as you normally
would, with the target program as the first argument.  (You may need to use the
--baud option if the serial line is running at anything except 9600 baud.)
Ie: `gdb TARGET-PROG', or `gdb --baud BAUD TARGET-PROG'.  After that, the only
new command you need to know about is `target remote'.  It's argument is either
a device name (usually a serial device, like `/dev/ttyb'), or a HOST:PORT
descriptor.  For example:

	(gdb) target remote /dev/ttyb

communicates with the server via serial line /dev/ttyb, and:

	(gdb) target remote the-target:2345

communicates via a TCP connection to port 2345 on host `the-target', where
you previously started up GDBserver with the same port number.  Note that for
TCP connections, you must start up GDBserver prior to using the `target remote'
command, otherwise you may get an error that looks something like
`Connection refused'.

Building GDBserver:

See the `configure.srv` file for the list of host triplets you can build
GDBserver for.

Building GDBserver for your host is very straightforward.  If you build
GDB natively on a host which GDBserver supports, it will be built
automatically when you build GDB.  You can also build just GDBserver:

	% mkdir obj
	% cd obj
	% path-to-toplevel-sources/configure --disable-gdb
	% make all-gdbserver

(If you have a combined binutils+gdb tree, you may want to also
disable other directories when configuring, e.g., binutils, gas, gold,
gprof, and ld.)

If you prefer to cross-compile to your target, then you can also build
GDBserver that way.  For example:

	% export CC=your-cross-compiler
	% path-to-topevel-sources/configure --disable-gdb
	% make all-gdbserver

Using GDBreplay:

A special hacked down version of GDBserver can be used to replay remote
debug log files created by GDB.  Before using the GDB "target" command to
initiate a remote debug session, use "set remotelogfile <filename>" to tell
GDB that you want to make a recording of the serial or tcp session.  Note
that when replaying the session, GDB communicates with GDBreplay via tcp,
regardless of whether the original session was via a serial link or tcp.

Once you are done with the remote debug session, start GDBreplay and
tell it the name of the log file and the host and port number that GDB
should connect to (typically the same as the host running GDB):

	$ gdbreplay logfile host:port

Then start GDB (preferably in a different screen or window) and use the
"target" command to connect to GDBreplay:

	(gdb) target remote host:port

Repeat the same sequence of user commands to GDB that you gave in the
original debug session.  GDB should not be able to tell that it is talking
to GDBreplay rather than a real target, all other things being equal.  Note
that GDBreplay echos the command lines to stderr, as well as the contents of
the packets it sends and receives.  The last command echoed by GDBreplay is
the next command that needs to be typed to GDB to continue the session in
sync with the original session.