Commit Graph

328 Commits

Author SHA1 Message Date
Yao Qi
4081c0f122 Simplify gdb.threads/wp-replication.exp on counting HW watchpoints
Nowadays, test gdb.threads/wp-replication.exp uses a while loop to
repeatedly insert HW watchpoint, resume and check no error message
coming out, in order to count HW watchpoints  There are some
drawbacks in this way,

 - the loop could be endless.  I think this is use to making trouble
   to S/390, since we had such comment

      # Some targets (like S/390) behave as though supporting
      # unlimited hardware watchpoints.  In this case we just take a
      # safe exit out of the loop.

   I hit this today too because a GDB internal error is triggered
   on "continue" in the loop, and $done is 0 invariantly, so the loop
   can't end.
 - the code counting hardware watchpoint is too complicated.  We can
   use "set breakpoint always-inserted on" to get the result of inserting
   HW watchpoint without resuming the inferior.  In this way,
   watch_count_done and empty_cycle in c file is no longer needed.

In this patch, I change to use "set breakpoint always-inserted on" trick,
and only iterate $NR_THREADS times, to count the HW watchpoint.  In this
way, the loop can't be endless, and GDB doesn't need to resume the inferior.

gdb/testsuite:

2015-10-30  Yao Qi  <yao.qi@linaro.org>

	* gdb.threads/wp-replication.c (watch_count_done): Remove.
	(empty_cycle): Remove.
	(main): Don't call empty_cycle.  Don't use watch_count_done.
	* gdb.threads/wp-replication.exp: Don't set breakpoint on
	empty_cycle.  Rewrite the code counting HW watchpoints.
2015-10-30 15:54:58 +00:00
Pedro Alves
1ed415e2b9 non-stop-fair-events.exp slower on software single-step && !displ-step targets
On software single-step targets that don't support displaced stepping,
threads keep hitting each other's single-step breakpoints, and then
GDB needs to pause all threads to step past those.  The end result is
that progress in the main thread will be slower and it may take a bit
longer for the signal to be queued.  This patch bumps the timeout on
such targets.

gdb/testsuite/ChangeLog:
2015-09-16  Pedro Alves  <palves@redhat.com>
	    Sandra Loosemore <sandra@codesourcery.com>

	* gdb.threads/non-stop-fair-events.c (timeout): New global.
	(SECONDS): Redefine.
	(main): Call pthread_kill and alarm early.
	* gdb.threads/non-stop-fair-events.exp: Probe displaced stepping
	support.
	(test): If the target can't hardware step and doesn't support
	displaced stepping, increase the timeout.
2015-09-16 15:51:36 +01:00
Pedro Alves
d136eff549 Make it easier to debug non-stop-fair-events.exp
If we enable infrun debug running this test, it quickly fails with a
full expect buffer.  That can be simply handled with a couple
exp_continues.  As it's annoying to hack this every time we need to
debug the test, this patch adds bits to enable debugging support
easily, with a one-line change.

And then, if any iteration of the test fails, we end up with a long
cascade of time outs.  Just bail out when we see the first fail.

gdb/testsuite/
2015-09-16  Pedro Alves  <palves@redhat.com>

	* gdb.threads/non-stop-fair-events.exp (gdb_test_no_anchor)
	(enable_debug): New procedures.
	(test): Use them.  Bail out if waiting for threads fails.
	(top level): Bail out if a test fails.
2015-09-16 15:51:36 +01:00
Philippe Waroquiers
5382cfab61 Fix PR/18564 - regression in showing __thread so extern variable
Ensure tls variable address is not relocated, as the msym addr
is an offset in the thread local storage of the shared library/object.
2015-09-15 21:12:39 +02:00
Pedro Alves
d15dcecdee Fix gdb.threads/non-ldr-exc-3.exp race
gdb.threads/non-ldr-exc-3.exp is sometimes failing like this:

 [Switching to Thread 6831.6832]

 Breakpoint 2, thread_execler (arg=0x0) at /home/pedro/gdb/mygit/build/../src/gdb/testsuite/gdb.threads/non-ldr-exc-3.c:41
 41        if (execl (image, image, argv1, NULL) == -1) /* break-here */
 PASS: gdb.threads/non-ldr-exc-3.exp: lock-sched=on,non-stop=off: continue to breakpoint
 (gdb) set scheduler-locking on
 (gdb) FAIL: gdb.threads/non-ldr-exc-3.exp: lock-sched=on,non-stop=off: set scheduler-locking on

The problem is that the gdb_test_multiple is missing the prompt
anchor.  The problem was introduced by 2fd33e9448.  This reverts the
hunk that introduced the problem, reverting back to
gdb_continue_to_breakpoint.

gdb/testsuite/ChangeLog:
2015-09-15  Pedro Alves  <palves@redhat.com>

	* gdb.threads/non-ldr-exc-3.exp (do_test): Use
	gdb_continue_to_breakpoint instead of gdb_test_multiple.
2015-09-15 17:01:59 +01:00
Don Breazeal
2fd33e9448 Extended-remote exec test
This patch updates several exec-related tests and some of the library
functions in order to get them running with extended-remote.  There were
three changes that were required, as follows:

In gdb.base/foll-exec.exp, use 'clean_start' in place of proc 'zap_session'
to reset the state of the debugger between tests.  This sets 'remote
exec-file' to execute the correct binary file in each subsequent test.

In gdb.base/pie-execl.exp, there is an expect statement with an expression
that is used to match output from both gdb and the program under debug.
For the remote target, this had to be split into two expressions, using
$inferior_spawn_id to match the output from the program.

Because I had encountered problems with extended-remote exec events in
non-stop mode in my manual testing, I added non-stop testing to the
non-ldr-exc-[1234].exp tests.  In order to set non-stop mode for remote
targets, it is necessary to 'set non-stop on' after gdb has started, but
before it connects to gdbserver.  This is done using 'save_vars' to set
non-stop mode in GDBFLAGS, so GDB sets non-stop mode on startup.

gdb/testsuite/ChangeLog:

	* gdb.base/foll-exec.c: Add copyright header.  Fix
	formatting issues.
	* gdb.base/foll-exec.exp (zap_session): Delete proc.
	(do_exec_tests): Use clean_restart in place of zap_session,
	and for test initialization.  Fix formatting issues.  Use
	fail in place of perror.
	* gdb.base/pie-execl.exp (main): Use 'inferior_spawn_id' in
	an expect statement to match an expression with output from
	the program under debug.
	* gdb.threads/non-ldr-exc-1.exp (do_test, main): Add
	non-stop tests and pass stop mode argument to clean_restart.
	Use save_vars to enable non-stop in GDBFLAGS.
	* gdb.threads/non-ldr-exc-2.exp: Likewise.
	* gdb.threads/non-ldr-exc-3.exp: Likewise.
	* gdb.threads/non-ldr-exc-4.exp: Likewise.
2015-09-11 11:12:46 -07:00
Sandra Loosemore
c0fa8fbd1c Improve hand-call-in-threads.exp failure handling.
2015-09-08  Sandra Loosemore  <sandra@codesourcery.com>

	gdb/testsuite/
	* gdb.threads/hand-call-in-threads.exp: Make sure the thread
	command actually switches threads.  Give up on remaining
	tests if target fails to stop at breakpoint.
2015-09-08 19:49:04 -07:00
Pedro Alves
d4569d7bc5 Fix step-over-{trips-on-watchpoint|lands-on-breakpoint}.exp race
On a target that is both always in non-stop mode and can do displaced
stepping (such as native x86_64 GNU/Linux, with "maint set
target-non-stop on"), the step-over-trips-on-watchpoint.exp test
sometimes fails like this:

   (gdb) PASS: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: thread 1
   set scheduler-locking off
   (gdb) PASS: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: set scheduler-locking off
   step
  -[Switching to Thread 0x7ffff7fc0700 (LWP 11782)]
  -Hardware watchpoint 4: watch_me
  -
  -Old value = 0
  -New value = 1
  -child_function (arg=0x0) at /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.threads/step-over-trips-on-watchpoint.c:39
  -39           other = 1; /* set thread-specific breakpoint here */
  -(gdb) PASS: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: step
  +wait_threads () at /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.threads/step-over-trips-on-watchpoint.c:49
  +49       return 1; /* in wait_threads */
  +(gdb) FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: step

Note "scheduler-locking" was set off.  The problem is that on such
targets, the step-over of thread 2 and the "step" of thread 1 can be
set to run simultaneously (since with displaced stepping the
breakpoint isn't ever removed from the target), and sometimes, the
"step" of thread 1 finishes first, so it'd take another resume to see
the watchpoint trigger.  Fix this by replacing the wait_threads
function with a one-line infinite loop that doesn't call any function,
so that the "step" of thread 1 never finishes.

gdb/testsuite/ChangeLog:
2015-08-07  Pedro Alves  <palves@redhat.com>

	* gdb.threads/step-over-lands-on-breakpoint.c (wait_threads):
	Delete function.
	(main): Add alarm.  Run an infinite loop instead of calling
	wait_threads.
	* gdb.threads/step-over-lands-on-breakpoint.exp (do_test): Change
	comment.
	* gdb.threads/step-over-trips-on-watchpoint.c (wait_threads):
	Delete function.
	(main): Add alarm.  Run an infinite loop instead of calling
	wait_threads.
	* gdb.threads/step-over-trips-on-watchpoint.exp (do_test): Change
	comment.
2015-08-07 17:26:21 +01:00
Pedro Alves
d55007b583 Fix signal-while-stepping-over-bp-other-thread.exp on targets always in non-stop
With "maint set target-non-stop on" we get:

 -PASS: gdb.threads/signal-while-stepping-over-bp-other-thread.exp: step
 +FAIL: gdb.threads/signal-while-stepping-over-bp-other-thread.exp: step

The issue is simply that switch_back_to_stepped_thread is not used in
non-stop mode, thus infrun doesn't output the expected "switching back
to stepped thread" log.

gdb/testsuite/ChangeLog:
2015-08-07  Pedro Alves  <palves@redhat.com>

	* signal-while-stepping-over-bp-other-thread.exp: Expect "restart
	threads" as alternative to "switching back to stepped thread".
2015-08-07 17:26:20 +01:00
Pedro Alves
f6a9d9c7db Revert "test slowdown"
That was pushed by mistake.
2015-08-06 12:45:45 +01:00
Pedro Alves
83e97ed023 Test for PR18749: problems if whole process dies while (ptrace-) stopped
This adds a kfailed test that has the whole process exit just while
several threads continuously step over a breakpoint.  Usually, the
process exits just while GDB or GDBserver is handling the breakpoint
hit.  In other words, the process disappears while the event thread is
(ptrace-) stopped.  This exposes several issues in GDB and GDBserver.
Errors, crashes, etc.

I fixed some of these issues recently, but there's a lot more to do.
It's a bit like playing whack-a-mole at the moment.  You fix an issue,
which then exposes several others.

E.g., with the native target, you get (among other errors):

  (...)
  [New Thread 0x7ffff47b9700 (LWP 18077)]
  [New Thread 0x7ffff3fb8700 (LWP 18078)]
  [New Thread 0x7ffff37b7700 (LWP 18079)]
  Cannot find user-level thread for LWP 18076: generic error
  (gdb) KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (prompt) (PRMS: gdb/18749)

gdb/testsuite/ChangeLog:
2015-08-06  Pedro Alves  <palves@redhat.com>

	PR gdb/18749
	* gdb.threads/process-dies-while-handling-bp.c: New file.
	* gdb.threads/process-dies-while-handling-bp.exp: New file.
2015-08-06 12:33:20 +01:00
Pedro Alves
4807d3f329 test slowdown 2015-08-06 12:33:19 +01:00
Pedro Alves
863d01bde2 gdbserver: Fix non-stop / fork / step-over issues
Ref: https://sourceware.org/ml/gdb-patches/2015-07/msg00868.html

This adds a test that has a multithreaded program have several threads
continuously fork, while another thread continuously steps over a
breakpoint.

This exposes several intertwined issues, which this patch addresses:

 - When we're stopping and suspending threads, some thread may fork,
   and we missed setting its suspend count to 1, like we do when a new
   clone/thread is detected.  When we next unsuspend threads, the fork
   child's suspend count goes below 0, which is bogus and fails an
   assertion.

 - If a step-over is cancelled because a signal arrives, but then gdb
   is not interested in the signal, we pass the signal straight back
   to the inferior.  However, we miss that we need to re-increment the
   suspend counts of all other threads that had been paused for the
   step-over.  As a result, other threads indefinitely end up stuck
   stopped.

 - If a detach request comes in just while gdbserver is handling a
   step-over (in the test at hand, this is GDB detaching the fork
   child), gdbserver internal errors in stabilize_thread's helpers,
   which assert that all thread's suspend counts are 0 (otherwise we
   wouldn't be able to move threads out of the jump pads).  The
   suspend counts aren't 0 while a step-over is in progress, because
   all threads but the one stepping past the breakpoint must remain
   paused until the step-over finishes and the breakpoint can be
   reinserted.

 - Occasionally, we see "BAD - reinserting but not stepping." being
   output (from within linux_resume_one_lwp_throw).  That was because
   GDB pokes memory while gdbserver is busy with a step-over, and that
   suspends threads, and then re-resumes them with proceed_one_lwp,
   which missed another reason to tell linux_resume_one_lwp that the
   thread should be set back to stepping.

 - In a couple places, we were resuming threads that are meant to be
   suspended.  E.g., when a vCont;c/s request for thread B comes in
   just while gdbserver is stepping thread A past a breakpoint.  The
   resume for thread B must be deferred until the step-over finishes.

 - The test runs with both "set detach-on-fork" on and off.  When off,
   it exercises the case of GDB detaching the fork child explicitly.
   When on, it exercises the case of gdb resuming the child
   explicitly.  In the "off" case, gdb seems to exponentially become
   slower as new inferiors are created.  This is _very_ noticeable as
   with only 100 inferiors gdb is crawling already, which makes the
   test take quite a bit to run.  For that reason, I've disabled the
   "off" variant for now.

gdb/ChangeLog:
2015-08-06  Pedro Alves  <palves@redhat.com>

	* target/waitstatus.h (enum target_stop_reason)
	<TARGET_STOPPED_BY_SINGLE_STEP>: New value.

gdb/gdbserver/ChangeLog:
2015-08-06  Pedro Alves  <palves@redhat.com>

	* linux-low.c (handle_extended_wait): Set the fork child's suspend
	count if stopping and suspending threads.
	(check_stopped_by_breakpoint): If stopped by trace, set the LWP's
	stop reason to TARGET_STOPPED_BY_SINGLE_STEP.
	(linux_detach): Complete an ongoing step-over.
	(lwp_suspended_inc, lwp_suspended_decr): New functions.  Use
	throughout.
	(resume_stopped_resumed_lwps): Don't resume a suspended thread.
	(linux_wait_1): If passing a signal to the inferior after
	finishing a step-over, unsuspend and re-resume all lwps.  If we
	see a single-step event but the thread should be continuing, don't
	pass the trap to gdb.
	(stuck_in_jump_pad_callback, move_out_of_jump_pad_callback): Use
	internal_error instead of gdb_assert.
	(enqueue_pending_signal): New function.
	(check_ptrace_stopped_lwp_gone): Add debug output.
	(start_step_over): Use internal_error instead of gdb_assert.
	(complete_ongoing_step_over): New function.
	(linux_resume_one_thread): Don't resume a suspended thread.
	(proceed_one_lwp): If the LWP is stepping over a breakpoint, reset
	it stepping.

gdb/testsuite/ChangeLog:
2015-08-06  Pedro Alves  <palves@redhat.com>

	* gdb.threads/forking-threads-plus-breakpoint.exp: New file.
	* gdb.threads/forking-threads-plus-breakpoint.c: New file.
2015-08-06 10:30:18 +01:00
Pedro Alves
0a39bb3218 stepping is disturbed by setjmp/longjmp | try/catch in other threads
At https://sourceware.org/ml/gdb-patches/2015-08/msg00097.html, Joel
observed that trying to next/step a program on GNU/Linux sometimes
results in the following failed assertion:

	% gdb -q .obj/gprof/main
    (gdb) start
    (gdb) n
    (gdb) step
    [...]/infrun.c:2391: internal-error:
    resume: Assertion `sig != GDB_SIGNAL_0' failed.

What happened is that, during the "next" operation, GDB hit a
longjmp/exception/step-resume breakpoint but failed to see that this
breakpoint was set for a different thread than the one being stepped.

Joel's detailed analysis follows:

More precisely, at the end of the "start" command, we are stopped at
the start of function Main in main.adb; there are 4 threads in total,
and we are in the main thread (which is thread 1):

    (gdb) info thread
      Id   Target Id         Frame
      4    Thread 0xb7a56ba0 (LWP 28379) 0xffffe410 in __kernel_vsyscall ()
      3    Thread 0xb7c5aba0 (LWP 28378) 0xffffe410 in __kernel_vsyscall ()
      2    Thread 0xb7e5eba0 (LWP 28377) 0xffffe410 in __kernel_vsyscall ()
    * 1    Thread 0xb7ea18c0 (LWP 28370) main () at /[...]/main.adb:57

All the logs below reference Thread ID/LWP, but it'll be easier to
talk about the threads by GDB thread number.  For instance, thread 1
is LWP 28370 while thread 3 is LWP 28378.  So, the explanations below
translate the LWPs into thread numbers.

Back to what happens while we are trying to "next' our program:
    (gdb) n
    infrun: clear_proceed_status_thread (Thread 0xb7a56ba0 (LWP 28379))
    infrun: clear_proceed_status_thread (Thread 0xb7c5aba0 (LWP 28378))
    infrun: clear_proceed_status_thread (Thread 0xb7e5eba0 (LWP 28377))
    infrun: clear_proceed_status_thread (Thread 0xb7ea18c0 (LWP 28370))
    infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT)
    infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x805451e
    infrun: target_wait (-1.0.0, status) =
    infrun:   28370.28370.0 [Thread 0xb7ea18c0 (LWP 28370)],
    infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
    infrun: TARGET_WAITKIND_STOPPED
    infrun: stop_pc = 0x8054523

We've resumed thread 1 (LWP 28370), and received in return a signal
that the same thread stopped slightly further.  It's still in the
range of instructions for the line of source we started the "next"
from, as evidenced by the following trace...

    infrun: stepping inside range [0x805451e-0x8054531]

... and thus, we decide to continue stepping the same thread:

    infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x8054523
    infrun: prepare_to_wait

That's when we get an event from a different thread (thread 3)...

    infrun: target_wait (-1.0.0, status) =
    infrun:   28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)],
    infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
    infrun: TARGET_WAITKIND_STOPPED
    infrun: stop_pc = 0x80782d0
    infrun: context switch
    infrun: Switching context from Thread 0xb7ea18c0 (LWP 28370) to Thread 0xb7c5aba0 (LWP 28378)

... which we find to be at the address where we set a breakpoint on
"the unwinder debug hook" (namely "_Unwind_DebugHook").  But GDB fails
to notice that the breakpoint was inserted for thread 1 only, and so
decides to handle it as...

    infrun: BPSTAT_WHAT_SET_LONGJMP_RESUME

... and inserts a breakpoint at the corresponding resume address, as
evidenced by this the next log:

    infrun: exception resume at 80542a2

That breakpoint seems innocent right now, but will play a role fairly
quickly.  But for now, GDB has inserted the exception-resume
breakpoint, and needs to single-step thread 3 past the breakpoint it
just hit.  Thus, it temporarily disables the exception breakpoint, and
requests a step of that thread:

    infrun: skipping breakpoint: stepping past insn at: 0x80782d0
    infrun: skipping breakpoint: stepping past insn at: 0x80782d0
    infrun: skipping breakpoint: stepping past insn at: 0x80782d0
    infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 0xb7c5aba0 (LWP 28378)] at 0x80782d0
    infrun: prepare_to_wait

We then get a notification, still from thread 3, that it's now past
that breakpoint...

    infrun: prepare_to_wait
    infrun: target_wait (-1.0.0, status) =
    infrun:   28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)],
    infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
    infrun: TARGET_WAITKIND_STOPPED
    infrun: stop_pc = 0x8078424

... so we can resume what we were doing before, which is single-stepping
thread 1 until we get to a new line of code:

    infrun: switching back to stepped thread
    infrun: Switching context from Thread 0xb7c5aba0 (LWP 28378) to Thread 0xb7ea18c0 (LWP 28370)
    infrun: expected thread still hasn't advanced
    infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x8054523

The "resume" log above shows that we're resuming thread 1 from where
we left off (0x8054523).  We get one more stop at 0x8054529, which is
still inside our stepping range so we go again.  That's when we get
the following event, from thread 3:

    infrun: prepare_to_wait
    infrun: target_wait (-1.0.0, status) =
    infrun:   28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)],
    infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
    infrun: TARGET_WAITKIND_STOPPED
    infrun: stop_pc = 0x80542a2

Now the stop_pc address is interesting, because it's the address of
"exception resume" breakpoint...

    infrun: context switch
    infrun: Switching context from Thread 0xb7ea18c0 (LWP 28370) to Thread 0xb7c5aba0 (LWP 28378)
    infrun: BPSTAT_WHAT_CLEAR_LONGJMP_RESUME

... and since that location is at a different line of code, this is
where it decides the "next" operation should stop:

    infrun: stop_waiting
    [Switching to Thread 0xb7c5aba0 (LWP 28378)]
    0x080542a2 in inte_tache_rt.ttache_rt (
        <_task>=0x80968ec <inte_tache_rt_inst.tache2>)
        at /[...]/inte_tache_rt.adb:54
    54            end loop;

However, what GDB should have noticed earlier that the exception
breakpoint we hit was for a different thread, thus should have
single-stepped that thread out of the breakpoint _without_ inserting
the exception-return breakpoint, and then resumed the single-stepping
of the initial thread (thread 1) until that thread stepped out of its
stepping range.

This is what this patch does, and after applying it, GDB now correctly
stops on the next line of code.

The patch adds a C++ test that exercises this, both for setjmp/longjmp
and exception breakpoints.  With an unpatched GDB it shows:

 (gdb) next
 [Switching to Thread 22445.22455]
 thread_try_catch (arg=0x0) at /home/pedro/gdb/mygit/build/../src/gdb/testsuite/gdb.threads/next-other-thr-longjmp.c:59
 59            catch (...)
 (gdb) FAIL: gdb.threads/next-other-thr-longjmp.exp: next to line 1
 next
 /home/pedro/gdb/mygit/build/../src/gdb/infrun.c:4865: internal-error: process_event_stop_test: Assertion `ecs->event_thread->control.exception_resume_breakpoint != NULL' fa
 iled.
 A problem internal to GDB has been detected,
 further debugging may prove unreliable.
 Quit this debugging session? (y or n) FAIL: gdb.threads/next-other-thr-longjmp.exp: next to line 2 (GDB internal error)
 Resyncing due to internal error.
 n

Tested on x86_64-linux, no regressions.

gdb/ChangeLog:
2015-08-05  Pedro Alves  <palves@redhat.com>
	    Joel Brobecker  <brobecker@adacore.com>

        * breakpoint.c (bpstat_what) <bp_longjmp, bp_longjmp_call_dummy>
	<bp_exception, bp_longjmp_resume, bp_exception_resume>: Handle the
	case where BS->STOP is not set.

gdb/testsuite/ChangeLog:
2015-08-05  Pedro Alves  <palves@redhat.com>

	* gdb.threads/next-while-other-thread-longjmps.c: New file.
	* gdb.threads/next-while-other-thread-longjmps.exp: New file.
2015-08-05 20:01:42 +01:00
Pedro Alves
2c8c5d375e testsuite: tcl exec& -> 'kill -9 $pid' is racy (attach-many-short-lived-thread.exp races and others)
The buildbots show that attach-many-short-lived-thread.exp is racy.
But after staring at debug logs and playing with SystemTap scripts for
a (long) while, I figured out that neither GDB, nor the kernel nor the
test's program itself are at fault.

The problem is simply that the testsuite machinery is currently
subject to PID-reuse races.  The attach-many-short-lived-threads.c
test program just happens to be much more susceptible to trigger this
race because threads and processes share the same number space on
Linux, and the test spawns many many short lived threads in
succession, thus enlarging the race window a lot.

Part of the problem is that several tests spawn processes with "exec&"
(in order to test the "attach" command) , and then at the end of the
test, to make sure things are cleaned up, issue a 'remote_spawn "kill
-p $testpid"'.  Since with tcl's "exec&", tcl itself is responsible
for reaping the process's exit status, when we go kill the process,
testpid may have already exited _and_ its status may have (and often
has) been reaped already.  Thus it can happen that another process
meanwhile reuses $testpid, and that "kill" command kills the wrong
process...  Frequently, that happens to be
attach-many-short-lived-thread, but this explains other test's races
as well.

In the attach-many-short-lived-threads test, it sometimes manifests
like this:

 (gdb) file /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads
 Reading symbols from /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads...done.
 (gdb)           Loaded /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads into /home/pedro/gdb/mygit/build/gdb/testsuite/../../gdb/gdb
 attach 5940
 Attaching to program: /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads, process 5940
 warning: process 5940 is a zombie - the process has already terminated
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 ptrace: Operation not permitted.
 (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: attach
 info threads
 No threads.
 (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: no new threads
 set breakpoint always-inserted on
 (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set breakpoint always-inserted on

Other times the process dies while the test is ongoing (the process is
ptrace-stopped):

 (gdb) print again = 1
 Cannot access memory at address 0x6020cc
 (gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 2: reset timer in the inferior

(Recall that on Linux, SIGKILL is not interceptable)

And other times it dies just while we're detaching:

 $4 = 319
 (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 2: print seconds_left
 detach
 Can't detach Thread 0x7fb13b7de700 (LWP 1842): No such process
 (gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 2: detach

GDB mishandles the latter (it should ignore ESRCH while detaching just
like when continuing), but that's another story.

The fix here is to change spawn_wait_for_attach to use Expect's
'spawn' command instead of Tcl's 'exec&' to spawn programs, because
with spawn we control when to wait for/reap the process.  That allows
killing the process by PID without being subject to pid-reuse races,
because even if the process is already dead, the kernel won't reuse
the process's PID until the zombie is reaped.

The other part of the problem lies in DejaGnu itself, unfortunately.
I have occasionally seen tests (attach-many-short-lived-threads
included, but not only that one) die with a random inexplicable
SIGTERM too, and that too is caused by the same reason, except that in
that case, the rogue SIGTERM is sent from this bit in DejaGnu's remote.exp:

    exec sh -c "exec > /dev/null 2>&1 && (kill -2 $pgid || kill -2 $pid) && sleep 5 && (kill $pgid || kill $pid) && sleep 5 && (kill -9 $pgid || kill -9 $pid) &"
    ...
    catch "wait -i $shell_id"

Even if the program exits promptly, that whole cascade of kills
carries on in the background, thus potentially killing the poor
process that manages to reuse $pid...

I sent a fix for that to the DejaGnu list:
 http://lists.gnu.org/archive/html/dejagnu/2015-07/msg00000.html

With both patches in place, I haven't seen
attach-many-short-lived-threads.exp fail again.

Tested on x86_64 Fedora 20, native, gdbserver and extended-gdbserver.

gdb/testsuite/ChangeLog:
2015-07-31  Pedro Alves  <palves@redhat.com>

	* gdb.base/attach-pie-misread.exp: Rename $res to $test_spawn_id.
	Use spawn_id_get_pid.  Wait for spawn id after eof.  Use
	kill_wait_spawned_process instead of explicit "kill -9".
	* gdb.base/attach-pie-noexec.exp: Adjust to spawn_wait_for_attach
	returning a spawn id instead of a pid.  Use spawn_id_get_pid and
	kill_wait_spawned_process.
	* gdb.base/attach-twice.exp: Likewise.
	* gdb.base/attach.exp: Likewise.
	(do_command_attach_tests): Use gdb_spawn_with_cmdline_opts and
	gdb_test_multiple.
	* gdb.base/solib-overlap.exp: Adjust to spawn_wait_for_attach
	returning a spawn id instead of a pid.  Use spawn_id_get_pid and
	kill_wait_spawned_process.
	* gdb.base/valgrind-infcall.exp: Likewise.
	* gdb.multi/multi-attach.exp: Likewise.
	* gdb.python/py-prompt.exp: Likewise.
	* gdb.python/py-sync-interp.exp: Likewise.
	* gdb.server/ext-attach.exp: Likewise.
	* gdb.threads/attach-into-signal.exp (corefunc): Use
	spawn_wait_for_attach, spawn_id_get_pid and
	kill_wait_spawned_process.
	* gdb.threads/attach-many-short-lived-threads.exp: Adjust to
	spawn_wait_for_attach returning a spawn id instead of a pid.  Use
	spawn_id_get_pid and kill_wait_spawned_process.
	* gdb.threads/attach-stopped.exp (corefunc): Use
	spawn_wait_for_attach, spawn_id_get_pid and
	kill_wait_spawned_process.
	* gdb.base/break-interp.exp: Rename $res to $test_spawn_id.
	Use spawn_id_get_pid.  Wait for spawn id after eof.  Use
	kill_wait_spawned_process instead of explicit "kill -9".
	* lib/gdb.exp (can_spawn_for_attach): Adjust comment.
	(kill_wait_spawned_process, spawn_id_get_pid): New procedures.
	(spawn_wait_for_attach): Use spawn instead of exec to spawn
	processes.  Don't map cygwin/windows pids here.  Now returns a
	spawn id list.
2015-07-31 20:06:24 +01:00
Pedro Alves
998d452ac8 remote follow fork and spurious child stops in non-stop mode
Running gdb.threads/fork-plus-threads.exp against gdbserver in
extended-remote mode, even though the test passes, we still see broken
behavior:

 (gdb) PASS: gdb.threads/fork-plus-threads.exp: set detach-on-fork off
 continue &
 Continuing.
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: continue &
 [New Thread 28092.28092]

 [Thread 28092.28092] #2 stopped.
 [New Thread 28094.28094]
 [Inferior 2 (process 28092) exited normally]
 [New Thread 28094.28105]
 [New Thread 28094.28109]

...

[Thread 28174.28174] #18 stopped.
 [New Thread 28185.28185]
 [Inferior 10 (process 28174) exited normally]
 [New Thread 28185.28196]

 [Thread 28185.28185] #20 stopped.
 Cannot remove breakpoints because program is no longer writable.
 Further execution is probably impossible.
 [Inferior 11 (process 28185) exited normally]
 [Inferior 1 (process 28091) exited normally]
 PASS: gdb.threads/fork-plus-threads.exp: reached breakpoint
 info threads
 No threads.
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: no threads left
 info inferiors
   Num  Description       Executable
 * 1    <null>            /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/fork-plus-threads
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: only inferior 1 left

All the "[Thread FOO] #NN stopped." above are bogus, as well as the
"Cannot remove breakpoints because program is no longer writable.",
which is a consequence.

The problem is that when we intercept a fork event, we should report
the event for the parent, only, and leave the child stopped, but not
report its stop event.  GDB later decides whether to follow the parent
or the child.  But because handle_extended_wait does not set the
child's last_status.kind to TARGET_WAITKIND_STOPPED, a
stop_all_threads/unstop_all_lwps sequence (e.g., from trying to access
memory) by mistake ends up queueing a SIGSTOP on the child, resuming
it, and then when that SIGSTOP is intercepted, because the LWP has
last_resume_kind set to resume_stop, gdbserver reports the stop to
GDB, as GDB_SIGNAL_0:

...
 >>>> entering unstop_all_lwps
 unstopping all lwps
 proceed_one_lwp: lwp 1600
    client wants LWP to remain 1600 stopped
 proceed_one_lwp: lwp 1828
 Client wants LWP 1828 to stop. Making sure it has a SIGSTOP pending
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Sending sigstop to lwp 1828
 pc is 0x3615ebc7cc
 Resuming lwp 1828 (continue, signal 0, stop expected)
   continue from pc 0x3615ebc7cc
 unstop_all_lwps done
 sigchld_handler
 <<<< exiting unstop_all_lwps
 handling possible target event
 >>>> entering linux_wait_1
 linux_wait_1: [<all threads>]
 my_waitpid (-1, 0x40000001)
 my_waitpid (-1, 0x1): status(137f), 1828
 LWFE: waitpid(-1, ...) returned 1828, ERRNO-OK
 LLW: waitpid 1828 received Stopped (signal) (stopped)
 pc is 0x3615ebc7cc
 Expected stop.
 LLW: resume_stop SIGSTOP caught for LWP 1828.1828.
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
 linux_wait_1 ret = LWP 1828.1828, 1, 0
 <<<< exiting linux_wait_1
 Writing resume reply for LWP 1828.1828:1
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Tested on x86_64 Fedora 20, extended-remote.

gdb/gdbserver/ChangeLog:
2015-07-30  Pedro Alves  <palves@redhat.com>

	* linux-low.c (handle_extended_wait): Set the child's last
	reported status to TARGET_WAITKIND_STOPPED.
2015-07-30 18:52:53 +01:00
Pedro Alves
69dde7dcb8 PR threads/18600: Inferiors left around after fork+thread spawn
The new gdb.threads/fork-plus-threads.exp test exposes one more
problem.  When one types "info inferiors" after running the program,
one see's a couple inferior left still, while there should only be
inferior #1 left.  E.g.:

 (gdb) info inferiors
   Num  Description       Executable
   4    process 8393      /home/pedro/bugs/src/test
   2    process 8388      /home/pedro/bugs/src/test
 * 1    <null>            /home/pedro/bugs/src/test
 (gdb) info threads

Calling prune_inferiors() manually at this point (from a top gdb) does
not remove them, because they still have inf->pid != 0 (while they
shouldn't).  This suggests that we never mourned those inferiors.

Enabling logs (master + previous patch) we see:

 ...
 WL: waitpid Thread 0x7ffff7fc2740 (LWP 9513) received Trace/breakpoint trap (stopped)
 WL: Handling extended status 0x03057f
 LHEW: Got clone event from LWP 9513, new child is LWP 9579
 [New Thread 0x7ffff37b8700 (LWP 9579)]
 WL: waitpid Thread 0x7ffff7fc2740 (LWP 9508) received 0 (exited)
 WL: Thread 0x7ffff7fc2740 (LWP 9508) exited.
			    ^^^^^^^^
 [Thread 0x7ffff7fc2740 (LWP 9508) exited]
 WL: waitpid Thread 0x7ffff7fc2740 (LWP 9499) received 0 (exited)
 WL: Thread 0x7ffff7fc2740 (LWP 9499) exited.
 [Thread 0x7ffff7fc2740 (LWP 9499) exited]
 RSRL: resuming stopped-resumed LWP Thread 0x7ffff37b8700 (LWP 9579) at 0x3615ef4ce1: step=0
 ...
 (gdb) info inferiors
   Num  Description       Executable
   5    process 9508      /home/pedro/bugs/src/test
		^^^^
   4    process 9503      /home/pedro/bugs/src/test
   3    process 9500      /home/pedro/bugs/src/test
   2    process 9499      /home/pedro/bugs/src/test
 * 1    <null>            /home/pedro/bugs/src/test
 (gdb)
 ...

Note the "Thread 0x7ffff7fc2740 (LWP 9508) exited." line.
That's this in wait_lwp:

      /* Check if the thread has exited.  */
      if (WIFEXITED (status) || WIFSIGNALED (status))
	{
	  thread_dead = 1;
	  if (debug_linux_nat)
	    fprintf_unfiltered (gdb_stdlog, "WL: %s exited.\n",
				target_pid_to_str (lp->ptid));
	}
    }

That was the leader thread reporting an exit, meaning the whole
process is gone.  So the problem is that this code doesn't understand
that an WIFEXITED status of the leader LWP should be reported to
infrun as process exit.

gdb/ChangeLog:
2015-07-30  Pedro Alves  <palves@redhat.com>

	PR threads/18600
	* linux-nat.c (wait_lwp): Report to the core when thread group
	leader exits.

gdb/testsuite/ChangeLog:
2015-07-30  Pedro Alves  <palves@redhat.com>

	PR threads/18600
	* gdb.threads/fork-plus-threads.exp: Test that "info inferiors"
	only shows inferior 1.
2015-07-30 18:52:09 +01:00
Pedro Alves
4dd63d488a PR threads/18600: Threads left stopped after fork+thread spawn
When a program forks and another process start threads while gdb is
handling the fork event, newly created threads are left stuck stopped
by gdb, even though gdb presents them as "running", to the user.

This can be seen with the test added by this patch.  The test has the
inferior fork a certain number of times and waits for all children to
exit.  Each fork child spawns a number of threads that do nothing and
joins them immediately.  Normally, the program should run unimpeded
(from the point of view of the user) and exit very quickly.  Without
this fix, it doesn't because of some threads left stopped by gdb, so
inferior 1 never exits.

The program triggers when a new clone thread is found while inside the
linux_stop_and_wait_all_lwps call in linux-thread-db.c:

      linux_stop_and_wait_all_lwps ();

      ALL_LWPS (lp)
	if (ptid_get_pid (lp->ptid) == pid)
	  thread_from_lwp (lp->ptid);

      linux_unstop_all_lwps ();

Within linux_stop_and_wait_all_lwps, we reach
linux_handle_extended_wait with the "stopping" parameter set to 1, and
because of that we don't mark the new lwp as resumed.  As consequence,
the subsequent resume_stopped_resumed_lwps, called from
linux_unstop_all_lwps, never resumes the new LWP.

There's lots of cruft in linux_handle_extended_wait that no longer
makes sense.  On systems with CLONE events support, we don't rely on
libthread_db for thread listing anymore, so the code that preserves
stop_requested and the handling of last_resume_kind is all dead.

So the fix is to remove all that, and simply always mark the new LWP
as resumed, so that resume_stopped_resumed_lwps re-resumes it.

gdb/ChangeLog:
2015-07-30  Pedro Alves  <palves@redhat.com>
	    Simon Marchi  <simon.marchi@ericsson.com>

	PR threads/18600
	* linux-nat.c (linux_handle_extended_wait): On CLONE event, always
	mark the new thread as resumed.  Remove STOPPING parameter.
	(wait_lwp): Adjust call to linux_handle_extended_wait.
	(linux_nat_filter_event): Adjust call to
	linux_handle_extended_wait.
	(resume_stopped_resumed_lwps): Add debug output.

gdb/testsuite/ChangeLog:
2015-07-30  Simon Marchi  <simon.marchi@ericsson.com>
	    Pedro Alves  <palves@redhat.com>

	PR threads/18600
	* gdb.threads/fork-plus-threads.c: New file.
	* gdb.threads/fork-plus-threads.exp: New file.
2015-07-30 18:50:29 +01:00
Sergio Durigan Junior
7da5b897c9 Uniquefy gdb.threads/attach-into-signal.exp
Hi,

While examining BuildBot's logs, I noticed:

  <https://sourceware.org/ml/gdb-testers/2015-q3/msg03767.html>

gdb.threads/attach-into-signal.exp has two nested loops and don't use
unique messages.  This commit fixes that.  Pushed under the obvious
rule.

gdb/testsuite/ChangeLog:
2015-07-29  Sergio Durigan Junior  <sergiodj@redhat.com>

	* gdb.threads/attach-into-signal.exp (corefunc): Use
	with_test_prefix on nested loops, uniquefying the test messages.
2015-07-29 11:10:49 -04:00
Pedro Alves
7759842763 PR gdb/18717: internal error if non-leader thread exits process
If a non-leader thread exits the process while all other threads are
ptrace-stopped, native gdb fails an assertion.  The test added by this
commit catches it:

 /home/pedro/gdb/mygit/build/../src/gdb/linux-nat.c:3198: internal-error: linux_nat_filter_event: Assertion `lp->resumed' failed.
 A problem internal to GDB has been detected,
 further debugging may prove unreliable.
 Quit this debugging session? (y or n)
 FAIL: gdb.threads/non-leader-exit-process.exp: program exits normally (GDB internal error)

The fix is just to remove the assertion.

With that out of the way, neither GDB not GDBserver handle this
perfectly though, so I'm adding a KFAIL:

 (gdb) continue
 Continuing.
 [Thread 0x7ffff7fc0700 (LWP 15350) exited]
 No unwaited-for children left.
 Couldn't get registers: No such process.
 (gdb) KFAIL: gdb.threads/non-ldr-exit.exp: program exits normally (PRMS: gdb/18717)

gdb/ChangeLog:
2015-07-24  Pedro Alves  <palves@redhat.com>

	PR gdb/18717
	* linux-nat.c (linux_nat_filter_event): Don't assert that the lwp
	is resumed, and extend the debug log.

gdb/testsuite/ChangeLog:
2015-07-24  Pedro Alves  <palves@redhat.com>

	PR gdb/18717
	* gdb.threads/non-ldr-exit.c: New file.
	* gdb.threads/non-ldr-exit.exp: New file.
2015-07-24 17:49:17 +01:00
Pedro Alves
28bf096c62 PR threads/18127 - threads spawned by infcall end up stuck in "running" state
Refs:
 https://sourceware.org/ml/gdb/2015-03/msg00024.html
 https://sourceware.org/ml/gdb/2015-06/msg00005.html

On GNU/Linux, if an infcall spawns a thread, that thread ends up with
stuck running state.  This happens because:

 - when linux-nat.c detects a new thread, it marks them as running,
   and does not report anything to the core.

 - we skip finish_thread_state when the thread that is running the
   infcall stops.

As result, that new thread ends up with stuck "running" state, even
though it really is stopped.

On Windows, _all_ threads end up stuck in running state, not just the
one that was spawned.  That happens because when a new thread is
detected, unlike linux-nat.c, windows-nat.c reports
TARGET_WAITKIND_SPURIOUS to infrun.  It's the fact that that event
does not cause a user-visible stop that triggers the problem.  When
the target is re-resumed, we call set_running with a wildcard ptid,
which marks all thread as running.  That set_running is not suppressed
because the (leader) thread being resumed does not have in_infcall
set.  Later, when the infcall finally finishes successfully, nothing
marks all threads back to stopped.

We can trigger the same problem on all targets by having a thread
other than the one that is running the infcall report a breakpoint hit
to infrun, and then have that breakpoint not cause a stop.  That's
what the included test does.

The fix is to stop GDB from suppressing the set_running calls while
doing an infcall, and then set the threads back to stopped when the
call finishes, iff they were originally stopped before the infcall
started.  (Note the MI *running/*stopped event suppression isn't
affected.)

Tested on x86_64 GNU/Linux.

gdb/ChangeLog:
2015-06-29  Pedro Alves  <palves@redhat.com>

	PR threads/18127
	* infcall.c (run_inferior_call): On infcall success, if the thread
	was marked stopped before, reset it back to stopped.
	* infrun.c (resume): Don't suppress the set_running calls when
	doing an infcall.
	(normal_stop): Only discard the finish_thread_state cleanup if the
	infcall succeeded.

gdb/testsuite/ChangeLog:
2015-06-29  Pedro Alves  <palves@redhat.com>

	PR threads/18127
	* gdb.threads/hand-call-new-thread.c: New file.
	* gdb.threads/hand-call-new-thread.c: New file.
2015-06-29 16:07:57 +01:00
Pedro Alves
9ee417720b Cleanup signal-while-stepping-over-bp-other-thread.exp
gdb/testsuite/ChangeLog:
2015-04-10  Pedro Alves  <palves@redhat.com>

	* gdb.threads/signal-while-stepping-over-bp-other-thread.exp: Use
	gdb_test_sequence and gdb_assert.
2015-04-10 19:49:00 +01:00
Pedro Alves
07473109e1 step-over-trips-on-watchpoint.exp: Don't put addresses in test messages
Diffing test results, I noticed:

 -PASS: gdb.threads/step-over-trips-on-watchpoint.exp: displaced=on: with thread-specific bp: next: b *0x0000000000400811 thread 1
 +PASS: gdb.threads/step-over-trips-on-watchpoint.exp: displaced=on: with thread-specific bp: next: b *0x00000000004007d1 thread 1

gdb/testsuite/ChangeLog:
2015-04-10  Pedro Alves  <palves@redhat.com>

	* gdb.threads/step-over-trips-on-watchpoint.exp (do_test): Use
	test messages that don't include the breakpoint address.
2015-04-10 19:23:24 +01:00
Pedro Alves
c79d856c88 Test step-over-{lands-on-breakpoint|trips-on-watchpoint}.exp with displaced stepping
These tests exercise the infrun.c:proceed code that needs to know to
start new step overs (along with switch_back_to_stepped_thread, etc.).
That code is tricky to get right in the multitude of possible
combinations (at least):

 (native | remote)
  X (all-stop | all-stop-but-target-always-in-non-stop)
  X (displaced-stepping | in-line step-over).

The first two above are properties of the target, but the different
step-over-breakpoint methods should work with any target that supports
them.  This patch makes sure we always test both methods on all
targets.

Tested on x86-64 Fedora 20.

gdb/testsuite/ChangeLog:
2015-04-10  Pedro Alves  <palves@redhat.com>

	* gdb.threads/step-over-lands-on-breakpoint.exp (do_test): New
	procedure, factored out from ...
	(top level): ... here.  Add "set displaced-stepping" testing axis.
	* gdb.threads/step-over-trips-on-watchpoint.exp (do_test): New
	parameter "displaced".  Use it.
	(top level): Use foreach and add "set displaced-stepping" testing
	axis.
2015-04-10 13:31:59 +01:00
Pedro Alves
ebc90b50ce Make gdb.threads/step-over-trips-on-watchpoint.exp effective on !x86
This test is currently failing like this on (at least) PPC64 and s390x:

 FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: step
 FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: next: next
 FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: with thread-specific bp: step: step
 FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: with thread-specific bp: next: next

gdb.log:

 (gdb) PASS: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: set scheduler-locking off
 step
 wait_threads () at ../../../src/gdb/testsuite/gdb.threads/step-over-trips-on-watchpoint.c:49
 49        return 1; /* in wait_threads */
 (gdb) FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: step

The problem is that the test assumes that both the "watch_me = 1;" and
the "other = 1;" lines compile to a single instruction each, which
happens to be true on x86, but no necessarily true everywhere else.
The result is that the test doesn't really test what it wants to test.

Fix it by looking for the instruction that triggers the watchpoint.

gdb/ChangeLog:
2015-04-10  Pedro Alves  <palves@redhat.com>

	* gdb.threads/step-over-trips-on-watchpoint.c (child_function):
	Remove comment.
	* gdb.threads/step-over-trips-on-watchpoint.exp (do_test): Find
	both the address of the instruction that triggers the watchpoint
	and the address of the instruction immediately after, and use
	those addresses for the test.  Fix comment.
2015-04-10 13:11:32 +01:00
Pedro Alves
8d707a12ef gdb/18216: displaced step+deliver signal, a thread needs step-over, crash
The problem is that with hardware step targets and displaced stepping,
"signal FOO" when stopped at a breakpoint steps the breakpoint
instruction at the same time it delivers a signal.  This results in
tp->stepped_breakpoint set, but no step-resume breakpoint set.  When
the next stop event arrives, GDB crashes.  Irrespective of whether we
should do something more/different to step past the breakpoint in this
scenario (e.g., PR 18225), it's just wrong to assume there'll be a
step-resume breakpoint set (and was not the original intention).

gdb/ChangeLog:
2015-04-10  Pedro Alves  <palves@redhat.com>

	PR gdb/18216
	* infrun.c (process_event_stop_test): Don't assume a step-resume
	is set if tp->stepped_breakpoint is true.

gdb/testsuite/ChangeLog:
2015-04-10  Pedro Alves  <palves@redhat.com>

	PR gdb/18216
	* gdb.threads/multiple-step-overs.exp: Remove expected eof.
2015-04-10 10:36:23 +01:00
Pedro Alves
f3770638ca Add test for PR18214 and PR18216 - multiple step-overs with queued signals
Both PRs are triggered by the same use case.

PR18214 is about software single-step targets.  On those, the 'resume'
code that detects that we're stepping over a breakpoint and delivering
a signal at the same time:

  /* Currently, our software single-step implementation leads to different
     results than hardware single-stepping in one situation: when stepping
     into delivering a signal which has an associated signal handler,
     hardware single-step will stop at the first instruction of the handler,
     while software single-step will simply skip execution of the handler.
...
     Fortunately, we can at least fix this particular issue.  We detect
     here the case where we are about to deliver a signal while software
     single-stepping with breakpoints removed.  In this situation, we
     revert the decisions to remove all breakpoints and insert single-
     step breakpoints, and instead we install a step-resume breakpoint
     at the current address, deliver the signal without stepping, and
     once we arrive back at the step-resume breakpoint, actually step
     over the breakpoint we originally wanted to step over.  */

doesn't handle the case of _another_ thread also needing to step over
a breakpoint.  Because the other thread is just resumed at the PC
where it had stopped and a breakpoint is still inserted there, the
thread immediately re-traps the same breakpoint.  This test exercises
that.  On software single-step targets, it fails like this:

 KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr3: continue to sigusr1_handler
 KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr2: continue to sigusr1_handler

gdb.log (simplified):

 (gdb) continue
 Continuing.

 Breakpoint 4, child_function_2 (arg=0x0) at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:66
 66            callme (); /* set breakpoint thread 2 here */
 (gdb) thread 3
 (gdb) queue-signal SIGUSR1
 (gdb) thread 1
 [Switching to thread 1 (Thread 0x7ffff7fc1740 (LWP 24824))]
 #0  main () at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:106
 106       wait_threads (); /* set wait-threads breakpoint here */
 (gdb) break sigusr1_handler
 Breakpoint 5 at 0x400837: file src/gdb/testsuite/gdb.threads/multiple-step-overs.c, line 31.
 (gdb) continue
 Continuing.
 [Switching to Thread 0x7ffff7fc0700 (LWP 24828)]

 Breakpoint 4, child_function_2 (arg=0x0) at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:66
 66            callme (); /* set breakpoint thread 2 here */
 (gdb) KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr3: continue to sigusr1_handler


For good measure, I made the test try displaced stepping too.  And
then I found it crashes GDB on x86-64 (a hardware step target), but
only when displaced stepping... :

 KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr1: continue to sigusr1_handler (PRMS: gdb/18216)
 KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr2: continue to sigusr1_handler (PRMS: gdb/18216)
 KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr3: continue to sigusr1_handler (PRMS: gdb/18216)

 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x000000000062a83a in process_event_stop_test (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4964
 4964          if (sr_bp->loc->permanent
 Setting up the environment for debugging gdb.
 Breakpoint 1 at 0x79fcfc: file src/gdb/common/errors.c, line 54.
 Breakpoint 2 at 0x50a26c: file src/gdb/cli/cli-cmds.c, line 217.
 (top-gdb) p sr_bp
 $1 = (struct breakpoint *) 0x0
 (top-gdb) bt
 #0  0x000000000062a83a in process_event_stop_test (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4964
 #1  0x000000000062a1af in handle_signal_stop (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4715
 #2  0x0000000000629097 in handle_inferior_event (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4165
 #3  0x0000000000627482 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3298
 #4  0x000000000064ad7b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
 #5  0x00000000004c375f in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4658
 #6  0x0000000000648c47 in handle_file_event (file_ptr=0x2e0eaa0, ready_mask=1) at src/gdb/event-loop.c:658

The all-stop-non-stop series fixes this, but meanwhile, this augments
the multiple-step-overs.exp test to cover this, KFAILed.

gdb/testsuite/ChangeLog:
2015-04-08  Pedro Alves  <palves@redhat.com>

	PR gdb/18214
	PR gdb/18216
	* gdb.threads/multiple-step-overs.c (sigusr1_handler): New
	function.
	(main): Install it as SIGUSR1 handler.
	* gdb.threads/multiple-step-overs.exp (setup): Remove 'prefix'
	parameter.  Always use "setup" as prefix.  Toggle "set
	displaced-stepping" off/on depending on global.  Don't switch to
	thread 1 here.
	(top level): Add displaced stepping "off/on" test axis.  Update
	"setup" calls.  Wrap each subtest with with_test_prefix.  Test
	continuing with a queued signal in each thread.
2015-04-08 19:59:03 +01:00
Yao Qi
337532fab1 Properly set alarm value in gdb.threads/non-stop-fair-events.exp
Nowadays, the alarm value is 60, and alarm is generated on some slow
boards.  This patch is to pass DejaGNU timeout value to the program,
and move the alarm call before going to infinite loop.  If any thread
has activities, the alarm is reset.

gdb/testsuite:

2015-04-07  Yao Qi  <yao.qi@linaro.org>

	* gdb.threads/non-stop-fair-events.c (SECONDS): New macro.
	(child_function): Call alarm.
	(main): Move call to alarm into the loop.
	* gdb.threads/non-stop-fair-events.exp: Build program with
	-DTIMEOUT=$timeout.
2015-04-07 11:30:07 +01:00
Yao Qi
cafda5977a kfail two tests in no-unwaited-for-left.exp for remote target
I see these two fails in no-unwaited-for-left.exp in remote testing
for aarch64-linux target.

...
continue
Continuing.
warning: Remote failure reply: E.No unwaited-for children left.

[Thread 1084] #2 stopped.
(gdb) FAIL: gdb.threads/no-unwaited-for-left.exp: continue stops when thread 2 exits

....
continue
Continuing.
warning: Remote failure reply: E.No unwaited-for children left.

[Thread 1081] #1 stopped.
(gdb) FAIL: gdb.threads/no-unwaited-for-left.exp: continue stops when the main thread exits

I checked the gdb.log on buildbot, and find that these two fails also
appear on Debian-i686-native-extended-gdbserver and Fedora-ppc64be-native-gdbserver-m64.
I recall that they are about local/remote parity, and related RSP is missing.
There has been already a PR 14618 about it.  This patch is to kfail them
on remote target.

gdb/testsuite:

2015-04-02  Yao Qi  <yao.qi@linaro.org>

	* gdb.threads/no-unwaited-for-left.exp: Set up kfail if target
	is remote.
2015-04-02 13:51:31 +01:00
Pedro Alves
a14711808e gdb.threads/manythreads.exp: can't read "test": no such variable
If interrupt_and_wait manages to trigger the FAIL path, we get:

  ERROR OCCURED: can't read "test": no such variable

gdb/testsuite/ChangeLog:
2015-04-01  Pedro Alves  <palves@redhat.com>

	* gdb.threads/manythreads.exp (interrupt_and_wait): Pass $message
	to fail instead of non-existent $test.
2015-04-01 15:30:13 +01:00
Pedro Alves
4eec2deb06 Crash on thread id wrap around
On GNU/Linux, if the target reuses the TID of a thread that GDB still
has in its list marked as THREAD_EXITED, GDB crashes, like:

 (gdb) continue
 Continuing.
 src/gdb/thread.c:789: internal-error: set_running: Assertion `tp->state != THREAD_EXITED' failed.
 A problem internal to GDB has been detected,
 further debugging may prove unreliable.
 Quit this debugging session? (y or n) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_reuse_time (GDB internal error)

Here:

 (top-gdb) bt
 #0  internal_error (file=0x953dd8 "src/gdb/thread.c", line=789, fmt=0x953da0 "%s: Assertion `%s' failed.")
     at src/gdb/common/errors.c:54
 #1  0x0000000000638514 in set_running (ptid=..., running=1) at src/gdb/thread.c:789
 #2  0x00000000004bda42 in linux_handle_extended_wait (lp=0x16f5760, status=0, stopping=0) at src/gdb/linux-nat.c:2114
 #3  0x00000000004bfa24 in linux_nat_filter_event (lwpid=20570, status=198015) at src/gdb/linux-nat.c:3127
 #4  0x00000000004c070e in linux_nat_wait_1 (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3478
 #5  0x00000000004c1015 in linux_nat_wait (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3722
 #6  0x00000000004c92d2 in thread_db_wait (ops=0xd80b60 <thread_db_ops>, ptid=..., ourstatus=0x7fffffffd2c0, options=1)
     at src/gdb/linux-thread-db.c:1525
 #7  0x000000000066db43 in delegate_wait (self=0xd80b60 <thread_db_ops>, arg1=..., arg2=0x7fffffffd2c0, arg3=1) at src/gdb/target-delegates.c:116
 #8  0x000000000067e54b in target_wait (ptid=..., status=0x7fffffffd2c0, options=1) at src/gdb/target.c:2206
 #9  0x0000000000625111 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3275
 #10 0x0000000000648a3b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
 #11 0x00000000004c2ecb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4655

I managed to come up with a test that reliably reproduces this.  It
spawns enough threads for the pid number space to wrap around, so
could potentially take a while.  On my box that's 4 seconds; on
gcc110, a PPC box which has max_pid set to 65536, it's over 10
seconds.  So I made the test compute how long that would take, and cap
the time waited if it would be unreasonably long.

Tested on x86_64 Fedora 20.

gdb/ChangeLog:
2015-04-01  Pedro Alves  <palves@redhat.com>

	* linux-thread-db.c (record_thread): Readd the thread to gdb's
	list if it was marked exited.

gdb/testsuite/ChangeLog:
2015-04-01  Pedro Alves  <palves@redhat.com>

	* gdb.threads/tid-reuse.c: New file.
	* gdb.threads/tid-reuse.exp: New file.
2015-04-01 13:38:06 +01:00
Pedro Alves
a25d8bf9c5 Fix "thread apply all" with exited threads
I noticed that "thread apply all" sometimes crashes.

The problem is that thread_apply_all_command doesn take exited threads
into account, and we qsort and then walk more elements than there
really ever were put in the array.  Valgrind shows:

 The current thread <Thread ID 3> has terminated.  See `help thread'.
 (gdb) thread apply all p 1

 Thread 1 (Thread 0x7ffff7fc2740 (LWP 29579)):
 $1 = 1
 ==29576== Use of uninitialised value of size 8
 ==29576==    at 0x639CA8: set_thread_refcount (thread.c:1337)
 ==29576==    by 0x5C2C7B: do_my_cleanups (cleanups.c:155)
 ==29576==    by 0x5C2CE8: do_cleanups (cleanups.c:177)
 ==29576==    by 0x63A191: thread_apply_all_command (thread.c:1477)
 ==29576==    by 0x50374D: do_cfunc (cli-decode.c:105)
 ==29576==    by 0x506865: cmd_func (cli-decode.c:1893)
 ==29576==    by 0x7562CB: execute_command (top.c:476)
 ==29576==    by 0x647DA4: command_handler (event-top.c:494)
 ==29576==    by 0x648367: command_line_handler (event-top.c:692)
 ==29576==    by 0x7BF7C9: rl_callback_read_char (callback.c:220)
 ==29576==    by 0x64784C: rl_callback_read_char_wrapper (event-top.c:171)
 ==29576==    by 0x647CB5: stdin_event_handler (event-top.c:432)
 ==29576==
 ...

This can happen easily today as linux-nat.c/linux-thread-db.c are
forgetting to purge non-current exited threads.  But even with that
fixed, we can always do "thread apply all" with an exited thread
selected, which won't be deleted until the user switches to another
thread.  That's what the test added by this commit exercises.

Tested on x86_64 Fedora 20.

gdb/ChangeLog:
2015-03-24  Pedro Alves  <palves@redhat.com>

	* thread.c (thread_apply_all_command): Take exited threads into
	account.

gdb/testsuite/ChangeLog:
2015-03-24  Pedro Alves  <palves@redhat.com>

	* gdb.threads/no-unwaited-for-left.exp: Test "thread apply all".
2015-03-24 21:01:29 +00:00
Pedro Alves
856e7dd698 Make "set scheduler-locking step" depend on user intention, only
Currently, "set scheduler-locking step" is a bit odd.  The manual
documents it as being optimized for stepping, so that focus of
debugging does not change unexpectedly, but then it says that
sometimes other threads may run, and thus focus may indeed change
unexpectedly...  A user can then be excused to get confused and wonder
why does GDB behave like this.

I don't think a user should have to know about details of how "next"
or whatever other run control command is implemented internally to
understand when does the "scheduler-locking step" setting take effect.

This patch completes a transition that the code has been moving
towards for a while.  It makes "set scheduler-locking step" hold
threads depending on whether the _command_ the user entered was a
stepping command [step/stepi/next/nexti], or not.

Before, GDB could end up locking threads even on "continue" if for
some reason run control decides a thread needs to be single stepped
(e.g., for a software watchpoint).

After, if a "continue" happens to need to single-step for some reason,
we won't lock threads (unless when stepping over a breakpoint,
naturally).  And if a stepping command wants to continue a thread for
bit, like when skipping a function to a step-resume breakpoint, we'll
still lock threads, so focus of debugging doesn't change.

In order to make this work, we need to record in the thread structure
whether what set it running was a stepping command.

(A follow up patch will remove the "step" parameters of 'proceed' and 'resume')

FWIW, Fedora GDB, which defaults to "scheduler-locking step" (mainline
defaults to "off") carries a different patch that goes in this
direction as well.

Tested on x86_64 Fedora 20, native and gdbserver.

gdb/ChangeLog:
2015-03-24  Pedro Alves  <palves@redhat.com>

	* gdbthread.h (struct thread_control_state) <stepping_command>:
	New field.
	* infcmd.c (step_once): Pass step=1 to clear_proceed_status.  Set
	the thread's stepping_command field.
	* infrun.c (resume): Check the thread's stepping_command flag to
	determine which threads should be resumed.  Rename 'entry_step'
	local to user_step.
	(clear_proceed_status_thread): Clear 'stepping_command'.
	(schedlock_applies): Change parameter type to struct thread_info
	pointer.  Adjust.
	(find_thread_needs_step_over): Remove 'step' parameter.  Adjust.
	(switch_back_to_stepped_thread): Adjust calls to
	'schedlock_applies'.
	(_initialize_infrun): Adjust "set scheduler-locking step" help.

gdb/testsuite/ChangeLog:
2015-03-24  Pedro Alves  <palves@redhat.com>

	* gdb.threads/schedlock.exp (test_step): No longer expect that
	"set scheduler-locking step" with "next" over a function call runs
	threads unlocked.

gdb/doc/ChangeLog:
2015-03-24  Pedro Alves  <palves@redhat.com>

	* gdb.texinfo (test_step) <set scheduler-locking step>: No longer
	mention that threads may sometimes run unlocked.
2015-03-24 17:50:31 +00:00
Pedro Alves
8bf3b159e5 gdbserver/Linux: unbreak thread event randomization
Wanting to make sure the new continue-pending-status.exp test tests
both cases of threads 2 and 3 reporting an event, I added counters to
the test, to make it FAIL if events for both threads aren't seen.
Assuming a well behaved backend, and given a reasonable number of
iterations, it should PASS.

However, running that against GNU/Linux gdbserver, I found that
surprisingly, that FAILed.  GDBserver always reported the breakpoint
hit for the same thread.

Turns out that I broke gdbserver's thread event randomization
recently, with git commit 582511be ([gdbserver] linux-low.c: better
starvation avoidance, handle non-stop mode too).  In that commit I
missed that the thread structure also has a status_pending_p field...
The end result was that count_events_callback always returns 0, and
then if no thread is stepping, select_event_lwp always returns the
event thread.  IOW, no randomization is happening at all.  Quite
curious how all the other changes in that patch were sufficient to fix
non-stop-fair-events.exp anyway even with that broken.

Tested on x86_64 Fedora 20, native and gdbserver.

gdb/gdbserver/ChangeLog:
2015-03-19 Pedro Alves  <palves@redhat.com>

	* linux-low.c (count_events_callback, select_event_lwp_callback):
	Use the lwp's status_pending_p field, not the thread's.

gdb/testsuite/ChangeLog:
2015-03-19  Pedro Alves  <palves@redhat.com>

	* gdb.threads/continue-pending-status.exp (saw_thread_2)
	(saw_thread_3): New globals.
	(top level): Increment them when an event for the corresponding
	thread is seen.
	(no thread starvation): New test.
2015-03-19 12:38:05 +00:00
Pedro Alves
eb54c8bf08 native/Linux: internal error if resume is short-circuited
If the linux_nat_resume's short-circuits the resume because the
current thread has a pending status, and, a thread with a higher
number was previously stopped for a breakpoint, GDB internal errors,
like:

 /home/pedro/gdb/mygit/src/gdb/linux-nat.c:2590: internal-error: status_callback: Assertion `lp->status != 0' failed.

Fix this by make status_callback bail out earlier.  GDBserver is
already doing the same.

New test added that exercises this.

gdb/ChangeLog:
2015-03-19  Pedro Alves  <palves@redhat.com>

	* linux-nat.c (status_callback): Return early if the LWP has no
	status pending.

gdb/testsuite/ChangeLog:
2015-03-19  Pedro Alves  <palves@redhat.com>

	* gdb.threads/continue-pending-status.c: New file.
	* gdb.threads/continue-pending-status.exp: New file.
2015-03-19 12:26:49 +00:00
Pedro Alves
be9957b82f Fix gdb.threads/thread-specific-bp.exp race
Gary stumbled on this:

 (gdb) PASS: gdb.threads/thread-specific-bp.exp: all-stop: continue to end
 info threads
   Id   Target Id         Frame
 * 1    Thread 0x7ffff7fdb700 (LWP 13717) "thread-specific" end () at /home/gary/work/archer/startswith/src/gdb/testsuite/gdb.threads/thread-specific-bp.c:29
 (gdb) FAIL: gdb.threads/thread-specific-bp.exp: all-stop: thread start is gone
 info breakpoint

The problem is that "...archer/startswith/src..." has a "start" in it,
which matches the too-lax regex in the test.

Rather than tweaking the regex, we can just remove the whole "info
threads", like we removed similar ones in other files -- GDB nowadays
does this implicitly already, so things should work without it.  Thus
removing this even improves testing here a bit.

gdb/testsuite/ChangeLog:
2015-03-04  Pedro Alves  <palves@redhat.com>

	* gdb.threads/thread-specific-bp.exp: Delete "info threads" test.
2015-03-04 17:23:55 +00:00
Pedro Alves
511aee7c39 gdb.threads/clone-thread_db.c: Add missing includes and fix pthread_join call
This fixes:

> gdb compile failed, /gdb/testsuite/gdb.threads/clone-thread_db.c: In function 'main':
> /gdb/testsuite/gdb.threads/clone-thread_db.c:67:3: warning: implicit declaration of function 'alarm' [-Wimplicit-function-declaration]
>    alarm (300);
>    ^
> /gdb/testsuite/gdb.threads/clone-thread_db.c:69:3: warning: implicit declaration of function 'pthread_create' [-Wimplicit-function-declaration]
>    pthread_create (&child, NULL, thread_fn, NULL);
>    ^
> /gdb/testsuite/gdb.threads/clone-thread_db.c:70:3: warning: implicit declaration of function 'pthread_join' [-Wimplicit-function-declaration]
>    pthread_join (child);
>    ^

And then adding the missing headers revealed the pthread_join call was
incorrect.  This probably fixes the crash we see on ppc64be, e.g., at

 https://sourceware.org/ml/gdb-testers/2015-q1/msg04415.html

the logs there show:

 ...
 Program received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x3fffb7ff54a0 (LWP 9275)]
 0x00003fffb7f3ce74 in .pthread_join () from /lib64/libpthread.so.0
 (gdb) FAIL: gdb.threads/clone-thread_db.exp: continue to end
 ...

Tested on x86_64 Fedora 20.

gdb/testsuite/
2015-03-04  Pedro Alves  <palves@redhat.com>

	* gdb.threads/clone-thread_db.c: Include unistd.h and pthread.h.
	(main): Pass missing retval argument to pthread_join call.
2015-03-04 09:13:49 +00:00
Pedro Alves
95e50b2723 follow-exec: delete all non-execing threads
This fixes invalid reads Valgrind first caught when debugging against
a GDBserver patched with a series that adds exec events to the remote
protocol.  Like these, using the gdb.threads/thread-execl.exp test:

$ valgrind ./gdb -data-directory=data-directory ./testsuite/gdb.threads/thread-execl  -ex "tar extended-remote :9999" -ex "b thread_execler" -ex "c" -ex "set scheduler-locking on"
...
Breakpoint 1, thread_execler (arg=0x0) at src/gdb/testsuite/gdb.threads/thread-execl.c:29
29        if (execl (image, image, NULL) == -1)
(gdb) n
Thread 32509.32509 is executing new program: build/gdb/testsuite/gdb.threads/thread-execl
[New Thread 32509.32532]
==32510== Invalid read of size 4
==32510==    at 0x5AA7D8: delete_breakpoint (breakpoint.c:13989)
==32510==    by 0x6285D3: delete_thread_breakpoint (thread.c:100)
==32510==    by 0x628603: delete_step_resume_breakpoint (thread.c:109)
==32510==    by 0x61622B: delete_thread_infrun_breakpoints (infrun.c:2928)
==32510==    by 0x6162EF: for_each_just_stopped_thread (infrun.c:2958)
==32510==    by 0x616311: delete_just_stopped_threads_infrun_breakpoints (infrun.c:2969)
==32510==    by 0x616C96: fetch_inferior_event (infrun.c:3267)
==32510==    by 0x63A2DE: inferior_event_handler (inf-loop.c:57)
==32510==    by 0x4E0E56: remote_async_serial_handler (remote.c:11877)
==32510==    by 0x4AF620: run_async_handler_and_reschedule (ser-base.c:137)
==32510==    by 0x4AF6F0: fd_event (ser-base.c:182)
==32510==    by 0x63806D: handle_file_event (event-loop.c:762)
==32510==  Address 0xcf333e0 is 16 bytes inside a block of size 200 free'd
==32510==    at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==32510==    by 0x77CB74: xfree (common-utils.c:98)
==32510==    by 0x5AA954: delete_breakpoint (breakpoint.c:14056)
==32510==    by 0x5988BD: update_breakpoints_after_exec (breakpoint.c:3765)
==32510==    by 0x61360F: follow_exec (infrun.c:1091)
==32510==    by 0x6186FA: handle_inferior_event (infrun.c:4061)
==32510==    by 0x616C55: fetch_inferior_event (infrun.c:3261)
==32510==    by 0x63A2DE: inferior_event_handler (inf-loop.c:57)
==32510==    by 0x4E0E56: remote_async_serial_handler (remote.c:11877)
==32510==    by 0x4AF620: run_async_handler_and_reschedule (ser-base.c:137)
==32510==    by 0x4AF6F0: fd_event (ser-base.c:182)
==32510==    by 0x63806D: handle_file_event (event-loop.c:762)
==32510==
[Switching to Thread 32509.32532]

Breakpoint 1, thread_execler (arg=0x0) at src/gdb/testsuite/gdb.threads/thread-execl.c:29
29        if (execl (image, image, NULL) == -1)
(gdb)

The breakpoint in question is the step-resume breakpoint of the
non-main thread, the one that was "next"ed.

The exact same issue can be seen on mainline with native debugging, by
running the thread-execl.exp test in non-stop mode, because the kernel
doesn't report a thread exit event for the execing thread.

Tested on x86_64 Fedora 20.

gdb/ChangeLog:
2015-03-02  Pedro Alves  <palves@redhat.com>

	* infrun.c (follow_exec): Delete all threads of the process except
	the event thread.  Extended comments.

gdb/testsuite/ChangeLog:
2015-03-02  Pedro Alves  <palves@redhat.com>

	* gdb.threads/thread-execl.exp (do_test): Handle non-stop.
	(top level): Call do_test with non-stop as well.
2015-03-03 01:25:17 +00:00
Pedro Alves
a47cd6e95a gdb.threads/multi-create-ns-info-thr.exp and native-extended-remote board
The buildbot shows that the new
gdb.threads/multi-create-ns-info-thr.exp test is timing out when
tested with --target=native-extended-remote.  The reason is:

 No breakpoints or watchpoints.
 (gdb) break main
 Breakpoint 1 at 0x10000b00: file ../../../binutils-gdb/gdb/testsuite/gdb.threads/multi-create.c, line 72.
 (gdb) run
 Starting program: /home/gdb-buildbot/fedora-21-ppc64be-1/fedora-ppc64be-native-extended-gdbserver/build/gdb/testsuite/outputs/gdb.threads/multi-create-ns-info-thr/multi-cre
 ate-ns-info-thr
 Process /home/gdb-buildbot/fedora-21-ppc64be-1/fedora-ppc64be-native-extended-gdbserver/build/gdb/testsuite/outputs/gdb.threads/multi-create-ns-info-thr/multi-create-ns-inf
 o-thr created; pid = 16266
 Unexpected vCont reply in non-stop mode: T0501:00003fffffffd190;40:00000080560fe290;thread:p3f8a.3f8a;core:0;
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 (gdb) break multi-create.c:45
 Breakpoint 2 at 0x10000994: file ../../../binutils-gdb/gdb/testsuite/gdb.threads/multi-create.c, line 45.
 (gdb) commands
 Type commands for breakpoint(s) 2, one per line.

Non-stop tests don't really work with the
--target_board=native-extended-remote board, because tests toggle
non-stop on after GDB is already connected to gdbserver, while
Currently, non-stop must be enabled before connecting.

This adjusts the test to bail if running to main fails, like all other
non-stop tests.

Note non-stop tests do work with --target_board=native-gdbserver.

gdb/testsuite/ChangeLog:
2015-02-21  Pedro Alves  <palves@redhat.com>

	* gdb.threads/multi-create-ns-info-thr.exp: Return early if
	runto_main fails.
2015-02-21 12:03:23 +00:00
Pedro Alves
2db9a4275c GNU/Linux: Stop using libthread_db/td_ta_thr_iter
TL;DR - GDB can hang if something refreshes the thread list out of the
target while the target is running.  GDB hangs inside td_ta_thr_iter.
The fix is to not use that libthread_db function anymore.

Long version:

Running the testsuite against my all-stop-on-top-of-non-stop series is
still exposing latent non-stop bugs.

I was originally seeing this with the multi-create.exp test, back when
we were still using libthread_db thread event breakpoints.  The
all-stop-on-top-of-non-stop series forces a thread list refresh each
time GDB needs to start stepping over a breakpoint (to pause all
threads).  That test hits the thread event breakpoint often, resulting
in a bunch of step-over operations, thus a bunch of thread list
refreshes while some threads in the target are running.

The commit adds a real non-stop mode test that triggers the issue,
based on multi-create.exp, that does an explicit "info threads" when a
breakpoint is hit.  IOW, it does the same things the as-ns series was
doing when testing multi-create.exp.

The bug is a race, so it unfortunately takes several runs for the test
to trigger it.  In fact, even when setting the test running in a loop,
it sometimes takes several minutes for it to trigger for me.

The race is related to libthread_db's td_ta_thr_iter.  This is
libthread_db's entry point for walking the thread list of the
inferior.

Sometimes, when GDB refreshes the thread list from the target,
libthread_db's td_ta_thr_iter can somehow see glibc's thread list as a
cycle, and get stuck in an infinite loop.

The issue is that when a thread exits, its thread control structure in
glibc is moved from a "used" list to a "cache" list.  These lists are
simply circular linked lists where the "next/prev" pointers are
embedded in the thread control structure itself.  The "next" pointer
of the last element of the list points back to the list's sentinel
"head".  There's only one set of "next/prev" pointers for both lists;
thus a thread can only be in one of the lists at a time, not in both
simultaneously.

So when thread C exits, simplifying, the following happens.  A-C are
threads.  stack_used and stack_cache are the list's heads.

Before:

  stack_used -> A -> B -> C -> (&stack_used)
  stack_cache -> (&stack_cache)

After:

  stack_used -> A -> B -> (&stack_used)
  stack_cache -> C -> (&stack_cache)

td_ta_thr_iter starts by iterating at the list's head's next, and
iterates until it sees a thread whose next pointer points to the
list's head again.  Thus in the before case above, C's next points to
stack_used, indicating end of list.  In the same case, the stack_cache
list is empty.

For each thread being iterated, td_ta_thr_iter reads the whole thread
object out of the inferior.  This includes the thread's "next"
pointer.

In the scenario above, it may happen that td_ta_thr_iter is iterating
thread B and has already read B's thread structure just before thread
C exits and its control structure moves to the cached list.

Now, recall that td_ta_thr_iter is running in the context of GDB, and
there's no locking between GDB and the inferior.  From it's local copy
of B, td_ta_thr_iter believes that the next thread after B is thread
C, so it happilly continues iterating to C, a thread that has already
exited, and is now in the stack cache list.

After iterating C, td_ta_thr_iter finds the stack_cache head, which
because it is not stack_used, td_ta_thr_iter assumes it's just another
thread.  After this, unless the reverse race triggers, GDB gets stuck
in td_ta_thr_iter forever walking the stack_cache list, as no thread
in thatlist has a next pointer that points back to stack_used (the
terminating condition).

Before fully understanding the issue, I tried adding cycle detection
to GDB's td_ta_thr_iter callback.  However, td_ta_thr_iter skips
calling the callback in some cases, which means that it's possible
that the callback isn't called at all, making it impossible for GDB to
break the loop.  I did manage to get GDB stuck in that state more than
once.

Fortunately, we can avoid the issue altogether.  We don't really need
td_ta_thr_iter for live debugging nowadays, given PTRACE_EVENT_CLONE.
We already know how to map and lwp id to a thread id without iterating
(thread_from_lwp), so use that more.

gdb/ChangeLog:
2015-02-20  Pedro Alves  <palves@redhat.com>

	* linux-nat.c (linux_handle_extended_wait): Call
	thread_db_notice_clone whenever a new clone LWP is detected.
	(linux_stop_and_wait_all_lwps, linux_unstop_all_lwps): New
	functions.
	* linux-nat.h (thread_db_attach_lwp): Delete declaration.
	(thread_db_notice_clone, linux_stop_and_wait_all_lwps)
	(linux_unstop_all_lwps): Declare.
	* linux-thread-db.c (struct thread_get_info_inout): Delete.
	(thread_get_info_callback): Delete.
	(thread_from_lwp): Use td_thr_get_info and record_thread.
	(thread_db_attach_lwp): Delete.
	(thread_db_notice_clone): New function.
	(try_thread_db_load_1): If /proc is mounted and shows the
	process'es task list, walk over all LWPs and call thread_from_lwp
	instead of relying on td_ta_thr_iter.
	(attach_thread): Don't call check_thread_signals here.  Split the
	tail part of the function (which adds the thread to the core GDB
	thread list) to ...
	(record_thread): ... this function.  Call check_thread_signals
	here.
	(thread_db_wait): Don't call thread_db_find_new_threads_1.  Always
	call thread_from_lwp.
	(thread_db_update_thread_list): Rename to ...
	(thread_db_update_thread_list_org): ... this.
	(thread_db_update_thread_list): New function.
	(thread_db_find_thread_from_tid): Delete.
	(thread_db_get_ada_task_ptid): Simplify.
	* nat/linux-procfs.c: Include <sys/stat.h>.
	(linux_proc_task_list_dir_exists): New function.
	* nat/linux-procfs.h (linux_proc_task_list_dir_exists): Declare.

gdb/gdbserver/ChangeLog:
2015-02-20  Pedro Alves  <palves@redhat.com>

	* thread-db.c: Include "nat/linux-procfs.h".
	(thread_db_init): Skip listing new threads if the kernel supports
	PTRACE_EVENT_CLONE and /proc/PID/task/ is accessible.

gdb/testsuite/ChangeLog:
2015-02-20  Pedro Alves  <palves@redhat.com>

	* gdb.threads/multi-create-ns-info-thr.exp: New file.
2015-02-20 21:40:31 +00:00
Pedro Alves
5c5019c27c PR18006: internal error if threaded program calls clone(CLONE_VM)
On GNU/Linux, if a pthreaded program has a thread call clone(CLONE_VM)
directly, and then that clone LWP hits a debug event (breakpoint,
etc.) GDB internal errors.  Threaded programs shouldn't really be
calling clone directly, but GDB shouldn't crash either.

The crash looks like this:

 (gdb) break clone_fn
 Breakpoint 2 at 0x4007d8: file clone-thread_db.c, line 35.
 (gdb) r
 ...
 [Thread debugging using libthread_db enabled]
 ...
 src/gdb/linux-nat.c:1030: internal-error: lin_lwp_attach_lwp: Assertion `lwpid > 0' failed.
 A problem internal to GDB has been detected,
 further debugging may prove unreliable.

The problem is that 'clone' ends up clearing the parent thread's tid
field in glibc's thread data structure.  For x86_64, the glibc code in
question is here:

  sysdeps/unix/sysv/linux/x86_64/clone.S:

   ...
          testq   $CLONE_THREAD, %rdi
          jne     1f
          testq   $CLONE_VM, %rdi
          movl    $-1, %eax            <----
          jne     2f
          movl    $SYS_ify(getpid), %eax
          syscall
  2:      movl    %eax, %fs:PID
          movl    %eax, %fs:TID        <----
  1:

When GDB refreshes the thread list out of libthread_db, it finds a
thread with LWP with pid -1 (the clone's parent), which naturally
isn't yet on the thread list.  GDB then tries to attach to that bogus
LWP id, which is caught by that assertion.

The fix is to detect the bad PID early.

Tested on x86-64 Fedora 20.  GDBserver doesn't need any fix.

gdb/ChangeLog:
2015-02-20  Pedro Alves  <palves@redhat.com>

	PR threads/18006
	* linux-thread-db.c (thread_get_info_callback): Return early if
	the thread's lwp id is -1.

gdb/testsuite/ChangeLog:
2015-02-20  Pedro Alves  <palves@redhat.com>

	PR threads/18006
	* gdb.threads/clone-thread_db.c: New file.
	* gdb.threads/clone-thread_db.exp: New file.
2015-02-20 19:00:21 +00:00
Pedro Alves
0703599a49 Fix adjust_pc_after_break, remove still current thread check
On decr_pc_after_break targets, GDB adjusts the PC incorrectly if a
background single-step stops somewhere where PC-$decr_pc has a
breakpoint, and the thread that finishes the step is not the current
thread, like:

   ADDR1 nop <-- breakpoint here
   ADDR2 jmp PC

IOW, say thread A is stepping ADDR2's line in the background (an
infinite loop), and the user switches focus to thread B.  GDB's
adjust_pc_after_break logic confuses the single-step stop of thread A
for a hit of the breakpoint at ADDR1, and thus adjusts thread A's PC
to point at ADDR1 when it should not, and reports a breakpoint hit,
when thread A did not execute the instruction at ADDR1 at all.

The test added by this patch exercises exactly that.

I can't find any reason we'd need the "thread to be examined is still
the current thread" condition in adjust_pc_after_break, at least
nowadays; it might have made sense in the past.  Best just remove it,
and rely on currently_stepping().

Here's the test's log of a run with an unpatched GDB:

 35        while (1);
 (gdb) PASS: gdb.threads/step-bg-decr-pc-switch-thread.exp: next over nop
 next&
 (gdb) PASS: gdb.threads/step-bg-decr-pc-switch-thread.exp: next& over inf loop
 thread 1
 [Switching to thread 1 (Thread 0x7ffff7fc2740 (LWP 29027))](running)
 (gdb)
 PASS: gdb.threads/step-bg-decr-pc-switch-thread.exp: switch to main thread
 Breakpoint 2, thread_function (arg=0x0) at ...src/gdb/testsuite/gdb.threads/step-bg-decr-pc-switch-thread.c:34
 34        NOP; /* set breakpoint here */
 FAIL: gdb.threads/step-bg-decr-pc-switch-thread.exp: no output while stepping

gdb/ChangeLog:
2015-02-11  Pedro Alves  <pedro@codesourcery.com>

	* infrun.c (adjust_pc_after_break): Don't adjust the PC just
	because the event thread is not the current thread.

gdb/testsuite/ChangeLog:
2015-02-11  Pedro Alves  <pedro@codesourcery.com>

	* gdb.threads/step-bg-decr-pc-switch-thread.c: New file.
	* gdb.threads/step-bg-decr-pc-switch-thread.exp: New file.
2015-02-11 09:45:41 +00:00
Pedro Alves
01b088bc51 Add "signal SIGTRAP" test
Some local changes I was working on related to SIGTRAP handling
resulted in "signal SIGTRAP" no longer passing the SIGTRAP to the
inferior.

Surprisingly, only annota1.exp catches this.  This commit adds a test
that doesn't rely on annotations, so that at the point annotations are
finaly dropped, we still have this use case covered ...

This is a multi-threaded test to also exercise the case of first
needing to do a step-over before delivering the signal.

Tested on x86_64 Fedora 20, native, remote/extended-remote gdbserver.

gdb/testsuite/
2015-02-10  Pedro Alves  <palves@redhat.com>

	* gdb.threads/signal-sigtrap.c: New file.
	* gdb.threads/signal-sigtrap.exp: New file.
2015-02-10 19:30:55 +00:00
Pedro Alves
e584fdbc6a Improve gdb.threads/attach-many-short-lived-threads.exp timeout handling
The buildbot shows that this test is still racy, and occasionally
fails with time outs on some machines.  I'd like to get major issues
with load out of the way.

The test currently exits after 180s, which is just a random number,
that has no relation to what the .exp file considers a time out.  This
commit makes the program wait a bit longer than what the .exp file
considers a time out, and, resets the timer for each iteration.

Tested on x86_64 Fedora 20, native and extended-remote gdbserver.

gdb/testsuite/
2015-02-06  Pedro Alves  <palves@redhat.com>

	* gdb.threads/attach-many-short-lived-threads.c (SECONDS): New
	macro.
	(seconds_left, again): New globals.
	(main): Wait seconds_left in a 1-second sleep loop instead of
	sleeping 180 seconds.  If 'again' is set, reset the seconds
	counter.
	* gdb.threads/attach-many-short-lived-threads.exp (test): Set
	'again' in the inferior before detaching.  Print the seconds left.
	(options): New global.
	(top level): Build program with	-DTIMEOUT=$timeout.
2015-02-06 13:24:32 +01:00
Mark Wielaard
37bc665e4e Remove testsuite compile errors with GCC5.
GCC5 defaults to the GNU11 standard for C and warns by default for
implicit function declarations and implicit return types.
https://gcc.gnu.org/gcc-5/porting_to.html

Fixing these issues in the testsuite turns 9 untested and 17 unsupported
testcases into 417 new passes when compiling with GCC5.

gdb/testsuite/ChangeLog:

        * gdb.arch/i386-bp_permanent.c (standard): New declaration.
        * gdb.base/disp-step-fork.c: Include unistd.h.
        * gdb.base/siginfo-obj.c: Include stdio.h.
        * gdb.base/siginfo-thread.c: Likewise.
        * gdb.mi/non-stop.c: Include unistd.h.
        * gdb.mi/nsthrexec.c: Include stdio.h.
        * gdb.mi/pthreads.c: Include unistd.h.
        * gdb.modula2/unbounded1.c (main): Declare returns int.
        * gdb.reverse/consecutive-reverse.c: Likewise.
        * gdb.threads/create-fail.c: Include unistd.h.
        * gdb.threads/killed.c: Likewise.
        * gdb.threads/linux-dp.c: Likewise.
        * gdb.threads/non-ldr-exc-1.c: Include stdio.h and string.h.
        * gdb.threads/non-ldr-exc-2.c: Likewise.
        * gdb.threads/non-ldr-exc-3.c: Likewise.
        * gdb.threads/non-ldr-exc-4.c: Likewise.
        * gdb.threads/pthreads.c: Include unistd.h.
        (main): Declare returns int.
        * gdb.threads/tls-main.c (foo): New declaration.
        * gdb.threads/watchpoint-fork-mt.c: Define _GNU_SOURCE.
2015-01-25 18:50:56 +01:00
Pedro Alves
198297aafb Linux: make target_is_async_p return false when async is off
linux_nat_is_async_p currently always returns true, even when the
target is _not_ async.  That confuses
gdb_readline_wrapper/gdb_readline_wrapper_cleanup, which
force-disables target-async while the secondary prompt is active.  As
a result, when gdb_readline_wrapper returns, the target is left async,
even through it was sync to begin with.

That can result in weird bugs, like the one the test added by this
commit exposes.

Ref: https://sourceware.org/ml/gdb-patches/2015-01/msg00592.html

gdb/ChangeLog:
2015-01-23  Pedro Alves  <palves@redhat.com>

	* linux-nat.c (linux_is_async_p): New macro.
	(linux_nat_is_async_p):
	(linux_nat_terminal_inferior): Check whether the target can async
	instead of whether it is already async.
	(linux_nat_terminal_ours): Don't check whether the target is
	async.
	(linux_async_pipe): Use linux_is_async_p.

gdb/testsuite/ChangeLog:
2015-01-23  Pedro Alves  <palves@redhat.com>

	* gdb.threads/continue-pending-after-query.c: New file.
	* gdb.threads/continue-pending-after-query.exp: New file.
2015-01-23 11:12:39 +00:00
Pedro Alves
ede9f622af add non-stop test that stresses thread starvation issues
This commit adds a non-stop mode test originally inspired by
signal-while-stepping-over-bp-other-thread.exp, that exposes the
thread starvation issues fixed by the previous patches.  It sets a set
of threads stepping in parallel, and has one of them get a signal.
Without the previous fixes, this would fail with timeouts.

gdb/testsuite/
2015-01-09  Pedro Alves  <palves@redhat.com>

	* gdb.threads/non-stop-fair-events.c: New file.
	* gdb.threads/non-stop-fair-events.exp: New file.
2015-01-09 14:44:42 +00:00
Pedro Alves
9665ffdd59 gdb.threads/{siginfo-thread.c,watchthreads-reorder.c,ia64-sigill.c} races with GDB
These three test all spawn a few threads and then send a SIGSTOP to
their parent GDB in order to pause it while the new threads set things
up for the test.  With a GDB patch that changes the inferior thread's
scheduling a bit, I sometimes see:

  FAIL: gdb.threads/siginfo-threads.exp: catch signal 0 (timeout)
  ...
  FAIL: gdb.threads/watchthreads-reorder.exp: reorder1: continue a (timeout)
  ...
  FAIL: gdb.threads/ia64-sigill.exp: continue (timeout)
  ...

The issue is that the test program stops GDB before it had a chance of
processing the new thread's clone event:

  (gdb) PASS: gdb.threads/siginfo-threads.exp: get pid
  continue
  Continuing.
  Stopping GDB PID 21541.
  Waiting till the threads initialize their TIDs.
  FAIL: gdb.threads/siginfo-threads.exp: catch signal 0 (timeout)

On Linux (at least), new threads start stopped, and the debugger must
resume them.  The fix is to make the test program wait for the new
threads to be running before stopping GDB.

gdb/testsuite/
2015-01-09  Pedro Alves  <palves@redhat.com>

	* gdb.threads/ia64-sigill.c (threads_started_barrier): New global.
	(thread_func): Wait on barrier.
	(main): Wait for all threads to start before stopping GDB.
	* gdb.threads/siginfo-threads.c (threads_started_barrier): New
	global.
	(thread1_func, thread2_func): Wait on barrier.
	(main): Wait for all threads to start before stopping GDB.
	* gdb.threads/watchthreads-reorder.c (threads_started_barrier):
	New global.
	(thread1_func, thread2_func): Wait on barrier.
	(main): Wait for all threads to start before stopping GDB.
2015-01-09 13:58:29 +00:00
Pedro Alves
c945a99f01 Test attaching to a program that constantly spawns short-lived threads
Before the previous fixes, on Linux, this would trigger several
different problems, like:

 [New LWP 27106]
 [New LWP 27047]
 warning: unable to open /proc file '/proc/-1/status'
 [New LWP 27813]
 [New LWP 27869]
 warning: Can't attach LWP 11962: No child processes
 Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: debugger service failed
 warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.

gdb/testsuite/
2015-01-09  Pedro Alves  <palves@redhat.com>

	* gdb.threads/attach-many-short-lived-threads.c: New file.
	* gdb.threads/attach-many-short-lived-threads.exp: New file.
2015-01-09 11:44:04 +00:00
Pedro Alves
c1a747c109 Linux: Skip thread_db thread event reporting if PTRACE_EVENT_CLONE is supported
[A test I wrote stumbled on a libthread_db issue related to thread
event breakpoints.  See glibc PR17705:
 [nptl_db: stale thread create/death events if debugger detaches]
 https://sourceware.org/bugzilla/show_bug.cgi?id=17705

This patch avoids that whole issue by making GDB stop using thread
event breakpoints in the first place, which is good for other reasons
as well, anyway.]

Before PTRACE_EVENT_CLONE (Linux 2.6), the only way to learn about new
threads in the inferior (to attach to them) or to learn about thread
exit was to coordinate with the inferior's glibc/runtime, using
libthread_db.  That works by putting a breakpoint at a magic address
which is called when a new thread is spawned, or when a thread is
about to exit.  When that breakpoint is hit, all threads are stopped,
and then GDB coordinates with libthread_db to read data structures out
of the inferior to learn about what happened.  Then the breakpoint is
single-stepped, and then all threads are re-resumed.  This isn't very
efficient (stops all threads) and is more fragile (inferior's thread
list in memory may be corrupt; libthread_db bugs, etc.) than ideal.

When the kernel supports PTRACE_EVENT_CLONE (which we already make use
of), there's really no need to use libthread_db's event reporting
mechanism to learn about new LWPs.  And if the kernel supports that,
then we learn about LWP exits through regular WIFEXITED wait statuses,
so no need for the death event breakpoint either.

GDBserver has been likewise skipping the thread_db events for a long
while:
  https://sourceware.org/ml/gdb-patches/2007-10/msg00547.html

There's one user-visible difference: we'll no longer print about
threads being created and exiting while the program is running, like:

 [Thread 0x7ffff7dbb700 (LWP 30670) exited]
 [New Thread 0x7ffff7db3700 (LWP 30671)]
 [Thread 0x7ffff7dd3700 (LWP 30667) exited]
 [New Thread 0x7ffff7dab700 (LWP 30672)]
 [Thread 0x7ffff7db3700 (LWP 30671) exited]
 [Thread 0x7ffff7dcb700 (LWP 30668) exited]

This is exactly the same behavior as when debugging against remote
targets / gdbserver.  I actually think that's a good thing (and as
such have listed this in the local/remote parity wiki page a while
ago), as the printing slows down the inferior.  It's also a
distraction to keep bothering the user about short-lived threads that
she won't be able to interact with anyway.  Instead, the user (and
frontend) will be informed about new threads that currently exist in
the program when the program next stops:

 (gdb) c
 ...
 * ctrl-c *
 [New Thread 0x7ffff7963700 (LWP 7797)]
 [New Thread 0x7ffff796b700 (LWP 7796)]

 Program received signal SIGINT, Interrupt.
 [Switching to Thread 0x7ffff796b700 (LWP 7796)]
 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81
 81              testq   %rax,%rax
 (gdb) info threads

A couple of tests had assumptions on GDB thread numbers that no longer
hold.

Tested on x86_64 Fedora 20.

gdb/
2014-01-09  Pedro Alves  <palves@redhat.com>

	Skip enabling event reporting if the kernel supports
	PTRACE_EVENT_CLONE.
	* linux-thread-db.c: Include "nat/linux-ptrace.h".
	(thread_db_use_events): New function.
	(try_thread_db_load_1): Check thread_db_use_events before enabling
	event reporting.
	(update_thread_state): New function.
	(attach_thread): Use it.  Check thread_db_use_events before
	enabling event reporting.
	(thread_db_detach): Check thread_db_use_events before disabling
	event reporting.
	(find_new_threads_callback): Check thread_db_use_events before
	enabling event reporting.  Update the thread's state if not using
	libthread_db events.

gdb/testsuite/
2014-01-09  Pedro Alves  <palves@redhat.com>

	* gdb.threads/fork-thread-pending.exp: Switch to the main thread
	instead of to thread 2.
	* gdb.threads/signal-command-multiple-signals-pending.c (main):
	Add barrier around each pthread_create call instead of around all
	calls.
	* gdb.threads/signal-command-multiple-signals-pending.exp (test):
	Set a break on thread_function and have the child threads hit it
	one at at a time.
2015-01-09 11:42:57 +00:00