8sa1-binutils-gdb

Author	SHA1	Message	Date
Yao Qi	4081c0f122	Simplify gdb.threads/wp-replication.exp on counting HW watchpoints Nowadays, test gdb.threads/wp-replication.exp uses a while loop to repeatedly insert HW watchpoint, resume and check no error message coming out, in order to count HW watchpoints There are some drawbacks in this way, - the loop could be endless. I think this is use to making trouble to S/390, since we had such comment # Some targets (like S/390) behave as though supporting # unlimited hardware watchpoints. In this case we just take a # safe exit out of the loop. I hit this today too because a GDB internal error is triggered on "continue" in the loop, and $done is 0 invariantly, so the loop can't end. - the code counting hardware watchpoint is too complicated. We can use "set breakpoint always-inserted on" to get the result of inserting HW watchpoint without resuming the inferior. In this way, watch_count_done and empty_cycle in c file is no longer needed. In this patch, I change to use "set breakpoint always-inserted on" trick, and only iterate $NR_THREADS times, to count the HW watchpoint. In this way, the loop can't be endless, and GDB doesn't need to resume the inferior. gdb/testsuite: 2015-10-30 Yao Qi <yao.qi@linaro.org> * gdb.threads/wp-replication.c (watch_count_done): Remove. (empty_cycle): Remove. (main): Don't call empty_cycle. Don't use watch_count_done. * gdb.threads/wp-replication.exp: Don't set breakpoint on empty_cycle. Rewrite the code counting HW watchpoints.	2015-10-30 15:54:58 +00:00
Pedro Alves	1ed415e2b9	non-stop-fair-events.exp slower on software single-step && !displ-step targets On software single-step targets that don't support displaced stepping, threads keep hitting each other's single-step breakpoints, and then GDB needs to pause all threads to step past those. The end result is that progress in the main thread will be slower and it may take a bit longer for the signal to be queued. This patch bumps the timeout on such targets. gdb/testsuite/ChangeLog: 2015-09-16 Pedro Alves <palves@redhat.com> Sandra Loosemore <sandra@codesourcery.com> * gdb.threads/non-stop-fair-events.c (timeout): New global. (SECONDS): Redefine. (main): Call pthread_kill and alarm early. * gdb.threads/non-stop-fair-events.exp: Probe displaced stepping support. (test): If the target can't hardware step and doesn't support displaced stepping, increase the timeout.	2015-09-16 15:51:36 +01:00
Pedro Alves	d136eff549	Make it easier to debug non-stop-fair-events.exp If we enable infrun debug running this test, it quickly fails with a full expect buffer. That can be simply handled with a couple exp_continues. As it's annoying to hack this every time we need to debug the test, this patch adds bits to enable debugging support easily, with a one-line change. And then, if any iteration of the test fails, we end up with a long cascade of time outs. Just bail out when we see the first fail. gdb/testsuite/ 2015-09-16 Pedro Alves <palves@redhat.com> * gdb.threads/non-stop-fair-events.exp (gdb_test_no_anchor) (enable_debug): New procedures. (test): Use them. Bail out if waiting for threads fails. (top level): Bail out if a test fails.	2015-09-16 15:51:36 +01:00
Philippe Waroquiers	5382cfab61	Fix PR/18564 - regression in showing __thread so extern variable Ensure tls variable address is not relocated, as the msym addr is an offset in the thread local storage of the shared library/object.	2015-09-15 21:12:39 +02:00
Pedro Alves	d15dcecdee	Fix gdb.threads/non-ldr-exc-3.exp race gdb.threads/non-ldr-exc-3.exp is sometimes failing like this: [Switching to Thread 6831.6832] Breakpoint 2, thread_execler (arg=0x0) at /home/pedro/gdb/mygit/build/../src/gdb/testsuite/gdb.threads/non-ldr-exc-3.c:41 41 if (execl (image, image, argv1, NULL) == -1) /* break-here / PASS: gdb.threads/non-ldr-exc-3.exp: lock-sched=on,non-stop=off: continue to breakpoint (gdb) set scheduler-locking on (gdb) FAIL: gdb.threads/non-ldr-exc-3.exp: lock-sched=on,non-stop=off: set scheduler-locking on The problem is that the gdb_test_multiple is missing the prompt anchor. The problem was introduced by `2fd33e9448`. This reverts the hunk that introduced the problem, reverting back to gdb_continue_to_breakpoint. gdb/testsuite/ChangeLog: 2015-09-15 Pedro Alves <palves@redhat.com> gdb.threads/non-ldr-exc-3.exp (do_test): Use gdb_continue_to_breakpoint instead of gdb_test_multiple.	2015-09-15 17:01:59 +01:00
Don Breazeal	2fd33e9448	Extended-remote exec test This patch updates several exec-related tests and some of the library functions in order to get them running with extended-remote. There were three changes that were required, as follows: In gdb.base/foll-exec.exp, use 'clean_start' in place of proc 'zap_session' to reset the state of the debugger between tests. This sets 'remote exec-file' to execute the correct binary file in each subsequent test. In gdb.base/pie-execl.exp, there is an expect statement with an expression that is used to match output from both gdb and the program under debug. For the remote target, this had to be split into two expressions, using $inferior_spawn_id to match the output from the program. Because I had encountered problems with extended-remote exec events in non-stop mode in my manual testing, I added non-stop testing to the non-ldr-exc-[1234].exp tests. In order to set non-stop mode for remote targets, it is necessary to 'set non-stop on' after gdb has started, but before it connects to gdbserver. This is done using 'save_vars' to set non-stop mode in GDBFLAGS, so GDB sets non-stop mode on startup. gdb/testsuite/ChangeLog: * gdb.base/foll-exec.c: Add copyright header. Fix formatting issues. * gdb.base/foll-exec.exp (zap_session): Delete proc. (do_exec_tests): Use clean_restart in place of zap_session, and for test initialization. Fix formatting issues. Use fail in place of perror. * gdb.base/pie-execl.exp (main): Use 'inferior_spawn_id' in an expect statement to match an expression with output from the program under debug. * gdb.threads/non-ldr-exc-1.exp (do_test, main): Add non-stop tests and pass stop mode argument to clean_restart. Use save_vars to enable non-stop in GDBFLAGS. * gdb.threads/non-ldr-exc-2.exp: Likewise. * gdb.threads/non-ldr-exc-3.exp: Likewise. * gdb.threads/non-ldr-exc-4.exp: Likewise.	2015-09-11 11:12:46 -07:00
Sandra Loosemore	c0fa8fbd1c	Improve hand-call-in-threads.exp failure handling. 2015-09-08 Sandra Loosemore <sandra@codesourcery.com> gdb/testsuite/ * gdb.threads/hand-call-in-threads.exp: Make sure the thread command actually switches threads. Give up on remaining tests if target fails to stop at breakpoint.	2015-09-08 19:49:04 -07:00
Pedro Alves	d4569d7bc5	Fix step-over-{trips-on-watchpoint\|lands-on-breakpoint}.exp race On a target that is both always in non-stop mode and can do displaced stepping (such as native x86_64 GNU/Linux, with "maint set target-non-stop on"), the step-over-trips-on-watchpoint.exp test sometimes fails like this: (gdb) PASS: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: thread 1 set scheduler-locking off (gdb) PASS: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: set scheduler-locking off step -[Switching to Thread 0x7ffff7fc0700 (LWP 11782)] -Hardware watchpoint 4: watch_me - -Old value = 0 -New value = 1 -child_function (arg=0x0) at /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.threads/step-over-trips-on-watchpoint.c:39 -39 other = 1; /* set thread-specific breakpoint here / -(gdb) PASS: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: step +wait_threads () at /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.threads/step-over-trips-on-watchpoint.c:49 +49 return 1; / in wait_threads / +(gdb) FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: step Note "scheduler-locking" was set off. The problem is that on such targets, the step-over of thread 2 and the "step" of thread 1 can be set to run simultaneously (since with displaced stepping the breakpoint isn't ever removed from the target), and sometimes, the "step" of thread 1 finishes first, so it'd take another resume to see the watchpoint trigger. Fix this by replacing the wait_threads function with a one-line infinite loop that doesn't call any function, so that the "step" of thread 1 never finishes. gdb/testsuite/ChangeLog: 2015-08-07 Pedro Alves <palves@redhat.com> gdb.threads/step-over-lands-on-breakpoint.c (wait_threads): Delete function. (main): Add alarm. Run an infinite loop instead of calling wait_threads. * gdb.threads/step-over-lands-on-breakpoint.exp (do_test): Change comment. * gdb.threads/step-over-trips-on-watchpoint.c (wait_threads): Delete function. (main): Add alarm. Run an infinite loop instead of calling wait_threads. * gdb.threads/step-over-trips-on-watchpoint.exp (do_test): Change comment.	2015-08-07 17:26:21 +01:00
Pedro Alves	d55007b583	Fix signal-while-stepping-over-bp-other-thread.exp on targets always in non-stop With "maint set target-non-stop on" we get: -PASS: gdb.threads/signal-while-stepping-over-bp-other-thread.exp: step +FAIL: gdb.threads/signal-while-stepping-over-bp-other-thread.exp: step The issue is simply that switch_back_to_stepped_thread is not used in non-stop mode, thus infrun doesn't output the expected "switching back to stepped thread" log. gdb/testsuite/ChangeLog: 2015-08-07 Pedro Alves <palves@redhat.com> * signal-while-stepping-over-bp-other-thread.exp: Expect "restart threads" as alternative to "switching back to stepped thread".	2015-08-07 17:26:20 +01:00
Pedro Alves	f6a9d9c7db	Revert "test slowdown" That was pushed by mistake.	2015-08-06 12:45:45 +01:00
Pedro Alves	83e97ed023	Test for PR18749: problems if whole process dies while (ptrace-) stopped This adds a kfailed test that has the whole process exit just while several threads continuously step over a breakpoint. Usually, the process exits just while GDB or GDBserver is handling the breakpoint hit. In other words, the process disappears while the event thread is (ptrace-) stopped. This exposes several issues in GDB and GDBserver. Errors, crashes, etc. I fixed some of these issues recently, but there's a lot more to do. It's a bit like playing whack-a-mole at the moment. You fix an issue, which then exposes several others. E.g., with the native target, you get (among other errors): (...) [New Thread 0x7ffff47b9700 (LWP 18077)] [New Thread 0x7ffff3fb8700 (LWP 18078)] [New Thread 0x7ffff37b7700 (LWP 18079)] Cannot find user-level thread for LWP 18076: generic error (gdb) KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (prompt) (PRMS: gdb/18749) gdb/testsuite/ChangeLog: 2015-08-06 Pedro Alves <palves@redhat.com> PR gdb/18749 * gdb.threads/process-dies-while-handling-bp.c: New file. * gdb.threads/process-dies-while-handling-bp.exp: New file.	2015-08-06 12:33:20 +01:00
Pedro Alves	4807d3f329	test slowdown	2015-08-06 12:33:19 +01:00
Pedro Alves	863d01bde2	gdbserver: Fix non-stop / fork / step-over issues Ref: https://sourceware.org/ml/gdb-patches/2015-07/msg00868.html This adds a test that has a multithreaded program have several threads continuously fork, while another thread continuously steps over a breakpoint. This exposes several intertwined issues, which this patch addresses: - When we're stopping and suspending threads, some thread may fork, and we missed setting its suspend count to 1, like we do when a new clone/thread is detected. When we next unsuspend threads, the fork child's suspend count goes below 0, which is bogus and fails an assertion. - If a step-over is cancelled because a signal arrives, but then gdb is not interested in the signal, we pass the signal straight back to the inferior. However, we miss that we need to re-increment the suspend counts of all other threads that had been paused for the step-over. As a result, other threads indefinitely end up stuck stopped. - If a detach request comes in just while gdbserver is handling a step-over (in the test at hand, this is GDB detaching the fork child), gdbserver internal errors in stabilize_thread's helpers, which assert that all thread's suspend counts are 0 (otherwise we wouldn't be able to move threads out of the jump pads). The suspend counts aren't 0 while a step-over is in progress, because all threads but the one stepping past the breakpoint must remain paused until the step-over finishes and the breakpoint can be reinserted. - Occasionally, we see "BAD - reinserting but not stepping." being output (from within linux_resume_one_lwp_throw). That was because GDB pokes memory while gdbserver is busy with a step-over, and that suspends threads, and then re-resumes them with proceed_one_lwp, which missed another reason to tell linux_resume_one_lwp that the thread should be set back to stepping. - In a couple places, we were resuming threads that are meant to be suspended. E.g., when a vCont;c/s request for thread B comes in just while gdbserver is stepping thread A past a breakpoint. The resume for thread B must be deferred until the step-over finishes. - The test runs with both "set detach-on-fork" on and off. When off, it exercises the case of GDB detaching the fork child explicitly. When on, it exercises the case of gdb resuming the child explicitly. In the "off" case, gdb seems to exponentially become slower as new inferiors are created. This is _very_ noticeable as with only 100 inferiors gdb is crawling already, which makes the test take quite a bit to run. For that reason, I've disabled the "off" variant for now. gdb/ChangeLog: 2015-08-06 Pedro Alves <palves@redhat.com> * target/waitstatus.h (enum target_stop_reason) <TARGET_STOPPED_BY_SINGLE_STEP>: New value. gdb/gdbserver/ChangeLog: 2015-08-06 Pedro Alves <palves@redhat.com> * linux-low.c (handle_extended_wait): Set the fork child's suspend count if stopping and suspending threads. (check_stopped_by_breakpoint): If stopped by trace, set the LWP's stop reason to TARGET_STOPPED_BY_SINGLE_STEP. (linux_detach): Complete an ongoing step-over. (lwp_suspended_inc, lwp_suspended_decr): New functions. Use throughout. (resume_stopped_resumed_lwps): Don't resume a suspended thread. (linux_wait_1): If passing a signal to the inferior after finishing a step-over, unsuspend and re-resume all lwps. If we see a single-step event but the thread should be continuing, don't pass the trap to gdb. (stuck_in_jump_pad_callback, move_out_of_jump_pad_callback): Use internal_error instead of gdb_assert. (enqueue_pending_signal): New function. (check_ptrace_stopped_lwp_gone): Add debug output. (start_step_over): Use internal_error instead of gdb_assert. (complete_ongoing_step_over): New function. (linux_resume_one_thread): Don't resume a suspended thread. (proceed_one_lwp): If the LWP is stepping over a breakpoint, reset it stepping. gdb/testsuite/ChangeLog: 2015-08-06 Pedro Alves <palves@redhat.com> * gdb.threads/forking-threads-plus-breakpoint.exp: New file. * gdb.threads/forking-threads-plus-breakpoint.c: New file.	2015-08-06 10:30:18 +01:00
Pedro Alves	0a39bb3218	stepping is disturbed by setjmp/longjmp \| try/catch in other threads At https://sourceware.org/ml/gdb-patches/2015-08/msg00097.html, Joel observed that trying to next/step a program on GNU/Linux sometimes results in the following failed assertion: % gdb -q .obj/gprof/main (gdb) start (gdb) n (gdb) step [...]/infrun.c:2391: internal-error: resume: Assertion `sig != GDB_SIGNAL_0' failed. What happened is that, during the "next" operation, GDB hit a longjmp/exception/step-resume breakpoint but failed to see that this breakpoint was set for a different thread than the one being stepped. Joel's detailed analysis follows: More precisely, at the end of the "start" command, we are stopped at the start of function Main in main.adb; there are 4 threads in total, and we are in the main thread (which is thread 1): (gdb) info thread Id Target Id Frame 4 Thread 0xb7a56ba0 (LWP 28379) 0xffffe410 in __kernel_vsyscall () 3 Thread 0xb7c5aba0 (LWP 28378) 0xffffe410 in __kernel_vsyscall () 2 Thread 0xb7e5eba0 (LWP 28377) 0xffffe410 in __kernel_vsyscall () * 1 Thread 0xb7ea18c0 (LWP 28370) main () at /[...]/main.adb:57 All the logs below reference Thread ID/LWP, but it'll be easier to talk about the threads by GDB thread number. For instance, thread 1 is LWP 28370 while thread 3 is LWP 28378. So, the explanations below translate the LWPs into thread numbers. Back to what happens while we are trying to "next' our program: (gdb) n infrun: clear_proceed_status_thread (Thread 0xb7a56ba0 (LWP 28379)) infrun: clear_proceed_status_thread (Thread 0xb7c5aba0 (LWP 28378)) infrun: clear_proceed_status_thread (Thread 0xb7e5eba0 (LWP 28377)) infrun: clear_proceed_status_thread (Thread 0xb7ea18c0 (LWP 28370)) infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT) infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x805451e infrun: target_wait (-1.0.0, status) = infrun: 28370.28370.0 [Thread 0xb7ea18c0 (LWP 28370)], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x8054523 We've resumed thread 1 (LWP 28370), and received in return a signal that the same thread stopped slightly further. It's still in the range of instructions for the line of source we started the "next" from, as evidenced by the following trace... infrun: stepping inside range [0x805451e-0x8054531] ... and thus, we decide to continue stepping the same thread: infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x8054523 infrun: prepare_to_wait That's when we get an event from a different thread (thread 3)... infrun: target_wait (-1.0.0, status) = infrun: 28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x80782d0 infrun: context switch infrun: Switching context from Thread 0xb7ea18c0 (LWP 28370) to Thread 0xb7c5aba0 (LWP 28378) ... which we find to be at the address where we set a breakpoint on "the unwinder debug hook" (namely "_Unwind_DebugHook"). But GDB fails to notice that the breakpoint was inserted for thread 1 only, and so decides to handle it as... infrun: BPSTAT_WHAT_SET_LONGJMP_RESUME ... and inserts a breakpoint at the corresponding resume address, as evidenced by this the next log: infrun: exception resume at 80542a2 That breakpoint seems innocent right now, but will play a role fairly quickly. But for now, GDB has inserted the exception-resume breakpoint, and needs to single-step thread 3 past the breakpoint it just hit. Thus, it temporarily disables the exception breakpoint, and requests a step of that thread: infrun: skipping breakpoint: stepping past insn at: 0x80782d0 infrun: skipping breakpoint: stepping past insn at: 0x80782d0 infrun: skipping breakpoint: stepping past insn at: 0x80782d0 infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 0xb7c5aba0 (LWP 28378)] at 0x80782d0 infrun: prepare_to_wait We then get a notification, still from thread 3, that it's now past that breakpoint... infrun: prepare_to_wait infrun: target_wait (-1.0.0, status) = infrun: 28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x8078424 ... so we can resume what we were doing before, which is single-stepping thread 1 until we get to a new line of code: infrun: switching back to stepped thread infrun: Switching context from Thread 0xb7c5aba0 (LWP 28378) to Thread 0xb7ea18c0 (LWP 28370) infrun: expected thread still hasn't advanced infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x8054523 The "resume" log above shows that we're resuming thread 1 from where we left off (0x8054523). We get one more stop at 0x8054529, which is still inside our stepping range so we go again. That's when we get the following event, from thread 3: infrun: prepare_to_wait infrun: target_wait (-1.0.0, status) = infrun: 28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x80542a2 Now the stop_pc address is interesting, because it's the address of "exception resume" breakpoint... infrun: context switch infrun: Switching context from Thread 0xb7ea18c0 (LWP 28370) to Thread 0xb7c5aba0 (LWP 28378) infrun: BPSTAT_WHAT_CLEAR_LONGJMP_RESUME ... and since that location is at a different line of code, this is where it decides the "next" operation should stop: infrun: stop_waiting [Switching to Thread 0xb7c5aba0 (LWP 28378)] 0x080542a2 in inte_tache_rt.ttache_rt ( <_task>=0x80968ec <inte_tache_rt_inst.tache2>) at /[...]/inte_tache_rt.adb:54 54 end loop; However, what GDB should have noticed earlier that the exception breakpoint we hit was for a different thread, thus should have single-stepped that thread out of the breakpoint _without_ inserting the exception-return breakpoint, and then resumed the single-stepping of the initial thread (thread 1) until that thread stepped out of its stepping range. This is what this patch does, and after applying it, GDB now correctly stops on the next line of code. The patch adds a C++ test that exercises this, both for setjmp/longjmp and exception breakpoints. With an unpatched GDB it shows: (gdb) next [Switching to Thread 22445.22455] thread_try_catch (arg=0x0) at /home/pedro/gdb/mygit/build/../src/gdb/testsuite/gdb.threads/next-other-thr-longjmp.c:59 59 catch (...) (gdb) FAIL: gdb.threads/next-other-thr-longjmp.exp: next to line 1 next /home/pedro/gdb/mygit/build/../src/gdb/infrun.c:4865: internal-error: process_event_stop_test: Assertion `ecs->event_thread->control.exception_resume_breakpoint != NULL' fa iled. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) FAIL: gdb.threads/next-other-thr-longjmp.exp: next to line 2 (GDB internal error) Resyncing due to internal error. n Tested on x86_64-linux, no regressions. gdb/ChangeLog: 2015-08-05 Pedro Alves <palves@redhat.com> Joel Brobecker <brobecker@adacore.com> * breakpoint.c (bpstat_what) <bp_longjmp, bp_longjmp_call_dummy> <bp_exception, bp_longjmp_resume, bp_exception_resume>: Handle the case where BS->STOP is not set. gdb/testsuite/ChangeLog: 2015-08-05 Pedro Alves <palves@redhat.com> * gdb.threads/next-while-other-thread-longjmps.c: New file. * gdb.threads/next-while-other-thread-longjmps.exp: New file.	2015-08-05 20:01:42 +01:00
Pedro Alves	2c8c5d375e	testsuite: tcl exec& -> 'kill -9 $pid' is racy (attach-many-short-lived-thread.exp races and others) The buildbots show that attach-many-short-lived-thread.exp is racy. But after staring at debug logs and playing with SystemTap scripts for a (long) while, I figured out that neither GDB, nor the kernel nor the test's program itself are at fault. The problem is simply that the testsuite machinery is currently subject to PID-reuse races. The attach-many-short-lived-threads.c test program just happens to be much more susceptible to trigger this race because threads and processes share the same number space on Linux, and the test spawns many many short lived threads in succession, thus enlarging the race window a lot. Part of the problem is that several tests spawn processes with "exec&" (in order to test the "attach" command) , and then at the end of the test, to make sure things are cleaned up, issue a 'remote_spawn "kill -p $testpid"'. Since with tcl's "exec&", tcl itself is responsible for reaping the process's exit status, when we go kill the process, testpid may have already exited _and_ its status may have (and often has) been reaped already. Thus it can happen that another process meanwhile reuses $testpid, and that "kill" command kills the wrong process... Frequently, that happens to be attach-many-short-lived-thread, but this explains other test's races as well. In the attach-many-short-lived-threads test, it sometimes manifests like this: (gdb) file /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads Reading symbols from /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads...done. (gdb) Loaded /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads into /home/pedro/gdb/mygit/build/gdb/testsuite/../../gdb/gdb attach 5940 Attaching to program: /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads, process 5940 warning: process 5940 is a zombie - the process has already terminated ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ptrace: Operation not permitted. (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: attach info threads No threads. (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: no new threads set breakpoint always-inserted on (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set breakpoint always-inserted on Other times the process dies while the test is ongoing (the process is ptrace-stopped): (gdb) print again = 1 Cannot access memory at address 0x6020cc (gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 2: reset timer in the inferior (Recall that on Linux, SIGKILL is not interceptable) And other times it dies just while we're detaching: $4 = 319 (gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 2: print seconds_left detach Can't detach Thread 0x7fb13b7de700 (LWP 1842): No such process (gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 2: detach GDB mishandles the latter (it should ignore ESRCH while detaching just like when continuing), but that's another story. The fix here is to change spawn_wait_for_attach to use Expect's 'spawn' command instead of Tcl's 'exec&' to spawn programs, because with spawn we control when to wait for/reap the process. That allows killing the process by PID without being subject to pid-reuse races, because even if the process is already dead, the kernel won't reuse the process's PID until the zombie is reaped. The other part of the problem lies in DejaGnu itself, unfortunately. I have occasionally seen tests (attach-many-short-lived-threads included, but not only that one) die with a random inexplicable SIGTERM too, and that too is caused by the same reason, except that in that case, the rogue SIGTERM is sent from this bit in DejaGnu's remote.exp: exec sh -c "exec > /dev/null 2>&1 && (kill -2 $pgid \|\| kill -2 $pid) && sleep 5 && (kill $pgid \|\| kill $pid) && sleep 5 && (kill -9 $pgid \|\| kill -9 $pid) &" ... catch "wait -i $shell_id" Even if the program exits promptly, that whole cascade of kills carries on in the background, thus potentially killing the poor process that manages to reuse $pid... I sent a fix for that to the DejaGnu list: http://lists.gnu.org/archive/html/dejagnu/2015-07/msg00000.html With both patches in place, I haven't seen attach-many-short-lived-threads.exp fail again. Tested on x86_64 Fedora 20, native, gdbserver and extended-gdbserver. gdb/testsuite/ChangeLog: 2015-07-31 Pedro Alves <palves@redhat.com> * gdb.base/attach-pie-misread.exp: Rename $res to $test_spawn_id. Use spawn_id_get_pid. Wait for spawn id after eof. Use kill_wait_spawned_process instead of explicit "kill -9". * gdb.base/attach-pie-noexec.exp: Adjust to spawn_wait_for_attach returning a spawn id instead of a pid. Use spawn_id_get_pid and kill_wait_spawned_process. * gdb.base/attach-twice.exp: Likewise. * gdb.base/attach.exp: Likewise. (do_command_attach_tests): Use gdb_spawn_with_cmdline_opts and gdb_test_multiple. * gdb.base/solib-overlap.exp: Adjust to spawn_wait_for_attach returning a spawn id instead of a pid. Use spawn_id_get_pid and kill_wait_spawned_process. * gdb.base/valgrind-infcall.exp: Likewise. * gdb.multi/multi-attach.exp: Likewise. * gdb.python/py-prompt.exp: Likewise. * gdb.python/py-sync-interp.exp: Likewise. * gdb.server/ext-attach.exp: Likewise. * gdb.threads/attach-into-signal.exp (corefunc): Use spawn_wait_for_attach, spawn_id_get_pid and kill_wait_spawned_process. * gdb.threads/attach-many-short-lived-threads.exp: Adjust to spawn_wait_for_attach returning a spawn id instead of a pid. Use spawn_id_get_pid and kill_wait_spawned_process. * gdb.threads/attach-stopped.exp (corefunc): Use spawn_wait_for_attach, spawn_id_get_pid and kill_wait_spawned_process. * gdb.base/break-interp.exp: Rename $res to $test_spawn_id. Use spawn_id_get_pid. Wait for spawn id after eof. Use kill_wait_spawned_process instead of explicit "kill -9". * lib/gdb.exp (can_spawn_for_attach): Adjust comment. (kill_wait_spawned_process, spawn_id_get_pid): New procedures. (spawn_wait_for_attach): Use spawn instead of exec to spawn processes. Don't map cygwin/windows pids here. Now returns a spawn id list.	2015-07-31 20:06:24 +01:00
Pedro Alves	998d452ac8	remote follow fork and spurious child stops in non-stop mode Running gdb.threads/fork-plus-threads.exp against gdbserver in extended-remote mode, even though the test passes, we still see broken behavior: (gdb) PASS: gdb.threads/fork-plus-threads.exp: set detach-on-fork off continue & Continuing. (gdb) PASS: gdb.threads/fork-plus-threads.exp: continue & [New Thread 28092.28092] [Thread 28092.28092] #2 stopped. [New Thread 28094.28094] [Inferior 2 (process 28092) exited normally] [New Thread 28094.28105] [New Thread 28094.28109] ... [Thread 28174.28174] #18 stopped. [New Thread 28185.28185] [Inferior 10 (process 28174) exited normally] [New Thread 28185.28196] [Thread 28185.28185] #20 stopped. Cannot remove breakpoints because program is no longer writable. Further execution is probably impossible. [Inferior 11 (process 28185) exited normally] [Inferior 1 (process 28091) exited normally] PASS: gdb.threads/fork-plus-threads.exp: reached breakpoint info threads No threads. (gdb) PASS: gdb.threads/fork-plus-threads.exp: no threads left info inferiors Num Description Executable * 1 <null> /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/fork-plus-threads (gdb) PASS: gdb.threads/fork-plus-threads.exp: only inferior 1 left All the "[Thread FOO] #NN stopped." above are bogus, as well as the "Cannot remove breakpoints because program is no longer writable.", which is a consequence. The problem is that when we intercept a fork event, we should report the event for the parent, only, and leave the child stopped, but not report its stop event. GDB later decides whether to follow the parent or the child. But because handle_extended_wait does not set the child's last_status.kind to TARGET_WAITKIND_STOPPED, a stop_all_threads/unstop_all_lwps sequence (e.g., from trying to access memory) by mistake ends up queueing a SIGSTOP on the child, resuming it, and then when that SIGSTOP is intercepted, because the LWP has last_resume_kind set to resume_stop, gdbserver reports the stop to GDB, as GDB_SIGNAL_0: ... >>>> entering unstop_all_lwps unstopping all lwps proceed_one_lwp: lwp 1600 client wants LWP to remain 1600 stopped proceed_one_lwp: lwp 1828 Client wants LWP 1828 to stop. Making sure it has a SIGSTOP pending ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Sending sigstop to lwp 1828 pc is 0x3615ebc7cc Resuming lwp 1828 (continue, signal 0, stop expected) continue from pc 0x3615ebc7cc unstop_all_lwps done sigchld_handler <<<< exiting unstop_all_lwps handling possible target event >>>> entering linux_wait_1 linux_wait_1: [<all threads>] my_waitpid (-1, 0x40000001) my_waitpid (-1, 0x1): status(137f), 1828 LWFE: waitpid(-1, ...) returned 1828, ERRNO-OK LLW: waitpid 1828 received Stopped (signal) (stopped) pc is 0x3615ebc7cc Expected stop. LLW: resume_stop SIGSTOP caught for LWP 1828.1828. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ... linux_wait_1 ret = LWP 1828.1828, 1, 0 <<<< exiting linux_wait_1 Writing resume reply for LWP 1828.1828:1 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Tested on x86_64 Fedora 20, extended-remote. gdb/gdbserver/ChangeLog: 2015-07-30 Pedro Alves <palves@redhat.com> * linux-low.c (handle_extended_wait): Set the child's last reported status to TARGET_WAITKIND_STOPPED.	2015-07-30 18:52:53 +01:00
Pedro Alves	69dde7dcb8	PR threads/18600: Inferiors left around after fork+thread spawn The new gdb.threads/fork-plus-threads.exp test exposes one more problem. When one types "info inferiors" after running the program, one see's a couple inferior left still, while there should only be inferior #1 left. E.g.: (gdb) info inferiors Num Description Executable 4 process 8393 /home/pedro/bugs/src/test 2 process 8388 /home/pedro/bugs/src/test * 1 <null> /home/pedro/bugs/src/test (gdb) info threads Calling prune_inferiors() manually at this point (from a top gdb) does not remove them, because they still have inf->pid != 0 (while they shouldn't). This suggests that we never mourned those inferiors. Enabling logs (master + previous patch) we see: ... WL: waitpid Thread 0x7ffff7fc2740 (LWP 9513) received Trace/breakpoint trap (stopped) WL: Handling extended status 0x03057f LHEW: Got clone event from LWP 9513, new child is LWP 9579 [New Thread 0x7ffff37b8700 (LWP 9579)] WL: waitpid Thread 0x7ffff7fc2740 (LWP 9508) received 0 (exited) WL: Thread 0x7ffff7fc2740 (LWP 9508) exited. ^^^^^^^^ [Thread 0x7ffff7fc2740 (LWP 9508) exited] WL: waitpid Thread 0x7ffff7fc2740 (LWP 9499) received 0 (exited) WL: Thread 0x7ffff7fc2740 (LWP 9499) exited. [Thread 0x7ffff7fc2740 (LWP 9499) exited] RSRL: resuming stopped-resumed LWP Thread 0x7ffff37b8700 (LWP 9579) at 0x3615ef4ce1: step=0 ... (gdb) info inferiors Num Description Executable 5 process 9508 /home/pedro/bugs/src/test ^^^^ 4 process 9503 /home/pedro/bugs/src/test 3 process 9500 /home/pedro/bugs/src/test 2 process 9499 /home/pedro/bugs/src/test * 1 <null> /home/pedro/bugs/src/test (gdb) ... Note the "Thread 0x7ffff7fc2740 (LWP 9508) exited." line. That's this in wait_lwp: /* Check if the thread has exited. / if (WIFEXITED (status) \|\| WIFSIGNALED (status)) { thread_dead = 1; if (debug_linux_nat) fprintf_unfiltered (gdb_stdlog, "WL: %s exited.\n", target_pid_to_str (lp->ptid)); } } That was the leader thread reporting an exit, meaning the whole process is gone. So the problem is that this code doesn't understand that an WIFEXITED status of the leader LWP should be reported to infrun as process exit. gdb/ChangeLog: 2015-07-30 Pedro Alves <palves@redhat.com> PR threads/18600 linux-nat.c (wait_lwp): Report to the core when thread group leader exits. gdb/testsuite/ChangeLog: 2015-07-30 Pedro Alves <palves@redhat.com> PR threads/18600 * gdb.threads/fork-plus-threads.exp: Test that "info inferiors" only shows inferior 1.	2015-07-30 18:52:09 +01:00
Pedro Alves	4dd63d488a	PR threads/18600: Threads left stopped after fork+thread spawn When a program forks and another process start threads while gdb is handling the fork event, newly created threads are left stuck stopped by gdb, even though gdb presents them as "running", to the user. This can be seen with the test added by this patch. The test has the inferior fork a certain number of times and waits for all children to exit. Each fork child spawns a number of threads that do nothing and joins them immediately. Normally, the program should run unimpeded (from the point of view of the user) and exit very quickly. Without this fix, it doesn't because of some threads left stopped by gdb, so inferior 1 never exits. The program triggers when a new clone thread is found while inside the linux_stop_and_wait_all_lwps call in linux-thread-db.c: linux_stop_and_wait_all_lwps (); ALL_LWPS (lp) if (ptid_get_pid (lp->ptid) == pid) thread_from_lwp (lp->ptid); linux_unstop_all_lwps (); Within linux_stop_and_wait_all_lwps, we reach linux_handle_extended_wait with the "stopping" parameter set to 1, and because of that we don't mark the new lwp as resumed. As consequence, the subsequent resume_stopped_resumed_lwps, called from linux_unstop_all_lwps, never resumes the new LWP. There's lots of cruft in linux_handle_extended_wait that no longer makes sense. On systems with CLONE events support, we don't rely on libthread_db for thread listing anymore, so the code that preserves stop_requested and the handling of last_resume_kind is all dead. So the fix is to remove all that, and simply always mark the new LWP as resumed, so that resume_stopped_resumed_lwps re-resumes it. gdb/ChangeLog: 2015-07-30 Pedro Alves <palves@redhat.com> Simon Marchi <simon.marchi@ericsson.com> PR threads/18600 * linux-nat.c (linux_handle_extended_wait): On CLONE event, always mark the new thread as resumed. Remove STOPPING parameter. (wait_lwp): Adjust call to linux_handle_extended_wait. (linux_nat_filter_event): Adjust call to linux_handle_extended_wait. (resume_stopped_resumed_lwps): Add debug output. gdb/testsuite/ChangeLog: 2015-07-30 Simon Marchi <simon.marchi@ericsson.com> Pedro Alves <palves@redhat.com> PR threads/18600 * gdb.threads/fork-plus-threads.c: New file. * gdb.threads/fork-plus-threads.exp: New file.	2015-07-30 18:50:29 +01:00
Sergio Durigan Junior	7da5b897c9	Uniquefy gdb.threads/attach-into-signal.exp Hi, While examining BuildBot's logs, I noticed: <https://sourceware.org/ml/gdb-testers/2015-q3/msg03767.html> gdb.threads/attach-into-signal.exp has two nested loops and don't use unique messages. This commit fixes that. Pushed under the obvious rule. gdb/testsuite/ChangeLog: 2015-07-29 Sergio Durigan Junior <sergiodj@redhat.com> * gdb.threads/attach-into-signal.exp (corefunc): Use with_test_prefix on nested loops, uniquefying the test messages.	2015-07-29 11:10:49 -04:00
Pedro Alves	7759842763	PR gdb/18717: internal error if non-leader thread exits process If a non-leader thread exits the process while all other threads are ptrace-stopped, native gdb fails an assertion. The test added by this commit catches it: /home/pedro/gdb/mygit/build/../src/gdb/linux-nat.c:3198: internal-error: linux_nat_filter_event: Assertion `lp->resumed' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) FAIL: gdb.threads/non-leader-exit-process.exp: program exits normally (GDB internal error) The fix is just to remove the assertion. With that out of the way, neither GDB not GDBserver handle this perfectly though, so I'm adding a KFAIL: (gdb) continue Continuing. [Thread 0x7ffff7fc0700 (LWP 15350) exited] No unwaited-for children left. Couldn't get registers: No such process. (gdb) KFAIL: gdb.threads/non-ldr-exit.exp: program exits normally (PRMS: gdb/18717) gdb/ChangeLog: 2015-07-24 Pedro Alves <palves@redhat.com> PR gdb/18717 * linux-nat.c (linux_nat_filter_event): Don't assert that the lwp is resumed, and extend the debug log. gdb/testsuite/ChangeLog: 2015-07-24 Pedro Alves <palves@redhat.com> PR gdb/18717 * gdb.threads/non-ldr-exit.c: New file. * gdb.threads/non-ldr-exit.exp: New file.	2015-07-24 17:49:17 +01:00
Pedro Alves	28bf096c62	PR threads/18127 - threads spawned by infcall end up stuck in "running" state Refs: https://sourceware.org/ml/gdb/2015-03/msg00024.html https://sourceware.org/ml/gdb/2015-06/msg00005.html On GNU/Linux, if an infcall spawns a thread, that thread ends up with stuck running state. This happens because: - when linux-nat.c detects a new thread, it marks them as running, and does not report anything to the core. - we skip finish_thread_state when the thread that is running the infcall stops. As result, that new thread ends up with stuck "running" state, even though it really is stopped. On Windows, _all_ threads end up stuck in running state, not just the one that was spawned. That happens because when a new thread is detected, unlike linux-nat.c, windows-nat.c reports TARGET_WAITKIND_SPURIOUS to infrun. It's the fact that that event does not cause a user-visible stop that triggers the problem. When the target is re-resumed, we call set_running with a wildcard ptid, which marks all thread as running. That set_running is not suppressed because the (leader) thread being resumed does not have in_infcall set. Later, when the infcall finally finishes successfully, nothing marks all threads back to stopped. We can trigger the same problem on all targets by having a thread other than the one that is running the infcall report a breakpoint hit to infrun, and then have that breakpoint not cause a stop. That's what the included test does. The fix is to stop GDB from suppressing the set_running calls while doing an infcall, and then set the threads back to stopped when the call finishes, iff they were originally stopped before the infcall started. (Note the MI running/stopped event suppression isn't affected.) Tested on x86_64 GNU/Linux. gdb/ChangeLog: 2015-06-29 Pedro Alves <palves@redhat.com> PR threads/18127 * infcall.c (run_inferior_call): On infcall success, if the thread was marked stopped before, reset it back to stopped. * infrun.c (resume): Don't suppress the set_running calls when doing an infcall. (normal_stop): Only discard the finish_thread_state cleanup if the infcall succeeded. gdb/testsuite/ChangeLog: 2015-06-29 Pedro Alves <palves@redhat.com> PR threads/18127 * gdb.threads/hand-call-new-thread.c: New file. * gdb.threads/hand-call-new-thread.c: New file.	2015-06-29 16:07:57 +01:00
Pedro Alves	9ee417720b	Cleanup signal-while-stepping-over-bp-other-thread.exp gdb/testsuite/ChangeLog: 2015-04-10 Pedro Alves <palves@redhat.com> * gdb.threads/signal-while-stepping-over-bp-other-thread.exp: Use gdb_test_sequence and gdb_assert.	2015-04-10 19:49:00 +01:00
Pedro Alves	07473109e1	step-over-trips-on-watchpoint.exp: Don't put addresses in test messages Diffing test results, I noticed: -PASS: gdb.threads/step-over-trips-on-watchpoint.exp: displaced=on: with thread-specific bp: next: b 0x0000000000400811 thread 1 +PASS: gdb.threads/step-over-trips-on-watchpoint.exp: displaced=on: with thread-specific bp: next: b 0x00000000004007d1 thread 1 gdb/testsuite/ChangeLog: 2015-04-10 Pedro Alves <palves@redhat.com> * gdb.threads/step-over-trips-on-watchpoint.exp (do_test): Use test messages that don't include the breakpoint address.	2015-04-10 19:23:24 +01:00
Pedro Alves	c79d856c88	Test step-over-{lands-on-breakpoint\|trips-on-watchpoint}.exp with displaced stepping These tests exercise the infrun.c:proceed code that needs to know to start new step overs (along with switch_back_to_stepped_thread, etc.). That code is tricky to get right in the multitude of possible combinations (at least): (native \| remote) X (all-stop \| all-stop-but-target-always-in-non-stop) X (displaced-stepping \| in-line step-over). The first two above are properties of the target, but the different step-over-breakpoint methods should work with any target that supports them. This patch makes sure we always test both methods on all targets. Tested on x86-64 Fedora 20. gdb/testsuite/ChangeLog: 2015-04-10 Pedro Alves <palves@redhat.com> * gdb.threads/step-over-lands-on-breakpoint.exp (do_test): New procedure, factored out from ... (top level): ... here. Add "set displaced-stepping" testing axis. * gdb.threads/step-over-trips-on-watchpoint.exp (do_test): New parameter "displaced". Use it. (top level): Use foreach and add "set displaced-stepping" testing axis.	2015-04-10 13:31:59 +01:00
Pedro Alves	ebc90b50ce	Make gdb.threads/step-over-trips-on-watchpoint.exp effective on !x86 This test is currently failing like this on (at least) PPC64 and s390x: FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: step FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: next: next FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: with thread-specific bp: step: step FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: with thread-specific bp: next: next gdb.log: (gdb) PASS: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: set scheduler-locking off step wait_threads () at ../../../src/gdb/testsuite/gdb.threads/step-over-trips-on-watchpoint.c:49 49 return 1; /* in wait_threads / (gdb) FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: step The problem is that the test assumes that both the "watch_me = 1;" and the "other = 1;" lines compile to a single instruction each, which happens to be true on x86, but no necessarily true everywhere else. The result is that the test doesn't really test what it wants to test. Fix it by looking for the instruction that triggers the watchpoint. gdb/ChangeLog: 2015-04-10 Pedro Alves <palves@redhat.com> gdb.threads/step-over-trips-on-watchpoint.c (child_function): Remove comment. * gdb.threads/step-over-trips-on-watchpoint.exp (do_test): Find both the address of the instruction that triggers the watchpoint and the address of the instruction immediately after, and use those addresses for the test. Fix comment.	2015-04-10 13:11:32 +01:00
Pedro Alves	8d707a12ef	gdb/18216: displaced step+deliver signal, a thread needs step-over, crash The problem is that with hardware step targets and displaced stepping, "signal FOO" when stopped at a breakpoint steps the breakpoint instruction at the same time it delivers a signal. This results in tp->stepped_breakpoint set, but no step-resume breakpoint set. When the next stop event arrives, GDB crashes. Irrespective of whether we should do something more/different to step past the breakpoint in this scenario (e.g., PR 18225), it's just wrong to assume there'll be a step-resume breakpoint set (and was not the original intention). gdb/ChangeLog: 2015-04-10 Pedro Alves <palves@redhat.com> PR gdb/18216 * infrun.c (process_event_stop_test): Don't assume a step-resume is set if tp->stepped_breakpoint is true. gdb/testsuite/ChangeLog: 2015-04-10 Pedro Alves <palves@redhat.com> PR gdb/18216 * gdb.threads/multiple-step-overs.exp: Remove expected eof.	2015-04-10 10:36:23 +01:00
Pedro Alves	f3770638ca	Add test for PR18214 and PR18216 - multiple step-overs with queued signals Both PRs are triggered by the same use case. PR18214 is about software single-step targets. On those, the 'resume' code that detects that we're stepping over a breakpoint and delivering a signal at the same time: /* Currently, our software single-step implementation leads to different results than hardware single-stepping in one situation: when stepping into delivering a signal which has an associated signal handler, hardware single-step will stop at the first instruction of the handler, while software single-step will simply skip execution of the handler. ... Fortunately, we can at least fix this particular issue. We detect here the case where we are about to deliver a signal while software single-stepping with breakpoints removed. In this situation, we revert the decisions to remove all breakpoints and insert single- step breakpoints, and instead we install a step-resume breakpoint at the current address, deliver the signal without stepping, and once we arrive back at the step-resume breakpoint, actually step over the breakpoint we originally wanted to step over. / doesn't handle the case of _another_ thread also needing to step over a breakpoint. Because the other thread is just resumed at the PC where it had stopped and a breakpoint is still inserted there, the thread immediately re-traps the same breakpoint. This test exercises that. On software single-step targets, it fails like this: KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr3: continue to sigusr1_handler KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr2: continue to sigusr1_handler gdb.log (simplified): (gdb) continue Continuing. Breakpoint 4, child_function_2 (arg=0x0) at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:66 66 callme (); / set breakpoint thread 2 here / (gdb) thread 3 (gdb) queue-signal SIGUSR1 (gdb) thread 1 [Switching to thread 1 (Thread 0x7ffff7fc1740 (LWP 24824))] #0 main () at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:106 106 wait_threads (); / set wait-threads breakpoint here / (gdb) break sigusr1_handler Breakpoint 5 at 0x400837: file src/gdb/testsuite/gdb.threads/multiple-step-overs.c, line 31. (gdb) continue Continuing. [Switching to Thread 0x7ffff7fc0700 (LWP 24828)] Breakpoint 4, child_function_2 (arg=0x0) at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:66 66 callme (); / set breakpoint thread 2 here / (gdb) KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr3: continue to sigusr1_handler For good measure, I made the test try displaced stepping too. And then I found it crashes GDB on x86-64 (a hardware step target), but only when displaced stepping... : KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr1: continue to sigusr1_handler (PRMS: gdb/18216) KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr2: continue to sigusr1_handler (PRMS: gdb/18216) KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr3: continue to sigusr1_handler (PRMS: gdb/18216) Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000000000062a83a in process_event_stop_test (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4964 4964 if (sr_bp->loc->permanent Setting up the environment for debugging gdb. Breakpoint 1 at 0x79fcfc: file src/gdb/common/errors.c, line 54. Breakpoint 2 at 0x50a26c: file src/gdb/cli/cli-cmds.c, line 217. (top-gdb) p sr_bp $1 = (struct breakpoint ) 0x0 (top-gdb) bt #0 0x000000000062a83a in process_event_stop_test (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4964 #1 0x000000000062a1af in handle_signal_stop (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4715 #2 0x0000000000629097 in handle_inferior_event (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4165 #3 0x0000000000627482 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3298 #4 0x000000000064ad7b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56 #5 0x00000000004c375f in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4658 #6 0x0000000000648c47 in handle_file_event (file_ptr=0x2e0eaa0, ready_mask=1) at src/gdb/event-loop.c:658 The all-stop-non-stop series fixes this, but meanwhile, this augments the multiple-step-overs.exp test to cover this, KFAILed. gdb/testsuite/ChangeLog: 2015-04-08 Pedro Alves <palves@redhat.com> PR gdb/18214 PR gdb/18216 * gdb.threads/multiple-step-overs.c (sigusr1_handler): New function. (main): Install it as SIGUSR1 handler. * gdb.threads/multiple-step-overs.exp (setup): Remove 'prefix' parameter. Always use "setup" as prefix. Toggle "set displaced-stepping" off/on depending on global. Don't switch to thread 1 here. (top level): Add displaced stepping "off/on" test axis. Update "setup" calls. Wrap each subtest with with_test_prefix. Test continuing with a queued signal in each thread.	2015-04-08 19:59:03 +01:00
Yao Qi	337532fab1	Properly set alarm value in gdb.threads/non-stop-fair-events.exp Nowadays, the alarm value is 60, and alarm is generated on some slow boards. This patch is to pass DejaGNU timeout value to the program, and move the alarm call before going to infinite loop. If any thread has activities, the alarm is reset. gdb/testsuite: 2015-04-07 Yao Qi <yao.qi@linaro.org> * gdb.threads/non-stop-fair-events.c (SECONDS): New macro. (child_function): Call alarm. (main): Move call to alarm into the loop. * gdb.threads/non-stop-fair-events.exp: Build program with -DTIMEOUT=$timeout.	2015-04-07 11:30:07 +01:00
Yao Qi	cafda5977a	kfail two tests in no-unwaited-for-left.exp for remote target I see these two fails in no-unwaited-for-left.exp in remote testing for aarch64-linux target. ... continue Continuing. warning: Remote failure reply: E.No unwaited-for children left. [Thread 1084] #2 stopped. (gdb) FAIL: gdb.threads/no-unwaited-for-left.exp: continue stops when thread 2 exits .... continue Continuing. warning: Remote failure reply: E.No unwaited-for children left. [Thread 1081] #1 stopped. (gdb) FAIL: gdb.threads/no-unwaited-for-left.exp: continue stops when the main thread exits I checked the gdb.log on buildbot, and find that these two fails also appear on Debian-i686-native-extended-gdbserver and Fedora-ppc64be-native-gdbserver-m64. I recall that they are about local/remote parity, and related RSP is missing. There has been already a PR 14618 about it. This patch is to kfail them on remote target. gdb/testsuite: 2015-04-02 Yao Qi <yao.qi@linaro.org> * gdb.threads/no-unwaited-for-left.exp: Set up kfail if target is remote.	2015-04-02 13:51:31 +01:00
Pedro Alves	a14711808e	gdb.threads/manythreads.exp: can't read "test": no such variable If interrupt_and_wait manages to trigger the FAIL path, we get: ERROR OCCURED: can't read "test": no such variable gdb/testsuite/ChangeLog: 2015-04-01 Pedro Alves <palves@redhat.com> * gdb.threads/manythreads.exp (interrupt_and_wait): Pass $message to fail instead of non-existent $test.	2015-04-01 15:30:13 +01:00
Pedro Alves	4eec2deb06	Crash on thread id wrap around On GNU/Linux, if the target reuses the TID of a thread that GDB still has in its list marked as THREAD_EXITED, GDB crashes, like: (gdb) continue Continuing. src/gdb/thread.c:789: internal-error: set_running: Assertion `tp->state != THREAD_EXITED' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_reuse_time (GDB internal error) Here: (top-gdb) bt #0 internal_error (file=0x953dd8 "src/gdb/thread.c", line=789, fmt=0x953da0 "%s: Assertion `%s' failed.") at src/gdb/common/errors.c:54 #1 0x0000000000638514 in set_running (ptid=..., running=1) at src/gdb/thread.c:789 #2 0x00000000004bda42 in linux_handle_extended_wait (lp=0x16f5760, status=0, stopping=0) at src/gdb/linux-nat.c:2114 #3 0x00000000004bfa24 in linux_nat_filter_event (lwpid=20570, status=198015) at src/gdb/linux-nat.c:3127 #4 0x00000000004c070e in linux_nat_wait_1 (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3478 #5 0x00000000004c1015 in linux_nat_wait (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3722 #6 0x00000000004c92d2 in thread_db_wait (ops=0xd80b60 <thread_db_ops>, ptid=..., ourstatus=0x7fffffffd2c0, options=1) at src/gdb/linux-thread-db.c:1525 #7 0x000000000066db43 in delegate_wait (self=0xd80b60 <thread_db_ops>, arg1=..., arg2=0x7fffffffd2c0, arg3=1) at src/gdb/target-delegates.c:116 #8 0x000000000067e54b in target_wait (ptid=..., status=0x7fffffffd2c0, options=1) at src/gdb/target.c:2206 #9 0x0000000000625111 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3275 #10 0x0000000000648a3b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56 #11 0x00000000004c2ecb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4655 I managed to come up with a test that reliably reproduces this. It spawns enough threads for the pid number space to wrap around, so could potentially take a while. On my box that's 4 seconds; on gcc110, a PPC box which has max_pid set to 65536, it's over 10 seconds. So I made the test compute how long that would take, and cap the time waited if it would be unreasonably long. Tested on x86_64 Fedora 20. gdb/ChangeLog: 2015-04-01 Pedro Alves <palves@redhat.com> * linux-thread-db.c (record_thread): Readd the thread to gdb's list if it was marked exited. gdb/testsuite/ChangeLog: 2015-04-01 Pedro Alves <palves@redhat.com> * gdb.threads/tid-reuse.c: New file. * gdb.threads/tid-reuse.exp: New file.	2015-04-01 13:38:06 +01:00
Pedro Alves	a25d8bf9c5	Fix "thread apply all" with exited threads I noticed that "thread apply all" sometimes crashes. The problem is that thread_apply_all_command doesn take exited threads into account, and we qsort and then walk more elements than there really ever were put in the array. Valgrind shows: The current thread <Thread ID 3> has terminated. See `help thread'. (gdb) thread apply all p 1 Thread 1 (Thread 0x7ffff7fc2740 (LWP 29579)): $1 = 1 ==29576== Use of uninitialised value of size 8 ==29576== at 0x639CA8: set_thread_refcount (thread.c:1337) ==29576== by 0x5C2C7B: do_my_cleanups (cleanups.c:155) ==29576== by 0x5C2CE8: do_cleanups (cleanups.c:177) ==29576== by 0x63A191: thread_apply_all_command (thread.c:1477) ==29576== by 0x50374D: do_cfunc (cli-decode.c:105) ==29576== by 0x506865: cmd_func (cli-decode.c:1893) ==29576== by 0x7562CB: execute_command (top.c:476) ==29576== by 0x647DA4: command_handler (event-top.c:494) ==29576== by 0x648367: command_line_handler (event-top.c:692) ==29576== by 0x7BF7C9: rl_callback_read_char (callback.c:220) ==29576== by 0x64784C: rl_callback_read_char_wrapper (event-top.c:171) ==29576== by 0x647CB5: stdin_event_handler (event-top.c:432) ==29576== ... This can happen easily today as linux-nat.c/linux-thread-db.c are forgetting to purge non-current exited threads. But even with that fixed, we can always do "thread apply all" with an exited thread selected, which won't be deleted until the user switches to another thread. That's what the test added by this commit exercises. Tested on x86_64 Fedora 20. gdb/ChangeLog: 2015-03-24 Pedro Alves <palves@redhat.com> * thread.c (thread_apply_all_command): Take exited threads into account. gdb/testsuite/ChangeLog: 2015-03-24 Pedro Alves <palves@redhat.com> * gdb.threads/no-unwaited-for-left.exp: Test "thread apply all".	2015-03-24 21:01:29 +00:00
Pedro Alves	856e7dd698	Make "set scheduler-locking step" depend on user intention, only Currently, "set scheduler-locking step" is a bit odd. The manual documents it as being optimized for stepping, so that focus of debugging does not change unexpectedly, but then it says that sometimes other threads may run, and thus focus may indeed change unexpectedly... A user can then be excused to get confused and wonder why does GDB behave like this. I don't think a user should have to know about details of how "next" or whatever other run control command is implemented internally to understand when does the "scheduler-locking step" setting take effect. This patch completes a transition that the code has been moving towards for a while. It makes "set scheduler-locking step" hold threads depending on whether the _command_ the user entered was a stepping command [step/stepi/next/nexti], or not. Before, GDB could end up locking threads even on "continue" if for some reason run control decides a thread needs to be single stepped (e.g., for a software watchpoint). After, if a "continue" happens to need to single-step for some reason, we won't lock threads (unless when stepping over a breakpoint, naturally). And if a stepping command wants to continue a thread for bit, like when skipping a function to a step-resume breakpoint, we'll still lock threads, so focus of debugging doesn't change. In order to make this work, we need to record in the thread structure whether what set it running was a stepping command. (A follow up patch will remove the "step" parameters of 'proceed' and 'resume') FWIW, Fedora GDB, which defaults to "scheduler-locking step" (mainline defaults to "off") carries a different patch that goes in this direction as well. Tested on x86_64 Fedora 20, native and gdbserver. gdb/ChangeLog: 2015-03-24 Pedro Alves <palves@redhat.com> * gdbthread.h (struct thread_control_state) <stepping_command>: New field. * infcmd.c (step_once): Pass step=1 to clear_proceed_status. Set the thread's stepping_command field. * infrun.c (resume): Check the thread's stepping_command flag to determine which threads should be resumed. Rename 'entry_step' local to user_step. (clear_proceed_status_thread): Clear 'stepping_command'. (schedlock_applies): Change parameter type to struct thread_info pointer. Adjust. (find_thread_needs_step_over): Remove 'step' parameter. Adjust. (switch_back_to_stepped_thread): Adjust calls to 'schedlock_applies'. (_initialize_infrun): Adjust "set scheduler-locking step" help. gdb/testsuite/ChangeLog: 2015-03-24 Pedro Alves <palves@redhat.com> * gdb.threads/schedlock.exp (test_step): No longer expect that "set scheduler-locking step" with "next" over a function call runs threads unlocked. gdb/doc/ChangeLog: 2015-03-24 Pedro Alves <palves@redhat.com> * gdb.texinfo (test_step) <set scheduler-locking step>: No longer mention that threads may sometimes run unlocked.	2015-03-24 17:50:31 +00:00
Pedro Alves	8bf3b159e5	gdbserver/Linux: unbreak thread event randomization Wanting to make sure the new continue-pending-status.exp test tests both cases of threads 2 and 3 reporting an event, I added counters to the test, to make it FAIL if events for both threads aren't seen. Assuming a well behaved backend, and given a reasonable number of iterations, it should PASS. However, running that against GNU/Linux gdbserver, I found that surprisingly, that FAILed. GDBserver always reported the breakpoint hit for the same thread. Turns out that I broke gdbserver's thread event randomization recently, with git commit `582511be` ([gdbserver] linux-low.c: better starvation avoidance, handle non-stop mode too). In that commit I missed that the thread structure also has a status_pending_p field... The end result was that count_events_callback always returns 0, and then if no thread is stepping, select_event_lwp always returns the event thread. IOW, no randomization is happening at all. Quite curious how all the other changes in that patch were sufficient to fix non-stop-fair-events.exp anyway even with that broken. Tested on x86_64 Fedora 20, native and gdbserver. gdb/gdbserver/ChangeLog: 2015-03-19 Pedro Alves <palves@redhat.com> * linux-low.c (count_events_callback, select_event_lwp_callback): Use the lwp's status_pending_p field, not the thread's. gdb/testsuite/ChangeLog: 2015-03-19 Pedro Alves <palves@redhat.com> * gdb.threads/continue-pending-status.exp (saw_thread_2) (saw_thread_3): New globals. (top level): Increment them when an event for the corresponding thread is seen. (no thread starvation): New test.	2015-03-19 12:38:05 +00:00
Pedro Alves	eb54c8bf08	native/Linux: internal error if resume is short-circuited If the linux_nat_resume's short-circuits the resume because the current thread has a pending status, and, a thread with a higher number was previously stopped for a breakpoint, GDB internal errors, like: /home/pedro/gdb/mygit/src/gdb/linux-nat.c:2590: internal-error: status_callback: Assertion `lp->status != 0' failed. Fix this by make status_callback bail out earlier. GDBserver is already doing the same. New test added that exercises this. gdb/ChangeLog: 2015-03-19 Pedro Alves <palves@redhat.com> * linux-nat.c (status_callback): Return early if the LWP has no status pending. gdb/testsuite/ChangeLog: 2015-03-19 Pedro Alves <palves@redhat.com> * gdb.threads/continue-pending-status.c: New file. * gdb.threads/continue-pending-status.exp: New file.	2015-03-19 12:26:49 +00:00
Pedro Alves	be9957b82f	Fix gdb.threads/thread-specific-bp.exp race Gary stumbled on this: (gdb) PASS: gdb.threads/thread-specific-bp.exp: all-stop: continue to end info threads Id Target Id Frame * 1 Thread 0x7ffff7fdb700 (LWP 13717) "thread-specific" end () at /home/gary/work/archer/startswith/src/gdb/testsuite/gdb.threads/thread-specific-bp.c:29 (gdb) FAIL: gdb.threads/thread-specific-bp.exp: all-stop: thread start is gone info breakpoint The problem is that "...archer/startswith/src..." has a "start" in it, which matches the too-lax regex in the test. Rather than tweaking the regex, we can just remove the whole "info threads", like we removed similar ones in other files -- GDB nowadays does this implicitly already, so things should work without it. Thus removing this even improves testing here a bit. gdb/testsuite/ChangeLog: 2015-03-04 Pedro Alves <palves@redhat.com> * gdb.threads/thread-specific-bp.exp: Delete "info threads" test.	2015-03-04 17:23:55 +00:00
Pedro Alves	511aee7c39	gdb.threads/clone-thread_db.c: Add missing includes and fix pthread_join call This fixes: > gdb compile failed, /gdb/testsuite/gdb.threads/clone-thread_db.c: In function 'main': > /gdb/testsuite/gdb.threads/clone-thread_db.c:67:3: warning: implicit declaration of function 'alarm' [-Wimplicit-function-declaration] > alarm (300); > ^ > /gdb/testsuite/gdb.threads/clone-thread_db.c:69:3: warning: implicit declaration of function 'pthread_create' [-Wimplicit-function-declaration] > pthread_create (&child, NULL, thread_fn, NULL); > ^ > /gdb/testsuite/gdb.threads/clone-thread_db.c:70:3: warning: implicit declaration of function 'pthread_join' [-Wimplicit-function-declaration] > pthread_join (child); > ^ And then adding the missing headers revealed the pthread_join call was incorrect. This probably fixes the crash we see on ppc64be, e.g., at https://sourceware.org/ml/gdb-testers/2015-q1/msg04415.html the logs there show: ... Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x3fffb7ff54a0 (LWP 9275)] 0x00003fffb7f3ce74 in .pthread_join () from /lib64/libpthread.so.0 (gdb) FAIL: gdb.threads/clone-thread_db.exp: continue to end ... Tested on x86_64 Fedora 20. gdb/testsuite/ 2015-03-04 Pedro Alves <palves@redhat.com> * gdb.threads/clone-thread_db.c: Include unistd.h and pthread.h. (main): Pass missing retval argument to pthread_join call.	2015-03-04 09:13:49 +00:00
Pedro Alves	95e50b2723	follow-exec: delete all non-execing threads This fixes invalid reads Valgrind first caught when debugging against a GDBserver patched with a series that adds exec events to the remote protocol. Like these, using the gdb.threads/thread-execl.exp test: $ valgrind ./gdb -data-directory=data-directory ./testsuite/gdb.threads/thread-execl -ex "tar extended-remote :9999" -ex "b thread_execler" -ex "c" -ex "set scheduler-locking on" ... Breakpoint 1, thread_execler (arg=0x0) at src/gdb/testsuite/gdb.threads/thread-execl.c:29 29 if (execl (image, image, NULL) == -1) (gdb) n Thread 32509.32509 is executing new program: build/gdb/testsuite/gdb.threads/thread-execl [New Thread 32509.32532] ==32510== Invalid read of size 4 ==32510== at 0x5AA7D8: delete_breakpoint (breakpoint.c:13989) ==32510== by 0x6285D3: delete_thread_breakpoint (thread.c:100) ==32510== by 0x628603: delete_step_resume_breakpoint (thread.c:109) ==32510== by 0x61622B: delete_thread_infrun_breakpoints (infrun.c:2928) ==32510== by 0x6162EF: for_each_just_stopped_thread (infrun.c:2958) ==32510== by 0x616311: delete_just_stopped_threads_infrun_breakpoints (infrun.c:2969) ==32510== by 0x616C96: fetch_inferior_event (infrun.c:3267) ==32510== by 0x63A2DE: inferior_event_handler (inf-loop.c:57) ==32510== by 0x4E0E56: remote_async_serial_handler (remote.c:11877) ==32510== by 0x4AF620: run_async_handler_and_reschedule (ser-base.c:137) ==32510== by 0x4AF6F0: fd_event (ser-base.c:182) ==32510== by 0x63806D: handle_file_event (event-loop.c:762) ==32510== Address 0xcf333e0 is 16 bytes inside a block of size 200 free'd ==32510== at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) ==32510== by 0x77CB74: xfree (common-utils.c:98) ==32510== by 0x5AA954: delete_breakpoint (breakpoint.c:14056) ==32510== by 0x5988BD: update_breakpoints_after_exec (breakpoint.c:3765) ==32510== by 0x61360F: follow_exec (infrun.c:1091) ==32510== by 0x6186FA: handle_inferior_event (infrun.c:4061) ==32510== by 0x616C55: fetch_inferior_event (infrun.c:3261) ==32510== by 0x63A2DE: inferior_event_handler (inf-loop.c:57) ==32510== by 0x4E0E56: remote_async_serial_handler (remote.c:11877) ==32510== by 0x4AF620: run_async_handler_and_reschedule (ser-base.c:137) ==32510== by 0x4AF6F0: fd_event (ser-base.c:182) ==32510== by 0x63806D: handle_file_event (event-loop.c:762) ==32510== [Switching to Thread 32509.32532] Breakpoint 1, thread_execler (arg=0x0) at src/gdb/testsuite/gdb.threads/thread-execl.c:29 29 if (execl (image, image, NULL) == -1) (gdb) The breakpoint in question is the step-resume breakpoint of the non-main thread, the one that was "next"ed. The exact same issue can be seen on mainline with native debugging, by running the thread-execl.exp test in non-stop mode, because the kernel doesn't report a thread exit event for the execing thread. Tested on x86_64 Fedora 20. gdb/ChangeLog: 2015-03-02 Pedro Alves <palves@redhat.com> * infrun.c (follow_exec): Delete all threads of the process except the event thread. Extended comments. gdb/testsuite/ChangeLog: 2015-03-02 Pedro Alves <palves@redhat.com> * gdb.threads/thread-execl.exp (do_test): Handle non-stop. (top level): Call do_test with non-stop as well.	2015-03-03 01:25:17 +00:00
Pedro Alves	a47cd6e95a	gdb.threads/multi-create-ns-info-thr.exp and native-extended-remote board The buildbot shows that the new gdb.threads/multi-create-ns-info-thr.exp test is timing out when tested with --target=native-extended-remote. The reason is: No breakpoints or watchpoints. (gdb) break main Breakpoint 1 at 0x10000b00: file ../../../binutils-gdb/gdb/testsuite/gdb.threads/multi-create.c, line 72. (gdb) run Starting program: /home/gdb-buildbot/fedora-21-ppc64be-1/fedora-ppc64be-native-extended-gdbserver/build/gdb/testsuite/outputs/gdb.threads/multi-create-ns-info-thr/multi-cre ate-ns-info-thr Process /home/gdb-buildbot/fedora-21-ppc64be-1/fedora-ppc64be-native-extended-gdbserver/build/gdb/testsuite/outputs/gdb.threads/multi-create-ns-info-thr/multi-create-ns-inf o-thr created; pid = 16266 Unexpected vCont reply in non-stop mode: T0501:00003fffffffd190;40:00000080560fe290;thread:p3f8a.3f8a;core:0; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (gdb) break multi-create.c:45 Breakpoint 2 at 0x10000994: file ../../../binutils-gdb/gdb/testsuite/gdb.threads/multi-create.c, line 45. (gdb) commands Type commands for breakpoint(s) 2, one per line. Non-stop tests don't really work with the --target_board=native-extended-remote board, because tests toggle non-stop on after GDB is already connected to gdbserver, while Currently, non-stop must be enabled before connecting. This adjusts the test to bail if running to main fails, like all other non-stop tests. Note non-stop tests do work with --target_board=native-gdbserver. gdb/testsuite/ChangeLog: 2015-02-21 Pedro Alves <palves@redhat.com> * gdb.threads/multi-create-ns-info-thr.exp: Return early if runto_main fails.	2015-02-21 12:03:23 +00:00
Pedro Alves	2db9a4275c	GNU/Linux: Stop using libthread_db/td_ta_thr_iter TL;DR - GDB can hang if something refreshes the thread list out of the target while the target is running. GDB hangs inside td_ta_thr_iter. The fix is to not use that libthread_db function anymore. Long version: Running the testsuite against my all-stop-on-top-of-non-stop series is still exposing latent non-stop bugs. I was originally seeing this with the multi-create.exp test, back when we were still using libthread_db thread event breakpoints. The all-stop-on-top-of-non-stop series forces a thread list refresh each time GDB needs to start stepping over a breakpoint (to pause all threads). That test hits the thread event breakpoint often, resulting in a bunch of step-over operations, thus a bunch of thread list refreshes while some threads in the target are running. The commit adds a real non-stop mode test that triggers the issue, based on multi-create.exp, that does an explicit "info threads" when a breakpoint is hit. IOW, it does the same things the as-ns series was doing when testing multi-create.exp. The bug is a race, so it unfortunately takes several runs for the test to trigger it. In fact, even when setting the test running in a loop, it sometimes takes several minutes for it to trigger for me. The race is related to libthread_db's td_ta_thr_iter. This is libthread_db's entry point for walking the thread list of the inferior. Sometimes, when GDB refreshes the thread list from the target, libthread_db's td_ta_thr_iter can somehow see glibc's thread list as a cycle, and get stuck in an infinite loop. The issue is that when a thread exits, its thread control structure in glibc is moved from a "used" list to a "cache" list. These lists are simply circular linked lists where the "next/prev" pointers are embedded in the thread control structure itself. The "next" pointer of the last element of the list points back to the list's sentinel "head". There's only one set of "next/prev" pointers for both lists; thus a thread can only be in one of the lists at a time, not in both simultaneously. So when thread C exits, simplifying, the following happens. A-C are threads. stack_used and stack_cache are the list's heads. Before: stack_used -> A -> B -> C -> (&stack_used) stack_cache -> (&stack_cache) After: stack_used -> A -> B -> (&stack_used) stack_cache -> C -> (&stack_cache) td_ta_thr_iter starts by iterating at the list's head's next, and iterates until it sees a thread whose next pointer points to the list's head again. Thus in the before case above, C's next points to stack_used, indicating end of list. In the same case, the stack_cache list is empty. For each thread being iterated, td_ta_thr_iter reads the whole thread object out of the inferior. This includes the thread's "next" pointer. In the scenario above, it may happen that td_ta_thr_iter is iterating thread B and has already read B's thread structure just before thread C exits and its control structure moves to the cached list. Now, recall that td_ta_thr_iter is running in the context of GDB, and there's no locking between GDB and the inferior. From it's local copy of B, td_ta_thr_iter believes that the next thread after B is thread C, so it happilly continues iterating to C, a thread that has already exited, and is now in the stack cache list. After iterating C, td_ta_thr_iter finds the stack_cache head, which because it is not stack_used, td_ta_thr_iter assumes it's just another thread. After this, unless the reverse race triggers, GDB gets stuck in td_ta_thr_iter forever walking the stack_cache list, as no thread in thatlist has a next pointer that points back to stack_used (the terminating condition). Before fully understanding the issue, I tried adding cycle detection to GDB's td_ta_thr_iter callback. However, td_ta_thr_iter skips calling the callback in some cases, which means that it's possible that the callback isn't called at all, making it impossible for GDB to break the loop. I did manage to get GDB stuck in that state more than once. Fortunately, we can avoid the issue altogether. We don't really need td_ta_thr_iter for live debugging nowadays, given PTRACE_EVENT_CLONE. We already know how to map and lwp id to a thread id without iterating (thread_from_lwp), so use that more. gdb/ChangeLog: 2015-02-20 Pedro Alves <palves@redhat.com> * linux-nat.c (linux_handle_extended_wait): Call thread_db_notice_clone whenever a new clone LWP is detected. (linux_stop_and_wait_all_lwps, linux_unstop_all_lwps): New functions. * linux-nat.h (thread_db_attach_lwp): Delete declaration. (thread_db_notice_clone, linux_stop_and_wait_all_lwps) (linux_unstop_all_lwps): Declare. * linux-thread-db.c (struct thread_get_info_inout): Delete. (thread_get_info_callback): Delete. (thread_from_lwp): Use td_thr_get_info and record_thread. (thread_db_attach_lwp): Delete. (thread_db_notice_clone): New function. (try_thread_db_load_1): If /proc is mounted and shows the process'es task list, walk over all LWPs and call thread_from_lwp instead of relying on td_ta_thr_iter. (attach_thread): Don't call check_thread_signals here. Split the tail part of the function (which adds the thread to the core GDB thread list) to ... (record_thread): ... this function. Call check_thread_signals here. (thread_db_wait): Don't call thread_db_find_new_threads_1. Always call thread_from_lwp. (thread_db_update_thread_list): Rename to ... (thread_db_update_thread_list_org): ... this. (thread_db_update_thread_list): New function. (thread_db_find_thread_from_tid): Delete. (thread_db_get_ada_task_ptid): Simplify. * nat/linux-procfs.c: Include <sys/stat.h>. (linux_proc_task_list_dir_exists): New function. * nat/linux-procfs.h (linux_proc_task_list_dir_exists): Declare. gdb/gdbserver/ChangeLog: 2015-02-20 Pedro Alves <palves@redhat.com> * thread-db.c: Include "nat/linux-procfs.h". (thread_db_init): Skip listing new threads if the kernel supports PTRACE_EVENT_CLONE and /proc/PID/task/ is accessible. gdb/testsuite/ChangeLog: 2015-02-20 Pedro Alves <palves@redhat.com> * gdb.threads/multi-create-ns-info-thr.exp: New file.	2015-02-20 21:40:31 +00:00
Pedro Alves	5c5019c27c	PR18006: internal error if threaded program calls clone(CLONE_VM) On GNU/Linux, if a pthreaded program has a thread call clone(CLONE_VM) directly, and then that clone LWP hits a debug event (breakpoint, etc.) GDB internal errors. Threaded programs shouldn't really be calling clone directly, but GDB shouldn't crash either. The crash looks like this: (gdb) break clone_fn Breakpoint 2 at 0x4007d8: file clone-thread_db.c, line 35. (gdb) r ... [Thread debugging using libthread_db enabled] ... src/gdb/linux-nat.c:1030: internal-error: lin_lwp_attach_lwp: Assertion `lwpid > 0' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. The problem is that 'clone' ends up clearing the parent thread's tid field in glibc's thread data structure. For x86_64, the glibc code in question is here: sysdeps/unix/sysv/linux/x86_64/clone.S: ... testq $CLONE_THREAD, %rdi jne 1f testq $CLONE_VM, %rdi movl $-1, %eax <---- jne 2f movl $SYS_ify(getpid), %eax syscall 2: movl %eax, %fs:PID movl %eax, %fs:TID <---- 1: When GDB refreshes the thread list out of libthread_db, it finds a thread with LWP with pid -1 (the clone's parent), which naturally isn't yet on the thread list. GDB then tries to attach to that bogus LWP id, which is caught by that assertion. The fix is to detect the bad PID early. Tested on x86-64 Fedora 20. GDBserver doesn't need any fix. gdb/ChangeLog: 2015-02-20 Pedro Alves <palves@redhat.com> PR threads/18006 * linux-thread-db.c (thread_get_info_callback): Return early if the thread's lwp id is -1. gdb/testsuite/ChangeLog: 2015-02-20 Pedro Alves <palves@redhat.com> PR threads/18006 * gdb.threads/clone-thread_db.c: New file. * gdb.threads/clone-thread_db.exp: New file.	2015-02-20 19:00:21 +00:00
Pedro Alves	0703599a49	Fix adjust_pc_after_break, remove still current thread check On decr_pc_after_break targets, GDB adjusts the PC incorrectly if a background single-step stops somewhere where PC-$decr_pc has a breakpoint, and the thread that finishes the step is not the current thread, like: ADDR1 nop <-- breakpoint here ADDR2 jmp PC IOW, say thread A is stepping ADDR2's line in the background (an infinite loop), and the user switches focus to thread B. GDB's adjust_pc_after_break logic confuses the single-step stop of thread A for a hit of the breakpoint at ADDR1, and thus adjusts thread A's PC to point at ADDR1 when it should not, and reports a breakpoint hit, when thread A did not execute the instruction at ADDR1 at all. The test added by this patch exercises exactly that. I can't find any reason we'd need the "thread to be examined is still the current thread" condition in adjust_pc_after_break, at least nowadays; it might have made sense in the past. Best just remove it, and rely on currently_stepping(). Here's the test's log of a run with an unpatched GDB: 35 while (1); (gdb) PASS: gdb.threads/step-bg-decr-pc-switch-thread.exp: next over nop next& (gdb) PASS: gdb.threads/step-bg-decr-pc-switch-thread.exp: next& over inf loop thread 1 [Switching to thread 1 (Thread 0x7ffff7fc2740 (LWP 29027))](running) (gdb) PASS: gdb.threads/step-bg-decr-pc-switch-thread.exp: switch to main thread Breakpoint 2, thread_function (arg=0x0) at ...src/gdb/testsuite/gdb.threads/step-bg-decr-pc-switch-thread.c:34 34 NOP; /* set breakpoint here / FAIL: gdb.threads/step-bg-decr-pc-switch-thread.exp: no output while stepping gdb/ChangeLog: 2015-02-11 Pedro Alves <pedro@codesourcery.com> infrun.c (adjust_pc_after_break): Don't adjust the PC just because the event thread is not the current thread. gdb/testsuite/ChangeLog: 2015-02-11 Pedro Alves <pedro@codesourcery.com> * gdb.threads/step-bg-decr-pc-switch-thread.c: New file. * gdb.threads/step-bg-decr-pc-switch-thread.exp: New file.	2015-02-11 09:45:41 +00:00
Pedro Alves	01b088bc51	Add "signal SIGTRAP" test Some local changes I was working on related to SIGTRAP handling resulted in "signal SIGTRAP" no longer passing the SIGTRAP to the inferior. Surprisingly, only annota1.exp catches this. This commit adds a test that doesn't rely on annotations, so that at the point annotations are finaly dropped, we still have this use case covered ... This is a multi-threaded test to also exercise the case of first needing to do a step-over before delivering the signal. Tested on x86_64 Fedora 20, native, remote/extended-remote gdbserver. gdb/testsuite/ 2015-02-10 Pedro Alves <palves@redhat.com> * gdb.threads/signal-sigtrap.c: New file. * gdb.threads/signal-sigtrap.exp: New file.	2015-02-10 19:30:55 +00:00
Pedro Alves	e584fdbc6a	Improve gdb.threads/attach-many-short-lived-threads.exp timeout handling The buildbot shows that this test is still racy, and occasionally fails with time outs on some machines. I'd like to get major issues with load out of the way. The test currently exits after 180s, which is just a random number, that has no relation to what the .exp file considers a time out. This commit makes the program wait a bit longer than what the .exp file considers a time out, and, resets the timer for each iteration. Tested on x86_64 Fedora 20, native and extended-remote gdbserver. gdb/testsuite/ 2015-02-06 Pedro Alves <palves@redhat.com> * gdb.threads/attach-many-short-lived-threads.c (SECONDS): New macro. (seconds_left, again): New globals. (main): Wait seconds_left in a 1-second sleep loop instead of sleeping 180 seconds. If 'again' is set, reset the seconds counter. * gdb.threads/attach-many-short-lived-threads.exp (test): Set 'again' in the inferior before detaching. Print the seconds left. (options): New global. (top level): Build program with -DTIMEOUT=$timeout.	2015-02-06 13:24:32 +01:00
Mark Wielaard	37bc665e4e	Remove testsuite compile errors with GCC5. GCC5 defaults to the GNU11 standard for C and warns by default for implicit function declarations and implicit return types. https://gcc.gnu.org/gcc-5/porting_to.html Fixing these issues in the testsuite turns 9 untested and 17 unsupported testcases into 417 new passes when compiling with GCC5. gdb/testsuite/ChangeLog: * gdb.arch/i386-bp_permanent.c (standard): New declaration. * gdb.base/disp-step-fork.c: Include unistd.h. * gdb.base/siginfo-obj.c: Include stdio.h. * gdb.base/siginfo-thread.c: Likewise. * gdb.mi/non-stop.c: Include unistd.h. * gdb.mi/nsthrexec.c: Include stdio.h. * gdb.mi/pthreads.c: Include unistd.h. * gdb.modula2/unbounded1.c (main): Declare returns int. * gdb.reverse/consecutive-reverse.c: Likewise. * gdb.threads/create-fail.c: Include unistd.h. * gdb.threads/killed.c: Likewise. * gdb.threads/linux-dp.c: Likewise. * gdb.threads/non-ldr-exc-1.c: Include stdio.h and string.h. * gdb.threads/non-ldr-exc-2.c: Likewise. * gdb.threads/non-ldr-exc-3.c: Likewise. * gdb.threads/non-ldr-exc-4.c: Likewise. * gdb.threads/pthreads.c: Include unistd.h. (main): Declare returns int. * gdb.threads/tls-main.c (foo): New declaration. * gdb.threads/watchpoint-fork-mt.c: Define _GNU_SOURCE.	2015-01-25 18:50:56 +01:00
Pedro Alves	198297aafb	Linux: make target_is_async_p return false when async is off linux_nat_is_async_p currently always returns true, even when the target is _not_ async. That confuses gdb_readline_wrapper/gdb_readline_wrapper_cleanup, which force-disables target-async while the secondary prompt is active. As a result, when gdb_readline_wrapper returns, the target is left async, even through it was sync to begin with. That can result in weird bugs, like the one the test added by this commit exposes. Ref: https://sourceware.org/ml/gdb-patches/2015-01/msg00592.html gdb/ChangeLog: 2015-01-23 Pedro Alves <palves@redhat.com> * linux-nat.c (linux_is_async_p): New macro. (linux_nat_is_async_p): (linux_nat_terminal_inferior): Check whether the target can async instead of whether it is already async. (linux_nat_terminal_ours): Don't check whether the target is async. (linux_async_pipe): Use linux_is_async_p. gdb/testsuite/ChangeLog: 2015-01-23 Pedro Alves <palves@redhat.com> * gdb.threads/continue-pending-after-query.c: New file. * gdb.threads/continue-pending-after-query.exp: New file.	2015-01-23 11:12:39 +00:00
Pedro Alves	ede9f622af	add non-stop test that stresses thread starvation issues This commit adds a non-stop mode test originally inspired by signal-while-stepping-over-bp-other-thread.exp, that exposes the thread starvation issues fixed by the previous patches. It sets a set of threads stepping in parallel, and has one of them get a signal. Without the previous fixes, this would fail with timeouts. gdb/testsuite/ 2015-01-09 Pedro Alves <palves@redhat.com> * gdb.threads/non-stop-fair-events.c: New file. * gdb.threads/non-stop-fair-events.exp: New file.	2015-01-09 14:44:42 +00:00
Pedro Alves	9665ffdd59	gdb.threads/{siginfo-thread.c,watchthreads-reorder.c,ia64-sigill.c} races with GDB These three test all spawn a few threads and then send a SIGSTOP to their parent GDB in order to pause it while the new threads set things up for the test. With a GDB patch that changes the inferior thread's scheduling a bit, I sometimes see: FAIL: gdb.threads/siginfo-threads.exp: catch signal 0 (timeout) ... FAIL: gdb.threads/watchthreads-reorder.exp: reorder1: continue a (timeout) ... FAIL: gdb.threads/ia64-sigill.exp: continue (timeout) ... The issue is that the test program stops GDB before it had a chance of processing the new thread's clone event: (gdb) PASS: gdb.threads/siginfo-threads.exp: get pid continue Continuing. Stopping GDB PID 21541. Waiting till the threads initialize their TIDs. FAIL: gdb.threads/siginfo-threads.exp: catch signal 0 (timeout) On Linux (at least), new threads start stopped, and the debugger must resume them. The fix is to make the test program wait for the new threads to be running before stopping GDB. gdb/testsuite/ 2015-01-09 Pedro Alves <palves@redhat.com> * gdb.threads/ia64-sigill.c (threads_started_barrier): New global. (thread_func): Wait on barrier. (main): Wait for all threads to start before stopping GDB. * gdb.threads/siginfo-threads.c (threads_started_barrier): New global. (thread1_func, thread2_func): Wait on barrier. (main): Wait for all threads to start before stopping GDB. * gdb.threads/watchthreads-reorder.c (threads_started_barrier): New global. (thread1_func, thread2_func): Wait on barrier. (main): Wait for all threads to start before stopping GDB.	2015-01-09 13:58:29 +00:00
Pedro Alves	c945a99f01	Test attaching to a program that constantly spawns short-lived threads Before the previous fixes, on Linux, this would trigger several different problems, like: [New LWP 27106] [New LWP 27047] warning: unable to open /proc file '/proc/-1/status' [New LWP 27813] [New LWP 27869] warning: Can't attach LWP 11962: No child processes Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: debugger service failed warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available. gdb/testsuite/ 2015-01-09 Pedro Alves <palves@redhat.com> * gdb.threads/attach-many-short-lived-threads.c: New file. * gdb.threads/attach-many-short-lived-threads.exp: New file.	2015-01-09 11:44:04 +00:00
Pedro Alves	c1a747c109	Linux: Skip thread_db thread event reporting if PTRACE_EVENT_CLONE is supported [A test I wrote stumbled on a libthread_db issue related to thread event breakpoints. See glibc PR17705: [nptl_db: stale thread create/death events if debugger detaches] https://sourceware.org/bugzilla/show_bug.cgi?id=17705 This patch avoids that whole issue by making GDB stop using thread event breakpoints in the first place, which is good for other reasons as well, anyway.] Before PTRACE_EVENT_CLONE (Linux 2.6), the only way to learn about new threads in the inferior (to attach to them) or to learn about thread exit was to coordinate with the inferior's glibc/runtime, using libthread_db. That works by putting a breakpoint at a magic address which is called when a new thread is spawned, or when a thread is about to exit. When that breakpoint is hit, all threads are stopped, and then GDB coordinates with libthread_db to read data structures out of the inferior to learn about what happened. Then the breakpoint is single-stepped, and then all threads are re-resumed. This isn't very efficient (stops all threads) and is more fragile (inferior's thread list in memory may be corrupt; libthread_db bugs, etc.) than ideal. When the kernel supports PTRACE_EVENT_CLONE (which we already make use of), there's really no need to use libthread_db's event reporting mechanism to learn about new LWPs. And if the kernel supports that, then we learn about LWP exits through regular WIFEXITED wait statuses, so no need for the death event breakpoint either. GDBserver has been likewise skipping the thread_db events for a long while: https://sourceware.org/ml/gdb-patches/2007-10/msg00547.html There's one user-visible difference: we'll no longer print about threads being created and exiting while the program is running, like: [Thread 0x7ffff7dbb700 (LWP 30670) exited] [New Thread 0x7ffff7db3700 (LWP 30671)] [Thread 0x7ffff7dd3700 (LWP 30667) exited] [New Thread 0x7ffff7dab700 (LWP 30672)] [Thread 0x7ffff7db3700 (LWP 30671) exited] [Thread 0x7ffff7dcb700 (LWP 30668) exited] This is exactly the same behavior as when debugging against remote targets / gdbserver. I actually think that's a good thing (and as such have listed this in the local/remote parity wiki page a while ago), as the printing slows down the inferior. It's also a distraction to keep bothering the user about short-lived threads that she won't be able to interact with anyway. Instead, the user (and frontend) will be informed about new threads that currently exist in the program when the program next stops: (gdb) c ... * ctrl-c * [New Thread 0x7ffff7963700 (LWP 7797)] [New Thread 0x7ffff796b700 (LWP 7796)] Program received signal SIGINT, Interrupt. [Switching to Thread 0x7ffff796b700 (LWP 7796)] clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81 81 testq %rax,%rax (gdb) info threads A couple of tests had assumptions on GDB thread numbers that no longer hold. Tested on x86_64 Fedora 20. gdb/ 2014-01-09 Pedro Alves <palves@redhat.com> Skip enabling event reporting if the kernel supports PTRACE_EVENT_CLONE. * linux-thread-db.c: Include "nat/linux-ptrace.h". (thread_db_use_events): New function. (try_thread_db_load_1): Check thread_db_use_events before enabling event reporting. (update_thread_state): New function. (attach_thread): Use it. Check thread_db_use_events before enabling event reporting. (thread_db_detach): Check thread_db_use_events before disabling event reporting. (find_new_threads_callback): Check thread_db_use_events before enabling event reporting. Update the thread's state if not using libthread_db events. gdb/testsuite/ 2014-01-09 Pedro Alves <palves@redhat.com> * gdb.threads/fork-thread-pending.exp: Switch to the main thread instead of to thread 2. * gdb.threads/signal-command-multiple-signals-pending.c (main): Add barrier around each pthread_create call instead of around all calls. * gdb.threads/signal-command-multiple-signals-pending.exp (test): Set a break on thread_function and have the child threads hit it one at at a time.	2015-01-09 11:42:57 +00:00

1 2 3 4 5 ...

328 Commits