Skip to content

runtime/cgo: handle signal on non-Go thread #3250

Closed
@nsf

Description

@nsf
For some reason certain action in C code forces segfault in Go. It was found in my gtk
bindings, I have stipped it down to one test file that segfaults on my machine. I did hg
bisect, the problem appears starting from revision 11922:daf22f371d51 (os/signal:
selective signal handling).

Here's the source code:
--------------------------------------------------------------------
package main

/*                                                                              
#include <gtk/gtk.h>                                                            
                                                                                
#cgo pkg-config: gtk+-3.0                                                       
*/
import "C"

func main() {
        C.gtk_init(nil, nil)
        C.gtk_file_chooser_button_new(nil, 0)
}
--------------------------------------------------------------------

It happens on linux 3.2.8 (archlinux distribution), as stated above any go version
starting from rev 11922, x86 achitecture, gtk 3.2.3

Here's the backtrace from gdb:


(gdb) bt
#0  runtime.sigtramp (sig=void, info=void, context=void) at
/home/nsf/go/src/pkg/runtime/sys_linux_386.s:176
#1  0x0805842b in runtime.sigtramp (sig=void, info=void, context=void) at
/home/nsf/go/src/pkg/runtime/sys_linux_386.s:195
#2  0x00000011 in ?? ()
#3  0xb47fe99c in ?? ()
#4  0x00000000 in ?? ()

And it seems that app creates a bunch of threads (could be related or not):
(gdb) info threads
  Id   Target Id         Frame 
* 5    Thread 0xb47ffb40 (LWP 1903) "test" runtime.sigtramp (sig=void,
info=void, context=void)
    at /home/nsf/go/src/pkg/runtime/sys_linux_386.s:176
  4    Thread 0xb53e2b40 (LWP 1902) "test" 0xb7fdd424 in __kernel_vsyscall ()
  3    Thread 0xb5be3b40 (LWP 1901) "test" 0xb7fdd424 in __kernel_vsyscall ()
  2    Thread 0xb6f48b40 (LWP 1900) "test" 0xb7fdd424 in __kernel_vsyscall ()
  1    Thread 0xb7089800 (LWP 1897) "test" 0xb747d026 in _int_free () from /lib/libc.so.6


If you guys are totally have no idea what's that, I can also try to dig gtk3 and remove
it from the test case (reproducing the bug with simple libraries only, like pthreads).
But I think it will be quite hard to do.

P.S. The same code in C runs fine:
---------------------------------------------------------------------
[nsf @ go-test]$ cat test.c
#include <gtk/gtk.h>

int main(int argc, char **argv)
{
        gtk_init(0, 0);
        gtk_file_chooser_button_new(0, 0);
}

[nsf @ go-test]$ gcc -o test test.c `pkg-config --cflags --libs gtk+-3.0`
[nsf @ go-test]$ ./test
[nsf @ go-test]$ gdb --quiet ./test
Reading symbols from /home/nsf/tmp/go-test/test...(no debugging symbols found)...done.
(gdb) run
Starting program: /home/nsf/tmp/go-test/test 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
[New Thread 0xb6745b40 (LWP 2034)]
[New Thread 0xb5dffb40 (LWP 2035)]
[New Thread 0xb53ffb40 (LWP 2036)]
[Thread 0xb53ffb40 (LWP 2036) exited]
[Thread 0xb5dffb40 (LWP 2035) exited]
[Thread 0xb6745b40 (LWP 2034) exited]
[Inferior 1 (process 2031) exited with code 0240]
(gdb) quit
---------------------------------------------------------------------

Activity

bradfitz

bradfitz commented on Mar 8, 2012

@bradfitz
Contributor

Comment 1:

You may have to runtime.LockOSThread to interact with GTK's event loop?
nsf

nsf commented on Mar 8, 2012

@nsf
Author

Comment 2:

1. This code doesn't use GTK's event loop. It's explicitly started with gtk_main usually.
2. Other code works fine (20+ demos using different widgets and events).
3. I tried runtime.LockOSThread(). And tried running the code above in the 'init'
function. Doesn't help.
rsc

rsc commented on Mar 8, 2012

@rsc
Contributor

Comment 3:

I believe that gtk is creating some thread and then that thread
gets a signal, and then the Go signal handler is invoked.
I am not sure what to do about this.  We like our signal handlers,
but they can't cope with being invoked on non-Go threads.
We could ignore such signals easily enough, but perhaps
gtk is really trying to handle that signal (or maybe it's a SIGSEGV
or something).
Russ

Labels changed: added priority-go1, removed priority-triage.

Owner changed to builder@golang.org.

Status changed to Accepted.

nsf

nsf commented on Mar 9, 2012

@nsf
Author

Comment 4:

Just want to mention, that the issue is most likely related to gtk DBus usage. On the
client side, it seems that the only signal it touches is SIGPIPE. It has code:
  #if HAVE_DECL_MSG_NOSIGNAL
  static dbus_bool_t _dbus_modify_sigpipe = FALSE;
  #else
  static dbus_bool_t _dbus_modify_sigpipe = TRUE;
  #endif
And then on connection opening it does:
  if (_dbus_modify_sigpipe)
    _dbus_disable_sigpipe ();
Which in turn results in a function call (if true):
  void _dbus_disable_sigpipe (void)
  {
    signal (SIGPIPE, SIG_IGN);
  }
On my machine I know it doesn't run _dbus_disable_sigpipe, maybe that's the issue.
Honestly I'm not an expert on how signals work in linux.
rsc

rsc commented on Mar 9, 2012

@rsc
Contributor

Comment 5:

Can you run your program under strace -f to find which signal is being
delivered?
If it is only SIGPIPE, we might be able to do a simple workaround for Go 1.
nsf

nsf commented on Mar 9, 2012

@nsf
Author

Comment 6:

The worst part that it runs fine under strace/ltrace. But it died once, however only
once, I wasn't able to repeat that under strace, see the second segfault log file. But
I'm afraid it won't be very helpful.

Attachments:

  1. strace-log.txt (160911 bytes)
  2. strace-log-segfault.txt (153031 bytes)
nsf

nsf commented on Mar 9, 2012

@nsf
Author

Comment 7:

Hm.. it dies often if I run strace without "-o" option (writes output to a file), here's
the two variants of dying:
SIGPIPE:
[pid 11326] read(3,
"\1\10\v\0\22\0\0\0\37\0\0\0\0\0\0\0H\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 104
[pid 11326] read(3, 0x89d5b08, 4096)    = -1 EAGAIN (Resource temporarily unavailable)
[pid 11326] read(3, 0x89d5b08, 4096)    = -1 EAGAIN (Resource temporarily unavailable)
[pid 11326] poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
[pid 11326] writev(3, [{"\24\0\6\0\1\0@\1\212\1\0\0\6\0\0\0\0\0\0\0\4\0\0\0", 24},
{NULL, 0}, {"", 0}], 3) = 24
[pid 11326] poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
[pid 11326] read(3, "\1
\f\0\1\0\0\0\6\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 36
[pid 11326] read(3, 0x89d5b08, 4096)    = -1 EAGAIN (Resource temporarily unavailable)
[pid 11326] read(3, 0x89d5b08, 4096)    = -1 EAGAIN (Resource temporarily unavailable)
[pid 11326] write(1, "unix:abstract=/tmp/dbus-LOsdVe90"..., 73) = -1 EPIPE (Broken pipe)
[pid 11326] --- {si_signo=SIGPIPE, si_code=SI_USER, si_pid=11326, si_uid=1000,
si_value={int=3076563288, ptr=0xb760a158}} (Broken pipe) ---
Process 11315 resumed
Process 11326 detached
Process 11315 detached
SIGCHLD? (but here it seems almost finished, we can see the last close calls):
[pid 11678] write(1, "unix:abstract=/tmp/dbus-LOsdVe90"..., 73) = 73
[pid 11678] write(1, "\370)\0\0", 4)    = 4
[pid 11678] write(1, "\1\0@\1", 4)      = 4
[pid 11678] close(1)                    = 0
[pid 11678] close(2)                    = 0
[pid 11678] exit_group(0)               = ?
Process 11678 detached
[pid 11677] <... select resumed> )      = 1 (in [9])
[pid 11677] --- {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=11678, si_status=0,
si_utime=0, si_stime=0} (Child exited) ---
[pid 11677] --- {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x2c} (Segmentation
fault) ---
Process 11677 detached
[pid 11676] +++ killed by SIGSEGV +++
[pid 11675] +++ killed by SIGSEGV +++
[pid 11674] +++ killed by SIGSEGV +++
+++ killed by SIGSEGV +++
And normally it runs fine:
read(3, 0x9842b08, 4096)                = -1 EAGAIN (Resource temporarily unavailable)
read(3, 0x9842b08, 4096)                = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3, [{"\24\0\6\0\1\0@\1\212\1\0\0\6\0\0\0\0\0\0\0\4\0\0\0", 24}, {NULL, 0}, {"",
0}], 3) = 24
poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
read(3, "\1 \f\0\1\0\0\0\6\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 36
read(3, 0x9842b08, 4096)                = -1 EAGAIN (Resource temporarily unavailable)
read(3, 0x9842b08, 4096)                = -1 EAGAIN (Resource temporarily unavailable)
write(1, "unix:abstract=/tmp/dbus-LOsdVe90"..., 73) = 73
write(1, "\370)\0\0", 4)                = 4
write(1, "\1\0@\1", 4)                  = 4
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
Process 11746 detached
rsc

rsc commented on Mar 12, 2012

@rsc
Contributor

Comment 8:

I created CL 5797068 to at least diagnose the problem better.
I do not believe we will be able to fix this for Go 1.

Labels changed: added priority-later, removed priority-go1.

nsf

nsf commented on Mar 12, 2012

@nsf
Author

Comment 9:

[nsf @ go-test]$ ./test
runtime: signal received on thread not created by Go.
Segmentation fault
Clearly that's the case.. Will look forward to a fix.
gopherbot

gopherbot commented on Jun 12, 2013

@gopherbot
Contributor

Comment 10 by joshrickmar:

I've run into this issue as well, trying to add a GtkEntry to a container (code
attached) with GTK 3.8 on OpenBSD.  I'm running go tip (changeset 8519983c00e8), and the
process no longer crashes, but produces a more useful error message:
runtime: signal received on thread not created by Go: SIGCHLD: child status has changed
Is there any way with the current source that we can catch this and ignore it, since we
no longer crash?

Attachments:

  1. test.go (297 bytes)
minux

minux commented on Jun 12, 2013

@minux
Member

Comment 11:

as a workaround, you could just add a return statement in function runtime.badsignal in
src/pkg/runtime/os_$GOOS.c to ignore any signals received on foreign threads.
gopherbot

gopherbot commented on Jun 12, 2013

@gopherbot
Contributor

Comment 12 by joshrickmar:

With the exception of processes ignoring my SIGINTs, that's a pretty good fix.
Here's the signals that I'm seeing with GTK and my test, now that it's not quitting
immediately:
runtime: signal received on thread not created by Go: SIGTERM: termination
runtime: signal received on thread not created by Go: SIGWINCH: window size change
Both of these signals, as well as a few others, have the default action of being ignored
(according to signal(3)).  Should this be fixed by listening for these signals, and if
they are sent, to ignore them completely?
gopherbot

gopherbot commented on Jun 13, 2013

@gopherbot
Contributor

Comment 13 by joshrickmar:

Oops, that should be:
runtime: signal received on thread not created by Go: SIGCHLD: child status has changed
runtime: signal received on thread not created by Go: SIGWINCH: window size change
gopherbot

gopherbot commented on Jun 13, 2013

@gopherbot
Contributor

Comment 14 by joshrickmar:

Here's a quick patch I put together, that ignores those signals that have no default
action.  I only modified the OpenBSD files, but the other platforms should have a
similar fix.  I've had no issues making GTK3 calls with cgo with this patch.

Attachments:

  1. ignore-signals.patch (1609 bytes)
minux

minux commented on Jul 11, 2013

@minux
Member

Comment 15:

This issue was closed by revision 2f1ead7.

Status changed to Fixed.

gopherbot

gopherbot commented on Jul 22, 2015

@gopherbot
Contributor

CL https://golang.org/cl/12503 mentions this issue.

locked and limited conversation to collaborators on Aug 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @bradfitz@nsf@rsc@minux@gopherbot

        Issue actions

          runtime/cgo: handle signal on non-Go thread · Issue #3250 · golang/go