Subject: | An analysis and solution for segment fault problem on Tcl module (Bug ID 53549 or 21181) |
Date: | Sun, 23 Apr 2017 12:44:47 +0800 |
To: | bug-Tcl [...] rt.cpan.org |
From: | SJ Luo <sjaluo [...] gmail.com> |
Hi,
After installation of Tcl module (version 1.05), I encountered segfault
problem similar to that listed in 21181:
My system is CentOS Linux 6.9 (x86_64) with Perl 5.10.1
[crystal@great Tcl-1.05-SJ]$ perl -e 'use Tcl;'
Segmentation fault (core dumped)
but with environment variable PERL_DL_NONLAZY set, things will go fine.
I've spent time go through gdb assembly code struggling with it and have
got some results. It is quite difficult to debug. I'd like to share it here.
In short, it is due to the symbol name confliction: There are functions
named Tcl_InitStub() in both libtclstub8.4.a and libtcl8.5.so and a wrong
one is being called.
In Tcl.xs function NpInitialize(), the libtcl8.5.so is loaded via dlopen()
before resolving of Tcl_InitStub() address, which is done on first calling
of the function by default (See 1st note below). the The linker incorrectly
resolved the symbol address to the one in libtcl8.5.so, while we expected
that in libtclstub8.4.a . That caused segmentation fault in following code.
When the PERL_DL_NONLAZY env is set to 1, the address of Tcl_InitStub()
would be resolved on phase of dlopen("tcl.so",RTLD_NOW), meanwhile
libtcl8.5.so was not loaded (dlopen) yet. Therefore the correct function
address is resolved and called.
I think it might be a bug of gcc or linker, rather than Tcl module. There
are some notes:
- Although libtclstub8.4.a is static linked. The linker still apply
dynamic linking calling procedure (via GOT/PLT table) to Tcl_InitStub().
I don't know why it is done this way. May there is some way to make it
actually statically link in code compilation phase. My gcc version is
4.4.7
- In theory, initstubs is a static function pointer, who should be
correctly assigned on initialization of Tcl.so. However, gcc did some
optimization on the code: gcc thought both address of Tcl_InitStub() and
initstubs do not change during program's life time, the initstubs
variable is actually reduced after compilation. When we call to
initstubs() in source code, it just call Tcl_InitStub() directly, rather
than via the function pointer, in executable file.
I worked around this issue by adding a line in NpInitialize() right after
function variable declaration section.
if( initstubs == NULL ) initstubs = NULL;
This line forces gcc to think that initstubs might change and actually
allocate some memory space to store its value. Then initstubs is now
actually existed and correctly set on Tcl.so initialization phase.
Now Tcl module works perfectly on my system. I hope it's helpful to
further improve the module quality.
Thanks,
SJ