Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the devel-nytprof CPAN distribution.

Report information
The Basics
Id: 86638
Status: resolved
Priority: 0/
Queue: devel-nytprof

People
Owner: Nobody in particular
Requestors: Tim.Bunce [...] pobox.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



CC: Tim.Bunce [...] pobox.com, bug-devel-nytprof [...] rt.cpan.org
Subject: $SIG{__DIE__} Re: Core dump in NYTProf under mod_perl / Mason
Date: Tue, 2 Jul 2013 15:17:21 +0100
To: Larry Bennett <larry.bennett [...] photobox.com>
From: Tim Bunce <Tim.Bunce [...] pobox.com>
On Tue, Jul 02, 2013 at 02:23:55PM +0100, Larry Bennett wrote: Show quoted text
> Hi Tim, > > I work at PhotoBox alongside Zefram. I'm setting up NYTProf to get some stats for our mod_perl / > Mason-based front-end web servers. When we enable NYTProf, the Apache child processes are segfaulting. > > We've managed to get a decent core dump from one of the crashes in live. I wonder if you could have a > look and see if it rings any bells or suggests avenues to explore? > > The gdb backtrace from the core dump is below. I also printed out a couple key variables below the > trace. The immediate cause of the crash is de-referencing a variable p that is copied from the > subr_entry passed from Perl, which has an apparently unexpected NULL value in called_subpkg_pv. > > We are running NYTProf 5.03. We set the NYTPROF env variable to: > file=/tmp/nytprof_scratch.out:endatexit=1:sigexit=1:libcexit=1:start=no > > (That libcexit is in anticipation of Zefram's patch.) > The crash happens on our live servers but sadly never in our dev evironments. It takes about 30 seconds > in live before the crashes start. > > If I should open an RT ticket or report this problem in some other location let me know.
I've CC'd this to RT - that should create a case. The stack trace looks like die() being called inside a $SIG{__DIE__} hook. So printing $@ might be useful for yourselves. Meanwhile I suspect we'll have to debug this by email round-trips. See below. Show quoted text
> Program terminated with signal 11, Segmentation fault. > #0 0x00007f49d796bc56 in incr_sub_inclusive_time (subr_entry=0x7f49e18dd5e0) at NYTProf.xs:2060 > 2060 while (*p) > (gdb) > (gdb) > (gdb) bt > #0 0x00007f49d796bc56 in incr_sub_inclusive_time (subr_entry=0x7f49e18dd5e0) at NYTProf.xs:2060 > #1 0x00007f49d82ff429 in Perl_leave_scope (base=2872) at scope.c:1134 > #2 0x00007f49d830e58f in Perl_die_unwind (msv=<value optimized out>) at pp_ctl.c:1765 > #3 0x00007f49d82b8db5 in Perl_croak_sv (baseex=<value optimized out>) at util.c:1683 > #4 0x00007f49d82b8dc9 in Perl_die_sv (baseex=0x0) at util.c:1616 > #5 0x00007f49d8321a98 in Perl_pp_die () at pp_sys.c:510 > #6 0x00007f49d82d15a3 in Perl_runops_standard () at run.c:41 > #7 0x00007f49d826e177 in Perl_call_sv (sv=<value optimized out>, flags=6) at perl.c:2690 > #8 0x00007f49d82b74d6 in S_invoke_exception_hook (ex=<value optimized out>, > Perl_warn=<value optimized out>) at util.c:1594 > #9 0x00007f49d82b9620 in Perl_vcroak (pat=<value optimized out>, args=<value optimized out>) > at util.c:1715 > #10 0x00007f49d82bc71d in Perl_die (pat=0x0) at util.c:1650 > #11 0x00007f49d82d2767 in Perl_pp_entersub () at pp_hot.c:2606 > #12 0x00007f49d79729cc in pp_subcall_profiler (is_slowop=0) at NYTProf.xs:2615 > #13 0x00007f49d82d15a3 in Perl_runops_standard () at run.c:41 > #14 0x00007f49d826e523 in Perl_call_sv (sv=<value optimized out>, flags=10) at perl.c:2705 > #15 0x00007f49d85ad8e7 in modperl_callback (handler=0x7f49e105e780, p=0x7f49e0faa8b8, > r=<value optimized out>, s=0x7f49e105f470, args=<value optimized out>) > at modperl_callback.c:101 > #16 0x00007f49d85ae0cb in modperl_callback_run_handlers (idx=<value optimized out>, type=-80, > r=<value optimized out>, c=<value optimized out>, s=<value optimized out>, > pconf=<value optimized out>, plog=0x0, ptemp=0x0, run_mode=MP_HOOK_RUN_FIRST) > at modperl_callback.c:262 > #17 0x00007f49d85ae65f in modperl_callback_per_dir (idx=0, r=<value optimized out>, > run_mode=<value optimized out>) at modperl_callback.c:369 > #18 0x00007f49d85a9bdb in modperl_response_handler_run (r=0x7f49e0faa930) at mod_perl.c:1000 > #19 0x00007f49d85a9d22 in modperl_response_handler_cgi (r=0x7f49e0faa930) at mod_perl.c:1099 > #20 0x00007f49dbf0a200 in ap_run_handler (r=0x7f49e0faa930) at config.c:158 > #21 0x00007f49dbf0dace in ap_invoke_handler (r=0x7f49e0faa930) at config.c:376 > #22 0x00007f49dbf1b3c8 in ap_process_request (r=0x7f49e0faa930) at http_request.c:282 > #23 0x00007f49dbf18278 in ap_process_http_connection (c=0x7f49e124de70) at http_core.c:190 > #24 0x00007f49dbf11d28 in ap_run_process_connection (c=0x7f49e124de70) at connection.c:43 > #25 0x00007f49dbf1fd87 in child_main (child_num_arg=<value optimized out>) at prefork.c:662 > #26 0x00007f49dbf2009a in make_child (s=0x7f49dc89bb18, slot=2) at prefork.c:758 > #27 0x00007f49dbf203cb in startup_children (_pconf=<value optimized out>, > plog=<value optimized out>, s=<value optimized out>) at prefork.c:776 > #28 ap_mpm_run (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>) > at prefork.c:997 > #29 0x00007f49dbef6410 in main (argc=5, argv=0x7fffd0f77018) at main.c:740 > (gdb) > (gdb) > (gdb) p p > $1 = 0x0 > (gdb) > (gdb) > (gdb) p *subr_entry > $2 = {already_counted = 1, subr_prof_depth = 14, subr_call_seqn = 7797, > prev_subr_entry_ix = 21488, initial_call_timeofday = {tv_sec = 23353010, tv_nsec = 450637351}, > initial_call_cputimes = {tms_utime = 0, tms_stime = 0, tms_cutime = 0, tms_cstime = 0}, > initial_overhead_ticks = 161933, initial_subr_ticks = 3454823, caller_fid = 131, > caller_line = 112, caller_subpkg_pv = 0x7f49dfe5b060 "HTML::Mason::Commands", > caller_subnam_sv = 0x7f49e2024cd0, called_cv = 0x0, called_cv_depth = 0, called_is_xs = 0x0, > called_subpkg_pv = 0x0, called_subnam_sv = 0x7f49e20292f0, hide_subr_call_time = 0} > ================================================================================================
That looks reasonable, except for the null called_subpkg_pv. The place to look is subr_entry_setup(). called_is_xs being null implies it took the if (op_type == OP_ENTERSUB || op_type == OP_GOTO) branch, which is sane. called_cv being null implies that resolve_sub_to_cv() failed. In which case we end up in else { subr_entry->called_subnam_sv = newSV(0); /* see incr_sub_inclusive_time */ } which doesn't set subr_entry->called_subpkg_pv. Suspicious! Per the comment I look in incr_sub_inclusive_time() and see: if (subr_entry->called_subnam_sv == &PL_sv_undef) { if (trace_level) logwarn("Don't know name of called sub, assuming xsub/builtin exited via an exception (which isn't handled yet)\n"); subr_entry->already_counted++; } if (subr_entry->already_counted) { subr_entry_destroy(aTHX_ subr_entry); return; } The warn message seems to fit the stack trace. So I'd *guess* that the right fix is to change the newSV(0) to &PL_sv_undef. Let me know if that works. Tim. p.s. Also, perhaps you or Zefram, could see if resolve_sub_to_cv() could be improved to handle this case. Thanks!
Subject: Re: [rt.cpan.org #86638] $SIG{__DIE__} Re: Core dump in NYTProf under mod_perl / Mason
Date: Tue, 2 Jul 2013 16:37:18 +0100
To: bug-Devel-NYTProf [...] rt.cpan.org
From: Zefram <zefram [...] fysh.org>
Some more information that I garnered from examining the core dump, and recipe to reproduce below. Not only is called_subpkg_pv null, called_subnam_sv is undef (but not &PL_sv_undef). Looks like it comes from the newSV(0) case in subr_entry_setup(), but I can't confirm that. The caller_* fields of subr_entry point at a piece of code in one of our Mason components that goes like this: # eval block protects Apache dev boxes using threaded MPM (which cannot run Apache2::SizeLimit) eval { ... my ($size, $shared) = $Apache2::SizeLimit::HOW_BIG_IS_IT->(); ... }; if ($@) { Babel::FrontEnd::Log->log('file' => 'ProcessTuning', 'msg' => "ERROR: $@" ); } } $Apache2::SizeLimit::HOW_BIG_IS_IT is never set anywhere: neither in our code nor in Apache2::SizeLimit itself. Our code is buggy on this point, but of course the eval block normally protects us from the the effects. Indeed, the ProcessTuning log shows many entries such as 2013-07-01 13:27:56 pod0 - ERROR: Can't use string ("") as a subroutine ref while "strict refs" in use at /opt/client/var/mason/obj/1978989512/public/autohandler.223805.obj line 112. The caller_* fields of subr_entry point at the same line, line 112 of the munged file. *subr_entry has been created by D:NYTP operating on the entersub op from that line; this corresponds to the pp_entersub() in C stack frame #11. $SIG{__DIE__} (PL_diehook) is set to \&HTML::Mason::Exceptions::rethrow_exception, the source of which is sub rethrow_exception { my ($err) = @_; return unless $err; if ( UNIVERSAL::can($err, 'rethrow') ) { $err->rethrow; } elsif ( ref $err ) { die $err; } HTML::Mason::Exception->throw(error => $err); } C stack frame #7 is calling this sub. The pp_die() in frame #5 must be a die from something called by HTML::Mason::Exception->throw. I can reproduce the crash in a couple of ways. $SIG{__DIE__} can die(), or return (causing resumption of standard die processing), or be entirely absent. The essential feature is the "string as subroutine ref" error generated by the Perl core. "use strict" is necessary to generate this: without it you get a different error ("Undefined subroutine called") and no crash. Here's my minimal reproduction: $ /opt/perl/bin/perl -MDevel::NYTProf -lwe 'use strict; $x::z->();' Name "x::z" used only once: possible typo at -e line 1. Use of uninitialized value in subroutine entry at -e line 1. Use of uninitialized value in subroutine entry at -e line 1. Use of uninitialized value in subroutine entry at -e line 1. Can't use string ("") as a subroutine ref while "strict refs" in use at -e line 1. zsh: segmentation fault (core dumped) /opt/perl/bin/perl -MDevel::NYTProf -lwe 'use strict; $x::z->();' $ gdb /opt/perl/bin/perl core GNU gdb (GDB) 7.4.1-debian Copyright (C) 2012 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /opt/perl-5.16.3/bin/perl...done. [New LWP 11478] warning: Can't read pathname for load map: Input/output error. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/opt/perl/bin/perl -MDevel::NYTProf -lwe use strict; $x::z->();'. Program terminated with signal 11, Segmentation fault. #0 incr_sub_inclusive_time (subr_entry=0xfb2070) at NYTProf.xs:2066 2066 while (*p) (gdb) bt #0 incr_sub_inclusive_time (subr_entry=0xfb2070) at NYTProf.xs:2066 #1 0x00002ad166c65fe0 in Perl_leave_scope (base=base@entry=0) at scope.c:1134 #2 0x00002ad166bd645f in S_my_exit_jump () at perl.c:4951 #3 0x00002ad166bdcb02 in Perl_my_failure_exit () at perl.c:4936 #4 0x00002ad166c6f985 in Perl_die_unwind (msv=msv@entry=0xf912b0) at pp_ctl.c:1801 #5 0x00002ad166c1d438 in Perl_vcroak (pat=<optimized out>, args=args@entry=0x7fffe0f28af8) at util.c:1716 #6 0x00002ad166c1d4fc in Perl_die ( pat=pat@entry=0x2ad166cd2e38 "Can't use string (\"%-32p\"%s) as a subroutine ref while \"strict refs\" in use") at util.c:1650 #7 0x00002ad166c3eabc in Perl_pp_entersub () at pp_hot.c:2606 #8 0x00002ad167d8acaa in pp_subcall_profiler (is_slowop=0) at NYTProf.xs:2621 #9 0x00002ad166c375d3 in Perl_runops_standard () at run.c:41 #10 0x00002ad166bdc9da in S_run_body (oldscope=1) at perl.c:2402 #11 perl_run (my_perl=<optimized out>) at perl.c:2320 #12 0x0000000000400edb in main (argc=4, argv=0x7fffe0f290c8, env=0x7fffe0f290f0) at perlmain.c:120 This should be good enough as a test case. -zefram
Subject: Re: [rt.cpan.org #86638] $SIG{__DIE__} Re: Core dump in NYTProf under mod_perl / Mason
Date: Tue, 2 Jul 2013 16:52:06 +0100
To: bug-Devel-NYTProf [...] rt.cpan.org
From: Zefram <zefram [...] fysh.org>
Changing called_subnam_sv to &PL_sv_undef looks dubious. pp_subcall_profiler() has various cases of sv_setpv() or sv_setpvf() on called_subnam_sv, which look like they're depending on it being a fresh undef scalar rather than the global immutable undef. Maybe you need to make the opposite change: test SvOK() rather than ==&PL_sv_undef? -zefram
Agreed. Patched, with test case (thanks for that!)
Fixed in 5.05. Thanks!