CC: | Tim.Bunce [...] pobox.com, bug-devel-nytprof [...] rt.cpan.org |
Subject: | $SIG{__DIE__} Re: Core dump in NYTProf under mod_perl / Mason |
Date: | Tue, 2 Jul 2013 15:17:21 +0100 |
To: | Larry Bennett <larry.bennett [...] photobox.com> |
From: | Tim Bunce <Tim.Bunce [...] pobox.com> |
On Tue, Jul 02, 2013 at 02:23:55PM +0100, Larry Bennett wrote:
Show quoted text
> Hi Tim,
>
> I work at PhotoBox alongside Zefram. I'm setting up NYTProf to get some stats for our mod_perl /
> Mason-based front-end web servers. When we enable NYTProf, the Apache child processes are segfaulting.
>
> We've managed to get a decent core dump from one of the crashes in live. I wonder if you could have a
> look and see if it rings any bells or suggests avenues to explore?
>
> The gdb backtrace from the core dump is below. I also printed out a couple key variables below the
> trace. The immediate cause of the crash is de-referencing a variable p that is copied from the
> subr_entry passed from Perl, which has an apparently unexpected NULL value in called_subpkg_pv.
>
> We are running NYTProf 5.03. We set the NYTPROF env variable to:
> file=/tmp/nytprof_scratch.out:endatexit=1:sigexit=1:libcexit=1:start=no
>
> (That libcexit is in anticipation of Zefram's patch.)
> The crash happens on our live servers but sadly never in our dev evironments. It takes about 30 seconds
> in live before the crashes start.
>
> If I should open an RT ticket or report this problem in some other location let me know.
I've CC'd this to RT - that should create a case.
The stack trace looks like die() being called inside a $SIG{__DIE__} hook.
So printing $@ might be useful for yourselves.
Meanwhile I suspect we'll have to debug this by email round-trips.
See below.
Show quoted text> Program terminated with signal 11, Segmentation fault.
> #0 0x00007f49d796bc56 in incr_sub_inclusive_time (subr_entry=0x7f49e18dd5e0) at NYTProf.xs:2060
> 2060 while (*p)
> (gdb)
> (gdb)
> (gdb) bt
> #0 0x00007f49d796bc56 in incr_sub_inclusive_time (subr_entry=0x7f49e18dd5e0) at NYTProf.xs:2060
> #1 0x00007f49d82ff429 in Perl_leave_scope (base=2872) at scope.c:1134
> #2 0x00007f49d830e58f in Perl_die_unwind (msv=<value optimized out>) at pp_ctl.c:1765
> #3 0x00007f49d82b8db5 in Perl_croak_sv (baseex=<value optimized out>) at util.c:1683
> #4 0x00007f49d82b8dc9 in Perl_die_sv (baseex=0x0) at util.c:1616
> #5 0x00007f49d8321a98 in Perl_pp_die () at pp_sys.c:510
> #6 0x00007f49d82d15a3 in Perl_runops_standard () at run.c:41
> #7 0x00007f49d826e177 in Perl_call_sv (sv=<value optimized out>, flags=6) at perl.c:2690
> #8 0x00007f49d82b74d6 in S_invoke_exception_hook (ex=<value optimized out>,
> Perl_warn=<value optimized out>) at util.c:1594
> #9 0x00007f49d82b9620 in Perl_vcroak (pat=<value optimized out>, args=<value optimized out>)
> at util.c:1715
> #10 0x00007f49d82bc71d in Perl_die (pat=0x0) at util.c:1650
> #11 0x00007f49d82d2767 in Perl_pp_entersub () at pp_hot.c:2606
> #12 0x00007f49d79729cc in pp_subcall_profiler (is_slowop=0) at NYTProf.xs:2615
> #13 0x00007f49d82d15a3 in Perl_runops_standard () at run.c:41
> #14 0x00007f49d826e523 in Perl_call_sv (sv=<value optimized out>, flags=10) at perl.c:2705
> #15 0x00007f49d85ad8e7 in modperl_callback (handler=0x7f49e105e780, p=0x7f49e0faa8b8,
> r=<value optimized out>, s=0x7f49e105f470, args=<value optimized out>)
> at modperl_callback.c:101
> #16 0x00007f49d85ae0cb in modperl_callback_run_handlers (idx=<value optimized out>, type=-80,
> r=<value optimized out>, c=<value optimized out>, s=<value optimized out>,
> pconf=<value optimized out>, plog=0x0, ptemp=0x0, run_mode=MP_HOOK_RUN_FIRST)
> at modperl_callback.c:262
> #17 0x00007f49d85ae65f in modperl_callback_per_dir (idx=0, r=<value optimized out>,
> run_mode=<value optimized out>) at modperl_callback.c:369
> #18 0x00007f49d85a9bdb in modperl_response_handler_run (r=0x7f49e0faa930) at mod_perl.c:1000
> #19 0x00007f49d85a9d22 in modperl_response_handler_cgi (r=0x7f49e0faa930) at mod_perl.c:1099
> #20 0x00007f49dbf0a200 in ap_run_handler (r=0x7f49e0faa930) at config.c:158
> #21 0x00007f49dbf0dace in ap_invoke_handler (r=0x7f49e0faa930) at config.c:376
> #22 0x00007f49dbf1b3c8 in ap_process_request (r=0x7f49e0faa930) at http_request.c:282
> #23 0x00007f49dbf18278 in ap_process_http_connection (c=0x7f49e124de70) at http_core.c:190
> #24 0x00007f49dbf11d28 in ap_run_process_connection (c=0x7f49e124de70) at connection.c:43
> #25 0x00007f49dbf1fd87 in child_main (child_num_arg=<value optimized out>) at prefork.c:662
> #26 0x00007f49dbf2009a in make_child (s=0x7f49dc89bb18, slot=2) at prefork.c:758
> #27 0x00007f49dbf203cb in startup_children (_pconf=<value optimized out>,
> plog=<value optimized out>, s=<value optimized out>) at prefork.c:776
> #28 ap_mpm_run (_pconf=<value optimized out>, plog=<value optimized out>, s=<value optimized out>)
> at prefork.c:997
> #29 0x00007f49dbef6410 in main (argc=5, argv=0x7fffd0f77018) at main.c:740
> (gdb)
> (gdb)
> (gdb) p p
> $1 = 0x0
> (gdb)
> (gdb)
> (gdb) p *subr_entry
> $2 = {already_counted = 1, subr_prof_depth = 14, subr_call_seqn = 7797,
> prev_subr_entry_ix = 21488, initial_call_timeofday = {tv_sec = 23353010, tv_nsec = 450637351},
> initial_call_cputimes = {tms_utime = 0, tms_stime = 0, tms_cutime = 0, tms_cstime = 0},
> initial_overhead_ticks = 161933, initial_subr_ticks = 3454823, caller_fid = 131,
> caller_line = 112, caller_subpkg_pv = 0x7f49dfe5b060 "HTML::Mason::Commands",
> caller_subnam_sv = 0x7f49e2024cd0, called_cv = 0x0, called_cv_depth = 0, called_is_xs = 0x0,
> called_subpkg_pv = 0x0, called_subnam_sv = 0x7f49e20292f0, hide_subr_call_time = 0}
> ================================================================================================
That looks reasonable, except for the null called_subpkg_pv.
The place to look is subr_entry_setup().
called_is_xs being null implies it took the
if (op_type == OP_ENTERSUB || op_type == OP_GOTO)
branch, which is sane.
called_cv being null implies that resolve_sub_to_cv() failed.
In which case we end up in
else {
subr_entry->called_subnam_sv = newSV(0); /* see incr_sub_inclusive_time */
}
which doesn't set subr_entry->called_subpkg_pv. Suspicious!
Per the comment I look in incr_sub_inclusive_time() and see:
if (subr_entry->called_subnam_sv == &PL_sv_undef) {
if (trace_level)
logwarn("Don't know name of called sub, assuming xsub/builtin exited via an exception (which isn't handled yet)\n");
subr_entry->already_counted++;
}
if (subr_entry->already_counted) {
subr_entry_destroy(aTHX_ subr_entry);
return;
}
The warn message seems to fit the stack trace.
So I'd *guess* that the right fix is to change the newSV(0) to &PL_sv_undef.
Let me know if that works.
Tim.
p.s. Also, perhaps you or Zefram, could see if resolve_sub_to_cv()
could be improved to handle this case. Thanks!