Bug #37297 for Text-Markdown: coredumps on various inputs in Perl 5.8.5

Tue Jul 01 11:04:37 2008 jflack [...] math.purdue.edu - Ticket created

Subject:	coredumps on various inputs in Perl 5.8.5
Date:	Tue, 01 Jul 2008 11:03:43 -0400
To:	bug-Text-Markdown [...] rt.cpan.org
From:	J Chapman Flack <jflack [...] math.purdue.edu>

This is not a duplicate of 36203, but 36203 could turn out to be a special case of this. I have an installation of IkiWiki, which you can think of as a wrapper that runs Markdown on a whole bunch of files in sequence within a single Perl instance. It has been rock solid with an older version of Text::Markdown for over a year. I upgraded to IkiWiki 2.50 last week (without changing the Text::Markdown version) - still rock solid. Yesterday, I upgraded to Text::Markdown 1.0.19 to make the MultiMarkdown features available. (Currently, nothing in my wiki /uses/ any MultiMarkdown features, and the multimarkdown option is off in ikiwiki.setup, so the only change is upgrading the module to 1.0.19 and using it on a bunch of existing Markdown files that caused no problems before.) Now I get coredumps when rebuilding the wiki. I first noticed that the coredump always occurred on the same wiki page, so I looked for something in the content of that page, but nothing I changed made a difference. I then added a shuffle() in IkiWiki::Render just to shake up the order of rendering pages, and now it doesn't dump on the same page every time, but it alwsys dumps on some page a ways into the list. So I don't think any specific page content has anything to do with it. It acts like some small cumulative data structure damage might be happening with every page rendered until the process finally falls on its face. When I upgraded to Text::Markdown 1.0.19 I noticed that List::MoreUtils got installed as a new prerequisite, and includes native code - a prime suspect when coredumps are involved. But according to the List::MoreUtils docs, you can set LIST_MOREUTILS_PP in the environment to have it use pure Perl implementations instead, and that doesn't change my results, so the trouble seems to be elsewhere (or the environment variable doesn't really do what the docs say, which I haven't tried very hard to check). Before I added the shuffle(), the symptom was always the same Segmentation fault on the same file. After the shuffle, not only does the failure occur on different files, but it can be a segfault, a bus error, or even panic messages from Perl (nice because they'll give a location in the Perl source). Here's a sample: Attempt to free unreferenced scalar: SV 0xa98f18 at .../lib/site_perl/5.8.5/Text/Markdown.pm line 997. panic: regfree data code '' at .../lib/site_perl/5.8.5/Text/Markdown.pm line 997. Bus Error(coredump) (gdb) info stack #0 0x0008d370 in Perl_pp_entersub () #1 0x000843a8 in Perl_runops_standard () #2 0x00028ed0 in S_call_body () #3 0x00028bc8 in Perl_call_sv () #4 0x0002cbf8 in S_call_list_body () #5 0x0002c7e4 in Perl_call_list () #6 0x00058f58 in Perl_newATTRSUB () #7 0x00055724 in Perl_utilize () #8 0x0004d544 in Perl_yyparse () #9 0x000bb2d0 in S_doeval () #10 0x000bd1f0 in Perl_pp_entereval () #11 0x000843a8 in Perl_runops_standard () #12 0x000283d8 in S_run_body () #13 0x00027fe0 in perl_run () #14 0x00024c08 in main () Segmentation Fault(coredump) (gdb) info stack #0 0x0006d0a4 in Perl_pregfree () #1 0x000b2a18 in Perl_pp_regcomp () #2 0x000843a8 in Perl_runops_standard () #3 0x00028ed0 in S_call_body () #4 0x00028bc8 in Perl_call_sv () #5 0x0002cbf8 in S_call_list_body () #6 0x0002c7e4 in Perl_call_list () #7 0x00058f58 in Perl_newATTRSUB () #8 0x00055724 in Perl_utilize () #9 0x0004d544 in Perl_yyparse () #10 0x000bb2d0 in S_doeval () #11 0x000bd1f0 in Perl_pp_entereval () #12 0x000843a8 in Perl_runops_standard () #13 0x000283d8 in S_run_body () #14 0x00027fe0 in perl_run () #15 0x00024c08 in main () I haven't found any workaround so it looks like I'll have to downgrade to get operational again. Chapman Flack Dept. of Mathematics Purdue University

Tue Jul 01 11:53:13 2008 bobtfish [...] bobtfish.net - Correspondence added

Subject:	Re: [rt.cpan.org #37297] coredumps on various inputs in Perl 5.8.5
Date:	Tue, 1 Jul 2008 16:52:40 +0100
To:	bug-Text-Markdown [...] rt.cpan.org
From:	Tomas Doran <bobtfish [...] bobtfish.net>

On 1 Jul 2008, at 16:04, Chapman Flack via RT wrote: Show quoted text

> > I haven't found any workaround so it looks like I'll have to downgrade > to get operational again. >

Thanks for the detailed bug report. I'd guess that the coredumps are due to the regex engine (as that seems to be where most of the other problems lie), and have a feeling that perl 5.10 would fix your issues as the regex engine has become renterant. If there is any possibility that you can try with perl 5.10 and see if the issue has gone away for you? Also, would you be prepared to try some of the older releases: http://svn.kulp.ch/cpan/text_multimarkdown/tags/ I'd specifically be interested in: 1.0.5 (very close to original MultiMarkdown - may point if the issue is in Markdown itself, or MultiMarkdown) 1.0.5 vs 1.0.6 (merge of Markdown 1.28b) 1.0.16 vs 1.0.17 (refactor _DeTab regexes) Cheers Tom

Tue Jul 01 11:53:14 2008 The RT System itself - Status changed from 'new' to 'open'

Tue Jul 01 14:04:50 2008 jflack [...] math.purdue.edu - Correspondence added

Subject:	Re: [rt.cpan.org #37297] coredumps on various inputs in Perl 5.8.5
Date:	Tue, 01 Jul 2008 12:56:57 -0400
To:	bug-Text-Markdown [...] rt.cpan.org
From:	J Chapman Flack <jflack [...] math.purdue.edu>

Tomas Doran via RT wrote: Show quoted text

> If there is any possibility that you can try with perl 5.10 and see > if the issue has gone away for you? > > Also, would you be prepared to try some of the older releases: > http://svn.kulp.ch/cpan/text_multimarkdown/tags/

Hi, I hope you'll forgive my trying to be as helpful as possible within time constraints. :) On the box where I'm testing, our web guy has a carefully tweaked Perl build that only gets changed with great ceremony, and I don't have a handy 5.10 to replicate the setup in. I did try a very simple hack to Markdown.pm that actually gets my wiki rebuild to complete successfully. I fear it works only by reducing the probability of failure, not by correcting the issue, but I just did 5 complete rebuilds in a row, so the probability might be low enough to limp on for the time being. I simply duplicated the subroutine _ProcessListItems into two identical copies named _ProcessListItemsUL and _ProcessListItemsOL and added the o (compile once only) option to the s{}{} in each one. Then I changed _DoLists so that the two call sites of _ProcessListItems passing $marker_ul actually call _ProcessListItemsUL and likewise for the two sites that pass $marker_ol. This way, the regex engine is still being used reentrantly (which is still like driving the wrong way down the freeway on purpose, when it was in the pre-5.10 docs not to do that, and 5.10 is still new enough that a lot of people, especially on commercial OSes or in production environments, aren't going to have it right away), but at least it's not reentrantly compiling and freeing regex objects on every call (so it's like driving the wrong way on the freeway in a Volvo). That seems to lower the risk a bit. The Right Thing To Do is probably to rework the reentrant uses of the regex engine so they aren't. For example, _ProcessListItems can use the regex to parse out a list of items, and then in a separate step iterate over the list and recursively process the items. That should be a perfectly safe way to do it in 5.8 and in 5.10. If the reentrant version proves to be a lot faster in 5.10, maybe it can be used conditionally when running on 5.10. Regards, Chapman Flack

Thu Jul 03 20:05:26 2008 bobtfish [...] bobtfish.net - Correspondence added

Subject:	Re: [rt.cpan.org #37297] coredumps on various inputs in Perl 5.8.5
Date:	Thu, 3 Jul 2008 16:40:42 +0100
To:	bug-Text-Markdown [...] rt.cpan.org
From:	Tomas Doran <bobtfish [...] bobtfish.net>

On 1 Jul 2008, at 19:04, Chapman Flack via RT wrote: Show quoted text

> I hope you'll forgive my trying to be as helpful as possible within > time constraints. :)

Of course - this is the real world after all, and it's awesome to hear from people who are using my code in production. ;) Show quoted text

> On the box where I'm testing, our web guy has > a carefully tweaked Perl build that only gets changed with great > ceremony, and I don't have a handy 5.10 to replicate the setup in. >

That's totally fair Show quoted text

> I did try a very simple hack to Markdown.pm that actually gets my > wiki rebuild to complete successfully. I fear it works only by > reducing the probability of failure, not by correcting the issue, > but I just did 5 complete rebuilds in a row, so the probability > might be low enough to limp on for the time being. > > I simply duplicated the subroutine _ProcessListItems into two > identical copies named _ProcessListItemsUL and _ProcessListItemsOL > and added the o (compile once only) option to the s{}{} in each one. > Then I changed _DoLists so that the two call sites of > _ProcessListItems > passing $marker_ul actually call _ProcessListItemsUL and likewise for > the two sites that pass $marker_ol.

Can you send me the code for this, even if it's not 'the correct solution', as (given it still passes the test suite), I'll push out a point release to fix this issue for anyone else who's seeing it.. Show quoted text

> > This way, the regex engine is still being used reentrantly (which is > still like driving the wrong way down the freeway on purpose, when > it was in the pre-5.10 docs not to do that, and 5.10 is still new > enough > that a lot of people, especially on commercial OSes or in production > environments, aren't going to have it right away), but at least it's > not reentrantly compiling and freeing regex objects on every call > (so it's like driving the wrong way on the freeway in a Volvo). That > seems to lower the risk a bit. > > The Right Thing To Do is probably to rework the reentrant uses of the > regex engine so they aren't. For example, _ProcessListItems can use > the > regex to parse out a list of items, and then in a separate step > iterate > over the list and recursively process the items. That should be a > perfectly safe way to do it in 5.8 and in 5.10. If the reentrant > version proves to be a lot faster in 5.10, maybe it can be used > conditionally when running on 5.10. >

I agree that this is a more correct approach to getting round the problem, and I'd like to fix this, but unfortunately Markdown is a spare time, not a paid time activity, and so I don't have as much time to hack on it as I'd like.. Cheers Tom

Mon Jul 07 09:55:49 2008 jflack [...] math.purdue.edu - Correspondence added

Subject:	Re: [rt.cpan.org #37297] coredumps on various inputs in Perl 5.8.5
Date:	Mon, 07 Jul 2008 09:55:13 -0400
To:	bug-Text-Markdown [...] rt.cpan.org
From:	J Chapman Flack <jflack [...] math.purdue.edu>

Tomas Doran via RT wrote: Show quoted text

> Can you send me the code for this, even if it's not 'the correct > solution', as (given it still passes the test suite), I'll push out a > point release to fix this issue for anyone else who's seeing it..

Sure, I'll phrase it in the form of a patch against 1.0.19. :) One caution: it might not fix the issue for anyone else who's seeing it: it's not just 'not the correct solution', it's really not a solution at all. The regex engine is still used re-entrantly, only not as heavily, so the probability of failure is lower. It's low enough that my own application manages to complete now without failing, but anybody else's mileage may still vary. Regards, -Chap

--- Markdown.pm_1.0.19 Tue Apr 22 13:39:11 2008 +++ Markdown.pm Mon Jul 7 09:38:17 2008 @@ -930,8 +930,8 @@ # paragraph for the last item in a list, if necessary: $list =~ s/\n{2,}/\n\n\n/g; my $result = ( $list_type eq 'ul' ) ? - $self->_ProcessListItems($list, $marker_ul) - : $self->_ProcessListItems($list, $marker_ol); + $self->_ProcessListItemsUL($list, $marker_ul) + : $self->_ProcessListItemsOL($list, $marker_ol); $result = "<$list_type>\n" . $result . "</$list_type>\n"; $result; }egmx; @@ -947,8 +947,8 @@ # paragraph for the last item in a list, if necessary: $list =~ s/\n{2,}/\n\n\n/g; my $result = ( $list_type eq 'ul' ) ? - $self->_ProcessListItems($list, $marker_ul) - : $self->_ProcessListItems($list, $marker_ol); + $self->_ProcessListItemsUL($list, $marker_ul) + : $self->_ProcessListItemsOL($list, $marker_ol); $result = "<$list_type>\n" . $result . "</$list_type>\n"; $result; }egmx; @@ -958,9 +958,9 @@ return $text; } -sub _ProcessListItems { +sub _ProcessListItemsOL { # -# Process the contents of a single ordered or unordered list, splitting it +# Process the contents of a single ordered list, splitting it # into individual list items. # @@ -1017,9 +1017,74 @@ } "<li>" . $item . "</li>\n"; - }egmx; + }egmxo; $self->{_list_level}--; + return $list_str; +} + +sub _ProcessListItemsUL { +# +# Process the contents of a single unordered list, splitting it +# into individual list items. +# + + my ($self, $list_str, $marker_any) = @_; + + + # The $self->{_list_level} global keeps track of when we're inside a list. + # Each time we enter a list, we increment it; when we leave a list, + # we decrement. If it's zero, we're not in a list anymore. + # + # We do this because when we're not inside a list, we want to treat + # something like this: + # + # I recommend upgrading to version + # 8. Oops, now this line is treated + # as a sub-list. + # + # As a single paragraph, despite the fact that the second line starts + # with a digit-period-space sequence. + # + # Whereas when we're inside a list (or sub-list), that line will be + # treated as the start of a sub-list. What a kludge, huh? This is + # an aspect of Markdown's syntax that's hard to parse perfectly + # without resorting to mind-reading. Perhaps the solution is to + # change the syntax rules such that sub-lists must start with a + # starting cardinal number; e.g. "1." or "a.". + + $self->{_list_level}++; + + # trim trailing blank lines: + $list_str =~ s/\n{2,}\z/\n/; + + + $list_str =~ s{ + (\n)? # leading line = $1 + (^[ \t]*) # leading whitespace = $2 + ($marker_any) [ \t]+ # list marker = $3 + ((?s:.+?) # list item text = $4 + (\n{1,2})) + (?= \n* (\z | \2 ($marker_any) [ \t]+)) + }{ + my $item = $4; + my $leading_line = $1; + my $leading_space = $2; + + if ($leading_line or ($item =~ m/\n{2,}/)) { + $item = $self->_RunBlockGamut($self->_Outdent($item)); + } + else { + # Recursion for sub-lists: + $item = $self->_DoLists($self->_Outdent($item)); + chomp $item; + $item = $self->_RunSpanGamut($item); + } + + "<li>" . $item . "</li>\n"; + }egmxo; + + $self->{_list_level}--; return $list_str; }

Fri Jul 11 18:44:43 2008 bobtfish [...] bobtfish.net - Correspondence added

Subject:	Re: [rt.cpan.org #37297] coredumps on various inputs in Perl 5.8.5
Date:	Fri, 11 Jul 2008 23:44:12 +0100
To:	bug-Text-Markdown [...] rt.cpan.org
From:	Tomas Doran <bobtfish [...] bobtfish.net>

On 7 Jul 2008, at 14:55, Chapman Flack via RT wrote: Show quoted text

> Queue: Text-Markdown > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=37297 > > > Tomas Doran via RT wrote:

>> Can you send me the code for this, even if it's not 'the correct >> solution', as (given it still passes the test suite), I'll push out a >> point release to fix this issue for anyone else who's seeing it..

> > Sure, I'll phrase it in the form of a patch against 1.0.19. :) > > One caution: it might not fix the issue for anyone else who's seeing > it: it's not just 'not the correct solution', it's really not a > solution > at all. The regex engine is still used re-entrantly, only not as > heavily, so the probability of failure is lower. It's low enough that > my own application manages to complete now without failing, but > anybody > else's mileage may still vary.

Totally agree, and I've presented it as such in the Changelog.. 1.0.20 just went to CPAN, many thanks for the patch. It should be on your CPAN mirror in the next day or two. I'd be grateful if you could try it out, and see if it works ok for you. Tom

Sat Jul 12 06:16:14 2008 bobtfish [...] bobtfish.net - Taken

Sat Jul 12 06:16:59 2008 bobtfish [...] bobtfish.net - Status changed from 'open' to 'resolved'

Sat Jul 12 06:16:59 2008 bobtfish [...] bobtfish.net - Fixed in 1.0.20 added

Sat Jul 12 06:16:59 2008 bobtfish [...] bobtfish.net - Severity Critical added

Sat Jul 12 06:17:00 2008 bobtfish [...] bobtfish.net - Broken in 1.0.16 added

Sat Jul 12 06:17:00 2008 bobtfish [...] bobtfish.net - Broken in 1.0.17 added

Sat Jul 12 06:17:00 2008 bobtfish [...] bobtfish.net - Broken in 1.0.18 added

Sat Jul 12 06:17:00 2008 bobtfish [...] bobtfish.net - Broken in 1.0.19 added

Sat Jul 12 06:17:58 2008 bobtfish [...] bobtfish.net - Reference to ticket #36203 added

Sat Jul 12 06:18:29 2008 bobtfish [...] bobtfish.net - Correspondence added

This does also fix RT#32603 :)

Sat Jul 12 06:18:31 2008 The RT System itself - Status changed from 'resolved' to 'open'

Sat Jul 12 06:18:32 2008 bobtfish [...] bobtfish.net - Status changed from 'open' to 'resolved'

Bug #37297 for Text-Markdown: coredumps on various inputs in Perl 5.8.5

Preferred bug tracker