Bug #87462 for Text-Autoformat: overlapping greedy submatches vulnerable to unlucky data

Tue Jul 30 17:32:04 2013 myrrhlin [...] gmail.com - Ticket created

Subject:	overlapping greedy submatches vulnerable to unlucky data
Date:	Tue, 30 Jul 2013 17:31:33 -0400
To:	bug-Text-Autoformat [...] rt.cpan.org
From:	Michael Hamlin <myrrhlin [...] gmail.com>

howdy, I ran into a case of Text::Autoformat behaving badly in production, and tracked it down to this patch (made against latest version 1.669003): 463,464c463,465 < $newtext =~ /\s*([^\n]*)$/; < $widow_okay = $para->{empty} || length($1) >= $args{widow}; --- Show quoted text

> (my $widow) = $newtext =~ /([^\n]*)$/; > $widow =~ s/^\s+//; > $widow_okay = $para->{empty} || length($widow) >= $args{widow};

this regex was taking over 9 minutes on a particularly bad email we received with lots of tabs. we're (sadly) still running 5.8.8. on CentOS boxen (eg GNU/Linux 2.6.18-194.8.1.el5 #1 SMP Thu Jul 1 19:04:48 EDT 2010 x86_64) the regex match m/\s*([^\n]*)$/ is problematic because spaces and tabs can match either of the greedy submatches. this overlap means lots of permutations and backtracking for the regex engine. doing the two bits of logic separately (get the last line, strip off leading space before determining its length) avoids the issue, at the expense of an extra lexical. i hope this report is helpful, and thank you for great tools! michael

Wed Jul 31 16:46:06 2013 DCONWAY [...] cpan.org - Status changed from 'new' to 'resolved'

Wed Jul 31 16:46:32 2013 damian [...] conway.org - Correspondence added

Subject:	Re: [rt.cpan.org #87462] overlapping greedy submatches vulnerable to unlucky data
Date:	Wed, 31 Jul 2013 13:45:38 -0700
To:	bug-Text-Autoformat [...] rt.cpan.org
From:	Damian Conway <damian [...] conway.org>

Thanks, Michael. Sorry for leaving in that nasty edge-case. I very much appreciate your tracking the problem down yourself and also providing a patch. That was a great help to me. I've now applied your patch and re-uploaded the module. All the very best, Damian