Skip Menu |

This queue is for tickets about the Text-Autoformat CPAN distribution.

Report information
The Basics
Id: 87462
Status: resolved
Priority: 0/
Queue: Text-Autoformat

People
Owner: Nobody in particular
Requestors: myrrhlin [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: overlapping greedy submatches vulnerable to unlucky data
Date: Tue, 30 Jul 2013 17:31:33 -0400
To: bug-Text-Autoformat [...] rt.cpan.org
From: Michael Hamlin <myrrhlin [...] gmail.com>
howdy, I ran into a case of Text::Autoformat behaving badly in production, and tracked it down to this patch (made against latest version 1.669003): 463,464c463,465 < $newtext =~ /\s*([^\n]*)$/; < $widow_okay = $para->{empty} || length($1) >= $args{widow}; --- Show quoted text
> (my $widow) = $newtext =~ /([^\n]*)$/; > $widow =~ s/^\s+//; > $widow_okay = $para->{empty} || length($widow) >= $args{widow};
this regex was taking over 9 minutes on a particularly bad email we received with lots of tabs. we're (sadly) still running 5.8.8. on CentOS boxen (eg GNU/Linux 2.6.18-194.8.1.el5 #1 SMP Thu Jul 1 19:04:48 EDT 2010 x86_64) the regex match m/\s*([^\n]*)$/ is problematic because spaces and tabs can match either of the greedy submatches. this overlap means lots of permutations and backtracking for the regex engine. doing the two bits of logic separately (get the last line, strip off leading space before determining its length) avoids the issue, at the expense of an extra lexical. i hope this report is helpful, and thank you for great tools! michael
Subject: Re: [rt.cpan.org #87462] overlapping greedy submatches vulnerable to unlucky data
Date: Wed, 31 Jul 2013 13:45:38 -0700
To: bug-Text-Autoformat [...] rt.cpan.org
From: Damian Conway <damian [...] conway.org>
Thanks, Michael. Sorry for leaving in that nasty edge-case. I very much appreciate your tracking the problem down yourself and also providing a patch. That was a great help to me. I've now applied your patch and re-uploaded the module. All the very best, Damian