Skip Menu |

This queue is for tickets about the Regexp-Assemble CPAN distribution.

Report information
The Basics
Id: 36399
Status: resolved
Priority: 0/
Queue: Regexp-Assemble

People
Owner: Nobody in particular
Requestors: cbw3qq202 [...] sneakemail.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.32
Fixed in: 0.33



Subject: Regexp::Assemble can produce invalid regexp
Date: Tue, 3 Jun 2008 14:33:52 +0200
To: bug-Regexp-Assemble [...] rt.cpan.org
From: "Yves BLUSSEAU" <cbw3qq202 [...] sneakemail.com>
Hi, I have a big problem when using Regexp::Assemble The program: #!/usr/bin/perl -w use strict; use Regexp::Assemble; my $ra = Regexp::Assemble->new; $ra->add('a|b|[cd]'); print $ra->re,$/; The result is: (?-xism:a\|b\|[cd]) That is wrong because Regexp::Assemble have added a \ before the pipe, so now the regexp match the pipe char not the alternation metacharacter | Regards
On Tue Jun 03 08:34:26 2008, cbw3qq202@sneakemail.com wrote: Show quoted text
> Hi, > > > > I have a big problem when using Regexp::Assemble
[...] Show quoted text
> $ra->add('a|b|[cd]'); > > print $ra->re,$/; > > > > The result is: > > (?-xism:a\|b\|[cd]) > > > > That is wrong because Regexp::Assemble have added a \ before the pipe, so > now the regexp match the pipe char not the alternation metacharacter |
Hmm, yes indeed. It seems that alternation and a character class don't play well. 'a|[cd]' is sufficient to trip the bug. If you add('a', 'b', '[cd]') independently, the bug doesn't manifest itself. The philosophy is to add the smallest pieces possible, in order to allow the module to assemble subpatterns efficiently. For instance, if you now add the pattern 'b\d+', that cannot be merged to create a|b(?:\d+)?|[cd]. Also, the initial pattern would be [abcd] had you added the subpatterns individually. Still, it is definitely a bug and I will fix it in a new release. Thanks for the report, David
On Tue Jun 03 08:34:26 2008, cbw3qq202@sneakemail.com wrote: Show quoted text
> Hi, > > > > I have a big problem when using Regexp::Assemble
Here is a patch against 0.32 that corrects the problem: --- Assemble.pm.old Mon Jul 30 19:48:14 2007 +++ Assemble.pm Wed Jun 4 22:40:39 2008 @@ -376,7 +376,7 @@ $token =~ s/^\\([^\w$()*+.?@\[\\\]^|{}\/])$/$1/; } else { - $token =~ s{\A([][{}*+?@|\\/])\Z}{\\$1}; + $token =~ s{\A([][{}*+?@\\/])\Z}{\\$1}; } if ($unroll_plus and $qualifier =~ s/\A\+(\?)?\Z/*/) { $1 and $qualifier .= $1; I'll have to explore this further to see that there aren't similar problems with other meta-characters. The above patch doesn't break anything in the test suite which basically means I wasn't testing for it... Thanks again for taking the time to file this report, David
Hi, this has been fixed in 0.33, released on CPAN. Thanks, David