Skip Menu |

This queue is for tickets about the Regexp-Assemble CPAN distribution.

Report information
The Basics
Id: 18266
Status: resolved
Priority: 0/
Queue: Regexp-Assemble

People
Owner: dland [...] cpan.org
Requestors: book [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Unimportant
Broken in: (no value)
Fixed in: (no value)



Subject: Default regexp when no pattern has been added is misleading
The default pattern returned by Regexp::Assemble when no pattern has been added yet is a pattern that always fail (which is what everyone should expect). However, I was surprised to see that the following one-liner : perl -MRegexp::Assemble -e 'print Regexp::Assemble->new->re' returned (?-xism:^a\bz) And scratched my head for a while before realising it what doing exactly what I expected. Wouldn't (?!) be easier to read as a pattern that always fails? (?-xism:(?!)) wouldn't have had me wonder what those 'a' and 'z' were.
On Tue Mar 21 06:10:20 2006, BOOK wrote: Show quoted text
> The default pattern returned by Regexp::Assemble when no pattern has > been added yet is a pattern that always fail (which is what everyone > should expect). > > However, I was surprised to see that the following one-liner : > > perl -MRegexp::Assemble -e 'print Regexp::Assemble->new->re' > > returned > > (?-xism:^a\bz) > > And scratched my head for a while before realising it what doing > exactly what I expected. > > Wouldn't (?!) be easier to read as a pattern that always fails?
I seem to recall that that will scan the entire string. Anchoring to the beginning forces the failure much sooner. I also wanted a more ancient method of creating a pattern that match nothing. (?!) struck me as "modern" whereas \b has been around forever. Show quoted text
> (?-xism:(?!)) wouldn't have had me wonder what those 'a' and 'z' were.
On Tue Mar 21 06:10:20 2006, BOOK wrote: Show quoted text
> Wouldn't (?!) be easier to read as a pattern that always fails? > > (?-xism:(?!)) wouldn't have had me wonder what those 'a' and 'z' were. >
It's all coming back to me now... Yes, that reads easier, but I confirm that it will scan the entire string. Try running % perl -Mre=debug -le '$x = "x" x 100000;$x =~ /(?:(?!))/' Anchoring it to the beginning is better: % perl -Mre=debug -le '$x = "x" x 100000;$x =~ /(?:^(?!))/' Freeing REx: `","' Compiling REx `(?:^(?!))' size 7 Got 60 bytes for offset annotations. first at 2 1: BOL(2) 2: UNLESSM[-0](7) 4: NOTHING(5) 5: SUCCEED(0) 6: TAIL(7) 7: END(0) anchored(BOL) minlen 0 Offsets: [7] 4[1] 8[0] 7[0] 7[0] 7[0] 7[0] 10[0] Matching REx `(?:^(?!))' against `xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...' Setting an EVAL scope, savestack=3 0 <> <xxxxxxxxxxxx> | 1: BOL 0 <> <xxxxxxxxxxxx> | 2: UNLESSM[-0] 0 <> <xxxxxxxxxxxx> | 4: NOTHING 0 <> <xxxxxxxxxxxx> | 5: SUCCEED could match... failed... Match failed Freeing REx: `"(?:^(?!))"' But the /^a\bz/ is best of all: david@bechet:~% perl -Mre=debug -le '$x = "x" x 100000;$x =~ /^a\bz/' Freeing REx: `","' Compiling REx `^a\bz' size 7 Got 60 bytes for offset annotations. first at 2 1: BOL(2) 2: EXACT <a>(4) 4: BOUND(5) 5: EXACT <z>(7) 7: END(0) anchored `az' at 0 (checking anchored) anchored(BOL) minlen 2 Offsets: [7] 1[1] 2[1] 0[0] 3[2] 5[1] 0[0] 6[0] Guessing start of match, REx `^a\bz' against `xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...'... String not equal... Match rejected by optimizer Freeing REx: `"^a\\bz"' The optimizer doesn't even fire up the regexp engine (unless the target string begins with 'a' and ends with 'z'). Theoretically I suppose I should change that to /^\x00\x\x00/ but pratically it's perhaps even better to change it to /^a\bz(?# a pattern that matches nothing)/ David
Le Mar. Mar. 21 07:50:29 2006, guest a écrit : Show quoted text
> Theoretically I suppose I > should change that to > > /^\x00\x\x00/
I think you mean /^\0\b\0/, but anyway... Show quoted text
> but pratically it's perhaps even better to change it to > > /^a\bz(?# a pattern that matches nothing)/
Yes. I'd suggest "a pattern that never matches" since "matching nothing" may be ambiguous (/(?=)/ matches nothing, after all, and always matches) Thanks for the quick and documented reply.
Fixed in version 0.25