Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Perl-Critic CPAN distribution.

Report information
The Basics
Id: 64776
Status: open
Priority: 0/
Queue: Perl-Critic

People
Owner: Nobody in particular
Requestors: EDAVIS [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 1.111
Fixed in: (no value)



Subject: Suggested policy: forbid .* at start or end of unanchored regexp
Beginning perl programmers will sometimes write regexp tests like: if ($string =~ /.*(\d+).*/) { # do something with $1 } The intention is to match a number somewhere in the input string. But because the regexp engine tries all possible start positions anyway, the initial .* is redundant, and since the regexp engine ignores unmatching stuff at the end, the final .* is also redundant. I suggest that code like this indicates some confusion about how perl's regular expressions work, and a warning is very worthwhile to explain to the programmer his or her mistake. A policy should warn about .* at the very start or end of a regexp used for m// matching (but not for s///). The warning text should suggest using either /\A(\d+)\z/ to match the entire string, or /(\d+)/ to search the string and find a match at any point.
The documentation for this should recognize the subtle difference between /\d+/ and /.*\d+/ -- the former matches the FIRST occurrence, and the latter matches the LAST occurrence, since .* is greedy. Not relevant if you are just checking presence, but relevant if you make use of pos() or $+[0].
You're right. .* at the beginning of a regexp is not a mistake if that regexp has capturing groups. However, .* at the start still merits a warning for a plain non-capturing regexp, and .* at the end is pretty much always a mistake (unless with /g perhaps?).