Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Perl-Critic CPAN distribution.

Report information
The Basics
Id: 38531
Status: resolved
Priority: 0/
Queue: Perl-Critic

People
Owner: Nobody in particular
Requestors: mschwern [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: (no value)
Fixed in: (no value)



Subject: Exceptions to /x for short regexes and regexes with spaces.
The attached patch adds two configurable exceptions to RegularExpressions::RequireExtendedFormatting. "allow_short_regex" makes an exception for regexes shorter than the given length. This supports the opinion that some regexes are so short that spacing them out won't help. "min_regex_length" is an alternative name. "allow_with_whitespace" makes an exception for regular expressions which contain whitespace. This supports the opinion that: $string =~ m{Basset\ hounds\ got\ long\ ears}x; is less readable than $string =~ m{Basset hounds got long ears};
Subject: require_extended_foramtting.patch
--- lib/Perl/Critic/Policy/RegularExpressions/RequireExtendedFormatting.pm (revision 57939) +++ lib/Perl/Critic/Policy/RegularExpressions/RequireExtendedFormatting.pm (local) @@ -13,7 +13,7 @@ use Readonly; use Perl::Critic::Utils qw{ :severities }; -use Perl::Critic::Utils::PPIRegexp qw{ &get_modifiers }; +use Perl::Critic::Utils::PPIRegexp qw{ &get_modifiers &get_match_string }; use base 'Perl::Critic::Policy'; our $VERSION = '1.090'; @@ -25,7 +25,25 @@ #----------------------------------------------------------------------------- -sub supported_parameters { return () } +sub supported_parameters { + return ( + { + name => 'allow_short_regex', + description => + q[Regexes below a given length are ok.], + behavior => 'integer', + default_string => '0', + integer_minimum => 0, + }, + { + name => 'allow_with_whitespace', + description => + q[Regexes with spaces can be harder to read with /x], + behavior => 'boolean', + }, + ); +} + sub default_severity { return $SEVERITY_MEDIUM } sub default_themes { return qw(core pbp maintenance) } sub applies_to { return qw(PPI::Token::Regexp::Match @@ -37,6 +55,10 @@ sub violates { my ( $self, $elem, undef ) = @_; + my $match = get_match_string($elem); + return if length $match <= $self->{_allow_short_regex}; + return if $self->{_allow_with_whitespace} and $match =~ /\s/; + my %mods = get_modifiers($elem); if ( ! $mods{x} ) { return $self->violation( $DESC, $EXPL, $elem ); @@ -86,7 +108,25 @@ =head1 CONFIGURATION -This Policy is not configurable except for the standard options. +Because using C</x> on a regex which has whitespace in it can make it harder +to read, you have to escape all that innocent whitespace, you can add an +exception by turning on C<allow_with_whitespace>. + + [RegularExpressions::RequireExtendedFormatting] + allow_with_whitespace = 1 + + $string =~ /Basset hounds got long ears/; # ok + +You might find that putting a C</x> on short regexes to be excessive. An +exception can be made for them by setting C<allow_short_regex> to the minimum +match length you'll allow without a C</x>. The length only counts the regular +expression, not the braces or operators. + + [RegularExpressions::RequireExtendedFormatting] + allow_short_regex = 5 + + $num =~ m{(\d+)}; # ok, only 5 characters + $num =~ m{\d\.(\d+)}; # not ok, 9 characters =head1 NOTES --- t/RegularExpressions/RequireExtendedFormatting.run (revision 57939) +++ t/RegularExpressions/RequireExtendedFormatting.run (local) @@ -74,6 +74,58 @@ my $string =~ tr/[A-Z]/[a-z]/cds; my $string =~ y/[A-Z]/[a-z]/cds; + +#----------------------------------------------------------------------------- + +## name allow_short_regex, pass +## failures 0 +## parms { allow_short_regex => 5 } +## cut + +my $string =~ m/foo/; + +my $string =~ s/foo//; +my $string =~ s/foo/bar/; +my $string =~ s/foo/barbarbar/; +my $string =~ s/12345//; + + +#----------------------------------------------------------------------------- + +## name allow_short_regex, fail +## failures 2 +## parms { allow_short_regex => 5 } +## cut + +my $string =~ m/foofoo/; + +my $string =~ s/foofoo//; + + +#----------------------------------------------------------------------------- + +## name allow_with_whitespace, pass +## failures 0 +## parms { allow_with_whitespace => 1 } +## cut + +my $string =~ m/foo bar/; + +my $string =~ s/foo bar//; + + +#----------------------------------------------------------------------------- + +## name allow_with_whitespace, fail +## failures 2 +## parms { allow_with_whitespace => 1 } +## cut + +my $string =~ m/foobar/; + +my $string =~ s/foobar/foo bar/; + + #----------------------------------------------------------------------------- # Local Variables: # mode: cperl
Subject: Re: [rt.cpan.org #38531] Exceptions to /x for short regexes and regexes with spaces.
Date: Sun, 17 Aug 2008 05:25:58 -0500
To: bug-Perl-Critic [...] rt.cpan.org
From: Elliot Shank <perl [...] galumph.com>
Michael G Schwern via RT wrote: Show quoted text
> The attached patch adds two configurable exceptions to > RegularExpressions::RequireExtendedFormatting.
Applied. Show quoted text
> "allow_short_regex" makes an exception for regexes shorter than the > given length. This supports the opinion that some regexes are so short > that spacing them out won't help. "min_regex_length" is an alternative > name.
Option renamed to minimum_regex_length_to_complain_about. Show quoted text
> "allow_with_whitespace" makes an exception for regular expressions which > contain whitespace. This supports the opinion that: > > $string =~ m{Basset\ hounds\ got\ long\ ears}x; > > is less readable than > > $string =~ m{Basset hounds got long ears};
The way this was coded, if a regex contained a single whitespace character, it was exempt, no matter how complicated. Instead, I took this option out, made regexes that contain only word and whitespace characters exempt by default, and added a "strict" option that turns off the exemption. I'm not so sure on this change. Opinion?
Subject: Re: [rt.cpan.org #38531] Exceptions to /x for short regexes and regexes with spaces.
Date: Sun, 17 Aug 2008 15:13:56 -0700
To: bug-Perl-Critic [...] rt.cpan.org
From: Michael G Schwern <schwern [...] pobox.com>
Elliot Shank via RT wrote: Show quoted text
> Michael G Schwern via RT wrote:
>> The attached patch adds two configurable exceptions to >> RegularExpressions::RequireExtendedFormatting.
> > Applied.
Thanks. Show quoted text
>> "allow_short_regex" makes an exception for regexes shorter than the >> given length. This supports the opinion that some regexes are so short >> that spacing them out won't help. "min_regex_length" is an alternative >> name.
> > Option renamed to minimum_regex_length_to_complain_about.
Seems a little long. Maybe that can be tightened up? "min" is a perfectly acceptable abbreviation and "to_complain_about" seems a bit wordy and perhaps redundant? Show quoted text
>> "allow_with_whitespace" makes an exception for regular expressions which >> contain whitespace. This supports the opinion that: >> >> $string =~ m{Basset\ hounds\ got\ long\ ears}x; >> >> is less readable than >> >> $string =~ m{Basset hounds got long ears};
> > The way this was coded, if a regex contained a single whitespace character, it was exempt, no matter how complicated. > > Instead, I took this option out, made regexes that contain only word and whitespace characters exempt by default, and added a "strict" option that turns off the exemption. > > I'm not so sure on this change. Opinion?
I think that's a good lower bound, but I'd still like the configurable whitespace exemption. For example... m{You have (\d+) new items?} is more readable than the /x version... m{You\ have\ (\d+)\ new\ items?}x despite having meta characters. -- Robrt: People can't win Schwern: No, but they can riot after the game.
Subject: Re: [rt.cpan.org #38531] Exceptions to /x for short regexes and regexes with spaces.
Date: Sun, 17 Aug 2008 18:02:49 -0500
To: bug-Perl-Critic [...] rt.cpan.org
From: Elliot Shank <perl [...] galumph.com>
Michael G Schwern via RT wrote: Show quoted text
>>> "allow_with_whitespace" makes an exception for regular >>> expressions which contain whitespace. This supports the opinion >>> that: >>> >>> $string =~ m{Basset\ hounds\ got\ long\ ears}x; >>> >>> is less readable than >>> >>> $string =~ m{Basset hounds got long ears};
>> >> The way this was coded, if a regex contained a single whitespace >> character, it was exempt, no matter how complicated. >> >> Instead, I took this option out, made regexes that contain only >> word and whitespace characters exempt by default, and added a >> "strict" option that turns off the exemption. >> >> I'm not so sure on this change. Opinion?
> > I think that's a good lower bound, but I'd still like the > configurable whitespace exemption. For example... > > m{You have (\d+) new items?} > > is more readable than the /x version... > > m{You\ have\ (\d+)\ new\ items?}x > > despite having meta characters.
I agree that it's easier to read, but I'm not willing to have an option that gives a blanket exemption for anything including whitespace. I think we need a version of ProhibitComplexRegexes that uses a metric other than the number of characters; we could then use part of that implementation to change RequireExtendedFormatting to handle what you want.
Subject: Re: [rt.cpan.org #38531] Exceptions to /x for short regexes and regexes with spaces.
Date: Sun, 17 Aug 2008 18:06:46 -0500
To: bug-Perl-Critic [...] rt.cpan.org
From: Elliot Shank <perl [...] galumph.com>
Michael G Schwern via RT wrote: Show quoted text
> Elliot Shank via RT wrote:
>>> "allow_short_regex" makes an exception for regexes shorter than the >>> given length. This supports the opinion that some regexes are so short >>> that spacing them out won't help. "min_regex_length" is an alternative >>> name.
>> >> Option renamed to minimum_regex_length_to_complain_about.
> > Seems a little long. Maybe that can be tightened up? "min" is a perfectly > acceptable abbreviation and "to_complain_about" seems a bit wordy and perhaps > redundant?
You're talking to someone who forbids abbreviations in code. (Yes, there's lots of abbreviations in the P::C code, but the "expl" and "desc" standards were created before I started on the project.)
Fixed and released in version 1.092. Although we may revisit this if we come up with a better way to measure the "complexity" of a regular expression.