Skip Menu |

This queue is for tickets about the Regexp-Grammars CPAN distribution.

Report information
The Basics
Id: 61691
Status: open
Priority: 0/
Queue: Regexp-Grammars

People
Owner: Nobody in particular
Requestors: whatson [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Repetition operator doesn't conform to defined whitespace behaviour
Hi Damian, It seems that the repetition operator '**' doesn't obey the same whitespace rules as the rest of Regexp::Grammars. Instead, the meaning of the surrounding whitespace is always interpreted as though inside a rule. eg. qr/ <val>**<sep> # no whitespace - so none is matched <val> ** <sep> # whitespace is treated as <ws> rather than ignored /x I believe that in general, in Perl 5, the whitespace behaviour of rules is an exception, not the norm. With this being the case, the whitespace around the repetition operator should only demonstrate the above behaviour when used inside a rule. Outside of a rule the surrounding whitespace should follow the convention of the regular expression in which it is used, ie. treated literally, or ignored if /x is used. This issue presented itself when attempting to use a specific length of whitespace as the seperator for a repetition. Despite encapsulating the repetition operator in a token (a desperate measure), debugging showed that the regular expression engine was backtracking unnecessarily, and matching whitespace longer than the specified length. Further investigation led to the aforementioned issue - a distilled test case is attached. Many thanks, Andrew Whatson
Subject: test_whitespace_seperator.pl
#!/usr/bin/perl use 5.010; use strict; use warnings; use Data::Dumper; use Regexp::Grammars; $Data::Dumper::Sortkeys = 1; $Data::Dumper::Terse = 1; # The text to match against my $text = 'a' . (' ' x 5) . 'z'; # This should match without backtracking my $broken1 = qr/ <logfile: - > <debug: on> \A<TOP>\Z <token: TOP> <[val]> ** <sep> <token: sep> \s{5} <token: val> \w+ /x; # This should NOT match # # NB: There are 5 spaces in the target, but we're matching for a list # separated by 3 whitespace characters # my $broken2 = qr/ <logfile: - > <debug: on> \A<TOP>\Z <token: TOP> <[val]> ** <sep> <token: sep> \s{3} <token: val> \w+ /x; # This demonstrates the expected behaviour of $broken1 my $correct1 = qr/ <logfile: - > <debug: on> \A<TOP>\Z <token: TOP> <[val]> (?: <sep> <[val]> )* <token: sep> \s{5} <token: val> \w+ /x; # This demonstrates the expected behaviour of $broken2 my $correct2 = qr/ <logfile: - > <debug: on> \A<TOP>\Z <token: TOP> <[val]> (?: <sep> <[val]> )* <token: sep> \s{3} <token: val> \w+ /x; $text =~ $broken1 ? print '$broken1 matched: ' . Dumper(\%/) : say '$broken1 did not match'; $text =~ $broken2 ? print '$broken2 matched: ' . Dumper(\%/) : say '$broken2 did not match'; $text =~ $correct1 ? print '$correct1 matched: ' . Dumper(\%/) : say '$correct1 did not match'; $text =~ $correct2 ? print '$correct2 matched: ' . Dumper(\%/) : say '$correct2 did not match';
Subject: Re: [rt.cpan.org #61691] Repetition operator doesn't conform to defined whitespace behaviour
Date: Tue, 28 Sep 2010 08:08:01 +1000
To: bug-Regexp-Grammars [...] rt.cpan.org
From: Damian Conway <damian [...] conway.org>
Hi Andrew, Thank-you for the detailed bug report and the test code. Show quoted text
> It seems that the repetition operator '**' doesn't obey the same > whitespace rules as the rest of Regexp::Grammars.
Yes, that's definitely a serious bug. I've just uploaded a new release of the module that squashes it. Let me know if you find any residual problems with **. All the very best, and thanks again, Damian
From: whatson [...] gmail.com
Hi Damian, Everything seems to be working as expected. Thanks again, Andrew Whatson