Skip Menu |

This queue is for tickets about the Regexp-Grammars CPAN distribution.

Report information
The Basics
Id: 128741
Status: rejected
Priority: 0/
Queue: Regexp-Grammars

People
Owner: Nobody in particular
Requestors: HAKONH [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Extra space added to array elements in a rule
Thanks for this very useful module! I just want to make you aware of a minor issue I had (looks like a bug to me). Consider following code: use strict; use warnings; use Regexp::Grammars; my $parser = qr{ <[item]>+ <rule: item> \w+ }x; my $text = 'itemA itemB itemC'; if ($text =~ $parser) { print "'$_'\n" for (@{ $/{item} }); } The output is: 'itemA' ' itemB' ' itemC' Notice the space in front of the second and third item. Expected output (since \w does not match a space): 'itemA' 'itemB' 'itemC' or the expected output should simply be none/empty (i.e.: parse failed), since I did not explicitly specify the delimiter space. For example, changing the parser to: my $parser = qr{ <[item]>+ % <.ws> <rule: item> \w+ }x; gives the expected output above. Have a nice day. Best regard, Håkon Hægland
Subject: Re: [rt.cpan.org #128741] Extra space added to array elements in a rule
Date: Thu, 7 Mar 2019 23:08:03 +0000
To: bug-Regexp-Grammars [...] rt.cpan.org
From: Damian Conway <damian [...] conway.org>
Hi Håkon, Thanks for the report. However, this is not a bug; it is the expected and documented behaviour, as described in "Tokens vs rules (whitespace handling)" in the module's documentation. Defining <item> as a <rule:....> means that any leading whitespace (such as the whitespace before the \w+) matches leading whitespace in the input. Which is exactly what you're seeing. In other words (as the documentation explains): <rule: item> \w+ is equivalent to: <token: item> <.ws> \w+ You observed: Show quoted text
> Notice the space in front of the second and third item. Expected > output (since \w does not match a space):
Correct, but the whitespace before the \w DOES (implicitly) match a space because <item> is defined as a rule, not a token. Show quoted text
> or the expected output should simply be none/empty (i.e.: parse > failed), since I did not explicitly specify the delimiter space.
However, you IMPLICITLY specified the delimiter space, by making <item> a rule. Which is why it succeeds. Show quoted text
> example, changing the parser to: > > my $parser = qr{ > <[item]>+ % <.ws> > <rule: item> \w+ > }x; > > gives the expected output above.
And that's why I would recommend writing the grammar as follows: my $parser = qr{ <[item]>+ % <.ws> <token: item> \w+ }x; As a general principle: make each named component of a grammar a <token:...>, unless you explicitly need to match intervening whitespace before or inside that component, in which case make it a <rule:...> And, if you do make a component a rule, then you must expect it to match (and return) that intervening whitespace as well. Hope this helps, Damian
Subject: Re: [rt.cpan.org #128741] Extra space added to array elements in a rule
Date: Fri, 8 Mar 2019 16:59:47 +0100
To: bug-Regexp-Grammars [...] rt.cpan.org
From: Håkon Hægland <hakon.hagland [...] gmail.com>
Hi Damian, thanks for the clear explanation! I see from your answer that I was quite confused about the fine distinction between token and rule. Anyway, this module is awesome! Have a great weekend, Best regards Håkon On Fri, Mar 8, 2019 at 12:08 AM damian@conway.org via RT < bug-Regexp-Grammars@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=128741 > > > Hi Håkon, > > Thanks for the report. > > However, this is not a bug; it is the expected and documented behaviour, > as described in "Tokens vs rules (whitespace handling)" > in the module's documentation. > > Defining <item> as a <rule:....> means that any leading whitespace > (such as the whitespace before the \w+) matches leading whitespace > in the input. Which is exactly what you're seeing. > > In other words (as the documentation explains): > > <rule: item> \w+ > > is equivalent to: > > <token: item> <.ws> \w+ > > > You observed: >
> > Notice the space in front of the second and third item. Expected > > output (since \w does not match a space):
> > Correct, but the whitespace before the \w DOES (implicitly) match a space > because <item> is defined as a rule, not a token. > >
> > or the expected output should simply be none/empty (i.e.: parse > > failed), since I did not explicitly specify the delimiter space.
> > However, you IMPLICITLY specified the delimiter space, by making <item> > a rule. Which is why it succeeds. > >
> > example, changing the parser to: > > > > my $parser = qr{ > > <[item]>+ % <.ws> > > <rule: item> \w+ > > }x; > > > > gives the expected output above.
> > And that's why I would recommend writing the grammar as follows: > > my $parser = qr{ > <[item]>+ % <.ws> > <token: item> \w+ > }x; > > > As a general principle: make each named component of a grammar a > <token:...>, > unless you explicitly need to match intervening whitespace before or inside > that component, in which case make it a <rule:...> > > And, if you do make a component a rule, then you must expect it to match > (and return) that intervening whitespace as well. > > > Hope this helps, > > Damian >