Skip Menu |

This queue is for tickets about the Regexp-Grammars CPAN distribution.

Report information
The Basics
Id: 124007
Status: rejected
Priority: 0/
Queue: Regexp-Grammars

People
Owner: Nobody in particular
Requestors: se_misc [...] hotmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Whitespace handling for \n with /m
Date: Thu, 4 Jan 2018 15:44:37 +0000
To: "bug-Regexp-Grammars [...] rt.cpan.org" <bug-Regexp-Grammars [...] rt.cpan.org>
From: Stefan Eichenberger <se_misc [...] hotmail.com>
Hi Damian, Running the below code IMHO displays inconsistent handling of \n-whitespace under modifier /m. I initially raised the issue over at StackOverflow (https://stackoverflow.com/questions/48042738/regexpgrammars-handling-n/48084744?noredirect=1#comment83153394_48084744), but believe the problem lies in the engine, not the user. Arguably, I'm new to Regexp::Grammars, so I hesitate to exclude the user though ... Thx. for your help Stefan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # this code version reported to bug-Regexp-Grammars, 2018-01-04 use Regexp::Grammars; my($text, $parser); $text = "line_1_1,line_1_2\nline_2_1,line_2_2"; $i = 1; print "Example $i: 2nd line match contains \\n despite '.' not matching \\n with modifier /m\n"; $parser = qr { <data> <rule: data> <[line]>+ <rule: line> .+ }xm; if ($text =~ $parser) { print "Matched $i"; } else { print "Not matched $i"; } print "\npause $i...\n\n"; $i++; print "Example $i: 2nd line match contains \\n despite explicit exclusion\n"; $parser = qr { <data> <rule: data> <[line]>+ <rule: line> [^\n]+ }xm; if ($text =~ $parser) { print "Matched $i"; } else { print "Not matched $i"; } print "\npause $i...\n\n"; $i++; print "Example $i: separator \$ seems to consume \\n (using separator \\n also works)\n"; $parser = qr { <data> <rule: data> <[line]>+ % $ # Note: \n als works here <rule: line> .+ }xm; if ($text =~ $parser) { print "Matched $i"; } else { print "Not matched $i"; } print "\npause $i...\n\n"; $i++; print "Example $i: contexts of 'line' matches still contain \\n, but fields no longer; so here explicit exclusion of \\n in rule seems to work\n"; $parser = qr { <data> <rule: data> <[line]>+ <rule: line> <[field]>+ % , <rule: field> [^,\n]+ }xm; if ($text =~ $parser) { print "Matched $i"; } else { print "Not matched $i"; } print "\npause $i...\n\n"; $i++; print "Example $i: returns 3 fields, where 2nd field contains \\n - probably due to greedy match of 'field'\n"; $parser = qr { <data> <rule: data> <[line]>+ % $ <rule: line> <[field]>+ % , <rule: field> [^,]+ }xm; if ($text =~ $parser) { print "Matched $i"; } else { print "Not matched $i"; } print "\npause $i...\n\n"; $i++;
Subject: Re: [rt.cpan.org #124007]
Date: Thu, 4 Jan 2018 15:51:24 +0000
To: "bug-Regexp-Grammars [...] rt.cpan.org" <bug-Regexp-Grammars [...] rt.cpan.org>
From: Stefan Eichenberger <se_misc [...] hotmail.com>
Sorry, forgot to mention environment: Regexp::Grammar 1.048 on Strawberry 5.26.1 on Win-7
Subject: Re: [rt.cpan.org #124007] Whitespace handling for \n with /m
Date: Fri, 5 Jan 2018 06:12:20 +1100
To: bug-Regexp-Grammars [...] rt.cpan.org
From: Damian Conway <damian [...] conway.org>
Hi Stefan, I'm afraid the bug in in the user in this case. ;-) A rule with whitespace within it matches any whitespace (including newlines) in the input at that point. So a rule like: <rule: line> .+ is really equivalent to: <rule: line><.ws>.+ meaning: match-but-don't-capture any leading whitespace, then match any-characters-except-newline. And it's the implicit call to <.ws> that "eats" the newlines preceding each line, which is why the first two examples match. If you want whitespace inside the rule to be ignored (as you seem to want here), then you need to declare the rule as a token instead. Tokens don't have the magical "whitespace-matches-whitespace" behaviour of rules. Hence you would write: <token: line> .+ in which case you will also need to explicitly consume the newlines separating each line, with something like: <rule: data> <[line]>+ % \n or perhaps: <rule: data> <[line]>+ % \n+ if you want to allow multiple newlines between lines. Hope this helps. If not, feel free to ask for further clarification. Damian
Subject: Re: [rt.cpan.org #124007] Whitespace handling for \n with /m
Date: Fri, 5 Jan 2018 18:21:08 +0000
To: "bug-Regexp-Grammars [...] rt.cpan.org" <bug-Regexp-Grammars [...] rt.cpan.org>
From: Stefan Eichenberger <se_misc [...] hotmail.com>
Hi Damian, Happy to be at fault here ;-) - your explanation is perfectly clear and makes sense. I'll update StackOverflow accordingly, to avoid confusion. You may consider updating perldoc in chapter 'Tokens vs. rules': The earlier example of a LaTeX matcher makes liberal use of rules with <.ws> inference; since LaTeX is not line oriented, that probably works, but such domain specific knowledge should not be implied. I've read 'Tokens vs. rules' multiple times, but didn't trigger on the critical notion that      <rule: line>   .+     ==>    <rule: line><.ws>.+ Thanks again for your kind help - which makes my basic learning exercise over the XMas vacation a success then :-) Stefan