Skip Menu |

Preferred bug tracker

Please email the preferred bug tracker to report your issue.

This queue is for tickets about the PPIx-Regexp CPAN distribution.

Report information
The Basics
Id: 107331
Status: resolved
Priority: 0/
Queue: PPIx-Regexp

People
Owner: Nobody in particular
Requestors: k-rindfrey [...] gmx.de
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: (no value)
Fixed in: 0.041_01



Subject: /x mode end-of-line comments are not recognized
Date: Fri, 25 Sep 2015 21:51:01 +0200
To: bug-PPIx-Regexp [...] rt.cpan.org
From: Klaus Rindfrey <k-rindfrey [...] gmx.de>
Hi Thomas, i installed version 0.041 of PPIx::Regexp from CPAN. Now, the docu of PPIx::Regexp::Token::Comment tells "This class represents a comment - both parenthesized comments (i.e. (?# this is a comment ) and the /x mode end-of-line comments." But running this script: #----------------------------------- use strict; use warnings; use PPIx::Regexp; use PPIx::Regexp::Dumper; my $re = PPIx::Regexp->new(<<'EOT'); qr/foo bar (?# first) baz # second /x' EOT PPIx::Regexp::Dumper->new( $re )->print(); #----------------------------------- only the "(?# first)" comment is recognized, not the "# second". The script prints: #----------------------------------- PPIx::Regexp failures=0 PPIx::Regexp::Token::Structure 'qr' PPIx::Regexp::Structure::Regexp / ... ' PPIx::Regexp::Token::Literal 'f' PPIx::Regexp::Token::Literal 'o' PPIx::Regexp::Token::Literal 'o' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Literal 'b' PPIx::Regexp::Token::Literal 'a' PPIx::Regexp::Token::Literal 'r' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Comment '(?# first)' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Literal 'b' PPIx::Regexp::Token::Literal 'a' PPIx::Regexp::Token::Literal 'z' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Literal '#' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Literal 's' PPIx::Regexp::Token::Literal 'e' PPIx::Regexp::Token::Literal 'c' PPIx::Regexp::Token::Literal 'o' PPIx::Regexp::Token::Literal 'n' PPIx::Regexp::Token::Literal 'd' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Literal '/' PPIx::Regexp::Token::Literal 'x' PPIx::Regexp::Token::Modifier '' PPIx::Regexp::Token::Whitespace ' #----------------------------------- I'm running perl v5.20.1. Regards, Klaus

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

From: k-rindfrey [...] gmx.de
Well, i just noticed there is an additional unintended character (a single quote) at the end of the regex. After removing it, the output is ok; both comments are correctly recognized. So you may simply close this issue. Obviously, PPIx::Regexp has been confused by that error. Am Fr 25. Sep 2015, 15:51:24, k-rindfrey@gmx.de schrieb: Show quoted text
> Hi Thomas, > > i installed version 0.041 of PPIx::Regexp from CPAN. Now, the docu of > PPIx::Regexp::Token::Comment tells "This class represents a comment - > both parenthesized comments (i.e. (?# this is a comment ) and the /x > mode end-of-line comments." > > But running this script: > > #----------------------------------- > use strict; > use warnings; > > use PPIx::Regexp; > use PPIx::Regexp::Dumper; > > my $re = PPIx::Regexp->new(<<'EOT'); > qr/foo > bar (?# first) > baz # second > /x' > EOT > > PPIx::Regexp::Dumper->new( $re )->print(); > #----------------------------------- > > only the "(?# first)" comment is recognized, not the "# second". The > script prints: > > #----------------------------------- > PPIx::Regexp failures=0 > PPIx::Regexp::Token::Structure 'qr' > PPIx::Regexp::Structure::Regexp / ... ' > PPIx::Regexp::Token::Literal 'f' > PPIx::Regexp::Token::Literal 'o' > PPIx::Regexp::Token::Literal 'o' > PPIx::Regexp::Token::Literal ' > ' > PPIx::Regexp::Token::Literal ' ' > PPIx::Regexp::Token::Literal 'b' > PPIx::Regexp::Token::Literal 'a' > PPIx::Regexp::Token::Literal 'r' > PPIx::Regexp::Token::Literal ' ' > PPIx::Regexp::Token::Comment '(?# first)' > PPIx::Regexp::Token::Literal ' > ' > PPIx::Regexp::Token::Literal ' ' > PPIx::Regexp::Token::Literal 'b' > PPIx::Regexp::Token::Literal 'a' > PPIx::Regexp::Token::Literal 'z' > PPIx::Regexp::Token::Literal ' ' > PPIx::Regexp::Token::Literal ' ' > PPIx::Regexp::Token::Literal '#' > PPIx::Regexp::Token::Literal ' ' > PPIx::Regexp::Token::Literal 's' > PPIx::Regexp::Token::Literal 'e' > PPIx::Regexp::Token::Literal 'c' > PPIx::Regexp::Token::Literal 'o' > PPIx::Regexp::Token::Literal 'n' > PPIx::Regexp::Token::Literal 'd' > PPIx::Regexp::Token::Literal ' > ' > PPIx::Regexp::Token::Literal '/' > PPIx::Regexp::Token::Literal 'x' > PPIx::Regexp::Token::Modifier '' > PPIx::Regexp::Token::Whitespace ' > #----------------------------------- > > I'm running perl v5.20.1. > > > Regards, > Klaus
Thank you very much for your report, and the follow-up. I think the documentation says the parse for invalid code is undefined, and if it does not it will. But before I close the ticket I will investigate whether there is a not-too-difficult way to actually get a parse failure in this case. Maybe there is none, but at least I can take a look.
Subject: Trailing characters after valid expression cause obscure parse failures
PPIx-Regexp version 0.041_01 went to PAUSE this afternoon. What I have done is beef up the initial scan so that it actually finds the matching brackets in all cases (it already has to for the regex in a s///), and makes any extra non-whitespace into a PPIx::Regexp::Token::Unknown. Your problematic expression now parses like this: PPIx::Regexp failures=1 PPIx::Regexp::Token::Structure 'qr' PPIx::Regexp::Structure::Regexp / ... / PPIx::Regexp::Token::Literal 'f' PPIx::Regexp::Token::Literal 'o' PPIx::Regexp::Token::Literal 'o' PPIx::Regexp::Token::Whitespace ' ' PPIx::Regexp::Token::Literal 'b' PPIx::Regexp::Token::Literal 'a' PPIx::Regexp::Token::Literal 'r' PPIx::Regexp::Token::Whitespace ' ' PPIx::Regexp::Token::Comment '(?# first)' PPIx::Regexp::Token::Whitespace ' ' PPIx::Regexp::Token::Literal 'b' PPIx::Regexp::Token::Literal 'a' PPIx::Regexp::Token::Literal 'z' PPIx::Regexp::Token::Whitespace ' ' PPIx::Regexp::Token::Comment '# second ' PPIx::Regexp::Token::Modifier 'x' PPIx::Regexp::Token::Unknown '\' ' Trailing characters after expression
From: k-rindfrey [...] gmx.de
Thanks, looks good. Regards, Klaus