Bug #107331 for PPIx-Regexp: Trailing characters after valid expression produce obscure parse failures.

Fri Sep 25 15:51:24 2015 k-rindfrey [...] gmx.de - Ticket created

Subject:	/x mode end-of-line comments are not recognized
Date:	Fri, 25 Sep 2015 21:51:01 +0200
To:	bug-PPIx-Regexp [...] rt.cpan.org
From:	Klaus Rindfrey <k-rindfrey [...] gmx.de>

Hi Thomas, i installed version 0.041 of PPIx::Regexp from CPAN. Now, the docu of PPIx::Regexp::Token::Comment tells "This class represents a comment - both parenthesized comments (i.e. (?# this is a comment ) and the /x mode end-of-line comments." But running this script: #----------------------------------- use strict; use warnings; use PPIx::Regexp; use PPIx::Regexp::Dumper; my $re = PPIx::Regexp->new(<<'EOT'); qr/foo bar (?# first) baz # second /x' EOT PPIx::Regexp::Dumper->new( $re )->print(); #----------------------------------- only the "(?# first)" comment is recognized, not the "# second". The script prints: #----------------------------------- PPIx::Regexp failures=0 PPIx::Regexp::Token::Structure 'qr' PPIx::Regexp::Structure::Regexp / ... ' PPIx::Regexp::Token::Literal 'f' PPIx::Regexp::Token::Literal 'o' PPIx::Regexp::Token::Literal 'o' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Literal 'b' PPIx::Regexp::Token::Literal 'a' PPIx::Regexp::Token::Literal 'r' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Comment '(?# first)' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Literal 'b' PPIx::Regexp::Token::Literal 'a' PPIx::Regexp::Token::Literal 'z' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Literal '#' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Literal 's' PPIx::Regexp::Token::Literal 'e' PPIx::Regexp::Token::Literal 'c' PPIx::Regexp::Token::Literal 'o' PPIx::Regexp::Token::Literal 'n' PPIx::Regexp::Token::Literal 'd' PPIx::Regexp::Token::Literal ' ' PPIx::Regexp::Token::Literal '/' PPIx::Regexp::Token::Literal 'x' PPIx::Regexp::Token::Modifier '' PPIx::Regexp::Token::Whitespace ' #----------------------------------- I'm running perl v5.20.1. Regards, Klaus

Message body is not shown because sender requested not to inline it.

Fri Sep 25 17:03:39 2015 k-rindfrey [...] gmx.de - Correspondence added

From:

k-rindfrey [...] gmx.de

Well, i just noticed there is an additional unintended character (a single quote) at the end of the regex. After removing it, the output is ok; both comments are correctly recognized. So you may simply close this issue. Obviously, PPIx::Regexp has been confused by that error. Am Fr 25. Sep 2015, 15:51:24, k-rindfrey@gmx.de schrieb: Show quoted text

> Hi Thomas, > > i installed version 0.041 of PPIx::Regexp from CPAN. Now, the docu of > PPIx::Regexp::Token::Comment tells "This class represents a comment - > both parenthesized comments (i.e. (?# this is a comment ) and the /x > mode end-of-line comments." > > But running this script: > > #----------------------------------- > use strict; > use warnings; > > use PPIx::Regexp; > use PPIx::Regexp::Dumper; > > my $re = PPIx::Regexp->new(<<'EOT'); > qr/foo > bar (?# first) > baz # second > /x' > EOT > > PPIx::Regexp::Dumper->new( $re )->print(); > #----------------------------------- > > only the "(?# first)" comment is recognized, not the "# second". The > script prints: > > #----------------------------------- > PPIx::Regexp failures=0 > PPIx::Regexp::Token::Structure 'qr' > PPIx::Regexp::Structure::Regexp / ... ' > PPIx::Regexp::Token::Literal 'f' > PPIx::Regexp::Token::Literal 'o' > PPIx::Regexp::Token::Literal 'o' > PPIx::Regexp::Token::Literal ' > ' > PPIx::Regexp::Token::Literal ' ' > PPIx::Regexp::Token::Literal 'b' > PPIx::Regexp::Token::Literal 'a' > PPIx::Regexp::Token::Literal 'r' > PPIx::Regexp::Token::Literal ' ' > PPIx::Regexp::Token::Comment '(?# first)' > PPIx::Regexp::Token::Literal ' > ' > PPIx::Regexp::Token::Literal ' ' > PPIx::Regexp::Token::Literal 'b' > PPIx::Regexp::Token::Literal 'a' > PPIx::Regexp::Token::Literal 'z' > PPIx::Regexp::Token::Literal ' ' > PPIx::Regexp::Token::Literal ' ' > PPIx::Regexp::Token::Literal '#' > PPIx::Regexp::Token::Literal ' ' > PPIx::Regexp::Token::Literal 's' > PPIx::Regexp::Token::Literal 'e' > PPIx::Regexp::Token::Literal 'c' > PPIx::Regexp::Token::Literal 'o' > PPIx::Regexp::Token::Literal 'n' > PPIx::Regexp::Token::Literal 'd' > PPIx::Regexp::Token::Literal ' > ' > PPIx::Regexp::Token::Literal '/' > PPIx::Regexp::Token::Literal 'x' > PPIx::Regexp::Token::Modifier '' > PPIx::Regexp::Token::Whitespace ' > #----------------------------------- > > I'm running perl v5.20.1. > > > Regards, > Klaus

Sat Sep 26 10:52:07 2015 wyant [...] cpan.org - Correspondence added

Thank you very much for your report, and the follow-up. I think the documentation says the parse for invalid code is undefined, and if it does not it will. But before I close the ticket I will investigate whether there is a not-too-difficult way to actually get a parse failure in this case. Maybe there is none, but at least I can take a look.

Sat Sep 26 10:52:07 2015 The RT System itself - Status changed from 'new' to 'open'

Mon Sep 28 22:18:07 2015 wyant [...] cpan.org - Correspondence added

Subject:

Trailing characters after valid expression cause obscure parse failures

PPIx-Regexp version 0.041_01 went to PAUSE this afternoon. What I have done is beef up the initial scan so that it actually finds the matching brackets in all cases (it already has to for the regex in a s///), and makes any extra non-whitespace into a PPIx::Regexp::Token::Unknown. Your problematic expression now parses like this: PPIx::Regexp failures=1 PPIx::Regexp::Token::Structure 'qr' PPIx::Regexp::Structure::Regexp / ... / PPIx::Regexp::Token::Literal 'f' PPIx::Regexp::Token::Literal 'o' PPIx::Regexp::Token::Literal 'o' PPIx::Regexp::Token::Whitespace ' ' PPIx::Regexp::Token::Literal 'b' PPIx::Regexp::Token::Literal 'a' PPIx::Regexp::Token::Literal 'r' PPIx::Regexp::Token::Whitespace ' ' PPIx::Regexp::Token::Comment '(?# first)' PPIx::Regexp::Token::Whitespace ' ' PPIx::Regexp::Token::Literal 'b' PPIx::Regexp::Token::Literal 'a' PPIx::Regexp::Token::Literal 'z' PPIx::Regexp::Token::Whitespace ' ' PPIx::Regexp::Token::Comment '# second ' PPIx::Regexp::Token::Modifier 'x' PPIx::Regexp::Token::Unknown '\' ' Trailing characters after expression

Mon Sep 28 22:18:13 2015 wyant [...] cpan.org - Status changed from 'open' to 'patched'

Mon Sep 28 22:18:14 2015 wyant [...] cpan.org - Fixed in 0.041_01 added

Mon Sep 28 22:19:38 2015 wyant [...] cpan.org - Subject changed from '/x mode end-of-line comments are not recognized' to 'Trailing characters after valid expression produce obscure parse failures.'

Tue Sep 29 13:08:31 2015 k-rindfrey [...] gmx.de - Correspondence added

From:

k-rindfrey [...] gmx.de

Thanks, looks good. Regards, Klaus

Fri Oct 16 14:32:48 2015 wyant [...] cpan.org - Status changed from 'patched' to 'resolved'

Fri Oct 16 14:32:54 2015 wyant [...] cpan.org - Severity Wishlist added

Bug #107331 for PPIx-Regexp: Trailing characters after valid expression produce obscure parse failures.

Preferred bug tracker