Skip Menu |

This queue is for tickets about the PPR CPAN distribution.

Report information
The Basics
Id: 122824
Status: resolved
Priority: 0/
Queue: PPR

People
Owner: Nobody in particular
Requestors: unobe [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.000011
Fixed in: (no value)



Subject: Recognition failures: UTF-8 BOM, /n regex modifier, ${!}
Thank you so much for creating this module! I hope to make good use of it. I've noticed some things when testing it against some code. If a UTF-8 BOM is present, it inaccurately gives a false failure. Wikipedia referenced Unicode to state that it should only appear at the beginning of a text stream, but if it appears elsewhere, it should be regarded as a zero-width non-breaking space. I've attached a patch for the common and proper case (beginning of text stream) because that resolves my itch. Long-term, would it be better to modify many of the instances of \s in the regexes to \p{Whitespace}? If a /n regex modifier is present, that also causes recognition failure. I've included that in the patch. Also, some built-in Perl variables weren't being recognized when written in a non-normative, but valid format (e.g., ${!}), so I've included a patch for that as well.
Subject: bom_slash_n_dollar_bang.patch
--- a/lib/perl5/site_perl/5.22.1/PPR.pm +++ b/lib/perl5/site_perl/5.22.1/PPR.pm @@ -62,7 +62,7 @@ use utf8; our $GRAMMAR = qr{ (?(DEFINE) (?<PerlDocument> - (?>(?&PerlOWS)) + (\x{feff})?+ (?>(?&PerlOWS)) (?: (?>(?&PerlStatement)) (?&PerlOWS) )*+ ) # End of rule @@ -820,6 +820,8 @@ our $GRAMMAR = qr{ | [][!"#\$%&'()*+,.\\/:;<=>?\@\^`|~-] | + \{ [!"#\$%&'()*+,.\\/:;<=>?\@\^`|~-] \} + | \{ \w++ \} | (?&PerlBlock) @@ -1098,7 +1100,7 @@ our $GRAMMAR = qr{ (?>(?&PPR_quotelike_body_interpolated_unclosed)) (?&PPR_quotelike_body_interpolated) ) - [msixpodualgcer]*+ + [msixpodualgcern]*+ ) # End of rule ) # End of rule @@ -1143,7 +1145,7 @@ our $GRAMMAR = qr{ ) (?&PPR_quotelike_body_interpolated) ) - [msixpodualgc]*+ + [msixpodualgcn]*+ ) # End of rule ) # End of rule (?= @@ -1160,7 +1162,7 @@ our $GRAMMAR = qr{ qr \b (?> (?= [#] ) | (?! (?>(?&PerlOWS)) => ) ) (?>(?&PPR_quotelike_body_interpolated)) - [msixpodual]*+ + [msixpodualn]*+ ) # End of rule (?<PerlRegex>
Subject: Re: [rt.cpan.org #122824] Recognition failures: UTF-8 BOM, /n regex modifier, ${!}
Date: Thu, 17 Aug 2017 07:05:41 +0000
To: bug-PPR [...] rt.cpan.org
From: Damian Conway <damian [...] conway.org>
Hi David, Thanks for the bug reports...and even more so for the patches! :-) I've now applied them all for the next release...which may be a few days yet as I'm still on the road, and still working on a particularly nasty corner case with parsing quotelikes. Show quoted text
> Long-term, would it be better to modify many of the instances > of \s in the regexes to \p{Whitespace}?
I'm not sure that would solve the problem, as I believe the BOM isn't actually included in the Unicode \p{Whitespace} property. Even if it is, even in the latest Perl release: "\x{FEFF}" =~ /\p{Whitespace}/ doesn't match. I might need to look at using [\p{Whitespace}\p{Cf}] instead. I'll need to look at whether that introduces a detectable performance hit though. And one might argue that non-leading BOMs *ought* to be invalid. :-) Much appreciated, Damian
Subject: Re: [rt.cpan.org #122824] Recognition failures: UTF-8 BOM, /n regex modifier, ${!}
Date: Mon, 21 Aug 2017 17:43:51 +0000
To: bug-PPR [...] rt.cpan.org
From: Damian Conway <damian [...] conway.org>
Resolved in the latest release (0.000012). Thanks again, Damian