Subject: | Recognition failures: UTF-8 BOM, /n regex modifier, ${!} |
Thank you so much for creating this module! I hope to make good use of it. I've noticed some things when testing it against some code.
If a UTF-8 BOM is present, it inaccurately gives a false failure. Wikipedia referenced Unicode to state that it should only appear at the beginning of a text stream, but if it appears elsewhere, it should be regarded as a zero-width non-breaking space. I've attached a patch for the common and proper case (beginning of text stream) because that resolves my itch. Long-term, would it be better to modify many of the instances of \s in the regexes to \p{Whitespace}?
If a /n regex modifier is present, that also causes recognition failure. I've included that in the patch.
Also, some built-in Perl variables weren't being recognized when written in a non-normative, but valid format (e.g., ${!}), so I've included a patch for that as well.
Subject: | bom_slash_n_dollar_bang.patch |
--- a/lib/perl5/site_perl/5.22.1/PPR.pm
+++ b/lib/perl5/site_perl/5.22.1/PPR.pm
@@ -62,7 +62,7 @@ use utf8;
our $GRAMMAR = qr{
(?(DEFINE)
(?<PerlDocument>
- (?>(?&PerlOWS))
+ (\x{feff})?+ (?>(?&PerlOWS))
(?: (?>(?&PerlStatement)) (?&PerlOWS) )*+
) # End of rule
@@ -820,6 +820,8 @@ our $GRAMMAR = qr{
|
[][!"#\$%&'()*+,.\\/:;<=>?\@\^`|~-]
|
+ \{ [!"#\$%&'()*+,.\\/:;<=>?\@\^`|~-] \}
+ |
\{ \w++ \}
|
(?&PerlBlock)
@@ -1098,7 +1100,7 @@ our $GRAMMAR = qr{
(?>(?&PPR_quotelike_body_interpolated_unclosed))
(?&PPR_quotelike_body_interpolated)
)
- [msixpodualgcer]*+
+ [msixpodualgcern]*+
) # End of rule
) # End of rule
@@ -1143,7 +1145,7 @@ our $GRAMMAR = qr{
)
(?&PPR_quotelike_body_interpolated)
)
- [msixpodualgc]*+
+ [msixpodualgcn]*+
) # End of rule
) # End of rule
(?=
@@ -1160,7 +1162,7 @@ our $GRAMMAR = qr{
qr \b
(?> (?= [#] ) | (?! (?>(?&PerlOWS)) => ) )
(?>(?&PPR_quotelike_body_interpolated))
- [msixpodual]*+
+ [msixpodualn]*+
) # End of rule
(?<PerlRegex>