Skip Menu |

Preferred bug tracker

Please email the preferred bug tracker to report your issue.

This queue is for tickets about the PPIx-Regexp CPAN distribution.

Report information
The Basics
Id: 91798
Status: resolved
Priority: 0/
Queue: PPIx-Regexp

People
Owner: Nobody in particular
Requestors: monmon [...] cpan.org
Cc: lesamoureuses [...] gmail.com
AdminCc:

Bug Information
Severity: (no value)
Broken in: 0.033
Fixed in: 0.036



CC: lesamoureuses [...] gmail.com
Subject: perl_version_introduced bug
Japanese Katakana "ム" is represented by octal code "\343\203\240". Using it and /x, perl_version_introduced value is not right. === my $re = PPIx::Regexp->new("qr/\343\203\240/x"); print $re->perl_version_introduced; #=> 5.017009 # If it's PPIx-Regexp-0.032, the result is 5.005. === "\343\203\240" is represented by hexadecimal code "\x{E3}\x{83}\x{A0}". The last word "\x{A0}" is interpreted as Whitespace. === # use Data::Dumper 'children' => [ bless( { 'content' => ' }, 'PPIx::Regexp::Token::Literal' ), bless( { 'content' => '�' }, 'PPIx::Regexp::Token::Literal' ), bless( { 'perl_version_introduced' => '5.017009', 'content' => '�' }, 'PPIx::Regexp::Token::Whitespace' ) ], ===
On Thu Jan 02 12:45:33 2014, MONMON wrote: Show quoted text
> Japanese Katakana "ム" is represented by octal code "\343\203\240". > Using it and /x, perl_version_introduced value is not right. > > > === > my $re = PPIx::Regexp->new("qr/\343\203\240/x"); > print $re->perl_version_introduced; #=> 5.017009 # If it's PPIx- > Regexp-0.032, the result is 5.005. > === > > > "\343\203\240" is represented by hexadecimal code > "\x{E3}\x{83}\x{A0}". > The last word "\x{A0}" is interpreted as Whitespace. > > > === > # use Data::Dumper > 'children' => [ > bless( { > 'content' => ' > }, 'PPIx::Regexp::Token::Literal' ), > bless( { > 'content' => '�' > }, 'PPIx::Regexp::Token::Literal' ), > bless( { > 'perl_version_introduced' => '5.017009', > 'content' => '�' > }, 'PPIx::Regexp::Token::Whitespace' ) > ], > ===
Thank you for your report. It seems to contain all sorts of interesting things. The first is the perl_version_introduced thing. I believe the correct response is 5.005, because that is when 'qr{}' was introduced. And that is the result produced by demonstration program eg/predump, which I rely on heavily for troubleshooting. But when I cut-and-paste your code into a stand-alone Perl script, I get 5.017009, just as you do, and it is far from obvious to me why. The information that it worked correctly in 0.032 is valuable, because it means I can investigate based on the changes between the two versions. The second thing is that it looks to me to be desirable for PPIx::Regexp to parse the content of the regexp as a single Unicode character, rather than as three escape sequences. I am not sure how to make that happen, since one of the requirements for the module is that it NOT eval() strings. For the moment it will have to just go on the wish list.
It took me a while to understand what was going on, but eventually I got it, I think. The basic problem was that I should not have implemented the change described as "Allow non-ASCII white space under /x." But I misunderstood what perl5179delta said was happening with non-ASCII white space. Also, I was using \s to detect white space for the purpose of blessing tokens into PPIx::Regexp::Token::Whitespace rather than PPIx::Regexp::Token::Literal. But \s matches too much. In fact, in the code installed in version 0.033, it was the \s that was matching "\240". So the \s has been replaced by an explicit character class. The contents of this class were verified both by the docs and by actually reading regcomp.c. These changes are in version 0.036, which just went to PAUSE, and should be appearing on CPAN mirrors in a few hours. I will leave the RT ticket open for a week or so, and then close it if there are no further problems.
RT-Send-CC: lesamoureuses [...] gmail.com
Thank you so much for looking into this issue! It has been resolved! On 2014-1月-04 土 18:22:54, WYANT wrote: Show quoted text
> It took me a while to understand what was going on, but eventually I > got it, I think. > > The basic problem was that I should not have implemented the change > described as "Allow non-ASCII white space under /x." But I > misunderstood what perl5179delta said was happening with non-ASCII > white space. > > Also, I was using \s to detect white space for the purpose of blessing > tokens into PPIx::Regexp::Token::Whitespace rather than > PPIx::Regexp::Token::Literal. But \s matches too much. In fact, in the > code installed in version 0.033, it was the \s that was matching > "\240". So the \s has been replaced by an explicit character class. > The contents of this class were verified both by the docs and by > actually reading regcomp.c. > > These changes are in version 0.036, which just went to PAUSE, and > should be appearing on CPAN mirrors in a few hours. I will leave the > RT ticket open for a week or so, and then close it if there are no > further problems.