Subject: | Escaped multi-character delimiters not recognised in balanced.pm |
Date: | Mon, 13 Mar 2017 16:56:16 +0100 |
To: | bug-Regexp-Common [...] rt.cpan.org |
From: | BPJ <bpj [...] melroch.se> |
Consider these examples:
say my $re = $RE{balanced}{-begin=>'<<'}{-end=>'>>'};
(?^:((?:\<\<(?:(?>[^\<\>]+)|\<(?!\<)|\>(?!\>)|(?-1))*\>\>)))
'<< 3 > 1 >>' =~ $re;
say $1;
<< 3 > 1 >>
say $re =$RE{balanced}{-begin=>'\<\<'}{-end=>'\>\>'};
(?^:((?:\<(?:(?>[^\<\>]+)|(?-1))*\>)))
'<< 3 > 1 >>' =~ $re;
say $1;
<< 3 > 1 >
The difference is that in the second the delimiters are backslash
escaped (they were quotemetaed variables in my actual code) and
suddenly the delimiters are considered to be `<` and `>` rather
than `<<` and `>>`! If this is really the intended behavior then
at least the documentation should mention it. I have tracked it
down to lines 22 and 24 in balanced.pm.
If the parentheses in `/([^|\\]+|\\.)+/gs` are made non-capturing
the behavior would be the same as when the delimiters are not
escaped. The `/g` ensures you get a list of all the pipe-delimited
substrings anyway.
I realize that even if this behavior is strange -- it becomes
impossible to use delimiters which contain backslash or pipe -- it
has seemingly been like this always so it might be too late to
change it, but perhaps a flag `-allow_escapes` or some such would
be possible?