Skip Menu |

This queue is for tickets about the Regexp-Assemble CPAN distribution.

Report information
The Basics
Id: 24171
Status: resolved
Priority: 1/
Queue: Regexp-Assemble

People
Owner: dland [...] cpan.org
Requestors: book [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.28
Fixed in: 0.31



Subject: \s and \S should not collapse into . (nor should \w / \W or \d / \D)
Hi David, While reading the Regexp::Assemble documentation, I saw the following paragraph: It also knows about meta-characters than can "absorb" regular characters. For instance, given "X\d" and "X5", it knows that 5 can be represented by "\d" and so the assembly is just "X\d". The "absorbent" meta-characters it deals with are ".", "\d", "\s" and "\W" and their complements. It will replace "\d"/"\D", "\s"/"\S" and "\w"/"\W" by "." (dot), and it will drop "\d" if "\w" is also present (as will "\D" in the presence of "\W"). The fact that '\s' and '\S' are merged into '.' sounds like a bug to me, as shows the following test script: use strict; use warnings; use Test::More; use Assemble; # given a list of strings my @str = ( 'a b', 'awb', 'a1b', 'bar', "a\nb" ); plan tests => 3 * @str; for my $meta (qw( s w d )) { # given a list of patterns my @re = ( "a\\${meta}b", "a\\@{[uc$meta]}b" ); # produce an assembled pattern my $re = Regexp::Assemble->new()->add(@re)->re(); # test it against the strings for my $str (@str) { # any match? my $ok = ''; $str =~ $_ && ( $ok = 1 ) for @re; # does the assemble regexp match as well? my $ptr = $str; $ptr =~ s/\\/\\\\/; $ptr =~ s/\n/\\n/; is( $str =~ $re, $ok, "Assembled regexp behaves as the list for \\$meta ($ptr)" ) } } The execution produces (under Win32 and Linux): 1..15 ok 1 - Assembled regexp behaves as the list for \s (a b) ok 2 - Assembled regexp behaves as the list for \s (awb) ok 3 - Assembled regexp behaves as the list for \s (a1b) ok 4 - Assembled regexp behaves as the list for \s (bar) not ok 5 - Assembled regexp behaves as the list for \s (a\nb) # Failed test (ra.pl at line 30) # got: '' # expected: '1' ok 6 - Assembled regexp behaves as the list for \w (a b) ok 7 - Assembled regexp behaves as the list for \w (awb) ok 8 - Assembled regexp behaves as the list for \w (a1b) ok 9 - Assembled regexp behaves as the list for \w (bar) not ok 10 - Assembled regexp behaves as the list for \w (a\nb) # Failed test (ra.pl at line 30) # got: '' # expected: '1' ok 11 - Assembled regexp behaves as the list for \d (a b) ok 12 - Assembled regexp behaves as the list for \d (awb) ok 13 - Assembled regexp behaves as the list for \d (a1b) ok 14 - Assembled regexp behaves as the list for \d (bar) not ok 15 - Assembled regexp behaves as the list for \d (a\nb) # Failed test (ra.pl at line 30) # got: '' # expected: '1' # Looks like you failed 3 tests of 15. This simply shows that '.' is not the same as the assembly of \s and \S (nor \w and \W, nor \d and \D), when one is not using the /s flag. The patch is to produce '(?:.|\n)' instead of '.' when you cannot be sure that the /s flag is enabled in the resulting regexp. In my opinion, the only case when you replace such a combination with a '.' is when the /s is explicitely set with the (?s:) construct. I don't know how it works on different platforms that have different interpretations for "\n". Regards (and happy new year nonetheless), -- BooK
On Mon Jan 01 05:16:11 2007, BOOK wrote: Show quoted text
> Hi David,
[...] Show quoted text
> This simply shows that '.' is not the same as the assembly of \s and \S > (nor \w and \W, nor \d and \D), when one is not using the /s flag. > > The patch is to produce '(?:.|\n)' instead of '.' when you cannot be sure > that the /s flag is enabled in the resulting regexp. In my opinion, > the only case when you replace such a combination with a '.' is when > the /s is explicitely set with the (?s:) construct. I don't know how it > works on different platforms that have different interpretations for > "\n".
Salut Philippe, good catch, you are absolutely correct. I'll fix this up. Probably making it optional for people to enable if they know their data, and off by default. Thanks, David