Subject: | \d matches more than [0-9] (unicode) |
Here is a test case:
$ perl -MRegexp::Assemble -C -E 'say Regexp::Assemble->new->add(qw(0 1 2
3 4 5 6 7 8 9))->as_string'
Output of R::A 0.34:
\d
This is wrong because \d matches more than [0-9]: it matches any unicode
digit, including digits in other writings than latin.
For example, \x{0966} is matched by \d:
$ perl -C -E 'say "Matched! \x{0966}" if "\x{0966}" =~ /^\d$/'
The Java API documentation has a list of ranges of unicode digits:
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Character.html#isDigit%28char%29
--
Olivier Mengué - http://o.mengue.free.fr/