Bug #84547 for MooseX-Types-Common: regexps for upper and lower case are anglocentric

Wed Apr 10 03:50:44 2013 perl [...] toby.ink - Ticket created

Subject:

regexps for upper and lower case are anglocentric

Using \p{Upper} and \p{Lower} would be an improvement. Test case attached.

Subject:

not-just-a-z.t

=pod =encoding utf-8 =head1 PURPOSE At the time of writing, MooseX::Types::Common::String uses regular expressions featuring C<< [a-z] >> and C<< [A-Z] >> to test lower- and upper-caseness. This is very anglocentric - there are many, many other lower- and upper-case characters commonly used in other languages. The current situation is not even sufficient for English text where many loan words, and even some native words include accented characters. These include I<< cafÃ© >>, I<< encycopÃ¦dia >> and I<< naÃ¯ve >>. There's no excuse for this; Perl has very good Unicode support, including built-in character classes for matching lower- and upper-case characters. =head1 AUTHOR Toby Inkster E<lt>tobyink@cpan.orgE<gt>. =head1 COPYRIGHT AND LICENCE This software is copyright (c) 2013 by Toby Inkster. This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself. =cut use strict; use warnings; use utf8; use Test::More; use MooseX::Types::Common::String -all; ok( is_UpperCaseStr('CAFÃ'), q[CAFÃ is uppercase] ); ok( !is_UpperCaseStr('CAFÃ©'), q[CAFÃ© is not (entirely) uppercase] ); ok( is_LowerCaseStr('cafÃ©'), q[cafÃ© is lowercase] ); ok( !is_LowerCaseStr('cafÃ'), q[cafÃ is not (entirely) lowercase] ); ok( is_UpperCaseSimpleStr('CAFÃ'), q[CAFÃ is uppercase] ); ok( !is_UpperCaseSimpleStr('CAFÃ©'), q[CAFÃ© is not (entirely) uppercase] ); ok( is_LowerCaseSimpleStr('cafÃ©'), q[cafÃ© is lowercase] ); ok( !is_LowerCaseSimpleStr('cafÃ'), q[cafÃ is not (entirely) lowercase] ); done_testing;

Wed Apr 10 10:47:06 2013 ether [...] cpan.org - Correspondence added

On Wed Apr 10 00:50:44 2013, TOBYINK wrote: Show quoted text

> Using \p{Upper} and \p{Lower} would be an improvement.

Agreed, although this would bump the minimum perl requirement to whatever version added these regexp character classes.

Wed Apr 10 10:47:06 2013 The RT System itself - Status changed from 'new' to 'open'

Wed Apr 10 10:47:06 2013 ether [...] cpan.org - Status changed from 'open' to 'patched'

Wed Apr 10 10:47:06 2013 ether [...] cpan.org - Taken

Wed Apr 10 17:12:01 2013 perl [...] toby.ink - Correspondence added

\p{Upper} and \p{Lower} work in Perl 5.6.2 which is already not supported by Moose. I'm not claiming they work as well as \p{Upper} and \p{Lower} do under more modern releases, but Unicode support seems sufficiently good to detect E with an acute accent as upper- or lower-case correctly.

Sat Sep 14 18:08:08 2013 ether [...] cpan.org - Correspondence added

thanks, will be released shortly.

Sat Sep 14 18:08:09 2013 ether [...] cpan.org - Status changed from 'patched' to 'resolved'

Thu Oct 23 19:05:46 2014 ether [...] cpan.org - Broken in 0.001001 deleted

Thu Oct 23 19:05:46 2014 ether [...] cpan.org - Broken in 0.001002 deleted

Thu Oct 23 19:05:46 2014 ether [...] cpan.org - Broken in 0.001003 deleted

Thu Oct 23 19:05:47 2014 ether [...] cpan.org - Broken in 0.001004 deleted

Thu Oct 23 19:05:47 2014 ether [...] cpan.org - Broken in 0.001005 deleted

Thu Oct 23 19:05:47 2014 ether [...] cpan.org - Broken in 0.001006 deleted

Thu Oct 23 19:05:47 2014 ether [...] cpan.org - Broken in 0.001007 deleted

Thu Oct 23 19:05:48 2014 ether [...] cpan.org - Broken in 0.001008 deleted

Thu Oct 23 19:05:48 2014 ether [...] cpan.org - Fixed in 0.001009 added