Skip Menu |

This queue is for tickets about the MooseX-Types-Common CPAN distribution.

Report information
The Basics
Id: 84547
Status: resolved
Priority: 0/
Queue: MooseX-Types-Common

People
Owner: ether [...] cpan.org
Requestors: perl [...] toby.ink
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 0.001000
Fixed in: 0.001009



Subject: regexps for upper and lower case are anglocentric
Using \p{Upper} and \p{Lower} would be an improvement. Test case attached.
Subject: not-just-a-z.t
=pod =encoding utf-8 =head1 PURPOSE At the time of writing, MooseX::Types::Common::String uses regular expressions featuring C<< [a-z] >> and C<< [A-Z] >> to test lower- and upper-caseness. This is very anglocentric - there are many, many other lower- and upper-case characters commonly used in other languages. The current situation is not even sufficient for English text where many loan words, and even some native words include accented characters. These include I<< café >>, I<< encycopædia >> and I<< naïve >>. There's no excuse for this; Perl has very good Unicode support, including built-in character classes for matching lower- and upper-case characters. =head1 AUTHOR Toby Inkster E<lt>tobyink@cpan.orgE<gt>. =head1 COPYRIGHT AND LICENCE This software is copyright (c) 2013 by Toby Inkster. This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself. =cut use strict; use warnings; use utf8; use Test::More; use MooseX::Types::Common::String -all; ok( is_UpperCaseStr('CAFÉ'), q[CAFÉ is uppercase] ); ok( !is_UpperCaseStr('CAFé'), q[CAFé is not (entirely) uppercase] ); ok( is_LowerCaseStr('café'), q[café is lowercase] ); ok( !is_LowerCaseStr('cafÉ'), q[cafÉ is not (entirely) lowercase] ); ok( is_UpperCaseSimpleStr('CAFÉ'), q[CAFÉ is uppercase] ); ok( !is_UpperCaseSimpleStr('CAFé'), q[CAFé is not (entirely) uppercase] ); ok( is_LowerCaseSimpleStr('café'), q[café is lowercase] ); ok( !is_LowerCaseSimpleStr('cafÉ'), q[cafÉ is not (entirely) lowercase] ); done_testing;
On Wed Apr 10 00:50:44 2013, TOBYINK wrote: Show quoted text
> Using \p{Upper} and \p{Lower} would be an improvement.
Agreed, although this would bump the minimum perl requirement to whatever version added these regexp character classes.
\p{Upper} and \p{Lower} work in Perl 5.6.2 which is already not supported by Moose. I'm not claiming they work as well as \p{Upper} and \p{Lower} do under more modern releases, but Unicode support seems sufficiently good to detect E with an acute accent as upper- or lower-case correctly.
thanks, will be released shortly.