Subject: | Feature request, titlecase doesn't handle '4th' etc correctly. |
Date: | Tue, 5 May 2020 13:58:18 +0200 |
To: | bug-Lingua-EN-Titlecase [...] rt.cpan.org |
From: | John Tweed <rosyth168 [...] gmail.com> |
Hi,
Thanks for the work you made in creating titlecase, it's been helpful
normalising my record collection. Except for names like 1st, 3rd, 4th
etc, that translate to 1St, 3Rd or 4Th, not the intended result.
Here is my not very rigorous modification to address this issue. (Where
.bak is the original).
$ diff -Naur ../Titlecase.pm*
--- ../Titlecase.pm 2020-05-05 13:48:49.030787393 +0200
+++ ../Titlecase.pm.bak 2020-05-05 13:55:01.632107658 +0200
@@ -10,7 +10,6 @@
uc_threshold
mixed_threshold
mixed_rx
- numeric_rx
wordish_rx
allow_mixed
word_punctuation
@@ -108,9 +107,6 @@
|
\G(?<!\A)[[:upper:]]
/x) unless $self->mixed_rx;
- $self->numeric_rx(qr/
- [[:digit:]]+(?:th|st|nd|rd)
- /x) unless $self->numeric_rx;
$self->allow_mixed(undef);
$self->mixed_threshold(0.25) unless $self->mixed_threshold;
@@ -212,11 +208,8 @@
my $wp = $self->word_punctuation;
my $wordish = $self->wordish_rx;
- my $numeric = $self->numeric_rx;
$self->{_lexer} = sub {
- # print("LEXER -> ",$_[0]);
- $_[0] =~ s/\A($numeric)// and return [ "word", "$1" ];
$_[0] =~ s/\A($wordish)// and return [ "word", "$1" ];
$_[0] =~ s/\A(.)//s and return [ undef, "$1" ];
return ();