Subject: | Date::Language::* encodings need to be standardized |
While it's a neat thing to be able to use localized period names transparently by just plugging a new module into Date::Language, the approach fails completely if plugins don't agree on character encoding:
$ perl -CO -MDate::Language -e'for(qw/ German Greek Chinese /){$t=Date::Language->new($_); print $t->time2str("%B ", 28*86400*$_) for 1..12; print "\n"}'
Januar Februar März April Mai Juni Juli August September Oktober November Dezember
Ιανουαρίου Φεβρουαρίου Μαρτίου Απριλίυ Μαΐου Ιουνίου Ιουλίου Αυγούστου Σεπτεμτου Οκτωβρίου Νοεμβρίου Δεκεμβρου
ä¸æ äºæ ä¸æ åæ äºæ å
æ ä¸æ å
«æ ä¹æ åæ åä¸æ åäºæ
German and Greek work fine; German uses uses Latin-1 strings that upgrade transparently while Greek has UTF-8 encoded as "\x{03..}" escapes. Chinese is in UTF-8 directly but without the "use utf8" so it returns UTF-8 as a byte string.
$ perl -CO -MDevel::Peek -MDate::Language -e'for(qw/ German Greek Chinese /){print STDERR "$_\n";Dump($t=Date::Language->new($_)->time2str("%B", 0))}'
German
SV = PVMG(0x85a470) at 0x826058
FLAGS = (POK,IsCOW,pPOK)
PV = 0x8614a0 "Januar"\0
Greek
SV = PVMG(0x85a470) at 0x826058
FLAGS = (POK,IsCOW,pPOK,UTF8)
PV = 0x82b4b0 "\316\231\316\261\316\275\316\277\317\205\316\261\317\201\316\257\316\277\317\205"\0 [UTF8 "\x{399}\x{3b1}\x{3bd}\x{3bf}\x{3c5}\x{3b1}\x{3c1}\x{3af}\x{3bf}\x{3c5}"]
Chinese
SV = PVMG(0x85a470) at 0x826058
FLAGS = (POK,pPOK)
PV = 0x82b4b0 "\344\270\200\346\234\210"\0
[boring lines deleted]
As I see it, that's a pretty hard one to fix without causing incompatibilities unless you want to do it the PHP way and add *_utf8 versions of everything (Ick!)
Perhaps a new constructor option would do so you could say
Date::Language->new('Chinese', encoding => 'utf8');
Although from the way the constructor works this doesn't seem straightforward either.
In any case, language plugins should not return anything but UTF-8 text in 2016 and probably all use the utf8 pragma explicitly so text is readable in a regular editor unlike D::L::Greek.
I might contribute a Lao and possibly Thai module if I don't have to hack my own decoding logic :)