Skip Menu |

This queue is for tickets about the TimeDate CPAN distribution.

Report information
The Basics
Id: 113419
Status: open
Priority: 0/
Queue: TimeDate

People
Owner: Nobody in particular
Requestors: MBETHKE [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 2.30
Fixed in: (no value)



Subject: Date::Language::* encodings need to be standardized
While it's a neat thing to be able to use localized period names transparently by just plugging a new module into Date::Language, the approach fails completely if plugins don't agree on character encoding: $ perl -CO -MDate::Language -e'for(qw/ German Greek Chinese /){$t=Date::Language->new($_); print $t->time2str("%B ", 28*86400*$_) for 1..12; print "\n"}' Januar Februar März April Mai Juni Juli August September Oktober November Dezember Ιανουαρίου Φεβρουαρίου Μαρτίου Απριλίυ Μαΐου Ιουνίου Ιουλίου Αυγούστου Σεπτεμτου Οκτωβρίου Νοεμβρίου Δεκεμβρου 一月 二月 三月 四月 五月 六月 七月 八月 九月 十月 十一月 十二月 German and Greek work fine; German uses uses Latin-1 strings that upgrade transparently while Greek has UTF-8 encoded as "\x{03..}" escapes. Chinese is in UTF-8 directly but without the "use utf8" so it returns UTF-8 as a byte string. $ perl -CO -MDevel::Peek -MDate::Language -e'for(qw/ German Greek Chinese /){print STDERR "$_\n";Dump($t=Date::Language->new($_)->time2str("%B", 0))}' German SV = PVMG(0x85a470) at 0x826058 FLAGS = (POK,IsCOW,pPOK) PV = 0x8614a0 "Januar"\0 Greek SV = PVMG(0x85a470) at 0x826058 FLAGS = (POK,IsCOW,pPOK,UTF8) PV = 0x82b4b0 "\316\231\316\261\316\275\316\277\317\205\316\261\317\201\316\257\316\277\317\205"\0 [UTF8 "\x{399}\x{3b1}\x{3bd}\x{3bf}\x{3c5}\x{3b1}\x{3c1}\x{3af}\x{3bf}\x{3c5}"] Chinese SV = PVMG(0x85a470) at 0x826058 FLAGS = (POK,pPOK) PV = 0x82b4b0 "\344\270\200\346\234\210"\0 [boring lines deleted] As I see it, that's a pretty hard one to fix without causing incompatibilities unless you want to do it the PHP way and add *_utf8 versions of everything (Ick!) Perhaps a new constructor option would do so you could say Date::Language->new('Chinese', encoding => 'utf8'); Although from the way the constructor works this doesn't seem straightforward either. In any case, language plugins should not return anything but UTF-8 text in 2016 and probably all use the utf8 pragma explicitly so text is readable in a regular editor unlike D::L::Greek. I might contribute a Lao and possibly Thai module if I don't have to hack my own decoding logic :)