Bug #61206 for Date-Manip: Lang files have have corrupt escape characters

Thu Sep 09 12:10:59 2010 1012web [...] ostermiller.com - Ticket created

Subject:

Lang files have have corrupt escape characters

It appears that the language resources for the Date::Manip project do not have proper character sets. I believe that all data in Perl is interpreted as UTF-8. However the language resources for Date::Manip are in a variety of character sets. I do not believe that they are being read in and used by the library correctly. There are a few language resources without \x escape sequences. These files are 100% ASCII and are fine: Lang/dutch.pm Lang/english.pm Lang/index.pm Lang/spanish.pm Lang/russian.pm is in the koi8-r character set Lang/swedish.pm is in the ISO-8859-15 character set Lang/polish.pm is in an unknown character set. I believe that this file is corrupt and non-recoverable. I suspect that it was at one time in ISO-8859-2 but it has been incorrectly interpreted as UTF-8 which corrupted it. The remaining files are in the ISO-8859-1 character set. I wrote a script called fixlang.sh (attached) that unescapes the \X## escape sequences in each file, uses iconv to translate from the identified character set to UTF-8, escapes any high byte charecters with \X## escape sequences, and writes the files back to a LangNew directory. Because the polish.pm file is corrupt, this process corrupts it further. All the other files have proper UTF-8 escaped text in them after the process.

Subject:

fixlang.sh

Download fixlang.sh
application/x-sh 619b

Message body not shown because it is not plain text.

Subject:

diffs.txt

Message body is not shown because it is too large.

Fri Sep 10 13:20:53 2010 sbeck [...] cpan.org - Correspondence added

I've been working on fixing the language files. I have just completed converting all of them to UTF-8 (the last couple I used your script to help with). I've also fixed the couple problems you found. All of this will be in the next release.

Fri Sep 10 13:20:54 2010 The RT System itself - Status changed from 'new' to 'open'

Fri Sep 10 13:20:54 2010 sbeck [...] cpan.org - Status changed from 'open' to 'resolved'