Subject: | Lang files have have corrupt escape characters |
It appears that the language resources for the Date::Manip project do
not have proper character sets.
I believe that all data in Perl is interpreted as UTF-8. However the
language resources for Date::Manip are in a variety of character sets.
I do not believe that they are being read in and used by the library
correctly.
There are a few language resources without \x escape sequences. These
files are 100% ASCII and are fine:
Lang/dutch.pm
Lang/english.pm
Lang/index.pm
Lang/spanish.pm
Lang/russian.pm is in the koi8-r character set
Lang/swedish.pm is in the ISO-8859-15 character set
Lang/polish.pm is in an unknown character set. I believe that this file
is corrupt and non-recoverable. I suspect that it was at one time in
ISO-8859-2 but it has been incorrectly interpreted as UTF-8 which
corrupted it.
The remaining files are in the ISO-8859-1 character set.
I wrote a script called fixlang.sh (attached) that unescapes the \X##
escape sequences in each file, uses iconv to translate from the
identified character set to UTF-8, escapes any high byte charecters with
\X## escape sequences, and writes the files back to a LangNew directory.
Because the polish.pm file is corrupt, this process corrupts it further.
All the other files have proper UTF-8 escaped text in them after the
process.
Subject: | fixlang.sh |
Message body not shown because it is not plain text.
Subject: | diffs.txt |
Message body is not shown because it is too large.