Subject: | Output of fc kills Encode::decode |
Hi
I'm processing subcountry names in Estonia, from:
http://en.wikipedia.org/wiki/ISO_3166-2:EE
I got to that page from the list of all countries:
http://en.wikipedia.org/wiki/ISO_3166-2
Code:
for my $element (@$table)
{
$i++;
$self -> log(debug => "code: $$element{code}");
$self -> log(debug => "name: $$element{name}");
$self -> log(debug => "decode: " . decode('utf8',
$$element{name}));
$self -> log(debug => "decode fc: " . decode('utf8', fc
$$element{name}));
$sth -> execute($country_id, $$element{code}, decode('utf8', fc
$$element{name}), decode('utf8', $$element{name}), $i);
}
Output:
debug: code: EE-37.
debug: name: Harjumaa.
debug: decode: Harjumaa.
debug: decode fc: harjumaa.
debug: code: EE-39.
debug: name: Hiiumaa.
debug: decode: Hiiumaa.
debug: decode fc: hiiumaa.
debug: code: EE-44.
debug: name: Ida-Virumaa.
debug: decode: Ida-Virumaa.
debug: decode fc: ida-virumaa.
debug: code: EE-49.
debug: name: Jõgevamaa.
debug: decode: Jõgevamaa.
Cannot decode string with wide characters at
/home/ron/perl5/perlbrew/perls/perl-5.14.2/lib/5.14.2/x86_64-linux-
thread-multi/Encode.pm line 176.
So, the call to fc returns something unacceptable to decode, when the
name is Jõgevamaa.
I rigged the code to skip Estonia, and the code works in all other
countries and their subcountries.
I then rigged the code to skip Jõgevamaa, and the next place it dies is:
debug: code: EE-65.
debug: name: Põlvamaa.
debug: decode: Põlvamaa.
Cannot decode string with wide characters at
/home/ron/perl5/perlbrew/perls/perl-5.14.2/lib/5.14.2/x86_64-linux-
thread-multi/Encode.pm line 176.
I.e The names corresponding to the codes EE-51, EE-57 and EE-59 are all
handled ok.
I rigged it to skip Põlvamaa, and the next place it dies is:
debug: code: EE-86.
debug: name: Võrumaa.
debug: decode: Võrumaa.
Cannot decode string with wide characters at
/home/ron/perl5/perlbrew/perls/perl-5.14.2/lib/5.14.2/x86_64-linux-
thread-multi/Encode.pm line 176.
So, each problem is 'o' with a tilde above it.
When I rigged to code to skip these 3 cases, everything worked.
This is Debian 6, 64 bit.
Perl V 5.14.2.
Encode V 2.44.
Unicode::CaseFold V 0.02.
Unicode::Normalize V 1.14.
Installing Perl V 5.15.9...
Versions of Encode, Unicode::CaseFold, Unicode::Normalize are the same.
Same problem :-(.
Cheers
Ron