Subject: | Autoconvert into UTF8 or ISO if an UTF-Entity is given or not |
If your HTML-String contain Entities, that will result in non-
ISO-8859-1 Chars, the decode_entities() method convert the string
automatically in UTF8. Otherwise it will be converted to ISO. It
doesn't matter of which encoding your input value is.
Examlpe (all input Strings are written in iso):
<code>use HTML::Entities;
my $val = "one − 1 − one = ä + ä";
HTML::Entities::decode_entities($val);
print $val;</code>
This will result in
"Wide character in subroutine entry at /usr/lib/perl5/vendor_perl/5.8.8/
IO/Compress/Adapter/Deflate.pm line 43.\n"
If you convert the value from utf8 to iso, all is fine:
<code>use HTML::Entities;
use Unicode::String;
my $val = "one − 1 − one = ä + ä";
HTML::Entities::decode_entities($val);
$val = Unicode::String::utf8($val)->latin1();
print $val;</code>
This will result in
"one 1 one = ä + ä"
But if your input value doesnt contain some higher char, the result is
iso. Example:
<code>use HTML::Entities;
use Unicode::String;
my $val = "one = ä + ä";
HTML::Entities::decode_entities($val);
$val = Unicode::String::utf8($val)->latin1();
print $val;</code>
Will result in "one = + ", but that's wrong. If you leave the encoding
at iso, it's correct. The Example...
<code>use HTML::Entities;
my $val = "one = ä + ä";
HTML::Entities::decode_entities($val);
print $val;</code>
...will correct result in "one = ä + ä"
Workaround:
If you check for 'is_utf8' and convert if you need it, it's ok.
If you give ISO and want ISO back:
<code>$val = Unicode::String::utf8($val)->latin1() if
(Encode::is_utf8($val));</code>
If you give UTF8 and want UTF8 back:
<code>$val = Unicode::String::latin1($val)->utf8() if(!
Encode::is_utf8($val));</code>