Skip Menu |

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id: 37757
Status: rejected
Priority: 0/
Queue: HTML-Parser

People
Owner: Nobody in particular
Requestors: cpan [...] tobias-tacke.de
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 3.56
Fixed in: (no value)



Subject: Autoconvert into UTF8 or ISO if an UTF-Entity is given or not
If your HTML-String contain Entities, that will result in non- ISO-8859-1 Chars, the decode_entities() method convert the string automatically in UTF8. Otherwise it will be converted to ISO. It doesn't matter of which encoding your input value is. Examlpe (all input Strings are written in iso): <code>use HTML::Entities; my $val = "one &#8722; 1 &minus; one = &auml; + ä"; HTML::Entities::decode_entities($val); print $val;</code> This will result in "Wide character in subroutine entry at /usr/lib/perl5/vendor_perl/5.8.8/ IO/Compress/Adapter/Deflate.pm line 43.\n" If you convert the value from utf8 to iso, all is fine: <code>use HTML::Entities; use Unicode::String; my $val = "one &#8722; 1 &minus; one = &auml; + ä"; HTML::Entities::decode_entities($val); $val = Unicode::String::utf8($val)->latin1(); print $val;</code> This will result in "one 1 one = ä + ä" But if your input value doesnt contain some higher char, the result is iso. Example: <code>use HTML::Entities; use Unicode::String; my $val = "one = &auml; + ä"; HTML::Entities::decode_entities($val); $val = Unicode::String::utf8($val)->latin1(); print $val;</code> Will result in "one = + ", but that's wrong. If you leave the encoding at iso, it's correct. The Example... <code>use HTML::Entities; my $val = "one = &auml; + ä"; HTML::Entities::decode_entities($val); print $val;</code> ...will correct result in "one = ä + ä" Workaround: If you check for 'is_utf8' and convert if you need it, it's ok. If you give ISO and want ISO back: <code>$val = Unicode::String::utf8($val)->latin1() if (Encode::is_utf8($val));</code> If you give UTF8 and want UTF8 back: <code>$val = Unicode::String::latin1($val)->utf8() if(! Encode::is_utf8($val));</code>
I can't see that HTML::Entities does anything wrong. It's output is a Perl Unicode string. If you want to feed it to Unicode::String's utf8() constructor you need to make sure it's decoded UTF8, which you can create with Encode. Better is just to avoid Unicode::String these days.
Subject: Autoconvert into UTF8 or ISO if an UTF-Entity is given or not: no module error
You'r right. There was some utf8-flag-magic in there and no error in this module. My error is, that all strings, no matter what charset the file is, are NOT masked as utf8. And, to extend the confusion, also the output has to set the flag explicitly. The following example is ok: <code>use strict; use warnings; use Encode(); use HTML::Entities(); binmode STDOUT, ":utf8";# force output utf-flag my $v = "one &#8722; 1 &minus; one = &auml; + ä"; utf8::decode($v);# set utf-flag explicit HTML::Entities::decode_entities($v); print "$v\n";</code> Regards Tobiwan
Why the status is changed to "open" when the form-field is set to "unchanged" :(