Subject: | Decoding Entities |
Hi Bruno,
As you know, I've struggled quite a bit with entities and character encodings. It seems that I've got many clients whose editors like to use entities.
At any rate, I finally realized tonite that Petal does not appear to be using the Petal::Entities module since v1.06. Instead the MKDoc::XML::Decode classes are being used. I suppose you may be leaving it for backwards compatibililty.
Despite my efforts, I still have some pages which insist on displaying the FFFD character in place of the chr() characters that MKDoc::XML::Decode::XHTML is outputting (some pages work, some don't although both files appear the same). I have found the adding the following line after $decode->process is called in Petal.pm fixes the problem:
$res = Encode::encode('utf8', $res);
I'm not sure this is a permanent solution but it seems to be working in all situations.
Furthermore, on the pages which do not work, if I dump the page to the logs, I see the nbsp entities being output as \x{a0}. In a newsgroup posting[1], I found the following code which shows these \x chars are not being treated as Unicode:
perl -MDevel::Peek -we "print Dump qq/\xa0/,Dump qq/\x{a0}/"
I'm guessing that the Encode::encode function that I'm using is converting these back to utf8 so that the browser displays them correctly. This doesn't explain why some pages come through ok, but I'm happier with having the extra code when decoding the page than having the bad characters showing. Thoughts?
Thanks,
William
[1] http://groups.google.com/group/perl.perl5.porters/browse_thread/thread/b847e3300a91f71d/1415aa675c860c53?lnk=st&q=%22\x%7Ba0%7D%22&rnum=2&hl=en#1415aa675c860c53