Subject: | Patch for Decode::XHTML |
The attached patch resolves character encoding issues that I have been
having for years. See RT#15152 for more details about the problems. This
uses Encode to let Perl know that the decoded values created by this
module are utf8 unicode characters instead of making Perl guess about
their format.
With this patch, I no longer get funky characters or wide character
warnings.
Subject: | XHTML.pm.diff |
--- XHTML.pm 2004-10-06 07:31:44.000000000 -0400
+++ /home/william/work/perl/Knowmad-Mailform/perl5/MKDoc/XML/Decode/XHTML.pm 2007-02-27 02:11:32.000000000 -0500
@@ -1,7 +1,7 @@
package MKDoc::XML::Decode::XHTML;
use warnings;
use strict;
-
+use Encode;
# Portions (c) International Organization for Standardization 1986:
# Permission to copy in any form is granted for use with conforming SGML
@@ -317,7 +317,11 @@
(@_ == 2) or warn "MKDoc::XML::Encode::process() should be called with two arguments";
my $class = shift;
my $stuff = shift;
- return $ENTITY_2_CHAR{$stuff};
+
+ my $chr = $ENTITY_2_CHAR{$stuff};
+ $chr = Encode::encode_utf8($chr);
+ $chr = pack("U*", unpack("C*", $chr));
+ return $chr;
}