Subject: | XML::Parser fails to parse element with the Euro and the Drachma signs |
Hi,
I'm experiencing a problem with using XML::Parser to parse an ISO-8859-7
XML element with the Euro and the Drachma signs. Here's my configuration.
Linux Fedora Core 9
perl, v5.10.0
expat_2.0.1
XML::Parser 2.36
And here's a script which reproduces the problem described.
--------------------------------------
#!/usr/bin/perl -w
use XML::Parser;
my $p1 = new XML::Parser(ProtocolEncoding=> 'ISO-8859-7');
my $ed = chr(0xA4) . chr(0xA5); # Euro and drachma signs
my $ab = chr(0xE2) . chr(0xE3); # Greek alpha and beta
$p1->parse("<p>$ed</p>"); # This fails
# $p1->parse("<p>$ab</p>"); # This doesn't
--------------------------------------
What I get when running the script is:
not well-formed (invalid token) at line 1, column 3, byte 3 at
/usr/lib/perl5/vendor_perl/5.10.0/i386-linux-thread-multi/XML/Parser.pm
line 187
Any ideas why this happens? If it's any help, according to wikipedia
<http://en.wikipedia.org/wiki/ISO_8859-7>, the updated 2003 version of
ISO-8859-7 added three characters (euro sign, drachma sign, and Greek
Ypogegrammeni) to the standard.
Best,
pthespis