Subject: | output_filter sometimes fails to encode attributes |
Date: | Fri, 21 Oct 2011 18:24:51 +0200 |
To: | bug-XML-Twig [...] rt.cpan.org |
From: | Zsbán Ambrus <ambrus [...] math.bme.hu> |
Hello,
It seems that the output_filter option of Twig sometimes fails to
encode attribute values. I think this is a bug.
Here's a simple example producing the problem. The first command
gives the wrong output, as the attribute value is not encoded; compare
this to the second command which uses the output_text_filter option
instead and gives the correct output.
$ perl -we 'use XML::Twig; my $tw = XML::Twig->new(output_filter =>
XML::Twig::encode_convert("iso-8859-2"));
$tw->set_root(XML::Twig::Elt->new("d", {"w" => "\x{e9}l\x{151}"},
"\x{e9}l\x{151}")); $tw->flush; print$/;' | od -tx1c -w8
Wide character in print at
/usr/local/perl5.14/lib/site_perl/5.14.2/XML/Twig.pm line 8115.
0000000 3c 64 20 77 3d 22 c3 a9
< d w = " Ă Š
0000010 6c c5 91 22 3e e9 6c f5
l Ĺ 221 " > é l ő
0000020 3c 2f 64 3e 0a
< / d > \n
0000025
$ perl -we 'use XML::Twig; my $tw = XML::Twig->new(output_text_filter =>
XML::Twig::encode_convert("iso-8859-2"));
$tw->set_root(XML::Twig::Elt->new("d", {"w" => "\x{e9}l\x{151}"},
"\x{e9}l\x{151}")); $tw->flush; print$/;' | od -tx1c -w8
0000000 3c 64 20 77 3d 22 e9 6c
< d w = " é l
0000010 f5 22 3e e9 6c f5 3c 2f
ő " > é l ő < /
0000020 64 3e 0a
d > \n
0000023
$
The bug might depend on whether the attribute value perl scalar has
the utf8 flag.
I am using XML::Twig version 3.39, whose configuration information I
attach to the bottom.
Ambrus
-----------
Configuration:
perl: 5.014002
OS: linux - x86_64-linux
required
XML::Parser : 2.41
expat : <no version information found>
Strongly Recommended
Scalar::Util : 1.23 (for improved memory management)
Encode : 2.42_01 (for encoding conversions)
Modules providing additional features
XML::XPathEngine : <not available> (to use XML::Twig::XPath)
XML::XPath : 1.13 (to use XML::Twig::XPath
if Tree::XPathEngine not available)
LWP : 6.02 (for the parseurl method)
HTML::TreeBuilder : 4.2 (to use parse_html and
parsefile_html)
HTML::Entities::Numbered : <not available> (to allow parsing of
HTML containing named entities)
HTML::Tidy : <not available> (to use parse_html and
parsefile_html with the use_tidy option)
HTML::Entities : 3.69 (for the html_encode filter)
Tie::IxHash : <not available> (for the keep_atts_order option)
Text::Wrap : 2009.0305 (to use the "wrapped"
option for pretty_print)
Modules used only by the auto tests
Test : 1.25_02
Test::Pod : 1.45
XML::Simple : <not available>
XML::Handler::YAWriter : <not available>
XML::SAX::Writer : <not available>
XML::Filter::BufferText : <not available>
IO::Scalar : <not available>
Please add this information to bug reports (you can run
t/zz_dump_config.t to get it)
if you are upgrading the module from a previous version, make sure you read the
Changes file for bug fixes, new features and the occasional
COMPATIBILITY WARNING
1..1
ok 1