Skip Menu |

This queue is for tickets about the XML-Twig CPAN distribution.

Report information
The Basics
Id: 71841
Status: open
Priority: 0/
Queue: XML-Twig

People
Owner: Nobody in particular
Requestors: ambrus [...] math.bme.hu
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: output_filter sometimes fails to encode attributes
Date: Fri, 21 Oct 2011 18:24:51 +0200
To: bug-XML-Twig [...] rt.cpan.org
From: Zsbán Ambrus <ambrus [...] math.bme.hu>
Hello, It seems that the output_filter option of Twig sometimes fails to encode attribute values. I think this is a bug. Here's a simple example producing the problem. The first command gives the wrong output, as the attribute value is not encoded; compare this to the second command which uses the output_text_filter option instead and gives the correct output. $ perl -we 'use XML::Twig; my $tw = XML::Twig->new(output_filter => XML::Twig::encode_convert("iso-8859-2")); $tw->set_root(XML::Twig::Elt->new("d", {"w" => "\x{e9}l\x{151}"}, "\x{e9}l\x{151}")); $tw->flush; print$/;' | od -tx1c -w8 Wide character in print at /usr/local/perl5.14/lib/site_perl/5.14.2/XML/Twig.pm line 8115. 0000000 3c 64 20 77 3d 22 c3 a9 < d w = " Ă Š 0000010 6c c5 91 22 3e e9 6c f5 l Ĺ 221 " > é l ő 0000020 3c 2f 64 3e 0a < / d > \n 0000025 $ perl -we 'use XML::Twig; my $tw = XML::Twig->new(output_text_filter => XML::Twig::encode_convert("iso-8859-2")); $tw->set_root(XML::Twig::Elt->new("d", {"w" => "\x{e9}l\x{151}"}, "\x{e9}l\x{151}")); $tw->flush; print$/;' | od -tx1c -w8 0000000 3c 64 20 77 3d 22 e9 6c < d w = " é l 0000010 f5 22 3e e9 6c f5 3c 2f ő " > é l ő < / 0000020 64 3e 0a d > \n 0000023 $ The bug might depend on whether the attribute value perl scalar has the utf8 flag. I am using XML::Twig version 3.39, whose configuration information I attach to the bottom. Ambrus ----------- Configuration: perl: 5.014002 OS: linux - x86_64-linux required XML::Parser : 2.41 expat : <no version information found> Strongly Recommended Scalar::Util : 1.23 (for improved memory management) Encode : 2.42_01 (for encoding conversions) Modules providing additional features XML::XPathEngine : <not available> (to use XML::Twig::XPath) XML::XPath : 1.13 (to use XML::Twig::XPath if Tree::XPathEngine not available) LWP : 6.02 (for the parseurl method) HTML::TreeBuilder : 4.2 (to use parse_html and parsefile_html) HTML::Entities::Numbered : <not available> (to allow parsing of HTML containing named entities) HTML::Tidy : <not available> (to use parse_html and parsefile_html with the use_tidy option) HTML::Entities : 3.69 (for the html_encode filter) Tie::IxHash : <not available> (for the keep_atts_order option) Text::Wrap : 2009.0305 (to use the "wrapped" option for pretty_print) Modules used only by the auto tests Test : 1.25_02 Test::Pod : 1.45 XML::Simple : <not available> XML::Handler::YAWriter : <not available> XML::SAX::Writer : <not available> XML::Filter::BufferText : <not available> IO::Scalar : <not available> Please add this information to bug reports (you can run t/zz_dump_config.t to get it) if you are upgrading the module from a previous version, make sure you read the Changes file for bug fixes, new features and the occasional COMPATIBILITY WARNING 1..1 ok 1
Subject: Re: [rt.cpan.org #71841] output_filter sometimes fails to encode attributes
Date: Fri, 21 Oct 2011 19:34:41 +0200
To: bug-XML-Twig [...] rt.cpan.org
From: mirod <xmltwig [...] gmail.com>
Yes, it is a bug, using output_text_filter will work. Generally though the whole input_filter / output_filter / output_text_filter is more or less obsolete. It dates back to pre-5.8, before Encode and output layers was available. Wouldn't opening the output file with the appropriate encoding layer work better? -- mirod On 10/21/2011 06:25 PM, ambrus@math.bme.hu via RT wrote: Show quoted text
> Fri Oct 21 12:25:30 2011: Request 71841 was acted upon. > Transaction: Ticket created by ambrus@math.bme.hu > Queue: XML-Twig > Subject: output_filter sometimes fails to encode attributes > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: ambrus@math.bme.hu > Status: new > Ticket<URL: https://rt.cpan.org/Ticket/Display.html?id=71841> > > > Hello, > > It seems that the output_filter option of Twig sometimes fails to > encode attribute values. I think this is a bug. > > Here's a simple example producing the problem. The first command > gives the wrong output, as the attribute value is not encoded; compare > this to the second command which uses the output_text_filter option > instead and gives the correct output. > > > $ perl -we 'use XML::Twig; my $tw = XML::Twig->new(output_filter => > XML::Twig::encode_convert("iso-8859-2")); > $tw->set_root(XML::Twig::Elt->new("d", {"w" => "\x{e9}l\x{151}"}, > "\x{e9}l\x{151}")); $tw->flush; print$/;' | od -tx1c -w8 > Wide character in print at > /usr/local/perl5.14/lib/site_perl/5.14.2/XML/Twig.pm line 8115. > 0000000 3c 64 20 77 3d 22 c3 a9 > < d w = " Ă Š > 0000010 6c c5 91 22 3e e9 6c f5 > l Ĺ 221 "> é l ő > 0000020 3c 2f 64 3e 0a > < / d> \n > 0000025 > $ perl -we 'use XML::Twig; my $tw = XML::Twig->new(output_text_filter => > XML::Twig::encode_convert("iso-8859-2")); > $tw->set_root(XML::Twig::Elt->new("d", {"w" => "\x{e9}l\x{151}"}, > "\x{e9}l\x{151}")); $tw->flush; print$/;' | od -tx1c -w8 > 0000000 3c 64 20 77 3d 22 e9 6c > < d w = " é l > 0000010 f5 22 3e e9 6c f5 3c 2f > ő "> é l ő< / > 0000020 64 3e 0a
> d> \n
> 0000023 > $ > > > The bug might depend on whether the attribute value perl scalar has > the utf8 flag. > > I am using XML::Twig version 3.39, whose configuration information I > attach to the bottom. > > Ambrus > > > ----------- > > Configuration: > > perl: 5.014002 > OS: linux - x86_64-linux > > required > XML::Parser : 2.41 > expat :<no version information found> > > Strongly Recommended > Scalar::Util : 1.23 (for improved memory management) > Encode : 2.42_01 (for encoding conversions) > > Modules providing additional features > XML::XPathEngine :<not available> (to use XML::Twig::XPath) > XML::XPath : 1.13 (to use XML::Twig::XPath > if Tree::XPathEngine not available) > LWP : 6.02 (for the parseurl method) > HTML::TreeBuilder : 4.2 (to use parse_html and > parsefile_html) > HTML::Entities::Numbered :<not available> (to allow parsing of > HTML containing named entities) > HTML::Tidy :<not available> (to use parse_html and > parsefile_html with the use_tidy option) > HTML::Entities : 3.69 (for the html_encode filter) > Tie::IxHash :<not available> (for the keep_atts_order option) > Text::Wrap : 2009.0305 (to use the "wrapped" > option for pretty_print) > > Modules used only by the auto tests > Test : 1.25_02 > Test::Pod : 1.45 > XML::Simple :<not available> > XML::Handler::YAWriter :<not available> > XML::SAX::Writer :<not available> > XML::Filter::BufferText :<not available> > IO::Scalar :<not available> > > > Please add this information to bug reports (you can run > t/zz_dump_config.t to get it) > > if you are upgrading the module from a previous version, make sure you read the > Changes file for bug fixes, new features and the occasional > COMPATIBILITY WARNING > > 1..1 > ok 1 >
Subject: Re: [rt.cpan.org #71841] output_filter sometimes fails to encode attributes
Date: Fri, 18 Nov 2011 14:19:52 +0100
To: bug-XML-Twig [...] rt.cpan.org
From: Zsbán Ambrus <ambrus [...] math.bme.hu>
On Fri, Oct 21, 2011 at 6:24 PM, Zsbán Ambrus <ambrus@math.bme.hu> wrote: Show quoted text
> It seems that the output_filter option of Twig sometimes fails to > encode attribute values.  I think this is a bug.
Bad news. It seems this bug occurse with the output_encoding option too. $ perl -we 'use XML::Twig; my $tw = XML::Twig->new(output_encoding => "iso-8859-2"); $tw->set_root(XML::Twig::Elt->new("d", {"w" => "\x{e9}l\x{151}"},"\x{e9}l\x{151}")); $tw->flush; print$/;' | od -tx1c -w8 Wide character in print at /usr/local/perl5.14/lib/site_perl/5.14.2/XML/Twig.pm line 8115. 0000000 3c 3f 78 6d 6c 20 76 65 < ? x m l v e 0000010 72 73 69 6f 6e 3d 22 31 r s i o n = " 1 0000020 2e 30 22 20 65 6e 63 6f . 0 " e n c o 0000030 64 69 6e 67 3d 22 69 73 d i n g = " i s 0000040 6f 2d 38 38 35 39 2d 32 o - 8 8 5 9 - 2 0000050 22 3f 3e 3c 64 20 77 3d " ? > < d w = 0000060 22 c3 a9 6c c5 91 22 3e " Ă Š l Ĺ 221 " > 0000070 e9 6c f5 3c 2f 64 3e 0a é l ő < / d > \n 0000100 $ As you can see above, the text content is encoded correctly but the attribute value is not. Ambrus
Subject: Re: [rt.cpan.org #71841] output_filter sometimes fails to encode attributes
Date: Fri, 18 Nov 2011 14:39:26 +0100
To: bug-XML-Twig [...] rt.cpan.org
From: mirod <xmltwig [...] gmail.com>
On 11/18/2011 02:20 PM, ambrus@math.bme.hu via RT wrote: Show quoted text
> Queue: XML-Twig > Ticket<URL: https://rt.cpan.org/Ticket/Display.html?id=71841> > > On Fri, Oct 21, 2011 at 6:24 PM, Zsbán Ambrus<ambrus@math.bme.hu> wrote:
>> It seems that the output_filter option of Twig sometimes fails to >> encode attribute values. I think this is a bug.
> > Bad news. It seems this bug occurse with the output_encoding option too. > > $ perl -we 'use XML::Twig; my $tw = XML::Twig->new(output_encoding => > "iso-8859-2"); $tw->set_root(XML::Twig::Elt->new("d", {"w" => > "\x{e9}l\x{151}"},"\x{e9}l\x{151}")); $tw->flush; print$/;' | od -tx1c > -w8 > Wide character in print at > /usr/local/perl5.14/lib/site_perl/5.14.2/XML/Twig.pm line 8115. > 0000000 3c 3f 78 6d 6c 20 76 65 > < ? x m l v e > 0000010 72 73 69 6f 6e 3d 22 31 > r s i o n = " 1 > 0000020 2e 30 22 20 65 6e 63 6f > . 0 " e n c o > 0000030 64 69 6e 67 3d 22 69 73 > d i n g = " i s > 0000040 6f 2d 38 38 35 39 2d 32 > o - 8 8 5 9 - 2 > 0000050 22 3f 3e 3c 64 20 77 3d > " ?> < d w = > 0000060 22 c3 a9 6c c5 91 22 3e > " Ă Š l Ĺ 221 "> > 0000070 e9 6c f5 3c 2f 64 3e 0a > é l ő< / d> \n > 0000100 > $ > > As you can see above, the text content is encoded correctly but the > attribute value is not. > > Ambrus >
Why don't you use an encoding layer? I believe that this should work: perl -we 'use XML::Twig; binmode( STDOUT, ":encoding(iso-8859-2)");my $tw = XML::Twig->new(); $tw->set_root(XML::Twig::Elt->new("d", {"w" => "\x{e9}l\x{151}"},"\x{e9}l\x{151}")); $tw->flush; print$/;' I will update the docs to reflect the fact that the various XML::Twig encoding options are now obsolete. -- mirod
Subject: Re: [rt.cpan.org #71841] output_filter sometimes fails to encode attributes
Date: Fri, 18 Nov 2011 16:41:45 +0100
To: bug-XML-Twig [...] rt.cpan.org
From: Zsbán Ambrus <ambrus [...] math.bme.hu>
On Fri, Nov 18, 2011 at 2:40 PM, xmltwig@gmail.com via RT <bug-XML-Twig@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=71841 > > > Why don't you use an encoding layer? > > I believe that this should work: > perl  -we 'use XML::Twig; binmode( STDOUT, ":encoding(iso-8859-2)");my > $tw = XML::Twig->new(); $tw->set_root(XML::Twig::Elt->new("d", {"w" => > "\x{e9}l\x{151}"},"\x{e9}l\x{151}")); $tw->flush; print$/;'
Using an encoding layer can be made to work, but it's quite difficult. 1. You need to add the encoding to the xml declaration eg. $tw->set_encoding("iso-8859-2"); 2. You need to make characters that cannot be represented in that output encoding come out correctly. Supposedly use PerlIO::encoding; use Encode; $PerlIO::encoding::fallback = Encode::FB_XMLCREF(); binmode STDOUT, "encoding(iso-8859-2)" or die; should do this, but I couldn't get it to work (it prints everything multiple times; I'll investigate this, it seems there's already a report about it: "https://rt.perl.org/rt3/Ticket/Display.html?id=29720"). 3. It will silently use ampersand escapes for unrepresentable characters in element names, in which case the resulting XML can't be read. Because of all these complications, I'd prefer if XML::Twig gave a simple way to output the XML in any encoding, which should solve all these issues: add an encoding header, output unrepresentable characters from text and attribute values as ampersand escapes, die with a meaningful message if element names or attribute names cannot be represented in the output. Ambrus
Sending the previous mail has failed. Please contact your admin, they can find more details in the logs.