Skip Menu |

This queue is for tickets about the XML-Twig CPAN distribution.

Report information
The Basics
Id: 78877
Status: resolved
Priority: 0/
Queue: XML-Twig

People
Owner: Nobody in particular
Requestors: eellis [...] classroom24-7.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Twig causes panic when processing what appears to be a sane file.
Date: Thu, 9 Aug 2012 22:49:54 +0000
To: "bug-XML-Twig [...] rt.cpan.org" <bug-XML-Twig [...] rt.cpan.org>
From: Eric Ellis <eellis [...] classroom24-7.com>
I'm not sure this is a Twig bug, but I'm at a loss as to what causes this, and it appears to be Twig related. I've attached the file in question. It's a job file for Microsoft Expression Encoder that happens to be XML an XML file with a different name. I can reliably cause a perl crash (currently using ActivePerl 5.12.4 on Win7 64, using Twig 3.39 from the AS repo) with this code and the attached file: <code> use warnings; use strict; use XML::Twig; my $twig = new XML::Twig(pretty_print => 'indented'); open( my $job_file, "<:encoding(utf8)", $ARGV[0] ); eval { $twig->safe_parse($job_file); }; my $metadata = $twig->first_elt('Metadata'); my @items = $metadata->children(); foreach my $i (@items) { print $i->att_names; } </code> Removing encoding appears to sidestep it nicely, but the file appears to be encoded UTF8. If I open with safe_parsefile, it behaves as expected as well. I can print the file with no issues when opened with the encoding stanza. Thanks for your time and effort Michael. Twig's been in use in my internal software for the last couple of years. I shudder to think about having to use any of the other less useable tools for XML manipulation. -- Eric Ellis eellis@classroom24-7.com http://www.classroom24-7.com
Download job.xej
application/octet-stream 65.2k

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #78877] Twig causes panic when processing what appears to be a sane file.
Date: Fri, 10 Aug 2012 07:07:08 +0200
To: bug-XML-Twig [...] rt.cpan.org
From: mirod <xmltwig [...] gmail.com>
On 08/10/2012 12:50 AM, Eric Ellis via RT wrote: Show quoted text
> my $twig = new XML::Twig(pretty_print => 'indented'); open( my > $job_file, "<:encoding(utf8)", $ARGV[0] ); eval { > $twig->safe_parse($job_file); }; > > my $metadata = $twig->first_elt('Metadata'); my @items = > $metadata->children(); foreach my $i (@items) { print $i->att_names; > } </code> > > Removing encoding appears to sidestep it nicely, but the file appears > to be encoded UTF8. If I open with safe_parsefile, it behaves as > expected as well.
I can reproduce this, and it looks like it makes sense. The file is read by expat, not by Perl, so the encoding layer interferes with expat's own UTF-8 processing. Removing the encoding(utf8) from the open statement solves the problem. Actually, if you are parsing a file, you should use parsefile. If for any reason you have to parse a filehandle, then it looks like you should not specify its encoding, and let the parser deal with it. I will update the docs though. Thanks for the report. -- mirod
documentation improved in 3.43 __ mirod