Skip Menu |

This queue is for tickets about the XML-Twig CPAN distribution.

Report information
The Basics
Id: 45782
Status: resolved
Priority: 0/
Queue: XML-Twig

People
Owner: Nobody in particular
Requestors: Frederik.Fouvry [...] acrolinx.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 3.33



Subject: xml_split behaves incorrectly when encoding is set in XML declaration and xml_pp is incorrect
Date: Wed, 6 May 2009 16:33:36 +0200
To: "bug-XML-Twig [...] rt.cpan.org" <bug-XML-Twig [...] rt.cpan.org>
From: Frederik Fouvry <Frederik.Fouvry [...] acrolinx.com>

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.

Hi, xml_split does not seem to be working correctly when the xml declaration contains encoding="utf-8" or encoding="UTF-8": in those cases, it removes all element content. XML::Twig version 3.32 Perl version: This is perl, v5.10.0 built for cygwin-thread-multi-64int (with 6 registered patches, see perl -V for more detail) OS version: CYGWIN_NT-5.1 fenera 1.5.25(0.156/4/2) 2008-06-12 19:34 i686 Cygwin Attached are some test data to reproduce the problem: Notok.xml contains the encoding in the XML declaration, and the output files have no content. Ok.xml does not contain the encoding attribute, and the output files work fine. Expected behaviour: the encoding attribute is respected, or at least does not remove any content ;-) In char_parser(), the print to $_[0] seems odd after shifting two arguments (according to the XML::Parser documentation, it only has two). Also, $state->{current_fh} never seemed to have a value when the encoding is set. And while I'm at it: xml_pp does not seem to be syntactically correct in the same version of XML::Twig: $ xml_pp Bareword "pod2text" not allowed while "strict subs" in use at /usr/bin/xml_pp line 119. Execution of /usr/bin/xml_pp aborted due to compilation errors. Many thanks! Frederik Fouvry Senior Linguistic Engineer -- Telephone +49 (0)30 288 84 83 34 - Facsimile: +49 (0)30 288 84 83 39 acrolinx GmbH, Rosenstraße 2, 10178 Berlin, Germany - WWW: www.acrolinx.com Geschäftsführer: Andrew Bredenkamp Registration HRB 84183, Amtsgericht Berlin-Charlottenburg
On Wed May 06 10:34:58 2009, Frederik.Fouvry@acrolinx.com wrote: Show quoted text
> Hi, > > xml_split does not seem to be working correctly when the xml > declaration contains encoding="utf-8" or encoding="UTF-8": in those > cases, it removes all element content. > > XML::Twig version 3.32 > Perl version: > This is perl, v5.10.0 built for cygwin-thread-multi-64int > (with 6 registered patches, see perl -V for more detail) > > OS version: > CYGWIN_NT-5.1 fenera 1.5.25(0.156/4/2) 2008-06-12 19:34 i686 Cygwin > > Attached are some test data to reproduce the problem: > Notok.xml contains the encoding in the XML declaration, and the output > files have no content. > Ok.xml does not contain the encoding attribute, and the output files > work fine. > > Expected behaviour: the encoding attribute is respected, or at least > does not remove any content ;-) > > In char_parser(), the print to $_[0] seems odd after shifting two > arguments (according to the XML::Parser documentation, it only has > two). Also, $state->{current_fh} never seemed to have a value when > the encoding is set. > > And while I'm at it: > xml_pp does not seem to be syntactically correct in the same version > of XML::Twig: > $ xml_pp > Bareword "pod2text" not allowed while "strict subs" in use at > /usr/bin/xml_pp line 119. > Execution of /usr/bin/xml_pp aborted due to compilation errors.
Both problems are fixed in the development version. I still have to figure out why I set that character handler, the tests don't show any problem when I don't set it, but I have to see what happens in the usual annoying cases, like a long CDATA section. __ mirod