Subject: | xml_split behaves incorrectly when encoding is set in XML declaration and xml_pp is incorrect |
Date: | Wed, 6 May 2009 16:33:36 +0200 |
To: | "bug-XML-Twig [...] rt.cpan.org" <bug-XML-Twig [...] rt.cpan.org> |
From: | Frederik Fouvry <Frederik.Fouvry [...] acrolinx.com> |
Message body is not shown because sender requested not to inline it.
Message body is not shown because sender requested not to inline it.
Message body is not shown because sender requested not to inline it.
Message body is not shown because sender requested not to inline it.
Message body is not shown because sender requested not to inline it.
Message body is not shown because sender requested not to inline it.
Message body is not shown because sender requested not to inline it.
Message body is not shown because sender requested not to inline it.
Hi,
xml_split does not seem to be working correctly when the xml declaration contains encoding="utf-8" or encoding="UTF-8": in those cases, it removes all element content.
XML::Twig version 3.32
Perl version:
This is perl, v5.10.0 built for cygwin-thread-multi-64int
(with 6 registered patches, see perl -V for more detail)
OS version:
CYGWIN_NT-5.1 fenera 1.5.25(0.156/4/2) 2008-06-12 19:34 i686 Cygwin
Attached are some test data to reproduce the problem:
Notok.xml contains the encoding in the XML declaration, and the output files have no content.
Ok.xml does not contain the encoding attribute, and the output files work fine.
Expected behaviour: the encoding attribute is respected, or at least does not remove any content ;-)
In char_parser(), the print to $_[0] seems odd after shifting two arguments (according to the XML::Parser documentation, it only has two). Also, $state->{current_fh} never seemed to have a value when the encoding is set.
And while I'm at it:
xml_pp does not seem to be syntactically correct in the same version of XML::Twig:
$ xml_pp
Bareword "pod2text" not allowed while "strict subs" in use at /usr/bin/xml_pp line 119.
Execution of /usr/bin/xml_pp aborted due to compilation errors.
Many thanks!
Frederik Fouvry
Senior Linguistic Engineer
--
Telephone +49 (0)30 288 84 83 34 - Facsimile: +49 (0)30 288 84 83 39
acrolinx GmbH, Rosenstraße 2, 10178 Berlin, Germany - WWW: www.acrolinx.com
Geschäftsführer: Andrew Bredenkamp
Registration HRB 84183, Amtsgericht Berlin-Charlottenburg