Subject: | Bad newline interpretation by XML-Twig on Windows |
L.S.
When I use XML-Twig on Windows on a regular text file (newline = 0x0d 0x0a), then the text printed by Twig has an extra 0x0d at the end of each line, i.e., each line ends in 0x0d 0x0d 0x0a instead of the expected 0x0d 0x0a. The print for the <data> tags in the example below includes an explicit newline, which shows up as an ordinary (for Windows) 0x0d 0x0a in the result, so only the newlines from the original file are not treated correctly. Perhaps the newlines from the input file are at some point interpreted Unix-style (i.e., as 0x0d followed by Unix-\n = 0x0a) by Twig. Files with 0x0d 0x0d 0x0a "newlines" are not accepted as XML files by XML::Parser or XML::Twig (on Windows; they yield a "syntax error at line 2, column 0, byte XX at d:/Perl/site/lib/XML/Parser.pm line 185" error), so if no explicit changes are made to the input by any handlers, then the output from Twig is not acceptable as input to Twig on Windows. This I believe to be a bug. I haven't had similar problems with other Perl scripts, so I believe this to be a bug in Twig and not in Perl on Windows.
I have looked at the Twig source code but have not been able to figure out how to solve the problem for myself. I have not found the problem mentioned in the tutorials or manual or FAQ or in the XML::Twig bug list. None of the changes mentioned for version 3.16 seem applicable, and replacing Twig.pm and XPath.pm by the versions from XML-Twig 3.16 does not help. If I can work around this problem by defining a suitable input_filter or output_filter, then please tell me in detail how to do this, because I have not been able to figure that out, either.
Best regards,
Louis Strous, louis.strous@consul.com
XML-Twig version: XML-Twig-3.15.tar.gz from http://xmltwig.com/xmltwig/ installed using ppm 3.0.1.
First line of Twig.pm: # $Id: Twig.pm.slow,v 1.157 2004/03/17 17:02:31 mrodrigu Exp $
Operating system: Windows 2000 Professional
Perl: This is perl, v5.8.0 built for MSWin32-x86-multi-thread
Binary build 802 provided by ActiveState Corp. http://www.ActiveState.com
Built 00:54:02 Nov 8 2002
Perl file test.pl using XML::Twig:
#!/bin/perl
use strict;
use XML::Twig;
open OUT, ">out.xml";
print OUT "foo1\n";
my $twig = XML::Twig->new(
twig_roots =>
{
'data' => sub { print OUT $_[1]->text . "\nfoo"; },
},
twig_print_outside_roots => \*OUT
);
print OUT "foo2\n";
$twig->parsefile("test.xml");
print OUT "foo3\n";
close OUT;
Input file test.xml:
<document>
<item>
<data>alpha</data>
<target>beta</target>
</item>
<item>
<data>gamma</data>
<target>delta</target>
</item>
</document>
hexl test.xml:
00000000: 3c64 6f63 756d 656e 743e 0d0a 2020 3c69 <document>.. <i
00000010: 7465 6d3e 0d0a 2020 2020 3c64 6174 613e tem>.. <data>
00000020: 616c 7068 613c 2f64 6174 613e 0d0a 2020 alpha</data>..
00000030: 2020 3c74 6172 6765 743e 6265 7461 3c2f <target>beta</
00000040: 7461 7267 6574 3e0d 0a20 203c 2f69 7465 target>.. </ite
00000050: 6d3e 0d0a 2020 3c69 7465 6d3e 0d0a 2020 m>.. <item>..
00000060: 2020 3c64 6174 613e 6761 6d6d 613c 2f64 <data>gamma</d
00000070: 6174 613e 0d0a 2020 2020 3c74 6172 6765 ata>.. <targe
00000080: 743e 6465 6c74 613c 2f74 6172 6765 743e t>delta</target>
00000090: 0d0a 2020 3c2f 6974 656d 3e0d 0a3c 2f64 .. </item>..</d
000000a0: 6f63 756d 656e 743e 0d0a ocument>..
Run as follows: perl test.pl
Output file out.xml: hexl out.xml:
00000000: 666f 6f31 0d0a 666f 6f32 0d0a 3c64 6f63 foo1..foo2..<doc
00000010: 756d 656e 743e 0d0d 0a20 203c 6974 656d ument>... <item
00000020: 3e0d 0d0a 2020 2020 616c 7068 610d 0a66 >... alpha..f
00000030: 6f6f 0d0d 0a20 2020 203c 7461 7267 6574 oo... <target
00000040: 3e62 6574 613c 2f74 6172 6765 743e 0d0d >beta</target>..
00000050: 0a20 203c 2f69 7465 6d3e 0d0d 0a20 203c . </item>... <
00000060: 6974 656d 3e0d 0d0a 2020 2020 6761 6d6d item>... gamm
00000070: 610d 0a66 6f6f 0d0d 0a20 2020 203c 7461 a..foo... <ta
00000080: 7267 6574 3e64 656c 7461 3c2f 7461 7267 rget>delta</targ
00000090: 6574 3e0d 0d0a 2020 3c2f 6974 656d 3e0d et>... </item>.
000000a0: 0d0a 3c2f 646f 6375 6d65 6e74 3e66 6f6f ..</document>foo
000000b0: 330d 0a 3..