Skip Menu |

This queue is for tickets about the XML-Twig CPAN distribution.

Report information
The Basics
Id: 11341
Status: open
Priority: 0/
Queue: XML-Twig

People
Owner: Nobody in particular
Requestors: louis.strous [...] consul.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 3.15
Fixed in: (no value)



Subject: Bad newline interpretation by XML-Twig on Windows
L.S. When I use XML-Twig on Windows on a regular text file (newline = 0x0d 0x0a), then the text printed by Twig has an extra 0x0d at the end of each line, i.e., each line ends in 0x0d 0x0d 0x0a instead of the expected 0x0d 0x0a. The print for the <data> tags in the example below includes an explicit newline, which shows up as an ordinary (for Windows) 0x0d 0x0a in the result, so only the newlines from the original file are not treated correctly. Perhaps the newlines from the input file are at some point interpreted Unix-style (i.e., as 0x0d followed by Unix-\n = 0x0a) by Twig. Files with 0x0d 0x0d 0x0a "newlines" are not accepted as XML files by XML::Parser or XML::Twig (on Windows; they yield a "syntax error at line 2, column 0, byte XX at d:/Perl/site/lib/XML/Parser.pm line 185" error), so if no explicit changes are made to the input by any handlers, then the output from Twig is not acceptable as input to Twig on Windows. This I believe to be a bug. I haven't had similar problems with other Perl scripts, so I believe this to be a bug in Twig and not in Perl on Windows. I have looked at the Twig source code but have not been able to figure out how to solve the problem for myself. I have not found the problem mentioned in the tutorials or manual or FAQ or in the XML::Twig bug list. None of the changes mentioned for version 3.16 seem applicable, and replacing Twig.pm and XPath.pm by the versions from XML-Twig 3.16 does not help. If I can work around this problem by defining a suitable input_filter or output_filter, then please tell me in detail how to do this, because I have not been able to figure that out, either. Best regards, Louis Strous, louis.strous@consul.com XML-Twig version: XML-Twig-3.15.tar.gz from http://xmltwig.com/xmltwig/ installed using ppm 3.0.1. First line of Twig.pm: # $Id: Twig.pm.slow,v 1.157 2004/03/17 17:02:31 mrodrigu Exp $ Operating system: Windows 2000 Professional Perl: This is perl, v5.8.0 built for MSWin32-x86-multi-thread Binary build 802 provided by ActiveState Corp. http://www.ActiveState.com Built 00:54:02 Nov 8 2002 Perl file test.pl using XML::Twig: #!/bin/perl use strict; use XML::Twig; open OUT, ">out.xml"; print OUT "foo1\n"; my $twig = XML::Twig->new( twig_roots => { 'data' => sub { print OUT $_[1]->text . "\nfoo"; }, }, twig_print_outside_roots => \*OUT ); print OUT "foo2\n"; $twig->parsefile("test.xml"); print OUT "foo3\n"; close OUT; Input file test.xml: <document> <item> <data>alpha</data> <target>beta</target> </item> <item> <data>gamma</data> <target>delta</target> </item> </document> hexl test.xml: 00000000: 3c64 6f63 756d 656e 743e 0d0a 2020 3c69 <document>.. <i 00000010: 7465 6d3e 0d0a 2020 2020 3c64 6174 613e tem>.. <data> 00000020: 616c 7068 613c 2f64 6174 613e 0d0a 2020 alpha</data>.. 00000030: 2020 3c74 6172 6765 743e 6265 7461 3c2f <target>beta</ 00000040: 7461 7267 6574 3e0d 0a20 203c 2f69 7465 target>.. </ite 00000050: 6d3e 0d0a 2020 3c69 7465 6d3e 0d0a 2020 m>.. <item>.. 00000060: 2020 3c64 6174 613e 6761 6d6d 613c 2f64 <data>gamma</d 00000070: 6174 613e 0d0a 2020 2020 3c74 6172 6765 ata>.. <targe 00000080: 743e 6465 6c74 613c 2f74 6172 6765 743e t>delta</target> 00000090: 0d0a 2020 3c2f 6974 656d 3e0d 0a3c 2f64 .. </item>..</d 000000a0: 6f63 756d 656e 743e 0d0a ocument>.. Run as follows: perl test.pl Output file out.xml: hexl out.xml: 00000000: 666f 6f31 0d0a 666f 6f32 0d0a 3c64 6f63 foo1..foo2..<doc 00000010: 756d 656e 743e 0d0d 0a20 203c 6974 656d ument>... <item 00000020: 3e0d 0d0a 2020 2020 616c 7068 610d 0a66 >... alpha..f 00000030: 6f6f 0d0d 0a20 2020 203c 7461 7267 6574 oo... <target 00000040: 3e62 6574 613c 2f74 6172 6765 743e 0d0d >beta</target>.. 00000050: 0a20 203c 2f69 7465 6d3e 0d0d 0a20 203c . </item>... < 00000060: 6974 656d 3e0d 0d0a 2020 2020 6761 6d6d item>... gamm 00000070: 610d 0a66 6f6f 0d0d 0a20 2020 203c 7461 a..foo... <ta 00000080: 7267 6574 3e64 656c 7461 3c2f 7461 7267 rget>delta</targ 00000090: 6574 3e0d 0d0a 2020 3c2f 6974 656d 3e0d et>... </item>. 000000a0: 0d0a 3c2f 646f 6375 6d65 6e74 3e66 6f6f ..</document>foo 000000b0: 330d 0a 3..
[guest - Wed Feb 2 12:10:00 2005]: Show quoted text
> When I use XML-Twig on Windows on a regular text file (newline = 0x0d > 0x0a), then the text printed by Twig has an extra 0x0d at the end > of each line, i.e., each line ends in 0x0d 0x0d 0x0a instead of the > expected 0x0d 0x0a.
Weird, I have never had a report on such a problem. I have a few questions, so we can eliminate some possible causes: Does the module pass the tests? Could you send me the result of t/zz_dump_config.t? Did you try with a more recent version of Perl, 5.8.0 seems to often cause Unicode-related problems? I will try to look into this, but I don't have a working windows machine around here, so it might take a while. Don't abandon all hopes though, I definitely need to test the module on Windows before releasing the next version (Real Soon Now(tm)). __ Mirod
[mirod@xmltwig.com - Thu Feb 3 10:54:31 2005]: Hi Louis, So far I managed to install XML::Twig 3.16 on a Win2000 machine, with perl-5.8.6, and your first code generated a well-formed XML file, I could parse it again, without any problem, once I had gotten rid of the extra "foo" that were printed by the original version. At this point I got so fed-up with fighting Windows AND the flickering screen on my laptop that I decided that that was it for today ;--( I tried your second exemple (the one that piped the output of 1 process to the next one) on my linux machine, no problem. I will try again the second exemple when I regain my temper. Thanks __ mirod