Skip Menu |

This queue is for tickets about the XML-SAX CPAN distribution.

Report information
The Basics
Id: 20126
Status: open
Priority: 0/
Queue: XML-SAX

People
Owner: Nobody in particular
Requestors: cjm [...] cpan.org
evanroode [...] sbcglobal.net
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in:
  • 0.14
  • 0.96
Fixed in: (no value)



Subject: incorrect parsing of comments
package XML::SAX::PurePerl, sub Comment I tried to produce a small test case that reproduces the problem, but haven't had much luck so far. The attached patch does solve the problem on the data that exposed this problem though. The sub tries matching '-->' to find the end of a comment, but those three characters are not necessarily in the reader's buffer, they could cross buffer boundaries. Eg, the buffer could contain [<!-- nice comment --], with the next chunk having the '>' part of the end of comment sequence. The current code would store everything including the -- in $comment_str, read the next chunk, and append the > and everything following it up till the next -->. One way to solve this is to keep reading more data until we find a --> sequence (or end of file), and only after we find one do we copy the matched string to $comment_str. Another solution would be to not append the entire buffer to comment_str when we don't find -->, but keep the last 2 characters in the buffer. Reading more data will then get the entire --> sequence into the buffer, and it will be properly matched on the next iteration. Erik
Subject: PurePerl.patch
Checking files (this may take a couple minutes) ... Checking perl-XML-SAX ... 2 possible mod(s) m /home/y/lib/perl5/site_perl/5.8/XML/SAX/ParserDetails.ini M /home/y/lib/perl5/site_perl/5.8/XML/SAX/PurePerl.pm --- yinst.32293.check/lib/perl5/site_perl/5.8/XML/SAX/PurePerl.pm 2006-06-26 14:57:24.000000000 -0700 +++ /home/y/lib/perl5/site_perl/5.8/XML/SAX/PurePerl.pm 2006-06-26 17:58:23.000000000 -0700 @@ -586,27 +586,29 @@ sub Comment { my ($self, $reader) = @_; - + my $data = $reader->data(4); if ($data =~ /^<!--/) { - $reader->move_along(4); + $reader->move_along(4); # skip comment start + + $data = $reader->data; + while ($data !~ m!-->!) { + my $n = $reader->read_more; + $self->parser_error("End of data seen while looking for close comment marker", $reader) + unless $n; + $data = $reader->data; + } + my $comment_str = ''; - while (1) { - my $data = $reader->data; - $self->parser_error("End of data seen while looking for close comment marker", $reader) - unless length($data); - if ($data =~ /^(.*?)-->/s) { - $comment_str .= $1; - $self->parser_error("Invalid comment (dash)", $reader) if $comment_str =~ /-$/; - $reader->move_along(length($1) + 3); - last; - } - else { - $comment_str .= $data; - $reader->move_along(length($data)); - } - } - + if ($data =~ /^(.*?)-->/s) { + $comment_str = $1; + $self->parser_error("Invalid comment (dash)", $reader) if $comment_str =~ /-$/; + $reader->move_along(length($1) + 3); + } + else { + return 0; + } + $self->comment({ Data => $comment_str }); return 1;
We are using XML::Simple at work. It was using XML::SAX::PurePerl behind the scenes and we ran into this bug just last week. I cannot provide the customer data or script that exhibits the behavior. I can write a script/data file combo if needed.
Subject: Re: [rt.cpan.org #20126] incorrect parsing of comments
Date: Sun, 06 Dec 2009 10:42:55 +1300
To: bug-XML-SAX [...] rt.cpan.org
From: Grant McLean <grant [...] mclean.net.nz>
On Sat, 2009-12-05 at 14:13 -0500, Matthew Musgrove via RT wrote: Show quoted text
> Queue: XML-SAX > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=20126 > > > We are using XML::Simple at work. It was using XML::SAX::PurePerl behind > the scenes and we ran into this bug just last week. > > I cannot provide the customer data or script that exhibits the behavior. > I can write a script/data file combo if needed.
Yes, I'll need an example of the failing case. You really should consider installing XML::SAX::Expat or XML::SAX::ExpatXS. No change will be required to your code and it will run faster and workaround the PurePerl parser error. Cheers Grant
On Sat Dec 05 16:43:16 2009, grant@mclean.net.nz wrote: Show quoted text
> Yes, I'll need an example of the failing case. > > You really should consider installing XML::SAX::Expat or > XML::SAX::ExpatXS. No change will be required to your code and it will > run faster and workaround the PurePerl parser error. > > Cheers > Grant > >
I started working on a test case this weekend and should have something to submit no later than tomorrow night. I do not know when the XML::Simple change was implemented that moved it from using XML::Parser but we never noticed this problem until recently (past month or so and then at random based on the generated data). Our initial workaround was to tell XML::Simple to make use of XML::Parser. Long term we agree that we should move to XML::SAX::Expat[XS].
On Sat Dec 05 16:43:16 2009, grant@mclean.net.nz wrote: Show quoted text
> On Sat, 2009-12-05 at 14:13 -0500, Matthew Musgrove via RT wrote: > > Yes, I'll need an example of the failing case. > > You really should consider installing XML::SAX::Expat or > XML::SAX::ExpatXS. No change will be required to your code and it will > run faster and workaround the PurePerl parser error. > > Cheers > Grant > >
I have created a fork of XML-SAX and added in a test script with xml file that exhibits the behavior. I went ahead and used the "customer" data (it was from our test group and did not contain any sensitive information). http://github.com/mrmuskrat/XML-SAX
I ran into this problem, too. The error message was "End tag mismatch", but applying PurePerl.patch fixed it (as did switching to XML::SAX::ExpatXS, but XML::SAX::PurePerl really ought to get fixed). Sorry, but I can't provide the file I was parsing.