Skip Menu |

This queue is for tickets about the XML-SAX CPAN distribution.

Report information
The Basics
Id: 97298
Status: open
Priority: 0/
Queue: XML-SAX

People
Owner: Nobody in particular
Requestors: vgrinshp [...] akamai.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 0.99



Subject: XML::SAX::PurePerl incorrectly (and silently) parses CDATA if the end maker is split between buffers.
Date: Wed, 16 Jul 2014 18:22:38 -0400
To: bug-XML-SAX [...] rt.cpan.org
From: Vadim Grinshpun <vgrinshp [...] akamai.com>
Hi, There's a bug in XML::SAX::PurePerl (XML-SAX-0.99) that causes CDATA sections to be misparsed if the terminating ]]> characters fall on both sides of a buffer boundary; e.g., if buffer ends with "]" or "]]" and the rest of the delimiter has not yet been read in. It looks like the partial delimiter characters are simply discarded, and the end of CDATA is not detected. If no further CDATA occurs, an exception is thrown. If additional CDATA sections exist, their end marker will likely be seen, and the failure will be silent, but the parsed content will be wrong (i.e., all XML between start of first CDATA and end of next CDATA will be considered part of the first CDATA section). Here's a quick script to reproduce the behavior (forcing the former case; to make it the latter, add one more CDATA section to $xml): #!/usr/bin/perl use strict; use warnings; use lib ( "$ENV{HOME}/downloads/XML-SAX-0.99/" ); use XML::Simple; $ENV{XML_SIMPLE_PREFERRED_PARSER}="XML::SAX::PurePerl"; my $print = 1; for my $buflen ( 1000 .. 5000 ) { my $xml = "<x><a><![CDATA["; my $remain = $buflen - length($xml); my $filler = 'Z' x $remain; $xml .= $filler; $xml .= "]]></a></x>"; eval { my $x = XMLin( $xml, forcearray => 1, keyattr => [] ); if ( $print ) # this is just to prove I'm using the expected pureperl version { print $INC{'XML/SAX/PurePerl.pm'} . "\n"; $print = 0; } my $parsed = $x->{a}; for my $v ( @$parsed ) { if ( $v =~ /[^Z]/ ) { die "v: '$v'\n"; } } }; if ( $@ ) { die "misparse: buflen: $buflen\n\nxml: '$xml'\n\n$@\n"; } }
On Wed Jul 16 18:22:48 2014, vgrinshp@akamai.com wrote: Show quoted text
> There's a bug in XML::SAX::PurePerl (XML-SAX-0.99) that causes CDATA > sections to be misparsed if the terminating ]]> characters fall on both > sides of a buffer boundary;
Fix: *** /usr/lib/perl5/vendor_perl/5.24.1/XML/SAX/PurePerl.pm.old Fri Dec 1 14:33:32 2017 --- /usr/lib/perl5/vendor_perl/5.24.1/XML/SAX/PurePerl.pm Fri Dec 1 14:32:21 2017 *************** *** 319,329 **** $self->characters({Data => $chars}); last; } ! else { ! $self->characters({Data => $data}); ! $reader->move_along(length($data)); ! $data = $reader->data; ! } } $self->end_cdata({}); return 1; --- 319,326 ---- $self->characters({Data => $chars}); last; } ! $reader->read_more; ! $data = $reader->data; } $self->end_cdata({}); return 1;