Subject: | incorrect parsing of comments |
package XML::SAX::PurePerl, sub Comment
I tried to produce a small test case that reproduces the problem, but
haven't had much luck so far. The attached patch does solve the problem
on the data that exposed this problem though.
The sub tries matching '-->' to find the end of a comment, but those
three characters are not necessarily in the reader's buffer, they could
cross buffer boundaries. Eg, the buffer could contain [<!-- nice comment
--], with the next chunk having the '>' part of the end of comment
sequence. The current code would store everything including the -- in
$comment_str, read the next chunk, and append the > and everything
following it up till the next -->.
One way to solve this is to keep reading more data until we find a -->
sequence (or end of file), and only after we find one do we copy the
matched string to $comment_str.
Another solution would be to not append the entire buffer to comment_str
when we don't find -->, but keep the last 2 characters in the buffer.
Reading more data will then get the entire --> sequence into the buffer,
and it will be properly matched on the next iteration.
Erik
Subject: | PurePerl.patch |
Checking files (this may take a couple minutes) ...
Checking perl-XML-SAX ... 2 possible mod(s)
m /home/y/lib/perl5/site_perl/5.8/XML/SAX/ParserDetails.ini
M /home/y/lib/perl5/site_perl/5.8/XML/SAX/PurePerl.pm
--- yinst.32293.check/lib/perl5/site_perl/5.8/XML/SAX/PurePerl.pm 2006-06-26 14:57:24.000000000 -0700
+++ /home/y/lib/perl5/site_perl/5.8/XML/SAX/PurePerl.pm 2006-06-26 17:58:23.000000000 -0700
@@ -586,27 +586,29 @@
sub Comment {
my ($self, $reader) = @_;
-
+
my $data = $reader->data(4);
if ($data =~ /^<!--/) {
- $reader->move_along(4);
+ $reader->move_along(4); # skip comment start
+
+ $data = $reader->data;
+ while ($data !~ m!-->!) {
+ my $n = $reader->read_more;
+ $self->parser_error("End of data seen while looking for close comment marker", $reader)
+ unless $n;
+ $data = $reader->data;
+ }
+
my $comment_str = '';
- while (1) {
- my $data = $reader->data;
- $self->parser_error("End of data seen while looking for close comment marker", $reader)
- unless length($data);
- if ($data =~ /^(.*?)-->/s) {
- $comment_str .= $1;
- $self->parser_error("Invalid comment (dash)", $reader) if $comment_str =~ /-$/;
- $reader->move_along(length($1) + 3);
- last;
- }
- else {
- $comment_str .= $data;
- $reader->move_along(length($data));
- }
- }
-
+ if ($data =~ /^(.*?)-->/s) {
+ $comment_str = $1;
+ $self->parser_error("Invalid comment (dash)", $reader) if $comment_str =~ /-$/;
+ $reader->move_along(length($1) + 3);
+ }
+ else {
+ return 0;
+ }
+
$self->comment({ Data => $comment_str });
return 1;