Skip Menu |

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id: 3954
Status: resolved
Priority: 0/
Queue: HTML-Parser

People
Owner: Nobody in particular
Requestors: nick [...] cleaton.net
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 3.31
Fixed in: (no value)



Subject: no state reset at eof unless strict_comment is set
This is all under FreeBSD 4.8 on x86. Consider this test script: -----------------------8<----------------------------------8<------ #!perl -w { package Foo; use strict; use base qw(HTML::Parser); use vars qw($AUTOLOAD); sub AUTOLOAD { my ($self, @args) = @_; return if $AUTOLOAD eq 'Foo::DESTROY'; print join ',', $AUTOLOAD, @args; print "\n"; } } my $foo = Foo->new( api_version => 3, start_document_h => ['input_start_document', 'self'], start_h => ['input_start', 'self,text'], end_h => ['input_end', 'self,text'], text_h => ['input_text', 'self,text'], declaration_h => ['input_declaration', 'self,text'], comment_h => ['input_comment', 'self,text'], process_h => ['input_process', 'self,text'], end_document_h => ['input_end_document', 'self'], ); print "==== HTML::Parser $HTML::Parser::VERSION\n"; $foo->parse('<'); $foo->eof; print "==\n"; $foo->parse('>'); $foo->eof; print "====\n"; -----------------------8<----------------------------------8<----- Under perl 5.005.03 and HTML::Parser 3.28, it works as expected: ==== HTML::Parser 3.28 Foo::input_start_document Foo::input_text,< Foo::input_end_document == Foo::input_start_document Foo::input_text,> Foo::input_end_document ==== But with HTML::Parser 3.31 (under both perl 5.8.0 and perl 5.8.1) it produces this output: ==== HTML::Parser 3.31 Foo::input_start_document Foo::input_comment,< Foo::input_end_document == Foo::input_comment,<> Foo::input_end_document ==== Adding strict_comment => 1, to the constructor eliminates the problem. It looks to me like the code that treats an open tag at eof as a comment is interfering with the state reset that's supposed to happen at eof. Removing the end_document hook doesn't fix the problem.
From: Nick Cleaton
Patch.
diff -Nurd HTML-Parser-3.31.orig/MANIFEST HTML-Parser-3.31/MANIFEST --- HTML-Parser-3.31.orig/MANIFEST Fri Aug 15 01:02:25 2003 +++ HTML-Parser-3.31/MANIFEST Wed Oct 8 19:20:52 2003 @@ -59,6 +59,7 @@ t/plaintext.t Test parsing of <plaintext> t/process.t Test process instruction support t/pullparser.t Test HTML::PullParser +t/rt3954.t Test fix for http://rt.cpan.org/NoAuth/Bug.html?id=3954 t/skipped-text.t Test skipped_text argspec t/textarea.t Test handling of <textarea> t/tokeparser.t Test HTML::TokeParser diff -Nurd HTML-Parser-3.31.orig/hparser.c HTML-Parser-3.31/hparser.c --- HTML-Parser-3.31.orig/hparser.c Tue Aug 19 15:45:08 2003 +++ HTML-Parser-3.31/hparser.c Wed Oct 8 19:24:51 2003 @@ -1606,6 +1606,8 @@ token.beg = s + 1; token.end = end; report_event(p_state, E_COMMENT, s, end, &token, 1, self); + SvREFCNT_dec(p_state->buf); + p_state->buf = 0; } else { goto REST_IS_TEXT; diff -Nurd HTML-Parser-3.31.orig/t/rt3954.t HTML-Parser-3.31/t/rt3954.t --- HTML-Parser-3.31.orig/t/rt3954.t Thu Jan 1 01:00:00 1970 +++ HTML-Parser-3.31/t/rt3954.t Wed Oct 8 19:23:52 2003 @@ -0,0 +1,89 @@ + + +require HTML::Parser; + +package P; @ISA = qw(HTML::Parser); + +sub AUTOLOAD { + my ($self,@args) = @_; + + return if $AUTOLOAD =~ /DESTROY/; + $P::buf .= join(',', $AUTOLOAD, @args) . "!"; +} + +package main; + + +my $parser = P->new( + api_version => 3, + start_document_h => ['input_start_document', 'self'], + start_h => ['input_start', 'self,text'], + end_h => ['input_end', 'self,text'], + text_h => ['input_text', 'self,text'], + declaration_h => ['input_declaration', 'self,text'], + comment_h => ['input_comment', 'self,text'], + process_h => ['input_process', 'self,text'], + end_document_h => ['input_end_document', 'self'], +); + + +print "1..4\n"; +$i = 0; + +$P::buf = ''; +$parser->parse('<')->eof; +is( $P::buf, + 'P::input_start_document!P::input_comment,<!P::input_end_document!', + 'no strict_comment, "<" document parsed as comment' + ); + +$P::buf = ''; +$parser->parse('>')->eof; +is( $P::buf, + 'P::input_start_document!P::input_text,>!P::input_end_document!', + 'no strict_comment, ">" document parsed as text' + ); + + +$parser = P->new( + strict_comment => 1, + api_version => 3, + start_document_h => ['input_start_document', 'self'], + start_h => ['input_start', 'self,text'], + end_h => ['input_end', 'self,text'], + text_h => ['input_text', 'self,text'], + declaration_h => ['input_declaration', 'self,text'], + comment_h => ['input_comment', 'self,text'], + process_h => ['input_process', 'self,text'], + end_document_h => ['input_end_document', 'self'], +); + + +$P::buf = ''; +$parser->parse('<')->eof; +is( $P::buf, + 'P::input_start_document!P::input_text,<!P::input_end_document!', + 'strict_comment, "<" document parsed as text' + ); + +$P::buf = ''; +$parser->parse('>')->eof; +is( $P::buf, + 'P::input_start_document!P::input_text,>!P::input_end_document!', + 'strict_comment, ">" document parsed as text' + ); + + +sub is { + my ($got, $expected, $name) = @_; + + ++$i; + print "Test $i: $name\n"; + if ( $expected ne $got) { + print "Expected: $expected\n", + "Got: $got\n"; + print( "not " ); + } + print "ok $i\n"; +} +