Skip Menu |

This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id: 60474
Status: resolved
Priority: 0/
Queue: HTML-Tree

People
Owner: Nobody in particular
Requestors: SREZIC [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Unimportant
Broken in: 3.23_3
Fixed in: (no value)



Subject: Last word may be eaten when parsing
The output of the following script: #!/usr/bin/perl use HTML::TreeBuilder; $tree = HTML::TreeBuilder->new->parse("ABC DEF GHI."); warn $tree->dump; __END__ look like this: <html> @0 (IMPLICIT) <head> @0.0 (IMPLICIT) <body> @0.1 (IMPLICIT) "ABC DEF" This means that the last word ("GHI.") is missing in the parsed tree. This can be workaround by either adding a newline to the string, or by wrapping the text with some tag. Regards, Slaven
HTML::Parser (which HTML::TreeBuilder is a subclass of) needs you to call $parser->eof to flush any remaining text when you are done calling $p->parse().
Here are a couple of examples to show how this works: $ perl -MHTML::TreeBuilder -e '$tree = HTML::TreeBuilder->new->parse("ABC DEF GHI."); print $tree->dump;' <html> @0 (IMPLICIT) <head> @0.0 (IMPLICIT) <body> @0.1 (IMPLICIT) "ABC DEF" $ perl -MHTML::TreeBuilder -e '$tree = HTML::TreeBuilder->new->parse("ABC DEF GHI."); $tree->eof(); print $tree->dump;' <html> @0 (IMPLICIT) <head> @0.0 (IMPLICIT) <body> @0.1 (IMPLICIT) "ABC DEF GHI." $ perl -MHTML::TreeBuilder -e '$tree = HTML::TreeBuilder->new->parse("ABC DEF GHI."); $tree->parse(" JKL."); $tree->eof(); print $tree->dump;' <html> @0 (IMPLICIT) <head> @0.0 (IMPLICIT) <body> @0.1 (IMPLICIT) "ABC DEF GHI. JKL."