Subject: | Bad parsing of <form><body> |
Yes, I know this is malformed HTML, but what HTML::Tree does is different from Safari,
FireFox and HTML::HTML5::Parser:
$ perl -MHTML::TreeBuilder -le '$h = new HTML::TreeBuilder; $h-
Show quoted text
>parse("<form><body><input name=foo value=bar>"); print $h->as_HTML'
<html><head></head><body><form></form><input name="foo" value="bar"
/></body></html>
Notice how the input ends up outside the form. This is because, when the extraneous
<body> tag is encountered, the current parsing/insertion position is set to the body element.
According to the HTML 5 specification (if you can call it that yet), the current insertion
position (‘stack of open elements’) does not change when an extraneous <body> tag is
encountered. It’s merely the attributes in the tag that get copied to the existing body
element.
The attached patch fixes it, at least for this case (<body> when pos is already inside body). I
made the fix conditional, just to keep the existing behaviour the same for other cases that I
haven’t thought about.
Subject: | open_U1t48IiP.txt |
Only in HTML-Tree-4.2-SxenFd: .DS_Store
diff -rup HTML-Tree-4.2-SxenFd-orig/lib/HTML/TreeBuilder.pm HTML-Tree-4.2-SxenFd/lib/HTML/TreeBuilder.pm
--- HTML-Tree-4.2-SxenFd-orig/lib/HTML/TreeBuilder.pm 2011-04-06 01:37:54.000000000 -0700
+++ HTML-Tree-4.2-SxenFd/lib/HTML/TreeBuilder.pm 2012-03-24 14:29:11.000000000 -0700
@@ -706,7 +706,8 @@ sub warning {
for ( keys %$attr ) {
$body->attr( $_, $attr->{$_} );
}
- return $self->{'_pos'} = $body; # bypass tweaking.
+ $self->{'_pos'} = $body unless $pos->is_inside('body');
+ return $self->{'_pos'}; # bypass tweaking.
#----------------------------------------------------------------------
}
diff -rup HTML-Tree-4.2-SxenFd-orig/t/body.t HTML-Tree-4.2-SxenFd/t/body.t
--- HTML-Tree-4.2-SxenFd-orig/t/body.t 2011-04-06 01:37:54.000000000 -0700
+++ HTML-Tree-4.2-SxenFd/t/body.t 2012-03-24 14:28:53.000000000 -0700
@@ -3,7 +3,7 @@
use warnings;
use strict;
-use Test::More tests => 11;
+use Test::More tests => 12;
BEGIN {
use_ok('HTML::TreeBuilder');
@@ -89,3 +89,10 @@ RT_18571: {
"<html><head></head><body><b>\$self->escape</b></body></html>" )
; # 3.22 compatability
}
+
+{
+ my $root = HTML::TreeBuilder->new;
+ $root->parse('<form><body><input>');
+ ok $root->find('input')->is_inside('form'),
+ '<form><body> leaves <form> as the current parsing position';
+}