Subject: | Wrong parsing HTML |
Date: | Fri, 31 Oct 2014 19:00:35 +0200 |
To: | bug-html-tree [...] rt.cpan.org |
From: | Victor Porton <porton [...] narod.ru> |
File test2.html:
[[[
<html>
<head>
<title>Test</title>
</head>
<body>
<form>
<link></link>
<input name="x" />
</form>
</body>
</html>
]]]
[[[
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TreeBuilder;
my $tree = HTML::TreeBuilder->new();
$tree->parse_file("test2.html");
print $tree->as_HTML, "\n";
]]]
Result:
[[[
<html><head><title>Test</title><link /></head><body><form></form><input name="x" /></body></html>
]]]
It closes <form> tag at a wrong place, what makes the <input> outside of the form. Also the <link> tag is placed in a wrong place.
The example is based on (stripped down) real HTML code from a third party site. We need to make it working. Yes, the place of <link> tag is wrong, but we need to make it working anyway.
I will attempt to fix this error in HTML::TreeBuilder but may need your help.
--
Victor Porton - http://portonvictor.org