Subject: | "FORM" is always a daughter of "body"? |
distribution name and version: HTML-Tree-3.18
perl version: "v5.8.6 built for MSWin32-x86-multi-thread" & " v5.8.0 built for i386-linux-thread-multi"
operating system: "windows xp Home Edition version 2002 SP 2" & "linux 2.4.20-8bigmem"
problem: if I didn't make a mistake, it seems to me that a wrong tree is built when a "form" is inside of a "span", which is in turn inside of a "p". The "form" is mistakenly listed as a sibling of "p". Further, "a" is also treated as a sibling of "p" too.
It may be argued that it won't hurt that a "form" can be the daughter of "body". This is fine if this "form" is not a part of other non-body tags. Otherwise, in our case, if it is treated like this, then "a" after "form" becomes a daughter of "body" too, which it in fact should be part of the paragraph marked by "p".
And my intention is to extract texts for each paragraph. But because of this, any text after a "form" will be missing from the paragraph they should appear.
Please verify if this is a bug. The html code is given below. Please check.
<---BEGIN OF HTML--->
<html>
<head>
<title>test
</title>
<body>
<p>
<span class="yqlink">
<form class="yqin" action="http://yq.search.yahoo.com/search" method="post">
<input type="hidden" name="p" value=""Bob Dylan"" />
<input type="hidden" name="sourceOrder" value="c1,i,yn,c3" />
<input type="hidden" name="c1" value="<p style="font-family:arial,sans-serif;font-weight:bold;font-size:13px;padding:0;margin-top:1em;margin-bottom:.5em;">Bob Dylan</p>" />
<input type="hidden" name="c3" value="<p><strong>SEARCH</strong><br /><a href="http://search.news.yahoo.com/search/news/?p=%22Bob+Dylan%22&fr=yqovly1">News</a> | <a href="http://search.news.yahoo.com/search/news/?p=%22Bob+Dylan%22&c=news_photos&fr=yqovly2">News Photos</a> | <a href="http://images.search.yahoo.com/search/images?p=%22Bob+Dylan%22&fr=yqovly3">Images</a> | <a href="http://search.yahoo.com/search?p=%22Bob+Dylan%22&fr=yqovly4">Web</a></p>" />
</form>
<a href="http://search.news.yahoo.com/search/news/?p=Bob+Dylan" onClick="activateYQinl(this);return false;" class="yqimgins" title="Related information on Bob Dylan">Bob Dylan
</a>
</span>
's "Blowin' In the Wind" touched on the tumult of the civil rights era; Marvin Gaye's "What's Going On" spoke to the social upheaval of the wartime 1970s; 1985's star-studded "We Are The World" addressed the heartbreaking starvation of millions in Somalia.
</p>
</body>
</html>
<--- END OF HTML---->
In fact, the html can be as simple as this for display the problem.
<html>
<body>
<p>
<span class="sth">
<form method="post>
</form>
</span>
</p>
</body>
</html>