Skip Menu |

This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id: 14966
Status: rejected
Priority: 0/
Queue: HTML-Tree

People
Owner: Nobody in particular
Requestors: jianmao [...] iit.edu
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 3.18
Fixed in: (no value)



Subject: "FORM" is always a daughter of "body"?
distribution name and version: HTML-Tree-3.18 perl version: "v5.8.6 built for MSWin32-x86-multi-thread" & " v5.8.0 built for i386-linux-thread-multi" operating system: "windows xp Home Edition version 2002 SP 2" & "linux 2.4.20-8bigmem" problem: if I didn't make a mistake, it seems to me that a wrong tree is built when a "form" is inside of a "span", which is in turn inside of a "p". The "form" is mistakenly listed as a sibling of "p". Further, "a" is also treated as a sibling of "p" too. It may be argued that it won't hurt that a "form" can be the daughter of "body". This is fine if this "form" is not a part of other non-body tags. Otherwise, in our case, if it is treated like this, then "a" after "form" becomes a daughter of "body" too, which it in fact should be part of the paragraph marked by "p". And my intention is to extract texts for each paragraph. But because of this, any text after a "form" will be missing from the paragraph they should appear. Please verify if this is a bug. The html code is given below. Please check. <---BEGIN OF HTML---> <html> <head> <title>test </title> <body> <p> <span class="yqlink"> <form class="yqin" action="http://yq.search.yahoo.com/search" method="post"> <input type="hidden" name="p" value="&quot;Bob Dylan&quot;" /> <input type="hidden" name="sourceOrder" value="c1,i,yn,c3" /> <input type="hidden" name="c1" value="<p style=&quot;font-family:arial,sans-serif;font-weight:bold;font-size:13px;padding:0;margin-top:1em;margin-bottom:.5em;&quot;>Bob Dylan</p>" /> <input type="hidden" name="c3" value="&lt;p&gt;&lt;strong&gt;SEARCH&lt;/strong&gt;&lt;br /&gt;&lt;a href=&quot;http://search.news.yahoo.com/search/news/?p=%22Bob+Dylan%22&amp;fr=yqovly1&quot;&gt;News&lt;/a&gt; | &lt;a href=&quot;http://search.news.yahoo.com/search/news/?p=%22Bob+Dylan%22&amp;c=news_photos&amp;fr=yqovly2&quot;&gt;News Photos&lt;/a&gt; | &lt;a href=&quot;http://images.search.yahoo.com/search/images?p=%22Bob+Dylan%22&amp;fr=yqovly3&quot;&gt;Images&lt;/a&gt; | &lt;a href=&quot;http://search.yahoo.com/search?p=%22Bob+Dylan%22&amp;fr=yqovly4&quot;&gt;Web&lt;/a&gt;&lt;/p&gt;" /> </form> <a href="http://search.news.yahoo.com/search/news/?p=Bob+Dylan" onClick="activateYQinl(this);return false;" class="yqimgins" title="Related information on Bob Dylan">Bob Dylan </a> </span> 's "Blowin' In the Wind" touched on the tumult of the civil rights era; Marvin Gaye's "What's Going On" spoke to the social upheaval of the wartime 1970s; 1985's star-studded "We Are The World" addressed the heartbreaking starvation of millions in Somalia. </p> </body> </html> <--- END OF HTML----> In fact, the html can be as simple as this for display the problem. <html> <body> <p> <span class="sth"> <form method="post> </form> </span> </p> </body> </html>
Show quoted text
> problem: if I didn't make a mistake, it seems to me that a wrong tree > is built when a "form" is inside of a "span", which is in turn > inside of a "p". The "form" is mistakenly listed as a sibling of > "p". Further, "a" is also treated as a sibling of "p" too.
FORM is a block-level element, as is P (and DIV). SPAN and (A) is an inline element, intended only for use within inline elements. Blocks can contain inline elements, but inline elements cannot contain blocks and be considered valid HTML. Block-level elements also can't be nested in valid HTML, which is why a FORM inside a DIV isn't proper, either. Marking as REJECTED, as this is not a bug according to spec.