Skip Menu |

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id: 47748
Status: resolved
Priority: 0/
Queue: HTML-Parser

People
Owner: Nobody in particular
Requestors: JMEHNLE [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: (no value)
Fixed in: (no value)



Subject: Handle <unclosed </tags
The other day, I received a spam e-mail with a text/html body part like this: ============================================================== blah blah<br><br <a href=http://domain/path.html target=_blank>Go!</a><br><p>blah ============================================================== My spam filter failed to parse the href URL from the message body due to the unclosed "<br" tag. Closing it causes HTML::Parser to correctly parse the URL. I noticed that http://search.cpan.org/dist/HTML-Parser/Parser.pm#BUGS says: «Unclosed start or end tags, e.g. "<tt<b>...</b</tt>" are not recognized.» I don't understand what the implication of this is, however. Is it a conscious decision not to support unclosed tags, or has there just been no use case for a fix? I tried how various browsers handle the HTML code from the spam message above: At least the following do render the link despite the preceding broken "<br" tag: Firefox 3, Konqueror from KDE 3.5.9, Safari 3 & 4, Mail.app At least the following do NOT render the link: IE 6, Opera 9.63 I'd appreciate it if an option could be added to HTML::Parser to recognize unclosed tags.