Skip Menu |

This queue is for tickets about the HTML-Parser CPAN distribution.

Report information
The Basics
Id: 8763
Status: resolved
Priority: 0/
Queue: HTML-Parser

People
Owner: Nobody in particular
Requestors: stefan.maus [...] smartit.de
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: (no value)



Subject: 'A'-tags not recognized without closing 'A'-tag
On a page like this: <html> <head> <title>page title</title> </head> <body class="normal"> <a name="top" /> <a href="link1">Test1</a> <a href="link2">Test2</a> <A href="link3">Test3</a> </body> </html> WWW::Mechanize will not recognize the link to "link1". The problem seems to be the 'A'-Tag without the closing 'A'-tag above it. I tried to locate the Bug, but I can't figure it out. Anyway, attached is a workaround patch that changed the '<a />'-tag to '<a ></a>'. Not verry pretty, but it works. This occured with the version 1.05_04 also as in an older version (sorry, I didn't wrote it down) on SuSE 9.0 with perl v5.8.1 Hope I made all correct, since this is my first patch submition and I am not verry good in english.
--- Mechanize.pm 2004-11-06 06:33:07.000000000 +0100 +++ patched_Mechanize.pm 2004-12-07 11:11:12.000000000 +0100 @@ -1761,6 +1761,8 @@ sub _extract_links_and_images { my $self = shift; + $self->{content} =~ s/<([aA])(\s+[^>]+)*\s*\/>/<$1$2><\/$1>/g; + my $parser = HTML::TokeParser->new(\$self->{content}); $self->{links} = [];
I moved this to the HTML-Parser queue, which handles our parsing at this level. Note that this is invalid HTML. The end tag for <a> tags is required, as documented here: http://www.blooberry.com/indexdot/html/tagpages/a/a-bookmark.htm Mark
Is there anything you want to change in HTML::Parser with regards to this? The default behaviour is to treat the "/" as a boolean attribute, but it you enable XML-mode then it will generate both a start_tag and end_tag event. But in XML-mode the case of the tags must match. This example was inconsistent.