Bug #8763 for HTML-Parser: 'A'-tags not recognized without closing 'A'-tag

Tue Dec 07 05:25:12 2004 Guest - Ticket created

Subject:

'A'-tags not recognized without closing 'A'-tag

On a page like this: <html> <head> <title>page title</title> </head> <body class="normal"> <a name="top" /> <a href="link1">Test1</a> <a href="link2">Test2</a> <A href="link3">Test3</a> </body> </html> WWW::Mechanize will not recognize the link to "link1". The problem seems to be the 'A'-Tag without the closing 'A'-tag above it. I tried to locate the Bug, but I can't figure it out. Anyway, attached is a workaround patch that changed the '<a />'-tag to '<a ></a>'. Not verry pretty, but it works. This occured with the version 1.05_04 also as in an older version (sorry, I didn't wrote it down) on SuSE 9.0 with perl v5.8.1 Hope I made all correct, since this is my first patch submition and I am not verry good in english.

--- Mechanize.pm 2004-11-06 06:33:07.000000000 +0100 +++ patched_Mechanize.pm 2004-12-07 11:11:12.000000000 +0100 @@ -1761,6 +1761,8 @@ sub _extract_links_and_images { my $self = shift; + $self->{content} =~ s/<([aA])(\s+[^>]+)*\s*\/>/<$1$2><\/$1>/g; + my $parser = HTML::TokeParser->new(\$self->{content}); $self->{links} = [];

Tue Dec 07 10:07:47 2004 MARKSTOS [...] cpan.org - Queue changed from WWW-Mechanize to HTML-Parser

Tue Dec 07 10:09:24 2004 MARKSTOS [...] cpan.org - Correspondence added

I moved this to the HTML-Parser queue, which handles our parsing at this level. Note that this is invalid HTML. The end tag for <a> tags is required, as documented here: http://www.blooberry.com/indexdot/html/tagpages/a/a-bookmark.htm Mark

Mon Oct 24 08:22:50 2005 GAAS [...] cpan.org - Correspondence added

Is there anything you want to change in HTML::Parser with regards to this? The default behaviour is to treat the "/" as a boolean attribute, but it you enable XML-mode then it will generate both a start_tag and end_tag event. But in XML-mode the case of the tags must match. This example was inconsistent.

Tue Nov 22 16:52:08 2005 GAAS [...] cpan.org - Status changed from 'new' to 'resolved'