Subject: | WWW::Mechanize Bug |
Date: | Sat, 13 Jan 2007 09:34:56 -0600 |
To: | <bug-WWW-Mechanize [...] rt.cpan.org> |
From: | "Randall Belk" <randall.belk [...] bigfoot.com> |
I believe I have run across a bug in Mechanize when extracting links
from pages.
I ran across this code on Google:
<html>
<head>
<title>
Redirecting
</title>
<meta content="0; url='http://video.google.com/'"
http-equiv="refresh">
</head>
<body alink="#ff0000" text="#000000" vlink="#551a8b" link="#0000cc"
bgcolor="#ffffff">
<script type="text/javascript" language="javascript"><!--
location.replace("http://video.google.com/")
//--> </script>
</body>
</html>
Notice the single quotes ' ' around the url= .
If I do an $agent->get($link); in mechanize it returns the link with
the quotes still around it.
After some help from the folks over at comp.lang.perl.misc I made a
change to the _link_from_token subroutine that seems to fix the problem.
All I did was take the url that the sub came up with and stripped out
any possible single quotes or double quotes. So now the code looks like
this :
2167 if ( $tag eq "meta" ) {
2168 my $equiv = $attrs->{"http-equiv"};
2169 my $content = $attrs->{"content"};
2170 return unless $equiv && (lc $equiv eq "refresh") && defined
$content;
2171
2172 if ( $content =~ /^\d+\s*;\s*url\s*=\s*(\S+)/i ) {
2173 $url = $1;
Show quoted text
>>>> $url =~ s/[\'|\"]//g;
I'm sure there is a better way to do this but it works for me.
Thanks,
Randall