Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the WWW-Mechanize CPAN distribution.

Report information
The Basics
Id: 2087
Status: resolved
Priority: 0/
Queue: WWW-Mechanize

People
Owner: Nobody in particular
Requestors:
Cc: MARKSTOS [...] cpan.org
AdminCc:

Bug Information
Severity: (no value)
Broken in: 0.36
Fixed in: (no value)



Subject: Allow follow() to match again other link attributes.
From the POD... $agent->follow($string|$num) Follow a link. If you provide a string, the first link whose text matches that string will be followed. Text associated with links is not always unique, so it would be useful (to me, anyway ;) if follow() could be made to match $string against other link attributes such as the href or name, as something like... $agent->follow($string, LINK_HREF); ...would be much more preferrable to the current workaround of rolling my own version of follow()... LINK: foreach my $link ( @{ $agent->links } ) { if ( $link[LINK_HREF] =~ /$string/ ) { $agent->get( $link[LINK_HREF] ); last LINK; } } A proof of concept patch against 0.36 is attatched, although it's limiting to a single attribute, where opting for a hashref (eg, $agent->follow($string|$num|$hashref)) instead could allow for chained matching. eg, $agent->follow( { text => 'Download', href => 'zip' } );
--- Mechanize.pm.orig Wed Jan 22 18:53:11 2003 +++ Mechanize.pm Tue Feb 11 15:19:05 2003 @@ -34,6 +34,8 @@ =cut +require 5.006; # our() not available < 5.006 + use strict; use warnings; @@ -45,8 +47,22 @@ use Carp; use URI::URL; -our @ISA = qw( LWP::UserAgent ); +use constant LINK_HREF => 0; +use constant LINK_TEXT => 1; +use constant LINK_NAME => 2; +require Exporter; + +our @ISA = qw( Exporter LWP::UserAgent ); + +our %EXPORT_TAGS = ( 'attributes' => [ qw( + LINK_HREF + LINK_TEXT + LINK_NAME +) ] ); +our @EXPORT_OK = ( @{ $EXPORT_TAGS{'attributes'} } ); +our @EXPORT = ( ); + =head1 VERSION Version 0.35 @@ -206,9 +222,14 @@ =cut sub follow { - my ($self, $link) = @_; + my ($self, $link, $attribute) = @_; my @links = @{$self->{links}}; my $thislink; + + unless ( defined($attribute) && $attribute =~ /^\d+$/ ) { + $attribute = LINK_TEXT; + } + if ( $link =~ /^\d+$/ ) { # is a number? if ($link <= $#links) { $thislink = $links[$link]; @@ -219,7 +240,7 @@ } } else { # user provided a regexp LINK: foreach my $l (@links) { - if ($l->[1] =~ /$link/) { + if ($l->[$attribute] =~ /$link/) { $thislink = $l; # grab first match last LINK; } @@ -231,7 +252,7 @@ } } - $thislink = $thislink->[0]; # we just want the URL, not the text + $thislink = $thislink->[LINK_HREF]; # we just want the URL, not the text $self->_push_page_stack(); $self->get( $thislink );
From: Arjen
[guest - Tue Feb 11 17:23:56 2003]: Websites where one can find electronic versions of (academic) journal articles, e.g., sciencedirect.com, generally present the stuff you want, followed by links to possible *generic* actions. Example of two entries on 1 page: On the theory of reference-dependent preferences, Pages 407-428 Alistair Munro and Robert Sugden Abstract | Full Text + Links | PDF (149 K) Melioration learning in games with constant and frequency-dependent pay-offs, Pages 429-448 Thomas Brenner and Ulrich Witt Abstract | Full Text + Links | PDF (114 K) In this case, doing $agent->follow('PDF') (comparable with the suggested HREF is of type 'zip') is not useful, as you do not just want to follow a pdf/zip link, but follow the link to the pdf just right after the correct pagenumbers are mentioned. This is problem that can occur for several other applications, I imagine. For example, screen scraping your inbox from webmail: for each subject line you can choose 'reply', 'read', 'delete', etc., but the links to those actions are not distinguishable by their name(or href/uri!) for the different emails. I think this problem is ultimately solved by having a function "follow_after(R1, R2)", which matches R1 on the text (being not html codes or link-text), and matches R2 on the links that follow after the match of R1. Following the implementation of $agent->follow, we can allow R2 to be an integer (possibly negative), which gives the link number starting from the match of R1. I have made an implementation already, but that was for 0.32. Expect a patch asap, once I've sorted out if it works with 0.36
From: mark [...] summersault.com
I ran into this today myself. I'll find matching against "name" and "href" useful as well.
Sending the previous mail has failed. Please contact your admin, they can find more details in the logs.