Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the WWW-Mechanize CPAN distribution.

Report information
The Basics
Id: 2989
Status: resolved
Priority: 0/
Queue: WWW-Mechanize

People
Owner: Nobody in particular
Requestors: andy [...] petdance.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Create an image extractor for Mech.
I want to be able to get all the images on a page.
From: markstos [...] cpan.org
[PETDANCE - Sun Jul 20 17:11:15 2003]: Show quoted text
> I want to be able to get all the images on a page.
Some thoughts on this: - There may be a doc-bug relate to thing. In the docs for find_link's tag_regexp attribute, the "img" tag is mentioned. However, from reading the source, it will never be found. - find_all_images seems be easy. It would just redefine %urltags, and call _extract_links() with the new definition. I might refactor to make %urltags hidden behind an overridable method.
From: markstos [...] cpan.org
[guest - Thu Dec 4 15:15:36 2003]: Show quoted text
> [PETDANCE - Sun Jul 20 17:11:15 2003]: >
> > I want to be able to get all the images on a page.
I think this /is/ easy. Here's a proof of concept script below. It requires Mech to be patched slightly to expose %urltags. Here's the small patch: #### --- /usr/local/lib/perl5/site_perl/5.8.0/WWW/Mechanize.pm +++ /home/mark/tmp/WWW/Mechanize.pm @@ -1269,7 +1269,8 @@ =cut -my %urltags = ( +use vars qw/%urltags/; +%urltags = ( a => "href", area => "href", frame => "src", ################### Here's the proof of concept script: ### #!/usr/bin/perl use lib '/home/mark/tmp/'; use strict; use WWW::Mechanize; use Data::Dumper; %WWW::Mechanize::urltags = ( img => 'src', ); my $a = WWW::Mechanize->new(); $a->get('http://rt.cpan.org/'); print Dumper ( $a->links ); __END__ ###### I'm not suggesting it implementing quite like this, just demonstrating the the framework is already there to make this very easy.
Subject: Create an image extractor for Mech. (or at least fix img-related docs.)
From: mark [...] summersault.com
[guest - Thu Dec 4 15:27:13 2003]: Show quoted text
> [guest - Thu Dec 4 15:15:36 2003]: >
> > [PETDANCE - Sun Jul 20 17:11:15 2003]: > >
> > > I want to be able to get all the images on a page.
> > I think this /is/ easy. Here's a proof of concept script below. It > requires Mech to be patched slightly to expose %urltags. Here's the > small patch: > #### > --- /usr/local/lib/perl5/site_perl/5.8.0/WWW/Mechanize.pm > +++ /home/mark/tmp/WWW/Mechanize.pm > @@ -1269,7 +1269,8 @@ > > =cut > > -my %urltags = ( > +use vars qw/%urltags/; > +%urltags = ( > a => "href", > area => "href", > frame => "src", > ################### > > Here's the proof of concept script: > > ### > > > #!/usr/bin/perl > > use lib '/home/mark/tmp/'; > use strict; > use WWW::Mechanize; > use Data::Dumper; > > %WWW::Mechanize::urltags = ( > img => 'src', > ); > > my $a = WWW::Mechanize->new(); > $a->get('http://rt.cpan.org/'); > print Dumper ( $a->links ); > > __END__ > > ###### > > I'm not suggesting it implementing quite like this, just demonstrating > the the framework is already there to make this very easy. >
Hello, I just ran into this issue again Today. I think the bug status should be elevated to Normal, or even 'Important'. The documentation demonstrates finding img links: $mech->find_link( tag_regex => qr/^(a|img)$/ However, per the above discussion, the current code will never find any img tags. I think perhaps there should be some flag to include images in all of the functions that 'find all links'. Or perhaps it would be cleaner to just have some additional img-specific functions.
Subject: [DOC PATCH[ Create an image extractor for Mech.
From: mark [...] summersault.com
[guest - Fri Jun 4 12:22:31 2004]: Show quoted text
> [guest - Thu Dec 4 15:27:13 2003]: >
> > [guest - Thu Dec 4 15:15:36 2003]: > >
> > > [PETDANCE - Sun Jul 20 17:11:15 2003]: > > >
> > > > I want to be able to get all the images on a page.
> > > > I think this /is/ easy. Here's a proof of concept script below. It > > requires Mech to be patched slightly to expose %urltags. Here's the > > small patch: > > #### > > --- /usr/local/lib/perl5/site_perl/5.8.0/WWW/Mechanize.pm > > +++ /home/mark/tmp/WWW/Mechanize.pm > > @@ -1269,7 +1269,8 @@ > > > > =cut > > > > -my %urltags = ( > > +use vars qw/%urltags/; > > +%urltags = ( > > a => "href", > > area => "href", > > frame => "src", > > ################### > > > > Here's the proof of concept script: > > > > ### > > > > > > #!/usr/bin/perl > > > > use lib '/home/mark/tmp/'; > > use strict; > > use WWW::Mechanize; > > use Data::Dumper; > > > > %WWW::Mechanize::urltags = ( > > img => 'src', > > ); > > > > my $a = WWW::Mechanize->new(); > > $a->get('http://rt.cpan.org/'); > > print Dumper ( $a->links ); > > > > __END__ > > > > ###### > > > > I'm not suggesting it implementing quite like this, just
> demonstrating
> > the the framework is already there to make this very easy. > >
> > > Hello, > > I just ran into this issue again Today. I think the bug status should > be > elevated to Normal, or even 'Important'. The documentation > demonstrates > finding img links: > > $mech->find_link( tag_regex => qr/^(a|img)$/ > > However, per the above discussion, the current code will never find > any > img tags. > > I think perhaps there should be some flag to include images in all of > the functions that 'find all links'. Or perhaps it would be cleaner to > just have some additional img-specific functions.
Subject: [DOC PATCH] Create an image extractor for Mech.
From: mark [...] summersault.com
Sorry if the last msg was blank. I had a browser spaz. Below is a doc patch as a quick fix for the current situation. Another idea: Do 'tag' and 'tag_regex' need to be limited to the same set of tags and attributes? If I search for links with these keys, I shouldn't be surprised be surprised if I get exactly what I ask for. All that's need for img support then is to add in some extra mappings somewhere that define any img tags, and their attributes that hold the URLS. This change to the 'tag' and 'tag_regex' attributes should be backwards compatible, and in fact would be bringing the code in compliance with the docs. --- /usr/local/lib/perl5/site_perl/5.8.0/WWW/Mechanize.pm Tue Apr 13 22:44:24 2004 +++ /home/mark/tmp/Mechanize.pm Fri Jun 4 11:27:15 2004 @@ -1000,6 +1000,17 @@ $mech->find_link( tag_regex => qr/^(a|img)$/; +Currently, the following tags are supported, with Mech looking +at these particular tag attributes: + + <a href=""> + <area href=""> + <frame src=""> + <iframe src=""> + <meta content=""> + +Other tags will be ignored. + =item * C<< n => number >> Matches against the I<n>th link.