Skip Menu |

This queue is for tickets about the WWW-Robot CPAN distribution.

Report information
The Basics
Id: 72826
Status: new
Priority: 0/
Queue: WWW-Robot

People
Owner: Nobody in particular
Requestors: alex [...] framexpeditions.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.026
Fixed in: (no value)



HEllo Thanks for you WWW::Robot we used quite extensively. I love the hook configuration. We based on your module a system of distributed robots which go quite well. However, in Robot.pm, it look that Robot::UA simple_request is called far too many times as we keep when calling check_mime_type repeatedly on the same file. And this simple_request will cause a sleep to be called (and duplicated web traffic). Suggestions * why not calling that line at first? next if $url_seen{ $link }; * or call add-url-test hook also before? thanks alex It happens around line 939 of Robot.pm unless ( $self->{ 'ANY_URL' } || # only follow html links (.html or .htm or no extension) $link =~ /\.s?html?/ || $link =~ m{/$} ) # lets assume .s?html or "/" type links really are text/html { # put in some obvious ones here ... next if $link =~ /(?:ftp|gopher|mailto|news|telnet|javascript):/ ; next if $link =~ /\.(?:gif|jpe?g)/; if ( $self->{ 'CHECK_MIME_TYPES' } ) { # grab anchor / area / frame links $self->verbose( " check mime type ..." ); next unless $self->check_mime_type( $link_url_abs, [ 'text/html' ] ) ; } } # only follow links we haven't seen yet ... next if $url_seen{ $link }; $url_seen{ $link }++; next if ( exists $self->{ 'HOOKS' }->{ 'add-url-test' } and not $self->invoke_hook_functions( 'add-url-test', $link_url_abs ) );