Subject: | WWW::RobotRules/LWP::RobotUA Does Not Respect Crawl-delay: |
Hi. This is imacat from Taiwan. I was trying LWP::RobotUA, and
found that WWW::RobotRules does not respect Crawl-delay:. The test
script is (an exact copy in WWW::RobotRules's POD):
==========
#! /usr/bin/perl -w
use WWW::RobotRules;
my $rules = WWW::RobotRules->new('MOMspider/1.0');
use LWP::Simple qw(get);
my $url = "http://sourceforge.net/robots.txt";
my $robots_txt = get $url;
$rules->parse($url, $robots_txt) if defined $robots_txt;
==========
The result I got is:
==========
imacat@rinse ~/tmp % ./test.pl
RobotRules <http://sourceforge.net/robots.txt>: Unexpected line:
Crawl-delay: 10
RobotRules <http://sourceforge.net/robots.txt>: Unexpected line:
Crawl-delay: 2
RobotRules <http://sourceforge.net/robots.txt>: Unexpected line:
Crawl-delay: 2
imacat@rinse ~/tmp %
==========
Crawl-delay: is a popular instruction that is used all over the
world, and is obeyed by Yahoo, MSN and many robots. A package written
with LWP::RobotUA with such a warning all the time could not be used.
This would make LWP::RobotUA quite useless. Besides, if a website has
specified Crawl-delay:, LWP::RobotUA should respect it instead of its
own $ua->delay(). Could you look into this and fix this soon? Thank you.