Skip Menu |

This queue is for tickets about the libwww-perl CPAN distribution.

Report information
The Basics
Id: 19539
Status: resolved
Priority: 0/
Queue: libwww-perl

People
Owner: Nobody in particular
Requestors: imacat [...] mail.imacat.idv.tw
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in:
  • 5.805
  • 5.822
  • 5.823
Fixed in: (no value)



Subject: WWW::RobotRules/LWP::RobotUA Does Not Respect Crawl-delay:
Hi. This is imacat from Taiwan. I was trying LWP::RobotUA, and found that WWW::RobotRules does not respect Crawl-delay:. The test script is (an exact copy in WWW::RobotRules's POD): ========== #! /usr/bin/perl -w use WWW::RobotRules; my $rules = WWW::RobotRules->new('MOMspider/1.0'); use LWP::Simple qw(get); my $url = "http://sourceforge.net/robots.txt"; my $robots_txt = get $url; $rules->parse($url, $robots_txt) if defined $robots_txt; ========== The result I got is: ========== imacat@rinse ~/tmp % ./test.pl RobotRules <http://sourceforge.net/robots.txt>: Unexpected line: Crawl-delay: 10 RobotRules <http://sourceforge.net/robots.txt>: Unexpected line: Crawl-delay: 2 RobotRules <http://sourceforge.net/robots.txt>: Unexpected line: Crawl-delay: 2 imacat@rinse ~/tmp % ========== Crawl-delay: is a popular instruction that is used all over the world, and is obeyed by Yahoo, MSN and many robots. A package written with LWP::RobotUA with such a warning all the time could not be used. This would make LWP::RobotUA quite useless. Besides, if a website has specified Crawl-delay:, LWP::RobotUA should respect it instead of its own $ua->delay(). Could you look into this and fix this soon? Thank you.