Subject: | WWW-RobotRules parsing rules of robots.txt like googlebot |
Date: | Sun, 15 May 2011 23:32:37 +0200 |
To: | bug-WWW-RobotRules [...] rt.cpan.org |
From: | Yannick Simon <yannick.simon [...] gmail.com> |
Hello
Thank you for this great library WWW-RobotRules
the is_allowed function is "ok" for the pure robots.txt rules
however,
1 - googlebot allows the rules with * characters
for instance
Disallow: /path/*/10
for instance, for googlebot
/path/sgsdfg/10 is disallowed
/path/sdfgsdfgzegz/10222D2 is disallowed
(lets take a look at http://www.google.com/robots.txt)
2 - googlebot allows the "Allow" directive
it would be great if there could be another "is_allowed" function
for instance is_allowed_extended
who acts as googlebot
if you don't have time, perhaps we can imagine i try tho develop the
"is_allowed_extended" function ? ;)
Thank You
regards
Yannick