Subject: | Additional googlebot incompatibility |
blekko got flamed by webmasters until we parsed robots.txt like google does. There's already a bug 68219 (https://rt.cpan.org/Public/Bug/Display.html?id=68219) about * and Allow. The additional things we were flamed about are:
1) blank lines should be ignored. Webmasters frequently have stuff like
User-agent: googlebot
Disallow: /
And expect the disallow to be applied to googlebot and not *. Same for
User-agent: googlebot
# a comment
Disallow: /
2) Trailing $
Disallow: .mp3$
should in fact disallow /foo.mp3
I would be happy to donate our testsuite.
I don't think anyone should be using a non-googlebot-compatible robots.txt parser these days. But if you want to keep a useless but standard-compliant mode around, it's easy enough to divide the tests up into the ones that obey the standard and the ones that obey the reality.