Subject: | Spidering breaks app |
For some reason, search engine spidering hits my dispatch app, I get errors:
I got this error report just now:
----
ON QSR Site
ERROR CODE 500
OCCURRED ON Tue May 2 09:07:24 2006
WHEN THE URL /resources/suppliers/wisconsin_built WAS REQUESTED
FULL PATH
BY A USER AT 65.214.44.145
THE BROWSER WAS Mozilla/2.0 (compatible; Ask Jeeves/Teoma;
+http://sp.ask.com/docs/about/tech_crawling.html)
------------------------------------------------------------------------------
$VAR1 = {
'REDIRECT_UNIQUE_ID' => 'RFdZjM-8S5AAAC-aQFE',
'QUERY_STRING' => '500',
'nokeepalive' => '1',
'REDIRECT_SCRIPT_FILENAME' => '/var/www/qsr/web/resources',
'REMOTE_PORT' => '54776',
'HTTP_ACCEPT' => 'text/html, text/plain, application/x-shockwave-flash',
'HTTP_USER_AGENT' => 'Mozilla/2.0 (compatible; Ask Jeeves/Teoma;
+http://sp.ask.com/docs/about/tech_crawling.html)',
'GATEWAY_INTERFACE' => 'CGI/1.1',
'HTTP_HOST' => 'www.qsrmagazine.com',
'REDIRECT_SERVER_ADDR' => '207.252.75.146',
'REDIRECT_PATH' => '/bin:/usr/bin:/sbin:/usr/sbin',
'SCRIPT_NAME' => '/cgi-bin/errorpage.cgi',
'SERVER_NAME' => 'www.qsrmagazine.com',
'HTTP_ACCEPT_ENCODING' => 'gzip, deflate',
'REDIRECT_SERVER_PROTOCOL' => 'HTTP /1.0',
'REDIRECT_PATH_INFO' => '/suppliers/wisconsin_built',
'REDIRECT_STATUS' => '500',
'REDIRECT_HTTP_ACCEPT_ENCODING' => 'gzip, deflate',
'UNIQUE_ID' => 'RFdZjM-8S5AAAC-aQFE',
'REDIRECT_REMOTE_ADDR' => '65.214.44.145',
'SCRIPT_FILENAME' => '/var/www/qsr/web/cgi-bin/errorpage.cgi',
'REDIRECT_PATH_TRANSLATED' => '/var/www/qsr/web/suppliers/wisconsin_built',
'REDIRECT_SERVER_SOFTWARE' => 'Apache/1.3.26 (Unix) Debian GNU/Linux',
'PATH' => '/bin:/usr/bin:/sbin:/usr/sbin',
'REDIRECT_DOCUMENT_ROOT' => '/var/www/qsr/web',
'REDIRECT_REQUEST_URI' => '/resources/suppliers/wisconsin_built',
'SERVER_ADDR' => '207.252.75.146',
'SERVER_PROTOCOL' => 'HTTP /1.0',
'REDIRECT_SERVER_SIGNATURE' => '<ADDRESS>Apache/1.3.26 Server at
www.qsrmagazine.com Port 80</ADDRESS>
',
'SERVER_SIGNATURE' => '<ADDRESS>Apache/1.3.26 Server at
www.qsrmagazine.com Port 80</ADDRESS>
',
'REDIRECT_SERVER_ADMIN' => 'admin@journalistic.com',
'REDIRECT_SERVER_PORT' => '80',
'SERVER_SOFTWARE' => 'Apache/1.3.26 (Unix) Debian GNU/Linux',
'SERVER_ADMIN' => 'admin@journalistic.com',
'REDIRECT_nokeepalive' => '1',
'REMOTE_ADDR' => '65.214.44.145',
'DOCUMENT_ROOT' => '/var/www/qsr/web',
'REQUEST_URI' => '/resources/suppliers/wisconsin_built',
'REDIRECT_HTTP_HOST' => 'www.qsrmagazine.com',
'REDIRECT_REMOTE_PORT' => '54776',
'REDIRECT_HTTP_ACCEPT' => 'text/html, text/plain,
application/x-shockwave-flash',
'REDIRECT_REQUEST_METHOD' => 'GET',
'REQUEST_METHOD' => 'GET',
'REDIRECT_HTTP_USER_AGENT' => 'Mozilla/2.0 (compatible; Ask
Jeeves/Teoma; +http://sp.ask.com/docs/about/tech_crawling.html)',
'REDIRECT_URL' => '/resources/suppliers/wisconsin_built',
'REDIRECT_SCRIPT_NAME' => '/resources',
'REDIRECT_GATEWAY_INTERFACE' => 'CGI-Perl/1.1',
'REDIRECT_QUERY_STRING' => '',
'REDIRECT_SERVER_NAME' => 'www.qsrmagazine.com',
'SERVER_PORT' => '80'
};
----
I dug into the web error log and found this:
----
[Tue May 2 09:07:24 2006] null: CGI::Application::Dispatch - ERROR
Can't locate object method "_run_app" via package
"QSR::Resources::ResourceDispatch" (perhaps you forgot to load
"QSR::Resources::ResourceDispatch"?) at
/usr/share/perl5/CGI/Application/Dispatch.pm line 298.
----
This is an error report I got from Google last week:
----
ON QSR Site
ERROR CODE 500
OCCURRED ON Sat Apr 22 05:52:23 2006
WHEN THE URL /resources/suppliers/astute_solutions WAS REQUESTED
FULL PATH
BY A USER AT 66.249.66.36
THE BROWSER WAS Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)
------------------------------------------------------------------------------
$VAR1 = {
'REDIRECT_UNIQUE_ID' => 'REn818-8S5AAAEfiDro',
'QUERY_STRING' => '500',
'REDIRECT_HTTP_FROM' => 'googlebot(at)googlebot.com',
'REDIRECT_SCRIPT_FILENAME' => '/var/www/qsr/web/resources',
'REMOTE_PORT' => '54249',
'HTTP_ACCEPT' => '*/*',
'HTTP_USER_AGENT' => 'Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)',
'GATEWAY_INTERFACE' => 'CGI/1.1',
'HTTP_HOST' => 'www.qsrmagazine.com',
'REDIRECT_SERVER_ADDR' => '207.252.75.146',
'REDIRECT_PATH' => '/bin:/usr/bin:/sbin:/usr/sbin',
'SCRIPT_NAME' => '/cgi-bin/errorpage.cgi',
'SERVER_NAME' => 'www.qsrmagazine.com',
'HTTP_ACCEPT_ENCODING' => 'gzip',
'REDIRECT_SERVER_PROTOCOL' => 'HTTP /1.1',
'REDIRECT_PATH_INFO' => '/suppliers/astute_solutions',
'REDIRECT_STATUS' => '500',
'REDIRECT_HTTP_ACCEPT_ENCODING' => 'gzip',
'REDIRECT_HTTP_CONNECTION' => 'Keep-alive',
'UNIQUE_ID' => 'REn818-8S5AAAEfiDro',
'REDIRECT_REMOTE_ADDR' => '66.249.66.36',
'SCRIPT_FILENAME' => '/var/www/qsr/web/cgi-bin/errorpage.cgi',
'REDIRECT_PATH_TRANSLATED' => '/var/www/qsr/web/suppliers/astute_solutions',
'REDIRECT_SERVER_SOFTWARE' => 'Apache/1.3.26 (Unix) Debian GNU/Linux',
'PATH' => '/bin:/usr/bin:/sbin:/usr/sbin',
'HTTP_FROM' => 'googlebot(at)googlebot.com',
'REDIRECT_DOCUMENT_ROOT' => '/var/www/qsr/web',
'REDIRECT_REQUEST_URI' => '/resources/suppliers/astute_solutions',
'SERVER_ADDR' => '207.252.75.146',
'SERVER_PROTOCOL' => 'HTTP /1.1',
'HTTP_CONNECTION' => 'Keep-alive',
'REDIRECT_SERVER_SIGNATURE' => '<ADDRESS>Apache/1.3.26 Server at
www.qsrmagazine.com Port 80</ADDRESS>
',
'SERVER_SIGNATURE' => '<ADDRESS>Apache/1.3.26 Server at
www.qsrmagazine.com Port 80</ADDRESS>
',
'REDIRECT_SERVER_ADMIN' => 'admin@journalistic.com',
'REDIRECT_SERVER_PORT' => '80',
'SERVER_SOFTWARE' => 'Apache/1.3.26 (Unix) Debian GNU/Linux',
'SERVER_ADMIN' => 'admin@journalistic.com',
'REMOTE_ADDR' => '66.249.66.36',
'DOCUMENT_ROOT' => '/var/www/qsr/web',
'REQUEST_URI' => '/resources/suppliers/astute_solutions',
'REDIRECT_HTTP_HOST' => 'www.qsrmagazine.com',
'REDIRECT_REMOTE_PORT' => '54249',
'REDIRECT_HTTP_ACCEPT' => '*/*',
'REDIRECT_REQUEST_METHOD' => 'GET',
'REQUEST_METHOD' => 'GET',
'REDIRECT_HTTP_USER_AGENT' => 'Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)',
'REDIRECT_URL' => '/resources/suppliers/astute_solutions',
'REDIRECT_SCRIPT_NAME' => '/resources',
'REDIRECT_GATEWAY_INTERFACE' => 'CGI-Perl/1.1',
'REDIRECT_QUERY_STRING' => '',
'REDIRECT_ERROR_NOTES' =>
'Can\'t locate object method
"@‰`o»p£zH†‡W“ø¿•À1•`²—x{‹D”X×›Ð2“8¹–
½†À†Œ˜…’˜‡”ˆ#‘ ¿•h”å‘ø…ŽXe‰°rV”@›" via package
"Apache" (perhaps you forgot to load "Apache"?) at
/usr/share/perl5/CGI/Application/Dispatch.pm line 427.
',
'REDIRECT_SERVER_NAME' => 'www.qsrmagazine.com',
'SERVER_PORT' => '80'
};
----
(Note the REDIRECT_ERROR_NOTES part ... very odd!)
This is my dispatch module:
----
package QSR::Resources::ResourceDispatch;
use strict;
use base 'CGI::Application::Dispatch';
use CGI::Carp qw( fatalsToBrowser );
use lib '/var/www/lib';
sub dispatch_args {
return {
'prefix' => 'QSR::Resources',
'table' => [
'/equipment/:category?' => {
'app' => 'ResourceViewer', 'rm' =>
'view_category_list' },
'/suppliers' => {
'app' => 'ResourceViewer', 'rm' =>
'view_company_list' },
'/suppliers/page/:page' => {
'app' => 'ResourceViewer', 'rm' =>
'view_company_list' },
'/suppliers/byletter/:letter' => {
'app' => 'ResourceViewer', 'rm' =>
'view_company_list' },
'/suppliers/search' => {
'app' => 'ResourceViewer', 'rm' => 'search_companies' },
'/suppliers/:company' => {
'app' => 'ResourceViewer', 'rm' => 'view_company' },
'/suppliers/:company/visit' => {
'app' => 'ResourceViewer', 'rm' => 'log_clickthrough' },
'' => {
'app' => 'ResourceViewer', 'rm' => 'start' },
],
};
}
1;
----
Then in the httpd.conf, I have these relevant lines:
----
<Location /resources>
SetHandler perl-script
PerlHandler QSR::Resources::ResourceDispatch
</Location>
----
You can see it in action here:
http://www.qsrmagazine.com/resources
I'm using the mod_perl approach. Let me know if I can provide any
further information.
My guess is that spidering hits the thing pretty quickly and something's
not able to keep up.
Thanks,
Jason