Skip Menu |

This queue is for tickets about the HTTP-DetectUserAgent CPAN distribution.

Report information
The Basics
Id: 52056
Status: open
Priority: 0/
Queue: HTTP-DetectUserAgent

People
Owner: Nobody in particular
Requestors: ABH [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 0.02
Fixed in: (no value)



Subject: Unrecognized crawlers/browsers
Below are some currently "Unknown" (to HTTP::DetectUserAgent) user-agents that we see 500+ times a day in our logs (most of them thousands or tens of thousands of times a day). One per line; hopefully RT won't mess it up. :-) Apple-PubSub/65 Apple-PubSub/65.11 AppleSyndication/56.1 BlackBerry8100/4.2.1 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/102 UP.Link/6.3.1.17.0 BlackBerry8310/4.2.2 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/102 BlackBerry8310/4.5.0.110 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/102 BlackBerry8320/4.5.0.81 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/100 BlackBerry8330/4.3.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/104 BlackBerry8330/4.3.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105 BlackBerry8330/4.5.0.131 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/104 BlackBerry8330/4.5.0.77 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105 BlackBerry8330m/4.5.0.131 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/104 BlackBerry8330m/4.5.0.138 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/189 BlackBerry8350i/4.6.1.204 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/103 BlackBerry8800/4.2.1 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/100 BlackBerry8830/4.2.2 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105 BlackBerry8900/4.6.1.114 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/100 BlackBerry8900/4.6.1.231 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/100 BlackBerry9000/4.6.0.167 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/102 BlackBerry9000/4.6.0.167 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/102 UP.Link/6.3.1.20.0 BlackBerry9530/4.7.0.148 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105 BlackBerry9530/4.7.0.75 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105 BlackBerry9530/5.0.0.328 Profile/MIDP-2.1 Configuration/CLDC-1.1 VendorID/105 BlackBerry9630/4.7.1.40 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/104 BlackBerry9630/4.7.1.40 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105 CCBot/1.0 (+http://www.commoncrawl.org/bot.html) MLBot (www.metadatalabs.com/mlbot) MOT-A-7F/00.04 UP.Browser/7.0.2.2.c.1.109 (GUI) MMP/2.0 UP.Link/5.1.2.17 MOT-A-A4/00.00 UP.Browser/7.0.2.2.c.1.120 (GUI) MMP/2.0 UP.Link/5.1.2.17 MOT-A-A4/00.01 UP.Browser/7.0.2.2.c.1.120 (GUI) MMP/2.0 UP.Link/5.1.2.17 MOT-A-B7/01.00 UP.Browser/7.2.7.2.520 (GUI) MMP/2.0 Push/PO UP.Link/5.1.2.17 MSIE 7.0 Microsoft URL Control - 6.01.9782 Motorola-w385 Obigo/Q04C1 MMP/2.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 Mozilla Mozilla/4.0 Mozilla/4.0 (compatible; Google Desktop) Mozilla/4.0 (compatible; ICS) Mozilla/5.0 Mozilla/4.0 (compatible;) Mozilla/4.0(compatible; MSIE 5.0; Windows 98; DigExt) Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 (support.voilabot@orange-ftgroup.com) Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.11) Gecko GranParadiso/3.0.11 Mozilla/5.0 (compatible; Ask Jeeves/Teoma; +http://about.ask.com/en/docs/about/webmasters.shtml) Mozilla/5.0 (compatible; Exabot/3.0; +http://www.exabot.com/go/robot) Mozilla/5.0 (compatible; Google Desktop) Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; http://desktop.google.com/) Mozilla/5.0 (compatible; Google Desktop/5.9.909.8267; http://desktop.google.com/) Mozilla/5.0 (compatible; MJ12bot/v1.2.3; http://www.majestic12.co.uk/bot.php?+) Mozilla/5.0 (compatible; MJ12bot/v1.2.4; http://www.majestic12.co.uk/bot.php?+) Mozilla/5.0 (compatible; MJ12bot/v1.2.5; http://www.majestic12.co.uk/bot.php?+) Mozilla/5.0 (compatible; MJ12bot/v1.3.0; http://www.majestic12.co.uk/bot.php?+) Mozilla/5.0 (compatible; MJ12bot/v1.3.1; http://www.majestic12.co.uk/bot.php?+) Mozilla/5.0 (compatible; OpenWeb 5.7.4-09) Mozilla/5.0 (compatible; ScoutJet; +http://www.scoutjet.com/) Mozilla/5.0 (compatible; proximic; +http://www.proximic.com) Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC- 1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; http://help.yahoo.com/help/us/ysearch/crawling/crawling-01.html) Nokia6820/2.0 (5.88) Profile/MIDP-1.0 Configuration/CLDC-1.0/1.0 (Jumpbot; http://www.jumptap.com/jumpbot; jumpbot@jumptap.com) Pingdom.com_bot_version_1.4_(http://www.pingdom.com/) SAMSUNG-SGH-T919/T919UVHL3 SHP/VPP/R5 NetFront/3.5 SMM-MMS/1.2.0 profile/MIDP- 2.1 configuration/CLDC-1.1 UniversalFeedParser/4.1 +http://feedparser.org/ Yandex/1.01.001 (compatible; Win16; I) check_http/v2053 (nagios-plugins 1.4.13) gonzo1[P] +http://www.suchen.de/faq.html larbin_2.6.3 larbin2.6.3@unspecified.mail"
Attaching the list as a text file too, in case that's easier to work with.
Apple-PubSub/65 Apple-PubSub/65.11 AppleSyndication/56.1 BlackBerry8100/4.2.1 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/102 UP.Link/6.3.1.17.0 BlackBerry8310/4.2.2 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/102 BlackBerry8310/4.5.0.110 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/102 BlackBerry8320/4.5.0.81 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/100 BlackBerry8330/4.3.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/104 BlackBerry8330/4.3.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105 BlackBerry8330/4.5.0.131 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/104 BlackBerry8330/4.5.0.77 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105 BlackBerry8330m/4.5.0.131 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/104 BlackBerry8330m/4.5.0.138 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/189 BlackBerry8350i/4.6.1.204 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/103 BlackBerry8800/4.2.1 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/100 BlackBerry8830/4.2.2 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105 BlackBerry8900/4.6.1.114 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/100 BlackBerry8900/4.6.1.231 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/100 BlackBerry9000/4.6.0.167 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/102 BlackBerry9000/4.6.0.167 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/102 UP.Link/6.3.1.20.0 BlackBerry9530/4.7.0.148 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105 BlackBerry9530/4.7.0.75 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105 BlackBerry9530/5.0.0.328 Profile/MIDP-2.1 Configuration/CLDC-1.1 VendorID/105 BlackBerry9630/4.7.1.40 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/104 BlackBerry9630/4.7.1.40 Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/105 CCBot/1.0 (+http://www.commoncrawl.org/bot.html) MLBot (www.metadatalabs.com/mlbot) MOT-A-7F/00.04 UP.Browser/7.0.2.2.c.1.109 (GUI) MMP/2.0 UP.Link/5.1.2.17 MOT-A-A4/00.00 UP.Browser/7.0.2.2.c.1.120 (GUI) MMP/2.0 UP.Link/5.1.2.17 MOT-A-A4/00.01 UP.Browser/7.0.2.2.c.1.120 (GUI) MMP/2.0 UP.Link/5.1.2.17 MOT-A-B7/01.00 UP.Browser/7.2.7.2.520 (GUI) MMP/2.0 Push/PO UP.Link/5.1.2.17 MSIE 7.0 Microsoft URL Control - 6.01.9782 Motorola-w385 Obigo/Q04C1 MMP/2.0 Profile/MIDP-2.0 Configuration/CLDC-1.1 Mozilla Mozilla/4.0 Mozilla/4.0 (compatible; Google Desktop) Mozilla/4.0 (compatible; ICS) Mozilla/5.0 Mozilla/4.0 (compatible;) Mozilla/4.0(compatible; MSIE 5.0; Windows 98; DigExt) Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 (support.voilabot@orange-ftgroup.com) Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.11) Gecko GranParadiso/3.0.11 Mozilla/5.0 (compatible; Ask Jeeves/Teoma; +http://about.ask.com/en/docs/about/webmasters.shtml) Mozilla/5.0 (compatible; Exabot/3.0; +http://www.exabot.com/go/robot) Mozilla/5.0 (compatible; Google Desktop) Mozilla/5.0 (compatible; Google Desktop/5.9.909.30391; http://desktop.google.com/) Mozilla/5.0 (compatible; Google Desktop/5.9.909.8267; http://desktop.google.com/) Mozilla/5.0 (compatible; MJ12bot/v1.2.3; http://www.majestic12.co.uk/bot.php?+) Mozilla/5.0 (compatible; MJ12bot/v1.2.4; http://www.majestic12.co.uk/bot.php?+) Mozilla/5.0 (compatible; MJ12bot/v1.2.5; http://www.majestic12.co.uk/bot.php?+) Mozilla/5.0 (compatible; MJ12bot/v1.3.0; http://www.majestic12.co.uk/bot.php?+) Mozilla/5.0 (compatible; MJ12bot/v1.3.1; http://www.majestic12.co.uk/bot.php?+) Mozilla/5.0 (compatible; OpenWeb 5.7.4-09) Mozilla/5.0 (compatible; ScoutJet; +http://www.scoutjet.com/) Mozilla/5.0 (compatible; proximic; +http://www.proximic.com) Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; http://help.yahoo.com/help/us/ysearch/crawling/crawling-01.html) Nokia6820/2.0 (5.88) Profile/MIDP-1.0 Configuration/CLDC-1.0/1.0 (Jumpbot; http://www.jumptap.com/jumpbot; jumpbot@jumptap.com) Pingdom.com_bot_version_1.4_(http://www.pingdom.com/) SAMSUNG-SGH-T919/T919UVHL3 SHP/VPP/R5 NetFront/3.5 SMM-MMS/1.2.0 profile/MIDP-2.1 configuration/CLDC-1.1 UniversalFeedParser/4.1 +http://feedparser.org/ Yandex/1.01.001 (compatible; Win16; I) check_http/v2053 (nagios-plugins 1.4.13) gonzo1[P] +http://www.suchen.de/faq.html larbin_2.6.3 larbin2.6.3@unspecified.mail"
Any plans for putting this in git? Anyway - attached is a patch that fixes it for some of the crawlers at least.
From 7b40b50afb2adc196e80aec7a85fd83a64e69816 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ask=20Bj=C3=B8rn=20Hansen?= <ask@develooper.com> Date: Wed, 25 Nov 2009 12:55:42 -0800 Subject: [PATCH 2/2] Add support for more crawlers and robots MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Ask Bjørn Hansen <ask@develooper.com> --- Changes | 2 + lib/HTTP/DetectUserAgent.pm | 27 +++++++++++++++++++ t/03.crawler.t | 60 +++++++++++++++++++++++++++++++++++++++++- t/04.robot.t | 9 ++++++ 4 files changed, 96 insertions(+), 2 deletions(-) diff --git a/Changes b/Changes index d4fcfa2..75d6fa2 100755 --- a/Changes +++ b/Changes @@ -1,5 +1,7 @@ Revision history for HTTP-DetectUserAgent + Add support for more crawlers and robots (Ask Bjørn Hansen) + 0.0.2 Sun Nov 01 11:01:00 2009 Fix Makefile.PL( add Test::Base to require modules ) Add detection code for iPhone and some Crawlers. diff --git a/lib/HTTP/DetectUserAgent.pm b/lib/HTTP/DetectUserAgent.pm index 71da395..d02d78d 100755 --- a/lib/HTTP/DetectUserAgent.pm +++ b/lib/HTTP/DetectUserAgent.pm @@ -120,6 +120,9 @@ sub _check_crawler { }elsif( index($ua, 'yahoofeedseeker') != -1){ $self->{name} = 'YahooFeedSeeker'; $self->{vendor} = 'Yahoo'; + }elsif( index($ua, 'yahooseeker') != -1){ + $self->{name} = 'YahooSeeker'; + $self->{vendor} = 'Yahoo'; } }elsif( index($ua, 'msnbot') != -1){ # http://search.msn.com/msnbot.htm @@ -137,6 +140,9 @@ sub _check_crawler { # http://help.baidu.jp/system/05.html $self->{name} = 'BaiduMobaider'; $self->{vendor} = 'Baidu'; + }elsif( index($ua, 'commoncrawl') != -1 ){ + $self->{name} = 'CCBot'; + $self->{vendor} = 'CommonCrawl'; }elsif( index($ua, 'yeti') != -1 && index($ua, 'naver') != -1){ # http://help.naver.com/robots/ $self->{name} = 'Yeti'; @@ -145,6 +151,16 @@ sub _check_crawler { # http://help.goo.ne.jp/door/crawler.html) $self->{name} = 'ichiro'; $self->{vendor} = 'goo'; + }elsif( index($ua, 'mj12bot') != -1){ + # http://www.majestic12.co.uk/bot.php + $self->{name} = 'MJ12bot'; + $self->{vendor} = 'Majestic 12'; + }elsif( index($ua, 'yandex') == 0){ + $self->{name} = 'Yandex'; + $self->{vendor} = 'Yandex'; + }elsif( index($ua, 'exabot') != -1){ + $self->{name} = 'Exabot'; + $self->{vendor} = 'Exalead'; }elsif( index($ua, 'moba-crawler') != -1){ # http://crawler.dena.jp/ $self->{name} = 'moba-crawler'; @@ -153,10 +169,16 @@ sub _check_crawler { # http://sagool.jp/ $self->{name} = 'MaSagool'; $self->{vendor} = 'Sagool'; + }elsif( index($ua, 'ask jeeves/teoma') != -1){ + $self->{name} = 'Teoma'; + $self->{vendor} = 'Ask Jeeves'; }elsif( index($ua, 'ia_archiver') != -1){ # http://www.archive.org/ $self->{name} = 'Internet Archive'; $self->{vendor} = 'Internet Archive'; + }elsif( index($ua, 'jumpbot@jumptap.com') != -1){ + $self->{name} = 'Jumpbot'; + $self->{vendor} = 'Jumptap'; }elsif( index($ua, 'tagoobot') != -1){ # http://www.tagoo.ru $self->{name} = 'Tagoobot'; @@ -169,6 +191,9 @@ sub _check_crawler { #http://ws.daum.net/aboutWebSearch.html $self->{name} = 'Daumoa'; $self->{vendor} = 'Daum'; + }elsif( index($ua, 'voilabot') != -1){ + $self->{name} = 'VoilaBot'; + $self->{vendor} = 'Orange'; }elsif( index($ua, 'spider') != -1 || index($ua, 'crawler') != -1 ){ $self->{name} = 'Unknown Crawler'; } @@ -199,6 +224,8 @@ sub _check_robot { }elsif( $block->{curl} ){ $self->{name} = 'Curl'; $self->{version} = $block->{curl}; + }elsif( $block->{check_http} ){ + $self->{name} = 'check_http'; }elsif( index( $ua, 'h2tconv' ) != -1 ){ $self->{name} = 'H2Tconv'; $self->{version} = 'Unknown'; diff --git a/t/03.crawler.t b/t/03.crawler.t index b5ce5ec..84a41fd 100755 --- a/t/03.crawler.t +++ b/t/03.crawler.t @@ -14,9 +14,9 @@ run { my $block = shift; my $ua = HTTP::DetectUserAgent->new($block->input); my $expected = $block->expected; - is $ua->type, "Crawler"; - is $ua->name, $expected->{name}; + is $ua->name, $expected->{name}, $block->input; is $ua->vendor, $expected->{vendor}; + is $ua->type, "Crawler"; } __END__ @@ -125,3 +125,59 @@ YahooFeedSeeker/1.0 (compatible; Mozilla 4.0; MSIE 5.5; http://my.yahoo.com/s/pu --- expected name: "YahooFeedSeeker" vendor: "Yahoo" + +=== YahooSeeker +--- input +Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; http://help.yahoo.com/help/us/ysearch/crawling/crawling-01.html) +--- expected +name: "YahooSeeker" +vendor: "Yahoo" + +=== CommonCrawl +--- input +CCBot/1.0 (+http://www.commoncrawl.org/bot.html) +--- expected +name: "CCBot" +vendor: "CommonCrawl" + +=== VoilaBot +--- input +Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 (support.voilabot@orange-ftgroup.com) +--- expected +name: "VoilaBot" +vendor: "Orange" + +=== Ask Jeeves +--- input +Mozilla/5.0 (compatible; Ask Jeeves/Teoma; +http://about.ask.com/en/docs/about/webmasters.shtml) +--- expected +name: "Teoma" +vendor: "Ask Jeeves" + +=== Majestic12 +--- input +Mozilla/5.0 (compatible; MJ12bot/v1.3.1; http://www.majestic12.co.uk/bot.php?+) +--- expected +name: "MJ12bot" +vendor: "Majestic 12" + +=== JumpBot +--- input +Nokia6820/2.0 (5.88) Profile/MIDP-1.0 Configuration/CLDC-1.0/1.0 (Jumpbot; http://www.jumptap.com/jumpbot; jumpbot@jumptap.com) +--- expected +name: "Jumpbot" +vendor: "Jumptap" + +=== Exabot +--- input +Mozilla/5.0 (compatible; Exabot/3.0; +http://www.exabot.com/go/robot) +--- expected +name: "Exabot" +vendor: "Exalead" + +=== Yandex +--- input +Yandex/1.01.001 (compatible; Win16; I) +--- expected +name: Yandex +vendor: Yandex diff --git a/t/04.robot.t b/t/04.robot.t index 72e7748..1bf4cae 100755 --- a/t/04.robot.t +++ b/t/04.robot.t @@ -27,3 +27,12 @@ Web::Scraper/0.24 --- expected name: "Web::Scraper" version: "0.24" + +=== Nagios +--- input +check_http/v2053 (nagios-plugins 1.4.13) +--- expected +name: check_http + + + -- 1.6.4.4