Category Archive for 'web-robot'

Wikia Grub

Tuesday, February 12th, 2008

Owner of the robot : Wikia Inc.
Country : USA
Robot type : search engine
Description : Wikia combines human collaboration and open source search software. Wikia Search Alpha uses Lucene search technology from the Apache Software Foundation and Grub .
Grub, as created in 2000 by LookSmart, was a proprietary distributed computing platform, crawling the web from a [...]

YodaoBot

Wednesday, December 19th, 2007

Owner of the robot : NetEase
Country : China
Robot type : search engine
Description : NetEase is known for its popular Chinese portal 163.com. It was using Google as its search engine till the middle of 2007, when it decided to replace it by yodao, its own search engine.

User Agent transmitted to the visited web server :

Mozilla/5.0 [...]

Google-Sitemaps

Sunday, December 2nd, 2007

Owner of the robot : Google Inc.
Country : USA
Robot type : probe
Description : This Google probe sporadically checks the existence of the verification file of the Google Webmaster Tools. It searches for the file name written in upper and in lower case letters. The goal is probably to determine whether the server is case sensitive.
The [...]

Internet Archive

Saturday, October 13th, 2007

Owner of the robot : The Internet Archive
Country : USA
Robot type : search engine
Description : The Internet Archive was founded in 1996 as a non-profit organization. It builds an Internet library of digital collections, for researchers, historians, and scholars. Many old versions of web pages are publicly available from their web site.
ia_archiver-web.archive.org is a web [...]

Microsoft Antispam Bot

Saturday, October 13th, 2007

Owner of the robot : Microsoft Corporation
Country : USA
Robot type : search engine
Description : MSNBot is the main crawler of Microsoft for its Live Search (MSN Search) search engine. This crawler is well known, maybe too well known. Some webmasters show this robot other contents than to regular users. To fight this spam, Microsoft [...]

Speedy Spider

Thursday, August 23rd, 2007

Owner of the robot : WorldLight.com AB
Country : Sweden
Robot type : search engine
Description : Entireweb is a search engine and a supplier of search technology to several meta search engines.

User Agent transmitted to the visited web server :

Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/)

IP address range :

from 88.131.153.0 to 88.131.153.255 (entireweb.com tdcsong.se)(last visit in November 2007)
from 88.131.106.0 to 88.131.106.63 [...]

Spock

Wednesday, August 8th, 2007

Owner of the robot : Spock
Country : USA
Robot type : search engine
Description : Spock robots crawl the web to collect informations about individuals. Spock web site speaks about indexing every human being on the planet!
Spock infrastructure uses Amazon Web Services: Amazon Simple Storage Service (S3) to store millions of profile photos and Amazon’s Elastic Compute [...]

Cazoodle

Monday, August 6th, 2007

Owner of the robot : Cazoodle Inc.
Country : USA
Robot type : search engine
Description : Cazoodle was established in August 2006, as a startup company from the University of Illinois (UIUC). The goal of the company is to make web-search broader and deeper, by accessing data beyond the reach of current search engines. The company was [...]

ZoomInfo

Monday, June 11th, 2007

Owner of the robot : Zoom Information Inc.
Country : USA
Robot type : search engine
Description : ZoomInfo focusses on finding pages with information about businesses and business professionals.
User Agent transmitted to the visited web server :

NextGenSearchBot 1 (for information visit http://about.zoominfo.com/PublicSite/NextGenSearchBot.asp)

 

 
IP address range : from 67.104.0.0 to 67.111.255.255 (xo.net)
URL for more information : http://www.zoominfo.com/About/misc/NextGenSearchBot.aspx
Access control [...]

Zeusbot

Monday, June 11th, 2007

Owner of the robot : Ramon Arnella
Country : Spain
Robot type : search engine
Description : Crawler of the Ulysseek search engine (in English).
User Agent transmitted to the visited web server :

Zeusbot/0.07 (Ulysseek's web-crawling robot; http://www.zeusbot.com; agent@zeusbot.com)

 

 
IP address range : from 217.113.244.112 to 217.113.244.127 ()
URL for more information : http://www.zeusbot.com/
Access control options understood by the robot [...]