Category Archive for 'web-robot'
Tuesday, February 12th, 2008
Owner of the robot : Wikia Inc. Country : USA Robot type : search engine Description : Wikia combines human collaboration and open source search software. Wikia Search Alpha uses Lucene search technology from the Apache Software Foundation and Grub . Grub, as created in 2000 by LookSmart, was a proprietary distributed computing platform, crawling [...]
Posted in web-robot | No Comments »
Wednesday, December 19th, 2007
Owner of the robot : NetEase Country : China Robot type : search engine Description : NetEase is known for its popular Chinese portal 163.com. It was using Google as its search engine till the middle of 2007, when it decided to replace it by yodao, its own search engine. User Agent transmitted to the [...]
Posted in web-robot | No Comments »
Sunday, December 2nd, 2007
Owner of the robot : Google Inc. Country : USA Robot type : probe Description : This Google probe sporadically checks the existence of the verification file of the Google Webmaster Tools. It searches for the file name written in upper and in lower case letters. The goal is probably to determine whether the server [...]
Posted in web-robot | No Comments »
Saturday, October 13th, 2007
Owner of the robot : The Internet Archive Country : USA Robot type : search engine Description : The Internet Archive was founded in 1996 as a non-profit organization. It builds an Internet library of digital collections, for researchers, historians, and scholars. Many old versions of web pages are publicly available from their web site. [...]
Posted in web-robot | No Comments »
Saturday, October 13th, 2007
Owner of the robot : Microsoft Corporation Country : USA Robot type : search engine Description : MSNBot is the main crawler of Microsoft for its Live Search (MSN Search) search engine. This crawler is well known, maybe too well known. Some webmasters show this robot other contents than to regular users. To fight this [...]
Posted in web-robot | No Comments »
Thursday, August 23rd, 2007
Owner of the robot : WorldLight.com AB Country : Sweden Robot type : search engine Description : Entireweb is a search engine and a supplier of search technology to several meta search engines. User Agent transmitted to the visited web server : Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/) IP address range : from 88.131.153.0 to 88.131.153.255 (entireweb.com tdcsong.se)(last [...]
Posted in web-robot | No Comments »
Wednesday, August 8th, 2007
Owner of the robot : Spock Country : USA Robot type : search engine Description : Spock robots crawl the web to collect informations about individuals. Spock web site speaks about indexing every human being on the planet! Spock infrastructure uses Amazon Web Services: Amazon Simple Storage Service (S3) to store millions of profile photos [...]
Posted in web-robot | No Comments »
Monday, August 6th, 2007
Owner of the robot : Cazoodle Inc. Country : USA Robot type : search engine Description : Cazoodle was established in August 2006, as a startup company from the University of Illinois (UIUC). The goal of the company is to make web-search broader and deeper, by accessing data beyond the reach of current search engines. [...]
Posted in web-robot | No Comments »
Monday, June 11th, 2007
Owner of the robot : Zoom Information Inc. Country : USA Robot type : search engine Description : ZoomInfo focusses on finding pages with information about businesses and business professionals. User Agent transmitted to the visited web server : NextGenSearchBot 1 (for information visit http://about.zoominfo.com/PublicSite/NextGenSearchBot.asp) IP address range : from 67.104.0.0 to 67.111.255.255 [...]
Posted in web-robot | 1 Comment »
Monday, June 11th, 2007
Owner of the robot : Ramon Arnella Country : Spain Robot type : search engine Description : Crawler of the Ulysseek search engine (in English). User Agent transmitted to the visited web server : Zeusbot/0.07 (Ulysseek's web-crawling robot; http://www.zeusbot.com; agent@zeusbot.com) IP address range : from 217.113.244.112 to 217.113.244.127 () URL for more information [...]
Posted in web-robot | No Comments »