Internet Archive


Owner of the robot : The Internet Archive

Country : USA

Robot type : search engine

Description : The Internet Archive was founded in 1996 as a non-profit organization. It builds an Internet library of digital collections, for researchers, historians, and scholars. Many old versions of web pages are publicly available from their web site.

ia_archiver-web.archive.org is a web robot directly managed by the Internet Archive. This robot complements the data acquisitions done by ia_archiver, the Alexa robot. Alexa seems to be the main provider of information to the Internet Archive.

The “robots” META tag is not supported. Pages including the line below can be retrieved in the index of the Internet Archive :
<meta name=”robots” content=”noindex”>

    User Agent transmitted to the visited web server :

    • Mozilla/5.0 (compatible; heritrix/2.0.0-SNAPSHOT-20071129.030306 +http://crawler.archive.org)

    IP address range :

    • from 208.70.24.0 to 208.70.31.255 (archive.org)
      (last visit in November 2007)

    User Agent transmitted to the visited web server :

    • ia_archiver-web.archive.org

    IP address range :

    • from 208.70.24.0 to 208.70.31.255 (archive.org)
      (last visit in March 2008)
    • from 207.241.224.0 to 207.241.239.255 (archive.org)
      (last visit in May 2008)

    Access control options understood by the robot :

    • robots.txt

    User Agent to use in the robots.txt file : ia_archiver

    URL’s for more information :

    Our pages about other ia_archiver robots :