SoGou


Owner of the robot : SOHU.COM Inc.

Country : China

Robot type : search engine

Description : Chinese search engine (index of more than one billion pages in Chinese).

User Agent transmitted to the visited web server :

  • sogou spider

 


 
IP address range : from 220.181.0.0 to 220.181.255.255 ()

URL for more information : http://corp.sohu.com/s2005/imwp-en.shtml

Access control options understood by the robot :

    User Agent to use in the robots.txt file :

     
    Last visit of this robot logged in May 2007.
    Other informations updated on February 11, 2006.

    2 Responses to “SoGou”

    1. Jacco V. says:

      this robot is quite nasty

      It somehow sniffs internet trafic and tries to access it.
      It even tries to pickup a copy of session-bound pages

       
      Webtravellog User:
      124.115.220.*** - - [12/Sep/2007:14:02:13 +0200] "GET /loginSuccess.php?sessionId=a222d271f54ef1809cc567300fe9ba3f HTTP/1.1" 200 17595 "https://*****.webtravellog.com/login/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" SSLv3 RC4-MD5

       
      And, the robot:
      220.181.19.72 - - [12/Sep/2007:14:11:03 +0200] "GET /loginSuccess.php?sessionId=a222d271f54ef1809cc567300fe9ba3f HTTP/1.1" 200 8093 "-" "Sogou Orion spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07)" SSLv3 DHE-RSA-AES256-SHA

    2. Abel Cheung says:

      Not only nasty, it even intentionally ignores my robots.txt (which has not been updated for 1.5 yr) and directly crawls all pages explicitly disallowed in robots.txt.

      Thus I’m not nice to them as well.

      Rewritecond %{HTTP_USER_AGENT} "^Sogou"
      RewriteRule .* http://www.sogou.com/ [L,R=301]

      This is implemented with firewall using packet rate limiting on ACK packets as well (yes, not SYN).