# robots.txt file for DD virtual server # in principle: allow all robots access - except: # - Gulliver of Northern Light # - InternetSeer # - PicSearch # - IBM Almaden crawler # - NameProtect's bot NPBot # - Google should not index ctglinks # explicitly allow the Google Adsense bot # for the rest: don't follow "counted" external links User-agent: AboutUsBot Disallow: / User-agent: Gulliver Disallow: / User-agent: InternetSeer Disallow: / User-agent: InternetSeer.com Disallow: / User-agent: sitecheck.internetseer.com Disallow: / # Ban Picsearch.org's bot: SE for images only - http://www.picsearch.com/menu.cgi?item=Picsearch # NOTE: If your site has been indexed by Picsearch and you do not wish to be included in the Picsearch index, please e-mail Picsearch at remove@picsearch.com and provide the full URL you wish to have removed. Picsearch will promptly deal with your request and remove your site along with the thumbnail references. User-Agent: psbot Disallow: / # Ban almaden's crawler: Info sold, not for my benefit; "For more information please refer to http://www.almaden.ibm.com/WebFountain" User-agent: http://www.almaden.ibm.com/cs/crawler Disallow: / # Ban NPBot - see http://www.nameprotect.com/botinfo.html; does seem to respect robots.txt, but check IPs! # see also: "carfac" on http://www.webmasterworld.com/forum11/1832.htm and http://weblog.bergersen.net/archives/000540.html User-Agent: NPBot Disallow: / # keep Google away from archive 'ctglinks' # per request from Adam Honig 2004-04-26 - see mail in Java Woman / Oher User-agent: Googlebot Disallow: /archive/ctglinks/ Disallow: /go/ Disallow: /419/ Disallow: /cv/ # allow Google bot for AdSense User-agent: Mediapartners-Google* Disallow: # all (other) robots User-agent: * # don't follow "counted" links Disallow: /go/ # don't go to 419 site (pity that's a subdirectory - but it won't be for long now!) Disallow: /419/ # CV isn't here any more Disallow: /cv/