RewriteRule Ban Help

Banning bots using a RewriteRule in .htaccess is easy. Here’s a quick primer about the ban types & how to combine them. Although first you should try to ban bots using robots.txt.

The Basics

Like robots.txt, .htaccess is a file you create in the top html directory of your website.

Using RewriteRule bans requires these 3 steps:

  1. Turn on the Rewrite engine
  2. Have some conditions (“RewriteCond”)
  3. Take an action (“RewriteRule”)

Turn on the Rewrite engine

RewriteEngine On

Put that once near the top of .htaccess & you’re done.

Next, pick your ban criteria. You can use any combination of the various criteria below — you can ban using a mix of UserAgents, IP addresses, etc. Just add OR in the square brackets after each RewriteCond line, except for the last one. [NC] means ignore letter case.

RewriteCond: Ban by UserAgent

RewriteCond %{HTTP_USER_AGENT} CrystalSemanticsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GrapeshotCrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mangoway [NC]

If you want to ban several UserAgents in one shot, put them in parentheses with pipes between each:

RewriteCond %{HTTP_USER_AGENT} (CrystalSemanticsBot|GrapeshotCrawler|Mangoway) [NC]

RewriteCond: Ban by IP Address

RewriteCond %{REMOTE_ADDR} ^208\.78\.85 [OR]
RewriteCond %{REMOTE_ADDR} ^208\.66\.97 [OR]
RewriteCond %{REMOTE_ADDR} ^208\.66\.100

Or combined into one line:

RewriteCond %{REMOTE_ADDR} ^(208\.78\.85|208\.66\.97|208\.66\.100)

RewriteCond: Ban by HTTP Referrer

RewriteCond %{HTTP_REFERER} ^http://search\.comodo\.com [NC]

Without getting into regular expressions, the carat (^) matches the beginning of the URL. The backslashes tell the matching engine that you’re matching periods; otherwise they’ll match any character.

RewriteRule: Take an action

RewriteRule !^robots\.txt - [F]

This sends back a bare bones 403 Forbidden response, for everything except robots.txt. It’s low-bandwidth & should be pretty clear to whoever is affected by your RewriteCond criteria that they’ve been banned. The “L” (last) flag isn’t necessary here — The “F” forbidden flag also implies “last” & no other RewriteRules that follow are evaluated.

Another option is:

RewriteRule !^(robots\.txt|nope\.html) [R=301,L]

That redirects anything matching your RewriteCond criteria to nope.html, but excludes nope.html requests so it doesn’t end up in an endless loop.

Hope that helps. Any suggestions, please add your comments below.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>