This site is about bad web crawlers, ad bots, & other bandwidth-wasting trash that’s filling up your site access & error logs & wasting your web hosting resources.
The irony here is painful. The business model for these services depends on high bandwidth usage of our websites. Think their priority is being a good guest? Nope.
The culprits are poorly coded web crawlers or ad verification services that mismanage URLs or make page requests at extreme rates.
Some tips for web crawler developers:
Monitor for excessive 404/403 errors, limit crawl rate, cache data. Use a custom useragent that has a crawler URL with contact info. Honor robots.txt.
I own CarComplaints.com. We get something like 12 million pageviews a month. Every so often I wade through our error logs. Usually it doesn’t take long to find companies & services that are make their business out of wasting of our bandwidth.
BadBots.net details what I’ve found for companies that deserve a site-wide ban.
Please leave comments with your experiences. If you’ve dealt with a bad bot/crawler/service, let me know. I’ll set you up so you can write about it.
Hope this site helps other site admins. Maybe even helps a badly coded crawler bot improve. Until then, please consider banning these crawler bots.