Honey Pot / Spider Trap

Jake’s getting email at denver_lick at floppedthenuts reminded me that there are email harvesters out there. While when you put in your email address to post a comment, etc, it is automatically protected, if you post an email address inside a comment, it is not. They (and other legitimate sources, like google) are called spiders. They are programs that "crawl" the internet. It works like this:

Googlebot starts with one link. It reads all the code on that web page, indexes and caches it, and follows any links on the page. From there it finds new pages. The more pages linked to your site, the higher you go on results from a relevant search (that’s one part of the rankings anyway.) Google finds Jake’s site – BlahStuff.com because 20+ sites link to him. When they "crawl" his site, they find the link and come here. They are then supposed to read a file called robots.txt. That file says where they are not supposed to go. For example, the cgi-bin, where all the executables for MT reside. Googlebot obeys that and only follows links to areas they are supposed to go.

Spam harvesters will not follow the robots.txt file. In fact, some of them go there first. So I went here to learn how to set a trap.

What I had to do was setup a cgi file (like a web program) with my info. I then added badboys.html to my robots.txt file. (regular browsing can see the pages, but spiders are not supposed to go there.) I then made an .htaccess file (tells the web server how to behave on a directory level) to redirect any requests for badboys.html to the cgi file I added earlier. Last step, add a blank (not actually listed on the page, just on the code) link on my index page to badboys.html. You the reader don’t see it. A spider sees it and notes it.

What happens is, when a spider goes to badboys.html, it gets redirected to the cgi script, their IP address is recorded and a screen tells them what happened. From that moment on that IP address is banned from the site. Apache will not let them back on to any part of the website. If someone is not there to see the screen, oh well. BANNED. If you trip it accidently, like reading this post you decide to test it and add badboys.html to the address bar, you will see the page with your IP and a link to email me. I get email notification when the alarm is tripped. I advise you not test it, because you will NOT be able to get back on to the site, any part of it. Then you will have to email me, and I will taunt you mercilessly. Then I will remove your IP from the ban list.

~ by kinshay on 2004-01-26.

No Responses Yet to “Honey Pot / Spider Trap”

  1. I think a change of title is warranted. Henceforth, you are no longer King of the Cheesy Sitcoms, but Mad Hackah! All hail the Mad Hackah!

  2. no, he is the king of cheese sitcoms.

    if he knew as much about unix as he does about scott baio he would have been sucked up by the nsa.

Comments are closed.

 
%d bloggers like this: