Google as a Real-time Blackhole List

Home / Blogs

Google as a Real-time Blackhole List

	By Noam Rathaus Chief Technology Officer
	January 10, 2008 Views: 13,062 Comments: 2

For those not familiar with RBL, the term means Real-time Blackhole List, it is mainly used for SPAM fighting. I have recently started playing around with Google as an RBL engine, the idea is that if the search term I use hits too many hits it is likely to be SPAM.

The danger of course is that the term could be simply popular—but the trick here is that I’m using something very special as the search term—the IP address of the poster.

The IP address shouldn’t be popular; except for a few rare cases, IP addresses listed on Google are directly related to SPAM—either they are listed under wiki-like sites as being banned, or they appear as mass-comment posters. Simply put, if your IP is listed in Google you must be up to no good.

How good is this method? Nothing is bullet proof, but if you have a suspicion of something being SPAM, put the IP in Google and see there are hits; Almost all the comment SPAM I filtered out this month had more than 100 hits in Google, all non-SPAM had either 0 or below the 10 hits mark.

BTW: A good advantage of Google is that it is quick—a few seconds to get a respond—a disadvantage is that you cannot just “hammer” them with searches or they will block you—maybe someone can pickup this idea and make an RBL from IP addresses using Google as a back-engine.

By Noam Rathaus, Chief Technology Officer

Filed Under

Comments

Suresh Ramasubramanian – Jan 11, 2008 2:27 AM

Not a very bright idea. Certainly one that I wouldnt automate.

I might possibly look it up to see what it appears to be, reputation-wise, but

1. That’s just one metric in a decision
2. It requires human rather than scriptintelligence

# 1 Reply | Link | Report Problems

Richard Golodner – Jan 11, 2008 5:43 AM

I must agree with Suresh on this idea. It does require some human intelligence to be ale to make a decision based on what you get from doing a quick Google search.
There are a variety of reasons you might see an IP address posted often, what if it an address of some sight that does not use dns to reach it, or there is no record in place for it as an administrative mistake.
In some cases only an IP adress is used to so that trafic to that site is kept to a minimum.
Perhaps this is not a good example, but I think there are much better and more accurate metrics for determinig if a site is a legitimate spam generator. Just my own thoughts. Not trying to be offensive.

# 2 Reply | Link | Report Problems

The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.