Home / Blogs

Google as a Real-time Blackhole List

For those not familiar with RBL, the term means Real-time Blackhole List, it is mainly used for SPAM fighting. I have recently started playing around with Google as an RBL engine, the idea is that if the search term I use hits too many hits it is likely to be SPAM.

The danger of course is that the term could be simply popular—but the trick here is that I’m using something very special as the search term—the IP address of the poster.

The IP address shouldn’t be popular; except for a few rare cases, IP addresses listed on Google are directly related to SPAM—either they are listed under wiki-like sites as being banned, or they appear as mass-comment posters. Simply put, if your IP is listed in Google you must be up to no good.

How good is this method? Nothing is bullet proof, but if you have a suspicion of something being SPAM, put the IP in Google and see there are hits; Almost all the comment SPAM I filtered out this month had more than 100 hits in Google, all non-SPAM had either 0 or below the 10 hits mark.

BTW: A good advantage of Google is that it is quick—a few seconds to get a respond—a disadvantage is that you cannot just “hammer” them with searches or they will block you—maybe someone can pickup this idea and make an RBL from IP addresses using Google as a back-engine.

By Noam Rathaus, Chief Technology Officer

Filed Under

Comments

Suresh Ramasubramanian  –  Jan 11, 2008 2:27 AM

Not a very bright idea. Certainly one that I wouldnt automate.

I might possibly look it up to see what it appears to be, reputation-wise, but

1. That’s just one metric in a decision
2. It requires human rather than scriptintelligence

Richard Golodner  –  Jan 11, 2008 5:43 AM

I must agree with Suresh on this idea. It does require some human intelligence to be ale to make a decision based on what you get from doing a quick Google search.
    There are a variety of reasons you might see an IP address posted often, what if it an address of some sight that does not use dns to reach it, or there is no record in place for it as an administrative mistake.
    In some cases only an IP adress is used to so that trafic to that site is kept to a minimum.
    Perhaps this is not a good example, but I think there are much better and more accurate metrics for determinig if a site is a legitimate spam generator. Just my own thoughts. Not trying to be offensive.

Comment Title:

  Notify me of follow-up comments

We encourage you to post comments and engage in discussions that advance this post through relevant opinion, anecdotes, links and data. If you see a comment that you believe is irrelevant or inappropriate, you can report it using the link at the end of each comment. Views expressed in the comments do not represent those of CircleID. For more information on our comment policy, see Codes of Conduct.

CircleID Newsletter The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.

I make a point of reading CircleID. There is no getting around the utility of knowing what thoughtful people are thinking and saying about our industry.

VINTON CERF
Co-designer of the TCP/IP Protocols & the Architecture of the Internet

Related

Topics

Cybersecurity

Sponsored byVerisign

Threat Intelligence

Sponsored byWhoisXML API

DNS

Sponsored byDNIB.com

Brand Protection

Sponsored byCSC

IPv4 Markets

Sponsored byIPv4.Global

New TLDs

Sponsored byRadix

Domain Names

Sponsored byVerisign