|
For those not familiar with RBL, the term means Real-time Blackhole List, it is mainly used for SPAM fighting. I have recently started playing around with Google as an RBL engine, the idea is that if the search term I use hits too many hits it is likely to be SPAM.
The danger of course is that the term could be simply popular—but the trick here is that I’m using something very special as the search term—the IP address of the poster.
The IP address shouldn’t be popular; except for a few rare cases, IP addresses listed on Google are directly related to SPAM—either they are listed under wiki-like sites as being banned, or they appear as mass-comment posters. Simply put, if your IP is listed in Google you must be up to no good.
How good is this method? Nothing is bullet proof, but if you have a suspicion of something being SPAM, put the IP in Google and see there are hits; Almost all the comment SPAM I filtered out this month had more than 100 hits in Google, all non-SPAM had either 0 or below the 10 hits mark.
BTW: A good advantage of Google is that it is quick—a few seconds to get a respond—a disadvantage is that you cannot just “hammer” them with searches or they will block you—maybe someone can pickup this idea and make an RBL from IP addresses using Google as a back-engine.
Sponsored byVerisign
Sponsored byIPv4.Global
Sponsored byRadix
Sponsored byVerisign
Sponsored byCSC
Sponsored byDNIB.com
Sponsored byWhoisXML API
Not a very bright idea. Certainly one that I wouldnt automate.
I might possibly look it up to see what it appears to be, reputation-wise, but
1. That’s just one metric in a decision
2. It requires human rather than scriptintelligence
I must agree with Suresh on this idea. It does require some human intelligence to be ale to make a decision based on what you get from doing a quick Google search.
There are a variety of reasons you might see an IP address posted often, what if it an address of some sight that does not use dns to reach it, or there is no record in place for it as an administrative mistake.
In some cases only an IP adress is used to so that trafic to that site is kept to a minimum.
Perhaps this is not a good example, but I think there are much better and more accurate metrics for determinig if a site is a legitimate spam generator. Just my own thoughts. Not trying to be offensive.