Filtering Spam at the Transport Level

Home / Blogs

Filtering Spam at the Transport Level

	By John Levine Author, Consultant & Speaker
	December 27, 2011 Views: 10,847 Add Comment

An interesting new paper from the Naval Postgraduate School (paper here, conference slides here) describes what appears to be an interesting new twist on spam filtering, looking at the characteristics of the TCP session through which the mail is delivered.

They observe that bots typically live on cable or DSL connections with slow congested upstreams. TCP sessions from bots turn out to be fairly easy to recognize by RTT, window, and retransmits, something that people have known at least since a paper at the 2008 CEAS conference on the topic.

This paper tries to see whether it would be practical to use that info to manage spam in real time. They have a network analyzer called SpamFlow that figures out per-connection characteristics. Then as a proof of concept they wrote a Spamassassin plugin to train on the data from SpamFlow and try and do filtering. They do some sort of hand-wavey load testing to see whether SpamFlow can keep up with a realistic mail load, and if it trains fast enough that it would provide useful data in real time. They claim that their results show that it does both.

It’s not obvious how best you would use this in combination with all of the other anti-spam tools people we have, most notably blacklists like the CBL that very accurately identify IPs of botted hosts by looking at the characteristics of mail received at large spamtraps. One thing that occurs to me is this sort of thing might be useful if mail moves to IPv6, since building v6 blacklists will be hard due to the size of the address space, while this lets you estimate the bottiness of each connection directly. Also, rather than accepting or rejecting mail, you might slow down mail reception from hosts that seem to be bots, both to give preference to non-bot senders, and because bots tend to be impatient so if you slow down a dubious connection and it gives up, it was probably a bot. The Turntide appliance did something similar five years ago, although it used different heuristics for deciding what to slow down.

This technique looks only at the characteristics of the TCP session, and not at the contents of the session, which means it also doesn’t look at the contents of the messages. It might be useful in contexts where for legal or political reasons the spam filter isn’t allowed to look at the messages, but users want spam filtering anyway. The authors point out that it is in principle applicable to any TCP transaction, so it might be useful against web queries from bots, too.

It’s hardly a FUSSP, but it’s an interesting paper.

By John Levine, Author, Consultant & Speaker

Filed Under

Comments

The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.

I make a point of reading CircleID. There is no getting around the utility of knowing what thoughtful people are thinking and saying about our industry.

VINTON CERF
Co-designer of the TCP/IP Protocols & the Architecture of the Internet