|
IBM researcher Nathaniel Borenstein has commented that everyone agrees that spam is bad, and that’s a huge impediment to doing anything about it. Having decided that spam is bad, it’s tempting to divide the spam problem into smaller problems and try to solve the smaller problems, then put the solutions to the subproblems together and, voila, no more spam. That would be fine if the combined subproblems were truly equivalent to the spam problem, but that’s rarely the case.
A common approach is to divide the spam problem in to the authentication problem and the introduction problem. The authentication problem involves ensuring whoever claims to have sent an e-mail message really did send it (or as a minor variant, that the recipient can detect and reject forgeries.) Authentication has gotten a lot of attention with systems like PGP, S/MIME, SPF, Sender-ID, and Domain Keys. While it’s far from solved, it’s fairly well understood.
The introduction problem involves vetting mail from people who haven’t written before. The idea is that a recipient keeps a list of people who’ve sent good e-mail. When a message arrives from someone not on the list, the sender does something to indicate good faith or non-spamminess, and is then added to the recipient’s list. If the introduction fails, the recipient might put the sender into a bad senders list, or just ignore the message so future mail from the same sender will require another introduction attempt.
The introductory something can be fairly complex and onerous, since each sender only has to introduce himself once to each recipient, and it should be onerous enough that spammers won’t go to the effort to do it. In such a system, we’d expect bad guys to try to circumvent the introduction by forging mail from someone already in the recipient’s list. That’s why the introduction approach is only useful if the authentication is good enough to prevent forgeries.
Viewed in this way, a lot of anti-spam proposals turn out really to be introduction proposals. Challenge/response, hashcash, CAPTCHAs (blurry pictures of words that the user has to retype), and refundable e-postage fall into this category. While some of these proposals are quite clever, and some of them are plausible solutions to the introduction problem, none of them solve the spam problem, because the introduction problem is not the spam problem.
For one thing, the introduction approach doesn’t match the way that people really use e-mail very well. Its model is that a stranger will write to you, you’ll decide whether you like the stranger’s mail, and then add that e-mail address to your accept or reject list. But people visit a vendor’s web site, order something, get order confirmations and (if they ask for it) newsletters from the vendor. But what address will the confirmations and newsletters come from? It’s rarely possible to predict. We can imagine schemes where as part of the ordering transaction the vendor adds its addresses to the user’s good sender list, but even if such schemes could be designed and deployed, they would be a tempting target for bad guys to subvert and stuff their addresses into unwitting users’ lists.
For another, the introduction scheme presumes that senders’ behavior stays the same, that someone who sends good mail will always send good mail and vice versa. That strikes me as extremely optimistic. In the late 1990s, spammers sent spam through other people’s existing mailing lists. They don’t spam that way now since other approaches are easier, but if the fastest way into people’s good sender lists is to piggyback on other mailing lists, they’ll do it again. They’ll join the list, possibly sending out an innocuous message or two, then blast out spam to the list until the list owner notices and cuts them off. (Yes, this has happened.)
The introduction approach presumes both that mail from unknown senders is probably spam, and that legitimate senders are interested enough in getting their message delivered to bear the burden of the introductory something. This may be true, or it may not be. I often see someone ask a question on a mailing list or newsgroup, send them an answer to the question, and get back some sort of introductory challenge. Am I going to jump through their hoops to do them a favor? Probably not.
Finally, the spam problem is unwanted bulk mail, regardless of where it comes from, not mail from strangers. I publish contact e-mail addresses in my books, and readers send me a lot of mail. It’s from people who haven’t written to me before, and it’s not spam. An accreditation system (third parties that vouch for senders) would help manage that problem a lot better than an introduction system.
Introduction systems aren’t inherently bad, but they’re not inherently related to spam, either.
Sponsored byVerisign
Sponsored byRadix
Sponsored byVerisign
Sponsored byCSC
Sponsored byDNIB.com
Sponsored byWhoisXML API
Sponsored byIPv4.Global
Attacking the spam problem head on is simply not possible. In its most general form, spam isn’t “unwanted bulk mail”, but rather “unwanted mail”. The most significant sources of unwanted mail operate in bulk, but the “bulkness” isn’t the root of the problem, merely an exacerbating factor. The spam problem is solved for subject X when subject X no longer receives unwanted mail: other factors are irrelevant. (I grant if you eliminate bulk-spam, you eliminate the bulk of the problem, but not the whole problem.)
Solving the spam problem directly means producing a system which can determine in advance whether or not a message is going to be desirable or valuable to a given recipient. Trainable filters are about the closest approximation we have to this ideal, but they are resource intensive and don’t scale well. Ideally, we would like unwanted messages to remain unsent, rather than be delivered and deleted. That ideal would involve senders having detailed knowledge of their recipients’ preferences, and intelligently targeting their messages to match. Even the most friendly and diligent senders lack the means to perform this task perfectly. In practice, some senders do a reasonable job, whereas others are bluntly inconsiderate to the point of active hostility. The hard-core “spammers” of the world are found at the latter end of this continuum.
Introduction systems are not inherently related to spam, but they are a valid defensive tool in the face of a certain kind of spam, namely the unsolicited bulk variety, known and loathed almost universally. Introduction systems work because they create a barrier to sending, either manual or monetary, and it’s unlikely that a bulk sender will have the resources or motivation to cross this barrier.
On the other hand, the barrier created by an introduction system is socially inappropriate in situations where communication is actively solicited. Questions asked on mailing lists and newsgroups are one example, mailing list subscriptions are another. Introduction systems are not a general solution, not only because they address a mere sub-class of unwanted mail, but also because they actively impede a sub-class of wanted mail.
I allege that this pattern persists across all known sort-of-anti-spam techniques: they all defeat a certain class of spam, but harm or conflict with other desirable instances of email. This is true of introduction systems, payment systems, accreditation systems, address obfuscation techniques, disposable email addresses, blocklisting, and so on. An email user with a broad range of usage patterns wouldn’t want to rely on any one of these approaches.
But that’s not to say that any of the approaches must be abandoned; rather, they must be used appropriately. Since no one technique is appropriate to all use cases, different use cases must use different techniques. A use case can’t generally be distinguished by the message itself, so individual email addresses need to be allocated to particular use-cases. Many of us do this already, creating email addresses for particular uses, although it’s a technique which doesn’t have standardised tools and interfaces at this time.
In my view, most of the problems of spam will be solved by empowering recipients with a broad range of techniques; a toolkit, with each tool appropriate to particular use cases. At the same time, we must allow recipients to distinguish their use cases by address, meaning that they must be able to create and expire addresses easily, and attach tools to addresses as appropriate. Tools in such a toolkit don’t need to be inherently related to spam, they just need to have uses that address a specific need.
One of the problems of current anti-spam thinking is that we’re still looking for the Ultimate Weapon Against Spam, rather than viewing our techniques as tools in a toolkit. The same “Ultimate Weapon” thinking means that some fail to appreciate the benefits of positive architectural changes which tip the playing field back in favour of email recipients. Sender identification schemes, for example, aren’t an anti-spam mechanism in and of themselves, but they are the kind of positive architectural improvement that we need.
The combined tools and architectural features may not solve the spam problem generally, but so long as they are close to optimal for their particular use cases, who cares?