|
Recently, a couple of anti-spam (or at least email security related) bloggers have written some articles about IPv6 and the challenges that the email industry faces regarding it. John Levine, who has written numerous RFCs and a couple of books about spam fighting, writes the following in his article A Politically Incorrect Guide to IPv6, part III:
We will eventually figure out both how people use IPv6 addresses for mail, and how to manage and publish v6 reputation data (I’ve been doing some experiments, which I’ll blog about when I have enough results), but until then, running a mail server on v6 will be a lot harder than running one on v4. And since you’ll be able to handle all the real mail on v4, why bother?
We will eventually figure out both how people use IPv6 addresses for mail, and how to manage and publish v6 reputation data (I’ve been doing some experiments, which I’ll blog about when I have enough results), but until then, running a mail server on v6 will be a lot harder than running one on v4. And since you’ll be able to handle all the real mail on v4, why bother?
Barry Leiba, another email security writer, writes the following on CircleID on an article entitled IP Blocklists, Email, and IPv6:
John Levine has one approach: leave the email system on IPv4 for the foreseeable future. Even, John points out, when many other services, customer endpoints, mobile and household devices, and the like have been—have to have been—switched to IPv6, we can still run the Internet email infrastructure on IPv4 for a long time, leaving the IP blocklists with v4 addresses, and a system that we’re already managing fine with.
John Levine has one approach: leave the email system on IPv4 for the foreseeable future. Even, John points out, when many other services, customer endpoints, mobile and household devices, and the like have been—have to have been—switched to IPv6, we can still run the Internet email infrastructure on IPv4 for a long time, leaving the IP blocklists with v4 addresses, and a system that we’re already managing fine with.
Of course, some day, we’ll want to completely get rid of IPv4 on the Internet, and by then we’ll need to have figured out a replacement for the IP blocklist mechanism. But John’s right that that won’t be happening for many years yet, and he makes a good case for saying that we don’t have to worry about it.
Both writers are saying the same thing, and I have been on discussion threads where the consensus was similar: there is no agreement on how to handle IPv6 over email at least in the short term, but eventually it will probably have to be figured out (there are some believe mail will never move to IPv6 vs some who think that it will have to go there one of these days). In the meantime, just use IPv4 to send mail.
To expand a bit on what both writers are saying, the biggest reason why no mail providers are particularly thrilled about using IPv6 to handle email is because there is no way at the moment to deal with the problem of abuse. Today, spammers make extensive use of botnets. Each day, they compromise new machines and start using them to spew out spam. Each of these bots use different IP addresses, and the IP addresses change all of the time. I haven’t done an analysis in a while, but if you had 10,000 IP addresses today that are sending out spam, then tomorrow there would be 10,000 again but at least 9700 of them would be different IP addresses than were there the previous day.
The reason that there is so much rotation in IP addresses is because spam filters today make use of IP blocklists. When a blocklist service detects that an IP is sending spam, it adds it to the blocklist and rejects all mail from it. There are exceptions to this rule such as a legitimate IP that sends a majority of good mail (such as a Hotmail or Gmail IP address), but in general, mail servers reject all mail from blocklisted IPs. The reason they do this is the following:
Those are the two primary reasons to use IP blocklists. They are essential in blocking spam. Next up, the question is how blocklists are populated, and I’m going to leave that aside because there are resources elsewhere on how to deal with that. Blocklist operators publish their lists in two ways:
In terms of effectiveness, we run XBL in front of PBL and XBL blocks about 4 times as much mail as PBL (I don’t know how many would be blocked if we ran them in reverse). The XBL is better at catching individual bots that are sending out spam but are not listed anywhere (they are new IPs) whereas the PBL is better at pre-emptively catching mail servers that should never send out spam (probable bots but it doesn’t matter because they shouldn’t be sending mail anyhow). They are designed to be used in tandem. However, if we had to list every single PBL IP singly instead of compressing it into CIDR ranges, and if we use about the same ratio of 7 million IPs ~ 100 megs, then the PBL would be 9.4 gigs in total size. 9.4 gigs is a large file size. It isn’t completely unmanageable but it goes from being a minor inconvenience to being a major one. It takes a long time to download/upload/process a 9.4 gig file. It’s also far easier to store the file entries in a database if it is only 500,000 entries (or even 7 million) vs 650 million of them. Databases that large start to run into the problem of scale.
The PBL and XBL are prime examples of why different styles of IP blocklists are required. The PBL lists 650 million IPs and we still have over 7 million IPs on the XBL that aren’t on the PBL. Clearly, spamming bots can and do move around such that they are not listed on the lists that have large swaths listed. Bots are very good at hiding in places that are not called out and blocked yet. If they could not do this they would not be in business, and spammers are still in business. The fact is that given enough space to hide, spammers will hide in that space. The problem that we in the industry face is that as soon as we find a hiding space, we can block it for a bit but the spammer will vacate it, relocate elsewhere and continue to spam.
And therein is the problem of IPv6. An IPv4 IP address consists of 4 octets, and each octet is a number running from 0-255. This means that there are 256 x 256 x 256 x 256 possible IP addresses, which is 4.2 billion possible IP addresses. In reality, there are far less than this because there are lots of ranges of IPs that are reserved and not for public consumption. Still, using our formula from above, if you had to list every single IP address singly in a file, then the size of the file would be 61 gigs. 61 gigs is a very large file size and there are very few pieces of hardware that can handle that size of file in memory (whether you are doing IP blocklist look ups in rbldnsd or some other in-memory solution on-the-box). Processing the file and cleaning it up would take a very long time; you simply couldn’t do it in real time where IP blocklists need to be updated frequently (once per hour at a bare minimum).
Sponsored byVerisign
Sponsored byIPv4.Global
Sponsored byDNIB.com
Sponsored byRadix
Sponsored byVerisign
Sponsored byCSC
Sponsored byWhoisXML API
Listing large blocks is not a sign of excellence. I happened to be blocked by a DNSBL just because someone else spammed from the same /24 that I was using. Collisions cannot be avoided in the rare case of NATted servers, but serious DNSBLs should avoid overusing CIDR. They should list addresses one by one, as they catch them. The PBL is semantically different, as it enforces RFC 4409 relying on network operators who communicate what blocks they allocate for uses that don’t provide for mail relaying. Running PBL for IPv6 can be done as soon as the caching problem is solved.
For numbers, I reckon I’d need 1TB of disk to track the whole IPv4 address space using ipq bdb. As that filter deploys Berkeley DB, the number of entries is not a problem at all. (The partition I’m using is actually less than 1T, but then my local BL contains only the 0.008% of all IPv4 addresses.) Admittedly, this is not the same as DNS caching, but proves that you can run block lists of any size.
As you say, IPv6 implies 16-byte keys, so the database takes more space on disk for the same number of entries. This problem can be solved. The other problem, that IPv6 allocations are given out in large blocks, cannot be solved. In other words, the block that the B in DNSBL stands for is meant to be a verb, not a block of addresses.