|
A long time ago in an Internet far away, nobody paid for DNS services. Not directly at least. We either ran our own servers, or got DNS service as part of our IP transit contract, or traded services with others. In ~1990 I was the operator of one of the largest name servers in existence (UUCP-GW-1.PA.DEC.COM) and I exchanged free DNS secondary service with UUNET. Two thousand zones seemed like a lot of zones back then—little did we dream that there would some day be a billion or so DNS zones world wide.
Anyway the decades went by and it turned out that almost everything connected in any way to the Internet was something important enough to pay for. I’ve watched various friends start companies for secondary name service, primary name service, even recursive name service. The standard for my incredulity has steadily risen, I like to think I’m ready for anything now, that nobody could still find some new way to monetize DNS that could still surprise me.
Ok so that was a setup. I’m surprised by something. Companies who want to sell DNS authoritative name services are out there beating each other up with competitive claims about something I would have said was crazy, which is whether it’s better to serve a DNS zone by pure anycast, pure unicast, or a deliberate and somehow necessary mixture of both anycast and unicast.
Now that I’ve seen it, the appeal of this kind of argument is understandable—it’s a subtle and complicated topic, claims on this topic are unprovable and non-falsifiable, and it’s unlikely that your competitors will be willing to change their approach to beat your claims. It’s the perfect market differentiator!
However, it’s more irritating than amusing, especially since one of of the ways my nonprofit employer (Internet Systems Consortium, which you may know as ISC or as “the BIND people”) raises funds for our free software efforts is by selling DNS secondary services (find us on the web at http://sns.isc.org/). Since there’s a good chance that some competitor will try to take a sale away from us on the basis that “ISC SNS uses anycast, and we don’t, so we’re better” I’m going to try to get out ahead of that game and lay down the facts as I see them.
History
In the beginning there was unicast, and it was good enough, because nobody’s revenue stream depended on the resiliency and performance of their DNS. Also because all of the caching recursive servers in the world ran BIND which had even at that time a really cool “server selection” method whereby the best of every DNS zone’s name servers would be found and used by each party doing DNS look-ups.
Later, commerce started. This brought in a lot of other (non-BIND) caching recursive servers who lacked BIND’s server selection methods and thus had the ability to make really bad decisions about which of a DNS zone’s name servers to talk to. This created an opportunity to do something different:
anycast.
Anycast was first used in commercial production by Rodney Joffe in 1997 or so at Genuity/GTE. For the record I started out not liking anycast since I wanted all caching recursive name servers to all have a server selection method as good as BIND’s. Anycast also makes it easier to deploy tricky CDN like features which at the time seemed like really bad ideas to me. I have since grown more sanguine about name servers being both dumber and smarter than I wanted them to be. (The market waits for no man.)
Techie Stuff
To glue itself together the Internet uses distance vector routing. This means the router connected to 192.5.5.0/24 (a block of 256 end-host addresses) tells its buddy routers “hey guys and girls, you can reach 192.5.5.0/24 through me” and they tell their buddy routers and so on until pretty soon everybody knows. In fact (and this is important) most routers hear many different paths to each of these “net-blocks”. Which path each of those far distant routers decides to use depends a lot on their local policy but it usually comes down to using the shortest path they heard or sometimes the first path they heard. What’s really important to note is that they only use one path at a time—fall-backs are fine but multi-path load sharing is just not done unless you’re inside the same network and most likely operated by the same team.
If you built a really strong name server able to answer all of the queries your zone will receive every second and if you built a global backbone network full of private fiber and beefy routers at every peering point on every continent and then you could interconnect with every network at every one of those peering points and use the distance vector routing system to tell everybody to send their traffic for your name server into your network.
There are at least three good reasons not to build name service that way.
First, private fiber and beefy routers are expensive. And once you start cheaping out by buying circuits and virtual circuits you’d lose control over your latency and you’d be subject to over-commit. The most unreliable part of the Internet is OPN—Other People’s Networks. Use them only when you have no better options.
Second, the speed of light is at this moment in history still a constant. So the further someone is from your name server the longer it will take their query to evoke a response and for that response to reach them. If you want high DNS performance then you need low round trip latency and that means putting a name server close to… well, close to everybody. No single name server no matter how well constructed or how powerful its network can be close to more than one part of the world.
Third, there’s an alternative which violates the good, fast, cheap rule (which rule states, you can choose only two.) Anycast: it’s better, it’s faster, and it’s cheaper. Don’t do it for less than all three reasons, but in any case, do it—or buy name service from someone who does it for you.
Anycast uses a loophole in distance vector routing. Instead of having a network (let’s say 192.5.5.0/24 again) with name servers on it (which in this case means F.ROOT-SERVERS.NET) connected to a global backbone with links to every peering point in the world, anycast cuts out the middle part where there’s a server in the middle with a network to the edges.
With anycast you still have to show up at a lot of peering points with a lot of equipment but you don’t need a backbone network and you don’t have one big name server in the middle—you have a lot of littler cheaper name servers installed inside of the peering points themselves. I call this a “loophole” because the networks you are connecting to can’t tell that you don’t have a backbone network. They just see distance vector routing. They don’t know and can’t tell and wouldn’t care that the network you’re advertising reachability for is not a single network in one place but is actually a local copy of a network that exists in many places.
This is faster because the speed of light gets respect and you really are really close to everyone who is sending you queries. This is better because you have a lot of small servers or locally load-balanced clusters of small servers which means you have no single point of failure and few points of visible failure. This is cheaper because you don’t have to pay for a backbone network that you didn’t need and that was just hurting your performance anyway.
Not every zone needs anycast. If you can get unicast name service for free or for trade or for much cheaper than anycast name service and if the opportunity cost of downtime is low because you aren’t driving revenue with your uptime then unicast will probably be fine for you. It was good enough for the whole Internet back before commerce came along, so don’t worry about it. But if your DNS responses really have to get there now for reasons related to business continuity or shareholder value then you will have to receive your DNS requests and transmit your DNS responses as close to the end users as possible. That’s what anycast can do for you.
Diversity
Those of us who have been doing DNS since the 1980’s find it hard to let go of certain ideals like infrastructure diversity. It’s theoretically possible to serve a DNS zone with a single well-anycasted name server, but we won’t. Long ago wise men sagely wrote that there should be at least two name servers for every DNS zone and that these name servers should be as diverse as possible. An interpretationist like myself says that this call for diversity means “please don’t put your only two name servers into the same rack on the same power strip”. On the other hand a literalist (also like myself) thinks this call for diversity means “don’t use a single name server even if it’s really hundreds of name servers hiding behind a single name server name.” Because that would just be creepy.
Other forms of diversity are worth considering. For example for the root zone that is the parent of all top level domains like COM or NET or UK we’ve tried for many years to ensure that they don’t all run the exact same software. It would be bad if a single bug present in all servers could wipe out the whole Internet. Monoclonal architectures are dust-bowl risks. Back when BIND was all there was, we root name server operators staggered our upgrades so that we would never all be running the same version. (Nowadays some of us run BIND and some run NSD.) I’m not sure this degree of care is warranted outside the root zone and I suspect that most TLD operators including VeriSign for “COM” run a single kind of software on all their servers. But it’s an example of how else to think about “diversity”.
Wierd Stuff
Lately I’ve heard indirectly sales pitches about hybrid anycast and unicast. Apparently it’s now being called risky to put all of your eggs into the anycast basket. I expect it goes something like “...and if something goes wrong with all that new fangled anycast widgetry won’t you be glad you’ve got good old fashioned unicast working for you?” My answer would be: no. And not just because anycast has been working fine since the 1990’s.
In a pure unicast or pure anycast service environment a DNS zone has the best and worst of one world and none of the other world. In a hybrid service environment where both anycast and unicast are in use you’re getting the best and worst of both worlds. I’m fine with getting the best of both worlds but it’s that bit about getting the worst of both worlds part I’d say let’s look at more closely.
The worst thing that can happen with anycast is path failure in which case some subset of your customers will think you are down for a short period while the distance vector routing system figures out a new path from them to you. Anycast limits the scope of this risk by limiting the size of the affected subset of customers. Anycast has other risks but this one is the worst since it means you’re losing money. This risk is also present for unicast but it’s not the worst thing that can happen for unicast. Yes, that means the worst thing that can happen with unicast is worse than the worst thing that can happen with anycast.
The worst thing that can happen with unicast is bad server selection which is pretty common among non-BIND recursive name servers although I’m sure that OpenDNS and Google DNS both get it right. Some large subset of your customers are using recursive name servers with really silly server selection methods—never mind why. Whenever these servers pick a unicast server that’s not close to them then your customers will get slower service. This doesn’t sound as bad as failure since they are still getting service, right? It is worse than failure, for two reasons. First, it’s persistent, it happens a little bit all the time, it’s like a slow leak in one of the main fuel lines of your business. Second, there’s no way to detect it or measure it or make investments to limit its effect on your revenue—so, no risk management.
Conclusion
When I was in primary school back in 1968 or so I learned from my experience with painting that if you mix enough colours together you always get “brown”. Perhaps that’s why I’m sensitive on the topic of getting the worst of both worlds. I don’t want the worst of both worlds. I think buyers should make a clear choice based on the costs and risks and benefits available to them and then accept only the best and worst of only the one world they choose to step into.
If it doesn’t matter how many milliseconds it takes for people to look up your web site or more generally for any DNS client to look up one of your domain names, then you’re a candidate for the kind of unicast name service that we all used to do for free or for trade and which many ISPs and ASPs still bundle as an extra if you’re buying something else from them. The domain name I use for my friends and family is REDBARN.ORG and I wouldn’t pay even one more nickel than I had to for that domain’s name service. If it takes some people five milliseconds and others fifty (50) milliseconds and a few as much as a hundred (100) milliseconds—that’s a tenth of a second—to look up my personal domain name, it’s all fine by me. There’s no commerce going on here, when my kids sell girl scout cookies they do it web-free.
If speed matters and you’re in no mood to compromise then use DNS anycast. If you want diversity then buy anycast name services from multiple providers each having a slightly different global footprint. DNS allows you to designate up to about a dozen different name servers for each zone and that is enough space to list several anycast name server names from each of several anycast name service providers. Chances are you only need two name servers if each is well anycasted but what I’m saying is you can have a lot more than two name servers if provider and technology diversity matters to you and you’re willing to pay what that kind of diversity costs.
There is no risk management case to be made for mixing unicast and anycast. As explained above, anycast isn’t some new fangled widget that you should be wary of and should protect against by also using unicast—anycast is well established technology which is the gold standard in DNS content reachability. Anycast’s benefits to your domains would in fact be undercut by mixing in some unicast name servers since this mixture would open the door for simple minded recursive DNS servers to guess wrong about which of your servers is closest to them and to thereby serve your customers poorly.
Sponsored byVerisign
Sponsored byWhoisXML API
Sponsored byDNIB.com
Sponsored byVerisign
Sponsored byCSC
Sponsored byIPv4.Global
Sponsored byRadix
This is a reather interesting post following his snide comments yesterday attacking me for having a commercial interest in another issue (I don’t but he thought he would make an attack).
The next day he is back admitting that his ‘employer’ competes directly with mine in the provision of outsourced DNS.
Organizing under one or another part of the tax code does not confer moral superiority. Quite the contrary when ‘non-profits’ have turned out to be a mighty profitable vehicle for many employees. When I was setting up Default Deny Security, a distinctly for-profit entity I was advised that I should incorporate as a non-profit to avoid tax.
I think that Vixie owes me aology for his previous attack. I won’t get it of course but I certainly think it is owed. Perhaps in the future he could mention the fact that when he is attacking others over the comercial interests of their employer that his own employer is a competitor? If not, I will do so and I will be reminding him of his selective moral compass next time he tries sliming someone else in the same way.
over on http://www.v6.facebook.com where i posted a tickler toward this blog post, there has been some technical quibbling. for example:
my answer to these was:
noting, this circleid thing is a blog, it’s meant for comments, feel free to use it.
I too have been hearing the rumbles of sales pitches claiming diversity between Anycast & Unicast having value. I have to agree with you 100%, that the answer should simply be no or even hell no. While my history only reaches back to the early 90s in the DNS space diversity should involve connectivity, hosting or server software and not fundamental routing methodologies. At UltraDNS we believe as you do, “If speed matters and you’re in no mood to compromise then use DNS Anycast.” Thank you for taking the time to offer your expert technical opinion on this subject which has too much marketing speak around it.
This could be looked at in two ways. One of them is that this is FUD, the other is that there is confusion on the part of many of the less technical people involved. Back when PKI was a new technology we had some pretty interesting conversations with customers who had had the misfortune to be informed by the wrong person. There are two issues here, one is reliability and the other is latency. Anycast has impact for latency but its impact on robustness is at first order to introduce an additional potential point of failure. If you have your DNS service on one anycast IP address then your system will be down if either (1) the host your anycast connects to is down or (2) there is no current reachable host due to misconfiguration of the anycast. Since an anycast domain has more responders the probability that a part of the network is down will actually increase under anycast. So anycast in and of itself does not eliminate the need for multiple IP addresses to be advertised to resolve the domain. My simulation runs show that the ideal deployment is generally going to be to have multiple IP addresses pointing to multiple independent anycast networks. But there are corner cases and exceptions.
if a system requires N elements in order to function properly then as N increases reliability will decrease. however, that is not how anycast works. for anycast, the system will operate correctly as long as any one (1) instance is reachable, and increases in N actually increase reliability.
the interesting argument here, according to me, is whether more anycast instances mean that something going wrong in any one of them can adversely affect overall system reliability. some forms of error such as leaking a low-capacity route to a wide area audience could certainly have that effect. that’s why we who operate anycast DNS infrastructure stringent change controls and operational monitoring—it’s a risk and should be managed. most forms of error are more like “the sysadmin tripped over the power cord” and in a properly designed anycast system that just reduces the number of instances by one which temporarily decreases performance but does not have any effect on correctness. since the goal of anycast is to maximize performance over the long run, manageable failures that lead to occasional and isolated reductions in performance are just fine.
true, that. but every one of those addresses should be of an anycast cloud. sometimes they’ll be a diverse set of anycast clouds representing multiple providers. as i said in the original article, using only a single name server name would be creepy.
do you know of a corner case or exception where a deliberate mix of unicast and anycast DNS is better for a high volume Internet property for whom the availability and performance of DNS is directly linked to revenue?
Yes, I thing we are arguing the same case but coming up with slightly different rationales. It all comes down to the exact definition of reliability and how it is measured. That is why I suspect that any noise being introduced in the channel here likely comes from either a salesperson who does not quite understand their product or a customer who did not get what the salesperson said. I don't think we need to accuse folk of FUD. Web browsing is an important Internet application but not the only one. I don't really need more than 5 nines reliability on my Web browser. But I do have other applications that need rather more. Adam Langley at Google recently published some interesting numbers on speed/reliability of DNS resolution of a randomly chosen RR code in DANE. I suspect that the timeout is still pretty long. One of the issues here is that DNS is a very old protocol designed before modern ideas on how to design protocols. If we were going to redesign DNS from scratch I think it is pretty clear that the client-resolver protocol and the resolver-server protocol have very distinct needs and the one-size fits all approach is not optimal. One area that is problematic being that a client has no idea if the response is coming back late because the resolver never got it or it is coming late because the resolver is waiting for the response. I think we would probably do that in a very different way today.
i wrote, in the original article i tried to make it clear that anycast was not my idea and that i am not claiming inventor or first-use credit:
with all due respect to Mr. Joffe, i’ve now heard privately from others who used anycast commercially earlier than Genuity/GTE. so while Genuity/GTE was first in my experience to talk about anycast at NANOG meetings and to market it as a specific feature, the first commercial use of this technology had already occurred elsewhere. i apologize for not researching this more completely.
Lest we drift too far into tangental discussions, the topic of the article is whether there is a risk management case to be made for mixing in some unicast servers in an otherwise anycasted infrastructure. My response to that so far is no, but yesterday someone pointed out to me that the .COM servers are operated using a mixture of anycast and unicast and this doesn’t seem to be hurting their performance or reliability at all. I had to agree, but with qualifications.
I see a couple of important differences between a TLD name service and an SLD name service. First and foremost, the owner of a web property has very little influence over the service architecture of GTLDs—that’s something ICANN negotiates on behalf of the community whereas something a web operator can control. On the other hand, a web operator can directly control the service architecture of their SLD DNS. So to the extent that the owner of a web property cares how their DNS services are delivered, those cares are more directly relevant for their own domain names than for their parent GTLD.
Second and more technically important, the role of the .COM server in DNS is to hand out referrals to SLD servers, whereas the role of most SLD servers is to hand out answers. This means the .COM servers should not see a lot of repeat queries from any single recursive name server. If OpenDNS or Google DNS sees a query for http://WWW.EXAMPLE.COM and does not know who the EXAMPLE.COM name servers are, they’ll forward the query to the servers for COM, and COM will refer them back to the servers for EXAMPLE.COM. Once that’s been done, all queries handled by the recursive name server for http://WWW.EXAMPLE.COM or FOO.EXAMPLE.COM or BAR.EXAMPLE.COM will be sent to the EXAMPLE.COM name servers not to the COM name servers. It is thus more vitally important that the EXAMPLE.COM name servers be close by (in terms of the speed of light) EXAMPLE’s customers than for the COM name servers to be close by EXAMPLE’s customers. SLD name servers take more queries for a given SLD than TLD servers do, and every one of those queries is an opportunity to perform well or poorly in terms of round trip time.
If we truly thought that simplistic (“bad”) name server selection has become such a small share of the world’s DNS access pattern that we could mix in a couple of unicast name servers without any statistical fear that these will have a performance impact, then this would be an argument in favour of pure unicast, not mixed anycast/unicast. I don’t think anyone is willing to make that argument.
Any time a recursive name server does not get an answer to the first question it asks of the first authority name server it tries, we’re into “retry” territory which means a bad user experience. Mixed anycast/unicast does not change the likelihood of that retry compared to pure anycast. Mixed anycast/unicast does however offer a small chance that a few recursive name servers having simplistic server selection logic will latch onto a unicast name server very far away (as the light flies.)
COM is particularly well run, it’s been many years since it had widely visible down time. If VeriSign and ICANN have agreed that a mix of unicast and anycast is what’s best, noone can fault the results of that agreement. But that doesn’t make COM a poster child for mixed anycast/unicast in SLD name service, for the reasons given above.
In Anycast DNS, Vincent Bernat has done some elegant lab work to show how to test and evaluate a mixed anycast/unicast DNS service environment. Kudos!