|
A revolution is taking place on the Internet, with new sites redefining how we interact online. The next-generation Internet is emerging in collaborative and interactive applications and sites with rich, varied media (images, video, music). As with many revolutions, this one is driven by the younger generation, which is adopting social networking sites like MySpace and video sharing sites like Google’s YouTube. But the general shift is not restricted to the young, as more mature consumers and businesses alike are exploring the possibilities of collaborative, media-rich applications.
This major shift in Internet applications has its unintended victims. One of them turns out to be the Domain Name System (DNS). This service is part of nearly every Internet interaction, and operators of major IP networks know that DNS is essential to service performance and availability.
The Internet is poised for its next major burst of growth and usage as billions of telephones, fax machines and PDAs join the desktops, laptops and servers already communicating on the Internet. This is a critical moment for traditional telephone carriers and Internet Service Providers (ISPs) developing next-generation voice, data and multimedia services; missteps or slowdowns can be enormously costly in the race for market and mindshare.
The next-generation Internet applications, as represented by today’s MySpace and YouTube, substantially increase the DNS query load for carriers. As is explained below, each page visit to one of these sites may require ten times more DNS queries than for traditional websites. As subscribers and traffic continue to grow to these sites, the underlying DNS traffic grows at a much more rapid rate. And as DNS traffic reaches capacity, the network slows down for all applications and becomes vulnerable to attacks and malicious traffic.
This issue affects all broadband providers, including DSL providers, cable operators, and wireless/cellular carriers offering IP services. Maintaining service performance and availability in this rapidly-changing world will require re-architecting the DNS infrastructure with much more capacity to sustain the new generation of Internet users and uses.
The Social Bottleneck
There are many traditional reasons why broadband carriers are seeing sustained growth in DNS traffic, including increases in broadband subscribers and increased usage per subscriber. This kind of ‘organic’ growth is to be expected, and is part of any carrier’s capacity planning. But broadband operators worldwide are experiencing a rapid growth in DNS traffic that cannot be attributed to subscriber growth. While viruses, worms or attacks cause temporary spikes in DNS traffic, these changes are significant and enduring. For example:
Something else is going on in addition to increases in subscribers and growth in online usage. The next-generation of web applications is one of the key factors, and the story of MySpace is a good example.
The Social Slowdown
MySpace is an extremely popular social networking site. MySpace users post profiles of themselves, to which their friends add in a continuous blog. Its collaborative nature is representative of the Web 2.0 movement.
A typical MySpace profile page is a rich assortment of images and blogs posted from friends. Users can post videos and flash-based content, as well as links to favorite songs in MP3 files. In most cases, each of these content pieces is stored in a separate DNS domain. For example, each image belonging to a friend is retrieved from a distinct URI. This means that retrieving and displaying a profile page may require hundreds of DNS lookups in the background—compared to ten or so lookups for a ‘standard’ B-to-C web page.
MySpace is one of the most visited sites on the Internet. Each of those page downloads may account for ten times or more the amount of DNS traffic of a typical web page visit. Here is an important clue to the recent, unusually high increase in DNS traffic. And, alas, there is more to the story than meets the eye.
The Content Distribution Conundrum
The use of content delivery networks (CDN) puts further strain on the DNS for rich media and social networking site queries.
Increasing numbers of websites use content delivery (or content caching) networks to distribute content on the Internet. Content delivery companies use globally distributed servers to deliver rich content and interactive applications most effectively to users around the globe. They do so by caching content locally, and then using the DNS to direct users to the best source for content, based on current performance, load, proximity and availability information. This capability has been critical to helping popular websites deliver a good experience to users across the world.
Content delivery companies want to maintain control over how traffic is directed across their global platform and dynamically redirect traffic to the most appropriate servers in real or near real-time. This is accomplished by setting short Time to Live (TTL) values on the DNS data they deliver for low-level links in pages. This means that caching name servers cannot maintain the data in the cache for long to answer subsequent queries for the same pages, which puts further strain on the DNS system.
Back to our example of the MySpace page, assume that twenty kids from a high school all want to check out the latest MySpace posting from their favorite musician, and all use the same Internet provider. Not only does each page download generate hundreds of lookups, but in many cases the caching name server cannot use cached data and must return to the global Internet to get the right DNS data. Not only does the site increase the DNS traffic, it reduces the effectiveness of the DNS server’s cache and increases the time the server spends waiting for responses from “upstream” authoritative DNS servers.
The improvements that content distribution networks provide to the Internet at large are well worth the trade-off of increased DNS traffic, which is very small in volume in comparison to image and video content. By distributing data around the Internet in the most efficient manner (and caching frequently-accessed data closer to the user/s), content delivery networks both improve response times for users and reduce the amount of data traveling through core IP networks.
However, this is small comfort to the DNS administrator struggling to handle DNS traffic doubling every year with increasing query latency.
A sign of things to come: the Web 2.0 bottleneck?
The MySpace example above is not an isolated problem—it is the harbinger of things to come as new uses of the Internet take hold.
The term Web 2.0 refers to this next-generation of more collaborative websites rich in media: podcasts, social networking sites, web applications and wikis among them. YouTube, the popular video sharing site acquired by Google in late 2006, is another example of collaborative, media-rich Internet usage that is wildly popular, relying on content delivery networks and increasing the load on the DNS.
If the DNS impact of social networking and Web 2.0 sites only affected users of those sites, the problem would be still be serious. Wireless carriers, for example, want to gain subscribers from the demographic that uses MySpace and FaceBook. As wireless connectivity increases, wireless users will take advantage of their ‘instant connectivity’ to use location-based social networks. Slow page downloads will reduce the value of the service.
But the ramifications go well beyond the social networking users.
An overloaded DNS server eventually slows down query responses to all queries—affecting all other services and users. Carriers have to continuously build out DNS capacity to provide adequate response times. And, as systems run closer to capacity, the network is more vulnerable to Denial of Service attacks or to DNS overloads due to traffic generated by viruses and worms.
For carriers that still rely on general-purpose Berkeley Internet Name Domain (BIND) software for DNS, the growth of DNS load puts them in an impossible situation. A BIND server can only handle so much DNS traffic before it starts behaving erratically, dropping packets, requiring frequent restarts as well as more hardware. And the more servers you add, the greater your ongoing cost of operation. Both CAPEX and OPEX increase—eroding the cost benefits of using open source software.
To make it more complex, none of this is happening in isolation. This server proliferation takes place while carriers are trying to streamline and optimize the IP infrastructure to support network convergence. Triple- and quadruple-play services are adding voice, video and mobility to the IP infrastructure, increasing service level expectations and traffic. And emerging standards like DNSSEC will put further load on the DNS.
A DNS Infrastructure for Web 2.0
Carriers need to shift how they think about the IP naming/addressing infrastructure of their networks, and DNS in particular. DNS has long been an ‘invisible’ service that performs in the background. Many carriers have relied for many years on BIND software, which was designed as a general-purpose DNS server (both authoritative and caching) for enterprises and small-range carriers. However, in a Web 2.0-enabled world, carriers trying to protect the customer experience cannot neglect the DNS.
Instead, they need to make a commitment to building a carrier-grade DNS infrastructure that can handle the rapid growth in traffic without sacrificing performance and resiliency. To do so, they will need to use the most efficient DNS solution possible that maximizes load capacity while minimizing server resources.
A good guideline is to provide enough DNS capacity to handle up to three times current peak load.
In addition, the DNS infrastructure must be tightly integrated in the entire network management scheme for optimal availability and reduced operating costs. It needs carrier-grade monitoring and alerting, and robust online configuration and remote management capabilities.
Carriers are not alone in this DNS investment. VeriSign, which operates two of the Internet’s root servers, has recently announced a massive investment in its DNS infrastructure, and is in the midst of increasing capacity 10,000-fold by year 2010. The Domain Name System as a whole has proven itself tremendously scalable over the past two decades, during a period of enormous change in IP networks. It will continue to scale and grow to handle the next-generation of Internet and converging IP networks, as long as we give it the attention and care it deserves.
Sponsored byDNIB.com
Sponsored byIPv4.Global
Sponsored byCSC
Sponsored byVerisign
Sponsored byWhoisXML API
Sponsored byRadix
Sponsored byVerisign
Have you actually correlated the frequency of myspace / social networking site dns queries from a particular IP with subsequent explosions in query traffic to other sites? And since quite a lot of the content on the internet gets served up from very few sites indeed (even e&oe Zipfs law / the 80-20 rule etc etc) is the extent of this traffic perhaps a bit overestimated?
And how about bot / trojan generated dns query traffic / DNS amplification attacks?
Web 2.0 has shown developers some of the power of dynamic usage of the internet’s infrastructure and there are already teams looking at leveraging similar techniques inside DNS.
By creating single points of identity it is possible to publish far richer data than current practices would suggest and this will lead to a substantial increase in DNS traffic as increasing levels of indirection become the norm.
Add to this the increasing prevalence of broadband network access in the developed world and it’s not unreasonable to suggest that the internet is about to undergo a fundamental change in character.
Mr. Tovar wrote: “A BIND server can only handle so much DNS traffic before it starts behaving erratically, dropping packets, requiring frequent restarts as well as more hardware.” I know of no way to get the behavious Tom describes. There is no erraticness, no packet drops. At ISC we beat the hell out of BIND, both in production on our root, TLD, and enterprise name servers; and in the QA lab; and on developer desktops. If there was a way to get BIND to act like Tom claims here, we’d know about it. ISC also provides commercial support for BIND to a growing fraction of the world’s business community, and if there had ever been a case where one of our Fortune 1200 customers had experienced erraticness and packet drops, you’d ALL know about it.
Mr. Tovar continues: “And the more servers you add, the greater your ongoing cost of operation. Both CAPEX and OPEX increase eroding the cost benefits of using open source software.” While it’s true that BIND works best inside organizations who have IT departments, it’s also true that costs of additional cookie-cutter name servers scale very well compared to the costs of additional file, web, and compute servers which are also part of a successful business’s growth curve. And every customer knows that the heaviest element of TCO is “vendor lock-in”, not staff or equipment. The reason F/OSS software like BIND drives down TCO is because there is no vendor and there is no lock to be in. ISC’s enterprise support customers for BIND and DHCP have told us that they love being in control of their own destiny, and that with ISC’s commercial support now available for BIND, they’ve got a better handle on their TCO than ever before.
In conclusion, Mr. Tovar wrote that “VeriSign, which operates two of the Internet’s root servers, has recently announced a massive investment in its DNS infrastructure,” but does Mr. Tovar know that VeriSign’s two root name servers run BIND? I agree with one conclusion: “The Domain Name System as a whole has proven itself tremendously scalable over the past two decades, during a period of enormous change in IP networks. It will continue to scale and grow to handle the next-generation of Internet and converging IP networks, as long as we give it the attention and care it deserves.” This is why ISC, in partnership with Internet Engines and Nominum, spent three years and five million dollars replacing BIND4/BIND8 with BIND9. This is why ISC has poured four more years and millions more dollars bringing out new versions of BIND9, of which BIND9 9.4.0 is the latest and best (and very much the fastest ever.) And it’s why ISC will shortly begin work on BIND10, which will have all of the clustering, embedding, and O&M features our customers tell is are missing in BIND9. The result will be another total refresh of the F/OSS foundation of the DNS industry—a tide that will raise all boats, even Mr. Tovar’s.
Nobody who’s bothered to read the source doubts that BIND 9 is a solid piece of engineering and capable of coping with high loads.
However the problem with next-gen DNS usage is going to be a combination of short-TTL zones, increasing Web 2.0 interactivity, and unforeseen consequences of using NAPTR and SRV resource records creatively in such an environment.
Whilst the net is still server-oriented this effect will be muted, but 100Mb pervasive broadband isn’t that many years down the road and if the majority of broadband providers continue to use dynamic IPs for their customers’ connections the short-TTL zone will become increasingly prevalent.
Someone produce some proper figures quick please.
My guess is DNS traffic growth is down to;
Deployment of broadband.
Peer to peer (if/where they use DNS).
Spam
Antispam measures
Email is a great way of generating DNS requests, for every spambot connecting to our email server, we validate the domain it claims to send email from, we reverse look up it’s IP address, and we look it up in an online blacklist, pretty much as a minimum.
Since each bot tends to be novel (they rotate them on some botnets, so you never see the same bot from the same network twice in a spam run), you might see several thousand novel bot IP addresses an hour, with three requests each, per mail server. Some people do statistical filtering, and thus look the address up in multiple lists, and weight the results. So 5 or 6 blacklist lookups per email attempted is not uncommon.
Content distribution networks tend to be in the business of fast performance, which means avoiding unneeded DNS round trips. Which is probably why I see 2 day TTLs on the biggest CDN name servers, when most people are happy with one day.
Maybe I don’t read enough MySpace, but most of that sort of stuff is popular, and repetitive, which means it is readily cached, and thus low cost for DNS. Many items per page, from different domains, isn’t a new “web 2” thing, web designers have been that messy for years.
Oh one other thing - does product placement (like the CEO of a DNS server vendor making statements that knock a competing product) really have a place on circleid?
Quote from the original article below, with a lot of the fat trimmed off
Unless I’m missing something, that article isnt about Web 2.0 or its effect on carriers’ DNS resolver traffic, any more than most vendor “white papers” are .. it is a straightforward “bind sucks, buy nominum CNS instead” sales pitch.
Mr.Tovar, if you’re Nominum’s president and COO, you do need to stop behaving like you are still in your former role at nominum - VP of worldwide sales and business development.
Amongst the various root server operators, several of them provide MTRG graphs, e.g H, K, and M. If more of them did this, providing graphs that went back further in time, and if TLD operators did the same, it would provide some useful data on DNS performance (for security, though, it might be desirable to publish the data with lags, or smoothed out, to not give attackers too much information in real-time that can be exploited).
The root servers only measure a very specific type of traffic (mostly lost traffic).
The volume of genuine queries is largely unrelated to traffic to the root servers. Most surveys of root server traffic show a small number of abusive users generating a lot of traffic, and a larger number of dispersed broken systems. Genuine traffic is only a small fraction of the total.
Measurement would have to be by the providers of recursive DNS services. My experience is most barely monitor past “is it working”, let alone producing statistics by origin of traffic.
However it is easy enough to switch on monitoring on BIND on recursive servers, I logged 7 seconds worth of traffic on ours just now, it was all spam email related traffic, but hey everyone has gone home from the Office, so I’d be surprised if there was much else for it to be doing.