House of Cards

Home / Blogs

House of Cards

	By Earl Zmijewski VP and General Manager, Internet Data Services
	August 28, 2010 Views: 11,823 Comments: 3

Time flies. Although it was over 18 months ago, it seems just like yesterday that a small Czech provider, SuproNet, caused global Internet mayhem by making a perfectly valid (but extremely long) routing announcement. Since Internet routing is trust-based, within seconds every router in the world saw this announcement and tried to pass it on. Unfortunately, due to the size of this single message, quite a few routers choked—resulting in widespread Internet instability. Today, over a year later, we were treated to a somewhat different version of the exact same story.

First, let’s review the Czech incident from February 2009. There were many positives to take away.

It was precipitated by an honest mistake.
It was an extremely unlikely event, as many stars had to be in exact alignment.
Most of the Internet’s core survived.
The response from operators was fast and efficient, with the damage largely contained within an hour.

The complete technical details can be found here.

Deja vu all over again

Fast forward to today: Friday, 27 August 2010. What do you think would happen if another large and unusual routing announcement was made on the Internet? Do you think all the router vendors have perfected their code in the past 18 months? Do you think the entire planet has upgraded to this new, improved and perfect code base? Do you think it makes sense to use the Internet as your testbed? I doubt you answered “yes” to any of these questions.

We’ll begin to describe what happened today with a snippet from a private mailing list. We’ll purposely leave out the technical details so that we don’t inadvertently contribute to the building of a Cybernuke.

On Friday 27 August, from 08:41 to 09:08 UTC, the RIPE NCC Routing Information Service (RIS) announced a route with an experimental BGP attribute. During this announcement, some Internet Service Providers reported problems with their networking infrastructure.

Immediately after discovering this, we stopped the announcement and started investigating the problem. Our investigation has shown that the problem was likely to have been caused by certain router types incorrectly modifying the experimental attribute and then further announcing the malformed route to their peers. The announcements sent out by the RIS were correct and complied to all standards.

Um, while standards compliance is nice, it is foolhardy to assume that all BGP implementations are perfectly compliant, especially given recent history. Over 3,500 prefixes (announced blocks of IP addresses) became unstable at the exact moment this “experiment” started. Not surprisingly, they were located all over the world: 832 in the US, 336 in Russia, 277 in Argentina, 256 in Romania and so forth. We saw over 60 countries impacted by a “correct” announcement that “complied with all standards”. The following graph shows the timeline of the event, followed by a map of the impacted countries by prefix count. Notice that it takes a bit for the Internet to stabilize after RIPE claims to have withdrawn the announcement at 09:08 UTC.

Conclusions

On the positive side, the incident was very brief, the damage was limited to under 2% of the Internet and the responsible parties quickly fessed up, aborting their “experiment”. On the negative side, the Internet remains a very fragile place, even if that fragility is highly localized and different in different places. Standards aren’t followed, code isn’t tested and people make mistakes. That’s life with any complex system and, while we can certainly do a better job, we will continue to see these types of events no matter what safeguards we might take. What puzzles me is how anyone thought it might be a good idea to test fate in this way. The end result was completely predictable.

NORDVPN DISCOUNT - CircleID x NordVPN
Get NordVPN [74% +3 extra months, from $2.99/month]

By Earl Zmijewski, VP and General Manager, Internet Data Services

Filed Under

Comments

RE: House of Cards Fergie – Aug 28, 2010 1:05 AM

Well, the announcement was re-posted to the NANOG listed earlier today:

http://mailman.nanog.org/pipermail/nanog/2010-August/024837.html

And as an aside, I see that Cisco posted a security advisory regarding this late this afternoon:

http://www.cisco.com/warp/public/707/cisco-sa-20100827-bgp.shtml

- ferg

# 1 Reply | Link | Report Problems

Internet Robustness John Curran – Aug 29, 2010 2:09 AM

On the negative side, the Internet remains a very fragile place, even if that fragility is highly localized and different in different places.

Not true. The fact that the impact was very small, unnoticed by the vast majority of users, and quickly fixed means that “The Internet” was just fine, and as robust as ever. Perhaps the point was that “Global Internet routing remains one of the most complex systems ever built, and subject to degradation just as any other highly complex system” ?

/John

# 2 Reply | Link | Report Problems

The bright colors dont tell the whole story Suresh Ramasubramanian – Aug 30, 2010 3:10 PM

All due respect to Renesys but some weightage for the percentage of significant ASNs affected should have been built in.

> 500 ASNs in the USA is a drop in the bucket. > 20 ASNs in another country with fewer networks might be most of the country.

Some study along those lines might have painted the bright colors elsewhere on the map

# 3 Reply | Link | Report Problems

The Weekly Wrap

More and more professionals are choosing to publish critical posts on CircleID from all corners of the Internet industry. If you find it hard to keep up daily, consider subscribing to our weekly digest. We will provide you a convenient summary report once a week sent directly to your inbox. It's a quick and easy read.