768K day - the day the Internet breaks (again)?
Article by ThousandEyes VP of product marketing Alex Henthorn-Iwane.
Nearly five years ago, on the 12th August 2014, the Internet ‘broke’ as Verizon’s introduction of thousands more routes to the web meant the full capacity of the global, fourth version of the Internet Protocol (IPv4) routing table was reached, at 512,000 entries. IPv4 comprises a group of rules that define how data is sent and received over a network and any disruption to this has major consequences. In this case, as routers had only been prepared for 512,000 entries, they crashed, in turn creating significant packet loss and traffic outages across the whole Internet. With this, the incident forever became known as “512k day”.
Despite this lesson, it might be about to happen again.
Following 512k day, a new upper limit was set at 768,000 (or colloquially called 768k) entries, although Ripe NCC, the Regional Internet Registry for Europe, states the ‘k’ should be interpreted as 1028, so the boundary is estimated to be at 786,432. With the current number of IPv4 prefixes believed to be around 770,767 this new capacity could, theoretically, be reached at any time.
Moving back to 2014, large companies such as Verizon, AT&T, Comcast, Sprint, Verizon and BT all experienced significant outages, drastically hampering their business and affecting their customers. The ripple effect from these outages highlighted the outdated equipment and lack of safety precautions for network providers, causing Internet service providers (ISPs) and Internet organisations to jump into action. Network engineers put damage control measures in place, increasing the upper limit.
However, the potential significance of 768k day is determined by the additional steps these engineers took to protect their network infrastructure. In fact, there is controversy over whether the same effect will be had this time around. Some network engineers believe larger providers will have the necessary precautions such as replacing old routers with new gear or by making firmware tweaks to allow devices to handle global border gateway protocol (BGP) routing tables that exceed even 768,000 routes, however many still expect significant outages.
The main risk comes from smaller ISPs, data centers and other providers who make up the fabric of the Internet. If these experience outages then we will see a ripple effect once again. Organisations likely to be hardest hit include corporate end-user companies who may be caught unaware and wouldn’t have had the resources to prepare.
With so many aspects making up the fabric of the Internet, these smaller companies still carry a substantial amount of service traffic across ‘soft spots’ where maintenance may be neglected. With this in mind, it’s very likely that enterprises will see some issues or outages due to 768k Day, along with the many outages that occur across the Internet on a day-to-day basis.
As digital experience has emerged as a key competitive business factor, and by 2020 is predicted to overtake price and product, the consequences of an outage are even greater than in 2014. The need for agility in today’s digital world means most companies can only realistically provide a reliable, seamless digital experience to their customers and employees by leveraging cloud and third-party technologies.
Despite how critical the digital experience is becoming, by relying on a wide variety of third-parties, companies are creating a digital footprint with a host of security issues and vulnerabilities that they may simply be unaware of, through no fault of their own. By widening their digital footprint, companies are increasing the chance of their network and services being affected by an outage, should it occur to one of their many providers.
Outages we’d expect to see from a a 768k day issue would be similar to outages showing total packet loss for monitoring tests that were crossing various router interfaces. A recent outage of this nature saw packet loss on several interfaces in the Cogent (AS 174) network in the San Francisco Bay area, that affected peer ISPs like Comcast, as well as services like Amazon, Verisign, and 8×8.
In order to ensure that routers can process paths over the 768k limit, it’s essential to carry out preventative measures, not only for business but for customers too.
Having visibility into the whole network, whether it be on premise or in the cloud, allows network engineers to identify weak spots and see potential outages appear and re-route traffic so services and customers are not affected.