Last night a major partner of ours suffered an upstream network issue, causing major performance issues for our Business hosting service and millions of sites around North America. The issue with CloudFlare lasted less than an hour, but it isn’t the problem that was interesting, it was the way the network adapted and allowed as many as our clients access as possible during this outage.
Our Business hosting platform relies on a small number of partners for different services. DNS forms a core service, though invisible to most users. DNS is the service that makes the translation from ‘www.nerdsonsite.com’ to the actual IP address that the website currently lives on. If a DNS service goes down, your computer will not be able to make the conversion from domain name to IP address, and the site will fail to load for you. Another important service our team relies on is a CDN (Content Delivery Network – http://bit.ly/PyflBh) which speeds up our client’s websites even if you are on the other side of the world from our hosting servers.
Last night, CloudFlare suffered what they termed a ‘route leak upstream’ that affected multiple of their data centres. (No, I’m not sure what a ‘route leak’ means either.) Because our hosting services rely on such a distributed network (we have support servers all around the globe) our clients were only affected in localized areas. Most hosting providers have their own localized networks around their own servers, and this means that when they have an outage their websites are down globally. Last night’s performance issue with CloudFlare only caused outages for our clients in North America and parts of Europe. While this was a sizeable outage to be sure (though short-lived), we noticed that in all other parts of the world our client’s sites were being loaded properly. For example, clients in the Asia Pacific Rim had no issues accessing our client’s sites during the performance lag.
Our team believes in the distributed nature of partners such as CloudFlare and Amazon, as they help us keep our services up in as many places as possible even during major events. In this case, the distributed nature of our networks meant that clients on the other side of the globe (during their working hours even!) didn’t even notice the outage on our side of the world!