What once was the “World Wide Web” is now turning into more of a “World Wide Logical Star”, where practically all of our infrastructure is controlled by a select few corporations powered by greed and control, not for the greater for the internet but the greater of their pocket.
The Beauty of Decentralisation
Way back to the origins of the Internet, one of the fundamental decisions was to take control away from any one organisation. The Internet is a library of the people who uphold it, and is available at all times as long as both you and the person you are trying to connect to are online.
Do you remember the wild-west of the internet? Visiting strange
.com websites hosted by some random propeller head in another country. 1,232 people have visited this site! This is the glory of someone who could spin up some HTTPD server in their garage (much like
rajchert.net), and you could visit that site at no extra cost.
What we’ve seen in the last decade is the rise of cloud computing, whereby you would pay someone else with a big fancy data-centre to host your services with a fancy five-nines SLA golden sticker.
Now, I don’t necessarily have an issue with cloud computing, though events we’ve seen recently have proven that developers and operators need to be reminded about the fundamental flaws of centralisation.
Taking Centralisation into Account
Centralisation isn’t a bad thing, though we as developers must be careful as to how we utilise it. We can learn from community operated and relatively autonomous systems in developing not only the services that we operate but the collectively hosted services.
A good example of this is Mastodon. Mastodon is a social media platform comparable to Facebook and Twitter whereby users can host and customise their own page or profile. You aren’t just accessing
mastodon.social like you would with
facebook.com but instead navigating between multiple and completely independent servers. By doing this, you create a highly resilient social media platform by giving the operation of that platform to the users who use it.
Decentralisation is also integrated into the fundamental protocols that make up what we know as the internet. BGP (Border Gateway Protocol, super relevant lol) for instance, is a protocol that connects the Internets big iron routers and networks to each other by broadcasting routes. Simply, it’s a small packet that says to the routers neighbours “Hey, I’m connected to that network! Go this way, it’ll take this long to get there”.
Why BGP relies on AS numbers distributed by by IANA, the operation of BGP is done by the community that makes up the internet. By design, no one actor can accidentally interrupt BGP (mostly).
Even some video games take decentralisation into account! Many source engine games are well known for their community hosted game servers and mods. While there is a centralised registry to list running game servers to simplify the process of joining them as a gamer, it is still absolutely possible to connect to these services without the help of the developers. This is all thanks to well founded and agreed upon protocols shared among a community of gamers.
Now this outage has already been well described my many sources, the best one I have seen yet is Cloudflares Celso Martinho and Tom Strickx and their blog post here. I have also found Steve Gibsons description from GRC of great value during the Security Now podcast #839. Facebook has also released a description of what went wrong.
Long story short, due to a misconfiguration error on Facebooks side, a catastrophic domino style outage occurred world-wide for approximately 6 hours. Facebook failed to push BGP route updates and very quickly took their servers (logically) offline. Not only did this take
facebook.com down, but all of their subsidiaries, and their internal network. Sysadmins could not access their servers, and so only those with physical access could resolve the issue. There are reports going around that admins had to physically break into datacentres because their security passes would not open doors.
Telstra’s DNS Woes
Telstra, Australia’s leading ISP, last year (2020-08-02) suffered a DNS fault that took out internet service for millions of Aussies. While DNS itself isn’t exactly centralised, the servers that do serve these requests can be.
When you connect your home router to the internet, it makes a couple of requests to pull various config to connect you, including which DNS servers to use. (This is a privacy concern in my opinion but I’ll get into that later). For most internet users, changing DNS servers is an unknown operation to them. So they tend to just stick on whatever is automatically configured. This group of people is in the many millions. When Telstra’s DNS servers went down (due to a misconfiguration error again), so did internet access for a very large portion of Telstra’s clients, those who did not configure their DNS to some other server.
I wouldn’t say that this is a design downside of DNS but rather an implementation issue. DNS has a wonderful tree like organisation whereby if an address isn’t known to a server it will look further to the root of the directory to gain a better understanding. Unfortunately in Telstra’s case, both their primary and secondary DNS servers died at the same time, and (if it was configured in the first place) fail-over failed to save them.
The take away from this is to host your own DNS forwarder within your network, and to specify more the 2 DNS servers that are not within the same organisation. Say something like this;
A large majority of routers will allow for more that two DNS servers, though if it doesn’t, just use one address two different providers!
The Beauty Behind Email
Isn’t it just nice, when a protocol is designed so well, it can be left alone by the developers in its operation and just put out there for free! God bless SMTP.
For someone running their own mail-server, they don’t rely on any third party service to transport that message. Google could be offline (like Facebook), but you still would receive emails!
When we design protocols to operate completely decentralised, we save on the greatly operational cost for running that service and improve service resilience.