Getty Images 1010819940

Cloud Failures: It Happens

Jan. 15, 2021

It was the day before Thanksgiving and I was having visions of pecan pies with turkey and all the fixings, but first there were chores to do. The house needed vacuuming, but that is easy-peasy with a robotic vacuum cleaner, or so I thought. I pulled out my smartphone and activated the robot’s app. Instead of a robot whirling to life, nothing happened except an error message that hit my screen – “Not Connecting.” What kind of an error message was that? My first thought was my Wi-Fi was down – wrong. Probably a glitch in the robot, so reboot it – wrong.

The bot came back to life long enough to tell me to check the help screen. The app’s help screen was no help, and neither was the manufacturer’s website. The help-line didn’t help either, but I learned a long time ago when dealing with cranky technology the best source of technical support comes from the tech chatrooms.  A little digging revealed there was an internet interruption caused by the Amazon Web Service (AWS) platform.

For the uninitiated, AWS is an internet infrastructure service that many refer to as an internet backbone. It got this title because so many websites, services, and apps depend on it to keep the data flowing. Several sources confirmed the interruption was a little more complicated than a minor inconvenience. It seems the interruption was actually a massive infrastructure outage that was causing a huge headache for thousands of online services across the internet.

I started poking into some of the really techno-geek type of sites and found it had impacted the online services of major big-tech companies like Adobe, Autodesk, etc. The power grid was not immune either. Southern California Edison, for example, took a hit. Amazingly, the news sources were pretty quiet about the outage and all the affected companies. Another AWS statement said that the interruption was caused by a human error in adding new capacity to the Amazon Kinesis real-time data processing service.

AWS also reported the outage was limited to only one of their 23 regions, and it appeared only to effect North America. North America is a pretty big chunk of geography. It is a little disconcerting in today’s COVID-19 age. So many of us rely heavily on internet connectivity for our work, play, shopping, and so much more.

Cloud Outages

This lack of evening news coverage about something so important to our society got me thinking about how most of us really don’t understand what is behind the digital technologies we take for granted. This was brought into better focus by visiting a couple of the big-tech companies’ websites impacted by the cloud-outage. There was page after page of customer comments about how badly the company’s product was performing, but nothing about a cloud-outage.

That got me thinking about cloud-based technologies and how they are becoming a major player for the dynamic smart grid. Every component, device, and element has been infused with technology to produce real-time data, which needs to be analyzed to produce meaningful information. Cloud-computing is popping up wherever data analytics are needed by smart management platforms to improve the flexibility, reliability, and resiliency of the grid. So, understanding the importance of cloud-outages is critical to the digital grid.

The causes of cloud-outages can be as simple as a power outage or as complex as a cyber-attack, but the most probable cause is human error like the November 25th cloud-outage. There are also maintenance issues, equipment failures, network problems, and so on. The takeaway here is cloud-outages have been happening and are going to continue to happen despite the best efforts of everyone involved.

Several experts are saying not all clouds are equal and the more redundancy built into system, the better it will perform. Others talk about the developers using several clouds for their systems, but there is another issue. Cloud services are provided by third parties and not the developers of the applications. When you see the sheer size of modern data centers it’s easy to see why everyone relies on these third parties. 

As more utilities adopt platforms that merge IT (information technology) and OT (operating technology), they are relying on the cloud for on-demand computing services. Let’s face it, the smart grid’s big-data is here to stay, and this trend of cloud-based computing technologies is happening. Despite the potential problems, the cloud’s benefits are too valuable to ignore. It’s going to be challenging, but it’s worth it!

Voice your opinion!

To join the conversation, and become an exclusive member of T&D World, create an account today!