Levi, Ray & Shoup, Inc.

Datacenter outages are still “on”

5/13/2021 by Patrick Schmidt

By Patrick Schmidt

Samuel Langhorne Clemens, known to most of us as Mark Twain, is one of the most quoted, and misquoted authors in history. Twain supposedly told a reporter, “Reports of my death have been greatly exaggerated.” This quote could be somewhere in between true and fabricated. The website Snopes might help you decide.

While exactly what he said may be lost to history, it is still a great sentiment that might be coming from the lips of datacenters worldwide – that is if large rooms filled with high-powered computing equipment could speak for themselves.

In a blog last October, I wrote about how, according to the Uptime Institute, the datacenter was neither dead nor dying. But you may think to yourself, we don’t really have a datacenter. We have migrated to a cloud-first model, so that really doesn’t concern us.

Ironically, your relationship with datacenters may be more important than ever. COVID-19 forced many of us into work-at-home situations and made colocation datacenters and public cloud providers crucial to delivering IT services. For this reason, keeping these facilities up and running is a major, growing concern among senior IT leaders.

So, where do we stand in the middle of 2021? The Uptime Institute is one of the industry leaders in datacenter certification and analysis. It’s like E.F. Hutton was in the 1980’s: When they talk, people listen.

What are they saying about datacenter outages? Plenty, and we should take note.

In the Institute’s 2021 Annual Outage Analysis (AOA), there were several key findings:

  • In spite of improving technology and better management of availability, outages remain a major concern for the industry — and increasingly, for customers and regulators. The impact and cost of outages is growing.
  • The causes of outages are changing. Software and IT configuration and network issues are becoming more common, while power issues are less likely to cause a major IT service outage.
  • Human error continues to cause problems. Many outages could be prevented by improving management processes and training staff to follow them correctly.
  • There were fewer serious and severe outages reported in 2020 than in the previous year. While progress in improving reliability and availability is always a factor, this decrease may, in part, be due to changes in IT use and management as a result of COVID-19.

It is important to note that Uptime’s research shows that the number of serious and severe outages decreased last year. These are events that result in a disruption or complete failure of service. However, the study also says the actual impact and cost of the reported outages is increasing. How can that be?

During a webinar and roundtable discussion on the report, Andy Lawrence, Uptime Institute Executive Director of Research, addressed the paradox. He remarked, “…the impact and the cost of outages are definitely growing, even if the amount of outages per kilowatt of IT load is dropping. That’s because of growing dependency on IT.”

Even without a worldwide pandemic, our dependency on IT services was growing. The events of early 2020 simply accelerated a process already in motion. Because we depend on a resilient infrastructure to power everything from ATMs to Zoom conferences, a focus on preventing outages needs to begin with what the report found as the root causes.

In the past, power supply issues were, by a wide margin, the primary factor in serious or severe outages. Now, software, IT configuration, and network issues play a large role. This may have been summed up in the third key finding above, “Human error continues to cause problems.”

We all should have confidence in out IT staff. But no one can be an expert at everything, and new technologies often add to confusion and delay in the datacenter. Therefore, there are times when bringing in another subject matter expert is the right thing to do.

Getting back to Mark Twain, he is also reported to have said, “The secret of getting ahead is getting started.” He was right and it’s easy to get started with LRS IT Solutions. We have a 25-year history of assisting our clients stay up and running to meet their business goals.

If you are struggling with implementing a particular project or are just getting started evaluating what your next step should be, contact us by filling out the form below and we will match you with one of our seasoned experts.

About the author

Patrick Schmidt is a Technology Lifecycle Management Specialist with LRS IT Solutions. For more than 20 years, he has been helping customers get a firm grasp on their asset and contract management with a combination of comprehensive service level analysis and lifecycle management best practices.