Removing human error from your data center
In our blog series exploring the top three reasons for data center downtime, you may have noticed a humbling thread underlying many of the top culprits: most data center outages boil down to simple human error.
Whether it was caused by a lack of structure, a quick change that skipped testing or simply sloppy keying, the human factor has long been recognized as a top contributor to enterprise data center downtime. A recent Uptime Institute survey1 found that more than 70% of data center and service outages are caused by human error.
Within any system, there is a built-in probability of failure. It’s our goal as operators to do whatever we can to prevent it from ever occurring in a live environment.
Focus leads to data center outage improvements
Fortunately, there is momentum toward industry improvement. We’re seeing slight decreases in errors as operators become more knowledgeable, technology becomes more sophisticated and market consequences become more punishing. This means teams that are focusing their attention on blunting the impacts of errors are making progress.
If you’re an organization that manages its critical infrastructure, eliminating the human factor should be at the very top of your data center outage mitigation efforts.
Three simple steps to minimize human error
- Introduce straight forward, comprehensive procedures like SOPs, MOPs and EOPs.
- Invest in a simple yet intelligent systems architecture, that can self-heal without human interjection, to manage critical facilities.
- Implement disciplined and scalable training practices for teams and individuals who will interact with your intelligent systems.