There’s a saying, “Murphy’s Law: What can go wrong, will. Bell’s Law: Murphy was an optimist.”
Yeah, we all feel that way on certain days when your planets are aligned for great misfortune, but it’s also an apropos saying for all things regarding your computer systems and even your data center maintainability. Things are going to go wrong. Sooner or later, they will.
Change happens. New equipment, increased power requirements, cooling demands, changes in safety and security regulations, consolidations, expansions. All of these change events can trigger a failure. They demand that you have flexible maintainability, because with each change event, there’s a potential for misfortune.
The good news is that you can mitigate your risk for a Bell’s Law type of SNAFU by taking precautions. All experts recommend a few steps you can take to avoid downtime, as well as choosing a data center partner that fits your budget and needs.
Number one on your to-do list is to avoid densely packing racks with energy hogs. Next you should be trading space for density, as energy costs are 4 — sometimes 5 times — the cost of space. (Aim for 4 kilowatts per rack.) But after you’ve done your part to ensure the best continuity for your own servers, what do you know about the Data Center that you choose to put your servers in? How do you reduce your chances of performance interruptions? Choosing a data center with a level of uptime consistent with your needs is a start.
The aptly named “Uptime Institute” (a non-profit organization) is an unbiased, third-party data center research, education, and consulting organization focused on improving data center performance and efficiency through collaboration and innovation. Members of the Uptime Institute are corporations with heavy data center utilizations. These corporate members share best practices in relation to high performance data centers, and through their discussions, they’ve identified four “tiers” of fault tolerance (the Bell’s Law thing again), where 1 is the lowest and 4 is the highest or best in regards to data center uptime. Below is their definition of Tiers 1 – 4 and their typical outage time on an annual basis, as well as their basic design criteria:
|Tier 1||29 hours||One path to power and coolant. Does not have redundant components (spare air conditioning units)||$|
|Tier 2||22 hours||One path to power and coolant, but has redundant components for both.||$$|
|Tier 3||1.6 hours||Multiple power and coolant distribution paths, but only one active path. If the active path fails, the data center can switch over to the redundant path.||$$$|
|Tier 4||24 minutes (this equates to essentially 99.995% availability||The data center has redundant paths, but also adds “fault tolerance” which means that if one path fails, the other automatically takes over, including everything from the electrical power distribution system, the interruptible power supply (UPS), back-up diesel generation, etc.||$$$$|
Bell’s Law happens. Do your part to ensure you aren’t set up for colossal failure – and choose a data center partner that fits your needs and budget. Knowing your parameters allows you to plan for disasters much easier, and the contingency plans for recovery.