Data Center Exposure and Recovery in New York City

Data Center Exposure and Recovery in New York City

Hurricane Sandy provided a fascinating opportunity to study the both the level of disaster planning and the resilience of New York City data centers. This article will examine a) what actually happened, b) what was the risk, and c) what are the lessons learned.

What Actually Happened?

Simply put, data centers in New York were caught off guard. Consider these incidents.

Internap and Peer 1, located at 75 Broad Street, suffered basement-level flooding which knocked out diesel fuel pumps.

 

Datagram, located at 33 Whitehall, experienced the exact same problem – 5 feet of water in the basement. As a result several high profile blogs and numerous websites went dark.hurricane-sandy

Both of these facilities are located a Zone A flood zone. Zone A is FEMA’s second highest risk category.

Then there were fuel supply issues. Fog Creek who makes and hosts Trello, Copilot and other popular platforms is in Peer 1 had to assemble a bucket brigade to carry diesel fuel up 17 stories to refuel a generator at Peer 1. As a precaution Trello was moved to Amazon Web Services and it seems to have suffered limited downtime, but the bucket brigade was required.

Shoretel, the VoIP provider, had 3 data centers – all in lower Manhattan, including 75 Broad St which did successfully switch over to generator power but due to “city restrictions” they had shut the generators down. 700 customers went down.

Fortunately, things did not get worse for Fog Creek, but carrying 5 gallon buckets of diesel fuel up 17 stories in a building with power problems strikes us as a recipe for something truly horrible.

 

squarespace-75broad-bucketTeams from Squarespace fill buckets with diesel fuel to haul them up 17 stories to the generator keeping the data center online. Staff from Peer 1, Squarespace and Fog Creek Software have formed this unusual Internet bucket brigade. (Photo via Squarespace)

A typical rack of servers requires 5 to 10 KW of power including cooling/HVAC. Typical data centers range in size from 5,000 to 40,000 square feet. A mid-sized facility at 20,000 SqFt would house about 600 racks. That equates to roughly 5 megawatts (MW) of power. A reasonably efficient diesel generator would require roughly 200 gallons of diesel per hour to push out 5 megawatts – that’s a bit over 3 gallons per minute.

Typically data centers tell us they have 1 week of diesel onsite and a resupply contract. A full week for a 20,000 SqFt data center is 34,000 gallons. We suspect that in lower Manhattan, the standard was more like 1 day. Then resupply problems hit because of the street flooding, and road and bridge closures.

 

What was the Risk?

The Mid-Atlantic States do not see nearly as many hurricanes as the Southeast and the Gulf Coast of the United States. The average return period for hurricanes within 50 miles of New York City is 18 to 19 years.

For the largest part of Hurricane season the Typical Hurricane Tracks, as observed by NOAA, take these storms out to see at the more northern latitudes of the NYC area.

Here are the July, August and September typical tracks:

july-hurricane-track

august-hurricane-track

september-hurricane-track

But look at how this changes in October:

october-hurricane-track

 

And notice how closely Hurricane Sandy lined up with the typical October track.

hurricane-sandy-track

 

Finally, what about the frequency of storm origin in October? Compare below the frequency map for August 21 – 31 origin, which is the peak of Hurricane Season, to the October 11 – 20 origin map below:

august_21_31_origins

october_21_31__origins

You can see that activity is less in October, but it’s hardly dormant as it is a few weeks later:

november_21_30__origins

Just as August and September are the periods of greatest risk in the Southeast and the Gulf Coast, October clearly presents the greatest risk of hurricanes in NYC.

What is the solution?

If these providers had built to the following standards, downtime would have been minimized:

  • One week of fuel for standby power onsite
  • Resupply plan for fuel in place – or
  • A redundant or backup site more than several hundred miles away

For any disaster recovery, hosting or colocation solution, we would look to the Uptime Institute who publishes the Data Center Site Infrastructure Tier Standard for Operational Sustainability.

Based on their standard, we’d offer the following. Red indicates higher risk profile of Lower Manhattan.

Disaster Risk Component Higher Risk Lower Risk
Flooding and Tsunami < 100 Year Flood Plain > 100 Year Flood Plain
Hurricanes and Tornadoes High Medium
Seismic Activity Zone 3 or 4 Zone 2A or 2B
Airport/Military Airfield < 3 miles from active runway > 3 miles from active runway
Adjacent Properties Chemical plant, etc. Office buildings, land
Transportation Corridors < 1 mile > 1 mile

 

To review your site’s risk of various natural disasters, see our Natural Disaster Risk Maps.

 

Disaster Recovery as a Service

mJobTime Systems: Server Backup enabled production through Hurricane Rita

mJobTime Systems: Server Backup enabled production through Hurricane Rita

In September 2005, less than one month after Katrina, Hurricane Rita loomed on the horizon, threatening an already tattered Gulf Coast economy. But well before knowing when his Business Continuity Plan would be put to the test, Mike Soniat, President of mJobTime Systems in Beaumont, Texas had implemented a robust GDV solution to protect his data and operations. When the storm hit, Global Data Vault’s Server Backup enabled production through Hurricane Rita, without  interruption, safely away from the extreme weather condition which inundated the company’s home location in Beaumont, Texas. rita_amo_2005266_lrg-233x300 “The challenge was finding a place with an internet connection for my laptop. Ultimately, I found myself in a crowded coffee shop in Baton Rouge…a town already overrun with Katrina victims,” Mike said. “I was relieved to be able to restore server data to my laptop. That was one of the few happy moments in an otherwise trying time.” mJobTime Systems mJobTime Systems is located in Beaumont, Texas just 30 miles from the coastal town of Sabine Pass. mJobTime Systems develops accounting and job cost software for contractors. Using real-life scenarios, mJobTime Systems’ team of experienced developers and accountants bring to market solutions and services that help those businesses manage their jobs or projects with ease, accuracy, and efficiency. To preserve the continuity of its revenue, mJobTime Systems must safeguard the software it develops for its customers. Evacuation On September 24, 2005, Rita made landfall between Sabine Pass, Texas and Johnsons Bayou, Louisiana, a region not unfamiliar with natural disasters. After Hurricane Audrey in 1957, many people and businesses migrated north to what is known today as Beaumont. In the hours leading up to Rita, the people of Beaumont evacuated. Mike arrived in Baton Rouge an evacuee, adding himself to the list of already displaced hurricane victims who sought shelter there just a few weeks earlier. “It was difficult just to find a place to sleep. All the hotels were booked. The traffic made it impossible to get anywhere. There were so many people packed into the city that all grocery store parking lots were full. I had spent hours trying to find a place to work. Finally, I found a coffee shop that just happened to have an internet connection I could access.” Results Because he used Global Data Vault’s Server solution, Mike was able to download the files he needed to continue with operations and production, and he met all necessary deadlines, despite the obstacles and unusual circumstances. mJobTime Systems is a small business that relies on its ability to produce and maintain for its customers continually on a day to day basis. “We care about our customers. We need to be there for them. We stick to our promises. Our jobs and their jobs depend on it.” Few disasters reach this scale, but mJobTime’s excellent preparedness and Global Data Vault’s Server Backup solution serve as an excellent model for Business Continuity Planning for any organization which relies on Information Technology.