Rackspace Data Center Outage: Cause for Concern or Time for a Change in Strategy?

Rackspace, a popular hosting provider in the cloud, suffered a significant outage on June 29, 2009. Apparently, a power interruption caused their Dallas (DFW-Grapevine) data center to go offline. Rackspace has posted a copy of the incident report here:

http://www.rackspace.com/downloads/pdfs/DFWIncidentReport6-29-2009.pdf

As a consequence, Rackspace expects to issue service credits to customers in the range of $2.5m-$3.5m. In response, Rackspace filed a Form 8-K with the SEC:

http://www.sec.gov/Archives/edgar/data/1107694/000118143109032728/rrd247155.htm

The Rackspace outage is bound to bring questions about the stability of services in the cloud. But should it? The outage that Rackspace (and their customers) experienced could have happened to any data center owner. So, why is Rackspace being held to a different standard?

Whenever a provider fails to deliver a service, it can affect a business that relies on those services. Just as a traditional IT organization would not rely on a single data center, nor should we expect the services we leverage in the cloud.

When working in the cloud, a change to the traditional method of redundancy is warranted. Cloud providers could potentially provide geo-diversity for customers. But the customer should really consider how to provide redundancy across providers. That way, if any failure happens with one provider, a second provider is there to pickup the demand.

In some ways, this potentially eliminates the value of an SLA (Service Level Agreement). I will discuss more on SLA value in a future blog post.

This redundancy does come at a cost (cloud-based or traditional model). A risk assessment and cost benefit analysis should be performed to better understand the options and path to take.