Between natural disasters like Hurricanes Sandy and Irene or man-made disasters like the recent data center outages, disasters happen. The question isn’t whether they will happen. The question is: What can be done to avoid the next one? Cloud computing provides a significant advantage to avoid disaster. However, simply leveraging cloud-based services is not enough. First, a tiered approach in leveraging cloud-based services is needed. Second, a new architectural paradigm is needed. Third, organizations need to consider the holistic range of issues they will contend with.
Technology Clouds Help Natural Clouds
If used correctly, cloud computing can significantly limit or completely avoid outages. Cloud offers a physical abstraction layer and allows applications to be located outside of disaster zones where services, staff and recovery efforts do not conflict.
- Leverage commercial data centers and Infrastructure as a Service (IaaS). Commercial data centers are designed to be more robust and resilient. Prior to a disaster, IaaS provides the ability to move applications to alternative facilities out of harms way.
- Leverage core application and platform services. This may come in the form of PaaS or SaaS. These service providers often architect solutions that are able to withstand single data center outages. That is not true in every case, but by leveraging this in addition to other changes, the risks are mitigated.
In all cases, it is important to ‘trust but verify’ when evaluating providers. Neither tier provides a silver bullet. The key is: Take a multi-faceted approach that architects services with the assumption for failure.
Changes in Application Resiliency
Historically, application resiliency relied heavily on redundant infrastructure. Judging from the responses to Amazon’s recent outages, users still make this assumption. The paradigm needs to change. Applications need to take more responsibility for resiliency. By doing so, applications ensure service availability in times of infrastructure failure.
In a recent blog post, I discussed the relationship cloud computing provides to greenfield and legacy applications. Legacy applications present a challenge to move into cloud-based services. They can (and eventually should) be moved into cloud. However, it will require a bit of work to take advantage of what cloud offers.
Greenfield applications, on the other hand, present a unique opportunity to fully take advantage of cloud-based services…if used correctly. With Hurricane Sandy, we saw greenfield applications still using the old paradigm of relying heavily on redundant infrastructure. And the consequence was significant application outages due to infrastructure failures. Consequently, greenfield applications that rely on the new paradigm (ie: Netflix) experienced no downtime due to Sandy. Netflix not only avoided disaster, but saw a 20% increase in streaming viewers.
Moving Beyond Technology
Leveraging cloud-based services requires more than a technology change. Organizational impact, process changes and governance are just a few of the things to consider. Organizations need to consider the changes to access, skill sets and roles. Is staff in other regions able to assist if local staff is impacted by the disaster? Fundamental changes from change management to application design processes will change too. And at what point are services preemptively moved to avoid disaster? Lastly, how do governance models change if the core players are out of pocket due to disaster? Without considering these changes, the risks increase exponentially.
So, where you do you get started? First, determine where you are today. All good maps start with a “You Are Here” label. Consider how to best leverage cloud services and build a plan. Take into account your disaster recovery and business continuity planning. Then put the plan in motion. Test your disaster scenarios to improve your ability to withstand outages. Hopefully by the time the next disaster hits (and it will), you will be in a better place to weather the storm.