More Than A Plan: Establishing A Disaster Recovery Program
Many organizations think having a disaster recovery plan is all the protection they need from disasters. However, there is so much more to disaster recovery than just a plan! That’s why most industry professionals see disaster recovery as an ongoing program or process that contains a number of distinct elements. Key process activities include:
- Business engagement and establishment of business requirements (through business impact analyses and risk assessments), resulting in the definition of recovery time objectives, recovery point objectives, and downtime procedures (manual workarounds)
- Identification, evaluation, and selection of appropriate recovery approaches to achieve business requirements, including defined ongoing budget commitments and staff allocations
- Development of plans for technical recovery and coordination of the recovery effort
- Execution of ongoing exercising and training
In addition to process elements, the following governance activities are also typically performed:
- Management engagement through recurring steering committee meetings (management reviews)
- Formalized, recurring planning activities documented in governance documentation
- Corrective action tracking and prioritization, as well as post-incident reporting and analysis
These process and governance activities, taken together, represent leading practices for a disaster recovery capability. However, it’s important to note that every organization is different and may require different levels of maturity and formality of the above activities.
In addition, the ideal IT disaster recovery capability is also one that supports a broader business continuity program that addresses aspects of recovery beyond IT, such as a facility, personnel or supplier loss. While most organizations start with IT disaster recovery, the goal is ultimately to address business continuity as well. As a result, building your IT disaster recovery program so it will align effectively with an eventual business continuity capability is another key consideration.
There are multiple standards and methodologies that provide guidance for establishing an IT disaster recovery program. Two of the more comprehensive are ISO 27031 and ITIL IT Service Continuity Management (ITSCM). This article will explore the components ISO 27031 and ITSCM, their similarities and differences, and their relationship to business continuity management.
ISO 27031:2011 – Information and communications technology (ICT) continuity management, developed originally by the British Standards Institute (BSI), was accepted as an ISO standard in 2011 and represents a management systems-based implementation of an IT disaster recovery program. It has six key principles:
- Protecting the ICT environment from incidents, failures and disruptions
- Detecting incidents at the earliest possible time
- Reacting to incidents as efficiently as possible
- Recovering by identifying and implementing appropriate recovery strategies
- Operating in disaster recovery mode
- Returning to normal operations
While ISO 27031 is intended for use in the larger context of a business continuity program, organizations have successfully implemented this standard and then later grew into business continuity.
Structured as a management systems-based standard, ISO 27031 has two main components: the management system and the process. The management system is intended to ensure that an organization has a documented process to execute ICT continuity management. It utilizes the plan-do-check-act (PDCA) cycle consistent with ISO and other management system based standards. The process details the necessary components to provide the recovery capability. While the management system described in ISO 27031 can be established solely for IT disaster recovery, there are elements of the process that assume the existence of an overall business continuity program. As you can see below, ICT requirements are established by business continuity requirements typically determined during a business impact analysis.
The process of developing, maintaining, and improving an ICT capability are defined as five high level components:
- Understanding the ICT requirements for business continuity – with the purpose of determining the ICT continuity services needed to support the business continuity requirements. The process requires understanding the components of critical services in production, their current continuity capability and the gap between current capabilities and business continuity requirements. The analysis should also focus on actions that can be taken to improve the resiliency of the production environment
- Determining ICT continuity strategies – with the purpose of developing both an overall ICT continuity management strategy and strategies for each critical ICT service that closes gaps identified during the previous phase
- Developing and implementing ICT strategies – with the purpose of implementing the chosen strategies, including establishing the necessary organizational structure, plans and procedures
- Exercising and testing – with the purpose of ensuring that the strategies and plans work as intended
- Maintenance, review and improvement – with the purpose of ensuring that ICT continuity strategy remains current and appropriate
For those familiar with BS 25999-2:2007, Business continuity management, the structure above is consistent with sections four through six of that standard.
Given the similarities to BS 25999, ISO 27031 is the logical choice for implementing a disaster recovery capability in organizations that either utilize BS 25999 for business continuity or have other management systems-based programs. It also provides solid guidance for organizations that have no business continuity or other structure in place to serve as a basis for disaster recovery development. Establishing a management system as part of an ISO 27031 implementation will provide the necessary governance and provide a platform for the development of a more comprehensive business continuity program.
Many organizations have adopted ITIL IT Service Management (ITSM) as the model for the operation of their IT function. Within ITSM are multiple processes that set standards for the design and operation of IT services. At the core of ITIL is Service Design. As can be seen from the diagram below, Service Design includes multiple disciplines, one of which is Service Continuity Management (ITSCM).
ITSCM, much like ISO 27031, is composed of multiple processes intended to ensure that disaster recovery is established, implemented, and maintained over time. While not a management system, it provides similar requirements, especially around testing, review, and continuous improvement.
As with ISO 27031, ITSCM assumes the existence of a business continuity capability as the source of business continuity requirements. The core elements of ITSCM are described below:
- Design Services for Continuity
Process Objective: To design appropriate and cost-justifiable continuity mechanisms and procedures to meet the agreed business continuity targets. This includes the design of risk reduction measures and recovery plans.
- ITSCM Support
Process Objective: To make sure that all members of IT staff with responsibilities for fighting disasters are aware of their exact duties, and to make sure that all relevant information is readily available when a disaster occurs.
- ITSCM Training and Testing
Process Objective: To make sure that all preventive measures and recovery mechanisms for the case of disaster events are subject to regular testing.
- ITSCM Review
Process Objective: To review if disaster prevention measures are still in line with risk perceptions from the business side, and to verify if continuity measures and procedures are regularly maintained and tested.
The two approaches identified above represent the most common methodologies for building a strong program. If based on legitimate business requirements, adequately funded and staffed, and updated on a regular basis, either ISO 27031 or ITSCM will serve as a solid disaster recovery model. Choosing which to use should be based on the culture of the organization and existing processes that can be leveraged. If management systems exist in other disciplines, or no formal structure exists, ISO 27031 would be a good choice because it includes a governance model as part of the standard. If an organization has adopted ITIL to guide overall service management ITSCM seems the natural choice. Caution should be taken, however, to ensure that the program has visibility and support outside of the IT organization. Without it, disconnects between business requirements and IT strategies often occur.
Get resilience insights delivered to your inbox.