Associate Managing Consultant
Get The Exercise Template
In business continuity and IT disaster recovery, the terms RTO and RPO are both often used in conversations about recovery requirements.
While the terms are closely related, they have distinct meanings and uses.
This blog will define RTOs and RPOs with a closer look at how these terms are used in business continuity and IT disaster recovery programs.
According to ISO 22300:2021, a Recovery Time Objective (RTO) is the “period of time following an incident within which a product or service or an activity is resumed, or resources are recovered.”
In business continuity and operational resilience conversations, RTO defines a specific period of time and is generally stated as a set number of hours, days, weeks, etc.
In simple terms, an RTO starts the clock ticking to mark the amount of time your organization can survive downtime and return to normal operations while maintaining continuity. You can use the RTO term for both business functions and resources. This is important to note because this is not the same for RPO.
RTOs vary from organization to organization, but here are some of the factors that might influence your RTO:
When discussing business continuity, your resources may cover a range of categories, including applications, vendors, facilities, people, and equipment. Each resource can have a unique RTO denoting the period of time that a specific resource should resume operations after a disruption.
According to the Federal Emergency Management Agency (FEMA), about 25% of businesses do not re-open after disasters. And, according to another report, about 16% of small-to-midsize business (SMB) executives don’t know their organization’s RTOs. Another quarter say if they experience a disaster, they think they can recover data within 10 minutes or less, with closer to 30% saying it could be done in less than an hour.
Unfortunately, conceptualizing and identifying RTOs is tricky. That’s why it’s important to take a closer look at the scope of RTOs to ensure you always set appropriate RTO timeframes for your resources—especially those that are most critical for your core operations.
The reality is, an RTO is based on many factors, and for some, an RTO could range from hours to weeks/months. To identify appropriate RTOs for your business functions, consider downtime impact, including impact types, such as financial, legal, operational, and reputational.
To identify an RTO, consider these two questions:
Answers to these questions can help align the “significant, negative impacts” with your organization’s risk appetite.
It’s important to note that if your executive leadership team accepts a certain risk tolerance, your RTOs should align with those tolerance levels.
To help you further identify resource RTOs, align your resources back to the business functions they support.
Think of it like this: If a business function can be down for a period of time, the resources required to perform that function can also be down for that amount of time. (Of course, there are always exceptions, such as information security tools that should always run.)
You should also consider any manual workarounds your team can use. If there are viable manual workarounds, the resource itself may have a longer RTO because downtime would not have as great of an impact on operational resilience.
RTOs vary from organization to organization. However, here are some RTO examples. What would these same RTOs look like for your organization?
According to ISO 22300:2021, a Recovery Point Objective (RPO) is the “point to which information used by an activity is restored to enable the activity to operate on resumption; can also be referred to as ‘maximum data loss.’”
The term RPO generally applies to a system or application that stores data. RPOs are metrics for determining how much data you’re willing to lose (or how much data you’re willing to re-enter) from backup to disaster recovery.
There are different ways to think about data loss. For example, you can think holistically–losing an entire database—or you can think about data loss from a transaction standpoint–losing the previous [period of time] of updates, files, transactions, etc.
When you talk about RPOs, you should refer to transactional data instead of archived data. For example, a legal contracts repository. Your RPO would apply to new or updated files, not historical data.
In other words, a one-day RPO or data loss tolerance means you could lose one day’s worth of updates or uploads to the system.
When determining RPOs, consider alternate sources of the data, including recreating lost data or work. If you have a backup source for the data—or if you can easily recreate the data—there may be more data loss tolerance.
Here are some factors that might influence your RPO:
It’s of note talking about RPOs that your organization’s back-up frequency may have the biggest impact on your RPO, but the frequency is sometimes overlooked in resiliency planning.
While many (hopefully most) organizations have routine data and system backups, those backups may not occur at the frequency an organization actually needs, something often not discovered until after a disaster or significant disruption.
So if these backups are so important, why don’t organizations do them more frequently?
In many cases, frequent backups are cost-prohibitive. The more data your organization has, and the more frequently it’s replicated and stored, the more storage space you need, which quickly adds up in costs.
Why is this important?
Because, if you experience a disaster or disruption, you can anticipate the possibility of data loss. Your RPO helps determine how much data you can risk losing, based on the amount of time from your most frequent backup to return to normal.
Though RTOs and RPOs are related, there are differences. While RTOs look forward in time (the amount of time you have to recover), RPOs look backward (when was your last best data backup and how long do you need to restore it?)
RPOs range in time from no data loss (0 hours) to a few days, depending on a variety of factors.
RTOs and RPOs are important factors in disaster recovery and business continuity planning. If you don’t know your RTO and RPO metrics, the bulk of your resiliency planning could be for naught—particularly if you underestimate how long your organization can survive downtime or the volume of data your organization can lose and still survive.
And while some organizations know their own time-related metrics, industry feedback suggests that as a whole, we don’t do a great job exploring RTOs for our vendors and suppliers. Supplier and vendor RTOs can directly impact and skew your own recovery metrics.
RTOs and RPOs are commonly used:
RTOs and RPOs set the foundational requirements for your business continuity and IT disaster recovery programs. It is important to understand and define these terms early in your program and routinely re-evaluate your RTOs and RPOs as your organization evolves.
Do you need help better understanding RTOs and RPOs and the role they play in your organization’s resilience? Contact a Castellan advisor today and we’ll be happy to help work through all of your questions.
Get The Exercise Template
Associate Managing Consultant
Get resilience insights delivered to your inbox.