Last month I posted a video that gave a brief overview of how data replication is used for continuous availability
. I then got ask why I used the term continuous availability instead of high availability or disaster recovery. "What's the difference?" Well, while these solutions have a lot of overlap and can usually be built from some of the same components, they can have different objectives. Different objects can mean different implementations. So, I'm going to use this post to define how I use these terms when talking about replication and why I chose continuous availability.
Disaster Recovery (DR)
The goal of disaster recovery is to restore the business after an unplanned outage. It does this by providing a standby of a primary database and keeping it current through replication of changes from the primary. Changes can be replicated synchronously or asynchronously. If done asynchronously and an outage happens, changes may be lost (if the primary is down permanently) or stranded (if the primary is down temporarily). The amount of change data lost or stranded is dependent on the latency of replicated changes. If done synchronously, no committed changes are lost or stranded in an outage.
This leads to two other terms - recovery point objective and recovery time objective.
Recovery point objective (RPO) is an industry term used to describe the amount of changed data a business is willing to lose in an outage. You'll see the term "RPO=0" used when a company says its business cannot afford to lose any data in an outage. "RPO>0" says the company can go back to some prior point in time (i.e., strand or lose a defined time's worth of data) to restart its business.
When a company is willing to accept a loss of data (RPO>0), they are generally driven by a desire to limit recovery time to an acceptable level. This is called the recovery time objective (RTO). With most disaster recovery solutions, there's a tradeoff between RPO and RTO. For example, if you want no data loss (RPO=0), some solutions require time to complete recovery before coming back on-line (RTO>0).
However, an important point here is that the definition of disaster recovery does not include a statement about the availability of the data either during a disaster or during recovery. That's where the terms high and continuous availability come in.
High Availability (HA)
The goals of high availability (HA) are to (1) make data available during defined periods and (2) meet availability objectives during those periods. HA solutions account for both planned and unplanned outages, but allow for something less than 100% availability. To meet objectives, an HA solution typically includes a window for planned outages such as maintenance.
A company might have HA objectives like be one of the following which are usually formalized in a service level agreement (SLA):
- Make data available 99.9% of the time for the year.
- Data can be unavailable for no more than 10 hours a year during defined business hours.
So what's the difference between disaster recovery and HA? Key points are:
- As stated previously, disaster recovery focuses more on unplanned outages and maximizing recovery of data, while HA focuses more on making data available and accounting for both planned and unplanned outages.
- Building on the previous point - HA doesn't offer a guarantee that no data is lost in a disaster.
- Disaster recovery solutions tend to be more of a single-site solution, with primary and standby being relatively close to one another, while HA is often used between sites that are separated by geographic distances spanning time zones.
The last two point are why disaster recovery and HA are used together - two sites can each have a disaster recovery solution for their local copy of a database and then use an HA solution to keep data highly available between sites.
However, many global businesses now need systems and data available without interruption. In other words, they want 100% availability. That's where continuous availability comes in.
The goal of continuous availability is to ensure data is always available for business needs. In other words, the goal is 100% data availability. Planned and unplanned outages of systems or software should have no effect on availability. This is usually achieved by using replication to maintain multiple active copies of the data. Like HA, the solution can also be enhanced with a disaster recovery component for unplanned outages.
To be clear, an 'active' solution means that any copy of the data could be read or modified. It also means that at least one site is always fully active and ready to accept workload.
Most continuous availability solutions today are for two databases and allow for workload balancing across sites. The terms Active-Active Databases and Active-Active or Dual Warehousing are used for the most common variations. However, there is also a trend towards availability solutions that involve three or more sites.
So, why did I use continuous availability with last month's video? It's because most modern data replication technologies are being designed to be either a solution or a part of solutions that target 100% availability of data. Naturally, if you have a different view, feel free to use the comment section below :)