Many people confuse the terms Disaster Recovery and Business Continuity, whilst it’s true that in both cases you’re recovering from a disaster, they’re not the same.
In accordance to ISO standards, Disaster Recovery ensures the recovery of IT systems, it is data centric, while business continuity ensures the continuity of the business in case of a disaster more properly disruptive incident, its focus is on the business.
In today’s world however, both are often intertwined, specifically with digital services.
If you take Netflix for example, it relies heavily on servers and other cloud services to provide an all year round streaming. However, that’s not the only aspect of a “disaster”, a business continuity is to make sure in one of its many aspects that the people are safe, another location or offices are available for staff to work from in case the main office turned to ashes.
If you take a retail store, and their entire POS systems are down, their DB server is down and customers are unable to make any purchases. Do staff members tell customers they can’t buy the items or the business should have a process of using pen and paper instead? Record every purchase, price, etc., and then enter them into the systems when they’re back on?
This is a disaster recovery if only their IT Systems were down, and would be considered a business continuity if the head office is on fire.
Our focus in this paper is not about business continuity, however it is about Disaster Recovery.
What is Disaster Recovery?
As aforementioned, disaster recovery is the recovery of IT systems (hardware, virtual, applications, DBs and so on), and it’s not all about Recovery Point Objective (RPO), and Recovery Time Objective (RTO), or defining business critical applications.
Although the three elements above; RPO, RTO, and business critical applications are quite important, what’s even more important is a Disaster Recovery Plan.
Before going into the Disaster recovery plan and strategy however, let us first define RPO and RTO in a little bit more details.
- The Recovery Point Objective (RPO) is to define the continuity of the services in a defined time, that is, how much data the business can afford to lose to a certain point, before it can recover from a disaster. In other words, how far back in time can a business go, to be able to resume its work? Take for example a Database server, can a business afford losing 24 hours of data, 8 hours of data, 2 hours of data? This is where an RPO gets defined.
- The Recovery Time Objective (RTO) is to define the amount of time a business can operate without its IT systems being available. Can a DB server, or an Exchange server be down for 2 hours before the panic button is hit? This is where we define how soon a system should be back online and running again.
- Business critical applications will need to be defined to determine the priority of systems that will need to be recovered. Should a business recover its printing server first or the Domain Controller? Or should they recover their SCCM server or SQL server?
Many organisations think that’s the holy trinity of Disaster Recovery, but it isn’t.
Disaster Recovery Plan and Strategy
Equally important to the RPO, RTO and applications, is having a Disaster Recovery Plan and a strategy.
The important bit of any plan, is a drill. Every now and then, the IT department should have a drill of its Disaster Recovery to execute a plan and later the strategy or process.
This is important for many reasons; imagine for instance you’re recovering a VM from a certain date, or even you’re recovering an entire DB of a given date, but that backup is actually corrupt. You need to make sure that backups are healthy and the people responsible for backups actually know in a pre-defined process a plan to follow. You don’t want to execute a wrong plan, otherwise not only will the business suffer for a longer period of time, but also you may hit the wrong RPO and RTO, and eventually be in a non-recovery situation.
Determine a DR Plan
A DR plan have multiple elements to it, to name a few:
- Role assignment; no business or department should rely on one person to do it all. And since Disaster Recovery is the recovery of IT systems, each person within the team should have a role assigned to them. Note however, in a business continuity roles are quite different. But back to DR, you couldn’t expect a business analyst to perform a restore of files, applications or VMs (unless trained).
- Inventory; you will need to determine what the business requires while performing a recovery. And this is part of the DR plan (we will discuss the plan later in the post). There are always physical devices in the IT Server Room (routers, switches, hypervisors, etc.). Defining your inventory for each scenario would help you determine what you need to recover. Recovering a VM is entirely different than from your hypervisor catching fire, which will mean getting a new server and install the hypervisor, delivery of the server alone can take time.
- Backup/Restore check; performing backup and restore check every now and then is important to make sure that the data that is backed up is useable, otherwise primarily you will miss the RPO. I wouldn’t imagine any business can survive with a 2TB loss of data.
There are other elements to this of course, such as vendor communication, having updated documentation of the environment and so on.
Disaster Recovery Plan and Strategy
The more effective your plan and strategy or process on how to respond to a disaster recovery, with previous exercises, the better you will recover. There’s a simple yet very effective recovery plan for DR that organisations usually follow, outlining:
- Critical System; Define your critical systems so you know how to respond to each
- RTO/RPO (hours); Define both RPO and RTO in hours, or minutes
- Threat; Define the threat, e.g. AD object deletion
- Prevention Strategy; an example would be protect accidental deletion of objects
- Response Strategy; Restore deleted object using restore solution
- Recovery Strategy; How approach the recovery, who needs to be involved etc.
Then, that plan is transformed to a process or plan strategy on how to respond, this basically maps out the steps that will need to be taken in case of a disaster, the following will be outlined:
- Critical System; Your defined critical systems
- Threat; What was/is the threat
- Response Strategy; the defined response strategy
- Response Action Steps; document step by step how to respond
- Recovery Strategy; the defined recovery strategy
- Recovery Strategy steps; map out step by step the recovery strategy and future prevention
High Availability is not Disaster Recovery
There’s a misconception about high availability and disaster recovery; they’re not related. If you take a SQL Server Enterprise 2012 and onward, with SQL AlwaysOn Availability Group enabled, this would mean that the DB is available on two different SQL server, your application is configured to talk with the “listener” that’s configured in your SQL server. Meaning that, if SQL A goes down, your application will continue operating, as SQL B is still online and the secondary DB being active on SQL B.
However, if an object is deleted on the primary DB, that deletion will replicate across the secondary DB that is available on SQL B.
Therefore, you will still need to recover that object from a previous backup.
How can Dilignet help?
Dilignet has the skills, experiences, and resources to help you plan, execute and recover from a disaster with a minimum business disruption. We work closely with you to help identify and manage your environment whilst utilising the latest cloud solutions.
Why Cloud solutions?
Disaster Recovery as a Service (DRaaS) helps businesses recover quickly from a disaster. Cloud services and solutions are redundant, therefore backups that are backed up to a location or storage in the cloud of choice, is replicated to multiple locations, think of it like disk mirroring.
Unlike traditional backup solutions, it’s better to use cloud backups, we could either help you adopt new backup solutions, or use your backup solution to take advantage of the cloud. Depending on the backup solution your company uses, we can help leverage cloud backups in multiple ways.
Traditional on premise backups and tape backups are not good enough, for many reasons. For example, if your IT service room is on fire, or your storage solution failed, the amount of time and cost that will take to recover could be catastrophic to the company. Whilst utilising cloud solutions, systems are virtually instantly available for you to restore VMs, objects or applications.
Whether your cloud of choice is Azure or AWS, we can help you utilise both cloud vendors for a complete, end-to-end backup solutions and a disaster recovery plan. We could also help you leverage your own backup solution, without the need for additional licenses.
Talk to us to find out more.