What is disaster recovery and how do you plan for it?
Let’s try a deviously simple thought experiment: What kind of disaster would it take to completely shut down your business? The amount of time you’re shut down doesn’t matter here, just the actual act of being completely shut down, with all activity grinding to a painful halt. What might cause this to happen to you? A virus? DDoS attack? Fire? Ragnarok? Your answer to the question matters.
Far too many businesses are sitting in one of two realities. Either they have no disaster recovery plan at all, or they have data backup, but that’s it. Needless to say, having no recovery plan is almost irreparably dangerous, while simply having a data backup, whether it be a cloud server, on-site, or a hybrid of both, is a good first step yet also horribly inadequate. In truth, every business needs a solid disaster recovery plan, while far too many don’t quite understand what this entails (or how simple it is to put one together).
- 1 What is disaster recovery?
- 2 Bring on the scare tactics
- 3 ‘Data backup/restoration’ and ‘disaster recovery’ are not the same thing
- 4 What does disaster recovery entail?
- 4.1 Run a Business Impact Analysis
- 4.2 Consider your specific vulnerabilities
- 4.3 Delineate responsibilities and decide on immediate steps
- 4.4 Repeat the process with intermediate and final recovery steps
- 4.5 Put your plan through the ringer through rigorous testing
- 4.6 Additional things to consider for your disaster recovery
What is disaster recovery?
In short, disaster recovery can be summed up in this way:
- Create a Business Impact Analysis
- Determine at-risk functions
- Evaluate recovery times
- Determine potential costs
- Plan for specific events
- Identify specific risks
- Determine immediate, intermediary and final recovery steps
- Identify key individuals to handle specific tasks
At a high level, disaster recovery is what occurs immediately after you’ve suffered a major data loss that impacts your business operations. There are any number of events that could cause you to lose data or lose access to your data. However, disaster recovery is necessary regardless of why you’ve suffered such a loss.
Most events that result in the need for disaster recovery can be mitigated through technological means. However, there are events that can occur that require physical solutions and which will take more time. In these cases, solid disaster recovery plans are all the more necessary, as they can help reduce the cost of a disaster and get your business back and up running faster.
Keep this in mind: A good disaster recovery plan can easily turn a single day downtime into a one-hour downtime, or a one-hour downtime into a 10 minute downtime. How you plan your disaster recovery may significantly impact how long your operations will be impacted.
Bring on the scare tactics
You’ve probably seen a few scary facts and figures about loss events. While they may seem impossible at times, they’re also strictly data-based and should be taken seriously:
- Such as the fact that IT downtime, regardless of the reason, can cost over $8,000/hour for a small business, or $700,000/hour for large enterprises.
- Or that 40 percent of businesses never reopen after a disaster.
Those two stats alone should give you pause, especially if you don’t have a disaster recovery plan in place. Although the average downtime is only around 90 minutes, that can be one very, very expensive hour and a half for any business. Living life on the edge may appeal to some, but most company COOs and accountants may run the numbers with more than a few beads of sweat running down their back.
The Disaster Recovery Preparedness Council also published a significant amount of data on the issue, with findings that were equally as troubling. In their 2014 Annual Report, they found that:
- 1 in 5 companies lost critical applications over a period of several days
- 36 percent of companies lost critical apps, VMs and data for several hours
- 60 percent of respondents did not have a well-documented disaster recovery plan
- 40 percent admitted that their current plan was ineffective when it was actually needed
- Over 65 percent found their plans failed when put to the test
- 12 percent found that their data backups were not recoverable for hours, while 2.5 percent found those backups were never recoverable at all
- Over 30 percent never fully recovered from the loss
- 50 percent of disasters were attributed to software or network failures
- Over 40 percent of failures were human errors (not malicious)
The Disaster Recovery Council report goes into far more detail what’s above, covering a large number of trouble areas related to disaster recovery plans. This survey is perhaps one of the best primers you may find regarding why a disaster recovery plan is utterly important.
Far too many businesses ignore disaster recovery or undervalue its importance until it’s too late. We’ll dive into developing disaster recovery plans, but first, let’s clear up a bit of confusion.
‘Data backup/restoration’ and ‘disaster recovery’ are not the same thing
If you constantly get these two confused, don’t worry. You’re in good company. But while they do go hand in hand, they are not the same thing.
‘Data backup and data restoration’ are the actions you take before and after a loss event.
Meanwhile, disaster recovery is the entire recovery plan that encompasses everything from replacing hardware to contacting your insurance company. To put it simply, data restoration is just a small piece of the data recovery puzzle.
Think of it like this: If the Samsung Galaxy Note 7 you use for business spontaneously bursts into flames, you’ll need more than just data restoration. You’ll need to file a claim, get a new device, backup your lost files and perhaps even have your IT professional(s) reimage the device to work with company servers and virtual machines. You’ll also have to go through the process of restoring user settings, restoring things such as email forwarding and reinstalling any necessary applications. That goes far beyond just restoring files. In fact, most of that process does not actually involve file restoration, which, quite frankly, is the easiest and perhaps quickest part of the process.
Now imagine you purchased your whole team Samsung Galaxy Note 7s and they all burst into flames. Your one IT professional might be able to handle your disaster in a few hours’ time while you switch over to a laptop or desktop to continue business operations, but he or she is going to need several days to handle the entire team’s new devices. You’re looking at tons of lost productivity time, something a simple data backup won’t solve. If you have no in-home IT professionals and utilize a managed services company, you’re in even more trouble.
‘Business continuity’ and ‘disaster recovery’ are not the same thing
This goes right alongside the idea expressed above. Just as data restoration is not a catch-all for disaster recovery, business continuity also does not envelop the entire concept of disaster recovery. Instead, business continuity is more the end goal of disaster recovery.
Immediately following a disaster that threatens to keep your business from actually doing business, your first question is going to be “how long will it take us to get back up and running?” In reality, the proper question should be “how long will it take us to resume normal operations?” The world “normal” is integral here.
Following a disaster, a proper disaster recovery plan should allow your business to maintain operations under certain contingency plans while you’re simultaneously working to restore what was lost during the disaster. This means that, for all intents and purposes, you should not have to ask how long it will take to get back up and running, but how long it will take for you to have normal operations where things are running more smoothly. Immediately following a disaster, you should have a tiered approach.
Your business should be able to operate without all of the regular systems you rely on for some time until everything is back to normal. If you have no plan in place for doing that, you’re taking on a fair amount of risk.
It’s best not to think of disaster recovery as a “one size fits all” plan. In fact, every company’s disaster recovery plan is going to be different. What goes into your personal disaster recovery plan can be affected by a multitude of different factors, such as company size, industry, number of employees, organizational and management structure, insurance policies held, federal regulations, technology ownership, physical location, and more. This can make disaster recovery seem overly complex. However, a solid disaster recovery plan only needs a heavy time investment at the beginning. Afterward, you can update that plan as needed.
A disaster recovery plan can be split into the following sections:
- Types of disasters
- Delineation of responsibilities
- Immediate responses (disaster specific)
- Intermediate recovery actions
- Long-term recovery actions
Additionally, when developing your disaster recovery plan, you can follow these steps:
- Run a Business Impact Analysis (BIA)
- Compile a list of your vulnerabilities through a more detailed risk assessment
- Delineate responsibilities for immediate steps
- Decide on immediate actions to take for each vulnerability (if some vulnerabilities have similar results, you can group them)
- Evaluate feasibility of immediate actions
- Develop plans for intermediate actions (new responsibility delineations as necessary)
- Evaluate feasibility of intermediate actions
- Develop plans for long-term actions (new responsibility delineations as necessary)
- Evaluate feasibility of long-term actions
- Test each broad vulnerability category
- Run cost projections for individual steps and for entire plan as a whole
Let’s take a look at each of the steps you might take while developing a disaster recovery plan.
Run a Business Impact Analysis
A Business Impact Analysis (BIA) should occur before you create any other part of your disaster recovery plan. Through your BIA, you’ll run projections for:
- Which business units are at risk
- Who within those business units is most responsible for handling emergencies
- What role/function each unit primarily performs
- What functions within those units are most critical or that “control” the operations of that unit
- The amount of time you’ll need to recover if those critical functions go down
- Point in time which needs to be restored (e.g., files that need restoration, based on last backup date)
- Which functions interoperate, and which additional functions need restoration for the whole system to get back to running as normal
This is only the top level of your BIA. Additionally, you will need to consider certain vulnerabilities that exist for the primary functions, such as:
- Any processes that also rely on those primary processes and functions
- A priority ranking of important sub-processes, to determine order in which they are restored
- Point in time which needs to be restored for sub-processes (e.g., files that need restoration, based on last backup date)
- Which sub-processes interoperate, and which additional functions need restoration for the whole system to get back to running as normal (as with above)
- A cost analysis for the primary process, such as annual revenue associated with that function (to aid in projections how much a disaster event will cost, down to the hour)
What this looks like will invariably be different for different businesses. However, here’s a potential scenario:
Let’s say you run a very small online marketplace that sells your home baked goods–cakes, pies, etc. Delicious! As your business is just you, you don’t have to worry about anyone else, so you are the primary contact person in charge of each “department”. For you, the most critical function is going to your physical property, your online store, and your computer equipment. If a fire breaks out in your home (reasonable assumption, even if undesirable), you could lose your home or more. But let’s say all that was lost was your computer equipment.
For you, that computer may have held critical data, such as customer information, personal business information, accounting info, etc., that you need. It’s also your direct access to your customers. How long would it take you to replace that lost data? One day? A week? Your Business Impact Analysis should examine how long it will take to resume some semblance of activity while you work out recovering the rest of your needed resources. You’ll need to purchase new computer equipment, as well as do a full restore of any lost files (this is hoping you had a recent backup). You’ll also need to consider how much that downtime is going to cost you.
It’s unlikely that you, as a home baker, would have your own company servers. More likely than not, you’ll keep valuable data on cloud servers, although you may have necessary data on your physical machine. Access to your online store is integral, however, as this is how you receive new orders and process important customer data. Running a BIA should help you quickly evaluate how fast you can regain access, and how quickly you can regain any lost data from your physical machine.
Every business will have a different BIA. The larger your business, the more time you’ll have to invest into performing your BIA as well. However, consider this an absolutely necessary first step to creating your disaster recovery plan, as the information you discover within your BIA will essentially guide your entire plan.
Consider your specific vulnerabilities
In running a BIA, you avoid this particular specific detail. However, next, consider where your business vulnerabilities exist. This may actually be the fun part of disaster recovery planning, as you can consider pretty much anything at first, then whittle down or lump your risks together into broader categories.
Lists could easily include:
- DDoS attacks
- Stolen/Lost company laptops
- Viruses or malware infections
- Terrorist attack
- Hacked and stolen customer data
- Server downtime for various reasons
Listing out your actual threats is important. This helps you understanding the extent of your business, as well as who will be necessary during the disaster recovery process. As stated, some threats may be similar in nature, and can more easily be grouped into broader categories. For example, a fire, flood or earthquake may have the similar effect, and could be grouped into one category: “Natural Disaster”. Meanwhile, a DDoS attack and a generic server downtime could be considered equivalent, as the end result is going to be the same. On a similar note, hacked and stolen data may well be considered similar to a stolen or lost company laptops, although the end result here is not always going to be the same.
With this knowledge in mind, you can more easily connect your actual threats to your BIA. Which threats are going to affect your higher-level processes? Which ones are going to require more recovery time? Consider this first before you begin to take the next step.
Delineate responsibilities and decide on immediate steps
Now that you’ve proposed specific threats, it’s time to decide who is going to be immediately responsible for handling a response. Important: Your IT person should not always be the first contact for all data loss events. Yes, your IT person/department is going to handle a good amount of the data recovery. However, many events require immediate actions that are best handled by either department heads, CEOs, or everyone within the company. For the most part, you’ll need to make a contact list for each type of event. Who should be contacted first, and what are the first steps that those individuals should take? This may be different depending on the actual event.
For example, if a flood occurs within your physical premises and overloads your on-site servers, there are going to be some immediate steps you’ll need to take to keep operations running. Anyone who works in the physical location will need to be notified immediately to work from home (this will prevent employees from showing up and floundering without getting any work done). If you have a secondary mirror site, your IT professionals will need to activate that site so your secondary server is online and working for your offsite workers as well as anyone visiting the site.
Note that these two steps can constitute the first response in a disaster recovery, and can help ensure you have very little downtime in the case of a flood at your physical premises. Also note that this requires two things: a secondary mirror site for your servers and all workers understanding the protocol that should be followed in this instance.
Possessing no disaster recovery plan for this scenario, e.g., no mirror server site and no employee notification to work from home, would have resulted in a down site for hours on end and employees wasting time having to commute back home, where they may or may not be able to perform their normal duties off-site. A simple recovery plan in this situation would save tens of thousands of dollars.
Repeat the process with intermediate and final recovery steps
Once you have immediate steps in place, including who should be contacted first in different events, and what steps those individuals should take upon contact, decide on how to proceed in the intermediate and long-term. Things such as purchasing new equipment would be considered an intermediate step, while full bare metal recovery would be considered a long-term step.
In the case of stolen data, you may consider reporting to the proper authorities and customers an intermediate or long-term step, depending on your industry and legal requirements. Your first step, however, should always be to “stop the bleeding”, so to speak, and regain some semblance of normal operations while you work on restoring business to normal.
Put your plan through the ringer through rigorous testing
It’s great to have a disaster recovery plan, as this can provide some feeling of security. However, that plan is going to be pointless if you don’t test it before an actual event occurs. Put every aspect of your plan to the test, including testing for every potential disaster or disaster type. You’ll want to test:
- How long the plan take to enact
- Whether your immediate, intermediate and long-term steps are viable
- Whether your first contacts for each event are well-versed on their responsibilities
- How much downtime you’ll experience with your plan, versus without one
Additional things to consider for your disaster recovery
Before you begin developing your disaster recovery plan, consider all of the following essential tools for any business:
- Have a solid data backup plan in place, and include bare metal backup in that plan. Regular data backup typically only includes files. However, should you need to purchase or build an entirely new on-site server, you’ll need more than just files. The entire virtual machine, operating system, system settings, user settings and profiles, customer data and more will be installed through this process. We like iDrive’s cloud backup solution for this purpose. iDrive provides a high-functioning, valuable service that will aid in this effort. You can read our iDrive review here.
- Have a mirror site with a separate server at a separate location. Having this at a separate location is important, as keeping your main server and backup server on the same premises could easily result in you losing both at the same time. Ensure that your backup server directly mirrors your main one, such that you can immediately switch to the mirror site
- Invest in malware/antivirus software that can be applied to company servers and computers quickly and efficiently. Viruses that attack servers can bring the house down, while infected computers can spread them across a network. Should a virus get in, have a plan in place to deal with it immediately
- Keep physical copies of your disaster recovery plan. This one should be obvious, however: if all of your computers go down, and your disaster recovery plan was digital, you may have a problem
- Ensure all employees have an emergency contact sheet should they notice a problem