Regardless of whether your business is big or small, there’s always a chance of disaster. The disaster can be natural or manmade, but shrewd preparation is your best bet for warding off sheer calamity. To ensure you have every conceivable situation covered, it’s prudent that you set up a thoroughly vetted network disaster recovery plan. A network disaster recovery plan ensures that IT services can be backed up and put back online as quickly as possible. Every good organization has an IT recovery plan, and a network disaster recovery plan is just one element of this plan, albeit a crucial one.
There are various natural reasons for a disaster—such as fire, flooding, and earthquakes—but a major reason for network disasters beyond the scope of our environment is the nature of hardware itself: namely, technology failure. But technology failure can also occur as a result of malicious attacks from malware or hackers, and sometimes it can even be a result of naivety or incompetence on the part of the network administrator. There can be many reasons for network disaster, but every business needs a plan to recover from a worst-case scenario. In this article, we will discuss how to recover when your business is hit by a network disaster as well as make sure that the plan itself is watertight in the first place.
Setting up a plan
A major mistake made by many companies is not having a network disaster recovery plan in the first place. There are many reasons why they opt not to have one. The plan costs money, time, and effort. Company executives who only have their eyes on the bottom line don’t think far enough ahead and decide it is not worth the investment of time and money to protect the company from disaster. Tight-fisted CEOs will happily cut corners by gambling that such a catastrophe won’t happen to them, but this is like driving a car without insurance: risky and ill-advised. No one knows when a disaster may occur; consequently, if there is no recovery plan for such a disaster, the business will suffer immensely once such a disaster does occur. And given enough time, it will.
Before we take a look at how the plan should be constructed, we want to make an important note. There are a lot of initialisms, such as Business Continuity Plan (BCP) and Business Impact Analysis [BIA]) throughout this article. Generally speaking, such initialisms are fine—or even welcomed—but it is my opinion that the sheer amount of initialisms that would have to be remembered in this article while reading would be a source of frustration, not convenience. Hence I have opted to write out all terms fully rather than shorten them as initialisms or acronyms.
The first step is to have a business continuity plan. But before a business continuity plan is drafted, one should carry out a business impact analysis. This analysis aims to differentiate between critical and non-critical functions. The lost income is estimated by the business impact analysis.
After the impact analysis is ready, it’s time for risk and threat analysis. There could be many risks and threats both before and after a failure. Nobody is sure that only a certain type of impact will happen after a disaster. As such, the plan needs to analyze every type of impact that can occur before, after, or during a disaster. Each type of risk and threat needs to be carefully analyzed before going ahead.
Every recovery plan should have a budget. The cost of failure gives an approximation for this budget. There are two important goals that need to be set while considering your options: cost and benefits.
A recovery point objective is a level of data loss that is expressed in time. For example, « one day of data ». Recovery time objective is the time limit that stakeholders are likely to accept a loss of service.
The leaders of a business are responsible for the company’s recovery point objective and recovery time objective. These two goals are necessary for a recovery plan. The recovery point objective will help in estimating how much data could not be recovered. Data is always important for a business, and it is essential to assess and take stock of how much data was recovered and how much was lost. After a disaster occurs, a company needs a certain amount of time to recover from the loss. The recovery point objective gives a time estimation for the recovery. Hopefully, with careful backup and planning, the loss under many scenarios will be minimal.
Another important task is to create a list of impact scenarios. Each service in the company has a specific list of impact scenarios. You need to analyze all of the possible aftermaths from impact scenarios that can arise from varying types of disasters. This list deals with such needs.
Starting phase: recovery strategy
After the business continuity plan, you are ready to start the network recovery plan. The very first phase is the recovery strategy. Follow these steps carefully to ensure that you develop a solid recovery strategy.
- First of all, you need a list that contains the staff, software, hardware, and/or third-party services that you may need. This worksheet is known as your business continuity resource requirements. This list should have everything that is necessary for your plan to be fully realized.
- You now have the recovery time objective and must now prioritize each service requirement according to it.
As instructed before, you should have created a list of impact scenarios. Along with goals set in the business impact scenarios, you can set various strategies, including the following:
- During a network disaster, the primary objective should be to bring back the network as soon as possible. Try as many different options as practicable to bring back the network so long as it is compatible with recovery time objectives.
- For each option, get the cost estimate and present it to the budget holder for approval.
- Analyze the cost and effectiveness of the option you choose. Select the option that has the better efficiency and budget compatibility.
There could be many options for the recovery strategy. It solely depends upon the nature of your business. For example, if your office is not available because of a natural or manmade disaster, switch the staff to a home-office or offsite configuration. Use the internet for a network until everything is recovered. This could be a suitable strategy, but obviously it depends on the requirements of your business. If your business has multiple locations and one of them is out because of a disaster, you can switch to another location for the time being, or even spread out the affected employees to minimize impact at the other locations. There could be many viable options, but these options heavily depend upon the nature of the business.
Develop the network recovery plan
We have already gone over the business continuity plan and recovery strategy, but now it’s time to develop the network recovery plan. To put things concisely: modern businesses usually have a lot of data. In some cases, entire companies can be laid to waste if their recovery plan is obsolete or non-existent. Whenever a network disaster occurs, you need to be ready to handle the data. Data backup is the best you can do to save as much data as possible before everything is lost. However, it is also crucial to stress how important regular backups are. For example, it does little good doing only monthly backups for a daily newspaper. You need to select a backup schedule that makes sense for your business. In some cases, hourly backups might not suffice; instead, real-time cloud backups are needed to completely eliminate the risk of any data whatsoever being lost.
There are almost always multiple departments in a business. While you are saving the data, other departments should be saving receipts, paper invoices, and other physical records that might keep some records archived without needing the network. Some disasters (such as a fire) would rule out this possibility, which is why it’s important to be thorough in your plan. You might already have a pre-existing plan containing recovery point objectives and recovery time objectives, but these are always subject to change and should be checked routinely, not just after a disaster when it may be too late.
This is important. A disaster can occur at any time, or it may never occur during the operation of a business. But just setting out the plans is not enough—you need to also test the continuity plan to make sure that, in the time of a disaster, everything goes smoothly. You might need additional resources, or the staff might need some training before they can handle everything properly during a disaster.
There is a variety of methods that can be used to test the business continuity plan. Let’s discuss a few of them.
- Plan review: Here, the testing is done by high-level officials such as department heads. The main goal is to review the plan developed earlier. They can review if the plan is effective or not, or if it needs improvements. There are plenty of matters to review, including making sure the contact details of the staff are up to date and that the budget for recovery is sufficient. For a business with a large number of employees, a plan review may include training managers, who in turn can pass the knowledge on to their team.
- Tabletop exercise (structured walkthrough test): This type of exercise is done at the core of the business. It usually targets a single aspect of the business continuity plan. It makes sure all the personnel involved in the plan are aware and familiar with the relevant portions of the business continuity plan as well as their role in a disaster/event.
Typically, there is a discussion of one or more disaster scenarios, during which the responsibilities are outlined, response procedures are reviewed, and necessary improvements are uncovered.
- Walkthrough drill (simulation test): Plan reviews and tabletop exercises are general discussions, while a walkthrough drill is a hands-on version. Here, the drills are created with a small team in mind, or perhaps multiple small teams that are expected to work closely at the time of disaster recovery. Many actions, such as restoring backups, live testing of redundant systems, and other relevant processes, are performed during the walkthrough drill. A walkthrough drill may also include a simulated response at alternate locations, validation of response processes/systems, and varying degrees of notification and resource mobilization. This walkthrough is done to ensure a team (and, more crucially, the individuals within the team) know their individual roles in the recovery process.
- Full recovery test (functional test): This is perhaps the most complex part of the operation. In a functional test, complex activities such as deploying your backup systems and processing transactions or data are performed. These tests are performed as though a real disaster has occurred. It generally involves the diligent cooperation of almost every department in a business.
Above are the common testing methods. But one more thing that matters is how often a business should perform these tests. Well, obviously, there are no set rules about this. It depends upon the size of the business, time, industry, staff availability, and resources. However, as a general rule of thumb, testing such as tabletop and walkthrough exercises should be performed annually, while testing multiple scenarios. The scenarios with higher risk should be given priority. The full recovery test is a big process and should be done every other year. This is because it can cost a significant amount of money, time, and human resources, but it should not be avoided altogether.
Remember, involve the vendor partners in the testing processes as much as you can. This will not only provide better accuracy and usability but also the feedback received from the vendors can help in making better improvements. Finally, make sure documentation of all testing processes is made. The documentation can be useful for the next time.
Plan for each scenario
We’ve now established the methods to test the business continuity plan, but there is more than one scenario for a given disaster. Obviously, there could be an infinite amount of scenarios, but looking at the most likely scenarios is a good start. It could be a fire that completely destroyed the building, or it could be a hacker attack. Of course, the plan for the fire will differ from the plan for the hacker’s attack. Therefore, there should be a plan for each of the most likely scenarios. Hardware could fail in many ways: mechanical failure, EMP (Electromagnetic Pulse) attack, demagnetization, fire, and so on. The important aspect is how you handle the loss of data.
There will be variations in each plan, but there will also be some common elements in each plan. Here are some important steps that should be considered in each plan:
- Each scenario has a plan, but it should be carefully determined when the plan itself is triggered.
- If a disaster occurs, the key staff are to be informed. What you need here is a contact list that has contact numbers of all of that key staff.
- There is always a recovery leader. The leader’s contact details (and the details of the deputies) should be provided. It should be known who will take charge when the recovery leader is not available.
- A task sheet for each task. It will show the person responsible for delivery to the deadline.
- A checklist containing all the hardware requirements attached to each task description.
- A list containing the contact details of preferred suppliers for each piece of equipment. It also contains the number of each item.
- Temporary workarounds and their descriptions.
- Details of the recovery service host. If there is any cloud backup server or other agreement with the managed service providers, their contact details would be there; these usually include phone numbers, email addresses, account numbers, and so on.
Multiple copies of the plan should be stored in digital format and spread over several sites. If you only have one site, then it should be stored at the backup server, which may be self-managed on a cloud service or managed by a third party as part of a storage and maintenance package.
There should also be multiple copies of the plan in hard copy, and they should be spread over several sites as well. If there is only one site, the plan should be stored far away from the primary location.
Maintaining the plan
As alluded to earlier, a common mistake that many businesses make is not maintaining the plan. Once it is created, it does not mean it will work fine forever. It should never be neglected because the organization and its network can change over time. As such, the plan needs to be updated regularly so that it can work properly in accordance with the changes in personnel, services, equipment, sites, and business processes.
There is no specific time for reviewing the plan, but it is recommended that the plan should be reviewed every six months. If there is any replacement in the key staff, the new members should be trained as part of the onboarding process. Moreover, other members of the staff should be notified about the replacements when they occur. The testing process should be made without wasting time if there is a change in the plan. Overall, the plan should be maintained properly, and no aspect should be neglected when there is a change in the plan.
The plan is made and everything is where it should be. But what if, say, some kind of malware attacked your digital copies of the recovery plan during a network disaster? Or if the plan was written on paper placed somewhere in the building and that part of the building is not accessible because of some kind of natural disaster? To overcome such situations, you should make sure that the plan is suitably secured in multiple mediums.
It might seem wise to have the plan shared among the people within an organization, but make sure none of them has a copy of it on their desktops or in any paper format where it can fall into the wrong hands. The plans should always be kept secure, as a malicious individual could compromise the plan or exploit weaknesses in the plan to wreak havoc on a business (aka industrial espionage). A number of testing exercises will be performed periodically, but do not just expose the entire plan to everyone. To prevent a full leak, only hand out plans to personnel on a need-to-know basis. A single copy of the plan should be placed on a secure onsite server, and a carefully secured backup should be kept offsite for redundancy. At most, give access to three key people with a vested interest in the continued success of the company.
Backing up data is one of the most important tasks at the time of a network disaster (as well as before it). Data is frequently generated in large volumes, and its complexity and scope can change drastically throughout the workday. There is always a risk that data becomes lost, corrupted, overwritten, stolen, damaged through hardware failure, human error, malware, or hacking. To avoid such situations, you need to make an effective plan for data backup.
Data backup strategy
Earlier, we discussed how important it is to perform data backup on a regular basis. The data backup strategy should be included in the business continuity plan. Here are three integral steps for the strategy to be effective:
- Identify the data to back up.
- Select and implement hardware and software backup procedures.
- Schedule and conduct backups and periodically validate that the data has been accurately backed up.
Developing the data backup plan
Let’s look at the key steps for crafting the perfect data backup plan. Identify data on network servers, laptop computers, desktop computers, and wireless devices that need to be backed up. Do not forget to make backups of vital hard copy records; this could include property deeds or license certificates (among other records and documents). This so-called digitization can be accomplished by scanning paper records into digital formats. This should also be backed up along with the existing digital data. The plan should consist of regularly scheduled backups from wireless devices, desktop computers, and desktop computers to a network server. Regular backups will be critical when a disaster occurs.
Options for data backup
After the plan and strategy have been formulated, you need to choose where to store the backups. There are many options, of course, but currently, tapes, cartridges, and large capacity USB drives are common choices. These options can be supplemented with data backup software and encrypted cloud backup through a third-party service. The security level of the backup should be the same as the security level of the original data. There should be no compromise.
Sometimes the disaster can be so severe that all the network devices may get destroyed. The business would need a plan for restarting the network and acquiring replacement equipment. Track all the settings of the switches and routers so it would be easy to set them up again from scratch. Try not to change these settings, or at least monitor changes if they occur. You could run an audit of the settings at regular intervals or as part of a recovery test.
An efficient option is to use a configuration management tool to standardize the set up of all the devices. It is always good to have a similar configuration for all the devices. Having different configurations will only increase problems during recovery.
We discussed how backup and network configuration management is important and necessary. There are various tools available. One of the most popular is SolarWinds.
SolarWinds MSP Backup is a cloud-based subscription solution operates data centers globally. It provides quick and secure data recovery. Data compression is used for speed transfers, and strong AES encryption is used for communications. As expected, though by no means true for all backup systems, all data stored is encrypted. That means even the data center staff itself cannot read it if they wanted to or were otherwise compelled to by a third party. In addition to being an excellent option for data backup, there is a web-based console available; this is where all the backup and recovery tasks can be controlled.
Another tool that you can download and run on-site, the SolarWinds Network Configuration Manager, is great for tightening up device security on the network as well as preparing to restore the system as part of the network recovery. It is not a cloud-based tool, though; instead, it runs on the Windows server. The standardized device configurations of the business can be stored in this tool, and it reloads if any unauthorized change is detected. There is a 30-day trial versions available for download.
Network disaster recovery
It finally happened: a network disaster! Thankfully, though, your business has done everything it reasonably can to mitigate the damage. Let’s discuss the most important points for effective recovery.
- All individuals should know their role in the recovery.
- The business continuity plan should be followed properly.
- If a disaster affects the facility, try to relocate as soon as possible. If there is no alternative location available, the staff can work from home.
- Perform data backup processes as soon as possible. Saving data should be the highest priority aside from protecting the well-being of people during a disaster.
- Recover the equipment as soon as possible. Try your best to save the equipment during a disaster except when doing so would put lives at risk.
- Waste no time if the devices and equipment need to be replaced.
As mentioned earlier, a network disaster recovery plan should be there for any business—regardless of the size. Yes, the recovery plan costs money, time, and effort, but it is worth it because no one knows when a disaster will occur. Without a recovery plan, everything can be lost. Don’t leave things to chance!