Why data backup is important
In information technology, the concept of data backup is the practice of copying data from a primary to a secondary location so that it may be used for recovery in case your original data is lost or corrupted after a disaster or accidental loss
The verb form, referring to the process of doing so, is “back up“, whereas the noun and adjective form is “backup“. Backups can be used to recover data after its loss from data deletion or corruption or to recover data from an earlier time. You can also use a backup to recover copies of older files if you have deleted them from your system.
In today’s modern society, companies and individuals are very dependent on data. Many businesses and organizations protect their critical data with backup, making it one of the key components of a company’s Disaster Recovery and Business Continuity Plan. Organizations that do not have proper disaster recovery plans in place will struggle to survive when disaster comes knocking. The goal of disaster recovery is to minimize the effects of a disaster or disruption. It means taking the necessary steps to ensure that the resources, personnel, and business processes can resume operation promptly. A backup strategy is part and parcel of a disaster recovery plan. Organizations must designate a Backup Administrator to drive the company-wide backup strategy.
Data backup strategy has become so important that compliance with data protection regulation requirements cannot be achieved without evidence of an effective backup system in place. The available types of backup operations have changed over the years to offer different balances of speed, security, and resource use. There is no single type of backup, and certainly no one-size-fits-all backup method for every situation. The advent of cloud computing has opened the door for more options such as cloud, on-site, or hybrid backup solutions.
The most common types of backup methodologies are full backup, mirror backup, incremental backup, and differential backup. Full backups copy everything, and it’s usually done the first time you backup a system. This results in a minimal time to restore data, a metric known as a recovery time objective (RTO). However, the disadvantages are that it takes longer to perform a full backup, and it requires more storage space, which is why the cloud is usually preferable. Mirror backup is usually used on-premises and often involves external hard drives or disks. Incremental backups are generally a better fit for cloud backups because they use fewer resources. We will now focus our attention on the differential backup method.
What is Differential backup?
Differential backup is a cumulative backup of all files that changed since the last full backup. Suppose you did a full backup yesterday for example, in differential backup, only the files that changed since yesterday will be backed up if you do a backup today. If you did the full backup on Sunday, that means on Monday you back up only the files that changed since Sunday; on Tuesday you backup only the files that changed since Sunday; on Wednesday you backup only the files that changed since Sunday, and so on, until the next full backup. The idea behind using differential backups stems from saving storage space and restoration time. Since changes to data are generally few compared to the entire amount of data in the data repository, the amount of time required to complete the backup will be smaller than if a full backup was performed every time that the organization or data owner wishes to back up changes since the last full backup.
Day | Sunday | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday |
---|---|---|---|---|---|---|---|---|
Backup Type | Full | Differential | Differential | Differential | Differential | Differential | Differential | Full |
Effect | N/A | Changes since Sunday | Changes since Sunday | Changes since Sunday | Changes since Sunday | Changes since Sunday | Changes since Sunday | N/A |
Table 1.0 | Differential backup
The advantage to this is that differential backups are more flexible and take a shorter time than full backups because so much less data is being backed up. It provides a quicker recovery time, requiring only a full backup and the last differential backup to restore the entire data repository. The disadvantage is that the amount of data being backed up grows with each differential backup until the next full backup. For each day elapsed since the last full backup, more data needs to be backed up, especially if a significant proportion of the data has changed, thus increasing backup time as compared to the incremental backup method.
Advantages | Disadvantages |
---|---|
Faster backup compared to a full backup | Differential backups store more backed up data on each subsequent operation. The backed-up data in a differential backup becomes progressively larger with each full backup cycle. |
Faster recovery time as only the full backup and most recent differential backup are needed to restore the whole repository. | Restoring individual backup files may be difficult or time-consuming as it needs to be searched within a larger backup. |
Since only changed data after the last full backup is targeted for backup, consumed storage space is minimal. | A full backup is required before the differential backup process can begin |
At most, two backup storage media types are needed for restoring data | If the full backup or any subsequent differential backup fails, the data recovery process cannot be completed. |
Table 2.0 | Advantages and disadvantages of differential backup
How differential backup and recovery works
Differential backup process begins with a full backup in which a copy of everything is first made. The full backup upon which a differential backup is based is known as the base of the differential. Once the original full backup is performed, differential backups use it as a base for comparison and to carry out subsequent backups. Then, as changes are made and new files are created before the next full backup, they are marked for differential backup. Usually, the Backup Administrator will put in place the differential backup schedule, and once initiated, the cumulative backup data will be copied.
For instance, if a full backup is done on Sunday, Monday’s differential backup backs up all the files changed or created since Sunday’s full backup. On Tuesday, the differential backup is performed then backs up all the changed files since Sunday’s full backup, including the files changed on Monday—and the cycle continues daily. In case of a disaster and you want to perform a complete restore, only the last full backup and latest differential backup are required. Generally, an organization must run a full backup at least once. But after that, you have to decide what backup method to adopt going forward. You have to choose between incremental or differential backup.
The first partial backup performed after the initial full backup will back up the same data, whether it’s incremental or differential. The difference appears during the third backup operation. The data that is backed up with an incremental is limited to the changes since the last incremental, whereas a differential will back up all changes since the first full backup. Choosing the optimal backup strategy usually involves making tradeoffs between performance, data protection levels, the total amount of data retained, recovery time, and cost. Running a weekly full backup plus daily differential backups offers a good compromise. More backup media sets are required to restore than with a daily full policy, but less than with a daily incremental policy. To restore data from any particular day, at most two media sets are required, reducing the time needed to recover and the potential for problems with an unreadable backup set.
Difference between differential and incremental backups
Generally, differential and incremental backups share something in common: they require at least one full backup copy in storage. They are also considered a form of cumulative backup. When it is initially run, differential and incremental backups both begin by copying all data changed from the previous backup. Furthermore, differential and incremental backups work towards faster backups and optimized storage space compared to full backups. Notwithstanding, differential backup differs from incremental backup in some ways.
Day | Sunday | Monday | Tuesday | Wednesday | Thursday | Friday | Saturday | Sunday |
---|---|---|---|---|---|---|---|---|
Backup type | Full | Incremental | Incremental | Incremental | Incremental | Incremental | Incremental | Full |
Effect | N/A | Changes since Sunday | Changes since Sunday | Changes since Sunday | Changes since Sunday | Changes since Sunday | Changes since Sunday | N/A |
Table 3.0 | Incremental backup
Firstly, differential backups take up more storage space. That makes it less attractive for companies that cannot afford extensive storage space. However, they require less recovery time, so organizations that cannot afford the slightest downtime will find differential backup more attractive since it offers less downtime if a disaster or a hacking attack occurs. Secondly, unlike incremental backups, differential backups will continue to copy all data changed since the previous full backup. This means that differential backups store more backed up data than an incremental backup on each subsequent operation, but still far less than a full backup. In total, the space and time required for differential backups falls somewhere in between incremental and full backups. The table below highlights some of the key differences between differential and incremental backups:
Key Features | Differential | Incremental |
---|---|---|
Storage space | Requires more storage space compared to incremental. And if your backup schedule is longer, your data grows even more–a longer schedule means more extensive data volumes. | Requires less storage space compared to differential backup. |
Backup speed | In general, differential backups take more time than incremental ones to complete. A full backup is slower than both differential and incremental backup, with differential backup being much slower than incremental. | Incremental backups are considered faster than differential backups, making them the fastest. |
Recovery speed | Differential backup recovery is faster than incremental ones, as you need fewer steps and data pieces to complete the recovery process. Depending on how much data you need to back up, your network speed, and your choice of storage (local, cloud, or hybrid). | Restoring from an incremental backup is more time-consuming because it depends on multiple backup copies being restored. If any of the previous incremental copies since the last backup are missing or corrupted, the recovery time may increase. Nonetheless, incremental backup recovery is the preferred choice if you only need partial restoration of recently added data. |
Cloud Suitability | Cloud providers bill customers based on the resources such as storage space and the network bandwidth they consume. So it makes sense to choose the backup type with the smallest data footprint. Unfortunately, differential backup isn't the best choice | If you're performing cloud backup, incremental backups are generally better suited because they consume fewer resources. You might start with a full backup in the cloud and then shift to incremental backups. |
Cost Efficiency | As noted earlier, full backups would be the most costly, as they require the most storage space. Then closely followed by differential backups | Incremental backups are the most cost-efficient. But if your backup schedule is longer, the incremental backup approach will take up more space, which means higher cost. |
Table 4.0 | Comparison of differential and incremental backup
Choosing the right backup method
Now that we have discussed differential backup in detail, you have now been equipped with the information you need to formulate the right backup strategy for your organization. The real question IT managers must ask is when to use each method, and how they should be combined to meet the overall business cost, performance, and uptime goals. Deciding between the various backup methods shouldn’t be about which is better, but about which best meets your business needs, especially your business continuity and disaster recovery plan. It’s important to remember that the purpose of backups is not actually to store files but to be able to recover them. If files can’t be restored after a disaster, then what’s the point of the backup?
So here are some things to keep in mind when deciding for your organization:
- Backup speed — The backup speed is one of the most important factors you should consider for your data backup.
- Recovery time objective (RTO) — this is the maximum tolerable duration of time between the event of failure or disaster and the point where operations resume.
- Recovery point objective (RPO) — this is the maximum length of time permitted that data can be restored, which may or may not mean data loss.
- Flexibility — your backup type must be able to scale from restoring a single file to restoring a whole server.
- Cloud backup — if you intend to back up to the cloud, consider a backup method that takes fewer resources, such as storage space and bandwidth.
- Cost Efficiency — you don’t want a backup method that costs more to implement and maintain.
- Schedule — your data backups must be able to run on a schedule and also capture changes in data and data that rarely change.
For organizations with small amounts of data, running a daily full backup is usually the natural thing to do. Daily full backup provides a high level of protection without much additional storage space costs. For obvious reasons, larger organizations or those with more data or server volume rarely choose to go this route due to the resource and time requirements of doing so.
Those with large amounts of data typically go for either daily incremental or differential backups. However, differential backup provides a higher level of data protection with less restore time without requiring a great deal more storage space. A weekly full backup with daily differential backups is a good option for many organizations because it gives the best of both worlds.