Backup rotation scheme
Encyclopedia
A backup rotation scheme is a method for effectively backing up
data where multiple media (such as tapes
) are used in the backup process. The scheme determines how and when each piece of removable storage is used for a backup job and how long it is retained once it has backup data stored on it. Different techniques have evolved over time to balance data retention and restoration needs with the cost of extra data storage media. Such a scheme can be quite complicated if it takes incremental backups, multiple retention periods, and off-site storage into consideration.
A First In, First Out
(FIFO) backup scheme saves new or modified files onto the oldest media in the set. Performing a daily backup onto a set of 14 media, the backup depth would be 14 days. Each day, the oldest media would be inserted when performing the backup.
This is the simplest rotation scheme, and is usually the first to come to mind. It was commonly used when people regularly backed up to floppy disks.
This approach, however, is naive and can lead to data loss. To understand why, consider a file in which an an unsuspected error is introduced. Several generations of backups and revisions have since occurred. The error is then detected. At this time, it would be pointless to have all of the most recent generations because all of them have the error. It would instead be beneficial to have at least one of the older generations, as it would not have the error.
puzzle, with what is essentially a recursive method. It is a 'smart' way of archiving an effective number of backups as well as the ability to go back over time, but it is more complex to understand. Basically, every tape is associated with a disk in the puzzle, and every disk movement to a different peg corresponds with a backup to that tape. So the first tape is used every other day (1, 3, 5, 7, 9,...), the second tape is used every fourth day (2, 6, 10, ...), the third tape is used every eighth day (4, 12, 20, ...).
A set of n tapes
(or tapes sets) will allow backups for 2 n-1 days before the last set is recycled. So, three tapes will give four days worth of backups and on the fifth day Set C will be overwritten; four tapes will give eight days, and Set D is overwritten on the ninth day; five tapes will give 16 days, etc. Files can be restored from 1, 2, 4, 8, 16, ..., 2 n - 1 days ago. Mathematically, you can look at the sequence of the binary notation of the day number. In each step, the number of zeros on the (right) end of the number determines the tape number to use.
The following tables show which tapes are used on which days of various cycles. Note that the Towers of Hanoi rotation method has the drawback of overwriting the very first backup (day 1 of the cycle) after only two days. However, this can easily be overcome by starting on the last day of a cycle (marked in red in the tables).
is a constant exponent
(possibly the square) of the multiplicative inverse
of the duration (possibly expressed in the number of days) between the date of the generation and the generation available before it.
Using a larger exponent leads to a more uniform distribution of generations, whereas a smaller exponent lead to a distribution with more recent and fewer older generations. This technique probabilistically ensures that past generations are always distributed across all points in time as desired.
Backup
In information technology, a backup or the process of backing up is making copies of data which may be used to restore the original after a data loss event. The verb form is back up in two words, whereas the noun is backup....
data where multiple media (such as tapes
Magnetic tape data storage
Magnetic tape data storage uses digital recording on to magnetic tape to store digital information. Modern magnetic tape is most commonly packaged in cartridges and cassettes. The device that performs actual writing or reading of data is a tape drive...
) are used in the backup process. The scheme determines how and when each piece of removable storage is used for a backup job and how long it is retained once it has backup data stored on it. Different techniques have evolved over time to balance data retention and restoration needs with the cost of extra data storage media. Such a scheme can be quite complicated if it takes incremental backups, multiple retention periods, and off-site storage into consideration.
FIFO method
- Used to keep the longest possible tail of daily backups
- Used when archived backups are not as important (i.e. no need to go back one year)
- Useful when data before the rotation period is irrelevant
A First In, First Out
FIFO
FIFO is an acronym for First In, First Out, an abstraction related to ways of organizing and manipulation of data relative to time and prioritization...
(FIFO) backup scheme saves new or modified files onto the oldest media in the set. Performing a daily backup onto a set of 14 media, the backup depth would be 14 days. Each day, the oldest media would be inserted when performing the backup.
This is the simplest rotation scheme, and is usually the first to come to mind. It was commonly used when people regularly backed up to floppy disks.
This approach, however, is naive and can lead to data loss. To understand why, consider a file in which an an unsuspected error is introduced. Several generations of backups and revisions have since occurred. The error is then detected. At this time, it would be pointless to have all of the most recent generations because all of them have the error. It would instead be beneficial to have at least one of the older generations, as it would not have the error.
Grandfather-father-son backup
Grandfather-father-son backup refers to the most common rotation scheme for rotating backup media. Originally designed for tape backup, it works well for any hierarchical backup strategy. The basic method is to define three sets of backups, such as daily, weekly and monthly. The daily, or son, backups are rotated on a daily basis with one graduating to father status each week. The weekly or father backups are rotated on a weekly basis with one graduating to grandfather status each month. In addition, quarterly, biannual, and/or annual backups can also be separately retained. Often one or more of the graduated backups is removed from the site for safekeeping and disaster recovery purposes.Towers of Hanoi
The Towers of Hanoi rotation method is more complex. It is based on the mathematics of the Tower of HanoiTower of Hanoi
The Tower of Hanoi or Towers of Hanoi, also called the Tower of Brahma or Towers of Brahma, is a mathematical game or puzzle. It consists of three rods, and a number of disks of different sizes which can slide onto any rod...
puzzle, with what is essentially a recursive method. It is a 'smart' way of archiving an effective number of backups as well as the ability to go back over time, but it is more complex to understand. Basically, every tape is associated with a disk in the puzzle, and every disk movement to a different peg corresponds with a backup to that tape. So the first tape is used every other day (1, 3, 5, 7, 9,...), the second tape is used every fourth day (2, 6, 10, ...), the third tape is used every eighth day (4, 12, 20, ...).
A set of n tapes
Magnetic tape data storage
Magnetic tape data storage uses digital recording on to magnetic tape to store digital information. Modern magnetic tape is most commonly packaged in cartridges and cassettes. The device that performs actual writing or reading of data is a tape drive...
(or tapes sets) will allow backups for 2 n-1 days before the last set is recycled. So, three tapes will give four days worth of backups and on the fifth day Set C will be overwritten; four tapes will give eight days, and Set D is overwritten on the ninth day; five tapes will give 16 days, etc. Files can be restored from 1, 2, 4, 8, 16, ..., 2 n - 1 days ago. Mathematically, you can look at the sequence of the binary notation of the day number. In each step, the number of zeros on the (right) end of the number determines the tape number to use.
The following tables show which tapes are used on which days of various cycles. Note that the Towers of Hanoi rotation method has the drawback of overwriting the very first backup (day 1 of the cycle) after only two days. However, this can easily be overcome by starting on the last day of a cycle (marked in red in the tables).
Three-Tape Hanoi Schedule
Day of the Cycle | ||||||||
---|---|---|---|---|---|---|---|---|
01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | |
Set | A | A | A | A | ||||
B | B | |||||||
C | C |
Four-Tape Hanoi Schedule
Day of the Cycle | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | |
Set | A | A | A | A | A | A | A | A | ||||||||
B | B | B | B | |||||||||||||
C | C | |||||||||||||||
D | D |
Five-Tape Hanoi Schedule
Day of the Cycle | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
01 | 02 | 03 | 04 | 05 | 06 | 07 | 08 | 09 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | |
Set | A | A | A | A | A | A | A | A | A | A | A | A | A | A | A | A | ||||||||||||||||
B | B | B | B | B | B | B | B | |||||||||||||||||||||||||
C | C | C | C | |||||||||||||||||||||||||||||
D | D | |||||||||||||||||||||||||||||||
E | E |
Weighted random approach
An alternative approach to keeping generations distributed across all points in time is to delete (or overwrite), past generations (except the oldest and the most-recent-n generations) when necessary in a weighted-random fashion. For each deletion, the weight assigned to each of the deletable generations is the probability of it being deleted. One acceptable weightWeight function
A weight function is a mathematical device used when performing a sum, integral, or average in order to give some elements more "weight" or influence on the result than other elements in the same set. They occur frequently in statistics and analysis, and are closely related to the concept of a...
is a constant exponent
Exponentiation
Exponentiation is a mathematical operation, written as an, involving two numbers, the base a and the exponent n...
(possibly the square) of the multiplicative inverse
Multiplicative inverse
In mathematics, a multiplicative inverse or reciprocal for a number x, denoted by 1/x or x−1, is a number which when multiplied by x yields the multiplicative identity, 1. The multiplicative inverse of a fraction a/b is b/a. For the multiplicative inverse of a real number, divide 1 by the...
of the duration (possibly expressed in the number of days) between the date of the generation and the generation available before it.
Using a larger exponent leads to a more uniform distribution of generations, whereas a smaller exponent lead to a distribution with more recent and fewer older generations. This technique probabilistically ensures that past generations are always distributed across all points in time as desired.