BackupPC
Encyclopedia
BackupPC is a free
Disk-to-disk
backup software
suite with a web-based frontend. The cross-platform server will run on any Linux
, Solaris
, or UNIX
based server. No client is necessary, as the server is itself a client for several protocols that are handled by other services native to the client OS. In 2007, BackupPC was mentioned as one of the three most well known open-source backup software even though it is one of the tools that are "so amazing, but unfortunately, if no one ever talks about them, many folks never hear of them"
Data deduplication
reduces the disk space needed to store the backups in the disk pool. It is possible to use it as D2D2T solution, if the archive function of BackupPC is used to backup the disk pool to tape. BackupPC is not a block-level backup system such as Ghost4Linux but performs file-based backup and restore. Thus it is not suitable for backup of disk images or raw disk partitions .
BackupPC incorporates a Server Message Block
(SMB) client that can be used to back up network shares of computers running Windows. Paradoxically, under such a setup the BackupPC server can be located behind a NAT'd
firewall while the Windows machine operates over a public IP address. While this may not be advisable for SMB traffic, it is more useful for web servers running SSH
with GNU tar
and rsync
available, as it allows the BackupPC server to be stored in a subnet separate from the web server's DMZ
.
It is published under the GNU General Public License
.
It can back up Unix-like systems with native ssh and tar or rsync support, such as Linux, BSD, and Mac OSX, as well as Microsoft Windows shares with minimal configuration .
On Windows, third party implementations of tar, rsync, and SSH (such as Cygwin
) are required to utilize those protocols.
could potentially back up ten Windows XP laptops with 10 GB of data each, and if 8 GB is repeated on each machine (Office and Windows binary files) would look like 100 GB is needed, but only 28 GB (10 × 2 GB + 8 GB) would be used. Compression of the data on the back-end will further reduce that requirement.
When browsing the backups, incremental backups are automatically filled back to the previous full backup. So every backup appears to be a full and complete dump of data.
A local disk used as a backup destination returns speeds of 10+ Mbit/s depending on CPU performance.
A faster CPU will naturally help with compression and md5sum generation. Speeds of over 13 MB/s are attainable on a gigabit LAN when backing up a Linux client using rsync over SSH, even when the backup destination is non-local.
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...
Disk-to-disk
Disk-to-disk
The term "disk-to-disk", or "D2D", generally refers to disk-to-disk backup. With D2D, a computer hard disk is backed up to another hard disk rather than to a tape or floppy...
backup software
Backup software
Backup software are computer programs used to perform backup; they create supplementary exact copies of files, databases or entire computers. These programs may later use the supplementary copies to restore the original contents in the event of data loss....
suite with a web-based frontend. The cross-platform server will run on any Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
, Solaris
Solaris Operating System
Solaris is a Unix operating system originally developed by Sun Microsystems. It superseded their earlier SunOS in 1993. Oracle Solaris, as it is now known, has been owned by Oracle Corporation since Oracle's acquisition of Sun in January 2010....
, or UNIX
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
based server. No client is necessary, as the server is itself a client for several protocols that are handled by other services native to the client OS. In 2007, BackupPC was mentioned as one of the three most well known open-source backup software even though it is one of the tools that are "so amazing, but unfortunately, if no one ever talks about them, many folks never hear of them"
Data deduplication
Data deduplication
In computing, data deduplication is a specialized data compression technique for eliminating coarse-grained redundant data. The technique is used to improve storage utilization and can also be applied to network data transfers to reduce the number of bytes that must be sent across a link...
reduces the disk space needed to store the backups in the disk pool. It is possible to use it as D2D2T solution, if the archive function of BackupPC is used to backup the disk pool to tape. BackupPC is not a block-level backup system such as Ghost4Linux but performs file-based backup and restore. Thus it is not suitable for backup of disk images or raw disk partitions .
BackupPC incorporates a Server Message Block
Server Message Block
In computer networking, Server Message Block , also known as Common Internet File System operates as an application-layer network protocol mainly used to provide shared access to files, printers, serial ports, and miscellaneous communications between nodes on a network. It also provides an...
(SMB) client that can be used to back up network shares of computers running Windows. Paradoxically, under such a setup the BackupPC server can be located behind a NAT'd
Network address translation
In computer networking, network address translation is the process of modifying IP address information in IP packet headers while in transit across a traffic routing device....
firewall while the Windows machine operates over a public IP address. While this may not be advisable for SMB traffic, it is more useful for web servers running SSH
Secure Shell
Secure Shell is a network protocol for secure data communication, remote shell services or command execution and other secure network services between two networked computers that it connects via a secure channel over an insecure network: a server and a client...
with GNU tar
Tar (file format)
In computing, tar is both a file format and the name of a program used to handle such files...
and rsync
Rsync
rsync is a software application and network protocol for Unix-like and Windows systems which synchronizes files and directories from one location to another while minimizing data transfer using delta encoding when appropriate. An important feature of rsync not found in most similar...
available, as it allows the BackupPC server to be stored in a subnet separate from the web server's DMZ
Demilitarized zone (computing)
In computer security, a DMZ is a physical or logical subnetwork that contains and exposes an organization's external services to a larger untrusted network, usually the Internet...
.
It is published under the GNU General Public License
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....
.
Protocols Supported
Supports NFS, SSH, SMB and rsyncIt can back up Unix-like systems with native ssh and tar or rsync support, such as Linux, BSD, and Mac OSX, as well as Microsoft Windows shares with minimal configuration .
On Windows, third party implementations of tar, rsync, and SSH (such as Cygwin
Cygwin
Cygwin is a Unix-like environment and command-line interface for Microsoft Windows. Cygwin provides native integration of Windows-based applications, data, and other system resources with applications, software tools, and data of the Unix-like environment...
) are required to utilize those protocols.
Protocol Choice
The choice between tar and rsync is dictated by the hardware and bandwidth available to the client. Clients backed up by rsync use considerably more CPU time than client machines using tar or SMB. Clients using SMB or tar use considerably more bandwidth than clients using rsync. These trade-offs are inherent in the differences between the protocols. Using tar or SMB transfers each file in its entirety, using little CPU but maximum bandwidth. The rsync method calculates checksums for each file on both the client and server machines in a way that enables a transfer of just the differences between the two files; this uses more CPU resources, but minimizes bandwidth.Data Storage
BackupPC uses a combination of hard links and compression to reduce the total disk space used for files. At the first full backup, all files are transferred to the backend, optionally compressed, and then compared. Files that are identical are hard linked, which uses only one additional directory entry. The upshot is that an astute system administratorSystem administrator
A system administrator, IT systems administrator, systems administrator, or sysadmin is a person employed to maintain and operate a computer system and/or network...
could potentially back up ten Windows XP laptops with 10 GB of data each, and if 8 GB is repeated on each machine (Office and Windows binary files) would look like 100 GB is needed, but only 28 GB (10 × 2 GB + 8 GB) would be used. Compression of the data on the back-end will further reduce that requirement.
When browsing the backups, incremental backups are automatically filled back to the previous full backup. So every backup appears to be a full and complete dump of data.
Performance
When backing up a remote SMB share, speeds of 3–4 Mbit/s are normal .A local disk used as a backup destination returns speeds of 10+ Mbit/s depending on CPU performance.
A faster CPU will naturally help with compression and md5sum generation. Speeds of over 13 MB/s are attainable on a gigabit LAN when backing up a Linux client using rsync over SSH, even when the backup destination is non-local.
Forks and related projects
- BackupPC4AFS is a version of BackupPC patched to back up AFSAndrew file systemThe Andrew File System is a distributed networked file system which uses a set of trusted servers to present a homogeneous, location-transparent file name space to all the client workstations. It was developed by Carnegie Mellon University as part of the Andrew Project. It is named after Andrew...
or OpenAFSOpenAFSOpenAFS is an open source implementation of the Andrew distributed file system . AFS was originally developed at Carnegie Mellon University, and developed as a commercial product by the Transarc Corporation, which was subsequently acquired by IBM. At LinuxWorld on 15 August 2000, IBM their plans...
volumes to a backup server's local disk or attached RAID. It supports all BackupPC features, including full and multi-level incremental dumps, exponential expiry, and configuration via conf files or a web interface. When performing full backups of multi-gigabyte AFS volumes, speeds of 24–35 megabytes per second are not uncommon over gigabit ethernet. - BackupPC SME Contrib is an add-on to SME ServerSME ServerSME Server is a Linux distribution based on CentOS offering an operating system for computers used as web, file, email and database servers...
that allows integration of BackupPC into the SME templated UI. - Zmanda's community edition of BackupPC has added the ability to use FTPFile Transfer ProtocolFile Transfer Protocol is a standard network protocol used to transfer files from one host to another host over a TCP-based network, such as the Internet. FTP is built on a client-server architecture and utilizes separate control and data connections between the client and server...
, as well as other patches that are part of the 3.2.0 version of mainline.