Bacula
Encyclopedia
Bacula is an open source, enterprise level computer backup
system for heterogeneous network
s. It is designed to automate backup tasks that had often required intervention from a systems administrator or computer operator.
Bacula supports Linux
, UNIX
, Windows
, and Mac OS X
backup clients, and a range of professional backup devices including tape libraries. Administrators and operators can configure the system via a command line console
, GUI
or web interface; its back-end is a catalog of information stored by MySQL
, PostgreSQL
, or SQLite
.
Bacula is the collective work of many developers, including Kern Sibbald, and its current release has been built upon ten years of development. It is open source
and available without fees for both commercial and non-commercial application, with respect to the GPL2
license with exceptions to permit linking with OpenSSL and distributing
Windows binaries. Bacula is a registered trademark of Kern Sibbald.
According to project information published on SourceForge
, since April 2002, Bacula has over 1.3 million downloads, which is four times more than any other open source backup program during the same period. By download statistics, this makes it the most downloaded open source backup program.
including:
to execute backup and restore functionality:
Director Daemon: manages other daemons, queries and updates catalog, interfaces with operator front-ends, automates backup schedules
Storage Daemon: makes system calls to drive backup media, responds to read/write requests from Director, and receives backup/restore data from file daemon
File Daemon: negotiates client-side communication, encryption and compression, opens file handles to access a client's data
Bacula Console: the control interface from which the user can enter commands to operate Bacula tasks. the console is a command line interface.
Bat (Bacula Administrative Tool) Console: a GUI interface from which the user can enter commands to operate Bacula tasks.
Tray Monitor: is a GUI
that can be installed on any desktop to monitor the Bacula operations.
Bweb: a web interface that allows systems management views of all the Bacula backups. It also permits most all operations that can be done with the console.
These daemons can run on independent hosts but typical installations consist of three kinds of Bacula hosts:
Client machines: the machines that contain the files to be backed up
Storage machines: machines that contain the media used to store the backups
Backup Servers: that orchestrate the backup processes
The Director manages everything so its host will always be called a "backup server"; the client and storage daemons run as its subordinates and have no direct control of the back up process. While this structure suggests that the three daemons run on three different machines, an equally valid setup is to run all three daemons on the machine that controls the backup process and backup additional machines that have just a file daemon installed. It is also possible mount any remote files and storage resources into its filesystem over SMB
or NFS, however, the Bacula developers discourage this in favor of having a File daemon installed on each machine to be backed up. In practice, however, the Director and Storage Daemon are often run on one machine (often referred to as the Bacula Server). The File Daemon is then run on each machine to be backed up (including the Bacula server—because its catalog is dumped as SQL
).
Backup data can be stored on various media, including tape, optical media, and disk.
or dump
. Bacula developers and users do not consider this a limitation, because it is an extensible, machine independent format that far surpasses the capabilities of the tar and dump formats.
By default, and as is case for all other open source backup software, Bacula's Differential and Incremental backups are based on system time stamps. Consequently, if you move files into an existing directory or move a whole directory into the backup FileSet after a Full backup, those files may not be backed up by an Incremental save because they may have old dates. You must explicitly update the date/time stamp on all moved files. Bacula versions starting with 3.0 or later support Accurate backup, which is an option to address this issue. Windows NTBackup, which is not as feature rich as Bacula, does not have this problem, because it does not rely on time stamps, but uses the archive bit attribute instead, which has its own set of problems.
|-
! align="left" | Date
! align="left" | Event
|-
| January 2000
| Project started
|-
| April 14, 2002
| First release to SourceForge.net
(version 1.16)
|-
| June 29, 2006
| Release 1.38.11 (Final version 1 release)
|-
| January 2007
| Release 2.0.0
|-
| September 2007
| Release 2.2.3
|-
| June 2008
| Release 2.4.0
|-
| April 2009
| Release 3.0.0
|-
| January 2010
| Release 5.0.0
|-
| September 2010
| Release 5.0.3
|}>
Backup
In information technology, a backup or the process of backing up is making copies of data which may be used to restore the original after a data loss event. The verb form is back up in two words, whereas the noun is backup....
system for heterogeneous network
Heterogeneous network
A heterogeneous network is a network connecting computers and other devices with different operating systems and/or protocols. For example, local area networks that connect Microsoft Windows and Linux based personal computers with Apple Macintosh computers are heterogeneous. The word...
s. It is designed to automate backup tasks that had often required intervention from a systems administrator or computer operator.
Bacula supports Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
, UNIX
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
, Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
, and Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...
backup clients, and a range of professional backup devices including tape libraries. Administrators and operators can configure the system via a command line console
System console
The system console, root console or simply console is the text entry and display device for system administration messages, particularly those from the BIOS or boot loader, the kernel, from the init system and from the system logger...
, GUI
Gui
Gui or guee is a generic term to refer to grilled dishes in Korean cuisine. These most commonly have meat or fish as their primary ingredient, but may in some cases also comprise grilled vegetables or other vegetarian ingredients. The term derives from the verb, "gupda" in Korean, which literally...
or web interface; its back-end is a catalog of information stored by MySQL
MySQL
MySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...
, PostgreSQL
PostgreSQL
PostgreSQL, often simply Postgres, is an object-relational database management system available for many platforms including Linux, FreeBSD, Solaris, MS Windows and Mac OS X. It is released under the PostgreSQL License, which is an MIT-style license, and is thus free and open source software...
, or SQLite
SQLite
SQLite is an ACID-compliant embedded relational database management system contained in a relatively small C programming library. The source code for SQLite is in the public domain and implements most of the SQL standard...
.
Introduction
Bacula is a set of computer programs for managing backup, recovery, and verification of computer data across a network. These programs work together to provide a robust, easily managed, and complete backup solution for mixed operating system environments.Bacula is the collective work of many developers, including Kern Sibbald, and its current release has been built upon ten years of development. It is open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...
and available without fees for both commercial and non-commercial application, with respect to the GPL2
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....
license with exceptions to permit linking with OpenSSL and distributing
Windows binaries. Bacula is a registered trademark of Kern Sibbald.
According to project information published on SourceForge
SourceForge
SourceForge Enterprise Edition is a collaborative revision control and software development management system. It provides a front-end to a range of software development lifecycle services and integrates with a number of free software / open source software applications .While originally itself...
, since April 2002, Bacula has over 1.3 million downloads, which is four times more than any other open source backup program during the same period. By download statistics, this makes it the most downloaded open source backup program.
Network options
- TCPTransmission Control ProtocolThe Transmission Control Protocol is one of the core protocols of the Internet Protocol Suite. TCP is one of the two original components of the suite, complementing the Internet Protocol , and therefore the entire suite is commonly referred to as TCP/IP...
/IPInternet ProtocolThe Internet Protocol is the principal communications protocol used for relaying datagrams across an internetwork using the Internet Protocol Suite...
- client–server communication uses standard ports and services instead of RPCRemote procedure callIn computer science, a remote procedure call is an inter-process communication that allows a computer program to cause a subroutine or procedure to execute in another address space without the programmer explicitly coding the details for this remote interaction...
for NFS, CIFS, etc.; this eases firewall administration and network security - CRAM-MD5CRAM-MD5In cryptography, CRAM-MD5 is achallenge-response authentication mechanism defined in RFC 2195 based on theHMAC-MD5 MACalgorithm...
- configurable client–server authentication - GZIPGzipGzip is any of several software applications used for file compression and decompression. The term usually refers to the GNU Project's implementation, "gzip" standing for GNU zip. It is based on the DEFLATE algorithm, which is a combination of Lempel-Ziv and Huffman coding...
- client-side compressionData compressionIn computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
to reduce network bandwidth consumption; this runs separate from hardware compression done by the backup device - TLSTransport Layer SecurityTransport Layer Security and its predecessor, Secure Sockets Layer , are cryptographic protocols that provide communication security over the Internet...
- network communication encryption - MD5MD5The MD5 Message-Digest Algorithm is a widely used cryptographic hash function that produces a 128-bit hash value. Specified in RFC 1321, MD5 has been employed in a wide variety of security applications, and is also commonly used to check data integrity...
/SHASecure Hash AlgorithmThe Secure Hash Algorithm is one of a number of cryptographic hash functions published by the National Institute of Standards and Technology as a U.S. Federal Information Processing Standard :...
- verify file integrity - CRCCyclic redundancy checkA cyclic redundancy check is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to raw data...
- verify data block integrity - PKIPublic key infrastructurePublic Key Infrastructure is a set of hardware, software, people, policies, and procedures needed to create, manage, distribute, use, store, and revoke digital certificates. In cryptography, a PKI is an arrangement that binds public keys with respective user identities by means of a certificate...
- backup data encryption
Client-options
- POSIXPOSIXPOSIX , an acronym for "Portable Operating System Interface", is a family of standards specified by the IEEE for maintaining compatibility between operating systems...
ACLAccess control listAn access control list , with respect to a computer file system, is a list of permissions attached to an object. An ACL specifies which users or system processes are granted access to objects, as well as what operations are allowed on given objects. Each entry in a typical ACL specifies a subject...
- needed to restore Windows NTWindows NTWindows NT is a family of operating systems produced by Microsoft, the first version of which was released in July 1993. It was a powerful high-level-language-based, processor-independent, multiprocessing, multiuser operating system with features comparable to Unix. It was intended to complement...
ACE's and Samba servers - UnicodeUnicodeUnicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
/UTF-8UTF-8UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...
- cross-platform filenames - VSS - calls Microsoft's snapshot service
- LVMLogical Volume Manager (Linux)LVM is a logical volume manager for the Linux kernel; it manages disk drives and similar mass-storage devices, in particular large ones. The term "volume" refers to a disk drive or partition thereof...
- pre-script setup for Linux/UNIX snapshot - LFSLarge file supportLarge file support, often abbreviated to LFS, is the term frequently applied to the ability to create files larger than 2 GiB on 32-bit operating systems.- Rationale :...
- backup files larger than 2GiB - raw - backup devices without a filesystem
Backup devices
- pooling - allocates backup volumes according to job needs and retention configuration
- spooling - writes backup data to spool until target backup medium is allocated so jobs can continue uninterrupted
- media-spanning - such as spanning tapes
- multi-streaming - write multiple, simultaneous data streams to the same medium
- ANSIAnsiAnsi is a village in Kaarma Parish, Saare County, on the island of Saaremaa, Estonia....
& EBCDICEBCDICExtended Binary Coded Decimal Interchange Code is an 8-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems....
- IBM compatibility - Barcodes - reading tape barcodes in libraries
- autoloaders - virtually every tape autoloader available (called autochangers in Bacula)
- most tape drives, including DDS, DLT, SDLT, LTO-1,2,3,4
Client OS
The client software, executed by a "file daemon" running on a Bacula client, on many operating systems,including:
- LinuxLinuxLinux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
- most major distributions, including: CentOS, Debian, Fedora, Gentoo, Mandriva, OpenSUSE, Red Hat and Ubuntu. - Solaris
- FreeBSDFreeBSDFreeBSD is a free Unix-like operating system descended from AT&T UNIX via BSD UNIX. Although for legal reasons FreeBSD cannot be called “UNIX”, as the direct descendant of BSD UNIX , FreeBSD’s internals and system APIs are UNIX-compliant...
- all released versions - NetBSDNetBSDNetBSD is a freely available open source version of the Berkeley Software Distribution Unix operating system. It was the second open source BSD descendant to be formally released, after 386BSD, and continues to be actively developed. The NetBSD project is primarily focused on high quality design,...
- WindowsMicrosoft WindowsMicrosoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
(File daemon supported on all 32 and 64 bit Windows OSes) - Mac OS XMac OS XMac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...
- OpenBSDOpenBSDOpenBSD is a Unix-like computer operating system descended from Berkeley Software Distribution , a Unix derivative developed at the University of California, Berkeley. It was forked from NetBSD by project leader Theo de Raadt in late 1995...
- HP-UXHP-UXHP-UX is Hewlett-Packard's proprietary implementation of the Unix operating system, based on UNIX System V and first released in 1984...
- Tru64
- IRIXIRIXIRIX is a computer operating system developed by Silicon Graphics, Inc. to run natively on their 32- and 64-bit MIPS architecture workstations and servers. It was based on UNIX System V with BSD extensions. IRIX was the first operating system to include the XFS file system.The last major version...
Structure
Bacula is designed to be modular so that it can scale to the needs of its operator(s). Any installation contains three kinds of daemonsDaemon (computer software)
In Unix and other multitasking computer operating systems, a daemon is a computer program that runs as a background process, rather than being under the direct control of an interactive user...
to execute backup and restore functionality:
Director Daemon: manages other daemons, queries and updates catalog, interfaces with operator front-ends, automates backup schedules
Storage Daemon: makes system calls to drive backup media, responds to read/write requests from Director, and receives backup/restore data from file daemon
File Daemon: negotiates client-side communication, encryption and compression, opens file handles to access a client's data
Bacula Console: the control interface from which the user can enter commands to operate Bacula tasks. the console is a command line interface.
Bat (Bacula Administrative Tool) Console: a GUI interface from which the user can enter commands to operate Bacula tasks.
Tray Monitor: is a GUI
Gui
Gui or guee is a generic term to refer to grilled dishes in Korean cuisine. These most commonly have meat or fish as their primary ingredient, but may in some cases also comprise grilled vegetables or other vegetarian ingredients. The term derives from the verb, "gupda" in Korean, which literally...
that can be installed on any desktop to monitor the Bacula operations.
Bweb: a web interface that allows systems management views of all the Bacula backups. It also permits most all operations that can be done with the console.
These daemons can run on independent hosts but typical installations consist of three kinds of Bacula hosts:
Client machines: the machines that contain the files to be backed up
Storage machines: machines that contain the media used to store the backups
Backup Servers: that orchestrate the backup processes
The Director manages everything so its host will always be called a "backup server"; the client and storage daemons run as its subordinates and have no direct control of the back up process. While this structure suggests that the three daemons run on three different machines, an equally valid setup is to run all three daemons on the machine that controls the backup process and backup additional machines that have just a file daemon installed. It is also possible mount any remote files and storage resources into its filesystem over SMB
Server Message Block
In computer networking, Server Message Block , also known as Common Internet File System operates as an application-layer network protocol mainly used to provide shared access to files, printers, serial ports, and miscellaneous communications between nodes on a network. It also provides an...
or NFS, however, the Bacula developers discourage this in favor of having a File daemon installed on each machine to be backed up. In practice, however, the Director and Storage Daemon are often run on one machine (often referred to as the Bacula Server). The File Daemon is then run on each machine to be backed up (including the Bacula server—because its catalog is dumped as SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....
).
Backup data can be stored on various media, including tape, optical media, and disk.
Limitations
Bacula stores backup data in an open and documented yet unique format; there are Bacula standalone tools to read/write the backup data (bls, bcopy, bscan, bextract), it is not compatible with other Unix backup utilities such as tarTar (file format)
In computing, tar is both a file format and the name of a program used to handle such files...
or dump
Dump (program)
dump is a Unix program used to back up file systems. It operates on blocks, below filesystem abstractions such as files and directories. Dump can back up a file system to a tape or another disk...
. Bacula developers and users do not consider this a limitation, because it is an extensible, machine independent format that far surpasses the capabilities of the tar and dump formats.
By default, and as is case for all other open source backup software, Bacula's Differential and Incremental backups are based on system time stamps. Consequently, if you move files into an existing directory or move a whole directory into the backup FileSet after a Full backup, those files may not be backed up by an Incremental save because they may have old dates. You must explicitly update the date/time stamp on all moved files. Bacula versions starting with 3.0 or later support Accurate backup, which is an option to address this issue. Windows NTBackup, which is not as feature rich as Bacula, does not have this problem, because it does not rely on time stamps, but uses the archive bit attribute instead, which has its own set of problems.
History
! align="left" | Date
! align="left" | Event
|-
| January 2000
| Project started
|-
| April 14, 2002
| First release to SourceForge.net
SourceForge.net
SourceForge is a web-based source code repository. It acts as a centralized location for software developers to control and manage open source software development. The website runs a version of SourceForge Enterprise Edition, forked from the last open-source version available...
(version 1.16)
|-
| June 29, 2006
| Release 1.38.11 (Final version 1 release)
|-
| January 2007
| Release 2.0.0
|-
| September 2007
| Release 2.2.3
|-
| June 2008
| Release 2.4.0
|-
| April 2009
| Release 3.0.0
|-
| January 2010
| Release 5.0.0
|-
| September 2010
| Release 5.0.3
|}>
Further reading
Chapter 7 covers Bacula- Enterprise Networking article
- Server Watch article
- O'Reilly SysAdmin interview article
- Deduplication article