Application checkpointing
Encyclopedia
Checkpointing is a technique for inserting fault tolerance into computing
systems. It basically consists of storing a snapshot of the current application
state, and later on, use it for restarting the execution in case of failure
.
Each design
decision made affects the properties and efficiency of the final product. For instance, deciding to store the entire application state will allow for a more straightforward implementation
, since no analysis of the application will be needed, but it will deny the portability of the generated state files, due to a number of non-portable structures (such as application stack
or heap) being stored along with application data.
, checkpointing is a technique that helps tolerate the errors leading to losing the effect of work of long-running applications. The main property which should be induced by checkpointing techniques in such systems is in preserving system consistency
in case of failure. There are two main approaches to checkpointing in such systems: coordinated checkpointing, in which all cooperating processes work together to establish coherent checkpoint; and communication induced (called also dependency induced) independent checkpointing.
It must be stressed that simply forcing processes to checkpoint their state at fixed time intervals is not sufficient to ensure global consistency. Even if we postulate the existence of global clock, the checkpoints made by different processes still may not form a consistent state. The need for establishing a consistent state may force other process to roll back to their checkpoints, which in turn may cause other processes to roll back to even earlier checkpoints, which in the most extreme case may mean that the only consistent state found is the initial state (the so called domino effect
).
In the coordinated checkpointing approach, processes must ensure that their checkpoints are consistent. This is usually achieved by some kind of two-phase commit protocol
algorithm. In communication induced checkpointing, each process checkpoints its own state independently whenever this state is exposed to other processes (that is, for example whenever a remote process reads the page written to by the local process).
The system state may be saved either locally, in stable storage, or in a distant node's memory.
FreeBSD, OpenBSD, Darwin etc). In contrast, kernel based checkpointing packages such as Chpox and the checkpointing algorithms developed for the MOSIX cluster computing environment tend to be highly operating system dependent. Most kernel based checkpointing packages developed to date run under either the 2.4 or 2.6 subfamilies of the Linux kernel on i686 architectures.
Cryopid is now maintained under the SourceForge project Cryopid2. This version of Cryopid will compile on all Linux kernels up to 2.6.27 for 32-bit kernels. Work is in hand to get Cryopid2 working on 64-bit kernels. The cryopid2 package extends Benard Blackhams original Cryopid package in a number of significant ways. For example, it allows the state of Linux real time signals to be preserved when a checkpoint is taken and also is capable of inter-operating with ssh
via a portal daemon in order to implement full process migration (and of any associated pod processes) between Linux hosts. Cryopid2 also has the capability to roll up its environment (e.g. the bodies of open files) into the checkpoints it produces. This facilitates the migration of processes onto foreign hosts which present an arbitrary file system environment to an inbound migrating process. Pipes are also preserved in a similar manner: their contents are sucked into the migrating process prior to migration (so they form part of the checkopoint) and spat out into the kernel of the new host when execution is resumed. Cryopid2 is inter-operable with the P3 Organic computing
environment which uses its services for both persistence
and process migration
Among the applications supported by DMTCP are Open MPI
, Python
, Perl
, and many programming language
s and shell scripting languages. With the use of TightVNC, it can also checkpoint and restart X Window applications, as long as they do not use extensions (e.g. no OpenGL or video). Among the Linux features supported by DMTCP are open file descriptor
s, pipes, sockets, signal handlers, process id and thread id virtualization (ensure old pids and tids continue to work upon restart), ptys, fifos, process group ids, session ids, terminal attributes, and mmap/mprotect (including mmap-based shared memory). See the QUICK-START file of the distribution for further details.
DMTCP is also the basis for URDB, the Universal Reversible Debugger. URDB is still experimental. Nevertheless, it currently adds reversibility to gdb, Python (pdb), and Perl (perl -d). It also supports reverse expression watchpoints, a form of temporal search within a process lifetime.
kernel has an ability to checkpoint and restart a virtual private server
(VPS), i.e. a set of processes and all the data structures associated with those processes (opened files, sockets, IPC objects, network connections, etc.). The primary use of checkpointing is "live migration", a move of a VPS
from one physical server to another without a need to shut down and restart it. OpenVZ supports checkpointing on x86, x86-64
and IA-64 architectures.
Their goal is to provide a robust, production quality implementation that checkpoints a wide range of applications, without requiring changes to be made to application code. This work focuses on checkpointing parallel applications that communicate through MPI, and on compatibility with the software suite produced by the SciDAC Scalable Systems Software ISIC. This work is broken down into 4 main areas:
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...
systems. It basically consists of storing a snapshot of the current application
Application software
Application software, also known as an application or an "app", is computer software designed to help the user to perform specific tasks. Examples include enterprise software, accounting software, office suites, graphics software and media players. Many application programs deal principally with...
state, and later on, use it for restarting the execution in case of failure
Failure
Failure refers to the state or condition of not meeting a desirable or intended objective, and may be viewed as the opposite of success. Product failure ranges from failure to sell the product to fracture of the product, in the worst cases leading to personal injury, the province of forensic...
.
Technique properties
There are many different points of view and techniques for achieving application checkpointing. Depending on the specific implementation, a tool can be classified as having several properties:- Amount of state saved: This property refers to the abstractionAbstraction (computer science)In computer science, abstraction is the process by which data and programs are defined with a representation similar to its pictorial meaning as rooted in the more complex realm of human life and language with their higher need of summarization and categorization , while hiding away the...
level used by the technique to analyze an application. It can range from seeing each application as a black boxBlack boxA black box is a device, object, or system whose inner workings are unknown; only the input, transfer, and output are known characteristics.The term black box can also refer to:-In science and technology:*Black box theory, a philosophical theory...
, hence storing all application data, to selecting specific relevant cores of data in order to achieve a more efficient and portable operation.
- Automatization level: Depending on the effort needed to achieve fault tolerance through the use of a specific checkpointing solution.
- PortabilityPortingIn computer science, porting is the process of adapting software so that an executable program can be created for a computing environment that is different from the one for which it was originally designed...
: Whether or not the saved state can be used on different machines to restart the application.
- System architecture: How is the checkpointing technique implemented: inside a libraryLibrary (computer science)In computer science, a library is a collection of resources used to develop software. These may include pre-written code and subroutines, classes, values or type specifications....
, by the compilerCompilerA compiler is a computer program that transforms source code written in a programming language into another computer language...
or at operating systemOperating systemAn operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
level.
Each design
Engineering
Engineering is the discipline, art, skill and profession of acquiring and applying scientific, mathematical, economic, social, and practical knowledge, in order to design and build structures, machines, devices, systems, materials and processes that safely realize improvements to the lives of...
decision made affects the properties and efficiency of the final product. For instance, deciding to store the entire application state will allow for a more straightforward implementation
Implementation
Implementation is the realization of an application, or execution of a plan, idea, model, design, specification, standard, algorithm, or policy.-Computer Science:...
, since no analysis of the application will be needed, but it will deny the portability of the generated state files, due to a number of non-portable structures (such as application stack
Stack-based memory allocation
Stacks in computing architectures are regions of memory where data is added or removed in a last-in-first-out manner.In most modern computer systems, each thread has a reserved region of memory referred to as its stack. When a function executes, it may add some of its state data to the top of the...
or heap) being stored along with application data.
Use in distributed shared memory systems
In distributed shared memoryDistributed shared memory
Distributed Shared Memory , in Computer Architecture is a form of memory architecture where the memories can be addressed as one address space...
, checkpointing is a technique that helps tolerate the errors leading to losing the effect of work of long-running applications. The main property which should be induced by checkpointing techniques in such systems is in preserving system consistency
Consistency model
In computer science, consistency models are used in distributed systems like distributed shared memory systems or distributed data stores . The system supports a given model, if operations on memory follow specific rules...
in case of failure. There are two main approaches to checkpointing in such systems: coordinated checkpointing, in which all cooperating processes work together to establish coherent checkpoint; and communication induced (called also dependency induced) independent checkpointing.
It must be stressed that simply forcing processes to checkpoint their state at fixed time intervals is not sufficient to ensure global consistency. Even if we postulate the existence of global clock, the checkpoints made by different processes still may not form a consistent state. The need for establishing a consistent state may force other process to roll back to their checkpoints, which in turn may cause other processes to roll back to even earlier checkpoints, which in the most extreme case may mean that the only consistent state found is the initial state (the so called domino effect
Domino effect
The domino effect is a chain reaction that occurs when a small change causes a similar change nearby, which then will cause another similar change, and so on in linear sequence. The term is best known as a mechanical effect, and is used as an analogy to a falling row of dominoes...
).
In the coordinated checkpointing approach, processes must ensure that their checkpoints are consistent. This is usually achieved by some kind of two-phase commit protocol
Two-phase commit protocol
In transaction processing, databases, and computer networking, the two-phase commit protocol is a type of atomic commitment protocol . It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort the transaction...
algorithm. In communication induced checkpointing, each process checkpoints its own state independently whenever this state is exposed to other processes (that is, for example whenever a remote process reads the page written to by the local process).
The system state may be saved either locally, in stable storage, or in a distant node's memory.
Practical implementations for Linux/Unix
A number of practical checkpointing packages have been developed for the Linux/Unix family of operating systems. These checkpointing packages may be divided into two classes, those which operate in user space, examples of which include the checkpointing package used by Condor and the portable checkpointing library developed by The University of Tennessee. User space checkpointing packages are highly portable and can typically be compiled and run on any modern Unix (e.g. Linux,FreeBSD, OpenBSD, Darwin etc). In contrast, kernel based checkpointing packages such as Chpox and the checkpointing algorithms developed for the MOSIX cluster computing environment tend to be highly operating system dependent. Most kernel based checkpointing packages developed to date run under either the 2.4 or 2.6 subfamilies of the Linux kernel on i686 architectures.
Cryopid
Modern checkpointing packages such as Cryopid are capable of checkpointing a process pod, that is a parent process and all its associated children, and of dealing with file system abstractions such as sockets and pipes (FIFO's) in addition to regular files. In the case of Cryopid, there is also provision to roll all dynamic libraries, open files, sockets and FIFO's associated with the process into the checkpoint. This is very useful when the checkpointed process is to be restarted in a heterogeneous environment (e.g. the machine on which the checkpoint is restarted has libraries and file system which differ from the host on which the process was checkpointed).Cryopid is now maintained under the SourceForge project Cryopid2. This version of Cryopid will compile on all Linux kernels up to 2.6.27 for 32-bit kernels. Work is in hand to get Cryopid2 working on 64-bit kernels. The cryopid2 package extends Benard Blackhams original Cryopid package in a number of significant ways. For example, it allows the state of Linux real time signals to be preserved when a checkpoint is taken and also is capable of inter-operating with ssh
Secure Shell
Secure Shell is a network protocol for secure data communication, remote shell services or command execution and other secure network services between two networked computers that it connects via a secure channel over an insecure network: a server and a client...
via a portal daemon in order to implement full process migration (and of any associated pod processes) between Linux hosts. Cryopid2 also has the capability to roll up its environment (e.g. the bodies of open files) into the checkpoints it produces. This facilitates the migration of processes onto foreign hosts which present an arbitrary file system environment to an inbound migrating process. Pipes are also preserved in a similar manner: their contents are sucked into the migrating process prior to migration (so they form part of the checkopoint) and spat out into the kernel of the new host when execution is resumed. Cryopid2 is inter-operable with the P3 Organic computing
Organic computing
Organic computing is a form of biologically-inspired computing with organic properties. It has emerged recently as a challenging vision for future information processing systems...
environment which uses its services for both persistence
Persistence (computer science)
Persistence in computer science refers to the characteristic of state that outlives the process that created it. Without this capability, state would only exist in RAM, and would be lost when this RAM loses power, such as a computer shutdown....
and process migration
Process migration
Process migration is when processes in computer clusters are able to move from machine to machine. Process migration is implemented in, among others, OpenMosix....
DMTCP
DMTCP (Distributed MultiThreaded Checkpointing) is a tool for transparently checkpointing the state of an arbitrary group of programs spread across many machines and connected by sockets. It does not modify the user's program or the operating system.Among the applications supported by DMTCP are Open MPI
Open MPI
Open MPI is a Message Passing Interface library project combining technologies and resources from several other projects . It is used by many TOP500 supercomputers including Roadrunner, which was the world's fastest supercomputer from June 2008 to November 2009, and K computer, the fastest...
, Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
, Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
, and many programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
s and shell scripting languages. With the use of TightVNC, it can also checkpoint and restart X Window applications, as long as they do not use extensions (e.g. no OpenGL or video). Among the Linux features supported by DMTCP are open file descriptor
File descriptor
In computer programming, a file descriptor is an abstract indicator for accessing a file. The term is generally used in POSIX operating systems...
s, pipes, sockets, signal handlers, process id and thread id virtualization (ensure old pids and tids continue to work upon restart), ptys, fifos, process group ids, session ids, terminal attributes, and mmap/mprotect (including mmap-based shared memory). See the QUICK-START file of the distribution for further details.
DMTCP is also the basis for URDB, the Universal Reversible Debugger. URDB is still experimental. Nevertheless, it currently adds reversibility to gdb, Python (pdb), and Perl (perl -d). It also supports reverse expression watchpoints, a form of temporal search within a process lifetime.
OpenVZ
OpenVZOpenVZ
OpenVZ is an operating system-level virtualization technology based on the Linux kernel and operating system. OpenVZ allows a physical server to run multiple isolated operating system instances, known as containers, Virtual Private Servers , or Virtual Environments...
kernel has an ability to checkpoint and restart a virtual private server
Virtual private server
Virtual private server is a term used by internet hosting services to refer to a virtual machine. The term is used for emphasizing that the virtual machine, although running in software on the same physical computer as other customers' virtual machines, is functionally equivalent to a separate...
(VPS), i.e. a set of processes and all the data structures associated with those processes (opened files, sockets, IPC objects, network connections, etc.). The primary use of checkpointing is "live migration", a move of a VPS
Virtual private server
Virtual private server is a term used by internet hosting services to refer to a virtual machine. The term is used for emphasizing that the virtual machine, although running in software on the same physical computer as other customers' virtual machines, is functionally equivalent to a separate...
from one physical server to another without a need to shut down and restart it. OpenVZ supports checkpointing on x86, x86-64
X86-64
x86-64 is an extension of the x86 instruction set. It supports vastly larger virtual and physical address spaces than are possible on x86, thereby allowing programmers to conveniently work with much larger data sets. x86-64 also provides 64-bit general purpose registers and numerous other...
and IA-64 architectures.
Berkeley Lab Checkpoint/Restart (BLCR)
The Future Technologies Group at the Lawrence National Laboratories are developing a hybrid kernel/user implementation of checkpoint/restart called BLCRTheir goal is to provide a robust, production quality implementation that checkpoints a wide range of applications, without requiring changes to be made to application code. This work focuses on checkpointing parallel applications that communicate through MPI, and on compatibility with the software suite produced by the SciDAC Scalable Systems Software ISIC. This work is broken down into 4 main areas:
- Checkpoint/Restart for Linux (CR)
- Checkpointable MPI Libraries
- Resource Management Interface to Checkpoint/Restart
- Development of Process Management Interfaces