Single-system image
Encyclopedia
In distributed computing
, a single system image (SSI) cluster is a cluster
of machines that appears to be one single system. The concept is often considered synonymous with that of a distributed operating system, but a single image may be presented for more limited purposes, just job scheduling for instance, which may be achieved by means of an additional layer of software over conventional operating system image
s running on each node
. The interest in SSI clusters is based on the perception that they may be simpler to use and administer than more specialized clusters.
Different SSI systems may provide a more or less complete illusion of a single system.
Processes may start on one node
and be moved to another node, possibly for resource balancing
or administrative reasons.for example it may be necessary to move long running processes off a node that is to be closed down for maintenance As processes are moved from one node to another, other associated resources (for example IPC
resources) may be moved with them.
of running processes, allowing their current state to be saved and reloaded at a later date.Checkpointing is particularly useful in clusters used for high-performance computing
, avoiding lost work in case of a cluster or node restart
Checkpointing can be seen as related to migration, as migrating a process from one node to another can be implemented by first checkpointing the process, then restarting it on another node. Alternatively checkpointing can be considered as migration to disk.
like systems) operate on all processes in the cluster.
The advantage of a single root view is that processes may be run on any available node and access needed files with no special precautions. If the cluster implements process migration a single root view enables direct accesses to the files from the node where the process is currently running.
Some SSI systems provide a way of "breaking the illusion", having some node-specific files even in a single root, e.g. HP TruCluster
provides a "context dependent symbolic link" (CDSL) which points to different files depending on the node that accesses it. This may be necessary to deal with heterogeneous clusters, where not all nodes have the same configuration.
can't mount disk devices from one node on another node).
s mechanisms as if they were running on the same machine. On some SSI systems this can even include shared memory
(can be emulated with Software Distributed shared memory
).
In most cases inter-node IPC will be slower than IPC on the same machine, possibly drastically slower for shared memory. Some SSI clusters include special hardware to reduce this slowdown.
|+SSI Properties of different clustering systems
|-
!Name
!Process migration
!Process checkpoint
!Single process space
!Single root
!Single I/O space
!Single IPC space
!Cluster IP addressMany of the Linux
based SSI clusters can use the Linux Virtual Server
to implement a single cluster IP address
|-
|Amoeba
Amoeba
development is carried forward by Dr. Stefan Bosse at BSS Lab
|
|
|
|
|
|
|
|-
|AIX TCFAIX TCF was available in AIX 1. It is currently inactive
|
|
|
|
|
|
|
|-
|Genesis
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
|Inferno
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
|Kerrighed
|
|
|
|
|
|
|
|-
|LinuxPMI
LinuxPMI
is a successor to openMosix
|
|
|
|
|
|
|
|-
|LOCUS
LOCUS
is currently inactive
|
|
|
|
|
|
|
|-
|MOSIX
|
|
|
|
|
|
|
|-
|openMosix
openMosix
was a fork of MOSIX, now inactive
|
|
|
|
|
|
|
|-
|Open-Sharedroot
Open-Sharedroot
is a shared root Cluster from ATIX
|
|
|
|
|
|
|
|-
|OpenSSI
|
|
|
|
|
|
|
|-
|VMScluster
|
|
|
|
|
|
|
|-
|Plan 9
|
|
|
|
|
|
|
|-
|Sprite
Sprite
is inactive.
|
|
|
|
|
|
|
|-
|TruCluster
TruCluster
is part of the Tru64 operating system from Hewlett-Packard
|
|
|
|
|
|
| >
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...
, a single system image (SSI) cluster is a cluster
Cluster (computing)
A computer cluster is a group of linked computers, working together closely thus in many respects forming a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks...
of machines that appears to be one single system. The concept is often considered synonymous with that of a distributed operating system, but a single image may be presented for more limited purposes, just job scheduling for instance, which may be achieved by means of an additional layer of software over conventional operating system image
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
s running on each node
Node (networking)
In communication networks, a node is a connection point, either a redistribution point or a communication endpoint . The definition of a node depends on the network and protocol layer referred to...
. The interest in SSI clusters is based on the perception that they may be simpler to use and administer than more specialized clusters.
Different SSI systems may provide a more or less complete illusion of a single system.
Features of SSI clustering systems
Different SSI systems may, depending on their intended usage, provide some subset of these features.Process migration
Many SSI systems provide process migration.Processes may start on one node
Node (networking)
In communication networks, a node is a connection point, either a redistribution point or a communication endpoint . The definition of a node depends on the network and protocol layer referred to...
and be moved to another node, possibly for resource balancing
Load balancing (computing)
Load balancing is a computer networking methodology to distribute workload across multiple computers or a computer cluster, network links, central processing units, disk drives, or other resources, to achieve optimal resource utilization, maximize throughput, minimize response time, and avoid...
or administrative reasons.for example it may be necessary to move long running processes off a node that is to be closed down for maintenance As processes are moved from one node to another, other associated resources (for example IPC
Inter-process communication
In computing, Inter-process communication is a set of methods for the exchange of data among multiple threads in one or more processes. Processes may be running on one or more computers connected by a network. IPC methods are divided into methods for message passing, synchronization, shared...
resources) may be moved with them.
Process checkpointing
Some SSI systems allow checkpointingApplication checkpointing
Checkpointing is a technique for inserting fault tolerance into computing systems. It basically consists of storing a snapshot of the current application state, and later on, use it for restarting the execution in case of failure.- Technique properties :...
of running processes, allowing their current state to be saved and reloaded at a later date.Checkpointing is particularly useful in clusters used for high-performance computing
High-performance computing
High-performance computing uses supercomputers and computer clusters to solve advanced computation problems. Today, computer systems approaching the teraflops-region are counted as HPC-computers.-Overview:...
, avoiding lost work in case of a cluster or node restart
Checkpointing can be seen as related to migration, as migrating a process from one node to another can be implemented by first checkpointing the process, then restarting it on another node. Alternatively checkpointing can be considered as migration to disk.
Single process space
Some SSI systems provide the illusion that all processes are running on the same machine - the process management tools (e.g. "ps", "kill" on UnixUnix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
like systems) operate on all processes in the cluster.
Single root
Most SSI systems provide a single view of the file system. This may be achieved by a simple NFS server, shared disk devices or even file replication.The advantage of a single root view is that processes may be run on any available node and access needed files with no special precautions. If the cluster implements process migration a single root view enables direct accesses to the files from the node where the process is currently running.
Some SSI systems provide a way of "breaking the illusion", having some node-specific files even in a single root, e.g. HP TruCluster
TruCluster
TruCluster is a closed-source high-availability clustering solution for the Tru64 UNIX operating system. It was originally developed by Digital Equipment Corporation , but was transferred to Compaq in 1998 when Digital was acquired by the company, which then later merged with Hewlett-Packard ....
provides a "context dependent symbolic link" (CDSL) which points to different files depending on the node that accesses it. This may be necessary to deal with heterogeneous clusters, where not all nodes have the same configuration.
Single I/O space
Some SSI systems allow all nodes to access the I/O devices (e.g. tapes, disks, serial lines and so on) of other nodes. There may be some restrictions on the kinds of accesses allowed (For example OpenSSIOpenSSI
OpenSSI is an open source single-system image clustering system. It allows a collection of computers to be treated as one large system, allowing applications running on any one machine access to the resources of all the machines in the cluster....
can't mount disk devices from one node on another node).
Single IPC space
Some SSI systems allow processes on different nodes to communicate using inter-process communicationInter-process communication
In computing, Inter-process communication is a set of methods for the exchange of data among multiple threads in one or more processes. Processes may be running on one or more computers connected by a network. IPC methods are divided into methods for message passing, synchronization, shared...
s mechanisms as if they were running on the same machine. On some SSI systems this can even include shared memory
Shared memory
In computing, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Depending on context, programs may run on a single processor or on multiple separate processors...
(can be emulated with Software Distributed shared memory
Distributed shared memory
Distributed Shared Memory , in Computer Architecture is a form of memory architecture where the memories can be addressed as one address space...
).
In most cases inter-node IPC will be slower than IPC on the same machine, possibly drastically slower for shared memory. Some SSI clusters include special hardware to reduce this slowdown.
Cluster IP address
Some SSI systems provide a "cluster address", a single address visible from outside the cluster that can be used to contact the cluster as if it were one machine. This can be used for load balancing inbound calls to the cluster, directing them to lightly loaded nodes, or for redundancy, moving the cluster address from one machine to another as nodes join or leave the cluster."leaving a cluster" is often a euphemism for crashingSome example SSI clustering systems
|-
!Name
!Process migration
!Process checkpoint
!Single process space
!Single root
!Single I/O space
!Single IPC space
!Cluster IP addressMany of the Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
based SSI clusters can use the Linux Virtual Server
Linux Virtual Server
Linux Virtual Server is an advanced load balancing solution for Linux systems. It is an open source project started by Wensong Zhang in May 1998...
to implement a single cluster IP address
|-
|Amoeba
Amoeba distributed operating system
Amoeba is an open source microkernel-based distributed operating system developed by Andrew S. Tanenbaum and others at the Vrije Universiteit. The aim of the Amoeba project is to build a timesharing system that makes an entire network of computers appear to the user as a single machine...
Amoeba
Amoeba distributed operating system
Amoeba is an open source microkernel-based distributed operating system developed by Andrew S. Tanenbaum and others at the Vrije Universiteit. The aim of the Amoeba project is to build a timesharing system that makes an entire network of computers appear to the user as a single machine...
development is carried forward by Dr. Stefan Bosse at BSS Lab
|
|
|
|
|
|
|
|-
|AIX TCFAIX TCF was available in AIX 1. It is currently inactive
|
|
|
|
|
|
|
|-
|Genesis
Genesis operating system
SSI operating system project created by the Faculty of Science and Technology at Deakin University, Melbourne Australia.-External links:*...
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
|Inferno
Inferno (operating system)
Inferno is a distributed operating system started at Bell Labs, but is now developed and maintained by Vita Nuova Holdings as free software. Inferno was based on the experience gained with Plan 9 from Bell Labs, and the further research of Bell Labs into operating systems, languages, on-the-fly...
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
| align="center" |
|-
|Kerrighed
Kerrighed
Kerrighed is an open source single-system image cluster software project. The project started in October 1998 at the Paris research group The French National Institute for Research in Computer Science and Control.-Background:...
|
|
|
|
|
|
|
|-
|LinuxPMI
LinuxPMI
LinuxPMI is a Linux Kernel extension for multi-system-image clustering...
LinuxPMI
LinuxPMI
LinuxPMI is a Linux Kernel extension for multi-system-image clustering...
is a successor to openMosix
OpenMosix
openMosix was a free cluster management system that provided single-system image capabilities, e.g. automatic work distribution among nodes. It allowed program processes to migrate to machines in the node's network that would be able to run that process faster...
|
|
|
|
|
|
|
|-
|LOCUS
LOCUS (operating system)
LOCUS was a distributed operating system developed at UCLA during the 1980s. It was notable for providing an early implementation of the single-system image idea, where a cluster of machines appeared to be one larger machine....
LOCUS
LOCUS (operating system)
LOCUS was a distributed operating system developed at UCLA during the 1980s. It was notable for providing an early implementation of the single-system image idea, where a cluster of machines appeared to be one larger machine....
is currently inactive
|
|
|
|
|
|
|
|-
|MOSIX
MOSIX
MOSIX is a distributed operating system. Although early versions were based on older UNIX systems, since 1999 it focuses on Linux clusters and grids...
|
|
|
|
|
|
|
|-
|openMosix
OpenMosix
openMosix was a free cluster management system that provided single-system image capabilities, e.g. automatic work distribution among nodes. It allowed program processes to migrate to machines in the node's network that would be able to run that process faster...
openMosix
OpenMosix
openMosix was a free cluster management system that provided single-system image capabilities, e.g. automatic work distribution among nodes. It allowed program processes to migrate to machines in the node's network that would be able to run that process faster...
was a fork of MOSIX, now inactive
|
|
|
|
|
|
|
|-
|Open-Sharedroot
Open-Sharedroot
Open-Sharedroot is a single-system image clustering solution for Linux.The Open-Sharedroot Project is based on a clustered filesystem like the Red Hat Global File System or Oracle Cluster Filesystem 2 to build a Shared Root Cluster....
Open-Sharedroot
Open-Sharedroot
Open-Sharedroot is a single-system image clustering solution for Linux.The Open-Sharedroot Project is based on a clustered filesystem like the Red Hat Global File System or Oracle Cluster Filesystem 2 to build a Shared Root Cluster....
is a shared root Cluster from ATIX
|
|
|
|
|
|
|
|-
|OpenSSI
OpenSSI
OpenSSI is an open source single-system image clustering system. It allows a collection of computers to be treated as one large system, allowing applications running on any one machine access to the resources of all the machines in the cluster....
|
|
|
|
|
|
|
|-
|VMScluster
VMScluster
A VMScluster is a computer cluster involving a group of computers running the OpenVMS operating system. Whereas tightly coupled multiprocessor systems run a single copy of the operating system, a VMScluster is loosely coupled: each machine runs its own copy of OpenVMS, but the disk storage, lock...
|
|
|
|
|
|
|
|-
|Plan 9
Plan 9 from Bell Labs
Plan 9 from Bell Labs is a distributed operating system. It was developed primarily for research purposes as the successor to Unix by the Computing Sciences Research Center at Bell Labs between the mid-1980s and 2002...
|
|
|
|
|
|
|
|-
|Sprite
Sprite operating system
Sprite was an experimental Unix-like distributed operating system developed at the University of California, Berkeley by John Ousterhout's research group between 1984 and 1992. Its notable features included support for single system image on computer clusters and for the introduction of the...
Sprite
Sprite operating system
Sprite was an experimental Unix-like distributed operating system developed at the University of California, Berkeley by John Ousterhout's research group between 1984 and 1992. Its notable features included support for single system image on computer clusters and for the introduction of the...
is inactive.
|
|
|
|
|
|
|
|-
|TruCluster
TruCluster
TruCluster is a closed-source high-availability clustering solution for the Tru64 UNIX operating system. It was originally developed by Digital Equipment Corporation , but was transferred to Compaq in 1998 when Digital was acquired by the company, which then later merged with Hewlett-Packard ....
TruCluster
TruCluster
TruCluster is a closed-source high-availability clustering solution for the Tru64 UNIX operating system. It was originally developed by Digital Equipment Corporation , but was transferred to Compaq in 1998 when Digital was acquired by the company, which then later merged with Hewlett-Packard ....
is part of the Tru64 operating system from Hewlett-Packard
Hewlett-Packard
Hewlett-Packard Company or HP is an American multinational information technology corporation headquartered in Palo Alto, California, USA that provides products, technologies, softwares, solutions and services to consumers, small- and medium-sized businesses and large enterprises, including...
|
|
|
|
|
|
| >