Distributed lock manager
Encyclopedia
A distributed lock manager (DLM) provides distributed software applications with a means to synchronize their accesses to shared resource
s.
DLMs have been used as the foundation for several successful clustered file system
s, in which the machines in a cluster can use each other's storage via a unified file system
, with significant advantages for performance and availability. The main performance benefit comes from solving the problem of disk cache coherency
between participating computers. The DLM is used not only for file locking
but also for coordination of all disk access. VMScluster
, the first clustering system to come into widespread use, relies on the OpenVMS
DLM in just this way.
designer chooses. A hierarchy of resources may be defined, so that a number of levels of locking can be implemented. For instance, a hypothetical database
might define a resource hierarchy as follows:
A process
can then acquire locks on the database as a whole, and then on particular parts of the database. A lock must be obtained on a parent resource before a subordinate resource can be locked.
The following truth table
shows the compatibility of each lock mode with the others:
technique that is used to perform I/O. The enqueue lock request can either complete synchronously, in which case the process waits until the lock is granted, or asynchronously, in which case an AST
occurs when the lock has been obtained.
It is also possible to establish a blocking AST, which is triggered when a process has obtained a lock that is preventing access to the resource by another process. The original process can then optionally take action to allow the other access (e.g. by demoting or releasing the lock).
It can be used to hold any information about the resource that the application designer chooses. A typical use is to hold a version number of the resource. Each time the associated entity (e.g. a database record) is updated, the holder of the lock increments the lock value block. When another process wishes to read the resource, it obtains the appropriate lock and compares the current lock value with the value it had last time the process locked the resource. If the value is the same, the process knows that the associated entity has not been updated since last time it read it, and therefore it is unnecessary to read it again. Hence, this technique can be used to implement various types of cache
in a database or similar application.
.
A simple example is when Process 1 has obtained an exclusive lock on Resource A, and Process 2 has obtained an exclusive lock on Resource B. If Process 1 then tries to lock Resource B, it will have to wait for Process 2 to release it. But if Process 2 then tries to lock Resource A, both processes will wait forever for each other.
The OpenVMS DLM periodically checks for deadlock situations. In the example above, the second lock enqueue request of one of the processes would return with a deadlock status. It would then be up to this process to take action to resolve the deadlock — in this case by releasing the first lock it obtained.
and Oracle
have developed clustering software for Linux
.
OCFS2
, the Oracle Cluster File System was added to the official Linux kernel
with version 2.6.16, in January 2006. The alpha-quality code warning on OCFS2 was removed in 2.6.19.
Red Hat's cluster software, including their DLM and Global File System
was officially added to the Linux kernel with version 2.6.19, in November 2006.
Both systems use a DLM modeled on the venerable VMS DLM. Oracle's DLM has a simpler API. (the core function,
has developed Chubby, a lock service for loosely-coupled distributed systems. It is designed for coarse-grained locking and also provides a limited but reliable distributed file system. Key parts of Google's infrastructure, including Google File System
, BigTable
, and MapReduce
, use Chubby to synchronize accesses to shared resources. Though Chubby was designed as a lock service, it is now heavily used inside Google as a name server
, supplanting DNS
.
projects such as OpenSSI
.
Shared resource
In computing, a shared resource or network share is a device or piece of information on a computer that can be remotely accessed from another computer, typically via a local area network or an enterprise Intranet, transparently as if it were a resource in the local machine.Examples are shared file...
s.
DLMs have been used as the foundation for several successful clustered file system
Clustered file system
A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system...
s, in which the machines in a cluster can use each other's storage via a unified file system
File system
A file system is a means to organize data expected to be retained after a program terminates by providing procedures to store, retrieve and update data, as well as manage the available space on the device which contain it. A file system organizes data in an efficient manner and is tuned to the...
, with significant advantages for performance and availability. The main performance benefit comes from solving the problem of disk cache coherency
Cache coherency
In computing, cache coherence refers to the consistency of data stored in local caches of a shared resource.When clients in a system maintain caches of a common memory resource, problems may arise with inconsistent data. This is particularly true of CPUs in a multiprocessing system...
between participating computers. The DLM is used not only for file locking
File locking
File locking is a mechanism that restricts access to a computer file by allowing only one user or process access at any specific time. Systems implement locking to prevent the classic interceding update scenario ....
but also for coordination of all disk access. VMScluster
VMScluster
A VMScluster is a computer cluster involving a group of computers running the OpenVMS operating system. Whereas tightly coupled multiprocessor systems run a single copy of the operating system, a VMScluster is loosely coupled: each machine runs its own copy of OpenVMS, but the disk storage, lock...
, the first clustering system to come into widespread use, relies on the OpenVMS
OpenVMS
OpenVMS , previously known as VAX-11/VMS, VAX/VMS or VMS, is a computer server operating system that runs on VAX, Alpha and Itanium-based families of computers. Contrary to what its name suggests, OpenVMS is not open source software; however, the source listings are available for purchase...
DLM in just this way.
VMS implementation
VMS was the first widely-available operating system to implement a DLM. This became available in Version 4, although the user interface was the same as the single-processor lock manager that was first implemented in Version 3.Resources
The DLM uses a generalised concept of a resource, which is some entity to which shared access must be controlled. This can relate to a file, a record, an area of shared memory, or anything else that the applicationApplication software
Application software, also known as an application or an "app", is computer software designed to help the user to perform specific tasks. Examples include enterprise software, accounting software, office suites, graphics software and media players. Many application programs deal principally with...
designer chooses. A hierarchy of resources may be defined, so that a number of levels of locking can be implemented. For instance, a hypothetical database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
might define a resource hierarchy as follows:
- Database
- Table
- Record
- Field
A process
Process (computing)
In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system , a process may be made up of multiple threads of execution that execute instructions concurrently.A computer program is a...
can then acquire locks on the database as a whole, and then on particular parts of the database. A lock must be obtained on a parent resource before a subordinate resource can be locked.
Lock modes
A process running within a VMSCluster may obtain a lock on a resource. There are six lock modes that can be granted, and these determine the level of exclusivity of access to the resource. Once a lock has been granted, it is possible to convert the lock to a higher or lower level of lock mode. When all processes have unlocked a resource, the system's information about the resource is destroyed.- Null Lock (NL). Indicates interest in the resource, but does not prevent other processes from locking it. It has the advantage that the resource and its lock value block are preserved, even when no processes are locking it.
- Concurrent Read (CR). Indicates a desire to read (but not update) the resource. It allows other processes to read or update the resource, but prevents others from having exclusive access to it. This is usually employed on high-level resources, in order that more restrictive locks can be obtained on subordinate resources.
- Concurrent Write (CW). Indicates a desire to read and update the resource. It also allows other processes to read or update the resource, but prevents others from having exclusive access to it. This is also usually employed on high-level resources, in order that more restrictive locks can be obtained on subordinate resources.
- Protected Read (PR). This is the traditional share lock, which indicates a desire to read the resource but prevents other from updating it. Others can however also read the resource.
- Protected Write (PW). This is the traditional update lock, which indicates a desire to read and update the resource and prevents others from updating it. Others with Concurrent Read access can however read the resource.
- Exclusive (EX). This is the traditional exclusive lock which allows read and update access to the resource, and prevents others from having any access to it.
The following truth table
Truth table
A truth table is a mathematical table used in logic—specifically in connection with Boolean algebra, boolean functions, and propositional calculus—to compute the functional values of logical expressions on each of their functional arguments, that is, on each combination of values taken by their...
shows the compatibility of each lock mode with the others:
Mode | NL | CR | CW | PR | PW | EX |
---|---|---|---|---|---|---|
NL | ||||||
CR | ||||||
CW | ||||||
PR | ||||||
PW | ||||||
EX |
Obtaining a lock
A process can obtain a lock on a resource by enqueueing a lock request. This is similar to the QIOQIO
QIO is a term used in several computer operating systems designed by the former Digital Equipment Corporation of Maynard, Massachusetts.I/O operations on these systems are initiated by issuing a QIO call to the kernel...
technique that is used to perform I/O. The enqueue lock request can either complete synchronously, in which case the process waits until the lock is granted, or asynchronously, in which case an AST
Asynchronous System Trap
Asynchronous system trap refers to a mechanism used in several computer operating systems designed by the former Digital Equipment Corporation of Maynard, Massachusetts....
occurs when the lock has been obtained.
It is also possible to establish a blocking AST, which is triggered when a process has obtained a lock that is preventing access to the resource by another process. The original process can then optionally take action to allow the other access (e.g. by demoting or releasing the lock).
Lock value block
A lock value block is associated with each resource. This can be read by any process that has obtained a lock on the resource (other than a null lock) and can be updated by a process that has obtained a protected update or exclusive lock on it.It can be used to hold any information about the resource that the application designer chooses. A typical use is to hold a version number of the resource. Each time the associated entity (e.g. a database record) is updated, the holder of the lock increments the lock value block. When another process wishes to read the resource, it obtains the appropriate lock and compares the current lock value with the value it had last time the process locked the resource. If the value is the same, the process knows that the associated entity has not been updated since last time it read it, and therefore it is unnecessary to read it again. Hence, this technique can be used to implement various types of cache
Cache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...
in a database or similar application.
Deadlock detection
When one or more processes have obtained locks on resources, it is possible to produce a situation where each is preventing another from obtaining a lock, and none of them can proceed. This is known as a deadly embrace or deadlockDeadlock
A deadlock is a situation where in two or more competing actions are each waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like the "chicken or the egg"...
.
A simple example is when Process 1 has obtained an exclusive lock on Resource A, and Process 2 has obtained an exclusive lock on Resource B. If Process 1 then tries to lock Resource B, it will have to wait for Process 2 to release it. But if Process 2 then tries to lock Resource A, both processes will wait forever for each other.
The OpenVMS DLM periodically checks for deadlock situations. In the example above, the second lock enqueue request of one of the processes would return with a deadlock status. It would then be up to this process to take action to resolve the deadlock — in this case by releasing the first lock it obtained.
Linux clustering
Both Red HatRed Hat
Red Hat, Inc. is an S&P 500 company in the free and open source software sector, and a major Linux distribution vendor. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina with satellite offices worldwide....
and Oracle
Oracle Corporation
Oracle Corporation is an American multinational computer technology corporation that specializes in developing and marketing hardware systems and enterprise software products – particularly database management systems...
have developed clustering software for Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
.
OCFS2
OCFS
OCFS is a shared disk file system developed by Oracle Corporation and released under the GNU General Public License....
, the Oracle Cluster File System was added to the official Linux kernel
Linux kernel
The Linux kernel is an operating system kernel used by the Linux family of Unix-like operating systems. It is one of the most prominent examples of free and open source software....
with version 2.6.16, in January 2006. The alpha-quality code warning on OCFS2 was removed in 2.6.19.
Red Hat's cluster software, including their DLM and Global File System
Global File System
In computing, the Global File System is a shared disk file system for Linux computer clusters. This is not to be confused with the Google File System, a proprietary distributed filesystem developed by Google....
was officially added to the Linux kernel with version 2.6.19, in November 2006.
Both systems use a DLM modeled on the venerable VMS DLM. Oracle's DLM has a simpler API. (the core function,
dlmlock
, has eight parameters, whereas the VMS SYS$ENQ
service and Red Hat's dlm_lock both have 11.)Google's Chubby lock service
GoogleGoogle
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...
has developed Chubby, a lock service for loosely-coupled distributed systems. It is designed for coarse-grained locking and also provides a limited but reliable distributed file system. Key parts of Google's infrastructure, including Google File System
Google File System
Google File System is a proprietary distributed file system developed by Google Inc. for its own use. It is designed to provide efficient, reliable access to data using large clusters of commodity hardware...
, BigTable
BigTable
BigTable is a compressed, high performance, and proprietary database system built on Google File System , Chubby Lock Service, SSTable and a few other Google technologies; it is currently not distributed nor is it used outside of Google, although Google offers access to it as part of their Google...
, and MapReduce
MapReduce
MapReduce is a software framework introduced by Google in 2004 to support distributed computing on large data sets on clusters of computers. Parts of the framework are patented in some countries....
, use Chubby to synchronize accesses to shared resources. Though Chubby was designed as a lock service, it is now heavily used inside Google as a name server
Name server
In computing, a name server is a program or computer server that implements a name-service protocol. It maps a human-recognizable identifier to a system-internal, often numeric, identification or addressing component....
, supplanting DNS
Domain name system
The Domain Name System is a hierarchical distributed naming system for computers, services, or any resource connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities...
.
SSI Systems
A DLM is also a key component of more ambitious single system imageSingle-system image
In distributed computing, a single system image cluster is a cluster of machines that appears to be one single system. The concept is often considered synonymous with that of a distributed operating system, but a single image may be presented for more limited purposes, just job scheduling for...
projects such as OpenSSI
OpenSSI
OpenSSI is an open source single-system image clustering system. It allows a collection of computers to be treated as one large system, allowing applications running on any one machine access to the resources of all the machines in the cluster....
.