Process substitution
Encyclopedia
In computing, process substitution is a form of inter-process communication
that allows the input or output of a command to appear as a file. The command is substituted in-line, where a file name would normally occur, by the command shell. This allows programs that normally only accept files to directly read from or write to another program.
The Unix
diff
command normally accepts the names of two files to compare, or one file name and standard input. Process substitution allows you to compare the output of two programs directly:
$ diff <(sort file1) <(sort file2)
The
Without process substitution, the alternatives are:
Both alternatives are rather more cumbersome.
Process substitution can also be used to capture output that would normally go to a file, and redirect it to the input of a process. The Bash syntax for writing to a process is
$ tee >(wc -l >&2) < bigfile | gzip > bigfile.gz
, and then substituting its name on the command line. (Because of this, process substitution is sometimes known as "anonymous named pipes.") To illustrate the steps involved, consider the following simple command substitution:
diff file1 <(sort file2)
The steps the shell performs are:
; it must read or write once from start to finish. Programs that explicitly check the type of a file before opening it may refuse to work with process substitution, because the "file" resulting from process substitution is not a regular file. "It is not possible to obtain the exit code of a process substitution command from the shell that created the process substitution."
Inter-process communication
In computing, Inter-process communication is a set of methods for the exchange of data among multiple threads in one or more processes. Processes may be running on one or more computers connected by a network. IPC methods are divided into methods for message passing, synchronization, shared...
that allows the input or output of a command to appear as a file. The command is substituted in-line, where a file name would normally occur, by the command shell. This allows programs that normally only accept files to directly read from or write to another program.
Example
The following examples use Bash syntax.The Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
diff
Diff
In computing, diff is a file comparison utility that outputs the differences between two files. It is typically used to show the changes between one version of a file and a former version of the same file. Diff displays the changes made per line for text files. Modern implementations also...
command normally accepts the names of two files to compare, or one file name and standard input. Process substitution allows you to compare the output of two programs directly:
$ diff <(sort file1) <(sort file2)
The
<(command)
expression tells the command interpreter to run command and make its output appear as a file. The command can be any arbitrarily complex shell command.Without process substitution, the alternatives are:
Both alternatives are rather more cumbersome.
Process substitution can also be used to capture output that would normally go to a file, and redirect it to the input of a process. The Bash syntax for writing to a process is
>(command)
. Here is an example that counts the lines in a file with wc -l
and compresses it with gzip
in one pass:$ tee >(wc -l >&2) < bigfile | gzip > bigfile.gz
Advantages
The main advantages of process substitution over its alternatives are:- Simplicity: The commands can be given in-line; there is no need to save temporary files or create named pipes first.
- Performance: Reading directly from another process is often faster than having to write a temporary file to disk, then read it back in. This also saves disk space.
- Parallelism: The substituted process can be running concurrently with the command reading its output or writing its input, taking advantage of multiprocessingMultiprocessingMultiprocessing is the use of two or more central processing units within a single computer system. The term also refers to the ability of a system to support more than one processor and/or the ability to allocate tasks between them...
to reduce the total time for the computation.
Mechanism
Under the hood, process substitution works by creating a named pipeNamed pipe
In computing, a named pipe is an extension to the traditional pipe concept on Unix and Unix-like systems, and is one of the methods of inter-process communication. The concept is also found in Microsoft Windows, although the semantics differ substantially...
, and then substituting its name on the command line. (Because of this, process substitution is sometimes known as "anonymous named pipes.") To illustrate the steps involved, consider the following simple command substitution:
diff file1 <(sort file2)
The steps the shell performs are:
- Create a new named pipe. This special file is often named something like
/dev/fd/63
on Unix-like systems; you can see it with a command likeecho <(true)
. - Execute the substituted command in the background (
sort file2
in this case), piping its output to the named pipe. - Execute the primary command, replacing the substituted command with the name of the named pipe. In this case, the full command might expand to something like
diff file1 /dev/fd/63
. - When execution is finished, remove the named pipe.
Limitations
Process substitution has some limitations: the "files" created are not seekable, which means the process reading or writing to the file cannot perform random accessRandom access
In computer science, random access is the ability to access an element at an arbitrary position in a sequence in equal time, independent of sequence size. The position is arbitrary in the sense that it is unpredictable, thus the use of the term "random" in "random access"...
; it must read or write once from start to finish. Programs that explicitly check the type of a file before opening it may refuse to work with process substitution, because the "file" resulting from process substitution is not a regular file. "It is not possible to obtain the exit code of a process substitution command from the shell that created the process substitution."
See also
- Pipeline (Unix)Pipeline (Unix)In Unix-like computer operating systems , a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process feeds directly as input to the next one. Each connection is implemented by an anonymous pipe...
- Named pipeNamed pipeIn computing, a named pipe is an extension to the traditional pipe concept on Unix and Unix-like systems, and is one of the methods of inter-process communication. The concept is also found in Microsoft Windows, although the semantics differ substantially...
- Command substitutionCommand substitutionIn computing, command substitution is a facility originally introduced in the Unix shells that allows a command to be run and its output to be pasted back on the command line as arguments to another command...
- Comparison of command shells
- Anonymous pipeAnonymous pipeIn computer science, an anonymous pipe is a simplex FIFO communication channel that may be used for one-way interprocess communication . An implementation is often integrated into the operating system's file IO subsystem...