Fork (operating system)
Encyclopedia
In computing
, when a process forks, it creates a copy of itself. More generally, a fork in a multithreading
environment means that a thread of execution is duplicated, creating a child thread from the parent thread.
Under Unix
and Unix-like
operating system
s, the parent and the child processes can tell each other apart by examining the return value of the
. In the child process, the return value of
of the newly created child process.
The fork operation creates a separate address space
for the child. The child process has an exact copy of all the memory segments of the parent process, though if copy-on-write
semantics are implemented actual physical memory may not be assigned (i.e., both processes may share the same physical memory segments for a while). Both the parent and child processes possess the same code segments, but execute independently of each other.
. In Unix, a filter is a (usually small) program that reads its input from stdin
, and writes its output to stdout
. A pipeline
of these commands can be strung together by a shell
to create new commands. For example, one can string together the output of the
In order to accomplish this, the shell forks itself, and uses pipes
, a form of interprocess communication, to tie the output of the
with the code associated with the programs they are intended to execute, using the
More generally, forking is also performed by the shell each time a user issues a command. A child process is created by forking the shell, and the child process is overlaid, once again by
executable file are listed below.
The
) of size 4K, 4K, and 2K respectively. These three frames will be accommodated in any three free pages in memory.
program) or exits very soon after the
In such cases, a technique called copy-on-write
(COW) is used. With this technique, when a fork occurs, the parent process's pages are not copied for the child process. Instead, the pages are shared between the child and the parent process. Whenever a process (parent or child) modifies a page, a separate copy of that particular page alone is made for that process (parent or child) which performed the modification. This process will then use the newly copied page rather than the shared one in all future references. The other process (the one which did not modify the shared page) continues to use the original copy of the page (which is now no longer shared). This technique is called copy-on-write since the page is copied when some process writes to it.
On some systems,
The
The use of
The
The
If signal handlers are invoked in the child process after
, which is a requirement for implementing the copy-on-write semantics prescribed by
If all processes share a single address space, then the only way
usually omit
mechanism that enable an efficient, memory copy operation of a contiguous address range. In the original design of the VMS (now OpenVMS
) operating system (1977), a copy operation with subsequent mutation of the content of a few specific addresses for the new process as in forking was considered risky. Errors in the current process state may be copied to a child process. Here, the metaphor of process spawning is used: each component of the memory layout of the new process is newly constructed from scratch. From a software-engineering viewpoint this latter approach would be considered more clean and safe, but the fork mechanism is still predominant due to its efficiency. The spawn (computing)
metaphor was later adopted in Microsoft operating systems (1993).
takes no argument and returns a process ID, which is usually an integer value. The returned process ID is of the type pid_t, which has been defined in the header file
, sys/types.h.
The purpose of fork system call is to create a new process, which becomes the child process
of caller, after which both, the parent and child processes, will execute the code following the fork system call. Hence, it is important to distinguish between parent and child process. This can be done by testing the return value of fork system call.
See also
External links
Computing
Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...
, when a process forks, it creates a copy of itself. More generally, a fork in a multithreading
Thread (computer science)
In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. The implementation of threads and processes differs from one operating system to another, but in most cases, a thread is contained inside a process...
environment means that a thread of execution is duplicated, creating a child thread from the parent thread.
Under Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
and Unix-like
Unix-like
A Unix-like operating system is one that behaves in a manner similar to a Unix system, while not necessarily conforming to or being certified to any version of the Single UNIX Specification....
operating system
Operating system
An operating system is a set of programs that manage computer hardware resources and provide common services for application software. The operating system is the most important type of system software in a computer system...
s, the parent and the child processes can tell each other apart by examining the return value of the
fork
system callSystem call
In computing, a system call is how a program requests a service from an operating system's kernel. This may include hardware related services , creating and executing new processes, and communicating with integral kernel services...
. In the child process, the return value of
fork
is 0, whereas the return value in the parent process is the PIDProcess identifier
In computing, the process identifier is a number used by most operating system kernels to uniquely identify a process...
of the newly created child process.
The fork operation creates a separate address space
Address space
In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity.- Overview :...
for the child. The child process has an exact copy of all the memory segments of the parent process, though if copy-on-write
Copy-on-write
Copy-on-write is an optimization strategy used in computer programming. The fundamental idea is that if multiple callers ask for resources which are initially indistinguishable, they can all be given pointers to the same resource...
semantics are implemented actual physical memory may not be assigned (i.e., both processes may share the same physical memory segments for a while). Both the parent and child processes possess the same code segments, but execute independently of each other.
Importance of forking in Unix
Forking is an important part of Unix, critical to the support of its design philosophy, which encourages the development of filtersFilter (Unix)
In Unix and Unix-like operating systems, a filter is a program that gets most of its data from its standard input and writes its main results to its standard output . Unix filters are often used as elements of pipelines...
. In Unix, a filter is a (usually small) program that reads its input from stdin
Standard streams
In Unix and Unix-like operating systems , as well as certain programming language interfaces, the standard streams are preconnected input and output channels between a computer program and its environment when it begins execution...
, and writes its output to stdout
Standard streams
In Unix and Unix-like operating systems , as well as certain programming language interfaces, the standard streams are preconnected input and output channels between a computer program and its environment when it begins execution...
. A pipeline
Pipeline (Unix)
In Unix-like computer operating systems , a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process feeds directly as input to the next one. Each connection is implemented by an anonymous pipe...
of these commands can be strung together by a shell
Unix shell
A Unix shell is a command-line interpreter or shell that provides a traditional user interface for the Unix operating system and for Unix-like systems...
to create new commands. For example, one can string together the output of the
find(1)
command and the input of the wc(1)
command to create a new command that will print a count of files ending in ".cpp" found in the current directory and any subdirectories, as follows:In order to accomplish this, the shell forks itself, and uses pipes
Pipeline (Unix)
In Unix-like computer operating systems , a pipeline is the original software pipeline: a set of processes chained by their standard streams, so that the output of each process feeds directly as input to the next one. Each connection is implemented by an anonymous pipe...
, a form of interprocess communication, to tie the output of the
find
command to the input of the wc
command. Two child processes are created, one for each command (find
and wc
). These child processes are overlaidOverlay (operating system)
In operating systems, an overlay is when a process replaces itself with the code of another program. On Unix-like systems, this is accomplished with the exec system call....
with the code associated with the programs they are intended to execute, using the
exec(3)
family of system calls (in the above example, find
will overlay the first child process, and wc
will overlay the second child process, and the shell will use pipes to tie the output of find with the input of wc).More generally, forking is also performed by the shell each time a user issues a command. A child process is created by forking the shell, and the child process is overlaid, once again by
exec
, with the code associated with the program to be executed.Process Address Space
Whenever an executable file is executed, it becomes a process. An executable file contains binary code grouped into a number of blocks called segments. Each segment is used for storing a particular type of data. A few segment names of a typical ELFExecutable and Linkable Format
In computing, the Executable and Linkable Format is a common standard file format for executables, object code, shared libraries, and core dumps. First published in the System V Application Binary Interface specification, and later in the Tool Interface Standard, it was quickly accepted among...
executable file are listed below.
- text — Segment containing executable code
- .bss.bssIn computer programming, the name .bss or bss is used by many compilers and linkers for a part of the data segment containing statically-allocated variables represented solely by zero-valued bits initially...
— Segment containing data initialized to zero - dataData segmentA data segment is a portion of virtual address space of a program, which contains the global variables and static variables that are initialized by the programmer...
— Segment containing initialized data - symtab — Segment containing the program symbols (e.g., function name, variable names, etc.)
- interp — Segment containing the name of the interpreter to be used
The
readelf
command can provide further details of the ELF file. When such a file is loaded in the memory for execution, the segments are loaded in memory. It is not necessary for the entire executable to be loaded in contiguous memory locations. Memory is divided into equal sized partitions called pages (typically 4KB). Hence when the executable is loaded in the memory, different parts of the executable are placed in different pages (which might not be contiguous). Consider an ELF executable file of size 10K. If the page size supported by the OS is 4K, then the file will be split into three pieces (also called framesCall stack
In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, control stack, run-time stack, or machine stack, and is often shortened to just "the stack"...
) of size 4K, 4K, and 2K respectively. These three frames will be accommodated in any three free pages in memory.
Fork and page sharing
When afork
system call is issued, a copy of all the pages corresponding to the parent process is created, loaded into a separate memory location by the OS for the child process. But this is not needed in certain cases. Consider the case when a child executes an "execExec (operating system)The exec collection of functions of Unix-like operating systems cause the running process to be completely replaced by the program passed as an argument to the function...
" system call (which is used to execute any executable file from within a CC (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
program) or exits very soon after the
fork
. When the child is needed just to execute a command for the parent process, there is no need for copying the parent process' pages, since exec replaces the address space of the process which invoked it with the command to be executed.In such cases, a technique called copy-on-write
Copy-on-write
Copy-on-write is an optimization strategy used in computer programming. The fundamental idea is that if multiple callers ask for resources which are initially indistinguishable, they can all be given pointers to the same resource...
(COW) is used. With this technique, when a fork occurs, the parent process's pages are not copied for the child process. Instead, the pages are shared between the child and the parent process. Whenever a process (parent or child) modifies a page, a separate copy of that particular page alone is made for that process (parent or child) which performed the modification. This process will then use the newly copied page rather than the shared one in all future references. The other process (the one which did not modify the shared page) continues to use the original copy of the page (which is now no longer shared). This technique is called copy-on-write since the page is copied when some process writes to it.
Vfork and page sharing
vfork
is another UNIX system call used to create a new process. When a vfork
system call is issued, the parent process will be suspended until the child process has either completed execution or been replaced with a new executable image via one of the execve
family of system calls. Even in vfork
, the pages are shared among the parent and child process. But vfork
does not mandate copy-on-write. Hence if the child process makes a modification in any of the shared pages, no new page will be created and the modified pages are visible to the parent process too. Since there is absolutely no page copying involved (consuming additional memory), this technique is highly efficient when a process needs to execute a blocking command using the child process.On some systems,
vfork
is the same as fork
.The
vfork
function differs from fork
only in that the child process can share code and data with the calling process (parent process). This speeds cloning activity significantly, at a risk to the integrity of the parent process if vfork
is misused.The use of
vfork
for any purpose except as a prelude to an immediate call to a function from the exec
family, or to _exit
, is not advised. In particular the Linux man page for vfork strongly discourages its use: The
vfork
function can be used to create new processes without fully copying the address space of the old process. If a forked process is simply going to call exec
, the data space copied from the parent to the child by fork
is not used. This is particularly inefficient in a paged environment, making vfork
particularly useful. Depending upon the size of the parent's data space, vfork
can give a significant performance improvement over fork
.The
vfork
function can normally be used just like fork
. It does not work, however, to return while running in the child's context from the caller of vfork
since the eventual return from vfork
would then return to a no longer existent stack frame. Care must also be taken to call _exit
rather than exit
if exec
cannot be called, since exit
flushes and closes standard I/O channels, thereby damaging the parent process's standard I/O data structures. (Even with fork
, it is still incorrect to call exit
, since buffered data would then be flushed twice.)If signal handlers are invoked in the child process after
vfork
, they must follow the same rules as other code in the child process.MMUless systems
On several embedded devices, there is no Memory Management UnitMemory management unit
A memory management unit , sometimes called paged memory management unit , is a computer hardware component responsible for handling accesses to memory requested by the CPU...
, which is a requirement for implementing the copy-on-write semantics prescribed by
fork
. If the system has some other mechanism for per-process address spaces, such as a segment register, copying the entire process memory to the new process achieves the desired effect, however this is a costly operation and most likely unneeded given that the new process almost immediately replaces the process image in most instances.If all processes share a single address space, then the only way
fork
could be implemented would be to swap the memory pages along with the rest of the task context switch. Rather than doing that, embedded operating systems such as uClinuxUClinux
μClinux stands for "MicroController Linux," and is pronounced "you-see-Linux" as explained on the website, not the way the Greek letter mu is normally pronounced. It was a fork of the Linux kernel for microcontrollers without a memory management unit...
usually omit
fork
and only implement vfork
; part of the work porting to such a platform involves rewriting code to use the latter.Forking in other operating systems
The fork mechanism (1969) in Unix and Linux maintains implicit assumptions on the underlying hardware: linear memory and a pagingPaging
In computer operating systems, paging is one of the memory-management schemes by which a computer can store and retrieve data from secondary storage for use in main memory. In the paging memory-management scheme, the operating system retrieves data from secondary storage in same-size blocks called...
mechanism that enable an efficient, memory copy operation of a contiguous address range. In the original design of the VMS (now OpenVMS
OpenVMS
OpenVMS , previously known as VAX-11/VMS, VAX/VMS or VMS, is a computer server operating system that runs on VAX, Alpha and Itanium-based families of computers. Contrary to what its name suggests, OpenVMS is not open source software; however, the source listings are available for purchase...
) operating system (1977), a copy operation with subsequent mutation of the content of a few specific addresses for the new process as in forking was considered risky. Errors in the current process state may be copied to a child process. Here, the metaphor of process spawning is used: each component of the memory layout of the new process is newly constructed from scratch. From a software-engineering viewpoint this latter approach would be considered more clean and safe, but the fork mechanism is still predominant due to its efficiency. The spawn (computing)
Spawn (computing)
Spawn in computing refers to a function that loads and executes a new child process.The current process may or may not continue to execute asynchronously...
metaphor was later adopted in Microsoft operating systems (1993).
Application usage
The fork system callSystem call
In computing, a system call is how a program requests a service from an operating system's kernel. This may include hardware related services , creating and executing new processes, and communicating with integral kernel services...
takes no argument and returns a process ID, which is usually an integer value. The returned process ID is of the type pid_t, which has been defined in the header file
Header file
Some programming languages use header files. These files allow programmers to separate certain elements of a program's source code into reusable files. Header files commonly contain forward declarations of classes, subroutines, variables, and other identifiers...
, sys/types.h.
The purpose of fork system call is to create a new process, which becomes the child process
Child process
A child process in computing is a process created by another process .A child process inherits most of its attributes, such as open files, from its parent. In UNIX, a child process is in fact created as a copy of the parent...
of caller, after which both, the parent and child processes, will execute the code following the fork system call. Hence, it is important to distinguish between parent and child process. This can be done by testing the return value of fork system call.
- If fork returns a negative value, it indicates that the creation of the process was unsuccessful.
- fork returns a zero to the newly created child process.
- fork returns a positive value, the process ID of the child process, to the parent.
Example in C
Example in Perl
Example in Python
See also
- Child processChild processA child process in computing is a process created by another process .A child process inherits most of its attributes, such as open files, from its parent. In UNIX, a child process is in fact created as a copy of the parent...
- Parent processParent processIn computing, a parent process is a process that has created one or more child processes.- Unix :In the operating system Unix, every process except is created when another process executes the fork system call. The process that invoked fork is the parent process and the newly-created process is...
- Fork bombFork bombIn computing, the fork bomb is a form of denial-of-service attack against a computer system which makes use of the fork operation whereby a running process can create another running process...
- Fork-execFork-execFork-exec is a commonly used technique in Unix whereby an executing process spawns a new program. fork is the name of the system call that the parent process uses to "divide" itself . After calling fork, the created child process is an exact copy of the parent except for the return value...
- ExecExec (operating system)The exec collection of functions of Unix-like operating systems cause the running process to be completely replaced by the program passed as an argument to the function...
- ExitExit (operating system)On many computer operating systems, a computer process terminates its execution by making an exit system call. More generally, an exit in a multithreading environment means that a thread of execution has stopped running. The operating system reclaims resources that were used by the process...
- WaitWait (operating system)In modern computer operating systems, a process may wait on another process to complete its execution. In most systems, a parent process can create an independently executing child process. The parent process may then issue a wait system call, which suspends the execution of the parent process...
External links
Child process
A child process in computing is a process created by another process .A child process inherits most of its attributes, such as open files, from its parent. In UNIX, a child process is in fact created as a copy of the parent...
Parent process
In computing, a parent process is a process that has created one or more child processes.- Unix :In the operating system Unix, every process except is created when another process executes the fork system call. The process that invoked fork is the parent process and the newly-created process is...
Fork bomb
In computing, the fork bomb is a form of denial-of-service attack against a computer system which makes use of the fork operation whereby a running process can create another running process...
Fork-exec
Fork-exec is a commonly used technique in Unix whereby an executing process spawns a new program. fork is the name of the system call that the parent process uses to "divide" itself . After calling fork, the created child process is an exact copy of the parent except for the return value...
Exec (operating system)
The exec collection of functions of Unix-like operating systems cause the running process to be completely replaced by the program passed as an argument to the function...
Exit (operating system)
On many computer operating systems, a computer process terminates its execution by making an exit system call. More generally, an exit in a multithreading environment means that a thread of execution has stopped running. The operating system reclaims resources that were used by the process...
Wait (operating system)
In modern computer operating systems, a process may wait on another process to complete its execution. In most systems, a parent process can create an independently executing child process. The parent process may then issue a wait system call, which suspends the execution of the parent process...
- Lightwolf, A library that implements thread forking for Java
- NetBSD: Why implement traditional vfork