Gillespie algorithm
Encyclopedia
In probability theory
, the Gillespie algorithm (or occasionally the Doob-Gillespie algorithm) generates a statistically correct trajectory (possible solution) of a stochastic
equation. It was created by Joseph L. Doob and others (circa 1945) and popularized by Dan Gillespie in 1977 in a paper where he uses it to simulate chemical or biochemical systems of reactions efficiently and accurately using limited computational power (see Stochastic simulation
). As computers have become faster, the algorithm has been used to simulate increasingly complex systems. The algorithm is particularly useful for simulating reactions within cells where the number of reagent
s typically number in the tens of molecules (or less). Mathematically, it is a variety of a dynamic Monte Carlo method
and similar to the kinetic Monte Carlo
methods. It is used heavily in computational systems biology
.
in the natural sciences). It was William Feller
, in 1940, who found the conditions under which the Kolmogorov equations admitted (proper) probabilities as solutions. In his Theorem I (1940 work) he establishes that the time-to-the-next-jump was exponentially distributed and the probability of the next event is proportional to the rate. As such, he established the relation of Kolmogorov's equations with stochastic process
es.
Later, Doob (1942, 1945) extended Feller's solutions beyond the case of pure-jump processes. The method was implemented in computers by David George Kendall
(1950) using the Manchester Mark 1
computer and later used by M. S. Bartlett
(1953) in his studies of epidemics outbreaks.
Gillespie (1977) worked ignoring this history as he writes "It should be emphasized, though, that the master equation itself plays no role whatsoever in either the derivation or the implementation of the stochastic simulation algorithm". Gillespie then proceeds through a heuristic argument to introduce the algorithm.
.
The physical basis of the algorithm is the collision of molecules within a reaction vessel. It is assumed that collisions are frequent, but collisions with the proper orientation and energy are infrequent. Therefore, all reactions within the Gillespie framework must involve at most two molecules. Reactions involving three molecules are assumed to be extremely rare and are modeled as a sequence of binary reactions. It is also assumed that the reaction environment is well mixed.
The algorithm is computationally expensive and thus many modifications and adaptations exist, including the next reaction method (Gibson & Bruck), tau-leaping, as well as hybrid techniques where abundant reactants are modeled with deterministic behavior. Adapted techniques generally compromise the exactitude of the theory behind the algorithm as it connects to the Master equation, but offer reasonable realizations for greatly improved timescales. The computational cost of exact versions of the algorithm is determined by the coupling class of the reaction network. In weakly coupled networks, the number of reactions that is influenced by any other reaction is bounded by a small constant. In strongly coupled networks, a single reaction firing can in principle affect all other reactions. An exact version of the algorithm with constant-time scaling for weakly coupled networks has been developed, enabling efficient simulation of systems with very large numbers of reaction channels (Slepoy 2008). The generalized Gillespie algorithm that accounts for the non-Markovian properties of random biochemical events with delay has been developed by Bratsun et al. 2005 and independently Barrio et al. 2006, as well as (Cai 2007). See the articles cited below for details.
Partial-propensity formulations as developed by Ramaswamy et al. (Ramaswamy 2009, 2010), and later independently rediscovered by Indurkhya (Indurkhya, 2010), are available to construct a family of exact versions of the algorithm whose computational cost is proportional to the number of chemical species in the network, rather than the (larger) number of reactions. These formulations can reduce the computational cost to constant-time scaling for weakly coupled networks and to scale at most linearly with the number of species for strongly coupled networks. A partial-propensity variant of the generalized Gillespie algorithm for reactions with delays has also been proposed (Ramaswamy 2011). The use of partial-propensity methods is limited to elementary chemical reactions, i.e., reactions with at most two different reactants. Every non-elementary chemical reaction can be equivalently decomposed into a set of elementary ones, at the expense of a linear (in the order of the reaction) increase in network size.
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...
, the Gillespie algorithm (or occasionally the Doob-Gillespie algorithm) generates a statistically correct trajectory (possible solution) of a stochastic
Stochastic
Stochastic refers to systems whose behaviour is intrinsically non-deterministic. A stochastic process is one whose behavior is non-deterministic, in that a system's subsequent state is determined both by the process's predictable actions and by a random element. However, according to M. Kac and E...
equation. It was created by Joseph L. Doob and others (circa 1945) and popularized by Dan Gillespie in 1977 in a paper where he uses it to simulate chemical or biochemical systems of reactions efficiently and accurately using limited computational power (see Stochastic simulation
Stochastic simulation
Stochastic simulation algorithms and methods were initially developed to analyse chemical reactions involving large numbers of species with complex reaction kinetics. The first algorithm, the Gillespie algorithm was proposed by Dan Gillespie in 1977...
). As computers have become faster, the algorithm has been used to simulate increasingly complex systems. The algorithm is particularly useful for simulating reactions within cells where the number of reagent
Reagent
A reagent is a "substance or compound that is added to a system in order to bring about a chemical reaction, or added to see if a reaction occurs." Although the terms reactant and reagent are often used interchangeably, a reactant is less specifically a "substance that is consumed in the course of...
s typically number in the tens of molecules (or less). Mathematically, it is a variety of a dynamic Monte Carlo method
Dynamic Monte Carlo method
In chemistry, dynamic Monte Carlo is a method for modeling the dynamic behaviors of molecules by comparing the rates of individual steps with random numbers...
and similar to the kinetic Monte Carlo
Kinetic Monte Carlo
The kinetic Monte Carlo method is a Monte Carlo method computer simulation intended to simulate the time evolution of some processes occurring in nature. Typically these are processes that occur with a given known rate...
methods. It is used heavily in computational systems biology
Computational systems biology
Modeling biological systems is a significant task of systems biology and mathematical biology.Computational systems biology aims to develop and use efficient algorithms, data structures, visualization and communication tools with the goal of computer modeling of biological systems...
.
History
The process that lead to the algorithm recognizes several important steps. In 1931, Andrei Kolmogorov introduced the differential equations corresponding to the time-evolution of stochastic processes that proceed by jumps, today known as Kolmogorov equations (Markov jump process) (a simplified version is known as master equationMaster equation
In physics and chemistry and related fields, master equations are used to describe the time-evolution of a system that can be modelled as being in exactly one of countable number of states at any given time, and where switching between states is treated probabilistically...
in the natural sciences). It was William Feller
William Feller
William Feller born Vilibald Srećko Feller , was a Croatian-American mathematician specializing in probability theory.-Early life and education:...
, in 1940, who found the conditions under which the Kolmogorov equations admitted (proper) probabilities as solutions. In his Theorem I (1940 work) he establishes that the time-to-the-next-jump was exponentially distributed and the probability of the next event is proportional to the rate. As such, he established the relation of Kolmogorov's equations with stochastic process
Stochastic process
In probability theory, a stochastic process , or sometimes random process, is the counterpart to a deterministic process...
es.
Later, Doob (1942, 1945) extended Feller's solutions beyond the case of pure-jump processes. The method was implemented in computers by David George Kendall
David George Kendall
David George Kendall FRS was an English statistician, who spent much of his academic life in the University of Oxford and the University of Cambridge. He worked with M. S...
(1950) using the Manchester Mark 1
Manchester Mark 1
The Manchester Mark 1 was one of the earliest stored-program computers, developed at the Victoria University of Manchester from the Small-Scale Experimental Machine or "Baby" . It was also called the Manchester Automatic Digital Machine, or MADM...
computer and later used by M. S. Bartlett
M. S. Bartlett
Maurice Stevenson Bartlett FRS was an English statistician who made particular contributions to the analysis of data with spatial and temporal patterns...
(1953) in his studies of epidemics outbreaks.
Gillespie (1977) worked ignoring this history as he writes "It should be emphasized, though, that the master equation itself plays no role whatsoever in either the derivation or the implementation of the stochastic simulation algorithm". Gillespie then proceeds through a heuristic argument to introduce the algorithm.
Idea behind the algorithm
Traditional continuous and deterministic biochemical rate equations do not accurately predict cellular reactions since they rely on bulk reactions that require the interactions of millions of molecules. They are typically modeled as a set of coupled ordinary differential equations. In contrast, the Gillespie algorithm allows a discrete and stochastic simulation of a system with few reactants because every reaction is explicitly simulated. When simulated, a Gillespie realization represents a random walk that exactly represents the distribution of the master equationMaster equation
In physics and chemistry and related fields, master equations are used to describe the time-evolution of a system that can be modelled as being in exactly one of countable number of states at any given time, and where switching between states is treated probabilistically...
.
The physical basis of the algorithm is the collision of molecules within a reaction vessel. It is assumed that collisions are frequent, but collisions with the proper orientation and energy are infrequent. Therefore, all reactions within the Gillespie framework must involve at most two molecules. Reactions involving three molecules are assumed to be extremely rare and are modeled as a sequence of binary reactions. It is also assumed that the reaction environment is well mixed.
Algorithm
Gillespie developed two different, but equivalent formulations; the direct method and the first reaction method. Below is a summary of the steps to run the algorithm (math omitted):- Initialization: Initialize the number of molecules in the system, reactions constants, and random number generators.
- Monte Carlo step: Generate random numbers to determine the next reaction to occur as well as the time interval. The probability of a given reaction to be chosen is proportional to the number of substrate molecules.
- Update: Increase the time step by the randomly generated time in Step 2. Update the molecule count based on the reaction that occurred.
- Iterate: Go back to Step 2 unless the number of reactants is zero or the simulation time has been exceeded.
The algorithm is computationally expensive and thus many modifications and adaptations exist, including the next reaction method (Gibson & Bruck), tau-leaping, as well as hybrid techniques where abundant reactants are modeled with deterministic behavior. Adapted techniques generally compromise the exactitude of the theory behind the algorithm as it connects to the Master equation, but offer reasonable realizations for greatly improved timescales. The computational cost of exact versions of the algorithm is determined by the coupling class of the reaction network. In weakly coupled networks, the number of reactions that is influenced by any other reaction is bounded by a small constant. In strongly coupled networks, a single reaction firing can in principle affect all other reactions. An exact version of the algorithm with constant-time scaling for weakly coupled networks has been developed, enabling efficient simulation of systems with very large numbers of reaction channels (Slepoy 2008). The generalized Gillespie algorithm that accounts for the non-Markovian properties of random biochemical events with delay has been developed by Bratsun et al. 2005 and independently Barrio et al. 2006, as well as (Cai 2007). See the articles cited below for details.
Partial-propensity formulations as developed by Ramaswamy et al. (Ramaswamy 2009, 2010), and later independently rediscovered by Indurkhya (Indurkhya, 2010), are available to construct a family of exact versions of the algorithm whose computational cost is proportional to the number of chemical species in the network, rather than the (larger) number of reactions. These formulations can reduce the computational cost to constant-time scaling for weakly coupled networks and to scale at most linearly with the number of species for strongly coupled networks. A partial-propensity variant of the generalized Gillespie algorithm for reactions with delays has also been proposed (Ramaswamy 2011). The use of partial-propensity methods is limited to elementary chemical reactions, i.e., reactions with at most two different reactants. Every non-elementary chemical reaction can be equivalently decomposed into a set of elementary ones, at the expense of a linear (in the order of the reaction) increase in network size.
Further reading
- (Slepoy 2008):
- (Bratsun 2005):
- (Barrio 2005):
- (Cai 2007):
- (Barnes 2010):
- (Ramaswamy 2009):
- (Ramaswamy 2010):
- (Indurkhya 2010):
- (Ramaswamy 2011):
External links
Software- Cain - Stochastic simulation of chemical kinetics. Direct, next reaction, tau-leaping, hybrid, etc.
- StochPy - Stochastic modelling in Python
- SynBioSS - Stochastic simulation of chemical kinetics using the exact SSA as well as an SSA/Langevin hybrid. Both MPI-parallel (supercomputer) and GUI (desktop) versions are provided.
- GillespieSSA - R package for Gillespie algorithm
- http://demonstrations.wolfram.com/DeterministicVersusStochasticChemicalKinetics/ - Mathematica code and applet for stochastic simulation of chemical kinetics.
- PDM - C++ implementations of all partial-propensity methods.