Computer experiment
Encyclopedia
In the scientific context, a computer experiment refer to mathematical modeling using computer simulation
Computer simulation
A computer simulation, a computer model, or a computational model is a computer program, or network of computers, that attempts to simulate an abstract model of a particular system...

. It has become common to call such experiments in silico
In silico
In silico is an expression used to mean "performed on computer or via computer simulation." The phrase was coined in 1989 as an analogy to the Latin phrases in vivo and in vitro which are commonly used in biology and refer to experiments done in living organisms and outside of living organisms,...

. This area includes Computational physics
Computational physics
Computational physics is the study and implementation of numerical algorithms to solve problems in physics for which a quantitative theory already exists...

, Computational chemistry
Computational chemistry
Computational chemistry is a branch of chemistry that uses principles of computer science to assist in solving chemical problems. It uses the results of theoretical chemistry, incorporated into efficient computer programs, to calculate the structures and properties of molecules and solids...

, Computational biology
Computational biology
Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems...

 and other similar disciplines.

Computer simulation as a building block of a computer experiment

In a computer simulation, a "computer" model typically replaces a traditional mathematical model. Whereas a mathematical model is traditionally solved analytically, a computer model can be solved numerically: this is what a computer simulation
Computer simulation
A computer simulation, a computer model, or a computational model is a computer program, or network of computers, that attempts to simulate an abstract model of a particular system...

 of a system (typically a physical system) is about. (Sometimes, an analytical solution to a mathematical model is not known.However,a computer simulation can find an approximate solution.Typically,this happens with differential equations).

In a computer experiment a computer model is used to make inferences about some underlying system. The idea is that the computer model takes the place of an experiment we cannot do: the phrase in silico experiment
In silico
In silico is an expression used to mean "performed on computer or via computer simulation." The phrase was coined in 1989 as an analogy to the Latin phrases in vivo and in vitro which are commonly used in biology and refer to experiments done in living organisms and outside of living organisms,...

is also used. At the moment, for example, the debate on climate change is being informed largely from evaluations of climate simulators running on some of the largest computers in the world, which are being used to investigate the impact of a substantial increase in the atmospheric concentration of greenhouse gases like carbon dioxide. In this case,
the accumulation of many simulations on different initial conditions form an experiment.

Computer experiments and statistics

Computer experiments can be seen a branch of applied statistics, because the user must account for three sources of uncertainty. First, the models often contain parameters whose values are not certain; second, the models themselves are imperfect representations of the underlying system; and third, data collected from the system that might be used to calibrate the models are imperfectly measured. However, it is fair to say that most practitioners of computer experiments do not see themselves as statisticians.

History

The first computer experiments were probably conducted at Los Alamos National Laboratory
Los Alamos National Laboratory
Los Alamos National Laboratory is a United States Department of Energy national laboratory, managed and operated by Los Alamos National Security , located in Los Alamos, New Mexico...

 to study the behaviour of nuclear weapons. Since then, the use of computer models has branched out into large parts of the physical and environmental sciences (where they are sometimes referred to as process models), and in medicine. Because computer experiments have developed in such a wide range of applications there is little standardisation of the terminology.

Preliminary remark

As a general guide, in this article learning about the model parameters using data from the system is referred to as (model) calibration, while learning about the system behaviour itself as (system) prediction. Combining both of these, e.g., using the model and system data to make predictions about the system, is referred to as calibrated prediction. Other terminology is discussed below, in #The "traditional" approach.

Constructing a simulator

The simulator is the computer code that we actually evaluate: the outputs of the simulator correspond, usually directly, to measurable aspects of the system. It is important to understand the process of creating a simulator, because this allows us to make judgements about how similar two or more simulators of the same system are. Without this information it is difficult to combine information from different simulators, because we do not know to what extent we can treat them as independent sources of information. See also Computer simulation
Computer simulation
A computer simulation, a computer model, or a computational model is a computer program, or network of computers, that attempts to simulate an abstract model of a particular system...

.

In most applications there are typically three parts to a simulator: the model, the treatment and the solver. Thus we might write

Simulator = Model + Treatment + Solver

Differences in each of these three components give rise to different simulators.

The model

The subject of models is a large one: see Model (abstract) for an introduction and further links. Our starting point is a mathematical model
Mathematical model
A mathematical model is a description of a system using mathematical concepts and language. The process of developing a mathematical model is termed mathematical modeling. Mathematical models are used not only in the natural sciences and engineering disciplines A mathematical model is a...

 for the system of interest. In the physical sciences a model typically describes the state variables, plus fundamental laws and equations of state that variables exist and evolve in space and time.

For example, suppose you were interested in building an ocean model. Then you might proceed as follows:
  • State variables: Velocity (in each of three directions), pressure, temperature, salinity, density

  • Fundamental laws: Navier-Stokes equations
    Navier-Stokes equations
    In physics, the Navier–Stokes equations, named after Claude-Louis Navier and George Gabriel Stokes, describe the motion of fluid substances. These equations arise from applying Newton's second law to fluid motion, together with the assumption that the fluid stress is the sum of a diffusing viscous...

     for conservation of momentum, continuity equation
    Continuity equation
    A continuity equation in physics is a differential equation that describes the transport of a conserved quantity. Since mass, energy, momentum, electric charge and other natural quantities are conserved under their respective appropriate conditions, a variety of physical phenomena may be described...

     (conservation of mass), conservation of temperature and salinity

  • Equations of state: Relationship of density to temperature, salinity and pressure, and perhaps also a model for the formation of sea-ice


The state variables for the ocean model are expressed as a continuum in space and time, and the fundamental laws as partial differential equation
Partial differential equation
In mathematics, partial differential equations are a type of differential equation, i.e., a relation involving an unknown function of several independent variables and their partial derivatives with respect to those variables...

s. Even at this stage, though, simplifications may be made. For example, it is common to treat seawater as incompressible. Furthermore, equations of state are often specified by empirical
relationships based on laboratory experiments.

In the "less-physical" sciences the models tend to be constructed with respect to the key processes, using what are sometimes referred to as compartmental models. (Example to be supplied here.) This also happens with some physical models. For example, a simple model of an ocean such as the Atlantic might divide the ocean into four compartments: 'south', 'tropical', 'north' and 'deep'.

The treatment

The treatment is what makes the model applicable to a particular instance. For example, it is what makes our model of the ocean applicable to the earth during the period 1750 - 2100. The treatment in this case comprises boundary conditions that describe the ocean margins and topography, initial conditions that quantify the state vector (velocity, pressure, etc) at every location at the start of 1750, and forcing functions that describe external influences on the oceans over the period 1750 - 2100. These forcings mainly describe events at the surface of the ocean, such as temperature, winds, and exchanges of freshwater through evaporation and precipitation. `Historic' values of forcing can be inferred from data, while `future' values, i.e. those from today to 2100, will be specified according to a particular scenario.

Note that in large-scale climate modelling we couple an ocean model with an atmosphere model, so that the forcing at the margin between the ocean and the atmosphere does not have to be prescribed, but can be inferred. At the moment this type of coupling is a bit of a black art. The forcing in these coupled models tends to be on the atmosphere: things like orbital effects, and atmospheric concentrations of greenhouse gases like CO2.

The solver

Finally, the solver turns the model and the treatment into a calculation that approximates the evolution of the state vector. At this point it is usually necessary to discretise the problem, which involves replacing the continuum with a lattice of discrete points. For an ocean simulator, the Earth's surface might be divided into rectangles, and the ocean itself into a number of layers. This division is typically fixed for a given simulator, and the number of cells is referred to as the simulator's resolution. Time is also discretised, although it is often possible to treat the step-size between adjacent time-points in quite a sophisticated manner, so that it adapts to the needs of the calculation.

Discretisation allows us to approximate the solution of the model for our given treatment, but it introduces problems that can necessitate further adjustment. There may be processes with characteristic scales that are smaller than a grid cell, or a time-step. These do not get picked up by the simulator, which behaves as though the state vector is constant over each cell and time-step. These so-called sub-grid-scale processes need to be put back in if they are thought to be a large component of the model. So the solver also includes an approximation for these processes: the impact of this approximation should go to zero as the simulator resolution becomes large.

The simulator as a function

The simulator is implemented as a piece of computer code that can be evaluated to produce a collection of outputs, normally written to file. In the code itself, or in files that are read by the code, we find all of the numbers that are required before the code will run. Typically these comprise (a) coefficients in the underlying model; and possibly also (b) initial conditions and (c) forcing functions. It is natural to see the simulator as a deterministic function that maps these inputs into a collection of outputs. As an aside, this notion can be extended to stochastic simulators, if we think of one of the inputs as being the seed to a random number generator.

On the basis of seeing our simulator this way, it is common to refer to the collection of inputs as x, the simulator itself as f, and the resulting output as f(x). Both x and f(x) are vector quantities, and they can be very large collections of values, often indexed by space, or by time, or by both space and time.

It is worth noting here that the simulator itself is a worthy object of inference. Many simulators embody, both formally and informally, the expertise of large sections of the relevant scientific community and as such are interesting objects in their own right.

Although is known in principle, in practice this is not the case. Many simulators comprise tens of thousands of lines of high-level computer code, which is not accessible to intuition. As discussed in the next section, without actually running the code it is impossible to predict exactly what the outputs will be.

Such a view lends itself to a Bayesian analysis, in which is treated as a random function, and the set of simulator runs as observations. The next section implicitly treats as a random function about which inferences may be made using the Bayesian paradigm.

Sources of uncertainty

There are four different sources of uncertainty in a computer experiment, which are discussed in turn.

Uncertainty about the simulator behaviour

We are uncertain about what would happen if we evaluated the simulator at a particular input value x; that is, until we actually make this evaluation, the outcome f(x) is uncertain. This is because the simulator is usually sufficiently complicated that we cannot know f(x) in the same way that we know x1 + x2 for given values of x1 and x2. Where the simulator is cheap to evaluate and the number of components in x is quite small this is not usually a problem, as we can either evaluate the simulator at little cost, or find an input close by at which we already have an evaluation. In this case our uncertainty about f(x) for any given x is not really an issue. But where the simulator is expensive to evaluate or there are lots of components in x, our typical state is to be uncertain about f(x), and this uncertainty can be quite large.

Uncertainty about the 'correct' simulator input

There are two reasons why we might be uncertain about the 'correct' simulator input. First, we may not have a precise measurement: this applies to components of x which have an analogue in the system. These measurable inputs would typically include initial conditions, since these are starting values of the state vector and the state vector is typically a quantity with an analogue in the system, like water temperature. Many forcing functions are also measurable. But just being measurable does not mean that we actually have the measurements, made without error.

Second, some components of x may not correspond to any measurable quantity in the system. These tuning inputs are typically found in the model parameters, in places where the model has been simplified, or where processes that are not completely understood have been represented by flexible but non-physical sub-models. These tuning inputs tend to represent quite general concepts, often highly aggregated.

For example, in a hydrocarbon reservoir it is common to parameterise each fault with a "transmissibility". We know that the nature of a fault varies at the microscale level, so obviously a single number is a gross simplification, but a reasonable one if the treatment has 100 faults. But what is the right value for "transmissibility" in this case? Exactly the same problems apply to rock permeability when averaged over regions, to give an aggregated "permeability". These are two examples where simplifying the model can lead to problems with the definitions of some of the input components, resulting in uncertainty about the 'correct' value. We cannot even be sure that there is a correct value, hence the use of 'correct' instead.

Further reading

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK