Survival analysis
Encyclopedia
Survival analysis is a branch of statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

 which deals with death in biological organisms and failure in mechanical systems. This topic is called reliability theory
Reliability theory
Reliability theory describes the probability of a system completing its expected function during an interval of time. It is the basis of reliability engineering, which is an area of study focused on optimizing the reliability, or probability of successful functioning, of systems, such as airplanes,...

or reliability analysis in engineering, and duration analysis or duration modeling in economics
Economics
Economics is the social science that analyzes the production, distribution, and consumption of goods and services. The term economics comes from the Ancient Greek from + , hence "rules of the house"...

 or sociology
Sociology
Sociology is the study of society. It is a social science—a term with which it is sometimes synonymous—which uses various methods of empirical investigation and critical analysis to develop a body of knowledge about human social activity...

. More generally, survival analysis involves the modeling of time to event data; in this context, death or failure is considered an "event" in the survival analysis literature – traditionally only a single event occurs, after which the organism or mechanism is dead or broken.

More recently, many concepts in survival analysis have been explained by Counting Process Theory, which adds flexibility in that it allows modeling multiple (or recurrent) events. This type of modeling fits very well in many situations, when the event is significant but does not end the lifespan of the subject – e.g. people can go to jail multiple times, alcoholics can start and stop drinking multiple times, and people can get married and divorced multiple times.

Survival analysis attempts to answer questions such as: what is the fraction of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the odds of survival?

To answer such questions, it is necessary to define "lifetime". In the case of biological survival, death
Death
Death is the permanent termination of the biological functions that sustain a living organism. Phenomena which commonly bring about death include old age, predation, malnutrition, disease, and accidents or trauma resulting in terminal injury....

 is unambiguous, but for mechanical reliability, failure
Failure
Failure refers to the state or condition of not meeting a desirable or intended objective, and may be viewed as the opposite of success. Product failure ranges from failure to sell the product to fracture of the product, in the worst cases leading to personal injury, the province of forensic...

 may not be well-defined, for there may well be mechanical systems in which failure is partial, a matter of degree, or not otherwise localized in time
Time
Time is a part of the measuring system used to sequence events, to compare the durations of events and the intervals between them, and to quantify rates of change such as the motions of objects....

. Even in biological problems, some events (for example, heart attack
Myocardial infarction
Myocardial infarction or acute myocardial infarction , commonly known as a heart attack, results from the interruption of blood supply to a part of the heart, causing heart cells to die...

 or other organ failure) may have the same ambiguity. The theory
Theory
The English word theory was derived from a technical term in Ancient Greek philosophy. The word theoria, , meant "a looking at, viewing, beholding", and referring to contemplation or speculation, as opposed to action...

 outlined below assumes well-defined events at specific times; other cases may be better treated by models which explicitly account for ambiguous events.

The theory of survival presented here also assumes that death or failure happens just once for each subject. Recurring event or repeated event models relax that assumption. The study of recurring events is relevant in systems reliability, and in many areas of social sciences and medical research.

This article is phrased primarily in terms of biological survival, but this is just for convenience. An equivalent formulation in terms of mechanical failure can be made by replacing every occurrence of death with failure.

Survival function

The object of primary interest is the survival function, conventionally denoted S, which is defined as


where t is some time, T is a random variable
Random variable
In probability and statistics, a random variable or stochastic variable is, roughly speaking, a variable whose value results from a measurement on some type of random process. Formally, it is a function from a probability space, typically to the real numbers, which is measurable functionmeasurable...

 denoting the time of death, and "Pr" stands for probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...

. That is, the survival function is the probability that the time of death is later than some specified time t.
The survival function is also called the survivor function or survivorship function in problems of biological survival, and the reliability function in mechanical survival problems. In the latter case, the reliability function is denoted R(t).

Usually one assumes S(0) = 1, although it could be less than 1 if there is the possibility of immediate death or failure.

The survival function must be non-increasing: S(u) ≤ S(t) if ut. This property follows directly from F(t) = 1 - S (t) being the integral of a non-negative function. This reflects the notion that survival to a later age is only possible if all younger ages are attained. Given this property, the lifetime distribution function and event density (F and f below) are well-defined.

The survival function is usually assumed to approach zero as age increases without bound, i.e., S(t) → 0 as t → ∞, although the limit could be greater than zero if eternal life
Eternal life (Christianity)
In Christianity the term eternal life traditionally refers to continued life after death, rather than immortality. While scholars such as John H. Leith assert that...

 is possible. For instance, we could apply survival analysis to a mixture of stable and unstable carbon isotopes; unstable isotopes would decay sooner or later, but the stable isotopes would last indefinitely.

Lifetime distribution function and event density

Related quantities are defined in terms of the survival function.

The lifetime distribution function, conventionally denoted F, is defined as the complement of the survival function,


and the derivative of F, which is the density function of the lifetime distribution, is conventionally denoted f,


The function f is sometimes called the event density; it is the rate of death or failure events per unit time.

The survival function is often defined in terms of distribution and density functions


Similarly, a survival event density function can be defined as

Hazard function and cumulative hazard function

The hazard function, conventionally denoted , is defined as the event rate at time t conditional on survival until time t or later (that is, Tt),


Force of mortality
Force of mortality
In actuarial science, force of mortality represents the instantaneous rate of mortality at a certain age measured on an annualized basis. It is identical in concept to failure rate, also called hazard function, in reliability theory....

 is a synonym of hazard function which is used particularly in demography
Demography
Demography is the statistical study of human population. It can be a very general science that can be applied to any kind of dynamic human population, that is, one that changes over time or space...

 and actuarial science
Actuarial science
Actuarial science is the discipline that applies mathematical and statistical methods to assess risk in the insurance and finance industries. Actuaries are professionals who are qualified in this field through education and experience...

, where it is denoted by . The term hazard rate is another synonym.

The hazard function must be non-negative, λ(t) ≥ 0, and its integral over must be infinite, but is not otherwise constrained; it may be increasing or decreasing, non-monotonic, or discontinuous.
An example is the bathtub curve
Bathtub curve
The bathtub curve is widely used in reliability engineering. It describes a particular form of the hazard function which comprises three parts:*The first part is a decreasing failure rate, known as early failures....

 hazard function, which is large for small values of t, decreasing to some minimum, and thereafter increasing again; this can model the property of some mechanical systems to either fail soon after operation, or much later, as the system ages.

The hazard function can alternatively be represented in terms of the cumulative hazard function, conventionally denoted :


so transposing signs and exponentiating


or differentiating (with the chain rule)


The name "cumulative hazard function" is derived from the fact that


which is the "accumulation" of the hazard over time.

From the definition of , we see that it increases without bound as t tends to infinity (assuming that S(t) tends to zero). This implies that must not decrease too quickly, since, by definition, the cumulative hazard has to diverge. For example, is not the hazard function of any survival distribution, because its integral converges to 1.

Quantities derived from the survival distribution

Future lifetime at a given time is the time remaining until death, given survival to age . Thus, it is in the present notation. The expected future lifetime is the expected value
Expected value
In probability theory, the expected value of a random variable is the weighted average of all possible values that this random variable can take on...

 of future lifetime. The probability of death at or before age , given survival until age , is just


Therefore the probability density of future lifetime is


and the expected future lifetime is


where the second expression is obtained using integration by parts
Integration by parts
In calculus, and more generally in mathematical analysis, integration by parts is a rule that transforms the integral of products of functions into other integrals...

.

For , that is, at birth, this reduces to the expected lifetime.

In reliability problems, the expected lifetime is called the mean time to failure
Mean time to failure
No artical exists on Wiki, please create one.In short Mean Time to Failure is the time taken for a part or system to fail for the first time.A very brief formula for the Mean Time To Failure of an event which occurs with probability P is: 1 / P....

, and the expected future lifetime is called the mean residual lifetime.

As the probability of an individual surviving until age t or later is S(t), by definition, the expected number of survivors at age t out of an initial population
Population
A population is all the organisms that both belong to the same group or species and live in the same geographical area. The area that is used to define a sexual population is such that inter-breeding is possible between any pair within the area and more probable than cross-breeding with individuals...

 of n newborns is n × S(t), assuming the same survival function for all individuals. Thus the expected proportion of survivors is S(t).
If the survival of different individuals is independent, the number of survivors at age t has a binomial distribution with parameters n and S(t), and the variance
Variance
In probability theory and statistics, the variance is a measure of how far a set of numbers is spread out. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean . In particular, the variance is one of the moments of a distribution...

 of the proportion of survivors is S(t) × (1-S(t))/n.

The age at which a specified proportion of survivors remain can be found by solving the equation S(t) = q for t, where q is the quantile
Quantile
Quantiles are points taken at regular intervals from the cumulative distribution function of a random variable. Dividing ordered data into q essentially equal-sized data subsets is the motivation for q-quantiles; the quantiles are the data values marking the boundaries between consecutive subsets...

 in question. Typically one is interested in the median
Median
In probability theory and statistics, a median is described as the numerical value separating the higher half of a sample, a population, or a probability distribution, from the lower half. The median of a finite list of numbers can be found by arranging all the observations from lowest value to...

 lifetime
, for which q = 1/2, or other quantiles such as q = 0.90 or q = 0.99.

One can also make more complex inferences from the survival distribution. In mechanical reliability problems, one can bring cost (or, more generally, utility
Utility
In economics, utility is a measure of customer satisfaction, referring to the total satisfaction received by a consumer from consuming a good or service....

) into consideration, and thus solve problems concerning repair or replacement. This leads to the study of renewal theory
Renewal theory
Renewal theory is the branch of probability theory that generalizes Poisson processes for arbitrary holding times. Applications include calculating the expected time for a monkey who is randomly tapping at a keyboard to type the word Macbeth and comparing the long-term benefits of different...

 and reliability theory of aging and longevity
Reliability theory of aging and longevity
Reliability theory of aging and longevity is a scientific approach aimed to gain theoretical insights into mechanisms of biological aging and species survival patterns by applying a general theory of systems failure, known as reliability theory.-Overview:...

.

Censoring

Censoring
Censoring (statistics)
In statistics, engineering, and medical research, censoring occurs when the value of a measurement or observation is only partially known.For example, suppose a study is conducted to measure the impact of a drug on mortality. In such a study, it may be known that an individual's age at death is at...

 is a form of missing data problem which is common in survival analysis. Ideally, both the birth and death dates of a subject are known, in which case the lifetime is known.

If it is known only that the date of death is after some date, this is called right censoring. Right censoring will occur for those subjects whose birth date is known but who are still alive when they are lost to follow-up or when the study ends.

If a subject's lifetime is known to be less than a certain duration, the lifetime is said to be left-censored.

It may also happen that subjects with a lifetime less than some threshold may not be observed at all: this is called truncation. Note that truncation is different from left censoring, since for a left censored datum, we know the subject exists, but for a truncated datum, we may be completely unaware of the subject. Truncation is also common. In a so-called delayed entry study, subjects are not observed at all until they have reached a certain age. For example, people may not be observed until they have reached the age to enter school. Any deceased subjects in the pre-school age group would be unknown. Left-truncated data is common in actuarial work for life insurance and pensions (Richards, 2010).

We generally encounter right-censored data. Left-censored data can occur when a person's survival time becomes incomplete on the left side of the follow-up period for the person. As an example, we may follow up a patient for any infectious disorder from the time of his or her being tested positive for the infection. We may never know the exact time of exposure to the infectious agent.

Fitting parameters to data

Survival models can be usefully viewed as ordinary regression models in which the response variable is time. However, computing the likelihood function (needed for fitting parameters or making other kinds of inferences) is complicated by the censoring. The likelihood function
Likelihood function
In statistics, a likelihood function is a function of the parameters of a statistical model, defined as follows: the likelihood of a set of parameter values given some observed outcomes is equal to the probability of those observed outcomes given those parameter values...

 for a survival model, in the presence of censored data, is formulated as follows. By definition the likelihood function is the conditional probability of the data given the parameters of the model.
It is customary to assume that the data are independent given the parameters. Then the likelihood function is the product of the likelihood of each datum. It is convenient to partition the data into four categories: uncensored, left censored, right censored, and interval censored. These are denoted "unc.", "l.c.", "r.c.", and "i.c." in the equation below.


For an uncensored datum, with equal to the age at death, we have


For a left censored datum, such that the age at death is known to be less than , we have


For a right censored datum, such that the age at death is known to be greater than , we have


For an interval censored datum, such that the age at death is known to be less than and greater than , we have


An important application where interval censored data arises is current status data, where the actual occurrence of an event is only known to the extent that it known not to occurred before observation time and to have occurred before the next.

Non-parametric estimation

The Nelson–Aalen estimator
Nelson–Aalen estimator
The Nelson–Aalen estimator is a non-parametric estimator of the cumulative hazard rate function in case of censored data or incomplete data. It is used in survival theory, reliability engineering and life insurance to estimate the cumulative number of expected events. An event can be a failure of a...

 can be used to provide a non-parametric
Non-parametric statistics
In statistics, the term non-parametric statistics has at least two different meanings:The first meaning of non-parametric covers techniques that do not rely on data belonging to any particular distribution. These include, among others:...

 estimate of the cumulative hazard rate function.

Distributions used in survival analysis

  • Exponential distribution
    Exponential distribution
    In probability theory and statistics, the exponential distribution is a family of continuous probability distributions. It describes the time between events in a Poisson process, i.e...

  • Weibull distribution
  • Exponential-logarithmic distribution

See also

  • Kaplan-Meier estimator
    Kaplan-Meier estimator
    The Kaplan–Meier estimator, also known as the product limit estimator, is an estimator for estimating the survival function from life-time data. In medical research, it is often used to measure the fraction of patients living for a certain amount of time after treatment. In economics, it can be...

  • Survival rate
    Survival rate
    In biostatistics, survival rate is a part of survival analysis, indicating the percentage of people in a study or treatment group who are alive for a given period of time after diagnosis...

  • Reliability theory
    Reliability theory
    Reliability theory describes the probability of a system completing its expected function during an interval of time. It is the basis of reliability engineering, which is an area of study focused on optimizing the reliability, or probability of successful functioning, of systems, such as airplanes,...

  • Proportional hazards models
    Proportional hazards models
    Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes before some event occurs to one or more covariates that may be associated with that quantity. In a proportional hazards model, the unique effect of a unit increase in a covariate...

  • Accelerated failure time model
    Accelerated failure time model
    In the statistical area of survival analysis, an accelerated failure time model is a parametric model that provides an alternative to the commonly used proportional hazards models...

  • Failure rate
    Failure rate
    Failure rate is the frequency with which an engineered system or component fails, expressed for example in failures per hour. It is often denoted by the Greek letter λ and is important in reliability engineering....

  • Logrank test
    Logrank test
    In statistics, the logrank test is a hypothesis test to compare the survival distributions of two samples. It is a nonparametric test and appropriate to use when the data are right skewed and censored...

  • Survival function
    Survival function
    The survival function, also known as a survivor function or reliability function, is a property of any random variable that maps a set of events, usually associated with mortality or failure of some system, onto time. It captures the probability that the system will survive beyond a specified time...

  • MTBF
  • Censoring (statistics)
    Censoring (statistics)
    In statistics, engineering, and medical research, censoring occurs when the value of a measurement or observation is only partially known.For example, suppose a study is conducted to measure the impact of a drug on mortality. In such a study, it may be known that an individual's age at death is at...

  • Maximum likelihood
    Maximum likelihood
    In statistics, maximum-likelihood estimation is a method of estimating the parameters of a statistical model. When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters....


External links

  • SOCR
    SOCR
    The Statistics Online Computational Resource is a suite of online tools and interactive aids for hands-on learning and teaching concepts in statistical analysis and probability theory developed at the University of California, Los Angeles...

     Survival analysis applet and interactive learning activity.
  • Survival/Failure Time Analysis @ Statistics
    Statistics
    Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

    ' Textbook Page
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK