Concentration parameter
Encyclopedia
In probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

 and statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, a concentration parameter is a special kind of numerical parameter of a parametric family
Parametric family
In mathematics and its applications, a parametric family or a parameterized family is a family of objects whose definitions depend on a set of parameters....

 of probability distribution
Probability distribution
In probability theory, a probability mass, probability density, or probability distribution is a function that describes the probability of a random variable taking certain values....

s. Concentration parameters occur in conjunction with distributions whose domain is a probability distribution, such as the symmetric Dirichlet distribution and the Dirichlet process
Dirichlet process
In probability theory, a Dirichlet process is a stochastic process that can be thought of as a probability distribution whose domain is itself a random distribution...

.

The larger the value of the concentration parameter, the more evenly distributed is the resulting distribution (the more it tends towards the uniform distribution
Uniform distribution
-Probability theory:* Discrete uniform distribution* Continuous uniform distribution-Other:* "Uniform distribution modulo 1", see Equidistributed sequence*Uniform distribution , a type of species distribution* Distribution of military uniforms...

). The smaller the value of the concentration parameter, the more sparsely distributed is the resulting distribution, with all but a few parameters having a probability near zero (in other words, the more it tends towards a distribution concentrated on a single point, the degenerate distribution defined by the Dirac delta function
Dirac delta function
The Dirac delta function, or δ function, is a generalized function depending on a real parameter such that it is zero for all values of the parameter except when the parameter is zero, and its integral over the parameter from −∞ to ∞ is equal to one. It was introduced by theoretical...

).

In the case of multivariate Dirichlet distributions, there is some confusion over how to define the concentration parameter. In the topic modelling literature, it is often defined as the sum of the individual Dirichlet parameters , when discussing symmetric Dirichlet distributions (where the parameters are the same for all dimensions) it is often defined to be the value of the single Dirichlet parameter used in all dimensions. This second definition is larger by a factor of the dimension of the distribution.

A concentration parameter of 1 (or k, the dimension of the Dirichlet distribution, by the definition used in the topic modelling literature) results in all sets of probabilities being equally likely, i.e. in this case the Dirichlet distribution of dimension k is equivalent to a uniform distribution over a k-1-dimensional simplex. Note that this is not the same as what happens when the concentration parameter tends towards infinity. In the former case, all resulting distributions are equally likely (the distribution over distributions is uniform). In the latter case, only near-uniform distributions are likely (the distribution over distributions is highly peaked around the uniform distribution). Meanwhile, in the limit as the concentration parameter tends towards zero, only distributions with nearly all mass concentrated on one of their components are likely (the distribution over distributions is highly peaked around the k possible Dirac delta distributions centered on one of the components, or in terms of the k-dimensional simplex, is highly peaked at corners of the simplex).

An example of where a sparse prior (concentration parameter much less than 1) is called for, consider a topic model
Topic model
In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. An early topic model was probabilistic latent semantic indexing , created by Thomas Hofmann in 1999...

, which is used to learn the topics that are discussed in a set of documents, where each "topic" is described using a categorical distribution
Categorical distribution
In probability theory and statistics, a categorical distribution is a probability distribution that describes the result of a random event that can take on one of K possible outcomes, with the probability of each outcome separately specified...

 over a vocabulary of words. A typical vocabulary might have 100,000 words, leading to a 100,000-dimensional categorical distribution. The prior distribution for the parameters of the categorical distribution would likely be a symmetric Dirichlet distribution. However, a coherent topic might only have a few hundred words with any significant probability mass. Accordingly, a reasonable setting for the concentration parameter might be 0.01 or 0.001. With a larger vocabulary of around 1,000,000 words, an even smaller value, e.g. 0.0001, might be appropriate.

See also

  • Dirichlet distribution
  • Dirichlet process
    Dirichlet process
    In probability theory, a Dirichlet process is a stochastic process that can be thought of as a probability distribution whose domain is itself a random distribution...

  • Pitman–Yor process
  • Location parameter
    Location parameter
    In statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter μ, which determines the "location" or shift of the distribution...

  • Scale parameter
    Scale parameter
    In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions...

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK