Haplotype
Encyclopedia
A haplotype in genetics
is a combination of allele
s (DNA sequences) at adjacent locations (loci
) on the chromosome
that are transmitted together. A haplotype may be one locus, several loci, or an entire chromosome depending on the number of recombination
events that have occurred between a given set of loci.
In a second meaning, haplotype is a set of single-nucleotide polymorphisms (SNPs) on a single chromosome of a chromosome pair that are statistically associated
. It is thought that these associations, and the identification of a few alleles of a haplotype block, can unambiguously identify all other polymorphic sites in its region. Such information is very valuable for investigating the genetics behind common diseases, and has been investigated in the human species by the International HapMap Project
.
Many genetic testing companies use the term 'haplotype' to refer to an individual collection of short tandem repeat
(STR) allele mutations within a genetic segment, while using the term 'haplogroup
' to refer to the SNP/unique-event polymorphism (UEP) mutations which represents the clade
to which a collection of potential haplotypes belong.
may not uniquely define its haplotype. For example, consider a diploid organism and two bi-allelic loci
on the same chromosome such as single-nucleotide polymorphisms (SNPs). The first locus has alleles A and T with three possible genotypes AA, AT, and TT, the second locus having G and C, again giving three possible genotypes GG, GC, and CC. For a given individual, there are therefore nine possible configurations for the genotypes at these two loci, as shown in the Punnett square
below, which shows the possible genotypes that an individual may carry and the corresponding haplotypes that these resolve to. For individuals that are homozygous at one or both loci, it is clear what the haplotypes are; it is only when an individual is heterozygous at both loci that the gametic phase
is ambiguous.
The only unequivocal method of resolving phase ambiguity is by sequencing
. However, it is possible to estimate the probability of a particular haplotype when phase is ambiguous using a sample of individuals.
Given the genotypes for a number of individuals, the haplotypes can be inferred by haplotype resolution or haplotype phasing techniques. These methods work by applying the observation that certain haplotypes are common in certain genomic regions. Therefore, given a set of possible haplotype resolutions, these methods choose those that use fewer different haplotypes overall. The specifics of these methods vary - some are based on combinatorial approaches (e.g., parsimony), whereas others use likelihood functions based on different models and assumptions such as the Hardy-Weinberg principle
, the coalescent theory
model, or perfect phylogeny. These models are combined with optimization algorithms such as expectation-maximization algorithm
(EM), Markov chain Monte Carlo
(MCMC), or hidden Markov models (HMM).
Microfluidic whole genome haplotyping
is a technique for the physical separation of individual chromosomes from a metaphase
cell followed by direct resolution of the haplotype for each allele.
; so, unlike autosomal haplotypes, there is therefore effectively no randomisation of the Y-chromosome haplotype between generations, and a human male should largely share the same Y chromosome as his father, give or take a few mutations.
In particular, the Y-DNA that is the numbered results of a Y-DNA genealogical DNA test
should match, barring mutations. Within genealogical and popular discussion, this is sometimes referred to as the "DNA signature" of a particular male human, or of his paternal bloodline.
s. STRs represent haplotypes. The results that make up the full Y-DNA haplotype from the Y chromosome DNA test can be divided into two parts: the results for UEPs, sometimes loosely called the SNP results as most UEPs are single-nucleotide polymorphisms, and the results for microsatellite short tandem repeat
sequences (Y-STR
s).
The UEP results reflect the inheritance of events it is believed can be assumed to have happened only once in all human history. These can be used to directly identify the individual's Y-DNA haplogroup
, his place on the broad family tree of the whole of humanity. Different Y-DNA haplogroups identify genetic populations which are often intricately geographically oriented, reflecting the migrations of current individuals' direct patrilineal ancestors tens of thousands of years ago.
Unlike the UEPs, the Y-STRs mutate much more easily, which gives them much more resolution to distinguish recent genealogy. But it also means that, rather than the population of descendants of a genetic event all sharing the same result, the Y-STR haplotypes are likely to have spread apart, to form a cluster of more or less similar results. Typically, this cluster will have a definite most probable center, the modal haplotype
(presumably close to the haplotype of the original founding event), and also a haplotype diversity — the degree to which it has become spread out. The further in the past the defining event occurred, and the more that subsequent population growth occurred early, the greater the haplotype diversity for a particular number of descendants will be. On the other hand, if the haplotype diversity is smaller for a particular number of descendants, this may indicate a more recent common ancestor, or that a population expansion has occurred more recently.
It is important to note that, unlike for UEPs, there is no guarantee that two individuals with a similar Y-STR haplotype will necessarily share a similar ancestry. There is no uniqueness about Y-STR events. Instead, the clusters of Y-STR haplotype results inheriting from different events and different histories all tend to overlap.
Thus, although sometimes a Y-STR haplotype may be directly indicative of a particular Y-DNA haplogroup, it is in most cases a long time since the haplogroups' defining events, so typically the cluster of Y-STR haplotype results associated with descendents of that event has become rather broad, and will tend to significantly overlap the (similarly broad) clusters of Y-STR haplotypes associated with other haplogroups, making it impossible to predict with absolute certainty to which Y-DNA haplogroup a Y-STR haplotype would point. All that can be done from the Y-STRs, if the UEPs are not actually tested, is to predict probabilities for haplogroup ancestry (as this online program does), but not certainties.
A similar scenario exists for surnames. A cluster of similar Y-STR haplotypes may indicate a shared common ancestor, with an identifiable modal haplotype, but only if the cluster is sufficiently distinct from what may have arisen by chance from different individuals historically having adopted the same name independently. This may require the typing of quite an extensive haplotype to establish, which has fueled DNA testing companies to offer ever-larger sets of markers - 12 then 24 then 37 then 67 and now 111.
Plausibly establishing relatedness between different surnames data-mined from a database is significantly harder, because now it must be established not that a randomly-selected member of the population is unlikely to have such a close match by accident, but rather that the very nearest member of the population in question, chosen purposely from the population for that very reason, would even under those circumstances be unlikely to match by accident. This is for the foreseeable future likely to be impossible, except in special cases where there is further information to drastically limit the size of that population of candidates under consideration.
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....
is a combination of allele
Allele
An allele is one of two or more forms of a gene or a genetic locus . "Allel" is an abbreviation of allelomorph. Sometimes, different alleles can result in different observable phenotypic traits, such as different pigmentation...
s (DNA sequences) at adjacent locations (loci
Locus (genetics)
In the fields of genetics and genetic computation, a locus is the specific location of a gene or DNA sequence on a chromosome. A variant of the DNA sequence at a given locus is called an allele. The ordered list of loci known for a particular genome is called a genetic map...
) on the chromosome
Chromosome
A chromosome is an organized structure of DNA and protein found in cells. It is a single piece of coiled DNA containing many genes, regulatory elements and other nucleotide sequences. Chromosomes also contain DNA-bound proteins, which serve to package the DNA and control its functions.Chromosomes...
that are transmitted together. A haplotype may be one locus, several loci, or an entire chromosome depending on the number of recombination
Genetic recombination
Genetic recombination is a process by which a molecule of nucleic acid is broken and then joined to a different one. Recombination can occur between similar molecules of DNA, as in homologous recombination, or dissimilar molecules, as in non-homologous end joining. Recombination is a common method...
events that have occurred between a given set of loci.
In a second meaning, haplotype is a set of single-nucleotide polymorphisms (SNPs) on a single chromosome of a chromosome pair that are statistically associated
Association (statistics)
In statistics, an association is any relationship between two measured quantities that renders them statistically dependent. The term "association" refers broadly to any such relationship, whereas the narrower term "correlation" refers to a linear relationship between two quantities.There are many...
. It is thought that these associations, and the identification of a few alleles of a haplotype block, can unambiguously identify all other polymorphic sites in its region. Such information is very valuable for investigating the genetics behind common diseases, and has been investigated in the human species by the International HapMap Project
International HapMap Project
The International HapMap Project is an organization that aims to develop a haplotype map of the human genome, which will describe the common patterns of human genetic variation. HapMap is a key resource for researchers to find genetic variants affecting health, disease and responses to drugs and...
.
Many genetic testing companies use the term 'haplotype' to refer to an individual collection of short tandem repeat
Short tandem repeat
A short tandem repeat in DNA occurs when a pattern of two or more nucleotides are repeated and the repeated sequences are directly adjacent to each other. The pattern can range in length from 2 to 5 base pairs and is typically in the non-coding intron region...
(STR) allele mutations within a genetic segment, while using the term 'haplogroup
Haplogroup
In the study of molecular evolution, a haplogroup is a group of similar haplotypes that share a common ancestor having the same single nucleotide polymorphism mutation in both haplotypes. Because a haplogroup consists of similar haplotypes, this is what makes it possible to predict a haplogroup...
' to refer to the SNP/unique-event polymorphism (UEP) mutations which represents the clade
Clade
A clade is a group consisting of a species and all its descendants. In the terms of biological systematics, a clade is a single "branch" on the "tree of life". The idea that such a "natural group" of organisms should be grouped together and given a taxonomic name is central to biological...
to which a collection of potential haplotypes belong.
Haplotype resolution
An organism's genotypeGenotype
The genotype is the genetic makeup of a cell, an organism, or an individual usually with reference to a specific character under consideration...
may not uniquely define its haplotype. For example, consider a diploid organism and two bi-allelic loci
Locus (genetics)
In the fields of genetics and genetic computation, a locus is the specific location of a gene or DNA sequence on a chromosome. A variant of the DNA sequence at a given locus is called an allele. The ordered list of loci known for a particular genome is called a genetic map...
on the same chromosome such as single-nucleotide polymorphisms (SNPs). The first locus has alleles A and T with three possible genotypes AA, AT, and TT, the second locus having G and C, again giving three possible genotypes GG, GC, and CC. For a given individual, there are therefore nine possible configurations for the genotypes at these two loci, as shown in the Punnett square
Punnett square
The Punnett square is a diagram that is used to predict an outcome of a particular cross or breeding experiment. It is named after Reginald C. Punnett, who devised the approach, and is used by biologists to determine the probability of an offspring's having a particular genotype...
below, which shows the possible genotypes that an individual may carry and the corresponding haplotypes that these resolve to. For individuals that are homozygous at one or both loci, it is clear what the haplotypes are; it is only when an individual is heterozygous at both loci that the gametic phase
Gametic phase
In a diploid individual, the gametic phase represents the original allelic combinations that an individual received from its parents. It is therefore a particular association of alleles at different loci on the same chromosome, which is often unknown....
is ambiguous.
AA | AT | TT | |
---|---|---|---|
GG | AG AG | AG TG | TG TG |
GC | AG AC | AG TC or AC TG |
TG TC |
CC | AC AC | AC TC | TC TC |
The only unequivocal method of resolving phase ambiguity is by sequencing
DNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....
. However, it is possible to estimate the probability of a particular haplotype when phase is ambiguous using a sample of individuals.
Given the genotypes for a number of individuals, the haplotypes can be inferred by haplotype resolution or haplotype phasing techniques. These methods work by applying the observation that certain haplotypes are common in certain genomic regions. Therefore, given a set of possible haplotype resolutions, these methods choose those that use fewer different haplotypes overall. The specifics of these methods vary - some are based on combinatorial approaches (e.g., parsimony), whereas others use likelihood functions based on different models and assumptions such as the Hardy-Weinberg principle
Hardy-Weinberg principle
The Hardy–Weinberg principle states that both allele and genotype frequencies in a population remain constant—that is, they are in equilibrium—from generation to generation unless specific disturbing influences are introduced...
, the coalescent theory
Coalescent theory
In genetics, coalescent theory is a retrospective model of population genetics. It attempts to trace all alleles of a gene shared by all members of a population to a single ancestral copy, known as the most recent common ancestor...
model, or perfect phylogeny. These models are combined with optimization algorithms such as expectation-maximization algorithm
Expectation-maximization algorithm
In statistics, an expectation–maximization algorithm is an iterative method for finding maximum likelihood or maximum a posteriori estimates of parameters in statistical models, where the model depends on unobserved latent variables...
(EM), Markov chain Monte Carlo
Markov chain Monte Carlo
Markov chain Monte Carlo methods are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample of the...
(MCMC), or hidden Markov models (HMM).
Microfluidic whole genome haplotyping
Microfluidic whole genome haplotyping
Microfluidic whole genome haplotyping is a technique for the physical separation of individual chromosomes from a metaphase cell followed by direct resolution of the haplotype for each allele.-Whole genome haplotyping:...
is a technique for the physical separation of individual chromosomes from a metaphase
Metaphase
Metaphase, from the ancient Greek μετά and φάσις , is a stage of mitosis in the eukaryotic cell cycle in which condensed & highly coiled chromosomes, carrying genetic information, align in the middle of the cell before being separated into each of the two daughter cells...
cell followed by direct resolution of the haplotype for each allele.
Y-DNA haplotypes from genealogical DNA tests
Unlike other chromosomes, Y chromosomes do not come in pairs. Every human male has only one copy of that chromosome. This means that there is no lottery as to which copy to inherit, and also (for most of the chromosome) no shuffling between copies by recombinationGenetic recombination
Genetic recombination is a process by which a molecule of nucleic acid is broken and then joined to a different one. Recombination can occur between similar molecules of DNA, as in homologous recombination, or dissimilar molecules, as in non-homologous end joining. Recombination is a common method...
; so, unlike autosomal haplotypes, there is therefore effectively no randomisation of the Y-chromosome haplotype between generations, and a human male should largely share the same Y chromosome as his father, give or take a few mutations.
In particular, the Y-DNA that is the numbered results of a Y-DNA genealogical DNA test
Genealogical DNA test
A genealogical DNA test examines the nucleotides at specific locations on a person's DNA for genetic genealogy purposes. The test results are not meant to have any informative medical value and do not determine specific genetic diseases or disorders ; they are intended only to give genealogical...
should match, barring mutations. Within genealogical and popular discussion, this is sometimes referred to as the "DNA signature" of a particular male human, or of his paternal bloodline.
UEP results (SNP results)
Unique-event polymorphisms (UEPs) like SNPs represent haplogroupHaplogroup
In the study of molecular evolution, a haplogroup is a group of similar haplotypes that share a common ancestor having the same single nucleotide polymorphism mutation in both haplotypes. Because a haplogroup consists of similar haplotypes, this is what makes it possible to predict a haplogroup...
s. STRs represent haplotypes. The results that make up the full Y-DNA haplotype from the Y chromosome DNA test can be divided into two parts: the results for UEPs, sometimes loosely called the SNP results as most UEPs are single-nucleotide polymorphisms, and the results for microsatellite short tandem repeat
Short tandem repeat
A short tandem repeat in DNA occurs when a pattern of two or more nucleotides are repeated and the repeated sequences are directly adjacent to each other. The pattern can range in length from 2 to 5 base pairs and is typically in the non-coding intron region...
sequences (Y-STR
Y-STR
A Y-STR is a short tandem repeat on the Y-chromosome. Y-STRs are often used in forensics, paternity, and genealogical DNA testing.-Nomenclature:Y-STRs are assigned names by the HUGO gene nomenclature committee....
s).
The UEP results reflect the inheritance of events it is believed can be assumed to have happened only once in all human history. These can be used to directly identify the individual's Y-DNA haplogroup
Human Y-chromosome DNA haplogroups
In human genetics, a Human Y-chromosome DNA haplogroup is a haplogroup defined by differences in the non-recombining portions of DNA from the Y chromosome ....
, his place on the broad family tree of the whole of humanity. Different Y-DNA haplogroups identify genetic populations which are often intricately geographically oriented, reflecting the migrations of current individuals' direct patrilineal ancestors tens of thousands of years ago.
Y-STR haplotypes
The other possible part of the genetic results is the Y-STR haplotype, the set of results from the Y-STR markers tested.Unlike the UEPs, the Y-STRs mutate much more easily, which gives them much more resolution to distinguish recent genealogy. But it also means that, rather than the population of descendants of a genetic event all sharing the same result, the Y-STR haplotypes are likely to have spread apart, to form a cluster of more or less similar results. Typically, this cluster will have a definite most probable center, the modal haplotype
Modal haplotype
A modal haplotype is an ancestral haplotype derived from the DNA test results of a specific group of people, using genetic genealogy.The two most commonly discussed modal haplotypes are the Atlantic Modal Haplotype and the Cohen Modal Haplotype...
(presumably close to the haplotype of the original founding event), and also a haplotype diversity — the degree to which it has become spread out. The further in the past the defining event occurred, and the more that subsequent population growth occurred early, the greater the haplotype diversity for a particular number of descendants will be. On the other hand, if the haplotype diversity is smaller for a particular number of descendants, this may indicate a more recent common ancestor, or that a population expansion has occurred more recently.
It is important to note that, unlike for UEPs, there is no guarantee that two individuals with a similar Y-STR haplotype will necessarily share a similar ancestry. There is no uniqueness about Y-STR events. Instead, the clusters of Y-STR haplotype results inheriting from different events and different histories all tend to overlap.
Thus, although sometimes a Y-STR haplotype may be directly indicative of a particular Y-DNA haplogroup, it is in most cases a long time since the haplogroups' defining events, so typically the cluster of Y-STR haplotype results associated with descendents of that event has become rather broad, and will tend to significantly overlap the (similarly broad) clusters of Y-STR haplotypes associated with other haplogroups, making it impossible to predict with absolute certainty to which Y-DNA haplogroup a Y-STR haplotype would point. All that can be done from the Y-STRs, if the UEPs are not actually tested, is to predict probabilities for haplogroup ancestry (as this online program does), but not certainties.
A similar scenario exists for surnames. A cluster of similar Y-STR haplotypes may indicate a shared common ancestor, with an identifiable modal haplotype, but only if the cluster is sufficiently distinct from what may have arisen by chance from different individuals historically having adopted the same name independently. This may require the typing of quite an extensive haplotype to establish, which has fueled DNA testing companies to offer ever-larger sets of markers - 12 then 24 then 37 then 67 and now 111.
Plausibly establishing relatedness between different surnames data-mined from a database is significantly harder, because now it must be established not that a randomly-selected member of the population is unlikely to have such a close match by accident, but rather that the very nearest member of the population in question, chosen purposely from the population for that very reason, would even under those circumstances be unlikely to match by accident. This is for the foreseeable future likely to be impossible, except in special cases where there is further information to drastically limit the size of that population of candidates under consideration.
See also
- International HapMap ProjectInternational HapMap ProjectThe International HapMap Project is an organization that aims to develop a haplotype map of the human genome, which will describe the common patterns of human genetic variation. HapMap is a key resource for researchers to find genetic variants affecting health, disease and responses to drugs and...
- genealogical DNA testGenealogical DNA testA genealogical DNA test examines the nucleotides at specific locations on a person's DNA for genetic genealogy purposes. The test results are not meant to have any informative medical value and do not determine specific genetic diseases or disorders ; they are intended only to give genealogical...
- HaplogroupHaplogroupIn the study of molecular evolution, a haplogroup is a group of similar haplotypes that share a common ancestor having the same single nucleotide polymorphism mutation in both haplotypes. Because a haplogroup consists of similar haplotypes, this is what makes it possible to predict a haplogroup...
- Y-STRY-STRA Y-STR is a short tandem repeat on the Y-chromosome. Y-STRs are often used in forensics, paternity, and genealogical DNA testing.-Nomenclature:Y-STRs are assigned names by the HUGO gene nomenclature committee....
Software
- FAMHAP — FAMHAP is a software for single-marker analysis and, in particular, joint analysis of unphased genotype data from tightly linked markers (haplotype analysis).
- Fugue — EM based haplotype estimation and association tests in unrelated and nuclear families.
- HPlus — A software package for imputation and testing of haplotypes in association studies using a modified method that incorporates the expectation-maximization algorithm and a Bayesian method called progressive ligation.
- HaploBlockFinder — A software package for analyses of haplotype block structure.
- HaploviewHaploviewHaploview is a commonly used bioinformatics software which is designed to analyze and visualize patterns of linkage disequilibrium in genetic data. Haploview can also perform association studies, choosing tagSNPs and estimating haplotype frequencies. Haploview is developed and maintained by Dr...
— Visualisation of linkage disequilibriumLinkage disequilibriumIn population genetics, linkage disequilibrium is the non-random association of alleles at two or more loci, not necessarily on the same chromosome. It is also referred to as to as gametic phase disequilibrium , or simply gametic disequilibrium...
, haplotype estimation and haplotype tagging (Homepage). - HelixTree — Haplotype analysis software - Haplotype Trend Regression (HTR), haplotypic association tests, and haplotype frequency estimation using both the expectation-maximization (EM) algorithm and composite haplotype method (CHM).
- PHASE — A software for haplotype reconstruction, and recombination rate estimation from population data.
- SNPHAP — EM based software for estimating haplotype frequencies from unphased genotypes.
- WHAP — haplotype based association analysis.
External links
- HaploGroups.com — Comprehensive resource for DNA testing.
- HapMap — homepage for the International HapMap Project.
- Haplotype versus Haplogroup — the difference between haplogroup & haplotype explained.