Microsatellite
Encyclopedia
Microsatellites, also known as Simple Sequence Repeats (SSRs) or short tandem repeat
Short tandem repeat
A short tandem repeat in DNA occurs when a pattern of two or more nucleotides are repeated and the repeated sequences are directly adjacent to each other. The pattern can range in length from 2 to 5 base pairs and is typically in the non-coding intron region...

s
(STRs), are repeating sequences of 2-6 base pairs of DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

.

Microsatellites are typically co-dominant. They are used as molecular markers in genetics
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....

, for kinship
Kinship
Kinship is a relationship between any entities that share a genealogical origin, through either biological, cultural, or historical descent. And descent groups, lineages, etc. are treated in their own subsections....

, population
Population
A population is all the organisms that both belong to the same group or species and live in the same geographical area. The area that is used to define a sexual population is such that inter-breeding is possible between any pair within the area and more probable than cross-breeding with individuals...

 and other studies. They can also be used to study gene duplication
Gene duplication
Gene duplication is any duplication of a region of DNA that contains a gene; it may occur as an error in homologous recombination, a retrotransposition event, or duplication of an entire chromosome.The second copy of the gene is often free from selective pressure — that is, mutations of it have no...

 or deletion
Genetic deletion
In genetics, a deletion is a mutation in which a part of a chromosome or a sequence of DNA is missing. Deletion is the loss of genetic material. Any number of nucleotides can be deleted, from a single base to an entire piece of chromosome...

. Microsatellites are also known to be causative agents in human disease, especially neurodegenerative disorders and cancer.

Introduction

One common example of a microsatellite is a (CA)n repeat, where n varies between alleles. These markers often present high levels of inter- and intra-specific polymorphism, particularly when the number of repetitions is 10 or greater. The repeated sequence is often simple, consisting of two, three or four nucleotides (di-, tri-, and tetranucleotide repeats respectively), and can be repeated 3 to 100 times, with the longer loci generally having more alleles due to the greater potential for slippage (see below). CA nucleotide repeats are very frequent in human
Human
Humans are the only living species in the Homo genus...

 and other genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

s, and are present every few thousand base pairs. As there are often many alleles present at a microsatellite locus, genotype
Genotype
The genotype is the genetic makeup of a cell, an organism, or an individual usually with reference to a specific character under consideration...

s within pedigree
Pedigree chart
A pedigree chart is a diagram that shows the occurrence and appearance or phenotypes of a particular gene or organism and its ancestors from one generation to the next, most commonly humans, show dogs, and race horses....

s are often fully informative, in that the progenitor of a particular allele can often be identified. In this way, microsatellites are ideal for determining paternity, population genetic studies and recombination mapping. It is also the only molecular marker to provide clues about which alleles are more closely related. Microsatellites are also predictors of SNP density and human–chimpanzee divergence differing from the genome-wide average in regions extending thousands of nucleotides.

The variability of microsatellites is due to a higher rate of mutation compared to other neutral regions of DNA. These high rates of mutation can be explained most frequently by slipped strand mispairing
Slipped strand mispairing
Slipped strand mispairing is a mutation process which occurs during DNA replication. It involves denaturation and displacement of the DNA strands, resulting in mispairing of the complementary bases. Slipped strand mispairing is one explanation for the origin and evolution of repetitive DNA sequences...

 (slippage) during DNA replication
DNA replication
DNA replication is a biological process that occurs in all living organisms and copies their DNA; it is the basis for biological inheritance. The process starts with one double-stranded DNA molecule and produces two identical copies of the molecule...

 on a single DNA strand. Mutation may also occur during recombination
Genetic recombination
Genetic recombination is a process by which a molecule of nucleic acid is broken and then joined to a different one. Recombination can occur between similar molecules of DNA, as in homologous recombination, or dissimilar molecules, as in non-homologous end joining. Recombination is a common method...

 during meiosis
Meiosis
Meiosis is a special type of cell division necessary for sexual reproduction. The cells produced by meiosis are gametes or spores. The animals' gametes are called sperm and egg cells....

. Some errors in slippage are rectified by proofreading
Proofreading (biology)
The term proofreading is used in genetics to refer to the error-correcting processes, first proposed by John Hopfield and Jacques Ninio, involved in DNA replication, immune system specificity, enzyme-substrate recognition among many other processes that require enhanced specificity...

 mechanisms within the nucleus
Cell nucleus
In cell biology, the nucleus is a membrane-enclosed organelle found in eukaryotic cells. It contains most of the cell's genetic material, organized as multiple long linear DNA molecules in complex with a large variety of proteins, such as histones, to form chromosomes. The genes within these...

, but some mutations can escape repair. The size of the repeat unit, the number of repeats and the presence of variant repeats are all factors, as well as the frequency of transcription
Transcription (genetics)
Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...

 in the area of the DNA repeat. Interruption of microsatellites, perhaps due to mutation, can result in reduced polymorphism. However, this same mechanism can occasionally lead to incorrect amplification of microsatellites; if slippage occurs early on during PCR, microsatellites of incorrect lengths can be amplified.

Amplification

Microsatellites can be amplified for identification by the polymerase chain reaction
Polymerase chain reaction
The polymerase chain reaction is a scientific technique in molecular biology to amplify a single or a few copies of a piece of DNA across several orders of magnitude, generating thousands to millions of copies of a particular DNA sequence....

 (PCR) process, using the unique sequences of flanking regions as primers
Primer (molecular biology)
A primer is a strand of nucleic acid that serves as a starting point for DNA synthesis. They are required for DNA replication because the enzymes that catalyze this process, DNA polymerases, can only add new nucleotides to an existing strand of DNA...

. DNA is repeatedly denatured at a high temperature to separate the double strand, then cooled to allow annealing
Annealing
Annealing may refer to:*Annealing , a heat treatment that alters the microstructure of a material causing changes in properties such as strength and hardness and ductility*Annealing , heating a piece of glass to remove stress...

 of primers and the extension of nucleotide sequences through the microsatellite. This process results in production of enough DNA to be visible on agarose or polyacrylamide
Acrylamide
Acrylamide is a chemical compound with the chemical formula C3H5NO. Its IUPAC name is prop-2-enamide. It is a white odourless crystalline solid, soluble in water, ethanol, ether, and chloroform. Acrylamide is incompatible with acids, bases, oxidizing agents, iron, and iron salts...

 gels; only small amounts of DNA are needed for amplification because in this way thermocycling creates an exponential increase in the replicated segment. With the abundance of PCR technology, primers that flank microsatellite loci are simple and quick to use, but the development of correctly functioning primers is often a tedious and costly process.

Creation of microsatellite primers

If searching for microsatellite markers in specific regions of a genome, for example within a particular exon
Exon
An exon is a nucleic acid sequence that is represented in the mature form of an RNA molecule either after portions of a precursor RNA have been removed by cis-splicing or when two or more precursor RNA molecules have been ligated by trans-splicing. The mature RNA molecule can be a messenger RNA...

 of a gene, primers can be designed manually. This involves searching the genomic DNA sequence for microsatellite repeats, which can be done by eye or by using automated tools such as repeat masker. Once the potentially useful microsatellites are determined (removing non-useful ones such as those with random inserts within the repeat region), the flanking sequences can be used to design oligonucleotide
Oligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...

 primers which will amplify the specific microsatellite repeat in a PCR reaction.

Random microsatellite primers can be developed by cloning
Cloning
Cloning in biology is the process of producing similar populations of genetically identical individuals that occurs in nature when organisms such as bacteria, insects or plants reproduce asexually. Cloning in biotechnology refers to processes used to create copies of DNA fragments , cells , or...

 random segments of DNA from the focal species. These random segments are inserted into a plasmid
Plasmid
In microbiology and genetics, a plasmid is a DNA molecule that is separate from, and can replicate independently of, the chromosomal DNA. They are double-stranded and, in many cases, circular...

 or bacteriophage
Bacteriophage
A bacteriophage is any one of a number of viruses that infect bacteria. They do this by injecting genetic material, which they carry enclosed in an outer protein capsid...

 vector
Cloning vector
A cloning vector is a small piece of DNA into which a foreign DNA fragment can be inserted. The insertion of the fragment into the cloning vector is carried out by treating the vehicle and the foreign DNA with a restriction enzyme that creates the same overhang, then ligating the fragments...

, which is in turn implanted into Escherichia coli
Escherichia coli
Escherichia coli is a Gram-negative, rod-shaped bacterium that is commonly found in the lower intestine of warm-blooded organisms . Most E. coli strains are harmless, but some serotypes can cause serious food poisoning in humans, and are occasionally responsible for product recalls...

bacteria. Colonies are then developed, and screened with fluorescently–labelled oligonucleotide
Oligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...

 sequences that will hybridize to a microsatellite repeat, if present on the DNA segment. If positive clones can be obtained from this procedure, the DNA is sequenced and PCR primers are chosen from sequences flanking such regions to determine a specific locus
Locus (genetics)
In the fields of genetics and genetic computation, a locus is the specific location of a gene or DNA sequence on a chromosome. A variant of the DNA sequence at a given locus is called an allele. The ordered list of loci known for a particular genome is called a genetic map...

. This process involves significant trial and error on the part of researchers, as microsatellite repeat sequences must be predicted and primers that are randomly isolated may not display significant polymorphism. Microsatellite loci are widely distributed throughout the genome and can be isolated from semi-degraded DNA of older specimens, as all that is needed is a suitable substrate for amplification through PCR.

More recent techniques involve using oligonucleotide
Oligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...

 sequences consisting of repeats complementary to repeats in the microsatellite to "enrich" the DNA extracted (Microsatellite enrichment
Microsatellite enrichment
Microsatellite enrichment is a method in molecular biology used for enriching the amount of microsatellite sequences in a DNA sample. This can be achieved by designing oligonucleotide probes that hybridize with the repeats in the microsatellites and then pull out the probe/microsatellite complexes...

). The oligonucleotide probe hybridizes with the repeat in the microsatellite, and the probe/microsatellite complex is then pulled out of solution. The enriched DNA is then cloned as normal, but the proportion of successes will now be much higher, drastically reducing the time required to develop the regions for use. However, which probes to use can be a trial and error process in itself.

ISSR-PCR

ISSR (for inter-simple sequence repeat) is a general term for a genome region between microsatellite loci. The complementary sequences to two neighboring microsatellites are used as PCR primers; the variable region between them gets amplified. The limited length of amplification cycles during PCR prevents excessive replication of overly long contiguous DNA sequences, so the result will be a mix of a variety of amplified DNA strands which are generally short but vary much in length.

Sequences amplified by ISSR-PCR can be used for DNA fingerprinting. Since an ISSR may be a conserved or nonconserved region, this technique is not useful for distinguishing individuals, but rather for phylogeography
Phylogeography
Phylogeography is the study of the historical processes that may be responsible for the contemporary geographic distributions of individuals. This is accomplished by considering the geographic distribution of individuals in light of the patterns associated with a gene genealogy.This term was...

 analyses or maybe delimiting species
Species
In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...

; sequence diversity is lower than in SSR-PCR, but still higher than in actual gene sequences. In addition, microsatellite sequencing and ISSR sequencing are mutually assisting, as one produces primers for the other.

Global Microsatellite Content with microarrays

Using a CGH-style array manufactured by Nimblgen/Roche the entire microsatellite content of a genome can be measured quickly, inexpensively and en masse. It is important to note that this approach does not evaluate the genotype of any particular locus, but instead sums the contributions for a given repeated motif from the many positions in which that motif exists across the genome. This array evaluates all 1- to 6- mer repeats (and their cyclic permutations and complement). This approach has been used to place any species, sequenced or not, onto a taxonomic tree. That tree matched precisely the currently accepted phylogenic relationships. With this new platform technology it is possible to study the genomic variations within an individual for those genomic features that are most variable, microsatellites.

Using this global microsatellite content array approach, studies indicate that there are major new genomic destabilization mechanisms that globally modify microsatellites, thus potentially altering very large numbers of genes. These global scale variations in both the tumor and germline patient samples may have important roles in the cancer process, of potential value in diagnosis, prognosis and therapy judgments . This Global Microsatellite Content array revealed that for the cancers studied, especially breast cancer, that there were elevated amounts of AT rich motifs. Pursuit of these AT rich motifs identified an AAAG motif that was variable in region immediately upstream of the start site of the Estrogen Related Receptor Gamma gene, a gene that had previously been implicated in breast cancer and tamoxifen resistance. This locus was found to be a promoter for the gene. A long allele was found to be approximately 3 times more prevalent in breast cancer patients (germline) than in cancer-free patients (p<0.01) and thus may be a risk marker.

Limitations

Microsatellites have proved to be versatile molecular markers, particularly for population analysis, but they are not without limitations. Microsatellites developed for particular species can often be applied to closely related species, but the percentage of loci that successfully amplify may decrease with increasing genetic distance
Genetic distance
Genetic distance refers to the genetic divergence between species or between populations within a species. It is measured by a variety of parameters. Smaller genetic distances indicate a close genetic relationship whereas large genetic distances indicate a more distant genetic relationship...

. Point mutation in the primer annealing sites in such species may lead to the occurrence of ‘null allele
Null allele
A null allele is a mutant copy of a gene that completely lacks that gene's normal function. This can be the result of the complete absence of the gene product at the molecular level, or the expression of a non-functional gene product...

s’, where microsatellites fail to amplify in PCR assays. Null alleles can be attributed to several phenomena. Sequence divergence
Divergence
In vector calculus, divergence is a vector operator that measures the magnitude of a vector field's source or sink at a given point, in terms of a signed scalar. More technically, the divergence represents the volume density of the outward flux of a vector field from an infinitesimal volume around...

 in flanking regions can lead to poor primer annealing, especially at the 3’ section, where extension commences; preferential amplification of particular size alleles due to the competitive nature of PCR can lead to heterozygous individuals being scored for homozygosity (partial null). PCR failure may result when particular loci fail to amplify, whereas others amplify more efficiently and may appear homozygous on a gel assay, when they are in reality heterozygous in the genome. Null alleles complicate the interpretation of microsatellite allele frequencies and thus make estimates of relatedness faulty. Furthermore, stochastic
Stochastic
Stochastic refers to systems whose behaviour is intrinsically non-deterministic. A stochastic process is one whose behavior is non-deterministic, in that a system's subsequent state is determined both by the process's predictable actions and by a random element. However, according to M. Kac and E...

 effects of sampling that occurs during mating may change allele frequencies in a way that is very similar to the effect of null alleles; an excessive frequency of homozygotes causing deviations from Hardy-Weinberg equilibrium expectations. Since null alleles are a technical problem and sampling effects that occur during mating are a real biological property of a population, it is often very important to distinguish between them if excess homozygotes are observed.

When using microsatellites to compare species, homologous loci may be easily amplified in related species, but the number of loci that amplify successfully during PCR may decrease with increased genetic distance between the species in question. Mutation in microsatellite alleles is biased in the sense that larger alleles contain more bases, and are therefore likely to be mistranslated in DNA replication. Smaller alleles also tend to increase in size, whereas larger alleles tend to decrease in size, as they may be subject to an upper size limit; this constraint has been determined but possible values have not yet been specified. If there is a large size difference between individual alleles, then there may be increased instability during recombination at meiosis. In tumour cells, where controls on replication may be damaged, microsatellites may be gained or lost at an especially high frequency during each round of mitosis
Mitosis
Mitosis is the process by which a eukaryotic cell separates the chromosomes in its cell nucleus into two identical sets, in two separate nuclei. It is generally followed immediately by cytokinesis, which divides the nuclei, cytoplasm, organelles and cell membrane into two cells containing roughly...

. Hence a tumour cell line might show a different genetic fingerprint from that of the host tissue.

Mechanisms for change

The most common cause of length changes in short sequence repeats is replication slippage, caused by mismatches between DNA strands while being replicated during meiosis (Tautz 1994). Typically, slippage in each microsatellite occurs about once per 1,000 generations (Weber 1993). Slippage changes in repetitive DNA are orders of magnitude more common than point mutations in other parts of the genome (Jarne 1996). Most slippage results in a change of just one repeat unit, and slippage rates vary for different repeat unit sizes, and within different species (Kruglyak 1998).

Short sequence repeats are distributed throughout the genome (King 1997). Presumably, their most probable means of expression will vary, depending on their location.

In proteins

In mammals, 20% to 40% of proteins contain repeating sequences of amino acids caused by short sequence repeats (Marcotte 1998). Most of the short sequence repeats within protein-coding portions of the genome have a repeating unit of three nucleotides, since that length will not cause frame-shift mutations (Sutherland 1995). Each trinucleotide repeating sequence is transcribed into a repeating series of the same amino acid. In yeasts, the most common repeated amino acids are glutamine, glutamic acid, asparagine, aspartic acid and serine. These repeating segments can affect the physical and chemical properties of proteins, with the potential for producing gradual and predictable changes in protein action (Hancock 2005).

For example, length changes in tandemly repeating regions in the Runx2 gene lead to differences in facial length in domesticated dogs (Canis familiaris), with an association between longer sequence lengths and longer faces (Fondon 2004). This association also applies to a wider range of Carnivora species (Sears 2007). Length changes in polyalanine tracts within the HoxA13 gene are linked to hand-foot-genital syndrome
Hand-Foot-Genital Syndrome
Hand-foot-genital syndrome is characterized by limb malformations and urogenital defects. Mild bilateral shortening of the thumbs and great toes, caused primarily by shortening of the distal phalanx and/or the first metacarpal or metatarsal, is the most common limb malformation and results in...

, a developmental disorder in humans (Utsch 2002). Length changes in other triplet repeats are linked to more than 40 neurological diseases in humans (Pearson 2005).

Evolutionary changes from replication slippage also occur in simpler organisms. For example, microsatellite length changes are common within surface membrane proteins in yeast, providing rapid evolution in cell properties (Bowen 2006). Specifically, length changes in the FLO1 gene control the level of adhesion to substrates (Verstrepen 2005). Short sequence repeats also provide rapid evolutionary change to surface proteins in pathenogenic bacteria, perhaps so they can keep up with immunological changes in their hosts (Moxon 1994). This is known as the Red Queen hypothesis (Van Valen 1973). Length changes in short sequence repeats in a fungus (Neurospora crassa) control the duration of its circadian clock cycles (Michael 2007).

Gene regulation

Length changes of microsatellites within promoters and other cis-regulatory regions can also change gene expression quickly, between generations. The human genome contains many (>16,000) short sequence repeats in regulatory regions, which provide ‘tuning knobs’ on the expression of many genes (Rockman 2002).
Length changes in bacterial SSRs can affect fimbriae formation in Haemophilus influenza, by altering promoter spacing (Moxon 1994). Minisatellites are also linked to abundant variations in cis-regulatory control regions in the human genome (Rockman 2002). And microsatellites in control regions of the Vasopressin 1a receptor gene in voles influence their social behavior, and level of monogamy (Hammock 2005).

Within introns

Microsatellites within intron
Intron
An intron is any nucleotide sequence within a gene that is removed by RNA splicing to generate the final mature RNA product of a gene. The term intron refers to both the DNA sequence within a gene, and the corresponding sequence in RNA transcripts. Sequences that are joined together in the final...

s also influence phenotype, through means that are not currently understood. For example, a GAA triplet expansion in the first intron of the X25 gene appears to interfere with transcription, and causes Friedreich Ataxia (Bidichandani 1998). Tandem repeats in the first intron of the Asparagine synthetase gene are linked to acute lymphoblastic leukemia (Akagi 2008). A repeat polymorphism in the fourth intron of the NOS3 gene is linked to hypertension in a Tunisian population (Jemaa 2008). Reduced repeat lengths in the EGFR gene are linked with osteosarcomas (Kersting 2008).

Within transposons

Microsatellites are distributed throughout the genome (Richard 2008). Almost 50% of the human genome is contained in various types of transposable elements (also called transposons, or ‘jumping genes’), and many of them contain repetitive DNA (Scherer 2008). It is probable that short sequence repeats in those locations are also involved in the regulation of gene expression (Tomilin 2008).

See also

  • minisatellite
    Minisatellite
    A minisatellite is a section of DNA that consists of a short series of bases 10-60 bp. These occur at more than 1,000 locations in the human genome...

  • Satellite DNA
    Satellite DNA
    Satellite DNA consists of very large arrays of tandemly repeating, non-coding DNA. Satellite DNA is the main component of functional centromeres, and form the main structural constituent of heterochromatin....

  • genetic marker
    Genetic marker
    A genetic marker is a gene or DNA sequence with a known location on a chromosome that can be used to identify cells, individuals or species. It can be described as a variation that can be observed...

  • mobile element
  • transposon
    Transposon
    Transposable elements are sequences of DNA that can move or transpose themselves to new positions within the genome of a single cell. The mechanism of transposition can be either "copy and paste" or "cut and paste". Transposition can create phenotypically significant mutations and alter the cell's...

  • short interspersed repetitive elements
  • long interspersed repetitive element
  • junk DNA
  • variable number tandem repeat
    Variable number tandem repeat
    A Variable Number Tandem Repeat is a location in a genome where a short nucleotide sequence is organized as a tandem repeat. These can be found on many chromosomes, and often show variations in length between individuals. Each variant acts as an inherited allele, allowing them to be used for...

    s
  • short tandem repeat
    Short tandem repeat
    A short tandem repeat in DNA occurs when a pattern of two or more nucleotides are repeated and the repeated sequences are directly adjacent to each other. The pattern can range in length from 2 to 5 base pairs and is typically in the non-coding intron region...

    s
  • Trinucleotide repeat disorders
    Trinucleotide repeat disorders
    Trinucleotide repeat disorders are a set of genetic disorders caused by trinucleotide repeat expansion, a kind of mutation where trinucleotide repeats in certain genes exceeding the normal, stable, threshold, which differs per gene...

  • microsatellite instability
    Microsatellite instability
    Microsatellites are repeated sequences of DNA. Although the length of these microsatellites is highly variable from person to person, each individual has microsatellites of a set length. These repeated sequences are common, and normal...

  • Simple sequence length polymorphism
    Simple sequence length polymorphism
    Simple Sequence Length Polymorphisms are used as genetic markers with Polymerase Chain Reaction . An SSLP is a type of polymorphism: a difference in DNA sequence amongst individuals. SSLPs are repeated sequences over varying base lengths in intergenic regions of deoxyribonucleic acid...

     (SSLP)
  • Snpstr
    Snpstr
    A SNPSTR is a compound genetic marker composed of one or more SNPs and one microsatellite . SNPSTRs were first described by MOUNTAIN et al. who developed experimental protocols for autosomal SNPSTRs which contain a SNP and a microsatellite within 500 base pairs of one another...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK