Multilocus sequence typing
Encyclopedia
Multilocus sequence typing (MLST) is a technique in molecular biology
for the typing of multiple loci
. The procedure characterizes isolates of bacterial species using the DNA sequences of internal fragments of multiple housekeeping gene
s. Approximately 450-500 bp internal fragments of each gene are used, as these can be accurately sequenced on both strands using an automated DNA sequencer. For each housekeeping gene, the different sequences present within a bacterial species are assigned as distinct alleles and, for each isolate, the alleles at each of the loci define the allelic profile or sequence type (ST).
The first MLST scheme to be developed was for Neisseria meningitidis
, the causative humerous agent of meningococcal meningitis
and septicaemia.
. Nucleotide differences between strains can be checked at a variable number of genes depending on the degree of discrimination desired.
The workflow of MLST involves: 1) data collection, 2) data analysis and 3) multilocus sequence analysis. In first section, definitive identification of variation is obtained by nucleotide sequence determination of gene fragments. In data analysis all unique sequences are assigned allele numbers and combined into an allelic profile and assigned a sequence type (ST). If new alleles and STs are found, they are stored in database after verification. In the final section of MLST the relatedness of isolates are made by comparing allelic profiles. Researchers do epidemiological and phylogenetical studies by comparing STs of different clonal complexes. A huge set of data is produced during the sequencing and identification process so bioinformatic techniques are used to arrange, manage, analyze and merge all of the biological data.
To strike the balance between the acceptable identification power, time and cost for the strain typing, about seven to eight house-keeping genes are commonly used in the laboratories. Quoting Staphylococcus aureus
as an example, seven housekeeping genes are used in MLST typing. These genes include carbamate kinase (arcC), shikimate dehydrogenase (aroE), glycerol kinase (glpF), guanylate kinase (gmk), phosphate acetyltransferase (pta), triosephosphate isomerase (tpi) and acetyl coenzyme A acetyltransferase (yqiL) as specified by the MLST website. However, it is not uncommon for up to ten housekeeping genes to be used. For Vibrio vulnificus
, the housekeeping genes used are glucose-6-phosphate isomerase (glp), DNA gyrase, subunit B (gyrB), malate-lactate dehydrogenase (mdh), methionyl-tRNA synthetase (metG), phosphoribosylaminoimidazole synthetase (purM), threonine dehyrogenase (dtdS), diaminopimelate decarboxylase (lysA), transhydrogenase alpha subunit (pntA), dihydroorotase (pyrC) and tryptophanase (tnaA). Thus both the number and type of housekeeping genes interrogated by MLST may differ from species to species.
For each of these housekeeping genes, the different sequences are assigned as alleles and the alleles at the loci provide an allelic profile. A series of profiles can then be the identification marker for strain typing. Sequences that differ at even a single nucleotide are assigned as different alleles and no weighting is given to take into account the number of nucleotide differences between alleles, as we cannot distinguish whether differences at multiple nucleotide sites are a result of multiple point mutations or a single recombinational exchange. The large number of potential alleles at each of the loci provides the ability to distinguish billions of different allelic profiles, and a strain with the most common allele at each locus would only be expected to occur by chance approximately once in 10,000 isolates. Despite MLST providing high discriminatory power, the accumulation of nucleotide changes in housekeeping genes is a relatively slow process and the allelic profile of a bacterial isolate is sufficiently stable over time for the method to be ideal for global epidemiology.
The relatedness of isolates is displayed as a dendrogram
constructed using the matrix of pairwise differences between their allelic profiles. The dendrogram is only a convenient way of displaying those isolates that have identical or very similar allelic profiles that can be assumed to be derived from a common ancestor; the relationships between isolates that differ at more than three out of seven loci are likely to be unreliable and should not be taken to infer their phylogeny.
The approach of MLST is distinct from Multi locus enzyme electrophoresis (MLEE), which is based on different electrophoretic mobilities (EM) of multiple core metabolic enzymes. The alleles at each locus define the EM of their products, as different amino acid sequences between enzymes result in different mobilities and distinct bands when run on a gel. The relatedness of isolates can then be visualized with a dendrogram generated from the matrix of pairwise differences between the electrophoretic types. This method has a lower resolution than MLST for several reasons, all arising from the fact that enzymatic phenotype diversity is merely a proxy for DNA sequence diversity. First, enzymes may have different amino acid sequences without having sufficiently different EM to give distinct bands. Second, "silent mutations" may alter the DNA sequence of a gene without altering the encoded amino acids. Thirdly, the phenotype of the enzyme can easily be altered in response to environmental conditions and badly affect the reproducibility of MLEE results - common modifications of enzymes are phosphorylation, cofactor binding and cleavage of transport sequences. This also limits comparability of MLEE data obtained by different laboratories, whereas MLST provides portable and comparable DNA sequence data and has great potential for automation and standardization.
MLST should not be confused with DNA barcoding
. The latter is a taxonomic method that uses short genetic markers in mitochondrial DNA to recognize particular species of eukaryotes. It is based on the fact that mitochondrial DNA
(mtDNA) has a relatively fast mutation rate, which gives significant variation in mtDNA sequences between species. This is only possible in eukaryotes (as prokaryotes lack mitochondria), whereas MLST, although initially developed for prokaryotes, is now finding application in eukaryotes and in principle could be applied to any kingdom.
The application of MLST is huge, and provides a resource for the scientific, public health, and veterinary communities as well as the food industry. The following are examples of MLST applications.
Campylobacter
Campylobacter
is the common causative agent for bacterial infectious intestinal diseases, usually arising from undercooked poultry or unpasteurised milk. However, its epidemiology is poorly understood since outbreaks are rarely detected, so that the sources and transmission routes of outbreak are not easily traced. In addition, Campylobacter genomes are genetically diverse and unstable with frequent inter- and intragenomic recombination, together with phase variation, which complicates the interpretation of data from many typing methods. Until recently, with the application of MLST technique, Campylobacter typing has achieved a great success and added onto the MLST database. As at 1 May 2008, the Campylobacter MLST database contains 3516 isolates and about 30 publications that use or mention MLST in research on Campylobacter (http://pubmlst.org/campylobacter/).
Staphylococcus aureus
S. aureus causes a number of diseases. Methicillin-resistant S. aureus (MRSA) has generated growing concerns over its resistance to almost all antibiotics except vancomycin. However, most serious S. aureus infections in the community, and many in hospitals, are caused by methicillin-susceptible isolates (MSSA) and there have been few attempts to identify the hypervirulent MSSA clones associated with serious disease. MLST was therefore developed to provide an unambiguous method of characterizing MRSA clones and for the identification of the MSSA clones associated with serious disease.
Streptococcus pyogenes
S. pyogenes causes diseases ranging from pharyngitis to life-threatening impetigo including necrotizing fasciitis. An MLST scheme for S. pyogenes has been developed. At present, the database contains the allelic profiles of isolates that represent the worldwide diversity of the organism and isolates from serious invasive disease.
Candida albicans
C. albicans is a fungal pathogen of humans and is responsible for hospital-acquired bloodstream infections. MLST technique has used to characterize C. albicans isolates. Combination of the alleles at the different loci results in unique diploid sequence types that can be used to discriminate strains. MLST has been shown successfully applied to study the epidemiology of C. albicans in the hospital as well as the diversity of C. albicans isolates obtained from diverse ecological niches including human and animal hosts.
The majority of MLST databases are hosted at 2 web servers currently located at Imperial College, London (mlst.net http://www.mlst.net) and in Oxford University (pubmlst.orghttp://www.pubmlst.org).
The databases hosted at each site are different and hold the organism specific reference allele sequences and lists of STs for individual organisms.
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...
for the typing of multiple loci
Locus (genetics)
In the fields of genetics and genetic computation, a locus is the specific location of a gene or DNA sequence on a chromosome. A variant of the DNA sequence at a given locus is called an allele. The ordered list of loci known for a particular genome is called a genetic map...
. The procedure characterizes isolates of bacterial species using the DNA sequences of internal fragments of multiple housekeeping gene
Housekeeping gene
A housekeeping gene is typically a constitutive gene that is required for the maintenance of basic cellular function, and are found in all cells of an organism. Although some housekeeping genes are expressed at relatively constant levels , other housekeeping genes may vary depending on...
s. Approximately 450-500 bp internal fragments of each gene are used, as these can be accurately sequenced on both strands using an automated DNA sequencer. For each housekeeping gene, the different sequences present within a bacterial species are assigned as distinct alleles and, for each isolate, the alleles at each of the loci define the allelic profile or sequence type (ST).
The first MLST scheme to be developed was for Neisseria meningitidis
Neisseria meningitidis
Neisseria meningitidis, often referred to as meningococcus, is a bacterium that can cause meningitis and other forms of meningococcal disease such as meningococcemia, a life threatening sepsis. N. meningitidis is a major cause of morbidity and mortality during childhood in industrialized countries...
, the causative humerous agent of meningococcal meningitis
Meningitis
Meningitis is inflammation of the protective membranes covering the brain and spinal cord, known collectively as the meninges. The inflammation may be caused by infection with viruses, bacteria, or other microorganisms, and less commonly by certain drugs...
and septicaemia.
Principle of MLST
MLST directly measures the DNA sequence variations in a set of housekeeping genes and characterizes strains by their unique allelic profiles. The principle of MLST is simple: the technique involves PCR amplification followed by DNA sequencingDNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....
. Nucleotide differences between strains can be checked at a variable number of genes depending on the degree of discrimination desired.
The workflow of MLST involves: 1) data collection, 2) data analysis and 3) multilocus sequence analysis. In first section, definitive identification of variation is obtained by nucleotide sequence determination of gene fragments. In data analysis all unique sequences are assigned allele numbers and combined into an allelic profile and assigned a sequence type (ST). If new alleles and STs are found, they are stored in database after verification. In the final section of MLST the relatedness of isolates are made by comparing allelic profiles. Researchers do epidemiological and phylogenetical studies by comparing STs of different clonal complexes. A huge set of data is produced during the sequencing and identification process so bioinformatic techniques are used to arrange, manage, analyze and merge all of the biological data.
To strike the balance between the acceptable identification power, time and cost for the strain typing, about seven to eight house-keeping genes are commonly used in the laboratories. Quoting Staphylococcus aureus
Staphylococcus aureus
Staphylococcus aureus is a facultative anaerobic Gram-positive coccal bacterium. It is frequently found as part of the normal skin flora on the skin and nasal passages. It is estimated that 20% of the human population are long-term carriers of S. aureus. S. aureus is the most common species of...
as an example, seven housekeeping genes are used in MLST typing. These genes include carbamate kinase (arcC), shikimate dehydrogenase (aroE), glycerol kinase (glpF), guanylate kinase (gmk), phosphate acetyltransferase (pta), triosephosphate isomerase (tpi) and acetyl coenzyme A acetyltransferase (yqiL) as specified by the MLST website. However, it is not uncommon for up to ten housekeeping genes to be used. For Vibrio vulnificus
Vibrio vulnificus
Vibrio vulnificus is a species of Gram-negative, motile, curved, rod-shaped bacteria of the Vibrio Genus. It was first reported by Hollis et al. in 1976. It was subsequently given the name Beneckea vulnifica by Reichelt et al. in 1976 , and finally Vibrio vulnificus by Farmer in 1979...
, the housekeeping genes used are glucose-6-phosphate isomerase (glp), DNA gyrase, subunit B (gyrB), malate-lactate dehydrogenase (mdh), methionyl-tRNA synthetase (metG), phosphoribosylaminoimidazole synthetase (purM), threonine dehyrogenase (dtdS), diaminopimelate decarboxylase (lysA), transhydrogenase alpha subunit (pntA), dihydroorotase (pyrC) and tryptophanase (tnaA). Thus both the number and type of housekeeping genes interrogated by MLST may differ from species to species.
For each of these housekeeping genes, the different sequences are assigned as alleles and the alleles at the loci provide an allelic profile. A series of profiles can then be the identification marker for strain typing. Sequences that differ at even a single nucleotide are assigned as different alleles and no weighting is given to take into account the number of nucleotide differences between alleles, as we cannot distinguish whether differences at multiple nucleotide sites are a result of multiple point mutations or a single recombinational exchange. The large number of potential alleles at each of the loci provides the ability to distinguish billions of different allelic profiles, and a strain with the most common allele at each locus would only be expected to occur by chance approximately once in 10,000 isolates. Despite MLST providing high discriminatory power, the accumulation of nucleotide changes in housekeeping genes is a relatively slow process and the allelic profile of a bacterial isolate is sufficiently stable over time for the method to be ideal for global epidemiology.
The relatedness of isolates is displayed as a dendrogram
Dendrogram
A dendrogram is a tree diagram frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering...
constructed using the matrix of pairwise differences between their allelic profiles. The dendrogram is only a convenient way of displaying those isolates that have identical or very similar allelic profiles that can be assumed to be derived from a common ancestor; the relationships between isolates that differ at more than three out of seven loci are likely to be unreliable and should not be taken to infer their phylogeny.
Comparison with other techniques
Earlier serological typing approaches had been established for differentiating bacterial isolates, but immunological typing has drawbacks such as reliance on few antigenic loci and unpredictable reactivities of antibodies with different antigenic variants. Several molecular typing schemes have been proposed to determine the relatedness of pathogens such as pulsed-field gel electrophoresis (PFGE), ribotyping, and PCR-based fingerprinting. But these DNA banding-based subtyping methods do not provide meaningful evolutionary analyses. Despite PFGE being considered by many researchers as the “gold standard”, many strains are not typable by this technique due to the degradation of the DNA during the process (gel smears).The approach of MLST is distinct from Multi locus enzyme electrophoresis (MLEE), which is based on different electrophoretic mobilities (EM) of multiple core metabolic enzymes. The alleles at each locus define the EM of their products, as different amino acid sequences between enzymes result in different mobilities and distinct bands when run on a gel. The relatedness of isolates can then be visualized with a dendrogram generated from the matrix of pairwise differences between the electrophoretic types. This method has a lower resolution than MLST for several reasons, all arising from the fact that enzymatic phenotype diversity is merely a proxy for DNA sequence diversity. First, enzymes may have different amino acid sequences without having sufficiently different EM to give distinct bands. Second, "silent mutations" may alter the DNA sequence of a gene without altering the encoded amino acids. Thirdly, the phenotype of the enzyme can easily be altered in response to environmental conditions and badly affect the reproducibility of MLEE results - common modifications of enzymes are phosphorylation, cofactor binding and cleavage of transport sequences. This also limits comparability of MLEE data obtained by different laboratories, whereas MLST provides portable and comparable DNA sequence data and has great potential for automation and standardization.
MLST should not be confused with DNA barcoding
DNA barcoding
DNA barcoding is a taxonomic method that uses a short genetic marker in an organism's DNA to identify it as belonging to a particular species. It differs from molecular phylogeny in that the main goal is not to determine classification but to identify an unknown sample in terms of a known...
. The latter is a taxonomic method that uses short genetic markers in mitochondrial DNA to recognize particular species of eukaryotes. It is based on the fact that mitochondrial DNA
Mitochondrial DNA
Mitochondrial DNA is the DNA located in organelles called mitochondria, structures within eukaryotic cells that convert the chemical energy from food into a form that cells can use, adenosine triphosphate...
(mtDNA) has a relatively fast mutation rate, which gives significant variation in mtDNA sequences between species. This is only possible in eukaryotes (as prokaryotes lack mitochondria), whereas MLST, although initially developed for prokaryotes, is now finding application in eukaryotes and in principle could be applied to any kingdom.
Advantages and Applications
MLST is highly unambiguous and portable. Materials required for ST determination can be exchanged between laboratories. Primer sequences and protocols can be accessed electronically. It is reproducible and scalable. MLST is automated, combines advances in high throughput sequencing and bioinformatics with established population genetics techniques. MLST data can be used to investigate evolutionary relationships among bacteria. MLST provides good discriminatory power to differentiate isolates.The application of MLST is huge, and provides a resource for the scientific, public health, and veterinary communities as well as the food industry. The following are examples of MLST applications.
CampylobacterCampylobacterCampylobacter is a genus of bacteria that are Gram-negative, spiral, and microaerophilic. Motile, with either unipolar or bipolar flagella, the organisms have a characteristic spiral/corkscrew appearance and are oxidase-positive. Campylobacter jejuni is now recognized as one of the main causes...
CampylobacterCampylobacter
Campylobacter is a genus of bacteria that are Gram-negative, spiral, and microaerophilic. Motile, with either unipolar or bipolar flagella, the organisms have a characteristic spiral/corkscrew appearance and are oxidase-positive. Campylobacter jejuni is now recognized as one of the main causes...
is the common causative agent for bacterial infectious intestinal diseases, usually arising from undercooked poultry or unpasteurised milk. However, its epidemiology is poorly understood since outbreaks are rarely detected, so that the sources and transmission routes of outbreak are not easily traced. In addition, Campylobacter genomes are genetically diverse and unstable with frequent inter- and intragenomic recombination, together with phase variation, which complicates the interpretation of data from many typing methods. Until recently, with the application of MLST technique, Campylobacter typing has achieved a great success and added onto the MLST database. As at 1 May 2008, the Campylobacter MLST database contains 3516 isolates and about 30 publications that use or mention MLST in research on Campylobacter (http://pubmlst.org/campylobacter/).
Neisseria meningitides
MLST has provided a more richly textured picture of bacteria within human populations and on strain variants that may be pathogenic to human, plants and animals. MLST technique was first used by Maiden et al. (1) to characterize Neisseria meningitides using six loci. The application of MLST has clearly resolved the major meningococcal lineages known to be responsible for invasive disease around the world. To improve the level of discriminatory power between the major invasive lineages, seven loci are now being used and have been accepted by many laboratories as the method of choice for characterizing meningococcal isolates. It is a well known fact that recombinational exchanges commonly occur in N. meningitidis, leading to rapid diversification of meningococcal clones. MLST has successfully provided a reliable method for characterization of clones within other bacterial species in which the rates of clonal diversification are generally lower.Staphylococcus aureusStaphylococcus aureusStaphylococcus aureus is a facultative anaerobic Gram-positive coccal bacterium. It is frequently found as part of the normal skin flora on the skin and nasal passages. It is estimated that 20% of the human population are long-term carriers of S. aureus. S. aureus is the most common species of...
S. aureus causes a number of diseases. Methicillin-resistant S. aureus (MRSA) has generated growing concerns over its resistance to almost all antibiotics except vancomycin. However, most serious S. aureus infections in the community, and many in hospitals, are caused by methicillin-susceptible isolates (MSSA) and there have been few attempts to identify the hypervirulent MSSA clones associated with serious disease. MLST was therefore developed to provide an unambiguous method of characterizing MRSA clones and for the identification of the MSSA clones associated with serious disease.Streptococcus pyogenesStreptococcus pyogenesStreptococcus pyogenes is a spherical, Gram-positive bacterium that is the cause of group A streptococcal infections. S. pyogenes displays streptococcal group A antigen on its cell wall. S...
S. pyogenes causes diseases ranging from pharyngitis to life-threatening impetigo including necrotizing fasciitis. An MLST scheme for S. pyogenes has been developed. At present, the database contains the allelic profiles of isolates that represent the worldwide diversity of the organism and isolates from serious invasive disease.Candida albicansCandida albicansCandida albicans is a diploid fungus that grows both as yeast and filamentous cells and a causal agent of opportunistic oral and genital infections in humans. Systemic fungal infections including those by C...
C. albicans is a fungal pathogen of humans and is responsible for hospital-acquired bloodstream infections. MLST technique has used to characterize C. albicans isolates. Combination of the alleles at the different loci results in unique diploid sequence types that can be used to discriminate strains. MLST has been shown successfully applied to study the epidemiology of C. albicans in the hospital as well as the diversity of C. albicans isolates obtained from diverse ecological niches including human and animal hosts.Limitations
MLST appears best in population genetic study but it is expensive. Due to the sequence conservation in housekeeping genes, MLST sometimes lacks the discriminatory power to differentiate bacterial strains, which limits its use in epidemiological investigations. To improve the discriminatory power of MLST, a multi-virulence-locus sequence typing (MVLST) approach has been developed using Listeria monocytogenes . MVLST broadens the benefits of MLST but targets virulence genes, which may be more polymorphic than housekeeping genes. Population genetics is not the only relevant factor in an epidemic. Virulence factors are also important in causing disease, and population genetic studies struggle to monitor these. This is because the genes involved are often highly recombining and mobile between strains in comparison with the population genetic framework. Thus, for example in Escherichia coli, identifying strains carrying toxin genes is more important than having a population genetics-based evaluation of prevalent strains.MLST databases
MLST databases contain the reference allele sequences and sequence types for each organism, and also isolate epidemiological data. The websites contain interrogation and analysis software which allow users to query their allele sequences and sequence types. MLST is widely used as a tool for researchers and public healthcare workers.The majority of MLST databases are hosted at 2 web servers currently located at Imperial College, London (mlst.net http://www.mlst.net) and in Oxford University (pubmlst.orghttp://www.pubmlst.org).
The databases hosted at each site are different and hold the organism specific reference allele sequences and lists of STs for individual organisms.
External links
- http://www.mlst.net mlst.net - Imperial College London
- http://pubmlst.org/ PubMLST - Oxford University
- http://mlst.ucc.ie/ databases hosted at University College Cork
- http://www.pasteur.fr/recherche/genopole/PF8/mlst/ databases held at Pasteur Institute.