Microbial phylogenetics
Encyclopedia
Microbial phylogenetics is the study of the evolution
ary relatedness among various groups of microorganism
s. The molecular approach to microbial phylogenetic analysis, pioneered by Carl Woese
in the 1970s and leading to the three-domain model (Archaea
, Bacteria
, Eucaryota), revolutionized our thinking about evolution in the microbial world. Phylogenetic analysis plays a central role in microbiology
and the emerging fields of comparative genomics
and phylogenomics
require substantial knowledge and understanding of phylogenetic analysis and computational methods.
l world, scientists started to include the bacteria in phylogenetic schemes to explain how life on Earth
may have developed. Some of the early phylogenetic tree
s of the prokaryote
world were morphology-based; others were based on the then-current ideas on the presumed conditions on our planet at the time that life first developed. Around 1950 many leading microbiologists, including Roger Stanier
and C. B. van Niel, had become pessimistic with respect to the possibility of ever reconstructing bacterial phylogeny. The concept of the prokaryote-eukaryote dichotomy did little to clarify phylogenetic relationships. The developing technology of nucleic acid sequencing, together with the recognition that sequences of building blocks in informational macromolecules (nucleic acids, proteins) can be used as 'molecular clocks' that contain historical information, led to the development of the three-domain model (Archaea
- Bacteria
- Eucaryota) in the late 1970s, primarily based on small subunit ribosomal RNA
sequence comparisons pioneered by Carl Woese
and George Fox
. The information currently accumulating from complete genome sequences of an ever increasing number of prokaryotes are now leading to further modifications of our views on microbial phylogeny. As more genome sequences become available, scientists have found that determining these relationships is complicated by the prevalence of horizontal gene transfer
(HGT) among archaea and bacteria.
There are four steps in general phylogenetic analysis of molecular sequences: (i) selection of a suitable molecule or molecules (phylogenetic marker), (ii) acquisition of molecular sequences, (iii) multiple sequence alignment (MSA) and (iv) phylogenetic treeing and evaluation. The first step of phylogenetic analysis is to choose a suitable homologous part of the genomes to be compared. Mechanisms of molecular evolution include mutations, duplication of genes, reorganization of genomes, and genetic exchanges such as recombination, reassortment and lateral gene transfer. Although all of this information can be used to infer phylogenetic relationships of genes or organisms, information on mutations, including substitution, insertion, and deletion, is most frequently used in phylogeny reconstruction. The aim is to infer a correct organismal phylogeny, using orthologous genetic loci, in which common ancestry of two sequences can be traced back to a speciation event. Phylogeny using homologous genetic loci derived by gene duplication (paralogy) or related through lateral gene transfer (xenology), cannot reflect evolutionary history of organisms.
Once DNA
sequence data are generated, they are subjected to a multiple sequence alignment process. This involves finding homologous sites, that is, positions derived from the same ancestral organism in the molecules under study. A set of sequences can be aligned with another by introducing "alignment gaps" (known in brief as "gaps"). In general, multiple sequence alignment starts by aligning a pair of sequences (pairwise alignment), and is then expanded to multiple sequences using various algorithms.
Many algorithms and computer programs have been developed in the last few decades for multiple sequence alignment, but the original Clustal
series programs are still most widely used and produce reasonably good quality MSA for small data sets. For a large dataset, such as massive pyrosequencing reads, the MUSCLE program can generate good compromise between accuracy and speed. The MAFFT
program utilizes several different algorithmic approaches and can be used for either small or very large datasets. There are also other computer programs developed for general multiple sequence alignment, but the above three have been most popular and are routinely used in publications in various microbiological disciplines.
(MLSA) represents the novel standard in microbial molecular systematics. In this context, MLSA is implemented in a relatively straightforward way, consisting essentially in the concatenation of several sequence partitions for the same set of organisms, resulting in a "supermatrix" which is used to infer a phylogeny by means of distance-matrix or optimality criterion-based methods. This approach is expected to have an increased resolving power due to the large number of characters analyzed, and a lower sensitivity to the impact of conflicting signals (i.e. phylogenetic incongruence) that result from eventual horizontal gene transfer events. The strategies used to deal with multiple partitions can be grouped in three broad categories: the total evidence, separate analysis and combination approaches. The concatenation approach that dominates MLSAs in the microbial molecular systematics literature is known to systematists working with plants and animals as the "total molecular evidence" approach, and has been used to solve difficult phylogenetic questions such as the relationships among the major groups of cetaceans, that of microsporidia and fungi, or the phylogeny of major plant lineages. The total molecular evidence approach has been criticized because by directly concatenating all available sequence alignments, the evidence of conflicting phylogenetic signals in the different data partitions is lost along with the possibility to uncover the evolutionary processes that gave rise to such contradictory signals. The nature of these conflicts is varied, but in the microbial world the strongest conflicting signals often derive from the existence of horizontal gene transfer events in the dataset. If the individuals containing xenologous loci are not identified and removed from the supermatrix prior to phylogeny inference, the resulting hypothesis may be strongly distorted, since standard treeing methods assume a single underlying evolutionary history. Based on these arguments, the conditional data combination strategy is to be generally preferred in bacterial MLSA.
The conserved inserts or deletions (indels) in protein sequences provide particularly useful means for identifying different groups of microbes in clear molecular terms and for understanding how they have branched off from a common ancestor. Conserved indels and lineage-specific proteins can be useful for understanding microbial phylogeny at different phylogenetic depths.
there even is no official nomenclature: the rules of the International Code of Nomenclature of Prokaryotes do not cover taxa above the rank of class. The most commonly accepted division of the prokaryotes in two "subkingdoms" or "domains" (Bacteria and Archaea) and the classification of their species with validly published names in respectively 27 and 2 "phyla" or "divisions" (as of November 2009) is primarily based on 16S rRNA sequence comparisons. This type of classification was adopted in the latest edition of Bergey's Manual of Systematic Bacteriology. Alternative classifications have been proposed as well, based e.g. on the structure of the cell wall. Some 16S rRNA sequence-based phyla unite prokaryotes of similar physiological properties (for example Cyanobacteria, Chlorobi, Thermotogae); others (Euryarchaeota, Proteobacteria, Flavobacteria) contain organisms with highly disparate lifestyles. Some phyla based on deep 16S rRNA lineages are currently represented by one or a few species only. Environmental genomics/metagenomics approaches suggest existence of many more phyla based on the deep lineages of 16S rRNA gene sequences recovered. To obtain the organisms harboring these sequences and to study their properties is a major challenge of microbiology today.
's theory of evolution
. Until the advent of molecular biology
, however, a universal tree of life
was well beyond the scope of the data and methods of traditional organismal phylogeny. The rapid development of these methods and bodies of genetic sequence from the 1970s onwards resulted in major reclassifications of life and revived ambitions to represent all organismal lineages by one true tree of life. Subsequent realization of the significance of lateral gene transfer and other non-vertical processes has subtly reconceptualized and reoriented attempts to construct this universal phylogeny, leading microbiologists such as Carl Woese
and W. Ford Doolittle to question whether the tree of life is an accurate paradigm for prokaryiotic evolution.
Horizontal gene transfer
has affected the formation of groups of organisms. Gene transfer can make it more difficult to define and determine relationships. In those cases where many genes have been transferred between preferred partners, the majority of genes in a genome may reflect gene acquisition, and as a consequence, if a coherent signal is detected, one nevertheless might not be sure that the signal is due to organismal shared ancestry. However, the presence of a particular transferred gene has been shown, in several cases, to constitute a shared derived character useful in classification. Gene transfer can put together new metabolic pathways that open up new ecological niches, and consequently, the transfer of an adaptive gene might create a new group of organisms.
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...
ary relatedness among various groups of microorganism
Microorganism
A microorganism or microbe is a microscopic organism that comprises either a single cell , cell clusters, or no cell at all...
s. The molecular approach to microbial phylogenetic analysis, pioneered by Carl Woese
Carl Woese
Carl Richard Woese is an American microbiologist and physicist. Woese is famous for defining the Archaea in 1977 by phylogenetic taxonomy of 16S ribosomal RNA, a technique pioneered by Woese and which is now standard practice. He was also the originator of the RNA world hypothesis in 1977,...
in the 1970s and leading to the three-domain model (Archaea
Archaea
The Archaea are a group of single-celled microorganisms. A single individual or species from this domain is called an archaeon...
, Bacteria
Bacteria
Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...
, Eucaryota), revolutionized our thinking about evolution in the microbial world. Phylogenetic analysis plays a central role in microbiology
Microbiology
Microbiology is the study of microorganisms, which are defined as any microscopic organism that comprises either a single cell , cell clusters or no cell at all . This includes eukaryotes, such as fungi and protists, and prokaryotes...
and the emerging fields of comparative genomics
Comparative genomics
Comparative genomics is the study of the relationship of genome structure and function across different biological species or strains. Comparative genomics is an attempt to take advantage of the information provided by the signatures of selection to understand the function and evolutionary...
and phylogenomics
Phylogenomics
Phylogenomics can be regarded as the intersection between the fields of evolution and genomics. The term has been used in multiple ways to refer to some type of analysis involving genome data and evolutionary reconstructions. It is an expansion of earlier phylogenetics...
require substantial knowledge and understanding of phylogenetic analysis and computational methods.
Historical overview
When at the end of the 19th century information began to accumulate about the diversity within the bacteriaBacteria
Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...
l world, scientists started to include the bacteria in phylogenetic schemes to explain how life on Earth
Earth
Earth is the third planet from the Sun, and the densest and fifth-largest of the eight planets in the Solar System. It is also the largest of the Solar System's four terrestrial planets...
may have developed. Some of the early phylogenetic tree
Phylogenetic tree
A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the inferred evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical and/or genetic characteristics...
s of the prokaryote
Prokaryote
The prokaryotes are a group of organisms that lack a cell nucleus , or any other membrane-bound organelles. The organisms that have a cell nucleus are called eukaryotes. Most prokaryotes are unicellular, but a few such as myxobacteria have multicellular stages in their life cycles...
world were morphology-based; others were based on the then-current ideas on the presumed conditions on our planet at the time that life first developed. Around 1950 many leading microbiologists, including Roger Stanier
Roger Stanier
Roger Yate Stanier was a Canadian microbiologist who was influential in the development of modern microbiology. As a member of the Delft School and former student of C. B. van Niel, he made important contributions to the taxonomy of bacteria, including the classification of blue-green algae as...
and C. B. van Niel, had become pessimistic with respect to the possibility of ever reconstructing bacterial phylogeny. The concept of the prokaryote-eukaryote dichotomy did little to clarify phylogenetic relationships. The developing technology of nucleic acid sequencing, together with the recognition that sequences of building blocks in informational macromolecules (nucleic acids, proteins) can be used as 'molecular clocks' that contain historical information, led to the development of the three-domain model (Archaea
Archaea
The Archaea are a group of single-celled microorganisms. A single individual or species from this domain is called an archaeon...
- Bacteria
Bacteria
Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...
- Eucaryota) in the late 1970s, primarily based on small subunit ribosomal RNA
Ribosomal RNA
Ribosomal ribonucleic acid is the RNA component of the ribosome, the enzyme that is the site of protein synthesis in all living cells. Ribosomal RNA provides a mechanism for decoding mRNA into amino acids and interacts with tRNAs during translation by providing peptidyl transferase activity...
sequence comparisons pioneered by Carl Woese
Carl Woese
Carl Richard Woese is an American microbiologist and physicist. Woese is famous for defining the Archaea in 1977 by phylogenetic taxonomy of 16S ribosomal RNA, a technique pioneered by Woese and which is now standard practice. He was also the originator of the RNA world hypothesis in 1977,...
and George Fox
George Fox
George Fox was an English Dissenter and a founder of the Religious Society of Friends, commonly known as the Quakers or Friends.The son of a Leicestershire weaver, Fox lived in a time of great social upheaval and war...
. The information currently accumulating from complete genome sequences of an ever increasing number of prokaryotes are now leading to further modifications of our views on microbial phylogeny. As more genome sequences become available, scientists have found that determining these relationships is complicated by the prevalence of horizontal gene transfer
Horizontal gene transfer
Horizontal gene transfer , also lateral gene transfer , is any process in which an organism incorporates genetic material from another organism without being the offspring of that organism...
(HGT) among archaea and bacteria.
Methods and programs
The purpose of phylogenetic analysis is to understand the past evolutionary path of organisms. Even though we will never know for certain the true phylogeny of any organism, phylogenetic analysis provides best assumptions, thereby providing a framework for various disciplines in microbiology. Due to the technological innovation of modern molecular biology and the rapid advancement in computational science, accurate inference of the phylogeny of a gene or organism seems possible in the near future. There has been a flood of nucleic acid sequence information, bioinformatic tools and phylogenetic inference methods in public domain databases, literature and World Wide Web space. Phylogenetic analysis has long played a central role in basic microbiology, for example in taxonomy and ecology. In addition, more recently emerging fields of microbiology, including comparative genomics and phylogenomics, require substantial knowledge and understanding of phylogenetic analysis and computational skills to handle the large-scale data involved. Methods of phylogenetic analysis and relevant computer software tools lend accuracy, efficiency and availability to the task.There are four steps in general phylogenetic analysis of molecular sequences: (i) selection of a suitable molecule or molecules (phylogenetic marker), (ii) acquisition of molecular sequences, (iii) multiple sequence alignment (MSA) and (iv) phylogenetic treeing and evaluation. The first step of phylogenetic analysis is to choose a suitable homologous part of the genomes to be compared. Mechanisms of molecular evolution include mutations, duplication of genes, reorganization of genomes, and genetic exchanges such as recombination, reassortment and lateral gene transfer. Although all of this information can be used to infer phylogenetic relationships of genes or organisms, information on mutations, including substitution, insertion, and deletion, is most frequently used in phylogeny reconstruction. The aim is to infer a correct organismal phylogeny, using orthologous genetic loci, in which common ancestry of two sequences can be traced back to a speciation event. Phylogeny using homologous genetic loci derived by gene duplication (paralogy) or related through lateral gene transfer (xenology), cannot reflect evolutionary history of organisms.
Once DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
sequence data are generated, they are subjected to a multiple sequence alignment process. This involves finding homologous sites, that is, positions derived from the same ancestral organism in the molecules under study. A set of sequences can be aligned with another by introducing "alignment gaps" (known in brief as "gaps"). In general, multiple sequence alignment starts by aligning a pair of sequences (pairwise alignment), and is then expanded to multiple sequences using various algorithms.
Many algorithms and computer programs have been developed in the last few decades for multiple sequence alignment, but the original Clustal
Clustal
Clustal is a widely used multiple sequence alignment computer program. The latest version is 2.1. There are two main variations:*ClustalW: command line interface*ClustalX: This version has a graphical user interface...
series programs are still most widely used and produce reasonably good quality MSA for small data sets. For a large dataset, such as massive pyrosequencing reads, the MUSCLE program can generate good compromise between accuracy and speed. The MAFFT
MAFFT
MAFFT is a multiple sequence alignment program for amino acid or nucleotide sequences. MAFFT is freely available for academic use, without any warranty.- External links :* * * at EBI* at GenomeNet* in MyHits, SIB...
program utilizes several different algorithmic approaches and can be used for either small or very large datasets. There are also other computer programs developed for general multiple sequence alignment, but the above three have been most popular and are routinely used in publications in various microbiological disciplines.
Multilocus sequence analysis
Multilocus sequence analysisMultilocus sequence typing
Multilocus sequence typing is a technique in molecular biology for the typing of multiple loci. The procedure characterizes isolates of bacterial species using the DNA sequences of internal fragments of multiple housekeeping genes...
(MLSA) represents the novel standard in microbial molecular systematics. In this context, MLSA is implemented in a relatively straightforward way, consisting essentially in the concatenation of several sequence partitions for the same set of organisms, resulting in a "supermatrix" which is used to infer a phylogeny by means of distance-matrix or optimality criterion-based methods. This approach is expected to have an increased resolving power due to the large number of characters analyzed, and a lower sensitivity to the impact of conflicting signals (i.e. phylogenetic incongruence) that result from eventual horizontal gene transfer events. The strategies used to deal with multiple partitions can be grouped in three broad categories: the total evidence, separate analysis and combination approaches. The concatenation approach that dominates MLSAs in the microbial molecular systematics literature is known to systematists working with plants and animals as the "total molecular evidence" approach, and has been used to solve difficult phylogenetic questions such as the relationships among the major groups of cetaceans, that of microsporidia and fungi, or the phylogeny of major plant lineages. The total molecular evidence approach has been criticized because by directly concatenating all available sequence alignments, the evidence of conflicting phylogenetic signals in the different data partitions is lost along with the possibility to uncover the evolutionary processes that gave rise to such contradictory signals. The nature of these conflicts is varied, but in the microbial world the strongest conflicting signals often derive from the existence of horizontal gene transfer events in the dataset. If the individuals containing xenologous loci are not identified and removed from the supermatrix prior to phylogeny inference, the resulting hypothesis may be strongly distorted, since standard treeing methods assume a single underlying evolutionary history. Based on these arguments, the conditional data combination strategy is to be generally preferred in bacterial MLSA.
rRNA and other global markers
The introduction of comparative rRNA sequence analysis represents a major milestone in the history of microbiology. The current taxonomy of prokaryotes as well as modern probe and chip based identification methods are mainly based upon rRNA derived phylogenetic conclusions. Also of importance is single gene based phylogenetic inference and alternative global markers include elongation and initiation factors, RNA polymerase subunits, DNA gyrases, heat shock and recA proteins. Although the comparative analyses are hampered by the generally low phylogenetic information content, and different resolution power, and multiple copies of the individual markers, the domain and prokaryotic phyla concept is globally supported.The conserved inserts or deletions (indels) in protein sequences provide particularly useful means for identifying different groups of microbes in clear molecular terms and for understanding how they have branched off from a common ancestor. Conserved indels and lineage-specific proteins can be useful for understanding microbial phylogeny at different phylogenetic depths.
The phyla of prokaryotes
There is no official classification of prokaryotes. For the higher taxaTaxon
|thumb|270px|[[African elephants]] form a widely-accepted taxon, the [[genus]] LoxodontaA taxon is a group of organisms, which a taxonomist adjudges to be a unit. Usually a taxon is given a name and a rank, although neither is a requirement...
there even is no official nomenclature: the rules of the International Code of Nomenclature of Prokaryotes do not cover taxa above the rank of class. The most commonly accepted division of the prokaryotes in two "subkingdoms" or "domains" (Bacteria and Archaea) and the classification of their species with validly published names in respectively 27 and 2 "phyla" or "divisions" (as of November 2009) is primarily based on 16S rRNA sequence comparisons. This type of classification was adopted in the latest edition of Bergey's Manual of Systematic Bacteriology. Alternative classifications have been proposed as well, based e.g. on the structure of the cell wall. Some 16S rRNA sequence-based phyla unite prokaryotes of similar physiological properties (for example Cyanobacteria, Chlorobi, Thermotogae); others (Euryarchaeota, Proteobacteria, Flavobacteria) contain organisms with highly disparate lifestyles. Some phyla based on deep 16S rRNA lineages are currently represented by one or a few species only. Environmental genomics/metagenomics approaches suggest existence of many more phyla based on the deep lineages of 16S rRNA gene sequences recovered. To obtain the organisms harboring these sequences and to study their properties is a major challenge of microbiology today.
Horizontal gene transfer
Efforts to construct the tree of life take their conceptual motivation from Charles DarwinCharles Darwin
Charles Robert Darwin FRS was an English naturalist. He established that all species of life have descended over time from common ancestry, and proposed the scientific theory that this branching pattern of evolution resulted from a process that he called natural selection.He published his theory...
's theory of evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...
. Until the advent of molecular biology
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...
, however, a universal tree of life
Tree of life
The concept of a tree of life, a many-branched tree illustrating the idea that all life on earth is related, has been used in science , religion, philosophy, mythology, and other areas...
was well beyond the scope of the data and methods of traditional organismal phylogeny. The rapid development of these methods and bodies of genetic sequence from the 1970s onwards resulted in major reclassifications of life and revived ambitions to represent all organismal lineages by one true tree of life. Subsequent realization of the significance of lateral gene transfer and other non-vertical processes has subtly reconceptualized and reoriented attempts to construct this universal phylogeny, leading microbiologists such as Carl Woese
Carl Woese
Carl Richard Woese is an American microbiologist and physicist. Woese is famous for defining the Archaea in 1977 by phylogenetic taxonomy of 16S ribosomal RNA, a technique pioneered by Woese and which is now standard practice. He was also the originator of the RNA world hypothesis in 1977,...
and W. Ford Doolittle to question whether the tree of life is an accurate paradigm for prokaryiotic evolution.
Horizontal gene transfer
Horizontal gene transfer
Horizontal gene transfer , also lateral gene transfer , is any process in which an organism incorporates genetic material from another organism without being the offspring of that organism...
has affected the formation of groups of organisms. Gene transfer can make it more difficult to define and determine relationships. In those cases where many genes have been transferred between preferred partners, the majority of genes in a genome may reflect gene acquisition, and as a consequence, if a coherent signal is detected, one nevertheless might not be sure that the signal is due to organismal shared ancestry. However, the presence of a particular transferred gene has been shown, in several cases, to constitute a shared derived character useful in classification. Gene transfer can put together new metabolic pathways that open up new ecological niches, and consequently, the transfer of an adaptive gene might create a new group of organisms.
See also
- PhylogeneticsPhylogeneticsIn biology, phylogenetics is the study of evolutionary relatedness among groups of organisms , which is discovered through molecular sequencing data and morphological data matrices...
- Molecular phylogenetics
- Computational phylogeneticsComputational phylogeneticsComputational phylogenetics is the application of computational algorithms, methods and programs to phylogenetic analyses. The goal is to assemble a phylogenetic tree representing a hypothesis about the evolutionary ancestry of a set of genes, species, or other taxa...
- History of molecular evolutionHistory of molecular evolutionThe history of molecular evolution starts in the early 20th century with "comparative biochemistry", but the field of molecular evolution came into its own in the 1960s and 1970s, following the rise of molecular biology...