Genome
Encyclopedia
In modern molecular biology
and genetics
, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA
or, for many types of virus
, in RNA
. The genome includes both the gene
s and the non-coding sequences of the DNA/RNA.
, Professor of Botany
at the University of Hamburg
, Germany
. In Greek
, the word genome (γίνομαι) means "I become, I am born, to come into being". The Oxford English Dictionary suggests the name to be a blend of the words gene and chromosome. A few related -ome words already existed, such as biome
and rhizome
, forming a vocabulary into which genome fits systematically.
(typically eukarya) the gamete
has half the number of chromosomes of the somatic cell
and the genome is a full set of chromosome
s in a gamete. In haploid organism
s, including cells of bacteria
, archaea
, and in organelles including mitochondria and chloroplasts, or virus
es, that similarly contain genes, the single or set of circular and/or linear chains of DNA (or RNA for some viruses
), likewise constitute the genome. The term genome can be applied specifically to mean that stored on a complete set of nuclear
DNA (i.e., the "nuclear genome") but can also be applied to that stored within organelles that contain their own DNA, as with the "mitochondrial genome" or the "chloroplast
genome". Additionally, the genome can comprise nonchromosomal genetic elements such as virus
es, plasmid
s, and transposable elements.
When people say that the genome of a sexually reproducing
species
has been "sequenced
", typically they are referring to a determination of the sequences of one set of autosome
s and one of each type of sex chromosome, which together represent both of the possible sexes. Even in species that exist in only one sex, what is described as "a genome sequence" may be a composite read from the chromosomes of various individuals. In general use, the phrase "genetic makeup" is sometimes used conversationally to mean the genome of a particular individual or organism. The study of the global properties of genomes of related organisms is usually referred to as genomics
, which distinguishes it from genetics
which generally studies the properties of single gene
s or groups of genes.
Both the number of base pair
s and the number of genes vary widely from one species to another, and there is only a rough correlation between the two (an observation known as the C-value paradox). At present, the highest known number of genes is around 60,000, for the protozoan causing trichomoniasis
(see List of sequenced eukaryotic genomes), almost three times as many as in the human genome
.
An analogy to the human genome stored on DNA is that of instructions stored in a book:
s. In such circumstances then, "genome" describes all of the genes and information on non-coding DNA that have the potential to be present.
In eukaryote
s such as plants, protozoa and animals, however, "genome" carries the typical connotation of only information on chromosomal DNA. So although these organisms contain chloroplasts and/or mitochondria that have their own DNA, the genetic information contained by DNA within these organelles is not considered part of the genome. In fact, mitochondria are sometimes said to have their own genome often referred to as the "mitochondrial genome". The DNA found within the chloroplast may be referred to as the "plastome
".
of a species. For example, the human genome sequence in principle could be determined from just half the information on the DNA of one cell from one individual. To learn what variations in genetic information underlie particular traits or diseases requires comparisons across individuals. This point explains the common usage of "genome" (which parallels a common usage of "gene") to refer not to the information in any particular DNA sequence, but to a whole family of sequences that share a biological context.
Although this concept may seem counter intuitive, it is the same concept that says there is no particular shape that is the shape of a cheetah
. Cheetahs vary, and so do the sequences of their genomes. Yet both the individual animals and their sequences share commonalities, so one can learn something about cheetahs and "cheetah-ness" from a single example of either.
was organized to map and to sequence
the human genome. Other genome projects include mouse, rice
, the plant Arabidopsis thaliana
, the puffer fish, and bacteria like E. coli. In 1976, Walter Fiers
at the University of Ghent (Belgium
) was the first to establish the complete nucleotide sequence of a viral RNA-genome (bacteriophage MS2
). The first DNA-genome project to be completed was the Phage Φ-X174
, with only 5386 base pairs, which was sequenced by Fred Sanger in 1977. The first bacterial genome to be completed was that of Haemophilus influenzae
, completed by a team at The Institute for Genomic Research
in 1995. A few months later, the first eukaryotic genome was completed, with the 16 chromosomes of budding yeast Saccharomyces cerevisiae
being released as a result of a European-led effort begun in the mid-1980s.
The development of new technologies has made it dramatically easier and cheaper to do sequencing, and the number of complete genome sequences is growing rapidly. Among many genome databases, the one maintained by the US National Institutes of Health is inclusive.
These new technologies open up the prospect of personal genome sequencing as an important diagnostic tool. A major step toward that goal was the completion of the decipherment of the full genome
of DNA pioneer James D. Watson
in 2007.
Whereas a genome sequence lists the order of every DNA base in a genome, a genome map identifies the landmarks. A genome map is less detailed than a genome sequence and aids in navigating around the genome. A fundamental step in the Human genome project was the release of a detailed genomic map by Jean Weissenbach
and his team at the Genoscope in Paris .
Note: The DNA from a single (diploid) human cell if the 46 chromosomes were connected end-to-end and straightened, would have a length of ~2 m and a width of ~2.4 nanometers.
Since genomes and their organisms are very complex, one research strategy is to reduce the number of genes in a genome to the bare minimum and still have the organism in question survive. There is experimental work being done on minimal genomes for single cell organisms as well as minimal genomes for multicellular organisms (see Developmental biology
). The work is both in vivo
and in silico
.
and studied without reference to the details of any particular genes and their products. Researchers compare traits such as chromosome number (karyotype
), genome size
, gene
order, codon usage bias
, and GC-content
to determine what mechanisms could have produced the great variety of genomes that exist today (for recent overviews, see Brown 2002; Saccone and Pesole 2003; Benfey and Protopapas 2004; Gibson and Muse 2004; Reese 2004; Gregory 2005).
Duplications
play a major role in shaping the genome. Duplications may range from extension of short tandem repeats, to duplication of a cluster of genes, and all the way to duplications of entire chromosomes or even entire genomes
. Such duplications are probably fundamental to the creation of genetic novelty.
Horizontal gene transfer
is invoked to explain how there is often extreme similarity between small portions of the genomes of two organisms that are otherwise very distantly related. Horizontal gene transfer seems to be common among many microbes. Also, eukaryotic cells
seem to have experienced a transfer of some genetic material from their chloroplast
and mitochondrial genomes to their nuclear chromosomes.
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...
and genetics
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....
, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
or, for many types of virus
RNA virus
An RNA virus is a virus that has RNA as its genetic material. This nucleic acid is usually single-stranded RNA but may be double-stranded RNA...
, in RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....
. The genome includes both the gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...
s and the non-coding sequences of the DNA/RNA.
Origin of term
The term was adapted in 1920 by Hans WinklerHans Winkler
Professor Hans Winkler was a German botanist. He was Professor of Botany at the University of Hamburg, and a director of that university's Institute of Botany. He is remembered for coining the term 'genome' in 1920, by making a portmanteau of the words gene and chromosome...
, Professor of Botany
Botany
Botany, plant science, or plant biology is a branch of biology that involves the scientific study of plant life. Traditionally, botany also included the study of fungi, algae and viruses...
at the University of Hamburg
University of Hamburg
The University of Hamburg is a university in Hamburg, Germany. It was founded on 28 March 1919 by Wilhelm Stern and others. It grew out of the previous Allgemeines Vorlesungswesen and the Kolonialinstitut as well as the Akademisches Gymnasium. There are around 38,000 students as of the start of...
, Germany
Germany
Germany , officially the Federal Republic of Germany , is a federal parliamentary republic in Europe. The country consists of 16 states while the capital and largest city is Berlin. Germany covers an area of 357,021 km2 and has a largely temperate seasonal climate...
. In Greek
Greek language
Greek is an independent branch of the Indo-European family of languages. Native to the southern Balkans, it has the longest documented history of any Indo-European language, spanning 34 centuries of written records. Its writing system has been the Greek alphabet for the majority of its history;...
, the word genome (γίνομαι) means "I become, I am born, to come into being". The Oxford English Dictionary suggests the name to be a blend of the words gene and chromosome. A few related -ome words already existed, such as biome
Biome
Biomes are climatically and geographically defined as similar climatic conditions on the Earth, such as communities of plants, animals, and soil organisms, and are often referred to as ecosystems. Some parts of the earth have more or less the same kind of abiotic and biotic factors spread over a...
and rhizome
Rhizome
In botany and dendrology, a rhizome is a characteristically horizontal stem of a plant that is usually found underground, often sending out roots and shoots from its nodes...
, forming a vocabulary into which genome fits systematically.
Overview
Some organisms have multiple copies of chromosomes, diploid, triploid, tetraploid and so on. In classical genetics, in a sexually reproducing organismOrganism
In biology, an organism is any contiguous living system . In at least some form, all organisms are capable of response to stimuli, reproduction, growth and development, and maintenance of homoeostasis as a stable whole.An organism may either be unicellular or, as in the case of humans, comprise...
(typically eukarya) the gamete
Gamete
A gamete is a cell that fuses with another cell during fertilization in organisms that reproduce sexually...
has half the number of chromosomes of the somatic cell
Somatic cell
A somatic cell is any biological cell forming the body of an organism; that is, in a multicellular organism, any cell other than a gamete, germ cell, gametocyte or undifferentiated stem cell...
and the genome is a full set of chromosome
Chromosome
A chromosome is an organized structure of DNA and protein found in cells. It is a single piece of coiled DNA containing many genes, regulatory elements and other nucleotide sequences. Chromosomes also contain DNA-bound proteins, which serve to package the DNA and control its functions.Chromosomes...
s in a gamete. In haploid organism
Organism
In biology, an organism is any contiguous living system . In at least some form, all organisms are capable of response to stimuli, reproduction, growth and development, and maintenance of homoeostasis as a stable whole.An organism may either be unicellular or, as in the case of humans, comprise...
s, including cells of bacteria
Bacteria
Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...
, archaea
Archaea
The Archaea are a group of single-celled microorganisms. A single individual or species from this domain is called an archaeon...
, and in organelles including mitochondria and chloroplasts, or virus
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...
es, that similarly contain genes, the single or set of circular and/or linear chains of DNA (or RNA for some viruses
RNA virus
An RNA virus is a virus that has RNA as its genetic material. This nucleic acid is usually single-stranded RNA but may be double-stranded RNA...
), likewise constitute the genome. The term genome can be applied specifically to mean that stored on a complete set of nuclear
Cell nucleus
In cell biology, the nucleus is a membrane-enclosed organelle found in eukaryotic cells. It contains most of the cell's genetic material, organized as multiple long linear DNA molecules in complex with a large variety of proteins, such as histones, to form chromosomes. The genes within these...
DNA (i.e., the "nuclear genome") but can also be applied to that stored within organelles that contain their own DNA, as with the "mitochondrial genome" or the "chloroplast
Chloroplast
Chloroplasts are organelles found in plant cells and other eukaryotic organisms that conduct photosynthesis. Chloroplasts capture light energy to conserve free energy in the form of ATP and reduce NADP to NADPH through a complex set of processes called photosynthesis.Chloroplasts are green...
genome". Additionally, the genome can comprise nonchromosomal genetic elements such as virus
Virus
A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea...
es, plasmid
Plasmid
In microbiology and genetics, a plasmid is a DNA molecule that is separate from, and can replicate independently of, the chromosomal DNA. They are double-stranded and, in many cases, circular...
s, and transposable elements.
When people say that the genome of a sexually reproducing
Sexual reproduction
Sexual reproduction is the creation of a new organism by combining the genetic material of two organisms. There are two main processes during sexual reproduction; they are: meiosis, involving the halving of the number of chromosomes; and fertilization, involving the fusion of two gametes and the...
species
Species
In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...
has been "sequenced
Sequencing
In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer...
", typically they are referring to a determination of the sequences of one set of autosome
Autosome
An autosome is a chromosome that is not a sex chromosome, or allosome; that is to say, there is an equal number of copies of the chromosome in males and females. For example, in humans, there are 22 pairs of autosomes. In addition to autosomes, there are sex chromosomes, to be specific: X and Y...
s and one of each type of sex chromosome, which together represent both of the possible sexes. Even in species that exist in only one sex, what is described as "a genome sequence" may be a composite read from the chromosomes of various individuals. In general use, the phrase "genetic makeup" is sometimes used conversationally to mean the genome of a particular individual or organism. The study of the global properties of genomes of related organisms is usually referred to as genomics
Genomics
Genomics is a discipline in genetics concerning the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis,...
, which distinguishes it from genetics
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....
which generally studies the properties of single gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...
s or groups of genes.
Both the number of base pair
Base pair
In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair...
s and the number of genes vary widely from one species to another, and there is only a rough correlation between the two (an observation known as the C-value paradox). At present, the highest known number of genes is around 60,000, for the protozoan causing trichomoniasis
Trichomoniasis
Trichomoniasis, sometimes referred to as "trich", is a common cause of vaginitis. It is a sexually transmitted disease, and is caused by the single-celled protozoan parasite Trichomonas vaginalis producing mechanical stress on host cells and then ingesting cell fragments after cell death...
(see List of sequenced eukaryotic genomes), almost three times as many as in the human genome
Human genome
The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...
.
An analogy to the human genome stored on DNA is that of instructions stored in a book:
- The book (genome) would contain 23 chapters (chromosomes);
- each chapter contains 48 to 250 million letters (A,C,G,T) without spaces;
- Hence, the book contains over 3.2 billion letters total;
- The book fits into a cell nucleus the size of a pinpoint;
- At least one copy of the book (all 23 chapters) is contained in most cells of our body. The only exception in humans is found in mature red blood cells which become enucleated during development and therefore lack a genome.
Types
Most biological entities that are more complex than a virus sometimes or always carry additional genetic material besides that which resides in their chromosomes. In some contexts, such as sequencing the genome of a pathogenic microbe, "genome" is meant to include information stored on this auxiliary material, which is carried in plasmidPlasmid
In microbiology and genetics, a plasmid is a DNA molecule that is separate from, and can replicate independently of, the chromosomal DNA. They are double-stranded and, in many cases, circular...
s. In such circumstances then, "genome" describes all of the genes and information on non-coding DNA that have the potential to be present.
In eukaryote
Eukaryote
A eukaryote is an organism whose cells contain complex structures enclosed within membranes. Eukaryotes may more formally be referred to as the taxon Eukarya or Eukaryota. The defining membrane-bound structure that sets eukaryotic cells apart from prokaryotic cells is the nucleus, or nuclear...
s such as plants, protozoa and animals, however, "genome" carries the typical connotation of only information on chromosomal DNA. So although these organisms contain chloroplasts and/or mitochondria that have their own DNA, the genetic information contained by DNA within these organelles is not considered part of the genome. In fact, mitochondria are sometimes said to have their own genome often referred to as the "mitochondrial genome". The DNA found within the chloroplast may be referred to as the "plastome
Plastome
The plastome is the genetic material that is found in plastids in plant cells . It composes part of the entire genome of photosynthetic organisms....
".
Genomes and genetic variation
A genome does not capture the genetic diversity or the genetic polymorphismPolymorphism (biology)
Polymorphism in biology occurs when two or more clearly different phenotypes exist in the same population of a species — in other words, the occurrence of more than one form or morph...
of a species. For example, the human genome sequence in principle could be determined from just half the information on the DNA of one cell from one individual. To learn what variations in genetic information underlie particular traits or diseases requires comparisons across individuals. This point explains the common usage of "genome" (which parallels a common usage of "gene") to refer not to the information in any particular DNA sequence, but to a whole family of sequences that share a biological context.
Although this concept may seem counter intuitive, it is the same concept that says there is no particular shape that is the shape of a cheetah
Cheetah
The cheetah is a large-sized feline inhabiting most of Africa and parts of the Middle East. The cheetah is the only extant member of the genus Acinonyx, most notable for modifications in the species' paws...
. Cheetahs vary, and so do the sequences of their genomes. Yet both the individual animals and their sequences share commonalities, so one can learn something about cheetahs and "cheetah-ness" from a single example of either.
Sequencing and mapping
The Human Genome ProjectHuman Genome Project
The Human Genome Project is an international scientific research project with a primary goal of determining the sequence of chemical base pairs which make up DNA, and of identifying and mapping the approximately 20,000–25,000 genes of the human genome from both a physical and functional...
was organized to map and to sequence
Sequencing
In genetics and biochemistry, sequencing means to determine the primary structure of an unbranched biopolymer...
the human genome. Other genome projects include mouse, rice
Rice
Rice is the seed of the monocot plants Oryza sativa or Oryza glaberrima . As a cereal grain, it is the most important staple food for a large part of the world's human population, especially in East Asia, Southeast Asia, South Asia, the Middle East, and the West Indies...
, the plant Arabidopsis thaliana
Arabidopsis thaliana
Arabidopsis thaliana is a small flowering plant native to Europe, Asia, and northwestern Africa. A spring annual with a relatively short life cycle, arabidopsis is popular as a model organism in plant biology and genetics...
, the puffer fish, and bacteria like E. coli. In 1976, Walter Fiers
Walter Fiers
Walter Fiers is a Belgian molecular biologist.He obtained a degree of Engineer for Chemistry and Agricultural Industries at the University of Ghent in 1954, and started his research career as an enzymologist in the laboratory of Laurent Vandendriessche in Ghent. In 1956-57, he worked with Heinz...
at the University of Ghent (Belgium
Belgium
Belgium , officially the Kingdom of Belgium, is a federal state in Western Europe. It is a founding member of the European Union and hosts the EU's headquarters, and those of several other major international organisations such as NATO.Belgium is also a member of, or affiliated to, many...
) was the first to establish the complete nucleotide sequence of a viral RNA-genome (bacteriophage MS2
Bacteriophage
A bacteriophage is any one of a number of viruses that infect bacteria. They do this by injecting genetic material, which they carry enclosed in an outer protein capsid...
). The first DNA-genome project to be completed was the Phage Φ-X174
Phi-X174 phage
The phi X 174 bacteriophage was the first DNA-based genome to be sequenced. This work was completed by Fred Sanger and his team in 1977. In 1962, Walter Fiers had already demonstrated the physical, covalently closed circularity of phi X 174 DNA.In 2003, it was reported that the whole genome of...
, with only 5386 base pairs, which was sequenced by Fred Sanger in 1977. The first bacterial genome to be completed was that of Haemophilus influenzae
Haemophilus influenzae
Haemophilus influenzae, formerly called Pfeiffer's bacillus or Bacillus influenzae, Gram-negative, rod-shaped bacterium first described in 1892 by Richard Pfeiffer during an influenza pandemic. A member of the Pasteurellaceae family, it is generally aerobic, but can grow as a facultative anaerobe. H...
, completed by a team at The Institute for Genomic Research
The Institute for Genomic Research
The Institute for Genomic Research was a non-profit genomics research institute founded in 1992 by Craig Venter in Rockville, Maryland, United States. It is now a part of the J. Craig Venter Institute.-History:...
in 1995. A few months later, the first eukaryotic genome was completed, with the 16 chromosomes of budding yeast Saccharomyces cerevisiae
Saccharomyces cerevisiae
Saccharomyces cerevisiae is a species of yeast. It is perhaps the most useful yeast, having been instrumental to baking and brewing since ancient times. It is believed that it was originally isolated from the skin of grapes...
being released as a result of a European-led effort begun in the mid-1980s.
The development of new technologies has made it dramatically easier and cheaper to do sequencing, and the number of complete genome sequences is growing rapidly. Among many genome databases, the one maintained by the US National Institutes of Health is inclusive.
These new technologies open up the prospect of personal genome sequencing as an important diagnostic tool. A major step toward that goal was the completion of the decipherment of the full genome
Human genome
The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...
of DNA pioneer James D. Watson
James D. Watson
James Dewey Watson is an American molecular biologist, geneticist, and zoologist, best known as one of the co-discoverers of the structure of DNA in 1953 with Francis Crick...
in 2007.
Whereas a genome sequence lists the order of every DNA base in a genome, a genome map identifies the landmarks. A genome map is less detailed than a genome sequence and aids in navigating around the genome. A fundamental step in the Human genome project was the release of a detailed genomic map by Jean Weissenbach
Jean Weissenbach
Jean Weissenbach is the current director of the Genoscope. He is one of the pioneers of the sequencing and analysis of the genomes.-References:...
and his team at the Genoscope in Paris .
Comparison of different genome sizes
Organism type | Organism | Genome size (base pair Base pair In molecular biology and genetics, the linking between two nitrogenous bases on opposite complementary DNA or certain types of RNA strands that are connected via hydrogen bonds is called a base pair... s) |
Genome size (in human-readable format) | mass - in pg | Note |
---|---|---|---|---|---|
Virus Virus A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea... |
Bacteriophage MS2 Bacteriophage MS2 The bacteriophage MS2 is an icosahedral, positive-sense single-stranded RNA virus that infects the bacterium Escherichia coli.-History:... |
3,569 | 3.5kb | 0.000002 | First sequenced RNA-genome |
Virus Virus A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea... |
SV40 SV40 SV40 is an abbreviation for Simian vacuolating virus 40 or Simian virus 40, a polyomavirus that is found in both monkeys and humans... |
5,224 | 5.2kb | ||
Virus Virus A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea... |
Phage Φ-X174 Phi-X174 phage The phi X 174 bacteriophage was the first DNA-based genome to be sequenced. This work was completed by Fred Sanger and his team in 1977. In 1962, Walter Fiers had already demonstrated the physical, covalently closed circularity of phi X 174 DNA.In 2003, it was reported that the whole genome of... |
5,386 | 5.4kb | First sequenced DNA-genome | |
Virus Virus A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea... |
HIV HIV Human immunodeficiency virus is a lentivirus that causes acquired immunodeficiency syndrome , a condition in humans in which progressive failure of the immune system allows life-threatening opportunistic infections and cancers to thrive... |
9,749 | 9.7kb | ||
Virus Virus A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea... |
Phage λ Lambda phage Enterobacteria phage λ is a temperate bacteriophage that infects Escherichia coli.Lambda phage is a virus particle consisting of a head, containing double-stranded linear DNA as its genetic material, and a tail that can have tail fibers. The phage particle recognizes and binds to its host, E... |
48,502 | 48kb | ||
Virus Virus A virus is a small infectious agent that can replicate only inside the living cells of organisms. Viruses infect all types of organisms, from animals and plants to bacteria and archaea... |
Mimivirus Mimivirus Mimivirus is a viral genus containing a single identified species named Acanthamoeba polyphaga mimivirus , or is a group of phylogenetically related large viruses . In colloquial speech, APMV is more commonly referred to as just “mimivirus”... |
1,181,404 | 1.2Mb | Largest known viral genome | |
Bacterium | Haemophilus influenzae Haemophilus influenzae Haemophilus influenzae, formerly called Pfeiffer's bacillus or Bacillus influenzae, Gram-negative, rod-shaped bacterium first described in 1892 by Richard Pfeiffer during an influenza pandemic. A member of the Pasteurellaceae family, it is generally aerobic, but can grow as a facultative anaerobe. H... |
1,830,000 | 1.8Mb | First genome of a living organism sequenced, July 1995 | |
Bacterium | Carsonella ruddii | 159,662 | 160kb | Smallest non-viral genome. | |
Bacterium | Buchnera aphidicola | 600,000 | 600kb | ||
Bacterium | Wigglesworthia glossinidia | 700,000 | 700Kb | ||
Bacterium | Escherichia coli Escherichia coli Escherichia coli is a Gram-negative, rod-shaped bacterium that is commonly found in the lower intestine of warm-blooded organisms . Most E. coli strains are harmless, but some serotypes can cause serious food poisoning in humans, and are occasionally responsible for product recalls... |
4,600,000 | 4.6Mb | ||
Bacterium | Solibacter usitatus (strain Ellin 6076) | 9,970,000 | 10Mb | Largest known Bacterial genome | |
Amoeboid Amoeboid Amoeboids are single-celled life-forms characterized by an irregular shape."Amoeboid" and "amœba" are often used interchangeably even by biologists, and especially refer to a creature moving by using pseudopodia. Most references to "amoebas" or "amoebae" are to amoeboids in general rather than to... |
Polychaos dubium ("Amoeba" dubia) | 670,000,000,000 | 670Gb | 737 | Largest known genome. (Disputed ) |
Plant Plant Plants are living organisms belonging to the kingdom Plantae. Precise definitions of the kingdom vary, but as the term is used here, plants include familiar organisms such as trees, flowers, herbs, bushes, grasses, vines, ferns, mosses, and green algae. The group is also called green plants or... |
Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana is a small flowering plant native to Europe, Asia, and northwestern Africa. A spring annual with a relatively short life cycle, arabidopsis is popular as a model organism in plant biology and genetics... |
157,000,000 | 157Mb | First plant genome sequenced, December 2000. | |
Plant Plant Plants are living organisms belonging to the kingdom Plantae. Precise definitions of the kingdom vary, but as the term is used here, plants include familiar organisms such as trees, flowers, herbs, bushes, grasses, vines, ferns, mosses, and green algae. The group is also called green plants or... |
Genlisea margaretae Genlisea margaretae Genlisea margaretae is a carnivorous species in the genus Genlisea native to areas of Madagascar, Tanzania, and Zambia. It has pale bundles of root-like organs up to about 20 cm long under ground that attract, trap, and digest protozoans. These organs are subterranean leaves, which lack chlorophyll... |
63,400,000 | 63Mb | Smallest recorded flowering plant Flowering plant The flowering plants , also known as Angiospermae or Magnoliophyta, are the most diverse group of land plants. Angiosperms are seed-producing plants like the gymnosperms and can be distinguished from the gymnosperms by a series of synapomorphies... genome, 2006. |
|
Plant Plant Plants are living organisms belonging to the kingdom Plantae. Precise definitions of the kingdom vary, but as the term is used here, plants include familiar organisms such as trees, flowers, herbs, bushes, grasses, vines, ferns, mosses, and green algae. The group is also called green plants or... |
Fritillaria assyrica | 130,000,000,000 | 130Gb | ||
Plant Plant Plants are living organisms belonging to the kingdom Plantae. Precise definitions of the kingdom vary, but as the term is used here, plants include familiar organisms such as trees, flowers, herbs, bushes, grasses, vines, ferns, mosses, and green algae. The group is also called green plants or... |
Populus trichocarpa Poplar Populus is a genus of 25–35 species of deciduous flowering plants in the family Salicaceae, native to most of the Northern Hemisphere. English names variously applied to different species include poplar , aspen, and cottonwood.... |
480,000,000 | 480Mb | First tree genome sequenced, September 2006 | |
Plant Plant Plants are living organisms belonging to the kingdom Plantae. Precise definitions of the kingdom vary, but as the term is used here, plants include familiar organisms such as trees, flowers, herbs, bushes, grasses, vines, ferns, mosses, and green algae. The group is also called green plants or... |
Paris japonica Paris japonica is a species of the genus Paris in the family Melanthiaceae, which has the largest genome of any plant yet assayed, about 150 billion base pairs long. An octoploid and suspected allopolyploid hybrid of four species, it has 40 chromosomes. It is native to sub-alpine regions of... (Japanese-native, pale-petal) |
150,000,000,000 | 150Gb | 152.23 | Largest plant genome known |
Moss Moss Mosses are small, soft plants that are typically 1–10 cm tall, though some species are much larger. They commonly grow close together in clumps or mats in damp or shady locations. They do not have flowers or seeds, and their simple leaves cover the thin wiry stems... |
Physcomitrella patens Physcomitrella patens Physcomitrella patens is a moss used as a model organism for studies on plant evolution, development and physiology.-Model organism:... |
480,000,000 | 480Mb | First genome of a bryophyte Bryophyte Bryophyte is a traditional name used to refer to all embryophytes that do not have true vascular tissue and are therefore called 'non-vascular plants'. Some bryophytes do have specialized tissues for the transport of water; however since these do not contain lignin, they are not considered to be... sequenced, January 2008. |
|
Yeast Yeast Yeasts are eukaryotic micro-organisms classified in the kingdom Fungi, with 1,500 species currently described estimated to be only 1% of all fungal species. Most reproduce asexually by mitosis, and many do so by an asymmetric division process called budding... |
Saccharomyces cerevisiae Saccharomyces cerevisiae Saccharomyces cerevisiae is a species of yeast. It is perhaps the most useful yeast, having been instrumental to baking and brewing since ancient times. It is believed that it was originally isolated from the skin of grapes... |
12,100,000 | 12.1Mb | First eukaryotic genome sequenced, 1996 | |
Fungus Fungus A fungus is a member of a large group of eukaryotic organisms that includes microorganisms such as yeasts and molds , as well as the more familiar mushrooms. These organisms are classified as a kingdom, Fungi, which is separate from plants, animals, and bacteria... |
Aspergillus nidulans Aspergillus nidulans Aspergillus nidulans is one of many species of filamentous fungi in the phylum Ascomycota... |
30,000,000 | 30Mb | ||
Nematode | Caenorhabditis elegans Caenorhabditis elegans Caenorhabditis elegans is a free-living, transparent nematode , about 1 mm in length, which lives in temperate soil environments. Research into the molecular and developmental biology of C. elegans was begun in 1974 by Sydney Brenner and it has since been used extensively as a model... |
100,300,000 | 100Mb | First multicellular animal genome sequenced, December 1998 | |
Nematode | Pratylenchus coffeae Pratylenchus coffeae Pratylenchus coffeae is a plant pathogenic nematode.- External links :*... |
20,000,000 | 20Mb | Smallest animal genome known | |
Insect Insect Insects are a class of living creatures within the arthropods that have a chitinous exoskeleton, a three-part body , three pairs of jointed legs, compound eyes, and two antennae... |
Drosophila melanogaster Drosophila melanogaster Drosophila melanogaster is a species of Diptera, or the order of flies, in the family Drosophilidae. The species is known generally as the common fruit fly or vinegar fly. Starting from Charles W... (fruit fly) |
130,000,000 | 130Mb | ||
Insect Insect Insects are a class of living creatures within the arthropods that have a chitinous exoskeleton, a three-part body , three pairs of jointed legs, compound eyes, and two antennae... |
Bombyx mori Bombyx mori The silkworm is the larva or caterpillar of the domesticated silkmoth, Bombyx mori . It is an economically important insect, being a primary producer of silk... (silk moth) |
530,000,000 | 530Mb | ||
Insect Insect Insects are a class of living creatures within the arthropods that have a chitinous exoskeleton, a three-part body , three pairs of jointed legs, compound eyes, and two antennae... |
Apis mellifera (honey bee) | 236,000,000 | 236Mb | ||
Insect Insect Insects are a class of living creatures within the arthropods that have a chitinous exoskeleton, a three-part body , three pairs of jointed legs, compound eyes, and two antennae... |
Solenopsis invicta (fire ant) | 480,000,000 | 480Mb | ||
Fish Fish Fish are a paraphyletic group of organisms that consist of all gill-bearing aquatic vertebrate animals that lack limbs with digits. Included in this definition are the living hagfish, lampreys, and cartilaginous and bony fish, as well as various extinct related groups... |
Tetraodon nigroviridis Tetraodon nigroviridis Tetraodon nigroviridis is one of the pufferfish known as the green spotted puffer. It is found across South and Southeast Asia in coastal freshwater and brackish water habitats. Tetraodon nigroviridis reaches a maximum length of about 15 cm... (type of puffer fish) |
385,000,000 | 390Mb | Smallest vertebrate genome known | |
Mammal Mammal Mammals are members of a class of air-breathing vertebrate animals characterised by the possession of endothermy, hair, three middle ear bones, and mammary glands functional in mothers with young... |
Homo sapiens | 3,200,000,000 | 3.2Gb | 3 | |
Fish Fish Fish are a paraphyletic group of organisms that consist of all gill-bearing aquatic vertebrate animals that lack limbs with digits. Included in this definition are the living hagfish, lampreys, and cartilaginous and bony fish, as well as various extinct related groups... |
Protopterus aethiopicus (marbled lungfish) | 130,000,000,000 | 130Gb | 143 | Largest vertebrate genome known |
Note: The DNA from a single (diploid) human cell if the 46 chromosomes were connected end-to-end and straightened, would have a length of ~2 m and a width of ~2.4 nanometers.
Since genomes and their organisms are very complex, one research strategy is to reduce the number of genes in a genome to the bare minimum and still have the organism in question survive. There is experimental work being done on minimal genomes for single cell organisms as well as minimal genomes for multicellular organisms (see Developmental biology
Developmental biology
Developmental biology is the study of the process by which organisms grow and develop. Modern developmental biology studies the genetic control of cell growth, differentiation and "morphogenesis", which is the process that gives rise to tissues, organs and anatomy.- Related fields of study...
). The work is both in vivo
In vivo
In vivo is experimentation using a whole, living organism as opposed to a partial or dead organism, or an in vitro controlled environment. Animal testing and clinical trials are two forms of in vivo research...
and in silico
In silico
In silico is an expression used to mean "performed on computer or via computer simulation." The phrase was coined in 1989 as an analogy to the Latin phrases in vivo and in vitro which are commonly used in biology and refer to experiments done in living organisms and outside of living organisms,...
.
Genome evolution
Genomes are more than the sum of an organism's genes and have traits that may be measuredMeasurement
Measurement is the process or the result of determining the ratio of a physical quantity, such as a length, time, temperature etc., to a unit of measurement, such as the metre, second or degree Celsius...
and studied without reference to the details of any particular genes and their products. Researchers compare traits such as chromosome number (karyotype
Karyotype
A karyotype is the number and appearance of chromosomes in the nucleus of an eukaryotic cell. The term is also used for the complete set of chromosomes in a species, or an individual organism.p28...
), genome size
Genome size
Genome size is the total amount of DNA contained within one copy of a single genome. It is typically measured in terms of mass in picograms or less frequently in Daltons or as the total number of nucleotide base pairs typically in megabases . One picogram equals 978 megabases...
, gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...
order, codon usage bias
Codon usage bias
Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides that encodes a specific amino acid residue in a polypeptide chain or for the termination of translation .There are 64 different codons but only 20...
, and GC-content
GC-content
In molecular biology and genetics, GC-content is the percentage of nitrogenous bases on a DNA molecule that are either guanine or cytosine . This may refer to a specific fragment of DNA or RNA, or that of the whole genome...
to determine what mechanisms could have produced the great variety of genomes that exist today (for recent overviews, see Brown 2002; Saccone and Pesole 2003; Benfey and Protopapas 2004; Gibson and Muse 2004; Reese 2004; Gregory 2005).
Duplications
Gene duplication
Gene duplication is any duplication of a region of DNA that contains a gene; it may occur as an error in homologous recombination, a retrotransposition event, or duplication of an entire chromosome.The second copy of the gene is often free from selective pressure — that is, mutations of it have no...
play a major role in shaping the genome. Duplications may range from extension of short tandem repeats, to duplication of a cluster of genes, and all the way to duplications of entire chromosomes or even entire genomes
Polyploidy
Polyploid is a term used to describe cells and organisms containing more than two paired sets of chromosomes. Most eukaryotic species are diploid, meaning they have two sets of chromosomes — one set inherited from each parent. However polyploidy is found in some organisms and is especially common...
. Such duplications are probably fundamental to the creation of genetic novelty.
Horizontal gene transfer
Horizontal gene transfer
Horizontal gene transfer , also lateral gene transfer , is any process in which an organism incorporates genetic material from another organism without being the offspring of that organism...
is invoked to explain how there is often extreme similarity between small portions of the genomes of two organisms that are otherwise very distantly related. Horizontal gene transfer seems to be common among many microbes. Also, eukaryotic cells
Eukaryote
A eukaryote is an organism whose cells contain complex structures enclosed within membranes. Eukaryotes may more formally be referred to as the taxon Eukarya or Eukaryota. The defining membrane-bound structure that sets eukaryotic cells apart from prokaryotic cells is the nucleus, or nuclear...
seem to have experienced a transfer of some genetic material from their chloroplast
Chloroplast
Chloroplasts are organelles found in plant cells and other eukaryotic organisms that conduct photosynthesis. Chloroplasts capture light energy to conserve free energy in the form of ATP and reduce NADP to NADPH through a complex set of processes called photosynthesis.Chloroplasts are green...
and mitochondrial genomes to their nuclear chromosomes.
External links
- Build a DNA Molecule
- Some comparative genome sizes
- DNA Interactive: The History of DNA Science
- DNA From The Beginning
- All About The Human Genome Project — from Genome.gov
- Animal genome size database
- Plant genome size database
- GOLD:Genomes OnLine Database
- The Genome News Network
- NCBI Entrez Genome Project database
- NCBI Genome Primer
- GeneCards — an integrated database of human genes
- BBC News - Final genome 'chapter' published
- IMG (The Integrated Microbial Genomes system) — for genome analysis by the DOE-JGI
- GeKnome Technologies Next-Gen Sequencing Data Analysis — next-generation sequencing data analysis for IlluminaIllumina (company)Illumina, Inc. is a company incorporated in April 1998 that develops, manufactures and markets integrated systems for the analysis of genetic variation and biological function. Using its technologies, the company provides a line of products and services that serve the sequencing, genotyping and...
and 454454 Life Sciences454 Life Sciences, is a biotechnology company based in Branford, Connecticut. It is a subsidiary of Roche, and specializes in high-throughput DNA sequencing.-History and Major Achievements:...
Service from GeKnome Technologies.