Gene duplication
Encyclopedia
Gene duplication is any duplication of a region of DNA
that contains a gene
; it may occur as an error in homologous recombination
, a retrotransposition
event, or duplication of an entire chromosome.
The second copy of the gene is often free from selective pressure — that is, mutation
s of it have no deleterious effects to its host organism. Thus it accumulates mutations faster than a functional single-copy gene, over generations of organisms.
A duplication is the opposite of a deletion. Duplications arise from an event termed unequal crossing-over that occurs during meiosis between misaligned homologous chromosomes. The chance of this happening is a function of the degree of sharing of repetitive elements between two chromosomes. The product of this recombination are a duplication at the site of the exchange and a reciprocal deletion.
; this stance has been held by members of the scientific community for over 100 years. Susumu Ohno
was one of the most famous developers of this theory in his classic book Evolution by gene duplication (1970). Ohno argued that gene duplication is the most important evolutionary force since the emergence of the universal common ancestor
.
Major genome duplication
events are not uncommon. It is believed that the entire yeast
genome
underwent duplication about 100 million years ago. Plant
s are the most prolific genome duplicators. For example, wheat
is hexaploid (a kind of polyploid), meaning that it has six copies of its genome.
The duplication of a gene results in an additional copy that is free from selective pressure. One kind of view is that this allows the new copy of the gene to mutate without deleterious consequence to the organism. This freedom from consequences allows for the mutation of novel genes that could potentially increase the fitness of the organism or code for a new function. An example of this is the apparent mutation of a duplicated digestive gene in a family of ice fish
into an antifreeze gene.
Another view is that both copies are equally free to accumulate degenerative mutations, so long as any defects are complemented by the other copy. This leads to a neutral "subfunctionalization" or DDC (duplication-degeneration-complementation) model, in which the functionality of the original gene is distributed among the two copies.
The two genes that exist after a gene duplication event are called paralogs and usually code for protein
s with a similar function and/or structure. By contrast, orthologous genes are ones which code for proteins with similar functions but exist in different species, and are created from a speciation
event. (See Homology of sequences in genetics).
It is important (but often difficult) to differentiate between paralogs and orthologs in biological research. Experiments on human gene function can often be carried out on other species
if a homolog to a human gene can be found in the genome of that species, but only if the homolog is orthologous. If they are paralogs and resulted from a gene duplication event, their functions are likely to be too different.
The paralogous segments can be repeat sequences with more than 90% sequence similarity. In such cases, they are known as low copy repeats (LCRs) though they are not highly repetitive sequences. They are mostly found in pericentronomic
, subtelomeric and interstitial
regions of a chromosome. The LCRs, due to their size (>1Kb), similarity, and orientation, are highly susceptible to duplications and deletions. These genomic rearrangements are caused by the mechanism of non-allelic homologous recombination. The resulting genomic variation leads to gene dosage dependent neurological disorders such as Rett-like syndrome
and Pelizaeus-Merzbacher disease
.
, amplification is one of many ways in which a gene
can be overexpressed. Genetic amplification can occur artificially, as with the use of the polymerase chain reaction
technique to amplify short strands of DNA
in vitro
using enzymes, or it can occur naturally, as described above. If it's a natural duplication, it can still take place in a somatic cell
, rather than a germline
cell (which would be necessary for a lasting evolutionary change).
technology can simultaneously monitor the expression
levels of thousands of genes across many treatments or experimental conditions, greatly facilitating the evolutionary studies of gene regulation after gene duplication or speciation
.
, as is the case with P70-S6 Kinase 1
amplification and breast cancer
. In such cases the genetic duplication occurs in a somatic cell and affects only the genome of the cancer cells themselves, not the entire organism, much less any subsequent offspring.
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
that contains a gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...
; it may occur as an error in homologous recombination
Homologous recombination
Homologous recombination is a type of genetic recombination in which nucleotide sequences are exchanged between two similar or identical molecules of DNA. It is most widely used by cells to accurately repair harmful breaks that occur on both strands of DNA, known as double-strand breaks...
, a retrotransposition
Retrotransposon
Retrotransposons are genetic elements that can amplify themselves in a genome and are ubiquitous components of the DNA of many eukaryotic organisms. They are a subclass of transposon. They are particularly abundant in plants, where they are often a principal component of nuclear DNA...
event, or duplication of an entire chromosome.
The second copy of the gene is often free from selective pressure — that is, mutation
Mutation
In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus. They can be defined as sudden and spontaneous changes in the cell. Mutations are caused by radiation, viruses, transposons and mutagenic...
s of it have no deleterious effects to its host organism. Thus it accumulates mutations faster than a functional single-copy gene, over generations of organisms.
A duplication is the opposite of a deletion. Duplications arise from an event termed unequal crossing-over that occurs during meiosis between misaligned homologous chromosomes. The chance of this happening is a function of the degree of sharing of repetitive elements between two chromosomes. The product of this recombination are a duplication at the site of the exchange and a reciprocal deletion.
Gene duplication as an evolutionary event
Gene duplication is believed to play a major role in evolutionEvolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...
; this stance has been held by members of the scientific community for over 100 years. Susumu Ohno
Susumu Ohno
was an Asian American geneticist and evolutionary biologist, and seminal researcher in the field of molecular evolution.- Biography :Susumu Ohno was born of Japanese parents in Seoul, Korea, on February 1, 1928. The second of five children, he was the son of the minister of education of the...
was one of the most famous developers of this theory in his classic book Evolution by gene duplication (1970). Ohno argued that gene duplication is the most important evolutionary force since the emergence of the universal common ancestor
Common descent
In evolutionary biology, a group of organisms share common descent if they have a common ancestor. There is strong quantitative support for the theory that all living organisms on Earth are descended from a common ancestor....
.
Major genome duplication
Polyploidy
Polyploid is a term used to describe cells and organisms containing more than two paired sets of chromosomes. Most eukaryotic species are diploid, meaning they have two sets of chromosomes — one set inherited from each parent. However polyploidy is found in some organisms and is especially common...
events are not uncommon. It is believed that the entire yeast
Yeast
Yeasts are eukaryotic micro-organisms classified in the kingdom Fungi, with 1,500 species currently described estimated to be only 1% of all fungal species. Most reproduce asexually by mitosis, and many do so by an asymmetric division process called budding...
genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....
underwent duplication about 100 million years ago. Plant
Plant
Plants are living organisms belonging to the kingdom Plantae. Precise definitions of the kingdom vary, but as the term is used here, plants include familiar organisms such as trees, flowers, herbs, bushes, grasses, vines, ferns, mosses, and green algae. The group is also called green plants or...
s are the most prolific genome duplicators. For example, wheat
Wheat
Wheat is a cereal grain, originally from the Levant region of the Near East, but now cultivated worldwide. In 2007 world production of wheat was 607 million tons, making it the third most-produced cereal after maize and rice...
is hexaploid (a kind of polyploid), meaning that it has six copies of its genome.
The duplication of a gene results in an additional copy that is free from selective pressure. One kind of view is that this allows the new copy of the gene to mutate without deleterious consequence to the organism. This freedom from consequences allows for the mutation of novel genes that could potentially increase the fitness of the organism or code for a new function. An example of this is the apparent mutation of a duplicated digestive gene in a family of ice fish
Notothenioidei
The Antarctic icefish belong to the perciform suborder Notothenioidei and are the largely endemic, dominant fish taxa in the cold continental shelf waters surrounding Antarctica. At present, the suborder includes 8 families with 43 genera and 122 species...
into an antifreeze gene.
Another view is that both copies are equally free to accumulate degenerative mutations, so long as any defects are complemented by the other copy. This leads to a neutral "subfunctionalization" or DDC (duplication-degeneration-complementation) model, in which the functionality of the original gene is distributed among the two copies.
The two genes that exist after a gene duplication event are called paralogs and usually code for protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
s with a similar function and/or structure. By contrast, orthologous genes are ones which code for proteins with similar functions but exist in different species, and are created from a speciation
Speciation
Speciation is the evolutionary process by which new biological species arise. The biologist Orator F. Cook seems to have been the first to coin the term 'speciation' for the splitting of lineages or 'cladogenesis,' as opposed to 'anagenesis' or 'phyletic evolution' occurring within lineages...
event. (See Homology of sequences in genetics).
It is important (but often difficult) to differentiate between paralogs and orthologs in biological research. Experiments on human gene function can often be carried out on other species
Species
In biology, a species is one of the basic units of biological classification and a taxonomic rank. A species is often defined as a group of organisms capable of interbreeding and producing fertile offspring. While in many cases this definition is adequate, more precise or differing measures are...
if a homolog to a human gene can be found in the genome of that species, but only if the homolog is orthologous. If they are paralogs and resulted from a gene duplication event, their functions are likely to be too different.
The paralogous segments can be repeat sequences with more than 90% sequence similarity. In such cases, they are known as low copy repeats (LCRs) though they are not highly repetitive sequences. They are mostly found in pericentronomic
Chromosome regions
Several chromosome regions have been defined by convenience in order to talk about gene locations. Most important is the distinction between chromosome region p and chromosome region q...
, subtelomeric and interstitial
Chromosome regions
Several chromosome regions have been defined by convenience in order to talk about gene locations. Most important is the distinction between chromosome region p and chromosome region q...
regions of a chromosome. The LCRs, due to their size (>1Kb), similarity, and orientation, are highly susceptible to duplications and deletions. These genomic rearrangements are caused by the mechanism of non-allelic homologous recombination. The resulting genomic variation leads to gene dosage dependent neurological disorders such as Rett-like syndrome
Rett syndrome
Rett syndrome is a neurodevelopmental disorder of the grey matter of the brain that almost exclusively affects females. The clinical features include small hands and feet and a deceleration of the rate of head growth . Repetitive hand movements, such as wringing and/or repeatedly putting hands into...
and Pelizaeus-Merzbacher disease
Pelizaeus-Merzbacher disease
Pelizaeus–Merzbacher disease is a rare central nervous system disorder in which coordination, motor abilities, and intellectual function are delayed to variable extents.-Classification:...
.
Gene duplication as amplification
Gene duplication doesn't necessarily constitute a lasting change in a species' genome. In fact, such changes often don't last past the initial host organism. From the perspective of molecular geneticsMolecular genetics
Molecular genetics is the field of biology and genetics that studies the structure and function of genes at a molecular level. The field studies how the genes are transferred from generation to generation. Molecular genetics employs the methods of genetics and molecular biology...
, amplification is one of many ways in which a gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...
can be overexpressed. Genetic amplification can occur artificially, as with the use of the polymerase chain reaction
Polymerase chain reaction
The polymerase chain reaction is a scientific technique in molecular biology to amplify a single or a few copies of a piece of DNA across several orders of magnitude, generating thousands to millions of copies of a particular DNA sequence....
technique to amplify short strands of DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
in vitro
In vitro
In vitro refers to studies in experimental biology that are conducted using components of an organism that have been isolated from their usual biological context in order to permit a more detailed or more convenient analysis than can be done with whole organisms. Colloquially, these experiments...
using enzymes, or it can occur naturally, as described above. If it's a natural duplication, it can still take place in a somatic cell
Somatic cell
A somatic cell is any biological cell forming the body of an organism; that is, in a multicellular organism, any cell other than a gamete, germ cell, gametocyte or undifferentiated stem cell...
, rather than a germline
Germline
In biology and genetics, the germline of a mature or developing individual is the line of germ cells that have genetic material that may be passed to a child.For example, gametes such as the sperm or the egg, are part of the germline...
cell (which would be necessary for a lasting evolutionary change).
Genomic microarrays detect Duplications
Technologies such as genomic microarrays, also called array comparative genomic hybridization (array CGH), are used to detect chromosomal abnormalities, such as microduplications, in a high throughput fashion from genomic DNA samples. In particular, DNA microarrayMicroarray
A microarray is a multiplex lab-on-a-chip. It is a 2D array on a solid substrate that assays large amounts of biological material using high-throughput screening methods.Types of microarrays include:...
technology can simultaneously monitor the expression
Gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as ribosomal RNA , transfer RNA or small nuclear RNA genes, the product is a functional RNA...
levels of thousands of genes across many treatments or experimental conditions, greatly facilitating the evolutionary studies of gene regulation after gene duplication or speciation
Speciation
Speciation is the evolutionary process by which new biological species arise. The biologist Orator F. Cook seems to have been the first to coin the term 'speciation' for the splitting of lineages or 'cladogenesis,' as opposed to 'anagenesis' or 'phyletic evolution' occurring within lineages...
.
Role in cancer
Duplications of oncogenes are a common cause of many types of cancerCancer
Cancer , known medically as a malignant neoplasm, is a large group of different diseases, all involving unregulated cell growth. In cancer, cells divide and grow uncontrollably, forming malignant tumors, and invade nearby parts of the body. The cancer may also spread to more distant parts of the...
, as is the case with P70-S6 Kinase 1
P70-S6 Kinase 1
Ribosomal protein S6 kinase beta-1 is an enzyme that in humans is encoded by the RPS6KB1 gene.-Interactions:P70-S6 Kinase 1 has been shown to interact with CSNK2B, EIF3B, COASY, KIAA1303, POLDIP3, Mammalian target of rapamycin, PPP2R2A, RBX1 and Ubiquitin C.-References:...
amplification and breast cancer
Breast cancer
Breast cancer is cancer originating from breast tissue, most commonly from the inner lining of milk ducts or the lobules that supply the ducts with milk. Cancers originating from ducts are known as ductal carcinomas; those originating from lobules are known as lobular carcinomas...
. In such cases the genetic duplication occurs in a somatic cell and affects only the genome of the cancer cells themselves, not the entire organism, much less any subsequent offspring.
Cancer type | Associated gene amplifications | Prevalence of amplification in cancer type (percent) |
---|---|---|
Breast cancer Breast cancer Breast cancer is cancer originating from breast tissue, most commonly from the inner lining of milk ducts or the lobules that supply the ducts with milk. Cancers originating from ducts are known as ductal carcinomas; those originating from lobules are known as lobular carcinomas... |
MYC Myc Myc is a regulator gene that codes for a transcription factor. In the human genome, Myc is located on chromosome 8 and is believed to regulate expression of 15% of all genes through binding on Enhancer Box sequences and recruiting histone acetyltransferases... |
20 |
ERBB2 (EGFR) | 20 | |
CCND1 (Cyclin D1 Cyclin D1 G1/S-specific cyclin-D1 is a protein that in humans is encoded by the CCND1 gene.Immunohistochemical staining of cyclin D1 antibodies is used to diagnose mantle cell lymphoma.-Interactions:... ) |
15-20 | |
FGFR1 | 12 | |
FGFR2 | 12 | |
Cervical cancer Cervical cancer Cervical cancer is malignant neoplasm of the cervix uteri or cervical area. One of the most common symptoms is abnormal vaginal bleeding, but in some cases there may be no obvious symptoms until the cancer is in its advanced stages... |
MYC Myc Myc is a regulator gene that codes for a transcription factor. In the human genome, Myc is located on chromosome 8 and is believed to regulate expression of 15% of all genes through binding on Enhancer Box sequences and recruiting histone acetyltransferases... |
25-50 |
ERBB2 | 20 | |
Colorectal cancer Colorectal cancer Colorectal cancer, commonly known as bowel cancer, is a cancer caused by uncontrolled cell growth , in the colon, rectum, or vermiform appendix. Colorectal cancer is clinically distinct from anal cancer, which affects the anus.... |
HRAS HRAS GTPase HRas also known as transforming protein p21 is an enzyme that in humans is encoded by the HRAS gene. The HRAS gene is located on the short arm of chromosome 11 at position 15.5, from base pair 522,241 to base pair 525,549.- Function :... |
30 |
KRAS KRAS GTPase KRas also known as V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog and KRAS, is a protein that in humans is encoded by the KRAS gene. Like other members of the Ras family, the KRAS protein is a GTPase and is an early player in many signal transduction pathways... |
20 | |
MYB MYB Myb proto-oncogene protein also known as transcriptional activator Myb is a protein that in humans is encoded by the MYB gene.- Function :... |
15-20 | |
Esophageal cancer Esophageal cancer Esophageal cancer is malignancy of the esophagus. There are various subtypes, primarily squamous cell cancer and adenocarcinoma . Squamous cell cancer arises from the cells that line the upper part of the esophagus... |
MYC Myc Myc is a regulator gene that codes for a transcription factor. In the human genome, Myc is located on chromosome 8 and is believed to regulate expression of 15% of all genes through binding on Enhancer Box sequences and recruiting histone acetyltransferases... |
40 |
CCND1 | 25 | |
MDM2 Mdm2 Mdm2 is an important negative regulator of the p53 tumor suppressor. It is the name of a gene as well as the protein encoded by that gene. Mdm2 protein functions both as an E3 ubiquitin ligase that recognizes the N-terminal trans-activation domain of the p53 tumor suppressor and an inhibitor of... |
13 | |
Gastric cancer | CCNE CCNE CCNE may refer to:*Cisco Career Certifications*Commission on Collegiate Nursing Education... (Cyclin E Cyclin E Cyclin E is a member of the cyclin family.Cyclin E binds to G1 phase Cdk2, which is required for the transition from G1 to S phase. The Cyclin E/CDK2 complex phosphorylates p27Kip1 , tagging it for degradation, thus promoting expression of Cyclin A, allowing progression to S phase.... ) |
15 |
KRAS KRAS GTPase KRas also known as V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog and KRAS, is a protein that in humans is encoded by the KRAS gene. Like other members of the Ras family, the KRAS protein is a GTPase and is an early player in many signal transduction pathways... |
10 | |
MET Met -In the arts:* Metropolitan Opera in Manhattan, New York* Metropolitan Museum of Art in Manhattan, New York* Various buildings known as the Metropolitan Opera House* The Metropolitan Ensemble Theater in Kansas City, Missouri-In computing and the Internet:... |
10 | |
Glioblastoma | ERBB1 (EGFR Epidermal growth factor receptor The epidermal growth factor receptor is the cell-surface receptor for members of the epidermal growth factor family of extracellular protein ligands... ) |
33-50 |
CDK4 | 15 | |
Head and neck cancer Head and neck cancer Head and neck cancer refers to a group of biologically similar cancers that start in the upper aerodigestive tract, including the lip, oral cavity , nasal cavity , paranasal sinuses, pharynx, and larynx. 90% of head and neck cancers are squamous cell carcinomas , originating from the mucosal lining... |
CCND1 | 50 |
ERBB1 | 10 | |
MYC Myc Myc is a regulator gene that codes for a transcription factor. In the human genome, Myc is located on chromosome 8 and is believed to regulate expression of 15% of all genes through binding on Enhancer Box sequences and recruiting histone acetyltransferases... |
7-10 | |
Hepatocellular cancer | CCND1 | 13 |
Neuroblastoma Neuroblastoma Neuroblastoma is the most common extracranial solid cancer in childhood and the most common cancer in infancy, with an annual incidence of about 650 cases per year in the US , and 100 cases per year in the UK . Close to 50 percent of neuroblastoma cases occur in children younger than two years old... |
MYCN | 20-25 |
Ovarian cancer Ovarian cancer Ovarian cancer is a cancerous growth arising from the ovary. Symptoms are frequently very subtle early on and may include: bloating, pelvic pain, difficulty eating and frequent urination, and are easily confused with other illnesses.... |
MYC Myc Myc is a regulator gene that codes for a transcription factor. In the human genome, Myc is located on chromosome 8 and is believed to regulate expression of 15% of all genes through binding on Enhancer Box sequences and recruiting histone acetyltransferases... |
20-30 |
ERBB2 | 15-30 | |
AKT2 AKT2 RAC-beta serine/threonine-protein kinase is an enzyme that in humans is encoded by the AKT2 gene.-Interactions:AKT2 has been shown to interact with TCL1A, APPL1, SH3RF1 and CHUK.-Further reading:... |
12 | |
Sarcoma Sarcoma A sarcoma is a cancer that arises from transformed cells in one of a number of tissues that develop from embryonic mesoderm. Thus, sarcomas include tumors of bone, cartilage, fat, muscle, vascular, and hematopoietic tissues... |
MDM2 Mdm2 Mdm2 is an important negative regulator of the p53 tumor suppressor. It is the name of a gene as well as the protein encoded by that gene. Mdm2 protein functions both as an E3 ubiquitin ligase that recognizes the N-terminal trans-activation domain of the p53 tumor suppressor and an inhibitor of... |
10-30 |
CDK4 | 10 | |
Small cell lung cancer | MYC Myc Myc is a regulator gene that codes for a transcription factor. In the human genome, Myc is located on chromosome 8 and is believed to regulate expression of 15% of all genes through binding on Enhancer Box sequences and recruiting histone acetyltransferases... |
15-20 |
See also
- Pseudogenes
- Molecular evolutionMolecular evolutionMolecular evolution is in part a process of evolution at the scale of DNA, RNA, and proteins. Molecular evolution emerged as a scientific field in the 1960s as researchers from molecular biology, evolutionary biology and population genetics sought to understand recent discoveries on the structure...
- Unequal crossing overUnequal crossing overUnequal crossing over is a type of gene duplication event that deletes a sequence in one strand and replaces it with a duplication from its sister chromatid in mitosis or from its homologous chromosome during meiosis. It is a type of chromosomal crossover between homologous sequences that are not...
- Human genomeHuman genomeThe human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...
- Comparative genomicsComparative genomicsComparative genomics is the study of the relationship of genome structure and function across different biological species or strains. Comparative genomics is an attempt to take advantage of the information provided by the signatures of selection to understand the function and evolutionary...
- InparanoidInparanoidINPARANOID is an algorithm which finds orthologous genes and those paralogous genes which arose—most likely by duplication--after some speciation event...
- Tandem exon duplicationTandem exon duplicationTandem exon duplication is defined as duplication of exons within the same gene to give rise to the subsequent exon. A complete exon analysis of all genes in Homo sapiens, Drosophila melanogaster and Caenorhabditis elegans has shown 12,291 instances of tandem duplication in exons in human, fly and...