Noncoding DNA
Encyclopedia
In genetics
Genetics
Genetics , a discipline of biology, is the science of genes, heredity, and variation in living organisms....

, noncoding DNA describes components of an organism's DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...

 sequences that do not encode
Genetic code
The genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells....

 for protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

 sequences. In many eukaryote
Eukaryote
A eukaryote is an organism whose cells contain complex structures enclosed within membranes. Eukaryotes may more formally be referred to as the taxon Eukarya or Eukaryota. The defining membrane-bound structure that sets eukaryotic cells apart from prokaryotic cells is the nucleus, or nuclear...

s, a large percentage of an organism's total genome size
Genome size
Genome size is the total amount of DNA contained within one copy of a single genome. It is typically measured in terms of mass in picograms or less frequently in Daltons or as the total number of nucleotide base pairs typically in megabases . One picogram equals 978 megabases...

 is noncoding DNA, although the amount of noncoding DNA, and the proportion of coding versus noncoding DNA varies greatly between species.

Much of this DNA has no known biological function and is sometimes referred to as "junk DNA". However, many types of noncoding DNA sequences do have known biological functions, including the transcription
Transcription (genetics)
Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...

al and translational  regulation
Regulatory sequence
A regulatory sequence is a segment of DNA where regulatory proteins such as transcription factors bind preferentially. These regulatory proteins bind to short stretches of DNA called regulatory regions, which are appropriately positioned in the genome, usually a short distance 'upstream' of the...

 of protein-coding sequences. Other noncoding sequences have likely, but as-yet undetermined, functions (this is inferred from high levels of homology
Homology (biology)
Homology forms the basis of organization for comparative biology. In 1843, Richard Owen defined homology as "the same organ in different animals under every variety of form and function". Organs as different as a bat's wing, a seal's flipper, a cat's paw and a human hand have a common underlying...

 and conservation
Conserved sequence
In biology, conserved sequences are similar or identical sequences that occur within nucleic acid sequences , protein sequences, protein structures or polymeric carbohydrates across species or within different molecules produced by the same organism...

 seen in sequences that do not encode proteins but, nonetheless,
appear to be under heavy selective pressure). While this indicates that noncoding DNA should not be indiscriminately referrred to as junk DNA, the lack of function and sequence conservation in a majority of noncoding DNA indicates that much of it may indeed be without function.

Fraction of noncoding genomic DNA

The amount of total genomic DNA varies widely between organisms, and the proportion of coding and noncoding DNA within these genomes varies greatly as well. More than 98% of the human genome
Human genome
The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs plus the small mitochondrial DNA. 22 of the 23 chromosomes are autosomal chromosome pairs, while the remaining pair is sex-determining...

 does not encode protein sequences, including most sequences within intron
Intron
An intron is any nucleotide sequence within a gene that is removed by RNA splicing to generate the final mature RNA product of a gene. The term intron refers to both the DNA sequence within a gene, and the corresponding sequence in RNA transcripts. Sequences that are joined together in the final...

s and most intergenic DNA
Intergenic region
An Intergenic region is a stretch of DNA sequences located between clusters of genes that contain few or no genes. Occasionally some intergenic DNA acts to control genes nearby, but most of it has no currently known function...

.

While overall genome size
Genome size
Genome size is the total amount of DNA contained within one copy of a single genome. It is typically measured in terms of mass in picograms or less frequently in Daltons or as the total number of nucleotide base pairs typically in megabases . One picogram equals 978 megabases...

, and by extension the amount of noncoding DNA, are correlated to organism complexity, there are many exceptions. For example, the genome of the unicellular Polychaos dubium (formerly known as Amoeba dubia) has been reported to contain more than 200 times the amount of DNA in humans. The pufferfish
Pufferfish
Tetraodontidae is a family of primarily marine and estuarine fish of the Tetraodontiformes order. The family includes many familiar species which are variously called pufferfish, balloonfish, blowfish, bubblefish, globefish, swellfish, toadfish, toadies, honey toads, sugar toads, and sea squab...

 Takifugu
Takifugu
Takifugu is a genus of pufferfish, often better known by the Japanese name . There are 25 species belonging to the genus Takifugu, which can be found worldwide from about 45° latitude north to 45° latitude south, mostly in salt water. Their diet consists mostly of algae, molluscs, invertebrates...

 rubripes
genome is only about one eighth the size of the human genome, yet seems to have a comparable number of genes; approximately 90% of the Takifugu genome is noncoding DNA and most of the genome size difference appears to lie in the noncoding DNA. The extensive variation in nuclear genome size among eukaryotic species is known as the C-value enigma
C-value enigma
The C-value enigma or C-value paradox is a term used to describe the complex puzzle surrounding the extensive variation in nuclear genome size among eukaryotic species...

 or C-value paradox.

About 80 percent of the nucleotide bases in the human genome may be transcribed, but transcription does not necessarily imply function.

Noncoding functional RNA

Noncoding RNAs are functional RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....

 molecules that are not translated into protein. Examples of noncoding RNA include ribosomal RNA
Ribosomal RNA
Ribosomal ribonucleic acid is the RNA component of the ribosome, the enzyme that is the site of protein synthesis in all living cells. Ribosomal RNA provides a mechanism for decoding mRNA into amino acids and interacts with tRNAs during translation by providing peptidyl transferase activity...

, transfer RNA
Transfer RNA
Transfer RNA is an adaptor molecule composed of RNA, typically 73 to 93 nucleotides in length, that is used in biology to bridge the three-letter genetic code in messenger RNA with the twenty-letter code of amino acids in proteins. The role of tRNA as an adaptor is best understood by...

, Piwi-interacting RNA
Piwi-interacting RNA
Piwi-interacting RNA is the largest class of small non-coding RNA molecules that is expressed in animal cells. piRNAs form RNA-protein complexes through interactions with piwi proteins...

 and microRNA.

MicroRNAs are predicted to control the translational activity of approximately 30% of all protein-coding genes in mammals and may be vital components in the progression or treatment of various diseases including cancer
Cancer
Cancer , known medically as a malignant neoplasm, is a large group of different diseases, all involving unregulated cell growth. In cancer, cells divide and grow uncontrollably, forming malignant tumors, and invade nearby parts of the body. The cancer may also spread to more distant parts of the...

, cardiovascular disease
Cardiovascular disease
Heart disease or cardiovascular disease are the class of diseases that involve the heart or blood vessels . While the term technically refers to any disease that affects the cardiovascular system , it is usually used to refer to those related to atherosclerosis...

, and the immune system
Immune system
An immune system is a system of biological structures and processes within an organism that protects against disease by identifying and killing pathogens and tumor cells. It detects a wide variety of agents, from viruses to parasitic worms, and needs to distinguish them from the organism's own...

 response to infection
Infection
An infection is the colonization of a host organism by parasite species. Infecting parasites seek to use the host's resources to reproduce, often resulting in disease...

.

Cis- and Trans-regulatory elements

Cis-regulatory element
Cis-regulatory element
A cis-regulatory element or cis-element is a region of DNA or RNA that regulates the expression of genes located on that same molecule of DNA . This term is constructed from the Latin word cis, which means "on the same side as". These cis-regulatory elements are often binding sites for one or...

s are sequences that control the transcription
Transcription (genetics)
Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...

 of a nearby gene. Cis-elements may be located in 5' or 3' untranslated regions or within intron
Intron
An intron is any nucleotide sequence within a gene that is removed by RNA splicing to generate the final mature RNA product of a gene. The term intron refers to both the DNA sequence within a gene, and the corresponding sequence in RNA transcripts. Sequences that are joined together in the final...

s. Trans-regulatory element
Trans-regulatory element
Trans-regulatory elements are genes which may modify the expression of distant genes. More specifically, trans-regulatory elements are DNA sequences that encode transcription factors....

s control the transcription
Transcription (genetics)
Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...

 of a distant gene.

Promoters facilitate the transcription of a particular gene and are typically upstream
Upstream and downstream (DNA)
In molecular biology and genetics, upstream and downstream both refer to a relative position in DNA or RNA. Each strand of DNA or RNA has a 5' end and a 3' end, so named for the carbons on the deoxyribose ring. Relative to the position on the strand, downstream is the region towards the 3' end of...

 of the coding region. Enhancer
Enhancer (genetics)
In genetics, an enhancer is a short region of DNA that can be bound with proteins to enhance transcription levels of genes in a gene cluster...

 sequences may also exert very distant effects on the transcription levels of genes.

Introns

Intron
Intron
An intron is any nucleotide sequence within a gene that is removed by RNA splicing to generate the final mature RNA product of a gene. The term intron refers to both the DNA sequence within a gene, and the corresponding sequence in RNA transcripts. Sequences that are joined together in the final...

s are non-coding sections of a gene, transcribed into the precursor mRNA
Precursor mRNA
Precursor mRNA is an immature single strand of messenger ribonucleic acid . pre-mRNA is synthesized from a DNA template in the cell nucleus by transcription. Pre-mRNA comprises the bulk of heterogeneous nuclear RNA...

 sequence, but ultimately removed by RNA splicing
RNA splicing
In molecular biology and genetics, splicing is a modification of an RNA after transcription, in which introns are removed and exons are joined. This is needed for the typical eukaryotic messenger RNA before it can be used to produce a correct protein through translation...

 during the processing to mature messenger RNA
Messenger RNA
Messenger RNA is a molecule of RNA encoding a chemical "blueprint" for a protein product. mRNA is transcribed from a DNA template, and carries coding information to the sites of protein synthesis: the ribosomes. Here, the nucleic acid polymer is translated into a polymer of amino acids: a protein...

. Many introns appear to be mobile genetic elements.

Studies of group I introns from Tetrahymena
Tetrahymena
Tetrahymena are free-living ciliate protozoa that can also switch from commensalistic to pathogenic modes of survival. They are common in fresh-water. Tetrahymena species used as model organisms in biomedical research are T. thermophila and T. pyriformis.- T...

indicate that some introns appear to be selfish genetic elements, neutral to the host because they remove themselves from flanking exon
Exon
An exon is a nucleic acid sequence that is represented in the mature form of an RNA molecule either after portions of a precursor RNA have been removed by cis-splicing or when two or more precursor RNA molecules have been ligated by trans-splicing. The mature RNA molecule can be a messenger RNA...

s during RNA processing and do not produce an expression bias between allele
Allele
An allele is one of two or more forms of a gene or a genetic locus . "Allel" is an abbreviation of allelomorph. Sometimes, different alleles can result in different observable phenotypic traits, such as different pigmentation...

s with and without the intron. Some introns appear to have significant biological function, possibly through ribozyme
Ribozyme
A ribozyme is an RNA molecule with a well defined tertiary structure that enables it to catalyze a chemical reaction. Ribozyme means ribonucleic acid enzyme. It may also be called an RNA enzyme or catalytic RNA. Many natural ribozymes catalyze either the hydrolysis of one of their own...

 functionality that may regulate tRNA and rRNA activity as well as protein-coding gene expression, evident in hosts that have become dependent on such introns over long periods of time; for example, the trnL-intron is found in all green plants and appears to have been vertically inherited for several billions of years, including more than a billion years within chloroplast
Chloroplast
Chloroplasts are organelles found in plant cells and other eukaryotic organisms that conduct photosynthesis. Chloroplasts capture light energy to conserve free energy in the form of ATP and reduce NADP to NADPH through a complex set of processes called photosynthesis.Chloroplasts are green...

s and an additional 2–3 billion years prior in the cyanobacterial ancestors of chloroplasts.

Pseudogenes

Pseudogene
Pseudogene
Pseudogenes are dysfunctional relatives of known genes that have lost their protein-coding ability or are otherwise no longer expressed in the cell...

s are DNA sequences, related to known gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

s, that have lost their protein-coding ability or are otherwise no longer expressed
Gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as ribosomal RNA , transfer RNA or small nuclear RNA genes, the product is a functional RNA...

 in the cell. Pseudogenes arise from retrotransposition or genomic duplication of functional genes, and become "genomic fossils" that are nonfunctional due to mutation
Mutation
In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus. They can be defined as sudden and spontaneous changes in the cell. Mutations are caused by radiation, viruses, transposons and mutagenic...

s that prevent the transcription
Transcription (genetics)
Transcription is the process of creating a complementary RNA copy of a sequence of DNA. Both RNA and DNA are nucleic acids, which use base pairs of nucleotides as a complementary language that can be converted back and forth from DNA to RNA by the action of the correct enzymes...

 of the gene, such as within the gene promoter region, or fatally alter the translation of the gene, such as premature stop codon
Stop codon
In the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are based on polypeptides, which are unique sequences of amino acids. Most codons in messenger RNA correspond to the addition of an amino acid to a growing polypeptide...

s or frameshift
Translational frameshift
Translational frameshifting or ribosomal frameshifting refers to an alternate process of protein translation. A protein is translated from one end of the mRNA to the other, from the 5' to the 3' end. Normally a protein is translated from a template mRNA with consecutive blocks of 3 nucleotides...

s. Pseudogenes resulting from the retrotransposition of an RNA intermediate are known as processed pseudogenes; pseudogenes that arise from the genomic remains of duplicated genes
Gene duplication
Gene duplication is any duplication of a region of DNA that contains a gene; it may occur as an error in homologous recombination, a retrotransposition event, or duplication of an entire chromosome.The second copy of the gene is often free from selective pressure — that is, mutations of it have no...

 or residues of inactivated genes are nonprocessed pseudogenes.

While Dollo's Law
Dollo's law
Dollo's law of irreversibility is a hypothesis proposed in 1893 by French-born Belgian paleontologist Louis Dollo which states that evolution is not reversible...

 suggests that the loss of function in pseudogenes is likely permanent, silenced genes may actually retain function for several million years and can be "reactivated" into protein-coding sequences and a substantial number of pseudogenes are actively transcribed. Because pseudogenes are presumed to change without evolutionary constraint, they can serve as a useful model of the type and frequencies of various spontaneous genetic mutations.

Repeat sequences, transposons and viral elements

Transposon
Transposon
Transposable elements are sequences of DNA that can move or transpose themselves to new positions within the genome of a single cell. The mechanism of transposition can be either "copy and paste" or "cut and paste". Transposition can create phenotypically significant mutations and alter the cell's...

s and retrotransposon
Retrotransposon
Retrotransposons are genetic elements that can amplify themselves in a genome and are ubiquitous components of the DNA of many eukaryotic organisms. They are a subclass of transposon. They are particularly abundant in plants, where they are often a principal component of nuclear DNA...

s are mobile genetic elements
Mobile genetic elements
Mobile genetic elements are a type of DNA that can move around within the genome. They include:*Transposons **Retrotransposons**DNA transposons**Insertion sequences*Plasmids...

. Retrotransposon repeated sequences
Repeated sequence (DNA)
In the study of DNA sequences, one can distinguish two main types of repeated sequence:*Tandem repeats:**Satellite DNA**Minisatellite**Microsatellite*Interspersed repeats:**SINEs...

, which include long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs), account for a large proportion of the genomic sequences in many species. Alu sequence
Alu sequence
An Alu element is a short stretch of DNA originally characterized by the action of the Alu restriction endonuclease. Alu elements of different kinds occur in large numbers in primate genomes. In fact, Alu elements are the most abundant mobile elements in the human genome. They are derived from the...

s, classified as a short interspersed nuclear element, are the most abundant mobile elements in the human genome. Some examples have been found of SINEs exerting transcriptional control of some protein-encoding genes.

Endogenous retrovirus
Endogenous retrovirus
Endogenous retroviruses are sequences in the genome thought to be derived from ancient viral infections of germ cells in humans, mammals and other vertebrates; as such their proviruses are passed on to the next generation and now remain in the genome....

 sequences are the product of reverse transcription of retrovirus
Retrovirus
A retrovirus is an RNA virus that is duplicated in a host cell using the reverse transcriptase enzyme to produce DNA from its RNA genome. The DNA is then incorporated into the host's genome by an integrase enzyme. The virus thereafter replicates as part of the host cell's DNA...

 genomes into the genomes of germ cell
Germ cell
A germ cell is any biological cell that gives rise to the gametes of an organism that reproduces sexually. In many animals, the germ cells originate near the gut of an embryo and migrate to the developing gonads. There, they undergo cell division of two types, mitosis and meiosis, followed by...

s. Mutation within these retro-transcribed sequences can inactivate the viral genome.

Over 8% of the human genome is made up of (mostly decayed) endogenous retrovirus sequences, as part of the over 42% fraction that is recognizably derived of retrotransposons, while another 3% can be identified to be the remains of DNA transposons. Much of the remaining half of the genome that is currently without an explained origin is expected to have found its origin in transposable elements that were active so long ago (> 200 million years) that random mutations have rendered them unrecognizable. Genome size variation in at least two kinds of plants is mostly the result of retrotransposon sequences.

Telomeres

Telomere
Telomere
A telomere is a region of repetitive DNA sequences at the end of a chromosome, which protects the end of the chromosome from deterioration or from fusion with neighboring chromosomes. Its name is derived from the Greek nouns telos "end" and merοs "part"...

s are regions of repetitive DNA at the end of a chromosome
Chromosome
A chromosome is an organized structure of DNA and protein found in cells. It is a single piece of coiled DNA containing many genes, regulatory elements and other nucleotide sequences. Chromosomes also contain DNA-bound proteins, which serve to package the DNA and control its functions.Chromosomes...

, which provide protection from chromosomal deterioration during DNA replication
DNA replication
DNA replication is a biological process that occurs in all living organisms and copies their DNA; it is the basis for biological inheritance. The process starts with one double-stranded DNA molecule and produces two identical copies of the molecule...

.

Functions of noncoding DNA

Many noncoding DNA sequences have important biological functions as indicated by comparative genomics
Comparative genomics
Comparative genomics is the study of the relationship of genome structure and function across different biological species or strains. Comparative genomics is an attempt to take advantage of the information provided by the signatures of selection to understand the function and evolutionary...

 studies that report some regions of noncoding DNA that are highly conserved
Conserved sequence
In biology, conserved sequences are similar or identical sequences that occur within nucleic acid sequences , protein sequences, protein structures or polymeric carbohydrates across species or within different molecules produced by the same organism...

, sometimes on time-scales representing hundreds of millions of years, implying that these noncoding regions are under strong evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...

ary pressure and positive selection
Natural selection
Natural selection is the nonrandom process by which biologic traits become either more or less common in a population as a function of differential reproduction of their bearers. It is a key mechanism of evolution....

. For example, in the genomes of human
Human
Humans are the only living species in the Homo genus...

s and mice
MICE
-Fiction:*Mice , alien species in The Hitchhiker's Guide to the Galaxy*The Mice -Acronyms:* "Meetings, Incentives, Conferencing, Exhibitions", facilities terminology for events...

, which diverged from a common ancestor 65–75 million years ago, protein-coding DNA sequences account for only about 20% of conserved DNA, with the remaining majority of conserved DNA is represented in noncoding regions. Linkage mapping often identifies chromosomal regions associated with a disease with no evidence of functional coding variants of genes within the region, suggesting that disease-causing genetic variants lie in the noncoding DNA.

Some specific sequences of noncoding DNA may be features essential to chromosome structure, centromere
Centromere
A centromere is a region of DNA typically found near the middle of a chromosome where two identical sister chromatids come closest in contact. It is involved in cell division as the point of mitotic spindle attachment...

 function and homolog
Homology (biology)
Homology forms the basis of organization for comparative biology. In 1843, Richard Owen defined homology as "the same organ in different animals under every variety of form and function". Organs as different as a bat's wing, a seal's flipper, a cat's paw and a human hand have a common underlying...

 recognition in meiosis
Meiosis
Meiosis is a special type of cell division necessary for sexual reproduction. The cells produced by meiosis are gametes or spores. The animals' gametes are called sperm and egg cells....

.

According to a comparative study of over 300 prokaryotic and over 30 eukaryotic genomes, eukaryotes appear to require a minimum amount of non-coding DNA. This minimum amount can be predicted using a growth model for regulatory genetic networks, implying that it is required for regulatory purposes. In humans the predicted minimum is about 5% of the total genome.

Genetic switches

Some noncoding DNA sequences are genetic "switches" that regulate when and where genes are expressed.

Regulation of gene expression

Some noncoding DNA sequences determine the expression levels of various genes.

Transcription factors

Some noncoding DNA sequences determine where transcription factors attach. A transcription factor is a protein that binds to specific non-coding DNA sequences, thereby controlling the flow (or transcription) of genetic information from DNA to mRNA. Transcription factors act at very different locations on the genomes of different people.

Operators

An operator is a segment of DNA to which a repressor
Repressor
In molecular genetics, a repressor is a DNA-binding protein that regulates the expression of one or more genes by binding to the operator and blocking the attachment of RNA polymerase to the promoter, thus preventing transcription of the genes. This blocking of expression is called...

 binds. A repressor is a DNA-binding protein that regulates the expression of one or more genes by binding to the operator and blocking the attachment of RNA polymerase to the promoter, thus preventing transcription of the genes. This blocking of expression is called repression.

Enhancers

An enhancer is a short region of DNA that can be bound with proteins (trans-acting factors), much like a set of transcription factors, to enhance transcription levels of genes in a gene cluster.

Promoters

A promoter is a region of DNA that facilitates transcription of a particular gene. Promoters are typically located near the genes they regulate.

Noncoding DNA and evolution

Shared sequences of apparently non-functional DNA are a major line of evidence for common descent
Common descent
In evolutionary biology, a group of organisms share common descent if they have a common ancestor. There is strong quantitative support for the theory that all living organisms on Earth are descended from a common ancestor....

.

Pseudogene sequences appear to accumulate mutations more rapidly than coding sequences due to a loss of selective pressure. This allows for the creation of mutant alleles that incorporate new functions that may be favored by natural selection; thus, pseudogenes can serve as raw material for evolution
Evolution
Evolution is any change across successive generations in the heritable characteristics of biological populations. Evolutionary processes give rise to diversity at every level of biological organisation, including species, individual organisms and molecules such as DNA and proteins.Life on Earth...

 and can be considered "protogenes".

Junk DNA

Junk DNA, a term that was introduced in 1972 by Susumu Ohno
Susumu Ohno
was an Asian American geneticist and evolutionary biologist, and seminal researcher in the field of molecular evolution.- Biography :Susumu Ohno was born of Japanese parents in Seoul, Korea, on February 1, 1928. The second of five children, he was the son of the minister of education of the...

. Ohno noted that the mutational load from deleterious mutations placed an upper limit on the number of functional loci that could be expected given a typical mutation rate. Ohta predicted that mammal genomes could not have more than 30,000 loci under selection before the "cost" from the mutational load would cause an inescapable decline in fitness, and eventually extinction. This prediction remains robust, with the human genome containing approximately 20,000 genes. Junk DNA remains a label for the portions of a genome
Genome
In modern molecular biology and genetics, the genome is the entirety of an organism's hereditary information. It is encoded either in DNA or, for many types of virus, in RNA. The genome includes both the genes and the non-coding sequences of the DNA/RNA....

 sequence for which no discernible function
Function (biology)
A function is part of an answer to a question about why some object or process occurred in a system that evolved through a process of selection. Thus, function refers forward from the object or process, along some chain of causation, to the goal or success...

 had been identified. According to a 1980 review in Nature
Nature (journal)
Nature, first published on 4 November 1869, is ranked the world's most cited interdisciplinary scientific journal by the Science Edition of the 2010 Journal Citation Reports...

by Leslie Orgel
Leslie Orgel
Leslie Eleazer Orgel FRS was a British chemist.Born in London, England, Orgel received his B.A. in chemistry with first class honours from Oxford University in 1949...

 and Francis Crick
Francis Crick
Francis Harry Compton Crick OM FRS was an English molecular biologist, biophysicist, and neuroscientist, and most noted for being one of two co-discoverers of the structure of the DNA molecule in 1953, together with James D. Watson...

, junk DNA has "little specificity and conveys little or no selective advantage to the organism". The term is used mainly in popular science
Popular science
Popular science, sometimes called literature of science, is interpretation of science intended for a general audience. While science journalism focuses on recent scientific developments, popular science is broad-ranging, often written by scientists as well as journalists, and is presented in many...

 and in a colloquial
Colloquialism
A colloquialism is a word or phrase that is common in everyday, unconstrained conversation rather than in formal speech, academic writing, or paralinguistics. Dictionaries often display colloquial words and phrases with the abbreviation colloq. as an identifier...

 way in scientific publications, and its connotations may have slowed research into the biological functions of noncoding DNA. Several lines of evidence indicate that some "junk DNA" sequences are likely to have unidentified functional activity, and other sequences may have had functions in the past.

Still, a significant amount of the sequence of the genomes of eukaryotic organisms currently appears to fall under no existing classification other than "junk". For example, one experiment removed 0.1% of the mouse genome with no detectable effect on the phenotype
Phenotype
A phenotype is an organism's observable characteristics or traits: such as its morphology, development, biochemical or physiological properties, behavior, and products of behavior...

. This result suggests that the removed DNA was largely nonfunctional. In addition, these sequences are enriched for the heterochromatic histone modification H3K9me3.

Noncoding DNA and Long range correlations

A statistical distinction between coding and noncoding DNA sequences have been found. It has been observed that nucleotides in non-coding DNA sequences display long range power law correlations while coding sequences do not.

See also

  • Conserved non-coding sequence
    Conserved non-coding sequence
    A conserved non-coding sequence is a DNA sequence of noncoding DNA that is evolutionarily conserved. These sequences are of interest for their potential to regulate gene production....

  • Eukaryotic chromosome fine structure
    Eukaryotic chromosome fine structure
    Eukaryotic chromosome fine structure refers to the structure of sequences for eukaryotic chromosomes. Some fine sequences are included in more than one class, so the classification listed is not intended to be completely separate.-Chromosomal characteristics:...

  • Phylogenetic footprinting
    Phylogenetic footprinting
    Phylogenetic footprinting is a technique used to identify transcription factor binding sites within a non-coding region of DNA of interest by comparing it to the orthologous sequence in different species...

  • Transcriptome
    Transcriptome
    The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non-coding RNA produced in one or a population of cells.-Scope:...

  • Intergenic region
    Intergenic region
    An Intergenic region is a stretch of DNA sequences located between clusters of genes that contain few or no genes. Occasionally some intergenic DNA acts to control genes nearby, but most of it has no currently known function...

  • Gene regulatory network
    Gene regulatory network
    A gene regulatory network or genetic regulatory network is a collection of DNA segments in a cell whichinteract with each other indirectly and with other substances in the cell, thereby governing the rates at which genes in the network are transcribed into mRNA.In general, each mRNA molecule goes...


External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK