Open reading frame
Encyclopedia
In molecular genetics
, an open reading frame (ORF) is a DNA
sequence that does not contain a stop codon
in a given reading frame
.
Normally, inserts which interrupt the reading frame of a subsequent region after the start codon cause frameshift mutation of the sequence and dislocate the sequences for stop codons.
. Long ORFs are often used, along with other evidence, to initially identify candidate protein coding regions
in a DNA
sequence. The presence of an ORF does not necessarily mean that the region is ever translated
. For example in a randomly generated DNA sequence with an equal percentage of each nucleotide
, a stop-codon would be expected once every 21 codons. A simple gene prediction
algorithm for prokaryotes might look for a start codon
followed by an open reading frame that is long enough to encode a typical protein, where the codon usage
of that region matches the frequency characteristic for the given organism's coding regions. By itself even a long open reading frame is not conclusive evidence for the presence of a gene
.
Possible stop codon
s in DNA are "TGA", "TAA" and "TAG". Thus, the last reading frame in this example contains a stop codon (TAA), unlike the first two.
Molecular genetics
Molecular genetics is the field of biology and genetics that studies the structure and function of genes at a molecular level. The field studies how the genes are transferred from generation to generation. Molecular genetics employs the methods of genetics and molecular biology...
, an open reading frame (ORF) is a DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
sequence that does not contain a stop codon
Stop codon
In the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are based on polypeptides, which are unique sequences of amino acids. Most codons in messenger RNA correspond to the addition of an amino acid to a growing polypeptide...
in a given reading frame
Reading frame
In biology, a reading frame is a way of breaking a sequence of nucleotides in DNA or RNA into three letter codons which can be translated in amino acids. There are 3 possible reading frames in an mRNA strand: each reading frame corresponding to starting at a different alignment...
.
Normally, inserts which interrupt the reading frame of a subsequent region after the start codon cause frameshift mutation of the sequence and dislocate the sequences for stop codons.
Significance
One common use of open reading frames are as one piece of evidence to assist in gene predictionGene prediction
In computational biology gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions...
. Long ORFs are often used, along with other evidence, to initially identify candidate protein coding regions
Genetic code
The genetic code is the set of rules by which information encoded in genetic material is translated into proteins by living cells....
in a DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
sequence. The presence of an ORF does not necessarily mean that the region is ever translated
Translation (genetics)
In molecular biology and genetics, translation is the third stage of protein biosynthesis . In translation, messenger RNA produced by transcription is decoded by the ribosome to produce a specific amino acid chain, or polypeptide, that will later fold into an active protein...
. For example in a randomly generated DNA sequence with an equal percentage of each nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...
, a stop-codon would be expected once every 21 codons. A simple gene prediction
Gene prediction
In computational biology gene prediction or gene finding refers to the process of identifying the regions of genomic DNA that encode genes. This includes protein-coding genes as well as RNA genes, but may also include prediction of other functional elements such as regulatory regions...
algorithm for prokaryotes might look for a start codon
Start codon
The start codon is generally defined as the point, sequence, at which a ribosome begins to translate a sequence of RNA into amino acids.When an RNA transcript is "read" from the 5' carbon to the 3' carbon by the ribosome the start codon is the first codon on which the tRNA bound to Met,...
followed by an open reading frame that is long enough to encode a typical protein, where the codon usage
Codon usage bias
Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides that encodes a specific amino acid residue in a polypeptide chain or for the termination of translation .There are 64 different codons but only 20...
of that region matches the frequency characteristic for the given organism's coding regions. By itself even a long open reading frame is not conclusive evidence for the presence of a gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...
.
Example
If a portion of a genome has been sequenced (e.g. 5'-ATCTAAAATGGGTGCC-3'), ORFs can be located by examining each of the three possible reading frames on each strand. In this sequence two out of three possible reading frames are entirely open, meaning that they do not contain a stop codon:- ...A TCT AAA ATG GGT GCC...
- ...AT CTA AAA TGG GTG CC...
- ...ATC TAA AAT GGG TGC C...
Possible stop codon
Stop codon
In the genetic code, a stop codon is a nucleotide triplet within messenger RNA that signals a termination of translation. Proteins are based on polypeptides, which are unique sequences of amino acids. Most codons in messenger RNA correspond to the addition of an amino acid to a growing polypeptide...
s in DNA are "TGA", "TAA" and "TAG". Thus, the last reading frame in this example contains a stop codon (TAA), unlike the first two.
See also
- SequeromeSequeromeSequerome is a web-based Sequence profiling tool for integrating the results of a BLAST sequence-alignment report with external research tools and servers that perform advanced sequence manipulations, and allowing the user to record the steps of such an analysis...
- A sequence profiling toolSequence profiling toolA sequence profiling tool in bioinformatics is a type of software that presents information related to a genetic sequence, gene name, or keyword input. Such tools generally take a query such as a DNA, RNA, or protein sequence or ‘keyword’ and search one or more databases for information related to...
that links each BLASTBLASTIn bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences...
record to the NCBINational Center for Biotechnology InformationThe National Center for Biotechnology Information is part of the United States National Library of Medicine , a branch of the National Institutes of Health. The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation sponsored by Senator Claude Pepper...
ORF enabling complete ORF analysis of a BLAST report.
External links
- Translation and Open Reading Frames
- NCBI ORF finder - A web based interactive tool for predicting and analysing ORFs from nucleotide sequences.
- ORF finder - A web based interactive tool for predicting and analysing ORFs from nucleotide sequences - hosted at bioinformatics.org
- hORFeome V5.1 - A web based interactive tool for CCSB Human ORFeome Collection