Kozak consensus sequence
Encyclopedia
The Kozak consensus sequence, Kozak consensus or Kozak sequence, is a sequence which occurs on eukaryotic mRNA and has the consensus
(gcc)gccRccAUGG, where R is a purine
(adenine
or guanine
) three bases upstream of the start codon
(AUG), which is followed by another 'G'. The Kozak consensus sequence plays a major role in the initiation of the translation process. The sequence was named after its discoverer, Marilyn Kozak.
is recognized by the ribosome
as the translational start site, from which a protein
is coded by that mRNA molecule. The ribosome requires this sequence, or a possible variation (see below) to initiate translation. The Kozak sequence is not to be confused with the ribosomal binding site
(RBS), that being either the 5' cap
of a messenger RNA
or an Internal Ribosome Entry Site (IRES).
In vivo, this site is often not matched exactly on different mRNAs and the amount of protein synthesized from a given mRNA is dependent on the strength of the Kozak sequence. Some nucleotides in this sequence are more important than others: the AUG is most important because it is the actual initiation codon encoding a methionine amino acid at the N-terminus of the protein. (Rarely, CTG is used as an initiation codon, encoding a leucine instead of its typical methionine.) The A nucleotide of the "AUG" is referred to as number 1. For a 'strong' consensus, the nucleotides at positions +4 (i.e. G in the consensus) and -3 (i.e. either A or G in the consensus) relative to the number 1 nucleotide must both match the consensus (there is no number 0 position). An 'adequate' consensus has only 1 of these sites, while a 'weak' consensus has neither. The cc at -1 and -2 are not as conserved, but contribute to the overall strength. There is also evidence that a G in the -6 position is important in the initiation of translation.
There are examples in vivo of each of these types of Kozak consensus, and they probably evolved as yet another mechanism of gene regulation. Lmx1b is an example of a gene with a weak Kozak consensus sequence. For initiation of translation from such a site, other features are required in the mRNA sequence in order for the ribosome to recognize the initiation codon.
AGNNAUGN
ANNAUGG
ACCAUGG
GACACCAUGG
Consensus sequence
In molecular biology and bioinformatics, consensus sequence refers to the most common nucleotide or amino acid at a particular position after multiple sequences are aligned. A consensus sequence is a way of representing the results of a multiple sequence alignment, where related sequences are...
(gcc)gccRccAUGG, where R is a purine
Purine
A purine is a heterocyclic aromatic organic compound, consisting of a pyrimidine ring fused to an imidazole ring. Purines, including substituted purines and their tautomers, are the most widely distributed kind of nitrogen-containing heterocycle in nature....
(adenine
Adenine
Adenine is a nucleobase with a variety of roles in biochemistry including cellular respiration, in the form of both the energy-rich adenosine triphosphate and the cofactors nicotinamide adenine dinucleotide and flavin adenine dinucleotide , and protein synthesis, as a chemical component of DNA...
or guanine
Guanine
Guanine is one of the four main nucleobases found in the nucleic acids DNA and RNA, the others being adenine, cytosine, and thymine . In DNA, guanine is paired with cytosine. With the formula C5H5N5O, guanine is a derivative of purine, consisting of a fused pyrimidine-imidazole ring system with...
) three bases upstream of the start codon
Start codon
The start codon is generally defined as the point, sequence, at which a ribosome begins to translate a sequence of RNA into amino acids.When an RNA transcript is "read" from the 5' carbon to the 3' carbon by the ribosome the start codon is the first codon on which the tRNA bound to Met,...
(AUG), which is followed by another 'G'. The Kozak consensus sequence plays a major role in the initiation of the translation process. The sequence was named after its discoverer, Marilyn Kozak.
Introduction
This sequence on an mRNA moleculeMolecule
A molecule is an electrically neutral group of at least two atoms held together by covalent chemical bonds. Molecules are distinguished from ions by their electrical charge...
is recognized by the ribosome
Ribosome
A ribosome is a component of cells that assembles the twenty specific amino acid molecules to form the particular protein molecule determined by the nucleotide sequence of an RNA molecule....
as the translational start site, from which a protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
is coded by that mRNA molecule. The ribosome requires this sequence, or a possible variation (see below) to initiate translation. The Kozak sequence is not to be confused with the ribosomal binding site
Ribosomal Binding Site
A ribosomal binding site is a sequence on mRNA that is bound by the ribosome when initiating protein translation.It can be either the 5' cap of a messenger RNA in eukaryotes, a region 6-7 nucleotides upstream of the start codon AUG in prokaryotes , or an internal ribosome entry site in viruses...
(RBS), that being either the 5' cap
5' cap
The 5' cap is a specially altered nucleotide on the 5' end of precursor messenger RNA and some other primary RNA transcripts as found in eukaryotes. The process of 5' capping is vital to creating mature messenger RNA, which is then able to undergo translation...
of a messenger RNA
Messenger RNA
Messenger RNA is a molecule of RNA encoding a chemical "blueprint" for a protein product. mRNA is transcribed from a DNA template, and carries coding information to the sites of protein synthesis: the ribosomes. Here, the nucleic acid polymer is translated into a polymer of amino acids: a protein...
or an Internal Ribosome Entry Site (IRES).
In vivo, this site is often not matched exactly on different mRNAs and the amount of protein synthesized from a given mRNA is dependent on the strength of the Kozak sequence. Some nucleotides in this sequence are more important than others: the AUG is most important because it is the actual initiation codon encoding a methionine amino acid at the N-terminus of the protein. (Rarely, CTG is used as an initiation codon, encoding a leucine instead of its typical methionine.) The A nucleotide of the "AUG" is referred to as number 1. For a 'strong' consensus, the nucleotides at positions +4 (i.e. G in the consensus) and -3 (i.e. either A or G in the consensus) relative to the number 1 nucleotide must both match the consensus (there is no number 0 position). An 'adequate' consensus has only 1 of these sites, while a 'weak' consensus has neither. The cc at -1 and -2 are not as conserved, but contribute to the overall strength. There is also evidence that a G in the -6 position is important in the initiation of translation.
There are examples in vivo of each of these types of Kozak consensus, and they probably evolved as yet another mechanism of gene regulation. Lmx1b is an example of a gene with a weak Kozak consensus sequence. For initiation of translation from such a site, other features are required in the mRNA sequence in order for the ribosome to recognize the initiation codon.
Mutations
Research has shown that a mutation of G—>C in the -6 position of the β-globin gene (β+45; human) disrupted the haematological and biosynthetic phenotype function. This was the first mutation found in the Kozak sequence. It was found in a family from the Southeast Italy and they suffered from thalassaemia intermedia.Variations in the consensus sequence
(gcc)gccRccAUGGAGNNAUGN
ANNAUGG
ACCAUGG
GACACCAUGG
Biota | Phylum | Consensus sequences |
---|---|---|
Vertebrate Vertebrate Vertebrates are animals that are members of the subphylum Vertebrata . Vertebrates are the largest group of chordates, with currently about 58,000 species described. Vertebrates include the jawless fishes, bony fishes, sharks and rays, amphibians, reptiles, mammals, and birds... |
gccRccATGG | |
Fruit fly Drosophilidae Drosophilidae is a diverse, cosmopolitan family of flies, which includes fruit flies. Another family of flies called Tephritidae also includes fruit flies. The best known species of Drosophilidae is Drosophila melanogaster, within the genus Drosophila, and this species Is used extensively for... (Drosophila spp.) |
Arthropoda | cAAaATG |
Budding yeast (Saccharomyces cerevisiae) | Ascomycota Ascomycota The Ascomycota are a Division/Phylum of the kingdom Fungi, and subkingdom Dikarya. Its members are commonly known as the Sac fungi. They are the largest phylum of Fungi, with over 64,000 species... |
aAaAaAATGTCt |
Slime mold (Dictyostelium discoideum) | Amoebozoa Amoebozoa The Amoebozoa are a major group of amoeboid protozoa, including the majority that move by means ofinternal cytoplasmic flow. Their pseudopodia are characteristically blunt and finger-like,... |
aaaAAAATGRna |
Ciliate Ciliate The ciliates are a group of protozoans characterized by the presence of hair-like organelles called cilia, which are identical in structure to flagella but typically shorter and present in much larger numbers with a different undulating pattern than flagella... |
Ciliophora | nTaAAAATGRct |
Malarial protozoa (Plasmodium spp.) | Apicomplexa Apicomplexa The Apicomplexa are a large group of protists, most of which possess a unique organelle called apicoplast and an apical complex structure involved in penetrating a host's cell. They are unicellular, spore-forming, and exclusively parasites of animals. Motile structures such as flagella or... |
taaAAAATGAan |
Toxoplasma (Toxoplasma gondii) | Apicomplexa Apicomplexa The Apicomplexa are a large group of protists, most of which possess a unique organelle called apicoplast and an apical complex structure involved in penetrating a host's cell. They are unicellular, spore-forming, and exclusively parasites of animals. Motile structures such as flagella or... |
gncAaaATGg |
Trypanosomatidae | Euglenozoa Euglenozoa The Euglenozoa are a large group of flagellate protozoa. They include a variety of common free-living species, as well as a few important parasites, some of which infect humans. There are two main subgroups, the euglenids and kinetoplastids... |
nnnAnnATGnC |
Terrestrial plants | AACAATGGC |