Gene synthesis
Encyclopedia
Artificial gene synthesis is the process of synthesizing a gene
in vitro
without the need for initial template DNA
samples. The main method is currently by oligonucleotide synthesis
(also used for other applications) from digital genetic sequences and subsequent annealing of the resultant fragments. In contrast, natural DNA replication
requires existing DNA templates for synthesizing new DNA.
Synthesis of the first complete gene, a yeast tRNA, was demonstrated by Har Gobind Khorana and coworkers in 1972. Synthesis of the first peptide
- and protein
-coding genes was performed in the laboratories of Herbert Boyer
and Alexander Markham, respectively.
Commercial gene synthesis services are now available from numerous companies worldwide, some of which have built their business model around this task. Current gene synthesis approaches are most often based on a combination of organic chemistry and molecular biological techniques and entire genes may be synthesized "de novo", without the need for precursor template DNA. Gene synthesis has become an important tool in many fields of recombinant DNA technology including heterologous gene expression, vaccine development, gene therapy and molecular engineering. The synthesis of nucleic acid sequences is often more economical than classical cloning and mutagenesis procedures. The market for gene synthesis was growing constantly over the past years. Experts estimated its volume to 40 Mio US-$ by the end of 2007.
Producing large amounts of protein
from gene sequences (or at least the protein coding regions of genes, the open reading frame
) found in nature can sometimes prove difficult and is a problem of sufficient impact that scientific conferences have been devoted to the topic. Many of the most interesting proteins sought by molecular biologist are normally regulated to be expressed in very low amounts in wild type
cells
. Redesigning these genes offers a means to improve gene expression in many cases. Rewriting the open reading frame is possible because of the degeneracy of the genetic code. Thus it is possible to change up to about a third of the nucleotides in an open reading frame and still produce the same protein. The available number of alternate designs possible for a given protein is astronomical. For a typical protein sequence of 300 amino acids there are over 10150 codon combinations that will encode an identical protein. Using optimization methods such as replacing rarely used codons with more common codons sometimes have a dramatic effects. Further optimizations such as removing RNA
secondary structure
s can also be included. At least in the case of E. coli, protein expression is maximized by predominantly using codons corresponding to tRNA's that retain amino acid charging during starvation. Computer programs are written to perform these, and other simultaneous optimizations are used to handle the enormous complexity of the task. A well optimized gene can improve protein expression 2 to 10 fold, and in some cases more than 100 fold improvements have been reported. Because of the large numbers of nucleotide
changes made to the original DNA sequence, the only practical way to create the newly designed genes is to use gene synthesis.
s, normal nucleotides which have protection groups: preventing amine, hydroxyl groups and phosphate groups interacting incorrectly. One phosphoramidite is added at a time, the product's 5' phosphate is deprotected and a new base is added and so on (backwards), at the end, all the protection groups are removed. Nevertheless, being a chemical process, several incorrect interactions occur leading to some defective products. The longer the oligonucleotide sequence that is being synthesized, the more defects there are, thus this process is only practical for producing short sequences of nucleotides. The current practical limit is about 200 bp for an oligonucleotide with sufficient quality to be used directly for a biological application. HPLC
can be used to isolate products with the proper sequence. Meanwhile a large number of oligos can be synthesized in parallel on gene chips. For optimal performance in subsequent gene synthesis procedures they should be prepared individually and in larger scales.
The result of a gene synthesis experiment depends strongly on the quality of the oligonucleotides used. For these annealing based gene synthesis protocols, the quality of the product is directly and exponentially dependent on the correctness of the employed oligonucleotides. Alternatively, after performing gene synthesis with oligos of lower quality, more effort must be made in downstream quality assurance during clone analysis, which is usually done by time-consuming standard cloning and sequencing procedures.
Another problem associated with all current gene synthesis methods is the high frequency of sequence errors because of the usage of chemically synthesized oligonucleotides. The error frequency increases with longer oligonucleotides, and as a consequence the percentage of correct product decreases dramatically as more oligonucleotides are used.
The mutation problem could be solved by shorter oligonucleotides used to assemble the gene. However, all annealing based assembly methods require the primers to be mixed together in one tube. In this case, shorter overlaps do not always allow precise and specific annealing of complementary primers, resulting in the inhibition of full length product formation.
Manual design of oligonucleotides is a laborious procedure and does not guarantee the successful synthesis of the desired gene. For optimal performance of almost all annealing based methods, the melting temperatures of the overlapping regions are supposed to be similar for all oligonucleotides. The necessary primer optimization should be performed using specialized oligonucleotide design programs. Several solutions for automated primer design for gene synthesis have been presented so far.
or specific endonucleases from bacteria or phages. Nevertheless, all these strategies increase time and costs for gene synthesis based on the annealing of chemically synthesized oligonucleotides.
Increasingly, genes are ordered in sets including functionally related genes or multiple sequence variants on a single gene. Virtually all of the therapeutic proteins in development, such as monoclonal antibodies, are optimized by testing many gene variants for improved function or expression.
published an article in Science Express, saying that they had successfully transplanted the natural DNA from a Mycoplasma mycoides
bacterium into a Mycoplasma capricolum
cell, creating a bacterium which behaved like a M. mycoides.
On Oct 6, 2007, Craig Venter
announced in an interview with UK's The Guardian
newspaper that the same team had synthesized a modified version of the single chromosome
of Mycoplasma genitalium
using chemicals. The chromosome was modified to eliminate all genes which tests in live bacteria had shown to be unnecessary. The next planned step in this minimal genome project is to transplant the synthesized minimal genome into a bacterial cell with its old DNA removed; the resulting bacterium will be called Mycoplasma laboratorium
. The next day the Canadian bioethics
group, ETC Group
issued a statement through their representative, Pat Mooney
, saying Venter's "creation" was "a chassis on which you could build almost anything". The synthesized genome had not yet been transplanted into a working cell.
On May 21, 2010, Science
reported that the Venter group had successfully synthesized the genome of the bacterium Mycoplasma mycoides from a computer record, and transplanted the synthesized genome into the existing cell of a Mycoplasma capricolum bacterium that had had its DNA removed. The "synthetic" bacterium was viable, i.e. capable of replicating billions of times. The team had originally planned to use the M. genitalium bacterium they had previously been working with, but switched to M. mycoides because the latter bacterium grows much faster, which translated into quicker experiments. Venter describes it as "the first species.... to have its parents be a computer". The transformed bacterium is dubbed "Synthia" by ETC. A Venter spokesperson has declined to confirm any breakthrough at the time of this writing, likely because similar genetic introduction techniques such as transfection
, transformation
, transduction
and protofection
have been a standard research practice for many years.
Now that the technique has been proven to work with the M. mycoides genome, the next project is presumably to go back to the minimized M. genitalium and transplant it into a cell to create the previously mentioned M. laboratorium.
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...
in vitro
In vitro
In vitro refers to studies in experimental biology that are conducted using components of an organism that have been isolated from their usual biological context in order to permit a more detailed or more convenient analysis than can be done with whole organisms. Colloquially, these experiments...
without the need for initial template DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
samples. The main method is currently by oligonucleotide synthesis
Oligonucleotide synthesis
Oligonucleotide synthesis is the chemical synthesis of relatively short fragments of nucleic acids with defined chemical structure . The technique is extremely useful in current laboratory practice because it provides a rapid and inexpensive access to custom-made oligonucleotides of the desired...
(also used for other applications) from digital genetic sequences and subsequent annealing of the resultant fragments. In contrast, natural DNA replication
DNA replication
DNA replication is a biological process that occurs in all living organisms and copies their DNA; it is the basis for biological inheritance. The process starts with one double-stranded DNA molecule and produces two identical copies of the molecule...
requires existing DNA templates for synthesizing new DNA.
Synthesis of the first complete gene, a yeast tRNA, was demonstrated by Har Gobind Khorana and coworkers in 1972. Synthesis of the first peptide
Peptide
Peptides are short polymers of amino acid monomers linked by peptide bonds. They are distinguished from proteins on the basis of size, typically containing less than 50 monomer units. The shortest peptides are dipeptides, consisting of two amino acids joined by a single peptide bond...
- and protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
-coding genes was performed in the laboratories of Herbert Boyer
Herbert Boyer
Herbert W. Boyer is a recipient of the 1990 National Medal of Science, co-recipient of the 1996 Lemelson-MIT Prize, and a co-founder of Genentech. He served as Vice President of Genentech from 1976 through his retirement in 1991....
and Alexander Markham, respectively.
Commercial gene synthesis services are now available from numerous companies worldwide, some of which have built their business model around this task. Current gene synthesis approaches are most often based on a combination of organic chemistry and molecular biological techniques and entire genes may be synthesized "de novo", without the need for precursor template DNA. Gene synthesis has become an important tool in many fields of recombinant DNA technology including heterologous gene expression, vaccine development, gene therapy and molecular engineering. The synthesis of nucleic acid sequences is often more economical than classical cloning and mutagenesis procedures. The market for gene synthesis was growing constantly over the past years. Experts estimated its volume to 40 Mio US-$ by the end of 2007.
Gene Optimization
While the ability to make increasingly long stretches of DNA efficiently and at lower prices is a technological driver of this field, increasingly attention is being focused on improving the design of genes for specific purposes. Early in the genome sequencing era, gene synthesis was used as an (expensive) source of cDNA's that were predicted by genomic or partial cDNA information but were difficult to clone. As higher quality sources of sequence verified cloned cDNA have become available, this practice has become less urgent.Producing large amounts of protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
from gene sequences (or at least the protein coding regions of genes, the open reading frame
Open reading frame
In molecular genetics, an open reading frame is a DNA sequence that does not contain a stop codon in a given reading frame.Normally, inserts which interrupt the reading frame of a subsequent region after the start codon cause frameshift mutation of the sequence and dislocate the sequences for stop...
) found in nature can sometimes prove difficult and is a problem of sufficient impact that scientific conferences have been devoted to the topic. Many of the most interesting proteins sought by molecular biologist are normally regulated to be expressed in very low amounts in wild type
Wild type
Wild type refers to the phenotype of the typical form of a species as it occurs in nature. Originally, the wild type was conceptualized as a product of the standard, "normal" allele at a locus, in contrast to that produced by a non-standard, "mutant" allele...
cells
Cell (biology)
The cell is the basic structural and functional unit of all known living organisms. It is the smallest unit of life that is classified as a living thing, and is often called the building block of life. The Alberts text discusses how the "cellular building blocks" move to shape developing embryos....
. Redesigning these genes offers a means to improve gene expression in many cases. Rewriting the open reading frame is possible because of the degeneracy of the genetic code. Thus it is possible to change up to about a third of the nucleotides in an open reading frame and still produce the same protein. The available number of alternate designs possible for a given protein is astronomical. For a typical protein sequence of 300 amino acids there are over 10150 codon combinations that will encode an identical protein. Using optimization methods such as replacing rarely used codons with more common codons sometimes have a dramatic effects. Further optimizations such as removing RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....
secondary structure
Secondary structure
In biochemistry and structural biology, secondary structure is the general three-dimensional form of local segments of biopolymers such as proteins and nucleic acids...
s can also be included. At least in the case of E. coli, protein expression is maximized by predominantly using codons corresponding to tRNA's that retain amino acid charging during starvation. Computer programs are written to perform these, and other simultaneous optimizations are used to handle the enormous complexity of the task. A well optimized gene can improve protein expression 2 to 10 fold, and in some cases more than 100 fold improvements have been reported. Because of the large numbers of nucleotide
Nucleotide
Nucleotides are molecules that, when joined together, make up the structural units of RNA and DNA. In addition, nucleotides participate in cellular signaling , and are incorporated into important cofactors of enzymatic reactions...
changes made to the original DNA sequence, the only practical way to create the newly designed genes is to use gene synthesis.
Chemical synthesis of oligonucleotides
Oligonucleotides are chemically synthesized using nucleotides, called phosphoramiditePhosphoramidite
Nucleoside phosphoramidites are derivatives of natural or synthetic nucleosides. They are used to synthesize oligonucleotides, relatively short fragments of nucleic acid and their analogs. Nucleoside phosphoramidites were first introduced in 1981 by Beaucage and Caruthers...
s, normal nucleotides which have protection groups: preventing amine, hydroxyl groups and phosphate groups interacting incorrectly. One phosphoramidite is added at a time, the product's 5' phosphate is deprotected and a new base is added and so on (backwards), at the end, all the protection groups are removed. Nevertheless, being a chemical process, several incorrect interactions occur leading to some defective products. The longer the oligonucleotide sequence that is being synthesized, the more defects there are, thus this process is only practical for producing short sequences of nucleotides. The current practical limit is about 200 bp for an oligonucleotide with sufficient quality to be used directly for a biological application. HPLC
High-performance liquid chromatography
High-performance liquid chromatography , HPLC, is a chromatographic technique that can separate a mixture of compounds and is used in biochemistry and analytical chemistry to identify, quantify and purify the individual components of the mixture.HPLC typically utilizes different types of stationary...
can be used to isolate products with the proper sequence. Meanwhile a large number of oligos can be synthesized in parallel on gene chips. For optimal performance in subsequent gene synthesis procedures they should be prepared individually and in larger scales.
Annealing based connection of oligonucleotides
Usually, a set of individually designed oligonucleotides is made on automated solid-phase synthesizers, purified and then connected by specific annealing and standard ligation or polymerase reactions. To improve specificity of oligonucleotide annealing, the synthesis step relies on a set of thermostable DNA ligase and polymerase enzymes. To date, several methods for gene synthesis have been described, such as the ligation of phosphorylated overlapping oligonucleotides, the Fok I method and a modified form of ligase chain reaction for gene synthesis. Additionally, several PCR assembly approaches have been described. They usually employ oligonucleotides of 40-50 nt long that overlap each other. These oligonucleotides are designed to cover most of the sequence of both strands, and the full-length molecule is generated progressively by overlap extension (OE) PCR, thermodynamically balanced inside-out (TBIO) PCR or combined approaches. The most commonly synthesized genes range in size from 600 to 1,200 bp.although much longer genes have been made by connecting previously assembled fragments of under 1,000 bp. In this size range it is necessary test several candidate clones confirming the sequence of the cloned synthetic gene by automated sequencing methods.Limitations
Moreover, because the assembly of the full-length gene product relies on the efficient and specific alignment of long single stranded oligonucleotides, critical parameters for synthesis success include extended sequence regions comprising secondary structures caused by inverted repeats, extraordinary high or low GC-content, or repetitive structures. Usually these segments of a particular gene can only be synthesized by splitting the procedure into several consecutive steps and a final assembly of shorter sub-sequences, which in turn leads to a significant increase in time and labor needed for its production.The result of a gene synthesis experiment depends strongly on the quality of the oligonucleotides used. For these annealing based gene synthesis protocols, the quality of the product is directly and exponentially dependent on the correctness of the employed oligonucleotides. Alternatively, after performing gene synthesis with oligos of lower quality, more effort must be made in downstream quality assurance during clone analysis, which is usually done by time-consuming standard cloning and sequencing procedures.
Another problem associated with all current gene synthesis methods is the high frequency of sequence errors because of the usage of chemically synthesized oligonucleotides. The error frequency increases with longer oligonucleotides, and as a consequence the percentage of correct product decreases dramatically as more oligonucleotides are used.
The mutation problem could be solved by shorter oligonucleotides used to assemble the gene. However, all annealing based assembly methods require the primers to be mixed together in one tube. In this case, shorter overlaps do not always allow precise and specific annealing of complementary primers, resulting in the inhibition of full length product formation.
Manual design of oligonucleotides is a laborious procedure and does not guarantee the successful synthesis of the desired gene. For optimal performance of almost all annealing based methods, the melting temperatures of the overlapping regions are supposed to be similar for all oligonucleotides. The necessary primer optimization should be performed using specialized oligonucleotide design programs. Several solutions for automated primer design for gene synthesis have been presented so far.
Error correction procedures
To overcome problems associated with oligonucleotide quality several elaborate strategies have been developed, employing either separately prepared fishing oligonucleotides, mismatch binding enzymes of the mutS familyMutS-1
Mismatch repair contributes to the overall fidelity of DNA replication and is essential for combating the adverse effects of damage to the genome. It involves the correction of mismatched base pairs that have been missed by the proofreading element of the DNA polymerase complex...
or specific endonucleases from bacteria or phages. Nevertheless, all these strategies increase time and costs for gene synthesis based on the annealing of chemically synthesized oligonucleotides.
Increasingly, genes are ordered in sets including functionally related genes or multiple sequence variants on a single gene. Virtually all of the therapeutic proteins in development, such as monoclonal antibodies, are optimized by testing many gene variants for improved function or expression.
Applications
Major applications of synthetic genes include synthesis of DNA sequences identified by high throughput sequencing but never cloned into plasmids and the ability to safely obtain genes for vaccine research without the need to grow the full pathogens. Digital manipulation of digital genetic code before synthesis into DNA can be used to optimize protein expression in a particular host, or remove non-functional segments in order to facilitate further replication of the DNA.Synthia and Mycoplasma laboratorium
On June 28, 2007, a team at the J. Craig Venter InstituteJ. Craig Venter Institute
The J. Craig Venter Institute is a non-profit genomics research institute founded by J. Craig Venter, Ph.D. in October 2006. The Institute was the result of consolidating four organizations: the Center for the Advancement of Genomics, The Institute for Genomic Research, the Institute for...
published an article in Science Express, saying that they had successfully transplanted the natural DNA from a Mycoplasma mycoides
Mycoplasma mycoides
Mycoplasma mycoides is a bacterial species of the genus Mycoplasma in the class Mollicutes.This microorganism is a parasite that lives in ruminants , causing lung disease....
bacterium into a Mycoplasma capricolum
Mycoplasma capricolum
Mycoplasma capricolum is a species of Mycoplasma bacteria. It is primarily a pathogen of goats, but has also been found in sheep and cows.-External links:* at MicrobeWiki...
cell, creating a bacterium which behaved like a M. mycoides.
On Oct 6, 2007, Craig Venter
Craig Venter
John Craig Venter is an American biologist and entrepreneur, most famous for his role in being one of the first to sequence the human genome and for his role in creating the first cell with a synthetic genome in 2010. Venter founded Celera Genomics, The Institute for Genomic Research and the J...
announced in an interview with UK's The Guardian
The Guardian
The Guardian, formerly known as The Manchester Guardian , is a British national daily newspaper in the Berliner format...
newspaper that the same team had synthesized a modified version of the single chromosome
Chromosome
A chromosome is an organized structure of DNA and protein found in cells. It is a single piece of coiled DNA containing many genes, regulatory elements and other nucleotide sequences. Chromosomes also contain DNA-bound proteins, which serve to package the DNA and control its functions.Chromosomes...
of Mycoplasma genitalium
Mycoplasma genitalium
Mycoplasma genitalium is a small parasitic bacterium that lives on the ciliated epithelial cells of the primate genital and respiratory tracts. M. genitalium is the smallest known genome that can constitute a cell, and the second-smallest bacterium after the recently-discovered endosymbiont...
using chemicals. The chromosome was modified to eliminate all genes which tests in live bacteria had shown to be unnecessary. The next planned step in this minimal genome project is to transplant the synthesized minimal genome into a bacterial cell with its old DNA removed; the resulting bacterium will be called Mycoplasma laboratorium
Mycoplasma laboratorium
Mycoplasma laboratorium is a planned partially synthetic species of bacterium derived from the genome of Mycoplasma genitalium. This effort in synthetic biology is being undertaken at the J. Craig Venter Institute by a team of approximately 20 scientists headed by Nobel laureate Hamilton Smith, and...
. The next day the Canadian bioethics
Bioethics
Bioethics is the study of controversial ethics brought about by advances in biology and medicine. Bioethicists are concerned with the ethical questions that arise in the relationships among life sciences, biotechnology, medicine, politics, law, and philosophy....
group, ETC Group
ETC Group
ETC Group is an international organization dedicated to "the conservation and sustainable advancement of cultural and ecological diversity and human rights." The full legal name is Action Group on Erosion, Technology and Concentration...
issued a statement through their representative, Pat Mooney
Pat Roy Mooney
For more than thirty years, Pat Mooney has worked with civil society organisations on international trade and development issues related to agriculture, biodiversity and new technologies. Mooney has lived most of his life on the Canadian prairies...
, saying Venter's "creation" was "a chassis on which you could build almost anything". The synthesized genome had not yet been transplanted into a working cell.
On May 21, 2010, Science
Science (journal)
Science is the academic journal of the American Association for the Advancement of Science and is one of the world's top scientific journals....
reported that the Venter group had successfully synthesized the genome of the bacterium Mycoplasma mycoides from a computer record, and transplanted the synthesized genome into the existing cell of a Mycoplasma capricolum bacterium that had had its DNA removed. The "synthetic" bacterium was viable, i.e. capable of replicating billions of times. The team had originally planned to use the M. genitalium bacterium they had previously been working with, but switched to M. mycoides because the latter bacterium grows much faster, which translated into quicker experiments. Venter describes it as "the first species.... to have its parents be a computer". The transformed bacterium is dubbed "Synthia" by ETC. A Venter spokesperson has declined to confirm any breakthrough at the time of this writing, likely because similar genetic introduction techniques such as transfection
Transfection
Transfection is the process of deliberately introducing nucleic acids into cells. The term is used notably for non-viral methods in eukaryotic cells...
, transformation
Transformation (genetics)
In molecular biology transformation is the genetic alteration of a cell resulting from the direct uptake, incorporation and expression of exogenous genetic material from its surroundings and taken up through the cell membrane. Transformation occurs naturally in some species of bacteria, but it can...
, transduction
Transduction (genetics)
Transduction is the process by which DNA is transferred from one bacterium to another by a virus. It also refers to the process whereby foreign DNA is introduced into another cell via a viral vector. Transduction does not require cell-to-cell contact , and it is DNAase resistant...
and protofection
Protofection
Protofection is a term that refers to the transfection of foreign mitochondrial DNA to replace the original energy generators within cells. As mitochondria are damaged with age, this would be a method of rejuvenating them to original states....
have been a standard research practice for many years.
Now that the technique has been proven to work with the M. mycoides genome, the next project is presumably to go back to the minimized M. genitalium and transplant it into a cell to create the previously mentioned M. laboratorium.
External links
- GeneSpace.net - a directory of commercial gene synthesis providers
- Craig Venter: On the Verge of Creating Synthetic Life - TED (Technology Entertainment Design) conferenceTED (conference)TED is a global set of conferences owned by the private non-profit Sapling Foundation, formed to disseminate "ideas worth spreading"....
(video)