Massive parallel sequencing
Encyclopedia
Massive Parallel Sequencing is a term used to describe several revolutionary approaches to DNA sequencing
, the so-called next-generation sequencing (NGS) technologies or second generation sequencing. These sequencing technologies have emerged in late 1996 an have become commercially available since 2005. They use miniaturized and parallelized platforms that allow to simultaneously sequence one million to several hundred millions of typically short reads (50-400 bases) from amplified DNA clones.
Although the commercially available NGS
platforms differ in their engineering configurations and sequencing chemistries, they share a technical paradigm in that sequencing of spatially separated, clonally amplified DNA
templates or single DNA molecules is performed in a flow cell in a massively parallel manner. This design is a paradigm shift from that of Sanger sequencing, also known as capillary sequencing or first generation sequencing which is based on the electrophoretic separation of chain-termination products produced in individual sequencing reactions.
, obviating the need for time consuming and laborious cloning of the DNA library into bacteria
. Second, the DNA is sequenced by synthesis, such that the DNA sequence is determined by the addition of nucleotides to the complementary strand rather through chain termination chemistry. Finally, the spatially segregated, amplified DNA templates are sequenced simultaneously in a massively parallel fashion without the requirement for a physical separation step. While these advances are shared across all commercially available high-throughput sequencing platforms, each utilizes a slightly different strategies.
As a massively parallel process, NGS
generates hundreds of megabases to gigabases of nucleotide sequence output in a single instrument run, depending on the platform. The associated timelines and costs needed to gain this knowledge of nucleic acid composition decreases substantially and enables change in genome sequencing approaches that are essential to many of the biological sciences.
Technologies are on the horizon and the instruments are expected to be marketed soon which would certainly decrease the sequencing cost further and eventually achieve the ultimate goal of $1000 per genome sequencing,.
Currently,5 massively parallel sequencing platforms are commercially available (features are summarized at the table), As the pace of this field is advancing quite rapidly, readers are referred to the manufacturers’ websites for the most current information regarding technical specifications and pricing.
Run times* and gigabase (Gb) output † per run for single-end sequencing are denoted by a double asterisk and a single dagger, respectively.
Run times and outputs approximately double when performing pair-end sequencing.
‡Average read lengths for the Roche 454 and Helicos Biosciences platforms.
Clonally amplified templates: most imaging systems have not been designed to detect single fluorescent events, so amplified templates are required. The two most common methods are emulsion PCR (emPCR) and solid-phase amplification.
PCR, the single-stranded DNA fragments or templates are attached to the surface of beads using adaptors or linkers, and one bead is attached to a single DNA fragment from the DNA library. The DNA library is generated through random fragmentation of the genomic DNA. The surface of the beads contains oligonucleotide
probes with sequences that are complementary to the adaptors binding the DNA fragments.
After that, the beads will be compartmentalized into separate water-oil emulsion droplets. In the aqueous water-oil emulsion, each of the droplets capturing one bead will serve as a PCR microreactor
for amplification steps to take place and produce clonally amplified copies of the DNA fragment.
The ratio of the primers to the template on the support defines the surface density of the amplified clusters. the flowcell is exposed to reagents for polymerase
-based extension, and priming occurs as the free/distal end of a ligated fragment "bridges" to a complementary oligo
on the surface. Repeated denaturation
and extension results in localized amplification of DNA fragments in millions of unique locations across the flow cell surface.
Solid-phase amplification can produce 100–200 million spatially separated template clusters (Illumina/Solexa), providing free ends to which a universal sequencing primer can be hybridized to initiate the NGS reaction.
and require a large amount of genomic DNA material (3–20 μg).
The preparation of single-molecule templates is more straightforward and requires less startingmaterial (<1 μg). more importantly, these methodsdo not require PCR, which creates mutations in clonally amplified templates that masquerade as sequence variants. AT-rich and GC-rich target sequences may also show amplification bias in product yield, which
results in their underrepresentation in genome alignments and assemblies.
Single molecule templates are usually immobilized on solid supports using one of at least three different approaches. In the first approach, spatially distributed individual primer
molecules are covalently attached to the solid support. The template, which is prepared by randomly fragmenting the starting material into small sizes (for example,~200–250 bp) and adding common adaptors to the fragment ends, is then hybridized to the immobilized primer. In the second approach, spatially distributed single-molecule templates are covalently attached to the solid support by priming and extending single-stranded, single-molecule templates from immobilized primers .A common primer is then hybridized to the template
In either approach, DNA polymerase can bind to the immobilized primed template configuration to initiate the NGS reaction. Both of the above approachesare used by Helicos BioSciences. In a third approach, spatially distributed single polymerase molecules
are attached to the solid support, to which a primed template molecule is bound . This approach isused by Pacific Biosciences. Larger DNA molecules (up to tens of thousands of base pairs) can be used with this technique and, unlike the first two approaches, the third approach can be used with real-timemethods, resulting in potentially longer read lengths.
is a non-electrophoretic, bioluminescence method that measures the release of inorganic pyrophosphate
by proportionally converting it into visible light using a series of enzymatic reactions. unlike other sequencing approaches that use modified nucleotides to terminate DNA synthesis, the pyrosequencing method manipulates DNA polymerase by the single addition of a dNTP in limiting amounts.
upon incorporation of the complementary dNTP, DNA polymerase extends the primer and pauses. DNA synthesis is reinitiated following the addition of the next complementary dNTP in the dispensing cycle.
The order and intensity of the light peaks are recorded as flowgrams, which reveal the underlying DNA sequence.
A fluorescently-labeled terminator is imaged as each dNTP is added and then cleaved to allow incorporation of the next base.
These nucleotides are chemically blocked such that each incorporation is a unique event. An imaging step follows each base incorporation step, then the blocked group is chemically removed to prepare each strand for the next incorporation by DNA polymerase. This series of steps continues for a specific number of cycles, as determined by user-defined instrument settings.
Sequencing by reversible terminator chemistry can be a four-colour cycle such as used by Illumina/Solexa, or a one-colour cycle such as used by Helicos BioSciences.
Helicos BioSciences uses a “virtual Terminators”, which are unblocked terminators with a second nucleoside analogue that acts as an inhibitor. This terminators have the approoriate modifications for terminating or inhibiting groups so that DNA synthesis is terminated after a single base addition.
and either one-base-encoded probes or two-base-encoded probes. In its simplest form, a fluorescently labelled probe hybridizes to its complementary sequence adjacent to the primed template. DNA ligase is then added to join the dye-labelled probe to the primer. Non-ligated probes are washed away, followed by fluorescence imaging to determine the identity of the ligated probe.
The cycle can be repeated either by using cleavable probes to remove the fluorescent dye and regenerate a 5′-PO4 group for subsequent ligation cycles or by removing and hybridizing a new primer to the template.
The method of real-time sequencing involves imaging the continuous incorporation of dye-labelled nucleotides during DNA synthesis: single DNA polymerase molecules
are attached to the bottom surface of individual zero-mode waveguide detectors (Zmw detectors) that can obtain sequence information while phospholinked nucleotides are being incorporated into the growing primer strand.
Pacific Biosciences uses a uniqe DNA polymerase who better incorporates phospholinked nucleotides and enables theresequencing of closed circular templates.
Currently this method needs more improvement because of high errors rate consisting of deletions, insertionsand mismatches.
DNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....
, the so-called next-generation sequencing (NGS) technologies or second generation sequencing. These sequencing technologies have emerged in late 1996 an have become commercially available since 2005. They use miniaturized and parallelized platforms that allow to simultaneously sequence one million to several hundred millions of typically short reads (50-400 bases) from amplified DNA clones.
Although the commercially available NGS
NGS
NGS could stand for:* the National Garden Scheme, a British organisation which promotes the opening of private gardens for charity.* the National Geodetic Survey, a service operating under the control of the National Oceanic and Atmospheric Administration...
platforms differ in their engineering configurations and sequencing chemistries, they share a technical paradigm in that sequencing of spatially separated, clonally amplified DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
templates or single DNA molecules is performed in a flow cell in a massively parallel manner. This design is a paradigm shift from that of Sanger sequencing, also known as capillary sequencing or first generation sequencing which is based on the electrophoretic separation of chain-termination products produced in individual sequencing reactions.
NGS Platforms
The commercially available next generation sequencing platforms differ from traditional Sanger sequencing technology in a number of ways. First, the DNA sequencing libraries are clonally amplified in vitroIn vitro
In vitro refers to studies in experimental biology that are conducted using components of an organism that have been isolated from their usual biological context in order to permit a more detailed or more convenient analysis than can be done with whole organisms. Colloquially, these experiments...
, obviating the need for time consuming and laborious cloning of the DNA library into bacteria
Bacteria
Bacteria are a large domain of prokaryotic microorganisms. Typically a few micrometres in length, bacteria have a wide range of shapes, ranging from spheres to rods and spirals...
. Second, the DNA is sequenced by synthesis, such that the DNA sequence is determined by the addition of nucleotides to the complementary strand rather through chain termination chemistry. Finally, the spatially segregated, amplified DNA templates are sequenced simultaneously in a massively parallel fashion without the requirement for a physical separation step. While these advances are shared across all commercially available high-throughput sequencing platforms, each utilizes a slightly different strategies.
As a massively parallel process, NGS
NGS
NGS could stand for:* the National Garden Scheme, a British organisation which promotes the opening of private gardens for charity.* the National Geodetic Survey, a service operating under the control of the National Oceanic and Atmospheric Administration...
generates hundreds of megabases to gigabases of nucleotide sequence output in a single instrument run, depending on the platform. The associated timelines and costs needed to gain this knowledge of nucleic acid composition decreases substantially and enables change in genome sequencing approaches that are essential to many of the biological sciences.
Technologies are on the horizon and the instruments are expected to be marketed soon which would certainly decrease the sequencing cost further and eventually achieve the ultimate goal of $1000 per genome sequencing,.
Currently,5 massively parallel sequencing platforms are commercially available (features are summarized at the table), As the pace of this field is advancing quite rapidly, readers are referred to the manufacturers’ websites for the most current information regarding technical specifications and pricing.
Platform | Template Preparation | Chemistry | Read length (bases) | Run Times (days)* | GB per Run† |
---|---|---|---|---|---|
Roche 454 | Clonal-emPCR | Pyrosequencing | 400‡ | 0.42 | 0.40-0.60 |
GS FLX Titanium | Clonal-emPCR | Pyrosequencing | 400‡ | 0.42 | 0.035 |
Illumina | Clonal Bridge Amplification | Reversible Dye Terminator | 35-100 | 2-4 | 30-100 |
HiSeq 2000 | Clonal Bridge Amplification | Reversible Dye Terminator | 35-100 | 2-4 | 9-25 |
Genom Analyzer IIX, IIE | Clonal Bridge Amplification | Reversible Dye Terminator | 35-100 | 2-5 | 3.5-10 |
IScanSQ | Clonal Bridge Amplification | Reversible Dye Terminator | 35-75 | 2.5-5 | 4-10 |
Life Technologies Solid 4 | Clonal-emPCR | Oigonucleotide Probe Ligation | 35-50 | 4-7 | 35-50 |
Helicos Biosciences Heliscope | Single Molecule | Reversible Dye Terminator | 35‡ | 8 | 25 |
Pacific Biosciences SMART | Single Molecule | Phospholinked Fluorescent Nucleutides | 800-1000 | 0.2 | Pending |
Run times* and gigabase (Gb) output † per run for single-end sequencing are denoted by a double asterisk and a single dagger, respectively.
Run times and outputs approximately double when performing pair-end sequencing.
‡Average read lengths for the Roche 454 and Helicos Biosciences platforms.
Template Preparation Methods used by NGS technologies
There are two methods used in preparing templates for NGS reactions: clonally amplified templates originating from single DNA molecules, and single DNA molecule templates.Clonally amplified templates: most imaging systems have not been designed to detect single fluorescent events, so amplified templates are required. The two most common methods are emulsion PCR (emPCR) and solid-phase amplification.
Emulsion PCR
In emulsionEmulsion
An emulsion is a mixture of two or more liquids that are normally immiscible . Emulsions are part of a more general class of two-phase systems of matter called colloids. Although the terms colloid and emulsion are sometimes used interchangeably, emulsion is used when both the dispersed and the...
PCR, the single-stranded DNA fragments or templates are attached to the surface of beads using adaptors or linkers, and one bead is attached to a single DNA fragment from the DNA library. The DNA library is generated through random fragmentation of the genomic DNA. The surface of the beads contains oligonucleotide
Oligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...
probes with sequences that are complementary to the adaptors binding the DNA fragments.
After that, the beads will be compartmentalized into separate water-oil emulsion droplets. In the aqueous water-oil emulsion, each of the droplets capturing one bead will serve as a PCR microreactor
Microreactor
A microreactor or microstructured reactor or microchannel reactor is a device in which chemical reactions take place in a confinement with typical lateral dimensions below 1 mm;the most typical form of such confinement are microchannels...
for amplification steps to take place and produce clonally amplified copies of the DNA fragment.
Bridge amplification on solid surface
High-density forward and reverse primers are covalently attached to the slide in a flow cell.The ratio of the primers to the template on the support defines the surface density of the amplified clusters. the flowcell is exposed to reagents for polymerase
Polymerase
A polymerase is an enzyme whose central function is associated with polymers of nucleic acids such as RNA and DNA.The primary function of a polymerase is the polymerization of new DNA or RNA against an existing DNA or RNA template in the processes of replication and transcription...
-based extension, and priming occurs as the free/distal end of a ligated fragment "bridges" to a complementary oligo
Oligo
Oligo may refer to:*Oligonucleotide as an abbreviation.*OLIGO Primer Analysis Software...
on the surface. Repeated denaturation
Denaturation
Denaturation may refer to:*Denaturation , a structural change in macromolecules caused by extreme conditions*Denaturation , transforming fissile materials so that they cannot be used in nuclear weapons...
and extension results in localized amplification of DNA fragments in millions of unique locations across the flow cell surface.
Solid-phase amplification can produce 100–200 million spatially separated template clusters (Illumina/Solexa), providing free ends to which a universal sequencing primer can be hybridized to initiate the NGS reaction.
Single-molecule templates
some of the clonally amplified methods protocols are cumbersome to implementand require a large amount of genomic DNA material (3–20 μg).
The preparation of single-molecule templates is more straightforward and requires less startingmaterial (<1 μg). more importantly, these methodsdo not require PCR, which creates mutations in clonally amplified templates that masquerade as sequence variants. AT-rich and GC-rich target sequences may also show amplification bias in product yield, which
results in their underrepresentation in genome alignments and assemblies.
Single molecule templates are usually immobilized on solid supports using one of at least three different approaches. In the first approach, spatially distributed individual primer
molecules are covalently attached to the solid support. The template, which is prepared by randomly fragmenting the starting material into small sizes (for example,~200–250 bp) and adding common adaptors to the fragment ends, is then hybridized to the immobilized primer. In the second approach, spatially distributed single-molecule templates are covalently attached to the solid support by priming and extending single-stranded, single-molecule templates from immobilized primers .A common primer is then hybridized to the template
In either approach, DNA polymerase can bind to the immobilized primed template configuration to initiate the NGS reaction. Both of the above approachesare used by Helicos BioSciences. In a third approach, spatially distributed single polymerase molecules
are attached to the solid support, to which a primed template molecule is bound . This approach isused by Pacific Biosciences. Larger DNA molecules (up to tens of thousands of base pairs) can be used with this technique and, unlike the first two approaches, the third approach can be used with real-timemethods, resulting in potentially longer read lengths.
Pyrosequencing
PyrosequencingPyrosequencing
Pyrosequencing is a method of DNA sequencing based on the "sequencing by synthesis" principle. It differs from Sanger sequencing, in that it relies on the detection of pyrophosphate release on nucleotide incorporation, rather than chain termination with dideoxynucleotides...
is a non-electrophoretic, bioluminescence method that measures the release of inorganic pyrophosphate
Pyrophosphate
In chemistry, the anion, the salts, and the esters of pyrophosphoric acid are called pyrophosphates. Any salt or ester containing two phosphate groups is called a diphosphate. As a food additive, diphosphates are known as E450.- Chemistry :...
by proportionally converting it into visible light using a series of enzymatic reactions. unlike other sequencing approaches that use modified nucleotides to terminate DNA synthesis, the pyrosequencing method manipulates DNA polymerase by the single addition of a dNTP in limiting amounts.
upon incorporation of the complementary dNTP, DNA polymerase extends the primer and pauses. DNA synthesis is reinitiated following the addition of the next complementary dNTP in the dispensing cycle.
The order and intensity of the light peaks are recorded as flowgrams, which reveal the underlying DNA sequence.
Sequencing by reversible terminator chemistry
This approach uses reversible terminator-bound dNTPs in a cyclic method that comprises nucleotide incorporation, fluorescence imaging and cleavage.A fluorescently-labeled terminator is imaged as each dNTP is added and then cleaved to allow incorporation of the next base.
These nucleotides are chemically blocked such that each incorporation is a unique event. An imaging step follows each base incorporation step, then the blocked group is chemically removed to prepare each strand for the next incorporation by DNA polymerase. This series of steps continues for a specific number of cycles, as determined by user-defined instrument settings.
Sequencing by reversible terminator chemistry can be a four-colour cycle such as used by Illumina/Solexa, or a one-colour cycle such as used by Helicos BioSciences.
Helicos BioSciences uses a “virtual Terminators”, which are unblocked terminators with a second nucleoside analogue that acts as an inhibitor. This terminators have the approoriate modifications for terminating or inhibiting groups so that DNA synthesis is terminated after a single base addition.
Sequencing-by-ligation mediated by ligase enzymes
In this approach, The sequence extension reaction is not carried out by polymerases but rather by DNA ligaseLigase
In biochemistry, ligase is an enzyme that can catalyse the joining of two large molecules by forming a new chemical bond, usually with accompanying hydrolysis of a small chemical group dependent to one of the larger molecules...
and either one-base-encoded probes or two-base-encoded probes. In its simplest form, a fluorescently labelled probe hybridizes to its complementary sequence adjacent to the primed template. DNA ligase is then added to join the dye-labelled probe to the primer. Non-ligated probes are washed away, followed by fluorescence imaging to determine the identity of the ligated probe.
The cycle can be repeated either by using cleavable probes to remove the fluorescent dye and regenerate a 5′-PO4 group for subsequent ligation cycles or by removing and hybridizing a new primer to the template.
Phospholinked Fluorescent Nucleutides or Real-time sequencing
Pacific Biosciences is currently leading this method.The method of real-time sequencing involves imaging the continuous incorporation of dye-labelled nucleotides during DNA synthesis: single DNA polymerase molecules
are attached to the bottom surface of individual zero-mode waveguide detectors (Zmw detectors) that can obtain sequence information while phospholinked nucleotides are being incorporated into the growing primer strand.
Pacific Biosciences uses a uniqe DNA polymerase who better incorporates phospholinked nucleotides and enables theresequencing of closed circular templates.
Currently this method needs more improvement because of high errors rate consisting of deletions, insertionsand mismatches.