DNA Encoded Chemical Library
Encyclopedia
DNA-encoded chemical libraries (DEL) are a new technology for the synthesis
and screening of collections of chemical compounds of unprecedented size and quality. DEL represents an advance in medicinal chemistry which bridges the fields of combinatorial chemistry and molecular biology. The driving force for the development of DEL technology is to improve and streamline the drug discovery
process in particular early phase discovery activities such as target validation and hit discovery.
DEL technology involves the conjugation of chemical compounds or building blocks to short DNA fragments that serve as identification bar codes and in some cases also direct and control the chemical synthesis. The technique, in principle, enables the mass creation and interrogation of libraries via affinity selection, typically on an immobilized protein target. In contrast to conventional screening procedures such as high-throughput screening
, biochemical assays are not required for binder identification, in principle allowing the isolation of binders to a wide range of proteins historically difficult to tackle with conventional screening technologies. So, in addition to the general discovery of target specific molecular compounds, the availability of binders to pharmacologically important, but so-far “undruggable” target proteins opens new possibilities to develop novel drugs for diseases that could not be treated so far. In eliminating the requirement to initially assess the activity of hits it is hoped and expected that many of the high affinity binders identified will be shown to be active in independent analysis of selected hits, therefore offering an efficient method to identify high quality hits and pharmaceutical leads.
DNA encoded chemical libraries bear resemblance to biological display technologies such as antibody phage display technology
, yeast display
, mRNA display
and aptamer SELEX
. In antibody phage display, antibodies are physically linked to phage particles that bear the gene coding for the attached antibody, which is equivalent to a physical linkage of a “phenotype
” (the protein) and a “genotype
” (the gene encoding for the protein ). Phage-displayed antibodies can be isolated from large antibody libraries by mimicking molecular evolution: through rounds of selection (on an immobilized protein target), amplification and translation.
In DEL the linkage of a small molecule to an identifier DNA code allows the facile identification of binding molecules. DEL libraries are subjected to affinity selection procedures on an immobilized target protein of choice, after which non-binders are removed by washing steps, and binders can subsequently be amplified by polymerase chain reaction (PCR) and identified by virtue of their DNA code (e.g.by DNA sequencing). In evolution-based DEL technologies (see below) hits can be further enriched by performing rounds of selection, PCR amplification and translation in analogy to biological display systems such as antibody phage display. This makes it possible to work with much larger libraries.
sequence constructed in parallel and to use this encoding genetic tag to identify and enrich active compounds. In 1993 the first practical implementation of this approach was presented by S. Brenner and K. Janda and similarly by the group of M.A. Gallop. Brenner and Janda suggested to generate individual encoded library members by an alternating parallel combinatorial synthesis
of the heteropolymeric chemical compound and the appropriate oligonucleotide sequence on the same bead in a “split-&-pool”-based fashion (see below).
Since unprotected DNA is restricted to a narrow window of conventional reaction conditions, until the end of 1990s a number of alternative encoding strategies were envisaged (i.e. MS-based
compound tagging, peptide
encoding, haloaromatic
tagging, encoding by secondary amine
s, semiconductor
devices.), mainly to avoid inconvenient solid phase DNA synthesis and to create easily screenable combinatorial libraries in high-throughput fashion. However, the selective amplifiability of DNA greatly facilitates library screening and it becomes indispensable for the encoding of organic compounds libraries of this unprecedented size. Consequently, at the beginning of 2000s DNA-combinatorial chemistry experienced a revival.
The beginning of the millennium saw the introduction of several independent developments in DEL technology. These technologies can be classified under two general categories: non-evolution-based and evolution-based DEL technologies capable of molecular evolution. The first category benefits from the ability to use off the shelf reagents and therefore enables rather straighforward library generation. Hits can be identified by DNA sequencing, however DNA translation and therefore molecular evolution is not feasible by these methods. The split and pool approaches developed by researchers at Praecis Pharmaceuticals (now owned by GlaxoSmithKline), Nuevolution (Copenhagen, Denmark) and ESAC technology developed in the laboratory of Prof D. Neri (Institute of Pharmaceutical Science, Zurich, Switzerland) fall under this category. ESAC technology sets itself apart being a combinatorial self-assembling approach which resembles fragment based hit discovery (Fig 1b). Here DNA annealing enables discrete building block combinations to be sampled, but no chemical reaction takes place between them.
Examples of evolution-based DEL technologies are DNA-routing developed by Prof. D.R. Halpin and Prof. P.B. Harbury (Stanford University, Stanford, CA), DNA-templated synthesis developed by Prof. D. Liu (Harvard University, Cambridge, MA) and commercialized by Ensemble Discovery (Cambridge, MA) and YoctoReactor technology developed and commercialized by Vipergen (Copenhagen, Denmark). These technologies are described in further detail below. DNA-templated synthesis and YoctoReactor technology require the prior conjugation of chemical building blocks (BB) to a DNA oligonucleotide tag before library assembly, therefore more upfront work is required before library assembly. Furthermore, the DNA tagged BBs enable the generation of a genetic code for synthesized compounds and artificial translation of the genetic code is possible: That is the BB’s can be recalled by the PCR-amplified genetic code, and the library compounds can be regenerated. This, in turn, enables the principle of Darwinian natural selection and evolution to be applied to small molecule selection in direct analogy to biological display systems; through rounds of selection, amplification and translation.
for the synthesis of DNA-encoded chemical libraries, a Split-&-Pool approach was pursued. Initially a set of unique DNA-oligonucleotides (n) each containing a specific coding sequence is chemically conjugated to a corresponding set of small organic molecules.Consequently the oligonucleotide
-conjugate compounds are mixed ("Pool") and divided("Split")into a number of groups (m). In appropriate conditions a second set of building blocks (m) are coupled to the first one and a further oligonucleotide
which is coding for the second modification is enzymatically introduced before mixing again. This “split-&-pool” steps can be iterated a number of times (r) increasing at each round the library size in a combinatorial manner (i.e. (n x m)r).
serving as a “core structure” for library synthesis. In a ‘pool-and-split’ fashion a set of multifunctional scaffolds undergo orthogonal reactions with series of suitable reactive partners. Following each reaction step, the identity of the modification is encoded by an enzymatic addition of DNA segment to the original DNA “core structure”. The use of N-protected amino acid
s covalently attached to a DNA fragment allow, after a suitable deprotection step, a further amide bond
formation with a series of carboxylic acid
s or a reductive amination
with aldehydes. Similarly, diene
carboxylic acids used as scaffolds for library construction at the 5’-end of amino modified oligonucleotide
, could be subjected to a Diels-Alder reaction with a variety of maleimide
derivatives. After completion of the desired reaction step, the identity of the chemical moiety added to the oligonucleotide
is established by the annealing of a partially complementary oligonucleotide
and by a subsequent Klenow fill-in DNA-polymerization
, yielding a double stranded DNA fragment. The synthetic and encoding strategies described above enable the facile construction of DNA-encoded libraries of a size up to 104 member compounds carrying two sets of “building blocks”. However the stepwise addition of at least three independent sets of chemical moieties to a tri-functional core building block for the construction and encoding of a very large DNA-encoded library (comprising up to 106 compounds) can also be envisaged.(Fig.2)
containing a variable, coding region flanked by a constant DNA sequence, carrying a suitable chemical modification at the oligonucleotide extremity. The ESAC sublibraries can be used in at least four different embodiments.
Preferential binders isolated from an affinity-based selection can be PCR-amplified and decoded on complementary oligonucleotide
microarrays or by concatenation of the codes, subcloning
and sequencing
. The individual building blocks can eventually be conjugated using suitable linkers to yield a drug-like high-affinity compound. The characteristics of the linker (e.g. length, flexibility, geometry, chemical nature and solubility) influence the binding affinity
and the chemical properties of the resulting binder.(Fig.3)
Bio-panning experiments on HSA
of a 600-member ESAC library allowed the isolation of the 4-(p-iodophenyl)butanoic moiety. The compound represents the core structure of a series of portable albumin
binding molecules and of AlbufluorTM a recently developed fluorescein
angiographic contrast agent
currently under clinical evaluation.
ESAC technology has been used for the isolation of potent inhibitors
of bovine trypsin
and for the identification of novel inhibitors
of stromelysin-1 (MMP-3) , a matrix metalloproteinase involved in both physiological and pathological tissue remodeling processes, as well as in disease processes, such as arthritis
and metastasis
.
technology. The DNA-routing machinery consists of a series of connected columns bearing resin-bound anticodons, which could sequence-specifically separate a population of DNA-templates into spatially distinct locations by hybridization. According to this split-and-pool protocol a peptide
combinatorial library DNA-encoded of 106 members was generated.
, which do not efficiently take place in solution
at low concentration
. A DNA-heteroduplex was used to accelerate the reaction between chemical moieties displayed at the extremities of the two DNA strands. Furthermore, the "proximity effect", which accelerates bimolecular reaction, was shown to be distance-independent (at least within a distance of 30 nucleotides). In a sequence-programmed fashion oligonucleotides carrying one chemical reactant group were hybridized to complementary oligonucleotide derivatives carrying a different reactive chemical group. The proximity conferred by the DNA hybridization drastically increases the effective molarity of the reaction reagents attached to the oligonucleotides, enabling the desired reaction to occur even in an aqueous environment at concentrations which are several orders of magnitude lower than those needed for the corresponding conventional organic reaction not DNA-templated. Using a DNA-templated set-up and sequence-programmed synthesis Liu and co-workers generated a 64 member compound DNA encoded library of macrocycles.
The center of the DNA junction constitutes a volume on the order of a yoctoliter, hence the name YoctoReactor. This volume contains a single molecule reaction yielding reaction concentrations in the high mM range. The effective concentration facilitated by the DNA greatly accelerates chemical reactions that otherwise would not take place at the actual concentration several orders of magnitude lower.
The yR design approach provides an unvarying reaction site with regard to both (a) distance between reactants and (b) sequence environment surrounding the reaction site. Furthermore the intimate connection between the code and the BB on the oligo-BB moieties which are mixed combinatorially in a single pot confers a high fidelity to the encoding of the library. The code of the synthesized products, furthermore, is not preset, but rather is assembled combinatorially and synthesized in synchronicity with the innate product.
-based methodology and high-throughput sequencing techniques represented the main methodologies for the decoding of DNA-encoded library selections.
into a vector
. Following Sanger sequencing of a representative number of the resulting colonies
revealed the frequencies of the codes present in the DNA-encoded library sample before and after selection.
is a device for high-throughput investigations widely used in molecular biology
and in medicine
. It consists of an arrayed series of microscopic spots (‘features’ or ‘locations’) containing few picomoles of oligonucleotides carrying a specific DNA sequence. This can be a short section of a gene
or other DNA element that are used as probes to hybridize a DNA or RNA
sample under suitable conditions. Probe-target hybridization is usually detected and quantified by fluorescence
-based detection of fluorophore
-labeled targets to determine relative abundance of the target nucleic acid
sequences. Microarray
has been used for the successfully decoding of ESAC DNA-encoded libraries. The coding oligonucleotides representing the individual chemical compounds in the library, are spotted and chemically linked onto the microarray
slides, using a BioChip Arrayer robot. Subsequently, the oligonucleotide
tags of the binding compounds isolated from the selection are PCR amplified using a fluorescent primer
and hybridized onto the DNA-microarray
slide. Afterwards, microarrays are analyzed using a laser
scan and spot intensities detected and quantified. The enrichment of the preferential binding compounds is revealed comparing the spots intensity of the DNA-microarray
slide before and after selection.
technologies exploited strategies that parallelize the sequencing process displacing the use of capillary
electrophoresis
and producing thousands or millions of sequences at once. In 2008 was described the first implementation of a high-throughput sequencing
technique originally developed for genome sequencing (i.e. "454 technology
") to the fast and efficient decoding of a DNA encoded chemical library comprising 4000 compounds. This study led to the identification of novel chemical compounds with submicromolar dissociation constant
s towards streptavidin
and definitely shown the feasibility to construct, perform selections and decode DNA-encoded libraries containing millions of chemical compounds.
Chemical synthesis
In chemistry, chemical synthesis is purposeful execution of chemical reactions to get a product, or several products. This happens by physical and chemical manipulations usually involving one or more reactions...
and screening of collections of chemical compounds of unprecedented size and quality. DEL represents an advance in medicinal chemistry which bridges the fields of combinatorial chemistry and molecular biology. The driving force for the development of DEL technology is to improve and streamline the drug discovery
Drug discovery
In the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which drugs are discovered or designed.In the past most drugs have been discovered either by identifying the active ingredient from traditional remedies or by serendipitous discovery...
process in particular early phase discovery activities such as target validation and hit discovery.
DEL technology involves the conjugation of chemical compounds or building blocks to short DNA fragments that serve as identification bar codes and in some cases also direct and control the chemical synthesis. The technique, in principle, enables the mass creation and interrogation of libraries via affinity selection, typically on an immobilized protein target. In contrast to conventional screening procedures such as high-throughput screening
High-throughput screening
High-throughput screening is a method for scientific experimentation especially used in drug discovery and relevant to the fields of biology and chemistry. Using robotics, data processing and control software, liquid handling devices, and sensitive detectors, High-Throughput Screening allows a...
, biochemical assays are not required for binder identification, in principle allowing the isolation of binders to a wide range of proteins historically difficult to tackle with conventional screening technologies. So, in addition to the general discovery of target specific molecular compounds, the availability of binders to pharmacologically important, but so-far “undruggable” target proteins opens new possibilities to develop novel drugs for diseases that could not be treated so far. In eliminating the requirement to initially assess the activity of hits it is hoped and expected that many of the high affinity binders identified will be shown to be active in independent analysis of selected hits, therefore offering an efficient method to identify high quality hits and pharmaceutical leads.
DNA-encoded chemical libraries and display technologies
Until recently, the application of molecular evolution in the laboratory had been limited to display technologies involving biological molecules, where small molecules lead discovery was considered beyond this biological approach. DEL has opened the field of display technology to include non-natural compounds such as small molecules, extending the application of molecular evolution and natural selection to the identification of small molecule compounds of desired activity and function.DNA encoded chemical libraries bear resemblance to biological display technologies such as antibody phage display technology
Phage display
Phage display is a method for the study of protein–protein, protein–peptide, and protein–DNA interactions that uses bacteriophages to connect proteins with the genetic information that encodes them. Phage Display was originally invented by George P...
, yeast display
Yeast display
Yeast display is a technique used in the field of protein engineering. The yeast display technique was first published by the laboratory of Professor K. Dane Wittrup. The technology was sold to Abbott Laboratories in 2001....
, mRNA display
MRNA display
mRNA display is a display technique used for in vitro protein, and/or peptide evolution to create molecules that can bind to a desired target. The process results in translated peptides or proteins that are associated with their mRNA progenitor via a puromycin linkage. The complex then binds to...
and aptamer SELEX
Aptamer
Aptamers are oligonucleic acid or peptide molecules that bind to a specific target molecule. Aptamers are usually created by selecting them from a large random sequence pool, but natural aptamers also exist in riboswitches. Aptamers can be used for both basic research and clinical purposes as...
. In antibody phage display, antibodies are physically linked to phage particles that bear the gene coding for the attached antibody, which is equivalent to a physical linkage of a “phenotype
Phenotype
A phenotype is an organism's observable characteristics or traits: such as its morphology, development, biochemical or physiological properties, behavior, and products of behavior...
” (the protein) and a “genotype
Genotype
The genotype is the genetic makeup of a cell, an organism, or an individual usually with reference to a specific character under consideration...
” (the gene encoding for the protein ). Phage-displayed antibodies can be isolated from large antibody libraries by mimicking molecular evolution: through rounds of selection (on an immobilized protein target), amplification and translation.
In DEL the linkage of a small molecule to an identifier DNA code allows the facile identification of binding molecules. DEL libraries are subjected to affinity selection procedures on an immobilized target protein of choice, after which non-binders are removed by washing steps, and binders can subsequently be amplified by polymerase chain reaction (PCR) and identified by virtue of their DNA code (e.g.by DNA sequencing). In evolution-based DEL technologies (see below) hits can be further enriched by performing rounds of selection, PCR amplification and translation in analogy to biological display systems such as antibody phage display. This makes it possible to work with much larger libraries.
History
The concept of DNA-encoding was first described in a theoretical paper by Brenner and Lerner in 1992 in which was proposed to link each molecule of a chemically synthesized entity to a particular oligonucleotideOligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...
sequence constructed in parallel and to use this encoding genetic tag to identify and enrich active compounds. In 1993 the first practical implementation of this approach was presented by S. Brenner and K. Janda and similarly by the group of M.A. Gallop. Brenner and Janda suggested to generate individual encoded library members by an alternating parallel combinatorial synthesis
Combinatorial chemistry
Combinatorial chemistry involves the rapid synthesis or the computer simulation of a large number of different but structurally related molecules or materials...
of the heteropolymeric chemical compound and the appropriate oligonucleotide sequence on the same bead in a “split-&-pool”-based fashion (see below).
Since unprotected DNA is restricted to a narrow window of conventional reaction conditions, until the end of 1990s a number of alternative encoding strategies were envisaged (i.e. MS-based
Mass spectrometry
Mass spectrometry is an analytical technique that measures the mass-to-charge ratio of charged particles.It is used for determining masses of particles, for determining the elemental composition of a sample or molecule, and for elucidating the chemical structures of molecules, such as peptides and...
compound tagging, peptide
Peptide
Peptides are short polymers of amino acid monomers linked by peptide bonds. They are distinguished from proteins on the basis of size, typically containing less than 50 monomer units. The shortest peptides are dipeptides, consisting of two amino acids joined by a single peptide bond...
encoding, haloaromatic
Halogenoarene
In organic chemistry, a halogenoarene, haloarene, or aryl halide, is an organic compound in which a halogen atom is bonded to a carbon atom which is part of an aromatic ring. The haloarene are studied separately from haloalkanes because they exhibit many differences in methods of preparation and...
tagging, encoding by secondary amine
Amine
Amines are organic compounds and functional groups that contain a basic nitrogen atom with a lone pair. Amines are derivatives of ammonia, wherein one or more hydrogen atoms have been replaced by a substituent such as an alkyl or aryl group. Important amines include amino acids, biogenic amines,...
s, semiconductor
Semiconductor
A semiconductor is a material with electrical conductivity due to electron flow intermediate in magnitude between that of a conductor and an insulator. This means a conductivity roughly in the range of 103 to 10−8 siemens per centimeter...
devices.), mainly to avoid inconvenient solid phase DNA synthesis and to create easily screenable combinatorial libraries in high-throughput fashion. However, the selective amplifiability of DNA greatly facilitates library screening and it becomes indispensable for the encoding of organic compounds libraries of this unprecedented size. Consequently, at the beginning of 2000s DNA-combinatorial chemistry experienced a revival.
The beginning of the millennium saw the introduction of several independent developments in DEL technology. These technologies can be classified under two general categories: non-evolution-based and evolution-based DEL technologies capable of molecular evolution. The first category benefits from the ability to use off the shelf reagents and therefore enables rather straighforward library generation. Hits can be identified by DNA sequencing, however DNA translation and therefore molecular evolution is not feasible by these methods. The split and pool approaches developed by researchers at Praecis Pharmaceuticals (now owned by GlaxoSmithKline), Nuevolution (Copenhagen, Denmark) and ESAC technology developed in the laboratory of Prof D. Neri (Institute of Pharmaceutical Science, Zurich, Switzerland) fall under this category. ESAC technology sets itself apart being a combinatorial self-assembling approach which resembles fragment based hit discovery (Fig 1b). Here DNA annealing enables discrete building block combinations to be sampled, but no chemical reaction takes place between them.
Examples of evolution-based DEL technologies are DNA-routing developed by Prof. D.R. Halpin and Prof. P.B. Harbury (Stanford University, Stanford, CA), DNA-templated synthesis developed by Prof. D. Liu (Harvard University, Cambridge, MA) and commercialized by Ensemble Discovery (Cambridge, MA) and YoctoReactor technology developed and commercialized by Vipergen (Copenhagen, Denmark). These technologies are described in further detail below. DNA-templated synthesis and YoctoReactor technology require the prior conjugation of chemical building blocks (BB) to a DNA oligonucleotide tag before library assembly, therefore more upfront work is required before library assembly. Furthermore, the DNA tagged BBs enable the generation of a genetic code for synthesized compounds and artificial translation of the genetic code is possible: That is the BB’s can be recalled by the PCR-amplified genetic code, and the library compounds can be regenerated. This, in turn, enables the principle of Darwinian natural selection and evolution to be applied to small molecule selection in direct analogy to biological display systems; through rounds of selection, amplification and translation.
Split-&-Pool DNA Encoding
In order to apply combinatorial chemistryCombinatorial chemistry
Combinatorial chemistry involves the rapid synthesis or the computer simulation of a large number of different but structurally related molecules or materials...
for the synthesis of DNA-encoded chemical libraries, a Split-&-Pool approach was pursued. Initially a set of unique DNA-oligonucleotides (n) each containing a specific coding sequence is chemically conjugated to a corresponding set of small organic molecules.Consequently the oligonucleotide
Oligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...
-conjugate compounds are mixed ("Pool") and divided("Split")into a number of groups (m). In appropriate conditions a second set of building blocks (m) are coupled to the first one and a further oligonucleotide
Oligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...
which is coding for the second modification is enzymatically introduced before mixing again. This “split-&-pool” steps can be iterated a number of times (r) increasing at each round the library size in a combinatorial manner (i.e. (n x m)r).
Stepwise coupling of coding DNA fragments to nascent organic molecules
A promising strategy for the construction of DNA-encoded libraries is represented by the use of multifunctional building blocks covalently conjugate to an oligonucleotideOligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...
serving as a “core structure” for library synthesis. In a ‘pool-and-split’ fashion a set of multifunctional scaffolds undergo orthogonal reactions with series of suitable reactive partners. Following each reaction step, the identity of the modification is encoded by an enzymatic addition of DNA segment to the original DNA “core structure”. The use of N-protected amino acid
Amino acid
Amino acids are molecules containing an amine group, a carboxylic acid group and a side-chain that varies between different amino acids. The key elements of an amino acid are carbon, hydrogen, oxygen, and nitrogen...
s covalently attached to a DNA fragment allow, after a suitable deprotection step, a further amide bond
Amide
In chemistry, an amide is an organic compound that contains the functional group consisting of a carbonyl group linked to a nitrogen atom . The term refers both to a class of compounds and a functional group within those compounds. The term amide also refers to deprotonated form of ammonia or an...
formation with a series of carboxylic acid
Carboxylic acid
Carboxylic acids are organic acids characterized by the presence of at least one carboxyl group. The general formula of a carboxylic acid is R-COOH, where R is some monovalent functional group...
s or a reductive amination
Reductive amination
Reductive amination is a form of amination that involves the conversion of a carbonyl group to an amine via an intermediate imine...
with aldehydes. Similarly, diene
Diene
In organic chemistry a diene or diolefin is a hydrocarbon that contains two carbon double bonds.Conjugated dienes are functional groups, with a general formula of CnH2n-2. Dienes and alkynes are functional isomers...
carboxylic acids used as scaffolds for library construction at the 5’-end of amino modified oligonucleotide
Oligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...
, could be subjected to a Diels-Alder reaction with a variety of maleimide
Maleimide
Maleimide is the chemical compound with the formula H2C22NH . This unsaturated imide is an important building block in organic synthesis. The name is a contraction of maleic acid and imide, the -CNHC- functional group...
derivatives. After completion of the desired reaction step, the identity of the chemical moiety added to the oligonucleotide
Oligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...
is established by the annealing of a partially complementary oligonucleotide
Oligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...
and by a subsequent Klenow fill-in DNA-polymerization
Polymerase
A polymerase is an enzyme whose central function is associated with polymers of nucleic acids such as RNA and DNA.The primary function of a polymerase is the polymerization of new DNA or RNA against an existing DNA or RNA template in the processes of replication and transcription...
, yielding a double stranded DNA fragment. The synthetic and encoding strategies described above enable the facile construction of DNA-encoded libraries of a size up to 104 member compounds carrying two sets of “building blocks”. However the stepwise addition of at least three independent sets of chemical moieties to a tri-functional core building block for the construction and encoding of a very large DNA-encoded library (comprising up to 106 compounds) can also be envisaged.(Fig.2)
Encoded Self-Assembling Chemical libraries
Encoded Self-Assembling Chemical (ESAC) libraries rely on the principle that two sublibraries of a size of x members (e.g. 103) containing a constant complementary hybridization domain can yield a combinatorial DNA-duplex library after hybridization with a complexity of x2 uniformly represented library members (e.g. 106). Each sub-library member would consist of an oligonucleotideOligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...
containing a variable, coding region flanked by a constant DNA sequence, carrying a suitable chemical modification at the oligonucleotide extremity. The ESAC sublibraries can be used in at least four different embodiments.
- A sub-library can be paired with a complementary oligonucleotide and used as a DNA encoded library displaying a single covalently linked compound for affinity-based selection experiments.
- A sub-library can be paired with an oligonucleotide displaying a known binder to the target, thus enabling affinity maturation strategies.
- Two individual sublibraries can be assembled combinatorially and used for the de novo identification of bindentate binding molecules.
- Three different sublibraries can be assembled to form a combinatorial triplex library.
Preferential binders isolated from an affinity-based selection can be PCR-amplified and decoded on complementary oligonucleotide
Oligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...
microarrays or by concatenation of the codes, subcloning
Subcloning
In molecular biology, subcloning is a technique used to move a particular gene of interest from a parent vector to a destination vector in order to further study its functionality....
and sequencing
DNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....
. The individual building blocks can eventually be conjugated using suitable linkers to yield a drug-like high-affinity compound. The characteristics of the linker (e.g. length, flexibility, geometry, chemical nature and solubility) influence the binding affinity
Binding (molecular)
Molecular binding is an attractive interaction between two molecules which results in a stable association in which the molecules are in close proximity to each other...
and the chemical properties of the resulting binder.(Fig.3)
Bio-panning experiments on HSA
Human serum albumin
Human serum albumin is the most abundant protein in human blood plasma. It is produced in the liver. Albumin constitutes about half of the blood serum protein...
of a 600-member ESAC library allowed the isolation of the 4-(p-iodophenyl)butanoic moiety. The compound represents the core structure of a series of portable albumin
Albumin
Albumin refers generally to any protein that is water soluble, which is moderately soluble in concentrated salt solutions, and experiences heat denaturation. They are commonly found in blood plasma, and are unique to other blood proteins in that they are not glycosylated...
binding molecules and of AlbufluorTM a recently developed fluorescein
Fluorescein
Fluorescein is a synthetic organic compound available as a dark orange/red powder soluble in water and alcohol. It is widely used as a fluorescent tracer for many applications....
angiographic contrast agent
Contrast medium
A medical contrast medium is a substance used to enhance the contrast of structures or fluids within the body in medical imaging...
currently under clinical evaluation.
ESAC technology has been used for the isolation of potent inhibitors
Enzyme inhibitor
An enzyme inhibitor is a molecule that binds to enzymes and decreases their activity. Since blocking an enzyme's activity can kill a pathogen or correct a metabolic imbalance, many drugs are enzyme inhibitors. They are also used as herbicides and pesticides...
of bovine trypsin
Trypsin
Trypsin is a serine protease found in the digestive system of many vertebrates, where it hydrolyses proteins. Trypsin is produced in the pancreas as the inactive proenzyme trypsinogen. Trypsin cleaves peptide chains mainly at the carboxyl side of the amino acids lysine or arginine, except when...
and for the identification of novel inhibitors
Enzyme inhibitor
An enzyme inhibitor is a molecule that binds to enzymes and decreases their activity. Since blocking an enzyme's activity can kill a pathogen or correct a metabolic imbalance, many drugs are enzyme inhibitors. They are also used as herbicides and pesticides...
of stromelysin-1 (MMP-3) , a matrix metalloproteinase involved in both physiological and pathological tissue remodeling processes, as well as in disease processes, such as arthritis
Arthritis
Arthritis is a form of joint disorder that involves inflammation of one or more joints....
and metastasis
Metastasis
Metastasis, or metastatic disease , is the spread of a disease from one organ or part to another non-adjacent organ or part. It was previously thought that only malignant tumor cells and infections have the capacity to metastasize; however, this is being reconsidered due to new research...
.
DNA-routing
In 2004, D.R. Halpin and P.B. Harbury presented a novel intriguing method for the construction of DNA-encoded libraries. For the first time the DNA-conjugated templates served for both encoding and programming the infrastructure of the “split-&-pool” synthesis of the library components. The design of Halpin and Harbury enabled alternating rounds of selection, PCR amplification and diversification with small organic molecules, in complete analogy to phage displayPhage display
Phage display is a method for the study of protein–protein, protein–peptide, and protein–DNA interactions that uses bacteriophages to connect proteins with the genetic information that encodes them. Phage Display was originally invented by George P...
technology. The DNA-routing machinery consists of a series of connected columns bearing resin-bound anticodons, which could sequence-specifically separate a population of DNA-templates into spatially distinct locations by hybridization. According to this split-and-pool protocol a peptide
Peptide
Peptides are short polymers of amino acid monomers linked by peptide bonds. They are distinguished from proteins on the basis of size, typically containing less than 50 monomer units. The shortest peptides are dipeptides, consisting of two amino acids joined by a single peptide bond...
combinatorial library DNA-encoded of 106 members was generated.
DNA-templated synthesis
In 2001 David Liu and co-workers showed that complementary DNA oligonucleotides can be used to assist certain synthetic reactionsChemical reaction
A chemical reaction is a process that leads to the transformation of one set of chemical substances to another. Chemical reactions can be either spontaneous, requiring no input of energy, or non-spontaneous, typically following the input of some type of energy, such as heat, light or electricity...
, which do not efficiently take place in solution
Solution
In chemistry, a solution is a homogeneous mixture composed of only one phase. In such a mixture, a solute is dissolved in another substance, known as a solvent. The solvent does the dissolving.- Types of solutions :...
at low concentration
Concentration
In chemistry, concentration is defined as the abundance of a constituent divided by the total volume of a mixture. Four types can be distinguished: mass concentration, molar concentration, number concentration, and volume concentration...
. A DNA-heteroduplex was used to accelerate the reaction between chemical moieties displayed at the extremities of the two DNA strands. Furthermore, the "proximity effect", which accelerates bimolecular reaction, was shown to be distance-independent (at least within a distance of 30 nucleotides). In a sequence-programmed fashion oligonucleotides carrying one chemical reactant group were hybridized to complementary oligonucleotide derivatives carrying a different reactive chemical group. The proximity conferred by the DNA hybridization drastically increases the effective molarity of the reaction reagents attached to the oligonucleotides, enabling the desired reaction to occur even in an aqueous environment at concentrations which are several orders of magnitude lower than those needed for the corresponding conventional organic reaction not DNA-templated. Using a DNA-templated set-up and sequence-programmed synthesis Liu and co-workers generated a 64 member compound DNA encoded library of macrocycles.
3-Dimensional proximity-based technology (YoctoReactor technology)
The YoctoReactor (yR) is a 3D proximity-driven approach which exploits the self-assembling nature of DNA oligonucleotides into 3, 4 or 5-way junctions to direct small molecule synthesis at the center of the junction. Figure 5 illustrates the basic concept with a 4-way DNA junction.The center of the DNA junction constitutes a volume on the order of a yoctoliter, hence the name YoctoReactor. This volume contains a single molecule reaction yielding reaction concentrations in the high mM range. The effective concentration facilitated by the DNA greatly accelerates chemical reactions that otherwise would not take place at the actual concentration several orders of magnitude lower.
Building a yR library
Figure 6 illustrates the generation of a yR library using a 3-way DNA junction. In summary, chemical building-blocks (BB) are attached via cleavable or non-cleavable linkers to three types of bispecific DNA oligonucleotides (oligo-BBs) representing each arm of the yR. To facilitate synthesis in a combinatorial manner, the oligo-BBs are designed such that the DNA contains (a) the code for an attached BB at the distal end of the oligo (colored lines) and (b) areas of constant DNA sequence (black lines) to bring about the self assembly of the DNA into a 3-way junction (independently of the BB) and the subsequent chemical reaction. Chemical reactions are performed via a stepwise procedure and after each step the DNA is ligated and the product purified by polyacryamide gel electrophoresis. Cleavable linkers (BB-DNA) are used for all but one position yielding a library of small molecules with a single covalent link to the DNA code. Table 1 outlines how libraries of different sizes can be generated using yR technology.The yR design approach provides an unvarying reaction site with regard to both (a) distance between reactants and (b) sequence environment surrounding the reaction site. Furthermore the intimate connection between the code and the BB on the oligo-BB moieties which are mixed combinatorially in a single pot confers a high fidelity to the encoding of the library. The code of the synthesized products, furthermore, is not preset, but rather is assembled combinatorially and synthesized in synchronicity with the innate product.
Rolling Translation
The latter two features of yR technology described above are important in that they enable a translational process analogous to biological translation. The ability to translate is essential to yR technology and the manipulation of very large libraries, however biological translation is limited to biological molecules; the translation of DNA into protein. With the yR an artificial translation system, termed Rolling Translation, has been established: The information for the synthesis of the small molecule is contained in the attached DNA (genotype) and the molecule-genetic code (phenotype-genotype) can be regenerated via Rolling Translation.Molecular evolution for hit identification
Using principles of Darwinian selection, high affinity hits are identified from the yR library via affinity selection on an immobilized target in rounds of selection, PCR amplification and translation. Affinity selection, described in detail elsewhere[ref], consists of incubating the immobilized target (typically a protein immobilized via a tag to a magnetic bead) with the library, during which time library compounds have the opportunity to bind to the target. Subsequent washing steps remove non-binders, reduce the level of unspecific binders and remove the vast majority of the library members. Elution of the mixture of non-binders and non-specific binders under denaturing conditions yields a mixture, which ideally is now enriched for binders to the target of interest. The mixture is subjected to PCR amplification followed by artificial translation after which a second round of selection can be performed. Typically if very large libraries are being interrogated (106-109), 2-3 rounds of selection are required to enrich binders above background. Workers at Vipergen (Copenhagen, Denmark), the company commercializing the yR, have reported a selective enrichment of a known peptide binder to anti-Enkephalin antibody from a frequency of 1 in 100 million in a model library to a final frequency of 1.7% in only two rounds of selection.The creation and manipulation of libraries up to 1x 1012 in size is therefore conceptually possible with the YoctoReactor technology.Decoding of DNA-encoded chemical libraries
Following selection from DNA-encoded chemical libraries, the decoding strategy for the fast and efficient identification of the specific binding compounds is crucial for the further development of the DEL technology. So far, Sanger-sequencing-based decoding, microarrayMicroarray
A microarray is a multiplex lab-on-a-chip. It is a 2D array on a solid substrate that assays large amounts of biological material using high-throughput screening methods.Types of microarrays include:...
-based methodology and high-throughput sequencing techniques represented the main methodologies for the decoding of DNA-encoded library selections.
Sanger sequencing-based decoding
Although many authors implicitly envisaged a traditional Sanger sequencing-based decoding, the number of codes to sequence simply according to the complexity of the library is definitely an unrealistic task for a traditional Sanger sequencing approach. Nevertheless, the implementation of Sanger sequencing for decoding DNA-encoded chemical libraries in high-throughput fashion was the first to be described. After selection and PCR amplification of the DNA-tags of the library compounds, concatamers containing multiple coding sequences were generated and ligatedDNA ligase
In molecular biology, DNA ligase is a specific type of enzyme, a ligase, that repairs single-stranded discontinuities in double stranded DNA molecules, in simple words strands that have double-strand break . Purified DNA ligase is used in gene cloning to join DNA molecules together...
into a vector
Vector (molecular biology)
In molecular biology, a vector is a DNA molecule used as a vehicle to transfer foreign genetic material into another cell. The four major types of vectors are plasmids, viruses, cosmids, and artificial chromosomes...
. Following Sanger sequencing of a representative number of the resulting colonies
Colony (biology)
In biology, a colony reference to several individual organisms of the same species living closely together, usually for mutual benefit, such as stronger defense or the ability to attack bigger prey. Some insects live only in colonies...
revealed the frequencies of the codes present in the DNA-encoded library sample before and after selection.
Microarray-based decoding
A DNA microarrayMicroarray
A microarray is a multiplex lab-on-a-chip. It is a 2D array on a solid substrate that assays large amounts of biological material using high-throughput screening methods.Types of microarrays include:...
is a device for high-throughput investigations widely used in molecular biology
Molecular biology
Molecular biology is the branch of biology that deals with the molecular basis of biological activity. This field overlaps with other areas of biology and chemistry, particularly genetics and biochemistry...
and in medicine
Medicine
Medicine is the science and art of healing. It encompasses a variety of health care practices evolved to maintain and restore health by the prevention and treatment of illness....
. It consists of an arrayed series of microscopic spots (‘features’ or ‘locations’) containing few picomoles of oligonucleotides carrying a specific DNA sequence. This can be a short section of a gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...
or other DNA element that are used as probes to hybridize a DNA or RNA
RNA
Ribonucleic acid , or RNA, is one of the three major macromolecules that are essential for all known forms of life....
sample under suitable conditions. Probe-target hybridization is usually detected and quantified by fluorescence
Fluorescence
Fluorescence is the emission of light by a substance that has absorbed light or other electromagnetic radiation of a different wavelength. It is a form of luminescence. In most cases, emitted light has a longer wavelength, and therefore lower energy, than the absorbed radiation...
-based detection of fluorophore
Fluorophore
A fluorophore, in analogy to a chromophore, is a component of a molecule which causes a molecule to be fluorescent. It is a functional group in a molecule which will absorb energy of a specific wavelength and re-emit energy at a different wavelength...
-labeled targets to determine relative abundance of the target nucleic acid
Nucleic acid
Nucleic acids are biological molecules essential for life, and include DNA and RNA . Together with proteins, nucleic acids make up the most important macromolecules; each is found in abundance in all living things, where they function in encoding, transmitting and expressing genetic information...
sequences. Microarray
Microarray
A microarray is a multiplex lab-on-a-chip. It is a 2D array on a solid substrate that assays large amounts of biological material using high-throughput screening methods.Types of microarrays include:...
has been used for the successfully decoding of ESAC DNA-encoded libraries. The coding oligonucleotides representing the individual chemical compounds in the library, are spotted and chemically linked onto the microarray
Microarray
A microarray is a multiplex lab-on-a-chip. It is a 2D array on a solid substrate that assays large amounts of biological material using high-throughput screening methods.Types of microarrays include:...
slides, using a BioChip Arrayer robot. Subsequently, the oligonucleotide
Oligonucleotide
An oligonucleotide is a short nucleic acid polymer, typically with fifty or fewer bases. Although they can be formed by bond cleavage of longer segments, they are now more commonly synthesized, in a sequence-specific manner, from individual nucleoside phosphoramidites...
tags of the binding compounds isolated from the selection are PCR amplified using a fluorescent primer
Primer (molecular biology)
A primer is a strand of nucleic acid that serves as a starting point for DNA synthesis. They are required for DNA replication because the enzymes that catalyze this process, DNA polymerases, can only add new nucleotides to an existing strand of DNA...
and hybridized onto the DNA-microarray
Microarray
A microarray is a multiplex lab-on-a-chip. It is a 2D array on a solid substrate that assays large amounts of biological material using high-throughput screening methods.Types of microarrays include:...
slide. Afterwards, microarrays are analyzed using a laser
Laser
A laser is a device that emits light through a process of optical amplification based on the stimulated emission of photons. The term "laser" originated as an acronym for Light Amplification by Stimulated Emission of Radiation...
scan and spot intensities detected and quantified. The enrichment of the preferential binding compounds is revealed comparing the spots intensity of the DNA-microarray
Microarray
A microarray is a multiplex lab-on-a-chip. It is a 2D array on a solid substrate that assays large amounts of biological material using high-throughput screening methods.Types of microarrays include:...
slide before and after selection.
Decoding by high throughput sequencing
According to the complexity of the DNA encoded chemical library (typically between 103 and 106 members), a conventional Sanger sequencing based decoding is unlikely to be usable in practice, due both to the high cost per base for the sequencing and to the tedious procedure involved. High throughput sequencingDNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....
technologies exploited strategies that parallelize the sequencing process displacing the use of capillary
Capillary
Capillaries are the smallest of a body's blood vessels and are parts of the microcirculation. They are only 1 cell thick. These microvessels, measuring 5-10 μm in diameter, connect arterioles and venules, and enable the exchange of water, oxygen, carbon dioxide, and many other nutrient and waste...
electrophoresis
Electrophoresis
Electrophoresis, also called cataphoresis, is the motion of dispersed particles relative to a fluid under the influence of a spatially uniform electric field. This electrokinetic phenomenon was observed for the first time in 1807 by Reuss , who noticed that the application of a constant electric...
and producing thousands or millions of sequences at once. In 2008 was described the first implementation of a high-throughput sequencing
DNA sequencing
DNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....
technique originally developed for genome sequencing (i.e. "454 technology
454 Life Sciences
454 Life Sciences, is a biotechnology company based in Branford, Connecticut. It is a subsidiary of Roche, and specializes in high-throughput DNA sequencing.-History and Major Achievements:...
") to the fast and efficient decoding of a DNA encoded chemical library comprising 4000 compounds. This study led to the identification of novel chemical compounds with submicromolar dissociation constant
Dissociation constant
In chemistry, biochemistry, and pharmacology, a dissociation constant is a specific type of equilibrium constant that measures the propensity of a larger object to separate reversibly into smaller components, as when a complex falls apart into its component molecules, or when a salt splits up into...
s towards streptavidin
Streptavidin
Streptavidin is a 60000 dalton protein purified from the bacterium Streptomyces avidinii. Streptavidin homo-tetramers have an extraordinarily high affinity for biotin . With a dissociation constant on the order of ≈10-14 mol/L, the binding of biotin to streptavidin is one of the strongest...
and definitely shown the feasibility to construct, perform selections and decode DNA-encoded libraries containing millions of chemical compounds.
See also
- Drug discoveryDrug discoveryIn the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which drugs are discovered or designed.In the past most drugs have been discovered either by identifying the active ingredient from traditional remedies or by serendipitous discovery...
- High-throughput screeningHigh-throughput screeningHigh-throughput screening is a method for scientific experimentation especially used in drug discovery and relevant to the fields of biology and chemistry. Using robotics, data processing and control software, liquid handling devices, and sensitive detectors, High-Throughput Screening allows a...
- Combinatorial chemistryCombinatorial chemistryCombinatorial chemistry involves the rapid synthesis or the computer simulation of a large number of different but structurally related molecules or materials...
- DNA sequencingDNA sequencingDNA sequencing includes several methods and technologies that are used for determining the order of the nucleotide bases—adenine, guanine, cytosine, and thymine—in a molecule of DNA....
- Phage displayPhage displayPhage display is a method for the study of protein–protein, protein–peptide, and protein–DNA interactions that uses bacteriophages to connect proteins with the genetic information that encodes them. Phage Display was originally invented by George P...