Virtual screening
Encyclopedia
Virtual screening is a computational technique used in drug discovery
research. By using computers, it deals with the quick search of large libraries of chemical structures in order to identify those structures which are most likely to bind to a drug target, typically a protein
receptor
or enzyme
.
Virtual screening has become an integral part of the drug discovery process. Related to the more general and long pursued concept of database searching, the term "virtual screening" is relatively new. Walters, et al. define virtual screening as "automatically evaluating very large libraries of compounds" using computer programs. As this definition suggests, VS has largely been a numbers game focusing on questions like how can we filter down the enormous chemical space
of over 1060 conceivable compounds to a manageable number that can be synthesized, purchased, and tested. Although filtering the entire chemical universe might be a fascinating question, more practical VS scenarios focus on designing and optimizing targeted combinatorial libraries and enriching libraries of available compounds from in-house compound repositories or vendor offerings.
The aim of virtual screening is to identify molecules of novel chemical structure that bind to the macromolecular target of interest. Thus, success of a virtual screen is defined in terms of finding interesting new scaffolds rather than many of these hits. Interpretations of virtual screening accuracy should therefore be considered with caution. Low hit rates of interesting scaffolds are clearly preferable over high hit rates of already known scaffolds.
Most virtual screening studies in the literature are retrospective. In these studies, the performance of a VS technique is measured by its ability to retrieve a small set of previously known molecules with affinity to the target of interest (active molecules or just actives) from a library containing a much higher proportion of assumed inactives or decoys. By contrast, in prospective applications of virtual screening, the resulting hits are subjected to experimental confirmation (e.g. IC50
measurements). There is consensus that retrospective benchmarks are not good predictors of prospective performance and consequently only prospective studies constitute conclusive proof of the suitability of a technique for a particular target.
models. A candidate ligand can then be compared to the pharmacophore model to determine whether it is compatible with it and therefore likely to bind.
Another approach to ligand-based virtual screening is to use 2D chemical similarity analysis methods to scan a database of molecules against one or more active ligand structure.
A popular approach to ligand-based virtual screening is based on searching molecules with shape similar to that of known actives, as such molecules will fit the target's binding site and hence will be likely to bind the target. There are a number of prospective applications of this class of techniques in the literature.
to estimate the likelihood that the ligand will bind to the protein with high affinity.
or Torque PBS.
A means of handling the input from large compound libraries is needed. This requires a form of compound database that can be queried by the parallel cluster, delivering compounds in parallel to the various compute nodes. Commercial database engines may be too ponderous, and a high speed indexing engine, such as Berkeley DB
, may be a better choice. Furthermore, it may not be efficient to run one comparison per job, because the ramp up time of the cluster nodes could easily outstrip the amount of useful work. To work around this, it is necessary to process batches of compounds in each cluster job, aggregating the results into some kind of log file. A secondary process, to mine the log files and extract high scoring candidates, can then be run after the whole experiment has been run.
Drug discovery
In the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which drugs are discovered or designed.In the past most drugs have been discovered either by identifying the active ingredient from traditional remedies or by serendipitous discovery...
research. By using computers, it deals with the quick search of large libraries of chemical structures in order to identify those structures which are most likely to bind to a drug target, typically a protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
receptor
Receptor (biochemistry)
In biochemistry, a receptor is a molecule found on the surface of a cell, which receives specific chemical signals from neighbouring cells or the wider environment within an organism...
or enzyme
Enzyme
Enzymes are proteins that catalyze chemical reactions. In enzymatic reactions, the molecules at the beginning of the process, called substrates, are converted into different molecules, called products. Almost all chemical reactions in a biological cell need enzymes in order to occur at rates...
.
Virtual screening has become an integral part of the drug discovery process. Related to the more general and long pursued concept of database searching, the term "virtual screening" is relatively new. Walters, et al. define virtual screening as "automatically evaluating very large libraries of compounds" using computer programs. As this definition suggests, VS has largely been a numbers game focusing on questions like how can we filter down the enormous chemical space
Chemical space
Chemical space is the space spanned by all possible molecules and chemical compounds – that is, all stoichiometric combinations of electrons and atomic nuclei, in all possible topology isomers. Chemical reactions allow us to move in chemical space...
of over 1060 conceivable compounds to a manageable number that can be synthesized, purchased, and tested. Although filtering the entire chemical universe might be a fascinating question, more practical VS scenarios focus on designing and optimizing targeted combinatorial libraries and enriching libraries of available compounds from in-house compound repositories or vendor offerings.
The aim of virtual screening is to identify molecules of novel chemical structure that bind to the macromolecular target of interest. Thus, success of a virtual screen is defined in terms of finding interesting new scaffolds rather than many of these hits. Interpretations of virtual screening accuracy should therefore be considered with caution. Low hit rates of interesting scaffolds are clearly preferable over high hit rates of already known scaffolds.
Most virtual screening studies in the literature are retrospective. In these studies, the performance of a VS technique is measured by its ability to retrieve a small set of previously known molecules with affinity to the target of interest (active molecules or just actives) from a library containing a much higher proportion of assumed inactives or decoys. By contrast, in prospective applications of virtual screening, the resulting hits are subjected to experimental confirmation (e.g. IC50
IC50
The half maximal inhibitory concentration is a measure of the effectiveness of a compound in inhibiting biological or biochemical function. This quantitative measure indicates how much of a particular drug or other substance is needed to inhibit a given biological process by half...
measurements). There is consensus that retrospective benchmarks are not good predictors of prospective performance and consequently only prospective studies constitute conclusive proof of the suitability of a technique for a particular target.
Method
There are two broad categories of screening techniques: ligand-based and structure-based.Ligand-based
Given a set of structurally diverse ligands that binds to a receptor, a model of the receptor can be built by exploiting the collective information contained in such set of ligands. These are known as pharmacophorePharmacophore
thumb|right|300px|An example of a pharmacophore model.A pharmacophore is an abstract description of molecular features which are necessary for molecular recognition of a ligand by a biological macromolecule....
models. A candidate ligand can then be compared to the pharmacophore model to determine whether it is compatible with it and therefore likely to bind.
Another approach to ligand-based virtual screening is to use 2D chemical similarity analysis methods to scan a database of molecules against one or more active ligand structure.
A popular approach to ligand-based virtual screening is based on searching molecules with shape similar to that of known actives, as such molecules will fit the target's binding site and hence will be likely to bind the target. There are a number of prospective applications of this class of techniques in the literature.
Structure-based
Structure-based virtual screening involves docking of candidate ligands into a protein target followed by applying a scoring functionScoring functions for docking
In the fields of computational chemistry and molecular modelling, scoring functions are fast approximate mathematical methods used to predict the strength of the non-covalent interaction between two molecules after they have been docked...
to estimate the likelihood that the ligand will bind to the protein with high affinity.
Computing Infrastructure
The computation of pair-wise interactions between atoms, which is a prerequisite for the operation of many virtual screening programs, is of computational complexity, where N is the number of atoms in the system. Because of the exponential scaling with respect to the number of atoms, the computing infrastructure may vary from a laptop computer for a ligand-based method to a mainframe for a structure-based method.Ligand-based
Ligand-based methods typically require a fraction of a second for a single structure comparison operation. A single CPU is enough to perform a large screening within hours. However, several comparisons can be made in parallel in order to expedite the processing of a large database of compounds.Structure-based
The size of the task requires a parallel computing infrastructure, such as a cluster of Linux systems, running a batch queue processor to handle the work, such as Sun Grid EngineSun Grid Engine
Oracle Grid Engine, previously known as Sun Grid Engine , previously known as CODINE or GRD , is an open source batch-queuing system, developed and supported by Sun Microsystems...
or Torque PBS.
A means of handling the input from large compound libraries is needed. This requires a form of compound database that can be queried by the parallel cluster, delivering compounds in parallel to the various compute nodes. Commercial database engines may be too ponderous, and a high speed indexing engine, such as Berkeley DB
Berkeley DB
Berkeley DB is a computer software library that provides a high-performance embedded database for key/value data. Berkeley DB is a programmatic software library written in C with API bindings for C++, PHP, Java, Perl, Python, Ruby, Tcl, Smalltalk, and most other programming languages...
, may be a better choice. Furthermore, it may not be efficient to run one comparison per job, because the ramp up time of the cluster nodes could easily outstrip the amount of useful work. To work around this, it is necessary to process batches of compounds in each cluster job, aggregating the results into some kind of log file. A secondary process, to mine the log files and extract high scoring candidates, can then be run after the whole experiment has been run.
See also
- High-throughput screeningHigh-throughput screeningHigh-throughput screening is a method for scientific experimentation especially used in drug discovery and relevant to the fields of biology and chemistry. Using robotics, data processing and control software, liquid handling devices, and sensitive detectors, High-Throughput Screening allows a...
- Drug discoveryDrug discoveryIn the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which drugs are discovered or designed.In the past most drugs have been discovered either by identifying the active ingredient from traditional remedies or by serendipitous discovery...
- Docking (molecular)
- Scoring functionsScoring functions for dockingIn the fields of computational chemistry and molecular modelling, scoring functions are fast approximate mathematical methods used to predict the strength of the non-covalent interaction between two molecules after they have been docked...
- ZINC databaseZINC databaseThe ZINC database is a curated collection of commercially available chemical compounds prepared especially for virtual screening. ZINC is used by investigators in pharmaceutical companies,...
External links
- ZINC — a free database of commercially-available compounds for virtual screening.
- Virtual Screening Methods
- Free service to screen for GPCR ligands, ion channel blockers and kinase inhibitors
- Brutus — a similarity analysis tool for ligand-based virtual screening.
- NovaMechanics Cheminformatics Research Combined structure & ligand based chemistry driven virtual screening.