Scoring functions for docking
Encyclopedia
In the fields of computational chemistry
and molecular modelling
, scoring functions are fast approximate mathematical methods used to predict the strength of the non-covalent interaction (also referred to as binding
affinity) between two molecules after they have been docked. Most commonly one of the molecules is a small organic compound
such as a drug
and the second is the drug's biological target such as a protein
receptor
. Scoring functions have also been developed to predict the strength of other types of intermolecular interactions, for example between two proteins or between protein and DNA
.
and other molecular modelling
applications. These include:
A potentially more reliable but much more computationally demanding alternative to scoring functions are free energy perturbation
calculations.
For currently used methods aiming to predict affinities of ligands
for proteins the following must first be known or predicted:
The above information yields the three dimensional structure of the complex. Based on this structure, the scoring function can then estimate the strength of the association between the two molecules in the complex using one of the methods outlined below. Finally the scoring function itself may be used to help predict both the binding mode and the active conformation of the small molecule in the complex, or alternatively a simpler and computationally faster function may be utilised within the docking run.
Finally hybrid scoring functions have also been developed in which the components from two or more of the above scoring functions are combined into one function.
Computational chemistry
Computational chemistry is a branch of chemistry that uses principles of computer science to assist in solving chemical problems. It uses the results of theoretical chemistry, incorporated into efficient computer programs, to calculate the structures and properties of molecules and solids...
and molecular modelling
Molecular modelling
Molecular modelling encompasses all theoretical methods and computational techniques used to model or mimic the behaviour of molecules. The techniques are used in the fields of computational chemistry, computational biology and materials science for studying molecular systems ranging from small...
, scoring functions are fast approximate mathematical methods used to predict the strength of the non-covalent interaction (also referred to as binding
Binding (molecular)
Molecular binding is an attractive interaction between two molecules which results in a stable association in which the molecules are in close proximity to each other...
affinity) between two molecules after they have been docked. Most commonly one of the molecules is a small organic compound
Small molecule
In the fields of pharmacology and biochemistry, a small molecule is a low molecular weight organic compound which is by definition not a polymer...
such as a drug
Drug
A drug, broadly speaking, is any substance that, when absorbed into the body of a living organism, alters normal bodily function. There is no single, precise definition, as there are different meanings in drug control law, government regulations, medicine, and colloquial usage.In pharmacology, a...
and the second is the drug's biological target such as a protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...
receptor
Receptor (biochemistry)
In biochemistry, a receptor is a molecule found on the surface of a cell, which receives specific chemical signals from neighbouring cells or the wider environment within an organism...
. Scoring functions have also been developed to predict the strength of other types of intermolecular interactions, for example between two proteins or between protein and DNA
DNA
Deoxyribonucleic acid is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms . The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in...
.
Utility
Scoring functions are widely used in drug discoveryDrug discovery
In the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which drugs are discovered or designed.In the past most drugs have been discovered either by identifying the active ingredient from traditional remedies or by serendipitous discovery...
and other molecular modelling
Molecular modelling
Molecular modelling encompasses all theoretical methods and computational techniques used to model or mimic the behaviour of molecules. The techniques are used in the fields of computational chemistry, computational biology and materials science for studying molecular systems ranging from small...
applications. These include:
- Virtual screeningVirtual screeningVirtual screening is a computational technique used in drug discovery research. By using computers, it deals with the quick search of large libraries of chemical structures in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or...
of small moleculeSmall moleculeIn the fields of pharmacology and biochemistry, a small molecule is a low molecular weight organic compound which is by definition not a polymer...
databases of candidate ligands to identify novel small molecules that bind to a protein target of interest and therefore are useful starting points for drug discoveryDrug discoveryIn the fields of medicine, biotechnology and pharmacology, drug discovery is the process by which drugs are discovered or designed.In the past most drugs have been discovered either by identifying the active ingredient from traditional remedies or by serendipitous discovery... - De novo design (design "from scratch") of novel small molecules that bind to a protein target
- Lead optimization of screening hits to optimize their affinity and selectivity
A potentially more reliable but much more computationally demanding alternative to scoring functions are free energy perturbation
Free energy perturbation
Free energy perturbation theory is a method based on statistical mechanics that is used in computational chemistry for computing free energy differences from molecular dynamics or Metropolis Monte Carlo simulations. The FEP method was introduced by R. W. Zwanzig in 1954...
calculations.
Prerequisites
Scoring functions are normally parameterized (or trained) against a data set consisting of experimentally determined binding affinities between molecular species similar to the species that one wishes to predict.For currently used methods aiming to predict affinities of ligands
Ligand (biochemistry)
In biochemistry and pharmacology, a ligand is a substance that forms a complex with a biomolecule to serve a biological purpose. In a narrower sense, it is a signal triggering molecule, binding to a site on a target protein.The binding occurs by intermolecular forces, such as ionic bonds, hydrogen...
for proteins the following must first be known or predicted:
- Protein tertiary structureTertiary structureIn biochemistry and molecular biology, the tertiary structure of a protein or any other macromolecule is its three-dimensional structure, as defined by the atomic coordinates.-Relationship to primary structure:...
– arrangement of the protein atoms in three dimensional space. Protein structures may be determined by experimental techniques such as X-ray crystallographyX-ray crystallographyX-ray crystallography is a method of determining the arrangement of atoms within a crystal, in which a beam of X-rays strikes a crystal and causes the beam of light to spread into many specific directions. From the angles and intensities of these diffracted beams, a crystallographer can produce a...
or solution phase NMRNMRNMR may refer to:Applications of Nuclear Magnetic Resonance:* Nuclear magnetic resonance* NMR spectroscopy* Solid-state nuclear magnetic resonance* Protein nuclear magnetic resonance spectroscopy* Proton NMR* Carbon-13 NMR...
methods or predicted by homology modelling. - Ligand active conformationConformational isomerismIn chemistry, conformational isomerism is a form of stereoisomerism in which the isomers can be interconverted exclusively by rotations about formally single bonds...
– three dimensional shape of the ligand when bound to the protein - Binding-mode – orientation of the two binding partners relative to each other in the complex
The above information yields the three dimensional structure of the complex. Based on this structure, the scoring function can then estimate the strength of the association between the two molecules in the complex using one of the methods outlined below. Finally the scoring function itself may be used to help predict both the binding mode and the active conformation of the small molecule in the complex, or alternatively a simpler and computationally faster function may be utilised within the docking run.
Classes
There are three general classes of scoring functions:- Force fieldForce field (chemistry)In the context of molecular modeling, a force field refers to the form and parameters of mathematical functions used to describe the potential energy of a system of particles . Force field functions and parameter sets are derived from both experimental work and high-level quantum mechanical...
– affinities are estimated by summing the strength of intermolecular van der WaalsVan der Waals forceIn physical chemistry, the van der Waals force , named after Dutch scientist Johannes Diderik van der Waals, is the sum of the attractive or repulsive forces between molecules other than those due to covalent bonds or to the electrostatic interaction of ions with one another or with neutral...
and electrostatic interactions between all atoms of the two molecules in the complex. The intramolecular energies (also referred to as strain energyStrain energyIn a molecule, strain energy is released when the constituent atoms are allowed to rearrange themselves in a chemical reaction or a change of chemical conformation in a way that:* angle strain,* torsional strain,* ring strain and/or steric strain,...
) of the two binding partners are also frequently included. Finally since the binding normally takes place in the presence of water, the desolvationSolvationSolvation, also sometimes called dissolution, is the process of attraction and association of molecules of a solvent with molecules or ions of a solute...
energies of the ligand and of the protein are sometimes taken into account using implicit solvationImplicit solvationImplicit solvation is a method of representing solvent as a continuous medium instead of individual “explicit” solvent molecules most often used in molecular dynamics simulations and in other applications of molecular mechanics...
methods such as GBSA or PBSA. - Empirical – based on counting the number of various types of interactions between the two binding partners. Counting may be based on the number of ligand and receptor atoms in contact with each other or by calculating the change in solvent accessible surface areaAccessible surface areaThe accessible surface area is the surface area of a biomolecule that is accessible to a solvent. The ASA is usually quoted in square ångstrom . ASA was first described by Lee & Richards in 1971 and is sometimes called the Lee-Richards molecular surface...
(ΔSASA) in the complex compared to the uncomplexed ligand and protein. The coefficients of the scoring function are usually fit using multiple linear regressionLinear regressionIn statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...
methods. These interactions terms of the function may include for example:- hydrophobicHydrophobeIn chemistry, hydrophobicity is the physical property of a molecule that is repelled from a mass of water....
— hydrophobic contacts (favorable), - hydrophobic — hydrophilicHydrophileA hydrophile, from the Greek "water" and φιλια "love," is a molecule or other molecular entity that is attracted to, and tends to be dissolved by water. A hydrophilic molecule or portion of a molecule is one that has a tendency to interact with or be dissolved by, water and other polar substances...
contacts (unfavorable), - hydrophilic — hydrophilic contacts (no contribution to affinity except for the following special cases):
- number of hydrogen bondHydrogen bondA hydrogen bond is the attractive interaction of a hydrogen atom with an electronegative atom, such as nitrogen, oxygen or fluorine, that comes from another molecule or chemical group. The hydrogen must be covalently bonded to another electronegative atom to create the bond...
s (favorable electrostatic contribution to affinity, especially if shielded from solvent, if solvent exposed no contribution), - number of hydrogen bond "mismatches" or other types of electrostatic repulsion (very unfavorable and rarely seen in stable complexes),
- number of hydrogen bond
- number of rotatable bonds immobilized in complex formation (unfavorable entropic contribution).
- hydrophobic
- Knowledge-based – based on statistical observations of intermolecular close contacts in large 3D databases (such as the Cambridge Structural DatabaseCambridge Structural DatabaseThe Cambridge Structural Database , is a repository for small molecule crystal structures. Scientists use single-crystal x-ray crystallography to determine the crystal structure of a compound. Once the structure is solved, information about the structure is saved in a file and deposited in the CSD...
or Protein Data BankProtein Data BankThe Protein Data Bank is a repository for the 3-D structural data of large biological molecules, such as proteins and nucleic acids....
) which are used to derive "potentials of mean forcePotential of mean forceThe Potential of Mean Force of a system with N molecules is strictly the potential that gives the average force over all the configurations of all the n+1...N molecules acting on a particle at any fixed configuration keeping fixed a set of molecules 1...n...
". This method is founded on the assumption that close intermolecular interactions between certain types of atoms or functional groups that occur more frequently than one would expect by a random distribution are likely to be energetically favorable and therefore contribute favorably to binding affinity.
Finally hybrid scoring functions have also been developed in which the components from two or more of the above scoring functions are combined into one function.