Sunteți pe pagina 1din 12

COMPUTATIONAL BIOLOGY

MOLECULAR DOCKING
SHARMISHTHA SHEKHAR IMT/08/9060 Integrated M. Tech Biotechnology SEMESTER- VII SECTION- I

INTRODUCTION Molecular docking is a computer simulation procedure to predict the conformation of a receptorligand complex, where the receptor is usually a protein or a nucleic acid molecule (DNA or RNA) and the ligand is either a small molecule or another protein. It can also be defined as a simulation process where a ligand position is estimated in a predicted or pre-defined binding site. The drug activity is obtained through the molecular binding of one molecule (the ligand) to the pocket of another, usually larger, molecule (the receptor), which is commonly a protein. In their binding conformations, the molecules exhibit geometric and chemical complementarity, both of which are essential for successful drug activity. The computational process of searching for a ligand that is able to fit both geometrically and energetically the binding site of a protein is called molecular docking.

Fig 1: Therapeutic drug molecule (small docked molecule towards the center of the figure) bound to protein receptor (HIV-1 protease). The drug molecule fits tightly in the binding site and blocks the normal protein function. The goal of ligandprotein docking is to predict the predominant binding model(s) of a ligand with a protein of known three-dimensional structure. Molecular docking simulations may be used for reproducing experimental data through docking validations algorithms, where protein-ligand or protein-protein conformations are obtained in silico and compared to structures obtained from X-ray crystallography or nuclear magnetic resonance. Furthermore, docking is one of main tools for virtual screening procedures, where a library of several compounds is docked against one drug target and returns the best hit.

For drug design, there are two main tasks:

identification of new compounds showing some activity against a target biological receptor, and the progressive optimization of these leads to yield a compound with improved potency and physicochemical properties in-vitro, and, eventually, improved efficacy, pharmacokinetic, and toxicological profiles in-vivo.

Identification is done either by Random Screening or a Directed design approach. The directed approach needs a rational starting point for medicinal chemists and molecular modeling scientist to exploit. Examples include the design of analogs of a drug known to be active against a target receptor and mimics of the natural substrate of an enzyme. Increasingly, the threedimensional structure of many biological targets is being revealed by X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy, opening the way to the design of novel molecules that directly exploit the structural characteristics of the receptor binding site. VIRTUAL SCREENING It uses computer based methods to discover new ligand on the basis of biological structure. The basic goal of the virtual screening is the reduction of the enormous virtual chemical space of small organic molecules, to synthesize and/or screen against a specific target protein, to a manageable number of the compound that inhibit a highest chance to lead to a drug candidate. Docking protocols can be described as a combination of two components: search strategy and scoring function. The search algorithm should generate an optimum number of congurations that include the experimentally determined binding mode. MOLECULAR MODELLING A molecule is characterized by a pair (A; B), in which A represents a collection of atoms, and B represents a collection of bonds between pairs of atoms. Information used for kinematic and energy computations is associated with each of the atoms and bonds. Each atom carries standard information, such as its van der Waals radius. Three pieces of information are associated with each bond: (i) the bond length- it is the distance between atom centers; (ii) the bond angle- it is the angle between two consecutive bonds; (iii) whether the bond is rotatable or not Since bond lengths and angles do not affect significantly the shape of a molecule, they are considered fixed. Thus the degrees of freedom of the molecule arise from the rotatable bonds. The three-dimensional embedding of a molecule defined when the values to its rotatable bonds is assigned is called the conformation of the molecule. Ligands typically have 3-15 rotatable bonds,

while receptors have 1,000-2,000 rotatable bonds. The dimension of the combined searched space makes the docking problem computationally intractable.

Fig 2: A drug molecule. Spheres represent atoms and bonds connecting them are represented by sticks. Curved arrows represent the rotatable degrees of freedom around bonds. In docking, energy evaluations are usually carried out with the help of a scoring function. The two main characteristics of a good scoring function are selectivity and efficiency. Selectivity enables the function to distinguish between correctly and incorrectly docked structures and efficiency enables the docking program to run in a reasonable amount of time. Current docking methods utilize the scoring functions in one of two ways. The first approach uses the full scoring function to rank a protein-ligand conformation. The system is then modified by the search algorithm, and the same scoring function is again applied to rank the new structure. In the alternative approach, a two stage scoring function is used. A reduced function is used in directing the search and a more rigorous one is then used to rank the resulting structures. Some common scoring functions are -Force-field methods -Empirical free energy scoring functions -Knowledge-based potential of mean force FORCE FIELD MODELS Molecular mechanics stem from the idea, that the electrons of the atom can be thought as fixed. Geometry of a molecule can be approximated effectively by taking all the interacting forces into account. Bonded interactions are described by spring forces, and non-bonded interactions are usually approximated by potentials resembling van der Waals interaction. The desired parameters are determined by experimental observations. Geometry is further optimized by finding the energy minimum. Total energy is represented by set of potential energy functions. In addition to these functions, a set of parameters is also needed to compute the total energy. The force field parameters have no meaning unless they are considered together with the potential energy functions. Thus a comparison between force field models is very difficult. In addition to these two parts, information about atom types and atom charges is also required. We also usually need a set of rules to type

atoms, generate parameters not explicitly defined and to assign functional forms and parameters. These methods together form a force field. Potential energy functions Parameters for function terms List of atoms and atom charges Rules for atom-typing, parameter generation and functional form assigning

Classical force field models Examples of classical force field models include AMBER, CHARMM and CVFF. They are used mainly in biochemistry. AMBER (Assisted Model Building with Energy Refinement) AMBER refers to two things: it may mean a set of molecular mechanic force fields used for the simulation of biomolecules, or it may also mean a package of molecular simulation programs. AMBER's set of parameters is experimentally derived. AMBER force fields are probably the most widespread ones. AMBER is designed especially for biological macromolecules.

CHARMM (Chemistry at HARvard Macromolecular Mechanics) CHARMM is a program for macromolecular dynamics. In addition to performing MD using algorithms for time-stepping, long range force calculation and periodic images, it can be used for energy minimization, normal modes and crystal optimizations. There are several potential energy functions parameterized for protein, lipid and nucleic acid simulations. CHARMM also incorporates free energy methods for chemical and conformational free energy calculations. CVFF (Consistent Valence Force Field) CVFF has parameters that are acquired by fitting crystal and gas structures to small organic molecules. CVFF is designed mainly for organic materials, and it is commonly used to predict structures and compute binding energies. Second generation force field models Second generation force fields examples include CFF and COMPASS. CFF (Consistent Force Field) CFF is a bit more complex compared to AMBER. The potential energy functions in CFF are expanded in order to avoid problems concerning complexity of potential energy surfaces. CFF also uses quantum calculations to determine the parameters for energy functions. This approach gives a

great advantage over classical models, since parameters can be determined much more accurately. Other advantages include the possibility to cover larger number of compounds into the force field model, and the fact that all parameters are determined the same way (which makes the model more consistent).
COMPASS (Condensed-phase Optimized Molecular Potentials for Atomistic Simulation

Studies) COMPASS is another ab initio (from the beginning) force field model. Like CFF, it also has parameters defined by quantum mechanical calculations and validated by empirical data. Generalized force field models Generalized force fields are not as accurate as the ones presented above, but they have their uses. They can be applied to systems that are not covered by more accurate force field models. Generalized force field models are based on atomic parameters and rules to determine the explicit form of parameters. Examples include ESFF and UFF.

FORCES It is very common to define the interactions between particles to be the consequence of forces between the molecules contained by the particles. Often forces are divided into four categories: Forces with electrostatic origin Forces with electrodynamic origin Steric forces Solvent-related forces

Forces with electrostatic origin are due to the charges residing in the matter. The most common interactions are charge-charge, charge-dipole and dipole-dipole. These forces can be computed with the basic law of Coulomb. Dependencies on the distance are the following: charge-charge: 1/r charge-dipole: 1/r2 dipole-dipole: 1/r3 In addition to purely electrostatic forces, there also exist those with electro dynamical background. The most widely known is probably the van der Waals-interaction. Atoms, that are normally electrically neutral, may develop an induced dipole moment when an external electric field is applied. Van der Waals-interaction is the force between the two induced dipoles, and it has a very short range. There are also forces between existing charges and induced dipoles.

Range dependences are the following:


charge-induced dipole: 1/r4 van der Waals: 1/r6

Steric forces are caused by entropy. For example, in cases where entropy is limited, there may be forces to minimize the free energy of the system that are due to entropy. Solvent-related forces are due to the structural changes of the solvent. These structural changes are generated, when ions, colloids, proteins, etc. are added into the structure of solvent. For example, when water is acting as a solvent, one must take the polaric nature of water molecules into account. Water molecules form hydrogen bonds, and for example the water mass around the studied protein may turn into a highly organized structure. It is very hard to determine the solventrelated interactions, and their modeling depends very much on the way the actual solvent is modeled. Common thing to all these forces is the electromagnetic origin. The rapid generation of quality lead compounds is a major hurdle in the design of therapeutics, so that accurate automated procedures would be of tremendous value to pharmaceutical and other biotechnology companies. However, designing a drug based on the knowledge of the target receptor structure as determined by current experimental techniques is a process prone to error. The two major reasons responsible for failures are: 1) Inaccuracies in the energy models used to score potential ligand/receptor complexes, and 2) The inability of current methods to account for conformational changes that occur during the binding process not only for the ligand, but also for the receptor. Although this problem has been partially solved by incorporating ligand flexibility in search methods, predicting receptor structural rearrangements is a very complex problem which has not been solved. The search algorithm for molecular docking should create an optimum number of configurations that include the experimentally determined binding modes. These configurations are evaluated using scoring functions to distinguish the experimental binding modes from all other modes explored through the searching algorithm. Some common searching algorithms include Molecular dynamics Monte Carlo methods Genetic algorithms

Fragment-based methods Point complementary methods Distance geometry methods Tabu searches Systematic searches

MOLECULAR DYNAMICS These methods involve the calculation of solutions to Newton's equations of motions. Finding the global minimum energy of a docked complex is difficult since traversing the rugged hyper surface of a biological problem is problematic. The problem is approached using standard optimization algorithms including:

direct searches, using only the potential function, impractical for large molecules, suitable only for crude optimization of small molecules far away from the minimum, e.g. simplex gradient methods, involving the first derivative of the potential function, low convergence near the minimum, recommended for initial optimization, e.g. steepest descend conjugate-gradient methods, history of the search influences the search direction, high computational efforts, better convergence, e.g. Fletcher- Reeves second derivative methods, very efficient convergence, e.g. Newton-Raphson least squares methods, good convergence but often computationally too expensive, e.g. Marquardt

Often a combination of methods mentioned above is used, for example a combination of a gradient method for initial optimization and a conjugate-gradient method when nearing the minimum. MONTE CARLO METHOD It was the technique used to perform the first computer simulation of a molecular system. The expression Monte Carlo usually means importance sampling or Metropolis method. The Metropolis method, which is actually a Markov chain Monte Carlo method, generates random moves to the system and then accepts or rejects the move based on a Boltzmann probability. GENETIC ALGORITHMS

Genetic algorithms and evolutionary programming are quite suitable for solving docking problems because of their usefulness in solving complex optimization problems. The essential idea of genetic algorithms is the evolution of a population of possible solutions via genetic operators (mutation, crossovers and migrations) to a final population, optimizing a predefined fitness function. The process of applying genetic algorithms starts with encoding the variables, in this case the degrees of freedom, into a "genetic code", e.g. binary strings.Then a random initial population of solutions is created. Genetic operators are then applied to this population leading to a new population. This new population is then scored and ranked, and using "the survival of the fittest", their probabilities of getting to the next iteration round depends on their score. If the size of the population is kept constant, good solutions will occupy the population. It should be noted that genetic algorithms are well suitable for parallel computing. Some programs using GAs are GOLD, AutoDock, DIVALI and DARWIN. FRAGMENT-BASED METHOD Fragment based methods can be described as dividing the ligand into separate portions or fragments, docking the fragments, and finally linking these fragments together. These methods require subjective decisions on the importance of the various functional groups in the ligand, because a good choice of base fragment is essential for these methods. A poor choice can significantly affect the quality of the results. The base fragment must contain the predominant interactions with the receptor. Some well known programs using fragment based methods are FlexX and DOCK. POINT COMPLEMENTARITY METHOD These methods are based on evaluating the shape and/or chemical complementarity between interacting molecules. The interacting molecules are usually modeled in an easy way, for example using spheres or cubes as atoms. The ligand description is then rotated and translated to obtain maximum number of matches between ligand and protein surfaces, minus the number of volume overlaps. Additional constraints may be present, for example a demand for interacting surface normals to be approximately in opposite directions. Some algorithms use a 3D grid, which is placed over the protein and over the ligand. Each grid point is then labeled either open space or inside the ligand or protein. Then a correlation function is created and this function is optimized using rigid body translation and rotation. This often involves using traditional shape recognition algorithms like Fast Fourier Transform (FFT) with Fourier correlation theory. A high correlation score denotes good surface complementarity between the molecules. Because many of the methods were originally created for protein-protein docking, the rigid body assumption is usually made. This is a limitation in ligand-protein docking. However, some algorithms are addressed to ligand-protein docking and these allow some flexibility.

Examples of programs using point complementary methods are FTDOCK, SANDOCK, FLOG and the Soft Docking algorithm. DISTANCE GEOMETRY METHOD Many types of structural information can be expressed as intra- or intermolecular distances. The distance geometry formalism allows these distances to be assembled and three-dimensional structures consistent with them to be calculated. The crucial feature is that it is not possible to arbitrarily assign values to the inter-atomic distances in a molecule and always obtain a low-energy conformation. Rather, the inter-atomic distances are closely interrelated and many combinations of distances are geometrically impossible. This enables fast sampling of the conformational space though not always resulting in good results. An example of a program using distance geometry in docking problem is DockIt. TABU SEARCHES These methods are based on stochastic processes, in which new states are randomly generated from an initial state (referred to as the current solution). These new solutions are then scored and ranked in ascending order. The best new solution is then chosen as the new current solution and the same process is then repeated again. To avoid loops and ensure diversity of the current solution a tabu list is used. This list acts as a memory. It contains information about previous current solutions and a new solution is rejected if it reminds a previous solution too much. An example of docking algorithm using tabu search is PRO LEADS.

SOFTWARES USED FOR MOLECULAR DOCKING

AutoDock

It is a suite of automated docking tools. It is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure. Current distributions of AutoDock consist of two generations of software: AutoDock 4 and AutoDock Vina. AutoDock 4 actually consists of two main programs: autodock, that performs the docking of the ligand to a set of grids describing the target protein; autogrid pre-calculates these grids. In addition to using them for docking, the atomic affinity grids can be visualized. This can help, for example, to guide organic synthetic chemists design better binders. AutoDock Vina does not require choosing atom types and pre-calculating grid maps for them. Instead, it calculates the grids internally, for the atom types that are needed, and it does this virtually instantly. DOCK DOCK is one of the oldest and best known ligand-protein docking programs. The initial version used rigid ligands; exibility was later incorporated via incremental construction of the ligand in the binding pocket. DOCK is a fragment-based method using shape and chemical complementary methods for creating possible orientations for the ligand. These orientations can be scored using three different scoring functions; however none of them contain explicit hydrogen-bonding terms, solvation/desolvation terms, or hydrophobicity terms thus limiting serious use. DOCK seems to handle well apolar binding sites and is useful for fast docking, but it is not the most accurate software available. FlexX FlexX is another fragment based method using exible ligands and rigid proteins. It uses MIMUMBA torsion angle database for the creation of conformers. The MIMUMBA is an interaction geometry database used to exactly describe intermolecular interaction patterns. For scoring, the Boehm function (with minor adaptions necessary for docking) is applied. FlexX is introduced here to pronounce the importance of scoring functions. Although FlexX and DOCK both are fragment based methods, they produce quite different results. On the contrary to DOCK which performs well with apolar binding sites, FlexX shows totally opposite behavior. It has a bit lower hit rate than DOCK but provides better estimates of Root Mean Square Distance for compounds with correctly predicted binding mode. There is an extension of FlexX called FlexE with exible receptors which has shown to produce better results with significantly lower running times.

Gold

Gold has won a lot of new users during the last few years because of its good results in impartial tests. It has a good hit rate overall, however it somewhat suffers when dealing with hydrophobic binding pockets. Gold uses genetic algorithm to provide docking of exible ligand and a protein with exible hydroxyl groups. Otherwise the protein is considered to be rigid. This makes it a good choice when the binding pocket contains amino acids that form hydrogen bonds with the ligand. Gold uses a scoring function that is based on favorable conformations found in Cambridge Structural Database and on empirical results on weak chemical interactions. The development of GOLD is currently focused on improving the computational algorithm and adding a support for parallel processing.

S-ar putea să vă placă și