Sunteți pe pagina 1din 7

Dockres documentation

http://inka.mssm.edu/~mezei/dockres/dockres.html

Description of the program Dockres: Summary of the results of docking a library to a target by Autodock-4, Autodock-Vina, eHiTS, GOLD or DOCK

Mihaly Mezei
Department of Structural and Chemical Biology Mount Sinai School of Medicine New York, NY 10029 Mihaly.Mezei@mssm.edu Jan. 19, 2012. The program Dockres scans the result of Autodock (Version 4) or Autodock-Vina or eHiTS or GOLD or DOCK docking runs with a series of ligands. It gathers the top binders and diplays a variety of statistics, both on the ligand set and on the top binding poses. Input of the program Besides the structure file for the target macromolecule (of the form macro.pdb*, or (for GOLD) macro.mol2) Dockres assumes the availability of the following files (the notation macro stands for the name of the macromolecule file's name without the .pdbqs .pdbqt, .pdb or .mol2 extension): Except for DOCK, a file called macro_<sw>.dir listing the docking result files. where <sw> is a one letter code for the screening software used: A: Autodock-4 V: Autodock-Vina E: eHiTS G: GOLD D: DOCK It can be created with the script getdir.csh (or by the the user with a text editor) prior to running Dockres. Format of the file macro_<sw>.dir: First record (optional): the name of the Autdock grid-parameter file or the eHiTS clip file (including the path relative to the current directory). Omit for Autodock-Vina and GOLD. Second to last records: the sequence number of the ligand, followed by the name of a ligand docking result file (.dlg for Autodock-4, .pdbqt for Autodock-Vina, .sdf for eHiTS or .mol2 files for GOLD, again including the path) - one record for each ligand docked For example, docking with Autodock-4 ligands ligx.mol2, ligy.mol2 and ligz.mol2 to macomolecule mm.pdbqt will result in files ligx.mm.dlg, ligy.mm.dlg, and ligz.mm.dlg. Thus the user has to prepare a file called mm.dir, with the following content
mm.gpf

1 of 7

5/9/2012 10:40 AM

Dockres documentation

http://inka.mssm.edu/~mezei/dockres/dockres.html

1 ligx.mm.dlg 2 ligy.mm.dlg 3 ligz.mm.dlg

Note that for GOLD, Dockres assumes that each pose is in a sperate file in the result directory and they are of the form gold_soln_<structure>#l_#p.mol2 where #l is the ligand number and #p is the pose number. In addition, Dockres needs For Autodock-4 The grid-parameter file (the one with the .gpf extension, used as the input to the Autogrid run) Optionally a file with the flexible part of the target with ..pdbqt extension (default name: macro_flex.pdbqt). For Autodock-Vina optionally a file with the flexible part of the target with ..pdbqt extension (default name: macro_flex.pdbqt). For eHiTS runs the clip file (in .pdb format) Dockres can be run both interactively from a terminal or in batch mode, specifying the run parameters as command-line options. When compiled with the parallel code included it has to be run in batch mode. In interactive mode it starts with asking (possibly a subset of the) for the following information: The docking software used The macromolecule file name (without the .pdb* or .mol2 extension), macro. For Autodock 4 it will ask for the existence of flexible part. If the answer is positive, it will ask for the flexible .pdbqt file 's name. If the checkpoint file macro_<sw>.ckp is present then the program will ask if it is to be used or deleted. If use is requested, it will read its contents and skip to the result summary segment. The number of top poses per ligand to consider candidates for the top scoring list (to 'extract') The use or ignoring of the clustering of poses done by Autodock-4 The amount of information per ligand to be printed on the output file macro_<sw>.res Optionally, the atom number of the center of binding site When a binding site is specified, the user has the option to select a ligand atomtype to use for the distance calculation (otherwise the ligand atom nearest to the binding site atom will be used) The size of the region around each ligand pose where target atoms will be searched for contact. This is defined by finding the smallest rectangle around the ligand (aligned with the coordinate axes) and extending it in every direction ('padding') by a user-defined number (default: 7 A). Optionally, a target ligand file to which the poses can be compared. Once this information is given, the docking result files are read and the data is extracted from each. Besides the coordinates of the pose, the program extracts two 'scores': for Autodock, the energy and free energy estimates and for eHiTS the values labeled energy and score. This may take some time - for larger libraries the program periodically will print a report of the progress. Once the data is gathered, a checkpoint file is written and the result summary starts. The result summary starts with printing on the terminal the list of the top-scoring poses, the number of poses in the top-score ranges, and a plot showing the distribution of the location of poses over the macromolecule's residues. The program then gives the user the option to For Autodock, rescore the free energies based on the multiplicity of each pose Limit the number of poses listed for the same ligand

2 of 7

5/9/2012 10:40 AM

Dockres documentation

http://inka.mssm.edu/~mezei/dockres/dockres.html

Calculate the RMSD between different top-scoring poses of the same ligand Extract docked poses. The default is to extract a PDB file with the macromolecule and the selected top-scoring poses, but the user can specify the list of poses to include as well. For rigid macromolecule a single file is generated with the different poses added to the macromolecule as additional residues (residue name LIG, chain id L). For flexible macromolecule, each pose will result in a complete file with the macromolecule and the ligand whose name will be a comination of the name of the macromolecule, the ligand and the pose number. Generate the same distribution restricted to a set of ligands specified by the user Repeating the calculation of the various statistics At this point, additional filtering criteria can be added: New value of the number of poses/ligand to extract List and/or range of residue numbers of the macromolecule atom nearest to the docked pose Minimum and maximum charge of the ligand If a tagte ligand file was specified at the outset of the run, the minimum number of contacts between the ligand and the target and the maximum average contact distance between the ligand and the target. In batch mode the following information can be specified: -sw software : the docking software, one of ATD4, eHiTS or GOLD -mm macro : the macromolecule name (without the.pdb*) -ne number of poses to extract -ic 'yes' or 'no' {yes} - use or don't use Autodock clustrering -ol Output level {2} : number between 0 and 3, higher number results in more detailed output from the result scan. -ib Binding site information - the solute atom index from which ligand distances will be gathered for later filtering -cb Binding site center - the x, y, and z coordinates of the putative binding site -lt Binding site type - the ligand atom type for which ligand target distances will be gathered for later filtering. If not specified, the nearest ligand atom will be used.
Ligand type number list: H :H-C = 1 C :>C< =11 H :H-N = 2 C :>C= =12 H :H-O = 3 C :C=-C =13 H :H-P = 4 C :C=-N =14 H :H-S = 5 C :Carom=15 H :H* = 6 C :*C* =16 N N N N N :>N< =21 :-N< =22 :-N= =23 :-N-=C=24 :*N* =25 O O O O O O O O :C-O-H=31 :N-O-H=32 :O-O-H=33 :P-O-H=34 :S-O-H=35 :C=O =36 :P=O =37 :S=O =38 O O O O O O :C-O-C=41 :C-O-N=42 :C-O-O=43 :C-O-P=44 :C-O-S=45 :*OX* =46 O-:C-On O-:P-On O-:S-On O-:*On* =51 =52 =53 =54

P :P* S :S* **:*

=58 =59 =60

-rb Binding site distance limit {100000} - ligand poses farther than this limit will be filtered out -bb Minimum backbone length - peptide ligands shorter than this number (in A) will be dropped -we ligand padding {7.0} - the value to add to the ligand coordinates when searching for nearby macromolecule atoms -nx number of poses {0} to save in the PDB file -py 0 or 1 - 0: {0} create a Pymol run file to accompany the PDB file (when the input is 1) -nl number of top poses {20} to list with detailed info -nd maximum number of poses per ligand to list. -sd 0 or 1 - 0: {1} Use name in .sdf file to list with detailed info (when the input is 1) -fu free energy units {kcal/mol}, either kcal/mol or Ki: the free energy scores can either printed in kcal/mol or as inhibition contant Ki (concentration); Ki=exp(FE/(kT)).

3 of 7

5/9/2012 10:40 AM

Dockres documentation

http://inka.mssm.edu/~mezei/dockres/dockres.html

-tl number of top scoring ligand ids to list in a file called macro_<sw>.lst. -rc 1: repeat the run starting from the checkpoint file macro_<sw>.ckp -sm 0 or 1 - 0: {1} Combine the isomers/tautomers into a single entry in the file macro_<sw>.lst. -tg targetfile: read the structure in the file targetfile; the top scoring ligand poses can be compared to it. -td Number of top DOCK ligands to consider A possible batch run call can be > dockres -mm hemoglobin -sw eHiTS -np 20 -ol 2 -ib 99 for the rest of the input that can be specified in interactive mode, defaults are used. Batch run with flexible macromolecule has not yet been implemented. Batch runs use default values for several filtering and output options. To use a non-default option for which no command-line input is implemented, an interactive run is required that can be started from the checkpoint file. It will not be CPU intensive since the time-consuming data gathering has been completed already. Output of the program Dockres will create the following files: A file called macro_<sw>.res where all result will be printed. If it is already present, it will write instead to macro_<sw>_N.res where N is the smallest integer such that no file with that number exists. A file called macro_<sw>.ckp containig all the information gathered allowing the repeated extraction of data with different filtering criteria without having to perform the time-consuming scan of the .dlg files If requested, PDB file(s) containing extracted ligand poses with the macromolecule The file macro_<sw>.res will contain The docking grid description The information requested about the docking of each ligand The distribution of a number of molecular properties in the ligand set: Number of hydrogen-bond donors Number of hydrogen-bond acceptors Number of rotatable bonds Number of NO2 groups Number of rings Molecular weight Charge For each chemical element occuring in the ligand set the number of such atoms A typical example of such output (for information in the number of rotatable bonds in a ligand) is
Distribution of number of rotatable bonds over Average= 4.1818 S.D.= 1.6414 11 ligands

1.00 .90 .80 .70 .60 .50

.00 .00 .18 .18 .18 .36 .00 .00 .09 .00 +----+----+----+----+----+----+----+----+----+----+ | | | | | | | | | | | |

4 of 7

5/9/2012 10:40 AM

Dockres documentation

http://inka.mssm.edu/~mezei/dockres/dockres.html

.40 .30 .20 .10

| |****| | | |****| | | |****|****|****|****| | | |****|****|****|****| |****| | +----+----+----+----+----+----+----+----+----+----+ 0 1 2 3 4 5 6 7 8 9

Here the X axis is the value of the property for which the distribution is calculated; the Y axis is the fraction of ligands having a particular value of the property; the numbers on the top give the actual fractions. In this example, the highest column is for 5. This means that most ligands in this library have 5 rotatable bonds. The hight of the column is at 0.4, meaning that between 30% and 40% of the ligands have 5 rotatable bonds - the actual number is 36%, shown on top. The number on top shows 0.36, meaning that 36% For both score types extracted (energy and free energy or energy and score) The number of poses having docking score in 0.5 kcal/mol intervals within 5 kcal/mol of the best score. This list helps deciding the number of top-scoring poses to extract - this number has to be specified at this point. The list of top-scoring ligands, giving Score (energy or free energy (Autodock); energy or inhibition constant (Autodock); energy or score (eHiTS); fitness or free energy (GOLD)) The number of times this pose was found (multiplicity) Ligand identifier as deduced from the file name read from the file macro_<sw>.dir Ligand sequence number in the file macro_<sw>.dir Ligand molecular weight Number of hydrogen-bond donors of the ligand molecule Number of hydrogen-bond acceptors of the ligand molecule Number of rotatable bonds of the ligand molecule Number of NO2 groups in the ligand molecule Number of rings in the ligand molecule Number of ligand-macromolecule hydrogen-bonds Total charge of the ligand molecule The macromolecule atom (index, name, residue number) nearest to the ligand The chemical formula of the ligand The list of ligand-target contact atom pairs. Contact is defined as a pair of target ligand atoms that are mutually proximal (i.e., if the ligand atom nearest to target atom iat is ial then iat is the nearest target atom to ial) If requested (whenever more then one pose per ligand was allowed to enter the top-scoring list), the RMSD between pairs of poses of the same ligand that are on the top-scoring list. The distribution of the number of poses with respect to the macromolecule residue. A typical example (shown only for residues 51-150) looks like this:
+---------+---------+---------+---------+---------+10| | | | | | | * | | * | | * | | * | | * * | |** * * | 1|** * * | +---------+---------+---------+---------+---------+51 100

5 of 7

5/9/2012 10:40 AM

Dockres documentation

http://inka.mssm.edu/~mezei/dockres/dockres.html

+---------+---------+---------+---------+---------+10| * | | * | | * | | * | | * | | * | | * | | * | | * | 1| * | +---------+---------+---------+---------+---------+101 150

The height of the column is proportional to the number of ligands docked to the residue represented by the X axis. The residues to which a ligand is docked can be assigned by using the residue where the closes contact is or using all residues that include atoms on the contact list (vide supra). The distribution of the docking free energies or eHiTS scores at the macromolecule residues. This is represented by a plot as shown below (from the same run the example above is taken):
Largest count= 7 +---------+---------+---------+---------+---------+-4.40| 2 | |1 2 | |17 | | 7 | | | |1 4 | | | | | | | -7.15| | +---------+---------+---------+---------+---------+51 100 +---------+---------+---------+---------+---------+-4.40| | | | | | | | | | | 4 | | M | | 1 | | 1 | -7.15| 1 | +---------+---------+---------+---------+---------+101 150

Here again the X axis represent the residue number, the Y axis the docking energy or free energy. Whenever a number appears, it indicates that ligands were docked to the corresponding residue, having the corresponding docking (free) energy. The number (between 0 and 9) is proportional to the number of ligands in that category. M (instead of a digit) shows the residue/energy combination with the largest number of members (the value is given before the plots); the digits 0 - 9 represent proportionally smaller number of members. Compilation of the program The program is written in Fortran 77. Its size is governed by the parameters (the number between the braces is

6 of 7

5/9/2012 10:40 AM

Dockres documentation

http://inka.mssm.edu/~mezei/dockres/dockres.html

the value set in the source code), established in the first line of the program MAXMACA {20000} - maximum number of atoms in the macromolecule MAXMACR {5000} - maximum number of residues in the macromolecule MAXMOL {15000} - maximum number of ligands MAXPOSE {200} - maximum number of poses per ligand MAXLIG {1000} - maximum number of atoms per ligand MAXSYN {3} - maximum number of docking syntaxes allowed It should be compiled at the highest optimization level for maximum speed. For example, using the g77 compiler the compilation can be executed by g77 -O4 -o dockres.exe dockres.f The optional parallelization is using the MPI library. To compile Dockres with the parallel code included, first remove the 'C@DM' string from the source code: > cat dockres.f | sed 'C@DM'd > dockres_mpi.f > f77mpi -o dockres -O4 dockres_mpi.f The name of the MPI-enabled compiler may be different in your system and additional libraries may also be needed to be invoked. For parallelized runs, the parameter MAXMOL can be set to less than the total number of ligands - it should be just large enough to hold data for Nmolec/NCPU. In this case, however, the program stops after writing the checkpoint file and a separate single-CPU run, compiled with the parameter MAXMOL set large enough to hold all ligands should be used to print/save the results. This option is useful for distributed memory systems where the majority of nodes have relatively small memory. Note that if Fortran-90 is used for one compilation, then it should be used for the other as well, otherwise the binary files will be incompatible.

7 of 7

5/9/2012 10:40 AM

S-ar putea să vă placă și