Documente Academic
Documente Profesional
Documente Cultură
EXP NO: 15
DATE:
DOCKING
TYPES OF DOCKING:
i. RIGID BODY DOCKING: The receptor and ligand are treated as rigid. Bond
length, bond angle and torsion angles of components are not modified at any stage of
complex generation. Rigid body docking is inadequate when substantial
conformational change occurs within the components during complex formation.
ii. FLEXIBLE LIGAND DOCKING: Receptor is kept rigid where as ligand is treated
as flexible.
iii. FLEXIBLE DOCKING: Flexible procedure which permits substantial
conformational change on both receptor and ligand.
Receptor molecular surface area is describes in terms of its solvent accessible surface
area.
Ligand molecular surface area is describes in terms of its matching surface
description, complementary between two surfaces help finding the complimentary
pose of the ligand are target.
ESIMULATION PROCESS:
2. Ligand finds its position in protein active site after certain number of move in its
conformation.
3. Conformation space consists of all possible orientation and conformation of the
protein pair with ligand.
4. Each snapshot of pair is referred to as “pose”.
5. Move incorporates rigid body transformations and tensional rotation.
6. Each conformation contributes to energy.
General steps:
1. Preparation of ligand
Energy minimization
Geometry optimization
Charge calculations
2. Preparation of proteins
3. Docking calculations
4. Protein ligand complex representations
DOCKING SERVER:
Docking server is a web based molecular docking programming useful in high though put
screening. It allows efficient and bust docking calculation by integrating several software.
FEATURES:
STEPS:
1. Preparation of ligand
Geometry optimization
Energy minimization
Charge calculation
Preparation of proteins
2. Docking calculation
3. Protein-ligand complex representation
I. PREPARATION OF LIGAND:
Ligands can be drawn using java applet or uploaded in appropriate file format (MDL
mol, sybyl mol12, PBD, hyperchem hin, smiles format, SDF format)
Parameters Selection:
Desired PH
Molecular mechanics / semi empirical quantum chemical calculation
parameters.
Rotatable bonds and atom types.
Draw a ligand lligand Upload a ligand Upload multiple ligand
Setup parameter
Parameter Selection:
Protein chain, heteroatom, Ligands and water selection.
Simulation box set-up.
Set up
Parameters Selection:
Protein, ligand, simulation types, number of and number of evaluation selection.
Start docking
PROCEDURE:
http://www.docking server.com/web/docking/
Step 2: click MY PROTEINS link and use upload or download option to load protein.
Step 3: enter PDB ID (3QMO) or protein name to download protein from PDB
WA. An efficient shape algorithm is used and flexible ligand docking is possible,
where the ligand is described as torsion tree and grids constructed that overlay the
binding site.
1) Receptor structures are complicated; they frequently change shape and solvent
structure upon binding to ligand.
2) The number of possible conformation rises exponentially with a number of rotatable
bonds.
3) Calculating the differential affinity between two related ligands using thermodynamic
methods is time consuming.
RESULT: Best fit orientation of a ligand and protein receptor was determined using
docking.
EXP NO: 16
DATE:
Deoxyribonucleic acid is a double stranded genetic material present in most of the organisms.
This genomic DNA differs from prokaryotes to eukaryotes.
Prokaryotes include bacteria, archaea; eukarya that have relatively small genomes with sizes
ranging from 0.5 -10mbp.the gene density in these genomes is very high; since very few
repetitive sequences are present.
In bacteria, majority of genes have a start codon ATG which codes for methionine. The other
codons like GTG and TTG along with ATG form the initiation codons which starts the
process of translation. To identify this initiation codon, a sequence called shine dalgarno
sequence which is a stretch of purine rich sequence complementary to 16s rRNA in the
ribosome tail. At the end there is a stop codon and poly T tails present. Any prokaryotic
genes are transferred together as one operon.
These genomes are much larger than prokaryotic ones. Its sizes ranging from 10mbp to
670gbp.they tend to have very low gene density. Since the space between genes is often very
large and rich, repetitive sequences occur. Most importantly genomic DNA is characterized
by mosaic organization in which a gene is split into pieces called ‘exons‘ by intervening non-
coding sequences called ‘introns‘.
The nascent transcript from a eukaryotic gene is modified in 3 different ways, before
undergoing translation which include 5‘capping, splicing and 3‘poly adenylation. These
genomesconsists of kozak sequences as a start codon and poly A tail at the termination,
which locate the final coding sequence. The splice site consists of GTAACT as a consensus
motif since the splice section of introns and exons following GT-AG rules for splicing. The
140 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019
CG island is a short stretch of DNA in which the frequency of the CG sequence is higher than
other regions. It is also called the CpG Island, where “p” simply indicates that “C” and “G”
are connected by a phosphodiester bond. The HMM can be used to find if a given short
sequence, the sequence comes from CpG islands or not. The HMM can also be trained to find
the CpG islands in a long sequence.
CpG islands are often located around the promoters of housekeeping genes (which are
essential for general cell functions) or other genes frequently expressed in a cell. At these
locations, the CG sequence is not methylated. By contrast, the CG sequences in inactive
genes are usually methylated to suppress their expression. Methylation of promoter-
associated CGIs plays an important role in gene regulation and carcinogenesis. Because of
the functional importance, multiple algorithms have been available for identifying CGIs in a
sequence.
Functional units of DNA (genes) can be identified by different gene-finding programs, such
as GeneMark, GlimmerM, GRAIL, GenScan, and Fgenes. RepeatMasker is a program that
screens DNA sequences for interspersed repeats and low complexity DNA sequences.
INTRODUCTION:
Deoxyribonucleic acid is a double stranded genetic material present in most of the organisms.
This genomic DNA differs from prokaryotes to eukaryotes.
Prokaryotes include bacteria, archaea; eukarya that have relatively small genomes with sizes
ranging from 0.5-10mbp.The gene density in these genomes is very high; since very few
repetitive sequences are present.
In bacteria, majority of genes have a start codon ATG which codes for methionine. The other
codons like GTG and TTG along with ATG form the initiation codons which starts the
process of translation. To identify this initiation codon, a sequence called shine dalgarno
sequence which is a stretch of purine rich sequence complementary to 16s rRNA in the
ribosome tail. At the end there is a stop codon and poly T tail is present. Any prokaryotic
genes are transferred together as one operon.
These genomes are much larger than prokaryotic ones. Its sizes ranging from 10mbp to
670gbp.they tend to have very low gene density. Since the space between genes is often very
large and rich, repetitive sequences occur. Most importantly genomic DNA is characterized
by mosaic organization in which a gene is split into pieces called ‘exons’ by intervening non-
coding sequences called ‘introns’.
The nascent transcript from a eukaryotic gene is modified in 3 different ways, before
undergoing translation which include 5‘capping, splicing and 3‘poly adenylation. These
genomes consist of kozak sequences as a start codon and poly a tail at the termination, which
locate the final coding sequence. The splice site consists of GTAACT as a consensus motif
since the splice section of introns and exons following GT-AG rules for splicing. The CG
Island is a short stretch of DNA in which the frequency of the CG sequence is higher than
other regions. It is also called the CpG Island, where “p” simply indicates that “C” and “G”
are connected by a phosphodiester bond. The HMM can be used to find if a given short
sequence, the sequence comes from CpG islands (or) not. The HMM can also be trained to
find the CpG islands in a long sequence.
DATE:
AIM: To predict the exon and intron regions of the given DNA sequence (Homo sapiens
leptin —lep, 3444 bp, NM_000230)
PROCEDURE:
Retrieve the query DNA sequence in FASTA format by accessing the nucleotide
database.
Login into http://genes.mit.edu/GENSCAN.html and paste the sequence in input box.
Click run button
As soon as the run button is clicked the process continues for predicting the gene
structure.
The results are obtained after the analysis.
RESULT: Two exons have been predicted from regions 84 to 587 and 1676 to 1681.
DATE:
SPILCE PREDICTOR
AIM: To predict the splice sites of the given DNA sequence (Homo sapiens leptin —lep,
3444 bp, NM_000230)
PROCEDURE:
Retrieve the query DNA sequence in FASTA format by accessing the nucleotide
database.
Login into http://deepc2.psi.iastate.edu/cgi-bin/sp.cgi and paste the sequence in input
box.
Click run button
As soon as the run button is clicked the process continues for predicting the gene
structure.
The results are obtained after the analysis.
RESULT: 16 splice sites have been predicted using splice site prediction by neutral
networks.
EXP NO: 17
DATE:
Protein structure prediction is the prediction of the three dimensional structure of a protein
from its amino acid sequence that it is the prediction of a proteins tertiary structure from its
primary structure. It is one of the most important goals pursued by bioinformatics and the
theoretical chemistry.
With no homologue of known structure from which to make a 3D model, a logical next step
is to predict secondary structure. Although they differ in a method, the aim of secondary
structure prediction is to provide the location of alpha helices and beta strands within a
protein or protein family.
There are now many web servers for structure prediction, here is a quick summary:
Secondary structure prediction has been around for almost a quarter of a century. The early
methods suffered from a lack of data. Predictions were preformed on single sequences rather
than families of homologous sequences, and were relatively few known 3D structures from
which to derive parameters. Probably the most famous early methods are those of Chou and
Fasman, Garnier, Osguthorbe, and Robson (GOR) and Limited. Although the author's
147 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019
originally claimed quite high accuracies (70-80%), under careful examination, the methods
were shown to be only between 56 and 60% accurate (see Jabich and Sander,1984 given
below). An early problem in secondary structure prediction had been the inclusion of
structures used to derive parameters in the set of structures, used to assess the accuracy of the
method.
DATE:
PROCEDURE:
REPORT: Results were obtained representing secondary structure elements with their
single letter codes such as Helix-h, Beta sheets-e, Coils-c.
151 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019
DATE:
JMPred
PROCEDURE:
REPORT: Results were obtained representing secondary structure elements with their
Single letter codes such as Helix-h, Beta sheets-e.
153 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019
DATE:
SOPMA
PROCEDURE:
REPORT: Results were obtained representing secondary structure elements with their
single letter codes such as Helix-h, Beta sheets-e, Coils-c.
155 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019
EXP NO: 18
DATE:
PRINCIPLE:
HOMOLOGY MODELING:
The quality of the homology model is dependent on the quality of the sequence alignment
and template structure. The approach can be complicated by the presence of alignment gaps
(commonly called indels) that indicate a structural region present in the target but not in the
template, and by structure gaps in the template that arise from poor resolution in the
experimental procedure (usually X-crystallography) used to solve the structure. Model
quality declines with decreasing sequence identity; a typical model has ~1.2 .A root mean
square deviation between the matched C° atoms at 70% sequence identity but only 2-4 A
agreement at 25% sequence identity. However, the errors are significantly higher in the loop
regions, where the amino acids sequences of the target and template proteins may be
completely different.
Homology modeling can produce high-quality structural models when the target and template
are closely related, which has inspired the formation of a structural genomics consortium
dedicated to the production of representative experimental structures for all classes of protein
folds.
The homology modeling procedure can be broken down into four sequential steps: template
selection, target-template alignment, model construction, and model assessment. The first two
steps are often essentially performed together, as the most common methods of identifying
templates rely on the production of sequence alignments; however, these alignments may not
be of sufficient quality because data-base search techniques prioritize speed over alignment
quality. These processes can be performed iteratively to improve the quality of the final
model, although quality assessments that are not dependent on the true target structure are
still under development.
RAMCHANDRAN PLOT:
Phi Psi
Φ Ψ
DATE:
REQUIREMENTS:
Websites: NCBI- http://www.ncbi.nlm.nih.gov/protein/
htpp://www.expasy.org/swissmod/SWISS-MODEL.html
PROCEDURE:
REPORT: The three-dimensional structure of the given protein sequence was predicted
using homology modeling.
EXP NO: 19
DATE:
Structural query language (SQL) is a computer language designed for managing data in
relational database management system. It is originally based upon relational algebra and
calculus. In common usage of SQL also encompasses data manipulation language (DML)
used for creating and modifying tables and other database structures.
PROPERTIES:
7) A column containing cost value is a foreign key which defines how table relate to
each other.
8) A field may have no value in it; this is called a null value.
9) A field can be found at the intersection of row and column; there can one value in it.
SQL FEATURES:
SQL can be used by a range of users, including those with little or no program
knowledge.
It is non-procedural language.
It reduces the amount of time required for creating and maintaining systems.
SQL RULES:
It starts with a verb, each verb is followed by number of clauses and a space ( )
separate clauses.
A comma (,) separates parameters without a clause.
A semicolon (;) is used to end SQL statement.
Statement may be split across lines but keywords may not.
Lexical units such as identifiers, operator names, and literals are separated by one or
more spaces or other delimiters that will not confused with the lexical units.
Reserved words may not be used as identifiers unless enclosed with double quotes.
Identifier can contain up to 30 characters and must start with an alphabetical
character.
Character and date literals must be enclosed with in single quotes.
SQL Delimiters:
Delimiters are symbols or compound symbols which have a special meaning within SQL
statements.
Aritmetic operations : +, -, *, %
Relational operations: >, <, >_, <_, ++, --
Logical operations: AND, OR, NOT
Special operations: between, and, not between
EXP NO: 20
DATE:
BIO PERL
INTRODUCTION:
PERL (Practical extraction and report language) is a programming language available for
most operating system. Using PERL, series of complex tasks can be reduced to single
statement. PERL is preferred for processing sequence analysis and database management. A
PERL, Program consists of a text file containing series of Perl, statement. Statement looks
like an Amalgam of c, UNIX shell script and English.
BIO PERL is a collection of Perl Modulus that facilities the development of Perl script for
bioinformatics application. It has played an integral role in human genome project. It is a
popular tool –kit developed as a collection of integrated PERL, modulus for transforming and
manipulating sequence data and annotations accession remote database and parsing output
from program such as BLAST, FAST etc. Bio- PERL also facilitates local execution of
programs from the EMBOSS suite. Bio-PERL saves time and effort.
In order to take advantage of bioperl the user needs a basic understanding of the PERL,
programming language including an understanding of how to use PERL reference, modules,
objects and methods. Bioperl is an open source software that is still under active
development. The advantage of open source software include the ability to freely examine
and modify source code and exemption from software licensing fees.
FEATURES:
Bioperl provide software modules for many of the typical tasks of bioinformatics
programming these include:
Accessing nucleotide and peptide sequence data from local and remote database
Transforming formats of database/file records
#!/usr/bin/perl-Use strict;
Each line ends with a; this is the way that the programmer tells perl that a statement is
complete.
The first line use strict; pragma enforce the strictest possible rules on the code. It I imperative
that beginners user the pragma; as it will help them understand the errors in their code easily
the
Second line is a simple print function is used to take strin as its argument and sending that
string to the standard output. The next line variable $ username that starts with a$, which is
said to be a “scalar variable. Scalar variable can hold strings.
The Chomp function removes any new line characters that are on the end of a variable.
171 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019
The final statement is another print statement; it uses the value of the username variable to
greet the user with his or her name.
Perl data types are of three types; scalar, array and hash data types.
1) SCALAR TYPE: A scalar is a single number or a string. The most basic kind of
variable in perl is the scalar variable. These scalars are remarkable in that string and
numbers are completely interchangeable.
For example,
$priority=9;
This statement sets the scalar variable $ priority to 9, but you can also assign to
exactly the same variable.
$ priority=”high”;
2) ARRAY TYPE: An array is a list of scalar values
Example
@my ary = (“A”,”C”,”G”,”T”);
One element of a list is a scalar. We should write $ my ary [1] instead of @my ary [1]
to get Access to the second element of the corresponding lists.
3) HASH TYPE: A hash is basically an array, in which the elements are addressed by
keywords.
The above statement hash represents a pair of bases to acids translate table if we want to
know the acid sequence “AAT” is translated to we just write
BIO-PERL OBJECTS: BIOPERL objects are all packed into modules, each module
consist of necessary subroutines. A module is a reusable piece or collection of PERL code
that can be called from you program, to carry out specific tasks.
USE bio::seq;
a) Sequence objects sequence is the central sequences objects in bioperl, when in doubt,
this is probably the object that you want to use to describe a DNA, RNA or protein
sequence in bioperl. Most common sequence manipulations can be performed with
sequence
There are seven different sequence objects, they are sequence primary sequence,
locatable sequence, live sequence, large sequence, sequence I and sequence with
quality.
b) Alignment objects: (simple aligu): this module allows the user to convert between
alignment formats as well as more sophisticated operations, like extracting specific
region of the alignment and generating consensus sequence.
c) Location objects: location object is designed to be associated with a sequence feature
object to indicate where on a larger structure, the feature can be found
The reason why this simple concept has evolved in a collection of rather complicated
Object is that some objects have multiple locations or sub-locations.
d) Interface objects and implementation an interface is sole the definition of what
methods one can object, without any knowledge of here it is implemented.
An implementation is an actual, working implementation of an object. In bio-perl
The interface objects usually have names like bio, my object I,
With the trialing I indicating it is an interface object. The interface object mainly
provide documentation on what the interface is, and how it use it without any
implementation.
Bio-per provides software modules for many of the typical tasks of bioinformatics
programming. This includes:
APPICATIONS OF BIO-PERL:
PROCEDURE:
REPORT: The given DNA sequence was converted into RNA using BIO-PERL.
EXP NO: 21
DATE:
WEBSITE: www.pubchem.ncbi.nlm.nih.gov
PubChem is a database of chemical molecules and their activities against biological assays.
The system is maintained by National Center for Biotechnology Information (NCBI), a
component of National Library of Medicine, which is a part of United States National
Institutes Health (NIH). PubChem can be accessed for free through the Web user interface.
Millions of compound structures and descriptive datasets can be easily downloaded via FTP.
PubChem contains substance descriptions and small molecules with fewer than 1000 atoms
and 1000 bonds. The American Chemical Society tried to get the U.S congress to restrict the
operation of PubChem, because they that claim it competes with their Chemical Abstract
Services. More than 80 database vendors contribute to growing PubChem database.
PUBCHEM DATABASES:
stored with in PubChem Compounds are pre-clustered and cross-referenced by identity and
similarity groups. PubChem Compound includes over 5M compounds.
Molecular Name Searches (eg., Tylenol, Benzene) allow searching with a variety of chemical
synonyms.
Chemical Property Range Searches (eg., molecular weight between 100 and 200,
hydrogen bond acceptor count between 3 and 5) allow searching for compounds with a
variety of physical / chemical properties and descriptors.
Simple Elemental Searches (all compounds containing Gallium) allow searching with
specific restrictions.
Molecule synonym searches (eg., all substance with “deoxythymidine” as a name fragment,
or substance that contain 3’-azido-3’-deoxythymidine).
Biological Links Search (eg., substances with tested, active or inactive bioassay).
Combined Searches (eg., substances that are “Active in any BioAssay” and contain the
element Ruthenium).
PUBCHEM BIOASSAY:
To browse our download PubChem BioAssay Results (NCI AIDS Antiviral Assay).
SEARCHING PUBCHEM
PubChem Text Search is for searching compound name, synonym or ID that defaults to
PubChem Compound. The search result page offers a pull down “databases” menu that
allows searching in PubChem Substances, PubChem Biological Assay and variety of other
Entrez databases.
PubChem Chemical Structure Search (7) has the following options: Search SMILES
(including SMARTS or InChl) or Formula which includes a “sketch” linked to a drawing
program that converts structural diagrams to SMILES(exact), SMARTS(structure) or
InChl(exact) strings for searching
Clicking “DONE” on the “structure editor” converts the structural diagram to the appropriate
string and transfers it to the search box.
Select Structure File allows importation of standard and common chemical file formats.
Specify Search Type allows restrictions to same compound, similar compounds, formula or
substructure.
PubChem Indexes and Index Search allows fielded/range searching from either the PubChem
homepage or Entrez search page. An extensive list of field aliases and examples of range
searching is provided.
PubChem Compound
PubChem Compound results area derived from PubChem Substances records that provide
structures. Since compounds are structurally unique one compound may link to multiple
substances. The default display is a compound summary with thumbnails with crosslinks to
each PubChem database, other NCBI database and depositor’s databases.
177 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019
Clicking either the structure or SID link gives the full display which includes the compound’s
property data, description, related substances information, neighboring structures and cross
links.
PubChem Substances
PubChem Substance has unique records if the structure are not known or supplied. For
example, sulfated polymannurogluronate, a novel anti-acquired immune deficiency syndrome
(AIDS) drug candidate and other natural products.
The PubChem Substance Summary Record is linked to the full record by clicking on the SID
number (PubChem’s Substance identifier). This displays the full record, that includes links to
PubMed and the source; the Medical Subject Annotation (MESH Substance Name) and a
MESH PubMed search link; and depositor supplied synonyms and comments.
PROCEDURE: