Sunteți pe pagina 1din 53

PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 15

DATE:

DOCKING

DEFINITION: Docking, a molecular modeling technique describes ‘best – fit’ orientation


of a ligand and receptor protein. Docking is the process of fitting molecule into a model
receptor-binding site to predict possible ligand receptor interactions.

TYPES OF DOCKING:

i. RIGID BODY DOCKING: The receptor and ligand are treated as rigid. Bond
length, bond angle and torsion angles of components are not modified at any stage of
complex generation. Rigid body docking is inadequate when substantial
conformational change occurs within the components during complex formation.
ii. FLEXIBLE LIGAND DOCKING: Receptor is kept rigid where as ligand is treated
as flexible.
iii. FLEXIBLE DOCKING: Flexible procedure which permits substantial
conformational change on both receptor and ligand.

MOLECULAR DOCKING APPROACHES:

GEOMETRY MATCHING OR SHAPE COMPLIMENTARY METHOD : It describes


protein and ligand features such as molecular surface.

 Receptor molecular surface area is describes in terms of its solvent accessible surface
area.
 Ligand molecular surface area is describes in terms of its matching surface
description, complementary between two surfaces help finding the complimentary
pose of the ligand are target.

ESIMULATION PROCESS:

1. Protein and ligand are separated by some physical distance.

127 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

2. Ligand finds its position in protein active site after certain number of move in its
conformation.
3. Conformation space consists of all possible orientation and conformation of the
protein pair with ligand.
4. Each snapshot of pair is referred to as “pose”.
5. Move incorporates rigid body transformations and tensional rotation.
6. Each conformation contributes to energy.

PROTEIN LIGAND DOCKING:

General steps:

1. Preparation of ligand
 Energy minimization
 Geometry optimization
 Charge calculations
2. Preparation of proteins
3. Docking calculations
4. Protein ligand complex representations

MOLECULAR DOCKING APPLICATIONS:

Docking helps in the generation of lead for the design of therapeutics.

1. LEAD IDENTIFICATION: Screening libraries of data base molecules Insilco to


identify lead molecules for further development into potential drug candidates.
2. LEAD OPTIMIZATION: Structure optimization of lead molecule to enhance their
potency.

DOCKING SERVER:

Docking server is a web based molecular docking programming useful in high though put
screening. It allows efficient and bust docking calculation by integrating several software.

128 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

FEATURES:

 High though put screening


 Docking for known binding site
 Calculation for inhibition constant, binding geometry and secondary interactions
 Determination of binding site

STEPS:

1. Preparation of ligand
 Geometry optimization
 Energy minimization
 Charge calculation
 Preparation of proteins
2. Docking calculation
3. Protein-ligand complex representation
I. PREPARATION OF LIGAND:
Ligands can be drawn using java applet or uploaded in appropriate file format (MDL
mol, sybyl mol12, PBD, hyperchem hin, smiles format, SDF format)
Parameters Selection:

 Desired PH
 Molecular mechanics / semi empirical quantum chemical calculation
parameters.
 Rotatable bonds and atom types.
Draw a ligand lligand Upload a ligand Upload multiple ligand

Setup parameter

Ligands ready for docking

129 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

II. PREPARATION OFPROTEINS:


3D structure protein can be uploaded or directly downloaded to docking server from
proteins data bank by providing PDB ID.

Parameter Selection:
 Protein chain, heteroatom, Ligands and water selection.
 Simulation box set-up.

Upload PDB Download PDB file

Set up

Set up simulation box

Protein ready for docking

130 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

III. LIGAND PROTEIN DOCKING CALCULATION:


Protein, ligand, and required parameter should be selected.

Parameters Selection:
 Protein, ligand, simulation types, number of and number of evaluation selection.

IV. RESULTS EVALUATION:


Secondary ligand protein interaction can be analyzed to know
how good the docking.

Setup single docking calculation Setup multiple docking calculation

Setup docking parameter

Start docking

Resulting of docking calculation

131 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

PROCEDURE:

http://www.docking server.com/web/docking/

Step 1: click docking link in main page.

Step 2: click MY PROTEINS link and use upload or download option to load protein.

Step 3: enter PDB ID (3QMO) or protein name to download protein from PDB

Step 4: select and download the protein from list.

Step 5: process the protein preparation step.

Step 6: use MY LIGANDS link to save ligand molecule.

Step 7: select ligand from MY LIGANDS.

Step 8: click START DOCKING link to proceed with docking.

Step 9: result page of docking procedure appears.

SOFTWARE TOOL FOR DOCKING :

1. SWISS DOCKS: Swiss dock is a web-based molecule-modeling tool developed by


molecular modeling group at Swiss institute of informatics. Modeling is based on
software EADock, DSS, which generate various binding modes for the ligand and
target. The binding modes with most favorable energies can be visualized.
2. AUTO DOCK: Auto dock is a computational tool aid in bioactive agents
development, by predicting the interaction of small molecule or macro molecule, it is
the development by molecular graphic laboratory, Scripps research institute, la jolla
USA. It uses Monte Carlo annealing technique (metropolis method) for configuration
exploration. It calculates the energy of molecular complex by using grid based on
molecular affinity potentials. Auto dock consists of two main components.
AUTOGRIDE: describes the grid of protein (binding site)
AUTODOCK: performs the docking of ligands on pre-calculated gride of protein.
3. ARGUS LAB: Argus lab is a free molecular modeling, graphics and drug design
software tool. It is developed by mark A Thomson, Planaria software, LLC, Seattle

132 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

WA. An efficient shape algorithm is used and flexible ligand docking is possible,
where the ligand is described as torsion tree and grids constructed that overlay the
binding site.

MAJOR PROBLEMS ASSOCIATED WITH DOCKING:

1) Receptor structures are complicated; they frequently change shape and solvent
structure upon binding to ligand.
2) The number of possible conformation rises exponentially with a number of rotatable
bonds.
3) Calculating the differential affinity between two related ligands using thermodynamic
methods is time consuming.

133 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

134 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

135 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

136 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

137 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

138 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

RESULT: Best fit orientation of a ligand and protein receptor was determined using
docking.

139 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 16

DATE:

NUCLEIC ACID FEATURE IDENTICATION

NUCLEIC ACID FEATURES PREDICTION:

Deoxyribonucleic acid is a double stranded genetic material present in most of the organisms.
This genomic DNA differs from prokaryotes to eukaryotes.

a) Genomic DNA features of prokaryotes:

Prokaryotes include bacteria, archaea; eukarya that have relatively small genomes with sizes
ranging from 0.5 -10mbp.the gene density in these genomes is very high; since very few
repetitive sequences are present.

In bacteria, majority of genes have a start codon ATG which codes for methionine. The other
codons like GTG and TTG along with ATG form the initiation codons which starts the
process of translation. To identify this initiation codon, a sequence called shine dalgarno
sequence which is a stretch of purine rich sequence complementary to 16s rRNA in the
ribosome tail. At the end there is a stop codon and poly T tails present. Any prokaryotic
genes are transferred together as one operon.

b) Genomic DNA features in eukaryotes:

These genomes are much larger than prokaryotic ones. Its sizes ranging from 10mbp to
670gbp.they tend to have very low gene density. Since the space between genes is often very
large and rich, repetitive sequences occur. Most importantly genomic DNA is characterized
by mosaic organization in which a gene is split into pieces called ‘exons‘ by intervening non-
coding sequences called ‘introns‘.

The nascent transcript from a eukaryotic gene is modified in 3 different ways, before
undergoing translation which include 5‘capping, splicing and 3‘poly adenylation. These
genomesconsists of kozak sequences as a start codon and poly A tail at the termination,
which locate the final coding sequence. The splice site consists of GTAACT as a consensus
motif since the splice section of introns and exons following GT-AG rules for splicing. The
140 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

CG island is a short stretch of DNA in which the frequency of the CG sequence is higher than
other regions. It is also called the CpG Island, where “p” simply indicates that “C” and “G”
are connected by a phosphodiester bond. The HMM can be used to find if a given short
sequence, the sequence comes from CpG islands or not. The HMM can also be trained to find
the CpG islands in a long sequence.

CpG islands are often located around the promoters of housekeeping genes (which are
essential for general cell functions) or other genes frequently expressed in a cell. At these
locations, the CG sequence is not methylated. By contrast, the CG sequences in inactive
genes are usually methylated to suppress their expression. Methylation of promoter-
associated CGIs plays an important role in gene regulation and carcinogenesis. Because of
the functional importance, multiple algorithms have been available for identifying CGIs in a
sequence.

Functional units of DNA (genes) can be identified by different gene-finding programs, such
as GeneMark, GlimmerM, GRAIL, GenScan, and Fgenes. RepeatMasker is a program that
screens DNA sequences for interspersed repeats and low complexity DNA sequences.

FEATURES OF GENOMIC DNA SEQUENCE:

INTRODUCTION:

Deoxyribonucleic acid is a double stranded genetic material present in most of the organisms.
This genomic DNA differs from prokaryotes to eukaryotes.

a) Genomic DNA features of prokaryotes:

Prokaryotes include bacteria, archaea; eukarya that have relatively small genomes with sizes
ranging from 0.5-10mbp.The gene density in these genomes is very high; since very few
repetitive sequences are present.

In bacteria, majority of genes have a start codon ATG which codes for methionine. The other
codons like GTG and TTG along with ATG form the initiation codons which starts the
process of translation. To identify this initiation codon, a sequence called shine dalgarno
sequence which is a stretch of purine rich sequence complementary to 16s rRNA in the

141 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

ribosome tail. At the end there is a stop codon and poly T tail is present. Any prokaryotic
genes are transferred together as one operon.

b) Genomic DNA features in eukaryotes:

These genomes are much larger than prokaryotic ones. Its sizes ranging from 10mbp to
670gbp.they tend to have very low gene density. Since the space between genes is often very
large and rich, repetitive sequences occur. Most importantly genomic DNA is characterized
by mosaic organization in which a gene is split into pieces called ‘exons’ by intervening non-
coding sequences called ‘introns’.

The nascent transcript from a eukaryotic gene is modified in 3 different ways, before
undergoing translation which include 5‘capping, splicing and 3‘poly adenylation. These
genomes consist of kozak sequences as a start codon and poly a tail at the termination, which
locate the final coding sequence. The splice site consists of GTAACT as a consensus motif
since the splice section of introns and exons following GT-AG rules for splicing. The CG
Island is a short stretch of DNA in which the frequency of the CG sequence is higher than
other regions. It is also called the CpG Island, where “p” simply indicates that “C” and “G”
are connected by a phosphodiester bond. The HMM can be used to find if a given short
sequence, the sequence comes from CpG islands (or) not. The HMM can also be trained to
find the CpG islands in a long sequence.

142 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 16 (a)

DATE:

NUCLEIC ACID FEATURE IDENTIFICATION USING


GENSCAN

AIM: To predict the exon and intron regions of the given DNA sequence (Homo sapiens
leptin —lep, 3444 bp, NM_000230)

PROCEDURE:

 Retrieve the query DNA sequence in FASTA format by accessing the nucleotide
database.
 Login into http://genes.mit.edu/GENSCAN.html and paste the sequence in input box.
 Click run button
 As soon as the run button is clicked the process continues for predicting the gene
structure.
 The results are obtained after the analysis.

143 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

RESULT: Two exons have been predicted from regions 84 to 587 and 1676 to 1681.

144 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 16 (b)

DATE:

SPILCE PREDICTOR

AIM: To predict the splice sites of the given DNA sequence (Homo sapiens leptin —lep,
3444 bp, NM_000230)

PROCEDURE:

 Retrieve the query DNA sequence in FASTA format by accessing the nucleotide
database.
 Login into http://deepc2.psi.iastate.edu/cgi-bin/sp.cgi and paste the sequence in input
box.
 Click run button
 As soon as the run button is clicked the process continues for predicting the gene
structure.
 The results are obtained after the analysis.

145 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

RESULT: 16 splice sites have been predicted using splice site prediction by neutral
networks.

146 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 17

DATE:

PROTEIN STRUCTURE PREDICTION

Protein structure prediction is the prediction of the three dimensional structure of a protein
from its amino acid sequence that it is the prediction of a proteins tertiary structure from its
primary structure. It is one of the most important goals pursued by bioinformatics and the
theoretical chemistry.

With no homologue of known structure from which to make a 3D model, a logical next step
is to predict secondary structure. Although they differ in a method, the aim of secondary
structure prediction is to provide the location of alpha helices and beta strands within a
protein or protein family.

There are now many web servers for structure prediction, here is a quick summary:

1. PSI-pred (PSI-BLAST profiles used for prediction; David Jones, Warwick)


2. JPRED Consensus prediction(includes many of the methods given below; Cuff and
Barton, EBI)
3. DSC King and Sternberg (this server)
4. PREDATOR Frischman and Argos (EMBL)
5. PHD home page Rost and Sander , EMBL, Germany
6. ZPRED server Zvelebil et al., Ludwig, U.K
7. nnPredict Cohen et al., UCSF, USA.
8. BMERC PSA Server Boston University, USA
9. SSP (Nearest- neighbor) Soloviev and Salami, Baylor College, USA.

METHODS FOR SINGLE SEQUENCES:

Secondary structure prediction has been around for almost a quarter of a century. The early
methods suffered from a lack of data. Predictions were preformed on single sequences rather
than families of homologous sequences, and were relatively few known 3D structures from
which to derive parameters. Probably the most famous early methods are those of Chou and
Fasman, Garnier, Osguthorbe, and Robson (GOR) and Limited. Although the author's
147 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

originally claimed quite high accuracies (70-80%), under careful examination, the methods
were shown to be only between 56 and 60% accurate (see Jabich and Sander,1984 given
below). An early problem in secondary structure prediction had been the inclusion of
structures used to derive parameters in the set of structures, used to assess the accuracy of the
method.

Comparative protein modeling uses previously solved structures as starting points or


templates. This is effective because it appears that although the number of actual proteins is
vast, there is a limited set of tertiary structural motifs to which most proteins belong. It has
been suggested that there are only around 2000 distinct protein folds in nature, though there
are many millions of different proteins.

These methods may also be split into two groups.

 HOMOLOGY MODELING: It is based on the reasonable assumption that two


homologous proteins will share very similar structures. Because a proteins fold is more
evolutionary conserved than its amino acids sequence, a target sequence ( the structure
of protein is yet solved) can be modeled with reasonable accuracy on a very distantly
related template. (Protein structure is solved experimentally-x ray/NMR), provided that
the relationship between target and template can be discerned through sequence
alignment. It has been suggested that the primary bottle neck in the comparative
modelling arises from difficulties in alignment rather than from errors in structure
prediction, gives a known-good alignment.

 PROTEIN THREADING: Scans the amino acid sequence of an unknown structure


against a database of solved structures. In each case, a scoring function is used to assess
the compatibility of the sequence to the structure, thus yielding possible three
dimensional models. This type of method is also known as 3D fold recognition due to its
compatibility analysis between three dimensional structures and linear protein
sequences. This method has also given rise to methods performing an inverse folding
search by evaluating the compatibility of a given structure with a large database of
sequences, thus predicting which sequences have the potential to produce a given fold.

148 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 17 (a)

DATE:

SECONDARY STRUCTURE PREDICTION OF A PROTEIN

GOR IV ALGORITHM (GOR - Garnier-Osguthorpe-Robson Method)

AIM: To predict secondary structure of human Cox 2 using GOR- IV Algorithm.

PROCEDURE:

1. Retrieve protein sequence from NCBI database in FASTA format.


2. Login to HTTPS://npsa-pbil,ibcp,fr/egi-bin/npsa_automat.pl?page=npsa_gor4.html to
perform secondary structure prediction.
3. Paste the protein sequence in the input box.
4. Submit the procedure by clicking on SUBMIT button.
5. Result will be obtained after the completion of the task.

149 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

150 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

REPORT: Results were obtained representing secondary structure elements with their
single letter codes such as Helix-h, Beta sheets-e, Coils-c.
151 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 17 (b)

DATE:

JMPred

AIM: To predict secondary structure of human coz 2 using JPred software.

PROCEDURE:

 Retrieve protein sequence from NCBI database in FASTA format


 Login to http://www.compbio.dundee.ac.uk/~www-jpred/
 To perform secondary structure prediction.
 Paste the protein sequence in the input box.
 Submit the procedure by clicking on MAKE PREDICTION button.
 Result will be obtained after the completion of the task.

152 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

REPORT: Results were obtained representing secondary structure elements with their
Single letter codes such as Helix-h, Beta sheets-e.
153 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 17 (c)

DATE:

SOPMA

AIM: To predict secondary structure of human Cox 2 using SOPMA.

PROCEDURE:

 Retrieve protein sequence from NCBI database in FASTA format.


 Login to
http://npsa-pbil.ibcp.fr/egi in/npsa_automat.p|?page=npsa_sopma.html
to perform secondary structure prediction.
 Paste the protein sequence in the input box.
 Submit the procedure by clicking on SUBMIT button. Result will be obtained after the
completion of the task.

154 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

REPORT: Results were obtained representing secondary structure elements with their
single letter codes such as Helix-h, Beta sheets-e, Coils-c.
155 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 18

DATE:

3D-STRUCTURE VISUALISATION AND HOMOLOGY


MODELING

PRINCIPLE:

HOMOLOGY MODELING:

Homology modeling also known as comparative modeling refers to constructing an atomic-


resolution model of the “target” protein from its amino-acid sequence and an experimental
three-dimensional structure of a related homologous protein (the “template”). Homology
modeling relies on the identification of one or more protein structures likely to resemble the
structure of the query sequence, and on the production of an alignment that map residues in
the query sequence to residues in the template sequence. The sequence alignment and
template structure are then used to produce a structural model of the target. Because protein
structures are more conserved than DNA sequences, detectable levels of sequence similarity
usually imply significant structural similarity.

The quality of the homology model is dependent on the quality of the sequence alignment
and template structure. The approach can be complicated by the presence of alignment gaps
(commonly called indels) that indicate a structural region present in the target but not in the
template, and by structure gaps in the template that arise from poor resolution in the
experimental procedure (usually X-crystallography) used to solve the structure. Model
quality declines with decreasing sequence identity; a typical model has ~1.2 .A root mean
square deviation between the matched C° atoms at 70% sequence identity but only 2-4 A
agreement at 25% sequence identity. However, the errors are significantly higher in the loop
regions, where the amino acids sequences of the target and template proteins may be
completely different.

Homology modeling can produce high-quality structural models when the target and template
are closely related, which has inspired the formation of a structural genomics consortium

156 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

dedicated to the production of representative experimental structures for all classes of protein
folds.

The homology modeling procedure can be broken down into four sequential steps: template
selection, target-template alignment, model construction, and model assessment. The first two
steps are often essentially performed together, as the most common methods of identifying
templates rely on the production of sequence alignments; however, these alignments may not
be of sufficient quality because data-base search techniques prioritize speed over alignment
quality. These processes can be performed iteratively to improve the quality of the final
model, although quality assessments that are not dependent on the true target structure are
still under development.

RAMCHANDRAN PLOT:

 It is used to visualize the backbone of amino-acids residues in protein structure.


 It is used for structural validation and to calculate the possible phi and psi angles that
account for amino acid residues.

Phi Psi

Φ Ψ

 It is done by several software namely WHATIF, RP RAMPAGE.


 Conformation deemed possible are those that involve little or no steric interference
based on calculations using vanderwaal’s radii and bond angles.
 The areas shaded dark blue reflects on formation that involves no steric overlap and
thus are fully allowed.
 Medium blue indicates conformation allowed at the extreme limits for unfavorable
atomic contacts.
 Lightest blue area reflects conformations that are permissible if a little flexibility is
allowed in the bond angle.
 Unshaded portion indicates sterically disallowed conformation.

157 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 18 (a)

DATE:

3D-STRUCTURE VISUALISATION AND HOMOLOGY


MODELING

AIM: To predict the three-dimensional structure of a given protein sequence using


Homology modeling.

REQUIREMENTS:
Websites: NCBI- http://www.ncbi.nlm.nih.gov/protein/

Molecular visualisation tool Swiss- PDB Viewer

htpp://www.expasy.org/swissmod/SWISS-MODEL.html

PROCEDURE:

 Go to NCBI and download a protein sequence (www.uniport.org).


 Copy the sequence and paste in notepad.
 Open homology modeling using Swiss model (swissmodel.expasy.org)
 Click on” building model” option, wait for a while and we will get the model and its
template.
 Using downloaded model, “RAMPAGE” Rap was observed.
 Check the quality of the homology model using the SAVES (the structure analysis
verification) servers (https://services.mbi.ucla.edu/SAVES/)
 Analyze the result and report.

158 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

159 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

160 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

161 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

REPORT: The three-dimensional structure of the given protein sequence was predicted
using homology modeling.

162 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 19

DATE:

BASIC PROGRAMMING IN SQL

Structural query language (SQL) is a computer language designed for managing data in
relational database management system. It is originally based upon relational algebra and
calculus. In common usage of SQL also encompasses data manipulation language (DML)
used for creating and modifying tables and other database structures.

A relational database uses, relationally or two-dimensional tables to store different pieces of


information inside tables and nothing more. All operations on data are done on the tables
themselves or produce other tables as result .A relational database contains one or many
tables, which is a basic storage structure of relational database management system
(RDBMS). Each row is set of column with only value for each. All rows from the same table
have the same set of columns. The row from a relational table is analogous to a record, and
the columns to a field.

Relational database servers in two ways:

1) Retrieving subject of its column


2) Retrieving subject of its row

PROPERTIES:

1) It can be accessed and modified by executing structured query language (SQL)


statements.
2) It contains a collection of tables with no physical pointers and uses a set of operators.
3) A single row or table representing all data required for a particular medicine, each
row in a table should be identified by a primary key, which allows no duplicate
key/values.
4) A column or an attribute contains the medicine name.
5) The serial number identifies a medicine in the table. In this example, the serial
number column is designated as primary key.
6) A primary key must contain value and the value must be unique.
163 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

7) A column containing cost value is a foreign key which defines how table relate to
each other.
8) A field may have no value in it; this is called a null value.
9) A field can be found at the intersection of row and column; there can one value in it.

GENERAL GUIDELINES FOR EXECUTING SQL COMMANDS:

 SQL commands may be on one/many line.


 Clauses are usually placed on separate lines.
 Tabulation can be used.
 Commands words can’t be split across lines.
 SQL commands are not case sensitive.
 Place a semi-colon (;) at the end of the last clause.

SQL FEATURES:

 SQL can be used by a range of users, including those with little or no program
knowledge.
 It is non-procedural language.
 It reduces the amount of time required for creating and maintaining systems.

SQL RULES:

 It starts with a verb, each verb is followed by number of clauses and a space ( )
separate clauses.
 A comma (,) separates parameters without a clause.
 A semicolon (;) is used to end SQL statement.
 Statement may be split across lines but keywords may not.
 Lexical units such as identifiers, operator names, and literals are separated by one or
more spaces or other delimiters that will not confused with the lexical units.
 Reserved words may not be used as identifiers unless enclosed with double quotes.
 Identifier can contain up to 30 characters and must start with an alphabetical
character.
 Character and date literals must be enclosed with in single quotes.

164 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

 Numeric literals can be represented by simple values.


 Scientific notations as 2*10power5.
 Comments may be enclosed between /* and /* symbols and may be multiline.
 Single line comments may be prefixed with a symbol.

SQL IS DIVIDED INTO 4 TYPES:

1. DATA DEFINITION LANGUAGE:


Data definition language uses the following commands. It is a set of SQL commands
used to create, modify and delete database structures but not data.
a) Create Table: This command can be used to create database objects
syntax: create table table name (fieldname 1 data type (size), fieldname 2 data
type (size), fieldname 3 data type (size),);
Eg: create table employee (eno number(10), ename varchar2(20), address
varchar2(15));
Output: Table created
b) Alter Table: This command is used to alter the structures of database. 3 types of
alter commands are available. They are:
 ADD:
Syntax-alter table name add (fieldname new data (size));
Eg: alter table employee add (phone number (2));
 MODIFY:
Syntax-alter table name modify (fieldname new data type (size));
Eg: alter table employee modify (enough varchar2 (15));
 RENAME:
Syntax: rename old table name to new table name;
Eg: rename employee to imp;

2. DATA MANIPULATION LANGUAGE:


Data manipulation language uses the following commands;
 INSERT: This command can be used to insert any new rows in the table.
Syntax: insert into table name values (&fieldname 1, &fieldname 2, &
fieldname 3);

165 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

Eg: insert into emp values (&eno, ‘&ename’, ‘&address’);


 UPDATE: This command is used to change values for specified column in
the table.
Syntax: update table name set column name=expression where conditions;
Eg: update emp set ename=’survana’ where eno=1;
 DELETE: This command is used to remove any rows from the table.
Syntax: delete from table name where condition;
Eg: delete from emp where ename=’survana’;
Note: If the condition is not given all rows will be deleted.

3. DATA QUERY LANGUAGE STATEMENT:


It is the component of SQL statement that allows getting data from the database and
imposing and ordering upon it. It includes select statement. It is the heart of SQL.
i) Syntax – select fieldname from table name where condition;
Eg:- select *from emp;
Note: ‘*’ represents all columns data
Select commands include optimal or keywords.

ORDER BY CLAUSE: It is used to display rows on specific order either


ascending or descending order. By default the order will be in ascending order.
Syntax :- select column name from table name order by column name sort order
Eg:- select * from emp order by ename;
Select * from emp order by ename desc;
Group Functions: Group Functions will operate on set of rows and will return a
single value.
Eg: sum ( ), avg ( ), max ( ), min ( ), count ( ).
Suntax – select group function (column name) from table name;

4. DATA CONTROL LANGUAGE STATEMENT:


It is the component of SQL statement that control access to data and to the database.
Occasionally DCL statement are grouped with DML statements.

166 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

i) COMMIT save work done


ii) SAVE POINT Identify a point in transaction to which you can later roll back.
iii) ROLLBACK Restore database to original since the last COMMIT.

SQL Delimiters:

Delimiters are symbols or compound symbols which have a special meaning within SQL
statements.

 Aritmetic operations : +, -, *, %
 Relational operations: >, <, >_, <_, ++, --
 Logical operations: AND, OR, NOT
 Special operations: between, and, not between

167 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

168 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

REPORT: Basic programming in SQL has been performed.

169 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 20

DATE:

BIO PERL

AIM: To convert the DNA sequence into RNA.

INTRODUCTION:

PERL (Practical extraction and report language) is a programming language available for
most operating system. Using PERL, series of complex tasks can be reduced to single
statement. PERL is preferred for processing sequence analysis and database management. A
PERL, Program consists of a text file containing series of Perl, statement. Statement looks
like an Amalgam of c, UNIX shell script and English.

BIO PERL is a collection of Perl Modulus that facilities the development of Perl script for
bioinformatics application. It has played an integral role in human genome project. It is a
popular tool –kit developed as a collection of integrated PERL, modulus for transforming and
manipulating sequence data and annotations accession remote database and parsing output
from program such as BLAST, FAST etc. Bio- PERL also facilitates local execution of
programs from the EMBOSS suite. Bio-PERL saves time and effort.

In order to take advantage of bioperl the user needs a basic understanding of the PERL,
programming language including an understanding of how to use PERL reference, modules,
objects and methods. Bioperl is an open source software that is still under active
development. The advantage of open source software include the ability to freely examine
and modify source code and exemption from software licensing fees.

FEATURES:

Bioperl provide software modules for many of the typical tasks of bioinformatics
programming these include:

 Accessing nucleotide and peptide sequence data from local and remote database
 Transforming formats of database/file records

170 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

 Searching for similar sequences


 Manipulating individual sequences
 Creating and manipulating sequence alignments
 Searching for genes and other structures on genomic DNA

STARTING A PERL PROGRAM:

The study of PERL can be done by considering a small PERL program.

#!/usr/bin/perl-Use strict;

#important pragmaPrint “what is your username?”

#print out the question

my $ username; #”declare” variable

$user name =<STDIN>; #ask for the user name

Chomp ($ username); #cut off new line

Print “hello, username./n”; #print out the greeting

Everything from the #character on is considered a comment.

Each line ends with a; this is the way that the programmer tells perl that a statement is
complete.

The first line use strict; pragma enforce the strictest possible rules on the code. It I imperative
that beginners user the pragma; as it will help them understand the errors in their code easily
the

Second line is a simple print function is used to take strin as its argument and sending that
string to the standard output. The next line variable $ username that starts with a$, which is
said to be a “scalar variable. Scalar variable can hold strings.

$username=<STDIN> uses a special perl construct which is an assignment statement; it takes


the data from the program’s standard input device.

The Chomp function removes any new line characters that are on the end of a variable.
171 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

The final statement is another print statement; it uses the value of the username variable to
greet the user with his or her name.

PERL DATA TYPES:

Perl data types are of three types; scalar, array and hash data types.

1) SCALAR TYPE: A scalar is a single number or a string. The most basic kind of
variable in perl is the scalar variable. These scalars are remarkable in that string and
numbers are completely interchangeable.
For example,
$priority=9;
This statement sets the scalar variable $ priority to 9, but you can also assign to
exactly the same variable.
$ priority=”high”;
2) ARRAY TYPE: An array is a list of scalar values
Example
@my ary = (“A”,”C”,”G”,”T”);
One element of a list is a scalar. We should write $ my ary [1] instead of @my ary [1]
to get Access to the second element of the corresponding lists.
3) HASH TYPE: A hash is basically an array, in which the elements are addressed by
keywords.

%Translate= (“AAT”,”ASN”,”TAT”,”Tyr”,”TTT”, phe”);

The above statement hash represents a pair of bases to acids translate table if we want to
know the acid sequence “AAT” is translated to we just write

$translate {“AAT”} and it will return “Asn”.

INSTALLATON: BIOPERL is a large collection of complex interacting software objects


which includes PERL modules from CPAN, bioperl PERL, extensions

A bioperl xs- extension and several compiled bioinformatics programs.

172 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

BIO-PERL OBJECTS: BIOPERL objects are all packed into modules, each module
consist of necessary subroutines. A module is a reusable piece or collection of PERL code
that can be called from you program, to carry out specific tasks.

The usage of a MODULE NAME should as shown below

USE bio::seq;

There are four types of BIO-PERL objects:

a) Sequence objects sequence is the central sequences objects in bioperl, when in doubt,
this is probably the object that you want to use to describe a DNA, RNA or protein
sequence in bioperl. Most common sequence manipulations can be performed with
sequence
There are seven different sequence objects, they are sequence primary sequence,
locatable sequence, live sequence, large sequence, sequence I and sequence with
quality.
b) Alignment objects: (simple aligu): this module allows the user to convert between
alignment formats as well as more sophisticated operations, like extracting specific
region of the alignment and generating consensus sequence.
c) Location objects: location object is designed to be associated with a sequence feature
object to indicate where on a larger structure, the feature can be found
The reason why this simple concept has evolved in a collection of rather complicated
Object is that some objects have multiple locations or sub-locations.
d) Interface objects and implementation an interface is sole the definition of what
methods one can object, without any knowledge of here it is implemented.
An implementation is an actual, working implementation of an object. In bio-perl
The interface objects usually have names like bio, my object I,
With the trialing I indicating it is an interface object. The interface object mainly
provide documentation on what the interface is, and how it use it without any
implementation.

173 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

PROGRAMMING USIN BIO-PERL:

Bio-per provides software modules for many of the typical tasks of bioinformatics
programming. This includes:

1. Accessing sequence data from local and remote databases


2. Transforming formats of databases/file records
3. Manipulating individual sequence
4. Searching for “similar” sequence
5. Creating and manipulating sequence alignments
6. Developing machine readable sequence annotations

APPICATIONS OF BIO-PERL:

 Bio-Perl provides access to sequence data and transforming formats of databases.


 Bio-Perl assists in sequence similarity search.
 Bio-Perl creates and manipulates sequence alignments.
 Bio-Perl is useful in searching structure of genome.
 Bio-Perl develops machine-readable annotations.

PROCEDURE:

 All the interacting software objects of bio-perl were installed.


 Create a notepad on desktop and write the program of converting a DNA sequence.
 To RNA sequence and the program was saved with the extension of pl.
 Create another text document, named as “dna.txt” and write the DNA sequence in it.
 Right click on previously created notepad and select bio-perl.
 A window will be displayed, asking to enter the file name of text document
containing DNA sequence.
 The converted DNA sequence to RNA sequence will be displayed.

REPORT: The given DNA sequence was converted into RNA using BIO-PERL.

174 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

EXP NO: 21

DATE:

RETRIEVAL OF CHEMICAL COMPOUNDS INFORMATION


FROM PUBCHEM

WEBSITE: www.pubchem.ncbi.nlm.nih.gov

PubChem is a database of chemical molecules and their activities against biological assays.
The system is maintained by National Center for Biotechnology Information (NCBI), a
component of National Library of Medicine, which is a part of United States National
Institutes Health (NIH). PubChem can be accessed for free through the Web user interface.
Millions of compound structures and descriptive datasets can be easily downloaded via FTP.
PubChem contains substance descriptions and small molecules with fewer than 1000 atoms
and 1000 bonds. The American Chemical Society tried to get the U.S congress to restrict the
operation of PubChem, because they that claim it competes with their Chemical Abstract
Services. More than 80 database vendors contribute to growing PubChem database.

PubChem is designed to provide information on biological activities of


small molecules, generally those with molecular weight less than 500 Daltons. PubChem’s
integration with NCBI’s Entrez information retrieval system provides sub/structure, similarity
structure, bioactivity data as well as links to biological property information in PubMed and
NCBI’s protein 3D structure.

PUBCHEM DATABASES:

PubChem is comprised of three linked databases – PubChem Compound, PubChem


Substance and PubChem Bioassay.

PUBCHEM COMPOUND (UNIQUE STRUCTURES WITH COMPUTED


PROPERTIES):

PubChem Compound is a searchable database of chemical structures with validated chemical


depiction information provides to describe substances in PubChem Substances. Structures

175 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

stored with in PubChem Compounds are pre-clustered and cross-referenced by identity and
similarity groups. PubChem Compound includes over 5M compounds.

Molecular Name Searches (eg., Tylenol, Benzene) allow searching with a variety of chemical
synonyms.

Chemical Property Range Searches (eg., molecular weight between 100 and 200,
hydrogen bond acceptor count between 3 and 5) allow searching for compounds with a
variety of physical / chemical properties and descriptors.

Simple Elemental Searches (all compounds containing Gallium) allow searching with
specific restrictions.

PUBCHEM SUBSTANCES (DEPOSITED STRUCTURES):

PubChem substance is a searchable database containing descriptions of chemical samples,


from variety of sources and links to PubMed citations, protein 3D structures and biological
screening results available in PubChem bioassay. PubChem substance includes over 8M
records. Substance with no known content are linked PubChem compound.

Molecule synonym searches (eg., all substance with “deoxythymidine” as a name fragment,
or substance that contain 3’-azido-3’-deoxythymidine).

Biological Links Search (eg., substances with tested, active or inactive bioassay).

Combined Searches (eg., substances that are “Active in any BioAssay” and contain the
element Ruthenium).

PUBCHEM BIOASSAY:

PubChem BioAssay is a searchable database containing bioactivity screens of chemical


substance described in PubChem substance. PubChem BioAssay includes over 180 bioassays.
Searchable description of each bioassay is provided that include descriptions of screening
procedural conditions and readouts.

To search for BioAssay Data Sets (eg., HIV growth inhibition)

To browse our download PubChem BioAssay Results (NCI AIDS Antiviral Assay).

176 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

SEARCHING PUBCHEM

 PubChem Text Search:

PubChem Text Search is for searching compound name, synonym or ID that defaults to
PubChem Compound. The search result page offers a pull down “databases” menu that
allows searching in PubChem Substances, PubChem Biological Assay and variety of other
Entrez databases.

 PubChem Chemical Structure Search:

PubChem Chemical Structure Search (7) has the following options: Search SMILES
(including SMARTS or InChl) or Formula which includes a “sketch” linked to a drawing
program that converts structural diagrams to SMILES(exact), SMARTS(structure) or
InChl(exact) strings for searching

Clicking “DONE” on the “structure editor” converts the structural diagram to the appropriate
string and transfers it to the search box.

Select Structure File allows importation of standard and common chemical file formats.
Specify Search Type allows restrictions to same compound, similar compounds, formula or
substructure.

 PubChem Indexes and Index Search:

PubChem Indexes and Index Search allows fielded/range searching from either the PubChem
homepage or Entrez search page. An extensive list of field aliases and examples of range
searching is provided.

PUBCHEM SEARCH RESULTS:

 PubChem Compound

PubChem Compound results area derived from PubChem Substances records that provide
structures. Since compounds are structurally unique one compound may link to multiple
substances. The default display is a compound summary with thumbnails with crosslinks to
each PubChem database, other NCBI database and depositor’s databases.
177 DECCAN SCHOOL OF PHARMACY
PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

Clicking either the structure or SID link gives the full display which includes the compound’s
property data, description, related substances information, neighboring structures and cross
links.

 PubChem Substances

PubChem Substance has unique records if the structure are not known or supplied. For
example, sulfated polymannurogluronate, a novel anti-acquired immune deficiency syndrome
(AIDS) drug candidate and other natural products.

The PubChem Substance Summary Record is linked to the full record by clicking on the SID
number (PubChem’s Substance identifier). This displays the full record, that includes links to
PubMed and the source; the Medical Subject Annotation (MESH Substance Name) and a
MESH PubMed search link; and depositor supplied synonyms and comments.

PROCEDURE:

1. Access the PubChem Web page – www.pubchem.ncbi.nlm.nih.gov


2. Then select Bioassay or Compound or Substance to search desired information related
to drugs and chemical compounds.
3. Such that the desired information is displayed in the screen.
Check the results and visualize all options.

178 DECCAN SCHOOL OF PHARMACY


PHARMACOINFORMATICS PRACTICAL RECORD 2018-2019

REPORT: Retrieval of chemical compounds information from PubChem was done.

179 DECCAN SCHOOL OF PHARMACY

S-ar putea să vă placă și