Sunteți pe pagina 1din 52

INTRODUCTION TO

BIOINFORMATICS

LECTURE 1
IPMS
What is Bioinformatics?...
2

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Computational
3

Biology

Bioinformatics

Genomics
Functional
genomics
Proteomics
Structural
bioinformatics

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
...What is Bioinformatics?
4

Bioinformatics: collection and storage of


biological information

Computational biology: development of


algorithms and statistical models to analyze
biological data

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Why is Bioinformatics Important?
5

Applications areas include


Medicine

Pharmaceutical drug design

Toxicology

Molecular evolution

Biosensors

Biomaterials

Biological computing models

DNA computing

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Why should I care?
6

SmartMoney ranks
Bioinformatics as #1 among
next HotJobs

Business Week 50 Masters of


Innovation

Jobs available, exciting


research potential

Important information
waiting to be decoded!

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Why is bioinformatics emerging ?
7

few people adequately trained in both biology and


computer science

Genome sequencing, microarrays, etc lead to large


amounts of data to be analyzed

Leads to important discoveries

Saves time and money

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Fighting Human Disease
8

Genetic / Inherited
Diabetes

Viral
Flu, common cold

Bacterial
Meningitis, Strep throat

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Drug Development Life Cycle
Discovery
(2 to 10 Years)
9

Preclinical Testing
(Lab and Animal Testing)

Phase I
(20-30 Healthy Volunteers used to
check for safety and dosage)
Phase II
(100-300 Patient Volunteers used to
check for efficacy and side effects)
Phase III
(1000-5000 Patient Volunteers
$600-700 Million! used to monitor reactions to
long-term drug use)

FDA Review
& Approval

Post-Marketing
Testing
Years

0 2 4 6 8 10 12 14 16

7 15 Years!
INTRODUCTION TO BIOINFORMATICS SPRING 2005
Dr. N AYDIN
Skills Requirement
10

Computer science

Molecular biology

Statistics

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Learning
11

ISCB: http://www.iscb.org/

NCBI: http://ncbi.nlm.nih.gov/

http://www.bioinformatics.org/

Journals

Conferences

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Overview of Molecular Biology
12

Cells
Chromosomes
DNA
RNA
Amino Acids
Proteins
Genome/Transcriptome/Proteome

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Cells
13

Complex system enclosed in a


Example Animal Cell
membrane www.ebi.ac.uk/microarray/
biology_intro.htm

Organisms are unicellular


(bacteria, bakers yeast) or
multicellular

Humans:
60 trillion cells
320 cell types

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Organisms
14

Classified into two types:

Eukaryotes: contain a membrane-bound nucleus and


organelles (plants, animals, fungi,)

Prokaryotes: lack a true membrane-bound nucleus and


organelles (single-celled, includes bacteria)

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Chromosomes
15
Human Karyotype
In eukaryotes, nucleus
http://avery.rutgers.edu/WSSP/StudentScholars/
contains one or several
Session8/Session8.html
double stranded DNA
molecules organized as
chromosomes

Humans:
22 Pairs of autosomes

1 pair sex chromosomes

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Chromosomes
16

Image source:
www.biotec.or.th/Genome/whatGenome.html
INTRODUCTION TO BIOINFORMATICS SPRING 2005
Dr. N AYDIN
DNA is the blueprint for life

DNA: Deoxyribonucleic Acid


Every cell in your body has 23
chromosomes in the nucleus
The genes in these
chromosomes determine all of
your physical attributes.
Single stranded molecule
(oligomer, polynucleotide) chain
of nucleotides

4 different nucleotides:
Adenosine (A)
Cytosine (C)
Guanine (G)
Thymine (T)

INTRODUCTION TO BIOINFORMATICS SPRING 2005 Dr. N AYDIN 17


Mapping the Genome
18

The human genome project has provided us with a


draft of the entire human genome.
Four bases:
A, T, C, G
3.12 billion base-pairs
99% of these are the same
Polymorphisms = where they differ

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Nucleotide Bases
19

Purines (A and G)
Pyrimidines (C and T)
Difference is in base structure

Image Source: www.ebi.ac.uk/microarray/


biology_intro.htm
INTRODUCTION TO BIOINFORMATICS SPRING 2005
Dr. N AYDIN
DNA polynucleotides(oligomers)
20

Different nucleotides
are strung together to
form polynucleotides

Ends of the
polynucleotide are
different

A directionality is
present

Convention is to label
the coding strand
from 5 to 3

http://www.emc.maricopa.edu/faculty/farabee/BIOBK/BioBookDNAMOLGEN.h
tml
INTRODUCTION TO BIOINFORMATICS SPRING 2005
Dr. N AYDIN
Single Strand Polynucleotide
21

Example polynucleotide:

5 GTAAAGTCCCGTTAGC
3

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Double Stranded DNA
22

DNA can be single-stranded or double-stranded

Double stranded DNA: second strand is the reverse


complement strand

Reverse complement runs in opposite direction and bases are


complementary

Complementary bases:
A, T
C, G

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Double Stranded Sequence
23

Example double stranded polynucleotide:


5 GTAAAGTCCCGTTAGC 3
| | | | | | | | | | | | | | | |
3 CATTTCAGGGCAATCG 5

http://www.emc.maricopa.edu/faculty/farabee/BIOBK/BioBookDNAMOLGEN.h
tml
INTRODUCTION TO BIOINFORMATICS SPRING 2005
Dr. N AYDIN
Double Helix
24

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Proteins: Molecular machinery
25
Proteins in your muscles allows you to move:

myosin
and
actin

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Proteins: Molecular machinery
26

Enzymes
(digestion, catalysis)

Structure (collagen)

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Proteins: Molecular machinery
27
Signaling
(hormones,
kinases)
Transport
(energy,
oxygen)

Image source: Crane digital, http://www.cranedigital.com/

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Drug Discovery
28

What protein can we attack to stop the disease from


progressing?

What sort of molecule will bind to this protein?

Does it kill the patient?

Does it have side effects?

Does it get to the problem spots?

INTRODUCTION TO BIOINFORMATICS SPRING 2005


Dr. N AYDIN
Finding drug leads
29

Once we have a target, how do we find some


compounds that might bind to it?
The old way: exhaustive screening
The new way: computational screening!

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
Drug Lead Screening & Docking
30

Complementarity
Shape
Chemical
Electrostatic

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
Problems in Bioinformatcs
31

Genomics
Gene finding
Annotation
Sequence alignment and database search
Functional genomics
Microarray expression, gene chips
Proteomics
Structure prediction
Comparative modeling
Function prediction
Structural bioinformatics
Molecular docking, screening, etc.

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
RNA
32

Ribonucleic Acid

Similar to DNA

Thymine (T) is replaced by uracil (U)

RNA can be:


Single stranded
Double stranded
Hybridized with DNA

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
RNA
33

RNA is generally single stranded

Forms secondary or tertiary structures

RNA folding will be discussed later

Important in a variety of ways, including


protein synthesis

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
RNA secondary structure
34

E. coli Rnase P RNA


secondary structure

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
mRNA
35

Messenger RNA

Linear molecule encoding genetic information


copied from DNA molecules

Transcription: process in which DNA is copied


into an RNA molecule

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
mRNA processing
36

Eukaryotic genes can be pieced together


Exons: coding regions
Introns: non-coding regions

mRNA processing removes introns, splices


exons together

Processed mRNA can be translated into a


protein sequence

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
mRNA Processing
37

Image source: http://departments.oxy.edu/biology/Stillman/bi221/111300/processing_of_hnrnas.htm

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
tRNA
38

Transfer RNA

Well-defined three-dimensional structure

Critical for creation of proteins

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
tRNA structure
39

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
tRNA
40

Amino acid attached to each tRNA

Determined by 3 base anticodon sequence


(complementary to mRNA)

Translation: process in which the nucleotide


sequence of the processed mRNA is used in
order to join amino acids together into a
protein with the help of ribosomes and tRNA

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
Genetic Code
41

4 possible bases (A, C, G, U)


3 bases in the codon
4 * 4 * 4 = 64 possible codon sequences
Start codon: AUG
Stop codons: UAA, UAG, UGA
61 codons to code for amino acids (AUG as well)
20 amino acids redundancy in genetic code

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
20 Amino Acids
42
Glycine (G, GLY)
Alanine (A, ALA)
Valine (V, VAL)
Leucine (L, LEU)
Isoleucine (I, ILE)
Phenylalanine (F, PHE)
Proline (P, PRO)
Serine (S, SER)
Threonine (T, THR)
Cysteine (C, CYS)
Methionine (M, MET)
Tryptophan (W, TRP)
Tyrosine (T, TYR)
Asparagine (N, ASN)
Glutamine (Q, GLN)
Aspartic acid (D, ASP)
Glutamic Acid (E, GLU)
Lysine (K, LYS)
Arginine (R, ARG)
Histidine (H, HIS)
START: AUG
STOP: UAA, UAG, UGA
INTRODUCTION TO BIOINFORMATICS SPRING
2005 Dr. N AYDIN
Amino Acids
43

building blocks for proteins (20 different)


vary by side chain groups

Hydrophilic amino acids are water soluable


Hydrophobic are not

Linked via a single chemical bond (peptide bond)

Peptide: Short linear chain of amino acids (< 30)


polypeptide: long chain of amino acids (which can be upwards
of 4000 residues long).

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
Proteins
44

Polypeptides having a three dimensional structure.

Primarysequence of amino acids constituting the polypeptide


chain
Secondarylocal organization into secondary structures such
as helices and sheets
Tertiary three dimensional arrangements of the amino acids
as they react to one another due to the polarity and resulting
interactions between their side chains
Quaternarynumber and relative positions of the protein
subunits

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
Protein Structure
45

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
Central Dogma
46

DNA

RNA

PROTEIN

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
Central Dogma
47

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
What is a Gene?
48

the physical and functional unit of heredity that


carries information from one generation to the next

DNA sequence necessary for the synthesis of a


functional protein or RNA molecule

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
Genome
49

chromosomal DNA of an organism

number of chromosomes and genome size varies


quite significantly from one organism to another

Genome size and number of genes does not


necessarily determine organism complexity

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
Genome Comparison
50

ORGANISM CHROMOSOME GENOME SIZE GENES


S

Homo sapiens 23 3,200,000,000 ~ 30,000


(Humans)

Mus musculus 20 2,600,000,000 ~30,000


(Mouse)

Drosophila 4 180,000,000 ~18,000


melanogaster
(Fruit Fly)

Saccharomyces 16 14,000,000 ~6,000


cerevisiae (Yeast)

Zea mays (Corn) 10 2,400,000,000 ???

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
Transcriptome
51

complete collection of all possible mRNAs


(including splice variants) of an organism.

regions of an organisms genome that get


transcribed into messenger RNA.

transcriptome can be extended to include all


transcribed elements, including non-coding
RNAs used for structural and regulatory
purposes.

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN
Proteome
52

the complete collection of proteins that can be


produced by an organism.

can be studied either as static (sum of all


proteins possible) or dynamic (all proteins
found at a specific time point) entity

INTRODUCTION TO BIOINFORMATICS SPRING


2005 Dr. N AYDIN

S-ar putea să vă placă și