Sunteți pe pagina 1din 45

Fundamental Molecular Biology

Computational Genomics
January 27, 2004
Genomes
Genome sizes
Triturus cristatus ≈ 18 GB (crested newt)
Corn ≈ 5 GB
Human ≈ 3.2 GB
Frog (X. laevis) ≈ 3 GB
Fruit fly (D. melanogaster) ≈ 137 MB
Nematode (C. elegans) = 80 MB
Yeast (S. cerevisiae) = 13 MB
Bacteria (M. tuberculosis) = 4.4 MB
HIV = 9181bp (RNA)
Human Genome
Human genome
¾ 3.2 GB (3200 MB)
¾ 46 chromosome (23 N)
¾ 22 autosome pairs + sex chromosomes (XX or XY)
¾ 25,000 – 100,000 genes
¾≈ 1014 cells
Genome Packaging
Many levels of packaging

Total human genome length ≈ 1 meter


(haploid)

Length of metaphase chromosomes is


approximately 5-10k times smaller.
Genome Packaging
Genome Packaging
Genome made up of chromosomes
Humans have 23
chromosomes
Flys have 4
chromosomes
Bacteria have 1
chromosome

Autosomes vs “Sex”
D. McDonald Seattle Laboratory of Pathology
chromosomes
Chromosome Anatomy
Structures
™ Centromeres
™ Telomeres
Euchromatin
™ Gene-dense
™ Loosely packed
™ Histones are acetylated
Heterochromatin
™ Gene-poor
™ Tightly packed
™ Histones are methylated
Chromosomes have genes
The density of genes on a
chromosome can be
quite variable
Chromosome 19:
23 genes per million bp
Chromosome 13:
5 genes per million bp
An average human gene
size(spliced) is around
3000bp
"non-functional" regions
Only 1-3% of the human genome may actually code
for a protein

Remaining DNA is either


1. regulatory
2. repetitive (≈ 50%)
3. junk (spacer DNA)
Repetitive Elements
Repeats
Simple
AT-rich, CA-repeats, longer
Complex
LINES -Long Interspersed Nuclear elements
SINES - Short Interspersed Nuclear elements
Alu, Mariner elements
Replication of DNA

One could envision


several methods of
replication
conservative or semi-
conservative ....
Important
considerations: errors,
start and stop points,
mechanics...
DNA
replication
From genes to proteins
Central Dogma

Genome mRNA transcript protein


transcription translation
Transcription
Produces intermediate molecules from which
proteins are produced.

mRNA is a "copy" of DNA


Single stranded
Condensed, all of the "filler" is taken out
Translated into proteins
Introns and Exons
Other mRNA modifications
Splicing
removes introns, splices exons together
Capping
Adds a 'G' in reverse orientation to the 5’ end
Polyadenylation
not coded in the DNA - 100’s-1000’s A's added
to the 3’ end of mRNA
Translation
Changes alphabets

DNA/RNA
4 letters

Amino acids
20 letters

3rd base wobble


What makes a protein
Amino acids are linked together to make a peptide,
which is extended and folded into a functional
protein.

Several classes of amino acids based upon chemical


and electrical properties.

Some substitutions can be tolerated in the overall


structure of the protein (sometimes)
From sequence to structure
So a protein is a chain of amino acids...
How is the structure formed?
Due to the interaction of amino acids with adjacent
amino acids as well as interaction of groups of
amino acids with the surrounding environment.
Computer modeling software DOES exist to predict
this structure
Summary of information flow
DNA to RNA to Amino acid
Complications:
How does DNA or RNA get
made?
How is the replication of
DNA from generation to
generation handled?
What controls which gene
gets made first or how
many copies of a protein
are made?
Questions about translation
Punctuation? Words?
20 amino acids, 4 different bp-- need more than
one bp to encode each amino acid
41 = 4
42 = 16
43 = 64
M. Nirenberg and H. Matthael broke the code ~10
years after the structure was discovered
Codons?
tRNA
Proteins
Structure
primary-> secondary -> tertiary -> quartenary

There are several common protein motifs


Beta sheet
alpha helix
Cysteine-bridges
Transcription Regulation
RNA polymerase is the engine driving transcription
~ 1.6 X 10 -8 meters long
1.8 meters of DNA in a human nucleus
Polymerase needs to cover a huge area, quickly and
accurately
Not all of the DNA is as accessable as others
How to find the genes that need to be transcribed
quickly?
Transcription regulation
Usually positive regulation
Usually a single transcripted gene

Modes:
1) recruits or enhances polymerase
2) makes DNA more available
Context
Transcription -The conversion of DNA information
to RNA information.
Happens before splicing
Mechanism for transcription is RNA polymerase.
Followed by translation - The conversion of RNA
information to amino acid sequence
RNA polymerase
>10 subunits in the “holoenzyme”
Several types: I-III

Type II is involved in "gene" transcription.

Type I and III transcribe ribosomal and transfer


RNA
How transcription elements are
discovered
Use of a reporter system
Procedural, makes the enzyme assay routine

Chop off the DNA until you get a change in the


reporter enzyme levels (promoter bashing)
Mutate a potential site (Single bp changes)
Consensus sequences
How transcription elements are
discovered

1.00

0.57

0.56

0.18

0.88
Promoters vs Enhancers
Characteristics of each
Promotors <200 bp from start
Enhancers < 'several kb'
May end up closer due to bending
Typically more global regulatory elements, tissue
or time specifc.

Both consist of < 10 bp elements


Enhancers
Can be either upstream or downstream.

Modes of operation:
Could change density of supercoiling.
Provid an entry site for the polymerase.
Anchor the DNA at a place within the cell for
access.
Mutation
Common themes in mutation
point mutation
recombination
transposition
deletion
Severity of mutation
No effect
point mutants in particular can be silent (no change to
AA)
Minimal effect
not critical to protein function
Major effect
Insertions which inactivate a protein
Frame-shifts
Deletions
Severity of mutation

Functional domain Conserved homology


SNPs
Single nucleotide polymorphisims
The human genomic sequence even without errors
is an approximation
Normal variation in sequence is several million per
person
Can be important in drug tolerance and disease
prevelence questions
Types of sequences
genomic

mRNA

EST

Protein sequences
Finding your way on molecules
5’ end is the start of DNA
3’ end is the end of DNA

N (amino) terminal is the start of a protein


C (carboxy) terminal is the end of a protein

Start codon Stop codon

5’ UTR CDS 3’ UTR


Anti-parallel
HOCH2 OH
O
C C
C C P
OH OH P
P
HOCH3 OH HOCH3 N
OH OCH3
O O O
H H

OH OH OH OH

Deoxyribonucleic acid Ribonucleic acid


(DNA) (RNA)
Sequences
Genome sequences (1,000,000’s of bp)
mRNA sequences (1,000’s of bp)
ESTs (100’s of bp)
Start codon Stop codon

5’ UTR CDS 3’ UTR

3’ EST

5’ EST
END
(goto UCSC web-site intro)

S-ar putea să vă placă și