CMMB-Fund Mol Bio

Fundamental Molecular Biology
Computational Genomics
January 27, 2004
Genomes
Genome sizes
Triturus cristatus ≈ 18 GB (crested newt)
Corn ≈ 5 GB
Human ≈ 3.2 GB
Frog (X. laevis) ≈ 3 GB
Fruit fly (D. melanogaster) ≈ 137 MB
Nematode (C. elegans) = 80 MB
Yeast (S. cerevisiae) = 13 MB
Bacteria (M. tuberculosis) = 4.4 MB
HIV = 9181bp (RNA)
Human Genome
Human genome
¾ 3.2 GB (3200 MB)
¾ 46 chromosome (23 N)
¾ 22 autosome pairs + sex chromosomes (XX or XY)
¾ 25,000 – 100,000 genes
¾≈ 1014 cells
Genome Packaging
Many levels of packaging
Total human genome length ≈ 1 meter

(haploid)
Length of metaphase chromosomes is

approximately 5-10k times smaller.
Genome Packaging
Genome Packaging
Genome made up of chromosomes
Humans have 23
chromosomes
Flys have 4
chromosomes
Bacteria have 1
chromosome
Autosomes vs “Sex”
D. McDonald Seattle Laboratory of Pathology
chromosomes
Chromosome Anatomy
Structures
Centromeres
Telomeres
Euchromatin
Gene-dense
Loosely packed
Histones are acetylated
Heterochromatin
Gene-poor
Tightly packed
Histones are methylated
Chromosomes have genes
The density of genes on a
chromosome can be
quite variable
Chromosome 19:
23 genes per million bp
Chromosome 13:
5 genes per million bp
An average human gene
size(spliced) is around
3000bp
"non-functional" regions
Only 1-3% of the human genome may actually code
for a protein
Remaining DNA is either

1. regulatory
2. repetitive (≈ 50%)
3. junk (spacer DNA)
Repetitive Elements
Repeats
Simple
AT-rich, CA-repeats, longer
Complex
LINES -Long Interspersed Nuclear elements
SINES - Short Interspersed Nuclear elements
Alu, Mariner elements
Replication of DNA
One could envision

several methods of
replication
conservative or semi-
conservative ....
Important
considerations: errors,
start and stop points,
mechanics...
DNA
replication
From genes to proteins
Central Dogma
Genome mRNA transcript protein

transcription translation
Transcription
Produces intermediate molecules from which
proteins are produced.
mRNA is a "copy" of DNA

Single stranded
Condensed, all of the "filler" is taken out
Translated into proteins
Introns and Exons
Other mRNA modifications
Splicing
removes introns, splices exons together
Capping
Adds a 'G' in reverse orientation to the 5’ end
Polyadenylation
not coded in the DNA - 100’s-1000’s A's added
to the 3’ end of mRNA
Translation
Changes alphabets
DNA/RNA
4 letters
Amino acids
20 letters
3rd base wobble

What makes a protein
Amino acids are linked together to make a peptide,
which is extended and folded into a functional
protein.
Several classes of amino acids based upon chemical

and electrical properties.
Some substitutions can be tolerated in the overall

structure of the protein (sometimes)
From sequence to structure
So a protein is a chain of amino acids...
How is the structure formed?
Due to the interaction of amino acids with adjacent
amino acids as well as interaction of groups of
amino acids with the surrounding environment.
Computer modeling software DOES exist to predict
this structure
Summary of information flow
DNA to RNA to Amino acid
Complications:
How does DNA or RNA get
made?
How is the replication of
DNA from generation to
generation handled?
What controls which gene
gets made first or how
many copies of a protein
are made?
Questions about translation
Punctuation? Words?
20 amino acids, 4 different bp-- need more than
one bp to encode each amino acid
41 = 4
42 = 16
43 = 64
M. Nirenberg and H. Matthael broke the code ~10
years after the structure was discovered
Codons?
tRNA
Proteins
Structure
primary-> secondary -> tertiary -> quartenary
There are several common protein motifs

Beta sheet
alpha helix
Cysteine-bridges
Transcription Regulation
RNA polymerase is the engine driving transcription
~ 1.6 X 10 -8 meters long
1.8 meters of DNA in a human nucleus
Polymerase needs to cover a huge area, quickly and
accurately
Not all of the DNA is as accessable as others
How to find the genes that need to be transcribed
quickly?
Transcription regulation
Usually positive regulation
Usually a single transcripted gene
Modes:
1) recruits or enhances polymerase
2) makes DNA more available
Context
Transcription -The conversion of DNA information
to RNA information.
Happens before splicing
Mechanism for transcription is RNA polymerase.
Followed by translation - The conversion of RNA
information to amino acid sequence
RNA polymerase
>10 subunits in the “holoenzyme”
Several types: I-III
Type II is involved in "gene" transcription.
Type I and III transcribe ribosomal and transfer

RNA
How transcription elements are
discovered
Use of a reporter system
Procedural, makes the enzyme assay routine
Chop off the DNA until you get a change in the

reporter enzyme levels (promoter bashing)
Mutate a potential site (Single bp changes)
Consensus sequences
How transcription elements are
discovered
1.00
0.57
0.56
0.18
0.88
Promoters vs Enhancers
Characteristics of each
Promotors <200 bp from start
Enhancers < 'several kb'
May end up closer due to bending
Typically more global regulatory elements, tissue
or time specifc.
Both consist of < 10 bp elements

Enhancers
Can be either upstream or downstream.
Modes of operation:
Could change density of supercoiling.
Provid an entry site for the polymerase.
Anchor the DNA at a place within the cell for
access.
Mutation
Common themes in mutation
point mutation
recombination
transposition
deletion
Severity of mutation
No effect
point mutants in particular can be silent (no change to
AA)
Minimal effect
not critical to protein function
Major effect
Insertions which inactivate a protein
Frame-shifts
Deletions
Severity of mutation
Functional domain Conserved homology

SNPs
Single nucleotide polymorphisims
The human genomic sequence even without errors
is an approximation
Normal variation in sequence is several million per
person
Can be important in drug tolerance and disease
prevelence questions
Types of sequences
genomic
mRNA
EST
Protein sequences
Finding your way on molecules
5’ end is the start of DNA
3’ end is the end of DNA
N (amino) terminal is the start of a protein

C (carboxy) terminal is the end of a protein
Start codon Stop codon
5’ UTR CDS 3’ UTR

Anti-parallel
HOCH2 OH
O
C C
C C P
OH OH P
P
HOCH3 OH HOCH3 N
OH OCH3
O O O
H H
OH OH OH OH
Deoxyribonucleic acid Ribonucleic acid

(DNA) (RNA)
Sequences
Genome sequences (1,000,000’s of bp)
mRNA sequences (1,000’s of bp)
ESTs (100’s of bp)
Start codon Stop codon
5’ UTR CDS 3’ UTR
3’ EST
5’ EST
END
(goto UCSC web-site intro)

CMMB-Fund Mol Bio

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

CMMB-Fund Mol Bio

Încărcat de

Drepturi de autor:

Formate disponibile

Fundamental Molecular Biology

Total human genome length ≈ 1 meter

Length of metaphase chromosomes is

Remaining DNA is either

One could envision

Genome mRNA transcript protein

mRNA is a "copy" of DNA

3rd base wobble

Several classes of amino acids based upon chemical

Some substitutions can be tolerated in the overall

There are several common protein motifs

Type II is involved in "gene" transcription.

Type I and III transcribe ribosomal and transfer

Chop off the DNA until you get a change in the

Both consist of < 10 bp elements

Functional domain Conserved homology

N (amino) terminal is the start of a protein

Start codon Stop codon

5’ UTR CDS 3’ UTR

Deoxyribonucleic acid Ribonucleic acid

5’ UTR CDS 3’ UTR

S-ar putea să vă placă și