Documente Academic
Documente Profesional
Documente Cultură
the nucleotide sequence of DNA into complementary sequence in RNA, a process called transcription. y During transcription, RNA polymerase binds to DNA and separates the DNA strands. RNA polymerase then uses one strand of DNA as a template from which nucleotides are assembled into a strand of mRNA.
as a promoter, where it binds and begins transcription. y RNA strands are then edited. Some parts are removed (introns) - which are not expressed and other that are left are called exons or expressed genes.
Translation
y During translation, the cell uses information from
messenger RNA to produce proteins. y A Transcription occurs in nucleus. y B mRNA moves to the cytoplasm then to the ribosomes. tRNA read the mRNA and obtain the amino acid coded for. y C Ribosomes attach amino acids together forming a polypeptide chain. y D Polypeptide chain keeps growing until a stop codon is reached.
starts at DNA, which replicates to form more DNA. Information is then transcribed into RNA, and then it is translated into protein. The proteins do most of the work in the cell. y Information does not flow in the other direction. This is a molecular version of the incorrectness of inheritance of acquired characteristics . Changes in proteins do not affect the DNA in a systematic manner (although they can cause random changes in DNA.
Transcription
y Transcription is the process of making an RNA copy of a single gene. Genes are y y y y
specific regions of the DNA of a chromosome. The enzyme used in transcription is RNA polymerase . There are several forms of RNA polymerase. In eukaryotes, most genes are transcribed by RNA polymerase 2. The raw materials for the new RNA are the 4 ribonucleoside triphosphates: ATP, CTP, GTP, and UTP. It s the same ATP as is used for energy in the cell. As with DNA replication, transcription proceeds 5- to 3 : new bases are added to the free 3 OH group. Unlike replication, transcription does not need to build on a primer. Instead, transcription starts at a region of DNA called a promoter . For protein-coding genes, the promoter is located a few bases 5 to (upstream from) the first base that is transcribed into RNA. Promoter sequences are very similar to each other, but not identical. If many promoters are compared, a consensus sequence can be derived. All promoters would be similar to this consensus sequence, but not necessarily identical.
After Transcription
y In prokaryotes, the RNA copy of a gene is messenger RNA, ready to be translated into protein. In fact, translation starts even before transcription is finished. y In eukaryotes, the primary RNA transcript of a gene needs further processing before it can be translated. This step is called RNA processing . Also, it needs to be transported out of the nucleus into the cytoplasm. y Steps in RNA processing:
y 1. Add a cap to the 5 end y 2. Add a poly-A tail to the 3 end y 3. splice out introns.
Introns
y Introns are regions within a gene that don t code for protein and don t
appear in the final mRNA molecule. Protein-coding sections of a gene (called exons) are interrupted by introns. y The function of introns remains unclear. They may help is RNA transport or in control of gene expression in some cases, and they may make it easier for sections of genes to be shuffled in evolution. But , no generally accepted reason for the existence of introns exists. y There are a few prokaryotic examples, but most introns are found in eukaryotes. y Some genes have many long introns: the dystrophin gene (mutants cause muscular dystrophy) has more than 70 introns that make up more than 99% of the gene s sequence. However, not all eukaryotic genes have introns: histone genes, for example, lack introns.
In eukaryotes, RNA polymerase produces a primary transcript , an exact RNA copy of the gene. A cap is put on the 5 end. The RNA is terminated and poly-A is added to the 3 end. All introns are spliced out. At this point, the RNA can be called messenger RNA. It is then transported out of the nucleus into the cytoplasm, where it is translated.
Translation
y Translation of mRNA into protein is accomplished by the ribosome, an RNA/protein hybrid. Ribosomes are composed of 2 subunits, large and small. y Ribosomes bind to the translation initiation sequence on the mRNA, then move down the RNA in a 5 to 3 direction, creating a new polypeptide. The first amino acid on the polypeptide has a free amino group, so it is called the Nterminal . The last amino acid in a polypeptide has a free acid group, so it is called the C-terminal . y Each group of 3 nucleotides in the mRNA is a codon , which codes for 1 amino acids. Transfer RNA is the adapter between the 3 bases of the codon and the corresponding amino acid.
initiation sites. There can be several different initiation sites on a messenger RNA: a prokaryotic mRNA can code for several different proteins. Translation begins at an AUG codon, or sometimes a GUG. The modified amino acid Nformyl methionine is always the first amino acid of the new polypeptide. y In eukaryotes, ribosomes bind to the 5 cap, then move down the mRNA until they reach the first AUG, the codon for methionine. Translation starts from this point. Eukaryotic mRNAs code for only a single gene. (Although there are a few exceptions, mainly among the eukaryotic viruses). y Note that translation does not start at the first base of the mRNA. There is an untranslated region at the beginning of the mRNA, the 5 untranslated region (5 UTR).
More Initiation
y The initiation process involves first joining the mRNA, the initiator methionine-tRNA, and the small ribosomal subunit. Several initiation factors -additional proteins--are also involved. The large ribosomal subunit then joins the complex.
Elongation
y The ribosome has 2 sites for tRNAs, called P and A. The initial tRNA
with attached amino acid is in the P site. A new tRNA, corresponding to the next codon on the mRNA, binds to the A site. The ribosome catalyzes a transfer of the amino acid from the P site onto the amino acid at the A site, forming a new peptide bond. y The ribosome then moves down one codon. The now-empty tRNA at the P site is displaced off the ribosome, and the tRNA that has the growing peptide chain on it is moved from the A site to the P site.
y The process is then repeated: y the tRNA at the P site holds the peptide chain, and a new tRNA binds to the A site. y the peptide chain is transferred onto the amino acid attached to the A site tRNA. y the ribosome moves down one codon, displacing the empty P site tRNA and moving the tRNA with the peptide chain from the A site to the P site.
Elongation
into their active conformation. However, some proteins are helped and guided in the folding process by chaperone proteins y Many proteins have sugars, phosphate groups, fatty acids, and other molecules covalently attached to certain amino acids. Most of this is done in the endoplasmic reticulum. y Many proteins are targeted to specific organelles within the cell. Targeting is accomplished through signal sequences on the polypeptide. In the case of proteins that go into the endoplasmic reticulum, the signal seqeunce is a group of amino acids at the N terminal of the polypeptide, which are removed from the final protein after translation.
CS 177
Questions about the genome in an organism: How much DNA, how many nucleotides? How many genes are there? What types of proteins appear to be coded by these genes?
Questions about the proteome: What proteins are present? Where are they?
DNA RNA Mutations Amino acids, protein structure
Lecture 2 * DNA and its components * RNA and its components * Mutations * Amino acids, review of protein structure
Linking nucleotides
5
Hydrogen bonds
3
N-H------N N-H------O 3 3 The 3 -OH of one nucleotide is linked to the 5 -phosphate of the next nucleotide What next? Linking nucleotides:
3
Thymine
3
2nm
3
Adenine
3
Cytosine
Guanine
Base pairing
5
A T
3
C G A T
3 3
T A
3
C G
5
DNA conventions
DNA conventions
1. DNA is a right-handed helix 2. The 5 end is to the left by convention 5 -ATCGCAATCAGCTAGGTT3 3 -TAGCGTTAGTCGATCCAA5 sense (forward) antisense (reverse)
DNA RNA
3 -TAGCGTTAGTCGATCCAA - 5
Mutations
5 -ATCGCAATCAGCTAGGTT - 3
5 -ATCGCAATCAGCTAGGTT-3 3 -TAGCGTTAGTCGATCCAA-5
DNA overview
DNA deoxyribonucleic acid 4 bases A = T = C = G = Adenine Thymine Cytosine Guanine Nucleotide base
OOH
DNA RNA Mutations Amino acids, protein structure 5 CH2 4
Pyrimidine (C4N2H4)
Purine (C5N4H4)
+ sugar
--
+ phosphate
O- PO 4 O P O
1
O H
2
H
3 OH
H H
Numbering of carbons?
sugar
DNA structure
Some more facts: 1. Forces stabilizing DNA structure: Watson-Crick-H-bonding and base stacking (planar aromatic bases overlap geometrically and electronically p energy gain) 2. Genomic DNAs are large molecules: Eschericia coli: 4.7 x 106 bp; ~ 1 mm contour length Human: 3.2 x 109 bp; ~ 1 m contour length 3. Some DNA molecules (plasmids) are circular and have no free ends: mtDNA bacterial DNA (only one circular chromosome) 4. Average gene of 1000 bp can code for average protein of about 330 amino acids 5. Percentage of non-coding DNA varies greatly among organisms Organism small virus typical virus bacterium yeast human amphibians plants # Base pairs 4 x 103 3 x 105 5 x 106 1 x 107 6000 3.2 x 109 < 80 x 109 < 900 x 109 # Genes 3 200 3000 > 50% 30,000? ? 23,000 - >50,000 > 99% Non-coding DNA very little very little 10 - 20% 99% ?
RNA structure
RNA ribonucleic acid 3 major types of RNA
messenger RNA (mRNA); template for protein synthesis transfer RNA (tRNA); adaptor molecules that decode the genetic code ribosomal RNA (rRNA); catalyzing the synthesis of proteins
Pyrimidine (C4N2H4)
Purine (C5N4H4)
+ sugar
--
+ phosphate
OH
5 CH2 4
O- PO 4 O P O
1
O H
H
3 OH
H
2 OH
sugar
RNA base composition: A+G=U+C / Chargaff s rule does not apply (RNA usually prevails as single strand)
RNA structure: - usually single stranded - many self-complementary regions p RNA commonly exhibits an intricate secondary structure (relatively short, double helical segments alternated with single stranded regions) - complex tertiary interactions fold the RNA in its final three dimensional form - the folded RNA molecule is stabilized by interactions (e.g. hydrogen bonds and base stacking)
RNA structure
Primary structure
Secondary structure C
B) duplex double helical RNA (A-form with 11 bp per turn) C) hairpin duplex bridged by a loop of unpaired nucleotides D) internal loop
D E F
DNA RNA Mutations Amino acids, protein structure
E) bulge loop unpaired nucleotides in one strand, other strand has contiguous base pairing F) junction
B A
three or more duplexes separated by single stranded regions G) pseudoknot tertiary interaction between bases of hairpin loop and outside bases
RNA structure
Primary structure
Secondary structure C
Tertiary structure
D E F
DNA RNA Mutations Amino acids, protein structure
B A
RNA structure
How to predict RNA secondary/tertiary structure?
Probing RNA structure experimentally: - physical methods (single crystal X-ray diffraction, electron microscopy) - chemical and enzymatic methods - mutational analysis (introduction of specific mutations to test change in some function or protein-RNA interaction)
Thermodynamic prediction of RNA structure: - RNA molecules comply to the laws of thermodynamics, therefore it should be possible to deduce RNA structure from its sequence by finding the conformation with the lowest free energy - Pros: only one sequence required; no difficult experiments; does not rely on alignments - Cons: thermodynamic data experimentally determined, but not always accurate; possible interactions of RNA with solvent, ions, and proteins
Comparative determination of RNA structure: - basic assumption: secondary structure of a functional RNA will be conserved in the evolution of the molecule (at least more conserved than the primary structure); when a set of homologous sequences has a certain structure in common, this structure can be deduced by comparing the structures possible from their sequences - Pros: very powerful in finding secondary structure, relatively easy to use, only sequences required, not affected by interactions of the RNA and other molecules - Cons: large number of sequences to study preferred, structure constrains in fully conserved regions cannot be inferred, extremely variable regions cause problems with alignment
Amino acids/proteins
- All codons are used in protein synthesis: - 20 amino acids - 3 stop codons - AUG (methionine) is the start codon (also used internally)
DNA RNA Mutations Amino acids, protein structure
- The code is non-overlapping and punctuation-free - The code is degenerate (but NOT ambiguous): each amino acid is specified by at least one codon - The code is universal (virtually all organisms use the same code)
Base 2
T
Phenylalanine F
C
Serine S
A
Tyrosine Y STOP Histidine H Glutamine Q Asparagine N Lysine K Aspartate B Glutamate Z
G
Cysteine C STOP Tryptophan W
T
Leucine L
T C A G T C A G T C A G T C A G
In-class exercise 1. Which amino acids are specified by single codons? methionine and tryptophan 2. How many amino acids are specified by the first two nucleotides only? five: proline, threonine, valine, alanine, glycine
C
Base 1
Leucine L
Proline P
Arginine R
Base 3
Isoleucine I Methionine M
Threonine T
Serine S Arginine R
Valine V
Alanine A
Glycine G
Amino acids
Hydrophobic
Amino acids
Hydrophyllic
Mutations
Primary structure
Proteins are chains of amino acids joined by peptide bonds Polypeptide chain
The N-CE-C sequence is repeated throughout the protein, forming the backbone The bonds on each side of the CE atom are free to rotate within spatial constrains, the angles of these bonds determine the conformation of the protein backbone The R side chains also play an important structural role
E helix
DNA RNA Mutations Amino acids, protein structure
F sheet
Reading frames
Reading frame (also open reading frame): The stretch of triplet sequence of DNA that potentially encodes a protein. The reading frame is designated by the initiation or start codon and is terminated by a stop codon. - a reading frame is not always easily recognizable - each strand of RNA/DNA has three possible starting points (position one, two, or three): Position 1 CAG AUG AGG UCA GGC AUA gln met arg ser gly ile C AGA UGA GGU CAG GCA UA arg trp gly gln ala CA GAU GAG GUC AGG CAU A asp glu val arg his
Position 2
Position 3
- mutations within an open reading frame that delete or add nucleotides can disrupt the reading frame (frameshift mutation):
DNA RNA Mutations Amino acids, protein structure
Wildtype Mutant
CAG AUG AGG UCA GGC AUA GAG gln met arg ser gly ile glu CAG AUG AGU CAG GCA UAG AG gln met ser gln ala
Up to 30% of mutations causing humane disease are due to premature termination of translation (nonsense mutations or frameshift)
Mutations
Mutation: any heritable change in DNA
Sources of mutation: Spontaneous mutations: mutations occur for unknown reasons Induced mutations: exposure to substance (mutagen) known to cause mutations, e.g. X-rays, UV light, free radicals Mutations may influence one or several base pairs a) Nucleotide substitutions (point mutation) 1) Transitions (Pu m Pu; Py m Py) 2) Transversions (Pu m Py)
In-class exercise How many transition and transversion events are possible?
2 transitions: T m C; A m G b) Insertion or deletion ( indels ) 4 transversions: T m A; T m G - one to many bases can be involved C m A; C m G - frequently associated with repeated sequences ( hot spots ) - lead to frameshift in protein-coding genes, except when N = 3X - also caused by insertion of transposable elements into genes
DNA RNA Mutations Amino acids, protein structure
Weighting of mutation events plays important role for phylogenetic analyses (model of sequence evolution)
Mutations
Mutations may influence phenotype a) Silent (or synonymous) substitution - nucleotide substitution without amino acid change - no effect on phenotype - mostly third codon position - other possible silent substitutions: changes in non-coding DNA b) Replacement substitution - causes amino acid change - neutral: protein still functions normally - missense: protein loses some functions (e.g. sickle cell anemia: mutation in -globin) c) Sense/nonsense substitution - sense: involves a change from a termination codon to one that codes for an amino acid - nonsense: creates premature termination codon
Mutation rates = a measure of the frequency of a given mutation per generation - mutation rates are usually given for specific loci (e.g. sickle cell anemia) - the rate of nucleotide substitutions in humans is on the order of 1 per 100,000,000 - range varies from 1 in 10,000 to 1 in 10,000,000,000 - every human has about 30 new mutations involving nucleotide substitutions - mutation rate is about twice as high in male as in female meiosis
Secondary structure
Other Secondary structure elements (no standardized classification) - random coil - loop
- In addition to secondary structure elements that apply to all proteins (e.g. helix, sheet) there are some simple structural motifs in some proteins - These super-secondary structures (e.g. transmembrane domains, coiled coils, helix-turn-helix, signal peptides) can give important hints about protein function
Secondary structure
Class 3: alpha/beta
Secondary structure
Alternative SCOP
Q: If we have all the Psi and Phi angles in a protein, do we then have enough information to describe the 3-D structure?
A: No, because the detailed packing of the amino acid side chains is not revealed from this information. However, the Psi and Phi angles do determine the entire secondary structure of a protein
Tertiary structure
Tertiary structure
The tertiary structure describes the organization in three dimensions of all the atoms in the polypeptide
The tertiary structure is determined by a combination of different types of bonding (covalent bonds, ionic bonds, h-bonding, hydrophobic interactions, Van der Waal s forces) between the side chains
Many of these bonds are very week and easy to break, but hundreds or thousands working together give the protein structure great stability
If a protein consists of only one polypeptide chain, this level then describes the complete structure
Tertiary structure
Proteins can be divided into two general classes based on their tertiary structure: - Fibrous proteins have elongated structure with the polypeptide chains arranged in long strands. This class of proteins serves as major structural component of cells Examples: silk, keratin, collagen
- Globular proteins have more compact, often irregular structures. This class of proteins includes most enzymes and most proteins involved in gene expression and regulation
Quaternary structure
The quaternary structure defines the conformation assumed by a multimeric protein. The individual polypeptide chains that make up a multimeric protein are often referred to as protein subunits. Subunits are joined by ionic, H and hydrophobic interactions Example: Haemoglobin (4 subunits)
Structure displays
cartoon
DNA RNA Mutations Amino acids, protein structure
spacefill
backbone
Secondary structure: Interactions that occur between the C=O and N-H groups on amino acids
Tertiary structure: Organization in three dimensions of all the atoms in the polypeptide
Quaternary structure:
DNA RNA Mutations Amino acids, protein structure
The four levels of protein structure are hierarchical: each level of the build process is dependent upon the one below it
Next week