Sunteți pe pagina 1din 13

1.

PROTEIN STRUCTURE
Note: this document represent a more detailed discussion of information given in the syllabus. I. CHARACTERISTICS OF PROTEINS In conjunction with their diversity in function, there are many structures of proteins. Nevertheless, several general features are shared. A. Composition and Size Proteins are complex macromolecules of high molecular weight, ranging from 5,000 to several million Daltons. They are composed of carbon, hydrogen, nitrogen, oxygen, and usually sulfur; some contain small amounts of other elements. Proteins may be simple or may be conjugated with other non-protein substances (for example, with lipids to form lipoproteins). B. The Amino Acids Proteins are polymers; the monomeric units are amino acids. Most proteins are formed from the same set of 20 amino acids. A few proteins contain non-standard amino acids that are derived from the standard set. Proteins contain from about 50 to several thousand amino acid residues. The characteristics and structures of amino acids are important to understanding protein function and many disease processes.

H H3 N+ C R

O C O-

All the amino acids except glycine are chiral and therefore proteins are chiral. The chiral amino acids contain one or more chiral carbons, in which four different substituents are attached to the chiral center. Amino acids are normally present in the L configuration; some amino acids of the D configuration are found in the cell walls of some bacteria and other sources. Note that the D or L refers to configuration in relation to reference compounds such as glyceraldehyde. By another convention, chiral compounds are called R (Latin: rectus, right) or S !%# !%# (Latin: sinister, left) R R compounds. Again, R and S H H refer to the spatial C C configuration. A chiral compound is optically active in that a solution containing the compound can rotate the plane of polarized light.
-OOC !$# NH3+ !"# +H3N !"# COO!$#

Fig. 1 The -amino acid. R refers to the side chain that distinguishes the amino acids.

The carbons are labeled using Greek letters starting at the first or carbon. Attached to the carbon is an amino group and an acid group; hence the name -amino acid. The carbons in the side-chain, R, are labeled serially with Greek letters (, , etc.). Sometimes amino acids are written in the un-ionized form (for example, the NH3+ is shown as NH2, and COO- is shown as COOH).

S amino acid (or L) R amino Acid (or D) Fig. 2 The chiral character of amino acids.

Rules for assigning the value to substituents for a chiral molecule are as follows: Of the four atoms attached to an asymmetric carbon, the atom of higher atomic number is assigned a higher value. If two atoms are the same, comparison of the next atoms attached to each is made. A double (or triple) bond counts as two (or three) of the atomic number for the attached atom. To determine the configuration of the amino acid alanine (R=CH3), for example, values are assigned to the atoms that are attached to the asymmetric carbon: NH2-1, COOH-2, R-3, and H-4. Envision the molecule as a steering wheel with H-4 in back. Go from 1 to 2 to 3. If the steering wheel turns to the right, it is an R amino acid; if it turns to the left, it is an S amino acid. The key to deciding the polar character of the amino acids is to examine their atomic structure. The parts of amino acids with hydrocarbons (CH, CH2, CH3) are hydrophobic; the parts with oxygens and nitrogens are hydrophilic. There are different ways to classify the amino acids. Here they are classified by their chemistry. They are sometimes grouped by their partition coefficient between an organic solvent (chloroform) and water, with more hydrophilic amino acids dissolving more readily in water. They can also be classified by their appearance in globular proteins, with more polar amino acids appearing at the surface. 1. Amino acids with mostly nonpolar side-chains Alanine
H -OOC C +NH3 CH3
-OOC

Valine
H C +NH3 CH CH3 CH3

Leucine
H -OOC C +NH3 C H2 CH CH3 CH3

Isoleucine
H -OOC C +NH3 CH3 C H C H2 CH3

Methionine
H -OOC C +NH3 C H2 C S H2 CH3

Phenylalanine
H -OOC C +NH3 C H2

Tryptophan
H -OOC C C H2 +NH3 HC C

Tyrosine
H -OOC C +NH3 C H2 OH

N H

Proline
-OOC +H 2 N

H C CH 2 CH 2 C H2

Note that proline has a 5 member ring.

Note that tryptophan and tyrosine have polar portions of their side-chains and that the nonpolar portions of glycine and alanine are small. 2. Polar with neutral side-chains (at pH 7). Threonine
H -OOC C +NH3 OH C H CH3 -OOC

Serine
H C +NH3 C OH H2

Asparagine
H -OOC C +NH3 C C H2 NH2

Glutamine
H -OOC O C +NH3 C C H2 H2 C O NH2

3. Polar with charged side-chains (at pH 7) Aspartate (Aspartic acid)


H -OOC C +NH3 C C H2 O-OOC

Glutamate (Glutamic acid)


H C +NH3 C C H2 H2 C O O-

Note the relationship of aspartate to asparagine and that of glutamate to glutamine. Lysine
H -OOC C H3N+ CH2 CH2 CH2 CH2 NH3+

Arginine
H -OOC C H3N+ CH2 CH2 CH2 N H C NH2+ NH2

Histidine

H -OOC C +NH3 C H2 C HN C H NH+

4.
-OOC

Cysteine
H C +NH3 C SH H2

S S

Cysteine behaves as nonpolar when unS charged, and polar when charged (usually above pH 8). Under oxidizing conditions (typically outside the cell), cys- Fig. 3 Cysteine residues can form disulfide bonds between polypeptides. Each SS indicates the linkage between teine residues form covalent bonds the sulfurs of two cysteines. through disulfide bridges. The disulfide can cross link different chains or within the same chain, and imparts increased stability to protein structures. 5. Glycine, has a small side-chain that allows it to get into tight spaces.
H -OOC C +NH3 H

SS

Reference: Three-letter abbreviations and one-letter symbols of the amino acids


Amino Acid Alanine Asparagine Aspartate Arginine Cysteine Glutamate Glutamine Codes 3 letter 1 letter Ala A Asn N Asp D Arg R Cys C Glu E Gln Q Amino Acid Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine Codes 3 letter 1 letter Gly G His H Ile I Leu L Lys K Met M Phe F Amino Acid Proline Serine Threonine Valine Tryptophan Tyrosine Codes 3 letter 1 letter Pro P Ser S Thr T Val V Trp W Tyr Y

C. The Peptide Bond The amino acids of a protein are linked by peptide or amide bonds in a head-to-tail fashion to form one or more chains of amino acids. A chain of amino acids is called a polypeptide. The amino acid sequence, determined by the mRNA sequence is specific for each protein. Note that the formation of the peptide bond between an amino and a carboxyl group has eliminated their charges. Only the most amino terminal amine and the most carboxyl terminal carboxyl groups are still present:

Peptide bond H
+

H CO2+
+

H CO2+

O C N H

H C R2 CO2+ H2O

H3N

C R1

H3N

C R2

H3N

C R1

Fig. 4 Formation of the peptide bond. The carboxyl group of one amino acid combines with the amino group of another.

The functional properties existing in a protein are contained mainly in the R-groups or side-chains. The three-dimensional structure or conformation of proteins is also determined by the properties of the amino acid residues. Each protein has a unique amino acid sequence and a distinct three-dimensional structure. Some chains of amino acids will form a protein domain that is globular. These proteins, which include enzymes and transport proteins, are usually watersoluble. Non-polar amino acids predominate on the inside of globular water-soluble proteins; polar ones predominate on the outside. Other chains of amino acids can form aggregates and are relatively insoluble in many aqueous solutions. These fibrous proteins are often elongated and include proteins such as collagen, which is abundant in skin. The polarity of the side-chain of an amino acid can dictate its role within a protein. A watersoluble protein has mostly amino acids with hydrophilic (water-loving) side-chains on the surface interacting with water. The interior of such proteins is hydrophobic (water-fearing) with the non-polar side-chains interacting with each other. On the other hand, the part of a protein that penetrates a membrane is composed mostly of amino acids with hydrophobic side-chains (membranes are hydrophobic). Likewise, regions of proteins found associated with negatively charged molecules (such as DNA) have surface amino acids that are positively charged. Note that the atoms N, O, and S can accept hydrogen bonds, whereas hydrogens attached to these atoms (NH, OH, SH) can donate hydrogen bonds. Proteins are rich in hydrogen bonds both the side-chains of polar B-chain amino acids and the peptide bond are involved. Hydrogen bonds form when a hydrogen atom is shared between two electronegative atoms. The strength of a hydrogen bond is distance dependent, and maximum strength occurs at short distances. The strongest hydrogen bonds contain atoms that are collinear (in the same line). The local environment may also influence the strength of a hydrogen bond; nonpolar regions favor hydrogen bond formation.
X H X

A-chain Fig. 5. Model of insulin. The hydrogen bonds are shown as dotted lines.

- + (where X is nitrogen, oxygen, or sulfur) D. Proteins sometimes contain several domains.

The role of proteins in disease is often detected through genetic analysis. For example, consider the tumor suppressor protein p53, which plays a critical role in protecting cells against cancer. This protein and its gene have been studied extensively because the loss or malfunction of p53 contributes to the development of about half of all cancers, including common ones such as skin, breast, and colon cancer. DNA sequencing has shown frequent missense mutations that cluster near the center of the gene encoding the protein (Fig. 6).

gene activation 5'

Sequence specific DNA binding

p53-p53 interaction 3'

p53 gene
transcription, translation NH2 gene activation

p53 protein x

xxx

x
p53-p53 interaction

COOH

Sequence specific DNA binding

Figure 6 The p53 gene and protein. Mutational hot spots, which result in changed amino acids, are shown as 'x' on the protein sequence. They cluster in the central DNA binding region of the protein. The protein has regions with other biochemical functions gene activation and self-association.

p53, a protein of about 400 amino acid residues, can be isolated and analyzed biochemically.The three functional domains are organized into three distinct and independent folding units of 50 to 200 amino acids each (Fig. 7). Each domain behaves much like a separate protein, and a separate biochemical function is associated with each domain. One domain binds DNA in a sequence specific manner, another activates certain genes, and another associates with itself. The DNA-binding domain contacts specific base sequences on DNA. The transcriptional activation domain signals assembly the basal transcription factors for the transcription of genes located near the p53 DNA-binding sites. The oligomerization domain enhances the strength and specificity of DNA binding because four binding domains are available instead of just one.
Transcriptional activation (4x) DNA-binding (4x) Oligomerization

Figure 7 The folded units (domains) of the p53 protein.

Small proteins, containing less than 200 amino acids, usually have only one domain and one function. Protein domains are the fundamental functional and three-dimensional structural units of a protein. The amino acid chains of each domain are folded into specific three-dimensional shapes. They are connected by non-structured stretches of amino acids. Typical of multi-domain proteins, each domain is encoded by groups of one or more exons. Various segments or functional domains of p53 protein have been studied individually. The central domain is responsible for forming a complex with a specific sequence of DNA. The Nterminal region is responsible for transactivation, and coordinates with basal transcription factors whereas the C-terminus is responsible for p53-p53 interactions. The spatial arrangement of the atoms in the domains, the three-dimensional structure, has been determined by X-ray crystallographic methods or by nuclear magnetic resonance (NMR) methods. E. The amino acids of a protein form a specific structure with a specific function.

A protein is very specific in its function. The function and specificity of a protein are dependent on the amino acid sequence and the three-dimensional structure. Consider, for example, DNA recognition. Different DNA sequences present different patterns of hydrogen bond acceptors (oxygens, nitrogens), hydrogen bond donors (NH groups), and hydrophobic contacts (CH3 groups). For example, the part of the GC base pair that is accessible through the major groove of DNA shows two hydrogen bond acceptors from the guanine (Figure 8). In an AT base pair, the pattern is a hydrogen bond donor from the adenine, a hydrogen bond acceptor from the thymine, and a methyl group from the thymine. An essential feature of most DNAbinding proteins, such as p53, is a segment of protruding protein that fits into the major groove of DNA. Analogous to the hydrogen bonding that occurs between base-pairs of DNA, amino acid side-chain residues of the protein segment can form hydrogen bonds to the edge of the base-pairs. The amino acids of a specific DNA-binding protein are positioned to interact most strongly with one a particular DNA sequence. Usually between ten and twenty hydrogen bonds form between a DNA-binding protein and DNA. The correct positioning for the DNA-reading segment is stabilized by electrostatic interactions (positive charges attract negative ones) between a neighboring segment of the protein and the sugar-phosphate backbone of the DNA. Figure 8 shows how the specificity of function derives from the specific three-dimensional structure of the protein in the case of p53. Motion, or dynamics is an important aspect of protein structure. For example, p53 may bind to slightly different DNA sites with similar affinity, but will adopt slight different structures. In the absence of the target, proteins typically are more mobile and show a range of structures that include the target-bound conformation.

Fig. 8 Contacts to DNA made by p53. Left: A segment of the protein reaches into the major groove of the DNA. This segment is stabilized in part by binding to a zinc ion (shown as a sphere). A part of the protein-DNA interface is circled and expanded on the right: here one can clearly see the hydrogen bonds made by arginine 280 to one of the base-pairs. Note that hydrogen bonds by the neighboring residue, aspartate 281, position the side-chain in the correct orientation for reading the DNA base-pair, and that aspartate 281 is in turn held in place by an arginine residue (residue 273). On the lower right of the figure, the positive charge of the arginine side-chain can be seen to make a favorable electrostatic interaction with the phosphate of the DNA backbone. For clarity, only some of the side-chains are shown.

III. CLASSIFICATION OF PROTEIN STRUCTURE The structure of proteins can be considered at four levels. A. Primary structure. The sequence of amino acids in a polypeptide (or chain, or protein) B. Secondary structure. The steric relationships involving hydrogen bonds and adjacent amino acids in a polypeptide. These conformations of the polypeptide may be coiled or extended. Their structure is frequently repeating (-helix, pleated sheet, collagen helix). C. Tertiary structure. The steric relationship of non-adjacent amino acids in a polypeptide. These conformations may involve the folding of globular proteins into compact structures utilizing interactions of charged amino acid groups as well as van der Waals interactions. D. Quaternary structure. The structure formed by the interaction and relationship of separate identical or non-identical polypeptides. A. Primary Structure

The primary structure of proteins refers to the sequence of amino acid residues. By convention, they are named from the N-terminal to the C-terminal end. For example, this tetrapeptide is named alanyl-glycyl-histidyl-leucine (Ala-Gly-His-Leu):
O
+

O N H C H2 C N H CH CH 2

O C N H CH H 2C CH COO -

H 3N

CH C CH 3

+
H N

NH

H 3C

CH 3

Fig. 9 Structure of a tetrapeptide, alanyl-glycyl-histidyl-leucine.

Simple peptides (dipeptides, tripeptides, tetrapeptides, etc.) are short amino acid chains linked by peptide bonds. Peptides can be produced by partial hydrolysis of the peptide chains of proteins. Alternatively, some are synthesized as such in the body. O R C N H R R OC + R N H Except for peptide bonds, the peptides have essentially free rotation about their bonds. Peptide bonds undergo resonance and are planar:

Fig. 10 Resonance of the peptide bond keeps the atoms coplanar. The C=O and the C-N bonds have both single and double bond characteristics. The partial double bond character in the C-N bond prevents rotation about this bond and restricts the amide group to a planar and almost always trans configuration. Thus, only other bonds permit rotation in a polypeptide.

Although peptides usually have little defined global conformation, some, such as peptide hormones, are physiologically active. Larger protein fold to maximize weak, non-covalent forces such as hydrogen bonds, hydrophobic interaction, electrostatic bonds, and van der Waals forces. Hydrogen bonds. Hydrogen bonds form when a hydrogen atom is shared between two electronegative atoms. The local environment influences the strength of a hydrogen bond; those in nonpolar regions are stronger than those at the surface of a protein. Hydrophobic interactions. Each amino acid residue in a polypeptide, especially the nonpolar ones, induces the formation of a solvation shell by water, and thus imparts some order to the water. When two nonpolar groups of an unstructured polypeptide come together, the surface area exposed to water is reduced, and therefore the ordering of the water is reduced. It is called the hydrophobic effect. In thermodynamic terms, this is an increase in the entropy of the water, which lends to a favorable, stable system. The interaction of (mostly) nonpolar residues leads to a small loss of entropy; however, the entropy gained from disordering of the water solvation shell is relatively large. Thus, the gain in protein stability from hydrophobic forces is not from

the intrinsic attraction of hydrophobic residues, but mostly from a loss of ordering of the water (solvent) shell. The net result is the sequestering of nonpolar residues of the protein from the aqueous environment. Electrostatic bonds are interactions between atoms of opposite charges. The strongest electrostatic bonds occur in regions of low polarity, such as the interior of the protein. They are also known as ionic bonds or salt bridges. Van der Waals forces. Two atoms are optimally separated by the sum of their van der Waals radii. Atoms have fluctuating electrical charges and therefore local regions of partial positive and partial negative charge. A weak bonding interaction exists when two atoms approach due to the attraction between the transient partial charges. This interaction is strongly dependent on distance at closest distances there is repulsion between electron clouds. At long distances, the van der Waals force becomes negligible. Note that these different forces are related to each other. For example, the hydrogen bond is really the sum of a van der Waals force and the electrostatic force between the electronegative atom (usually N or O) and the partially positive hydrogen. The hydrophobic effect is the most poorly understood force, but it is clear the solvation shell for a protein involves hydrogen bond forces between water molecules and between water and the protein. B. Secondary Structure
Peptide bond
H N C O

Rotate here
O C H N C H R C O R H C N H O C

O H N

R H C N H

N H

cis-amide in uracil Trans-amide in peptide Planar amide group

Fig. 11 The configuration of the peptide bond and conformational mobility in peptides.

The secondary structure refers to a local structural element that stretches of amino acids can fold into. There are usually 10 to 40 amino acids in a secondary element. The chemistry of the amino acids, such as planarity of the peptide bond, suggests that only a few conformations can be adopted by a polypeptide. The distinct conformations are governed by a number of different atomic forces, including electrostatic interactions and hydrogen bonds. The different secondary elements are characterized by differences in hydrogen bonding pattern. Proposing that a polypeptide will fold to maximize the number of hydrogen bonds, Linus Pauling predicted that polypeptides will contain the -helix and the -sheet (also known as the pleatedsheet). Three-dimensional structural determination of protein structures has experimentally verified these predictions.

10

Fig. 12 Model of the -helix. Left: Note that the oxygen of the carbonyl group is hydrogen-bonded (shown as a dashed line) to the HN of the amino acid four residues down the chain. Below: The amino acid side-chains stick out radially from the core.

ARG+
THR GLU
ASP-

GLU-

ARG+

GLY GLU-

PRO

ARG+

1. The -helix In -helical structures the amino acid residues are arranged in a spiral with ~3.5 amino acids per turn. The helix rises 1.5 per residue and has 5.4 between each turn. Note that the atoms in hydrogen bonds are collinear and the hydrogen bonds are strong. The -helix is right handed with respect to the direction of the spiral. Note that the sidechains of amino acids adjacent in sequence are spatially proximate. Therefore, if two or more amino acid adjacent in sequence have ionized side-chains of like charge, for example, the -helix is not energetically favorable. This sequence of amino acids would adopt another secondary structure. The five-member ring of proline also cannot adopt the bond conformations needed to fit into the middle of a helix it is known as a helix breaker.
C O H N C CH-R O C N R-HC C H N O H N C O H O C CH-R R-HC N H O

In close-up views of protein structure where all individual bonds between the atoms are shown, the path of the polypeptide is indicated by a trace through the backbone atoms (the nitrogen, the carbon and the carbonyl carbon). The helix, for example, is shown as a ribbon in the shape of a cylinder (Fig. 12). In overall views of protein structure, the helix can also be shown by a cylinder itself. 2. The sheet

In the conformation or pleated sheet, the polypeptides are much more extended than in the -helix. Hydrogen bonding is between stretches of amino acid residues rather than within a continuous stretch of amino acid residues.

Fig. 13 Model of a (antiparallel) -sheet.

11

The chains may be parallel (i.e., with chains running from N-terminal to C-terminal in one direction) or antiparallel (with chains running in opposite directions). Again, in close-up views of protein structure as in Fig. 13, the bonds between the atoms are shown. In overviews of the protein, the individual strands of the sheet are shown by broad arrows, with the arrowhead on the C-terminal amino acid of the strand (pointing away from the N-terminal amino acid). A typical protein will contains these elements of secondary structure connected by a series of bends (known as hairpin bends or -turns), non-regular loops, and disordered peptide regions.
Some groups of secondary elements, called super-secondary elements or protein structural motifs, occur in many proteins. For example, the core of a protein may contain two sheets that constitute a sandwich.

C. Tertiary Structure The tertiary structure is the description of the conformational fold of a protein, or how the secondary elements of a protein interact with one another. They can be described best by showing a picture (Fig. 14). Many watersoluble proteins approximate a sphere. Not only is protein chiral at the amino acid level, but globally adopts stereospecificity in its three-dimensional shape. Many drugs that bind to proteins are also chiral compounds. The synthesis of these drugs often results in a mixture of the R and S forms (a racemic mixture), with one form being active, and the other form either inactive or dangerous.
Fig. 14 The DNA-binding domain of p53 bound to DNA. The major part of the domain comprises two sheets that stack face to face across an extended hydrophobic core. The Zn2+ ion is below the left end of the helix at the top of the protein. The zinc is liganded by cysteine 176, histidine 179, cysteine 238, and cysteine 242 and thus holds together parts of the protein chain that are distant (about 60 residues) in sequence. Note the complementary fit between the protrusion of the protein and the major groove of the DNA.

D. Quaternary Structure The quaternary structure is the interaction and relationship of separate (identical or non-identical) polypeptides. Only proteins that have more than one polypeptide chain have quaternary structure. p53, for example, exists as a tetramer in nature, i.e., has four polypeptides. The four DNA-binding domains are connected to the oligomerization domain by flexible, nonstructured loops. Although Figures 8 and 14 show only one subunit binding to a pentamer DNA sequence, full-length p53 binds as a tetramer to four closely spaced DNA sequences of 5 base12

pairs each (i.e., 20 base-pairs are contacted in total). IV. THE EFFECT OF MUTATIONS ON PROTEIN STRUCTURE The changing of one amino acid to another often has little consequence for the function of a protein when the substitution conserves the nature of the amino acid. When a polar amino acid is substituted by a non-polar amino acid, however, or vice versa, or the charge of the amino acid changes, there can be a dramatic effect on function. Mutations in the p53 DNA-binding domain have been studied in detail. One class of mutations involves residues that contact the DNA. Failure to bind DNA by these mutants can be attributable to a loss of a critical DNA interaction. These mutants have little change in threedimensional (secondary, tertiary) structure. Another class of mutations involves residues that are important for the correct fold of the DNA-binding domain. Loss of DNA binding by these mutants can be attributed to structural defects. Certain mutant p53 proteins have been shown to be stabilized with small molecules (i.e., potential new drugs).

13

S-ar putea să vă placă și