Sunteți pe pagina 1din 42

Protein Structure

Introduction:
One of the major goals of bio-informatics is to understand the relationship
between amino-acid sequence and three-dimensional structure in proteins. If
this relationship were known, then the structure of a protein could reliably
predicted from the amino acid sequence.
Proteins are chains of amino acids joined by peptide bonds. The common
amino acids have the general structure that contains a central alpha ()carbon atom (C) to which a carboxylic acid group, amino group and
hydrogen atom are covalently bonded. In addition, the -carbon atom binds
a side chain group, designated R that uniquely defines each of the 20
essential amino acids. The R side chains also play important structural role.
Table 1: Properties of Amino acids
3Relative
Charged,
Single
VdW
letter
abundance MW pK
Polar,
code
volume(3)
code
(%) E.C.
Apolar
Alanine
ALA A
13.0
71
67
H
Arginine
ARG R
5.3
157 12.5 148
C+
Asparagine ASN N
9.9
114
96
P
Aspartate
ASP D
9.9
114 3.9 91
CCysteine
CYS C
1.8
103
86
P
Glutamate
GLU E
10.8
128 4.3 109
CGlutamine
GLN Q
10.8
128
114
P
Glycine
GLY G
7.8
57
48
Histidine
HIS H
0.7
137 6.0 118
P,C+
Isoleucine
ILE I
4.4
113
124
H
Leucine
LEU L
7.8
113
124
H
Lysine
LYS K
7.0
129 10.5 135
C+
Methionine MET M
3.8
131
124
H
Phenylalanine PHE F
3.3
147
135
H
Proline
PRO P
4.6
97
90
H
Serine
SER S
6.0
87
73
P
Threonine
THR T
4.6
101
93
P
Tryptophan TRP W
1.0
186
163
P
Tyrosine
TYR Y
2.2
163 10.1 141
P
Valine
VAL V
6.0
99
105
H
Name
(Residue)

Fig 1: Basic structure of different amino acids


CORN is an acronym for -COOH (the
main chain caboxylic acid group), the -R
group (representation of side chain) and
-NH2 (the nitrogen of the main chain
amine group). The acronym CORN is
produced with a clockwise eye movement
then the molecule is the L- form. The Lcomes from the fact that the amino group is
on the L-eft side in its Fisher projection of
the compound. (Strictly speaking the Land D- notation has been superceded by the
more modern R- and S- notation for
referring to stereo-isomers). Only one of
the twenty amino acids is not in the Lform, and that is glycine.

Many conformations of the chain are possible due to rotation of the chain
about each C atom. It is these conformational variations that are responsible
for differences in the three-dimensional structures of proteins. Each amino
acid in the chain is polar, i.e., it has separated positive and negatively
charged regions with a chemically free C=O group, which can act as a
hydrogen bond acceptor, and an NH group, which can act as a hydrogen
bond donor. These groups interact in protein structures.
Fig 2: The structure of two amino acids in a polypeptide chain.

amino acid 1

C
C

peptide bond amino acid 2


(covalent bond)
Neighbouring amino acids are joined by a peptide bond between the C=O
and NH groups. The N-C-C sequence is repeated throughout the protein,
forming the backbone of the three-dimensional structure. The amino acid at
one end of the chain has a free NH2 group (chain beginning) and the amino
acid at the other end has a free COOH group (chain end). The bonds on each

side of the C atom are quite free to rotate, but many combinations of angles
are not possible for most amino acids due to spatial constraints from the R
group and neighboring positions in the chain. The conformation of the
protein backbone in space is determined by the angles of these bonds,
(phi) of the bond between the N and C atoms and (psi) of the bond
between the C and C of the C=O group, also named C. All possible
conformations of a polypeptide chain can be described in terms of their ,
conformational angles, a description that automatically takes account of the
fixed geometric features of the polypeptide backbone. The distribution of
these two angles for the amino acids in a particular protein is often plotted
on a graph called a Ramachandran plot. The angle of the peptide bond
joining the C=O and NH groups (not shown) is nearly always 180.
The side chains define the structures of the different amino acids:
1. Amino acids having uncharged side chain R groups at physiological
pH:
a. Category of alkyl amino acids:
Glycine has the simplest structure, with R=H. It does not have a side chain
and can therefore increase local flexibility in structures.
Alanine contains a methyl (CH3 ) R group.
Valine has an isopropyl R group.
Leucine and isoleucine has R groups, which are butyl alkyl chains that are
structural isomers of each other. In leucine, the branching methyl group in
the isobutyl side chain occurs on the gamma ()-carbon of the amino acid.
In the isoleucine, the butyl side chain is branched at the carbon.
b. Category of aromatic amino acids:
Phenyl alanine having R group, which contains a benzene ring.
Tyrosine contains a phenol group in the side chain.
Tryptophan has R group, which contains a heterocyclic structure known
as an indole.
In the three aromatic amino acids the aromatic moiety is attached to the carbon through a methylene ( CH2 ) carbon.
c. Sulphur-containing common amino acids category:
Cystein has R group, which has thiolmethyl (HSCH2 ). Cystein can react
with another cystein to form a cross-link that can stabilize the protein
structure.

Methionine has the side chain, which contains a methyl-ethyl-thiol ether


(CH3SCH2CH2 ).
d. Hydroxy (alcohol)-containing common amino acids:
Serine has a side chain, which contain a hydroxymethyl moiety
(HOCH2 ).
Threonine contains an ethanol structure, which is connected to the
carbon of the amino acid structure for R (CH3CHOH ).
e. -amino containing amino acid:
Proline has the side chain, R, which is unique in that it incorporates the amine in the side chain. Thus the proline is more accurately classified as an
-amino acid, since its -amine is a secondary amine, rather than a primary
amine.
2. Amino acids having charged side chain R groups:
a. The next category of amino acids, the dicarboxylic monoamino acids,
contains a negatively charged carboxylate R group at pH 7.
Aspartate: In aspartate the side chain carboxylic acid group is separated by
a single methylene (-CH2-) from the carbon. This differs from glutamate,
in which the -carboxylic acid group is separated by two methylene (-CH2CH2-) carbon atoms from the carbon of the generalized structure.
b. Dibasic monocarboxolic acid structures are present in lysine, arginine,
and histidine. In these structures, the R group contains a nitrogen or nitrogen
atoms that may be protonated to form a positively charged side chain.
Lysine: In lysine the side chain is simply N-butyl amine.
Arginine: In arginine the side chain group contains a guanidino group of
arginine and the amino group of lysine are predominantly protonated at
physiological pH (pH ~ 7.0) and are in their charged forms.
Histidine: In histidine the side chain R group contains a five membered
heterocyclic structure known as imidazole group. The pKa of the imidazole
group of histidine is approximately 6.0 in water, and physiological solutions
will contain relatively high concentrations of both the basic (imidazole) and
acidic (imidazolium) forms of the histidine side chain of physiological pH
(pH ~7.0).
3. Amino acids having amide moiety in their side chain R group:
Glutamine and asparagines may be considered structural analogs of
glutamic acid and apartic acid, respectively, with their side chain carboxylic
acids amidated. The amide side chains of glutamine and asparagines cannot
be protonated and are uncharged in the range of physiological pH.

The hydrophobic properties of the amino acids:


Hydrogen bonding and van der Waals forces are of great importance in
determining the secondary structures formed by fibrous proteins. To
understand the complex folded structures found in globular proteins,
additional types of interactions between amino acid side chains and water
are of great importance. The so-called hydrophobic forces that lead to
interaction of hydrophobic groups in proteins are the hardest type of
noncovalent interactions and such relative hydrophobicity of an amino
acid side chain is considered to play an important role in defining a specific
amino acids contribution to a protein structure and function. Table 2,
contains values of relative hydrophobicity of the 20 common amino acids
based on the partition between water and a nonpolar solvent.
Table 2: Relative Hydrophobic Properties of Amino Acids
Amino acids

Relative
hydrophobicitya
(kcal mol-1)

Alaline
Arginine
Asparagine
Aspartic Acid
Cysteine
Glycine
Glutamine
Glutamic acid
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine

0.5
(-11.2)
-0.2
(-7.4)
(-2.8)
0
-0.3
(-9.9)
0.5
2.5
1.8
(-4.2)
1.8
2.5
(-3.3)
-0.3
0.4
3.4
2.3
1.5

Fraction of
time found
buried in
protein str.b
0.38
0.01
0.12
0.15
0.40c
0.36
0.07
0.18
0.17
0.60
0.45
0.03
0.40
0.50
0.18
0.22
0.23
0.27
0.15
0.54

Hydrophobicities
measured by the
distribution of the
amino acid between a
nonpolar solvent,
either ethanol or
dioxane, and water.
Negative values
indicate a preference
for nonpolar solvent.
b
Amino acid residue
must be atleast 95%
buried within the
interior of the protein
structure in order to be
counted as buried.
c
Values is for Cys in
thiol form. Value for
disulfide form is 0.50.

The scale is based on glycine being zero.The amino acid side chains that
preferentially dissolve in the nonpolar solvent have a positive
hydrophobicity value and it is found ---- with the more positive value, the

greater its preference for the nonpolar solvent (leucine, isoleucine,


phenylalanine, tryptophan, tyrosine, and valine are the most positive).
In the last column of table 2, are values of the fraction of times an amino
acid is found buried within the interior of a protein structure. The data is
based on the observed three-dimensional structures of 12 proteins as
determined by X-ray diffraction analysis. The volume of a buried amino
acid must be at least 95% within the interior of the protein structure.
Those residues preferring nonpolar solvents (i.e., positive hydrophobicity)
are also those amino acids found buried in folded protein structures. This
correlation is not exact, however, due to the amphoteric nature of many of
the hydrophobic amino acids that place the more polar portions of their
structure near the surface. In addition, contrary to expectation, a significant
number of nonpolar residues will not be buried and are exposed to the water
solvent as a result of three-dimensional folding of the protein. However,
these hydrophobic side chain groups on the surface appear generally
dispersed and isolated among polar and ionized surface residues. The
clustering of two or more nonpolar side chain groups may occur in small
regions of the protein surface and are usually associated with a function of
the protein such as providing a hydrophobic binding site for substrate or
ligand molecules to the protein. Amino acids of intermediate polarity
include glycine, tyrosine with its polar OH group, threonine, serine, and
proline. These amino acids are found both in the interior and on the solvent
protein interface in significant proportions. In contrast, the polar amino
acids glutamine and asparagines and the amino acids containing charged
R groups at pH 7 (lysine, arginine, histidine, glutamate, and aspartate)
are predominantly found on the surface in globular proteins where the
charge is stabilized by the water solvent. As a result of the complex nature of
protein folding, specific interactions can cause the stabilization of charged
residues in the nonpolar interior of a protein. The rare positioning of a
charged side chain into the interior of a globular protein is usually correlated
with an essential structural or functional role for the buried charged side
chain group with the nonpolar interior of the protein.
The hierarchial nature of protein structure:
The Danish protein chemist K.U. Linderstrom-Lang described the
following levels of protein structure:
1. Primary structure: The primary structure of a protein refers to the
covalent structure of a protein. The covalent structure of a protein
includes all the covalent bonds between amino acids and is normally

defined by the sequence of peptide-bonded amino acids and


locations of disulphide bonds.
Higher levels of protein organization refer to noncovalently
generated conformational properties of the primary structure. These
higher levels of protein conformation and organization are
customarily defined the secondary, tertiary, and quarternary
structures of a protein.
2. Secondary structure: It refers to the local conformations of the
polyopeptide chain in the protein. The polypeptide chain in this
context is refers to the covalently interconnected atoms of peptide
bonds and -carbon linkages that string the amino acid residues of
the protein together. Side chain (R) groups are not included at the
level of secondary structure. For example, secondary structures of
polypeptide chains may form noncovalently generated conformations
that are helical (i.e., -helix).
The secondary structure arises from the interactions of neighboring
amino acids. Because the DNA-coded primary sequence dictates
which amino acids are near each other, secondary structure often
forms as the peptide chain comes off the ribosome after protein
synthesis.
An important characteristics of secondary structure is the
formation of hydrogen bonds (H bonds) between the CO group of
one peptide bond and the NH group of another nearby peptide
bond. If the H bonds forms between peptide bonds in the same
chain, either helical structures such as the -helix develop or turns
such as -turns are formed. If H bonds form between peptide bonds
in different chains, extended structures form, such as the -pleated
sheet.
3. Tertiary structure: It refers to the total three-dimensional structure
of the polypeptide units of the protein. It includes the
conformational relationships in space of the side chain groups to
the polypeptide chain and the geometric relationship of distant
regions of the polypeptide chain to each other.
The secondarily ordered polypeptide chains of soluble proteins
tend to fold into globular structures with the hydrophobic side
chains in the interior of the structure which remain away from the
water and the hydrophilic side chains on the outside in contact

with water. This folding is due to associations between segments of


-helix, extended -chains, or other secondary structures and
represents a state of lowest energy (i.e., of greatest stability for the
protein).
The conformation results from:
a. Hydrogen-bonding within a chain or between chains.
b. The flexibility of the chain at points of instability, allowing
water to obtain maximum entropy and thus govern the structure
to some extent
c. The formation of other noncovalent bonds between side-chain
groups, such as salt linkages, or -electron interactions of
aromatic rings.
d. The sites and number of disulphide bridges between Cys
residues within the chain.
4. Quarternary structure: it refers to the structure and interactions of
noncovalent association of discrete polypeptide subunits into a
multisubunit protein. Not all proteins have a quarternary structure.

Quaternary structure
of
Protein Kinase C
Interacting Protein

Some representation of the same helix, ball and stick, backbone


and the secondary structure.

Chain B of Protein
Kinase C Interacting
Protein.
Helices are
visualized as ribbons
and
extended strands of
beta sheets by broad
arrows.

-Helix:
Fig 3: The -helix and sheets of
protein secondary structure. (A) In the
helix, note that each C=O group at amino
acid position n in the sequence is
hydrogen-bonded with the NH group at
position n+4. There are 3.6 residues per
turn. The helix is usually right-handed, but
short section of 3-5 amino acids of left
-handed helices occur occasionally. The
average and angles of amino acids in
the right-handed helix are approximately
60 and 40, respectively. (B) The sheet
is made up of strands that are portions of
protein chain. The strands may run in the
same (parallel) or opposite (antiparallel)
strands.

The -helix is the most abundant type of secondary structure in proteins.


Characteristics of the -helical conformation are 3.6 amino acid residues
per 360 turn (n=3.6) with an H bond formed between every fourth
residues; the average length is 10 amino acids (3 turns) or 10 but
varies from 5 to 40 (1.5 to 11 turns). The alignment of the H bonds creates
a dipole moment for the helix with a resulting partial positive charge at the
amino end of the helix. Because this region has free NH2 groups, it will
interact with negatively charged groups such as phosphates. The
commonest location of helices is at the surface of protein cores, where
they provide an interface with the aqueous environment. The inner-facing
side of the helix tends to have hydrophobic amino acids and the outerfacing side hydrophilic amino acids. Thus, every third of four amino
acids along the chain will tend to be hydrophobic, a pattern that can be
readily detected. In the leucine zipper motif, a repeating pattern of leucines
on the facing sides of two adjacent helices is highly predictive of the motif.
A helical-wheel plot can be used to show this repeated pattern. Other -

helices buried in the protein crore or in cellular membranes has a higher and
more regular distribution of hydrophobic amino acids, and are highly
predictive of such structures. Helices exposed on the surface have a lower
proportion of hydrophobic amino acids. Amino acid content can be
predictive of an -helical regions. Regions richer in alanine (A),
glutamine (E), leucine (L), and methionine (M) and poorer in proline
(P), glycine (G), tyrosine (Y), and serine (S) tend to form an -helix.
Proline destabilizes or breaks an -helix but can be present in longer helices,
forming bend. There are computer programs (garnier program of Emboss)
for predicting quite reliably the general location of -helices in a protein
sequences.
-Sheet:
Besides alpha helices, beta sheets are another major structural elements in
globular proteins containing 20-28% of all residues. The backbone of the
polypeptide chain is extended into a zigzag rather than helical structure. In
fibroin the zigzag polypeptide chains are arranged side by side to form a
structure resembling a series of pleats, which is called a pleated sheet.
The basic unit of a beta sheet is a beta strand (which can be thought of as a
helix with n = 2 residues/turn) with approximate backbone dihedral angles
phi = -120 and psi = +120 producing a translation of 3.2 to 3.4
Angstroms/residue for residues in antiparallel and parallel strands,
respectively. The beta strand is then like the alpha helix, a repeating
secondary structure.

Fig 4: Two polypeptide chains in a structure conformation

The hydrogen bond network is seen in


a 2-stranded, antiparallel -sheet. The
side chains are sticking out above or
below the plane of the picture. The beta
sheet can be infinitely extended due to
the repeatable H-bonding pattern to
either side of a strand.

However, since there are no intrasegment hydrogen bonds and van der Waals
interactions between atoms of neighboring residues are not significant due to
the extended nature of the chain, this extended conformation is only stable
as part of a beta sheet where contributions from hydrogen bonds and van der
Waals interactions between aligned strands exerts a stabilizing influence.
The reason why beta sheet is sometimes called the beta "pleated" sheet is
that sequentially neighboring CA atoms are alternately above and below the
plane of the sheet giving a "pleated" appearance.
Beta sheets are found in two forms designated as Antiparallelor Parallel
based on the relative directions of two interacting beta strands. The average
length of a beta sheet is about 6 residues and most beta sheets contain less
than 6 strands. Side chains from adjacent residues of a strand in a beta sheet
are found on opposite sides of the sheet and do not interact with one another.
Therefore, like -helices, -sheets has the potential for amphiphilicity with
one face polar and the other apolar.
However, unlike -helices, which are comprised of residues from a
continuous polypeptide segment (i.e., hydrogen bonds between CO of
residue i and NH of residue i+3), -sheets are formed from strands that are
very often from distant portions of the polypeptide sequence. Hydrogen
bonds in -sheets are on average 0.1 Angstrom shorter than those found in
alpha helices.

Fig 5. The protein thioredoxin (2TRX.PDB)


contains a five-stranded beta sheet comprised
of three parallel strands and three antiparallel
strands. The entire protein is shown as a
cartoon with the beta strands (three parallel
strands and three antiparallel strands) colored
red and alpha helices colored yellow.

Half of the backbone hydrogen bond donors and acceptors in strands at the
edges of a -sheets are uninvolved in the sheet interactions. Unlike the
situation with alpha helices (helix capping residues in beta sheets have not
yet been characterized. Sheets can fold onto themselves forming a barrel or
cylinder, thus eliminating the non-hydrogen bonded "ends" (see the TIM
barrel in Fig 6).

Fig 6: Pleated Sheet


Structure
between two
polypeptide
chains.

Parallel sheets
The hydrogen bonds in a parallel -sheet are not perpendicular to the
individual strands resulting in component (~1/3 of the peptide dipole)
parallel to the strand. The parallel sheet then has an overall macrodipole
leaving an effective charge of ~ +1/15 unit elemental charge at the Nterminus and - 1/15 charge at the C-terminus of each strand of average
length (~5 times less than an average -helix macrodipole).
An example of a mixed -sheet containing a parallel component is given
using the protein thioredoxin.

Fig 7: The three-stranded parallel beta sheet in


thioredoxin (2TRX.PDB). The three parallel
strands are shown in both cartoon format (left)
and in stick form containing backbone atoms N,
CA, C, and O' (right). Hydrogen bonds are
identified by arrows connecting the donor
nitrogen and acceptor oxygens. Strands are
numbered according to their relative position in
the polypeptide sequence.

Antiparallel sheets
A residue in an antiparallel beta strand has values of -139 and +135 degrees
for the backbone dihedral angles phi and psi, respectively. Antiparallel beta
sheets are thought to be intrinsically more stable than parallel sheets due to
the more optimal orientation of the interstrand hydrogen bonds and that
peptide bond dipoles of nearest neighbors within an strand cancel whereas in
the parallel sheet, components of the dipoles parallel to the strands align and
may interact unfavorably.
However, in an exhaustive survey of hydrogen bonds in protein 3D
structures, no significant difference in the linearity of the hydrogen bonds in
parallel and antiparallel sheets was found. An example of a mixed beta sheet
containing an antiparallel component is shown using the protein thioredoxin
in (Fig 8).
Fig 8: The three-stranded antiparallel beta
sheet
in
the
protein
thioredoxin
(2TRX.PDB). The three antiparallel strands
are shown in both cartoon format (left) and
in stick form containing backbone atoms N,
CA, C, and O' (right). Hydrogen bonds are
identified by arrows connecting the donor
nitrogen and acceptor oxygens. Strands are
numbered according to their relative position
in the polypeptide sequence.

Twists
The classical beta sheets originally proposed are planar but most sheets
observed in globular proteins are twisted (0 to 30 degrees per residue).
Antiparallel beta sheets are more often twisted than parallel sheets. This
twist is always of the same handedness, but unfortunately, it has been
described using two conflicting conventions in the literature. If defined in
terms of the angle of strand crossings, the twist is left-handed and if defined
in terms of the progressive twist of the hydrogen-bonding direction, the twist
is right-handed. Two-stranded beta strands show the largest twists. At least
two separate explanations have been offered as reasons for the observed
twist.
One, based on statistical reasoning, suggests that the greater amount of
allowed phi, psi space to the "right" of the classical beta strand region would
produce a right-handed twist.
Another suggests a systematic distortion of the tetrahedral nitrogen, seen in
X-ray crystal structures, would favor a right-handed twist. A classical
example is the antiparallel beta strand in the bovine pancreatic trypsin
inhibitor.
Fig 9: A severely twisted
antiparallel beta hairpin from
the pancreatic trypsin
inhibitor (5PTI).

The pleated nature of the sheet becomes distinctly visible in the righ panels of
this figure, showing also the side chains sticking out above and below the sheet
plane. If you look carefully, you will also notice that the sheet has a left twist
(centre panel).

Bulges
Another irregularity found in antiparallel beta sheets is the hydrogenbonding of two residues from one strand with one residue from the other
called a beta bulge. Bulges are most often found in antiparallel sheets with
~5% of bulges occurring in parallel strands. Bulges, as well as Turnseffect
the directionality of the polypeptide chain and classical ones, like the one
shown in Figure 13 also accentuate the right-handed twist of the sheet.
Fig10: A classical beta-bulge in
chymotrypsin (1CHG.PDB) involving
residues 33 and 41-42. Hydrogen bonds are
indicated with dotted lines connecting the
donor and acceptor atoms.

Strand connections
There are two basic categories of connections between the individual strands
of a beta sheet. When the backbone enters the same end of the sheet that it
left it is called a hairpin connection and when the backbone enters the
opposite end it is called a crossover connection. Crossover connections can
be thought of as a type of helical connection of the strand ends. In globular

proteins, right-handed crossovers are the rule, although a few examples of


left-handed crossovers are available (e.g., subtilisin and glucose phosphate
isomerase).
Super-secondary structure
Secondary structure elements are observed to combine in specific geometric
arrangements known as motifs or super-secondary structures. In this
section we will look at motifs consisting of no more than three secondary
structure elements. Larger motifs such as the Greek key will be examined in
the sections on tertiary structure and protein folds.
a) -hairpins:
-hairpins are one of the simplest super-secondary structures and are
widespread in globular proteins.

They occur as the short loop regions between anti-parallel hydrogen bonded
-strands. In general a reverse turn (or -turn, as it they are sometimes
called) is any region of a protein where there is a hydrogen bond involving
the carbonyl of residue i and the NH group of residue i+3. An alternative
definition states that the -carbons of residues i and i+3 must be within 7.0
. The structures of reverse turns are outlined in section 1.4. Sibanda and
Thornton have devised a system for classifying -hairpins, which is based
on two conventions for defining loop regions.
-hairpin loops adopt specific conformations which depend on their lengths
and sequences. Sibanda and Thornton have shown that 70% of -hairpins
are less than 7 residues in length with the two-residue turns forming the
most noticeable component. These two-residue -hairpins all adopt one of
the classical reverse turn conformations with an obvious preference for types
I' and II'. Type I 2-residue hairpins also occur but with lower abundance.
This contrasts with reverse turns where types I and II tend to dominate. In bhairpins the type I' turn has the correct twist to match the twist of the -sheet
and modelling studies indicated that if either type I or type II turns were to

connect the anti-parallel -strands, they would diverge within a short


distance from the turn.
b) Two-residue -hairpins
Type I'.The first residue in this turn adopts the left-handed a-helical
conformation and therefore shows preference for glycine, asparagine or
aspartate. These residues can adopt conformations with positive F angles due
to the absence of a side chain with glycine and because of hydrogen bonds
between the side chain and main chain in the case of asparagine or aspartate.
The second residue of a type I' turn is nearly always glycine as the required
and angles are well outside the allowed regions of the Ramachandran
plot for amino acids with side chains. Were another type of amino acid to
occur here there would be steric hindrance between its side chain and the
carbonyl oxygen of the preceding residue.
Type II'. The first residue of these turns has a conformation, which can only
be adopted by glycine (see below Ramachandran plot). The second residue
shows a preference for polar amino acids such as serine and threonine.

Type I. Both residues of


these turns adopt helical conformations

c) Three-residue -hairpins
Normally, the residues at the ends of the two -strands, only make one
hydrogen bond as shown below. The intervening three residues have distinct
conformational preferences as shown in the Ramachandran plot. The first
residue adopts the right-handed a-helical conformation and the second amino
acid lies in the bridging region between between -helix and -sheet.
Glycine, asparagine or aspartate are frequently found at the last residue
position as this adopts and angles close to the left-handed helical
conformation.

d) Four-residue -hairpins
These are also quite common with the first two residues adopting the ahelical conformation. The third residue has and angles, which lie in the
bridging region between between -helix and -sheet and the final residue
adopts the left-handed -helical conformation and is therefore usually
glycine, aspartate or asparagine.

The torsion angles for the residues


(i+1) and (i+2) in the two types of
turn lie in distinct regions of the
Ramachandran plot.

Reverse Turns
A reverse turn is region of the polypeptide having a hydrogen bond from one
main chain carbonyl oxygen to the main chain N-H group 3 residues along
the chain (ie O(i) to N(i+3)). Helical regions are excluded from this
definition and turns between beta-strands form a special class of turn known
as the beta-hairpin (see later). Reverse turns are very abundant in globular
proteins and generally occur at the surface of the molecule. It has been
suggested that turn regions act as nucleation centres during protein folding.
Reverse turns are divided into classes based on the phi and psi angles of the
residues at positions i+1 and i+2. Types I and II shown in the figure below
are the most common reverse turns, the essential difference between them
being the orientation of the peptide bond between residues at (i+1) and (i+2).

Note that the (i+2) residue of the type II


turn lies in a region of the Ramachandran
plot which can only be occupied by
glycine. From the diagram of this turn it
can be seen that were the (i+2) residue to
have a side chain, there would be steric
hindrance with the carbonyl oxygen of the
preceding residue. Hence, the (i+2) residue
of type II reverse turns is nearly always
glycine.

Longer loops
For these, a wide range of conformations is
observed and the general term 'random coil' is
sometimes used. Consecutive anti-parallel strands when linked by hairpins form a supersecondary structure known as the - meander.
-strands have a slight right-handed twist such that
when they pack side-by-side to form a -sheet, the
sheet has an overall left-handed curvature. Antiparallel -strands forming a -hairpin can
accommodate a 90 degree change in direction known as a -corner. The
strand on the inside of the bend often has a glycine at this position while the
other strand can have a -bulge. The latter involves a single residue in the
right-handed -helical conformation, which breaks the hydrogen bonding
pattern of the -sheet. This residue can also be in the left-handed helical or
bridging regions of the Ramachandran plot.
-corners are observed to have a right-handed twist when viewed from the
concave side.

Helix hairpins
A helix hairpin or-hairpin refers to the loop connecting two anti-parallel
-helical segments. Clearly, the longer the length of the loop then there is
greater chance of the number of possible conformations. However, for short
connections there are a limited number of conformations and for the shortest
loops of two or three residues, there is only one allowed conformation. Antiparallel -helices will interact generally by hydrophobic interactions
between side chains at the interface. Therefore, hydrophobic amino acids
have to be appropriately positioned in the amino acid sequence (one per turn
of each helix) to generate a hydrophobic core. Efimov has analyzed the
conformations of -hairpins and some of his results are summarized below.
The shortest -helical connections involve two residues, which are oriented
approximately perpendicular to the axes of the helices. Analysis of known
structures reveals that the first of these two residues adopts and angles in
the bridging or -helical regions of the Ramachandran plot. The second
residue is always glycine and is in a region of the Ramachandran plot with
positive phi, which is not available to other amino acids.
Three residue loops are also observed to have conformational preferences.
The first residue occupies the bridging region of the Ramachandran plot, the

second adopts the left-handed helical conformation and the last residue is in
a -strand conformation.
Four-residue loops adopt one of two possible conformations. One is similar
to the three-residue loop conformation described above except that there is
an additional residue in the -strand conformation at the fourth position. The
other conformation involves the four residues adopting bridging, , bridging,
and conformations, respectively.
The - corner
Short loop regions connecting helices, which are roughly perpendicular to
one another, are referred to as -corners. Efimov has shown that the
shortest -corner has its first residue in the left-handed -helical
conformation and the next two residues in -strand conformations. This
conformation can only be adopted when the two helices form a right-handed
corner. Indeed, if the helices were linked to form a left-handed corner there
would be steric hindrance. This may explain the scarcity of left-handed corners in protein X-ray structures.
The C-terminal residue of the first helix, which is in the left-handed helical conformation, must have a short side chain to avoid steric hindrance
and is observed commonly to be glycine. The first residue of the second
helix, which is in the -conformation, frequently has a small polar side chain
such as serine or aspartate, which can form hydrogen bonds with the free
NH groups at the amino-terminal end of the second helix. The central
residue of the -corner is almost always hydrophobic as it is buried and
interacts with other non-polar side chains buried where the ends of the two
helices contact each other.

Helix-turn-helix
The loop regions connecting -helical segments can have important
functions. For example, in parvalbumin there is helix-turn-helix motif,
which appears three times in the structure. Two of these motifs are involved
in binding calcium by virtue of carboxyl side chains and main chain
carbonyl groups. This motif has been called the EF hand as one is located
between the E and F helices of parvalbumin. It now appears to be a
ubiquitous calcium-binding motif present in several other calcium-sensing
proteins such as calmodulin and troponin C.
EF hands are made up from a loop of around 12 residues, which has polar
and hydrophobic amino acids at conserved positions. These are crucial for
ligating the metal ion and forming a stable hydrophobic core. Glycine is
invariant at the sixth position in the loop for structural reasons. Carboxyl
side chains, main chain groups and bound solvent, octahedrally coordinate
the calcium ion.

A different helix-loop-helix motif is also common to certain DNA binding


proteins. This motif was first observed in prokaryotic DNA binding proteins
such as the cro repressor from phage lambda. This protein is a homo-dimer
with each subunit being 66 amino acids in length. Each subunit consists of
an all-anti-parallel three stranded -sheet with three helical segments
inserted sequentially between the first and second -strands. The two
subunits of cro associate by virtue of the third -strands, which interact
forming a six-stranded -sheet in the centre of the molecule. Mutagenesis
and biochemical work had indicated that residues in the second helix of each
cro monomer interacted with DNA. Accordingly, model-building studies
indicated that both these helices in the dimeric protein would fit into the
major groove of B-DNA. These proteins recognise base sequences which are
palindromic, i.e. possess an internal two-fold symmetry axis. The two
recognition helices of the cro protein are also related by a two-fold axis
passing through the central -sheet region of the dimer. Therefore, the
recognition helices of the cro dimer fit into the major groove of the DNA
and interact with each identical half of the palindrome. Hence, the second
helix of the helix-turn-helix motif has an important role in recognising the
DNA while the remainder of the structure serves to keep the two helices in
the correct relative position for fitting in the major groove of DNA. Many

other helix-turn-helix proteins with different folds exhibit essentially the


same mode of binding to DNA.
-- Motifs
Anti-parallel -strands can be linked by short lengths of polypeptide forming
-hairpin structures. In contrast, parallel -strands are connected by longer
regions of chain, which cross the -sheet and frequently contain -helical
segments. This motif is called the -- motif and is found in most proteins
that have a parallel -sheet. The loop regions linking the strands to the
helical segments can vary greatly in length. The helix axis is roughly parallel
with the -strands and all three elements of secondary structure interact
forming a hydrophobic core. In certain proteins the loop linking the carboxyterminal end of the first -strand to the amino terminal end of the helix is
involved in binding of ligands or substrates. The -- motif almost always
has a right-handed fold as demonstrated in the figure.

Motifs Are Regular Combinations of Secondary Structures


Many proteins contain one or more motifs built from particular
combinations of secondary structures. A motif is defined by a specific
combination of secondary structures that has a particular topology and is
organized into a characteristic three-dimensional structure.
The coiled-coil motif comprises two, three, or four amphipathic helices
wrapped around one another. In this motif, hydrophobic side chains project
like knobs from one helix and interdigitate into the gaps, or holes,
between the hydrophobic side chains of the other helix along the contact
surface. The subunits in some multimeric proteins and in rodlike fibers are
held together by coiled-coil interactions. The Ca 2+-binding helix-loop-helix
motif is marked by the presence of certain hydrophilic residues at invariant
positions in the loop. Oxygen atoms in the invariant residues bind a calcium

ion through hydrogen bonds. In another common motif, the zinc finger, three
secondary structuresan helix and two strands with an antiparallel
orientationform a fingerlike bundle held together by a zinc ion. This motif
is most commonly found in proteins that bind RNA or DNA.
Additional motifs will be examined in discussions of other proteins. The
presence of the same motif in different proteins with similar functions
clearly indicates that during evolution these useful combinations of
secondary structures have been conserved.

A typical small motif is the


calcium binding EF-hand in
calmodulin, a ubiquitions
molecule undergoing Cadependent conformational
changes. It contains 4 Ca++
ions which are coordinated in
a typical fashion in a helixturn-helix motif called the EF
hand.
Motif

Structural and Functional Domains Are Modules of Tertiary Structure


The tertiary structure of large proteins is often subdivided into distinct
globular or fibrous regions called domains. Structurally, a domain is a
compactly folded region of polypeptide. For large proteins, domains can be
recognized in structures determined by x-ray crystallography or in images
captured by electron microscopy. These discrete regions are well
distinguished or physically separated from other parts of the protein, but
connected by the polypeptide chain. Hemagglutinin, for example, contains a
globular domain and a fibrous domain.

Figure 3-4. Four levels of structure in hemagglutinin, which is a long


multimeric molecule whose three identical subunits are each composed
of two chains, HA1 and HA2. (a) Primary structure is illustrated by the
amino acid sequence of residues 68 195 of HA 1. This region is used by
influenza virus to bind to animal cells. The one-letter amino acid code is
used. Secondary structure is represented diagrammatically beneath the
sequence, showing regions of the polypeptide chain that are folded into
helices (light blue cylinders), strands (light green arrows), and random
coils (white strands). (b) Tertiary structure constitutes the folding of the
helices and strands in each HA subunit into a compact structure that is 13.5
nm long and divided into two domains. The membrane-distal domain is
folded into a globular conformation. The blue and green segments in this
domain correspond to the sequence shown in part (a). The proximal domain,
which lies adjacent to the viral membrane, has a stemlike conformation due
to alignment of two long helices of HA2 (dark blue) with strands in HA1.
Short turns and longer loops, which usually lie at the surface of the
molecule, connect the helices and strands in a given chain. (c) The

quaternary structure comprises the three subunits of HA; the structure is


stabilized by lateral interactions among the long helices (dark blue) in the
subunit stems, forming a triple-stranded coiled-coil stalk. Each of the distal
globular domains in trimeric hemagglutinin has a site (red) for binding sialic
acid molecules on the surface of target cells. Like many membrane proteins,
HA has several covalently bound carbohydrate (CHO) chains.
A structural domain consists of 100200 residues in various combinations of
helices, sheets, turns, and random coils. A domain, which is often
characterized by some interesting structural features, for example, an
unusual abundance of a particular amino acid (a proline-rich domain, an
acidic domain, a glycine-rich domain), sequences common to (conserved in)
many proteins (SH3, or Src homology region 3), or a particular secondarystructure motif (zinc-finger motif in kringle domain).
Domains sometimes are defined in functional terms based on observations
that the activity of a protein is localized to a small region along its length.
For instance, a particular region or regions of a protein may be responsible
for its catalytic activity (e.g., a kinase domain) or binding ability (e.g., a
DNA-binding domain, membrane-binding domain). Functional domains
often are identified experimentally by whittling down a protein to its
smallest active fragment with the aid of proteases, enzymes that cleave the
polypeptide backbone. Alternatively, the DNA encoding a protein can be
subjected to mutagenesis, so that segments of the protein's backbone are
removed or changed. The activity of the truncated or altered protein product
synthesized from the mutated gene is then monitored.
The functional definition of a domain is less rigorous than a structural
definition. However, if the three-dimensional structure of a protein has not
been determined, identification of functional domains can provide useful
information about the protein. Because the activity of a protein usually
depends on a proper three-dimensional structure, a functional domain
consists of at least one and often several structural domains.
The organization of tertiary structure into domains further illustrates the
principle that complex molecules are built from simpler components. Like
secondary-structure motifs, tertiary-structure domains are incorporated as
modules into different proteins, thereby modifying their functional activities.
The modular approach to protein architecture is particularly easy to
recognize in large proteins, which tend to be a mosaic of different domains
and thus can perform different functions simultaneously.
The epidermal growth factor (EGF) domain is one example of a module that
is present in several proteins. EGF is a small soluble peptide hormone that

binds to cells in the skin and connective tissue, causing them to divide. It is
generated by proteolytic cleavage between repeated EGF domains in the
EGF precursor protein, which is anchored in the cell membrane by a
membrane-spanning domain. Six conserved cysteine residues form three
pairs of disulfide bonds that hold EGF in its native conformation. The EGF
domain also occurs in other proteins, including tissue plasminogen activator
(TPA), a protease that is used to dissolve blood clots in heart attack victims;
Neu protein, which is involved in embryonic differentiation; and Notch
protein, a cell-adhesion molecule that glues cells together. Besides the EGF
domain, these proteins contain additional domains found in other proteins.
For example, TPA possesses a chymotryptic domain, a common feature in
proteins that catalyze proteolysis.

Epidermal growth factor (EGF) is generated by proteolytic cleavage of a


precursor protein containing multiple EGF domains (orange). The EGF
domain also occurs in Neu protein and in tissue plasminogen activator
(TPA). Other domains, or modules, in these proteins include a chymotryptic
domain (purple), an immunoglobulin domain (green), a fibronectin domain
(yellow), a membrane-spanning domain (pink), and a kringle domain (blue).
[Adapted from I. D. Campbell and P. Bork, 1993, Curr. Opin. Struc. Biol.
3:385.]
Classification of Protein Structure:
From the work of Levitt and Clothia (1976), four principal classes of protein
structure were recognized based on the types and arrangements of secondary
structural elements. These classes are described as follows:
1. Class comprises a bundle of helices connected by loops on the
surface of proteins.
2. Class comprises antiparallel sheets, usually two sheets in close
forming a sandwich. Alternatively, a sheet can twist into a barrel with

the first and last strands touching. Examples are enzymes, transport
proteins, antibodies, and virus coat proteins such as neuraminidase.
3. Class / comprises mainly parallel sheets with intervening
helices, but may also have mixed sheets. In addition to forming a
sheet in some proteins in this class, parallel strands in others may
form into a barrel structure that is surrounded by helices. This class
of proteins includes many metabolic enzymes.
4. Class + comprises mainly segregated helices and antiparallel
sheets.
5. Multidomain ( and ) proteins comprise domains representing more
than one of the above four classes.
6. membrane and cell-surface proteins and peptides excluding proteins
of the immune system comprise this class.
Within these broad categories, protein structures show a variety of folding
patterns. Among proteins with similar folding patterns, there are families
that share enough features of structure, sequence and function to suggest
evolutionary relationship. However, unrelated proteins often show similar
structural themes.
Classification of protein structures occupies a key position in
bioinformatics, not least as a bridge between sequence and function.

Helix-turnhelix motif
This class of folds is
referred to as alpha
+ beta

A single helix

beta sheet
arrangement

alpha/ beta
bar
rels

Small disulphide-rich
protein, which have few
helices and sheet.

Weak interaction stabilizes a protein conformation:


The native conformation of a protein is only marginally stable. The
difference in free energy between folded and unfolded states in typical
proteins under physiological conditions are in the range of only 20 to 65
kJ/mol. A given polypeptide chain can theoretically assume countless
different conformations, and as a result the unfolded state of a protein is
characterized by a high degree of conformational entropy. This entropy, and
the hydrogen-bonding interactions of many groups in the polypeptide chain
with solvent (water), tends to maintain the unfolded state. The chemical
interactions that counteract these effects and stabilize the native
conformation include disulfide bonds and the weak (noncovalent)
interactions are hydrogen bonds and hydrophobic, ionic and van der Waals
interactions.
Every time a bond is formed between two atoms, some free energy is
released in the form of heat or entropy. In other words, the formation of
bonds is accompanied by a favorable (negative) change in free energy. The
G (free-energy change) for covalent bond formation is generally in the
range of 200 to 460 kJ/mol. For weak interactions, G = 4 to 30kJ/mol.
Although covalent bonds are clearly much stronger, weak interactions
predominate as a stabilizing force into protein structure because of their
number. In general, the protein conformation with the lowest free energy
(i.e., the most stable) is the one with maximum number of weak interactions.
Every hydrogen-bonding group in a polypeptide chain was hydrogen
bonded to water prior to folding. For every hydrogen bond formed in a
protein, hydrogen bonds (of similar strength) between the same groups and
water were broken. The net stability contributed by a given weak interaction,
or the difference in free energies of the folded and unfolded state, is close to
zero. The contribution of weak interactions to protein stability can be
understood in terms of properties of water. Pure water contains a network of
hydrogen bonding water molecules. Optimizing the hydrogen bonding of
water around a hydrophobic molecule results in the formation of a highly
structural shell or solvation layer of water in the immediate vicinity,
resulting in an unfavorable decrease in entropy of water. The association
among hydrophobic or nonpolar groups results in a decrease in these
structured solvation layer, or a favorable increase in entropy. This entropy is
the major thermodynamic driving force for the association of hydrophobic
groups in aqueous solution, and hydrophobic amino acid side chains
therefore tend to be clustered in a proteins interior, away from water.

The formation of hydrogen bonds and ionic interactions in a protein is also


driven largely by this same entropic effect. Polar groups can generally form
hydrogen bonds with water and hence are soluble in water. Therefore, a
solvation shell of structured water will also form to some extent around
polar molecules. Even though the energy of formation of an intramolecular
interaction between two polar groups in a macromolecule is largely canceled
out by the elimination of such interactions between the same groups and
water, the release of structured water when the intramolecular interaction is
formed provides an entropic driving force for folding. Most of the net
change in free energy that occurs when weak interactions are formed within
protein is therefore derived from the increase in entropy in the surrounding
aqueous solution.
Of the different types of weak interactions, hydrophobic interactions are
particularly important in stabilizing a protein conformation. The interior of a
protein conformation; the interior of a protein is generally a densely packed
core of hydrophobic amino acid side chains. It is also important that any
polar or charged groups in protein interior have suitable partners for
hydrogen bonding or ionic interactions. One hydrogen bond makes only a
small apparent contribution to the stability of a native structure, but the
presence of a single hydrogen-bonding group without a partner in the
hydrophobic core of a protein can be so destabilizing that conformations
containing such are often thermodynamically untenable.
Protein Stability:
By the term protein stability primarily means the thermodynamic stability
of a protein that unfolds and refolds rapidly, reversibly, cooperatively, and
with a simple, two-state mechanism:
Where Ku, is the
equilibrium
constant for unfolding.
The easiest proteins in which to study folding and stability are those that
exhibit this sort of rapid reversibility. Both experimental design and also
theoretical treatment of data are simplified by reversible systems. Thus, it is
no surprise that most of the literature reports about stability discuss this type
of reversible system. The bulk of this dissertation will also focus on
thermodynamic stability.

In these cases, the stability of the protein is simply the difference in Gibbs
free energy, G, between the folded and the unfolded states. The only factors
affecting stability are the relative free energies of the folded (Gf) and the
unfolded (Gu) states. The larger and more positive Gu, the more stable is the
protein to denaturation.
The Gibbs free energy, G, is made up the two terms enthalpy (H) and
entropy (S), related by the equation:
Where T is the temperature in Kelvin.
The folding free energy difference, Gu, is typically small, of the order of 515 kcal/mol for a globular protein (compared to e.g. ~30 - 100 kcal/mol for a
covalent bond).
The easiest proteins in which to study folding and stability are those that
exhibit this sort of rapid by reversible systems. Thus, it is no surprise that
most of the literature reports reversibility. Both experimental design and also
theoretical treatment of data are simplified about stability discuss this type
of reversible system. The bulk of this Gu, the more stable is the protein to
denaturation. dissertation will also focus on thermodynamic stability.
In these cases, the stability of the protein is simply the difference in Gibbs
free energy, G, between the folded and the unfolded states. The only factors
affecting stability are the relative free energies of the folded (Gf) and the
unfolded (Gu) states. The larger and more positive
The Gibbs free energy, G, is made up the two terms enthalpy (H) and
entropy (S), related by the equation:
Where T is the temperature in Kelvin.
The folding free energy difference, Gu, is typically small, of the order of 515 kcal/mol for a globular protein (compared to e.g. ~30 - 100 kcal/mol for a
covalent bond).
In the case of irreversible or slowly unfolding proteins, it is kinetic stability
or the rate of unfolding that is important. A protein that is kinetically stable
will unfold more slowly than a kinetically unstable protein. In a kinetically
stable protein, a large free energy barrier to unfolding is required and the
factors affecting stability are the relative free energies of the folded (G f) and
the transition state (Gts) for the first committed step on the unfolding

pathway. Kinetic stability is discussed in more detail in its own section; see
Kinetic Stability. Irreversible loss of protein folded structure is represented
by:

Where ki is the rate constant for some irreversible inactivation process.


The free energy profile for a rapidly inactivating protein is shown below.
Note that once the Unfolded form is reached, the energy barrier to
inactivation is lower than that to refolding.

The Hydrophobic Effect


The hydrophobic effect is considered to be the major driving force for the
folding of globular proteins. It results in the burial of the hydrophobic
residues in the core of the protein.
The thermodynamic factors which give rise to the hydrophobic effect are
complex and still incompletely understood. The free energy of transfer of a
non-polar compound from some reference state, such as an organic solution,
into water, Gtr, is made up of an enthalpy, H, and entropy, -T S, term.

At room temperature, the enthalpy of transfer from organic solution into


aqueous solution is negligible; the interaction enthalpies are the same in both
cases.
The entropy however is negative. Water tends to form ordered cages around
the non-polar molecule and this leads to a decrease in entropy. At high
temperatures (~ 110C) these cages are no longer any stronger than bulk
water, and the entropy contribution tends to zero. The enthalpy of transfer,
however, is now positive (unfavourable). Because the temperature
dependence of entropy and enthalpy are not the same, there is some
temperature at which the hydrophobic effect is strongest, and the effect
decreases at temperatures above and below this temperature. The decrease in
the strength of the hydrophobic effect with decreasing temperatures is
probably the major cause of cold-denaturation in proteins.
The contribution of the hydrophobic effect to globular protein stability has
been estimated empirically both by measuring the thermodynamics of
transfer of model compounds (e.g. blocked amino acids, cyclic peptides...)
from organic solvents to water.
Hydrogen Bonds
A hydrogen bond occurs when two electronegative atoms, such as nitrogen
and oxygen, interact with the same hydrogen. The hydrogen is normally
covalently attached to one atom, the donor, but interacts electrostatically
with the other, the acceptor. This interaction is due to the dipole between the
electronegative atoms and the proton.
There is a geometric component involved in hydrogen bonds, and for single
donor acceptor systems, such as N-H---O, the strongest hydrogen bonds are
collinear (Creighton, 1993 and references therein). Electrostatic calculations
suggest that deviation of 20 from linearity leads to a decrease in binding
energy of approximately 10% (Pimentel & McClellan, 1960).
In double acceptor systems, bifurcated hydrogen bonds with non-linear
angles are preferred. The occurrence of hydrogen bonds in protein structure
has been extensively reviewed by Baker & Hubbard (1984), albeit before the
pdb database was as large as it is today. They found that 90% of N-H---O
bonds in proteins lie between 140 and 180, and that they are centred around
158C. For C=O---H, the range is more broadly distributed between 90 and
160 and centred around 129.
The strength of a hydrogen bond is between 2 and 10 kcal/mol, and one
might think that this is the amount of energy one hydrogen bond contributes
towards stabilization of a folded protein. However, in the unfolded state, all

potential hydrogen bonding partners in the extended polypeptide chain are


satisfied by hydrogen bonds to water. When the protein folds, these proteinto-water H-bonds are broken, and only some are replaced by (often suboptimal) intra-protein H-bonds. McDonald & Thornton (1994) showed that
while only 1.3% of backbone amino groups and 1.8% of carbonyl groups in
proteins fail to H-bond (without any obviously compensating interactions),
80% of main chain carbonyls fail to form a second hydrogen bond. Thus, if
one considers enthalpy terms alone, it would appear that hydrogen bonding
is destabilizing to folded protein structure.
However, one must also consider entropy. When a protein folds, and those
hydrogen bonds that the protein made to bulk water are broken, the entropy
of the solvent increases. The balance between the entropy and enthalpy
terms are close, and in the recent past it was considered that H-bonds made
no contribution overall to protein stability. But, it is now generally accepted
that H-bonds make a positive contribution to protein stabilisation (reviewed
in Pace et al., 1996.
Despite the small contribution made to protein stability by hydrogen bonds,
it should be kept in mind that if an intramolecular hydrogen bond is broken
or deleted in a protein without the possibility of forming a compensating Hbond to solvent, that protein will undego destabilization. In globular
proteins, much of the H-bonding potential of the backbone amide and
carbonyl groups is satisfied by the formation of regular structure such as
alpha helix and beta sheet (links to PPS); regular structure comprises 80 90% of globular protein structure.
Example of stabilized protein structure conformation:
Alpha-helix:
The alpha helix is the most abundant helical conformation found in globular
proteins accounting for 32-38% of all residues (Kabsch & Sander, 1983;
Creighton, 1993). The average length of an alpha helix is 10 residues as
taken from surveys of this structural database. The average dihedral angles
phi and psi (-64 +/- 7, -41 +/- 7) also obtained from these surveys are found
to differ slightly from the geometrically pure alpha helix (-57.8, -47.0). The
abundance of this particular form of secondary structure stems from the
following properties:
the phi and psi angles of the alpha helix (lie in the center of an
allowed, minimum energy region of the Ramachandran (phi, psi) map.
the dipoles of hydrogen bonding backbone atoms are in near perfect
alignment.

the radius of the helix allows for favorable van der Waals interactions
across the helical axis.
side chains are well staggered minimizing steric interference.

Looking at the helix along the helical axis from the C-terminus (top), you
can see the four carbonyl oxygens of the last turn of the helix and the
dispersion of sidechains. Residues in positions (i, i+3) and (i, i+4) are
positioned in such a way as to force interaction of their sidechains. This can
have a stabilizing effect if the residues are of opposite charge or are both
hydrophobic. Interaction between aromatic rings (Phe) at position (i) and His
at position (i+4) appears to have a stabilizing effect on the helical
conformation of the C-peptide of ribonuclease in solution (Armstrong et al.,
1993).

Helical wheel (A) and helical net (B) representations for locating
amphiphilic helices and other intra-helical interactions. Arrows in (B)
indicate the (i, i+3) and the (i, i+4) sidechain interactions enforced by
the helical conformation.

Beta Sheets:
Besides the alpha helix, beta sheets are another other major structural
element in globular proteins containing 20-28% of all residues (Kabsch &
Sander, 1983; Creighton, 1993). The extended conformation of the
polypeptide strands composing a beta sheet was already proposed in the
1930's from diffraction data (1.0) but researchers had to wait until the X-ray
crystal structure of lysozyme was solved before getting a look at one in a
globular protein. The basic unit of a beta sheet is a beta strand (which can be
thought of as a helix with n = 2 residues/turn) with approximate backbone
dihedral angles phi = -120 and psi = +120 producing a translation of 3.2 to

3.4 Angstroms/residue for residues in antiparallel and parallel strands,


respectively. The beta strand is then like the alpha helix, a repeating
secondary structure. However, since there are no intrasegment hydrogen
bonds and van der Waals interactions between atoms of neighboring residues
are not significant due to the extended nature of the chain, this extended
conformation is only stable as part of a beta sheet where contributions from
hydrogen bonds and van der Waals interactions between aligned strands
exert a stabilizing influence. The beta sheet is sometimes called the beta
"pleated" sheet since sequentially neighboring CA atoms are alternately
above and below the plane of the sheet giving a "pleated" appearance.
Beta sheets are found in two forms designated as "Antiparallel" or "Parallel"
based on the relative directions of two interacting beta strands. The average
length of a beta sheet is about 6 residues and most beta sheets contain less
than 6 strands. Side chains from adjacent residues of a strand in a beta sheet
are found on opposite sides of the sheet and do not interact with one another.
Therefore, like alpha-helices, beta-sheets have the potential for
amphiphilicity with one face polar and the other apolar. However, unlike
alpha-helices which are comprised of residues from a continuous
polypeptide segment (i.e., hydrogen bonds between CO of residue i and NH
of residue i+3), beta sheets are formed from strands that are very often from
distant portions of the polypeptide sequence. Hydrogen bonds in beta sheets
are on average 0.1 Angstrom shorter than those found in alpha helices
(Baker & Hubbard, 1984).
Conformational Entropy of Unfolding:
The factor that makes the greatest contribution to stabilization of the
unfolded state is its conformational entropy. It has been proposed that
decreasing the conformational flexibility of the unfolded chain (by
substitution with proline, or by replacement of glycine) should lead to an
increase in the stability of the folded relative to the unfolded protein
(Matthews et al, 1987) (See also disulphide bonds, below.)
Watanabe et al (1994) have found a correlation between the number of
proline residues in oligo-1,6-glucosidases from a number of bacterial species
and their thermostability. Structural analysis suggests that the optimal
placement is at the N-cap of alpha helices and at the second position beta
type I and beta type II turns.
Chemical Degradation:

It should be mentioned that however stable a protein can be made by


stabilizing the folded state, the ultimate limit of protein stability must come
from covalent degradation. At high temperatures (80 - 120 C) Asn and Gln
are susceptible to deamidation, Asp-Xaa peptide bonds are susceptible to
hydrolysis, disulphides bonds rupture, and Xaa-Pro peptide bonds undergo
cis-trans isomerisation (where Xaa is any amino acid).
Interestingly, the upper limit for protein chemical thermostability may be
higher than one would calculate from studies involving model mesophilic
enzymes. Apart from the trivial response of just avoiding these residues
(disulphides are absent and Asn/Gln content is reduced in hyperthermophiles), it is observed that the deamidation rate of Gln and Asn
residues is reduced, presumably by steric constraint, in fully folded hyperthermophilic structures (Vielle & Zeikus, 1996 and references therein).
Evolution and Protein Stability:
The evolutionary pressure is for a protein (or a car) to be strong enough to
perform its function, but no stronger. If a protein accumulates a stabilizing
mutation (which has no other effects), without evolutionary pressure to keep
it stable, it will likely soon accumulate another destabilizing mutation.
However, if this protein accumulates a destabilizing mutation which
compromises its ability to function, natural selection will rapidly remove it.
One example suggestive of this is on the mutation of the barnase to a close
homolog called binase (Serrano et al, 1993). These proteins have 17
differences out of 110 amino-acids, and highly superimposable structures.
Conclusions:
It is the sum of these various stabilising and destabilising interactions that
gives rise to the final stability of a protein. The total destabilising and the
total stabilising energies are both large, and their difference is small. This is
one of the reasons that current computational methods struggle to predict
protein stability from structure.
Furthermore, our understanding of these many forces is incomplete; as often
as not mutations have the opposite effect to that which had been predicted.
In addition, the activation energy for folding is an important determinant of
both kinetic stability and whether a protein will fold to a global minimum.

S-ar putea să vă placă și