Documente Academic
Documente Profesional
Documente Cultură
Introduction:
One of the major goals of bio-informatics is to understand the relationship
between amino-acid sequence and three-dimensional structure in proteins. If
this relationship were known, then the structure of a protein could reliably
predicted from the amino acid sequence.
Proteins are chains of amino acids joined by peptide bonds. The common
amino acids have the general structure that contains a central alpha ()carbon atom (C) to which a carboxylic acid group, amino group and
hydrogen atom are covalently bonded. In addition, the -carbon atom binds
a side chain group, designated R that uniquely defines each of the 20
essential amino acids. The R side chains also play important structural role.
Table 1: Properties of Amino acids
3Relative
Charged,
Single
VdW
letter
abundance MW pK
Polar,
code
volume(3)
code
(%) E.C.
Apolar
Alanine
ALA A
13.0
71
67
H
Arginine
ARG R
5.3
157 12.5 148
C+
Asparagine ASN N
9.9
114
96
P
Aspartate
ASP D
9.9
114 3.9 91
CCysteine
CYS C
1.8
103
86
P
Glutamate
GLU E
10.8
128 4.3 109
CGlutamine
GLN Q
10.8
128
114
P
Glycine
GLY G
7.8
57
48
Histidine
HIS H
0.7
137 6.0 118
P,C+
Isoleucine
ILE I
4.4
113
124
H
Leucine
LEU L
7.8
113
124
H
Lysine
LYS K
7.0
129 10.5 135
C+
Methionine MET M
3.8
131
124
H
Phenylalanine PHE F
3.3
147
135
H
Proline
PRO P
4.6
97
90
H
Serine
SER S
6.0
87
73
P
Threonine
THR T
4.6
101
93
P
Tryptophan TRP W
1.0
186
163
P
Tyrosine
TYR Y
2.2
163 10.1 141
P
Valine
VAL V
6.0
99
105
H
Name
(Residue)
Many conformations of the chain are possible due to rotation of the chain
about each C atom. It is these conformational variations that are responsible
for differences in the three-dimensional structures of proteins. Each amino
acid in the chain is polar, i.e., it has separated positive and negatively
charged regions with a chemically free C=O group, which can act as a
hydrogen bond acceptor, and an NH group, which can act as a hydrogen
bond donor. These groups interact in protein structures.
Fig 2: The structure of two amino acids in a polypeptide chain.
amino acid 1
C
C
side of the C atom are quite free to rotate, but many combinations of angles
are not possible for most amino acids due to spatial constraints from the R
group and neighboring positions in the chain. The conformation of the
protein backbone in space is determined by the angles of these bonds,
(phi) of the bond between the N and C atoms and (psi) of the bond
between the C and C of the C=O group, also named C. All possible
conformations of a polypeptide chain can be described in terms of their ,
conformational angles, a description that automatically takes account of the
fixed geometric features of the polypeptide backbone. The distribution of
these two angles for the amino acids in a particular protein is often plotted
on a graph called a Ramachandran plot. The angle of the peptide bond
joining the C=O and NH groups (not shown) is nearly always 180.
The side chains define the structures of the different amino acids:
1. Amino acids having uncharged side chain R groups at physiological
pH:
a. Category of alkyl amino acids:
Glycine has the simplest structure, with R=H. It does not have a side chain
and can therefore increase local flexibility in structures.
Alanine contains a methyl (CH3 ) R group.
Valine has an isopropyl R group.
Leucine and isoleucine has R groups, which are butyl alkyl chains that are
structural isomers of each other. In leucine, the branching methyl group in
the isobutyl side chain occurs on the gamma ()-carbon of the amino acid.
In the isoleucine, the butyl side chain is branched at the carbon.
b. Category of aromatic amino acids:
Phenyl alanine having R group, which contains a benzene ring.
Tyrosine contains a phenol group in the side chain.
Tryptophan has R group, which contains a heterocyclic structure known
as an indole.
In the three aromatic amino acids the aromatic moiety is attached to the carbon through a methylene ( CH2 ) carbon.
c. Sulphur-containing common amino acids category:
Cystein has R group, which has thiolmethyl (HSCH2 ). Cystein can react
with another cystein to form a cross-link that can stabilize the protein
structure.
Relative
hydrophobicitya
(kcal mol-1)
Alaline
Arginine
Asparagine
Aspartic Acid
Cysteine
Glycine
Glutamine
Glutamic acid
Histidine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
0.5
(-11.2)
-0.2
(-7.4)
(-2.8)
0
-0.3
(-9.9)
0.5
2.5
1.8
(-4.2)
1.8
2.5
(-3.3)
-0.3
0.4
3.4
2.3
1.5
Fraction of
time found
buried in
protein str.b
0.38
0.01
0.12
0.15
0.40c
0.36
0.07
0.18
0.17
0.60
0.45
0.03
0.40
0.50
0.18
0.22
0.23
0.27
0.15
0.54
Hydrophobicities
measured by the
distribution of the
amino acid between a
nonpolar solvent,
either ethanol or
dioxane, and water.
Negative values
indicate a preference
for nonpolar solvent.
b
Amino acid residue
must be atleast 95%
buried within the
interior of the protein
structure in order to be
counted as buried.
c
Values is for Cys in
thiol form. Value for
disulfide form is 0.50.
The scale is based on glycine being zero.The amino acid side chains that
preferentially dissolve in the nonpolar solvent have a positive
hydrophobicity value and it is found ---- with the more positive value, the
Quaternary structure
of
Protein Kinase C
Interacting Protein
Chain B of Protein
Kinase C Interacting
Protein.
Helices are
visualized as ribbons
and
extended strands of
beta sheets by broad
arrows.
-Helix:
Fig 3: The -helix and sheets of
protein secondary structure. (A) In the
helix, note that each C=O group at amino
acid position n in the sequence is
hydrogen-bonded with the NH group at
position n+4. There are 3.6 residues per
turn. The helix is usually right-handed, but
short section of 3-5 amino acids of left
-handed helices occur occasionally. The
average and angles of amino acids in
the right-handed helix are approximately
60 and 40, respectively. (B) The sheet
is made up of strands that are portions of
protein chain. The strands may run in the
same (parallel) or opposite (antiparallel)
strands.
helices buried in the protein crore or in cellular membranes has a higher and
more regular distribution of hydrophobic amino acids, and are highly
predictive of such structures. Helices exposed on the surface have a lower
proportion of hydrophobic amino acids. Amino acid content can be
predictive of an -helical regions. Regions richer in alanine (A),
glutamine (E), leucine (L), and methionine (M) and poorer in proline
(P), glycine (G), tyrosine (Y), and serine (S) tend to form an -helix.
Proline destabilizes or breaks an -helix but can be present in longer helices,
forming bend. There are computer programs (garnier program of Emboss)
for predicting quite reliably the general location of -helices in a protein
sequences.
-Sheet:
Besides alpha helices, beta sheets are another major structural elements in
globular proteins containing 20-28% of all residues. The backbone of the
polypeptide chain is extended into a zigzag rather than helical structure. In
fibroin the zigzag polypeptide chains are arranged side by side to form a
structure resembling a series of pleats, which is called a pleated sheet.
The basic unit of a beta sheet is a beta strand (which can be thought of as a
helix with n = 2 residues/turn) with approximate backbone dihedral angles
phi = -120 and psi = +120 producing a translation of 3.2 to 3.4
Angstroms/residue for residues in antiparallel and parallel strands,
respectively. The beta strand is then like the alpha helix, a repeating
secondary structure.
However, since there are no intrasegment hydrogen bonds and van der Waals
interactions between atoms of neighboring residues are not significant due to
the extended nature of the chain, this extended conformation is only stable
as part of a beta sheet where contributions from hydrogen bonds and van der
Waals interactions between aligned strands exerts a stabilizing influence.
The reason why beta sheet is sometimes called the beta "pleated" sheet is
that sequentially neighboring CA atoms are alternately above and below the
plane of the sheet giving a "pleated" appearance.
Beta sheets are found in two forms designated as Antiparallelor Parallel
based on the relative directions of two interacting beta strands. The average
length of a beta sheet is about 6 residues and most beta sheets contain less
than 6 strands. Side chains from adjacent residues of a strand in a beta sheet
are found on opposite sides of the sheet and do not interact with one another.
Therefore, like -helices, -sheets has the potential for amphiphilicity with
one face polar and the other apolar.
However, unlike -helices, which are comprised of residues from a
continuous polypeptide segment (i.e., hydrogen bonds between CO of
residue i and NH of residue i+3), -sheets are formed from strands that are
very often from distant portions of the polypeptide sequence. Hydrogen
bonds in -sheets are on average 0.1 Angstrom shorter than those found in
alpha helices.
Half of the backbone hydrogen bond donors and acceptors in strands at the
edges of a -sheets are uninvolved in the sheet interactions. Unlike the
situation with alpha helices (helix capping residues in beta sheets have not
yet been characterized. Sheets can fold onto themselves forming a barrel or
cylinder, thus eliminating the non-hydrogen bonded "ends" (see the TIM
barrel in Fig 6).
Parallel sheets
The hydrogen bonds in a parallel -sheet are not perpendicular to the
individual strands resulting in component (~1/3 of the peptide dipole)
parallel to the strand. The parallel sheet then has an overall macrodipole
leaving an effective charge of ~ +1/15 unit elemental charge at the Nterminus and - 1/15 charge at the C-terminus of each strand of average
length (~5 times less than an average -helix macrodipole).
An example of a mixed -sheet containing a parallel component is given
using the protein thioredoxin.
Antiparallel sheets
A residue in an antiparallel beta strand has values of -139 and +135 degrees
for the backbone dihedral angles phi and psi, respectively. Antiparallel beta
sheets are thought to be intrinsically more stable than parallel sheets due to
the more optimal orientation of the interstrand hydrogen bonds and that
peptide bond dipoles of nearest neighbors within an strand cancel whereas in
the parallel sheet, components of the dipoles parallel to the strands align and
may interact unfavorably.
However, in an exhaustive survey of hydrogen bonds in protein 3D
structures, no significant difference in the linearity of the hydrogen bonds in
parallel and antiparallel sheets was found. An example of a mixed beta sheet
containing an antiparallel component is shown using the protein thioredoxin
in (Fig 8).
Fig 8: The three-stranded antiparallel beta
sheet
in
the
protein
thioredoxin
(2TRX.PDB). The three antiparallel strands
are shown in both cartoon format (left) and
in stick form containing backbone atoms N,
CA, C, and O' (right). Hydrogen bonds are
identified by arrows connecting the donor
nitrogen and acceptor oxygens. Strands are
numbered according to their relative position
in the polypeptide sequence.
Twists
The classical beta sheets originally proposed are planar but most sheets
observed in globular proteins are twisted (0 to 30 degrees per residue).
Antiparallel beta sheets are more often twisted than parallel sheets. This
twist is always of the same handedness, but unfortunately, it has been
described using two conflicting conventions in the literature. If defined in
terms of the angle of strand crossings, the twist is left-handed and if defined
in terms of the progressive twist of the hydrogen-bonding direction, the twist
is right-handed. Two-stranded beta strands show the largest twists. At least
two separate explanations have been offered as reasons for the observed
twist.
One, based on statistical reasoning, suggests that the greater amount of
allowed phi, psi space to the "right" of the classical beta strand region would
produce a right-handed twist.
Another suggests a systematic distortion of the tetrahedral nitrogen, seen in
X-ray crystal structures, would favor a right-handed twist. A classical
example is the antiparallel beta strand in the bovine pancreatic trypsin
inhibitor.
Fig 9: A severely twisted
antiparallel beta hairpin from
the pancreatic trypsin
inhibitor (5PTI).
The pleated nature of the sheet becomes distinctly visible in the righ panels of
this figure, showing also the side chains sticking out above and below the sheet
plane. If you look carefully, you will also notice that the sheet has a left twist
(centre panel).
Bulges
Another irregularity found in antiparallel beta sheets is the hydrogenbonding of two residues from one strand with one residue from the other
called a beta bulge. Bulges are most often found in antiparallel sheets with
~5% of bulges occurring in parallel strands. Bulges, as well as Turnseffect
the directionality of the polypeptide chain and classical ones, like the one
shown in Figure 13 also accentuate the right-handed twist of the sheet.
Fig10: A classical beta-bulge in
chymotrypsin (1CHG.PDB) involving
residues 33 and 41-42. Hydrogen bonds are
indicated with dotted lines connecting the
donor and acceptor atoms.
Strand connections
There are two basic categories of connections between the individual strands
of a beta sheet. When the backbone enters the same end of the sheet that it
left it is called a hairpin connection and when the backbone enters the
opposite end it is called a crossover connection. Crossover connections can
be thought of as a type of helical connection of the strand ends. In globular
They occur as the short loop regions between anti-parallel hydrogen bonded
-strands. In general a reverse turn (or -turn, as it they are sometimes
called) is any region of a protein where there is a hydrogen bond involving
the carbonyl of residue i and the NH group of residue i+3. An alternative
definition states that the -carbons of residues i and i+3 must be within 7.0
. The structures of reverse turns are outlined in section 1.4. Sibanda and
Thornton have devised a system for classifying -hairpins, which is based
on two conventions for defining loop regions.
-hairpin loops adopt specific conformations which depend on their lengths
and sequences. Sibanda and Thornton have shown that 70% of -hairpins
are less than 7 residues in length with the two-residue turns forming the
most noticeable component. These two-residue -hairpins all adopt one of
the classical reverse turn conformations with an obvious preference for types
I' and II'. Type I 2-residue hairpins also occur but with lower abundance.
This contrasts with reverse turns where types I and II tend to dominate. In bhairpins the type I' turn has the correct twist to match the twist of the -sheet
and modelling studies indicated that if either type I or type II turns were to
c) Three-residue -hairpins
Normally, the residues at the ends of the two -strands, only make one
hydrogen bond as shown below. The intervening three residues have distinct
conformational preferences as shown in the Ramachandran plot. The first
residue adopts the right-handed a-helical conformation and the second amino
acid lies in the bridging region between between -helix and -sheet.
Glycine, asparagine or aspartate are frequently found at the last residue
position as this adopts and angles close to the left-handed helical
conformation.
d) Four-residue -hairpins
These are also quite common with the first two residues adopting the ahelical conformation. The third residue has and angles, which lie in the
bridging region between between -helix and -sheet and the final residue
adopts the left-handed -helical conformation and is therefore usually
glycine, aspartate or asparagine.
Reverse Turns
A reverse turn is region of the polypeptide having a hydrogen bond from one
main chain carbonyl oxygen to the main chain N-H group 3 residues along
the chain (ie O(i) to N(i+3)). Helical regions are excluded from this
definition and turns between beta-strands form a special class of turn known
as the beta-hairpin (see later). Reverse turns are very abundant in globular
proteins and generally occur at the surface of the molecule. It has been
suggested that turn regions act as nucleation centres during protein folding.
Reverse turns are divided into classes based on the phi and psi angles of the
residues at positions i+1 and i+2. Types I and II shown in the figure below
are the most common reverse turns, the essential difference between them
being the orientation of the peptide bond between residues at (i+1) and (i+2).
Longer loops
For these, a wide range of conformations is
observed and the general term 'random coil' is
sometimes used. Consecutive anti-parallel strands when linked by hairpins form a supersecondary structure known as the - meander.
-strands have a slight right-handed twist such that
when they pack side-by-side to form a -sheet, the
sheet has an overall left-handed curvature. Antiparallel -strands forming a -hairpin can
accommodate a 90 degree change in direction known as a -corner. The
strand on the inside of the bend often has a glycine at this position while the
other strand can have a -bulge. The latter involves a single residue in the
right-handed -helical conformation, which breaks the hydrogen bonding
pattern of the -sheet. This residue can also be in the left-handed helical or
bridging regions of the Ramachandran plot.
-corners are observed to have a right-handed twist when viewed from the
concave side.
Helix hairpins
A helix hairpin or-hairpin refers to the loop connecting two anti-parallel
-helical segments. Clearly, the longer the length of the loop then there is
greater chance of the number of possible conformations. However, for short
connections there are a limited number of conformations and for the shortest
loops of two or three residues, there is only one allowed conformation. Antiparallel -helices will interact generally by hydrophobic interactions
between side chains at the interface. Therefore, hydrophobic amino acids
have to be appropriately positioned in the amino acid sequence (one per turn
of each helix) to generate a hydrophobic core. Efimov has analyzed the
conformations of -hairpins and some of his results are summarized below.
The shortest -helical connections involve two residues, which are oriented
approximately perpendicular to the axes of the helices. Analysis of known
structures reveals that the first of these two residues adopts and angles in
the bridging or -helical regions of the Ramachandran plot. The second
residue is always glycine and is in a region of the Ramachandran plot with
positive phi, which is not available to other amino acids.
Three residue loops are also observed to have conformational preferences.
The first residue occupies the bridging region of the Ramachandran plot, the
second adopts the left-handed helical conformation and the last residue is in
a -strand conformation.
Four-residue loops adopt one of two possible conformations. One is similar
to the three-residue loop conformation described above except that there is
an additional residue in the -strand conformation at the fourth position. The
other conformation involves the four residues adopting bridging, , bridging,
and conformations, respectively.
The - corner
Short loop regions connecting helices, which are roughly perpendicular to
one another, are referred to as -corners. Efimov has shown that the
shortest -corner has its first residue in the left-handed -helical
conformation and the next two residues in -strand conformations. This
conformation can only be adopted when the two helices form a right-handed
corner. Indeed, if the helices were linked to form a left-handed corner there
would be steric hindrance. This may explain the scarcity of left-handed corners in protein X-ray structures.
The C-terminal residue of the first helix, which is in the left-handed helical conformation, must have a short side chain to avoid steric hindrance
and is observed commonly to be glycine. The first residue of the second
helix, which is in the -conformation, frequently has a small polar side chain
such as serine or aspartate, which can form hydrogen bonds with the free
NH groups at the amino-terminal end of the second helix. The central
residue of the -corner is almost always hydrophobic as it is buried and
interacts with other non-polar side chains buried where the ends of the two
helices contact each other.
Helix-turn-helix
The loop regions connecting -helical segments can have important
functions. For example, in parvalbumin there is helix-turn-helix motif,
which appears three times in the structure. Two of these motifs are involved
in binding calcium by virtue of carboxyl side chains and main chain
carbonyl groups. This motif has been called the EF hand as one is located
between the E and F helices of parvalbumin. It now appears to be a
ubiquitous calcium-binding motif present in several other calcium-sensing
proteins such as calmodulin and troponin C.
EF hands are made up from a loop of around 12 residues, which has polar
and hydrophobic amino acids at conserved positions. These are crucial for
ligating the metal ion and forming a stable hydrophobic core. Glycine is
invariant at the sixth position in the loop for structural reasons. Carboxyl
side chains, main chain groups and bound solvent, octahedrally coordinate
the calcium ion.
ion through hydrogen bonds. In another common motif, the zinc finger, three
secondary structuresan helix and two strands with an antiparallel
orientationform a fingerlike bundle held together by a zinc ion. This motif
is most commonly found in proteins that bind RNA or DNA.
Additional motifs will be examined in discussions of other proteins. The
presence of the same motif in different proteins with similar functions
clearly indicates that during evolution these useful combinations of
secondary structures have been conserved.
binds to cells in the skin and connective tissue, causing them to divide. It is
generated by proteolytic cleavage between repeated EGF domains in the
EGF precursor protein, which is anchored in the cell membrane by a
membrane-spanning domain. Six conserved cysteine residues form three
pairs of disulfide bonds that hold EGF in its native conformation. The EGF
domain also occurs in other proteins, including tissue plasminogen activator
(TPA), a protease that is used to dissolve blood clots in heart attack victims;
Neu protein, which is involved in embryonic differentiation; and Notch
protein, a cell-adhesion molecule that glues cells together. Besides the EGF
domain, these proteins contain additional domains found in other proteins.
For example, TPA possesses a chymotryptic domain, a common feature in
proteins that catalyze proteolysis.
the first and last strands touching. Examples are enzymes, transport
proteins, antibodies, and virus coat proteins such as neuraminidase.
3. Class / comprises mainly parallel sheets with intervening
helices, but may also have mixed sheets. In addition to forming a
sheet in some proteins in this class, parallel strands in others may
form into a barrel structure that is surrounded by helices. This class
of proteins includes many metabolic enzymes.
4. Class + comprises mainly segregated helices and antiparallel
sheets.
5. Multidomain ( and ) proteins comprise domains representing more
than one of the above four classes.
6. membrane and cell-surface proteins and peptides excluding proteins
of the immune system comprise this class.
Within these broad categories, protein structures show a variety of folding
patterns. Among proteins with similar folding patterns, there are families
that share enough features of structure, sequence and function to suggest
evolutionary relationship. However, unrelated proteins often show similar
structural themes.
Classification of protein structures occupies a key position in
bioinformatics, not least as a bridge between sequence and function.
Helix-turnhelix motif
This class of folds is
referred to as alpha
+ beta
A single helix
beta sheet
arrangement
alpha/ beta
bar
rels
Small disulphide-rich
protein, which have few
helices and sheet.
In these cases, the stability of the protein is simply the difference in Gibbs
free energy, G, between the folded and the unfolded states. The only factors
affecting stability are the relative free energies of the folded (Gf) and the
unfolded (Gu) states. The larger and more positive Gu, the more stable is the
protein to denaturation.
The Gibbs free energy, G, is made up the two terms enthalpy (H) and
entropy (S), related by the equation:
Where T is the temperature in Kelvin.
The folding free energy difference, Gu, is typically small, of the order of 515 kcal/mol for a globular protein (compared to e.g. ~30 - 100 kcal/mol for a
covalent bond).
The easiest proteins in which to study folding and stability are those that
exhibit this sort of rapid by reversible systems. Thus, it is no surprise that
most of the literature reports reversibility. Both experimental design and also
theoretical treatment of data are simplified about stability discuss this type
of reversible system. The bulk of this Gu, the more stable is the protein to
denaturation. dissertation will also focus on thermodynamic stability.
In these cases, the stability of the protein is simply the difference in Gibbs
free energy, G, between the folded and the unfolded states. The only factors
affecting stability are the relative free energies of the folded (Gf) and the
unfolded (Gu) states. The larger and more positive
The Gibbs free energy, G, is made up the two terms enthalpy (H) and
entropy (S), related by the equation:
Where T is the temperature in Kelvin.
The folding free energy difference, Gu, is typically small, of the order of 515 kcal/mol for a globular protein (compared to e.g. ~30 - 100 kcal/mol for a
covalent bond).
In the case of irreversible or slowly unfolding proteins, it is kinetic stability
or the rate of unfolding that is important. A protein that is kinetically stable
will unfold more slowly than a kinetically unstable protein. In a kinetically
stable protein, a large free energy barrier to unfolding is required and the
factors affecting stability are the relative free energies of the folded (G f) and
the transition state (Gts) for the first committed step on the unfolding
pathway. Kinetic stability is discussed in more detail in its own section; see
Kinetic Stability. Irreversible loss of protein folded structure is represented
by:
the radius of the helix allows for favorable van der Waals interactions
across the helical axis.
side chains are well staggered minimizing steric interference.
Looking at the helix along the helical axis from the C-terminus (top), you
can see the four carbonyl oxygens of the last turn of the helix and the
dispersion of sidechains. Residues in positions (i, i+3) and (i, i+4) are
positioned in such a way as to force interaction of their sidechains. This can
have a stabilizing effect if the residues are of opposite charge or are both
hydrophobic. Interaction between aromatic rings (Phe) at position (i) and His
at position (i+4) appears to have a stabilizing effect on the helical
conformation of the C-peptide of ribonuclease in solution (Armstrong et al.,
1993).
Helical wheel (A) and helical net (B) representations for locating
amphiphilic helices and other intra-helical interactions. Arrows in (B)
indicate the (i, i+3) and the (i, i+4) sidechain interactions enforced by
the helical conformation.
Beta Sheets:
Besides the alpha helix, beta sheets are another other major structural
element in globular proteins containing 20-28% of all residues (Kabsch &
Sander, 1983; Creighton, 1993). The extended conformation of the
polypeptide strands composing a beta sheet was already proposed in the
1930's from diffraction data (1.0) but researchers had to wait until the X-ray
crystal structure of lysozyme was solved before getting a look at one in a
globular protein. The basic unit of a beta sheet is a beta strand (which can be
thought of as a helix with n = 2 residues/turn) with approximate backbone
dihedral angles phi = -120 and psi = +120 producing a translation of 3.2 to