Documente Academic
Documente Profesional
Documente Cultură
Zhang Wenbin
Fulfilled by student: Do Thi Van Thanh ( 杜氏去清 )
ID number: 6160102907
Homework 1
Novel Methods For Enzyme Structure And Function Relationship Study
Enzymes, as biological catalysts, are critical for life, with a significant proportion (approximately 45%)
of gene products annotated as having an enzyme function. Without enzyme catalysis, most reactions
would be too slow to be useful for life, although not all reactions in nature require catalysis(Zhu and Lai
2009). Moreover, they are often the targets for pharmaceutical drug development, with a large number of
approved drugs acting to modify the behavior of enzymes implicated in human disease as well as disease
causing pathogens. Nature has produced a diverse array of enzymes all of which have the same overall
goal, to catalyze a particular reaction. Although, in a general sense, the function is conserved, enzymes
have developed a myriad of structural intricacies to accomplish this task, often referred to as structure
function relationships(Ouzounis, Coulson et al. 2003). Understanding these relationships involves a
comprehensive knowledge of both the catalytic mechanism and the structural features that make the
reaction possible.
(Scott C.-H. Pegg 2006) have developed a resource, the structure-function linkage database (SFLD), to
analyze these structure-function relationships. Unique to the SFLD is its hierarchical classification scheme
based on linking the specific partial reactions (or other chemical capabilities) that are conserved at the
superfamily, subgroup, and family levels with the conserved structural elements that mediate them. They
present the results of analyses using the SFLD in correcting misannotations, guiding protein engineering
experiments, and elucidating the function of recently solved enzyme structures from the structural
genomics initiative. A full understanding of enzyme structure-function relationships requires a mapping
of specific structural features to specific aspects of chemical mechanisms. Grouping enzymes that share
conserved structural features that perform a common aspect of a chemical mechanism is a step in this
direction. It lets us observe how the overall functions have diverged and which structural elements may
be responsible for the less conserved aspects of the reactions, as well as identify those responsible for
shared chemical capabilities at the subgroup or superfamily level. The SFLD provides not only structure-
function information but is organized around a functional fold superfamily paradigm that uses precisely
this grouping. Table 1 shows all of the structural genomics initiative targets that matched at least one
hidden Markov model(Pearson and Eddy 2011) in the SFLD and the level- (s of granularity (superfamily,
subgroup, or family) at which the target’s function can be described.
Table 1. Structures Solved by the Structural Genomics Initiative that Match Hidden Markov Models of the SFLD(Scott C.-H.
Pegg 2006)
FunTree is a new resource that brings together sequence, structure, phylogenetic, chemical and
mechanistic information for structurally defined enzyme superfamilies. Gathering together this range of
data into a single resource allows the investigation of how novel enzyme functions have evolved within a
structurally defined superfamily as well as providing a means to analyse trends across many
superfamilies(Furnham, Sillitoe et al. 2012). This is done not only within the context of an enzyme’s
sequence and structure but also the relationships of their reactions. Developed in tandem with the CATH
database, it currently comprises 276 superfamilies covering 1800 (70%) of sequence assigned enzyme
reactions. Central to the resource are phylogenetic trees generated from structurally informed multiple
sequence alignments using both domain structural alignments supplemented with domain sequences and
whole sequence alignments based on commonality of multi-domain architectures. These trees are
decorated with functional annotations such as metabolite similarity as well as annotations from manually
curated resources such the catalytic site atlas and MACiE for enzyme mechanisms.
Protein domains, structurally defined by CATH, that are identified as having an enzyme function are
selected using the MACiE database. The workflow by which data are collected, processed and presented
is shown in Figure 1.
Fig.1 The FunTree pipeline. (A) An overview of the workflow for collecting and processing sequence, structure and functional
information in FunTree. (B) A detailed schematic representation of the various steps in data collection, processing and
visualization in FunTree(Furnham, Sillitoe et al. 2012).
The structures are shown as protein cartoons, colored based on the colors assigned to the E.C. code in the
tree and the active site residues are highlighted as space filled atoms colored red. The active site
information is derived from the Catalytic Site Atlas (CSA) Figure 2. By beginning to gather, catalogue
and classify the emergence of catalytic reactions, users can analyse shifts in functionality across and within
enzyme superfamilies and may help in designing new enzymes as well as aid in function prediction.
CATH version 3.5 contains 173 536 domains, 2626 homologous superfamilies and 1313 fold groups.
When focusing on structural genomics (SG) structures, (Sillitoe, Cuff et al. 2012) observe that the number
of new folds for CATH v3.5 is slightly less than for previous releases, and this observation suggests that
they may now know the majority of folds that are easily accessible to structure determination. They have
improved the accuracy of our functional family (FunFams) sub-classification method and the CATH
sequence domain search facility has been extended to provide FunFam annotations for each domain. The
CATH website has been redesigned. Improving the display of functional data and of conserved sequence
features associated with FunFams within each CATH superfamily.
Fig.2. An example of the phylogenetic tree as visualized in FunTree. A single branch of the tree is highlighted to show the
range of information held for each branch. Branches with annotation in black have structural information associated with it,
while those in blue have just sequence information. The tree image is imbedded in the web page using the GoogleMaps API to
navigate around the tree. Annotation on each branch is hyperlinked to the underlying data source. Relationships between
metabolite data is shown as colored boxes, with the coloring based on a rainbow scale with similar metabolites having similar
colors. At each node in the tree, a bootstrap value is provided in blue as well as a link to a JalView window showing the
superimposition of any structures in the clade rooted at the node. The structures are colored by the same colors given to the
E.C. numbers in the tree and well as having any catalytic residue information for the CSA highlighted as red filled
spheres(Furnham, Sillitoe et al. 2012).
Sequence diversity across a superfamily correlates with structural diversity of relatives and also functional
diversity Figure.3. Examining CATH enzyme superfamilies, figure.3 shows that the majority of
superfamilies in CATH (90%) have <10 sequence subfamilies (at 30% sequence identity) and 10 enzyme
functions, whereas some of the remaining superfamilies (<5%, corresponding to <100 superfamilies) can
diverge significantly in sequence, structure and function(Sillitoe, Cuff et al. 2012). The new superfamily
pages have been designed to improve the presentation of information on this diversity, particularly by
capturing more informative data for functional families within each superfamily(Furnham, Sillitoe et al.
2012).
During evolution, most enzymes evolve to become enzymes from the same EC (Enzyme Commission)
class (60% of all EC changes) Figure. 4B (e.g., one hydrolase will evolve a new hydrolase function).
However, the remaining 40% of changes are between. enzymes catalyzing different overall chemistry
Figure. 4C. Remarkably, all possible changes between EC classes are observed. There are some
preferences such as transferases (EC 2) becoming oxidoreductases (EC 1), hydrolases (EC 3), and lyases
(EC 4). Isomerases (EC 5) are exceptional and evolve new overall chemistry more often than conserving
the chemistry of isomerization(Martínez Cuesta, Rahman et al. 2015).
Fig. 3. Plot showing, for each enzyme superfamily in CATH, the number of unique EC terms, FunFams and SCs(Sillitoe, Cuff
et al. 2012)
Fig.4. Exploring the evolution of enzyme function within 283 multifunctional CATH superfamilies. (A) FunTree approach:
first, structural clusters of CATH domains involved in enzyme function are created, populated with sequence relatives, and
structurally informed multiple sequence alignments are generated. Second, using alignments as the starting point, speciesguided
phylogenetic trees are created. Finally, functional annotations are retrieved from protein data resources and the frequency of
all possible exchanges between different EC numbers within each superfamily is added to an EC exchange matrix. To visualize
this matrix, circular diagrams are shown with ribbons representing the frequency of EC changes observed during evolution
(bandwidth) (B) within EC classes (diagonal of EC exchange matrix) and (C) between EC classes (off-diagonal). Although the
ribbons were colored according to the lowest EC primary class, they are bidirectional and hence the frequency of changes
ECX->ECY is the same to ECY->ECX. Data were obtained from CATH version 3.5 and graphics were generated using
Circos(Martí nez Cuesta, Rahman et al. 2015).
(Redfern, Dessailly et al. 2009) presented a novel method (FLORA) that automatically generates structural
motifs associated with different functional sub-families (FSGs) within functionally diverse domain
superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the
method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of
residues. FLORA is able to accurately discriminate between homologous domains with different functions
and substantially outperforms (a 2–3 fold increase in coverage at low error rates) popular structure
comparison methods and a leading function prediction method. They benchmarked FLORA on a large
data set of enzyme superfamilies from all three major protein classes (α, β, αβ). and demonstrate the
functional relevance of the motifs it identifies. FLORA can be applied to any other functional or
superfamily classification (both enzyme and non-enzyme) where there are sufficient structural data. Also
providing novel predictions of enzymatic activity for a large number of structures solved by the Protein
Structure Initiative.
Understanding how enzymes have evolved offers clues about their structure-function relationships and
mechanisms. (Brown and Babbitt 2014) described evolution of functionally diverse enzyme superfamilies,
each representing a large set of sequences that evolved from a common ancestor and that retain conserved
features of their structures and active sites. Using several examples, describing the different structural
strategies nature has used to evolve new reaction and substrate specificities in each unique superfamily.
The results provide insight about enzyme evolution that is not easily obtained from studies of one or only
a few enzymes.
(Kingsley and Lill 2015) have outlined several features of tunnels, including tunnel architecture, dynamics,
and physicochemical properties, and have also provided several examples of how these features influence
substrate specificity and enzymatic function. They summarized several of the computational methods that
have been critical in furthering our understanding of tunnel location, composition, function, and dynamics.
Understanding tunnels and how they function provides a more complete picture of enzymatic processes
and opens the door to novel methods of interfering with or altering enzyme function. There is a growing
interest in trying to understand the underlying structure–function relationships in tunnel-containing
enzymes.
Fig. 5 The evolution of kinetic models to explain ligand binding to enzymes with buried active sites (Kingsley and Lill 2015)
As the demand grows for a more holistic understanding of enzymatic processes, it is likely that
computational methods of tunnel prediction and evaluation will become increasingly useful to gain new
insight into structure–function relationships in proteins with buried active sites.
References
1.Almonacid, D. E. and P. C. Babbitt (2011). "Toward mechanistic classification of enzyme functions." Current
Opinion in Chemical Biology 15(3): 435-442.
2.Annabel E. Todd, C. A. O., and Janet M. Thornton (2002). "Sequence and Structural Differences between Enzyme
and Nonenzyme Homologs." Structure 10: 1435-1451.
7.Furnham, N., et al. (2012). "Exploring the evolution of novel enzyme functions within structurally defined protein
superfamilies." PLoS Comput Biol 8(3): e1002403.
8.Furnham, N., et al. (2012). "FunTree: a resource for exploring the functional evolution of structurally defined
enzyme superfamilies." Nucleic Acids Res 40(Database issue): D776-782.
9.Gail J. Bartlett, N. B. a. J. M. T. (2003). "Catalysing New Reactions during Evolution: Economy of Residues and
Mechanism." J. Mol. Biol. 331: 829-860.
10.Glasner, M. E., et al. (2006). "Evolution of enzyme superfamilies." Curr Opin Chem Biol 10(5): 492-497.
11.Kingsley, L. J. and M. A. Lill (2015). "Substrate tunnels in enzymes: Structure-function relationships and
computational methodology." Proteins: Structure, Function, and Bioinformatics 83(4): 599-611.
12. Martínez Cuesta, S., et al. (2015). "The Classification and Evolution of Enzyme Function." Biophysical Journal
109(6): 1082-1086.
13.Ouzounis, C. A., et al. (2003). "Classification schemes for protein structure and function." Nature Reviews
Genetics 4(7): 508-519.
14.Pearson, W. R. and S. R. Eddy (2011). "Accelerated Profile HMM Searches." PLoS Comput Biol 7(10)e1002195.
15.Redfern, O. C., et al. (2009). "FLORA: a novel method to predict protein function from structure in divese
superfamilies." PLoS Comput Biol 5(8): e1000485.
16. S.C.-H. Pegg, S. B., S. Ojha, C.C. Huang, T.E. Ferrin, and P.C. Babbitt (2005). "Representing Structure-
Function Relationships in Mechanistically Diverse Enzyme Superfamilies." Pacific Symposium on Biocomputing
10: 358-369.
17.Scott C.-H. Pegg, S. D. B., ‡ Sunil Ojha, Jennifer Seffernick,| Elaine C. Meng, John H. Morris, Patricia J. Chang,
Conrad C. Huang, Thomas E. Ferrin,, and Patricia C. Babbitt (2006). "Leveraging Enzyme Structure-Function
Relationships for Functional Inference and Experimental Design: The Structure-Function Linkage Database."
Biochemistry , 45, 2545-2555 45: 2545-2555.
18.Sillitoe, I., et al. (2012). "New functional families (FunFams) in CATH to improve the mapping of conserved
functional sites to 3D structures." Nucleic Acids Res 41(D1): D490-D498.
19.Zhu, X. and L. Lai (2009). "A novel method for enzyme design." Journal of Computational Chemistry 30(2):
256-267.