Documente Academic
Documente Profesional
Documente Cultură
2
2r
2
x
_ _
exp
y
j
2
2r
2
y
_ _
exp
z
j
2
2r
2
z
_ _
: 2
The molecule is oriented according to the following
procedure: the longest distance between two effective atoms
determines the z-axis, and the longest distance between
projections on the xy plane determines the x-axis.
For this orientation of molecules in the coordinate system,
the values of r
x
, r
y
, r
z
parameters are calculated as one-third
of the highest x, y, or z coordinates of the effective atom
increased by 9 A
N
i1
~
H
r
i
1
1
2
7
r
ij
c
_ _
2
9
r
ij
c
_ _
4
5
r
ij
c
_ _
6
r
ij
c
_ _
8
_ _ _ _
for r
ij
c
otherwise 0
_
_
_
3
where
~
Ho
j
denotes the empirical hydrophobicity value
characteristic for the j-th point, N is the number of residues
in a protein,
~
H
r
i
represents the hydrophobicity characteristic
for the i-th amino acid, r
ij
is the distance between the j-th
point and the geometrical center of the i-th residue, and c
expresses the cutoff distance, which has a xed value of 9.0 A
,
following the original paper [55]. The observed hydro-
phobicity distribution
~
Ho is also standardized. More details
concerning the FOD force eld are given in recently
published papers [2327].
The similarity of the FOD-based hydrophobic scale with
others commonly used for calculations (e.g., the Eisenberg
[56] or Doolittle [40] scales) has been shown and discussed in
[57]. The differences between these scales seem to be
negligible with respect to the problem under consideration.
Use of these scales does not change the
~
Ho distribution
signicantly (Equation 3) [57]. The introduction of the FOD-
based hydrophobic scale unies the system for proteins
(amino acids) and molecules interacting with proteins,
creating stable complexes (ligands).
Scoring Function. Since both theoretical
~
Ht and observed
~
Ho distributions of hydrophobicity are standardized to 1.0
and were calculated for the same set of points (geometrical
centers of all residues in a protein), the comparison of these
two characteristics is possible. The difference between
theoretical and empirical distributions D
~
H expresses the
irregularity of hydrophobic core construction. For the i-th
residue, D
~
H
i
is calculated as follows:
D
~
H
i
~
Ht
i
~
Ho
i
4
where
~
Ht
i
and
~
Ho
i
are the theoretical and observed values of
hydrophobicity for the geometric center of the i-th residue,
respectively.
The maxima of D
~
H recognize the residues representing the
hydrophobicity deciency, which points out the structural
irregularity, usually in a function-related area.
Comparative Analysis. The SuMo and ProFunc methods
(both available on the Web, see urls below) were selected to
perform the comparative analysis as to functional site
recognition.
SuMo. SuMo is a Web tool [58] (http://sumo-pbil.ibcp.fr/
cgi-bin/sumo-welcome) that predicts the function of proteins
based on the chemistry of the bound ligand. Each ligand and
macromolecule part is divided into sets of arbitrary prede-
ned chemical groups. The active site is recognized by a
comparison of a minimum of three chemical groups in both
compared molecules. SuMo produces a list of probable active
sites on default ranked by the number of SuMo groups
involved in each given prediction. The active site is described
by a set of amino acids and corresponding chemical groups
[59].
ProFunc. ProFunc [60] is a Web server (http://www.ebi.ac.uk/
thornton-srv/databases/ProFunc) devoted to predicting the
function of proteins of known 3-D structure and unknown
function. The server provides both sequence- and structure-
based methods, which may be used in the analysis of proteins.
From the group of structure-based methods available on the
server, the reverse templates 3-D templatebased method
[61] was chosen and applied to validate the method presented
in this study. According to the reverse-template method, the
structure itself is broken up into a large number of templates
(each containing three residues) that are scanned against a
representative set of structures in the PDB [61]. All the hits
obtained are scored and ranked. Other homology/sequence-
based tools were not taken into account; only methods of
similar (structure-based) methodologies were included.
The coordinates of all protein structures under study were
submitted to the server in PDB format. The top reverse
templatematching structures of known and unknown
functions were used in our comparative analysis.
Result Verification
The residues annotated in CSA as those playing roles in
catalytic activity were used as the gold standard to verify the
reliability of the results received according to the FOD model.
To indicate the most meaningful amino acids considered
by the FOD model to be located in the functional site, the
calculation of percentiles was used to identify the threshold
for selection of D
~
H maxima, which are distinguished as
belonging to the functional site. It is possible to do so,
because the quantitative results expressing the level of D
~
H
can be taken as the criteria for discrimination. For a set of
measurements arranged in order of magnitude, the p-th
percentile is the value that has p percent of the measurements
below it and (100 p) percent above it. In this analysis, the
95th percentile was used. In other words, among the analyzed
data, 95% of values were below the 95th percentile threshold,
and only the 5% above the threshold was taken into
consideration.
The same validation method cannot be used in the SuMo or
ProFunc methods because of their different types of output
data. They produce only the numbers of amino acids that
potentially belong to functional sites and total scores (based
on which given set of amino acid residues is assessed and what
functional site is proposed). This is why the percentage of
commonly classied residues was calculated for each protein
molecule by taking the best hit by ProFunc (according to the
score value) and the solution most relevant to the FOD-based
results by SuMo.
Results
Functional Site Recognition in Proteins of Differentiated
Biological Activity
The proteins of known biological activity (Table 1) and
protein structures of unknown function that resulted from
structural genomics projects (Table 2) were examined for the
locations of their functional sites. Table 3 summarizes the
results of the method application and comparison with
experimental observations (CSA classication). The rst
column presents the protein under consideration and the
list of residues recognized by CSA. For two proteins (rat
annexin V and ButF), residues that are in direct contact with
ligand [62,63] and/or are part of the functional site are given
[64].
PLoS Computational Biology | www.ploscompbiol.org May 2007 | Volume 3 | Issue 5 | e94 0912
Active Site Recognition In Silico
T
a
b
l
e
3
.
B
i
o
l
o
g
i
c
a
l
A
c
t
i
v
i
t
y
R
e
l
a
t
e
d
R
e
s
i
d
u
e
s
a
s
R
e
c
o
g
n
i
z
e
d
i
n
P
r
o
t
e
i
n
s
o
f
K
n
o
w
n
B
i
o
l
o
g
i
c
a
l
F
u
n
c
t
i
o
n
U
s
i
n
g
M
e
t
h
o
d
s
D
i
s
c
u
s
s
e
d
i
n
T
h
i
s
P
a
p
e
r
P
D
B
I
D
/
N
u
m
b
e
r
o
f
A
A
s
,
N
a
m
e
o
f
A
c
t
i
v
e
S
i
t
e
F
u
z
z
y
O
i
l
D
r
o
p
M
o
d
e
l
S
u
M
o
M
e
t
h
o
d
P
r
o
F
u
n
c
M
e
t
h
o
d
N
u
m
b
e
r
o
f
A
A
s
A
A
N
a
m
e
s
P
D
B
I
D
/
N
u
m
b
e
r
o
f
A
A
s
F
O
D
/
S
u
M
o
R
a
t
i
o
P
D
B
I
D
/
N
u
m
b
e
r
o
f
A
A
s
S
c
o
r
e
1
A
3
0
/
2
5
A
S
P
,
2
6
T
H
R
2
5
,
2
8
,
2
9
,
3
0
,
3
1
A
S
P
,
A
L
A
,
A
S
P
,
A
S
P
,
T
H
R
1
S
T
C
/
2
9
,
3
0
0
.
4
2
F
8
0
/
3
1
,
7
5
,
8
6
8
7
0
.
3
1
2
0
6
L
/
1
1
G
L
U
,
2
0
A
S
P
1
1
,
3
0
,
1
2
,
1
0
5
,
1
0
,
1
4
5
,
1
0
4
G
L
U
,
G
L
Y
,
G
L
Y
,
G
L
N
,
A
S
P
,
A
R
G
,
P
H
E
1
O
W
Z
/
3
0
,
1
0
4
0
.
5
1
7
6
L
/
8
8
T
Y
R
,
1
0
0
I
L
E
,
1
0
1
A
S
N
8
6
0
1
R
B
N
/
1
2
H
I
S
,
1
2
0
P
H
E
,
1
1
9
H
I
S
,
4
1
L
Y
S
1
2
,
1
2
0
,
1
1
,
4
5
,
1
1
9
,
7
,
4
1
H
I
S
,
P
H
E
,
G
L
N
,
T
H
R
,
H
I
S
,
L
Y
S
,
L
Y
S
1
R
P
G
/
1
1
,
1
2
,
4
5
,
1
2
0
0
.
2
2
1
I
Z
R
/
1
2
H
I
S
,
4
4
A
S
N
,
4
7
V
A
L
8
4
0
1
B
H
2
/
4
3
G
L
U
,
1
7
8
A
R
G
,
2
0
4
G
L
N
,
1
8
1
T
H
R
1
8
0
,
4
3
,
1
7
8
,
4
4
,
4
5
,
1
5
0
,
2
7
0
L
Y
S
,
G
L
U
,
A
R
G
,
S
E
R
,
G
L
Y
,
A
S
P
,
L
Y
S
1
B
H
2
/
4
3
,
4
4
,
4
5
,
1
7
8
0
.
2
1
B
H
2
/
2
9
S
E
R
,
3
0
P
R
O
,
2
0
9
T
R
P
9
4
0
1
K
Z
L
/
9
7
H
I
S
,
1
4
6
S
E
R
,
4
8
C
Y
S
,
1
8
5
A
S
P
,
4
1
S
E
R
1
8
3
,
1
4
4
,
1
4
1
,
1
8
6
,
1
8
5
,
1
4
0
,
1
8
4
,
1
0
4
,
1
8
9
,
1
3
3
,
1
3
2
G
L
U
,
G
L
Y
,
A
L
A
,
G
L
N
,
A
S
P
,
I
L
E
,
V
A
L
,
A
S
P
,
L
Y
S
,
T
Y
R
,
L
Y
S
2
A
C
V
/
1
8
5
,
1
8
6
0
.
2
1
K
Z
L
/
4
5
A
S
N
,
4
7
T
H
R
,
7
3
L
E
U
1
,
0
2
0
1
B
G
0
/
1
2
6
A
R
G
,
2
8
0
A
R
G
,
2
2
A
R
G
,
2
2
5
G
L
U
,
3
0
9
A
R
G
1
2
6
,
2
8
0
,
2
7
4
,
2
2
5
,
3
1
4
,
3
3
0
,
2
2
6
,
2
7
3
A
R
G
,
A
R
G
,
A
S
N
,
G
L
U
,
G
L
U
,
A
R
G
,
A
S
P
,
T
H
R
1
S
D
0
/
1
2
6
,
2
8
0
,
3
1
4
0
.
2
5
1
M
1
5
/
1
2
9
A
R
G
,
2
7
8
T
H
R
,
3
3
0
A
R
G
1
,
0
0
0
1
J
A
G
/
1
4
2
A
R
G
,
7
0
G
L
U
1
4
2
,
7
0
,
1
4
1
,
1
1
8
,
5
1
,
4
4
A
R
G
,
G
L
U
,
G
L
U
,
A
R
G
,
L
Y
S
,
G
L
U
1
B
X
R
/
1
1
8
,
1
4
1
,
1
4
2
0
.
4
1
J
A
G
/
1
4
3
S
E
R
,
2
2
9
T
R
P
,
2
4
2
L
E
U
9
8
0
1
M
H
L
/
9
1
G
L
N
,
9
5
H
I
S
9
4
,
9
8
,
1
0
0
,
1
0
1
,
2
9
,
2
8
,
3
1
A
S
P
,
A
S
P
,
T
H
R
,
P
R
O
,
P
H
E
,
A
L
A
,
A
R
G
N
O
H
I
T
1
D
2
V
/
9
T
H
R
,
1
4
C
Y
S
,
2
0
P
R
O
4
8
0
1
A
0
J
/
5
7
H
I
S
,
1
0
2
A
S
P
,
1
9
5
S
E
R
1
0
2
,
1
9
5
,
1
9
6
,
2
1
3
,
2
2
9
,
4
3
,
1
9
3
,
1
9
4
,
4
4
,
5
3
,
1
9
8
,
3
1
,
3
2
,
1
3
9
,
3
0
A
S
P
,
S
E
R
,
G
L
Y
,
V
A
L
,
T
H
R
,
G
L
Y
,
G
L
Y
,
A
S
P
,
G
L
Y
,
V
A
L
,
P
R
O
,
A
L
A
,
S
E
R
,
S
E
R
,
G
L
N
1
G
J
B
/
1
9
3
,
1
9
5
0
.
2
8
5
1
A
0
J
/
5
5
A
L
A
,
1
9
6
G
L
Y
,
1
9
7
G
L
Y
1
,
0
4
0
1
D
V
E
/
1
3
9
G
L
Y
,
1
3
6
A
R
G
,
1
4
0
A
S
P
,
1
3
5
T
H
R
,
1
4
3
G
L
Y
,
2
5
H
I
S
,
5
8
T
Y
R
1
3
9
,
1
3
6
,
1
4
0
,
1
3
7
,
5
8
,
8
5
,
6
2
G
L
Y
,
A
R
G
,
A
S
P
,
T
Y
R
,
T
Y
R
,
A
R
G
,
G
L
U
1
J
2
C
/
1
3
6
,
1
3
9
0
.
1
6
1
J
0
2
/
7
4
T
Y
R
,
1
2
9
L
E
U
,
1
3
2
H
I
S
9
2
0
1
B
5
T
/
2
8
G
L
U
,
1
2
0
A
S
P
2
8
,
1
2
0
,
2
7
5
,
1
8
3
G
L
U
,
A
S
P
,
T
Y
R
,
G
L
N
1
S
L
Y
/
1
8
3
,
2
7
5
0
.
1
8
1
B
5
T
/
2
7
P
H
E
,
2
8
3
S
E
R
,
2
8
7
C
Y
S
9
6
0
1
B
7
3
/
7
0
C
Y
S
,
7
A
S
P
,
8
S
E
R
,
1
7
8
C
Y
S
7
0
,
7
,
8
,
6
9
,
1
1
,
7
1
,
7
2
,
1
7
8
,
4
0
,
1
1
4
,
1
7
7
,
1
1
7
,
1
7
9
,
2
0
0
C
Y
S
,
A
S
P
,
S
E
R
,
A
L
A
,
G
L
Y
,
A
S
N
,
T
H
R
,
C
Y
S
,
G
L
Y
,
T
H
R
,
G
L
Y
,
T
H
R
,
T
H
R
,
S
E
R
1
Z
U
W
/
7
0
,
7
1
,
1
7
7
0
.
3
1
1
B
7
4
/
1
1
1
V
A
L
,
1
2
3
T
Y
R
,
1
7
7
G
L
Y
9
6
0
1
B
O
L
/
1
0
5
G
L
U
,
1
0
9
H
I
S
,
4
6
H
I
S
1
0
5
,
4
7
,
4
6
,
4
8
,
3
0
,
1
3
7
,
1
3
6
,
2
0
0
G
L
U
,
G
L
Y
,
H
I
S
,
L
E
U
,
A
S
N
,
A
L
A
,
L
Y
S
,
A
S
P
1
U
C
D
/
4
6
,
1
0
5
0
.
2
2
1
B
O
L
/
1
0
9
H
I
S
,
1
1
0
G
L
Y
,
1
3
3
T
Y
R
9
4
0
2
T
S
I
/
8
2
L
Y
S
,
2
3
0
L
Y
S
,
2
3
3
L
Y
S
,
8
6
A
R
G
2
3
0
,
4
7
,
8
0
,
1
9
4
,
7
8
,
4
8
,
1
9
3
,
1
9
2
,
3
8
,
1
9
5
,
2
2
1
,
5
0
,
5
1
,
2
6
9
,
1
7
3
,
1
0
L
Y
S
,
G
L
Y
,
S
E
R
,
A
S
P
,
A
S
P
,
H
I
S
,
S
E
R
,
G
L
Y
,
A
S
P
,
G
L
N
,
P
R
O
,
A
L
A
,
T
H
R
,
T
Y
R
,
G
L
N
,
A
R
G
1
J
I
L
/
3
8
,
7
8
,
1
7
3
,
1
9
2
,
1
9
4
,
1
9
5
0
.
2
7
2
T
S
I
/
3
4
T
Y
R
,
1
8
6
C
Y
S
,
1
8
9
G
L
N
9
6
0
1
G
O
X
/
2
5
4
H
I
S
,
1
5
7
A
S
P
,
2
5
7
A
R
G
,
1
2
9
T
Y
R
2
3
0
,
2
5
5
,
7
7
,
7
8
,
2
8
5
,
7
6
,
3
0
9
,
2
8
6
,
3
0
8
,
3
8
,
2
8
7
,
2
8
8
,
2
8
9
,
3
9
,
2
9
0
,
2
9
3
L
Y
S
,
G
L
Y
,
P
R
O
,
T
H
R
,
A
S
P
,
A
L
A
,
A
R
G
,
G
L
Y
,
G
L
Y
,
A
S
N
,
G
L
Y
,
V
A
L
,
A
R
G
,
A
R
G
,
A
R
G
,
A
R
G
1
G
O
X
/
2
8
5
,
2
8
6
,
2
8
7
,
2
8
8
,
2
8
9
0
.
5
5
1
G
O
X
/
8
1
G
L
N
,
9
2
T
H
R
,
9
5
A
L
A
1
,
0
8
0
1
D
H
F
/
2
2
L
E
U
,
3
0
G
L
U
1
1
5
,
3
5
,
5
6
,
3
8
,
5
5
,
1
3
2
,
4
8
,
1
2
2
V
A
L
,
G
L
N
,
T
H
R
,
T
H
R
,
L
Y
S
,
L
Y
S
,
A
S
N
,
L
Y
S
1
J
X
2
/
3
8
,
4
8
0
.
1
8
1
K
M
V
/
8
V
A
L
,
1
2
1
T
Y
R
,
1
3
6
T
H
R
8
4
0
1
E
S
O
/
6
1
H
I
S
4
4
,
4
2
,
4
3
,
7
2
,
8
3
,
8
4
,
1
4
0
,
4
0
,
1
2
2
H
I
S
,
G
L
Y
,
P
H
E
,
P
R
O
,
P
R
O
,
A
L
A
,
G
L
U
,
G
L
U
,
A
S
P
1
B
Z
O
/
4
4
0
.
0
9
1
E
S
O
/
4
6
H
I
S
,
6
1
H
I
S
,
1
1
8
H
I
S
9
6
0
5
C
P
A
/
7
1
A
R
G
,
1
2
7
A
R
G
,
2
7
0
G
L
U
7
2
,
6
9
,
1
2
7
,
1
9
6
,
6
8
,
6
7
,
1
4
4
,
1
1
2
,
1
9
4
,
1
1
1
,
6
5
,
1
0
8
,
1
7
5
G
L
U
,
H
I
S
,
A
R
G
,
H
I
S
,
I
L
E
,
G
L
Y
,
A
S
N
,
A
S
N
,
S
E
R
,
T
H
R
,
A
S
P
,
G
L
U
,
G
L
U
1
A
R
M
/
6
9
,
7
2
,
1
2
7
,
1
9
6
0
.
4
5
C
P
A
/
1
0
8
G
L
U
,
1
1
1
T
H
R
,
1
1
2
A
S
N
1
,
0
0
0
1
Q
L
H
/
4
8
S
E
R
,
5
1
H
I
S
4
8
,
4
7
,
4
6
,
1
7
4
,
6
8
,
1
7
5
,
3
6
9
,
2
0
3
,
2
9
2
,
1
7
8
,
1
7
3
,
3
1
9
,
1
7
7
,
1
7
9
,
9
1
,
9
2
,
2
0
6
,
3
2
4
S
E
R
,
A
R
G
,
C
Y
S
,
C
Y
S
,
G
L
U
,
G
L
Y
,
A
R
G
,
V
A
L
,
V
A
L
,
T
H
R
,
G
L
Y
,
P
H
E
,
S
E
R
,
G
L
Y
,
P
R
O
,
L
E
U
,
S
E
R
,
S
E
R
1
H
T
B
/
4
6
,
4
8
,
1
7
4
,
3
2
4
0
.
3
1
1
N
8
K
/
1
4
5
T
H
R
,
1
5
0
T
H
R
,
1
5
1
V
A
L
1
,
1
4
0
1
2
A
S
/
1
0
0
A
R
G
,
4
6
A
S
P
,
1
1
6
G
L
N
7
2
,
7
1
,
1
0
0
,
1
1
4
,
4
6
,
1
1
6
,
7
4
,
2
9
4
,
1
1
5
,
4
5
,
9
7
,
4
7
,
2
5
1
,
2
1
4
,
2
4
8
S
E
R
,
H
I
S
,
A
R
G
,
V
A
L
,
A
S
P
,
G
L
N
,
A
L
A
,
G
L
Y
,
A
S
P
,
G
L
N
,
L
Y
S
,
A
S
N
,
S
E
R
,
A
R
G
,
G
L
U
1
2
A
S
/
1
0
0
,
1
1
6
,
2
5
1
0
.
7
5
1
2
A
S
/
1
1
9
T
R
P
,
1
2
1
A
R
G
,
2
9
4
G
L
Y
9
8
0
1
B
D
0
/
3
1
1
C
Y
S
,
3
9
L
Y
S
,
2
6
5
T
Y
R
3
5
5
,
3
5
7
,
4
0
,
3
5
4
,
3
5
3
,
4
3
,
3
4
2
,
3
9
,
3
4
3
,
3
4
1
,
2
4
3
G
L
U
,
P
R
O
,
A
L
A
,
T
Y
R
,
A
S
N
,
T
Y
R
,
I
L
E
,
L
Y
S
,
A
S
P
,
S
E
R
,
G
L
U
1
B
D
0
/
4
3
,
3
5
4
0
.
1
1
B
D
0
/
2
9
3
G
L
Y
,
2
8
8
T
R
P
,
3
3
2
I
L
E
1
,
0
0
0
1
E
F
8
/
1
1
0
G
L
Y
,
6
6
H
I
S
,
1
4
0
T
Y
R
1
1
3
,
1
1
4
,
8
9
,
1
4
7
,
1
1
7
,
9
3
G
L
U
,
M
E
T
,
T
H
R
,
A
S
N
,
S
E
R
,
G
L
N
1
J
D
F
/
9
3
0
.
1
7
1
E
F
8
/
1
0
1
S
E
R
,
1
0
3
V
A
L
,
1
2
7
S
E
R
1
,
0
0
0
1
M
E
K
/
3
7
G
L
Y
,
3
9
C
Y
S
,
3
6
C
Y
S
,
3
8
H
I
S
6
4
,
8
6
,
2
4
,
2
5
,
2
6
,
2
7
,
5
9
,
6
1
,
8
7
,
8
9
,
9
0
,
9
7
,
1
1
5
L
Y
S
,
L
Y
S
,
H
I
S
,
L
Y
S
,
T
Y
R
,
L
E
U
,
G
L
U
,
A
R
G
,
P
H
E
,
A
R
G
,
A
S
N
,
L
Y
S
,
A
R
G
N
O
H
I
T
1
M
E
K
/
4
6
T
Y
R
,
6
2
L
E
U
,
6
4
L
Y
S
8
6
0
1
P
1
X
/
1
6
7
L
Y
S
,
2
0
1
L
Y
S
,
1
0
2
A
S
P
1
6
7
,
2
0
1
,
1
3
7
,
1
0
2
,
1
6
L
Y
S
,
L
Y
S
,
L
Y
S
,
A
S
P
,
A
S
P
N
O
H
I
T
1
P
1
X
/
1
6
A
S
P
,
4
7
C
Y
S
,
2
0
1
L
Y
S
1
,
0
4
0
2
A
B
K
/
1
3
8
A
S
P
,
1
2
0
L
Y
S
1
3
8
,
1
2
0
,
1
3
6
,
1
8
0
,
1
1
9
,
3
9
,
1
8
4
,
4
4
A
S
P
,
L
Y
S
,
A
L
A
,
I
L
E
,
A
R
G
,
S
E
R
,
A
R
G
,
A
S
P
N
O
H
I
T
2
A
B
K
/
1
4
4
V
A
L
,
1
4
8
T
H
R
,
1
8
2
H
I
S
9
0
0
2
D
H
N
/
2
2
G
L
U
,
1
0
0
L
Y
S
9
7
,
3
2
,
1
1
5
,
3
4
,
9
5
A
R
G
,
A
S
P
,
G
L
U
,
T
H
R
,
L
Y
S
N
O
H
I
T
1
D
H
N
/
8
4
I
L
E
,
9
4
T
H
R
,
1
1
8
A
R
G
9
0
0
1
P
M
I
/
1
1
1
G
L
N
,
3
0
4
A
R
G
,
2
9
4
G
L
U
1
1
1
,
2
8
5
,
2
8
7
,
1
3
8
,
2
7
9
,
2
9
4
,
2
8
4
,
2
8
3
,
1
0
2
,
1
4
0
,
1
0
0
,
2
7
7
,
4
8
G
L
N
,
H
I
S
,
T
Y
R
,
G
L
U
,
L
E
U
,
G
L
U
,
P
R
O
,
A
S
P
,
L
E
U
,
A
L
A
,
L
Y
S
,
M
E
T
,
G
L
U
1
P
M
I
/
1
1
1
,
1
3
8
,
2
8
4
,
2
8
5
0
.
2
2
1
P
M
I
/
2
0
6
P
H
E
,
2
6
1
C
Y
S
,
3
2
3
T
Y
R
9
8
0
1
T
P
H
/
1
1
A
S
N
,
9
5
H
I
S
,
1
3
L
Y
S
,
1
6
5
G
L
U
,
1
7
1
G
L
Y
9
7
,
1
6
5
,
6
4
,
9
8
,
6
5
,
1
1
,
9
5
,
1
3
A
S
N
,
H
I
S
,
L
Y
S
,
G
L
U
,
G
L
U
,
G
L
N
,
A
R
G
,
A
S
N
1
T
P
B
/
9
5
,
1
6
5
0
.
1
5
1
S
W
0
/
1
6
4
T
Y
R
,
2
0
8
T
Y
R
,
2
2
6
V
A
L
9
8
0
1
A
M
6
/
1
9
9
T
H
R
,
1
0
6
G
L
U
1
9
9
,
1
0
6
,
9
6
,
2
0
0
,
1
1
7
,
1
1
9
,
9
4
,
9
2
,
6
7
T
H
R
,
G
L
U
,
H
I
S
,
T
H
R
,
G
L
U
,
H
I
S
,
H
I
S
,
G
L
N
,
A
S
N
1
B
N
V
/
9
4
,
9
6
,
1
0
6
,
1
1
9
,
1
9
9
,
2
0
0
0
.
4
3
1
L
U
G
/
3
0
P
R
O
,
1
0
7
H
I
S
,
2
0
9
T
R
P
1
,
0
0
0
PLoS Computational Biology | www.ploscompbiol.org May 2007 | Volume 3 | Issue 5 | e94 0913
Active Site Recognition In Silico
T
a
b
l
e
3
.
C
o
n
t
i
n
u
e
d
.
P
D
B
I
D
/
N
u
m
b
e
r
o
f
A
A
s
,
N
a
m
e
o
f
A
c
t
i
v
e
S
i
t
e
F
u
z
z
y
O
i
l
D
r
o
p
M
o
d
e
l
S
u
M
o
M
e
t
h
o
d
P
r
o
F
u
n
c
M
e
t
h
o
d
N
u
m
b
e
r
o
f
A
A
s
A
A
N
a
m
e
s
P
D
B
I
D
/
N
u
m
b
e
r
o
f
A
A
s
F
O
D
/
S
u
M
o
R
a
t
i
o
P
D
B
I
D
/
N
u
m
b
e
r
o
f
A
A
s
S
c
o
r
e
1
V
D
1
/
9
3
G
L
U
,
9
7
H
I
S
,
3
9
H
I
S
9
3
,
3
9
,
9
6
,
1
0
,
1
2
,
1
1
,
1
7
1
,
3
7
,
1
5
8
G
L
U
,
H
I
S
,
L
Y
S
,
V
A
L
,
G
L
N
,
G
L
N
,
G
L
N
,
S
E
R
,
G
L
U
1
V
D
1
/
3
9
,
9
3
0
.
1
8
1
V
D
3
/
9
P
H
E
,
1
1
9
S
E
R
,
1
7
5
C
Y
S
9
2
0
2
A
H
J
/
1
1
3
S
E
R
1
1
3
,
1
2
8
,
1
1
5
,
1
2
3
,
1
3
3
,
9
0
S
E
R
,
L
Y
S
,
T
H
R
,
P
R
O
,
A
R
G
,
G
L
N
2
A
H
J
/
1
1
3
0
.
0
8
2
C
Z
1
/
5
6
G
L
Y
,
6
0
V
A
L
,
6
4
T
R
P
8
8
0
1
A
8
A
/
C
a
l
c
i
u
m
-
b
i
n
d
i
n
g
s
i
t
e
;
2
8
G
L
Y
,
3
0
G
L
Y
,
3
1
T
H
R
,
7
0
G
L
U
,
1
0
0
G
L
Y
,
1
0
2
G
L
Y
,
1
0
3
T
H
R
,
1
4
2
A
S
P
,
1
8
1
G
L
Y
,
1
8
2
G
L
U
,
1
8
3
L
E
U
,
1
8
4
L
Y
S
,
1
8
5
T
R
P
,
1
8
6
G
L
Y
,
1
8
7
T
H
R
,
2
5
9
G
L
Y
,
2
6
1
G
L
Y
,
2
6
2
T
H
R
,
3
0
1
A
S
P
;
i
o
n
c
h
a
n
n
e
l
:
1
1
2
G
L
U
,
2
7
1
A
R
G
,
9
8
A
S
P
,
1
1
7
A
R
G
,
1
7
G
L
U
,
7
8
G
L
U
,
9
5
G
L
U
8
6
,
8
9
,
9
0
,
9
3
,
1
0
6
,
1
1
0
,
1
1
1
,
1
1
4
,
1
1
5
,
1
1
9
,
2
6
9
,
2
7
3
,
2
7
4
S
E
R
,
T
Y
R
,
A
S
P
,
G
L
U
,
L
Y
S
,
G
L
U
,
I
L
E
,
S
E
R
,
A
R
G
,
G
L
U
,
A
R
G
,
S
E
R
,
A
R
G
2
R
A
N
/
1
1
1
0
.
0
8
2
R
A
N
/
1
0
9
T
H
R
,
1
1
1
I
L
E
,
1
4
7
T
Y
R
8
8
0
1
N
2
Z
/
3
0
S
E
R
,
3
1
P
R
O
,
3
2
A
L
A
,
5
0
T
Y
R
,
8
5
T
R
P
,
8
7
G
L
Y
,
1
9
6
T
R
P
,
2
4
1
S
E
R
,
2
4
2
A
S
P
,
2
4
5
G
L
U
,
2
4
6
A
R
G
3
2
,
1
0
8
,
1
1
0
,
1
7
4
,
1
7
6
,
2
4
5
,
2
4
6
,
2
4
7
A
L
A
,
A
S
P
,
T
H
R
,
S
E
R
,
G
L
N
,
G
L
U
,
A
R
G
,
A
L
A
1
N
2
Z
/
2
4
6
0
.
0
4
1
N
2
Z
/
2
8
T
H
R
,
2
9
L
E
U
,
4
8
S
E
R
9
6
0
C
o
l
u
m
n
1
r
e
s
i
d
u
e
s
a
r
e
r
e
c
o
g
n
i
z
e
d
a
s
a
c
t
i
v
e
s
i
t
e
(
C
S
A
d
a
t
a
b
a
s
e
)
i
n
t
h
e
o
r
d
e
r
a
c
c
o
r
d
i
n
g
t
o
i
n
c
r
e
a
s
e
d
d
i
s
t
a
n
c
e
v
e
r
s
u
s
t
h
e
l
i
g
a
n
d
p
o
s
i
t
i
o
n
(
g
e
o
m
e
t
r
i
c
a
l
c
e
n
t
e
r
o
f
l
i
g
a
n
d
o
r
t
h
e
a
v
e
r
a
g
e
d
p
o
s
i
t
i
o
n
o
f
a
m
i
n
o
a
c
i
d
s
r
e
s
p
o
n
s
i
b
l
e
f
o
r
b
i
o
l
o
g
i
c
a
l
f
u
n
c
t
i
o
n
a
s
f
o
u
n
d
i
n
l
i
t
e
r
a
t
u
r
e
)
.
B
o
l
d
n
u
m
b
e
r
s
a
r
e
g
i
v
e
n
f
o
r
r
e
s
i
d
u
e
s
r
e
c
o
g
n
i
z
e
d
b
y
F
O
D
a
n
d
a
t
l
e
a
s
t
o
n
e
o
f
t
w
o
a
n
a
l
y
z
e
d
m
e
t
h
o
d
s
(
S
u
M
o
,
P
r
o
F
u
n
c
)
.
U
n
d
e
r
l
i
n
e
d
n
u
m
b
e
r
s
r
e
p
r
e
s
e
n
t
r
e
s
i
d
u
e
s
f
o
u
n
d
t
o
b
e
f
u
n
c
t
i
o
n
-
r
e
l
a
t
e
d
(
a
c
c
o
r
d
i
n
g
t
o
C
S
A
d
a
t
a
b
a
s
e
)
.
I
t
a
l
i
c
i
z
e
d
n
u
m
b
e
r
s
a
r
e
s
h
o
w
n
w
h
e
n
t
h
e
p
o
s
i
t
i
o
n
o
f
t
h
e
a
m
i
n
o
a
c
i
d
s
d
i
f
f
e
r
b
y
1
(
c
l
o
s
e
s
t
n
e
i
g
h
b
o
r
s
)
v
e
r
s
u
s
t
h
e
C
S
A
c
l
a
s
s
i
f
i
c
a
t
i
o
n
o
r
v
e
r
s
u
s
t
h
e
p
o
s
i
t
i
o
n
f
o
u
n
d
b
y
F
O
D
.
F
o
r
F
O
D
v
e
r
s
u
s
S
u
M
o
c
o
m
p
a
r
i
s
o
n
,
a
m
i
n
o
a
c
i
d
s
c
o
m
m
o
n
f
o
r
b
o
t
h
m
e
t
h
o
d
s
a
r
e
p
o
i
n
t
e
d
o
u
t
w
i
t
h
t
h
e
P
D
B
p
r
o
t
e
i
n
c
o
d
e
,
w
h
i
c
h
f
u
n
c
t
i
o
n
a
l
s
i
t
e
w
a
s
f
o
u
n
d
t
o
b
e
r
e
l
a
t
e
d
w
i
t
h
t
h
e
p
r
o
t
e
i
n
u
n
d
e
r
c
o
n
s
i
d
e
r
a
t
i
o
n
.
F
O
D
/
S
u
M
o
r
a
t
i
o
e
x
p
r
e
s
s
e
s
t
h
e
p
a
r
t
o
f
a
m
i
n
o
a
c
i
d
s
c
o
m
m
o
n
f
o
r
F
O
D
v
e
r
s
u
s
t
h
e
t
o
t
a
l
n
u
m
b
e
r
o
f
r
e
s
i
d
u
e
s
p
o
i
n
t
e
d
o
u
t
b
y
S
u
M
o
.
P
r
o
F
u
n
c
s
c
o
r
e
v
a
l
u
e
s
a
r
e
g
i
v
e
n
a
c
c
o
r
d
i
n
g
t
o
t
h
e
p
r
o
g
r
a
m
c
l
a
s
s
i
f
i
c
a
t
i
o
n
.
d
o
i
:
1
0
.
1
3
7
1
/
j
o
u
r
n
a
l
.
p
c
b
i
.
0
0
3
0
0
9
4
.
t
0
0
3
PLoS Computational Biology | www.ploscompbiol.org May 2007 | Volume 3 | Issue 5 | e94 0914
Active Site Recognition In Silico
T
a
b
l
e
4
.
B
i
o
l
o
g
i
c
a
l
A
c
t
i
v
i
t
y
R
e
l
a
t
e
d
R
e
s
i
d
u
e
s
a
s
R
e
c
o
g
n
i
z
e
d
i
n
P
r
o
t
e
i
n
s
o
f
U
n
k
n
o
w
n
B
i
o
l
o
g
i
c
a
l
F
u
n
c
t
i
o
n
U
s
i
n
g
M
e
t
h
o
d
s
D
i
s
c
u
s
s
e
d
i
n
T
h
i
s
P
a
p
e
r
F
u
z
z
y
O
i
l
D
r
o
p
M
o
d
e
l
S
u
M
o
M
e
t
h
o
d
P
r
o
F
u
n
c
M
e
t
h
o
d
P
D
B
I
D
/
N
u
m
b
e
r
o
f
A
A
s
A
A
N
a
m
e
P
D
B
I
D
/
N
u
m
b
e
r
o
f
A
A
s
F
O
D
/
S
u
M
o
R
a
t
i
o
S
c
o
r
e
P
D
B
I
D
/
N
u
m
b
e
r
o
f
A
A
s
S
c
o
r
e
1
Z
H
C
/
3
0
,
5
3
,
5
7
A
S
P
,
L
Y
S
,
L
Y
S
1
K
Q
S
/
3
0
0
.
2
5
2
.
9
9
/
3
.
0
2
7
1
Z
H
C
/
1
3
L
Y
S
,
1
8
H
I
S
,
6
1
H
I
S
6
8
0
2
A
Z
P
/
1
4
,
1
5
,
1
6
,
1
7
,
5
7
,
6
0
,
6
1
,
8
0
,
8
1
G
L
U
,
P
R
O
,
T
H
R
,
A
R
G
,
A
S
P
,
V
A
L
,
G
L
Y
,
A
S
N
,
A
S
N
4
K
B
P
/
1
5
,
1
6
,
8
0
0
.
7
5
2
.
1
/
3
.
1
5
2
A
Z
P
/
9
2
T
H
R
,
9
4
G
L
Y
,
9
8
S
E
R
1
,
1
4
0
2
C
V
9
/
8
,
3
5
,
3
7
,
6
5
,
1
1
1
,
1
4
3
,
1
7
0
A
S
P
,
A
S
N
,
G
L
U
,
A
S
N
,
G
L
N
,
G
L
U
,
H
I
S
1
H
2
G
/
3
5
,
3
7
,
1
1
1
,
1
4
3
0
.
8
2
.
9
9
9
/
3
.
7
5
2
C
V
B
/
7
G
L
Y
,
9
V
A
L
,
1
9
4
T
H
R
;
1
T
7
0
/
1
6
8
G
L
Y
,
1
6
9
T
H
R
,
1
7
0
H
I
S
1
,
1
0
0
2
C
V
B
/
6
,
1
1
8
,
1
1
9
,
1
3
2
,
1
3
3
,
1
6
7
,
1
6
9
,
1
7
0
G
L
U
,
P
R
O
,
G
L
U
,
H
I
S
,
G
L
Y
,
G
L
U
,
P
R
O
,
A
L
A
1
O
A
1
/
6
,
1
3
2
,
1
6
7
0
.
6
2
.
7
/
3
.
3
5
2
C
V
B
/
4
2
C
Y
S
,
7
2
A
S
N
,
1
0
3
A
S
P
1
,
0
2
0
2
C
W
4
/
3
1
,
5
4
,
5
5
,
5
6
,
5
7
G
L
N
,
G
L
U
,
A
S
N
,
L
E
U
,
L
Y
S
1
T
S
Y
/
5
4
,
5
5
0
.
6
7
2
.
1
/
2
.
7
7
2
C
V
L
/
6
0
L
E
U
,
6
6
G
L
Y
,
1
1
9
C
Y
S
;
2
B
3
3
/
1
0
P
R
O
,
3
1
G
L
N
,
1
1
6
G
L
U
8
8
0
2
C
W
5
/
4
4
,
4
6
,
4
7
,
4
8
,
4
9
,
5
0
,
5
1
,
5
4
A
S
P
,
A
R
G
,
A
R
G
,
A
L
A
,
A
L
A
,
T
Y
R
,
A
L
A
,
G
L
U
1
Z
I
9
/
5
0
,
5
4
1
2
.
4
3
/
3
.
4
5
2
C
W
5
/
1
8
9
T
H
R
,
2
3
2
G
L
U
,
1
1
9
C
Y
S
1
,
0
0
0
2
C
W
Y
/
4
1
,
4
2
,
4
5
,
8
4
G
L
Y
,
V
A
L
,
L
E
U
,
L
E
U
1
B
Q
Q
/
4
1
0
.
3
3
2
.
4
3
/
3
.
0
2
C
W
Y
/
5
T
R
P
,
3
9
L
E
U
,
4
0
G
L
N
9
2
0
2
C
X
0
/
7
7
,
7
8
,
9
1
,
1
4
1
P
R
O
,
T
H
R
,
L
Y
S
,
V
A
L
1
U
0
F
/
1
4
1
0
.
2
5
3
.
6
0
/
5
.
5
0
2
C
X
1
/
6
6
C
Y
S
,
7
8
T
H
R
,
1
6
8
H
I
S
9
0
0
2
C
X
F
/
1
5
2
,
1
6
5
,
1
6
6
,
1
6
7
,
1
6
9
,
1
7
0
,
1
8
3
,
1
8
6
,
1
8
7
,
2
4
2
G
L
U
,
T
H
R
,
A
L
A
,
S
E
R
,
L
Y
S
,
A
S
P
,
A
L
A
,
A
R
G
,
L
E
U
,
M
E
T
1
Y
I
7
/
1
6
6
,
1
6
7
,
1
7
0
0
.
7
5
2
.
1
/
3
.
3
2
C
X
F
/
1
5
4
V
A
L
,
1
8
4
T
R
P
,
1
9
7
T
Y
R
9
2
0
2
C
X
L
/
1
8
3
,
1
8
4
,
1
8
6
,
1
8
7
,
1
8
8
,
1
9
1
,
1
9
3
A
L
A
,
T
R
P
,
A
R
G
,
L
E
U
,
A
L
A
,
G
L
N
,
L
Y
S
1
N
2
C
/
1
8
3
,
1
8
4
,
1
8
6
0
.
7
5
2
.
1
/
3
.
0
5
2
D
W
K
/
1
5
4
V
A
L
,
1
8
4
T
R
P
,
1
9
7
T
Y
R
9
2
0
2
D
4
R
/
4
9
,
5
1
,
6
4
,
6
6
,
8
7
,
9
1
,
1
3
0
S
E
R
,
T
R
P
,
G
L
U
,
G
L
U
,
T
Y
R
,
T
R
P
,
A
S
N
1
N
S
I
/
5
1
,
6
6
,
8
7
,
1
3
0
0
.
6
7
2
.
4
/
3
.
6
2
D
4
R
/
3
1
L
E
U
,
5
1
T
R
P
,
6
6
G
L
U
9
2
0
2
E
S
H
/
8
,
9
,
1
1
,
1
2
,
9
6
,
1
0
0
,
1
0
3
G
L
Y
,
P
H
E
,
G
L
Y
,
T
R
P
,
S
E
R
,
M
E
T
,
A
R
G
1
L
U
2
/
1
2
,
9
6
,
1
0
0
0
.
6
2
.
1
5
/
3
.
0
3
2
D
4
R
/
3
1
L
E
U
,
5
1
T
R
P
,
6
6
G
L
U
9
2
0
2
E
U
I
/
6
5
,
8
2
,
8
3
G
L
N
,
A
S
N
,
A
S
P
1
T
K
K
/
6
5
,
8
2
,
8
3
0
.
7
5
2
.
1
/
3
.
4
3
2
E
U
I
/
5
1
L
E
U
,
6
2
G
L
Y
,
6
4
C
Y
S
8
6
0
2
E
V
V
/
5
5
,
5
6
,
7
2
,
7
3
,
9
5
G
L
U
,
L
E
U
,
T
R
P
,
V
A
L
,
G
L
N
1
J
X
6
/
7
2
,
7
3
0
.
6
6
2
.
1
/
3
.
2
2
2
E
V
V
/
4
1
I
L
E
,
7
2
T
R
P
,
8
7
S
E
R
9
8
0
2
E
W
0
/
3
8
,
3
9
,
4
0
G
L
N
,
G
L
Y
,
I
L
E
1
V
R
Q
/
3
8
,
3
9
0
.
6
7
2
.
1
/
4
.
2
2
E
W
0
/
2
5
T
H
R
,
2
7
I
L
E
,
2
8
T
Y
R
9
8
0
2
E
W
C
/
2
7
,
2
8
,
2
9
,
1
1
5
L
E
U
,
A
S
N
,
T
Y
R
,
A
S
P
1
C
F
4
/
2
8
,
2
9
0
.
6
7
2
.
1
5
/
3
.
1
5
2
E
W
C
/
4
9
M
E
T
,
5
0
G
L
U
,
1
1
6
G
L
Y
6
9
4
2
E
W
R
/
2
7
,
4
5
,
4
7
,
9
1
,
9
8
T
H
R
,
A
S
P
,
G
L
N
,
G
L
U
,
L
Y
S
1
S
B
E
/
2
7
,
4
5
0
.
6
7
2
.
4
3
/
3
.
9
2
F
C
L
/
2
8
G
L
Y
,
3
1
S
E
R
,
1
3
0
L
E
U
8
0
0
2
F
0
6
/
5
,
4
6
,
6
9
,
9
9
,
1
1
2
G
L
N
,
A
R
G
,
T
H
R
,
T
Y
R
,
V
A
L
1
K
5
6
/
5
,
4
6
,
6
9
,
9
9
1
2
.
7
3
/
3
.
6
2
F
O
6
/
7
3
G
L
Y
,
1
1
3
I
L
E
,
1
2
1
C
Y
S
8
8
0
2
F
0
9
/
5
,
1
3
,
1
4
,
1
6
,
1
7
,
1
8
,
2
3
,
5
8
,
5
9
,
6
1
,
8
0
T
H
R
,
L
Y
S
,
P
R
O
,
T
H
R
,
V
A
L
,
L
Y
S
,
A
R
G
,
L
Y
S
,
G
L
Y
,
G
L
U
,
P
R
O
2
B
M
E
/
5
,
1
6
,
1
7
0
.
6
2
.
4
2
/
2
.
7
5
2
F
0
9
/
5
T
H
R
,
1
9
L
E
U
,
4
8
T
Y
R
8
4
0
2
F
2
2
/
4
2
,
4
3
,
4
5
,
4
6
,
4
8
H
I
S
,
I
L
E
,
A
R
G
,
V
A
L
,
A
S
P
1
D
Q
O
/
4
3
,
4
6
0
.
6
7
2
.
1
/
3
.
6
7
2
F
2
2
/
4
7
A
R
G
,
8
3
L
E
U
,
1
1
5
G
L
U
9
4
0
2
F
4
0
/
5
,
5
3
,
5
7
,
6
4
L
Y
S
,
G
L
U
,
L
Y
S
,
G
L
U
1
S
7
6
/
5
,
6
4
0
.
4
2
.
3
9
/
3
.
0
2
F
4
0
/
4
4
I
L
E
,
5
3
G
L
U
,
5
6
G
L
U
8
8
0
2
F
4
L
/
1
4
8
,
1
4
9
,
1
5
0
,
1
5
1
,
2
6
4
,
2
7
4
A
S
P
,
T
H
R
,
L
Y
S
,
G
L
U
,
S
E
R
,
L
Y
S
1
T
L
L
/
1
4
8
,
1
4
9
,
1
5
0
,
2
6
4
1
2
.
4
3
/
4
.
9
4
2
F
4
L
/
1
2
4
G
L
Y
,
1
2
6
I
L
E
,
1
7
5
H
I
S
1
,
1
0
0
2
F
4
N
/
6
1
,
6
3
,
6
4
,
6
5
,
1
1
4
T
H
R
,
G
L
Y
,
A
L
A
,
T
Y
R
,
T
Y
R
2
A
9
9
/
6
5
,
1
1
4
1
2
.
4
5
/
3
.
3
5
2
F
4
N
/
1
0
6
G
L
Y
,
1
4
4
T
Y
R
,
1
4
5
A
L
A
9
6
0
2
F
4
Z
/
7
0
,
9
6
,
1
1
1
,
1
1
2
,
1
4
4
,
1
5
1
G
L
U
,
L
E
U
,
M
E
T
,
L
Y
S
,
I
L
E
,
I
L
E
N
O
H
I
T
2
F
4
Z
/
8
9
T
Y
R
,
1
1
9
H
I
S
,
1
8
7
T
H
R
9
4
0
2
F
9
C
/
1
1
9
,
1
3
6
,
1
3
7
,
1
3
8
,
1
6
3
,
1
6
4
,
1
6
5
,
1
8
3
S
E
R
,
S
E
R
,
G
L
U
,
I
L
E
,
S
E
R
,
A
R
G
,
I
L
E
,
G
L
U
1
V
J
J
/
1
3
7
,
1
6
4
1
2
.
2
/
3
.
8
3
2
F
9
C
/
5
9
C
Y
S
,
6
0
T
R
P
,
6
1
I
L
E
9
6
0
2
F
B
7
/
2
1
,
2
3
,
3
3
L
Y
S
,
S
E
R
,
G
L
U
1
S
6
J
/
3
3
0
.
2
5
2
.
8
7
/
2
.
8
7
2
F
B
7
/
3
2
T
Y
R
,
7
7
P
H
E
,
7
9
G
L
Y
7
0
0
2
F
B
L
/
7
,
2
5
,
8
4
,
9
8
,
1
1
3
,
1
4
3
L
Y
S
,
G
L
N
,
L
Y
S
,
G
L
U
,
G
L
U
,
A
S
N
1
A
X
D
/
8
4
,
1
4
3
0
.
6
7
2
.
1
/
4
.
2
2
F
B
L
/
9
8
G
L
N
,
1
1
2
V
A
L
,
1
3
0
T
R
P
8
2
0
2
F
B
M
/
2
9
9
,
3
0
6
,
3
3
4
,
3
9
6
,
3
9
7
,
3
9
8
,
4
2
5
L
E
U
,
L
Y
S
,
P
H
E
,
A
L
A
,
S
E
R
,
I
L
E
,
A
S
P
1
E
X
M
/
3
9
7
,
3
9
8
,
4
2
5
1
2
.
4
3
/
4
.
2
3
2
F
B
M
/
3
9
4
L
E
U
,
4
2
6
G
L
Y
,
5
1
2
G
L
U
1
,
0
0
0
2
F
D
S
/
5
,
9
,
9
7
,
1
0
4
,
1
3
4
,
1
3
5
,
2
8
8
L
Y
S
,
A
R
G
,
L
Y
S
,
L
Y
S
,
P
R
O
,
T
H
R
,
G
L
U
1
S
5
D
/
9
7
,
1
3
4
,
1
3
5
1
2
.
1
/
3
.
3
2
F
D
S
/
1
0
3
T
Y
R
,
1
0
4
L
Y
S
,
1
0
6
A
S
N
9
6
0
2
F
E
1
/
6
,
7
,
8
,
4
2
A
S
P
,
A
L
A
,
S
E
R
,
G
L
U
1
N
T
4
/
6
,
7
,
8
0
.
7
5
2
.
4
5
/
2
.
7
7
2
F
E
1
/
3
5
T
H
R
,
1
0
3
T
Y
R
,
1
1
4
L
E
U
1
,
0
0
0
2
F
F
G
/
3
9
,
4
0
,
4
9
V
A
L
,
L
Y
S
,
G
L
U
5
E
A
T
/
3
9
,
4
0
0
.
6
7
2
.
1
/
2
.
7
2
F
F
G
/
3
7
C
Y
S
,
5
0
L
E
U
,
7
3
G
L
U
7
2
0
2
F
F
I
/
1
7
,
1
8
,
1
1
0
,
1
4
1
,
1
6
6
,
1
9
7
,
2
3
6
S
E
R
,
H
I
S
,
A
R
G
,
G
L
U
,
A
S
P
,
L
Y
S
,
A
S
P
1
B
H
5
/
1
8
,
1
4
1
,
1
6
6
1
2
.
4
/
1
1
.
0
4
2
F
F
I
/
1
7
S
E
R
,
1
8
H
I
S
,
5
8
V
A
L
1
,
0
6
0
2
F
F
M
/
3
9
,
4
4
,
4
7
,
5
2
,
6
5
,
7
7
P
R
O
,
A
S
P
,
L
Y
S
,
L
Y
S
,
L
Y
S
,
L
Y
S
1
U
E
J
/
4
4
,
5
2
0
.
6
7
2
.
1
5
/
3
.
0
3
2
F
F
M
/
6
2
S
E
R
,
6
3
V
A
L
,
7
1
T
R
P
7
8
0
B
o
l
d
n
u
m
b
e
r
s
a
r
e
g
i
v
e
n
f
o
r
r
e
s
i
d
u
e
s
r
e
c
o
g
n
i
z
e
d
b
y
F
O
D
a
n
d
a
t
l
e
a
s
t
o
n
e
o
f
t
w
o
a
n
a
l
y
z
e
d
m
e
t
h
o
d
s
(
S
u
M
o
,
P
r
o
F
u
n
c
)
.
I
t
a
l
i
c
i
z
e
d
n
u
m
b
e
r
s
i
n
d
i
c
a
t
e
w
h
e
n
t
h
e
p
o
s
i
t
i
o
n
s
o
f
a
m
i
n
o
a
c
i
d
s
d
i
f
f
e
r
b
y
1
(
c
l
o
s
e
s
t
n
e
i
g
h
b
o
r
s
)
v
e
r
s
u
s
t
h
e
C
S
A
c
l
a
s
s
i
f
i
c
a
t
i
o
n
o
r
v
e
r
s
u
s
t
h
e
p
o
s
i
t
i
o
n
f
o
u
n
d
b
y
F
O
D
.
S
u
M
o
r
e
s
u
l
t
s
a
r
e
c
h
a
r
a
c
t
e
r
i
z
e
d
b
y
:
n
u
m
b
e
r
s
o
f
a
m
i
n
o
a
c
i
d
c
o
m
m
o
n
f
o
r
F
O
D
a
n
d
S
u
M
o
.
F
O
D
/
S
u
M
o
r
a
t
i
o
e
x
p
r
e
s
s
e
s
t
h
e
p
a
r
t
o
f
a
m
i
n
o
a
c
i
d
s
c
o
m
m
o
n
f
o
r
F
O
D
v
e
r
s
u
s
t
h
e
t
o
t
a
l
n
u
m
b
e
r
o
f
r
e
s
i
d
u
e
s
p
o
i
n
t
e
d
o
u
t
b
y
S
u
M
o
a
n
d
a
d
d
i
t
i
o
n
a
l
l
y
b
y
t
h
e
r
e
l
a
t
i
o
n
b
e
t
w
e
e
n
t
h
e
S
u
M
o
s
c
o
r
e
o
f
t
h
e
s
o
l
u
t
i
o
n
c
l
o
s
e
s
t
t
o
t
h
e
o
n
e
b
a
s
e
d
o
n
t
h
e
F
O
D
m
o
d
e
l
(
h
i
g
h
e
s
t
n
u
m
b
e
r
o
f
c
o
m
m
o
n
p
o
s
i
t
i
o
n
s
)
a
n
d
t
h
e
s
c
o
r
e
v
a
l
u
e
o
f
b
e
s
t
h
i
t
,
a
s
e
s
t
i
m
a
t
e
d
b
y
S
u
M
o
.
P
r
o
F
u
n
c
s
c
o
r
e
v
a
l
u
e
s
a
r
e
g
i
v
e
n
a
c
c
o
r
d
i
n
g
t
o
t
h
e
p
r
o
g
r
a
m
c
l
a
s
s
i
f
i
c
a
t
i
o
n
.
d
o
i
:
1
0
.
1
3
7
1
/
j
o
u
r
n
a
l
.
p
c
b
i
.
0
0
3
0
0
9
4
.
t
0
0
4
PLoS Computational Biology | www.ploscompbiol.org May 2007 | Volume 3 | Issue 5 | e94 0915
Active Site Recognition In Silico
In Table 3, the columns representing FOD results show the
numbers of residues recognized by this method: agreement
with CSA classication (underlined), and residues dened by
two methodsFOD and at least one of two other methods
(SuMo, ProFunc) as biological activity-related residues (in
bold). Where the position of the amino acids differed by 1
(closest neighbors) versus the CSA classication or versus the
position found by SuMo or ProFunc, the numbers are in
italics in Table 3. The description of the SuMo and ProFunc
columns in Table 3 is given below (Comparative Analysis).
The residues recognized as potentially responsible for
binding site creation in proteins of unrecognized biological
function are given in Table 4.
Prole plots of D
~
H were used to identify the positions
recognized by the FOD model as related to functional sites.
The prole plots of D
~
H were examined for proteins of known
and unknown biological activity (Figures 1, S1, and S2; and
Figures 2, S3, and S4; respectively). The residues with the
highest D
~
H appeared as peaks in the prole plots and were
predicted to be functionally important. The values of D
~
H
indicate the level of hydrophobicity irregularity. It is inter-
preted that the higher the D
~
Hvalue, the higher the deciency
of hydrophobicity with respect to its idealized distribution
according to Gauss function. Thus, the D
~
H maxima identied
as being represented by a particular amino acid point out the
residues in the surrounding area where the hydrophobicity
deciency is signicant. In most cases, this deciency is caused
simply by the presence of a cavity or by the highly irregular
distribution of side chains. The D
~
H prole together with the
color scale visualizes the magnitude of the irregularity. The
same scale applied to the 3-D presentation of the protein
molecule is able to visualize the location of high D
~
H values,
particularly in the protein structure. It can be seen that the
residues with high D
~
H values appear to be placed in close
mutual vicinity, often creating a cleft, which can be responsible
for ligand (substrate) binding.
The 3-D representations for selected proteins of known
function are shown in Figure 3, and for selected proteins of
unknown biological function in Figure 4. Other proteins
under consideration are presented in Figures S5S7 and
Figures S8S10.
The color scale expressing the magnitude of D
~
H is as
follows: red, high D
~
H; yellow, average D
~
H; green, low and
negative D
~
H. The white color denotes the experimentally
veried amino acids as responsible for catalytic activity
(according to the CSA database). In most cases, the set of
amino acids selected according to the FOD model is larger
than the set of residues classied by CSA. This is because the
D
~
H prole also selects amino acids that are close in space,
which create well-dened putative cavities that accompany
the residues responsible for enzymatic activity. Amino acids
indicated by FOD as belonging to the binding cavity are in
space lling form.
The molecules presented in Figure 3A and 3B are selected
to show the best results; the molecules presented in Figure 3C
and 3D demonstrate the cases of low accordance. Some of the
protein molecules with high D
~
H values shown in Figure 3A
and 3B appeared to be highly accordant to the active site
location. Other proteins with high D
~
H values (Figure 3C and
3D) are not exactly located in the positions of the amino acids
that make up the catalytic site. Nevertheless, the analysis of
the larger set of proteins may suggest that the specicity of
Figure 1. Profile Plots of Hydrophobicity Deviation DH
; yellow, middle DH
.
doi:10.1371/journal.pcbi.0030094.g001
PLoS Computational Biology | www.ploscompbiol.org May 2007 | Volume 3 | Issue 5 | e94 0916
Active Site Recognition In Silico
the mutual location of the residues represented by high D
~
H
values versus the position of the enzymatic site may be
classied according to enzyme specicity.
One hypothesis is that the residues responsible for complex
xation (protein and ligand or substrate) were selected by the
FOD model. Another explanation for the mismatch between
experimentally identied and automatically identied resi-
dues is simply that for multimeric chains, only the rst chain
was present in the analysis.
Comparative Analysis
The results summarize the comparison of the model
applied to identify the ligand-binding site and two other
methods dedicated to the same purpose: ProFunc and SuMo
are given in Table 3 for proteins of known biological function
and in Table 4 for proteins of unknown biological function.
Table 3 presents the list of proteins (the PDB accession
numbers are given) accompanied by the amino acids
identied as function-related according to CSA classication.
SuMo results (for each SuMo search in question) show the
comparison with the FOD model for only one example of a
functional site found by SuMo and present the residue
numbers, which appeared to be common for these two
methods (column 4 of Table 3). The limitation to compare
only one SuMo result for one search is caused by the
specicity of output generated by the SuMo procedure, which
produces an enormous number of possible solutions for one
particular protein molecule (in most cases, thousands of
variants). Each solution is presented with regard to another
protein (PDB number given), the functional site of which
seems to be related to that found in the molecule under
analysis. This procedure proposes a list of functional sites
that sometimes represent changed functionality (e.g., ligands
of different structure/characteristics are bound). One func-
tional site with a functional site of the same/closest properties
is selected. The presentation of all results is impossible to
present here in complete form.
In column 5 of Table 3, the ratio of commonly recognized
residues to the number of all residues recognized by SuMo for
that hit is shown. As we see, the total number of amino acids
classied by SuMo in most cases is the same or exceeds the
number identied by the FOD model.
The numbers given in the last two columns (ProFunc) of
Table 3 represent positions of amino acids recognized by
ProFunc by its best hit and method score. This is why the
number of commonly recognized residues (given in bold) is
lower than in the SuMo comparison.
The results describing the analysis of proteins of unknown
biological function are shown in Table 4. The presentation is
similar to that for proteins of known biological function with
an obvious lack of underlined positions (no CSA classication
available). The SuMo results are additionally characterized by
the relation between the SuMo score of the solution closest to
that based on the FOD model (highest number of common
positions) and the score value of best hit, as estimated by
SuMo.
The comparison of the methods selected for analysis is
generally very difcult. The SuMo and ProFunc methods
represent the methodology of the stochastic nature. The FOD
seems to be a more heuristic method. SuMo and ProFunc
produce very large outputs with long lists of possible
Figure 2. Profile Plots of Hydrophobicity Deviation DH
. The protein originated in the Thermus thermophilus genome (C) and the protein originated the
Staphylococcus aureus genome (D) are examples of dispersed localization of residues representing high DH
. The protein originated in the Thermus thermophilus genome (C) and the protein originated in the
Staphylococcus aureus genome (D) are examples of dispersed localization of residues representing high DH