Sunteți pe pagina 1din 29

Structural

Classification of
Proteins database

The Structural Classification of Proteins


(SCOP) database is a largely manual
classification of protein structural
domains based on similarities of their
structures and amino acid sequences. A
motivation for this classification is to
determine the evolutionary relationship
between proteins. Proteins with the same
shapes but having little sequence or
functional similarity are placed in
different superfamilies, and are assumed
to have only a very distant common
ancestor. Proteins having the same shape
and some similarity of sequence and/or
function are placed in "families", and are
assumed to have a closer common
ancestor.
SCOP

Content

Description Protein Structure


Classification

Contact

Research center Laboratory of


Molecular Biology

Authors Alexey G. Murzin,


Steven E. Brenner, Tim
J. P. Hubbard, and
Cyrus Chothia

Primary citation PMID 7723011

Release date 1994


Access

Website http://scop.mrc-
lmb.cam.ac.uk/scop/

Miscellaneous

Version 1.75 (June 2009;


110,800 domains in
38,221 structures
classed as 3,902
families)[1]

Curation policy manual


SCOPe
Content

Description SCOP - extended

Contact

Authors Naomi K. Fox, Steven E.


Brenner, and John-
Marc Chandonia

Primary citation PMID 24304899

Access

Website https://scop.berkeley.edu

Miscellaneous

Version 2.07 (March 2018;


276,231 domains in
87,224 structures
classed as 4,919
families)[1]
Curation policy manual (new
classifications) and
automated (new
structures BLAST)

The SCOP database is freely accessible on


the internet. SCOP was created in 1994 in
the Centre for Protein Engineering and
the Laboratory of Molecular Biology.[2] It
was maintained by Alexey G. Murzin and
his colleagues in the Centre for Protein
Engineering until its closure in 2010 and
subsequently at the Laboratory of
Molecular Biology in Cambridge,
England.[3][4][5]
Similar to CATH and Pfam databases,
SCOP provides a classification of
individual structural domains of proteins,
rather than a classification of the entire
proteins which may include a significant
number of different domains.

As of January 2014, the work on SCOP has


been discontinued and the last official
version of SCOP is 1.75 (released June
2009). Since then SCOPe from UC
Berkeley has been responsible for
updating the database in a compatible
manner, with a combination of
automated and manual methods. As of
April 2019, the latest release is SCOPe
2.07 (March 2018).[6] The prototype of a
new Structural Classification of Proteins 2
(SCOP2) database from Cambridge has
been made publicly available. SCOP2
replaces the SCOP hierarchy with a
directed acyclic graph for more flexibility
and retains its best features. It is
incompatible with SCOP and is yet to be
populated.[6]

Hierarchical organisation
The source of protein structures is the
Protein Data Bank. The unit of
classification of structure in SCOP is the
protein domain. What the SCOP authors
mean by "domain" is suggested by their
statement that small proteins and most
medium-sized ones have just one
domain,[7] and by the observation that
human hemoglobin,[8] which has an α2β2
structure, is assigned two SCOP domains,
one for the α and one for the β subunit.

The shapes of domains are called "folds"


in SCOP. Domains belonging to the same
fold have the same major secondary
structures in the same arrangement with
the same topological connections. 1195
folds are given in SCOP version 1.75.
Short descriptions of each fold are given.
For example, the "globin-like" fold is
described as core: 6 helices; folded leaf,
partly opened. The fold to which a
domain belongs is determined by
inspection, rather than by software.

The levels of SCOP are as follows.

1. Class: Types of folds, e.g., beta


sheets.
2. Fold: The different shapes of
domains within a class.
3. Superfamily: The domains in a fold
are grouped into superfamilies,
which have at least a distant
common ancestor.
4. Family: The domains in a
superfamily are grouped into
families, which have a more recent
common ancestor.
5. Protein domain: The domains in
families are grouped into protein
domains, which are essentially the
same protein.
6. Species: The domains in "protein
domains" are grouped according to
species.
7. Domain: part of a protein. For simple
proteins, it can be the entire protein.

Classes
The broadest groups on SCOP are the
protein fold classes. These classes group
structures with similar secondary
structure composition, but different
overall tertiary structures and
evolutionarily origins. This is the top level
"root" of the SCOP hierarchical
classification.

1. All alpha proteins [46456] (284):


Domains consisting of α-helices
2. All beta proteins [48724] (174):
Domains consisting of β-sheets
3. Alpha and beta proteins (a/b)
[51349] (147): Mainly parallel beta
sheets (beta-alpha-beta units)
4. Alpha and beta proteins (a+b)
[53931] (376): Mainly antiparallel
beta sheets (segregated alpha and
beta regions)
5. Multi-domain proteins (alpha and
beta) [56572] (66): Folds consisting of
two or more domains belonging to
different classes
6. membrane and cell surface proteins
and peptides [56835] (58): Does not
include proteins in the immune
system
7. Small proteins [56992] (90): Usually
dominated by metal ligand, cofactor,
and/or disulfide bridges
8. coiled-coil proteins [57942] (7): Not a
true class
9. Low resolution protein structures
[58117] (26): Peptides and
fragments. Not a true class
10. Peptides [58231] (121): peptides and
fragments. Not a true class.
11. Designed proteins [58788] (44):
Experimental structures of proteins
with essentially non-natural
sequences. Not a true class

The number in brackets, called a "sunid",


is a SCOP unique integer identifier for
each node in the SCOP hierarchy. The
number in parentheses indicates how
many elements are in each category. For
example, there are 284 folds in the "All
alpha proteins" class. Each member of the
hierarchy is a link to the next level of the
hierarchy.

Folds

Each class contains a number of distinct


folds. This classification level indicates
similar tertiary structure, but not
necessarily evolutionary relatedness. For
example, the "All-α proteins" class
contains >280 distinct folds, including:
Globin-like (core: 6 helices; folded leaf,
partly opened), long alpha-hairpin (2
helices; antiparallel hairpin, left-handed
twist) and Type I dockerin domains
(tandem repeat of two calcium-binding
loop-helix motifs, distinct from the EF-
hand).

Superfamilies

Domains within a fold are further


classified into superfamilies. This is a
largest grouping of proteins for which
structural similarity is sufficient to
indicate evolutionary relatedness and
therefore share a common ancestor.
However, this ancestor is presumed to be
distant, because the different members of
a superfamily have low sequence
identities. For example, the two
superfamilies of the "Globin-like" fold are:
the Globin superfamily and alpha-helical
ferredoxin superfamily (contains two Fe4-
S4 clusters).

Families

Protein families are more closely related


than superfamilies. Domains are placed
in the same family if that have either:

1. >30% sequence identity


2. some sequence identity (e.g., 15%)
and perform the same function
The similarity in sequence and structure is
evidence that these proteins have a closer
evolutionary relationship than do
proteins in the same superfamily.
Sequence tools, such as BLAST, are used
to assist in placing domains into
superfamilies and families. For example,
the four families in the "globin-like"
superfamily of the "globin-like" fold are
truncated hemoglobin (lack the first
helix), nerve tissue mini-hemoglobin (lack
the first helix but otherwise is more
similar to conventional globins than the
truncated ones), globins (Heme-binding
protein), and phycocyanin-like
phycobilisome proteins (oligomers of two
different types of globin-like subunits
containing two extra helices at the N-
terminus binds a bilin chromophore).
Families in SCOP are each assigned a
concise classification string, sccs, where
the letter identifies the class to which the
domain belongs; the following integers
identify the fold, superfamily, and family,
respectively (e.g., a.1.1.2 for the "Globin"
family).[9]

PDB entry domains

A "TaxId" is the taxonomy ID number and


links to the NCBI taxonomy browser,
which provides more information about
the species to which the protein belongs.
Clicking on a species or isoform brings up
a list of domains. For example, the
"Hemoglobin, alpha-chain from Human
(Homo sapiens)" protein has >190 solved
protein structures, such as 2dn3
(complexed with cmo), and 2dn1
(complexed with hem, mbn, oxy). Clicking
on the PDB numbers is supposed to
display the structure of the molecule, but
the links are currently broken (links work
in pre-SCOP).

Example
Most pages in SCOP contain a search box.
Entering "trypsin +human" retrieves
several proteins, including the protein
trypsinogen from humans. Selecting that
entry displays a page that includes the
"lineage", which is at the top of most
SCOP pages.

Human trypsonogen lineage


1. Root: scop
2. Class: All beta proteins [48724]
3. Fold: Trypsin-like serine proteases
[50493]
barrel, closed; n=6, S=8; greek-key
duplication: consists of two
domains of the same fold
4. Superfamily: Trypsin-like serine
proteases [50494]
5. Family: Eukaryotic proteases [50514]
6. Protein: Trypsin(ogen) [50515]
7. Species: Human (Homo sapiens)
[TaxId: 9606] [50519]

Searching for "Subtilisin" returns the


protein, "Subtilisin from Bacillus subtilis,
carlsberg", with the following lineage.

Subtilisin from Bacillus subtilis,


carlsberg lineage
1. Root: scop
2. Class: Alpha and beta proteins (a/b)
[51349]
Mainly parallel beta sheets (beta-
alpha-beta units)
3. Fold: Subtilisin-like [52742]
3 layers: a/b/a, parallel beta-sheet
of 7 strands, order 2314567; left-
handed crossover connection
between strands 2 & 3
4. Superfamily: Subtilisin-like [52743]
5. Family: Subtilases [52744]
6. Protein: Subtilisin [52745]
7. Species: Bacillus subtilis, carlsberg
[TaxId: 1423] [52746]

Although both of these proteins are


proteases, they do not even belong to the
same fold, which is consistent with them
being an example of convergent
evolution.

Comparison to other
classification systems
SCOP classification is more dependent on
manual decisions than the semi-
automatic classification by CATH, its chief
rival. Human expertise is used to decide
whether certain proteins are evolutionary
related and therefore should be assigned
to the same superfamily, or their
similarity is a result of structural
constraints and therefore they belong to
the same fold. Another database, FSSP, is
purely automatically generated (including
regular automatic updates) but offers no
classification, allowing the user to draw
their own conclusion as to the
significance of structural relationships
based on the pairwise comparisons of
individual protein structures.

SCOP successors

By 2009, the original SCOP database


manually classified 38,000 PDB entries
into a strictly hierarchical structure. With
the accelerating pace of protein structure
publications, the limited automation of
classification could not keep up, leading
to a non-comprehensive dataset. The
Structural Classification of Proteins
extended (SCOPe) database was released
in 2012 with far greater automation of the
same hierarchical system and is full
backwards compatible with SCOP. In
2014, manual curation was reintroduced
into SCOPe to maintain accurate structure
assignment. As of February 2015, SCOPe
2.05 classified 71,000 of the 110,000 total
PDB entries.[6]

The Evolutionary Classification of Protein


Domains (ECOD) database released in
2014 is a similar expansion of the SCOP.
Unlike the compatible SCOPe, it renames
the class-fold-superfamily-family
hierarchy into an architecture-X-
homology-topology-family (A-XHTF)
grouping, with the last level mostly
defined by Pfam and supplemented by
HHsearch clustering for uncategorized
sequences.[10] ECOD has the best PDB
coverage of all three sucessors: it covers
every PDB structure, and is updated
biweekly.[11] The direct mapping to Pfam
has proven useful to Pfam curators who
use the homology-level category to
supplement their "clan" grouping.[12]

SCOP2 is a prototype classification system


that aims to more the evolutionary
complexity inherent in protein structure
evolution. It is therefore not a simple
hierarchy, but a directed acyclic graph
network connecting protein superfamilies
representing structural and evolutionary
relationships such as circular
permutations, domain fusion and domain
decay. Consequently, domains are not
separated by strict fixed boundaries, but
rather are defined by their relationships
to the most similar other structures. As of
March 2018, the SCOP2 prototype
classifies 995 PDB entries.[6]

See also
Structural alignment
CATH
FSSP
SUPERFAMILY
Pfam

References
1. Chandonia JM, Fox NK, Brenner SE
(January 2019). "SCOPe: classification
of large macromolecular structures
in the structural classification of
proteins-extended database" .
Nucleic Acids Research. 47 (D1):
D475–D481.

S-ar putea să vă placă și