Sunteți pe pagina 1din 60

Protein Structure Alignments and Classifications

Protein Analysis Workshop 2006

Alain Schenkel Liisa Holm

Institute of Biotechnology University of Helsinki

Introduction:Overview What is it? Why? Sequence vs structural alignment Structural Alignments: Alignment algorithms: Dali,

Overview

What is it? Why? Sequence vs structural alignment

Structural Alignments:Overview What is it? Why? Sequence vs structural alignment Alignment algorithms: Dali, Applications: DaliLite, Dali

Alignment algorithms: Dali, Applications: DaliLite, Dali Database

Structure Classification:Alignment algorithms: Dali, Applications: DaliLite, Dali Database The fold space, SCOP, CATH , The Dali Fold

The fold space,

SCOP, CATH,

The Dali Fold Index.

Structural Alignments

The goal is to compare proteins structures and provide a measure of their level of structural similarity.

provide a measure of their level of structural similarity. One way of phrasing the ques- tion

One way of phrasing the ques- tion is:

Given two structures (PDB), what is the “best” way of super- imposing them?

Structural Alignments

More generally: a structural alignment provides a list of correspondences between residues within 2 proteins based on the 3D structures of the proteins:

DSSP .ellleEEEEEEEEEEEEeelllllhhhhhhhhlllLLLEEEEEEEEEeelllleeEEEE Query .lvsgiKYILQVEIGRTTcpkssgdlqscefhdepeMAKYTTCTFVVYsipwlnqiKLLE 104

ident

|

|

Sbjct qggplpPRLAYYVILEAG

KPGVKEGLVDLA

SLSV

87

DSSP llllllLLEEEEEEEELL

LLLEEEEEEELL

LLEE

We will need a measure of structural similarity.

Not to be confused with a sequence alignment!

Sequence vs structural alignments

Different problems:

Meaning of “similarity” for sequence:Sequence vs structural alignments Different problems: biological information enforced, between pairs of aa only. Meaning

biological information enforced, between pairs of aa only.

Meaning of “similarity” for structure:biological information enforced, between pairs of aa only. purely structural, must consider structural environment

purely structural, must consider structural environment (eg, aa in helix or sheet).

Sequence and structural alignment give different type of informations.

Sequence vs structural alignments

Different objectives:

Sequence:Sequence vs structural alignments Different objectives: Evolutionary model, Function is conserved (catalytic residues,

Evolutionary model, Function is conserved (catalytic residues, Detection of homology.

Structure:is conserved (catalytic residues, Detection of homology. ), Physical model (stability), Structure is conserved,

),

Physical model (stability), Structure is conserved, Detection of structural similarity.

Main:

Motivations

Structure classification.Main: Motivations Detection of remote homology. Function assignment. Also: Used in structure prediction: evaluation of

Detection of remote homology.Main: Motivations Structure classification. Function assignment. Also: Used in structure prediction: evaluation of

Function assignment.Structure classification. Detection of remote homology. Also: Used in structure prediction: evaluation of predicted

Also:

Used in structure prediction: evaluation of predicted structure against various template structures.Main: Motivations Structure classification. Detection of remote homology. Function assignment. Also:

Protein evolution

Evolutionary pressure is on function.Protein evolution Structure plays a key role in protein function. General response to mutation is structural

Structure plays a key role in protein function.Protein evolution Evolutionary pressure is on function. General response to mutation is structural change, but many

General response to mutation is structural change, but many mutations will not (or only slightly) change the structure.on function. Structure plays a key role in protein function. Structure is more conserved than sequence

Structure is more conserved than sequence in distantly related proteins.mutations will not (or only slightly) change the structure. In homolog proteins with similar function: active

In homolog proteins with similar function:more conserved than sequence in distantly related proteins. active site residues are conserved, non-functional residues

active site residues are conserved, non-functional residues may vary considerably.

Caveat

There are cases where a good sequence alignment is not observed in the structure.

Eg: 1hmpA (phosphoribosyltransferase, human) and 1piv1 (coat protein, virus). Sequence similarity is 40% over red regions, but:

(phosphoribosyltransferase, human) and 1piv1 (coat protein, virus). Sequence similarity is 40% over red regions, but:
(phosphoribosyltransferase, human) and 1piv1 (coat protein, virus). Sequence similarity is 40% over red regions, but:

Using sequence and structural alignments

Low sequence similarity (below twilight zone 20-30%):

Structural alignment guides sequence alignment.Low sequence similarity (below twilight zone 20-30%): If conserved residues are clustered in active site: strong

If conserved residues are clustered in active site: strong evidence of homology.20-30%): Structural alignment guides sequence alignment. Convergent evolution: structural similarity alone does not

Convergent evolution: structural similarity alone does not rule out convergent evolution.

Sequence alignment helps to distinguish between convergent and divergent evolution.strong evidence of homology. Convergent evolution: structural similarity alone does not rule out convergent evolution.

Example

Consider the two proteins:

sperm whale myoglobin (PDB : 1jw8), PDB: 1jw8),

cyanobacterial C-phycocyanin (PDB : 1cpc). PDB: 1cpc).

We know:

sequence identity: 15%,1jw8), cyanobacterial C-phycocyanin ( PDB : 1cpc). We know: functionally different (oxygen transport/photosynthesis),

functionally different (oxygen transport/photosynthesis), but common features (binding, transport).( PDB : 1jw8), cyanobacterial C-phycocyanin ( PDB : 1cpc). We know: sequence identity: 15%, homology

homology is questionable.

Structural comparison:

similarity is evident,Structural comparison: Dali Z-score 8.5 (it’s good), same fold. Stronger case for homology. Last common ancestor:

Dali Z-score 8.5 (it’s good),Structural comparison: similarity is evident, same fold. Stronger case for homology. Last common ancestor: 3 billions

same fold.similarity is evident, Dali Z-score 8.5 (it’s good), Stronger case for homology. Last common ancestor: 3

Stronger case for homology.

Dali Z-score 8.5 (it’s good), same fold. Stronger case for homology. Last common ancestor: 3 billions

Last common ancestor:

3 billions years ago.

The Structure Alignment Problem

Pairwise alignmentThe Structure Alignment Problem Database searches Demo Dali

Database searchesThe Structure Alignment Problem Pairwise alignment Demo Dali

Demo DaliThe Structure Alignment Problem Pairwise alignment Database searches

Structural alignment algorithms

Different problems:

Pairwise alignmentStructural alignment algorithms Different problems: Multiple alignment Database searches Different types: Global alignment

Multiple alignmentalignment algorithms Different problems: Pairwise alignment Database searches Different types: Global alignment Local

Database searchesalgorithms Different problems: Pairwise alignment Multiple alignment Different types: Global alignment Local alignment

Different types:

Global alignmentalgorithms Different problems: Pairwise alignment Multiple alignment Database searches Different types: Local alignment

Local alignmentalgorithms Different problems: Pairwise alignment Multiple alignment Database searches Different types: Global alignment

Structural Similarity

How can we measure the level of structural similarity between two proteins?

Structural Similarity How can we measure the level of structural similarity between two proteins?
Structural Similarity How can we measure the level of structural similarity between two proteins?

Measures of Similarity

Local measures:Measures of Similarity Root mean square deviation Intramolecular distance Secondary structure orientation Overall measures

Root mean square deviation Intramolecular distance Secondary structure orientation

Overall measures (less sensitive):deviation Intramolecular distance Secondary structure orientation Secondary structure content Histogram of distances

Secondary structure content Histogram of distances

Root Mean Square Deviation

Find an alignment which minimizes the root mean square deviation:

an alignment which minimizes the root mean square deviation: where is the distance between the two

where

is the distance between the two residues in the

pair of the alignment.

th

Involves rigid body motions (translation and rotation) and deciding which residues should be aligned.

Structure Superimposition

The root mean square deviation is best used at refining a structure superimposition starting from an alignment:

Given a structural alignment, find the superimposition that minimizes the RMSD.a structure superimposition starting from an alignment: This problem has an exact solution. Not everything that

This problem has an exact solution.find the superimposition that minimizes the RMSD. Not everything that is aligned can be superimposed at

Not everything that is aligned can be superimposed at the same time:

the RMSD. This problem has an exact solution. Not everything that is aligned can be superimposed
the RMSD. This problem has an exact solution. Not everything that is aligned can be superimposed

Intramolecular Distance

Compute the intramolecular distance between allIntramolecular Distance residues: one distance matrix per protein. Find the alignment for which corresponding residues

residues:

one distance matrix per protein.

Find the alignment for which corresponding residues in pairs of aligned residues have best matching distance.between all residues: one distance matrix per protein. A104 6.7Å A132 B56 6.3Å B202 Aligned pairs:

A104

6.7Å

A132

B56

6.3Å

B202

Aligned pairs:

(A104, B56)

(A104, B56)

(A132, B202)

(A132, B202)

(A104, B56) (A132, B202)  
 

Advantage: no translation/rotation needed.

Structural Alignment Algorithms

A structure alignment algorithm involves:

Choosing which parts of structure should be compared. A typical choice is: structural domain (ie, independent folding unit).Algorithms A structure alignment algorithm involves: Choice of structure representation and similarity measure.

Choice of structure representation and similarity measure. Eg:choice is: structural domain (ie, independent folding unit). all atoms or backbone, with intramolecular distance,

all atoms or backbone, with intramolecular distance, secondary structure orientation.

Computing alignments. The hard part:intramolecular distance, secondary structure orientation. cannot proceed sequentially (as for sequence) no dynamic

cannot proceed sequentially (as for sequence) no dynamic programming: heuristic only.

Statistical significance of alignment.Computing alignments. The hard part: cannot proceed sequentially (as for sequence) no dynamic programming: heuristic only.

The Dali Algorithm

Dali: Distance Matrix Alignment

Compares structural domainsThe Dali Algorithm Dali: Distance Matrix Alignment Structure representation: backbone Similarity measure: distance matrix

Structure representation: backboneDali: Distance Matrix Alignment Compares structural domains Similarity measure: distance matrix Algorithm: seeks maximal

Similarity measure: distance matrixstructural domains Structure representation: backbone Algorithm: seeks maximal matching submatrices Significance:

Algorithm: seeks maximal matching submatricesCompares structural domains Structure representation: backbone Similarity measure: distance matrix Significance: Z-score

Significance: Z-scoreStructure representation: backbone Similarity measure: distance matrix Algorithm: seeks maximal matching submatrices

Distance Matrix

Faithful representation of 3D structure.Distance Matrix In particular, conserved if homology. No reference frame needed. Example: contact matrix for 1A34-A

In particular, conserved if homology.Distance Matrix Faithful representation of 3D structure. No reference frame needed. Example: contact matrix for 1A34-A

No reference frame needed.representation of 3D structure. In particular, conserved if homology. Example: contact matrix for 1A34-A (cutoff=6Å)

Example:

contact matrix

for 1A34-A (cutoff=6Å)

In particular, conserved if homology. No reference frame needed. Example: contact matrix for 1A34-A (cutoff=6Å)
In particular, conserved if homology. No reference frame needed. Example: contact matrix for 1A34-A (cutoff=6Å)

The Dali Algorithm

Find good matching submatrices according to some scoring scheme.The Dali Algorithm Scoring scheme: down-weigh pairs involving distant r e s i d u e

Scoring scheme: down-weigh pairs involving distantgood matching submatrices according to some scoring scheme. r e s i d u e s

residues (eg, cutoff

contact matrix).

Extend a matching submatrix as long as it is beneficial.involving distant r e s i d u e s (eg, cutoff contact matrix) . Eg:

Eg:

1A34-A (147 aa long)

contact matrix) . Extend a matching submatrix as long as it is beneficial. Eg: 1A34-A (147

1B35-A (260 aa long)

contact matrix) . Extend a matching submatrix as long as it is beneficial. Eg: 1A34-A (147

Statistical significance: Z-score

As for sequence alignment, we need a normalized score in order to compare different structural alignment.

What similarity can we expect from chance?score in order to compare different structural alignment. Align unrelated pairs scores from random matches a

Align unrelated pairsalignment. What similarity can we expect from chance? scores from random matches a distribution of scores.

scores from random matches a distribution of scores.

Z-scorepairs scores from random matches a distribution of scores. number of standard deviations above the mean.

number of standard deviations above the

mean.

Z-score

very good alignment

medium range

: limit of alignment detection

number of standard deviations above the mean. Z-score – very good alignment medium range : limit

Tools built on Dali

Pairwise structural alignment: DaliLiteTools built on Dali Structural domain assignment Database searches Database of structural alignments: Dali Database

Structural domain assignmentTools built on Dali Pairwise structural alignment: DaliLite Database searches Database of structural alignments: Dali

Database searchesstructural alignment: DaliLite Structural domain assignment Database of structural alignments: Dali Database Structure

Database of structural alignments: Dali Databasestructural alignment: DaliLite Structural domain assignment Database searches Structure classification: Dali Fold Index

Structure classification: Dali Fold Indexalignment: DaliLite Structural domain assignment Database searches Database of structural alignments: Dali Database

Structural Alignement on the Web

DaliLite serve r ver

http://www.ebi.ac.uk/DaliLite/

SSAP s erve r server

http://cathwww.biochem.ucl.ac.uk/cgi-bin/cath/SsapServer.pl

VAST s erve r server

http://www.ncbi.nlm.nih.gov/Structure/VAST/vastsearch.html

CE serv er rver

http://cl.sdsc.edu/

and ma ny o th ers: F UGU E , 3DCoffee, many others: FUGUE, 3DCoffee,

Demo DaliLite

Main server:

http://www.ebi.ac.uk/DaliLite/

Let us submit to DaliLite the PDBs:

1A34-A (satellite tobacco mosaic virus) A (satellite tobacco mosaic virus)

1B35-A (cricket paralysis virus) A (cricket paralysis virus)

Best structural alignment of 1a34-A and 1b35-A:

Best structural alignment of 1a34-A and 1b35-A:

Structure superimposition of 1a34-A and 1b35-A:

Structure superimposition of 1a34-A and 1b35-A:

Database Searches

As for sequence alignment against sequence databases, we hit a speed bottleneck:

Must perform many pairwise comparisons: there are structures in the PDB (7 Nov 2006). sec
Must perform many pairwise comparisons: there are
structures in the PDB (7 Nov 2006).
sec
hours.

Solution:

Build and use representative databases,structures in the PDB (7 Nov 2006). sec hours. Solution: Align only against representatives, Speed up

Align only against representatives,sec hours. Solution: Build and use representative databases, Speed up the alignment algorithm (by making approximations).

Speed up the alignment algorithm (by making approximations).the PDB (7 Nov 2006). sec hours. Solution: Build and use representative databases, Align only against

Representative Database

PDB database contains much redundancy:Representative Database Mutation studies, Same structure solved by different groups. Remove all structures with similar

Mutation studies, Same structure solved by different groups.

Remove all structures with similar sequences (eg, PDB90 : structures sharing not more than 90% identity): PDB90: structures sharing not more than 90% identity):

groups. Remove all structures with similar sequences (eg, PDB90 : structures sharing not more than 90%

Speed up the Algorithm

Filtering:Speed up the Algorithm Use fast but less accurate algorithms to remove everything that is very

Use fast but less accurate algorithms to remove everything that is very dissimilar. Dali: quick look-up based on alignment of secondary structures elements.

Hierarchical alignment:to remove everything that is very dissimilar. Dali: quick look-up based on alignment of secondary structures

that is very dissimilar. Dali: quick look-up based on alignment of secondary structures elements. Hierarchical alignment:
that is very dissimilar. Dali: quick look-up based on alignment of secondary structures elements. Hierarchical alignment:
that is very dissimilar. Dali: quick look-up based on alignment of secondary structures elements. Hierarchical alignment:

Dali Database

A database of precomputed structural alignments

Database search is slow and the number of structures in the PDB database is growing.Dali Database A database of precomputed structural alignments Dali is fast enough to carry out regular

Dali is fast enough to carry out regular updates of the alignment database to account for newly released structures, all-against-all comparisons leading to a classification of protein domain structures.of precomputed structural alignments Database search is slow and the number of structures in the PDB

Demo Dali Database

Consider the Haemophilus influenzae protein Y065_HAEIN:

Function unkown, n,

No well annotated homologs, ed homologs,

Structure (1htw, 1fl 9) solved by t he Str uc tural Genomics 1fl9) solved by the Structural Genomics

Project (http://www.structuralgenomics.org/).

Let us use the Dali Database to check whether there are some structures in the PDB which are structurally closely related to 1htw:

Dali Database server:PDB which ar e stru ctu rally closely related to 1htw: http://ekhidna.biocenter.hel si nki .f i/dali/start

http://ekhidna.biocenter.helsinki.fi/dali/start

Dali server at EBI (to upload your own structure):Dali Database server: http://ekhidna.biocenter.hel si nki .f i/dali/start http://www.ebi.ac.uk/dali/Interactive.html

http://www.ebi.ac.uk/dali/Interactive.html

Demo Dali Database

Demo Dali Database structural domain list of all domains represented by 1htwA_1
Demo Dali Database structural domain list of all domains represented by 1htwA_1
Demo Dali Database structural domain list of all domains represented by 1htwA_1

structural domain list of all domains represented by 1htwA_1

Demo Dali Database

Structural similarity found:

Demo Dali Database Structural similarity found: list of representatives in neighbourhood of 1htwA

list of representatives in neighbourhood of 1htwA

Demo Dali Database

Structure superimposition of 1htwA and best hits 1iqpA, 1nsf:

Demo Dali Database Structure superimposition of 1htwA and best hits 1iqpA , 1nsf :

Demo Dali Database

Weak residue conservation found:

Demo Dali Database Weak residue conservation found:

Demo Dali Database

So, we have found for 1htw a medium quality hit with 1iqp. Annotation for 1iqp (RFCS_PYRFU):

Replication factor C small subunitquality hit with 1iqp. Annotation for 1iqp ( RFCS_PYRFU ): DNA replication This gives hints for

DNA replicationfor 1iqp ( RFCS_PYRFU ): Replication factor C small subunit This gives hints for the function

This gives hints for the function of 1htw. The stacked multiple structure alignment emphasizes some residue conservation. Whether Y065_HAEIN does indeed interact with DNA must be confirmed experimentally.

Structure Classification

The Fold SpaceStructure Classification SCOP, CATH Dali Fold Index

SCOP, CATHStructure Classification The Fold Space Dali Fold Index

Dali Fold IndexStructure Classification The Fold Space SCOP, CATH

The Fold Space

Protein folds occupy a small portion of the space of all possible stable polypeptide chain structures.

Biophysical constraints:

fast folding,

fast folding,

appropriate stability properties,

appropriate stability properties,

fast folding, appropriate stability properties,  
 

Question: What is the fold space covered by protein structures?

PDB Content Growth

As of 7 Nov 2006: 39969 structures.

PDB Content Growth As of 7 Nov 2006: 39969 structures. Protein Data Bank: http://www.rcsb.org/pdb PDB Statistics

Protein Data Bank: http://www.rcsb.org/pdb

PDB Statistics

The Fold Space

About 40 thousands structures in the PDB. But:

Many structures solved several times.The Fold Space About 40 thousands structures in the PDB. But: Many structures share the same

Many structures share the same basic fold.in the PDB. But: Many structures solved several times. Surprisingly, the number of basic folds observed

Surprisingly, the number of basic folds observed so far is small.several times. Many structures share the same basic fold. Current estimates of the total number of

Current estimates of the total number of folds in nature:the number of basic folds observed so far is small. between 1000 and 10’000. Possible reasons:

between 1000 and 10’000.

Possible reasons:

All folds derived from a small group of shared common ancestors.in nature: between 1000 and 10’000. Possible reasons: Small set of biologically favored folds ( evolution).

Small set of biologically favored folds ( evolution).between 1000 and 10’000. Possible reasons: All folds derived from a small group of shared common

convergent

(Orengo et al.,1997)

(Orengo et al.,1997)

(Orengo et al., 1997)

(Orengo et al., 1997)

Inspecting the Fold Space

Evolutionary constraintsInspecting the Fold Space Biophysical constraints There is some organization. (Bourne et al., 2003) How can

Biophysical constraintsInspecting the Fold Space Evolutionary constraints There is some organization. (Bourne et al., 2003) How can

There is some organization.

(Bourne et al., 2003)

There is some organization. (Bourne et al., 2003) How can we map the fold space? Choice

How can we map the fold space? Choice of:

resolution level,some organization. (Bourne et al., 2003) How can we map the fold space? Choice of: distance:

distance: evolutionary or structural.constraints There is some organization. (Bourne et al., 2003) How can we map the fold space?

Structure Classifications

The major players:

CATH:Structure Classifications The major players: semi-automated (uses SSAP), structure (and sequence) based. SCOP: manual,

semi-automated (uses SSAP), structure (and sequence) based.

SCOP:semi-automated (uses SSAP), structure (and sequence) based. manual, homology and structure based. Dali Fold Index: fully

manual, homology and structure based.

Dali Fold Index:sequence) based. SCOP: manual, homology and structure based. fully automated, structure based. All: hierarchical

fully automated, structure based.

All: hierarchical classification.

CATH Class: , , . Architecture: 3D packing of SSEs. Topology: SSEs succession in the

CATH

Class:

CATH Class: , , . Architecture: 3D packing of SSEs. Topology: SSEs succession in the chain.

,

,

.

Architecture:

3D packing of SSEs.CATH Class: , , . Architecture: Topology: SSEs succession in the chain. Homology: structural/sequence similarity.

Topology:

CATH Class: , , . Architecture: 3D packing of SSEs. Topology: SSEs succession in the chain.

SSEs succession in the chain.

Homology:

, , . Architecture: 3D packing of SSEs. Topology: SSEs succession in the chain. Homology: structural/sequence

structural/sequence similarity.

CATH Status

Release 3.0.0 (4 May 2006):

CATH Sta tu s Release 3.0.0 (4 May 2006): URL: http://cathwww.biochem.ucl.ac.uk/latest/

URL: http://cathwww.biochem.ucl.ac.uk/latest/

SCOP

SCOP: Structure Classification Of Proteins

1. Classes:

SCOP SCOP: Structure Classification Of Proteins 1. Classes: , , 2. Folds: , , membrane, coiled-coils,

,

,

2. Folds:

,

, membrane, coiled-coils, small,

structural similarityClasses: , , 2. Folds: , , membrane, coiled-coils, small, 3. Superfamilies: some evolutionary relationship 4.

3. Superfamilies:

coiled-coils, small, structural similarity 3. Superfamilies: some evolutionary relationship 4. Families: clear homology

some evolutionary relationship

4. Families:

3. Superfamilies: some evolutionary relationship 4. Families: clear homology 5. Domains: independent folding unit

clear homology

5. Domains:

3. Superfamilies: some evolutionary relationship 4. Families: clear homology 5. Domains: independent folding unit

independent folding unit

SCOP Status

Release 1.69 (1 Oct 2004): 25973 PDB Entries

SCOP Status Release 1.69 (1 Oct 2004): 25973 PDB En tries URL: http://scop.mrc-lmb.cam.ac.uk/scop/

URL: http://scop.mrc-lmb.cam.ac.uk/scop/

Dali Fold Classification

The classification is also hierarchical, but established in an automated way:

all-against-all pairwise comparison using DALI within a set of representative structures. DALI within a set of representative structures.

hierarchical clustering of results by Z-score.way: all-against-all pairwise comparison using DALI within a set of representative structures. (Dietmann&Holm, 2001)

DALI within a set of representative structures. hierarchical clustering of results by Z-score. (Dietmann&Holm, 2001)

(Dietmann&Holm, 2001)

2

4

8

16 32

64

distance in Z-score

1nu3B_1 1sgoA_1 1g96A_1 1us1A_1 1n9eA_1 1tu5A_1 1ekmA_3 1tu5A_2 1us1A_2 1n9eA_2 1ekmA_1 1n5pA_0 1pfpA_1
1nu3B_1
1sgoA_1
1g96A_1
1us1A_1
1n9eA_1
1tu5A_1
1ekmA_3
1tu5A_2
1us1A_2
1n9eA_2
1ekmA_1
1n5pA_0
1pfpA_1
1v2bB_1
1xqmA_1
1uisA_1
1movA_1
1g7kA_1
1xa9A_1
1ggxA_1
1kp5A_1
1gl4A_1
1vq0A_1
1vzyA_1
1hw7A_1
2aak_1
1qcqA_1
1j7dB_1
1jatA_1
2e2c_1
1i7kA_1
2ucz_1
1jatB_1
1j7dA_1
1ayzA_1
1q34A_1
1pzvA_1
1kpsA_1
1jasA_1
1c4zD_1
1uzxA_1
1kppA_1
1m1lA_1
1q5fA_1
1soyA_1

1w2zA_1

1iqxA_1

1oacB_2

1v5rA_1

1us1A_3

1tu5A_3

1ekmA_2

1n9eA_3

1x9yA_1

1ur6A_1

1yh2A_1

1y8xA_1

1y6lA_1

1ylaA_1

1yh6A_1

1yf9B_1

1tteA_1

1eazA_1 1fb8A_1 1upqA_1 1qqgB_1 1dynA_1 1btkB_1 1unpA_1 1v89A_1 1p6sA_1 1wgqA_1 1v5uA_1 1v5mA_1 1ntyA_2
1eazA_1
1fb8A_1
1upqA_1
1qqgB_1
1dynA_1
1btkB_1
1unpA_1
1v89A_1
1p6sA_1
1wgqA_1
1v5uA_1
1v5mA_1
1ntyA_2
1kz7A_2
1foeA_2
1w1hB_1
1ki1B_2
1x86A_2
1v61A 1

Dali Fold status:

3107 folds on top of hierarchy.1w1hB_1 1ki1B_2 1x86A_2 1v61A 1 Dali Fold status: Highly inhomogeneous: few folds used many times, many

Highly inhomogeneous:1v61A 1 Dali Fold status: 3107 folds on top of hierarchy. few folds used many times,

few folds used many times, many folds used few times.

This inhomogeneity is observed in other classifications (eg, CATH) and seems to be a property of the fold space.

Domain/Fold Distribution

Population of fold types:

Domain/Fold Distribution Population of fold types: (Holm and Sander., 1998) 40% of domains covered by 10

(Holm and Sander., 1998)

40% of domains covered by 10 fold types,Population of fold types: (Holm and Sander., 1998) Each remaining fold: cover less than 1%. A

Each remaining fold: cover less than 1%.and Sander., 1998) 40% of domains covered by 10 fold types, A projection of the fold

by 10 fold types, Each remaining fold: cover less than 1%. A projection of the fold

A projection of the fold space onto the plane.

/

clear uneven distribution.fold: cover less than 1%. A projection of the fold space onto the plane. / blue:

A projection of the fold space onto the plane. / clear uneven distribution. blue: , red:

blue:

, red:

, green:

(Holm and Sander., 1998)

Demo: Dali Fold Index

Let us browse the Dali fold map at:

http://ekhidna.biocenter.helsinki.fi/dali/start

Fold Index.

Dali Fold Index Let us bro ws e the Dali fold map at: http://ek hi dna.biocenter.helsinki.fi/dali/start

Eg: both 1a34 and 1b35 (a previous example) are in fold 1923:

Eg: both 1a34 and 1b35 (a previous example) are in fold 1923:

References

Articles:

L. Holm and C. Sander: Protein Structure Comparison

L.

Holm and C. Sander: Protein Structure Comparison

by Alignment of Distance Matrices. J. Mol. Biol. (1993) 233, 123–138.

L. Holm and C. Sander: Mapping the Protein Universe.

L.

Holm and C. Sander: Mapping the Protein Universe.

Science (1996) 273, 595–603.

L. Holm and C. Sander: Dictionnary of Recurrent

L.

Holm and C. Sander: Dictionnary of Recurrent

Domains in Protein Structures. PROTEINS (1998) 33,

88–96.

Books:

Structural Bioinformatics. Edited by P. E. Bourne and H. Weissig, Wiley-Liss, 2003. Edited by P. E. Bourne and H. Weissig, Wiley-Liss, 2003.