Sunteți pe pagina 1din 17

2D Gel Databases

2D Gel Databases www.expasy.ch - Swiss-2DPAGE

http://www.anl.gov/BIO/PMG/ - Mouse liver, human breast cell


lines, pyrococcus. Argonne Protein Mapping Group.

http://www.harefield.nthames.nhs.uk/nhli/protein/index.html - HSC-
2DPAGE, Heart Science Centre, Harefield Hospital

http://oto.wustl.edu/thc/peri-gels.htm - Washington Univ. Inner Ear


Protein Database

http://ca.expasy.org/ch2d/2d-index.html - World 2DPAGE, Index of


2D gel databases

Federated 2D PAGE database Federated 2D PAGE database

z Described by Appel et al (1996) z Rules:


– Rule 1 – Individual entries in the database must be accessible by a keyword
z Aimed to tackle (then) emerging search. Other methods are possible but not required.
– Rule 2 – The database must be linked to other databases by active
problems with 2D Gel databases: hypertext cross-references, linking together all related databases. Database
entries must be at least linked to the main index.
– non-uniformity of data-encoding conventions – Rule 3 – A main index has to be supplied that provides a means of querying
all databases through one unique query point. Currently, the main index is
– robustness the SWISS-PROT database.
– Rule 4 – Individual protein entries must be available through clickable
– consistency images.
– Rule 5 – 2DE analysis software designed for use with federated databases,
– commitment of groups to maintain the databases must be able to access individual entries in any federated 2DE databases.
and data quality http://ca.expasy.org/ch2d/fed-rules.html

Swiss 2DPAGE Swiss 2DPAGE


z Established in 1993 z Entries are linked to images showing the
experimentally determined and theoretical
z Maintained by the Central Clinical Chemistry protein locations.
Laboratory of the Geneva University Hospital
and the Swiss Institute of Bioinformatics. z Cross-references are provided to other
federated 2D-PAGE database entries,
z Entries highly annotated - Medline and SWISS-PROT
– containing textual data on proteins including:
z mapping procedure
z Search via - clickable images
z physiological and pathological information, - keywords
z experimental data (isoelectric point, molecular weight,
amino acid composition, peptide masses)
z bibliographical references.

1
Swiss 2DPAGE Swiss 2DPAGE

Swiss 2DPAGE Swiss 2DPAGE

Swiss 2DPAGE Swiss 2DPAGE

2
Make2DDB
Make2DDB

z Software package provided by ExPASY


z Allows for production of a 2DPAGE
database on users server.
z Database created which is queryable
via description, accession or spot
clicking.
z Provides links to Swiss-Prot.

Swiss 2DPAGE
Make2DDB databases
http://semele.anu.edu.au/2d/2d.html -
ANU 2D-PAGE, Australian National University 2D-PAGE database

http://babbage.csc.ucm.es/2d/2d.html -
COMPLUYEAST 2DPAGE, Saccharomyces cerevisae 2D-PAGE database at
Universidad complutense Madrid, Spain

http://www.gram.au.dk/ -
PHCI-2DPAGE, Parasite host cell interaction 2D-PAGE interaction database.

http://www.bio-mol.unisi.it/2d/2d.html -
Sienna 2D PAGE

A sample of 2D-PAGE databases created with make2ddb.

2D Gel Databases Proteomics Database Schema

z Limitations of current databases: z What should it encompass?


z Do not contain strict/detailed descriptions of protocol
(buffers, sample volume, staining techniques all important
– Proteomics methods (e.g. protein sample
information for gel comparisons). prep, electrophesis buffers, staining
z Designed as 2D (and not proteomics) databases and techniques, digestion for MS etc).
therefore not readily expandable to incorporate other – Results from each stage of the experiment
proteomics data e.g. MS, MDLC.
z Designed for reference gels, not on-going projects.
(e.g. gel images, MS data).
– Parameters used for MS data
analysis/statistical results
– All stored in strict format.

3
Database querying Proteomics Database Schema
z Interactvia web interface using
Perl/CGI
z Clickable gel images
z Text querying – for keywords, gel/spot
name, author, sequence etc.
z XML used for data exchange

Introduction to databases DBMS choice


z Flat file –simplest database type, an ordered z A flat file database would contain many redundancies in
collection of data entries, analogous to how files would storing complex data types.
be stored in a filing cabinet. z An object-oriented database could intrinsically store
complex data types e.g. large images, however, a
z Relational –more sophisticated, storing data in inter- relational database could contain links to images stored
related tables. Allow for flexible querying using elsewhere.
Structured Query Language (SQL). z SQL would provide a fast and easy way of querying and
z Object Orientated – database consistent with object updating the database.
orientated principles, allowing for storage of complex z A relational database would provide a platform, easily
datatypes (i.e. multimedia) and querying beyond that expandable to accommodate additional forms of data.
defined by a rigidly defined query language.

Future
Computer Analysis of
z Standard database schema for proteomics and mark-up

z
language for data exchange.
Improved spot detection, quantification and gel warping
Mass Spectrometry Data
algorithms.
z Improved sample preparation techniques.
z More automation (linkage of robots!).
z Protein array technologies.

4
Protein Sequencing and Identification

Introduction Gel

z Software and computational techniques for Sample Preparation

the identification of proteins and residue MALDI


modifications using both MALDI and Matrix assisited laser desorption ionisation
(Peptide Mass Fingerprinting)
MS/MS data.
Protein Not Identified Protein Identified

Further Sample Processing

MS/MS
Peptide Fragmentation

Software for protein identification Software for protein identification

1. Peptide fingerprint (PMF) 2. Peptide fragmentation


software : PepSea, PeptIdent/MultiIdent, MS-Fit

1) PepSea (http://pepsea.protana.com/PA_PeptidePatternForm.html)
1) MOWSE (http://srs.hgmp.mrc.ac.uk/cgi-bin/mowse)
2) SEQUEST
2) ProFound (http://prowl.rockefeller.edu)
3) PepFrag (http://www.proteometrics.com/prowl/pepfragch.html)
3) Mascot (http://www.matrixscience.com/search_form_select.html)
MS-Tag (http://prospector.ucsf.edu/ucsfhtm13.2/mstagfd.html)
4) PeptIdent2 (http://us.expasy.org/tools/peptident.html)
4) Mascot
5) PeptideSearch (http://www.mann.embl-heidelberg.de

6) MS-Fit (http://prospector.ucsf.edu)


① ② ④

Peptide mass fingerprint



Fig.3 Simulation

5
Peptide Mass Fingerprinting Mass spectrum
(peptide mass fingerprint)

MS intensity
MS spectrum

Protein
database

Peptide
Protein id mass 422.25 692.35 1096.59 1451.75
A B C A
B
C

Mass spectrum vs. database Peptide mass match


….FNSTPKYIKSEGYGPREKYQSRPKFNSTPKDYN…
Mass spectrum database
intensity Mass spectrum
Protein A Spectrum of a protein in
FNSTPK DB

YIK YQSRPKFNSTPK
Protein B
FNSTPKYIK

Protein C 422.25 692.35 1096.59 1451.75

Tolerance
?

Software for protein identification MOWSE (http://srs.hgmp.mrc.ac.uk/cgi-bin/mowse)

1. Peptide fingerprint (PMF)


software : PepSea, PeptIdent/MultiIdent, MS-Fit

1) MOWSE (http://srs.hgmp.mrc.ac.uk/cgi-bin/mowse)

2) ProFound (http://prowl.rockefeller.edu)
3) Mascot (http://www.matrixscience.com/search_form_select.html)
4) PeptIdent2 (http://us.expasy.org/tools/peptident.html)

5) PeptideSearch (http://www.mann.embl-heidelberg.de

6) MS-Fit (http://prospector.ucsf.edu)

6
MOWSE (http://srs.hgmp.mrc.ac.uk/cgi-bin/mowse) ProFound (http://prowl.rockefeller.edu)

ProFound (http://prowl.rockefeller.edu) Mascot (http://www.matrixscience.com/search_form_select.html)

Mascot (http://www.matrixscience.com/search_form_select.html) PeptIdent2 (http://us.expasy.org/tools/peptident.html)

7
PeptIdent2 (http://us.expasy.org/tools/peptident.html)
PeptideSearch (http://www.mann.embl-heidelberg.de

MS-Fit (http://prospector.ucsf.edu) MS-Fit (http://prospector.ucsf.edu)

MS-Fit (http://prospector.ucsf.edu)

Peptide Mass Fingerprinting


A mass spectrum of the peptide mixture resulting from the
digestion of a protein by a proteolytic enzyme

z Choice of Enzyme
z Missed Cleavages
z Search Masses
z Constraining the Protein Molecular Weight
z Which masses to include in a search
z Autolysis products
z Modifications

8
Enzymatic Cleavage Choice of Enzyme

Peptide Fragments
Native Protein z Enzymes of low specificity are next to
useless as they produce a complex mixture
Enzyme of similar masses
z For MALDI, Peptides of masses less than
500 Da should be avoided

Enzyme Specificity Missed Cleavages


Enzyme Cleave At Don’t Cleave N or Cterm z Digests are usually not perfect
Trypsin KR P C
z Cleavage sites may be missed by an enzyme
Lys-C K P C
z These partially cleaved peptides are known
Lys-C/P K C
as partials
Arg-C R P C
z Reduce the discrimination of a search
V8-E E P C
V8-DE DE P C

Chymotrypsin FYWLIVM P C

Search Masses Constraining Protein Mass


z Select masses which are large enough to z To increase discrimination, the mass of the
provide discrimination intact protein can be used in a search
z Larger masses are more likely to be partials z This is dangerous since this may be just a
z With Trypsin, a mass range of 1000 to 3000 fragment of an entire protein
Da is good
z Mass tolerance is important in obtaining
good discrimination

9
Which Masses to Include ? Autolysis Products
The optimum dataset for a peptide mass fingerprint is all
the correct peptides and none of the wrong ones ! By correct, z Some digests may be dominated by the
we mean that the textbook cleavage rules were followed. In
practice, this rarely (if ever) happens.
autolysis peaks of the enzyme used
z In these cases, the known masses of these
z Enzymatic cleavage not perfect products may be filtered
z Sequence coverage may be poor
z Noise

Mascot (http://www.matrixscience.com/search_form_select.html)

Residue Modifications
z Some residues may be modified during the
sample preparation procedure
z This introduces discrepancies in the
expected and observed masses
z For example, Met residues are often
oxidised

Sample Preparation for MALDI Sample Preparation for


MALDI
z Exciseband from gel
z TrypticDigestion of gel fragment
z Supernatant transferred to fresh eppendorf
z Sample transferred to target plate

10
MALDI Mass Spectrometer
Sample Preparation Robot
z Ions are generated by a LASER firing at the target
plate
z The time of firing of the LASER and the arrival
time of the ions at the detector are known, the
relative masses can then be calculated
z Only singly charged ions are generated, other
types of spectrometer may generate multiply
charged ions

Ez = (1/2) mv2

MALDI Internals Micromass MALDI

Isotopic Cluster
Typical Fingerprint Spectrum

11
Poorly Resolved Peak

Database Searching with


Peptide Mass fingerprints

Database Searching with Peptide Mass fingerprints


zProduce a theoretical digest of all the proteins in a database with a specific enzyme
Problems
zCompare these theoretical masses with experimentally observed masses
zAssign a score to matching peptides/proteins
z Mixtures and contamination
Mass spectrum database z Partial cleavage
z Identifying real peaks
Protein A
z Residue modifications
z Mass accuracy
Protein B

Protein C

MOWSE Problems with MOWSE


z One of the first programs for identifying z Databases had to be pre-indexed, these
proteins by peptide mass fingerprinting indexes are large and slow to build
z Developed by Darryl Pappin and Alan z Does not handle variable modifications
Bleasby z Indexing means that databases can’t be
z Developed alongside the OWL non- regularly updated easily
redundant protein database z Limited functionality

12
Search Speed
MASCOT
z Take advantage of multi-processor systems
z Totally web based
z No pre-indexing of databases
z Increased functionality
z Copes with multiple modifications
z Easily expandable
z Increased speed

Search Speed Search Speed


Search speed is very important as databases increase in size and
automation leads to a high throughput of samples. Also, if the
algorithms are efficient more elaborate searches may be
undertaken, for instance with large numbers of variable residue
modifications and different mass tolerance to attempt to make
more sense of data derived from mixtures or with contamination

z Ability to use multiple processors when


available
z Very efficient I/O, databases may also be
mapped to memory
z Efficient cleavage site and mass calculation

Search Speed Thread Models


Threads is a standardized model for dividing a
program into subtasks whose execution can be
interleaved or run in parallel.

z Boss/Worker
z Peer
z Pipeline
z MASCOT is based on the Boss/Worker
model

13
Boss/Worker Model
Boss/Worker Model Resources
Workers
Program Files
taskX
Input Data Output
Databases
Boss
"Boss" main() taskY
Input (Stream)

Disks
Worker Thread A Worker Thread B Worker Thread c taskZ

Special
The “Boss” accepts input and then distributes the work to Devices
other threads

Peer Model
Peer Model Resources
Workers
Program Files
taskX

Input Data Input


(Static) Databases

taskY

Thread A Thread B Thread C taskZ


Disks

Special
Devices
Output Output Output

Each Thread is responsible for it’s own input

Program
Thread Pipeline Model
Pipeline Model Stage1 Stage2 Stage3
Input (Stream)

Input Stream Output


Thread A Thread B Thread C

Resources Files Files Files

Databases Databases Databases


A single thread accepts input, passing the data on to the next
thread for further processing
Disks Disks Disks

Special Devices Special Devices Special Devices

14
Related Search Methods Composition Queries
z Masses may be combined with sequence
information : 1234.5 seq(c-ABCD) seq(EF) z Composition information may also be used
z These searches are very valuable as even with mass information to refine queries
small amounts of sequence information may z Chemical or enzymatic analysis, such as N
be very discriminating
terminal analysis with Edman, may give
z Sequence information is derived from the composition information
partial interpretation of a MS/MS spectrum
z A typical query would
z Know as the “sequence tag” method
be : 1234.5 comp(2[H]0[M])

15
MASCOT Queries Databases Searched with
Peptide Mass Fingerprint Data
z One of the most powerful features of z Non-identical protein databases are the ideal
MASCOT is the ability to mix all the types z EST sequences are too short to contain
of query in one search meaningful information for these searches
z MASCOT allows the user to specify a z Non-redundant databases may be
particular species to further increase search problematic
discrimination z MASCOT translates nucleic acid databases
on the fly

Local Mascot database


(http://www.matrixscience.com/search_form_select.html)

Establish a Mascot database


in your own lab
P4

Local Mascot database Local Mascot database

P4

16
Local Mascot database
ftp://ftp.ncbi.nih.gov/repository/MSDB/msdb.nam
MSDB
zA non-identical protein sequence database
designed for mass spectrometry searches
z Additional information, such as multiple
species lines, in the textual information
z De-convolution of SWISSPROT and other
sequences
z Nightly updates
z Links to source databases

Local Mascot database


Is The Protein Identified ?
z Most samples are identified using just
peptide mass fingerprinting
z With the growth of databases, this trend will
continue
z Some samples do not have representatives
in any of the databases, to sequence these
proteins more analysis is required

17

S-ar putea să vă placă și