Documente Academic
Documente Profesional
Documente Cultură
http://www.harefield.nthames.nhs.uk/nhli/protein/index.html - HSC-
2DPAGE, Heart Science Centre, Harefield Hospital
1
Swiss 2DPAGE Swiss 2DPAGE
2
Make2DDB
Make2DDB
Swiss 2DPAGE
Make2DDB databases
http://semele.anu.edu.au/2d/2d.html -
ANU 2D-PAGE, Australian National University 2D-PAGE database
http://babbage.csc.ucm.es/2d/2d.html -
COMPLUYEAST 2DPAGE, Saccharomyces cerevisae 2D-PAGE database at
Universidad complutense Madrid, Spain
http://www.gram.au.dk/ -
PHCI-2DPAGE, Parasite host cell interaction 2D-PAGE interaction database.
http://www.bio-mol.unisi.it/2d/2d.html -
Sienna 2D PAGE
3
Database querying Proteomics Database Schema
z Interactvia web interface using
Perl/CGI
z Clickable gel images
z Text querying – for keywords, gel/spot
name, author, sequence etc.
z XML used for data exchange
Future
Computer Analysis of
z Standard database schema for proteomics and mark-up
z
language for data exchange.
Improved spot detection, quantification and gel warping
Mass Spectrometry Data
algorithms.
z Improved sample preparation techniques.
z More automation (linkage of robots!).
z Protein array technologies.
4
Protein Sequencing and Identification
Introduction Gel
MS/MS
Peptide Fragmentation
1) PepSea (http://pepsea.protana.com/PA_PeptidePatternForm.html)
1) MOWSE (http://srs.hgmp.mrc.ac.uk/cgi-bin/mowse)
2) SEQUEST
2) ProFound (http://prowl.rockefeller.edu)
3) PepFrag (http://www.proteometrics.com/prowl/pepfragch.html)
3) Mascot (http://www.matrixscience.com/search_form_select.html)
MS-Tag (http://prospector.ucsf.edu/ucsfhtm13.2/mstagfd.html)
4) PeptIdent2 (http://us.expasy.org/tools/peptident.html)
4) Mascot
5) PeptideSearch (http://www.mann.embl-heidelberg.de
6) MS-Fit (http://prospector.ucsf.edu)
⑧
① ② ④
⑤
⑥
Fig.3 Simulation
5
Peptide Mass Fingerprinting Mass spectrum
(peptide mass fingerprint)
MS intensity
MS spectrum
Protein
database
Peptide
Protein id mass 422.25 692.35 1096.59 1451.75
A B C A
B
C
YIK YQSRPKFNSTPK
Protein B
FNSTPKYIK
Tolerance
?
1) MOWSE (http://srs.hgmp.mrc.ac.uk/cgi-bin/mowse)
2) ProFound (http://prowl.rockefeller.edu)
3) Mascot (http://www.matrixscience.com/search_form_select.html)
4) PeptIdent2 (http://us.expasy.org/tools/peptident.html)
5) PeptideSearch (http://www.mann.embl-heidelberg.de
6) MS-Fit (http://prospector.ucsf.edu)
6
MOWSE (http://srs.hgmp.mrc.ac.uk/cgi-bin/mowse) ProFound (http://prowl.rockefeller.edu)
7
PeptIdent2 (http://us.expasy.org/tools/peptident.html)
PeptideSearch (http://www.mann.embl-heidelberg.de
MS-Fit (http://prospector.ucsf.edu)
z Choice of Enzyme
z Missed Cleavages
z Search Masses
z Constraining the Protein Molecular Weight
z Which masses to include in a search
z Autolysis products
z Modifications
8
Enzymatic Cleavage Choice of Enzyme
Peptide Fragments
Native Protein z Enzymes of low specificity are next to
useless as they produce a complex mixture
Enzyme of similar masses
z For MALDI, Peptides of masses less than
500 Da should be avoided
Chymotrypsin FYWLIVM P C
9
Which Masses to Include ? Autolysis Products
The optimum dataset for a peptide mass fingerprint is all
the correct peptides and none of the wrong ones ! By correct, z Some digests may be dominated by the
we mean that the textbook cleavage rules were followed. In
practice, this rarely (if ever) happens.
autolysis peaks of the enzyme used
z In these cases, the known masses of these
z Enzymatic cleavage not perfect products may be filtered
z Sequence coverage may be poor
z Noise
Mascot (http://www.matrixscience.com/search_form_select.html)
Residue Modifications
z Some residues may be modified during the
sample preparation procedure
z This introduces discrepancies in the
expected and observed masses
z For example, Met residues are often
oxidised
10
MALDI Mass Spectrometer
Sample Preparation Robot
z Ions are generated by a LASER firing at the target
plate
z The time of firing of the LASER and the arrival
time of the ions at the detector are known, the
relative masses can then be calculated
z Only singly charged ions are generated, other
types of spectrometer may generate multiply
charged ions
Ez = (1/2) mv2
Isotopic Cluster
Typical Fingerprint Spectrum
11
Poorly Resolved Peak
Protein C
12
Search Speed
MASCOT
z Take advantage of multi-processor systems
z Totally web based
z No pre-indexing of databases
z Increased functionality
z Copes with multiple modifications
z Easily expandable
z Increased speed
z Boss/Worker
z Peer
z Pipeline
z MASCOT is based on the Boss/Worker
model
13
Boss/Worker Model
Boss/Worker Model Resources
Workers
Program Files
taskX
Input Data Output
Databases
Boss
"Boss" main() taskY
Input (Stream)
Disks
Worker Thread A Worker Thread B Worker Thread c taskZ
Special
The “Boss” accepts input and then distributes the work to Devices
other threads
Peer Model
Peer Model Resources
Workers
Program Files
taskX
taskY
Special
Devices
Output Output Output
Program
Thread Pipeline Model
Pipeline Model Stage1 Stage2 Stage3
Input (Stream)
14
Related Search Methods Composition Queries
z Masses may be combined with sequence
information : 1234.5 seq(c-ABCD) seq(EF) z Composition information may also be used
z These searches are very valuable as even with mass information to refine queries
small amounts of sequence information may z Chemical or enzymatic analysis, such as N
be very discriminating
terminal analysis with Edman, may give
z Sequence information is derived from the composition information
partial interpretation of a MS/MS spectrum
z A typical query would
z Know as the “sequence tag” method
be : 1234.5 comp(2[H]0[M])
15
MASCOT Queries Databases Searched with
Peptide Mass Fingerprint Data
z One of the most powerful features of z Non-identical protein databases are the ideal
MASCOT is the ability to mix all the types z EST sequences are too short to contain
of query in one search meaningful information for these searches
z MASCOT allows the user to specify a z Non-redundant databases may be
particular species to further increase search problematic
discrimination z MASCOT translates nucleic acid databases
on the fly
P4
16
Local Mascot database
ftp://ftp.ncbi.nih.gov/repository/MSDB/msdb.nam
MSDB
zA non-identical protein sequence database
designed for mass spectrometry searches
z Additional information, such as multiple
species lines, in the textual information
z De-convolution of SWISSPROT and other
sequences
z Nightly updates
z Links to source databases
17