2D Gel Databases

2D Gel Databases
2D Gel Databases www.expasy.ch - Swiss-2DPAGE
http://www.anl.gov/BIO/PMG/ - Mouse liver, human breast cell

lines, pyrococcus. Argonne Protein Mapping Group.
http://www.harefield.nthames.nhs.uk/nhli/protein/index.html - HSC-
2DPAGE, Heart Science Centre, Harefield Hospital
http://oto.wustl.edu/thc/peri-gels.htm - Washington Univ. Inner Ear

Protein Database
http://ca.expasy.org/ch2d/2d-index.html - World 2DPAGE, Index of

2D gel databases
Federated 2D PAGE database Federated 2D PAGE database
z Described by Appel et al (1996) z Rules:

– Rule 1 – Individual entries in the database must be accessible by a keyword
z Aimed to tackle (then) emerging search. Other methods are possible but not required.
– Rule 2 – The database must be linked to other databases by active
problems with 2D Gel databases: hypertext cross-references, linking together all related databases. Database
entries must be at least linked to the main index.
– non-uniformity of data-encoding conventions – Rule 3 – A main index has to be supplied that provides a means of querying
all databases through one unique query point. Currently, the main index is
– robustness the SWISS-PROT database.
– Rule 4 – Individual protein entries must be available through clickable
– consistency images.
– Rule 5 – 2DE analysis software designed for use with federated databases,
– commitment of groups to maintain the databases must be able to access individual entries in any federated 2DE databases.
and data quality http://ca.expasy.org/ch2d/fed-rules.html
Swiss 2DPAGE Swiss 2DPAGE

z Established in 1993 z Entries are linked to images showing the
experimentally determined and theoretical
z Maintained by the Central Clinical Chemistry protein locations.
Laboratory of the Geneva University Hospital
and the Swiss Institute of Bioinformatics. z Cross-references are provided to other
federated 2D-PAGE database entries,
z Entries highly annotated - Medline and SWISS-PROT
– containing textual data on proteins including:
z mapping procedure
z Search via - clickable images
z physiological and pathological information, - keywords
z experimental data (isoelectric point, molecular weight,
amino acid composition, peptide masses)
z bibliographical references.
1
2
Make2DDB
Make2DDB
z Software package provided by ExPASY

z Allows for production of a 2DPAGE
database on users server.
z Database created which is queryable
via description, accession or spot
clicking.
z Provides links to Swiss-Prot.
Swiss 2DPAGE
Make2DDB databases
http://semele.anu.edu.au/2d/2d.html -
ANU 2D-PAGE, Australian National University 2D-PAGE database
http://babbage.csc.ucm.es/2d/2d.html -
COMPLUYEAST 2DPAGE, Saccharomyces cerevisae 2D-PAGE database at
Universidad complutense Madrid, Spain
http://www.gram.au.dk/ -
PHCI-2DPAGE, Parasite host cell interaction 2D-PAGE interaction database.
http://www.bio-mol.unisi.it/2d/2d.html -
Sienna 2D PAGE
A sample of 2D-PAGE databases created with make2ddb.
2D Gel Databases Proteomics Database Schema
z Limitations of current databases: z What should it encompass?

z Do not contain strict/detailed descriptions of protocol
(buffers, sample volume, staining techniques all important
– Proteomics methods (e.g. protein sample
information for gel comparisons). prep, electrophesis buffers, staining
z Designed as 2D (and not proteomics) databases and techniques, digestion for MS etc).
therefore not readily expandable to incorporate other – Results from each stage of the experiment
proteomics data e.g. MS, MDLC.
z Designed for reference gels, not on-going projects.
(e.g. gel images, MS data).
– Parameters used for MS data
analysis/statistical results
– All stored in strict format.
3
Database querying Proteomics Database Schema
z Interactvia web interface using
Perl/CGI
z Clickable gel images
z Text querying – for keywords, gel/spot
name, author, sequence etc.
z XML used for data exchange
Introduction to databases DBMS choice

z Flat file –simplest database type, an ordered z A flat file database would contain many redundancies in
collection of data entries, analogous to how files would storing complex data types.
be stored in a filing cabinet. z An object-oriented database could intrinsically store
complex data types e.g. large images, however, a
z Relational –more sophisticated, storing data in inter- relational database could contain links to images stored
related tables. Allow for flexible querying using elsewhere.
Structured Query Language (SQL). z SQL would provide a fast and easy way of querying and
z Object Orientated – database consistent with object updating the database.
orientated principles, allowing for storage of complex z A relational database would provide a platform, easily
datatypes (i.e. multimedia) and querying beyond that expandable to accommodate additional forms of data.
defined by a rigidly defined query language.
Future
Computer Analysis of
z Standard database schema for proteomics and mark-up
z
language for data exchange.
Improved spot detection, quantification and gel warping
Mass Spectrometry Data
algorithms.
z Improved sample preparation techniques.
z More automation (linkage of robots!).
z Protein array technologies.
4
Protein Sequencing and Identification
Introduction Gel
z Software and computational techniques for Sample Preparation
the identification of proteins and residue MALDI

modifications using both MALDI and Matrix assisited laser desorption ionisation
(Peptide Mass Fingerprinting)
MS/MS data.
Protein Not Identified Protein Identified
Further Sample Processing
MS/MS
Peptide Fragmentation
Software for protein identification Software for protein identification
1. Peptide fingerprint (PMF) 2. Peptide fragmentation

software : PepSea, PeptIdent/MultiIdent, MS-Fit
1) PepSea (http://pepsea.protana.com/PA_PeptidePatternForm.html)
1) MOWSE (http://srs.hgmp.mrc.ac.uk/cgi-bin/mowse)
2) SEQUEST
2) ProFound (http://prowl.rockefeller.edu)
3) PepFrag (http://www.proteometrics.com/prowl/pepfragch.html)
3) Mascot (http://www.matrixscience.com/search_form_select.html)
MS-Tag (http://prospector.ucsf.edu/ucsfhtm13.2/mstagfd.html)
4) PeptIdent2 (http://us.expasy.org/tools/peptident.html)
4) Mascot
5) PeptideSearch (http://www.mann.embl-heidelberg.de
6) MS-Fit (http://prospector.ucsf.edu)
⑧
① ② ④
Peptide mass fingerprint

③
⑤
⑥
Fig.3 Simulation
5
Peptide Mass Fingerprinting Mass spectrum
(peptide mass fingerprint)
MS intensity
MS spectrum
Protein
database
Peptide
Protein id mass 422.25 692.35 1096.59 1451.75
A B C A
B
C
Mass spectrum vs. database Peptide mass match

….FNSTPKYIKSEGYGPREKYQSRPKFNSTPKDYN…
Mass spectrum database
intensity Mass spectrum
Protein A Spectrum of a protein in
FNSTPK DB
YIK YQSRPKFNSTPK
Protein B
FNSTPKYIK
Protein C 422.25 692.35 1096.59 1451.75
Tolerance
?
Software for protein identification MOWSE (http://srs.hgmp.mrc.ac.uk/cgi-bin/mowse)
1. Peptide fingerprint (PMF)

software : PepSea, PeptIdent/MultiIdent, MS-Fit
1) MOWSE (http://srs.hgmp.mrc.ac.uk/cgi-bin/mowse)
2) ProFound (http://prowl.rockefeller.edu)
3) Mascot (http://www.matrixscience.com/search_form_select.html)
4) PeptIdent2 (http://us.expasy.org/tools/peptident.html)
5) PeptideSearch (http://www.mann.embl-heidelberg.de
6) MS-Fit (http://prospector.ucsf.edu)
6
MOWSE (http://srs.hgmp.mrc.ac.uk/cgi-bin/mowse) ProFound (http://prowl.rockefeller.edu)
ProFound (http://prowl.rockefeller.edu) Mascot (http://www.matrixscience.com/search_form_select.html)
Mascot (http://www.matrixscience.com/search_form_select.html) PeptIdent2 (http://us.expasy.org/tools/peptident.html)
7
PeptIdent2 (http://us.expasy.org/tools/peptident.html)
PeptideSearch (http://www.mann.embl-heidelberg.de
MS-Fit (http://prospector.ucsf.edu) MS-Fit (http://prospector.ucsf.edu)
MS-Fit (http://prospector.ucsf.edu)
Peptide Mass Fingerprinting

A mass spectrum of the peptide mixture resulting from the
digestion of a protein by a proteolytic enzyme
z Choice of Enzyme
z Missed Cleavages
z Search Masses
z Constraining the Protein Molecular Weight
z Which masses to include in a search
z Autolysis products
z Modifications
8
Enzymatic Cleavage Choice of Enzyme
Peptide Fragments
Native Protein z Enzymes of low specificity are next to
useless as they produce a complex mixture
Enzyme of similar masses
z For MALDI, Peptides of masses less than
500 Da should be avoided
Enzyme Specificity Missed Cleavages

Enzyme Cleave At Don’t Cleave N or Cterm z Digests are usually not perfect
Trypsin KR P C
z Cleavage sites may be missed by an enzyme
Lys-C K P C
z These partially cleaved peptides are known
Lys-C/P K C
as partials
Arg-C R P C
z Reduce the discrimination of a search
V8-E E P C
V8-DE DE P C
Chymotrypsin FYWLIVM P C
Search Masses Constraining Protein Mass

z Select masses which are large enough to z To increase discrimination, the mass of the
provide discrimination intact protein can be used in a search
z Larger masses are more likely to be partials z This is dangerous since this may be just a
z With Trypsin, a mass range of 1000 to 3000 fragment of an entire protein
Da is good
z Mass tolerance is important in obtaining
good discrimination
9
Which Masses to Include ? Autolysis Products
The optimum dataset for a peptide mass fingerprint is all
the correct peptides and none of the wrong ones ! By correct, z Some digests may be dominated by the
we mean that the textbook cleavage rules were followed. In
practice, this rarely (if ever) happens.
autolysis peaks of the enzyme used
z In these cases, the known masses of these
z Enzymatic cleavage not perfect products may be filtered
z Sequence coverage may be poor
z Noise
Mascot (http://www.matrixscience.com/search_form_select.html)
Residue Modifications
z Some residues may be modified during the
sample preparation procedure
z This introduces discrepancies in the
expected and observed masses
z For example, Met residues are often
oxidised
Sample Preparation for MALDI Sample Preparation for

MALDI
z Exciseband from gel
z TrypticDigestion of gel fragment
z Supernatant transferred to fresh eppendorf
z Sample transferred to target plate
10
MALDI Mass Spectrometer
Sample Preparation Robot
z Ions are generated by a LASER firing at the target
plate
z The time of firing of the LASER and the arrival
time of the ions at the detector are known, the
relative masses can then be calculated
z Only singly charged ions are generated, other
types of spectrometer may generate multiply
charged ions
Ez = (1/2) mv2
MALDI Internals Micromass MALDI
Isotopic Cluster
Typical Fingerprint Spectrum
11
Poorly Resolved Peak
Database Searching with

Peptide Mass fingerprints
Database Searching with Peptide Mass fingerprints

zProduce a theoretical digest of all the proteins in a database with a specific enzyme
Problems
zCompare these theoretical masses with experimentally observed masses
zAssign a score to matching peptides/proteins
z Mixtures and contamination
Mass spectrum database z Partial cleavage
z Identifying real peaks
Protein A
z Residue modifications
z Mass accuracy
Protein B
Protein C
MOWSE Problems with MOWSE

z One of the first programs for identifying z Databases had to be pre-indexed, these
proteins by peptide mass fingerprinting indexes are large and slow to build
z Developed by Darryl Pappin and Alan z Does not handle variable modifications
Bleasby z Indexing means that databases can’t be
z Developed alongside the OWL non- regularly updated easily
redundant protein database z Limited functionality
12
Search Speed
MASCOT
z Take advantage of multi-processor systems
z Totally web based
z No pre-indexing of databases
z Increased functionality
z Copes with multiple modifications
z Easily expandable
z Increased speed
Search Speed Search Speed

Search speed is very important as databases increase in size and
automation leads to a high throughput of samples. Also, if the
algorithms are efficient more elaborate searches may be
undertaken, for instance with large numbers of variable residue
modifications and different mass tolerance to attempt to make
more sense of data derived from mixtures or with contamination
z Ability to use multiple processors when

available
z Very efficient I/O, databases may also be
mapped to memory
z Efficient cleavage site and mass calculation
Search Speed Thread Models

Threads is a standardized model for dividing a
program into subtasks whose execution can be
interleaved or run in parallel.
z Boss/Worker
z Peer
z Pipeline
z MASCOT is based on the Boss/Worker
model
13
Boss/Worker Model
Boss/Worker Model Resources
Workers
Program Files
taskX
Input Data Output
Databases
Boss
"Boss" main() taskY
Input (Stream)
Disks
Worker Thread A Worker Thread B Worker Thread c taskZ
Special
The “Boss” accepts input and then distributes the work to Devices
other threads
Peer Model
Peer Model Resources
Workers
Program Files
taskX
Input Data Input

(Static) Databases
taskY
Thread A Thread B Thread C taskZ

Disks
Special
Devices
Output Output Output
Each Thread is responsible for it’s own input
Program
Thread Pipeline Model
Pipeline Model Stage1 Stage2 Stage3
Input (Stream)
Input Stream Output

Thread A Thread B Thread C
Resources Files Files Files
Databases Databases Databases

A single thread accepts input, passing the data on to the next
thread for further processing
Disks Disks Disks
Special Devices Special Devices Special Devices
14
Related Search Methods Composition Queries
z Masses may be combined with sequence
information : 1234.5 seq(c-ABCD) seq(EF) z Composition information may also be used
z These searches are very valuable as even with mass information to refine queries
small amounts of sequence information may z Chemical or enzymatic analysis, such as N
be very discriminating
terminal analysis with Edman, may give
z Sequence information is derived from the composition information
partial interpretation of a MS/MS spectrum
z A typical query would
z Know as the “sequence tag” method
be : 1234.5 comp(2[H]0[M])
15
MASCOT Queries Databases Searched with
Peptide Mass Fingerprint Data
z One of the most powerful features of z Non-identical protein databases are the ideal
MASCOT is the ability to mix all the types z EST sequences are too short to contain
of query in one search meaningful information for these searches
z MASCOT allows the user to specify a z Non-redundant databases may be
particular species to further increase search problematic
discrimination z MASCOT translates nucleic acid databases
on the fly
Local Mascot database

(http://www.matrixscience.com/search_form_select.html)
Establish a Mascot database

in your own lab
P4
Local Mascot database Local Mascot database
P4
16
ftp://ftp.ncbi.nih.gov/repository/MSDB/msdb.nam
MSDB
zA non-identical protein sequence database
designed for mass spectrometry searches
z Additional information, such as multiple
species lines, in the textual information
z De-convolution of SWISSPROT and other
sequences
z Nightly updates
z Links to source databases

Is The Protein Identified ?
z Most samples are identified using just
peptide mass fingerprinting
z With the growth of databases, this trend will
continue
z Some samples do not have representatives
in any of the databases, to sequence these
proteins more analysis is required
17

2D Gel Databases

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

2D Gel Databases

Încărcat de

Drepturi de autor:

Formate disponibile

2D Gel Databases

2D Gel Databases www.expasy.ch - Swiss-2DPAGE

http://www.anl.gov/BIO/PMG/ - Mouse liver, human breast cell

http://oto.wustl.edu/thc/peri-gels.htm - Washington Univ. Inner Ear

http://ca.expasy.org/ch2d/2d-index.html - World 2DPAGE, Index of

Federated 2D PAGE database Federated 2D PAGE database

z Described by Appel et al (1996) z Rules:

Swiss 2DPAGE Swiss 2DPAGE

Swiss 2DPAGE Swiss 2DPAGE

Swiss 2DPAGE Swiss 2DPAGE

z Software package provided by ExPASY

A sample of 2D-PAGE databases created with make2ddb.

2D Gel Databases Proteomics Database Schema

z Limitations of current databases: z What should it encompass?

Introduction to databases DBMS choice

z Software and computational techniques for Sample Preparation

the identification of proteins and residue MALDI

Further Sample Processing

Software for protein identification Software for protein identification

1. Peptide fingerprint (PMF) 2. Peptide fragmentation

Peptide mass fingerprint

Mass spectrum vs. database Peptide mass match

Protein C 422.25 692.35 1096.59 1451.75

Software for protein identification MOWSE (http://srs.hgmp.mrc.ac.uk/cgi-bin/mowse)

1. Peptide fingerprint (PMF)

ProFound (http://prowl.rockefeller.edu) Mascot (http://www.matrixscience.com/search_form_select.html)

Mascot (http://www.matrixscience.com/search_form_select.html) PeptIdent2 (http://us.expasy.org/tools/peptident.html)

MS-Fit (http://prospector.ucsf.edu) MS-Fit (http://prospector.ucsf.edu)

Peptide Mass Fingerprinting

Enzyme Specificity Missed Cleavages

Search Masses Constraining Protein Mass

Sample Preparation for MALDI Sample Preparation for

MALDI Internals Micromass MALDI

Database Searching with

Database Searching with Peptide Mass fingerprints

MOWSE Problems with MOWSE

Search Speed Search Speed

z Ability to use multiple processors when

Search Speed Thread Models

Input Data Input

Thread A Thread B Thread C taskZ

Each Thread is responsible for it’s own input

Input Stream Output

Resources Files Files Files

Databases Databases Databases

Special Devices Special Devices Special Devices

Local Mascot database

Establish a Mascot database

Local Mascot database Local Mascot database

Local Mascot database

S-ar putea să vă placă și