Sunteți pe pagina 1din 47

Interactoma

Cómo preparar un tema


Proteome-scale map of the human
protein–protein interaction network

Rual et al. Nature 437, 1173-1178 (20 October 2005)


Lectura dirigida
• Objetivo: Desarrollar un tema académico del que no se
dispone prácticamente de información previa.

• Método: lectura dirigida de los artículos buscando


– Responder a las preguntas iniciales
– Identificar nuevas preguntas relevantes

• Preguntas iniciales:
– ¿Qué es el interactoma?
– ¿Qué problema científico aborda?
– ¿Cómo se estudia el interactoma?
1.- Búsqueda de la
información
Bibliografía seleccionada
2.- Elaboración de la
información
¿Qué es el interactoma?
¿Qué problema científico aborda?
¿Cómo se estudia?
• Interactoma:
– identificación sistemática de
interacciones de proteínas
dentro de un organismo

• Primer ejemplo: Fago T7

• Procedimientos experimentales:
– Sistema Y2H (híbrido de dos)
– Detección de complejos por
espectroscopía de masas

• Problemas técnicos:
– Cobertura incompleta
– Detección de las interacciones

• Retos técnicos:
– Generación de conjuntos de
clones
– Método de mating de las cepas
de levadura
• Full interactome network:
– The complete collection of all
physical protein–protein interactions
that can take place within a cell.

• Requires
– Construction of comprehensive sets
of protein–protein interactions
– Creation of genome-scale resource
collections of open reading frames
(ORFeomes) cloned so as to
facilitate protein
– Capture all expressed isoforms
(splice variants and
polymorphisms).
• Generation of comprehensive
network maps, generally depicted as
nodes (e.g. proteins, RNAs, DNA
binding sites or metabolites) linked by
edges corresponding to molecular
interactions (e.g. protein–protein
interactions, enzymatic reactions,
DNA–protein, etc.).

• As biological systems are highly


dynamic and fluid, information on
where and when nodes appear or
disappear on where and when edges
take place and on the rewiring of the
network, as sub-networks appear or
disappear during developmental and
cell cycle stages, needs to be
obtained.
Problema biológico

• Complete surveys of genes and


proteins of model organisms
revealed a surprisingly small
number of genes even in
complex multicellular organisms,
supporting the view that a
comprehensive understanding of
cellular functions may not be
achieved by the characterization
of single genes or proteins alone
• The one-gene/one-protein at-a-
time approach of the last thirty
years has provided some
indication of function for only 5–
10% of all predicted proteins so
far.
• Recent findings suggest that
species differences cannot be
accounted for by the individual
properties of their component
genes, but rather by the
relationships between them.

• Set of experimental techniques


developed for the systematic
analysis of protein interactions
– Yeast two-hybrid-based methods
– Identification by mass
spectrometry of isolated protein
complexes
– Protein chips
– Hybrid approaches
– Computational methods
¿Cómo se estudia el
interactoma?
¿Cómo se estudia el interactoma?

YEAST TWO-HYBRID SYSTEM


Y2H
SISTEMA DE DOBLE HÍBRIDO
¿Cómo se estudia el interactoma?

YEAST TWO-HYBRID SYSTEM


Y2H
SISTEMA DE DOBLE HÍBRIDO

¿Qué es el Y2H?
¿Cómo se hace?
¿Cómo se interpreta?
Yeast two-hybrid system
• The Y2H system was originally
described by Fields and Song in 1989.
(Nature, 340, 245–246.).

• The canonical Y2H system consists of


a separable, DNA binding domain (DB)
from a transcriptional activator protein
(yeast Gal4 or bacterial lexA) fused to
protein ‘X’, generally referred to as the
‘bait’, and a separable transcriptional
activation domain (AD) fused to protein
‘Y’, termed the ‘prey’.

• When DB-X and AD-Y are co-


expressed in the nucleus of yeast cells,
X-Y protein–protein interactions
reconstitute a functional transcription
factor that activates one or more
reporter genes.
Defining the ORFeome
• The use of cDNAs as source for Y2H bias the results towards highly
expressed genes
• The ORFeome of an organism corresponds to its complete set of protein-
encoding genes, cloned as full-length open reading frames (ORFs).
• An ORF consists of the entire coding sequence between the initiation and
termination codons, excluding the 5’ and 3’ mRNA untranslated regions
(UTRs).
• A cloned ORFeome resource should ideally include all variants of all genes
expressed in all cells at all stages of development.

• Cloning thousands of ORFs in dozens of different vectors would be virtually


impossible using conventional restriction enzymes and DNA ligase
technology.
• A robust, standardized, and flexible methodology is required for ORFeome-
scale cloning, a need satisfied by the Gateway recombinational cloning
(RC) system
Gateway recombinational cloning (RC) system
Gateway mimics the site-specific recombination events of bacteriophage λ
integration into and excision from the Escherichia coli chromosome. The
method does not involve standard nucleic acid digestion and ligation.
Gateway recombinational cloning (RC) system
Gateway recombinational cloning (RC) system
Gateway recombinational
cloning (RC) system
Yeast two-hybrid system
Yeast two-hybrid system
Y2H
• Quality of the in high-throughput datasets (Y2H, AP-MS).
– Low overlap in the results of the two genome-wide yeast two-hybrid projects by
Uetz et al. (2000) and Ito et al. (2000) and, similarly, in the two high-throughput
AP-MS approaches analysing the yeast proteome (Gavin et al. 2002; Ho et al.
2002) has raised concerns about “noisiness” and false negative or false positive
results.

• Los sistemas de screening de los resultados son todavía bastante


rudimentarios y no “saturan” el sistema
Overlapping of Y2H interactomes

High-throughput two-hybrid screens are subsaturating. Venn diagrams showing overlap among independent high-
throughput two-hybrid screens (a) for Drosophila proteins or (b) for human proteins. In each case the data represent the
entire two-hybrid dataset rather than just the set judged to be high confidence in each study. Numbers indicate unique
interactions based on gene locus (i.e. detection of an interaction between protein A and two splice variants of protein B
would be counted as one interaction). Data for Drosophila were obtained from Giot et al. [13], Stanyon et al. [14], and
Formstecher et al. [15], and compiled to remove redundancy in the Drosophila Interactions Database [23]. Human data
was obtained from Rual et al. [16] and Stelzl [17].
Yeast two-hybrid system
Y2H
• Quality of the in high-throughput datasets (Y2H, AP-MS).
– Low overlap in the results of the two genome-wide yeast two-hybrid projects by
Uetz et al. (2000) and Ito et al. (2000) and, similarly, in the two high-throughput
AP-MS approaches analysing the yeast proteome (Gavin et al. 2002; Ho et al.
2002) has raised concerns about “noisiness” and false negative or false positive
results.

• Standard Y2H generally underestimates the number of interactions


(inherent false-negatives), because:
– The forced subcellular localization of bait and prey in the yeast nucleus may
preclude certain interactions from taking place, a particular instance being
interactions involving integral membrane proteins.
– For interactions that require specific post-translational modifications, unless the
enzymes responsible for such modification happen to be present in the yeast
nucleus, the interaction may not be detectable by Y2H.
– It is difficult to analyze transcription factors which might by itself activate
transcription of the reporter genes.
FALSE-POSITIVES IN Y2H
• Biological false-positives
– The interaction can be confirmed by multiple, different methods, but the two
proteins are never present in the same cell or subcellular compartment at the
same time.
– These false-positives are nearly impossible to unequivocally identify using
interaction assays alone.

• Technical false-positives
– In HT-Y2H experiments the technical false-positive rate is substantially reduced
by
• incorporating multiple reporter genes to measure transcription activation
• employing different DNA sequences for binding by DB in the promoters of the reporter
genes
• using low copy number vectors
• retesting interacting pairs in fresh yeast
Yeast two-hybrid system
HT-2YH
Stringent Y2H screening strategy.
Through use of multiple, single-copy
reporter genes, low copy plasmids for
expression of bait and prey in yeast,
and retesting of all positives, Y2H
achieves increased stringency leading
to reproducibly real interactions.
TECHNICAL FALSE-POSITIVES IN
Y2H
• Biological false-positives
– The interaction can be confirmed by multiple, different methods, but the two proteins are
never present in the same cell or subcellular compartment at the same time.
– These false-positives are nearly impossible to unequivocally identify using interaction assays
alone.

• Technical false-positives
– In HT-Y2H experiments the technical false-positive rate is substantially reduced by
• incorporating multiple reporter genes to measure transcription activation
• employing different DNA sequences for binding by DB in the promoters of the reporter genes
• using low copy number vectors
• retesting interacting pairs in fresh yeast

• Auto-activators
– the DB-X construct activates gene expression in the absence of any AD-Y
• Strong auto-activators can be removed directly before any AD-Y is added
• Additional auto-activators arise owing to acquisition of mutations in the bait during propagation.
• These latent auto-activators are much harder to identify, as the presence of AD-Y gives the
appearance of an interaction when in fact it is the DB-X construct alone that auto-activates the Y2H
reporter genes, irrespective of any AD-Y that is present
TECHNICAL FALSE-POSITIVES IN
Y2H

The initial genome-wide Y2H


Removal of auto-activators. Auto-activating baits studies contained significant
are the major source of false-positives. Strong auto- technical false-positives, most
activators can be removed prior to addition of prey likely because the influence
plasmids. Latent ones can arise during manipulation of of auto-activation was not
yeast; testing for latent auto-activators is performed in recognized.
parallel with Y2H screening.
Modificaciones del sistema Y2H
Detección de interacciones por
cromatografía de afinidad y
espectroscopía de masas

AP-MS
Ejemplos de estudios con Y2H
• Bacterias
– Helicobacter pylori
• Eucariontes
– S. cerevisiae
• 5600 interacciones en que participan el 69% de las proteínas
– Drosophila melanogaster
• 24000 interacciones que implican el 54% de los genes
– Plasmodium falciparum
– Caenorhabditis elegans
• 5400 interacciones que implican el 12% de las proteínas
– Homo sapiens
• 10000 ORFs clonadas
IDENTIFICATION OF INTERACTING PROTEINS
BY MASS SPECTROMETRY
• Two basic strategies:
– direct (purification of a stable complex and elucidation of the components of the complex by
mass spectrometry)
• Faces the difficult task of achieving sufficient purification of target complexes without loss of
components and with minimal contamination.
– co-AP (purification of a complex by virtue of an affinity tag placed on one of its components,
then elucidation of the components of the complex by mass spectrometry).

• The identification of increasing numbers of shared components involving complexes


of different function highlights the challenge to assigning function based on co-
purification strategies.

• Many complexes involve very transient interactions and/or individual components are
not readily detectable owing to low expression, AP–MS will underestimate the extent
of complex co-membership.

• A systematic analysis suggests that a majority of novel and shared components are
likely to be biologically relevant (108), which means that AP–MS is a reliable method
for identifying novel components of complexes.
COMPLEMENTARITY OF AP–MS AND Y2H

Y2H AP-MS

• Detecta uniones débiles • No detecta uniones débiles


• Ambiente heterólogo • Ambiente nativo
• No permite modificaciones post- • Permite modificaciones
traduccionales posttraduccionales
• Permite mutagénesis • No permite mutagénesis

• La multifuncionalidad de algunas proteínas puede producir asociaciones


inesperadas evidenciadas por las dos técnicas

VALIDACIÓN DE LAS INTERACCIONES


Validación de los resultados interactómicos
• La calidad de las interacciones debe validarse por su detección usando
procedimientos complementarios:
– Y2H y AP-MS
– Utilización de knock-out o inhibición de la expresión del gen por siRNA

• Criterios de validación • Factores que aumentan la confianza


– Conservación filogenética en la interacción
– Localización subcelular – Señal de Y2H con más de dos
– Coexpresión reporters
– Topología de la red – Aparición de la misma señal en varios
clusters
– Las proteínas participantes tienen
• Validación informática anotaciones de GO similares
– Asigna un valor a la probabilidad de – La interacción se ha detectado en otros
la interacción in vivo en función de sistemas INTERLOGS
las características de las proteínas y
el tipo de interacción
– Ventajas • Estos criterios son muy
• Los datos se almacenan en bases en restrictivos y sesgan la
las que quedan disponibles validación hacia lo ya
• Se puede trabajar a distintos niveles publicado; pero aumentan
de confianza
• Se pueden comparar distintos la robustez del análisis
interactomas
Protein interaction networks
• Biological networks are graphical
visualizations of elements such as
proteins or genes, depicted in the graph
as nodes or vertices, connected to each
other by links or edges representing their
functional interactions.

P(1) = 0.33
The most basic characteristic of a node in k=1 P(2) = 0.50
a network is its degree k, which is defined P(3) = 0.00
as the number of links it has to other k=2 P(4) = 0.17
nodes. k=4
• An elementary measure to characterize a
In protein interaction networks, links network’s topology is the degree distribut-
usually are undirected. In other complex ion P(k), obtained by counting the number
networks, like for example gene of nodes having the same degree N(k)
regulatory networks, links can be divided by the total number of nodes (N).
directional; here the degree of a node is
divided into incoming degree, comprising • P(k) gives the probability of a node having
the links that point towards that node, and exactly the degree k.
outgoing degree, denoting links pointing
away from it. • P(k) can be used to classify networks
Protein interaction networks
• A Poisson distribution of P(k) values is indicative of random networks.

• In a number of biological networks, P(k) values follow a power law


[ P(k) ≈ k –γ ]
where the exponent (γ) ranges between 2 and 3.

• This means that the large majority of nodes have only one or very few links, while a
small but significant number of nodes, the so-called “hubs” or centros, are connected
to many other nodes.
– Power law topology contribute to the robustness against random perturbations
– Knockouts of genes encoding hubs are approximately three times more likely to
confer lethality than those of non-hubs.
– Furthermore, the dynamics of interactions mediated by hub proteins points to a
modular organization of the yeast proteome.
Protein interaction networks
• Another feature to describe and classify network architecture and the relative position
of particular nodes in the network is the path length
– The number of steps that have to be taken to reach from one node to another.
• The shortest path and the mean path length are measures of the diameter of a
network.
• Scale-free networks have ultra-short mean path lengths and therefore have so-called
“small-world” properties, a characteristic of random networks.
Protein interaction networks
• Biological and other complex networks revealed a high degree of clustering which is
not found in random networks but rather is an attribute of regular networks.
• Clustering coefficient Ci:
– Number of links existing between the neighbours of a node i divided by the maximum
number of links possible between these neighbours:
Ci = 2ni / k(k-1)
where n is the number of links connecting the k neighbors.
– A high clustering coefficient means that, if for example a node A is connected to B and C,
there is a high probability that B has a direct link to C or, in other words, A, B and C form a
triangle.
• A high Clustering coefficient (a high density of triangles) indicates a “community
structure” or a modular organization, which is another general property of complex
networks.
• In biological networks, functional annotation of these separable subgraphs supports
the view that these structures reflect the modularity of cellular functions.
– Rationale for annotation:
• Guilty by association
• Majority rule
Protein interaction network decomposition reveals functional modules and motifs.
a Graphical representation of a nonredundant set of yeast interactome data compiled from the
GRID and DIP databases results in a highly complex network. b Exemplarily, κ-core decompos-
ition, a method that is based on the recursive removal of the least connected nodes from the
yeast interactome network, is shown. Depicted is the 8-core, a subgraph with all nodes con-
nected by at least eight edges. Colouring indicates functional categorization according to GO
annotation.

Joachim F. Uhrig. Protein interaction networks in plants. Planta (2006) 224: 771–781
Interactómica comparada
• El mapa de interacciones en una especie es útil para predecir cuáles son
las interacciones en otra.

• INTERLOG: interacciones de proteínas predichas en un organismo porque


en otro se ha comprobado que los ortólogos de dichas proteínas interac-
cionan entre sí.
– Ejemplo: de 2727 genes asociados a enfermedades en humanos, 1716 tienen
ortólogos en Drosophila melanogaster y, de ellos, 914 (56%) interaccionan entre
sí en ensayos de Y2H

• Cuestiones derivadas de la comparación de interactomas


– ¿Cuán dinámico es?
– ¿Cómo evoluciona?
– ¿Cómo influye en la evolución?
– ¿Qué características son específicas de especie?
– ¿Cómo se relaciona con el fenotipo?
– ¿Cómo explica las enfermedades y la respuesta a patógenos?
Generation of a proteome-
scale human Y2H map.
Whole Interactome Disease associated proteins

Using a stringent, high-throughput yeast two-hybrid system, we tested pairwise


interactions among the products of around 8,100 currently available Gateway-cloned
open reading frames and detected circa 2,800 interactions. This data set, called
CCSB-HI1, has a verification rate of nearly 78% as revealed by an independent co-
affinity purification assay, and correlates significantly with other biological attributes.
The CCSB-HI1 data set increases by around 70% the set of available binary
interactions within the tested space and reveals more than 300 new connections to
over 100 disease-associated proteins.

Towards a proteome-scale map of the human protein–protein interaction network. Rual et al. 2005. Nature 437, 1173-1178

S-ar putea să vă placă și