Outline
Why protein-proteininteractions?.
Experimental methods for discovering PPIs:
• Yeast-two-hybrid
• AP-MS
PPIs databases:
• DIP
• MIPs
Computational prediction of PPIs
• Phylogenetic based method
• Expression correlation based method
• STRING (EMBL)
3.
Why protein-protein interactions(PPI)?
Gene is the basic
unit of heredity.
Genomes are
availabe.
genome Proteome (蛋白质组) interactome
Proteins, the working
molecules of a cell,
carry out many
biological activities
Proteins function by
interacting with other
proteins.
4.
Why protein-protein interactions(PPI)?
PPIs are involved in many biological processes:
Signal transduction
Protein complexes or molecular machinery
Protein carrier
Protein modifications (phosphorylation)
…
PPIs help to decipher the molecular mechanisms underlying the
biological functions, and enhance the approaches for drug discovery
5.
High throughput experimentalmethods
for discovering PPIs
Yeast-two-hybrid (Y2H ,)
Ito T. et al., 2001
Uetz P. et al., 2000
Affinity purification followed by mass
spectrometry (AP-MS ,)
Gavin AC et al., 2002, 2006
Ho Y. et al., 2002
Krogan NJ et al., 2006
6.
Y2H experiments
Idea:
Bait& (prey) protein is fused
to the binding domain
(activation domain).
If bait and prey proteins
interact, the transcription of
the reporter gene is initiated.
High throughput screening the
interactions between the bait
and the prey library.
In yeast nucleus
7.
AP-MS experiments
Fuse[a TAP tag consisting of protA (IgG binding
peptides) and calmodulin binding peptide (CBP)
separated by TEV protease cleavage site] to the
target protein
After the first AP step using an IgG matrix, many
contaminants are eliminated.
In the second AP step, CBP binds tightly to
calmodulin coated beads. After washing which
removes remained contaminants and the TEV
protease, the bound meterial is released under mild
condition with EGTA
Proteins are identified by mass spectrometry
8.
PPIs Databases.
DIP-Database of Interacting Protein.
(http://dip.doe-mbi.ucla.edu/ )
MIPS-Munich Information center for Protein
Sequences.
(http://mips.gsf.de/ )
9.
DIP
Protein function
Protein-protein relationship
Evolution of protein-protein interaction
The network of interacting proteins
Unknown protein-protein interaction
The best interaction conditions
•PPIs databases
10.
DIP-Statistics
Number ofproteins: 20731
Number of organisms: 274
Number of interactions: 57687
Number of distinct experiments describing an interaction:
65735
Number of data sources (articles): 3915
Graph of PPIsaround DIP:1143N
Nodes are proteins
Edges are PPIs
The center node is DIP:1143N
Edge width encodes the number
of independent experiments
identyfying the interaction.
Green (red) is used to draw core
(unverified) interactions.
Click on each node (edge) to
know more about the protein
(interaction).
MIPS
Services:
Genomes
Databanksretrieval systems
Analysis tools
Expression analysis
Protein protein interactions
MPact: the MIPS protein interaction resource on yeast.
MPPI: the MIPS Mammalian Protein-Protein Interaction Database.
Protein complexes
Mammalian protein complexes at MIPS
17.
MPact: the MIPSprotein interaction
resource on yeast
Query all PPIs of a yeast protein
MPPI: the MIPSMammalian Protein-Protein
Interaction Database
Query PPIs of a mamalian protein. You can use x-ref, for example Uniprot
accession number.
Assessment of large–scaledata sets of
PPIs
The overlap between the individual methods is
surprisingly small
The methods may not have reached saturation.
Many of the methods may produce a significant
fraction of false positives.
Some methods may have difficulties for certain
types of interactions
Von Mering C, et al. Nature, (2002) 417 : 399–403
27.
Functional biases
AP-MSdiscovers few PPIs involved in transport and sensing
Y2H detects few PPIs involved in translation.
Different methods complement each other
Von Mering C, et al. Nature, (2002) 417 : 399–403
28.
Computational methods ofprediction
Current approaches:
Genomic methods
Biological context methods
Structural based methods
29.
Genomic methods
Proteina and b whose genes are close in different genomes are
predicted to interact.
Protein a and b are predicted to interact if they combine (fuse) to
form one protein in another organism.
Protein a and c are predicted to interact if they have similar
phylogenetic profiles.
30.
Biological context methods
Gene expression: Two protein whose genes exhibit
very similar patterns of expression across multiple
states or experiments may then be considered
candidates for functional association and posibly
direct physical interaction.
GO annotations: two interacting proteins likely have
the same GO term annotations.
Machine learning techniques are adopted for PPI
classification by intergrating all known information.
31.
STRING: Search Toolfor the Retrieval of
Interacting Genes/Proteins
A database of known and predicted protein interactions
Direct (physical) and indirect (functional) associations
The database currently covers 2,483,276 proteins from 630
organisms
Derived from these sources:
Supported by
Graph of PPIs
Nodes are proteins
Lines with color is an evidence of
interaction between two proteins.
The color encodes the method
used to detect the interaction.
Click on each node to get the
information of the corresponding
protein.
Click on each edge to get
information of the interaction
between two proteins.
34.
List of predictedpartners
Partners with discription and confidence score.
Choose different types of views to see more detail
35.
Neighborhood View
Thered block is the queried protein and others are its neighbors in
organisms. Click on the blocks to obtain the information about
corresponding proteins.
The close organisms show the similar protein neighborhood patterns.
Help to find out the close genes/proteins in genomic region.
36.
Occurence Views
Representsphylogenetic profiles of proteins.
Color of the boxes indicates the sequence similarity between the proteins and
their homologus protein in the organisms.
The size of box shows how many members in the family representing the
reported sequence similarity.
Click on each box to see the sequence alignment.
37.
Gene Fusion View
This view shows the individual gene fusion events per species
Two different colored boxes next to each other indicate a fusion
event.
Hovering above a region in a gene gives the gene name; clicking on
a gene gives more detailed information
38.
References
Ito Tet.al: A comprehensive two-hybrid analysis to explore the yeast protein
interactome. Proc. Natl Acad. Sci. USA 2001, 98:4569-4574.
Uetz P et. al: A comprehensive analysis protein-protein interactions in
Saccharomyces cerevisiae. Nature 2000, 403:623-627.
Gavin AC et.al: Functional organization of the yeast proteome by systematic
analysis of protein complexes. Nature 2002, 415:141-147.
Gavin AC et.al: Proteome survey reveals modularity of the yeast cell
machinery. Nature 2006, 440:631-636.
Ho Y et.al: Systematic identification of protein complexes in Saccharomyces
cerevisiae by mass spectrometry. Nature 2002, 415:180-183.
Von Mering C et.al: Comparative assessment of large-scale data sets of
protein-protein interactions. Nature 2002, 417:399-403.
Editor's Notes
#4 For example, signals from the exterior of a cell are mediated to the inside of that cell by protein-protein interactions of the signaling molecules
In protein complex, members are linked by non-covalent interactions, they often activate or inhibit other members.
a protein may be carrying another protein, for example, from cytoplasm to nucleus or vice versa in the case of the nuclear pore importins, a type of protein that moves other protein molecules into the nucleus by binding to a specific recognition sequence
protein kinase will add a phosphate to a target protein
#27 each technique produces a unique distribution of interactions with respect to functional categories of interacting proteins
For TAP, possibly because these are enriched in transmembrane proteins, which are more difficult to purify
#35 This view shows runs of genes that occur repeatedly in close neighborhood in (prokaryotic) genomes. Genes located together in a run are linked with a black line (maximum allowed intergenic distance is 300 bp). Note that if there are multiple runs for a given species, these are separated by white space. If there are other genes in the run that are below the current score threshold, they are drawn as small white triangles. Gene fusion occurences are also drawn, but only if they are present in a run (see also the Fusion section below for more details).
#36 This view shows the presence or absence of linked proteins across species. Proteins are listed across the top of the page and a phylogenetic tree with species names is listed down the left hand side. In the subsequent grid, the presence of the protein in a species is marked with a red square and absence with a white space. The color of the red square can be more or less intense to reflect the amount of conservation of the homologous protein in the specie.