Visualising Biological Networks with Cytoscape    (gene coexpression & protein-protein interaction)    (... steps to integ...
Networks & PathwaysComparison and combination of these type of complex data           genes/proteins in networks          ...
Biomolecular                  complexity of                 living systems                      GENOME                  TR...
Genomes …    Is there a simple “genome factor”?Organism       Genome       Genome Factor               GenesBacteria      ...
Omics era: unraveling biological complexitythe paradox of the "genome alone"      genome                                  ...
Omics era: unraveling biological complexityproteins constitute the keystones of the cellular machinery      genome        ...
Omics era: unraveling biological complexityinteractions (gene2gene, prot2prot) ... cellular machinery dynamics      genome...
Dr J De Las Rivas - 2011   8
NetworksTwo major types of networks derived from experimental data  Two major types of networks derived from large-scale o...
Networks tool = CytoscapeThe most powerful tool to build, visualize and analyse networks            Cytoscape is a open so...
Networks tool = CytoscapeThe most powerful tool to build, visualize and analyse networks            Cytoscape is a open so...
Networks tool = CytoscapeThe most powerful tool to build, visualize and analyse networks                                  ...
Networks & PathwaysComparison and combination of these type of complex dataNetworks & Pathways¿The data?: databases, data ...
Pathways databasesKEGG and Reactomehttp://www.genome.jp/kegg/http://www.reactome.org/                             14
Network databasesGeneMANIA and STRINGhttp://www.genemania.org/http://string-db.org/                            Dr J De Las...
Networks & PathwaysComparison and combination of these type of complex data   http://www.genome.jp/kegg/                  ...
Dr J De Las Rivas - 2011   17
NetworksTwo major types of networks derived from experimental data  Two major types of networks derived from large-scale o...
Stuart et al. (2003) ScienceHumancoexpressionstudies               Lee et al. (2004) Genome Research                      ...
Stuart et al. (2003) ScienceHumancoexpressionlow signal & high noise                                        Dr J De Las Ri...
Experimental                  Sample biasdatasetselection               Lee et al. (2004) Genome Research                 ...
Sample bias                                                            “malignant data”Lee et al. (2004) Genome Research  ...
Sample bias                            microarray datasets in                      “malignant data”                       ...
Human transcriptomic network                                         of normal tissues:                                   ...
Experimental   Normal Samplesdatasetselection               Prieto et al. (2008) PLoS ONE                              Dr ...
Experimental dataset selection22 microarraysfrom “hematopoietic”samples                        to achieve                 ...
Sample selectionnormal healthy tissuesrepresenting the body “evenly”(pvclust algortihm:uncertainty in hierarchical cluster...
Experimental            48 microarrays of whole tissues / organs                             normal healthy samples (hgu13...
Stuart et al. (2003) ScienceHumancoexpressioncomparativestudyusing Stuart et al. approach                               Le...
This work (2008)Human                          pathway name (KEGG ID number)                               Proteasome (305...
Gene2gene coexpression method(based in combination of correlation r and crossvalidation N)                                ...
Gene2gene coexpression methodmapping coexpression of house keeping genes and tissue specific genes(based in KEGG pathways)...
Hi-Fi gene2gene coexpression network(based in combination of correlation r and crossvalidation N)precision obtained for 3 ...
Hi-Fi human coexpression networknetwork = intersection with 2 methods and precision ≥ 0.60 (r ≥ 0.77, N ≥ 605)            ...
Hi-Fi human coexpression network                                   Analysis done                                   with 2 ...
Hi-Fi humancoexpression network        Analysis done        with 2 algorithms        MCODE        MCL        mitochondrial...
Hi-Fi human coexpression network(modules coherent in terms of transcription factor TF regulation)             Coexp Module...
Human transcriptomic network                                        of normal tissues:                                    ...
NetworksTwo major types of networks derived from experimental data  Two major types of networks derived from large-scale o...
Protein-Protein Interactions (PPIs) biological networks                                    Zhu et al. (2007) Genes Dev.   ...
Protein-Protein Interactions (PPIs)international consortiumsOur group participates actively in HUPO PSI-MI (Molecular Inte...
Protein-Protein Interactions (PPIs)international consortiums There are several primary PPIs databases, but at present ther...
Protein-Protein Interactions (PPIs)review some essential concepts on PPIs                                         PLoS Com...
Protein-Protein Interactions (PPIs) definition The advancement of genome and proteome-wide experimental technologies have ...
Protein-Protein Interactions (PPIs) definition The advancement of genome and proteome-wide experimental technologies have ...
Protein-Protein Interactions (PPIs) definition It is important to define the different types of associations between prote...
Protein-Protein Interactions (PPIs) definition It is important to define the different types of associations between prote...
Protein-Protein Interactions (PPIs) definition It is important to define the different types of associations between prote...
Protein-Protein Interactions (PPIs) definition What do we mean by protein-protein interaction ? Protein-to-Protein interac...
Protein-Protein Interactions (PPIs)review some essential concepts on PPIs                                         PLoS Com...
Protein-Protein Interactions (PPIs)types of experimental methods Within the last years a large amount of data on protein-p...
Protein-Protein Interactions (PPIs) types of experimental methods  Data about the  YEAST interactome  Two main  high-throu...
Protein-Protein Interactions (PPIs)major high-throughput experimental methodsIn recent years two main high-throughput prot...
Protein Interactions PPIsTAP-MSTandem-Affinity Purification andMass Spectrometry (TAP-MS)provides multimer interactions (c...
Protein Interactions PPIsTAP-MSTandem-Affinity Purification andMass Spectrometry (TAP-MS)provides multimer interactions (c...
Protein Interactions PPIsY2H                                               High-throughput Two-Hybrid systems             ...
Protein Interactions PPIsY2H                                              High-throughput Two-Hybrid systems              ...
Protein Interactions PPIsY2H                                                 High-throughput Two-Hybrid systems           ...
Protein-Protein Interactions (PPIs)review some essential concepts on PPIs                                         Dr J De ...
Protein-Protein Interactions (PPIs)experimental methods Within the last years a large amount of data on protein-protein in...
Protein Interactions (PIs)protein arrays/chips: multiple technologies to find protein interactions                        ...
Protein Interactions (PIs)                                           2007protein arrays/chips: multiple technologies to fi...
Protein Interactions (PIs)protein arrays: 1st data analysis step is the signal quantification                             ...
Protein-Protein Interactions (PPIs)proteins arrays to detect protein-protein interations Within the last years a large amo...
Protein-Protein Interactions (PPIs) data sources: databases                                                               ...
Protein-Protein Interactions (PPIs) experimental vs computational    For a proper study of protein-protein interactions it...
Protein-Protein Interactions (PPIs)  types of databases         There are several types of PPIs databases:               -...
From protein interactions to protein networksintegration & unification of protein interaction dataWe have developed a data...
Protein-Protein Interactions (PPIs)integration & unification of PPI data  APID (Agile Protein Interaction DataAnalizer) ht...
Protein-Protein Interactions (PPIs)integration & unification of PPI data APID (Agile Protein Interaction DataAnalizer) htt...
Protein-Protein Interactions (PPIs)integration & unification of PPI data: hsPPIs in APIDThere are several primary PPIs dat...
NetworksTwo major types of networks derived from experimental data  Two major types of networks derived from large-scale o...
Omics era: unraveling biological complexitymapping biological networks                                                    ...
Protein-Protein Interactions (PPIs)representation and visualization of networksFrom Merico, Gfeller & Bader (2010) Nature ...
Omics era: unraveling biological complexitymapping biological networks  The interactome is a  complex biomolecular network...
Omics era: unraveling biological complexitymapping biological networks    Biological networks derived from PPIs are not ra...
Protein-Protein Interactions (PPIs)topological parametersThe interactome is abiomolecular networkNetwork topology plays a ...
Protein-Protein Interactions (PPIs)              From Zhu et al. (2007) Genes Dev.   78
Protein-Protein Interactions (PPIs)build networks from experimental data: CytoscapeChallenge: improve data integration and...
Protein-Protein Interactions (PPIs)build networks from experimental data: Cytoscape                 http://www.cytoscape.o...
Protein-Protein Interactions (PPIs)build networks from experimental data: Cytoscape                                       ...
Protein-Protein Interactions (PPIs)build networks from experimental data: APID2NETUsing Cytoscapeand the pluginAPID2NETyou...
Protein-Protein Interactions (PPIs)build networks from experimental data: APID2NET                                  Downlo...
Protein-Protein Interactions (PPIs)two publications                                      Dr J De Las Rivas - 2011   84
Protein-Protein Interactions (PPIs)PSICQUIC new tool and service Aranda et al. (2011) Nature Methods                      ...
Protein-Protein Interactions (PPIs)PSICQUIC new tool and service Aranda et al. (2011) Nature Methods                      ...
Protein-Protein Interactions (PPIs)PSICQUIC & PSISCORE Aranda et al. (2011) Nature Methods                                ...
Protein-Protein Interactions (PPIs)                             Javier De Las Rivas                                  Refer...
Dr J De Las Rivas - 2011   89
THANKS Bioinformatics and Functional Genomics Research GroupCancer Research Center (CiC, CSIC/USAL), Salamanca, Spain     ...
Upcoming SlideShare
Loading in …5
×

Cytoscape: Gene coexppression and PPI networks

4,411 views

Published on

First presentation slides of the BITS training session "Visualising biological networks with Cytoscape"

See http://www.bits.vib.be/training

Published in: Technology

Cytoscape: Gene coexppression and PPI networks

  1. 1. Visualising Biological Networks with Cytoscape (gene coexpression & protein-protein interaction) (... steps to integration of Networks & Pathways) genes/proteins in networks and genes/proteins in pathways Brussels (BE), 2.September.2011 Practical Course Bioinformatics Training BITS - VIBDr. Javier De Las RivasCancer Research Center (CiC-IBMCC)CSIC and University of Salamanca (CSIC/USAL)Salamanca, Spain   Dr J De Las Rivas - 2011 1
  2. 2. Networks & PathwaysComparison and combination of these type of complex data genes/proteins in networks and genes/proteins in pathways Dr J De Las Rivas - 2011 2
  3. 3. Biomolecular complexity of living systems GENOME TRANSCRIPTOME genes/RNAs actions & relations PROTEOME proteins actions & relations METABOLOME INTERACTOME structural, metabolic or signaling (molecular interactions) protein-ligand actions & relationsCitrate Cycle Dr J De Las Rivas - 2011 3
  4. 4. Genomes … Is there a simple “genome factor”?Organism Genome Genome Factor GenesBacteria 3.000 From the mere x2 genome numbersYeast 6.000 HUMAN x3 x12 is only aboutWorm 18.000 12 times x2 BACTERIAHuman 36.000 BIOLOGY includes two other key factors:  Cellular Factor 1 bacteria is 1 cell 1 human is 10 9 cells (and more than 300 cell-types)  Relational Factor By interaction and relations the number of possible outputs grows exponentially Dr J De Las Rivas - 2011 4
  5. 5. Omics era: unraveling biological complexitythe paradox of the "genome alone" genome phenome expression track expression track activation track activation track stable facing legacy reality genome living system Dr J De Las Rivas - 2011 5
  6. 6. Omics era: unraveling biological complexityproteins constitute the keystones of the cellular machinery genome interactome phenome expression track expression track active track active track stable working-moving facing legacy machinery reality genes proteins + = genome cellular machinery living system Dr J De Las Rivas - 2011 6
  7. 7. Omics era: unraveling biological complexityinteractions (gene2gene, prot2prot) ... cellular machinery dynamics genome interactome phenome expression track expression track active track active track working-moving machinery + = genome cellular machinery living system Dr J De Las Rivas - 2011 7
  8. 8. Dr J De Las Rivas - 2011 8
  9. 9. NetworksTwo major types of networks derived from experimental data Two major types of networks derived from large-scale omic data 1.– Gene Coexpression Networks: derived from gene expression profiling and transcriptomic studies 2.– Protein-Protein Interaction Networks: derived from proteomic studies Dr J De Las Rivas - 2011 9
  10. 10. Networks tool = CytoscapeThe most powerful tool to build, visualize and analyse networks Cytoscape is a open source bioinformatics package for biological network visualization and data integration (desktop Java application released under GNU License, LGPL) Dr J De Las Rivas - 2011 10
  11. 11. Networks tool = CytoscapeThe most powerful tool to build, visualize and analyse networks Cytoscape is a open source bioinformatics package for biological network visualization and data integration (desktop Java application released under GNU License, LGPL) Important publications: Nature Protocols (2007) Bioinformatics (2011) Dr J De Las Rivas - 2011 11
  12. 12. Networks tool = CytoscapeThe most powerful tool to build, visualize and analyse networks Dr J De Las Rivas - 2011 12
  13. 13. Networks & PathwaysComparison and combination of these type of complex dataNetworks & Pathways¿The data?: databases, data sources genes/proteins in networks and genes/proteins in pathways Dr J De Las Rivas - 2011 13
  14. 14. Pathways databasesKEGG and Reactomehttp://www.genome.jp/kegg/http://www.reactome.org/ 14
  15. 15. Network databasesGeneMANIA and STRINGhttp://www.genemania.org/http://string-db.org/ Dr J De Las Rivas - 2011 15
  16. 16. Networks & PathwaysComparison and combination of these type of complex data http://www.genome.jp/kegg/ http://www.reactome.org/ http://string-db.org/ http://www.genemania.org/ Dr J De Las Rivas - 2011 16
  17. 17. Dr J De Las Rivas - 2011 17
  18. 18. NetworksTwo major types of networks derived from experimental data Two major types of networks derived from large-scale omic data 1.– Gene Coexpression Networks: derived from gene expression profiling and transcriptomic studies 2.– Protein-Protein Interaction Networks: derived from proteomic studies Dr J De Las Rivas - 2011 18
  19. 19. Stuart et al. (2003) ScienceHumancoexpressionstudies Lee et al. (2004) Genome Research Griffith et al. (2005) Genomics 19
  20. 20. Stuart et al. (2003) ScienceHumancoexpressionlow signal & high noise Dr J De Las Rivas - 2011 20
  21. 21. Experimental Sample biasdatasetselection Lee et al. (2004) Genome Research Dr J De Las Rivas - 2011 21
  22. 22. Sample bias “malignant data”Lee et al. (2004) Genome Research ≈ 80% of these datasets correspond to “cancer” samples ¿how “normal” is this? ¿ do we consider that “tumor cells” usually have totally aberrant genome with many altered chromosomes? Dr J De Las Rivas - 2011 22
  23. 23. Sample bias microarray datasets in “malignant data” Lee et al. (2004) Genome Researchlymphoma breast cancer NCI-60 tumor cell linesGIST sarcoma breast cancer lymphomaleukemia obesity prefrontal cortexlung cancer breast cancer prostate cancermelanoma thyroid papillary tumors breast cancerleprosy breast cancer breast cancerNCI-60 tumor cell lines blue cell tumors asthmafibroblasts astrocytoma NCI-60 tumor cell linesparasite response gastric cancer diverse tissuesliver cancer prostate cancer dermatomyositisleukemia breast cancer viral infectionbreast cancer medulloblastoma breast cancerprostate cancer sarcoma leukemiaT-cells breast cancer musclebladder tumors breast cancer ovarian cancerbladder tumors brain tumors prostate cancerlung cancer tumor and normal breast cancerleukemia glioma cell cycle, tumorsinflammatory myopathy leukemia leukemiabreast cancer breast cancer colorectal cancer Dr J De Las Rivas - 2011 23
  24. 24. Human transcriptomic network of normal tissues: a global map withoutKey questions malignant data• Can we use global human gene expression data (i.e. transcriptomicgenome-wide microarray data) to derive gene coexpression networks ?• Is it a reliable way to find coexpression (knowing the large noise andbackground in microarrays and the bad effect of outliyers on correlation) ?• How reliable are the data coming from microarrays ?Can we calculate and improve the reliability of microarray data ?• Which algorithm is good enough to provide a sensible reliableexpression signal: MAS5, RMA, dCHIP, PLIER, FARMS ...? Dr J De Las Rivas - 2011 24
  25. 25. Experimental Normal Samplesdatasetselection Prieto et al. (2008) PLoS ONE Dr J De Las Rivas - 2011 25
  26. 26. Experimental dataset selection22 microarraysfrom “hematopoietic”samples to achieve transcriptomic global view34 microarraysfrom “brain” it is criticalsamples to avoid “sample bias” adequate sample selection136 microarrayshgu133aGene Expression Atlas 26
  27. 27. Sample selectionnormal healthy tissuesrepresenting the body “evenly”(pvclust algortihm:uncertainty in hierarchical clustervia multiscale bootstrap resampling) Dr J De Las Rivas - 2011 27
  28. 28. Experimental 48 microarrays of whole tissues / organs normal healthy samples (hgu133a)dataset Gene Expression Atlasselection A MAS5 - Spearman B RMA - Pearson Dr J De Las Rivas - 2011 28
  29. 29. Stuart et al. (2003) ScienceHumancoexpressioncomparativestudyusing Stuart et al. approach Lee et al. (2004) Genome Research Griffith et al. (2005) Genomics 29
  30. 30. This work (2008)Human pathway name (KEGG ID number) Proteasome (3050) nºgenes genes coexp / genes % gn coexp mean r 31 28 / 28 1.00 0.69 Ribosome (3010) 120 52 / 55 0.95 0.75coexpression Oxidative phosphorylation (190) Focal adhesion (4510) 129 194 88 / 95 154 / 168 0.93 0.92 0.73 0.68studies Antigen processing and presentation (4612) Glycan structures - degradation (1032) 86 30 71 / 78 20 / 22 0.91 0.91 0.75 0.65 Neuroactive ligand-receptor interact. (4080) 299 227 / 255 0.89 0.68 Cell cycle (4110) 114 90 / 102 0.88 0.66 Regulation of actin cytoskeleton (4810) 208 141 / 161 0.88 0.66mapping Cytokine-cytokine receptor interact. (4060) 256 196 / 223 0.88 0.69coexpressing genes Lee et al. (2004) pathway name (KEGG ID number) nºgenes genes coexp / genes % gn coexpinto KEGG Ribosome (3010) 120 43 / 44 0.98 Proteasome (3050) 31 19 / 22 0.86pathways to check Oxidative phosphorylation (190) 129 31 / 44 0.70 Cell cycle (4110) 114 33 / 47 0.70functional ECM-receptor interaction (4512) 87 16 / 23 0.70coherence Gap junction (4540) 92 9 / 13 0.69 Pathogenic Escherichia coli infection (5130) 49 11 / 16 0.69 Pathogenic Escherichia coli infection (5131) 49 11 / 16 0.69 T cell receptor signaling pathway (4660) 93 15 / 22 0.68done as in: Metabolism of xenobiotics by cytP450 (980) 70 7 / 11 0.64Stuart et al. (2003) Science Griffith et al. (2005) pathway name (KEGG ID number) nºgenes genes coexp / genes % gn coexpi.e. detection of the Ribosome (3010) 120 36 / 38 0.95 Proteasome (3050) 31 20 / 24 0.83number of genes Oxidative phosphorylation (190) 129 55 / 67 0.82within each pathway Val, Leu and isoleucine degradation (280) ECM-receptor interaction (4512) 50 87 15 / 19 16 / 22 0.79 0.73that coexpress Cell cycle (4110) 114 36 / 51 0.71 Propanoate metabolism (640) 34 9 / 14 0.64 Butanoate metabolism (650) 44 9 / 14 0.64 Hematopoietic cell lineage (4640) 88 18 / 28 0.64but still noisy data !!! beta-Alanine metabolism (410) 24 7 / 11 0.64
  31. 31. Gene2gene coexpression method(based in combination of correlation r and crossvalidation N) reliable region ≠ random signal signal / noise ratio = high r > 0.68 and N > 220 (aprox) rN-plot r (correlation factor) noise region ≈ random signal signal / noise ratio = low N (number of positives in the random crossvalidation) Dr J De Las Rivas - 2011 31
  32. 32. Gene2gene coexpression methodmapping coexpression of house keeping genes and tissue specific genes(based in KEGG pathways) House Keeping Tissue Specific genes genes A MAS5 - Spearman B RMA - Pearson distribution density Cytokine-Cytokine Ribosome receptor intaction r (correlation factor) r (correlation factor) Oxidative Neuroactive ligand- phosphorylation receptor interaction Complement and coagulation Proteasome cascades N (nº of positives in the random crossvalidation) N (nº of positives in the random crossvalidation) Dr J De Las Rivas - 2011 32
  33. 33. Hi-Fi gene2gene coexpression network(based in combination of correlation r and crossvalidation N)precision obtained for 3 reliable networks at high r and N RMA – Pearson - Filtered Mas5 – Spearman - All ** * ** * Specificity Precision 1 Coefficients Number of Nodes 2 Number of Links 2 N r RMA-Pearson (pre-Filtered) 0.60 765 0.85 1.672 5.945 0.70 835 0.87 1.215 3.273 0.80 925 0.84 983 2.423 MAS5-Spearman (non-Filtered) 0.60 605 0.77 3.052 12.669 0.70 645 0.79 2.295 7.874 0.80 695 0.81 1.762 4.910 1. Corresponds to the networks derived for KEGG annotated genes 2. Corresponds to the full networks including all genes Dr J De Las Rivas - 2011 33
  34. 34. Hi-Fi human coexpression networknetwork = intersection with 2 methods and precision ≥ 0.60 (r ≥ 0.77, N ≥ 605) Dr J De Las Rivas - 2011 34
  35. 35. Hi-Fi human coexpression network Analysis done with 2 algorithms MCODE MCL nuclear related metabolism ribosomal and translation cytoskeleton Dr J De Las Rivas - 2011 35
  36. 36. Hi-Fi humancoexpression network Analysis done with 2 algorithms MCODE MCL mitochondrial metabolism and redox homeostasis most genes of the COX family, the NDUF family and the UQCR family
  37. 37. Hi-Fi human coexpression network(modules coherent in terms of transcription factor TF regulation) Coexp Modules Search in TF found p-value TransFac_db TF Gene Name Module 1 PAP MTF-1 0.001 T02354 MTF1 metal-regulatory transcription factor 1 10 genes Factory – – Module 2 PAP CRE-BP1 0.0172 T00167 ATF2 activating transcription factor 2 4 genes Factory CRE-BP1 0.0033 Module 3 PAP Sp1 0.13 T00759 SP1 Sp1 transcription factor 15 genes Factory Sp1 0.017 Dr J De Las Rivas - 2011 37
  38. 38. Human transcriptomic network of normal tissues: a global map without malignant dataWe achieved:1st.- Reliable calculation of human genome-wide (global) expression data2nd.- Reliable calculation of human gene2gene (global) co-expression data Dr J De Las Rivas - 2011 38
  39. 39. NetworksTwo major types of networks derived from experimental data Two major types of networks derived from large-scale omic data 1.– Gene Coexpression Networks: derived from gene expression profiling and transcriptomic studies 2.– Protein-Protein Interaction Networks: derived from proteomic studies Dr J De Las Rivas - 2011 39
  40. 40. Protein-Protein Interactions (PPIs) biological networks Zhu et al. (2007) Genes Dev. The review shows that PPI data are, at present, a major part of the new systematic approaches to large-scale experimental determination of biomolecular networksFrom Zhu et al. (2007) Genes Dev. Dr J De Las Rivas - 2011 40
  41. 41. Protein-Protein Interactions (PPIs)international consortiumsOur group participates actively in HUPO PSI-MI (Molecular Interactions Workgroup) Dr J De Las Rivas - 2011 41
  42. 42. Protein-Protein Interactions (PPIs)international consortiums There are several primary PPIs databases, but at present there is small integration. PPIs proteins & MIs biomolecules EU project PSIMExFP7-HEALTH-2007-223411
  43. 43. Protein-Protein Interactions (PPIs)review some essential concepts on PPIs PLoS Comp. Bio. (2010) Dr J De Las Rivas - 2011 43
  44. 44. Protein-Protein Interactions (PPIs) definition The advancement of genome and proteome-wide experimental technologies have introduced modern biology in the high complexity of living cells, where thousands of biomolecules work together with many cross-talks and cross-regulations. To achieve a first level of understanding of such cellular complexity we need to unravel the interactions that occur between all the proteins that integrate a living cell. BUT, what do we mean by protein-protein interaction ? just physical contact or other level of biomolecular relation / associationFrom De Las Rivas et al. (2004) Comp. Funct. Genomics Dr J De Las Rivas - 2011 44
  45. 45. Protein-Protein Interactions (PPIs) definition The advancement of genome and proteome-wide experimental technologies have introduced modern biology in the high complexity of living cells, where thousands of biomolecules work together with many cross-talks and cross-regulations. To achieve a first level of understanding of such cellular complexity we need to unravel the interactions that occur between all the proteins that integrate a living cell.From De Las Rivas et al. (2004) Comp. Funct. Genomics Dr J De Las Rivas - 2011 45
  46. 46. Protein-Protein Interactions (PPIs) definition It is important to define the different types of associations between proteins in order to make clear what are PPI. I.- The PPI are proper physical interactions (and these can be direct or indirect) pApBpCpD3pE is a complex pA pB physical direct pA with pB pC pD with pE or physical indirect pD x 3 pA with pD pB with pE pE complex = stable molecular machineFrom De Las Rivas et al. (2004) Comp. Funct. Genomics Dr J De Las Rivas - 2011 46
  47. 47. Protein-Protein Interactions (PPIs) definition It is important to define the different types of associations between proteins in order to make clear what are PPI. II.- PPI can be stable (i.e. complexes) or transient (i.e. in signaling pathways) pApBpCpD3pE is a complex interacts with other proteins pF pA pB pF interacts with the complex in transient mode pC physical direct pF with pApB pD x 3 or physical indirect pE pF with pE complex = stable molecular machineFrom De Las Rivas et al. (2004) Comp. Funct. Genomics Dr J De Las Rivas - 2011 47
  48. 48. Protein-Protein Interactions (PPIs) definition It is important to define the different types of associations between proteins in order to make clear what are PPI. III.- Just associations but not PPI (because there are not physical interactions) genetic gA and gX are corregulated pA pX metabolic pD and pY are involved in the same pathway pDx2 pY no physical interactionFrom De Las Rivas et al. (2004) Comp. Funct. Genomics Dr J De Las Rivas - 2011 48
  49. 49. Protein-Protein Interactions (PPIs) definition What do we mean by protein-protein interaction ? Protein-to-Protein interactions (PPIs) are specific physical contacts between protein pairs that occur by selective molecular docking in a particular biological context. Forward-looking two main challenges remain in the field: (i) a better filtering of false positives in the PPI collections (ii) an adequate distinction of the biological context that specifies and determines the existence or not of a given PPI at a given biological situation.From De Las Rivas & Fontanillo (2010) Dr J De Las Rivas - 2011 49
  50. 50. Protein-Protein Interactions (PPIs)review some essential concepts on PPIs PLoS Comp. Bio. (2010) 50
  51. 51. Protein-Protein Interactions (PPIs)types of experimental methods Within the last years a large amount of data on protein-protein interactions in cellular systems has been obtained both by the high-throughput and small scale technologies. A list of most relevant methods to is presented: Complex oriented methods (find multimeric PPIs) - Co-Immunoprecipitation (Co-IP) - Pull-Down Assays - Tandem Affinity Purification + Mass Spectrometry (TAP-MS) Binary oriented methods (find dimeric PPIs) - Two Hybrid systems (Y2H) - Protein Arrays / Protein Chips 3D-structure based methods (find specific PPI interfaces) - X-ray Crystallography (X-ray) - Electro Microscopy (EM) - Nuclear Magnetic Resonance (NMR) Dr J De Las Rivas - 2011 51
  52. 52. Protein-Protein Interactions (PPIs) types of experimental methods Data about the YEAST interactome Two main high-throughput proteomic techniques have been applied to determine PPIs: TAP-MS & Y2HFrom Reguly et al. (2006) Journal of Biology Dr J De Las Rivas - 2011 52
  53. 53. Protein-Protein Interactions (PPIs)major high-throughput experimental methodsIn recent years two main high-throughput proteomic techniques have beenapplied to determine PPIs: – Tandem-Affinity Purification and Mass Spectrometry (TAP-MS) provides multimer interactions (complexes) – High-throughput Two-Hybrid systems (Y2H) provides binary interactions Dr J De Las Rivas - 2011 53
  54. 54. Protein Interactions PPIsTAP-MSTandem-Affinity Purification andMass Spectrometry (TAP-MS)provides multimer interactions (complexes) Bait and Prey system The "bait proteins" are prepared with tags in order to fish the "prey proteins” The co-purified partners are identified several timesFrom Wodak et al. (2008) Mol Cel Proteomics
  55. 55. Protein Interactions PPIsTAP-MSTandem-Affinity Purification andMass Spectrometry (TAP-MS)provides multimer interactions (complexes) Once the tables of co-purified partners are produced the spokes model is applied to estimate the binary interactionsFrom Wodak et al. (2008) Mol Cel Proteomics Dr J De Las Rivas - 2011 55
  56. 56. Protein Interactions PPIsY2H High-throughput Two-Hybrid systems provide binary interactions (a) Y2H (yeast two hybrid) system, in yeast cells (b) LUMIER system (luciferase), in mammalian cellsFrom Stelzl & Wanker (2006) Curr Opin Chem Biol Dr J De Las Rivas - 2011 56
  57. 57. Protein Interactions PPIsY2H High-throughput Two-Hybrid systems provide binary interactions Y2H classical system: Coding sequences for a protein X and a protein Y are fused to a DNA binding domain (DBD, i.e. bait plasmid) and a transcription activation domain (AD, i.e. prey plasmid). Upon interaction of protein X and protein Y, transcriptional activity of the DBD and AD domains is reconstituted leading to reporter gene activation. LUMIER system: Coding sequences for a protein X and a protein Y are fused to a 6xFLAG tag sequence and to renilla luciferase and cotransfected in mammalian cells. Upon interaction of protein X and protein Y, the luciferase fusion protein remains bound during the procedure and is detected via light emission. 57
  58. 58. Protein Interactions PPIsY2H High-throughput Two-Hybrid systems provide binary interactions membrane Y2H (Cub-Nub) mammalian PPI trap (JAK-STAT3) classical nuclear Y2H split-TEV 2H system (scissors)From Suter et al. (2008) Curr Opin Biotechnology Dr J De Las Rivas - 2011 58
  59. 59. Protein-Protein Interactions (PPIs)review some essential concepts on PPIs Dr J De Las Rivas - 2011 59
  60. 60. Protein-Protein Interactions (PPIs)experimental methods Within the last years a large amount of data on protein-protein interactions in cellular systems has been obtained both by the high-throughput and small scale technologies. A list of most relevant methods to is presented: Complex oriented methods (find multimeric PPIs) - Co-Immunoprecipitation (Co-IP) - Pull-Down Assays - Tandem Affinity Purification + Mass Spectrometry (TAP-MS) Binary oriented methods (find dimeric PPIs) - Two Hybrid systems (Y2H) - Protein Arrays / Protein Chips 3D-structure based methods (find specific PPI interfaces) - X-ray Crystallography (X-ray) - Electro Microscopy (EM) - Nuclear Magnetic Resonance (NMR) Dr J De Las Rivas - 2011 60
  61. 61. Protein Interactions (PIs)protein arrays/chips: multiple technologies to find protein interactions Dr J De Las Rivas - 2011 61
  62. 62. Protein Interactions (PIs) 2007protein arrays/chips: multiple technologies to find protein interactions Multiple types of protein arrays ≈ protein chips designed to find different types of protein interactions: – protein - ligand interactions (ligands ≈ metabolites, drugs, chemicals, ...) – protein - antibody interactions (the protein is the antigen) – protein - DNA/RNA interactions (many proteins bind nucleic acids) – protein - protein interactions (many proteins have specific binding to other proteins in a stable or transient way) Hall, Ptacek & Snyder (2007) Dr J De Las Rivas - 2011 62
  63. 63. Protein Interactions (PIs)protein arrays: 1st data analysis step is the signal quantification Antibody Signal quantification is a arrays technical problem that has to be resolved by each platform with maximum precission and accuracy Proteome Reverse arrays arrays Peptide arrays Dr J De Las Rivas - 2011 63
  64. 64. Protein-Protein Interactions (PPIs)proteins arrays to detect protein-protein interations Within the last years a large amount of data on protein-protein interactions in cellular systems has been obtained both by the high-throughput and small scale technologies. A list of most relevant methods to is presented: Complex oriented methods (find multimeric PPIs) - Co-Immunoprecipitation (Co-IP) - Pull-Down Assays - Tandem Affinity Purification + Mass Spectrometry (TAP-MS) Binary oriented methods (find dimeric PPIs) - Two Hybrid systems (Y2H) - Protein Arrays / Protein Chips 3D-structure based methods (find specific PPI interfaces) - X-ray Crystallography (X-ray) - Electro Microscopy (EM) - Nuclear Magnetic Resonance (NMR) Dr J De Las Rivas - 2011 64
  65. 65. Protein-Protein Interactions (PPIs) data sources: databases PLoS Comp. Bio. (2010) Name DB full name and type PPIs sources Type of MI species n prot. n interact. Primary Databases: PPI experimental data (curated from specific SSc & LSc published studies) (Dec.2009) (Dec.2009) BIND Biomolecular Interaction Network Database Ssc & Lsc published studies (literature-curated) PPIs & others all [31972] [58266] BioGRID General Repository for Interaction Datasets Ssc & Lsc published studies (literature-curated) PPIs & others all [28717] [108691] DIP Database of Interacting Proteins Ssc & Lsc published studies (literature-curated) only PPIs all 20728 57683 HPRD Human Protein Reference Database Ssc & Lsc published studies (literature-curated) only PPIs human 27081 38806 IntAct Database of protein InterAction data Ssc & Lsc published studies (literature-curated) PPIs & others all [60504] [202826] MINT Molecular INTeractions database Ssc & Lsc published studies (literature-curated) only PPIs all 30089 83744 MIPS-MPact MIPS protein interaction resource on yeast derived from CYGD only PPIs yeast 1500 4300 MIPS-MPPI MIPS mammalian protein-protein interaction db Ssc published studies (literature-curated) only PPIs mammalian 982 937 Meta Databases: PPI experimental data (integrated and unified from different public repositories) APID Agile Protein Interaction DataAnalyzer BIND, BioGRID, DIP, HPRD, IntAct, MINT only PPIs all 56460 322579 MPIDB The microbial protein interaction database BIND, DIP, IntAct, MINT, other sets (exp & litcur) only PPIs microbial 7810 24295 PINA Protein Interaction Network Analysis platform BioGRID, DIP, HPRD, IntAct, MINT, MPact only PPIs all [?] 188823 Prediction Databases: PPI experimental & predicted data ("functional interactions", i.e. interactions lato sensu derived from different types of data) MiMI Michigan Molecular Interactions BIND, BioGRID, DIP, HPRD, IntAct & nonPPI dt PPIs & others all [45452] [391386] PIPs Human protein-protein interactions prediction db BIND, DIP, HPRD, OPHID & nonPPI dt PPIs & others human [?] [37606] OPHID Online Predicted Human Interaction Database BIND, BioGRID, HPRD, IntAct, MINT, MPact & nonPPI dt PPIs & others human [?] [424066] STRING Known and Predicted Protein-Protein Interactions BIND, BioGRID, DIP, HPRD, IntAct, MINT & nonPPI dt PPIs & others all [2590259] [88633860] UniHI Unified Human Interactome BIND, BioGRID, DIP, HPRD, IntAct, MINT & nonPPI dt PPIs & others human [22307] [200473]From De Las Rivas & Fontanillo (2010) Dr J De Las Rivas - 2011 65
  66. 66. Protein-Protein Interactions (PPIs) experimental vs computational For a proper study of protein-protein interactions it is very important to distinguish and separate the data that come from experimental methods (provided PPIs validated in the lab by some technique) & the data coming from computational methods (that provided PPIs infered but not really proved). Many databases and repositories of PPIs include both experimentally and computationally determined interactions and this mix may produce confusion or false expectations in the analyses done on these combined data.From Aloy and Russell (2006) Nature Reviews Dr J De Las Rivas - 2011 66
  67. 67. Protein-Protein Interactions (PPIs) types of databases There are several types of PPIs databases: - primary-db - meta-db - prediction-dbName DB full name and type PPIs sources Type of MI species n prot. n interact.Primary Databases: PPI experimental data (curated from specific SSc & LSc published studies) (Dec.2009) (Dec.2009)BIND Biomolecular Interaction Network Database Ssc & Lsc published studies (literature-curated) PPIs & others all [31972] [58266]BioGRID General Repository for Interaction Datasets Ssc & Lsc published studies (literature-curated) PPIs & others all [28717] [108691]DIP Database of Interacting Proteins Ssc & Lsc published studies (literature-curated) only PPIs all 20728 57683HPRD Human Protein Reference Database Ssc & Lsc published studies (literature-curated) only PPIs human 27081 38806IntAct Database of protein InterAction data Ssc & Lsc published studies (literature-curated) PPIs & others all [60504] [202826]MINT Molecular INTeractions database Ssc & Lsc published studies (literature-curated) only PPIs all 30089 83744MIPS-MPact MIPS protein interaction resource on yeast derived from CYGD only PPIs yeast 1500 4300MIPS-MPPI MIPS mammalian protein-protein interaction db Ssc published studies (literature-curated) only PPIs mammalian 982 937Meta Databases: PPI experimental data (integrated and unified from different public repositories)APID Agile Protein Interaction DataAnalyzer BIND, BioGRID, DIP, HPRD, IntAct, MINT only PPIs all 56460 322579MPIDB The microbial protein interaction database BIND, DIP, IntAct, MINT, other sets (exp & litcur) only PPIs microbial 7810 24295PINA Protein Interaction Network Analysis platform BioGRID, DIP, HPRD, IntAct, MINT, MPact only PPIs all [?] 188823Prediction Databases: PPI experimental & predicted data ("functional interactions", i.e. interactions lato sensu derived from different types of data)MiMI Michigan Molecular Interactions BIND, BioGRID, DIP, HPRD, IntAct & nonPPI dt PPIs & others all [45452] [391386]PIPs Human protein-protein interactions prediction db BIND, DIP, HPRD, OPHID & nonPPI dt PPIs & others human [?] [37606]OPHID Online Predicted Human Interaction Database BIND, BioGRID, HPRD, IntAct, MINT, MPact & nonPPI dt PPIs & others human [?] [424066]STRING Known and Predicted Protein-Protein Interactions BIND, BioGRID, DIP, HPRD, IntAct, MINT & nonPPI dt PPIs & others all [2590259] [88633860]UniHI Unified Human Interactome BIND, BioGRID, DIP, HPRD, IntAct, MINT & nonPPI dt PPIs & others human [22307] [200473]From De Las Rivas & Fontanillo (2010) Dr J De Las Rivas - 2011 67
  68. 68. From protein interactions to protein networksintegration & unification of protein interaction dataWe have developed a database that integrates and unifies PPIs: APID & APID2NET Experimental data unified in APID Dr J De Las Rivas - 2011 68
  69. 69. Protein-Protein Interactions (PPIs)integration & unification of PPI data APID (Agile Protein Interaction DataAnalizer) http://bioinfow.dep.usal.es/apidAt present 6 source PPI DBs were unified:• BIND (Biomolecular Interaction Network DB) Data integration• BioGRID (Biological Gral. Repository for Interaction Datasets) & unification• DIP (Database of Interacting Proteins) by• HPRD (Human Protein Reference Database) Sequence• IntAct (Database system & analysis tools for PI data) UniProt_ID• MINT (Molecular Interactions Database) PubMed_ID
  70. 70. Protein-Protein Interactions (PPIs)integration & unification of PPI data APID (Agile Protein Interaction DataAnalizer) http://bioinfow.dep.usal.es/apidWe are developing a new APID database that will integrate PDB and sDDI (3D) data Dr J De Las Rivas - 2011 70
  71. 71. Protein-Protein Interactions (PPIs)integration & unification of PPI data: hsPPIs in APIDThere are several primary PPIs databases, but at present there is small integration: in 2010 human proteins 11998 proteins & human interactions 80032 interactions in 2007 human interactions 38832 interactions ↓ in 2010 human interactions 80032 interactions Dr J De Las Rivas - 2011 71
  72. 72. NetworksTwo major types of networks derived from experimental data Two major types of networks derived from large-scale omic data 1.– Gene Coexpression Networks: derived from gene expression profiling and transcriptomic studies 2.– Protein-Protein Interaction Networks: derived from proteomic studies Dr J De Las Rivas - 2011 72
  73. 73. Omics era: unraveling biological complexitymapping biological networks How can we characterize Zhu et al. (2007) Genes Dev. biomolecular networks and measure parameters that allow to understand the role of different nodes & edges in a given network ? (graph & network theory)From Zhu et al. (2007) Genes Dev. Dr J De Las Rivas - 2011 73
  74. 74. Protein-Protein Interactions (PPIs)representation and visualization of networksFrom Merico, Gfeller & Bader (2010) Nature Biotechnology Dr J De Las Rivas - 2011 74
  75. 75. Omics era: unraveling biological complexitymapping biological networks The interactome is a complex biomolecular network Biomolecular networks are scale-free. A scale-free network has more high-degree nodes and a power-law degree distribution, which leads to a straight line when plotting the total number of nodes with a particular degree versus that degree in log-log scales.From Zhu et al. (2007) Genes Dev. Dr J De Las Rivas - 2011 75
  76. 76. Omics era: unraveling biological complexitymapping biological networks Biological networks derived from PPIs are not randomly organized but rather have a scale-free format, containing a small number of nodes (hubs) with many connections (Barabasi and Oltvai 2004). This organization was originally discovered in World Wide Web (www) interactions and later found to exist in biological networks (Barabasi and Albert 1999; Jeong et al. 2000, 2001; Guelzim et al. 2002; Tong et al. 2004). Compared with a bell-shaped degree distribution in random networks, scale-free networks have a typical power law distribution: a fat-tailed distribution in which there are vertices with high degrees termed hubs. The advantage of this type of organization is that the system is more robust: random loss of individual non-hub vertices is less disruptive in scale-free networks than random networks. Network topology plays a vital role in understanding network architecture and performance. It is important to know the most important and commonly used topological parameters that can be calculated in a network.From Zhu et al. (2007) Genes Dev. Dr J De Las Rivas - 2011 76
  77. 77. Protein-Protein Interactions (PPIs)topological parametersThe interactome is abiomolecular networkNetwork topology plays a vital role inunderstanding network architecture andperformance.Several of the most important andcommonly used topological parametersinclude:– degree number of links connected to 1 vertex– distance shortest path length– diameter maximum distance between anytwo nodes– clustering coefficient number of linksbetween the vertices within its neighborhood dividedby the number of possible links between them– betweenness fraction of the shortest pathsbetween all pairs of vertices that pass through onevertex or link From Zhu et al. (2007) Genes Dev. Dr J De Las Rivas - 2011 77
  78. 78. Protein-Protein Interactions (PPIs) From Zhu et al. (2007) Genes Dev. 78
  79. 79. Protein-Protein Interactions (PPIs)build networks from experimental data: CytoscapeChallenge: improve data integration and analytic methods to understand networks http://www.cytoscape.org/ Dr J De Las Rivas - 2011 79
  80. 80. Protein-Protein Interactions (PPIs)build networks from experimental data: Cytoscape http://www.cytoscape.org/ Comparison of network analyses platforms From Cline et al. (2007) Nature Protocols Dr J De Las Rivas - 2011 80
  81. 81. Protein-Protein Interactions (PPIs)build networks from experimental data: Cytoscape 81
  82. 82. Protein-Protein Interactions (PPIs)build networks from experimental data: APID2NETUsing Cytoscapeand the pluginAPID2NETyou can build aPPI networkby direct queryand retrieval fromAPID Dr J De Las Rivas - 2011 82
  83. 83. Protein-Protein Interactions (PPIs)build networks from experimental data: APID2NET Download activities for APID2NET 6772 (August 2011) Dr J De Las Rivas - 2011 83
  84. 84. Protein-Protein Interactions (PPIs)two publications Dr J De Las Rivas - 2011 84
  85. 85. Protein-Protein Interactions (PPIs)PSICQUIC new tool and service Aranda et al. (2011) Nature Methods Dr J De Las Rivas - 2011 85
  86. 86. Protein-Protein Interactions (PPIs)PSICQUIC new tool and service Aranda et al. (2011) Nature Methods Dr J De Las Rivas - 2011 86
  87. 87. Protein-Protein Interactions (PPIs)PSICQUIC & PSISCORE Aranda et al. (2011) Nature Methods 87
  88. 88. Protein-Protein Interactions (PPIs) Javier De Las Rivas References•  Aranda et al. (2011) PSICQUIC and PSISCORE: accessing and scoring molecular interactions. Nature Methods 8, 528–529.•  De Las Rivas J, Fontanillo C. (2010) Protein-Protein Interactions essentials: key concepts to building and analyzing Interactome Networks. PLoS Computational Biology 6(6): e1000807.•  Prieto C, De Las Rivas J. (2010) Structural domain-domain interactions: assessment and comparison with protein-protein interaction data to improve the interactome. Proteins 78:109-117.•  Hernandez-Toro J., Prieto C, De Las Rivas J. (2007). APID2NET: unified interactome graphic analyzer. Bioinformatics 23: 2495-2497.•  Prieto C, De Las Rivas J. (2006). APID: Agile Protein Interaction DataAnalyzer. Nucleic Acids Research 34: W298-302. WEB References http://bioinfow.dep.usal.es/apid http://ubioinfo.cicancer.org/en/index-en.html Dr J De Las Rivas - 2011 88
  89. 89. Dr J De Las Rivas - 2011 89
  90. 90. THANKS Bioinformatics and Functional Genomics Research GroupCancer Research Center (CiC, CSIC/USAL), Salamanca, Spain http://bioinfow.dep.usal.es University of Salamanca founded in 1130 universal chartered in 1216 Dr J De Las Rivas - 2011 90

×