Computa(onal	
  tools	
  for	
  going
from	
  molecules	
  to	
  interac(ons
...and	
  back
Benno	
  Schwikowski
Systems	
  Biology	
  Lab
Ins(tut	
  Pasteur,	
  Paris
Phenotype
Adapted	
  from	
  E.	
  Zerhouni’s	
  talkKohn, 1999
20,000 200,000,000
Molecules Interac(ons
Nature	
  News,	
  18	
  July	
  2012 Reactome,	
  18	
  July	
  2013
From	
  molecules	
  to	
  networks	
  
Network	
  inference	
  in	
  Cytoscape	
  3
From	
  networks	
  to	
  molecules
How	
  networks	
  can	
  help	
  to	
  iden(fy	
  proteins
Cytoscape
Open-­‐source	
  plaLorm	
  for	
  biological	
  network	
  
data	
  integra(on,	
  analysis,	
  and	
  visualiza(on
– Free	
  &	
  Open-­‐source	
  (LPGL)
– Developed	
  and	
  maintained	
  by	
  universi(es,	
  companies,	
  
and	
  research	
  ins(tu(ons
– Expandable	
  by	
  Apps/Plugins
6
Show	
  the	
  
results
7
VizMapper
Layouts
Cytoscape	
  
Apps
Visualiza3on
Computa3onal
Analysis
Human
analysis
Filtering
Selec(on
Data
import
Data
export
Cytoscape	
  Workflow
Annotated	
  Network
Core	
  Concepts	
  -­‐	
  Integra(on
• Networks	
  &	
  Data	
  Tables	
  (A[ributes)
8
VizMapper
Core	
  Concepts	
  -­‐	
  Visual	
  mapping
9
Use	
  specific	
  line	
  types	
  to	
  
indicate	
  different	
  types	
  of	
  
interac(ons
Browse	
  extremely	
  dense	
  
networks	
  by	
  controlling	
  for	
  the	
  
opacity	
  of	
  nodes Expression	
  data	
  mapping
Set	
  node	
  sizes	
  based	
  on	
  the	
  degree	
  
of	
  connec(vity	
  of	
  the	
  nodes
Encode	
  specific	
  physical	
  en((es	
  
as	
  different	
  node	
  shapes
Data	
  Table
Core	
  Concepts	
  -­‐	
  Analysis
Apps/Plugins:	
  Expanding	
  Cytoscape	
  Func(onality
10
Berlin,	
  July	
  18,	
  2013
Import	
  Networks
11
• Network	
  Data	
  Formats
– SIF
– GML
– XGMML
– GraphML
– BioPAX
– PSI-­‐MI
– SBML
– KGML(KEGG)
– Excel
– Delimited	
  Text	
  Table
– CSV
– Tab
• Network Databases
– Protein - Protein
– STRING - IntAct
– Genetic
– BioGRID
– Protein - Compound
– ChEMBL
– Human-Curated
Pathways
– KEGG, Reactome,
PathwayCommons
Berlin,	
  July	
  18,	
  2013
Import	
  Data	
  Table	
  (A[ributes)
• Data	
  Table:	
  Any	
  data	
  that	
  
describes	
  or	
  provides	
  details	
  
about	
  nodes,	
  edges,	
  and	
  
networks
• Anything	
  saved	
  as	
  a	
  table	
  can	
  
be	
  loaded	
  into	
  Cytoscape
– Excel
– Tab	
  Delimited	
  Document
– CSV
• As	
  long	
  as	
  proper	
  mapping	
  key	
  
is	
  available,	
  Cytoscape	
  can	
  map	
  
them	
  to	
  your	
  networks
12
BRCA1
GO Terms:
DNA Repair
Cell Cycle
DNA Binding
NCBI Gene ID 672
On Chromosome 16 Ensemble ID
ENSG00000012048
Public	
  Data	
  Sources
Berlin,	
  July	
  18,	
  2013
What’s	
  new	
  in	
  3.0
13
• 2.x	
  done	
  without	
  explicit	
  design	
  guidelines	
  or	
  standards
• No	
  well-­‐defined	
  API
• Hard	
  to	
  maintain	
  and	
  improve	
  (plugins	
  breaking)
• Plugins	
  could	
  not	
  share	
  func^onality	
  
Berlin,	
  July	
  18,	
  2013
Cytoscape	
  3	
  –	
  Reasons	
  for	
  the	
  rewrite
14
Berlin,	
  July	
  18,	
  2013
Cytoscape	
  3.0	
  –	
  A	
  complete	
  rewrite
• New	
  modular	
  architecture	
  based	
  on	
  OSGi
• Compa^bility	
  with	
  3.0	
  guarantees	
  compa^bility	
  with	
  3.x
• Clear	
  and	
  simplified	
  API	
  (implementa^on	
  separate)
• RootNetwork/SubNetwork	
  design
• Acributes	
  are	
  replaced	
  by	
  Tables	
  (‘first-­‐class	
  ci^zens’)
– CyRow	
  and	
  CyColumn	
  interfaces	
  
• Apps	
  can	
  talk	
  to	
  each	
  other	
  now,	
  much	
  less	
  likely	
  to	
  break
• All	
  plugins	
  need	
  to	
  be	
  converted	
  to	
  Apps
15
• 140+ plugins for version 2.x series
• 16 apps for 3.x series
Berlin,	
  July	
  18,	
  2013
Status of apps/plugins
16
3.0 Apps
jActiveModules
MCODE
AgilentLiterature Search
VennDiagramGenerator
ClusterONE
Centiscape
GeneMANIA
Integrated in 3.0 Core
EnhancedSearch
BiomartClient
NetworkAnalyzer
Plugins	
  being	
  ported
ClusterMaker
Genoscape
MiMiplugin
...
Berlin,	
  July	
  18,	
  2013
What’s	
  new	
  in	
  3.0
• hcp://apps.cytoscape.org
17
Cytoscape 3.x
Cyni Toolbox
GUI
Cyni API
- Cyni Interfaces
- Cyni Data Structure
- Utility Methods
Data
Imputation
Network
Inference
Data
Discretization
Metrics
Cyni Apps
User 2: Method Developer
New Network
Inference Method
User 1: Biologists
Load Data
Berlin,	
  July	
  18,	
  2013
Cyni network inference
19
Estimate Data Discretize Data Infer Network
Berlin,	
  July	
  18,	
  2013
Cyni	
  Network	
  inference	
  toolbox
• Cyni	
  provides
– A	
  few	
  built-­‐in	
  algorithms
– Data	
  imputa^on	
  and	
  discre^za^on	
  techniques
– Several	
  known	
  metrics	
  (correla^on,	
  bayesian,...)
– Documented	
  API
– Tutorials	
  and	
  sample	
  code
• First	
  3.0	
  app	
  that	
  exports	
  func^onality
• Addi^onal	
  implementa^ons	
  underway	
  (ARACNe)
20
From	
  molecules	
  to	
  networks	
  
Network	
  inference	
  in	
  Cytoscape	
  3
From	
  networks	
  to	
  molecules
How	
  networks	
  can	
  help	
  to	
  iden(fy	
  proteins
Motivation
• Study of 24 smooth muscle cells over many
years
• Proteomic analysis of many samples revealed
systematic differences between two groups
• Close analysis revealed that the causative
factor is the use of bovine DNAse I in the
protein extraction protocol
22
Affected SMC protein extracts
3 104 5 6 7 8 9
43
34
26
55
95
130
17
11
Unaffected SMC protein extracts
43
34
26
55
95
130
17
11
3 104 5 6 7 8 9
DIGE
Without
DNAse I
treatment
DIGE
With
DNAse I
treatment
Acosta-Martin, Gwinner, Pinet, Schwikowski, unpublished
First bioinformatic analysis
• 11 unaffected and 13 affected SMC protein
extracts (as identified by absence of 3 large
spots)
• 569 out of 853 spots differentially expressed,
408 with FC>2, 135 significant (62 down, 73
up)
• Identification of 41 proteins from 102 spots
• GO analysis: >50% in apoptosis, cell motion,
actin cytoskeleton reorganization
24
The Steiner tree approach
• “Explanation”=
connected
network
• Parsimony
principle: Use
the minimum
number of
additional
proteins
25
Steiner PPI analysis
• Started with 41 original proteins + DNAse I –
ACAP1 (unconnected)
• Use BIND and IntAct databases:
–51,975 interactions among 21,022 proteins
• Weight edges with inverse functional similarity
score (between 0 and 10)
• Use Steiner heuristic implemented in the
GOBLIN tool (Univ. Augsburg)
26
Schlicker (2007), Nucleic Acids Research
Mehlhorn (1988) Information Processing Letters
Sanity check: Is the resulting network
better than chance?
27
Network length Number of Steiner nodes
Resulting Steiner network
28
Gwinner et al. (2013), Proteomics
Resulting list of Steiner nodes
29
• Focus on Steiner nodes with meaningful
connections to input proteins:
Sort by score sum over all interactions to
input proteins
55 kDa 43 kDa
ArbitraryUnits/1000
ArbitraryUnits/10
Experimental validation
Gwinner et al., Proteomics (2013)
From	
  molecules	
  to	
  networks	
  
Network	
  inference	
  in	
  Cytoscape	
  3
From	
  networks	
  to	
  molecules
How	
  networks	
  can	
  help	
  to	
  iden(fy	
  proteins
Galagan	
  et	
  al.,
Nature	
  499	
  (11	
  July	
  2013)
33
Large-scale
measurement
Biology
Computation
Manipulate
Measure
Mine
Model
Ideker/Lauffenburger	
  2006
Berlin,	
  July	
  18,	
  2013
Questions beyond ‘the best network’
• Which parts of a given network are consistent with
the data?
• Which parts of the network are we sure of, given the
data?
• Which interactions could be added (removed) to
make the data compatible with the model?
• Which experiment could be done to better
distinguish different possible models?
34
Postdocs
Ph.D.
students
Senior
So@ware
Engineer
Xiaoyi	
  Chen	
  
Oriol	
  GuitartFreddy	
  Cliquet
Frederik	
  Gwinner	
  
Robin	
  Friedman	
  
Master
students
Iryna	
  Nikolayeva
Systems	
  Biology	
  Lab
Leif	
  Blaese	
  
Steiner	
  approach
Adelina	
  Acosta-­‐Mar(n,
Florence	
  Pinet	
  (Inst.	
  Pasteur	
  Lille)
Cytoscape/Cyni
Part	
  of	
  	
  
Gary	
  Bader	
  &	
  Co.	
  (U.	
  Toronto)
Alexander	
  Pico	
  &	
  Co	
  (Gladstone	
  SFO)
Trey	
  Ideker	
  &	
  Co.	
  (UC	
  San	
  Diego)
Chris	
  Sander	
  &	
  Co.	
  (MSKCC	
  NYC)
Piet	
  Molenaar
Agilent
Leroy	
  Hood	
  &	
  Co.	
  (ISB	
  Sea[le)
Collaborators
36
Berlin,	
  July	
  18,	
  2013
Cytoscape Retreat 2013
Pasteur Institute, Paris
Oct 9: Symposium on Network Biology
Oct 10: Cytoscape User and Developer Tutorials
http://nrnb.org/cyretreat/
37

NetBioSIG2013-KEYNOTE Benno Schwikowski

  • 1.
    Computa(onal  tools  for  going from  molecules  to  interac(ons ...and  back Benno  Schwikowski Systems  Biology  Lab Ins(tut  Pasteur,  Paris
  • 2.
    Phenotype Adapted  from  E.  Zerhouni’s  talkKohn, 1999
  • 3.
  • 4.
    Nature  News,  18  July  2012 Reactome,  18  July  2013
  • 5.
    From  molecules  to  networks   Network  inference  in  Cytoscape  3 From  networks  to  molecules How  networks  can  help  to  iden(fy  proteins
  • 6.
    Cytoscape Open-­‐source  plaLorm  for  biological  network   data  integra(on,  analysis,  and  visualiza(on – Free  &  Open-­‐source  (LPGL) – Developed  and  maintained  by  universi(es,  companies,   and  research  ins(tu(ons – Expandable  by  Apps/Plugins 6
  • 7.
    Show  the   results 7 VizMapper Layouts Cytoscape   Apps Visualiza3on Computa3onal Analysis Human analysis Filtering Selec(on Data import Data export Cytoscape  Workflow
  • 8.
    Annotated  Network Core  Concepts  -­‐  Integra(on • Networks  &  Data  Tables  (A[ributes) 8
  • 9.
    VizMapper Core  Concepts  -­‐  Visual  mapping 9 Use  specific  line  types  to   indicate  different  types  of   interac(ons Browse  extremely  dense   networks  by  controlling  for  the   opacity  of  nodes Expression  data  mapping Set  node  sizes  based  on  the  degree   of  connec(vity  of  the  nodes Encode  specific  physical  en((es   as  different  node  shapes Data  Table
  • 10.
    Core  Concepts  -­‐  Analysis Apps/Plugins:  Expanding  Cytoscape  Func(onality 10
  • 11.
    Berlin,  July  18,  2013 Import  Networks 11 • Network  Data  Formats – SIF – GML – XGMML – GraphML – BioPAX – PSI-­‐MI – SBML – KGML(KEGG) – Excel – Delimited  Text  Table – CSV – Tab • Network Databases – Protein - Protein – STRING - IntAct – Genetic – BioGRID – Protein - Compound – ChEMBL – Human-Curated Pathways – KEGG, Reactome, PathwayCommons
  • 12.
    Berlin,  July  18,  2013 Import  Data  Table  (A[ributes) • Data  Table:  Any  data  that   describes  or  provides  details   about  nodes,  edges,  and   networks • Anything  saved  as  a  table  can   be  loaded  into  Cytoscape – Excel – Tab  Delimited  Document – CSV • As  long  as  proper  mapping  key   is  available,  Cytoscape  can  map   them  to  your  networks 12 BRCA1 GO Terms: DNA Repair Cell Cycle DNA Binding NCBI Gene ID 672 On Chromosome 16 Ensemble ID ENSG00000012048 Public  Data  Sources
  • 13.
    Berlin,  July  18,  2013 What’s  new  in  3.0 13
  • 14.
    • 2.x  done  without  explicit  design  guidelines  or  standards • No  well-­‐defined  API • Hard  to  maintain  and  improve  (plugins  breaking) • Plugins  could  not  share  func^onality   Berlin,  July  18,  2013 Cytoscape  3  –  Reasons  for  the  rewrite 14
  • 15.
    Berlin,  July  18,  2013 Cytoscape  3.0  –  A  complete  rewrite • New  modular  architecture  based  on  OSGi • Compa^bility  with  3.0  guarantees  compa^bility  with  3.x • Clear  and  simplified  API  (implementa^on  separate) • RootNetwork/SubNetwork  design • Acributes  are  replaced  by  Tables  (‘first-­‐class  ci^zens’) – CyRow  and  CyColumn  interfaces   • Apps  can  talk  to  each  other  now,  much  less  likely  to  break • All  plugins  need  to  be  converted  to  Apps 15
  • 16.
    • 140+ pluginsfor version 2.x series • 16 apps for 3.x series Berlin,  July  18,  2013 Status of apps/plugins 16 3.0 Apps jActiveModules MCODE AgilentLiterature Search VennDiagramGenerator ClusterONE Centiscape GeneMANIA Integrated in 3.0 Core EnhancedSearch BiomartClient NetworkAnalyzer Plugins  being  ported ClusterMaker Genoscape MiMiplugin ...
  • 17.
    Berlin,  July  18,  2013 What’s  new  in  3.0 • hcp://apps.cytoscape.org 17
  • 18.
    Cytoscape 3.x Cyni Toolbox GUI CyniAPI - Cyni Interfaces - Cyni Data Structure - Utility Methods Data Imputation Network Inference Data Discretization Metrics Cyni Apps User 2: Method Developer New Network Inference Method User 1: Biologists
  • 19.
    Load Data Berlin,  July  18,  2013 Cyni network inference 19 Estimate Data Discretize Data Infer Network
  • 20.
    Berlin,  July  18,  2013 Cyni  Network  inference  toolbox • Cyni  provides – A  few  built-­‐in  algorithms – Data  imputa^on  and  discre^za^on  techniques – Several  known  metrics  (correla^on,  bayesian,...) – Documented  API – Tutorials  and  sample  code • First  3.0  app  that  exports  func^onality • Addi^onal  implementa^ons  underway  (ARACNe) 20
  • 21.
    From  molecules  to  networks   Network  inference  in  Cytoscape  3 From  networks  to  molecules How  networks  can  help  to  iden(fy  proteins
  • 22.
    Motivation • Study of24 smooth muscle cells over many years • Proteomic analysis of many samples revealed systematic differences between two groups • Close analysis revealed that the causative factor is the use of bovine DNAse I in the protein extraction protocol 22
  • 23.
    Affected SMC proteinextracts 3 104 5 6 7 8 9 43 34 26 55 95 130 17 11 Unaffected SMC protein extracts 43 34 26 55 95 130 17 11 3 104 5 6 7 8 9 DIGE Without DNAse I treatment DIGE With DNAse I treatment Acosta-Martin, Gwinner, Pinet, Schwikowski, unpublished
  • 24.
    First bioinformatic analysis •11 unaffected and 13 affected SMC protein extracts (as identified by absence of 3 large spots) • 569 out of 853 spots differentially expressed, 408 with FC>2, 135 significant (62 down, 73 up) • Identification of 41 proteins from 102 spots • GO analysis: >50% in apoptosis, cell motion, actin cytoskeleton reorganization 24
  • 25.
    The Steiner treeapproach • “Explanation”= connected network • Parsimony principle: Use the minimum number of additional proteins 25
  • 26.
    Steiner PPI analysis •Started with 41 original proteins + DNAse I – ACAP1 (unconnected) • Use BIND and IntAct databases: –51,975 interactions among 21,022 proteins • Weight edges with inverse functional similarity score (between 0 and 10) • Use Steiner heuristic implemented in the GOBLIN tool (Univ. Augsburg) 26 Schlicker (2007), Nucleic Acids Research Mehlhorn (1988) Information Processing Letters
  • 27.
    Sanity check: Isthe resulting network better than chance? 27 Network length Number of Steiner nodes
  • 28.
    Resulting Steiner network 28 Gwinneret al. (2013), Proteomics
  • 29.
    Resulting list ofSteiner nodes 29 • Focus on Steiner nodes with meaningful connections to input proteins: Sort by score sum over all interactions to input proteins
  • 30.
    55 kDa 43kDa ArbitraryUnits/1000 ArbitraryUnits/10 Experimental validation Gwinner et al., Proteomics (2013)
  • 31.
    From  molecules  to  networks   Network  inference  in  Cytoscape  3 From  networks  to  molecules How  networks  can  help  to  iden(fy  proteins
  • 32.
    Galagan  et  al., Nature  499  (11  July  2013)
  • 33.
  • 34.
    Berlin,  July  18,  2013 Questions beyond ‘the best network’ • Which parts of a given network are consistent with the data? • Which parts of the network are we sure of, given the data? • Which interactions could be added (removed) to make the data compatible with the model? • Which experiment could be done to better distinguish different possible models? 34
  • 35.
    Postdocs Ph.D. students Senior So@ware Engineer Xiaoyi  Chen   Oriol  GuitartFreddy  Cliquet Frederik  Gwinner   Robin  Friedman   Master students Iryna  Nikolayeva Systems  Biology  Lab Leif  Blaese  
  • 36.
    Steiner  approach Adelina  Acosta-­‐Mar(n, Florence  Pinet  (Inst.  Pasteur  Lille) Cytoscape/Cyni Part  of     Gary  Bader  &  Co.  (U.  Toronto) Alexander  Pico  &  Co  (Gladstone  SFO) Trey  Ideker  &  Co.  (UC  San  Diego) Chris  Sander  &  Co.  (MSKCC  NYC) Piet  Molenaar Agilent Leroy  Hood  &  Co.  (ISB  Sea[le) Collaborators 36
  • 37.
    Berlin,  July  18,  2013 Cytoscape Retreat 2013 Pasteur Institute, Paris Oct 9: Symposium on Network Biology Oct 10: Cytoscape User and Developer Tutorials http://nrnb.org/cyretreat/ 37