Advertisement
Advertisement

More Related Content

Similar to Introduction to Biological Network Analysis and Visualization with Cytoscape Part1(20)

Advertisement

More from Keiichiro Ono(20)

Advertisement

Introduction to Biological Network Analysis and Visualization with Cytoscape Part1

  1. Introduction to Biological Network Analysis and Visualization with Cytoscape Keiichiro Ono Cytoscape Core Developer Team UC, San Diego Trey Ideker Lab / National Resource for Network Biology 5/10/2016 The Scripps Research Institute Lecture 1: Basics
  2. Keiichiro Ono Cytoscape Core Developer since 2005 @UCSD Trey Ideker Lab Area of Interest: Biological Data Integration & Visualization
  3. Agenda • Lecture 1 (Today):
 
 Introduction to Biological Network Analysis and Visualization • What is the benefits of biological network analysis and visualization? • Introduction to Cytoscape • Preview of Lecture 2: cyREST • Lecture 2:
 
 Reproducible Analysis & Visualization • Introduction to Jupyter Notebook • Create a reproducible network visualization workflows with Python
  4. All documents, data, and code are available here: https://github.com/idekerlab/tsri-lecture
  5. Why Network Analysis?
  6. Networks?
  7. EP300 PPARG SMARCD3 STMN1 SMARCA4 OPTN ATP6V1C1 PSMD1 HTT PRNP HNRNPUL1 CCDC88A CLU HSP90AB1 SMARCD3 MAP4K4 MIF4GD USP11 MARCH6TUBB EDF1 CHD8 Protein-Protein Interactions
  8. Human-Curated Pathways KEGG Pathway (TCA Cycle) visualized by Cytoscape KGMLReader
  9. Interactomes Human Interactome data from BioGRID visualized with Cytoscape
  10. Social Networks
  11. Network extracted from Panama Papers data set
  12. Node 1 - Edge Type - Node 2
  13. Protein 1 - Y2H - Protein 2
  14. Networks vs Pathways
  15. Networks Pathways
  16. Networks Pathways Collection of binary interactions Human-curated / detailed Large Scale Small Scale Generated from omics-data Constructed from literature
  17. Networks / Pathways = Graphs
  18. Benefits of Network Analysis
  19. Benefits of Network Analysis - You have list of N genes from your screening - Now you want to know: - Relationships among those genes - Functions - etc. Screening 1 PPARG TCF7L2 RETN IRS1 HNF1A HNF4A KCNJ11 GCK LIPC PTPN1 ABCC8 ENPP1 HNF1B
  20. ENSG00000167780 ENSG00000255974 EBI-9992455 ENSG00000241119 ANP32B EBI-10055098 EBI-10055672 EBI-9871829 NR3C1 EBI-10096648 STK16 SNCG EBI-10034984 EBI-9973444 RAD50 EBI-9980932 ENSG00000070019 EBI-9871836 ENSG00000105398 EBI-9992700 ENSG00000207778 ENSG00000143819 EBI-9980935 ENSG00000125730 ENSG00000180432 ENSG00000197249 EXT2 EBI-5333164 EBI-10051521 OPTN EBI-10050213 PPARGC1A EBI-10039585 MAPK8 HNF4A EDF1 SFPQ ENSG00000110245 PPARGEP300 SMARCD3 PRNP HNRNPUL1 ATP6V1C1 HTT EBI-10050241 EBI-10039564 ENSG00000118137 PABPC4 EBI-10050232 EBI-10051518 EBI-9871840 EBI-10096653 ENSG00000095596 MARCH6 EBI-3946155 EBI-5333185 BCAR3 IRS1 PIK3CA LRRK2 Irs1 PIK3R1 PIK3R1 Grb2 Phb Grin1 H1f0 Rps3 Rpn2 Ipo9 Scml2 Cand1 Eif2ak2 Ipo8 Ppfibp1 Sdpr Poldip3 Tenm3 Rars Ppp2r1a Vim Rfc3 Hsp90ab1 R q c d 1 Rplp0 Hnrnpu Irs1 Snd1 Hspa8 Ung Tp53bp2 GRB2 TP53BP2 YWHAB IGF1R DDR1 SMAD3 EBI-1108795 Ctnnd1 Ran Ywhag Rab6b Ybx1 Epha2 Grin2b Flot2 Aff3 Ptpn1 PELI1 EIF2AK2 INSR YWHAE NCOR1 Pik3r1 N U M B PRKCQ TP53BP2 PIK3R3 RAP2A Mink1 CHEBI:39112 Ywhab CHEBI:39079 CDK5 STMN1 PPARGUSP11 CHEBI:45783 CHEBI:18319 HSP90AB1 MAP4K4 MIF4GD CHEBI:64310 CHEBI:49840 TUBB CLU CHD8 CCDC88A SMARCA4 SMARCD3 PSMD1 MAP4K2 MINK1 Map3k1 MAP3K1 PKM MAP3K11 CHEBI:49375 GOLIM4 ARHGAP39 GOSR1 BCOR PTRF PCNP KIDINS220 TMEM216 PHB ABCD3 1C ATP6V0D1 FLOT2 RNF213 SMARCD2 LTN1 PIAS1 Sumo1 ATP6V0A1 WRAP53 EVC2 ACTN1 GALNT2 dhbF B4E2V5 TCTN3 GHR EGFR ATP6V1B2 CDH2 TMEM17 RMDN3 PHB2 PTPN1 Cdh2 TCTN2 PDGFRB ATF2 CHEBI:17440 mviM1 MET MVP MSN MREG COL5A1 FLOT1 ASS1 CHEBI:17283 uvrB FOXM1 tyrB UBE2I glnA Ctnnb1 PSMA3 ENSG00000138795 CTNNB1 RUNX3 PSEN1 JUP DAXX TCF7L2 hmwP2 hemL2 vgrG7 fadB FBLN1 XRCC6 XRCC5 PARP1 Q99IB8-PRO_0000045599 pagA Psen1 ENSG00000168646 ENSG00000065361 p p s C Q99IB8-PRO_0000045596 Q8CLD5 TGIF1 YPO2975 p g i pyrE ENSG00000110092 ENSG00000136997
  21. Benefits of Network Analysis - You can see the relationships among the group of biological entities - Find drag targets - Overrepresented functions AND their connections
  22. Gene List to Network to Biological Insight - How? - You need to search, integrate, and visualize multiple data sources This is what you will learn in this lecture
  23. What is Cytoscape?
  24. An Open Source Platform for Biological Network Data Integration, Analysis and Visualization Cytoscape
  25. Cytoscape 3.4.0 (Latest Release)
  26. Cytoscape - Open Source (LGPL) - Free for both commercial and academic use - Developed and maintained by universities, companies, and research institutions - UC, San Diego - University of Toronto - UC, San Francisco - ISB - And collaborators world-wide
  27. Cytoscape - De-facto standard software in biological network research community - Large User and Developer Community - Expandable by Apps - This is why Cytoscape is a Platform, not a simple desktop application
  28. C. Elegans Interactome from BioGRID Database ?
  29. Biological Networks - Tell us anything by themselves - Just a big hairball…
  30. Module 1 Module 2
  31. In other words…
  32. Module 1 Need a tool to extract meaningful biological modules
  33. Basic Use Case
  34. Networks Public Interaction Databases
  35. List of Genes
  36. Other Data
  37. Network Data Analysis Analysis Graph Analysis NetworkX igraph Cytoscape Python Pandas NumPy SciPy Excel Visualization Desktop Gephi Cytoscape matplotlib Web Cytoscape.js sigma.js d3 NDV3 d3.chart Google Charts Data Storage Graph Neo4j GraphX Document MongoDB Relational MySQL IPython 3rd Party Apps NetworkAnalyzer
  38. Network Data Analysis Analysis Visualization Desktop Gephi Cytoscape matplotlib Web Cytoscape.js sigma.js d3 NDV3 d3.chart Google Charts Data Storage
  39. Network Data Analysis Analysis Graph Analysis NetworkX igraph Cytoscape Python Pandas NumPy SciPy Excel Visualization IPython 3rd Party Apps NetworkAnalyzer
  40. Three Basic Steps for Data Visualization with Cytoscape
  41. <?xml version="1.0" encoding="UTF-8"?> <graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd"> <!-- Created by igraph --> <key id="degree" for="node" attr.name="degree" attr.type="double"/> <key id="betweenness" for="node" attr.name="betweenness" attr.type="double"/> <graph id="G" edgedefault="directed"> <node id="n0"> <data key="degree">79</data> <data key="betweenness">0</data> </node> <node id="n1"> <data key="degree">9</data> <data key="betweenness">167</data> </node> <node id="n2"> <data key="degree">18</data> <data key="betweenness">75</data> </node> <node id="n3"> <data key="degree">8</data> <data key="betweenness">12</data> </node> <node id="n4"> <data key="degree">26</data> <data key="betweenness">210</data> </node> <node id="n5"> <data key="degree">29</data> <data key="betweenness">320</data> </node> Data Integration
  42. Analysis
  43. Visualization
  44. Drawing Biological Networks VS
  45. Drawing Tools You need to specify color of each node, width of each edge, shape of nodes, etc.
  46. There is one huge difference between Cytoscape and Illustrator…
  47. In Cytoscape, Your Data Controls View
  48. Creating Visualizations in Cytoscape Name Type BRCA1 gene MAP2K1 gene C05981 compound • Mapping from Type to Node Shape • Mapping from Type to Node Color C05981 BRCA1 MAP2K1 Creating mappings from data points to Visual Properties
  49. Network Data Annotated Networks Attributes Analyzed Data
  50. Apps
  51. Cytoscape Apps - Extension programs to add new features to Cytoscape - formerly called Plugins - Large App developer/ user community - This is why Cytoscape is so successful in life science community!
  52. Example Apps
  53. ClueGO Creates and visualizes a functionally grouped network of terms/pathways
  54. ReactomeFIPlugIn Explore Reactome pathways and search for diseases related pathways and network patterns using the Reactome functional interaction network
  55. KEGGScape KEGG Pathway Importer for Cytoscape
  56. clusterMaker2 Multi-algorithm clustering app for Cytoscape
  57. cyREST (Now Part of The Core!) RESTful API for Cytoscape
  58. cyREST RESTful API for Cytoscape
  59. APPS.CYTOSCAPE.ORG
  60. Overview of App Ecosystem A travel guide to Cytoscape plugins Rintaro Saito, Michael E Smoot, Keiichiro Ono, Johannes Ruscheinski, Peng- Liang Wang, Samad Lotia, Alexander R Pico, Gary D Bader, Trey Ideker (2012) Nature Methods 9 (11) p. 1069-1076
  61. Tips for Learning New Tools
  62. Choose a Right Tool
  63. Choose a Right Tool Analysis VisualizationData Preparation
  64. Data Visualization Tools http://selection.datavisualization.ch/
  65. Data Visualization Tools http://selection.datavisualization.ch/
  66. Data Visualization Tools http://selection.datavisualization.ch/
  67. Tools • In some cases, you can get exact same result using different tools • Example: Data preparation (cleansing / cleansing) • But if you choose right tools, you can do it 100x faster than others. • ex: Re-formatting complex data sets • Excel vs Python Script • Some recommendations: • R/Bioconductor, Python/Pandas, Git/GitHub/Gist
  68. Learning Tools = Saving Your Time
  69. Hands-on: Introduction to Network Visualization with Cytoscape
  70. Sample Data Files: http://cl.ly/VTJs Same data is available in lecture1/dataset
  71. Data Visualization
  72. - Goal: Help others to understand your data - Emphasize what you want to tell - Use color, shape, size of objects effectively! - Excellent resource for data visualization - Tamara Munzner’s Web Site: 
 http://www.cs.ubc.ca/~tmm/ Data Visualization
  73. Today’s Goal
  74. Story: I want to show gene expression changes over time as a network diagram
  75. YPL201C YPL211W YML007WYPL131W YOR327CYDR171W YCL067C YGL208WYER074WYBL050W YLR134WYPL149W YDR050C YMR311CYGL134W YBR112CYKL101W YNL199C YPL222W YLR264W YNL098C YLL028W YOR039W YNL135C YPR041WYDR174W YIL074C YKL028W YIL162W YNL189W YOR212W YPR080W YPR145W YLL019C YLR284CYPL031C YFR037CYML074C YPL240CYPR048W YBR274W YBR050C YML032C YJR022WYBR248C YDR382W YER081WYIR009W YDR244W YOL016C YER103W YGR058WYLR256WYAL003W YOR355WYIL061C YER111C YMR309C YPL248CYBR019CYLR362W YGL035CYPR167C YML123C YBL026WYNL091W YOR178C YIL113WYLR321C YML064C YMR117C YDL194WYNR007C YOL058WYBR045CYER065CYNL167C YGL097WYHR071W YDL078C YDL081CYDR354W YER145C YGR136WYDR311W YPR119WYER112W YLR214W YER143W YBR043CYKL204W YGR019WYEL041W YER133W YBR118WYAL038W YDR167WYMR058WYER079W YMR291W YKL012W YDL113CYDR299W YDL075W YDL236WYLR377C YNL145W YNL236W YOL156W YGL013C YHR171W YMR021C YFL038C YER090WYPR062W YAR007C YNL307CYML024WYDR335W YLR075W YNL050CYGR046W YAL040CYLR191W YMR138WYIL045W YHR005C YKL211CYLR452C YPL075WYML051W YOL123WYHR198C YMR300C YJR060W YMR043WYPR124WYLR081W YLR319CYKL074C YKL001C YDR100W YDR395W YDR009W YDR309C YPR102C YAL030W YHR084W YLR345W YBR170C YJL089WYFL026W YBR018C YGL115W YDL215CYGR009C YOL120C YFL017C YDR429C YIL052C YGL073W YGR108WYPR035W YJL190CYOL086CYBL005WYKR026C YBR155W YOR264W YKL109W YOR167C YDR070CYEL015W YIL133C YGL166WYHR030CYGL008C YMR146C YBR160W YBR020W YBR190WYDR323CYLR197W YFR014CYKL161C YML054C YKR099WYLR340WYGL106W YBR093CYCL040W YLR044C YCR086WYDL130W YJL203W YEL009CYBR135W YOR361C YGR085C YNL216W YBR109C YER124C YJL157C YDR461WYNL154CYLR117C YKR097W YIL069CYMR186W YJR109CYIL015W YER040W YGR074WYER052C YIL160CYOR290C YLR249W YGL153WYOR215CYGR254W YLR432WYCR084CYOR089C YOR303W YGL161C YLR293CYDL030WYNL036W YHR135CYER179W YDR277CYDR184C YML114C YFL039CYER054C YER110CYLR109W YLR116WYNL214W YBL069W YHR141CYER116CYJL219W YDL023C YGL202WYER062C YMR183CYFR034CYGL122C YIL105C YDL088CYPR010C YJR048W YIL070C YEL039CYDR412WYMR108W YOR204W YMR255W YLR175W YHR115CYNL164C YJL013C YDL063C YNL117W YIL143CYOR315W YDR146CYLR310CYGR014WYBR217W YJL036W YNL116W YOR120W YDR032C YPR113W YLR153C YGR048W YGR203W YNL113WYOR202W YNR050C YCL030C YJL159W YHR053CYPR110C?YLR258W YBL079W YNL069C YNL311CYDR142C YGL044CYMR044W What is Great Visualization…?
  76. Design is complicated, because humans are complicated. Design is a process to avoid bad designs. Mike Bostock (New York Times Visualization Team. Creator of D3.js)
  77. It is hard to generalize the design process, but we can avoid pitfalls by following some basic rules.
  78. Every pixel should carry information. Edward Tufte
  79. Avoid Data Overload • Mapping too many attributes makes your visualization awful! • It is hard to see the overall trend of your data sets if too many channels are used in a image
  80. “Great Artists Steal…”
  81. MUD HAP4 GC HA GAL1 GAL7 GAL80 GAL3 GAL11 GAL4 GAL2 SIP4 FBP1 GAL10 SWI5 SUC2 MIG1 ADH1 PGK1 CDC19 GCR1 CBF1 ENO1 ENO2 MCK1 NCE103 SSL2 TFB1 YNL091W TRP4 ARG1 GCN4 SKO1 HIS3 ADE4 ILV2 RPS17A BAS1 HIS7 RPS24B MSL1 HIS4 PDC5 PHO84 PHO4 YIL105C MET16 RPL11B RPS8B RPL11A RPL31A PHO13 PDC1 SXM1 RPL34B RPL16B ATC1 CAR1 FCY1 ICL1SRP1 TPI1 RPL18B RPL25 PHO5 RPS24A RPL18A DMC1 RAP1 RPL16A HSP42
  82. MUD HAP4 GC HA GAL1 GAL7 GAL80 GAL3 GAL11 GAL4 GAL2 SIP4 FBP1 GAL10 SWI5 SUC2 MIG1 ADH1 PGK1 CDC19 GCR1 CBF1 ENO1 ENO2 MCK1 NCE103 SSL2 TFB1 YNL091W TRP4 ARG1 GCN4 SKO1 HIS3 ADE4 ILV2 RPS17A BAS1 HIS7 RPS24B MSL1 HIS4 PDC5 PHO84 PHO4 YIL105C MET16 RPL11B RPS8B RPL11A RPL31A PHO13 PDC1 SXM1 RPL34B RPL16B ATC1 CAR1 FCY1 ICL1SRP1 TPI1 RPL18B RPL25 PHO5 RPS24A RPL18A DMC1 RAP1 RPL16A HSP42 Map gene expression values to color Avoid using more colors in other components (edge/label) If necessary, map other data into non-overlapping visual properties (edge score to width)
  83. Part 1: Session File and Basic Navigation
  84. Cytoscape 3.4 Desktop Toolbar Network Panel Bird’s Eve View Table Browser Network Views
  85. Local Column Table Tabs List Data
 (Values in [ ]) Shared Column
  86. Session File - Snapshot of your workspace - Networks - Tables - Visual Styles - System Properties
  87. Open a Session - Click folder icon - Or, File → Open
  88. Exercise 1: Loading a session
  89. Navigation - Pan: Drag - Zoom - IN: Mouse Wheel UP - OUT: Mouse Wheel DOWN - Selection: Shift + Drag - Fit to Window - Selected region - Entire network
  90. First Neighbor of Nodes CTR+6
  91. Create New Sub-Network From Selection CTR+N
  92. - Grid View
  93. - Detached View
  94. Part 2: Data Import
  95. Network Data Formats - SIF - GML - XGMML - GraphML - BioPAX - PSI-MI - SBML - KGML (KEGG) - Excel - Text Table - CSV - Tab
  96. NCBI Gene ID 672 On Chromosome 17 GO Terms DNA Repair Cell Cycle DNA Binding Ensemble ID ENSG00000012048 BRCA1
  97. Data Tables for Cytoscape - Example: - Numeric - Gene expression profiles - Network statistics calculated in other applications, such as R - Confidence scores for edges - Text (or categorical) - GO annotation for genes - List of genes related to disease X - Targets for FDA approved drugs - Genes on KEGG Pathway Y - Clusters / group / community calculated in external programs - …
  98. Your Data Sets - Anything saved as a table can be loaded into Cytoscape - Excel - Tab Delimited Document - CSV - As long as proper mapping key is available, Cytoscape can map them to your networks.
  99. Mapping Key in the Network Mapping Key in the Table
  100. Exercise 2: Loading network and tables
  101. Part 3: Visualization
  102. Layouts
  103. Force-Directed + Edge Bundling
  104. Stacked-Node Layout + Default Edge Bend
  105. Circular + Edge Bend
  106. Automatic Layout - Choose proper algorithm - Tree-like data - Hierarchical Layout - Scale-Free Network - Force-directed - Circular process - Circular Layout - Tweak parameters if necessary
  107. Manual Layout - Tweak result from automatic layout - Scale - Align - Rotate
  108. Exercise 3: Apply layouts
  109. Visual Style - Collection of mappings from Attributes to Visual Properties
  110. Visual Styles - Defaults + Mappings - Expression values to node color - Gene function to node shape - Interaction detection method to edge line type - Confidence score to edge width
  111. Core Idea: Data Controls The View
  112. Data Controls The View • Photoshop / Illustrator • You control the pixels and objects on the display • Data Visualization Tools (including Cytoscape) • Data points are mapped to visual properties • Color • Size
  113. Data Controls The View
  114. Expression Values To Node Colors
  115. Discrete Mapping Editor Continuous Mapping Editor
  116. Exercise 4: Create New Visual Style
  117. Preview of Lecture 2: Reproducible Workflow with Jupyter Notebook and cyREST
  118. Cline, Melissa S., et al. "Integration of biological networks and gene expression data using Cytoscape." Nature protocols 2.10 (2007): 2366-2382.
  119. Cline, Melissa S., et al. "Integration of biological networks and gene expression data using Cytoscape." Nature protocols 2.10 (2007): 2366-2382.
  120. Results
  121. Sharing Results 😐
  122. Sharing Results and Process 😃
  123. Point & Click Operation is Easy, but not Reproducible…
  124. Goal: Reproducible Science
  125. Goal: Reproducible Science REST
  126. Tools You Need
  127. REST - GitHub - For source code sharing - IPython (Jupyter) Notebook - Your electronic lab notebook - cyREST - RESTful API module for Cytoscape
  128. - - Two Google Groups - cytoscape- discuss@googlegroups.com - cytoscape- helpdesk@googlegroups.com - ANY question is OK! Getting Help
  129. Further Readings
  130. Further Readings • My presentation slides • http://www.slideshare.net/keiono • This deck of will be uploaded today
  131. Further Readings 1 - Introduction to Network Biology - Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases
 
 Shoemaker BA, Panchenko AR (2007) Deciphering Protein–Protein Interactions. Part I. Experimental Techniques and Databases. PLoS Comput Biol 3(3): e42.doi:10.1371/journal.pcbi.0030042 - Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners
 
 Shoemaker BA, Panchenko AR (2007) Deciphering Protein–Protein Interactions. Part II. Computational Methods to Predict Protein and Domain Interaction Partners. PLoS Comput Biol 3(4): e43. doi:10.1371/ journal.pcbi.0030043
  132. Further Readings 2 - Overview of Cytoscape Apps (Plugins) - A travel guide to Cytoscape plugins
 
 Rintaro Saito, Michael E Smoot, Keiichiro Ono, Johannes Ruscheinski, Peng-Liang Wang, Samad Lotia, Alexander R Pico, Gary D Bader, Trey Ideker (2012) Nature Methods 9 (11) p. 1069-1076 - Sample Protocol (based on 2.x) − Integration of biological networks and gene expression data using Cytoscape
 
 Cline, et al. Nature Protocols, 2, 2366-2382 (2007).
  133. Further Readings 3 - Cytoscape Tutorial Booklet:
 
 Analysis and Visualization of Biological Networks with Cytoscape - http://www.rbvi.ucsf.edu/Outreach/Workshops/ISMBTutorial.pdf
  134. 2016 Keiichiro Ono kono@ucsd.edu
Advertisement