Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Reproducible Workflows with
Jupyter Notebook and Cytoscape
Keiichiro Ono

Cytoscape Core Developer Team

UC, San Diego Trey...
Course Materials: Clone/Fork/Download this repository!
https://github.com/idekerlab/tsri-lecture
Setup Guide:
https://gith...
Keiichiro Ono
Cytoscape Core Developer 

since 2005 

@UCSD Trey Ideker Lab

Area of Interest:

Biological Data Integratio...
Agenda
• Reproducible Analysis & Visualization

• Introduction to Jupyter Notebook

• Create a reproducible network
visual...
Review: Cytoscape Core Features
Review
- Network analysis / visualization is a powerful
method to get biological insights from your
screening result
- Cyt...
Review
-Core features of Cytoscape
-Navigation (Pan/Zoom/Select)
-Network / Table Data Import
-Automatic Layout
-Visual St...
Drawing Biological Networks
VS
Drawing Tools
You need to specify
color of each node,
width of each edge,
shape of nodes, etc.
There is one huge difference
between Cytoscape and
Illustrator…
In Cytoscape,
Your Data Controls View
Creating Visualizations in Cytoscape
Name Type
BRCA1 gene
MAP2K1 gene
C05981 compound
• Mapping from Type to Node Shape
• ...
Reproducibility
Recap
Cytoscape Session File — for sharing results
But what about process?
http://www.the-scientist.com/?articles.view/articleNo/43632/title/Get-With-the-Program/
https://theconversation.com/how-co...
Problems
- Reproducibility of biological research, especially for in vivo/vitro
experiments, is a hard problem
- But this ...
Typical Workflow
Data
Preparation
Analysis Visualization
Data Preparation
Data
Preparation
- Cleansing
- Normalization
- Missing values
- Corrupted values
- Reformat
- Conversion
Data
Preparation
Analysis Visualization
Analysis
Analysis
- Filtering
- Standard graph
statistics
- Density
- Betweenness
- Centrality
- Clustering
- Community Detection
-...
Data
Preparation
Analysis Visualization
Visualization
Visualization
- Mapping
- Data points to
visual variables
- Layout
- For graphs:
- Force-directed
- Tree
Data
Preparation
Analysis Visualization
Data
Preparation
Analysis Visualization
Data
Preparation
Analysis Visualization
Cytoscape for Interactive Visualization
Python for Data Manipulation / Analysis
Lab Notebook for in silico Experiments
Interactive Command-Line
+
Markdown-based Documents
IPython Notebook?
Jupyter?
IPython
Notebook
Notebook UI
+ Python Kernel
Jupyter
Notebook UI
+
Language Kernel
(R/Julia/etc.)
Language-Agnostic
- From next version (4.x), Python Notebook will be an
implementation of Jupyter
- You can switch to othe...
Question
• Cytoscape is a desktop application
• Point & click GUI operation
• Easy to use, but how can we
make our workflow...
REST
What is cyREST?
- Platform-independent, RESTful API module for Cytoscape
- Means you can access basic Cytoscape data objec...
Interactive Data Analysis
Environments
In-House Databases External Computing Resources
- Graph Layout
- Statistical Analys...
REST API?
curl http://mygene.info/v2/query?q=kras
{
"hits": [
{
"taxid": 9606,
"entrezgene": 3845,
"symbol": "KRAS",
"_id": "3845",
...
REST
Cytoscape 3.1+
Clients
POST
PUT
DELETE
GET
How cyREST Works
Mapping Cytoscape API to HTTP Methods
Create
Read
Update
Delete
Cytoscape Operations
POST
GET
PUT
DELETE
HTTP Methods
Get full network with unique ID 52 as JSON
GET http://localhost:1234/v1/networks/52
http://localhost:1234/v1/networks/52
Language-Specific Shims
For Python For R
REST
REST
Lab notebook to record
your workflow
Make Cytoscape
controllable via scripts
Manage
multiple versions of your
notebook...
Hands-On:
Using Cytoscape from Jupyter Notebook
Where should we go from here?
REST
Lab notebook to record
your workflow
Make Cytoscape
controllable via scripts
Manage
multiple versions of your
notebook...
Python 3.5.0
Ubuntu 15.04
Pandas, numpy, scipy, jupyter…
Docker as Portable Data Analysis Environment
Bare Metal Machine
OS
Virtual Machine
Frameworks
Your App
Bare Metal Machine
OS (Linux)
Docker
Frameworks
Application
Frameworks
Application
Frameworks
Application
Frameworks
Appli...
What is Docker?
- Container to run applications in an isolated
environment
- Application = Layer of images
- Sharable Envi...
Docker Hub
- Sharing environments as code!
- Dockerfile - Definition of your container
- “GitHub of Images”
Jupyter Official Images
Resources
- https://www.dataquest.io/blog/docker-
data-science/
- https://try.jupyter.org/
-
- Two Google Groups
- cytoscape-
discuss@googlegroups.com
- cytoscape-
helpdesk@googlegroups.com
- ANY question is OK!
G...
Further Readings
Further Readings
• My presentation slides
• http://www.slideshare.net/keiono
• cyREST web sites
• http://apps.cytoscape.or...
2016 Keiichiro Ono
kono@ucsd.edu
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
Upcoming SlideShare
Loading in …5
×

Reproducible Workflow with Cytoscape and Jupyter Notebook

3,456 views

Published on

Lecture slides for SDCSB advanced Cytoscape tutorial session on 5/19/2016. It contains introduction to Jupyter Notebook and cyREST.

Published in: Data & Analytics
  • Be the first to comment

Reproducible Workflow with Cytoscape and Jupyter Notebook

  1. 1. Reproducible Workflows with Jupyter Notebook and Cytoscape Keiichiro Ono Cytoscape Core Developer Team UC, San Diego Trey Ideker Lab / National Resource for Network Biology 5/19/2016 Advanced Cytoscape Workshop
  2. 2. Course Materials: Clone/Fork/Download this repository! https://github.com/idekerlab/tsri-lecture Setup Guide: https://github.com/idekerlab/tsri-lecture/blob/master/ documents/Setup%20Guide.pdf Cytoscape 3.4.0: http://www.cytoscape.org/download.php
  3. 3. Keiichiro Ono Cytoscape Core Developer since 2005 @UCSD Trey Ideker Lab Area of Interest: Biological Data Integration & Visualization
  4. 4. Agenda • Reproducible Analysis & Visualization • Introduction to Jupyter Notebook • Create a reproducible network visualization workflows with Python
  5. 5. Review: Cytoscape Core Features
  6. 6. Review - Network analysis / visualization is a powerful method to get biological insights from your screening result - Cytoscape is the de-facto standard tool to perform this type of analysis
  7. 7. Review -Core features of Cytoscape -Navigation (Pan/Zoom/Select) -Network / Table Data Import -Automatic Layout -Visual Style
  8. 8. Drawing Biological Networks VS
  9. 9. Drawing Tools You need to specify color of each node, width of each edge, shape of nodes, etc.
  10. 10. There is one huge difference between Cytoscape and Illustrator…
  11. 11. In Cytoscape, Your Data Controls View
  12. 12. Creating Visualizations in Cytoscape Name Type BRCA1 gene MAP2K1 gene C05981 compound • Mapping from Type to Node Shape • Mapping from Type to Node Color C05981 BRCA1 MAP2K1 Creating mappings from data points to Visual Properties
  13. 13. Reproducibility
  14. 14. Recap Cytoscape Session File — for sharing results But what about process?
  15. 15. http://www.the-scientist.com/?articles.view/articleNo/43632/title/Get-With-the-Program/ https://theconversation.com/how-computers-broke-science-and-what-we-can-do-to-fix-it-49938http://www.nature.com/nature/journal/v483/n7391/full/483531a.html Reproducibility …it’s a known issue
  16. 16. Problems - Reproducibility of biological research, especially for in vivo/vitro experiments, is a hard problem - But this is true even for in silico analysis! - OS version - Revision of scripts - Data analysis software versions - Version of data files - Command line parameters written on a paper napkin - “Black magic” only a grad student knows - This is something we need to fix, using latest technologies and best practices
  17. 17. Typical Workflow
  18. 18. Data Preparation Analysis Visualization
  19. 19. Data Preparation
  20. 20. Data Preparation - Cleansing - Normalization - Missing values - Corrupted values - Reformat - Conversion
  21. 21. Data Preparation Analysis Visualization
  22. 22. Analysis
  23. 23. Analysis - Filtering - Standard graph statistics - Density - Betweenness - Centrality - Clustering - Community Detection - GO enrichment analysis
  24. 24. Data Preparation Analysis Visualization
  25. 25. Visualization
  26. 26. Visualization - Mapping - Data points to visual variables - Layout - For graphs: - Force-directed - Tree
  27. 27. Data Preparation Analysis Visualization
  28. 28. Data Preparation Analysis Visualization
  29. 29. Data Preparation Analysis Visualization
  30. 30. Cytoscape for Interactive Visualization Python for Data Manipulation / Analysis
  31. 31. Lab Notebook for in silico Experiments
  32. 32. Interactive Command-Line + Markdown-based Documents
  33. 33. IPython Notebook? Jupyter?
  34. 34. IPython Notebook Notebook UI + Python Kernel Jupyter Notebook UI + Language Kernel (R/Julia/etc.)
  35. 35. Language-Agnostic - From next version (4.x), Python Notebook will be an implementation of Jupyter - You can switch to other language kernels - In this lecture, we will use Python, but you can use language of your choice to control Cytoscape
  36. 36. Question • Cytoscape is a desktop application • Point & click GUI operation • Easy to use, but how can we make our workflow reproducible?
  37. 37. REST
  38. 38. What is cyREST? - Platform-independent, RESTful API module for Cytoscape - Means you can access basic Cytoscape data objects programmatically - Now it’s a Cytoscape Core feature! REST
  39. 39. Interactive Data Analysis Environments In-House Databases External Computing Resources - Graph Layout - Statistical Analysis - Data Pre-processing RStudio - NumPy - SciPy - Pandas - NetworkX IPython Notebook File / Code Hosting ServicesPublic Data Repository PSICQUIC Services EBI RDF Platform Other Bioinformatics Web Applications / Services - igraph - rCurl Command Line Tools > sed > awk > grep > curl Web Browsers Data Repository & Collaboration Service Data Bus (Internet) Your Workstation Cytoscape App Store Cytoscape Desktop Apps Core REST
  40. 40. REST API?
  41. 41. curl http://mygene.info/v2/query?q=kras { "hits": [ { "taxid": 9606, "entrezgene": 3845, "symbol": "KRAS", "_id": "3845", "name": "Kirsten rat sarcoma viral oncogene homolog" }, { "taxid": 10090, "entrezgene": 16653, "symbol": "Kras", "_id": "16653", "name": "Kirsten rat sarcoma viral oncogene homolog" }, { "taxid": 10116, "entrezgene": 24525, "symbol": "Kras", "_id": "24525", "name": "Kirsten rat sarcoma viral oncogene" }, { "taxid": 10090, "entrezgene": 110836, "symbol": "Kras2-rs2", "_id": "110836", "name": "Kirsten rat sarcoma oncogene 2, related sequence 2" }, { "taxid": 10090, "entrezgene": 110832, "symbol": "Kras2-rs1", "_id": "110832", "name": "Kirsten rat sarcoma oncogene 2, related sequence 1" }, { "taxid": 10090, "entrezgene": 111117, "symbol": "Kras1-ps", "_id": "111117", "name": "Kirsten rat sarcoma oncogene 1, pseudogene" } ], "max_score": 391.5175, "took": 4, "total": 6 }
  42. 42. REST Cytoscape 3.1+ Clients POST PUT DELETE GET How cyREST Works
  43. 43. Mapping Cytoscape API to HTTP Methods Create Read Update Delete Cytoscape Operations POST GET PUT DELETE HTTP Methods
  44. 44. Get full network with unique ID 52 as JSON GET http://localhost:1234/v1/networks/52
  45. 45. http://localhost:1234/v1/networks/52
  46. 46. Language-Specific Shims For Python For R
  47. 47. REST
  48. 48. REST Lab notebook to record your workflow Make Cytoscape controllable via scripts Manage multiple versions of your notebooks and other scripts
  49. 49. Hands-On: Using Cytoscape from Jupyter Notebook
  50. 50. Where should we go from here?
  51. 51. REST Lab notebook to record your workflow Make Cytoscape controllable via scripts Manage multiple versions of your notebooks and other scripts Missing: Environment to execute your workflow
  52. 52. Python 3.5.0 Ubuntu 15.04 Pandas, numpy, scipy, jupyter…
  53. 53. Docker as Portable Data Analysis Environment
  54. 54. Bare Metal Machine OS Virtual Machine Frameworks Your App
  55. 55. Bare Metal Machine OS (Linux) Docker Frameworks Application Frameworks Application Frameworks Application Frameworks Application Frameworks Application
  56. 56. What is Docker? - Container to run applications in an isolated environment - Application = Layer of images - Sharable Environments - Environments as code
  57. 57. Docker Hub - Sharing environments as code! - Dockerfile - Definition of your container - “GitHub of Images”
  58. 58. Jupyter Official Images
  59. 59. Resources - https://www.dataquest.io/blog/docker- data-science/ - https://try.jupyter.org/
  60. 60. - - Two Google Groups - cytoscape- discuss@googlegroups.com - cytoscape- helpdesk@googlegroups.com - ANY question is OK! Getting Help
  61. 61. Further Readings
  62. 62. Further Readings • My presentation slides • http://www.slideshare.net/keiono • cyREST web sites • http://apps.cytoscape.org/apps/cyrest • https://github.com/idekerlab/cyREST/wiki • py2cytoscape — https://github.com/idekerlab/ py2cytoscape
  63. 63. 2016 Keiichiro Ono kono@ucsd.edu

×