Overview of Modern
Graph Analysis Tools
Keiichiro Ono

Cytoscape Core Developer Team

UC, San Diego Trey Ideker Lab / National Resource for Network Biology

5/24/2016 Ideker Lab Meeting
Recap
Cytoscape Session File — for sharing results
But what about process?
http://www.the-scientist.com/?articles.view/articleNo/43632/title/Get-With-the-Program/
https://theconversation.com/how-computers-broke-science-and-what-we-can-do-to-fix-it-49938http://www.nature.com/nature/journal/v483/n7391/full/483531a.html
Reproducibility
…it’s a known issue
Data
Preparation
Analysis Visualization
Advanced Users:
Cytoscape for Interactive Visualization
R/Python for Data Manipulation / Analysis
Lab Notebook for in silico Experiments
Interactive Command-Line
+
Markdown-based Documents
Question
• Cytoscape is a desktop application
• Point & click GUI operation
• Easy to use, but how can we
make our workflow
reproducible?
REST
What is cyREST?
- Platform-independent, RESTful API module for Cytoscape
- Means you can access basic Cytoscape data objects
programmatically
- Now it’s a Cytoscape Core feature!
REST
Get full network with unique ID 52 as JSON
GET http://localhost:1234/v1/networks/52
But, don’t use cyREST (directly)!
Language-Specific Shims
For Python For R
RCy3
• R wrapper for cyREST
• Now a part of Bioconductor
• Easy to install
• Natural API for R users
py2cytoscape
• Python wrapper for cyREST
• Supports high-level API
• Cytoscape.js viewer included
• Supports for iOS/Android
Example
Creating an empty network with raw cyREST API
…and with py2cytoscape
http://nbviewer.jupyter.org/gist/keiono/
73da21846b6f73de70122bdb545c1c14
https://github.com/cytoscape/cyREST/wiki/Running-your-workflow-in-the-clouds
Now you have…
• Programmatic access to Cytoscape functions
• Notebooks to run your workflows
• Remote machines (clusters/clouds) for CPU intensive
tasks
Graph Libraries as Analytic Engine for Cytoscape
In-Memory Graph Analysis
N < millions
NetworkX
Pros:

- Easy to install

- Most of basic 

graph operations

Cons:

- Slow!
igraph
Pros:

- Has a lot of analysis features

Standard graph statistics, community detection, label propagation, etc.

- Fast (comparing to NetworkX)

Cons:

- Weird API (for Python Users)
graph-tool
Pros:

- Fast (Optimized with C++)

- Nice visualization features

Cons:

Hard to install
Parallel Graph
Analytics (PGX)
- Oracle’s experimental project

- There are lots of unknowns due to its stage (early experimental release), but has a
lots of features, just like igraph
Don’t use NetworkX for large data sets…
FYI: GPU-Based
Layouts
~100x faster
Out-of-Core Graph Analysis
N > billions
GraphX
• Part of Apache Spark Project

• Industry Standard

• Lots of documentation and
supports from the community

• You can use Python and R, but
in Spark world, Scala is still the
first-class citizen…
End-to-end PageRank performance
(20 iterations, 3.7B edges)
GraphLab Create
• Commercial Service by Dato

• High-level API and data
structure

• SFrame/SGraph

• Their version of scalable-
DataFrames

• (Semi) automatic parallel
processing
Neo4j v3
- This one focuses on storing arbitrary large graph (billions of nodes /edges)
data

- Has some analysis features

- Now natively support Python
Summary
• Don’t use NetworkX unless it’s necessary!
• Don’t use raw cyREST API if you are Python/R users
• There are lots of new graph analysis tools
• Some of them are bit hard to install / Setup
• Candidates for CI services (?)
• We deploy to servers, and you can access from
simple API
2016 Keiichiro Ono
kono@ucsd.edu

Overview of Modern Graph Analysis Tools