This document provides an overview of large scale graph analytics and JanusGraph. It discusses graph databases and their use cases. JanusGraph is presented as an open source graph database that can scale to billions of vertices and edges across multiple storage backends like HBase, Cassandra and Bigtable. It uses the TinkerPop framework and Gremlin query language. JanusGraph supports ACID transactions, external indices, and evolving schemas. Example graph queries are demonstrated using the Gremlin console.
This document provides an overview of large scale graph analytics and JanusGraph. It discusses graph databases and their use cases. JanusGraph is presented as an open source graph database that can scale to billions of vertices and edges across multiple storage backends like HBase, Cassandra and Bigtable. It uses the TinkerPop framework and Gremlin query language. JanusGraph supports ACID transactions, external indices, and evolving schemas. Example graph queries are demonstrated using the Gremlin console.
ACM SIGMOD日本支部第56回支部大会でお話しした、ICDE 2014の参加報告についての資料です。以下のような6部構成になっています。全190ページです。
・ICDE 2014を俯瞰してみる(5p~)
・ビッグデータ時代の新発想:もうデータは蓄えない(32p~)
Keynote, Running with Scissors: Fast Queries on Just-in-Time Databases
・見えない相手と協調作業:センサネットワーク上のデータ集約(64p~)
10 Year Most Influential Paper, Approximate Aggregation Techniques for Sensor Databases
・メインメモリデータベースがハードウェアトランザクショナルメモリを使ったら…(96p~)
Best Paper, Exploiting Hardware Transactional Memory in Main-Memory Databases
・過去の結果を再利用:ビューを用いた大規模グラフからのパターン発見(126p~)
Best Paper Runner-up, Answering Graph Pattern Queries Using Views
・アルゴリズムでゴリゴリ解決:大量のベクトルから類似ペアを厳密に見つけたい(155p~)
気になる論文, L2AP: Fast Cosine Similarity Search With Prefix L-2 Norm Bounds
Cytoscape is an open-source software ecosystem for complex network analysis and visualization. It began in 2003 as a Java desktop application but has since expanded to include a REST API, JavaScript library, and Jupyter Notebook/Lab extensions. As an ecosystem, Cytoscape provides domain-independent tools for visualizing and analyzing networks across various fields and computing environments.
- Keiichiro Ono presented on his experience developing and maintaining bioinformatics visualization applications over 15 years, including Cytoscape.
- Expanding applications to support new technologies like web, Python, and JavaScript is important to attract developers and connect to popular tools, but breaking API changes are difficult.
- A loosely coupled approach integrating existing applications, Jupyter notebooks, and new web apps may be better than rewriting applications from scratch.
Overview of Modern Graph Analysis ToolsKeiichiro Ono
This document discusses modern tools for graph analysis and making graph workflows reproducible. It introduces cyREST, a RESTful API for programmatic access to Cytoscape, and language-specific wrappers like RCy3 and py2cytoscape that provide natural APIs. These tools allow running Cytoscape workflows in notebooks and remote machines. It also covers graph libraries for analysis like NetworkX, igraph, graph-tool, and PGX for smaller graphs, and distributed frameworks like GraphX, GraphLab Create, and Neo4j for extremely large graphs with billions of nodes. The document recommends not using NetworkX for large data and considering cloud-based options for difficult to install tools.
Presentation slides for SDCSB Cytoscape Workshop on 5/19/2016. The presentation contains current status of Cytoscape project and overview of the Cytoscape ecosystem. It briefly mentions the Cytoscape Cyberinfrastructure.
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Keiichiro Ono
Introduction to biological network analysis and visualization with Cytoscape (using the latest version 3.4).
This is a first half of the lecture for Applied Bioinformatics lecture at TSRI.
Building Reproducible Network Data Analysis / Visualization WorkflowsKeiichiro Ono
The document discusses building reproducible network data analysis and visualization workflows using REST APIs and containerization. It aims to solve problems with complex software stacks that are difficult to set up and not reproducible. The goal is to create reproducible and scalable "dry experiments" using Docker containers, GitHub for source code sharing, Jupyter notebooks as electronic lab notebooks, and the cyREST module for the Cytoscape network analysis software. Examples of scenarios using local workstations and cloud computing are presented, as well as a demo and future plans.
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...Keiichiro Ono
This document provides an overview of a tutorial on building reproducible network data visualization workflows using Cytoscape and IPython Notebook. The tutorial will cover integrating data, analyzing networks, visualizing results, and preparing outputs for publication. It will demonstrate setting up a portable data analysis environment using Docker and sharing work through GitHub. The bulk of the tutorial will focus on using IPython Notebook as an electronic lab notebook for interactive and reproducible experiments with Cytoscape.
SDCSB CYTOSCAPE AND NETWORK ANALYSIS WORKSHOP at Sanford ConsortiumKeiichiro Ono
This document provides an overview and update on Cytoscape, an open source platform for biological network analysis and visualization. Key points discussed include:
- Cytoscape 3.2.1 is the latest desktop application release with new features like a chart editor and exporting visualizations as web applications.
- Cytoscape.js is a JavaScript library for building web applications that visualize networks, and there are examples of web apps built with it.
- Cytoscape's cyberinfrastructure initiative aims to make the software more accessible and integratable for computational biologists through services, apps, and repositories.
VIZBI 2015 Tutorial: Cytoscape, IPython, Docker, and Reproducible Network Dat...Keiichiro Ono
This document summarizes a tutorial presentation on reproducible network data visualization workflows using Cytoscape, IPython, Docker, and other tools. The presentation introduces Cytoscape 3.2 features like exporting visualizations as web applications and using chart editors. It discusses challenges in bioinformatics like complexity of data analysis pipelines and reproducibility. The goal of reproducible science is explained. Modern computing resources like virtual machines and frameworks are reviewed. Basic workflows for data preparation, analysis, and visualization are outlined. Technologies for enabling reproducibility like Docker, source code versioning with Git/GitHub, and Jupyter Notebooks are presented.
cyREST provides platform-independent access to Cytoscape's data models and functions via REST. This allows different tools like RStudio, IPython notebooks, command line utilities, and web apps to interact with Cytoscape. The goal is for all bioinformatics tools to work seamlessly together. cyREST demonstrates controlling Cytoscape from an IPython notebook to enable interactive data analysis across environments and computing resources.
GraphLab Conference 2014 Cytoscape FlyerKeiichiro Ono
Cytoscape is an open source platform for network analysis and visualization. It supports standard data formats and has flexible visualization capabilities through visual styles and layout algorithms. The Cytoscape ecosystem includes apps that expand its functionality and Cytoscape.js, a JavaScript library for graph visualization compatible with browsers.
Cytoscape Untangles the Web: a first step towards Cytoscape Cyberinfrastructu...Keiichiro Ono
Cytoscape is a standard desktop application for biological network analysis and visualization, but emerging problems include large network datasets that exceed desktop capabilities, demand for collaborative data sharing, and the need for self-publishing networks without web programming skills. CyNetShare is a first step towards a Cytoscape cyberinfrastructure that allows visualization of public network data files through an interactive web application using Cytoscape.js, sharing of visualizations via URL, and runs on both desktops and tablets.
NeXO Web Poster for ISMB 2014 BioVis SIGKeiichiro Ono
NeXO Web is an integrated ontology visualization application for modern web platforms that provides a new way to visualize ontology data sets like the Gene Ontology and NeXO using modern web technologies. It integrates multi-omics data into a HTML5-based single page application and is open for future expansion. NeXO Web allows users to visualize ontologies, search and enrich terms, view analysis windows, ontology trees, raw interactions, and term details.
Towards the Cytoscape CyberinfrastructureKeiichiro Ono
1) The document discusses ongoing projects at the Cytoscape Core Developer Team to integrate Cytoscape into larger computational workflows by sharing data and computing resources over the network.
2) This will utilize standard tools like RStudio and IPython Notebook as primary workbenches for advanced users.
3) One project is a simple web application called CyNetShare that allows sharing of network visualization using Cytoscape.js in a web browser.
8. Keiichiro Ono
Background
Bioinformatics
Computer Science
Work
Research
Bioinformatics workflow
Visualization pipeline
Data
Visualization
Networks
Other Biological Data
Integration
Molecular Interactions
Pathways
Annotations
Software Development
Cytoscape
NeXO
Cyberinfrastructure
All kinds of small tools
Like
Art
Kandinsky
Mondrian
Music
Electronica
Techno
Minimal
Detroit
Jazz
Sci-fi
Movie
Novel
Life
US
San Diego
San Francisco Bay Area
Los Angeles
Orange County
Japan
Gifu
Tokyo
14. Keiichiro Ono
Background
Bioinformatics
Computer Science
Work
Research
Bioinformatics workflow
Visualization pipeline
Data
Visualization
Networks
Other Biological Data
Integration
Molecular Interactions
Pathways
Annotations
Software Development
Cytoscape
NeXO
Cyberinfrastructure
All kinds of small tools
Like
Art
Kandinsky
Mondrian
Music
Electronica
Techno
Minimal
Detroit
Jazz
Sci-fi
Movie
Novel
Life
US
San Diego
San Francisco Bay Area
Los Angeles
Orange County
Japan
Gifu
Tokyo
38. –Tamara Munzner
Visualization is suitable when there is a
need to augment human capabilities
rather than replace people with
computational decision-making methods.
Visualization Analysis and Design. A K Peters/CRC Press, 10/2014.
可視化の利用は、(機械学習などの)計算機的手法で人を置き換える場合ではなく、
ヒトの能力を拡張して意思決定を行う必要がある時にこそ適切である。
106. grammar of graphics:
a general scheme for data visualization which
breaks up graphs into semantic components such
as scales and layers
en.wikipedia.org/wiki/Ggplot2
149. References
• Tufte, Edward R., and P. R. Graves-Morris. The visual display of quantitative
information. Vol. 2. Cheshire, CT: Graphics press, 1983.
• Wilkinson, Leland, et al. The grammar of graphics. Springer Science &
Business Media, 2006.
• Shen, Helen. "Interactive notebooks: Sharing the code." Nature 515.7525
(2014): 151-152.
• Tamara Munzner. Visualization Analysis and Design. A K Peters Visualization
Series, CRC Press, 2014.