1. By Priyabrata Dash
email: bobquest33@gmail.com
Graph Analytics for Big Data
(Current Trends &IBM System G)
2. Agenda
Graph Analytics Frameworks & Trends
Graph Databases & Trends
IBM System G
IBM Graph Store
IBM System G & Graph Store in Bluemix
Demo
Q & A
3. Graph Analytics Frameworks
Processing extremely large graphs has been and
remains a challenge, but recent advances in Big
Data technologies have made this task more
practical.
There are two classes of systems to consider:
− Graph databases for OLTP workloads for
quick low-latency access to small portions of
graph data.
− Graph processing engines for OLAP
workloads allowing batch processing of large
portions of a graph.
4. Graph Processing Engines
Graph problems have now become mainstream
and in response to the growing popularity for
graph analyses, a large number of specialized
graph engines have emerged
A key feature that these specialized graph
engines have going for them is that they
provide a vertex-centric way of graph
programming, which is intuitive for the end
(graph analytics) application developer to use.
5. Graph Processing Engines
Apache Giraph
Apache Hama
GraphX for Apache Spark
Faunus
Apache Tinkerpop
Gelly for Apache Flink
Dendrite
IBM System G & Many More .....
6. Graph Databases
In computing, a graph database is a database
that uses graph structures for semantic queries
with nodes, edges and properties to represent
and store data.
Graph databases tend to be optimized for
graph-based traversal algorithms.
11. Apache Tinkerpop
The TinkerPop stack
provides a foundation
for building high-
performance graph
applications of any size
It has the ability to
build applications
simple to handle trillion
edge graphs scaled
across a cluster of
computers.
18. System G Native Store
System G Native store represents graphs in-
memory and on-disk
− Organizing graph data for representing a graph
that stores both graph structure and vertex
properties and edge properties
− Caching graph data in memory in either batch-
mode or on-demand from the on-disk streaming
graph data
− Persisting graph updates along with the time
stamps from in-memory graph to on-disk graph
− Performing graph queries by loading graph
structure and/or property data
20. System G Native Store Overview
Native store not only
offers persistent graph
storage, but also sequential
/concurrent/distributed
graph runtimes
A set of C++ graph
programming APIs, a CLI
command set (gShell),
A socket client, a socket
client GUI, and some
visualization toolkit.
21. Tinkerpop & SPARQL Over Native
Store
IBM System G has a JNI layer to translate
the Native Store graph APIs into the
TinkerPop APIs.
Therefore, JAVA graph applications built
on top of the TinkerPop Blueprint can be
ported onto the IBM System G Native
Store. And various Open Source tools can
be integrated into the IBM System G.
System G provides TinkerPop Blueprints
interfaces to both it's high-performance C++
implementations and its HBase-based
GBase graphs.
Since Native Store provides Tinkerpop/
Blueprints interface via JNI, Gremlin is
running on Native Store.
A JENA based SPARQL query engine is
installed on top of the System G Native
Store
22. IBM System G on Bluemix
IBM System G on Bluemix, the
cloud version of IBM System G,
aims to help users get started with
IBM System G graph database
technologies, analytics,
visualizations and solutions by
interacting with the system in an
online setting.
http://systemg.mybluemix.net/
IBM® Graph Data Store enables
you to build and work with
powerful applications, using a
fully-managed graph database
service, accessible through a
REST-based HTTP API interface.
IBM Graph Data Store is an
experimental service. Data held in
the Graph Data Store is not
necessarily being backed up. In
particular, it should not currently
be used for high volume, high
performance, or production
applications.
https://graph-data-store-docs.ng.bluemix.net/index.html
https://graph-data-store-docs.ng.bluemix.net/gettingstarted.html