GraphTech Ecosystem - part 2: Graph Analytics

The GraphTech Ecosystem 2019
Part 2/3 - Graph analytics

The three layers of graph technology
Graph visualization
tools
Graph computing
framework
Graph databases
Visualize
Analyze
Store Backend
Frontend

The graph analytics ecosystem
Query languages
Graph analytics libraries and toolkitsGraph processing frameworks / engines

Graph processing frameworks / engines

Apache Giraph
Apache Giraph
Distributed
Apache 2.0 Licence
https://giraph.apache.org/
About
Apache Giraph is an iterative graph processing system built for high scalability. For example, it is
currently used at Facebook to analyze the social graph formed by users and their connections. Giraph
originated as the open-source counterpart to Pregel, the graph processing architecture developed at
Google and described in a 2010 paper. Both systems are inspired by the Bulk Synchronous Parallel
model of distributed computation introduced by Leslie Valiant. Giraph adds several features beyond the
basic Pregel model, including master computation, sharded aggregators, edge-oriented input, out-of-core
computation, and more.

Apache Hadoop Spark
Apache Hadoop Spark
Distributed
Apache License 2.0
http://spark.apache.org/
About
Spark is an Apache Software Foundation project focused on general-purpose OLAP data processing.
Spark provides a hybrid in-memory/disk-based distributed computing model that is similar to Hadoop’s
MapReduce model.

Apache Hama
Apache Hama
Distributed
Apache License 2.0
https://hama.apache.org/
About
Apache HamaTM is a framework for Big Data analytics which uses the Bulk Synchronous Parallel (BSP)
computing model, which was established in 2012 as a Top-Level Project of The Apache Software
Foundation. It is a top-level open source project to do advanced analytics beyond MapReduce.

Cassovary
Cassovary
Single system
Apache License 2.0
https://github.com/twitter/cassovary
About
Cassovary is a in-memory graph engine for the Java Virtual Machine (JVM) written in Scala. Cassovary is
designed from the ground up to efficiently handle graphs with billions of edges. It comes with some
common node and graph data structures and traversal algorithms. A typical usage is to do large-scale
graph mining and analysis.

Digree
Digree
Distributed
Research project
https://sigmodrecord.org/publications/si
gmodRecord/1712/pdfs/05_systems_Sp
yropoulos.pdf
About
Digree, is a system prototype that enables distributed execution of graph pattern matching queries in a
cloud of interconnected graph databases.

Faunus
Faunus
Distributed
Apache License 2.0
https://github.com/thinkaurelius/faunus
About
Faunus is a distributed analytics engine for processing property graphs with Hadoop. A breadth-first
version of the graph traversal language Gremlin operates on a vertex-centric property graph data
structure. Faunus provides adaptors to the distributed graph database Titan, any Rexster fronted graph
database, and to text and binary graphs stored in HDFS. The provided Gremlin operations and Hadoop
graph tools can be extended using MapReduce and Blueprints.

FlashGraph
FlashGraph
Distributed
Apache License 2.0
https://github.com/Smerity/FlashGraph
About
FlashGraph is a semi-external memory graph processing engine, optimized for a high-speed SSD array.
FlashGraph provides flexible programming interface to help users implement graph algorithms. In
FlashGraph, users write serial code that reads data in memory and FlashGraph executes users' code in
parallel and out of core. It enables us to process a billion-node graph in a single machine and has
performance comparable to or exceed in-memory graph engines such as PowerGraph.

Galloy
Galloy
Distributed
BSD License
http://iss.ices.utexas.edu/?p=projects/g
alois
About
The Galois system permits application programmers to exploit amorphous data-parallelism in irregular
algorithms without having to write explicitly parallel code. The Galois library provides concurrent data
structures, schedulers, and memory allocators. The Galois runtime executes these programs in parallel,
using parallelization strategies such as optimistic and round-based execution. Galois runs on
shared-memory NUMA platforms and NVIDIA GPUs. A subset of the Galois programming model is
supported on distributed-memory machines.

Gelly
Gelly
Single system
Apache 2.0 License
https://flink.apache.org/news/2015/08/
24/introducing-flink-gelly.html
About
Gelly is Apache Flink’s graph-processing API and library. Flink’s native support for iterations makes it a
suitable platform for large-scale graph analytics. By leveraging delta iterations, Gelly is able to map
various graph processing models such as vertex-centric or gather-sum-apply to Flink dataflows. Gelly
allows Flink users to perform end-to-end data analysis in a single system. Gelly can be seamlessly used
with Flink’s DataSet API, which means that pre-processing, graph creation, analysis, and post-processing
can be done in the same application.

GPS
GPS
Distributed
BSD License
http://infolab.stanford.edu/gps/
About
GPS is an open-source system for scalable, fault-tolerant, and easy-to-program execution of algorithms
on extremely large graphs. GPS is similar to Google’s proprietary Pregel system, and Apache Giraph. GPS
is a distributed system designed to run on a cluster of machines, such as Amazon's EC2.

Gradoop
Gradoop
Distributed
Apache 2.0 License
https://dbs.uni-leipzig.de/en/research/pr
ojects/gradoop
About
Gradoop is a research framework for scalable graph analytics built on top of Apache Flink™. It offers a
graph data model which extends the widespread property graph model by the concept of logical graphs
and further provides operators that can be applied on single logical graphs and collections of logical
graphs. The combination of these operators allows the flexible, declarative definition of graph analytical
workflows. Gradoop can be easily integrated in a workflow which already uses Flink™ operators and
Flink™ libraries (i.e. Gelly, ML and Table).

GraphChi
GraphChi
Single System
Apache 2.0 License
https://github.com/GraphChi/graphchi-c
pp
About
GraphChi is a disk-based system for computing efficiently on graphs with billions of edges. By using a
well-known method to break large graphs into small parts, and a novel parallel sliding windows method,
GraphChi is able to execute several advanced data mining, graph mining, and machine learning
algorithms on very large graphs, using just a single consumer-level computer. GraphChi is a spin-off
project separate from the GraphLab PowerGraph project

GraphLab PowerGraph
GraphLab PowerGraph
Distributed
Apache 2.0 License
https://github.com/jegonzal/PowerGrap
h
About
GraphLab PowerGraph is a graph-based, high performance, distributed computation framework written
in C++. The GraphLab PowerGraph academic project was started in 2009 at Carnegie Mellon University
to develop a new parallel computation abstraction tailored to machine learning. GraphLab PowerGraph
is no longer in active development by the founding team. GraphLab PowerGraph is now supported by the
community. The learnings from GraphLab PowerGraph and GraphChi projects have culminated into
GraphLab Create.

GraphX
GraphX
Distributed
Apache 2.0 License
https://spark.apache.org/graphx/
About
GraphX is Apache Spark's API for graphs and graph-parallel computation. GraphX unifies ETL,
exploratory analysis, and iterative graph computation within a single system. You can view the same
data as both graphs and collections, transform and join graphs with RDDs efficiently, and write custom
iterative graph algorithms using the Pregel API.

Hadoop MapReduce
Hadoop MapReduce
Distributed
Apache 2.0 License
https://mapr.com/products/product-over
view/mapreduce/
About
MapReduce is Hadoop's native batch processing engine. Apache MapReduce is a powerful framework
for processing large, distributed sets of structured or unstructured data on a Hadoop cluster. The key
feature of MapReduce is its ability to perform processing across an entire cluster of nodes, with each
node processing its local data.

Microsoft Graph Engine
Microsoft Graph Engine
Distributed
MIT License
https://www.graphengine.io/
About
Microsoft Graph Engine is a distributed in-memory data processing engine, underpinned by a
strongly-typed in-memory key-value store and a general distributed computation engine.

Mizan
Mizan
Single System
Research project
https://thegraphsblog.wordpress.com/th
e-graph-blog/mizan/
About
Mizan is an advanced clone to Google’s graph processing system Pregel that utilizes online graph vertex
migrations to dynamically optimizes the execution of graph algorithms. You can use our Mizan system
to develop any vertex centric graph algorithm and run in parallel over a local cluster or over cloud
infrastructure. Mizan is compatible with Pregel’s API, written in C++ and uses MPICH2 for
communication.

PGX
PGX
Distributed
OTN License
https://www.oracle.com/technetwork/or
acle-labs/parallel-graph-analytix/overvie
w/index.html
About
PGX is a toolkit for graph analysis - both running algorithms such as PageRank against graphs, and
performing SQL-like pattern-matching against graphs, using the results of algorithmic analysis.
Algorithms are parallelized for extreme performance. The PGX toolkit includes both a single-node
in-memory engine, and a distributed engine for extremely large graphs. Graphs can be loaded from a
variety of sources including flat files, SQL and NoSQL databases and Apache Spark and Hadoop;
incremental updates are supported.

Pregel
Pregel
Single System
MIT License
http://web.cs.ucdavis.edu/~amenta/f15/
pregel.pdf
About
Pregel is a distributed programming framework, focused on providing users with a natural API for
programming graph algorithms while managing the details of distribution invisibly, including messaging
and fault tolerance. It is similar in concept to MapReduce, but with a natural graph API and much more
efficient support for iterative computations over the graph. The high-level organization of Pregel
programs is inspired by Valiant’s Bulk Synchronous Parallel model.

Ringo
Ringo
Single System
BSD License
http://snap.stanford.edu/ringo/
About
Ringo is a system for construction and analysis of large graphs on a single large memory multicore
machine, that combines high productivity analysis with fast and scalable execution times.
It offers an interactive easy-to-use Python interface, a rich set of over 200 advanced graph operations
and algorithms (based on the SNAP graph library), integration of table and graph processing, and
support for efficient graph construction and transformations between tables and graphs.

Signal/Collect
Signal/Collect
Distributed
Apache 2.0 License
https://uzh.github.io/signal-collect/
About
Signal/Collect is a framework for computations on large graphs. The model allows to concisely express
many iterated and data-flow algorithms, while the framework parallelizes and distributes the
computation.

ThingSpan
ThingSpan
Distributed
Commercial
https://www.objectivity.com/products/th
ingspan/
About
ThingSpan is a purpose-built, massively scalable graph software platform, powered by Objectivity/DB,
that leverages the open source stack by natively integrating with Apache Spark and the Hadoop
Distributed File System (HDFS). It provides ultra-fast navigation and pathfinding queries against huge
distributed graphs. ThingSpan also supports parallel pattern-finding and predictive analytics in
combination with Spark components, such as MLlib, GraphX, and Spark SQL.

Graph analytics library and toolkit

Brainnets
Brainnets
C++
MIT License
https://github.com/makism/brainnets
About
Brainnet is a cross platform C++ graph analysis library for brain functional connectivity.

Combinatorial BLAS
Combinatorial BLAS
Linear Algebra
BSD License
https://people.eecs.berkeley.edu/~aydin
/CombBLAS/html/
About
The Combinatorial BLAS (CombBLAS) is an extensible distributed-memory parallel graph library offering
a small but powerful set of linear algebra primitives specifically targeting graph analytics.

Directed Graph Library (DGLib)
Directed Graph Library (DGLib)
C
GNU General Public License
https://grass.osgeo.org/dglib/
About
The Directed Graph Library provides functionality for vector network analysis. The original design idea
behind DGLib was to support middle sized graphs in RAM with a near-static structure that doesn't need
to be dynamically modified by the user program; ability to read graphs from input streams and process
them with no needle to rebuild internal trees. DGLib defines a serializable graph as being in FLAT state
and a editable graph as being in TREE state.

Dracula Graph Library
Dracula Graph Library
JavaScript
MIT License
https://www.graphdracula.net/
About
Dracula.js is a set of tools to display and layout interactive connected graphs and networks, along with
various related algorithms from the field of graph theory.

Graphinius JS
Graphinius JS
JavaScript
Apache 2.0
https://github.com/cassinius/Graphinius
JS
About
Graphinius JS is generic graph analysis library in Typescript. It is used in the GRAPHINIUS project an
Interactive Graph Research Framework with open access Web-based machine learning platform allowing
experts and end-users alike to visually compose state-of-the-art processing pipelines.

GraphJet
GraphJet
Java
Apache 2.0
https://github.com/twitter/GraphJet
About
GraphJet is a real-time graph processing library written in Java that maintains a full graph index over a
sliding time window in memory on a single server. This index supports a variety of graph algorithms
including personalized recommendation algorithms based on collaborative filtering. These algorithms
power a variety of real-time recommendation services within Twitter, notably content (tweets/URLs)
recommendations that require collaborative filtering over a heterogeneous, rapidly evolving graph.

Graphology
Graphology
JavaScript
MIT License
https://graphology.github.io/
About
Graphology is a specification and reference implementation for a robust & multipurpose JavaScript
Graph object. It aims at supporting various kinds of graphs with the same unified interface. Along with
those specifications, one will also find a standard library full of graph theory algorithms and common
utilities such as graph generators, layouts etc.

GraphStream
GraphStream
Java
CeCILL-C (French version) and LGPL v3
http://graphstream-project.org/
About
GraphStream is a Java library for the modeling and analysis of dynamic graphs. You can generate,
import, export, measure, layout and visualize them.

Grph
Grph
Java
Apache 2.0
http://www.i3s.unice.fr/~hogie/software
/index.php
About
Grph is a high-performance Java library for the manipulation of graphs. Its main design objectives are to
make it simple to use and extend, efficient, and, according to its initial motivation: useful in the context
of graph experimentation and network simulation. Grph also has the particularity to come with tools like
an evolutionary computation engine, a bridge to linear solvers, a framework for distributed computing,
etc.

iGraph
iGraph
C, R, Python, M, C++
GNU General Public License
https://igraph.org/
About
igraph is a collection of network analysis tools with the emphasis on efficiency, portability and ease of
use. igraph is open source and free. igraph can be programmed in R, Python, Mathematica and C/C++.

JGrphT
JGraphT
Java
LGPL 2.1 and EPL 2.0
https://jgrapht.org/
About
JGraphT is a Java library of graph theory data structures and algorithms designed for performance, with
near-native speed in many cases adapters for memory-optimized fastutil representation. JGraphT has
specialized iterators for graph traversal (DFS, BFS, etc) algorithms for path finding, clique detection,
isomorphism detection, coloring, common ancestors, tours, connectivity, matching, cycle detection,
partitions, cuts, flows, centrality, spanning, etc.

Java Universal Network Graph (Jung)
Jung
Java
BSD License
http://jung.sourceforge.net/
About
JUNG is a software library that provides a common and extendible language for the modeling, analysis,
and visualization of data that can be represented as a graph or network. The current distribution of
JUNG includes implementations of a number of algorithms from graph theory, data mining, and social
network analysis, such as routines for clustering, decomposition, optimization, random graph generation,
statistical analysis, and calculation of network distances, flows, and importance measures (centrality,
PageRank, HITS, etc.). JUNG also provides a visualization framework.

NetworKit
NetworKit
Python, C++
MIT License
https://networkit.github.io/
About
NetworKit is a growing open-source toolkit for large-scale network analysis. Its aim is to provide tools for
the analysis of large networks in the size range from thousands to billions of edges. For this purpose, it
implements efficient graph algorithms, many of them parallel to utilize multicore architectures. These
are meant to compute standard measures of network analysis, such as degree sequences, clustering
coefficients, and centrality measures.

NetworkX
NetworkX
Python
BSD License
https://networkx.github.io/
About
NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and
functions of complex networks.

NVIDIA Graph Analytics library (nvGRAPH)
nvGRAPH
CUDA-C
https://developer.nvidia.com/nvgraph
About
The NVIDIA Graph Analytics library (nvGRAPH) comprises of parallel algorithms for high performance
analytics on graphs with up to 2 billion edges. nvGRAPH makes it possible to build interactive and high
throughput graph analytics applications.

RDFLib
RDFLib
Python
BSD License
https://4store.github.io/
About
RDFLib is a Python package working with RDF. RDFLib contains parsers and serializers for RDF/XML, N3,
NTriples, N-Quads, Turtle, TriX, RDFa and Microdata; a Graph interface which can be backed by any one
of a number of Store implementations; store implementations for in memory storage and persistent
storage on top of the Berkeley DB; a SPARQL 1.1 implementation - supporting SPARQL 1.1 Queries and
Update statements.

ScaleGraph
ScaleGraph
X10
Eclipse Public License v1.0
http://scalegraph.sourceforge.net/web/
About
ScaleGraph is a graph library based on the highly productive X10 programming language. The goal of
ScaleGraph is to provide large-scale graph analysis algorithms and efficient distributed computing
framework for graph analysts and for algorithm developers, respectively.

Stanford Network Analysis Project (SNAP)
SNAP
C++
BSD License
http://snap.stanford.edu/
About
Stanford Network Analysis Platform (SNAP) is a general purpose network analysis and graph mining
library. It scales to massive networks with hundreds of millions of nodes, and billions of edges. It
efficiently manipulates large graphs, calculates structural properties, generates regular and random
graphs, and supports attributes on nodes and edges. SNAP is also available through the NodeXL which
is a graphical front-end that integrates network analysis into Microsoft Office and Excel.

Tink
Tink
Java
Apache 2.0
https://github.com/otherwise777/Tink
About
Tink is a temporal graph analytics library for the data stream engine Flink. One of the more important
aspects of this library is a temporal shortest path. It relies extensively on Gelly, Flink Graph API.

ArangoDB Query Language (AQL)
ArangoDB Query Language (AQL)
ArangoDB
https://docs.arangodb.com/3.3/AQL/
About
AQL is the SQL-like query language used in the ArangoDB database management system. It supports
CRUD operations for both documents (nodes) and edges, but it is not a data definition language (DDL).
AQL does support geospatial queries. It is JSON-oriented.

Cypher
Cypher
Neo4j
https://neo4j.com/developer/cypher/
About
Cypher is Neo4j’s graph query language that allows users to store and retrieve data from the graph
database. Cypher’s syntax provides a visual and logical way to match patterns of nodes and
relationships in the graph. It is a declarative, SQL-inspired language for describing visual patterns in
graphs using ASCII-Art syntax.

G-CORE
G-CORE
https://arxiv.org/pdf/1712.01550.pdf
About
G-CORE is a research language proposal from Linked Data Benchmark Council. It is a closed query
language where paths are first class citizens. The data model used in G-CORE is an extension of property
graphs with paths. It is an expressive query language.

Graph Query Language (GQL)
Graph Query Language (GQL)
Cypher ( Neo4j & the openCypher
community), PGQL (Oracle) and G-CORE
https://www.gqlstandards.org/
About
GQL is a proposed new international standard language for property graph querying. The idea of a
standalone graph query language to complement SQL was raised by ISO SC32/ WG3 members in early
2017, and is echoed in the GQL manifesto of May 2018.

GraphGrep
GraphGrep
About
GraphGrep is an application-independent method for querying graphs, finding all the occurrences of a
subgraph in a database of graphs. The interface to GraphGrep is a regular expression graph query
language Glide that combines features from XPath and Smart.

GraphQL
GraphQL
DGgraph
https://graphql.org/
About
GraphQL is a query language for APIs providing a complete and understandable description of the data
in APIs. It is questionable whether or not call GraphQL call a graph query language but GraphQL can be
used to query data modeled as a graph and various data system use it as such.

GSQL
GSQL
TigerGraph
https://www.tigergraph.com/tag/gsql/
About
The GSQL ® Query Language is a language for the exploration and analysis of large scale graphs. The
high-level language makes it easy to perform powerful graph traversal queries in the TigerGraph system.

Gremlin
Gremlin
Amazon Neptune, Cosmos DB, DataStax
Enterprise Graph, Hadoop (Giraph),
Hadoop (Spark ), InfiniteGraph,
JanusGraph, Neo4j, Ontotext, OrientDB
https://tinkerpop.apache.org/gremlin.ht
ml
About
Gremlin is the graph traversal language of Apache TinkerPop. Gremlin is a functional, data-flow language
that enables users to succinctly express complex traversals on (or queries of) their application's
property graph.

N3QL
N3QL
RDF triplestores
https://www.w3.org/DesignIssues/N3QL
.html
About
N3QL is an implementation of an N3-based query language for RDF. It treats RDF as data and provides
query with triple patterns and constraints over a single RDF model. The target usage is for scripting and
for experimentation in information modelling languages. The language is derived from Notation3.and
RDQL.

OpenCypher
OpenCypher
SAP HANA Graph, Neo4j, Agens Graph,
RedisGraph, Memgraph, Apache Spark,
Apache TinkerPop, Gradoop, Ruruki,
Graphflow
https://www.opencypher.org/about
About
Neo4j started the openCypher Project in 2015 to create the industry-standard language for querying
graph databases. The project aims to deliver a full and open specification of the graph database query
language: Cypher.

Property Graph Query Language (PGQL)
PGQL
Oracle Big Data Spatial and Graph
http://pgql-lang.org/
About
PGQL is a graph pattern-matching query language for the property graph data model, inspired by SQL,
openCypher, G-CORE, GSQL, and SPARQL. PGQL combines graph pattern matching with familiar
constructs from SQL, such as SELECT, FROM and WHERE.

RDF Data Query Language (RDQL)
RDQL
https://www.w3.org/Submission/RDQL/
About
RDQL is a query language for RDF based on SquishQL. It queries RDF documents using a SQL-alike
syntax. An RDQL query consists of a graph pattern, expressed as a list of triple patterns. Each triple
pattern is comprised of named variables and RDF values (URIs and literals).

Sesame RDF Query Language (SeRQL)
SeRQL
RDF triplestores
http://archive.rdf4j.org/users/ch11.html
About
SeRQL ("Sesame RDF Query Language", pronounced "circle") is an RDF query language that is very
similar to SPARQL, but with other syntax. SeRQL was originally developed as a better alternative for the
query languages RQL and RDQL. A lot of SeRQL's features can now be found in SPARQL and SeRQL has
adopted some of SPARQL's features in return.

SociaLite
SociaLite
Hadoop
https://github.com/socialite-lang/socialit
e
About
SociaLite is a high-level query language for distributed graph analysis. In SociaLite, analysis programs
are implemented in high-level queries, that are compiled to parallel/distributed code. SociaLite is
Hadoop compatible, hence SociaLite queries can read data on HDFS (Hadoop Distributed File System).

SPARQL
SPARQL
RDF triplestores, Jena, OpenLink Virtuoso
https://www.w3.org/TR/sparql11-query/
About
SPARQL is a query language and a protocol for accessing RDF designed by the W3C RDF Data Access
Working Group. It is a declarative query language for performing data manipulation and data definition
operations on data represented as a collection of RDF Language sentences/statements.

SquishQL
SquishQL
RDF triplestores
http://www.hpl.hp.com/techreports/200
2/HPL-2002-110.html
About
SquishQL is a RDF query language with SQL notation based on Guha's rdfDB query language.

TriQL
TriQL
RDF triplestores
http://wifo5-03.informatik.uni-mannheim
.de/bizer/triql/
About
TriQL is an SQL-based query language for extracting information from Named Graphs. TriQL is based on
RDQL. The basic idea of TriQL is using graph patterns for querying sets of named graphs.

GraphTech Ecosystem - part 2: Graph Analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to GraphTech Ecosystem - part 2: Graph Analytics

Similar to GraphTech Ecosystem - part 2: Graph Analytics (20)

More from Linkurious

More from Linkurious (20)

Recently uploaded

Recently uploaded (20)

GraphTech Ecosystem - part 2: Graph Analytics