3. Graph Analytics:
(Network Science)
Leveraging of connections between
entities in a network towards gaining
insight about said entities and/or the
network via the use of graph
algorithms.
4. 1) Why graph analytics?
2) How are graph analytics done currently?
3) What are most people dealing with?
4) Bolt-on graph analytics with GraphGen
5) The GraphGen Language
5. Graphs Across Domains
Protein-protein
interaction networks
Financial transaction
networks
Stock Trading Networks
Social Networks
Federal Funds Networks
Knowledge Graph
World Wide Web
Communication
Networks
Citation Networks
…...
http://go.umd.edu/graphs
6. Example Use cases
● Financial crimes
(e.g. money
laundering)
● Fraudulent
transactions
● Cybercrime
● Counterterrorism
● Key players in a
network
● Ranking entities (web
pages, PageRank)
● Providing connection
recommendations to
users
● Optimizing
transportation
routes
● Identifying
weaknesses in
power grids, water
grids etc.
● Computer networks
● Medical Research
● Disease pathology
● DNA Sequencing
7. 1) Why graph analytics?
2) How are graph analytics
done currently?
3) What are most people dealing with?
4) Bolt-on graph analytics with GraphGen
5) The GraphGen Language
8. Types of Graph Analytics
● Graph “queries”: Subgraph pattern matching, shortest
paths, temporal queries
● Real Time Analytics: Anomaly/Event detection, online
prediction
● Batch Analytics (Network Science): Centrality analysis,
community detection, network evolution
● Machine Learning: Matrix factorization, logistic
regression modeled as message passing in specially
structured graphs.
http://go.umd.edu/graphs
9. State of the art
● Graph Analytics tasks are too widely varied
http://go.umd.edu/graphs
● There is no one-size-fits-all solution
○ RDBMS/Hadoop/Spark have their tradeoffs
● Fragmented area with little consensus
❖ Specialized graph databases (Neo4j, Titan, Blazegraph, Cayley,Dgraph)
❖ RDF stores (Allegrograph, Jena)
❖ Bolt-on solutions (Teradata SQL-Graph, SAP Graph Engine,
Oracle)
❖ Distributed batch processing systems (Giraph, GraphX,
GraphLab) Lots of ETL required!
❖ Many more research prototypes...
11. What should I use then??
● What fraction of the overall workload is
graph-oriented?
● How often are some sort of graph analytics
required to run?
● Do you need to do graph updates?
● What types of analytics are required?
● How large would the graphs be?
● Are you starting from scratch or do you have an
already deployed DBMS?
12. 1) Why graph analytics?
2) How are graph analytics done currently?
3) What are most people
dealing with?
4) Bolt-on graph analytics with GraphGen
5) The GraphGen Language
13. ● Most business analytics (querying, reporting,
OLAP) happen in SQL
● Organizations typically model their data
according to their needs
● Graph databases if you have strictly
graph-centric workloads
Where’s the Data?
14. Where’s the Data?
● Most likely organized in some type of database schema
● Collection of tables related to each-other through
common attributes, or primary, foreign-key constraints.
We need to extract connections between entities
16. Lots of “hidden” graphs
● Let’s take TPC-H.
part_key
Part
supplier_key
...
customer_key
Customer
customer_name
...
order_key
Orders
part_key
customer_key
...
supplier_key
Supplier
supplier_name
...
● We could create edges
between two customers if
they’ve:
○ Bought the same item
○ Bought the same item on
the same day
○ Bought from the same
supplier
○ Etc.
17. State of the art
● Graph Analytics tasks are too widely varied
http://go.umd.edu/graphs
● There is no one-size-fits-all solution
○ RDBMS/Hadoop/Spark have their tradeoffs
● Fragmented area with little consensus
❖ Specialized graph databases (Neo4j, Titan, Blazegraph, Cayley,Dgraph)
❖ RDF stores (Allegrograph, Jena)
❖ Bolt-on solutions (Teradata SQL-Graph, SAP Graph Engine,
Oracle)
❖ Distributed batch processing systems (Giraph, GraphX,
GraphLab) Lots of ETL required!
❖ Many more research prototypes...
18. State of the art
● Graph Analytics tasks are too widely varied
http://go.umd.edu/graphs
● There is no one-size-fits-all solution
○ RDBMS/Hadoop/Spark have their tradeoffs
● Fragmented area with little consensus
❖ Specialized graph databases (Neo4j, Titan, Blazegraph, Cayley,Dgraph)
❖ RDF stores (Allegrograph, Jena)
❖ Bolt-on solutions (Teradata SQL-Graph, SAP Graph Engine,
Oracle)
❖ Distributed batch processing systems (Giraph, GraphX,
GraphLab) Lots of ETL required!
❖ Many more research prototypes...
19. 1) Why graph analytics?
2) How are graph analytics done currently?
3) What are most people dealing with?
4) Bolt-on graph analytics
with GraphGen
5) The GraphGen Language
20. GraphGen
Extract and analyze
many different kinds
of graphs
Simple, Intuitive,
Declarative Language,
No ETL required
Full Graph API & Vertex
Centric Framework
23. ● Exploration of database schema to detect
different types of hidden graphs.
● Allows users to visually explore potential
graphs.
● Simple statistic and on-the-fly analysis
Not all graphs will be useful!
GraphGen Explorer Web App
26. from graphgenpy import GraphGenerator
import networkx as nx
datalogQuery = """
Nodes(ID, Name) :- Author(ID, Name).
Edges(ID1, ID2) :- AuthorPublication(ID1, PubID), AuthorPublication(ID2, PubID).
"""
# Credentials for connecting to the database
gg = GraphGenerator("localhost","5432","testgraphgen","kostasx","password")
fname = gg.generateGraph(datalogQuery,"extracted_graph",GraphGenerator.GML)
G = nx.read_gml(fname,'id')
print "Graph Loaded into NetworkX! Running PageRank..."
# Run any algorithm on the graph using NetworkX
print nx.pagerank(G)
print "Done!"
Define GraphGen Query
Database Credentials
Generate and
Serialize Graph
Load Graph into
NetworkX
Run Any Algorithm
28. // Establish Connection to Database
GraphGenerator ggen = new GraphGenerator("host", "port", "dbName",
"username", "password");
// Define and evaluate a single graph extraction query
String datalog_query = "...";
Graph g = ggen.generateGraph(datalog_query).get(0);
// Initialize vertec-centric object
VertexCentric p = new VertexCentric(g);
// Define vertex-centric compute function
Executor program = new Executor("result_value_name") {
@Override
public void compute(Vertex v, VertexCentric p) {
// implementation of compute function
}
};
// Begin execution
p.run(program);
Define GraphGen Query
Database Credentials
Extract and Load
Graph
Define Vertex
Centric Program
Run Program
29. // Establish Connection to Database
GraphGenerator ggen = new GraphGenerator("host", "port", "dbName",
"username", "password");
// Define and evaluate a single graph extraction query
String datalog_query = "...";
Graph g = ggen.generateGraph(datalog_query).get(0);
for (Vertex v : g.getVertices()) {
// For each neighbor
for (Vertex neighbor : v.getVertices(Direction.OUT)) {
// Do something
}
}
Define GraphGen Query
Database Credentials
Extract and Load
Graph
Use Full API to
access the Graph
31. 1) Why graph analytics?
2) How are graph analytics done currently?
3) What are most people dealing with?
4) Bolt-on graph analytics with GraphGen
5) The GraphGen Language
32. GraphGen DSL
● Intuitive Domain Specific Language based on Datalog
● User needs to specify:
○ How the nodes are defined
○ How the edges are defined
● The query is executed, and the user gets a Graph object
to operate upon.
● Very expressive: Allows for homogeneous and
heterogeneous graphs with various types of nodes and
edges.
34. GraphGen DSL Example
Nodes(ID, Name) :- Customer(ID, Name).
● Creates a node out of each row in the Customer table
■ Customer ID and Name as properties
Edges(ID1, ID2) :-
Orders(_,partKey, ID1), Orders(_,partKey, ID2).
● Connect ID1 -> ID2 if they have both ordered the same part
35. GraphGen
● Enable extraction of
different types of hidden
graphs
● Independent of where the
data is stored (given SQL)
● Enable complex analytics
over the extracted graphs
● Efficient extraction
through various
in-memory
representations
● Efficient analysis
through a parallel
execution engine
● Effortless through a
Declarative Language
● Eliminates the need
for complex ETL
● Intuitive and swift
analysis of any graph
that exists in your
data!