SlideShare a Scribd company logo
1 of 37
Download to read offline
GraphGen: Conducting
Graph Analytics over
Relational Databases
Konstantinos Xirogiannopoulos
Amol Deshpande
collaborated
Name:
Konstantinos
Name:
Amol
Name: University of MarylandName: PyData DC
Year: 2016
gave_talk
works_at
works_at
Graph Analytics:
(Network Science)
Leveraging of connections between
entities in a network towards gaining
insight about said entities and/or the
network via the use of graph
algorithms.
1) Why graph analytics?
2) How are graph analytics done currently?
3) What are most people dealing with?
4) Bolt-on graph analytics with GraphGen
5) The GraphGen Language
Graphs Across Domains
Protein-protein
interaction networks
Financial transaction
networks
Stock Trading Networks
Social Networks
Federal Funds Networks
Knowledge Graph
World Wide Web
Communication
Networks
Citation Networks
…...
http://go.umd.edu/graphs
Example Use cases
● Financial crimes
(e.g. money
laundering)
● Fraudulent
transactions
● Cybercrime
● Counterterrorism
● Key players in a
network
● Ranking entities (web
pages, PageRank)
● Providing connection
recommendations to
users
● Optimizing
transportation
routes
● Identifying
weaknesses in
power grids, water
grids etc.
● Computer networks
● Medical Research
● Disease pathology
● DNA Sequencing
1) Why graph analytics?
2) How are graph analytics
done currently?
3) What are most people dealing with?
4) Bolt-on graph analytics with GraphGen
5) The GraphGen Language
Types of Graph Analytics
● Graph “queries”: Subgraph pattern matching, shortest
paths, temporal queries
● Real Time Analytics: Anomaly/Event detection, online
prediction
● Batch Analytics (Network Science): Centrality analysis,
community detection, network evolution
● Machine Learning: Matrix factorization, logistic
regression modeled as message passing in specially
structured graphs.
http://go.umd.edu/graphs
State of the art
● Graph Analytics tasks are too widely varied
http://go.umd.edu/graphs
● There is no one-size-fits-all solution
○ RDBMS/Hadoop/Spark have their tradeoffs
● Fragmented area with little consensus
❖ Specialized graph databases (Neo4j, Titan, Blazegraph, Cayley,Dgraph)
❖ RDF stores (Allegrograph, Jena)
❖ Bolt-on solutions (Teradata SQL-Graph, SAP Graph Engine,
Oracle)
❖ Distributed batch processing systems (Giraph, GraphX,
GraphLab) Lots of ETL required!
❖ Many more research prototypes...
Different Analytics Flows
Other SystemsGraph Databases Bolt-On Solutions
What should I use then??
● What fraction of the overall workload is
graph-oriented?
● How often are some sort of graph analytics
required to run?
● Do you need to do graph updates?
● What types of analytics are required?
● How large would the graphs be?
● Are you starting from scratch or do you have an
already deployed DBMS?
1) Why graph analytics?
2) How are graph analytics done currently?
3) What are most people
dealing with?
4) Bolt-on graph analytics with GraphGen
5) The GraphGen Language
● Most business analytics (querying, reporting,
OLAP) happen in SQL
● Organizations typically model their data
according to their needs
● Graph databases if you have strictly
graph-centric workloads
Where’s the Data?
Where’s the Data?
● Most likely organized in some type of database schema
● Collection of tables related to each-other through
common attributes, or primary, foreign-key constraints.
We need to extract connections between entities
Most Likely...
Lots of “hidden” graphs
● Let’s take TPC-H.
part_key
Part
supplier_key
...
customer_key
Customer
customer_name
...
order_key
Orders
part_key
customer_key
...
supplier_key
Supplier
supplier_name
...
● We could create edges
between two customers if
they’ve:
○ Bought the same item
○ Bought the same item on
the same day
○ Bought from the same
supplier
○ Etc.
State of the art
● Graph Analytics tasks are too widely varied
http://go.umd.edu/graphs
● There is no one-size-fits-all solution
○ RDBMS/Hadoop/Spark have their tradeoffs
● Fragmented area with little consensus
❖ Specialized graph databases (Neo4j, Titan, Blazegraph, Cayley,Dgraph)
❖ RDF stores (Allegrograph, Jena)
❖ Bolt-on solutions (Teradata SQL-Graph, SAP Graph Engine,
Oracle)
❖ Distributed batch processing systems (Giraph, GraphX,
GraphLab) Lots of ETL required!
❖ Many more research prototypes...
State of the art
● Graph Analytics tasks are too widely varied
http://go.umd.edu/graphs
● There is no one-size-fits-all solution
○ RDBMS/Hadoop/Spark have their tradeoffs
● Fragmented area with little consensus
❖ Specialized graph databases (Neo4j, Titan, Blazegraph, Cayley,Dgraph)
❖ RDF stores (Allegrograph, Jena)
❖ Bolt-on solutions (Teradata SQL-Graph, SAP Graph Engine,
Oracle)
❖ Distributed batch processing systems (Giraph, GraphX,
GraphLab) Lots of ETL required!
❖ Many more research prototypes...
1) Why graph analytics?
2) How are graph analytics done currently?
3) What are most people dealing with?
4) Bolt-on graph analytics
with GraphGen
5) The GraphGen Language
GraphGen
Extract and analyze
many different kinds
of graphs
Simple, Intuitive,
Declarative Language,
No ETL required
Full Graph API & Vertex
Centric Framework
GraphGen Interfaces
Native Java LibraryPython wrapper LibraryGraphGen Explorer: UI
Web Application
Graphgen Explorer Web App
● Exploration of database schema to detect
different types of hidden graphs.
● Allows users to visually explore potential
graphs.
● Simple statistic and on-the-fly analysis
Not all graphs will be useful!
GraphGen Explorer Web App
GraphgenPy in Python
from graphgenpy import GraphGenerator
import networkx as nx
datalogQuery = """
Nodes(ID, Name) :- Author(ID, Name).
Edges(ID1, ID2) :- AuthorPublication(ID1, PubID), AuthorPublication(ID2, PubID).
"""
# Credentials for connecting to the database
gg = GraphGenerator("localhost","5432","testgraphgen","kostasx","password")
fname = gg.generateGraph(datalogQuery,"extracted_graph",GraphGenerator.GML)
G = nx.read_gml(fname,'id')
print "Graph Loaded into NetworkX! Running PageRank..."
# Run any algorithm on the graph using NetworkX
print nx.pagerank(G)
print "Done!"
Define GraphGen Query
Database Credentials
Generate and
Serialize Graph
Load Graph into
NetworkX
Run Any Algorithm
Native GraphGen in Java
// Establish Connection to Database
GraphGenerator ggen = new GraphGenerator("host", "port", "dbName",
"username", "password");
// Define and evaluate a single graph extraction query
String datalog_query = "...";
Graph g = ggen.generateGraph(datalog_query).get(0);
// Initialize vertec-centric object
VertexCentric p = new VertexCentric(g);
// Define vertex-centric compute function
Executor program = new Executor("result_value_name") {
@Override
public void compute(Vertex v, VertexCentric p) {
// implementation of compute function
}
};
// Begin execution
p.run(program);
Define GraphGen Query
Database Credentials
Extract and Load
Graph
Define Vertex
Centric Program
Run Program
// Establish Connection to Database
GraphGenerator ggen = new GraphGenerator("host", "port", "dbName",
"username", "password");
// Define and evaluate a single graph extraction query
String datalog_query = "...";
Graph g = ggen.generateGraph(datalog_query).get(0);
for (Vertex v : g.getVertices()) {
// For each neighbor
for (Vertex neighbor : v.getVertices(Direction.OUT)) {
// Do something
}
}
Define GraphGen Query
Database Credentials
Extract and Load
Graph
Use Full API to
access the Graph
GraphGen Back-End Architecture
1) Why graph analytics?
2) How are graph analytics done currently?
3) What are most people dealing with?
4) Bolt-on graph analytics with GraphGen
5) The GraphGen Language
GraphGen DSL
● Intuitive Domain Specific Language based on Datalog
● User needs to specify:
○ How the nodes are defined
○ How the edges are defined
● The query is executed, and the user gets a Graph object
to operate upon.
● Very expressive: Allows for homogeneous and
heterogeneous graphs with various types of nodes and
edges.
TPC-H Database
partKey
Part
supplierKey
...
customerKey
Customer
customerName
...
● We want to explore a
graph of customers!
● Using the GraphGen
Language:
○ Which tables do
we need to
combine to extract
the nodes and
edges
orderKey
Orders
partKey
customerKey
...
supplierKey
Supplier
supplierName
...
GraphGen DSL Example
Nodes(ID, Name) :- Customer(ID, Name).
● Creates a node out of each row in the Customer table
■ Customer ID and Name as properties
Edges(ID1, ID2) :-
Orders(_,partKey, ID1), Orders(_,partKey, ID2).
● Connect ID1 -> ID2 if they have both ordered the same part
GraphGen
● Enable extraction of
different types of hidden
graphs
● Independent of where the
data is stored (given SQL)
● Enable complex analytics
over the extracted graphs
● Efficient extraction
through various
in-memory
representations
● Efficient analysis
through a parallel
execution engine
● Effortless through a
Declarative Language
● Eliminates the need
for complex ETL
● Intuitive and swift
analysis of any graph
that exists in your
data!
Download GraphGen at:
konstantinosx.github.io/graphgen-project/
DDL Blog Post at:
blog.districtdatalabs.com/graph-analytics-over-relational-datasets
Email: kostasx@cs.umd.edu
Twitter: @kxirog
Download GraphGen at:
konstantinosx.github.io/graphgen-project/
Thank you!

More Related Content

What's hot

GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and GiraphDoug Needham
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis PatternsMikio L. Braun
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big dataSigmoid
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBMohamed Taher Alrefaie
 
Open Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkOpen Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
 
Congressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4jCongressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4jWilliam Lyon
 
GraphQL Schema Stitching with Prisma & Contentful
GraphQL Schema Stitching with Prisma & ContentfulGraphQL Schema Stitching with Prisma & Contentful
GraphQL Schema Stitching with Prisma & ContentfulNikolas Burk
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
 
Next-generation API Development with GraphQL and Prisma
Next-generation API Development with GraphQL and PrismaNext-generation API Development with GraphQL and Prisma
Next-generation API Development with GraphQL and PrismaNikolas Burk
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
A general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4JA general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4JFlorent Biville
 
Interpreting Relational Schema to Graphs
Interpreting Relational Schema to GraphsInterpreting Relational Schema to Graphs
Interpreting Relational Schema to GraphsNeo4j
 
Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014Taro L. Saito
 
Why is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncWhy is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncFranz Inc. - AllegroGraph
 
Python網站框架絕技: Django 完全攻略班
Python網站框架絕技: Django 完全攻略班Python網站框架絕技: Django 完全攻略班
Python網站框架絕技: Django 完全攻略班Paul Chao
 
Www Search Engine But Not In Perl
Www Search Engine But Not In PerlWww Search Engine But Not In Perl
Www Search Engine But Not In PerlKonstantin Ivinsky
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedTuri, Inc.
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupMárton Kodok
 
Big Data-Driven Applications with Cassandra and Spark
Big Data-Driven Applications  with Cassandra and SparkBig Data-Driven Applications  with Cassandra and Spark
Big Data-Driven Applications with Cassandra and SparkArtem Chebotko
 

What's hot (20)

GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
Realtime Data Analysis Patterns
Realtime Data Analysis PatternsRealtime Data Analysis Patterns
Realtime Data Analysis Patterns
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DB
 
Open Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkOpen Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache Spark
 
Congressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4jCongressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4j
 
GraphQL Schema Stitching with Prisma & Contentful
GraphQL Schema Stitching with Prisma & ContentfulGraphQL Schema Stitching with Prisma & Contentful
GraphQL Schema Stitching with Prisma & Contentful
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
Next-generation API Development with GraphQL and Prisma
Next-generation API Development with GraphQL and PrismaNext-generation API Development with GraphQL and Prisma
Next-generation API Development with GraphQL and Prisma
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
A general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4JA general introduction to Spring Data / Neo4J
A general introduction to Spring Data / Neo4J
 
Interpreting Relational Schema to Graphs
Interpreting Relational Schema to GraphsInterpreting Relational Schema to Graphs
Interpreting Relational Schema to Graphs
 
Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014
 
Why is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncWhy is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz Inc
 
Python網站框架絕技: Django 完全攻略班
Python網站框架絕技: Django 完全攻略班Python網站框架絕技: Django 完全攻略班
Python網站框架絕技: Django 完全攻略班
 
Www Search Engine But Not In Perl
Www Search Engine But Not In PerlWww Search Engine But Not In Perl
Www Search Engine But Not In Perl
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch Warmup
 
Big Data-Driven Applications with Cassandra and Spark
Big Data-Driven Applications  with Cassandra and SparkBig Data-Driven Applications  with Cassandra and Spark
Big Data-Driven Applications with Cassandra and Spark
 

Viewers also liked

20141216 graph database prototyping ams meetup
20141216 graph database prototyping ams meetup20141216 graph database prototyping ams meetup
20141216 graph database prototyping ams meetupRik Van Bruggen
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Spark Summit
 
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypetPyData
 
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014PyData
 
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...PyData
 
Nipype
NipypeNipype
NipypePyData
 
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"PyData
 
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014PyData
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...PyData
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...PyData
 
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischInteractive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischPyData
 
How to apply graph analytics for bank loan fraud detection?
How to apply graph analytics for bank loan fraud detection?How to apply graph analytics for bank loan fraud detection?
How to apply graph analytics for bank loan fraud detection?Linkurious
 
Github in a Graph
Github in a GraphGithub in a Graph
Github in a Graphakollegger
 
10-15 511 genetic algorithms and machine learning (alan nochenson)
10-15 511 genetic algorithms and machine learning (alan nochenson)10-15 511 genetic algorithms and machine learning (alan nochenson)
10-15 511 genetic algorithms and machine learning (alan nochenson)Alan Nochenson
 
Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Machine Learning, Data Mining, Genetic Algorithms, Neural ...Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Machine Learning, Data Mining, Genetic Algorithms, Neural ...butest
 
How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...PyData
 
Finding the insights hidden in your graph data
Finding the insights hidden in your graph dataFinding the insights hidden in your graph data
Finding the insights hidden in your graph dataDataStax
 
Python resampling
Python resamplingPython resampling
Python resamplingPyData
 
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerFaster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerPyData
 
Doing frequentist statistics with scipy
Doing frequentist statistics with scipyDoing frequentist statistics with scipy
Doing frequentist statistics with scipyPyData
 

Viewers also liked (20)

20141216 graph database prototyping ams meetup
20141216 graph database prototyping ams meetup20141216 graph database prototyping ams meetup
20141216 graph database prototyping ams meetup
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
 
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypet
 
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
 
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
 
Nipype
NipypeNipype
Nipype
 
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
 
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
 
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischInteractive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
 
How to apply graph analytics for bank loan fraud detection?
How to apply graph analytics for bank loan fraud detection?How to apply graph analytics for bank loan fraud detection?
How to apply graph analytics for bank loan fraud detection?
 
Github in a Graph
Github in a GraphGithub in a Graph
Github in a Graph
 
10-15 511 genetic algorithms and machine learning (alan nochenson)
10-15 511 genetic algorithms and machine learning (alan nochenson)10-15 511 genetic algorithms and machine learning (alan nochenson)
10-15 511 genetic algorithms and machine learning (alan nochenson)
 
Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Machine Learning, Data Mining, Genetic Algorithms, Neural ...Machine Learning, Data Mining, Genetic Algorithms, Neural ...
Machine Learning, Data Mining, Genetic Algorithms, Neural ...
 
How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...
 
Finding the insights hidden in your graph data
Finding the insights hidden in your graph dataFinding the insights hidden in your graph data
Finding the insights hidden in your graph data
 
Python resampling
Python resamplingPython resampling
Python resampling
 
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerFaster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
 
Doing frequentist statistics with scipy
Doing frequentist statistics with scipyDoing frequentist statistics with scipy
Doing frequentist statistics with scipy
 

Similar to GraphGen: Conducting Graph Analytics over Relational Databases

Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataTrieu Nguyen
 
20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patengeKarin Patenge
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...TigerGraph
 
The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphTrey Grainger
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022ArangoDB Database
 
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ArangoDB Database
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?Samet KILICTAS
 
Graph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d appsGraph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d appsGene Leybzon
 
201411203 goto night on graphs for fraud detection
201411203 goto night on graphs for fraud detection201411203 goto night on graphs for fraud detection
201411203 goto night on graphs for fraud detectionRik Van Bruggen
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RGraphRM
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Demi Ben-Ari
 
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Neo4j
 
Intro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucIntro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucFraugster
 
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONSBIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONScscpconf
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions csandit
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022ArangoDB Database
 
Machine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better RecommendationsMachine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better RecommendationsChristopherWoodward16
 

Similar to GraphGen: Conducting Graph Analytics over Relational Databases (20)

Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
 
20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge
 
Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
Hardware Accelerated Machine Learning Solution for Detecting Fraud and Money ...
 
The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge Graph
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
 
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
ATO 2022 - Machine Learning + Graph Databases for Better Recommendations (3)....
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Graph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d appsGraph protocol for accessing information about blockchains and d apps
Graph protocol for accessing information about blockchains and d apps
 
201411203 goto night on graphs for fraud detection
201411203 goto night on graphs for fraud detection201411203 goto night on graphs for fraud detection
201411203 goto night on graphs for fraud detection
 
aRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con RaRangodb, un package per l'utilizzo di ArangoDB con R
aRangodb, un package per l'utilizzo di ArangoDB con R
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
 
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
 
Intro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucIntro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana Goriuc
 
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONSBIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
BIG GRAPH: TOOLS, TECHNIQUES, ISSUES, CHALLENGES AND FUTURE DIRECTIONS
 
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
Big Graph : Tools, Techniques, Issues, Challenges and Future Directions
 
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
Machine Learning + Graph Databases for Better Recommendations V1 08/06/2022
 
Machine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better RecommendationsMachine Learning + Graph Databases for Better Recommendations
Machine Learning + Graph Databases for Better Recommendations
 

More from PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

More from PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

GraphGen: Conducting Graph Analytics over Relational Databases

  • 1. GraphGen: Conducting Graph Analytics over Relational Databases Konstantinos Xirogiannopoulos Amol Deshpande
  • 2. collaborated Name: Konstantinos Name: Amol Name: University of MarylandName: PyData DC Year: 2016 gave_talk works_at works_at
  • 3. Graph Analytics: (Network Science) Leveraging of connections between entities in a network towards gaining insight about said entities and/or the network via the use of graph algorithms.
  • 4. 1) Why graph analytics? 2) How are graph analytics done currently? 3) What are most people dealing with? 4) Bolt-on graph analytics with GraphGen 5) The GraphGen Language
  • 5. Graphs Across Domains Protein-protein interaction networks Financial transaction networks Stock Trading Networks Social Networks Federal Funds Networks Knowledge Graph World Wide Web Communication Networks Citation Networks …... http://go.umd.edu/graphs
  • 6. Example Use cases ● Financial crimes (e.g. money laundering) ● Fraudulent transactions ● Cybercrime ● Counterterrorism ● Key players in a network ● Ranking entities (web pages, PageRank) ● Providing connection recommendations to users ● Optimizing transportation routes ● Identifying weaknesses in power grids, water grids etc. ● Computer networks ● Medical Research ● Disease pathology ● DNA Sequencing
  • 7. 1) Why graph analytics? 2) How are graph analytics done currently? 3) What are most people dealing with? 4) Bolt-on graph analytics with GraphGen 5) The GraphGen Language
  • 8. Types of Graph Analytics ● Graph “queries”: Subgraph pattern matching, shortest paths, temporal queries ● Real Time Analytics: Anomaly/Event detection, online prediction ● Batch Analytics (Network Science): Centrality analysis, community detection, network evolution ● Machine Learning: Matrix factorization, logistic regression modeled as message passing in specially structured graphs. http://go.umd.edu/graphs
  • 9. State of the art ● Graph Analytics tasks are too widely varied http://go.umd.edu/graphs ● There is no one-size-fits-all solution ○ RDBMS/Hadoop/Spark have their tradeoffs ● Fragmented area with little consensus ❖ Specialized graph databases (Neo4j, Titan, Blazegraph, Cayley,Dgraph) ❖ RDF stores (Allegrograph, Jena) ❖ Bolt-on solutions (Teradata SQL-Graph, SAP Graph Engine, Oracle) ❖ Distributed batch processing systems (Giraph, GraphX, GraphLab) Lots of ETL required! ❖ Many more research prototypes...
  • 10. Different Analytics Flows Other SystemsGraph Databases Bolt-On Solutions
  • 11. What should I use then?? ● What fraction of the overall workload is graph-oriented? ● How often are some sort of graph analytics required to run? ● Do you need to do graph updates? ● What types of analytics are required? ● How large would the graphs be? ● Are you starting from scratch or do you have an already deployed DBMS?
  • 12. 1) Why graph analytics? 2) How are graph analytics done currently? 3) What are most people dealing with? 4) Bolt-on graph analytics with GraphGen 5) The GraphGen Language
  • 13. ● Most business analytics (querying, reporting, OLAP) happen in SQL ● Organizations typically model their data according to their needs ● Graph databases if you have strictly graph-centric workloads Where’s the Data?
  • 14. Where’s the Data? ● Most likely organized in some type of database schema ● Collection of tables related to each-other through common attributes, or primary, foreign-key constraints. We need to extract connections between entities
  • 16. Lots of “hidden” graphs ● Let’s take TPC-H. part_key Part supplier_key ... customer_key Customer customer_name ... order_key Orders part_key customer_key ... supplier_key Supplier supplier_name ... ● We could create edges between two customers if they’ve: ○ Bought the same item ○ Bought the same item on the same day ○ Bought from the same supplier ○ Etc.
  • 17. State of the art ● Graph Analytics tasks are too widely varied http://go.umd.edu/graphs ● There is no one-size-fits-all solution ○ RDBMS/Hadoop/Spark have their tradeoffs ● Fragmented area with little consensus ❖ Specialized graph databases (Neo4j, Titan, Blazegraph, Cayley,Dgraph) ❖ RDF stores (Allegrograph, Jena) ❖ Bolt-on solutions (Teradata SQL-Graph, SAP Graph Engine, Oracle) ❖ Distributed batch processing systems (Giraph, GraphX, GraphLab) Lots of ETL required! ❖ Many more research prototypes...
  • 18. State of the art ● Graph Analytics tasks are too widely varied http://go.umd.edu/graphs ● There is no one-size-fits-all solution ○ RDBMS/Hadoop/Spark have their tradeoffs ● Fragmented area with little consensus ❖ Specialized graph databases (Neo4j, Titan, Blazegraph, Cayley,Dgraph) ❖ RDF stores (Allegrograph, Jena) ❖ Bolt-on solutions (Teradata SQL-Graph, SAP Graph Engine, Oracle) ❖ Distributed batch processing systems (Giraph, GraphX, GraphLab) Lots of ETL required! ❖ Many more research prototypes...
  • 19. 1) Why graph analytics? 2) How are graph analytics done currently? 3) What are most people dealing with? 4) Bolt-on graph analytics with GraphGen 5) The GraphGen Language
  • 20. GraphGen Extract and analyze many different kinds of graphs Simple, Intuitive, Declarative Language, No ETL required Full Graph API & Vertex Centric Framework
  • 21. GraphGen Interfaces Native Java LibraryPython wrapper LibraryGraphGen Explorer: UI Web Application
  • 23. ● Exploration of database schema to detect different types of hidden graphs. ● Allows users to visually explore potential graphs. ● Simple statistic and on-the-fly analysis Not all graphs will be useful! GraphGen Explorer Web App
  • 24.
  • 26. from graphgenpy import GraphGenerator import networkx as nx datalogQuery = """ Nodes(ID, Name) :- Author(ID, Name). Edges(ID1, ID2) :- AuthorPublication(ID1, PubID), AuthorPublication(ID2, PubID). """ # Credentials for connecting to the database gg = GraphGenerator("localhost","5432","testgraphgen","kostasx","password") fname = gg.generateGraph(datalogQuery,"extracted_graph",GraphGenerator.GML) G = nx.read_gml(fname,'id') print "Graph Loaded into NetworkX! Running PageRank..." # Run any algorithm on the graph using NetworkX print nx.pagerank(G) print "Done!" Define GraphGen Query Database Credentials Generate and Serialize Graph Load Graph into NetworkX Run Any Algorithm
  • 28. // Establish Connection to Database GraphGenerator ggen = new GraphGenerator("host", "port", "dbName", "username", "password"); // Define and evaluate a single graph extraction query String datalog_query = "..."; Graph g = ggen.generateGraph(datalog_query).get(0); // Initialize vertec-centric object VertexCentric p = new VertexCentric(g); // Define vertex-centric compute function Executor program = new Executor("result_value_name") { @Override public void compute(Vertex v, VertexCentric p) { // implementation of compute function } }; // Begin execution p.run(program); Define GraphGen Query Database Credentials Extract and Load Graph Define Vertex Centric Program Run Program
  • 29. // Establish Connection to Database GraphGenerator ggen = new GraphGenerator("host", "port", "dbName", "username", "password"); // Define and evaluate a single graph extraction query String datalog_query = "..."; Graph g = ggen.generateGraph(datalog_query).get(0); for (Vertex v : g.getVertices()) { // For each neighbor for (Vertex neighbor : v.getVertices(Direction.OUT)) { // Do something } } Define GraphGen Query Database Credentials Extract and Load Graph Use Full API to access the Graph
  • 31. 1) Why graph analytics? 2) How are graph analytics done currently? 3) What are most people dealing with? 4) Bolt-on graph analytics with GraphGen 5) The GraphGen Language
  • 32. GraphGen DSL ● Intuitive Domain Specific Language based on Datalog ● User needs to specify: ○ How the nodes are defined ○ How the edges are defined ● The query is executed, and the user gets a Graph object to operate upon. ● Very expressive: Allows for homogeneous and heterogeneous graphs with various types of nodes and edges.
  • 33. TPC-H Database partKey Part supplierKey ... customerKey Customer customerName ... ● We want to explore a graph of customers! ● Using the GraphGen Language: ○ Which tables do we need to combine to extract the nodes and edges orderKey Orders partKey customerKey ... supplierKey Supplier supplierName ...
  • 34. GraphGen DSL Example Nodes(ID, Name) :- Customer(ID, Name). ● Creates a node out of each row in the Customer table ■ Customer ID and Name as properties Edges(ID1, ID2) :- Orders(_,partKey, ID1), Orders(_,partKey, ID2). ● Connect ID1 -> ID2 if they have both ordered the same part
  • 35. GraphGen ● Enable extraction of different types of hidden graphs ● Independent of where the data is stored (given SQL) ● Enable complex analytics over the extracted graphs ● Efficient extraction through various in-memory representations ● Efficient analysis through a parallel execution engine ● Effortless through a Declarative Language ● Eliminates the need for complex ETL ● Intuitive and swift analysis of any graph that exists in your data!
  • 36. Download GraphGen at: konstantinosx.github.io/graphgen-project/ DDL Blog Post at: blog.districtdatalabs.com/graph-analytics-over-relational-datasets
  • 37. Email: kostasx@cs.umd.edu Twitter: @kxirog Download GraphGen at: konstantinosx.github.io/graphgen-project/ Thank you!