SlideShare a Scribd company logo
Introduction to
Graph Databases
For Melbourne Data Engineering
by Timothy Findlay
What is a graph store ?
– Tool for storing and retrieving data
– Optimized for highly related data; where many things are connected to many other things
– There are many implementations such as Neo4J, Dgraph, ArangoDB, OrientDB, Titan/JanusGraph, DSE
Why use a graph ?
• High performance queries at scale
• Many-to-many relationships
Why not ?
• Slow range scans / initial seek
• Fixed depth, short scans
• Super Nodes eg. Everything is connected to everything
Stuart
Bob
Kevin
Data model problems
Relational databases are great at this, but can
struggle at scale with high number of joins
Some application databases are not designed for
analytical workloads which require joins
Some warehouses are not designed to provide
easy access to a variety of information without
slow and complex joining
Person Address
Credit Card
Address
Credit Card
Person
Vertex’s can labelled to form virtual layers
to partition data
eg.
Find address for each IP
Find buildings affected by a Cable
10.1.2.3
Layer: TCP/IP
Desk: WRK12:05
Layer: Building
10.1.2.4
Layer: TCP/IP
Cable 00154672
Layer: Physical
08:00:27:3d:90:82
Layer: Physical
08:00:27:73:4b:89
Layer: Physical
Desk: WRK12:05
Layer: Building
CableDuct: W775-12
Layer: Building
MobileMobile
Credit Card
Examples of traversing relationships
1 Someplace Ave
Layer: Address
1-2 Someplace Ave
Layer: Address
2 Someplace Ave
Layer: Address
Directionality
House House House
Bricks
Suburb: Kensington
Wood Asbestos A material may exist in a house and a houses may exist in a
suburb, but you may never know both sides of the equation
unless you search by house and traverse outward
For directed graphs the direction is a key design consideration
and needs to be considered carefully up front
Train Station
Fancy features
– Vertex revisions
Computer
Person
Some graph databases support versioning of vertexes.
This can used as a form of version control for audit, or to see what traversals
were like at a different point in time.
It could also be used to compare points in time to understand changes in
traversals eg. changes to the route of a network, plumbing, transport system
Person
Person
Computer
• Event Sourcing & Triggers
• Consider integrating Flume/Flink/Nifi pipelines into a graph
• Event sourcing can be a powerful way of establishing edges on the fly from a Kafka topic or JMS message
Company
Event1 Event2 Event3 Event4
If Company
has Event2
+ Event 4
Then …
Variations between implementations
– Storage layers – RocksDB, Cassandra, BerkleyDB, in-memory
– Types of query languages – Gremlin, GraphQL+, openCypher
– Different cluster technologies (Partitioning/Sharding)
– Enforced Schema and Schema-less support
– Different data types (document stores vs key/value pairs)
– Geospatial support
– Support for database functions
– Record versioning / effective dating
RDBMS / SQL GRAPH / GRAPHQL
database database
table collection
row document / vertex
column attribute
table joins collection joins (graph call these edges)
primary key primary key (automatically present on _key attribute)
index index
Terminology
How to use a graph store ?
– Starting
– docker run -p 8529:8529 arangodb/arangodb:3.0.10
– Putting data in
– echo { "name" : "Timothy" } | curl --basic --user “root:openSesame" -X POST --data-binary @- --dump -
http://localhost:8529/_api/gharial/MyPeople/vertex/people
– Pulling data out
– curl --basic --user “root:openSesame" -X GET http://arangodb:8529/_api/document/people/timothyfindlay/
– curl -X POST --dump - http://arangodb:8529/_api/people/
{ "query" : "FOR x in people FILTER IS_IN_POLYGON( x.loc, [ 153.090667 , -27.247108 ] , true ) == true RETURN
[ x.name ]", "bindVars": { "id" : 3 } }
This presentation is NOT sponsored by Arango GMBH
Python(eg. Airflow operator)
from pyArango.connection import *
import names
conn = Connection(arangoURL="http://10.1.20.6:8529", username="root", password="openSesame")
conn.createDatabase(name="MelbourneDEM")
db = conn["MelbourneDEM"]
table = db.createCollection(name="people")
for i in xrange(1000):
doc = table.createDocument()
doc["_key"] = names.get_full_name()
doc.save
CLI(eg. Cron batch processing)
arangoimp --on-duplicate ignore 
--log.level warn 
--server.endpoint http+tcp://10.1.20.6:8529 
--server.authentication 'true' 
--server.username root 
--server.password openSesame 
--server.database MelbourneDEM 
--type csv 
--create-collection true 
--create-collection-type edge 
--file edges.csv 
--collection people_links
Java eg. Flume connector, Kafka Connect Sink, Nifi processor
<dependency>
<groupId>com.arangodb</groupId>
<artifactId>arangodb-java-driver</artifactId>
<version>4.1.12</version>
</dependency>
ArangoDB a = new ArangoDB.Builder()
.host(arangoDBHost, arangoDBPort)
.user(arangoDBUser)
.password(arangoDBPass)
.maxConnections(maxConnections)
.registerModule(new VPackDriverModule())
.build();
ArangoDatabase b = a.createDatabase(dbName);
ArangoCollection c = b.createCollection(collectionName);
MultiDocumentEntity d = c.insertDocuments( .... )
Javascript eg. NodeJS, Browser applications
'use strict';
// Constants
const PORT = 8888;
db = require('arangojs')();
db.useDatabase('pni');
// App
const app = express();
app.get('/', function (req, res) {
db.query('FOR x in people FILTER IS_IN_POLYGON( x.loc, [ 153.090667 , -27.247108 ] , true ) == true RETURN [ x.name ]').then(
cursor => cursor.all()
).then(
keys => res.send('All keys:', keys.join('<BR/> ')),
err => console.error('Failed to execute query:', err)
);
});
app.listen(PORT);
Whats next…
– Query languages like GraphQL
– Clustering and scaling
– More ingestion pipelines eg. Nifi, Kafka Connect
– Embedded functions / events

More Related Content

What's hot

Hadoop data ingestion
Hadoop data ingestionHadoop data ingestion
Hadoop data ingestion
Vinod Nayal
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
Wes McKinney
 
Introduction to ArangoDB (nosql matters Barcelona 2012)
Introduction to ArangoDB (nosql matters Barcelona 2012)Introduction to ArangoDB (nosql matters Barcelona 2012)
Introduction to ArangoDB (nosql matters Barcelona 2012)
ArangoDB Database
 
Introduction To Spark - Durham LUG 20150916
Introduction To Spark - Durham LUG 20150916Introduction To Spark - Durham LUG 20150916
Introduction To Spark - Durham LUG 20150916
Ian Pointer
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
Amazon Web Services
 
London HUG
London HUGLondon HUG
London HUG
Boudicca
 
Open source data ingestion
Open source data ingestionOpen source data ingestion
Open source data ingestion
Treasure Data, Inc.
 
Deep Dive on ArangoDB
Deep Dive on ArangoDBDeep Dive on ArangoDB
Deep Dive on ArangoDB
Max Neunhöffer
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Wes McKinney
 
Multi-model databases and node.js
Multi-model databases and node.jsMulti-model databases and node.js
Multi-model databases and node.js
Max Neunhöffer
 
BDT204 Awesome Applications of Open Data - AWS re: Invent 2012
BDT204 Awesome Applications of Open Data - AWS re: Invent 2012BDT204 Awesome Applications of Open Data - AWS re: Invent 2012
BDT204 Awesome Applications of Open Data - AWS re: Invent 2012
Amazon Web Services
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
 
Mining a Large Web Corpus
Mining a Large Web CorpusMining a Large Web Corpus
Mining a Large Web Corpus
Robert Meusel
 
Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics tools
Nascenia IT
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dan Lynn
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
Wes McKinney
 
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018  - 03 - Starburst CBOPresto Summit 2018  - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
kbajda
 
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ..."Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
Dataconomy Media
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Databricks
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
Eduardo Silva Pereira
 

What's hot (20)

Hadoop data ingestion
Hadoop data ingestionHadoop data ingestion
Hadoop data ingestion
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
 
Introduction to ArangoDB (nosql matters Barcelona 2012)
Introduction to ArangoDB (nosql matters Barcelona 2012)Introduction to ArangoDB (nosql matters Barcelona 2012)
Introduction to ArangoDB (nosql matters Barcelona 2012)
 
Introduction To Spark - Durham LUG 20150916
Introduction To Spark - Durham LUG 20150916Introduction To Spark - Durham LUG 20150916
Introduction To Spark - Durham LUG 20150916
 
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
AWS Customer Presentation: Freie Univerisitat - Berlin Summit 2012
 
London HUG
London HUGLondon HUG
London HUG
 
Open source data ingestion
Open source data ingestionOpen source data ingestion
Open source data ingestion
 
Deep Dive on ArangoDB
Deep Dive on ArangoDBDeep Dive on ArangoDB
Deep Dive on ArangoDB
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
Multi-model databases and node.js
Multi-model databases and node.jsMulti-model databases and node.js
Multi-model databases and node.js
 
BDT204 Awesome Applications of Open Data - AWS re: Invent 2012
BDT204 Awesome Applications of Open Data - AWS re: Invent 2012BDT204 Awesome Applications of Open Data - AWS re: Invent 2012
BDT204 Awesome Applications of Open Data - AWS re: Invent 2012
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data TransportApache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
 
Mining a Large Web Corpus
Mining a Large Web CorpusMining a Large Web Corpus
Mining a Large Web Corpus
 
Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics tools
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data FramesACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
 
Presto Summit 2018 - 03 - Starburst CBO
Presto Summit 2018  - 03 - Starburst CBOPresto Summit 2018  - 03 - Starburst CBO
Presto Summit 2018 - 03 - Starburst CBO
 
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ..."Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
 
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 

Similar to 20181215 introduction to graph databases

Above the cloud joarder kamal
Above the cloud   joarder kamalAbove the cloud   joarder kamal
Above the cloud joarder kamal
Joarder Kamal
 
Document Databases & RavenDB
Document Databases & RavenDBDocument Databases & RavenDB
Document Databases & RavenDB
Brian Ritchie
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
DataWorks Summit
 
Hybrid architecture integrateduserviewdata-peyman_mohajerian
Hybrid architecture integrateduserviewdata-peyman_mohajerianHybrid architecture integrateduserviewdata-peyman_mohajerian
Hybrid architecture integrateduserviewdata-peyman_mohajerian
Data Con LA
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
MapR Technologies
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
Marin Dimitrov
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overview
Madhur Nawandar
 
Azure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfAzure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdf
pbonillo1
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
samthemonad
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsApache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
MapR Technologies
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
javier ramirez
 
C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2
Bill Liu
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
Leandro Totino Pereira
 
H2O Rains with Databricks Cloud - Parisoma SF
H2O Rains with Databricks Cloud - Parisoma SFH2O Rains with Databricks Cloud - Parisoma SF
H2O Rains with Databricks Cloud - Parisoma SF
Sri Ambati
 
Webinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerWebinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data Layer
IBM Cloud Data Services
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
Chester Chen
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
Alluxio, Inc.
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
Rakuten Group, Inc.
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
Adam Muise
 
H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16
Sri Ambati
 

Similar to 20181215 introduction to graph databases (20)

Above the cloud joarder kamal
Above the cloud   joarder kamalAbove the cloud   joarder kamal
Above the cloud joarder kamal
 
Document Databases & RavenDB
Document Databases & RavenDBDocument Databases & RavenDB
Document Databases & RavenDB
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 
Hybrid architecture integrateduserviewdata-peyman_mohajerian
Hybrid architecture integrateduserviewdata-peyman_mohajerianHybrid architecture integrateduserviewdata-peyman_mohajerian
Hybrid architecture integrateduserviewdata-peyman_mohajerian
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overview
 
Azure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfAzure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdf
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsApache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2C19013010 the tutorial to build shared ai services session 2
C19013010 the tutorial to build shared ai services session 2
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
H2O Rains with Databricks Cloud - Parisoma SF
H2O Rains with Databricks Cloud - Parisoma SFH2O Rains with Databricks Cloud - Parisoma SF
H2O Rains with Databricks Cloud - Parisoma SF
 
Webinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data LayerWebinar: The Anatomy of the Cloudant Data Layer
Webinar: The Anatomy of the Cloudant Data Layer
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
May 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETLMay 29, 2014 Toronto Hadoop User Group - Micro ETL
May 29, 2014 Toronto Hadoop User Group - Micro ETL
 
H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16
 

Recently uploaded

June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
marufrahmanstratejm
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 

Recently uploaded (20)

June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 

20181215 introduction to graph databases

  • 1. Introduction to Graph Databases For Melbourne Data Engineering by Timothy Findlay
  • 2. What is a graph store ? – Tool for storing and retrieving data – Optimized for highly related data; where many things are connected to many other things – There are many implementations such as Neo4J, Dgraph, ArangoDB, OrientDB, Titan/JanusGraph, DSE Why use a graph ? • High performance queries at scale • Many-to-many relationships Why not ? • Slow range scans / initial seek • Fixed depth, short scans • Super Nodes eg. Everything is connected to everything Stuart Bob Kevin
  • 3. Data model problems Relational databases are great at this, but can struggle at scale with high number of joins Some application databases are not designed for analytical workloads which require joins Some warehouses are not designed to provide easy access to a variety of information without slow and complex joining Person Address Credit Card Address Credit Card Person Vertex’s can labelled to form virtual layers to partition data eg. Find address for each IP Find buildings affected by a Cable 10.1.2.3 Layer: TCP/IP Desk: WRK12:05 Layer: Building 10.1.2.4 Layer: TCP/IP Cable 00154672 Layer: Physical 08:00:27:3d:90:82 Layer: Physical 08:00:27:73:4b:89 Layer: Physical Desk: WRK12:05 Layer: Building CableDuct: W775-12 Layer: Building MobileMobile Credit Card Examples of traversing relationships 1 Someplace Ave Layer: Address 1-2 Someplace Ave Layer: Address 2 Someplace Ave Layer: Address
  • 4. Directionality House House House Bricks Suburb: Kensington Wood Asbestos A material may exist in a house and a houses may exist in a suburb, but you may never know both sides of the equation unless you search by house and traverse outward For directed graphs the direction is a key design consideration and needs to be considered carefully up front Train Station
  • 5. Fancy features – Vertex revisions Computer Person Some graph databases support versioning of vertexes. This can used as a form of version control for audit, or to see what traversals were like at a different point in time. It could also be used to compare points in time to understand changes in traversals eg. changes to the route of a network, plumbing, transport system Person Person Computer • Event Sourcing & Triggers • Consider integrating Flume/Flink/Nifi pipelines into a graph • Event sourcing can be a powerful way of establishing edges on the fly from a Kafka topic or JMS message Company Event1 Event2 Event3 Event4 If Company has Event2 + Event 4 Then …
  • 6. Variations between implementations – Storage layers – RocksDB, Cassandra, BerkleyDB, in-memory – Types of query languages – Gremlin, GraphQL+, openCypher – Different cluster technologies (Partitioning/Sharding) – Enforced Schema and Schema-less support – Different data types (document stores vs key/value pairs) – Geospatial support – Support for database functions – Record versioning / effective dating RDBMS / SQL GRAPH / GRAPHQL database database table collection row document / vertex column attribute table joins collection joins (graph call these edges) primary key primary key (automatically present on _key attribute) index index Terminology
  • 7. How to use a graph store ? – Starting – docker run -p 8529:8529 arangodb/arangodb:3.0.10 – Putting data in – echo { "name" : "Timothy" } | curl --basic --user “root:openSesame" -X POST --data-binary @- --dump - http://localhost:8529/_api/gharial/MyPeople/vertex/people – Pulling data out – curl --basic --user “root:openSesame" -X GET http://arangodb:8529/_api/document/people/timothyfindlay/ – curl -X POST --dump - http://arangodb:8529/_api/people/ { "query" : "FOR x in people FILTER IS_IN_POLYGON( x.loc, [ 153.090667 , -27.247108 ] , true ) == true RETURN [ x.name ]", "bindVars": { "id" : 3 } } This presentation is NOT sponsored by Arango GMBH
  • 8. Python(eg. Airflow operator) from pyArango.connection import * import names conn = Connection(arangoURL="http://10.1.20.6:8529", username="root", password="openSesame") conn.createDatabase(name="MelbourneDEM") db = conn["MelbourneDEM"] table = db.createCollection(name="people") for i in xrange(1000): doc = table.createDocument() doc["_key"] = names.get_full_name() doc.save CLI(eg. Cron batch processing) arangoimp --on-duplicate ignore --log.level warn --server.endpoint http+tcp://10.1.20.6:8529 --server.authentication 'true' --server.username root --server.password openSesame --server.database MelbourneDEM --type csv --create-collection true --create-collection-type edge --file edges.csv --collection people_links
  • 9. Java eg. Flume connector, Kafka Connect Sink, Nifi processor <dependency> <groupId>com.arangodb</groupId> <artifactId>arangodb-java-driver</artifactId> <version>4.1.12</version> </dependency> ArangoDB a = new ArangoDB.Builder() .host(arangoDBHost, arangoDBPort) .user(arangoDBUser) .password(arangoDBPass) .maxConnections(maxConnections) .registerModule(new VPackDriverModule()) .build(); ArangoDatabase b = a.createDatabase(dbName); ArangoCollection c = b.createCollection(collectionName); MultiDocumentEntity d = c.insertDocuments( .... )
  • 10. Javascript eg. NodeJS, Browser applications 'use strict'; // Constants const PORT = 8888; db = require('arangojs')(); db.useDatabase('pni'); // App const app = express(); app.get('/', function (req, res) { db.query('FOR x in people FILTER IS_IN_POLYGON( x.loc, [ 153.090667 , -27.247108 ] , true ) == true RETURN [ x.name ]').then( cursor => cursor.all() ).then( keys => res.send('All keys:', keys.join('<BR/> ')), err => console.error('Failed to execute query:', err) ); }); app.listen(PORT);
  • 11. Whats next… – Query languages like GraphQL – Clustering and scaling – More ingestion pipelines eg. Nifi, Kafka Connect – Embedded functions / events

Editor's Notes

  1. Highlight use case for layered data eg. physical elements and virtual elements relate Summarise graph theory and degree / adjacency / laplacian matrix in mathematical graph theory
  2. Many to Many – Compare traditional SQL query scanning many people to many cards to many other people to many addresses etc. Layering can follow ITU G805 / ISO / Semantic layer
  3. 1. Revisions can be used for planning – future state analysis 2. Batch loading via strom and traditional methods ok, but real-time makes it more usable in real time
  4. Note: XML documents such as KMZ/KML
  5. There are components available for tools like Flume, Kafka and Nifi to load directly into some engines.