SlideShare a Scribd company logo
LDBC
Social Network Benchmark (SNB)
Graph Generator
graph benchmarking to the next level
Peter Boncz
Why Benchmarking?
• make
competing
products
comparable

© Jim Gray, 2005

• accelerate
progress,
make
technology
viable
What is the LDBC?
Linked Data Benchmark Council = LDBC
• Industry entity similar to TPC (www.tpc.org)
• Focusing on graph and RDF store benchmarking

•
•
•
•
•
•

Kick-started by an EU project
Runs from September 2012 – March 2015
9 project partners:
SNB Roadmap
•
•
•
•
•

Motivation
Dataset Generator
Workloads
Results
Future
SNB Scenario: Social Network Analysis
• Intuitive: everybody knows what a SN is
– Facebook, Twitter, LinkedIn, …

• SNs can be easily represented as a graph

– Entities are the nodes (Person, Group, Tag, Post, ...)
– Relationships are the edges (Friend, Likes, Follows, …)

• Different scales: from small to very large SNs
– Up to billions of nodes and edges

• Multiple query needs:

– interactive, analytical, transactional

• Multiple types of uses:

– marketing, recommendation, social interactions, fraud
detection, ...

•
Audience
• For developers facing graph processing tasks
– recognizable scenario to compare merits of different
products and technologies

• For vendors of graph database technology
– checklist of features and performance characteristics

• For researchers, both industrial and academic
– challenges in multiple choke-point areas such as
graph query optimization and (distributed) graph
analysis

•
What was developed?
• Four main elements:
–
–
–

data schema: defines the structure of the data
workloads: defines the set of operations to perform
performance metrics: used to measure
(quantitatively) the performance of the systems
– execution rules: defined to assure that the results
from different executions of the benchmark are
valid and comparable

• Software as Open Source (GitHub)
– data generator, query drivers, validation tools, ...

•
•
SNB Roadmap
•
•
•
•
•

Motivation
Dataset Generator
Workloads
Results
Future
Data Schema
• Specified in UML for portability
–
–
–

Classes
associations between classes
Attributes for classes and associations

• Some of the relationships represent dimensions
– Time (Y,QT,Month,Day)
– Geography (Continent,Country,Place)

• Data Formats

– CSV
– RDF (Turtle + N3)
–

•
Data Schema
Data Generation Process
• Produce synthetic data that mimics the
characteristics of real SN data
• Based on SIB–S3G2 Social Graph Generator

– www.w3.org/wiki/Social_Network_Intelligence_BenchM
ark

• Graph model:

– property (directed labeled) graph

• Building blocks
–
–
–

•

Correlated data dictionaries
Sequential subtree generation
Graph generation among correlation dimensions
Choke Points
• Hidden challenges in a benchmark
influence database system design, e.g. TPC-H
•
•
•

Functional Dependency Analysis in aggregation
Bloom Filters for sparse joins
Subquery predicate propagation

–

• LDBC explicitly designs benchmarks looking at
choke-point “coverage”
– requires access to database kernel architects

•
Data Correlations between attributes
SELECT personID from person
WHERE firstName = ‘Joachim’ AND addressCountry = ‘Germany’

Anti-Correlation

SELECT personID from person
WHERE firstName = ‘Cesare’

AND addressCountry = ‘Italy’

§
§ Query optimizers may underestimate or overestimate the result size of
conjunctive predicates
Joachim Loew
Cesare

Cesare
Joachim Prandelli
Property Dictionaries
• Property values for each literal property
defined by:
– a dictionary D, a fixed set of values (e.g., terms
from DBPedia or GeoNames)
– a ranking function R (between 1 and|D|)
– a probability density function F that specifies
how the data generator chooses values from
dictionary D using the rank R for each term in
the dictionary
Compact Correlated Property Value Generation
Using geometric distribution for function
F()
Correlated Edge Generation
“1990”

<Britney Spears>
rth
e>
Student
lik
Ye
<
“University of< stu
s>
ar
d yA
e> “Anna”
> know
m
Leipzig”
t>
<
tn a
firs
e>
< liveAt>
m
<knows>
P5
<
“Germany”
<firstna
P1
“Laura”
<birth
Year>
“1990”
s>
w
<Britney
no
<k
Spears>
P3
P2
>
<s
“University of
At
tu
y
d
dy
Leipzig”
At
stu
<
>
“University of
“University of
Amsterdam”
Leipzig”
“1990” “Netherlands”
<b
i

s
ow

< li
ke

n
<k

>

<is

>

P4

tu
<s

>

>
At
dy

>

ear>

At
<live

Y
<birth
How to Generate a Correlated Graph?
Ye

?

?

ear>

tu
<s
>

connection
probability

<birth
Y

?

<li
k

e>

>
ar

At
dy

<Britney Spears>

>

h
irt
<b

“University of < s
t ud
yAt
Leipzig”
>
me>
<firstna
P1
“Laura”

<Britney Spears>
• Computeke>
li similarity of two nodes based
P4 <
Student
on their (correlated) properties.
“Anna”
e>function wrt
• ?Use a probability density
m
tn a
irs
ffor connecting nodes
<
to this similarity <liveAt> “Germany”
P5
?
<is

“1990”

P3

“1990”

Y
<birth

“University of
>
P2
At
Leipzig”
highly similar  less similar
dy
tu
<s
<s
> tudy
At
“University this is very expensive to compute on a large
Danger:
“University of
of Leipzig”
Amsterdam”
graph! (quadratic, random access)
“1990”
“Netherlands”

?

At>
<live

e ar >
Window Optimization
Ye

<birth
Y

no
<k

e>

>
ar

>

n
<k

P3

At
dy

<Britney Spears>

s>
w
o

ear>

tu
<s
>

connection
probability

ws

<li
k

>

h
irt
<b

“University of < s
t ud
yAt
Leipzig”
>
me>
<firstna
P1
“Laura”

<is

“1990”

>
ke <Britney Spears>
i
P4 <l nodes with too large similarity distance
Trick: disregard
Student
(only connect nodes in a similarity window)
s>
e> “Anna”
ow
m
<kn
tn a
Window <firs
<knows>
P5
<liveAt> “Germany”

“1990”

Y
<birth

“University of
Leipzig”
highly similar  less similar
<s
<s
> tudy
A
“University
Probability that two nodes are connectedt is “University w.r.t
skewed of
of similarity between the nodes (due to probability
the Leipzig”
Amsterdam”
“1990”
“Netherlands”
>
At
y
ud
t

At>
<live

e ar >

distr.)

P2
Graph generation
• Create new edges and nodes in one pass, starting
from an existing node and start creating new
nodes to which it will be connected
• Uses degree distributions that follow a power law
• Generates correlated and highly connected graph
data by creating edges after generating many
nodes
– Similarity metric
– Random dimension
Implementation
• MapReduce

– A Map function runs on different parts of the input data, in
parallel and on many clusters
– Main Reduce function:
•
•

Sort the correlation dimensions
Generate edges based on sliding windows (similarity metric)

– Successive MapReduce phases for different correlation
dimensions

• Scalable, e.g.:

– Generate 1.2TB of data in 30 minutes (16 4-core nodes)
–

•
Validation: Metrics
•
•
•
•
•
•
•
•
•

Largest Connected Component
Average Clustering Coefficient
Diameter
Average Path Length
Hop-plot User-Knows
Attribute distributions
Degree distributions
Time evolution
Statistics (100K users / 1 year)
SNB Roadmap
•
•
•
•
•

Motivation
Dataset Generator
Workloads
Results
Future
Workloads
• On-Line: tests a system's throughput with relatively
simple queries with concurrent updates

– Show all photos posted by my friends that I was tagged in
–

• Business Intelligence: consists of complex structured
queries for analyzing online behavior
– Influential people the topic of open source development?
–

• Graph Analytics: tests the functionality and scalability
on most of the data as a single operation
– PageRank, Shortest Path(s), Community Detection

•
Workloads by system
Interactive Query Set
• Tests system throughput with relatively simple queries and
concurrent updates
• Current set: 12 read-only queries
• For each query:
–
–
–
–
–

Name and detailed description in plain English
List of input parameters
Expected result: content and format
Textual functional description
Relevance:

• textual description (plain English) of the reasoning for including this
query in the workload
• discussion about the technical challenges (Choke Points) targeted

– Validation parameters and validation results
– SPARQL query
•

•
–
Some SNB Interactive Choke Points
• Graph Traversals. Query execution time heavily depends
on the ability to quickly traverse friends graph.
• Plan Variablility. Each query have many different best
plans depending on parameter choices (eg. Hash- vs
index-based joins).
• Top k and distinct: Many queries return the first results
in a specific order: Late projection, pushing conditions
from the sort into the query
• Repetitive short queries, differing only in literals,
opportunity for query plan recycling
Choke Point Coverage
Example: Q3

•

Name: Friends within 2 hops that have been in two countries
Description:
Find Friends and Friends of Friends of the user A that have made a post in
the foreign countries X and Y within a specified period. We count only posts
that are made in the country that is different from the country of a friend. The
result should be sorted descending by total number of posts, and then by person
URI. Top 20 should be shown. The user A (as friend of his friend) should not be
in the result
Parameter:
- Person
- CountryX
- CountryY
- startDate - the beginning of the requested period
- Duration - requested period in days
Result:
- Person.id, Person.firstname, Person.lastName
- Number of post of each country and the sum of all posts
Relevance:
- Choke Points: CP3.3
- If one country is large but anticorrelated with the country of self then
processing this before a smaller but positively correlated country can be
beneficial
Example: Q5 - SPARQL
select ?group count (*)
where {
{select distinct ?fr
where {
{%Person% snvoc:knows ?fr.} union
{%Person% snvoc:knows ?fr2.
?fr2 snvoc:knows ?fr. filter (?fr != %Person%)}
}
} .
?group snvoc:hasMember ?mem . ?mem snvoc:hasPerson ?fr .
?mem snvoc:joinDate ?date . filter (?date >=
"%Date0%"^^xsd:date) .
?post snvoc:hasCreator ?fr . ?group snvoc:containerOf ?post
}
group by ?group
order by desc(2) ?group
limit 20
Example: Q5 - Cypher
MATCH (person:Person)-[:KNOWS*1..2]-(friend:Person)
WHERE person.id={person_id}
MATCH (friend)<-[membership:HAS_MEMBER]-(forum:Forum)
WHERE membership.joinDate>{join_date}
MATCH (friend)<-[:HAS_CREATOR]-(comment:Comment)
WHERE (comment)-[:REPLY_OF*0..]->(:Comment)-[:REPLY_OF]->(:Post)<[:CONTAINER_OF]-(forum)
RETURN forum.title AS forum, count(comment) AS commentCount
ORDER BY commentCount DESC
MATCH (person:Person)-[:KNOWS*1..2]-(friend:Person)
WHERE person.id={person_id}
MATCH (friend)<-[membership:HAS_MEMBER]-(forum:Forum)
WHERE membership.joinDate>{join_date}
MATCH (friend)<-[:HAS_CREATOR]-(post:Post)<-[:CONTAINER_OF]-(forum)
RETURN forum.title AS forum, count(post) AS postCount
ORDER BY postCount DESC
Example: Q5 - DEX
v.setLongVoid(personId);
v.setLongVoid(personId);
long personOID = graph.findObject(personId, v);
long personOID = graph.findObject(personId, v);
Objects friends = graph.neighbors(personOID, knows, EdgesDirection.Outgoing);
Objects friends = graph.neighbors(personOID, knows, EdgesDirection.Outgoing);
Objects allFriends = graph.neighbors(friends, knows, EdgesDirection.Outgoing);
Objects allFriends = graph.neighbors(friends, knows, EdgesDirection.Outgoing);
allFriends.union(friends);
allFriends.union(friends);
allFriends.remove(personOID);
allFriends.remove(personOID);
friends.close();
friends.close();
Objects members = graph.explode(allFriends, hasMember, EdgesDirection.Ingoing);
Objects members = graph.explode(allFriends, hasMember, EdgesDirection.Ingoing);
v.setTimestampVoid(date);
v.setTimestampVoid(date);
Objects candidate = graph.select(joinDate, Condition.GreaterEqual, v, members);
Objects candidate = graph.select(joinDate, Condition.GreaterEqual, v, members);
Objects finalSelection = graph.tails(candidate);
Objects finalSelection = graph.tails(candidate);
candidate.close();
candidate.close();
members.close();
members.close();
Objects posts = graph.neighbors(allFriends, hasCreator, EdgesDirection.Ingoing);
Objects posts = graph.neighbors(allFriends, hasCreator, EdgesDirection.Ingoing);
ObjectsIterator iterator = finalSelection.iterator();
ObjectsIterator iterator = finalSelection.iterator();
while (iterator.hasNext()) {
while (iterator.hasNext()) {
long oid = iterator.next();
long oid = iterator.next();
Container c = new Container();
Container c = new Container();
Objects postsGroup = graph.neighbors(oid, containerOf, EdgesDirection.Outgoing);
Objects postsGroup = graph.neighbors(oid, containerOf, EdgesDirection.Outgoing);
Objects moderators = graph.neighbors(oid, hasModerator, EdgesDirection.Outgoing);
Objects moderators = graph.neighbors(oid, hasModerator, EdgesDirection.Outgoing);
long moderatorOid = moderators.any();
long moderatorOid = moderators.any();
moderators.close();
moderators.close();
Objects postsModerator = graph.neighbors(moderatorOid, hasCreator, EdgesDirection.Ingoing);
Objects postsModerator = graph.neighbors(moderatorOid, hasCreator, EdgesDirection.Ingoing);
postsGroup.difference(postsModerator);
postsGroup.difference(postsModerator);
postsModerator.close();
postsModerator.close();
postsGroup.intersection(posts);
postsGroup.intersection(posts);
long count = postsGroup.size();
long count = postsGroup.size();
if (count > 0) {
if (count > 0) {
graph.getAttribute(oid, forumId, v);
graph.getAttribute(oid, forumId, v);
c.row[0] = db.getForumURI(v.getLong());
c.row[0] = db.getForumURI(v.getLong());
c.compare2 = String.valueOf(v.getLong());
c.compare2 = String.valueOf(v.getLong());
c.row[1] = String.valueOf(count);
c.row[1] = String.valueOf(count);
c.compare = count;
c.compare = count;
results.add(c);
results.add(c);
}
}
postsGroup.close()
postsGroup.close()
}
}
SNB Roadmap
•
•
•
•
•

Motivation
Dataset Generator
Workloads
Results
Future
SNB Software
• SNB data
– Dataset generator
– Dataset Validator
– SN Graph Analyzer

• SNB Query drivers
– BSBM based
– YCSB based
Some Experiments
• Virtuoso (RDF)
– 100k users during 3 years period (3.3 billion triples,
60GB)
– Ten SPARQL query mixes
– 4 x Intel Xeon 2.30GHz CPU, 193 GB of RAM

• DEX (Graph Database)
– Validation setup: 10k users during 3 years (19GB)
– Validation query set and parameters (API-based)
– 2 x Intel Xeon 2.40Ghz CPU, 128 GB of RAM

•
Virtuoso Interactive Workload
• Some queries could not be considered as truly interactive
– e.g. Q4, Q5 and Q9
– … still all queries are very interesting challenges

• ”Irregular” data distribution reflecting the reality of the SN
– … but complicates the selection of query parameters

•
•
Exploration in Scale
• 3.3 bn RDF triples per 100K users, 24G in triples,
36G in literals
• 2/3 of data in interactive working set, 1/4 in BI
working set
• scale out becomes increasingly necessary after 1M
• 10-100M users are data center scales
– as in real social networks
– larger scales will favor space efficient data models,
e.g. column store with a schema, but
– larger scales also have greater need for schema-last
features
DEX Interactive Workload
•
•
•
•
•
•

Query validation (no SPARQL)
Identified some of implementation choke points
New optimizations implemented and tested
SNB Roadmap
•
•
•
•
•

Motivation
Dataset Generator
Workloads
Results
Future
Ongoing Work
• Finishing Interactive workload
– updates (transactional)
– substitution parameters

• New BI and Graph Analytical Workloads
• Data Generator Improvements

– improve dictionaries and distributions for BI
– Scale factors and dataset (SN graph) validation

• Query Drivers

– Parallel update generator

• Auditing Rules for SNB
•
•
Pointers
• Code&Queries: github.com/ldbc
– ldbc_socialnet_bm

• ldbc_socialnet_dbgen
• ldbc_socialnet_qgen

• Wiki: ldbc.eu:8090/display/TUC

– Background & Discussions + Detailed report:
ldbc.eu:8090/download/attachments/4325436/LDBC_S

• LDBC Technical User Community (TUC) meeting:
– Thursday April 3, CWI Amsterdam (see wiki – next
week)

•
Thank you!

More Related Content

What's hot

Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
Databricks
 
RDF and OWL : the powerful duo | Tara Raafat
RDF and OWL : the powerful duo | Tara RaafatRDF and OWL : the powerful duo | Tara Raafat
RDF and OWL : the powerful duo | Tara Raafat
Connected Data World
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
Connected Data World
 
Neanex - Semantic Construction with Graphs
Neanex - Semantic Construction with GraphsNeanex - Semantic Construction with Graphs
Neanex - Semantic Construction with Graphs
Neo4j
 
Neo4j Graph Data Science - Webinar
Neo4j Graph Data Science - WebinarNeo4j Graph Data Science - Webinar
Neo4j Graph Data Science - Webinar
Neo4j
 
How Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondHow Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and Beyond
Neo4j
 
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Peter Haase
 
What you need to know to start an AI company?
What you need to know to start an AI company?What you need to know to start an AI company?
What you need to know to start an AI company?
Mo Patel
 
An Introduction to Graph: Database, Analytics, and Cloud Services
An Introduction to Graph:  Database, Analytics, and Cloud ServicesAn Introduction to Graph:  Database, Analytics, and Cloud Services
An Introduction to Graph: Database, Analytics, and Cloud Services
Jean Ihm
 
Swt infontology and ambient intelligence
Swt infontology and ambient intelligenceSwt infontology and ambient intelligence
Swt infontology and ambient intelligence
keith scharding
 
Graph Analytics
Graph AnalyticsGraph Analytics
Graph Analytics
Khalid Salama
 
Network and IT Operations
Network and IT OperationsNetwork and IT Operations
Network and IT Operations
Neo4j
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
Ajay Ohri
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge Graphs
Peter Haase
 
Connected datalondon metadata-driven apps
Connected datalondon metadata-driven appsConnected datalondon metadata-driven apps
Connected datalondon metadata-driven apps
Connected Data World
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4j
ijtsrd
 
ISO 18876
ISO 18876ISO 18876
ISO 18876
lenand
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
Neo4j
 
4. Document Discovery with Graph Data Science
 4. Document Discovery with Graph Data Science 4. Document Discovery with Graph Data Science
4. Document Discovery with Graph Data Science
Neo4j
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Paris Sud University
 

What's hot (20)

Graph-Powered Machine Learning
Graph-Powered Machine LearningGraph-Powered Machine Learning
Graph-Powered Machine Learning
 
RDF and OWL : the powerful duo | Tara Raafat
RDF and OWL : the powerful duo | Tara RaafatRDF and OWL : the powerful duo | Tara Raafat
RDF and OWL : the powerful duo | Tara Raafat
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
 
Neanex - Semantic Construction with Graphs
Neanex - Semantic Construction with GraphsNeanex - Semantic Construction with Graphs
Neanex - Semantic Construction with Graphs
 
Neo4j Graph Data Science - Webinar
Neo4j Graph Data Science - WebinarNeo4j Graph Data Science - Webinar
Neo4j Graph Data Science - Webinar
 
How Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondHow Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and Beyond
 
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
 
What you need to know to start an AI company?
What you need to know to start an AI company?What you need to know to start an AI company?
What you need to know to start an AI company?
 
An Introduction to Graph: Database, Analytics, and Cloud Services
An Introduction to Graph:  Database, Analytics, and Cloud ServicesAn Introduction to Graph:  Database, Analytics, and Cloud Services
An Introduction to Graph: Database, Analytics, and Cloud Services
 
Swt infontology and ambient intelligence
Swt infontology and ambient intelligenceSwt infontology and ambient intelligence
Swt infontology and ambient intelligence
 
Graph Analytics
Graph AnalyticsGraph Analytics
Graph Analytics
 
Network and IT Operations
Network and IT OperationsNetwork and IT Operations
Network and IT Operations
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge Graphs
 
Connected datalondon metadata-driven apps
Connected datalondon metadata-driven appsConnected datalondon metadata-driven apps
Connected datalondon metadata-driven apps
 
Graph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4jGraph Databases and Graph Data Science in Neo4j
Graph Databases and Graph Data Science in Neo4j
 
ISO 18876
ISO 18876ISO 18876
ISO 18876
 
Leveraging Graphs for Better AI
Leveraging Graphs for Better AILeveraging Graphs for Better AI
Leveraging Graphs for Better AI
 
4. Document Discovery with Graph Data Science
 4. Document Discovery with Graph Data Science 4. Document Discovery with Graph Data Science
4. Document Discovery with Graph Data Science
 
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
Tutorial@BDA 2017 -- Knowledge Graph Expansion and Enrichment
 

Similar to FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz

FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
LDBC council
 
LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status update
LDBC council
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
Ioan Toma
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with spark
Marissa Saunders
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big data
jins0618
 
VCCORP SoICT 2018
VCCORP SoICT 2018VCCORP SoICT 2018
VCCORP SoICT 2018
Tuan Hoang
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015
StampedeCon
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
Max De Marzi
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
Tao Feng
 
Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
Vladimir Alexiev, PhD, PMP
 
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
James Powell
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
Gábor Szárnyas
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing Paradigms
Jiaheng Lu
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
Tao Feng
 
La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphes
Cédric Fauvet
 
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Databricks
 
Halko_santafe_2015
Halko_santafe_2015Halko_santafe_2015
Halko_santafe_2015
Nathan Halko
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
markgrover
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
Samet KILICTAS
 
Graph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4jGraph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4j
Neo4j
 

Similar to FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz (20)

FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014:  Social Network Benchmark (SNB) Graph GeneratorFOSDEM 2014:  Social Network Benchmark (SNB) Graph Generator
FOSDEM 2014: Social Network Benchmark (SNB) Graph Generator
 
LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status update
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with spark
 
Chengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big dataChengqi zhang graph processing and mining in the era of big data
Chengqi zhang graph processing and mining in the era of big data
 
VCCORP SoICT 2018
VCCORP SoICT 2018VCCORP SoICT 2018
VCCORP SoICT 2018
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Semantics and Machine Learning
Semantics and Machine LearningSemantics and Machine Learning
Semantics and Machine Learning
 
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
EgoSystem: Presentation to LITA, American Library Association, Nov 8 2014
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
 
Multi-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing ParadigmsMulti-Model Data Query Languages and Processing Paradigms
Multi-Model Data Query Languages and Processing Paradigms
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
La bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphesLa bi, l'informatique décisionnelle et les graphes
La bi, l'informatique décisionnelle et les graphes
 
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
 
Halko_santafe_2015
Halko_santafe_2015Halko_santafe_2015
Halko_santafe_2015
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Graph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4jGraph Analytics: Graph Algorithms Inside Neo4j
Graph Analytics: Graph Algorithms Inside Neo4j
 

More from Ioan Toma

LDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczLDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter Boncz
Ioan Toma
 
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXParallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Ioan Toma
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service Selection
Ioan Toma
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
Ioan Toma
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark Auditing
Ioan Toma
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive Workload
Ioan Toma
 
MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use Cases
Ioan Toma
 
Towards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsTowards Temporal Graph Management and Analytics
Towards Temporal Graph Management and Analytics
Ioan Toma
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
Ioan Toma
 
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
Ioan Toma
 
20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in production
Ioan Toma
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on Giraph
Ioan Toma
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
Ioan Toma
 
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
Ioan Toma
 
Ldbc spb 2.0 evolution
Ldbc spb 2.0 evolutionLdbc spb 2.0 evolution
Ldbc spb 2.0 evolution
Ioan Toma
 
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyGRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
Ioan Toma
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
Ioan Toma
 

More from Ioan Toma (17)

LDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczLDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter Boncz
 
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXParallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service Selection
 
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark Auditing
 
Social Network Benchmark Interactive Workload
Social Network Benchmark Interactive WorkloadSocial Network Benchmark Interactive Workload
Social Network Benchmark Interactive Workload
 
MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use Cases
 
Towards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsTowards Temporal Graph Management and Analytics
Towards Temporal Graph Management and Analytics
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
 
20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in production
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on Giraph
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
 
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
 
Ldbc spb 2.0 evolution
Ldbc spb 2.0 evolutionLdbc spb 2.0 evolution
Ldbc spb 2.0 evolution
 
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyGRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
 

Recently uploaded

Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 

Recently uploaded (20)

Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 

FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz

  • 1. LDBC Social Network Benchmark (SNB) Graph Generator graph benchmarking to the next level Peter Boncz
  • 2. Why Benchmarking? • make competing products comparable © Jim Gray, 2005 • accelerate progress, make technology viable
  • 3. What is the LDBC? Linked Data Benchmark Council = LDBC • Industry entity similar to TPC (www.tpc.org) • Focusing on graph and RDF store benchmarking • • • • • • Kick-started by an EU project Runs from September 2012 – March 2015 9 project partners:
  • 5. SNB Scenario: Social Network Analysis • Intuitive: everybody knows what a SN is – Facebook, Twitter, LinkedIn, … • SNs can be easily represented as a graph – Entities are the nodes (Person, Group, Tag, Post, ...) – Relationships are the edges (Friend, Likes, Follows, …) • Different scales: from small to very large SNs – Up to billions of nodes and edges • Multiple query needs: – interactive, analytical, transactional • Multiple types of uses: – marketing, recommendation, social interactions, fraud detection, ... •
  • 6. Audience • For developers facing graph processing tasks – recognizable scenario to compare merits of different products and technologies • For vendors of graph database technology – checklist of features and performance characteristics • For researchers, both industrial and academic – challenges in multiple choke-point areas such as graph query optimization and (distributed) graph analysis •
  • 7. What was developed? • Four main elements: – – – data schema: defines the structure of the data workloads: defines the set of operations to perform performance metrics: used to measure (quantitatively) the performance of the systems – execution rules: defined to assure that the results from different executions of the benchmark are valid and comparable • Software as Open Source (GitHub) – data generator, query drivers, validation tools, ... • •
  • 9. Data Schema • Specified in UML for portability – – – Classes associations between classes Attributes for classes and associations • Some of the relationships represent dimensions – Time (Y,QT,Month,Day) – Geography (Continent,Country,Place) • Data Formats – CSV – RDF (Turtle + N3) – •
  • 11. Data Generation Process • Produce synthetic data that mimics the characteristics of real SN data • Based on SIB–S3G2 Social Graph Generator – www.w3.org/wiki/Social_Network_Intelligence_BenchM ark • Graph model: – property (directed labeled) graph • Building blocks – – – • Correlated data dictionaries Sequential subtree generation Graph generation among correlation dimensions
  • 12. Choke Points • Hidden challenges in a benchmark influence database system design, e.g. TPC-H • • • Functional Dependency Analysis in aggregation Bloom Filters for sparse joins Subquery predicate propagation – • LDBC explicitly designs benchmarks looking at choke-point “coverage” – requires access to database kernel architects •
  • 13. Data Correlations between attributes SELECT personID from person WHERE firstName = ‘Joachim’ AND addressCountry = ‘Germany’ Anti-Correlation SELECT personID from person WHERE firstName = ‘Cesare’ AND addressCountry = ‘Italy’ § § Query optimizers may underestimate or overestimate the result size of conjunctive predicates Joachim Loew Cesare Cesare Joachim Prandelli
  • 14. Property Dictionaries • Property values for each literal property defined by: – a dictionary D, a fixed set of values (e.g., terms from DBPedia or GeoNames) – a ranking function R (between 1 and|D|) – a probability density function F that specifies how the data generator chooses values from dictionary D using the rank R for each term in the dictionary
  • 15. Compact Correlated Property Value Generation Using geometric distribution for function F()
  • 16. Correlated Edge Generation “1990” <Britney Spears> rth e> Student lik Ye < “University of< stu s> ar d yA e> “Anna” > know m Leipzig” t> < tn a firs e> < liveAt> m <knows> P5 < “Germany” <firstna P1 “Laura” <birth Year> “1990” s> w <Britney no <k Spears> P3 P2 > <s “University of At tu y d dy Leipzig” At stu < > “University of “University of Amsterdam” Leipzig” “1990” “Netherlands” <b i s ow < li ke n <k > <is > P4 tu <s > > At dy > ear> At <live Y <birth
  • 17. How to Generate a Correlated Graph? Ye ? ? ear> tu <s > connection probability <birth Y ? <li k e> > ar At dy <Britney Spears> > h irt <b “University of < s t ud yAt Leipzig” > me> <firstna P1 “Laura” <Britney Spears> • Computeke> li similarity of two nodes based P4 < Student on their (correlated) properties. “Anna” e>function wrt • ?Use a probability density m tn a irs ffor connecting nodes < to this similarity <liveAt> “Germany” P5 ? <is “1990” P3 “1990” Y <birth “University of > P2 At Leipzig” highly similar  less similar dy tu <s <s > tudy At “University this is very expensive to compute on a large Danger: “University of of Leipzig” Amsterdam” graph! (quadratic, random access) “1990” “Netherlands” ? At> <live e ar >
  • 18. Window Optimization Ye <birth Y no <k e> > ar > n <k P3 At dy <Britney Spears> s> w o ear> tu <s > connection probability ws <li k > h irt <b “University of < s t ud yAt Leipzig” > me> <firstna P1 “Laura” <is “1990” > ke <Britney Spears> i P4 <l nodes with too large similarity distance Trick: disregard Student (only connect nodes in a similarity window) s> e> “Anna” ow m <kn tn a Window <firs <knows> P5 <liveAt> “Germany” “1990” Y <birth “University of Leipzig” highly similar  less similar <s <s > tudy A “University Probability that two nodes are connectedt is “University w.r.t skewed of of similarity between the nodes (due to probability the Leipzig” Amsterdam” “1990” “Netherlands” > At y ud t At> <live e ar > distr.) P2
  • 19. Graph generation • Create new edges and nodes in one pass, starting from an existing node and start creating new nodes to which it will be connected • Uses degree distributions that follow a power law • Generates correlated and highly connected graph data by creating edges after generating many nodes – Similarity metric – Random dimension
  • 20. Implementation • MapReduce – A Map function runs on different parts of the input data, in parallel and on many clusters – Main Reduce function: • • Sort the correlation dimensions Generate edges based on sliding windows (similarity metric) – Successive MapReduce phases for different correlation dimensions • Scalable, e.g.: – Generate 1.2TB of data in 30 minutes (16 4-core nodes) – •
  • 21. Validation: Metrics • • • • • • • • • Largest Connected Component Average Clustering Coefficient Diameter Average Path Length Hop-plot User-Knows Attribute distributions Degree distributions Time evolution
  • 24. Workloads • On-Line: tests a system's throughput with relatively simple queries with concurrent updates – Show all photos posted by my friends that I was tagged in – • Business Intelligence: consists of complex structured queries for analyzing online behavior – Influential people the topic of open source development? – • Graph Analytics: tests the functionality and scalability on most of the data as a single operation – PageRank, Shortest Path(s), Community Detection •
  • 26. Interactive Query Set • Tests system throughput with relatively simple queries and concurrent updates • Current set: 12 read-only queries • For each query: – – – – – Name and detailed description in plain English List of input parameters Expected result: content and format Textual functional description Relevance: • textual description (plain English) of the reasoning for including this query in the workload • discussion about the technical challenges (Choke Points) targeted – Validation parameters and validation results – SPARQL query • • –
  • 27. Some SNB Interactive Choke Points • Graph Traversals. Query execution time heavily depends on the ability to quickly traverse friends graph. • Plan Variablility. Each query have many different best plans depending on parameter choices (eg. Hash- vs index-based joins). • Top k and distinct: Many queries return the first results in a specific order: Late projection, pushing conditions from the sort into the query • Repetitive short queries, differing only in literals, opportunity for query plan recycling
  • 29. Example: Q3 • Name: Friends within 2 hops that have been in two countries Description: Find Friends and Friends of Friends of the user A that have made a post in the foreign countries X and Y within a specified period. We count only posts that are made in the country that is different from the country of a friend. The result should be sorted descending by total number of posts, and then by person URI. Top 20 should be shown. The user A (as friend of his friend) should not be in the result Parameter: - Person - CountryX - CountryY - startDate - the beginning of the requested period - Duration - requested period in days Result: - Person.id, Person.firstname, Person.lastName - Number of post of each country and the sum of all posts Relevance: - Choke Points: CP3.3 - If one country is large but anticorrelated with the country of self then processing this before a smaller but positively correlated country can be beneficial
  • 30. Example: Q5 - SPARQL select ?group count (*) where { {select distinct ?fr where { {%Person% snvoc:knows ?fr.} union {%Person% snvoc:knows ?fr2. ?fr2 snvoc:knows ?fr. filter (?fr != %Person%)} } } . ?group snvoc:hasMember ?mem . ?mem snvoc:hasPerson ?fr . ?mem snvoc:joinDate ?date . filter (?date >= "%Date0%"^^xsd:date) . ?post snvoc:hasCreator ?fr . ?group snvoc:containerOf ?post } group by ?group order by desc(2) ?group limit 20
  • 31. Example: Q5 - Cypher MATCH (person:Person)-[:KNOWS*1..2]-(friend:Person) WHERE person.id={person_id} MATCH (friend)<-[membership:HAS_MEMBER]-(forum:Forum) WHERE membership.joinDate>{join_date} MATCH (friend)<-[:HAS_CREATOR]-(comment:Comment) WHERE (comment)-[:REPLY_OF*0..]->(:Comment)-[:REPLY_OF]->(:Post)<[:CONTAINER_OF]-(forum) RETURN forum.title AS forum, count(comment) AS commentCount ORDER BY commentCount DESC MATCH (person:Person)-[:KNOWS*1..2]-(friend:Person) WHERE person.id={person_id} MATCH (friend)<-[membership:HAS_MEMBER]-(forum:Forum) WHERE membership.joinDate>{join_date} MATCH (friend)<-[:HAS_CREATOR]-(post:Post)<-[:CONTAINER_OF]-(forum) RETURN forum.title AS forum, count(post) AS postCount ORDER BY postCount DESC
  • 32. Example: Q5 - DEX v.setLongVoid(personId); v.setLongVoid(personId); long personOID = graph.findObject(personId, v); long personOID = graph.findObject(personId, v); Objects friends = graph.neighbors(personOID, knows, EdgesDirection.Outgoing); Objects friends = graph.neighbors(personOID, knows, EdgesDirection.Outgoing); Objects allFriends = graph.neighbors(friends, knows, EdgesDirection.Outgoing); Objects allFriends = graph.neighbors(friends, knows, EdgesDirection.Outgoing); allFriends.union(friends); allFriends.union(friends); allFriends.remove(personOID); allFriends.remove(personOID); friends.close(); friends.close(); Objects members = graph.explode(allFriends, hasMember, EdgesDirection.Ingoing); Objects members = graph.explode(allFriends, hasMember, EdgesDirection.Ingoing); v.setTimestampVoid(date); v.setTimestampVoid(date); Objects candidate = graph.select(joinDate, Condition.GreaterEqual, v, members); Objects candidate = graph.select(joinDate, Condition.GreaterEqual, v, members); Objects finalSelection = graph.tails(candidate); Objects finalSelection = graph.tails(candidate); candidate.close(); candidate.close(); members.close(); members.close(); Objects posts = graph.neighbors(allFriends, hasCreator, EdgesDirection.Ingoing); Objects posts = graph.neighbors(allFriends, hasCreator, EdgesDirection.Ingoing); ObjectsIterator iterator = finalSelection.iterator(); ObjectsIterator iterator = finalSelection.iterator(); while (iterator.hasNext()) { while (iterator.hasNext()) { long oid = iterator.next(); long oid = iterator.next(); Container c = new Container(); Container c = new Container(); Objects postsGroup = graph.neighbors(oid, containerOf, EdgesDirection.Outgoing); Objects postsGroup = graph.neighbors(oid, containerOf, EdgesDirection.Outgoing); Objects moderators = graph.neighbors(oid, hasModerator, EdgesDirection.Outgoing); Objects moderators = graph.neighbors(oid, hasModerator, EdgesDirection.Outgoing); long moderatorOid = moderators.any(); long moderatorOid = moderators.any(); moderators.close(); moderators.close(); Objects postsModerator = graph.neighbors(moderatorOid, hasCreator, EdgesDirection.Ingoing); Objects postsModerator = graph.neighbors(moderatorOid, hasCreator, EdgesDirection.Ingoing); postsGroup.difference(postsModerator); postsGroup.difference(postsModerator); postsModerator.close(); postsModerator.close(); postsGroup.intersection(posts); postsGroup.intersection(posts); long count = postsGroup.size(); long count = postsGroup.size(); if (count > 0) { if (count > 0) { graph.getAttribute(oid, forumId, v); graph.getAttribute(oid, forumId, v); c.row[0] = db.getForumURI(v.getLong()); c.row[0] = db.getForumURI(v.getLong()); c.compare2 = String.valueOf(v.getLong()); c.compare2 = String.valueOf(v.getLong()); c.row[1] = String.valueOf(count); c.row[1] = String.valueOf(count); c.compare = count; c.compare = count; results.add(c); results.add(c); } } postsGroup.close() postsGroup.close() } }
  • 34. SNB Software • SNB data – Dataset generator – Dataset Validator – SN Graph Analyzer • SNB Query drivers – BSBM based – YCSB based
  • 35. Some Experiments • Virtuoso (RDF) – 100k users during 3 years period (3.3 billion triples, 60GB) – Ten SPARQL query mixes – 4 x Intel Xeon 2.30GHz CPU, 193 GB of RAM • DEX (Graph Database) – Validation setup: 10k users during 3 years (19GB) – Validation query set and parameters (API-based) – 2 x Intel Xeon 2.40Ghz CPU, 128 GB of RAM •
  • 36. Virtuoso Interactive Workload • Some queries could not be considered as truly interactive – e.g. Q4, Q5 and Q9 – … still all queries are very interesting challenges • ”Irregular” data distribution reflecting the reality of the SN – … but complicates the selection of query parameters • •
  • 37. Exploration in Scale • 3.3 bn RDF triples per 100K users, 24G in triples, 36G in literals • 2/3 of data in interactive working set, 1/4 in BI working set • scale out becomes increasingly necessary after 1M • 10-100M users are data center scales – as in real social networks – larger scales will favor space efficient data models, e.g. column store with a schema, but – larger scales also have greater need for schema-last features
  • 38. DEX Interactive Workload • • • • • • Query validation (no SPARQL) Identified some of implementation choke points New optimizations implemented and tested
  • 40. Ongoing Work • Finishing Interactive workload – updates (transactional) – substitution parameters • New BI and Graph Analytical Workloads • Data Generator Improvements – improve dictionaries and distributions for BI – Scale factors and dataset (SN graph) validation • Query Drivers – Parallel update generator • Auditing Rules for SNB • •
  • 41. Pointers • Code&Queries: github.com/ldbc – ldbc_socialnet_bm • ldbc_socialnet_dbgen • ldbc_socialnet_qgen • Wiki: ldbc.eu:8090/display/TUC – Background & Discussions + Detailed report: ldbc.eu:8090/download/attachments/4325436/LDBC_S • LDBC Technical User Community (TUC) meeting: – Thursday April 3, CWI Amsterdam (see wiki – next week) •

Editor's Notes

  1. 2
  2. 26
  3. 27
  4. Sources for generating edges correlated with property values 32
  5. Sources for generating edges correlated with property values 33
  6. Sources for generating edges correlated with property values 34