SlideShare a Scribd company logo
1 of 24
Download to read offline
LDBC Social Network
Benchmark
Interactive Workload
Arnau Prat
DAMA – UPC
Summary of SNB-Interactive
● Simple but challenging interactive queries on top of a social
network site
– Interactive queries
– Flexible: Declarative and API based systems
– Latency and throughput are both important
– Easy to use
● All software and docs at https://github.com/ldbc
Table of Contents
● LDBC SNB Datage
● LDBC SNB Interactive Queries
● LDBC Workload Driver
● Conclusions
LDBC SNB Datagen
● Generates a realistic social network with the Facebook
degree distribution (persons, groups, posts, likes, etc.)
– Correlated graph → Similar people have a larger probability to
connected, correlated attributes, etc.
LDBC SNB Datagen
● Generates a realistic social network with the Facebook
degree distribution (persons, groups, posts, likes, etc.)
– Correlated graph → Similar people have a larger probability to
connected, correlated attributes, etc.
– Event driven activity volume
LDBC SNB Datagen
● Generates a realistic social network with the Facebook
degree distribution (persons, groups, posts, likes, etc.)
– Correlated graph → Similar people have a larger probability to
connected, correlated attributes, etc.
– Event driven activity volume
– Scalable
LDBC SNB Datagen
● Generates a realistic social network with the Facebook
degree distribution (persons, groups, posts, likes, etc.)
– Correlated graph → Similar people have a larger probability to
connected, correlated attributes, etc.
– Event driven activity volume
– Scalable
– Deterministic → Allows a fair comparison between SUTs and
reproducibility of benchmark executions
LDBC SNB Datagen
● Scale Factors
– 1,3,10,30,100,300,1000
– Based on the size of the dataset on dist in CSV format
SF Relations Persons Messages Activity Size
SF1 20M 11K 3M 3 years 1GB
SF10 200M 73K 30M 3 years 10GB
SF100 2000M 499K 300M 3 years 100GB
SF1000 20000M 3600K 3000M 3 years 1000GB
* approximated numbers
LDBC SNB Datagen
● 90% of the network is output as CSV to be bulk loaded
● The rest 10% is output as update streams
● Substitution parameters for each complex read query type
– Parameter binding to reduce variability between queries
LDBC SNB Interactive queries
● 14 Complex reads ( interactive yet complex, target choke-
points ):
– Query 6: Given a start Person and some Tag, find the other Tags
that occur together with this Tag on Posts that were created by
start Person’s friends and friends of friends
– Query 14: Given two Persons, find all (unweighted) shortest
paths between these two Persons, in the subgraph induced by
the Knows relationship. Then, for each path calculate a weight.
The nodes in the path are Persons, and the weight of a path is the
sum of weights between every pair of consecutive Person nodes
in the path. The weight for a pair of Persons is calculated such
that every reply (by one of the Persons) to a Post (by the other
Person) contributes 1.0, and every reply (by ones of the Persons)
to a Comment (by the other Person) contributes 0.5.
LDBC SNB Interactive queries
● 7 Short reads (balance read/write ratio of workload. mimic
user behavior):
– Given a start Person, retrieve their first name, last name,
birthday, IP address, browser, and city of residence
– Given a start Person, retrieve all of their friends, and the date at
which they became friends
– Given a Message (Post or Comment), retrieve the (1-hop)
Comments that reply to it. In addition, return a boolean flag
indicating if the author of the reply knows the author of the
original message. If author is same as original author, return
false for "knows" flag
LDBC SNB Interactive queries
● 8 Updates:
– Add Person
– Add Knows
– Add Post
– Add Post Like
– Add Comment
– Add Comment Like
– Add Group
– Add Group Membership
LDBC Workload Driver
● Responsible of generating the Workload = Stream of
operations
– scheduled start time (real time)
– type (e.g. ComplexQuery1)
– parameters (e.g. Person ID)
LDBC Workload Driver
● Updates
– substitution parameters read from datagen update streams
– time stamps ("simulation time") read from datagen update
streams
LDBC Workload Driver
● Complex Reads
– substitution parameters read from datagen files
– scheduled start times assigned by driver as multiples of update
frequency
● e.g. for every 132 Updates the driver generates 1
ComplexQuery1
LDBC Workload Driver
●
two groups of Short Reads: "person centric" & "message centric"
●
after each Complex Read a sequence of Short Reads is executed
– sequence appoximates walk through network
– at each step there is a probability of taking another step, which
decreases at each step
– steps consist of either all "person centric" or all "message centric"
operations
● e.g., (person centric operations)->(flip coin)->(message centric
operations)->(flip coin)...
– mimics user "following links"/Facebook-stalking :-)
– substitution parameters read taken from results of recent Complex
Reads and Short reads
LDBC Workload Driver - Execution
● Driver schedules operations as close to their scheduled start times
as possible
● "Time Compression Ratio" used to configure target throughput
● Vendor provides callbacks that driver use to execute operations
● Number of worker threads configurable
● For every executed operation, driver logs the following (used for
auditing)
– operation type
– scheduled start time
– actual start time
– runtime
LDBC Workload Driver – Validation Mode
● Given a vendor implementation & workload, driver generates
validation datasets
● Stream of operations + their results
● Validation datasets can then be used to validate other
vendor implementations (e.g. compare results)
● Official validation datasets are provided by the LDBC SNB
LDBC Workload Driver - Example
● SF10, tcr 0.5, 1000k
Query Count
Q1 36
Q2 25
Q3 10
Q4 26
Q5 14
Q6 4
Q7 16
Query Count
Q8 63
Q9 3
Q10 27
Q11 49
Q12 21
Q13 49
Q14 19
Query Count
S1 451
S2 451
S3 451
S4 444
S5 444
S6 444
S7 444
Query Count
U1 1
U2 124
U3 153
U4 3
Query Count
U5 244
U6 18
U7 74
U8 22
LDBC Workload Driver - Example
Query Frequency
Q1 26
Q2 37
Q3 106
Q4 36
Q5 72
Q6 316
Q7 48
Q8 9
Q9 384
Q10 37
Q11 20
Q12 44
Q13 19
Q14 49
● Query mix
for SF10
Query Frequency
Q1 26
Q2 37
Q3 142
Q4 46
Q5 84
Q6 580
Q7 32
Q8 3
Q9 705
Q10 44
Q11 24
Q12 44
Q13 19
Q14 49
● Query mix
for SF300
LDBC Workload Driver - Example
SF1 Mean SF3Mean SF10Mean SF30Mean
0
200
400
600
800
1000
1200
1400
1600
1800
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9
Q10
Q11
Q12
Q13
Q14
Milliseconds
LDBC Workload Driver - Rules
● Benchmark executions must meet the following rules to be
valid:
– queries must pass validation datasets
– at most 5% of the queries actual start time can be one second
greater than scheduled start time
– must comprise at least 2 hours of simulation time
– at any point, the test machine is disconnected and those
commited must be persistent
● Performance metrics are:
– latencies for each query
– throughput
Conclusions
● SNB Interactive on top of synthetic Social Network data
● 3 Types of queries:
– Complex Reads
– Short Reads
– Updates
● The driver builds a query wich mimics a user behavior
● Both latency and throughput are important. Persistence is
mandatory
● All software is open source. We are open for contributions!
Thank you

More Related Content

Viewers also liked

E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...Ioan Toma
 
Ldbc spb 2.0 evolution
Ldbc spb 2.0 evolutionLdbc spb 2.0 evolution
Ldbc spb 2.0 evolutionIoan Toma
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczIoan Toma
 
20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in productionIoan Toma
 
LDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczLDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczIoan Toma
 
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyGRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyIoan Toma
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLDBC council
 
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesIoan Toma
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingIoan Toma
 

Viewers also liked (9)

E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
E-Commerce and Graph-driven Applications: Experiences and Optimizations while...
 
Ldbc spb 2.0 evolution
Ldbc spb 2.0 evolutionLdbc spb 2.0 evolution
Ldbc spb 2.0 evolution
 
Keynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter BonczKeynote IDEAS2013 - Peter Boncz
Keynote IDEAS2013 - Peter Boncz
 
20 billion triples in production
20 billion triples in production20 billion triples in production
20 billion triples in production
 
LDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter BonczLDBC 6th TUC Meeting conclusions by Peter Boncz
LDBC 6th TUC Meeting conclusions by Peter Boncz
 
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba PeyGRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
GRAPH-TA 2013 - RDF and Graph benchmarking - Jose Lluis Larriba Pey
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on Giraph
 
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web ServicesSADI: A design-pattern for “native” Linked-Data Semantic Web Services
SADI: A design-pattern for “native” Linked-Data Semantic Web Services
 
LDBC SNB Benchmark Auditing
LDBC SNB Benchmark AuditingLDBC SNB Benchmark Auditing
LDBC SNB Benchmark Auditing
 

Similar to Social Network Benchmark Interactive Workload

FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczIoan Toma
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015Ioan Toma
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...Gábor Szárnyas
 
Yet another json rpc library (mole rpc)
Yet another json rpc library (mole rpc)Yet another json rpc library (mole rpc)
Yet another json rpc library (mole rpc)Viktor Turskyi
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with sparkMarissa Saunders
 
SEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSemLib Project
 
Payola ESWC 2014 demo poster
Payola ESWC 2014 demo posterPayola ESWC 2014 demo poster
Payola ESWC 2014 demo posterJiří Helmich
 
Joint search by social and spatial proximity
Joint search by social and spatial proximityJoint search by social and spatial proximity
Joint search by social and spatial proximityieeepondy
 
Jubatus talk at HadoopSummit 2013
Jubatus talk at HadoopSummit 2013Jubatus talk at HadoopSummit 2013
Jubatus talk at HadoopSummit 2013Preferred Networks
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Fabian Hueske
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsMike Broberg
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to ProductionMostafa Majidpour
 
Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...openCypher
 
LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC council
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaDataWorks Summit
 
Tutorial 8 (web graph models)
Tutorial 8 (web graph models)Tutorial 8 (web graph models)
Tutorial 8 (web graph models)Kira
 
Network Test Automation 2015-04-23 #npstudy
Network Test Automation 2015-04-23 #npstudyNetwork Test Automation 2015-04-23 #npstudy
Network Test Automation 2015-04-23 #npstudyHiroshi Ota
 
IT6701 Information Management - Unit I
IT6701 Information Management - Unit I  IT6701 Information Management - Unit I
IT6701 Information Management - Unit I pkaviya
 

Similar to Social Network Benchmark Interactive Workload (20)

FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
 
Yet another json rpc library (mole rpc)
Yet another json rpc library (mole rpc)Yet another json rpc library (mole rpc)
Yet another json rpc library (mole rpc)
 
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge BasesLOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
LOD2: State of Play WP2 - Storing and Querying Very Large Knowledge Bases
 
Clickstream data with spark
Clickstream data with sparkClickstream data with spark
Clickstream data with spark
 
SEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentation
 
Payola ESWC 2014 demo poster
Payola ESWC 2014 demo posterPayola ESWC 2014 demo poster
Payola ESWC 2014 demo poster
 
Joint search by social and spatial proximity
Joint search by social and spatial proximityJoint search by social and spatial proximity
Joint search by social and spatial proximity
 
Jubatus talk at HadoopSummit 2013
Jubatus talk at HadoopSummit 2013Jubatus talk at HadoopSummit 2013
Jubatus talk at HadoopSummit 2013
 
cyclades eswc2016
cyclades eswc2016cyclades eswc2016
cyclades eswc2016
 
Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...Streaming SQL to unify batch and stream processing: Theory and practice with ...
Streaming SQL to unify batch and stream processing: Theory and practice with ...
 
SQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 QuestionsSQL to NoSQL: Top 6 Questions
SQL to NoSQL: Top 6 Questions
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...Academic research on graph processing: connecting recent findings to industri...
Academic research on graph processing: connecting recent findings to industri...
 
LDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status updateLDBC 8th TUC Meeting: Introduction and status update
LDBC 8th TUC Meeting: Introduction and status update
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
 
Tutorial 8 (web graph models)
Tutorial 8 (web graph models)Tutorial 8 (web graph models)
Tutorial 8 (web graph models)
 
Network Test Automation 2015-04-23 #npstudy
Network Test Automation 2015-04-23 #npstudyNetwork Test Automation 2015-04-23 #npstudy
Network Test Automation 2015-04-23 #npstudy
 
IT6701 Information Management - Unit I
IT6701 Information Management - Unit I  IT6701 Information Management - Unit I
IT6701 Information Management - Unit I
 

More from Ioan Toma

Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXParallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXIoan Toma
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionIoan Toma
 
MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesIoan Toma
 
Towards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsTowards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsIoan Toma
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphIoan Toma
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphIoan Toma
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)Ioan Toma
 
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...Ioan Toma
 

More from Ioan Toma (8)

Parallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOXParallel and incremental materialisation of RDF/DATALOG in RDFOX
Parallel and incremental materialisation of RDF/DATALOG in RDFOX
 
MODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service SelectionMODAClouds Decision Support System for Cloud Service Selection
MODAClouds Decision Support System for Cloud Service Selection
 
MarkLogic Overview and Use Cases
MarkLogic Overview and Use CasesMarkLogic Overview and Use Cases
MarkLogic Overview and Use Cases
 
Towards Temporal Graph Management and Analytics
Towards Temporal Graph Management and AnalyticsTowards Temporal Graph Management and Analytics
Towards Temporal Graph Management and Analytics
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
Lighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on GiraphLighthouse: Large-scale graph pattern matching on Giraph
Lighthouse: Large-scale graph pattern matching on Giraph
 
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
HP Labs: Titan DB on LDBC SNB interactive by Tomer Sagi (HP)
 
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
SPIMBENCH: A scalable, Schema-Aware Instance Matching Benchmark for the Seman...
 

Recently uploaded

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Social Network Benchmark Interactive Workload

  • 1. LDBC Social Network Benchmark Interactive Workload Arnau Prat DAMA – UPC
  • 2. Summary of SNB-Interactive ● Simple but challenging interactive queries on top of a social network site – Interactive queries – Flexible: Declarative and API based systems – Latency and throughput are both important – Easy to use ● All software and docs at https://github.com/ldbc
  • 3. Table of Contents ● LDBC SNB Datage ● LDBC SNB Interactive Queries ● LDBC Workload Driver ● Conclusions
  • 4. LDBC SNB Datagen ● Generates a realistic social network with the Facebook degree distribution (persons, groups, posts, likes, etc.) – Correlated graph → Similar people have a larger probability to connected, correlated attributes, etc.
  • 5. LDBC SNB Datagen ● Generates a realistic social network with the Facebook degree distribution (persons, groups, posts, likes, etc.) – Correlated graph → Similar people have a larger probability to connected, correlated attributes, etc. – Event driven activity volume
  • 6. LDBC SNB Datagen ● Generates a realistic social network with the Facebook degree distribution (persons, groups, posts, likes, etc.) – Correlated graph → Similar people have a larger probability to connected, correlated attributes, etc. – Event driven activity volume – Scalable
  • 7. LDBC SNB Datagen ● Generates a realistic social network with the Facebook degree distribution (persons, groups, posts, likes, etc.) – Correlated graph → Similar people have a larger probability to connected, correlated attributes, etc. – Event driven activity volume – Scalable – Deterministic → Allows a fair comparison between SUTs and reproducibility of benchmark executions
  • 8. LDBC SNB Datagen ● Scale Factors – 1,3,10,30,100,300,1000 – Based on the size of the dataset on dist in CSV format SF Relations Persons Messages Activity Size SF1 20M 11K 3M 3 years 1GB SF10 200M 73K 30M 3 years 10GB SF100 2000M 499K 300M 3 years 100GB SF1000 20000M 3600K 3000M 3 years 1000GB * approximated numbers
  • 9. LDBC SNB Datagen ● 90% of the network is output as CSV to be bulk loaded ● The rest 10% is output as update streams ● Substitution parameters for each complex read query type – Parameter binding to reduce variability between queries
  • 10. LDBC SNB Interactive queries ● 14 Complex reads ( interactive yet complex, target choke- points ): – Query 6: Given a start Person and some Tag, find the other Tags that occur together with this Tag on Posts that were created by start Person’s friends and friends of friends – Query 14: Given two Persons, find all (unweighted) shortest paths between these two Persons, in the subgraph induced by the Knows relationship. Then, for each path calculate a weight. The nodes in the path are Persons, and the weight of a path is the sum of weights between every pair of consecutive Person nodes in the path. The weight for a pair of Persons is calculated such that every reply (by one of the Persons) to a Post (by the other Person) contributes 1.0, and every reply (by ones of the Persons) to a Comment (by the other Person) contributes 0.5.
  • 11. LDBC SNB Interactive queries ● 7 Short reads (balance read/write ratio of workload. mimic user behavior): – Given a start Person, retrieve their first name, last name, birthday, IP address, browser, and city of residence – Given a start Person, retrieve all of their friends, and the date at which they became friends – Given a Message (Post or Comment), retrieve the (1-hop) Comments that reply to it. In addition, return a boolean flag indicating if the author of the reply knows the author of the original message. If author is same as original author, return false for "knows" flag
  • 12. LDBC SNB Interactive queries ● 8 Updates: – Add Person – Add Knows – Add Post – Add Post Like – Add Comment – Add Comment Like – Add Group – Add Group Membership
  • 13. LDBC Workload Driver ● Responsible of generating the Workload = Stream of operations – scheduled start time (real time) – type (e.g. ComplexQuery1) – parameters (e.g. Person ID)
  • 14. LDBC Workload Driver ● Updates – substitution parameters read from datagen update streams – time stamps ("simulation time") read from datagen update streams
  • 15. LDBC Workload Driver ● Complex Reads – substitution parameters read from datagen files – scheduled start times assigned by driver as multiples of update frequency ● e.g. for every 132 Updates the driver generates 1 ComplexQuery1
  • 16. LDBC Workload Driver ● two groups of Short Reads: "person centric" & "message centric" ● after each Complex Read a sequence of Short Reads is executed – sequence appoximates walk through network – at each step there is a probability of taking another step, which decreases at each step – steps consist of either all "person centric" or all "message centric" operations ● e.g., (person centric operations)->(flip coin)->(message centric operations)->(flip coin)... – mimics user "following links"/Facebook-stalking :-) – substitution parameters read taken from results of recent Complex Reads and Short reads
  • 17. LDBC Workload Driver - Execution ● Driver schedules operations as close to their scheduled start times as possible ● "Time Compression Ratio" used to configure target throughput ● Vendor provides callbacks that driver use to execute operations ● Number of worker threads configurable ● For every executed operation, driver logs the following (used for auditing) – operation type – scheduled start time – actual start time – runtime
  • 18. LDBC Workload Driver – Validation Mode ● Given a vendor implementation & workload, driver generates validation datasets ● Stream of operations + their results ● Validation datasets can then be used to validate other vendor implementations (e.g. compare results) ● Official validation datasets are provided by the LDBC SNB
  • 19. LDBC Workload Driver - Example ● SF10, tcr 0.5, 1000k Query Count Q1 36 Q2 25 Q3 10 Q4 26 Q5 14 Q6 4 Q7 16 Query Count Q8 63 Q9 3 Q10 27 Q11 49 Q12 21 Q13 49 Q14 19 Query Count S1 451 S2 451 S3 451 S4 444 S5 444 S6 444 S7 444 Query Count U1 1 U2 124 U3 153 U4 3 Query Count U5 244 U6 18 U7 74 U8 22
  • 20. LDBC Workload Driver - Example Query Frequency Q1 26 Q2 37 Q3 106 Q4 36 Q5 72 Q6 316 Q7 48 Q8 9 Q9 384 Q10 37 Q11 20 Q12 44 Q13 19 Q14 49 ● Query mix for SF10 Query Frequency Q1 26 Q2 37 Q3 142 Q4 46 Q5 84 Q6 580 Q7 32 Q8 3 Q9 705 Q10 44 Q11 24 Q12 44 Q13 19 Q14 49 ● Query mix for SF300
  • 21. LDBC Workload Driver - Example SF1 Mean SF3Mean SF10Mean SF30Mean 0 200 400 600 800 1000 1200 1400 1600 1800 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Milliseconds
  • 22. LDBC Workload Driver - Rules ● Benchmark executions must meet the following rules to be valid: – queries must pass validation datasets – at most 5% of the queries actual start time can be one second greater than scheduled start time – must comprise at least 2 hours of simulation time – at any point, the test machine is disconnected and those commited must be persistent ● Performance metrics are: – latencies for each query – throughput
  • 23. Conclusions ● SNB Interactive on top of synthetic Social Network data ● 3 Types of queries: – Complex Reads – Short Reads – Updates ● The driver builds a query wich mimics a user behavior ● Both latency and throughput are important. Persistence is mandatory ● All software is open source. We are open for contributions!