Uploaded byDBOnto

587 views

Querying Distributed RDF Graphs: The Effects of Partitioning Poster

Abstract: Web-scale RDF datasets are increasingly processed using distributed RDF data stores built on top of a cluster of shared-nothing servers. Such systems critically rely on their data partitioning scheme and query answering scheme, the goal of which is to facilitate correct and ecient query processing. Existing data partitioning schemes are commonly based on hashing or graph partitioning techniques. The latter techniques split a dataset in a way that minimises the number of connections between the resulting subsets, thus reducing the need for communication between servers; however, to facilitate ecient query answering, considerable duplication of data at the intersection between subsets is often needed. Building upon the known graph partitioning approaches, in this paper we present a novel data partitioning scheme that employs minimal duplication and keeps track of the connections between partition elements; moreover, we propose a query answering scheme that uses this additional information to correctly answer all queries. We show experimentally that, on certain well-known RDF benchmarks, our data partitioning scheme often allows more answers to be retrieved without distributed computation than the known schemes, and we show that our query answering scheme can eciently answer many queries.

Querying Distributed RDF Graphs: The Effects of Partitioning
Anthony Potter Boris Motik Ian Horrocks
Challenges
Create a distributed, cloud-based DBMS for large
RDF graphs. The two main challenges are:
• How to partition data across a cluster
• How to answer queries over a cluster
LUBM: Percentage of Local Answers
System Q2 Q8 Q9 Q11 Q12 Qc
RDFox 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
SHAPE 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
Hash 0.44% 4.96% 0.23% 5.80% 0.00% 0.04%
SP2B: Percentage of Local Answers
System Q4 Q5 Q6 Q7 Q8
RDFox 95.95% 73.00% 99.90% 92.41% 91.45%
SHAPE 95.23% 9.72% 100.00% 41.97% 73.72%
Hash 0.01% 0.77% 0.25% 0.08% 0.26%
Storage Overhead
RDFox SHAPE Hash
LUBM 3.60% 84.23% 0.00%
SP2B 0.60% 38.63% 0.00%
RDF Graph Partitioning
Evaluation
Wildcard *
Conclusions
Our approach:
• Greater percentage of local
answers compared to
competitors
• Minimal storage overhead
• Recognises local answers
Future Work
• Develop efficient query
answering scheme with
wildcards
• Implement distributed
DBMS on top of RDFox
Local Answer
A local answer can be
evaluated fully on a
single machine:
• No network
communication
• Fast evaluation Subject Hashing Partition element 1 Graph-based
Partition element 2
RDFox: Our approach
SHAPE: Semantic hash partitioning
with n-hop duplication
Hash: Subject hashing
Aims
• Maximise number of
local answers to
common queries
• Recognise local
answers
• Minimise storage
overhead
• Many approaches to partitioning:
§ Subject hashing
§ Graph-based
§ Semantic hashing
• Partitioning scheme affects the
number of local answers
• We propose a novel graph-based
partitioning scheme:
§ Minimal storage overhead
§ Common (star) queries fully
local
Introduction of a new
wildcard resource:
• Represents all external
resources
• Tracks connections
between partition
elements
Used in our novel query
answering scheme:
• Reduces the number of
intermediate answers
• Recognises local
answers

Recommended

PPT

PDQ: Proof-driven Querying presentation

PDF

Optique presentation

PDF

Semantic Faceted Search with SemFacet presentation

PDF

PAGOdA poster

PDF

RDFox Poster

PDF

Overview of Dan Olteanu's Research presentation

PDF

Diadem DBOnto Kick Off meeting

PDF

ArtForm - Dynamic analysis of JavaScript validation in web forms - Poster

PDF

DIADEM: domain-centric intelligent automated data extraction methodology Pres...

PDF

Parallel Datalog Reasoning in RDFox Presentation

PDF

PAGOdA Presentation

PDF

PAGOdA paper

PDF

PDQ Poster

PDF

Optique - poster

PDF

Aggregating Semantic Annotators Paper

PDF

SemFacet Poster

PDF

Sem facet paper

PDF

Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...

PDF

ROSeAnn Presentation

PDF

Welcome by Ian Horrocks

PDF

SemFacet paper

PDF

Query Distributed RDF Graphs: The Effects of Partitioning Paper

PDF

Stop Calling It Hallucination: Semantic Drift as AI’s Real Failure Mode

bySemantic Fidelity Lab

PDF

How Azure DevOps Consultants Dubai Reduce Release Delays.pdf

PDF

Huawei Datacom – How To Pass H12-892 On Your First Try

PDF

Artificial Intelligence and Barbarism - Conceptual Map

PDF

Top 10 API Automation Testing Tools: Features, Pros & Cons

byhenrycavill30521

PDF

The map to conquer linear algebra for IT engineer

byakipii ogaoga

PDF

Industrial RFID Landscape for 2025 - Readers, Tags, Antennas, Printers, etc

PDF

Xemelgo - RFID Industry Predictions for 2026

More Related Content

PPT

PDQ: Proof-driven Querying presentation

PDF

Optique presentation

PDF

Semantic Faceted Search with SemFacet presentation

PDF

PAGOdA poster

PDF

RDFox Poster

PDF

Overview of Dan Olteanu's Research presentation

PDF

Diadem DBOnto Kick Off meeting

PDF

ArtForm - Dynamic analysis of JavaScript validation in web forms - Poster

PDQ: Proof-driven Querying presentation

Optique presentation

Semantic Faceted Search with SemFacet presentation

PAGOdA poster

RDFox Poster

Overview of Dan Olteanu's Research presentation

Diadem DBOnto Kick Off meeting

ArtForm - Dynamic analysis of JavaScript validation in web forms - Poster

Viewers also liked

PDF

DIADEM: domain-centric intelligent automated data extraction methodology Pres...

PDF

Parallel Datalog Reasoning in RDFox Presentation

PDF

PAGOdA Presentation

PDF

PAGOdA paper

PDF

PDQ Poster

PDF

Optique - poster

PDF

Aggregating Semantic Annotators Paper

PDF

SemFacet Poster

PDF

Sem facet paper

PDF

Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...

PDF

ROSeAnn Presentation

PDF

Welcome by Ian Horrocks

PDF

SemFacet paper

PDF

Query Distributed RDF Graphs: The Effects of Partitioning Paper

DIADEM: domain-centric intelligent automated data extraction methodology Pres...

Parallel Datalog Reasoning in RDFox Presentation

PAGOdA Presentation

PAGOdA paper

PDQ Poster

Optique - poster

Aggregating Semantic Annotators Paper

SemFacet Poster

Sem facet paper

Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...

ROSeAnn Presentation

Welcome by Ian Horrocks

SemFacet paper

Query Distributed RDF Graphs: The Effects of Partitioning Paper

Recently uploaded

PDF

Stop Calling It Hallucination: Semantic Drift as AI’s Real Failure Mode

bySemantic Fidelity Lab

PDF

How Azure DevOps Consultants Dubai Reduce Release Delays.pdf

PDF

Huawei Datacom – How To Pass H12-892 On Your First Try

PDF

Artificial Intelligence and Barbarism - Conceptual Map

PDF

Top 10 API Automation Testing Tools: Features, Pros & Cons

byhenrycavill30521

PDF

The map to conquer linear algebra for IT engineer

byakipii ogaoga

PDF

Industrial RFID Landscape for 2025 - Readers, Tags, Antennas, Printers, etc

PDF

Xemelgo - RFID Industry Predictions for 2026

PDF

Top Benefits of Using KVM VPS Hosting for Growing Businesses

byEthernet Servers

PDF

The Multiverse of Artificial Intelligence

PDF

RaaS™ — Research as a Service | Sovereign-Grade Energy & Infrastructure

PDF

Benefits of Using the IAC 500 Sensor for Ambient Conditions

byCS Instruments

PPTX

Digital transformation success powered by EPM and NexInfo.pptx

byjayasuryanexinfo

PPTX

Introduction to Computer Network Concepts.pptx

byAnacrissa Soriano

PDF

RaaS™ — Research as a Service | Sovereign-Grade Energy & Infrastructure

PDF

Advent of Cyber 2025 TryHackMe Certificate

byVICTOR MAESTRE RAMIREZ

PDF

_OSHA102_U06_Chemical Safety and Hazard Communication_ 001 (1) (1).pdf

PPTX

Corporate AI Training to AI Enable a Company Workforce

byStélio Inácio

PPTX

Compare and contrast types of attacks.pptx

bysyednaqihassan14

DOCX

Top Websites To ⭐Buy ⭐Old ⭐Gmail ⭐Accounts (PVA & Bulk) (5).docx

byBuy Old Gmail Accounts

Stop Calling It Hallucination: Semantic Drift as AI’s Real Failure Mode

bySemantic Fidelity Lab

How Azure DevOps Consultants Dubai Reduce Release Delays.pdf

Huawei Datacom – How To Pass H12-892 On Your First Try

Artificial Intelligence and Barbarism - Conceptual Map

Top 10 API Automation Testing Tools: Features, Pros & Cons

byhenrycavill30521

The map to conquer linear algebra for IT engineer

byakipii ogaoga

Industrial RFID Landscape for 2025 - Readers, Tags, Antennas, Printers, etc

Xemelgo - RFID Industry Predictions for 2026

Top Benefits of Using KVM VPS Hosting for Growing Businesses

byEthernet Servers

The Multiverse of Artificial Intelligence

RaaS™ — Research as a Service | Sovereign-Grade Energy & Infrastructure

Benefits of Using the IAC 500 Sensor for Ambient Conditions

byCS Instruments

Digital transformation success powered by EPM and NexInfo.pptx

byjayasuryanexinfo

Introduction to Computer Network Concepts.pptx

byAnacrissa Soriano

RaaS™ — Research as a Service | Sovereign-Grade Energy & Infrastructure

Advent of Cyber 2025 TryHackMe Certificate

byVICTOR MAESTRE RAMIREZ

_OSHA102_U06_Chemical Safety and Hazard Communication_ 001 (1) (1).pdf

Corporate AI Training to AI Enable a Company Workforce

byStélio Inácio

Compare and contrast types of attacks.pptx

bysyednaqihassan14

Top Websites To ⭐Buy ⭐Old ⭐Gmail ⭐Accounts (PVA & Bulk) (5).docx

byBuy Old Gmail Accounts

Querying Distributed RDF Graphs: The Effects of Partitioning Poster

1.
Querying Distributed RDFGraphs: The Effects of Partitioning Anthony Potter Boris Motik Ian Horrocks Challenges Create a distributed, cloud-based DBMS for large RDF graphs. The two main challenges are: • How to partition data across a cluster • How to answer queries over a cluster LUBM: Percentage of Local Answers System Q2 Q8 Q9 Q11 Q12 Qc RDFox 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% SHAPE 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% Hash 0.44% 4.96% 0.23% 5.80% 0.00% 0.04% SP2B: Percentage of Local Answers System Q4 Q5 Q6 Q7 Q8 RDFox 95.95% 73.00% 99.90% 92.41% 91.45% SHAPE 95.23% 9.72% 100.00% 41.97% 73.72% Hash 0.01% 0.77% 0.25% 0.08% 0.26% Storage Overhead RDFox SHAPE Hash LUBM 3.60% 84.23% 0.00% SP2B 0.60% 38.63% 0.00% RDF Graph Partitioning Evaluation Wildcard * Conclusions Our approach: • Greater percentage of local answers compared to competitors • Minimal storage overhead • Recognises local answers Future Work • Develop efficient query answering scheme with wildcards • Implement distributed DBMS on top of RDFox Local Answer A local answer can be evaluated fully on a single machine: • No network communication • Fast evaluation Subject Hashing Partition element 1 Graph-based Partition element 2 RDFox: Our approach SHAPE: Semantic hash partitioning with n-hop duplication Hash: Subject hashing Aims • Maximise number of local answers to common queries • Recognise local answers • Minimise storage overhead • Many approaches to partitioning: § Subject hashing § Graph-based § Semantic hashing • Partitioning scheme affects the number of local answers • We propose a novel graph-based partitioning scheme: § Minimal storage overhead § Common (star) queries fully local Introduction of a new wildcard resource: • Represents all external resources • Tracks connections between partition elements Used in our novel query answering scheme: • Reduces the number of intermediate answers • Recognises local answers