Querying Distributed RDF Graphs: The Effects of Partitioning 
Anthony Potter Boris Motik Ian Horrocks 
Challenges 
Create a distributed, cloud-based DBMS for large 
RDF graphs. The two main challenges are: 
• How to partition data across a cluster 
• How to answer queries over a cluster 
LUBM: Percentage of Local Answers 
System Q2 Q8 Q9 Q11 Q12 Qc 
RDFox 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 
SHAPE 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 
Hash 0.44% 4.96% 0.23% 5.80% 0.00% 0.04% 
SP2B: Percentage of Local Answers 
System Q4 Q5 Q6 Q7 Q8 
RDFox 95.95% 73.00% 99.90% 92.41% 91.45% 
SHAPE 95.23% 9.72% 100.00% 41.97% 73.72% 
Hash 0.01% 0.77% 0.25% 0.08% 0.26% 
Storage Overhead 
RDFox SHAPE Hash 
LUBM 3.60% 84.23% 0.00% 
SP2B 0.60% 38.63% 0.00% 
RDF Graph Partitioning 
Evaluation 
Wildcard * 
Conclusions 
Our approach: 
• Greater percentage of local 
answers compared to 
competitors 
• Minimal storage overhead 
• Recognises local answers 
Future Work 
• Develop efficient query 
answering scheme with 
wildcards 
• Implement distributed 
DBMS on top of RDFox 
Local Answer 
A local answer can be 
evaluated fully on a 
single machine: 
• No network 
communication 
• Fast evaluation Subject Hashing Partition element 1 Graph-based 
Partition element 2 
RDFox: Our approach 
SHAPE: Semantic hash partitioning 
with n-hop duplication 
Hash: Subject hashing 
Aims 
• Maximise number of 
local answers to 
common queries 
• Recognise local 
answers 
• Minimise storage 
overhead 
• Many approaches to partitioning: 
§ Subject hashing 
§ Graph-based 
§ Semantic hashing 
• Partitioning scheme affects the 
number of local answers 
• We propose a novel graph-based 
partitioning scheme: 
§ Minimal storage overhead 
§ Common (star) queries fully 
local 
Introduction of a new 
wildcard resource: 
• Represents all external 
resources 
• Tracks connections 
between partition 
elements 
Used in our novel query 
answering scheme: 
• Reduces the number of 
intermediate answers 
• Recognises local 
answers

Querying Distributed RDF Graphs: The Effects of Partitioning Poster

  • 1.
    Querying Distributed RDFGraphs: The Effects of Partitioning Anthony Potter Boris Motik Ian Horrocks Challenges Create a distributed, cloud-based DBMS for large RDF graphs. The two main challenges are: • How to partition data across a cluster • How to answer queries over a cluster LUBM: Percentage of Local Answers System Q2 Q8 Q9 Q11 Q12 Qc RDFox 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% SHAPE 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% Hash 0.44% 4.96% 0.23% 5.80% 0.00% 0.04% SP2B: Percentage of Local Answers System Q4 Q5 Q6 Q7 Q8 RDFox 95.95% 73.00% 99.90% 92.41% 91.45% SHAPE 95.23% 9.72% 100.00% 41.97% 73.72% Hash 0.01% 0.77% 0.25% 0.08% 0.26% Storage Overhead RDFox SHAPE Hash LUBM 3.60% 84.23% 0.00% SP2B 0.60% 38.63% 0.00% RDF Graph Partitioning Evaluation Wildcard * Conclusions Our approach: • Greater percentage of local answers compared to competitors • Minimal storage overhead • Recognises local answers Future Work • Develop efficient query answering scheme with wildcards • Implement distributed DBMS on top of RDFox Local Answer A local answer can be evaluated fully on a single machine: • No network communication • Fast evaluation Subject Hashing Partition element 1 Graph-based Partition element 2 RDFox: Our approach SHAPE: Semantic hash partitioning with n-hop duplication Hash: Subject hashing Aims • Maximise number of local answers to common queries • Recognise local answers • Minimise storage overhead • Many approaches to partitioning: § Subject hashing § Graph-based § Semantic hashing • Partitioning scheme affects the number of local answers • We propose a novel graph-based partitioning scheme: § Minimal storage overhead § Common (star) queries fully local Introduction of a new wildcard resource: • Represents all external resources • Tracks connections between partition elements Used in our novel query answering scheme: • Reduces the number of intermediate answers • Recognises local answers