Your SlideShare is downloading. ×
STI Summit 2011 - DB vs RDF
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

STI Summit 2011 - DB vs RDF

125
views

Published on

Published in: Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
125
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. DB vs RDF: structure vs correlation +comments on data integration & SW Peter Boncz Senior Research Scientist @ CWI Lecturer @ Vrije Universiteit Amsterdam Architect & Co-founder MonetDB Architect & Co-founder VectorWise STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 1
  • 2. Correlations Real data is highly correlated }  (gender, location) ó firstname }  (profession,age) ó income But… database systems assume attribute independence }  wrong assumption leads to suboptimal query plan STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 2
  • 3. Example: co-authorship query DBLP “find authors who published in VLDB, SIGMOD and Bioinformatics” SELECT author FROM publications p1, p2, p3 WHERE p1.venue = VLDB and p2.venue = SIGMOD and p3.venue = Bioinformatics and p1.author = = p2.author and p1.author p2.author and p1.author = = p3.author and p1.author p3.author and p2.author = = p3.author p2.author p3.author STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 3
  • 4. Correlations: RDBMS vs RDF Schematic structure in RDF data hidden in correlations, which are everywhere SPARQL leads to plans with }  many self-joins }  whose hit-ratio is correlated (with eg selections) Relational query optimizers do not have a clue è all self-joins look the same to it è random join order, bad query plans L STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 4
  • 5. RDF Engines to the Next Level Challenge: solving the correlation problem. Ideas? }  Interleave optimization and execution }  Run-time sampling to detect true selectivities }  Run-time query (re-)planning }  Tackling when there are long join chains }  Creating partial path indexes }  Graph “cracking”: index build as side-effect of query }  . STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 5
  • 6. Stratos Idreos: database cracking Challenge: solving the correlation problem. Ideas? }  Interleave optimization and execution }  Run-time sampling to detect true selectivities }  Run-time query (re-)planning }  Tackling when there are long join chains }  Creating partial path indexes }  Graph “cracking”: index build as side-effect of query }  . STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 6
  • 7. RDF Engines to the Next Level Challenge: solving the correlation problem. Ideas? }  Interleave optimization and execution }  Run-time sampling to detect true selectivities }  Run-time query (re-)planning }  Tackling when there are long join chains }  Creating partial path indexes }  Graph “cracking”: index build as side-effect of query }  “recycling” intermediate results STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 7
  • 8. RDF Engines to the Next Level Benchmarking in LOD2: }  Attempting to engage vendors to collaborate “a TPC for RDF” }  New and more challenging benchmarks “Suitably designed benchmarks drive progress” benchmarks with: }  complex query patterns on large data }  geo and text data and queries }  outside Linked Open Datasets that get joined to synthetic data }  highly interlinked graph structures and queries on them }  correlated query predicates STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 8
  • 9. Social Intelligence Benchmark (SIB) RDF-friendly benchmark simulating a huge social network }  Social Graphs have understandable, interesting, scenarios }  Social Graphs are highly connected }  LOD in the wild not yet + Exploiting knowledge bases from interlinked RDF datasets è not only synthetic data, also linking out to DBpedia conversation topics, real world concepts, geographical information, connectivity, social network analysis Data correlations galore STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 9
  • 10. thoughts on.. Information Integration Data integration and Schema Integration }  Different applications, organizations, time motives hard problem, tens of B$/y }  Has been on the DB R&D menu for 20+ years }  AI complete, immature tech }  hard to achieve high precision automatically }  Semantic Web does not solve this issue }  In fact, it is its major hurdle to success }  Information integration != {Reasoning,Inference,Logic} STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 10
  • 11. The schema.org Approach choose }  web-addressable, machine-readable schema+data over }  ragged, graph, schema-last RDF data model Approach: }  schema-first, centralized, controlled }  well-defined use case (web search = ad money) STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 11
  • 12. The Watson Approach Winning Jeopardy! Is pretty cool No central role for reasoning, inference there Recipe: }  Finding statistical evidence in Big Data }  Using semantics for focused sub-tasks (only) }  Intelligently combining multiple approaches }  Focusing on the Jeopardy! problem at hand STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 12
  • 13. DBMS Architecture for new HW “hardware-conscious database architectures” }  focus on exploiting modern hardware }  platform independent, flexible }  highest performance per core }  green }  scalability to ~10 TB, currently examples: STI Summit, July 6, 2011 Riga. DB vs RDF: structure vs correlation 13