OpenHPI 3.5 - How to Store RDF(S) Data? - Triple Stores

888
-1

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
888
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
74
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

OpenHPI 3.5 - How to Store RDF(S) Data? - Triple Stores

  1. 1. Semantic Web TechnologiesLecture 3: Semantic Web - Basic Architecture II 05: How to Store RDF(S) Data? Dr. Harald Sack Hasso Plattner Institute for IT Systems Engineering University of Potsdam Spring 2013 This file is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0)
  2. 2. 2Lecture 3: Semantic Web - Basic Architecture II Open HPI - Course: Semantic Web Technologies Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
  3. 3. 3 How in RDBMS RDF to store RDF Data? • RDF Graph can be represented by a set of Triples (s,p,o) • Triples can simply be stored in a relational database system • Idea: Use a specific relational schema for RDF data and benefit from 40 years of research in the DB community • 3 steps for SPARQL query processing: (1) Convert SPARQL query to SQL query (w.r.t. the schema) (2) Use RDBMS to answer SQL query (3) Generate SPARQL query result from SQL query result 05 How to store RDF(S) Data? - Triple StoresOpen HPI - Course: SemanticHarald Sack, Hasso-Plattner-Institut, Universität Potsdam Semantic Web Technologies , Dr. Web Technologies - Lecture 3: Semantic Web Basic Architecture
  4. 4. RDF in RDBMS4 • RDF Graph can be represented by a set of Triples (s,p,o) • Triples can simply be stored in a relational database management system (RDBMS) • Motivation: Use a specific relational schema for RDF data and benefit from 40 years of research in the DB community • 3 steps for SPARQL query processing: (1) Convert SPARQL query to SQL query (w.r.t. the schema) (2) Use RDBMS to answer SQL query (3) Generate SPARQL query result from SQL query result Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
  5. 5. Problem5 • How to store RDF triples to carry out efficient SPARQL queries? • 4 alternatives: 1.Giant Triple Storage 2.Property Tables 3.Vertically Partitioned Tables (Binary Tables) 4.Hexastore Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
  6. 6. Giant Triple Storage • Basic idea:6 • Store all RDF triples in a single table • performance depends on efficient indexing Pros: • easy to implement • works for huge numbers of properties, if indexes are chosen with care Cons: • many self joins Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
  7. 7. Giant Triple Storage • Basic idea:7 • Store all RDF triples in a single table • performance depends on efficient indexing SPARQL SELECT ?university WHERE { ?v rdf:type :GradStudent; :bachelorsFrom ?university . } SELECT t2.o AS university SQL FROM triples AS t1, triples AS t2 WHERE t1.p=‘type‘ AND t1.o=‘GradStudent‘ AND t2.p=‘bachelorsFrom‘ AND t1.s=t2.s Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
  8. 8. ID Based Triple Storage8 • Use numerical identifier for each RDF term in the dataset • saves space and enhances efficiency RDF Term ID s p o :ID1 1 1 2 3 rdf:type 2 1 4 5 :FullProfessor 3 1 6 7 :teacherOf 4 1 8 9 “AI“ 5 1 10 11 :bachelorsFrom 6 “MIT“ 7 12 13 14 :mastersFrom 8 12 15 7 “Cambridge 9 12 4 16 Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
  9. 9. Quad Tables9 • Storing multiple RDF graphs • used for provenance, versioning, contexts, etc. RDF Term ID g s p o :ID1 1 100 1 2 3 rdf:type 2 100 1 4 5 :FullProfessor 3 101 12 16 7 :teacherOf 4 100 1 8 9 “AI“ 5 102 23 21 11 :bachelorsFrom 6 “MIT“ 7 100 1 6 7 :mastersFrom 8 102 23 22 8 “Cambridge 9 101 12 4 16 Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
  10. 10. Property Tables10 • Combining all (or some) properties of similar subjects in n-ary tables • Use ID based encoding for efficiency Pros: Cons: • Fewer joins • Potentially a lot of NULLs • If the data is structured, we • Clustering is not trivial have a relational DB • Multi-value properties are complicated Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
  11. 11. Vertically Partitioned Tables (Binary Tables)11 • for each unique property create a two column table • Use ID based encoding for efficiency Pros: • Supports multi-value properties • No NULLs • Read only needed attributes (i.e. less I/O) • No clustering • Excellent performance (if number of properties is small, queries with bounded properties) Cons: • Expensive inserts • Bad performance (large number of properties, queries with unbounded properties) Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
  12. 12. Vertically Partitioned Tables (Binary Tables)!"#$%&"($)*+$,"-./0$%&"( relational DBs12 • Different physical storage models for • Row Based Storage 1233((0&$456*278-$*&"89($/":(-*$3"$(-8&2"08-$1;* • Tuples (i.e. DB records) are stored consecutively !"#$<8*(:$*&"89(= row needs to be read even if few • Entire attributes are projected ! !"#$%&()*%*+,-%./-0&1 2-%&3/-%0./4&%."3)5%$6 ! 743)-%-/84%%0&3/9% -%20%5%4)::%8233-)9"3%& Storage • Column Based 2-%#-/;%.3%0 • Read columns relevant to the query → projection is free ,"-./0$<8*(:$*&"89(= • Inserts are expensive ! <%20./$"=4&-%$%52433/3>% C D C D C D ?"%-6@#-/;%.3)/4)&:-%% E E E ! A4&%-3&2-%%B#%4&)5%Sack, Hasso-Plattner-Institut, Universität Potsdam Semantic Web Technologies , Dr. Harald
  13. 13. Hexastores13 • Create an index for every possible combination to enable efficient processing • spo, pos, osp, sop, pso, ops Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
  14. 14. Hexastores14 • Create an index for every possible combination to enable efficient processing • spo, pos, osp, sop, pso, ops Pros: • fast joins (in the beginning) Cons: • 5 times more storage • weak performance, when disk access is necessary Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
  15. 15. Triple Stores15 • Apache Jena - http://jena.apache.org • Open Link Virtuoso - http://virtuoso.openlinksw.com • Sesame - http://www.openrdf.org • SwiftOWLIM - http://www.ontotext.com/owlim • AllegroGraph - http://www.franz.com/agraph/allegrograph/ • and many more... • W3C maintains a list of triple stores: http://www.w3.org/wiki/SemanticWebTools#RDF_Triple_Store_Systems Semantic Web Technologies , Dr. Harald Sack, Hasso-Plattner-Institut, Universität Potsdam
  16. 16. Is t Web Basisarchitektur her 2. Semantic e m 2.7 Warum RDF noch nicht ausreicht Wor ore16 ld tha in n R the DF(S )? 06 The Limits of RDF(S)Open HPI - Course: Semantic Web Technologies - Lecture 3:Universität Potsdam Basic Architecture Vorlesung Semantic Web Technologien, Dr. Harald Sack, Hasso-Plattner-Institut, Semantic Web

×