‫أكاديمية الحكومة اإللكترونية الفلسطينية‬              The Palestinian eGovernment Academy                         www.ego...
AboutThis tutorial is part of the PalGov project, funded by the TEMPUS IV program of theCommission of the European Communi...
© Copyright NotesEveryone is encouraged to use this material, or part of it, but shouldproperly cite the project (logo and...
Tutorial Map                                                                                                           Top...
Module ILOsAfter completing this module students will beable to:   - Demonstrate knowledge about querying techniques   for...
Data Integration and Open Information Systems (Tutorial II)                                             The Palestinian e-...
Module OutlinePart I: Querying SPO TablesPart 2: Querying SPO Tables using SQLPart 3: Complexities and Solutions of Queryi...
RDF as a Data Model• RDF (Resource Description Framework) is the Data Model  behind the Semantic/Data Web.• RDF is a graph...
RDF as a Data Model    • RDF’s way of representing Data as triples is more      elementary than Databases or XML.       – ...
Persistent Storage of RDF Graphs                                                                                          ...
How to Query RDF data stored in one table?                                             S        P                O• How do...
How to Query RDF data stored in one table?   S          P              O                     2007  M1    year            2...
How to Query RDF data stored in one table?                                                 S        P                O• Ex...
How to Query RDF data stored in one table?                                                 S        P                O• Ex...
How to Query RDF data stored in one table?                                                    S        P                O•...
How to Query RDF data stored in one table?• Exercise 4: Star query.   – List all the names of the directors from Lebanon w...
Data Integration and Open Information Systems (Tutorial II)                                             The Palestinian e-...
Module OutlinePart I: Querying SPO TablesPart 2: Querying SPO Tables using SQLPart 3: Complexities and Solutions of Queryi...
Practical Session• Given the RDF graph in the next slide, do the following:   (1) Insert the data graph as <S,P,O> triples...
Practical Session                                                                    Palestine                            ...
Practical Session• Each student should work alone.• The student is encouraged to first answer each query mentally from the...
Data Integration and Open Information Systems (Tutorial II)                                             The Palestinian e-...
Module OutlinePart 1: Querying SPO TablesPart 2: Querying SPO Tables using SQLPart 3: Complexities and Solutions of Queryi...
Complexity of querying graph-shaped data• Where does the complexity of querying graph-shaped  data stored in a relational ...
Solution: Indexes and Dictionaries• What solutions do you propose to optimized queries on graph-  shaped data?• Build Inde...
Solution: Store RDF in a Different SchemaProperty Tables: Clustered Property Tables             Abadi et al, Scalable Sema...
Solution: Store RDF in a Different SchemaProperty Tables: Property-Class Tables            Abadi et al, Scalable Semantic ...
Solution: Store RDF in a Different SchemaVertically Partitioned Approach            Abadi et al, Scalable Semantic Web Dat...
Discussion• Advantages and Disadvantages of storing RDF in other  schemas?• What do you prefer?• Any other suggestions to ...
optional        MashQL        (under development at Birzeit University, started at the univeristy of Cyprus)• What about s...
optional       MashQL       (under development at Birzeit University, started at the univeristy of Cyprus)• A graphical qu...
optional      MashQL ExampleCeline Dion’sAlbums after                                                           www.site1....
optional       MashQL ExampleCeline Dion’sAlbums after                                                           www.site1...
optional       MashQL ExampleCeline Dion’sAlbums after                                                                www....
optional       MashQL ExampleCeline Dion’sAlbums after                                                              www.si...
optional       MashQL ExampleCeline Dion’sAlbums after                                                                    ...
optional       MashQL ExampleCeline Dion’sAlbums after                                                           www.site1...
optional       MashQL ExampleCeline Dion’sAlbums after    2003?          RDF Input           From:              http://www...
optional       MashQL Query Optimization•   The major challenge in MashQL is interactivity!•   User interaction must be wi...
optional     MashQL Query Optimization: Graph Signature‘Graph Signature’ optimizes RDF queries by :• Summarizing the RDF d...
optional       Experiments • Experiments performed on 15M SPO triples (rows) of the Yago   Dataset. • Graph Signature Inde...
References• Anton Deik, Bilal Faraj, Ala Hawash, Mustafa Jarrar: Towards Query  Optimization for the Data Web - Two Disk-B...
Thank you!   PalGov © 2011   43
Upcoming SlideShare
Loading in …5
×

Pal gov.tutorial2.session9.lab rdf-stores

782 views

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
782
On SlideShare
0
From Embeds
0
Number of Embeds
85
Actions
Shares
0
Downloads
81
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Pal gov.tutorial2.session9.lab rdf-stores

  1. 1. ‫أكاديمية الحكومة اإللكترونية الفلسطينية‬ The Palestinian eGovernment Academy www.egovacademy.psTutorial II: Data Integration and Open Information Systems Session 9 RDF Stores: Challenges and Solutions Dr. Mustafa Jarrar University of Birzeit mjarrar@birzeit.edu www.jarrar.info PalGov © 2011 1
  2. 2. AboutThis tutorial is part of the PalGov project, funded by the TEMPUS IV program of theCommission of the European Communities, grant agreement 511159-TEMPUS-1-2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.psProject Consortium: Birzeit University, Palestine University of Trento, Italy (Coordinator ) Palestine Polytechnic University, Palestine Vrije Universiteit Brussel, Belgium Palestine Technical University, Palestine Université de Savoie, France Ministry of Telecom and IT, Palestine University of Namur, Belgium Ministry of Interior, Palestine TrueTrust, UK Ministry of Local Government, PalestineCoordinator:Dr. Mustafa JarrarBirzeit University, P.O.Box 14- Birzeit, PalestineTelfax:+972 2 2982935 mjarrar@birzeit.eduPalGov © 2011 2
  3. 3. © Copyright NotesEveryone is encouraged to use this material, or part of it, but shouldproperly cite the project (logo and website), and the author of that part.No part of this tutorial may be reproduced or modified in any form or byany means, without prior written permission from the project, who havethe full copyrights on the material. Attribution-NonCommercial-ShareAlike CC-BY-NC-SAThis license lets others remix, tweak, and build upon your work non-commercially, as long as they credit you and license their new creationsunder the identical terms. PalGov © 2011 3
  4. 4. Tutorial Map Topic h Intended Learning Objectives Session 1: XML Basics and Namespaces 3A: Knowledge and Understanding Session 2: XML DTD’s 3 2a1: Describe tree and graph data models. Session 3: XML Schemas 3 2a2: Understand the notation of XML, RDF, RDFS, and OWL. Session 4: Lab-XML Schemas 3 2a3: Demonstrate knowledge about querying techniques for data models as SPARQL and XPath. Session 5: RDF and RDFs 3 2a4: Explain the concepts of identity management and Linked data. Session 6: Lab-RDF and RDFs 3 2a5: Demonstrate knowledge about Integration &fusion of Session 7: OWL (Ontology Web Language) 3 heterogeneous data. Session 8: Lab-OWL 3B: Intellectual Skills Session 9: Lab-RDF Stores -Challenges and Solutions 3 2b1: Represent data using tree and graph data models (XML & Session 10: Lab-SPARQL 3 RDF). Session 11: Lab-Oracle Semantic Technology 3 2b2: Describe data semantics using RDFS and OWL. Session 12_1: The problem of Data Integration 1.5 2b3: Manage and query data represented in RDF, XML, OWL. Session 12_2: Architectural Solutions for the Integration Issues 1.5 2b4: Integrate and fuse heterogeneous data. Session 13_1: Data Schema Integration 1C: Professional and Practical Skills Session 13_2: GAV and LAV Integration 1 2c1: Using Oracle Semantic Technology and/or Virtuoso to store Session 13_3: Data Integration and Fusion using RDF 1 and query RDF stores. Session 14: Lab-Data Integration and Fusion using RDF 3D: General and Transferable Skills 2d1: Working with team. Session 15_1: Data Web and Linked Data 1.5 2d2: Presenting and defending ideas. Session 15_2: RDFa 1.5 2d3: Use of creativity and innovation in problem solving. 2d4: Develop communication skills and logical reasoning abilities. Session 16: Lab-RDFa 3 PalGov © 2011 4
  5. 5. Module ILOsAfter completing this module students will beable to: - Demonstrate knowledge about querying techniques for the graph data model. - Manage and query data represented in RDF. PalGov © 2011 5
  6. 6. Data Integration and Open Information Systems (Tutorial II) The Palestinian e-Government Academy January, 2012Tutorial II: Data Integration and Open Information Systems Part1: Querying SPO Tables PalGov © 2011 6
  7. 7. Module OutlinePart I: Querying SPO TablesPart 2: Querying SPO Tables using SQLPart 3: Complexities and Solutions of Querying GraphShaped Data PalGov © 2011 7
  8. 8. RDF as a Data Model• RDF (Resource Description Framework) is the Data Model behind the Semantic/Data Web.• RDF is a graph-based data model.• In RDF, Data is represented as triples: <Subject, Predicate, Object>, or in short: <S,P,O>. P O S PalGov © 2011 8
  9. 9. RDF as a Data Model • RDF’s way of representing Data as triples is more elementary than Databases or XML. – This enables easy data integration and interoperability between systems (as will be demonstrated later). • EXAMPLE: the fact that the movie Sicko (2007) is directed by Michael Moore from the USA can be represented by the following triples which form a directed labeled graph:< M1, name, Sicko > 2007< M1, year, 2007 >< M1, directedBy, D1 > Sicko< D1, name, Michael Moore > M1< D1, country, C1 > name Michael D1< C1, name, USA> Moore PalGov © 2011 9
  10. 10. Persistent Storage of RDF Graphs S P O• Typically, an RDF Graph is stored as M1 year 2007 one <S,P,O> Table in a Relational M1 M1 Name directedBy Sicko D1 Database Management System. M2 directedBy D1 M2 Year 2009 2007 Sicko M2 Name Capitalism Michael Moore M3 Year 1995 directedBy M1 D1 Emmy Awards M3 directedBy D2 P1 1995 M3 Name Brave Heart M2 2009 M4 Year 2007 Capitalism USA M4 directedBy D3 C1 1995 M4 Name Caramel Washington DC directedBy D1 Name Michael Moore M3 D2 actedIn P2 Oscars D1 hasWonPrizeIn P1 Brave Heart Mel Gibson D1 Country C1 1996 2007 D2 Counrty C1 Nadine Labaki directedBy D2 hasWonPrizeIn P2 M4 D3 C2 Lebanon actedIn D2 Name Mel Gibson Beirut D2 actedIn M3 Caramel location name P3 Stockholm Festival D3 Name Nadine Labaki C3 Sweden 2007 D3 Country C2 Stockholm D3 hasWonPrizeIn P3 D3 actedIn M4 … … … PalGov © 2011 10
  11. 11. How to Query RDF data stored in one table? S P O• How do we query graph-shaped M1 M1 year Name 2007 Sicko data stored in one relational table? M1 directedBy D1 M2 directedBy D1 M2 Year 2009 M2 Name Capitalism• How do we traverse a graph stored M3 Year 1995 M3 directedBy D2 in one relational table? M3 Name Brave Heart M4 Year 2007 M4 directedBy D3 M4 Name Caramel Self Joins! D1 D1 Name hasWonPrizeIn Michael Moore P1 D1 Country C1 D2 Counrty C1 D2 hasWonPrizeIn P2 D2 Name Mel Gibson D2 actedIn M3 D3 Name Nadine Labaki D3 Country C2 D3 hasWonPrizeIn P3 D3 actedIn M4 … … … PalGov © 2011 11
  12. 12. How to Query RDF data stored in one table? S P O 2007 M1 year 2007 Sicko Michael Moore M1 Name Sicko directedBy M1 D1 Emmy Awards M1 directedBy D1 M2 directedBy D1 P1 1995 M2 2009 M2 Year 2009 M2 Name Capitalism Capitalism C1 USA M3 Year 1995 1995 M3 directedBy D2 Washington DC directedBy M3 Name Brave Heart M3 D2 actedIn P2 M4 Year 2007 Oscars M4 directedBy D3 Brave Heart Mel Gibson 1996 M4 Name Caramel 2007 Nadine Labaki D1 Name Michael Moore directedBy M4 D3 C2 Lebanon D1 hasWonPrizeIn P1 actedIn … … … Beirut Caramel location name P3 Stockholm Festival C3Exercises: Sweden 2007(1) What is the name of director D3? Stockholm(2) What is the name of the director of the movie M1?(3) List all the movies who have directors from the USA and their directors.(4) List all the names of the directors from Lebanon who have won prizes and the prizes they have won. PalGov © 2011 12
  13. 13. How to Query RDF data stored in one table? S P O• Exercise 1: A simple query. M1 M1 year Name 2007 Sicko – What is the name of director D3? M1 directedBy D1 M2 directedBy D1 – Consider the table MoviesTable (s,p,o). M2 Year 2009 – SQL: M2 Name Capitalism M3 Year 1995 M3 directedBy D2Select o M3 Name Brave HeartFrom MoviesTable T M4 Year 2007 M4 directedBy D3Where T.s = „D3‟ AND T.p = „Name‟; M4 Name Caramel D1 Name Michael Moore D1 hasWonPrizeIn P1 D1 Country C1 D2 Counrty C1 D2 hasWonPrizeIn P2 D2 Name Mel Gibson D2 actedIn M3 D3 Name Nadine Labaki D3 Country C2 D3 hasWonPrizeIn P3 D3 actedIn M4 … … … PalGov © 2011 Ans: Nadine Labaki 13
  14. 14. How to Query RDF data stored in one table? S P O• Exercise 2: A path query with 1 join. M1 M1 year Name 2007 Sicko – What is the name of the director of the M1 directedBy D1 M2 directedBy D1 movie M1. M2 Year 2009 – Consider the table MoviesTable (s,p,o). M2 Name Capitalism M3 Year 1995 – SQL: M3 directedBy D2 M3 Name Brave HeartSelect T2.o M4 Year 2007 M4 directedBy D3From MoviesTable T1, MoviesTable T2 M4 Name CaramelWhere {T1.s = „M1‟ AND D1 Name Michael Moore D1 hasWonPrizeIn P1 T1.p = „directedBy‟ AND D1 Country C1 D2 Counrty C1 T1.o = T2.s AND D2 hasWonPrizeIn P2 T2.p = „name‟}; D2 Name Mel Gibson D2 actedIn M3 D3 Name Nadine Labaki D3 Country C2 D3 hasWonPrizeIn P3 D3 actedIn M4 … … … PalGov © 2011 Ans: Michael Moore 14
  15. 15. How to Query RDF data stored in one table? S P O• Exercise 3: A path query with 2 M1 M1 year Name 2007 Sicko joins. M1 directedBy D1 M2 directedBy D1 – List all the movies who have directors from M2 Year 2009 the USA and their directors. M2 Name Capitalism M3 Year 1995 – Consider the table MoviesTable(s,p,o). M3 directedBy D2 – SQL: M3 Name Brave Heart … … … D1 Name Michael MooreSelect T1.s, T1.o D1 hasWonPrizeIn P1From MoviesTable T1, MoviesTable T2, D1 Country C1 D2 Counrty C1 MoviesTable T3 D2 hasWonPrizeIn P2 D2 Name Mel GibsonWhere {T1.o = T2.s AND D2 actedIn M3 T2.o = T3.s AND … … … C1 Name USA T1.p = „directedBy‟ AND C1 Capital Washington DC T2.p = „country‟ AND C2 Name Lebanon C2 Capital Beirut T3.p = „name‟ AND … … … T3.o = „USA‟}; Ans: M1 D1; M2 D1; M3 D2 PalGov © 2011 15
  16. 16. How to Query RDF data stored in one table?• Exercise 4: Star query. – List all the names of the directors from Lebanon who have won prizes and the prizes they have won. S P O – Consider the table MoviesTable (S,P,O). D1 Name Michael Moore D1 hasWonPrizeIn P1Select T1.o, T4.o D1 Country C1From MoviesTable T1, MoviesTable T2, D2 Counrty C1 D2 hasWonPrizeIn P2 MoviesTable T3, MoviesTable T4 D2 Name Mel GibsonWhere {T1.p = „name‟ AND D2 actedIn M3 D3 Name Nadine Labaki T2.s = T1.s AND D3 Country C2 T2.p = „country‟ AND D3 hasWonPrizeIn P3 D3 actedIn M4 T2.o = T3.s AND … … … T3.p = „name‟ AND C1 Name USA C1 Capital Washington DC T3.o = „Lebanon‟ AND C2 Name Lebanon T4.s = T1.s AND C2 Capital Beirut … … … T4.p = „hasWonPrizeIn‟}; Ans: ‘Nadine Labaki’ , P3 PalGov © 2011 16
  17. 17. Data Integration and Open Information Systems (Tutorial II) The Palestinian e-Government Academy January, 2012Tutorial II: Data Integration and Open Information Systems Part 2: Querying SPO Tables using SQL PalGov © 2011 17
  18. 18. Module OutlinePart I: Querying SPO TablesPart 2: Querying SPO Tables using SQLPart 3: Complexities and Solutions of Querying GraphShaped Data PalGov © 2011 18
  19. 19. Practical Session• Given the RDF graph in the next slide, do the following: (1) Insert the data graph as <S,P,O> triples in a 3-column table in any DBMS. Schema of the table: Books (Subject, Predicate, Object) (2) Write the following queries in SQL and execute them over the newly created table: • List all the authors born in a country which has the name Palestine. • List the names of all authors with the name of their affiliation who are born in a country whose capital’s population is14M. Note that the author must have an affiliation. • List the names of all books whose authors are born in Lebanon along with the name of the author. PalGov © 2011 19
  20. 20. Practical Session Palestine 7.6K Said Capital Name CN1 CA1 Jerusalem AU1 BK1 CU Colombia University Author Viswanathan 14.0M BK2 AU2 India Capital Name New Delhi Wamadat CN2 CA2 Name Author Naima BK3 AU3 Lebanon 2.0M The Prophet Gibran CN3 CA3 BK4 Beirut Author AU4This data graph is about books. It talks about four books (BK1-BK4).Information recorded about a book includes data such as; its author,affiliation, country of birth including its capital and the population ofits capital. PalGov © 2011 20
  21. 21. Practical Session• Each student should work alone.• The student is encouraged to first answer each query mentally from the graph before writing and executing the SQL query, and to compare the results of the query execution with his/her expected results.• The student is strongly recommended to write two additional queries, execute them on the data graph, and hand them along with the required queries.• The student must highlight problems and challenges of executing queries on graph-shaped data and propose solutions to the problem.• Each student must expect to present his/her queries at class and, more importantly, he/she must be ready to discuss the problems of querying graph- shaped data and his/her proposed solutions.• The final delivery should include (i) a snapshot of the built table, (ii) the SQL queries, (iii) The results of the queries, and (iv) a one-page discussion of the problems of querying graph-shaped data and the proposed possible solutions. These must be handed in a report form in PDF Format. PalGov © 2011 21
  22. 22. Data Integration and Open Information Systems (Tutorial II) The Palestinian e-Government Academy January, 2012Tutorial II: Data Integration and Open Information Systems Part 3 Complexities and Solutions of Querying Graph Shaped Data PalGov © 2011 22
  23. 23. Module OutlinePart 1: Querying SPO TablesPart 2: Querying SPO Tables using SQLPart 3: Complexities and Solutions of Querying GraphShaped Data PalGov © 2011 23
  24. 24. Complexity of querying graph-shaped data• Where does the complexity of querying graph-shaped data stored in a relational table lie?• The complexity of querying a data graph lies in the many self-joins that are to be performed on the table.• Accessing multiple properties for a resource requires subject-subject joins (Star-shaped queries).• Path expressions require subject-object joins.• In general, a query with n edges on an SPO table requires n-1 self-joins of that table. PalGov © 2011 24
  25. 25. Solution: Indexes and Dictionaries• What solutions do you propose to optimized queries on graph- shaped data?• Build Indexes. – On each column: S, P, O – On Permutations of two columns: SP, SO, PS, PO, … – On Permutation of three columns: SPO, SOP, PSO, POS, OSP, OPS, …• Dictionary Encoding String Data? – Replacing all literals by ids. – Triple stores are compressed. – Allows for faster comparison and matching operations.  RDF3X Solution PalGov © 2011 25
  26. 26. Solution: Store RDF in a Different SchemaProperty Tables: Clustered Property Tables Abadi et al, Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB 2007. Clustered Property Table: • A clustered property table, contains clusters of properties that tend to be defined together. • Multiple property tables with different clusters of properties may be created to optimize queries. • A key requirement for this type of property table is that a particular property may only appear in at most one property table. • Advantage: Some queries can be answered directly from the property table with no joins. - Ex: retrieve the title of the book authored in 2001. PalGov © 2011 26
  27. 27. Solution: Store RDF in a Different SchemaProperty Tables: Property-Class Tables Abadi et al, Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB 2007. Property-Class Tables: • A property-class table exploits the Type property of subjects to cluster similar sets of subjects together in the same table. • Unlike the first type of property table, a property may exist in multiple property-class tables. • This solution is found in Jena2. • Oracle adopts a property table-like data structure (they call it a “subject-property matrix”). However, they are not used for primary storage, but rather as an auxiliary data structure. PalGov © 2011 27
  28. 28. Solution: Store RDF in a Different SchemaVertically Partitioned Approach Abadi et al, Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB 2007. Six Vertically Partitioned Tables: • The triples table is rewritten into n two column tables where n is the number of unique properties in the data. • In each of these tables, the first column contains the subjects that define that property and the second column contains the object values for those subjects. • This method is called Vertical Partitioning. • An experimental implementation of this approach was performed using an open source column-oriented database system (C-Store). PalGov © 2011 28
  29. 29. Discussion• Advantages and Disadvantages of storing RDF in other schemas?• What do you prefer?• Any other suggestions to optimize queries over graph- shaped data? PalGov © 2011 29
  30. 30. optional MashQL (under development at Birzeit University, started at the univeristy of Cyprus)• What about summarizing the data graph, and instead of querying the original data, we query the summary and get the same results?• What about summarizing 15M SPO triples into less then 500,000?• This is called Graph Signature Indexing. It is currently in research at Birzeit University. Initial comparison between GS performance and Oracle’s Semantic Technologies PalGov © 2011 30
  31. 31. optional MashQL (under development at Birzeit University, started at the univeristy of Cyprus)• A graphical query formulation language for the Data Web• A general structured-data retrieval language (not merely an interface) RDF Input From: http:www.site1.com/rdf http:www.site2.comrdf MashQL Article Title ArticleTitle Author “^Hacker” YearPubYear > 2000Addresses all of the following challenges:  The user does not know the schema.  There is no offline or inline schemaontology.  The query may involve multiple sources.  The query language is expressive (not a single-purpose interface) PalGov © 2011 31
  32. 32. optional MashQL ExampleCeline Dion’sAlbums after www.site1.com/rdf 2003? RDF Input :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” From: :A1 :ProdYear 2007 http://www.site1.com/rdf :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani” Query www.site2.com/rdf Everything :B1 rdf:Type bibo:Album :B1 :Title “The prayer” Artist “^Dion” :B1 :Artist “Celine Dion” :B1 :Artist “Josh Groban” YearProdYear > 2003 :B1 :Year 2008 Title AlbumTitle :B2 rdf:Type bibo:Album :B2 :Title “Miracle” :B2 :Author “Celine Dion” :B2 :Year 2004  Interactive Query Formulation.  MashQL queries are translated into and executed as SPARQL. PalGov © 2011 32
  33. 33. optional MashQL ExampleCeline Dion’sAlbums after www.site1.com/rdf 2003? RDF Input :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” From: :A1 :ProdYear 2007 http://www.site1.com/rdf :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani” Query www.site2.com/rdf Everything :B1 rdf:Type bibo:Album :B1 :Title “The prayer” :B1 :Artist “Celine Dion” :B1 :Artist “Josh Groban” :B1 :Year 2008 :B2 rdf:Type bibo:Album :B2 :Title “Miracle” :B2 :Author “Celine Dion” :B2 :Year 2004 PalGov © 2011 33
  34. 34. optional MashQL ExampleCeline Dion’sAlbums after www.site1.com/rdf 2003? RDF Input :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” From: :A1 :ProdYear 2007 http://www.site1.com/rdf :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani” Query www.site2.com/rdf Everything :B1 rdf:Type bibo:Album Everything :B1 :Title “The prayer” Album :B1 :Artist “Celine Dion” A1 :B1 :Artist “Josh Groban” A2 :B1 :Year 2008 B1 :B2 rdf:Type bibo:Album Types Individuals :B2 :Title “Miracle” :B2 :Author “Celine Dion” :B2 :Year 2004 Background queries SELECT X WHERE {?X ?P ?O} Union SELECT X WHERE {?S ?P ?X} SELECT X WHERE {?S rdf:Type ?X} PalGov © 2011 34
  35. 35. optional MashQL ExampleCeline Dion’sAlbums after www.site1.com/rdf 2003? RDF Input :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” From: :A1 :ProdYear 2007 http://www.site1.com/rdf :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani” Query www.site2.com/rdf Everything :B1 rdf:Type bibo:Album :B1 :Title “The prayer” Artist Contains Dion :B1 :Artist “Celine Dion” Artist Author Equals Equals :B1 :Artist “Josh Groban” Contains Contains Title OneOf :B1 :Year 2008 ProdYear Not :B2 rdf:Type bibo:Album Type Between :B2 :Title “Miracle” LessThan Year MoreThan :B2 :Author “Celine Dion” :B2 :Year 2004 Background query SELECT P WHERE {?Everything ?P ?O} PalGov © 2011 35
  36. 36. optional MashQL ExampleCeline Dion’sAlbums after www.site1.com/rdf 2003? RDF Input :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” From: :A1 :ProdYear 2007 http://www.site1.com/rdf :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani” Query www.site2.com/rdf Everything :B1 rdf:Type bibo:Album :B1 :Title “The prayer” Artist “^Dion :B1 :Artist “Celine Dion” :B1 :Artist “Josh Groban” Year ProdYear MoreTh 2003 :B1 :Year 2008 Artist Equals Equals Author Contains :B2 rdf:Type bibo:Album Title OneOf :B2 :Title “Miracle” ProdYear Not Between :B2 :Author “Celine Dion” Type LessThan :B2 :Year 2004 Year MoreThan MoreThan Background query SELECT P WHERE {?Everything ?P ?O} PalGov © 2011 36
  37. 37. optional MashQL ExampleCeline Dion’sAlbums after www.site1.com/rdf 2003? RDF Input :A1 :Title “Taking Chances” :A1 :Artist “Dion C.” From: :A1 :ProdYear 2007 http://www.site1.com/rdf :A1 :Producer “Peer Astrom” :A2 :Title “Monodose” http://www.site2.com/rdf :A2 :Artist “Ziad Rahbani” Query www.site2.com/rdf Everything :B1 rdf:Type bibo:Album :B1 :Title “The prayer” Artist “^Dion :B1 :Artist “Celine Dion” :B1 :Artist “Josh Groban” YearProdYear > 2003 :B1 :Year 2008 :B2 rdf:Type bibo:Album Title AlbumTitle :B2 :Title “Miracle” Artist Author :B2 :Author “Celine Dion” Title Title :B2 :Year 2004 ProdYear Type Year Background query SELECT P WHERE {?Everything ?P ?O} PalGov © 2011 37
  38. 38. optional MashQL ExampleCeline Dion’sAlbums after 2003? RDF Input From: http://www.site1.com/rdf http://www.site2.com/rdf Query Everything Artist “^Dion” YearProdYear > 2003 Title AlbumTitle PREFIX S1: <http://site1.com/rdf> PREFIX S2: <http://site1.com/rdf> SELECT ?AlbumTitle FROM <http://site1.com/rdf> FROM <http://site2.com/rdf> WHERE { {?X S1:Artist ?X1} UNION {?X S2:Artist ?X1} {?X S1:ProdYear ?X2} UNION {?X S2:Year ?X2} {{?X S1:Title ?AlbumTitle} UNION {?X S2:Title ?AlbumTitle}} FILTER regex(?X1, “^Dion”) FILTER (?X2 > 2003)} PalGov © 2011 38
  39. 39. optional MashQL Query Optimization• The major challenge in MashQL is interactivity!• User interaction must be within 100ms.• However, querying RDF is typically slow.• How about dealing with huge datasets!?• Implemented Solutions (i.e., Oracle) proved to perform weakly specially in MashQL background queries. A query optimization solution was developed. It is called Graph Signature Indexing. PalGov © 2011 39
  40. 40. optional MashQL Query Optimization: Graph Signature‘Graph Signature’ optimizes RDF queries by :• Summarizing the RDF dataset (Algorithms were developed)• Then, executing the queries on the summary instead of the original dataset (A new execution plan was developed) PalGov © 2011 40
  41. 41. optional Experiments • Experiments performed on 15M SPO triples (rows) of the Yago Dataset. • Graph Signature Index of less than 500K was built on the data. • Two sets of queries were performed, and the results were as follows: Extreme-case linear queries of depth 1-5: Query A1 A2 A3 A4 A5 GS 0.344 1.953 5.250 10.234 24.672 Oracle 110.390 302.672 525.844 702.969 >20min Queries that cover all MashQL Background queries some queries that might occur as the final queries generated in MashQL:Query Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 GS 0.441 0.578 0.587 0.556 0.290 0.291 0.587 0.785 2.469Oracle 25.640 29.875 29.593 29.641 19.922 21.922 62.734 64.813 32.703 PalGov © 2011 41
  42. 42. References• Anton Deik, Bilal Faraj, Ala Hawash, Mustafa Jarrar: Towards Query Optimization for the Data Web - Two Disk-Based algorithms: Trace Equivalence and Bisimilarity.• Mustafa Jarrar and Marios D. Dikaiakos: A Query Formulation Language for the Data Web. IEEE Transactions on Knowledge and Data Engineering.IEEE Computer Society.• Abadi et al, Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB 2007. PalGov © 2011 42
  43. 43. Thank you! PalGov © 2011 43

×