May 14, 2014
RDF Analytics
Lenses over Semantic Graphs
Dario Colazzo 3,1
Franc¸ois Goasdou´e 4,1
Ioana Manolescu 1,2
Alexa...
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT comp...
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT comp...
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT comp...
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT comp...
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT comp...
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT comp...
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT comp...
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT comp...
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT comp...
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2
RDF data warehousing scenario
þAlice
software engineer
IT comp...
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 3
RDF data warehousing
Application needs:
(i) support of heterog...
Summary
1. RDF Graphs & BGP Queries
2. RDF Graph Analysis
3. On-Line Analytical Processing
4. Empirical Evaluation
5. Sum ...
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 5
RDF Graphs & BGP Queries
– recall –
The Resource Description Framework (RDF)
RDF graph – set of triples
Assertion Triple Relational notation
Class s rdf:type ...
RDF Schema (RDFS)
– declare semantic constraints between classes and properties
Constraint Triple Relational notation
Subc...
Open-world assumption and RDF entailment
RDF data model – based on the open-world assumption.
→ deductive constraints – im...
Basic Graph Pattern (BGP) queries
→ subset of SPARQL; BGP – conjunctions of triple patterns
q(y) :- x rdf:type Person, x h...
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 10
RDF Graph Analysis
– formal framework for warehousing RDF dat...
Analytical schema (AnS) and instance (I)
RDF graph:
Person
user1
user2
rdf:type
rdf:type
BillhasName
post1
post2
wrote
wro...
Analytical schema (AnS) and instance (I)
RDF graph:
Person
user1
user2
rdf:type
rdf:type
BillhasName
post1
post2
wrote
wro...
Analytical schema (AnS) and instance (I)
RDF graph:
Person
user1
user2
rdf:type
rdf:type
BillhasName
post1
post2
wrote
wro...
Analytical schema (AnS) and instance (I)
RDF graph:
Person
user1
user2
rdf:type
rdf:type
BillhasName
post1
post2
wrote
wro...
Analytical schema (AnS) and instance (I)
RDF graph:
Person
user1
user2
rdf:type
rdf:type
BillhasName
post1
post2
wrote
wro...
Analytical schema (AnS) and instance (I)
RDF graph:
Person
user1
user2
rdf:type
rdf:type
BillhasName
post1
post2
wrote
wro...
Analytical query (AnQ)
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 12
Analytical schema: Instance:
n1 : Blog...
Analytical query (AnQ)
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 12
Analytical schema: Instance:
n1 : Blog...
Analytical query (AnQ)
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 12
Analytical schema: Instance:
n1 : Blog...
Analytical query (AnQ)
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 12
Analytical schema: Instance:
n1 : Blog...
Analytical query answering
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 13
through analytical schema material...
Analytical query answering
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 13
through analytical schema material...
Analytical query answering
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 13
through analytical schema material...
Analytical query answering
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 13
through analytical schema material...
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 14
On-Line Analytical Processing
– applying OLAP operations –
Slice, dice, drill-in and drill-out
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15
Query: Find the number of...
Slice, dice, drill-in and drill-out
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15
Query: Find the number of...
Slice, dice, drill-in and drill-out
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15
Query: Find the number of...
Slice, dice, drill-in and drill-out
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15
Query: Find the number of...
Slice, dice, drill-in and drill-out
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15
Query: Find the number of...
Roll-up and drill-down
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 16
Query: Find the number of sites where ...
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 17
Empirical Evaluation
– experiments and demo –
Experiments
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 18
Settings: kdb+ v3.0 (64 bits) – highly efficient in...
Analytical query answering
12 patterns c number of triple patterns in the classifier query
1,097 queries v number of dimens...
Java GUI using the Prefuse toolkit
(collaboration with Tushar Ghosh)
RDF Analytics: Lenses over Semantic Graphs May 14, 20...
Java GUI using the Prefuse toolkit
(collaboration with Tushar Ghosh)
RDF Analytics: Lenses over Semantic Graphs May 14, 20...
Java GUI using the Prefuse toolkit
(collaboration with Tushar Ghosh)
RDF Analytics: Lenses over Semantic Graphs May 14, 20...
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 21
Sum Up
Related works
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 22
Graph cube: on warehousing and OLAP multidimens...
Sum up and perspectives
RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 23
Sum up:
Approach for specifying and e...
Questions?
I
You Attention
Question
:b1
:b2
:b3
thank
payed
ask
ask
ask
rdf:type
rdf:type
rdf:type
alexandra.roatis@inria....
Upcoming SlideShare
Loading in …5
×

RDF Analytics: Lenses over Semantic Graphs

623 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
623
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

RDF Analytics: Lenses over Semantic Graphs

  1. 1. May 14, 2014 RDF Analytics Lenses over Semantic Graphs Dario Colazzo 3,1 Franc¸ois Goasdou´e 4,1 Ioana Manolescu 1,2 Alexandra Roatis¸ 2,1 1OAK – Inria, France 2LRI – Universit´e Paris-Sud, France 3LAMSADE – Universit´e Paris Dauphine, France 4PILGRIM – Universit´e Rennes 1, France
  2. 2. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2 RDF data warehousing scenario þAlice software engineer IT company builds user applications open RDF data (Grenoble) worksFor DS: Restaurants (i) heterogeneous data App: clickable map m #restaurants region & average rating type of cuisine build RDW: relational data warehouse extract tabular data (SPARQL queries) merge (ii) new central concepts DS3: MuseumsDS2: Shops RDW2 RDW3 (iii) other missing relationships? Bug: landmarks museums find redesign Feature: query relationships region famous people (iv) query schema add Feature: new type of aggregation for each landmark, show how many restaurants are nearby (v) impossible ! (separate star schema; restaurants and landmarks – central entities) add
  3. 3. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2 RDF data warehousing scenario þAlice software engineer IT company builds user applications open RDF data (Grenoble) worksFor DS: Restaurants (i) heterogeneous data App: clickable map m #restaurants region & average rating type of cuisine build RDW: relational data warehouse extract tabular data (SPARQL queries) merge (ii) new central concepts DS3: MuseumsDS2: Shops RDW2 RDW3 (iii) other missing relationships? Bug: landmarks museums find redesign Feature: query relationships region famous people (iv) query schema add Feature: new type of aggregation for each landmark, show how many restaurants are nearby (v) impossible ! (separate star schema; restaurants and landmarks – central entities) add
  4. 4. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2 RDF data warehousing scenario þAlice software engineer IT company builds user applications open RDF data (Grenoble) worksFor DS: Restaurants (i) heterogeneous data App: clickable map m #restaurants region & average rating type of cuisine build RDW: relational data warehouse extract tabular data (SPARQL queries) merge (ii) new central concepts DS3: MuseumsDS2: Shops RDW2 RDW3 (iii) other missing relationships? Bug: landmarks museums find redesign Feature: query relationships region famous people (iv) query schema add Feature: new type of aggregation for each landmark, show how many restaurants are nearby (v) impossible ! (separate star schema; restaurants and landmarks – central entities) add
  5. 5. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2 RDF data warehousing scenario þAlice software engineer IT company builds user applications open RDF data (Grenoble) worksFor DS: Restaurants (i) heterogeneous data App: clickable map m #restaurants region & average rating type of cuisine build RDW: relational data warehouse extract tabular data (SPARQL queries) merge (ii) new central concepts DS3: MuseumsDS2: Shops RDW2 RDW3 (iii) other missing relationships? Bug: landmarks museums find redesign Feature: query relationships region famous people (iv) query schema add Feature: new type of aggregation for each landmark, show how many restaurants are nearby (v) impossible ! (separate star schema; restaurants and landmarks – central entities) add
  6. 6. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2 RDF data warehousing scenario þAlice software engineer IT company builds user applications open RDF data (Grenoble) worksFor DS: Restaurants (i) heterogeneous data App: clickable map m #restaurants region & average rating type of cuisine build RDW: relational data warehouse extract tabular data (SPARQL queries) merge (ii) new central concepts DS3: MuseumsDS2: Shops RDW2 RDW3 (iii) other missing relationships? Bug: landmarks museums find redesign Feature: query relationships region famous people (iv) query schema add Feature: new type of aggregation for each landmark, show how many restaurants are nearby (v) impossible ! (separate star schema; restaurants and landmarks – central entities) add
  7. 7. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2 RDF data warehousing scenario þAlice software engineer IT company builds user applications open RDF data (Grenoble) worksFor DS: Restaurants (i) heterogeneous data App: clickable map m #restaurants region & average rating type of cuisine build RDW: relational data warehouse extract tabular data (SPARQL queries) merge (ii) new central concepts DS3: MuseumsDS2: Shops RDW2 RDW3 (iii) other missing relationships? Bug: landmarks museums find redesign Feature: query relationships region famous people (iv) query schema add Feature: new type of aggregation for each landmark, show how many restaurants are nearby (v) impossible ! (separate star schema; restaurants and landmarks – central entities) add
  8. 8. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2 RDF data warehousing scenario þAlice software engineer IT company builds user applications open RDF data (Grenoble) worksFor DS: Restaurants (i) heterogeneous data App: clickable map m #restaurants region & average rating type of cuisine build RDW: relational data warehouse extract tabular data (SPARQL queries) merge (ii) new central concepts DS3: MuseumsDS2: Shops RDW2 RDW3 (iii) other missing relationships? Bug: landmarks museums find redesign Feature: query relationships region famous people (iv) query schema add Feature: new type of aggregation for each landmark, show how many restaurants are nearby (v) impossible ! (separate star schema; restaurants and landmarks – central entities) add
  9. 9. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2 RDF data warehousing scenario þAlice software engineer IT company builds user applications open RDF data (Grenoble) worksFor DS: Restaurants (i) heterogeneous data App: clickable map m #restaurants region & average rating type of cuisine build RDW: relational data warehouse extract tabular data (SPARQL queries) merge (ii) new central concepts DS3: MuseumsDS2: Shops RDW2 RDW3 (iii) other missing relationships? Bug: landmarks museums find redesign Feature: query relationships region famous people (iv) query schema add Feature: new type of aggregation for each landmark, show how many restaurants are nearby (v) impossible ! (separate star schema; restaurants and landmarks – central entities) add
  10. 10. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2 RDF data warehousing scenario þAlice software engineer IT company builds user applications open RDF data (Grenoble) worksFor DS: Restaurants (i) heterogeneous data App: clickable map m #restaurants region & average rating type of cuisine build RDW: relational data warehouse extract tabular data (SPARQL queries) merge (ii) new central concepts DS3: MuseumsDS2: Shops RDW2 RDW3 (iii) other missing relationships? Bug: landmarks museums find redesign Feature: query relationships region famous people (iv) query schema add Feature: new type of aggregation for each landmark, show how many restaurants are nearby (v) impossible ! (separate star schema; restaurants and landmarks – central entities) add
  11. 11. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 2 RDF data warehousing scenario þAlice software engineer IT company builds user applications open RDF data (Grenoble) worksFor DS: Restaurants (i) heterogeneous data App: clickable map m #restaurants region & average rating type of cuisine build RDW: relational data warehouse extract tabular data (SPARQL queries) merge (ii) new central concepts DS3: MuseumsDS2: Shops RDW2 RDW3 (iii) other missing relationships? Bug: landmarks museums find redesign Feature: query relationships region famous people (iv) query schema add Feature: new type of aggregation for each landmark, show how many restaurants are nearby (v) impossible ! (separate star schema; restaurants and landmarks – central entities) add
  12. 12. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 3 RDF data warehousing Application needs: (i) support of heterogeneous data (ii) multiple central concepts (iii) support for RDF semantics when querying (iv) possibility to query the relationships between entities (the schema) (v) flexible choice of aggregation dimensions This work: redesign the core data analytics concepts and tools for RDF formal framework for warehouse-style analytics on RDF data suited to heterogeneous, semantic-rich corpora of Linked Data
  13. 13. Summary 1. RDF Graphs & BGP Queries 2. RDF Graph Analysis 3. On-Line Analytical Processing 4. Empirical Evaluation 5. Sum Up RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 4
  14. 14. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 5 RDF Graphs & BGP Queries – recall –
  15. 15. The Resource Description Framework (RDF) RDF graph – set of triples Assertion Triple Relational notation Class s rdf:type o o(s) Property s p o p(s, o) user1 user2 worksWith Bill hasName 28 hasAge Madrid inCity Studentrdf:type :b1wrote blog1 inBlog resource (URI) blank node literal (string) property RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 6
  16. 16. RDF Schema (RDFS) – declare semantic constraints between classes and properties Constraint Triple Relational notation Subclass s rdfs:subClassOf o s ⊆ o Subproperty s rdfs:subPropertyOf o s ⊆ o Domain typing s rdfs:domain o Πdomain(s) ⊆ o Range typing s rdfs:range o Πrange(s) ⊆ o Person Student rdfs:subClassOf knows rdfs:range rdfs:domain worksWith rdfs:subPropertyOf RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 7
  17. 17. Open-world assumption and RDF entailment RDF data model – based on the open-world assumption. → deductive constraints – implicitly propagate tuples Entailment – reasoning mechanism set of explicit triples + → derive implicit triples some entailment rules Exhaustive application of entailment → saturation (closure) The semantics of an RDF graph is its saturation. user1 Student Person rdfs:subClassOf rdf:type rdf:type RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 8
  18. 18. Basic Graph Pattern (BGP) queries → subset of SPARQL; BGP – conjunctions of triple patterns q(y) :- x rdf:type Person, x hasName y query evaluation query answering the evaluation of a query only uses the graph’s explicit triples (complete) answer set – evaluate q against the graph’s saturation user1 Student Person rdfs:subClassOf rdf:type rdf:type Bill hasName RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 9
  19. 19. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 10 RDF Graph Analysis – formal framework for warehousing RDF data –
  20. 20. Analytical schema (AnS) and instance (I) RDF graph: Person user1 user2 rdf:type rdf:type BillhasName post1 post2 wrote wrote blog1 inBlog inBlog Code Blog hasName Analytical schema: → labeled directed graph Instance of the analytical schema w.r.t. the graph RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11
  21. 21. Analytical schema (AnS) and instance (I) RDF graph: Person user1 user2 rdf:type rdf:type BillhasName post1 post2 wrote wrote blog1 inBlog inBlog Code Blog hasName Analytical schema: → labeled directed graph n1 λ(n1) ← Blogger δ(n1) ← q(x) :- x rdf:type Person, x wrote y, y inBlog z n2 λ(n2) ← Name δ(n2) ← q(x) :- y hasName x e2 λ(e2) ← identifiedBy δ(e2) ← q(x, y) :- x rdf:type Person, x hasName y Instance of the analytical schema w.r.t. the graph x rdf:type λ(n1) user1 rdf:type Blogger user2 rdf:type Blogger x λ(e2) y user1 identifiedBy Bill x rdf:type λ(n2) Bill rdf:type Name Code Blog rdf:type Name RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11
  22. 22. Analytical schema (AnS) and instance (I) RDF graph: Person user1 user2 rdf:type rdf:type BillhasName post1 post2 wrote wrote blog1 inBlog inBlog Code Blog hasName Analytical schema: → labeled directed graph n1 λ(n1) ← Blogger δ(n1) ← q(x) :- x rdf:type Person, x wrote y, y inBlog z n2 λ(n2) ← Name δ(n2) ← q(x) :- y hasName x e2 λ(e2) ← identifiedBy δ(e2) ← q(x, y) :- x rdf:type Person, x hasName y Instance of the analytical schema w.r.t. the graph x rdf:type λ(n1) user1 rdf:type Blogger user2 rdf:type Blogger x λ(e2) y user1 identifiedBy Bill x rdf:type λ(n2) Bill rdf:type Name Code Blog rdf:type Name RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11
  23. 23. Analytical schema (AnS) and instance (I) RDF graph: Person user1 user2 rdf:type rdf:type BillhasName post1 post2 wrote wrote blog1 inBlog inBlog Code Blog hasName Analytical schema: → labeled directed graph n1 λ(n1) ← Blogger δ(n1) ← q(x) :- x rdf:type Person, x wrote y, y inBlog z n2 λ(n2) ← Name δ(n2) ← q(x) :- y hasName x e2 λ(e2) ← identifiedBy δ(e2) ← q(x, y) :- x rdf:type Person, x hasName y Instance of the analytical schema w.r.t. the graph x rdf:type λ(n1) user1 rdf:type Blogger user2 rdf:type Blogger x λ(e2) y user1 identifiedBy Bill x rdf:type λ(n2) Bill rdf:type Name Code Blog rdf:type Name RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11
  24. 24. Analytical schema (AnS) and instance (I) RDF graph: Person user1 user2 rdf:type rdf:type BillhasName post1 post2 wrote wrote blog1 inBlog inBlog Code Blog hasName Analytical schema: → labeled directed graph n1 λ(n1) ← Blogger δ(n1) ← q(x) :- x rdf:type Person, x wrote y, y inBlog z n2 λ(n2) ← Name δ(n2) ← q(x) :- y hasName x e2 λ(e2) ← identifiedBy δ(e2) ← q(x, y) :- x rdf:type Person, x hasName y ! data heterogeneity preserved ! Instance of the analytical schema w.r.t. the graph x rdf:type λ(n1) user1 rdf:type Blogger user2 rdf:type Blogger x λ(e2) y user1 identifiedBy Bill x rdf:type λ(n2) Bill rdf:type Name Code Blog rdf:type Name RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11
  25. 25. Analytical schema (AnS) and instance (I) RDF graph: Person user1 user2 rdf:type rdf:type BillhasName post1 post2 wrote wrote blog1 inBlog inBlog Code Blog hasName Analytical schema: → labeled directed graph n1 λ(n1) ← Blogger δ(n1) ← q(x) :- x rdf:type Person, x wrote y, y inBlog z n2 λ(n2) ← Name δ(n2) ← q(x) :- y hasName x e2 λ(e2) ← identifiedBy δ(e2) ← q(x, y) :- x rdf:type Person, x hasName y ! easy to extend ! Instance of the analytical schema w.r.t. the graph x rdf:type λ(n1) user1 rdf:type Blogger user2 rdf:type Blogger x λ(e2) y user1 identifiedBy Bill x rdf:type λ(n2) Bill rdf:type Name Code Blog rdf:type Name RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 11
  26. 26. Analytical query (AnQ) RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 12 Analytical schema: Instance: n1 : Blogger n2 : Citye2 : from n3 : Value e3 : age n4 : BlogPost e4 : posted n5 : Site e5 : on user1 user2 user3 28 age Madrid from 40 age 35 age New York from post1 post2 post3 post4 posted posted posted posted blog1 blog2 on on on on Query: Find the number of sites where each blogger posts, classified by the blogger’s age and city. c(x, d1, d2) :- x age d1, x from d2 m(x, v) :- x posted y, y on v count
  27. 27. Analytical query (AnQ) RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 12 Analytical schema: Instance: n1 : Blogger n2 : Citye2 : from n3 : Value e3 : age n4 : BlogPost e4 : posted n5 : Site e5 : on user1 user2 user3 28 age Madrid from 40 age 35 age New York from post1 post2 post3 post4 posted posted posted posted blog1 blog2 on on on on Query: Find the number of sites where each blogger posts, classified by the blogger’s age and city. c(x, d1, d2) :- x age d1, x from d2 { user1, “28”, “Madrid” , user3, “35”, “New York” } m(x, v) :- x posted y, y on v count
  28. 28. Analytical query (AnQ) RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 12 Analytical schema: Instance: n1 : Blogger n2 : Citye2 : from n3 : Value e3 : age n4 : BlogPost e4 : posted n5 : Site e5 : on user1 user2 user3 28 age Madrid from 40 age 35 age New York from post1 post2 post3 post4 posted posted posted posted blog1 blog2 on on on on Query: Find the number of sites where each blogger posts, classified by the blogger’s age and city. c(x, d1, d2) :- x age d1, x from d2 { user1, “28”, “Madrid” , user3, “35”, “New York” } m(x, v) :- x posted y, y on v { user1, blog1 , user1, blog2 , user2, blog2 , user3, blog2 } count
  29. 29. Analytical query (AnQ) RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 12 Analytical schema: Instance: n1 : Blogger n2 : Citye2 : from n3 : Value e3 : age n4 : BlogPost e4 : posted n5 : Site e5 : on user1 user2 user3 28 age Madrid from 40 age 35 age New York from post1 post2 post3 post4 posted posted posted posted blog1 blog2 on on on on Query: Find the number of sites where each blogger posts, classified by the blogger’s age and city. c(x, d1, d2) :- x age d1, x from d2 { user1, “28”, “Madrid” , user3, “35”, “New York” } m(x, v) :- x posted y, y on v { user1, blog1 , user1, blog2 , user2, blog2 , user3, blog2 } count { “28”, “Madrid”, 2 , “35”, “New York”, 1 }
  30. 30. Analytical query answering RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 13 through analytical schema materialization through analytical query reformulation
  31. 31. Analytical query answering RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 13 through analytical schema materialization through analytical query reformulation Analytical schema: n1 λ(n1) ← Blogger δ(n1) ← q(x) :- x rdf:type Person, x wrote y, y inBlog z e1 λ(e1) ← acquaintedWith δ(e1) ← q(x, y) :- z rdfs:subPropertyOf knows, x z y Query: c(x, d) :- x rdf:type Blogger, x acquaintedWith d c (x, d) :- x rdf:type Person, x wrote y1, y1 inBlog y2, z1 rdfs:subPropertyOf knows, x z1 d
  32. 32. Analytical query answering RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 13 through analytical schema materialization through analytical query reformulation Analytical schema: n1 λ(n1) ← Blogger δ(n1) ← q(x) :- x rdf:type Person, x wrote y, y inBlog z e1 λ(e1) ← acquaintedWith δ(e1) ← q(x, y) :- z rdfs:subPropertyOf knows, x z y Query: c(x, d) :- x rdf:type Blogger, x acquaintedWith d c (x, d) :- x rdf:type Person, x wrote y1, y1 inBlog y2,
  33. 33. Analytical query answering RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 13 through analytical schema materialization through analytical query reformulation Analytical schema: n1 λ(n1) ← Blogger δ(n1) ← q(x) :- x rdf:type Person, x wrote y, y inBlog z e1 λ(e1) ← acquaintedWith δ(e1) ← q(x, y) :- z rdfs:subPropertyOf knows, x z y Query: c(x, d) :- x rdf:type Blogger, x acquaintedWith d c (x, d) :- x rdf:type Person, x wrote y1, y1 inBlog y2, z1 rdfs:subPropertyOf knows, x z1 d
  34. 34. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 14 On-Line Analytical Processing – applying OLAP operations –
  35. 35. Slice, dice, drill-in and drill-out RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15 Query: Find the number of sites where each blogger posts, classified by the blogger’s age and city. c(x, d1, d2) :- x age d1, x from d2 m(x, v) :- x posted y, y on v count
  36. 36. Slice, dice, drill-in and drill-out RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15 Query: Find the number of sites where each blogger posts, classified by the blogger’s age and city. c(x, d1, d2) :- x age d1, x from d2 m(x, v) :- x posted y, y on v count Slice: bind an aggregation dimension to a single value cΣ (x, d1, d2) :- x age d1, x from d2 Σ = { d1 ← “35” }
  37. 37. Slice, dice, drill-in and drill-out RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15 Query: Find the number of sites where each blogger posts, classified by the blogger’s age and city. c(x, d1, d2) :- x age d1, x from d2 m(x, v) :- x posted y, y on v count Slice: bind an aggregation dimension to a single value cΣ (x, d1, d2) :- x age d1, x from d2 Σ = { d1 ← “35” } Dice: bind several aggregation dimensions to sets of values cΣ (x, d1, d2) :- x age d1, x from d2 Σ = { d1 ← {“28”}, d2 ← {“Madrid”, “Kyoto”} }
  38. 38. Slice, dice, drill-in and drill-out RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15 Query: Find the number of sites where each blogger posts, classified by the blogger’s age and city. c(x, d1, d2) :- x age d1, x from d2 m(x, v) :- x posted y, y on v count Slice: bind an aggregation dimension to a single value cΣ (x, d1, d2) :- x age d1, x from d2 Σ = { d1 ← “35” } Dice: bind several aggregation dimensions to sets of values cΣ (x, d1, d2) :- x age d1, x from d2 Σ = { d1 ← {“28”}, d2 ← {“Madrid”, “Kyoto”} } Drill-in: remove a dimension from the classifier c (x, d2) :- x from d2
  39. 39. Slice, dice, drill-in and drill-out RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 15 Query: Find the number of sites where each blogger posts, classified by the blogger’s age and city. c(x, d1, d2) :- x age d1, x from d2 m(x, v) :- x posted y, y on v count Slice: bind an aggregation dimension to a single value cΣ (x, d1, d2) :- x age d1, x from d2 Σ = { d1 ← “35” } Dice: bind several aggregation dimensions to sets of values cΣ (x, d1, d2) :- x age d1, x from d2 Σ = { d1 ← {“28”}, d2 ← {“Madrid”, “Kyoto”} } Drill-in: remove a dimension from the classifier c (x, d2) :- x from d2 Drill-out: add a dimension to the classifier c (x, d1, d2, d3) :- x age d1, x from d2, x acquaintedWith d3
  40. 40. Roll-up and drill-down RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 16 Query: Find the number of sites where each blogger posts, classified by the blogger’s age and city. c(x, d1, d2) :- x age d1, x from d2 m(x, v) :- x posted y, y on v count nextLevel relationship – hierarchies among nodes or edges n1 : Blogger n2 : Citye2 : from n6 : Statee6 : nextLevel n3 : Value e3 : age n4 : BlogPost e4 : posted n5 : Site e5 : on Roll-up: along the City dimension to the State level c (x, d1, d3) :- x age d1, x from d2, d2 nextLevel d3
  41. 41. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 17 Empirical Evaluation – experiments and demo –
  42. 42. Experiments RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 18 Settings: kdb+ v3.0 (64 bits) – highly efficient in-memory column store q interpreted programming language Dataset: DBpedia Download 3.8 Ontology and Ontology Infobox datasets Hardware: 8-core DELL server at 2.13 GHz 16 GB of RAM running Linux 2.6.31.14 Results: linear scale-up w.r.t. the data size for instance materialization and query answering
  43. 43. Analytical query answering 12 patterns c number of triple patterns in the classifier query 1,097 queries v number of dimension variables in the classifier query m number of triple patterns in the measure query c1v1m1 c1v1m2 c1v1m3 c2v1m3 c3v2m3 c4v3m3 c5v1m3 c5v2m3 c5v3m3 c5v4m1 c5v4m2 c5v4m3 0 1 10 average minimum maximum c1v1m1 (73) c1v1m2 (53) c1v1m3 (62) c2v1m3 (71) c3v2m3 (76) c4v3m3 (130) c5v1m3 (144) c5v2m3 (216) c5v3m3 (144) c5v4m1 (28) c5v4m2 (64) c5v4m3 (36) 0 1 10 100 1,000 10,000 100,000 evaluation time (s) number of results RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 19
  44. 44. Java GUI using the Prefuse toolkit (collaboration with Tushar Ghosh) RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 20
  45. 45. Java GUI using the Prefuse toolkit (collaboration with Tushar Ghosh) RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 20
  46. 46. Java GUI using the Prefuse toolkit (collaboration with Tushar Ghosh) RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 20
  47. 47. RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 21 Sum Up
  48. 48. Related works RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 22 Graph cube: on warehousing and OLAP multidimensional networks [SIGMOD 2011] → do not handle heterogeneous graphs, nor data semantics, both central in RDF → only focus on counting edges in contrast with our flexible analytical queries Business intelligence on complex graph data [EDBT/ICDT 2012 Workshops] → graph data aggregated in a spatial fashion (group connected nodes into regions) → our framework – RDF-specific + more general aggregation No Size Fits All – Running the Star Schema Benchmark with SPARQL and RDF Aggregate Views [ESWC 2013] → techniques for transforming OLAP queries into SPARQL → could be used to further optimize analytical query answering in our framework The MD-join: An Operator for Complex OLAP [ICDE 2001] → separation between grouping and aggregation present in our analytical queries is similar to the MD-join operator for RDWs W3C’s SPARQL 1.1 Query Language → features SQL-style grouping and aggregation → efficient SPARQL 1.1 platforms – ideal for deploying our framework
  49. 49. Sum up and perspectives RDF Analytics: Lenses over Semantic Graphs May 14, 2014 – 23 Sum up: Approach for specifying and exploiting an RDF data warehouse define an analytical schema that captures the information of interest formalize analytical queries (or cubes) over the analytical schema Instances of analytical schemas are RDF graphs themselves, which allows to exploit the rich semantics and heterogeneous structure. Perspectives: semi-automatic analytical schema design optimized OLAP operation on analytical queries results efficient methods for deploying analytical schemas and analytical queries in parallel contexts
  50. 50. Questions? I You Attention Question :b1 :b2 :b3 thank payed ask ask ask rdf:type rdf:type rdf:type alexandra.roatis@inria.fr https://team.inria.fr/oak/warg/

×