Your SlideShare is downloading. ×
DL'12 mastro at work
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

DL'12 mastro at work

63
views

Published on

A presentation on my early work on the Mastro system. Some of this research is now part of the ontop system, some evolved into more optimised forms (also in ontop).

A presentation on my early work on the Mastro system. Some of this research is now part of the ontop system, some evolved into more optimised forms (also in ontop).

Published in: Education, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
63
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Mastro at Work: Experiences on Ontology-based Data Access Domenico Fabio Savo1, Domenico Lembo1, Maurizio Lenzerini1, Antonella Poggi1, Mariano Rodriguez-Muro2, Vittorio Romagnoli3, Marco Ruzzi1, Gabriele Stella3 1 Sapienza Universit`a di Roma lastname@dis.uniroma1.it 2 Free University of Bozen-Bolzano rodriguez@inf.unibz.it 3 Banca Monte dei Paschi di Siena firstname.lastname@banca.mps.it May, 2010 Mastro at Work Savo et. al.
  • 2. Motivations DL-Lite OBDA framework OBDA Integrated view, semantically rich description, mapping for concep- tual level and data sources. Ex- ploiting reasoning to overcome in- completeness Data Source Data Source Data Source Data Layer Ontology Semantic Layer Queries Mappings Mastro at Work Savo et. al.
  • 3. Motivations DL-Lite OBDA framework DL-Lite framework for OBDA Components: • A family of Ontology Languages: DL-Lite. • A mapping technique for relational databases: Virtual ABoxes • Promising proposal. • However, never evaluated in ‘the field’. Data Source Data Source Data Source Data Layer Ontology Semantic Layer Queries Mappings Mastro at Work Savo et. al.
  • 4. Motivations The domain • Joint project on OBDA by Banca Monte dei Paschi di Siena (MPS), Free University of Bozen-Bolzano, and SAPIENZA Universit`a di Roma. • Clusters of Connected Customers (CCCs) • Data is used in risk estimation in the process of granting credit to bank customers Mastro at Work Savo et. al.
  • 5. Motivations Problems and Solutions • management is now completely entrusted to the expert of the applications rather than to the domain experts. Mastro at Work Savo et. al.
  • 6. Motivations Problems and Solutions • management is now completely entrusted to the expert of the applications rather than to the domain experts. • OBDA has been then used for answering queries posed over the CCCs ontology, not only aimed at easily extracting relevant information but also to localize inconsistencies and incompleteness in the data, as well as to devise new data governance tasks. Mastro at Work Savo et. al.
  • 7. Systems Mastro at Work Savo et. al.
  • 8. Mastro The Mastro-OBDA plugin A DL-Lite reasoner for the OBDA context that is able to take an ontology with with mappings to a relational database (defining a ‘virtual Abox’) in order to provide the following services: Mastro at Work Savo et. al.
  • 9. Mastro The Mastro-OBDA plugin A DL-Lite reasoner for the OBDA context that is able to take an ontology with with mappings to a relational database (defining a ‘virtual Abox’) in order to provide the following services: Features • Conjunctive Query Answering • Epistemic Query Answering (EQL) • Identification Constraints • Epistemic Constraints Mastro at Work Savo et. al.
  • 10. Protege, OBDA and Mastro plugins Protege 4 and the OBDA Plugin Features • Ontology definition • Datasource and mapping definition • Interaction with OBDA-reasoner (CQs, Epistemic queries, etc.) Mastro at Work Savo et. al.
  • 11. Case Study Mastro at Work Savo et. al.
  • 12. MPS Methodology • Developed the Ontology independently from the source • Tools used: • interviews • questionnaires • existing documentation • Developed over a period of 6 months Mastro at Work Savo et. al.
  • 13. Ontology Excerpt of the Ontology ∃inGrouping Customer ∃inGrouping− Grouping ∃relativeTo Grouping ∃relativeTo− CCC Grouping ∃inGrouping− Grouping ∃relativeTo (functional relativeTo) (functional inGrouping− ) Grouping δ(timestamp) JuridicalCCC CCC JuridicalCCC δ(timestamp) ∃inMembership Customer ∃inMembership− Membership ∃hasMembership CompanyGroup ∃hasMembership− Memberhip ∃Membership ∃inMembership− Memberhip ∃hasMembership− (functional inMembership− ) (functional hasMembership) Holding Membership Membership δ(timestamp) CompanyGroup δ(id code) 79 concepts, 33 roles, 37 concept attributes, 600 DL-LiteA,Id axioms Mastro at Work Savo et. al.
  • 14. Constraints IDCs to impose complex business constraints (id JuridicalCCC timestamp, relativeTo− ◦ inGrouping− ◦ inMembership ◦ ?Holding ◦ hasMembership− ) • At the same time two juridical CCCs cannot comprise customers that are lead members, i.e., are the holdings, of the same company group. Mastro at Work Savo et. al.
  • 15. Constraints IDCs to impose complex business constraints (id JuridicalCCC timestamp, relativeTo− ◦ inGrouping− ◦ inMembership ◦ ?Holding ◦ hasMembership− ) • At the same time two juridical CCCs cannot comprise customers that are lead members, i.e., are the holdings, of the same company group. A total of 30 Identification Constraints Mastro at Work Savo et. al.
  • 16. Constraints EQLCs to impose complex business constraints EQLC( verify not exists ( SELECT jurCCC.jccc FROM sparqltable(SELECT ?jccc WHERE{ ?jccc rdf:type ’JuridicalCCC’ })jurCCC WHERE jurCCC.jccc NOT IN ( SELECT withGroupLeader.jccc FROM sparqltable(SELECT ?jccc, ?mem WHERE{ ?cus rdf:type ’Customer’. ?cus :inMembership ?mem.?mem rdf:type ’Holding’. ?cus :inGrouping ?gr. ?gr :relativeTo ?jccc. ?jccc rdf:type ’JuridicalCCC’}) withGroupLeader ) ) ) • There does not exist a juridical CCC that does not comprise a customer which is the holding member of a company group Mastro at Work Savo et. al.
  • 17. Constraints EQLCs to impose complex business constraints EQLC( verify not exists ( SELECT jurCCC.jccc FROM sparqltable(SELECT ?jccc WHERE{ ?jccc rdf:type ’JuridicalCCC’ })jurCCC WHERE jurCCC.jccc NOT IN ( SELECT withGroupLeader.jccc FROM sparqltable(SELECT ?jccc, ?mem WHERE{ ?cus rdf:type ’Customer’. ?cus :inMembership ?mem.?mem rdf:type ’Holding’. ?cus :inGrouping ?gr. ?gr :relativeTo ?jccc. ?jccc rdf:type ’JuridicalCCC’}) withGroupLeader ) ) ) • There does not exist a juridical CCC that does not comprise a customer which is the holding member of a company group A total of 27 Epistemic Constraint Mastro at Work Savo et. al.
  • 18. OBDA Mappings The Data Source • Currently, MPS applications managing CCCs rely over a 15 million tuple database, stored in 12 relational tables under IBM DB2 RDBMS Source name Source Description Source size GZ0001 Data on customers 3.463.083 GZ0002 Data on juridical connections between customers 157.280 GZ0003 Data on guarantee connection between customers 1.270.333 GZ0004 Data on economical connections between customers 104.033 GZ0005 Data on corporation connections between customers 1.021.779 GZ0006 Data on patrimonial connections between customers 809.321 GZ0007 Data on company groups 55.362 GZ0012 Customers loan information 5.966.948 GZ0015 Data on monitoring and reporting procedures 1.243 GZ0101 Data on membership of customers into CCCs 2.225.466 GZ0102 Information on CCCs 663.656 GZ0104 Data on bank credit coordinators for juridical CCCs 38.457 Mastro at Work Savo et. al.
  • 19. OBDA Mappings OBDA Mappings: Example SELECT id cluster, timestamp val FROM GZ0102, GZ0007 WHERE GZ0102.validity code = ‘T’ AND GZ0102.id cluster <> 0 AND GZ0007.validity code = ‘T’ AND GZ0007.id group <> 0 AND GZ0102.id cluster = GZ0007.id group JuridicalCCC(ccc(id cluster, timestamp val)), timestamp(ccc(id cluster, timestamp val), timestamp val) Mastro at Work Savo et. al.
  • 20. OBDA Mappings OBDA Mappings: Example SELECT id cluster, timestamp val FROM GZ0102, GZ0007 WHERE GZ0102.validity code = ‘T’ AND GZ0102.id cluster <> 0 AND GZ0007.validity code = ‘T’ AND GZ0007.id group <> 0 AND GZ0102.id cluster = GZ0007.id group JuridicalCCC(ccc(id cluster, timestamp val)), timestamp(ccc(id cluster, timestamp val), timestamp val) If the tuple (243, 24052009112341) is in ans(body) the we have the following Virtual ABox assertions: JuridicalCCC(gcc(243, 24052009112341)) timestamp(gcc(243, 24052009112341) Mastro at Work Savo et. al.
  • 21. Experimentation Ontology usage Mastro at Work Savo et. al.
  • 22. Ontology usage Verifying incompleteness in the data through query answering Incompleteness of the data Querying the database directly vs. querying the ontology provides more answers. • To retrieve the identification codes of all company groups. DB operations use id code from GZ0007 • Asking for q(y) ← CompanyGroup(x), id code(x, y) • Mastro indicates that GZ0007 is not the only relevant table. Mastro at Work Savo et. al.
  • 23. Ontology usage Verifying inconsistencies in the data through query answering Inconsistency of the data Using epistemic query answering to locate inconsistent tuples. • (functional ingrouping− ) • We can detect the violating tuples using: SELECT testview.l, testview.c1, testview.c2 FROM sparqltable (SELECT ?l ?c1 ?c2 WHERE{?c1:inGrouping?l. ?c2:inGrouping?l}) testview WHERE testview.c1 <> testview.c2 Mastro at Work Savo et. al.
  • 24. Query structure Evaluation Performance Mastro at Work Savo et. al.
  • 25. Query structure Query Performance Query answering in DL-Lite for OBDA in a nutshell Mastro at Work Savo et. al.
  • 26. Query structure Query Performance Query answering in DL-Lite for OBDA in a nutshell • Reformulate w.r.t. T • Unfold w.r.t. M • Evaluate Mastro at Work Savo et. al.
  • 27. Query structure Query Performance Query answering in DL-Lite for OBDA in a nutshell • Reformulate w.r.t. T • Unfold w.r.t. M • Evaluate Sources of complexity • Reformulation - Size of the reformulation • Unfolding - Size of the unfolding and query structure Mastro at Work Savo et. al.
  • 28. Query structure Query Performance Query answering in DL-Lite for OBDA in a nutshell • Reformulate w.r.t. T • Unfold w.r.t. M • Evaluate Sources of complexity • Reformulation - Size of the reformulation • Unfolding - Size of the unfolding and query structure Most critical aspect in the MPS scenario: query structure. Mastro at Work Savo et. al.
  • 29. Query structure Query Structure In Mastro, query unfolding is done by means of partial evaluation and SQL views. Mastro at Work Savo et. al.
  • 30. Query structure Query Structure In Mastro, query unfolding is done by means of partial evaluation and SQL views. Given a Virtual Abox defined by DB, the mappings M and a query Q to be evaluated we: • Define a set of auxiliary predicates and SQL views Mastro at Work Savo et. al.
  • 31. Query structure Query Structure In Mastro, query unfolding is done by means of partial evaluation and SQL views. Given a Virtual Abox defined by DB, the mappings M and a query Q to be evaluated we: • Define a set of auxiliary predicates and SQL views • Associate these to T by means of a logic program P Mastro at Work Savo et. al.
  • 32. Query structure Query Structure In Mastro, query unfolding is done by means of partial evaluation and SQL views. Given a Virtual Abox defined by DB, the mappings M and a query Q to be evaluated we: • Define a set of auxiliary predicates and SQL views • Associate these to T by means of a logic program P • Compute the partial evaluation of Q with respect to P Mastro at Work Savo et. al.
  • 33. Query structure Query Structure In Mastro, query unfolding is done by means of partial evaluation and SQL views. Given a Virtual Abox defined by DB, the mappings M and a query Q to be evaluated we: • Define a set of auxiliary predicates and SQL views • Associate these to T by means of a logic program P • Compute the partial evaluation of Q with respect to P • Translate the PE into SQL by means of the views. Mastro at Work Savo et. al.
  • 34. Query structure T -views Example: The mappings m1: SELECT .... WHERE cd tp = 503 ; linkedTo(cus(idcus), link(linkid)) m2: SELECT .... WHERE cd tp = 501 ; linkedTo(cus(idcus), link(linkid)) Mastro at Work Savo et. al.
  • 35. Query structure T -views Example: The mappings m1: SELECT .... WHERE cd tp = 503 ; linkedTo(cus(idcus), link(linkid)) m2: SELECT .... WHERE cd tp = 501 ; linkedTo(cus(idcus), link(linkid)) The view for AuxlinkedTo SELECT ‘cus(’||idcus||‘)’ as term1, ‘link(’||linkid||‘)’ as term2 FROM (SELECT .... WHERE cd_tp = 503) view_m1 UNION SELECT ‘cus’(||idcus||‘)’ as term1, ‘link(’||linkid||‘)’ as term2 FROM (SELECT .... WHERE cd_tp = 501) view_m2 Mastro at Work Savo et. al.
  • 36. Query structure T -views, unfolding Program linkedTo(x, y) ← AuxlinkedTo(x, y) The query q(x, y) ← linkedTo(x, z), linkedTo(y, z) The partial evaluation q(x, y) ← AuxleadsTo(x, z), AuxlinkedTo(y, z) Mastro at Work Savo et. al.
  • 37. Query structure T -views, unfolding SELECT leadsto1.term1, leadsto2.term1 FROM ( SELECT ‘cus(’||idcus||‘)’ as term1, ‘link(’||linkid||‘)’ as term2 FROM (SELECT .... WHERE cd_tp = 503) view_m1 UNION SELECT ‘cus’(||idcus||‘)’ as term1, ‘link(’||linkid||‘)’ as term2 FROM (SELECT .... WHERE cd_tp = 501) view_m2 ) as leadsto1, ( SELECT ‘cus(’||idcus||‘)’ as term1, ‘link(’||linkid||‘)’ as term2 FROM (SELECT .... WHERE cd_tp = 503) view_m1 UNION SELECT ‘cus’(||idcus||‘)’ as term1, ‘link(’||linkid||‘)’ as term2 FROM (SELECT .... WHERE cd_tp = 501) view_m2 ) as leadsto2 WHERE leadsto1.term2 = leadsto2.term2 Mastro at Work Savo et. al.
  • 38. Query structure Performance of T -views Poor performance, in the order of hours, for trivial queries. Mastro at Work Savo et. al.
  • 39. Query structure Performance of T -views Poor performance, in the order of hours, for trivial queries. Culprit Materialization of partial results in the DBMS query plans. Mastro at Work Savo et. al.
  • 40. Query structure Performance of T -views Poor performance, in the order of hours, for trivial queries. Culprit Materialization of partial results in the DBMS query plans. Solution For relational DBMS queries, simpler is better. Mastro at Work Savo et. al.
  • 41. Query structure M-views Example: Mappings m1: SELECT .... WHERE cd tp = 503 ; linkedTo(cus(idcus), link(linkid)) m2: SELECT .... WHERE cd tp = 501 ; linkedTo(cus(idcus), link(linkid)) The views: Auxm1 = SELECT .... WHERE cd tp = 503 Auxm2 = SELECT .... WHERE cd tp = 503 Mastro at Work Savo et. al.
  • 42. Query structure M-views, unfolding Program: linkedTo(cus(idcus), link(linkid)) ← Auxm1(idcus, linkid) linkedTo(cus(idcus), link(linkid)) ← Auxm2(idcus, linkid) Mastro at Work Savo et. al.
  • 43. Query structure M-views, unfolding Program: linkedTo(cus(idcus), link(linkid)) ← Auxm1(idcus, linkid) linkedTo(cus(idcus), link(linkid)) ← Auxm2(idcus, linkid) The query q(x, y) ← linkedTo(x, z), linkedTo(y, z) Mastro at Work Savo et. al.
  • 44. Query structure M-views, unfolding Program: linkedTo(cus(idcus), link(linkid)) ← Auxm1(idcus, linkid) linkedTo(cus(idcus), link(linkid)) ← Auxm2(idcus, linkid) The query q(x, y) ← linkedTo(x, z), linkedTo(y, z) The partial evaluation q(cus(idcus1), cus(idcus2)) ← Auxm1(idcus1, linkid1), Auxm1(idcus2, linkid1) q(cus(idcus1), cus(idcus2)) ← Auxm1(idcus1, linkid1), Auxm2(idcus2, linkid1) q(cus(idcus1), cus(idcus2)) ← Auxm2(idcus1, linkid1), Auxm2(idcus2, linkid1) Mastro at Work Savo et. al.
  • 45. Query structure M-views, unfolding SELECT ’cus(’||auxm11.idcus||’)’ as x, ’cus(’||auxm12.idcus||’)’ as y FROM (SELECT .... WHERE cd_tp = 503) as auxm11, (SELECT .... WHERE cd_tp = 503) as auxm12 WHERE auxm11.linkid = auxm12.linkid UNION SELECT ’cus(’||auxm11.idcus||’)’ as x, ’cus(’||auxm21.idcus||’)’ as y FROM (SELECT .... WHERE cd_tp = 503) as auxm11, (SELECT .... WHERE cd_tp = 501) as auxm21 WHERE auxm11.linkid = auxm21.linkid UNION SELECT ’cus(’||auxm21.idcus||’)’ as x, ’cus(’||auxm22.idcus||’)’ as y FROM (SELECT .... WHERE cd_tp = 501) as auxm21, (SELECT .... WHERE cd_tp = 501) as auxm22 WHERE auxm21.linkid = auxm22.linkid Mastro at Work Savo et. al.
  • 46. Query structure Performance comparison figures/performances4.pdf Mastro at Work Savo et. al.
  • 47. Conclusions Conclusions Mastro at Work Savo et. al.
  • 48. Conclusions MPS feedback Useful result from the MPS point of view • Data Integration • Data Quality • Knowledge Sharing From the technical point of view: • DBMS level performance for on-the-fly OBDA is possible • Query tuning is mandatory. • Pinpointed the features of the queries that are needed for good performance and those that trigger bad performance. Mastro at Work Savo et. al.
  • 49. Conclusions Current and Future work • Experiment with live access to the sources • Extend the current experimentation to other data domains in MPS Preview of the Mastro OBDA plugin and the OBDA plugin for Protege 4.0 • http://www.dis.uniroma1.it/quonto/ • http://obda.inf.unibz.it Mastro at Work Savo et. al.