SlideShare a Scribd company logo
1 of 29
Download to read offline
A DATABASE APPROACH TO MONITORING THE
                    QUALITY OF INFORMATION IN RDF STORES
                        Alexandre Rademaker and Edward Hermann




Wednesday, November 30, 11
NOTES



                  This is not a research report, this is a research
                  propose!

                  Let us start by looking results from database
                  researchers.




Wednesday, November 30, 11
WHAT IS (ENSURE) DATA QUALITY?




                  Semantic properties of databases can be represented
                  by integrity constraints!

                  Integrity enforcement means maintain correctness of
                  database. Truth Maintenance!



                                                         Hendrik, 2011

Wednesday, November 30, 11
HENDRIK DECKER


             http://web.iti.upv.es/~hendrik/
             Universidad Politécnica de Valencia




Wednesday, November 30, 11
EXAMPLE




                  A marriage is between one man and one women only.
                  How can we model such constraint in a relational
                  DB?

                  We are talking about more than: check constraint,
                  foreign key and primary key.




Wednesday, November 30, 11
DB THEORY USES DATALOG




                  Datalog is more expressive than SQL (transitive
                  closure)

                  SQL is FOL (dedidable for finite model)

                  SELECT X WHERE Y (give me the binds that satisfy
                  the clauses)




Wednesday, November 30, 11
TWO WAYS TO ENFORCE INTEGRITY




                 In each update, check if any integrity constraint is
                 violated. (not always rigorously check due its
                 performance penalty)

                 Repair extant violations of constraints. (accumulation
                 of inconsistency is inevitable)



                                                           Hendrik, 2011

Wednesday, November 30, 11
INCONSISTENCY-TOLERANT METHODS




                 Rigorous way is to eliminate all inconsistency. Repair
                 the whole database.

                 Relaxation... partial (flexible) repairs!


                             Absolute consistency is out of question
                                     due its intractability!

                                                                Hendrik, 2011

Wednesday, November 30, 11
FLEXIBILITY OF PARTIAL INCONSISTENCY


            Flexibility served in two ways:
                 Integrity enforcement is more flexible. Don’t have to
                 be done all at once. (constraint violations can be
                 tolerated to be solved in appropriate moment)

                 Some inconsistency may be unknown at update time.
                 Total approach would fail in such situation.

                 But...


                                                         Hendrik, 2011

Wednesday, November 30, 11
PARTIAL REPAIRS


                 Absolute consistency is out of question due its intractability.

                 But, naive inconsistency-tolerant repairs can be data-
                 destructive.

                 For a rational flexible repair strategy, one needs criteria
                 (expressed in terms of metrics)

                 Only admit repairs that are integrity-preserving! That is, total
                 amount of integrity violation not increase after the repair.


                                                                   Hendrik, 2011

Wednesday, November 30, 11
FORMAL DEFINITIONS

           For an update U (inserts, deletes) of database D, we
           denoted DU the updated database.


           D      =    database
           IC     =    integrity theory
           I      =    constraint
           U      =    update
                                            D(F)   = true if F eval to true in D

                                            D(I)   = true if I is satisfied in D

                                            D(IC) = true if all I in IC is
                                                    satisfied in D




                                                                  Hendrik, 2011

Wednesday, November 30, 11
FORMAL DEFINITIONS


            Let be an ordering antisymmetric, reflexive and transitive.
            For two elements in a lattice A and B, A B is their least upper bound.




                                                                    Hendrik, 2011

Wednesday, November 30, 11
FORMAL DEFINITIONS

            We say that (µ, ) is an inconsistency metric if
            µ maps tuples (D, IC) to some lattice that is partially ordered by                 .

           Simple example of a metric        is given by (D, IC) = D(IC)
           with the natural order true       f alse of the range of .

                                                  That is, integrity sat, D(IC) = true,
                                                  mean lower inconsistency than integrity violation,
                                                  D(IC) = false.


              Non trivial examples given by comparing or
              counting violated constraints.



                                                                               Hendrik, 2011

Wednesday, November 30, 11
INCONSISTENCY METRICS


                 Inconsistency metrics are used to decide if an update preserves
                 integrity, that is, doesn’t create a integrity violation that
                 doesn’t exist before the update.

                 Intuitively, an update preserves integrity if it doesn’t increase
                 the measured inconsistency

                             For a metric (µ, ), an update U in a database D
                             with integrity theory IC is integrity-preserving with
                             regard to (µ, ) if µ(DU , IC) µ(D, IC).

                                                                        Hendrik, 2011

Wednesday, November 30, 11
AND MORE...




                 Inconsistency-tolerant integrity checking

                 Repairs

                 Computing and checking partial repairs

                 Computing integrity-preserving repairs




                                                             Hendrik, 2011

Wednesday, November 30, 11
WHY WE ARE TALKING ABOUT IT?




Wednesday, November 30, 11
WHY WE ARE TALKING ABOUT IT?



                 Lattes@FGV Project (a unified KB of FGV research
                 publications, researchers, skills etc), http://dck092.fgv.br/

                 Semantic Web brings, RDF, description logics, linked data etc.

                 Our research topics include Logics and knowledge
                 representation.

                 RDF are the key concept of Semantic Web

                 Relational has fixed model (TBOX of an ontology)



Wednesday, November 30, 11
TOPOS: THEORETICAL PART
                                                                                         scra
                                                                                              tchi
                                                                                                   n g th
                                                                                                          e su
                                                                                                                 rfac
                                                                                                                     e!
                 A topos (plural topoi or toposes) is a category with a quite expressive internal logic

                 The category of graphs and graph-homomorphisms can be viewed as a topos.

                 This topos already has a Heyting algebra that is used as the truth-basis of its internal logic.

                 A Heyting algebra is a lattice with additional properties. This topos-theoretic view of RDF
                 stores can be investigated in order to provide a natural way to provide foundations to
                 partial repairs in RDF stores.

                 Besides that, if we view traditional DBs as finite first-order logical structures, the category
                 of (finite) first-order structures and homomorphism between then has its own internal
                 logic. This internal logic can be investigated also regarding partial repairs.




Wednesday, November 30, 11
LATTES@FGV




Wednesday, November 30, 11
LATTES@FGV




Wednesday, November 30, 11
LATTES@FGV




Wednesday, November 30, 11
LATTES@FGV: THE RDF KB




                              http://dck092.fgv.br:10035/repositories/fgv (800k triples)



Wednesday, November 30, 11
LATTES@FGV



                 480 CV Lattes and collected data from other sources (Qualis,
                 Digital Library etc) in one triple store

                 lots of errors (inconsistencies) for different reasons: poor user
                 interface for input data, misinterpretation etc.

                 How to identify the errors? (non ad-hoc matter)

                 How to fix what can be fixed automatically?




Wednesday, November 30, 11
INTEGRITY CONSTRAINTS IN RDF




                 We can consider the extension of what was discussed so far to
                 non-SQL

                 KR/DB can be viewed as a graph

                 The query language of RDF based stores, SPARQL, can be
                 used to provide semantics to the store.




Wednesday, November 30, 11
EXAMPLES




                                  An article referenced by a CV
                                  must have the author of this CV as
                                  one of its authors!




Wednesday, November 30, 11
EXAMPLES




                                  If two resources were identified by
                                  reference to the same article, every
                                  author of the first one should also
                                  be related to the second one!




Wednesday, November 30, 11
IN THE LAST EXAMPLE

           Of course, two publications cannot be considered
           the same comparing only their titles!

           We need entity alignment, similarity checker...

           Suppose we have identified all resources that
           represent the same real “entity” using ask {
           owl:sameAs, than ...                     ?p1 owl:sameAs  ?p2 ;
                                                        dc:creator ?c .
                                                    OPTIONAL {
                                                      ?p2 ?rel ?c .
                                                    }
                                                    FILTER( !bound(?rel) )
                                                  }



Wednesday, November 30, 11
A LITTLE BIT ABOUT THE
                   IDENTIFICATION OF SIMILARITY

           (defun assert-same-list (list)
             (let ((new nil))
               (mapcar (lambda (pair)
                      (let ((a (first pair))
                         (b (second pair)))
                     (if (not (blank-node-p a))
                         (push (reverse pair) new)
                         (push pair new))))
                    list)
               (dolist (pair new)
                 (add-triple (first pair) !owl:sameAs (second pair)))))




           (select0/callback (?x ?y) #'insert-same-as
                (q- ?x !rdf:type !foaf:Agent)
                (q- ?y !rdf:type !foaf:Agent)
                (q- ?x !foaf:name ?n)
                (q- ?y !foaf:name ?n)
                (lispp (upi< ?x ?y)))




                                                  Naive approach: Shaking hands!


Wednesday, November 30, 11
A LITTLE BIT ABOUT THE
                   IDENTIFICATION OF SIMILARITY

           (defun components (vertices n generator)
             (do ((res nil)
                  (vtx vertices
                    (set-difference vtx (car res) :test #'upi=)))
                 ((null vtx) res)
                 (push (ego-group (car vtx) n generator) res)))


           (defsna-generator same-journal (node)
             (select0 (?j)
               (q- (?? node) !bibo:issn ?i)
               (q- ?j !bibo:issn ?i)
               (lispp (utils::check-issn (part->value ?i)))
               (lispp (upi< node ?j))
               (q- ?j !dc:title ?t2)
               (q- (?? node) !dc:title ?t1)
               (lispp (> (utils::jaro-winkler-distance (part->value ?t1) (part->value ?t2)) 0.7))))


           (let ((nodes (mapcar #'subject (get-triples-list :p !bibo:issn :limit nil))))
                (dolist (g (components nodes 2 'same-journal)))
                    (merge-nodes g))


                    An ad-hoc solution: breath-first-search of connected components!


Wednesday, November 30, 11

More Related Content

More from Alexandre Rademaker

Verifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetVerifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetAlexandre Rademaker
 
An overview of Portuguese WordNets
An overview of Portuguese WordNetsAn overview of Portuguese WordNets
An overview of Portuguese WordNetsAlexandre Rademaker
 
On the Computational Complexity of Intuitionistic Hybrid Modal Logic
On the Computational Complexity of Intuitionistic Hybrid Modal LogicOn the Computational Complexity of Intuitionistic Hybrid Modal Logic
On the Computational Complexity of Intuitionistic Hybrid Modal LogicAlexandre Rademaker
 
OpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportOpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportAlexandre Rademaker
 
Embedding NomLex-BR nominalizations into OpenWordnet-PT
Embedding NomLex-BR nominalizations into OpenWordnet-PTEmbedding NomLex-BR nominalizations into OpenWordnet-PT
Embedding NomLex-BR nominalizations into OpenWordnet-PTAlexandre Rademaker
 
A linked open data architecture for contemporary historical archives
A linked open data architecture for contemporary historical archivesA linked open data architecture for contemporary historical archives
A linked open data architecture for contemporary historical archivesAlexandre Rademaker
 
Processamento de Linguagem Natural em textos da História Comptemporânea do Br...
Processamento de Linguagem Natural em textos da História Comptemporânea do Br...Processamento de Linguagem Natural em textos da História Comptemporânea do Br...
Processamento de Linguagem Natural em textos da História Comptemporânea do Br...Alexandre Rademaker
 
On the proof theory for Description Logics
On the proof theory for Description LogicsOn the proof theory for Description Logics
On the proof theory for Description LogicsAlexandre Rademaker
 
OpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allOpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allAlexandre Rademaker
 
Intuitionistic Description Logic for Legal Reasoning
Intuitionistic Description Logic for Legal ReasoningIntuitionistic Description Logic for Legal Reasoning
Intuitionistic Description Logic for Legal ReasoningAlexandre Rademaker
 
Is it important to explain a theorem? A case study in UML and ALCQI
Is it important to explain a theorem? A case study in UML and ALCQIIs it important to explain a theorem? A case study in UML and ALCQI
Is it important to explain a theorem? A case study in UML and ALCQIAlexandre Rademaker
 

More from Alexandre Rademaker (12)

Verifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNetVerifying Integrity Constraints of a RDF-based WordNet
Verifying Integrity Constraints of a RDF-based WordNet
 
An overview of Portuguese WordNets
An overview of Portuguese WordNetsAn overview of Portuguese WordNets
An overview of Portuguese WordNets
 
On the Computational Complexity of Intuitionistic Hybrid Modal Logic
On the Computational Complexity of Intuitionistic Hybrid Modal LogicOn the Computational Complexity of Intuitionistic Hybrid Modal Logic
On the Computational Complexity of Intuitionistic Hybrid Modal Logic
 
OpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project ReportOpenWordnet-PT: A Project Report
OpenWordnet-PT: A Project Report
 
Embedding NomLex-BR nominalizations into OpenWordnet-PT
Embedding NomLex-BR nominalizations into OpenWordnet-PTEmbedding NomLex-BR nominalizations into OpenWordnet-PT
Embedding NomLex-BR nominalizations into OpenWordnet-PT
 
A linked open data architecture for contemporary historical archives
A linked open data architecture for contemporary historical archivesA linked open data architecture for contemporary historical archives
A linked open data architecture for contemporary historical archives
 
Processamento de Linguagem Natural em textos da História Comptemporânea do Br...
Processamento de Linguagem Natural em textos da História Comptemporânea do Br...Processamento de Linguagem Natural em textos da História Comptemporânea do Br...
Processamento de Linguagem Natural em textos da História Comptemporânea do Br...
 
On the proof theory for Description Logics
On the proof theory for Description LogicsOn the proof theory for Description Logics
On the proof theory for Description Logics
 
OpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for allOpenWN-PT: a Brazilian Wordnet for all
OpenWN-PT: a Brazilian Wordnet for all
 
Intuitionistic Description Logic for Legal Reasoning
Intuitionistic Description Logic for Legal ReasoningIntuitionistic Description Logic for Legal Reasoning
Intuitionistic Description Logic for Legal Reasoning
 
First Order Logic
First Order LogicFirst Order Logic
First Order Logic
 
Is it important to explain a theorem? A case study in UML and ALCQI
Is it important to explain a theorem? A case study in UML and ALCQIIs it important to explain a theorem? A case study in UML and ALCQI
Is it important to explain a theorem? A case study in UML and ALCQI
 

Recently uploaded

internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 

Recently uploaded (20)

internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 

A database approach to monitoring the quality of information in RDF stores

  • 1. A DATABASE APPROACH TO MONITORING THE QUALITY OF INFORMATION IN RDF STORES Alexandre Rademaker and Edward Hermann Wednesday, November 30, 11
  • 2. NOTES This is not a research report, this is a research propose! Let us start by looking results from database researchers. Wednesday, November 30, 11
  • 3. WHAT IS (ENSURE) DATA QUALITY? Semantic properties of databases can be represented by integrity constraints! Integrity enforcement means maintain correctness of database. Truth Maintenance! Hendrik, 2011 Wednesday, November 30, 11
  • 4. HENDRIK DECKER http://web.iti.upv.es/~hendrik/ Universidad Politécnica de Valencia Wednesday, November 30, 11
  • 5. EXAMPLE A marriage is between one man and one women only. How can we model such constraint in a relational DB? We are talking about more than: check constraint, foreign key and primary key. Wednesday, November 30, 11
  • 6. DB THEORY USES DATALOG Datalog is more expressive than SQL (transitive closure) SQL is FOL (dedidable for finite model) SELECT X WHERE Y (give me the binds that satisfy the clauses) Wednesday, November 30, 11
  • 7. TWO WAYS TO ENFORCE INTEGRITY In each update, check if any integrity constraint is violated. (not always rigorously check due its performance penalty) Repair extant violations of constraints. (accumulation of inconsistency is inevitable) Hendrik, 2011 Wednesday, November 30, 11
  • 8. INCONSISTENCY-TOLERANT METHODS Rigorous way is to eliminate all inconsistency. Repair the whole database. Relaxation... partial (flexible) repairs! Absolute consistency is out of question due its intractability! Hendrik, 2011 Wednesday, November 30, 11
  • 9. FLEXIBILITY OF PARTIAL INCONSISTENCY Flexibility served in two ways: Integrity enforcement is more flexible. Don’t have to be done all at once. (constraint violations can be tolerated to be solved in appropriate moment) Some inconsistency may be unknown at update time. Total approach would fail in such situation. But... Hendrik, 2011 Wednesday, November 30, 11
  • 10. PARTIAL REPAIRS Absolute consistency is out of question due its intractability. But, naive inconsistency-tolerant repairs can be data- destructive. For a rational flexible repair strategy, one needs criteria (expressed in terms of metrics) Only admit repairs that are integrity-preserving! That is, total amount of integrity violation not increase after the repair. Hendrik, 2011 Wednesday, November 30, 11
  • 11. FORMAL DEFINITIONS For an update U (inserts, deletes) of database D, we denoted DU the updated database. D = database IC = integrity theory I = constraint U = update D(F) = true if F eval to true in D D(I) = true if I is satisfied in D D(IC) = true if all I in IC is satisfied in D Hendrik, 2011 Wednesday, November 30, 11
  • 12. FORMAL DEFINITIONS Let be an ordering antisymmetric, reflexive and transitive. For two elements in a lattice A and B, A B is their least upper bound. Hendrik, 2011 Wednesday, November 30, 11
  • 13. FORMAL DEFINITIONS We say that (µ, ) is an inconsistency metric if µ maps tuples (D, IC) to some lattice that is partially ordered by . Simple example of a metric is given by (D, IC) = D(IC) with the natural order true f alse of the range of . That is, integrity sat, D(IC) = true, mean lower inconsistency than integrity violation, D(IC) = false. Non trivial examples given by comparing or counting violated constraints. Hendrik, 2011 Wednesday, November 30, 11
  • 14. INCONSISTENCY METRICS Inconsistency metrics are used to decide if an update preserves integrity, that is, doesn’t create a integrity violation that doesn’t exist before the update. Intuitively, an update preserves integrity if it doesn’t increase the measured inconsistency For a metric (µ, ), an update U in a database D with integrity theory IC is integrity-preserving with regard to (µ, ) if µ(DU , IC) µ(D, IC). Hendrik, 2011 Wednesday, November 30, 11
  • 15. AND MORE... Inconsistency-tolerant integrity checking Repairs Computing and checking partial repairs Computing integrity-preserving repairs Hendrik, 2011 Wednesday, November 30, 11
  • 16. WHY WE ARE TALKING ABOUT IT? Wednesday, November 30, 11
  • 17. WHY WE ARE TALKING ABOUT IT? Lattes@FGV Project (a unified KB of FGV research publications, researchers, skills etc), http://dck092.fgv.br/ Semantic Web brings, RDF, description logics, linked data etc. Our research topics include Logics and knowledge representation. RDF are the key concept of Semantic Web Relational has fixed model (TBOX of an ontology) Wednesday, November 30, 11
  • 18. TOPOS: THEORETICAL PART scra tchi n g th e su rfac e! A topos (plural topoi or toposes) is a category with a quite expressive internal logic The category of graphs and graph-homomorphisms can be viewed as a topos. This topos already has a Heyting algebra that is used as the truth-basis of its internal logic. A Heyting algebra is a lattice with additional properties. This topos-theoretic view of RDF stores can be investigated in order to provide a natural way to provide foundations to partial repairs in RDF stores. Besides that, if we view traditional DBs as finite first-order logical structures, the category of (finite) first-order structures and homomorphism between then has its own internal logic. This internal logic can be investigated also regarding partial repairs. Wednesday, November 30, 11
  • 22. LATTES@FGV: THE RDF KB http://dck092.fgv.br:10035/repositories/fgv (800k triples) Wednesday, November 30, 11
  • 23. LATTES@FGV 480 CV Lattes and collected data from other sources (Qualis, Digital Library etc) in one triple store lots of errors (inconsistencies) for different reasons: poor user interface for input data, misinterpretation etc. How to identify the errors? (non ad-hoc matter) How to fix what can be fixed automatically? Wednesday, November 30, 11
  • 24. INTEGRITY CONSTRAINTS IN RDF We can consider the extension of what was discussed so far to non-SQL KR/DB can be viewed as a graph The query language of RDF based stores, SPARQL, can be used to provide semantics to the store. Wednesday, November 30, 11
  • 25. EXAMPLES An article referenced by a CV must have the author of this CV as one of its authors! Wednesday, November 30, 11
  • 26. EXAMPLES If two resources were identified by reference to the same article, every author of the first one should also be related to the second one! Wednesday, November 30, 11
  • 27. IN THE LAST EXAMPLE Of course, two publications cannot be considered the same comparing only their titles! We need entity alignment, similarity checker... Suppose we have identified all resources that represent the same real “entity” using ask { owl:sameAs, than ...   ?p1 owl:sameAs ?p2 ;       dc:creator ?c .   OPTIONAL {     ?p2 ?rel ?c .   }   FILTER( !bound(?rel) ) } Wednesday, November 30, 11
  • 28. A LITTLE BIT ABOUT THE IDENTIFICATION OF SIMILARITY (defun assert-same-list (list) (let ((new nil)) (mapcar (lambda (pair) (let ((a (first pair)) (b (second pair))) (if (not (blank-node-p a)) (push (reverse pair) new) (push pair new)))) list) (dolist (pair new) (add-triple (first pair) !owl:sameAs (second pair))))) (select0/callback (?x ?y) #'insert-same-as (q- ?x !rdf:type !foaf:Agent) (q- ?y !rdf:type !foaf:Agent) (q- ?x !foaf:name ?n) (q- ?y !foaf:name ?n) (lispp (upi< ?x ?y))) Naive approach: Shaking hands! Wednesday, November 30, 11
  • 29. A LITTLE BIT ABOUT THE IDENTIFICATION OF SIMILARITY (defun components (vertices n generator) (do ((res nil) (vtx vertices (set-difference vtx (car res) :test #'upi=))) ((null vtx) res) (push (ego-group (car vtx) n generator) res))) (defsna-generator same-journal (node) (select0 (?j) (q- (?? node) !bibo:issn ?i) (q- ?j !bibo:issn ?i) (lispp (utils::check-issn (part->value ?i))) (lispp (upi< node ?j)) (q- ?j !dc:title ?t2) (q- (?? node) !dc:title ?t1) (lispp (> (utils::jaro-winkler-distance (part->value ?t1) (part->value ?t2)) 0.7)))) (let ((nodes (mapcar #'subject (get-triples-list :p !bibo:issn :limit nil)))) (dolist (g (components nodes 2 'same-journal))) (merge-nodes g)) An ad-hoc solution: breath-first-search of connected components! Wednesday, November 30, 11