Your SlideShare is downloading. ×
0
Using Semantic Web
Resources for Data Quality
      Management
       Christian Fürber and Martin Hepp
      christian@fue...
Purpose of Data
  Measurement                                      Information &
                                         ...
Data Quality in Practice




       Reference: http://www.heise.de/newsticker/meldung/Comdirect-Bank-macht-Kunden-zu-Billi...
The Web of Messy Data?
 Retrieved from http://dbpedia.org/sparql on July 20th




                                        ...
The Web of Messy Data?
 Retrieved from http://dbpedia.org/sparql on July 20th




                                        ...
Risk of Failure
  Measurement                                      Information &
                                         ...
Data Quality Problem Types
                                                      Inconsistent duplicates
                 ...
Goals

• Use Semantic Web data to identify data
  quality problems on instance level

• Support Data Quality Management (D...
Total Data Quality Management
  for and based on the Semantic Web
                                                        ...
How can the Semantic Web support
    Data Quality Management?

   Availability of FREE Data Quality Knowledge,
   e.g. for...
Using Trusted References
  Las Vegas                      France       DQ-Constraints



                             loca...
Basic Architecture




C. Fürber, M. Hepp:                               12
Using SemWeb Resources for DQM
Basic Characteristics of SPIN
                                 • Allows definition of generalized
                        ...
Generic Data Quality Constraints
       Library for Easy DQ-Defintion
                                                • Ma...
Definition of Data Quality
                Constraints based on SPIN




C. Fürber, M. Hepp:                           15
...
Constraint checking in Practice




C. Fürber, M. Hepp:                       16
Using SemWeb Resources for DQM
Legal Value Constraints
   Return all instances of class vcard:Address that do not have a
   matching value for property v...
Functional Dependency Constraints
   Return all instances of vcard:ADR with city-country-combinations
   that do not have ...
Acquisition of Semantic Web
                 Sources for DQM
        (1)          Replication of relevant knowledge-bases
...
Limitations
• High degree of uncertainty about quality of Semantic
  Web resources
• Risk for data quality problem prolife...
Contributions
• Data quality control for Semantic Web data
• Identification of potential inconsistencies
  between Semanti...
Future Work
• Semantic Web information quality assessment
  framework (SWIQA) with computation of KPI‘s
• Analysis and ide...
Data Quality Constraints Library for SPIN @
http://semwebquality.org/ontologies/dq-constraints#

          Christian Fürbe...
Upcoming SlideShare
Loading in...5
×

Using Semantic Web Resources for Data Quality Management

1,636

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,636
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
29
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Using Semantic Web Resources for Data Quality Management"

  1. 1. Using Semantic Web Resources for Data Quality Management Christian Fürber and Martin Hepp christian@fuerber.com, mhepp@computer.org Presentation at the 17th International Conference on Knowledge Engineering and Knowledge Management, October 10-15, 2010, Lisbon, Portugal
  2. 2. Purpose of Data Measurement Information & Knowledge 101010101 010101010 DATA 101010101 001010101 Automation 001010101 Decisions C. Fürber, M. Hepp: 2 Using SemWeb Resources for DQM
  3. 3. Data Quality in Practice Reference: http://www.heise.de/newsticker/meldung/Comdirect-Bank-macht-Kunden-zu-Billiardaeren-996088.html C. Fürber, M. Hepp: 3 Using SemWeb Resources for DQM
  4. 4. The Web of Messy Data? Retrieved from http://dbpedia.org/sparql on July 20th Which one is the correct population? C. Fürber, M. Hepp: 4 Using SemWeb Resources for DQM
  5. 5. The Web of Messy Data? Retrieved from http://dbpedia.org/sparql on July 20th Places with negative population?!? C. Fürber, M. Hepp: 5 Using SemWeb Resources for DQM
  6. 6. Risk of Failure Measurement Information & Knowledge 101010101 010101010 DATA 101010101 001010101 Automation 001010101 Decisions C. Fürber, M. Hepp: 6 Using SemWeb Resources for DQM
  7. 7. Data Quality Problem Types Inconsistent duplicates Invalid characters Missing classification Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ Incorrect reference Approximate duplicates Reference: Linking Open Data cloud diagram, by Character alignment violation Word transpositions Invalid substrings Mistyping / Misspelling errors Cardinality violation Missing values Referential integrity violation Misfielded values Unique value violation False values Functional Dependency Out of range values Violation Imprecise values Existence of Homonyms Meaningless values Incorrect classification Existence of Synonyms Contradictory relationships Outdated conceptual elements Untyped literals Outdated values C. Fürber, M. Hepp: 7 Using SemWeb Resources for DQM
  8. 8. Goals • Use Semantic Web data to identify data quality problems on instance level • Support Data Quality Management (DQM) process C. Fürber, M. Hepp: 8 Using SemWeb Resources for DQM
  9. 9. Total Data Quality Management for and based on the Semantic Web Develop and Define what‘s apply SPARQL good and / or queries based what‘s poor Define Measure on DQ- data quality Definition DQ Improve Analyze Reference: Richard Wang (1998) C. Fürber, M. Hepp: 9 Using SemWeb Resources for DQM
  10. 10. How can the Semantic Web support Data Quality Management? Availability of FREE Data Quality Knowledge, e.g. for the identification of… • Legal value violations • Functional dependency violations C. Fürber, M. Hepp: 10 Using SemWeb Resources for DQM
  11. 11. Using Trusted References Las Vegas France DQ-Constraints local:Location tref:Location Las Vegas Las Vegas France USA Tested Knowledgebase Trusted Reference C. Fürber, M. Hepp: 11 Using SemWeb Resources for DQM
  12. 12. Basic Architecture C. Fürber, M. Hepp: 12 Using SemWeb Resources for DQM
  13. 13. Basic Characteristics of SPIN • Allows definition of generalized SPARQL query templates http://spinrdf.org/ • Constraint checking based on SPARQL • Definition of inferencing rules via SPARQL C. Fürber, M. Hepp: 13 Using SemWeb Resources for DQM
  14. 14. Generic Data Quality Constraints Library for Easy DQ-Defintion • Mandatory properties & literals • Legal values* • Legal value ranges • Functional dependencies* • Legal syntaxes • Uniqueness * Designed to use trusted references available @ http://semwebquality.org/ontologies/dq-constraints# C. Fürber, M. Hepp: 14 Using SemWeb Resources for DQM
  15. 15. Definition of Data Quality Constraints based on SPIN C. Fürber, M. Hepp: 15 Using SemWeb Resources for DQM
  16. 16. Constraint checking in Practice C. Fürber, M. Hepp: 16 Using SemWeb Resources for DQM
  17. 17. Legal Value Constraints Return all instances of class vcard:Address that do not have a matching value for property vcard:country-name in property tref:country SELECT ?s WHERE { ?s a vcard:Address . ?s vcard:country-name ?value . OPTIONAL { ?s2 a tref:Location . ?s2 tref:country ?value1 . } . FILTER(str(?value1)!= str(?value)) } C. Fürber, M. Hepp: 17 Using SemWeb Resources for DQM
  18. 18. Functional Dependency Constraints Return all instances of vcard:ADR with city-country-combinations that do not have a matching pair in instances of gn:Location. SELECT ?s WHERE { ?s a gr:LocationOfSalesOrServiceProvisioning . ?s vcard:ADR ?node ?node vcard:city ?value1 . ?node vcard:country ?value2 . NOT EXISTS { ?s2 a gn:Location . ?s2 gn:asciiname ?value1 . ?s2 gn:country ?value2 . }} C. Fürber, M. Hepp: 18 Using SemWeb Resources for DQM
  19. 19. Acquisition of Semantic Web Sources for DQM (1) Replication of relevant knowledge-bases (2) On the fly via federated SPARQL queries: PREFIX dbo:<http://dbpedia.org/ontology/> SELECT * WHERE { ?s1 :location_CITY ?city . OPTIONAL{ SERVICE <http://dbpedia.org/sparql>{ ?s2 a dbo:City . ?s2 rdfs:label ?city . FILTER (lang(?city) = "en") . } } FILTER(!bound(?s2)) } C. Fürber, M. Hepp: 19 Using SemWeb Resources for DQM
  20. 20. Limitations • High degree of uncertainty about quality of Semantic Web resources • Risk for data quality problem proliferation • Lack of Semantic Web resources for certain domains • Flexible design of RDF and structural heterogeneity complicate definition of generic DQ constraints • Scalability on large data sets • DQ constraints close the world C. Fürber, M. Hepp: 20 Using SemWeb Resources for DQM
  21. 21. Contributions • Data quality control for Semantic Web data • Identification of potential inconsistencies between Semantic Web Resources • Reduction of effort for the definition of functional dependency rules and legal value rules • Reuse of shared data quality rules on a Web scale C. Fürber, M. Hepp: 21 Using SemWeb Resources for DQM
  22. 22. Future Work • Semantic Web information quality assessment framework (SWIQA) with computation of KPI‘s • Analysis and identification of useful „trusted references“ based on SWIQA • Application on multi-source master data of information systems • Evaluation on large data sets C. Fürber, M. Hepp: 22 Using SemWeb Resources for DQM
  23. 23. Data Quality Constraints Library for SPIN @ http://semwebquality.org/ontologies/dq-constraints# Christian Fürber Researcher E-Business & Web Science Research Group Werner-Heisenberg-Weg 39 85577 Neubiberg Germany skype c.fuerber email christian@fuerber.com web http://www.unibw.de/ebusiness homepage http://www.fuerber.com twitter http://www.twitter.com/cfuerber Paper available at http://bit.ly/c5v6TM 23
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×