Data and Knowledge Evolution

1,000 views

Published on

by Giorgos Flouris
at Open Data Tutorials, May 2013

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,000
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
23
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Mention FORTH and logo.
    Mention project funding.
    Acknowledgement: Michalis Chortis support in creating a couple of slides, Vassilis Christophides for providing some useful references.
    Slide availability also on dropbox.
  • [BLHL01] T. Berners-Lee, J. Hendler, O. Lassila, The Semantic Web. Scientific American, 2001.
  • [Gru93] T.R. Gruber. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5 (2), 1993.
  • [MLA+12] M. Morsey, J. Lehmann, S. Auer, C. Stadler, S. Hellmann. DBpedia and the Live Extraction of Structured Data from Wikipedia. Program: Electronic library and Information Systems, 46(2):157-181, 2012.
  • [FMK+08] G. Flouris, D. Manakanatas, H. Kondylakis, D. Plexousakis, G. Antoniou. Ontology Change: Classification and Survey. Knowledge Engineering Review, 23(2):117-152, 2008.
    [ZAA+13] F. Zablith, G. Antoniou, M. d’Aquin, G. Flouris, H. Kondylakis, E. Motta, D. Plexousakis, M. Sabou. Ontology Evolution: A Process Centric Survey. Knowledge Engineering Review (to appear).
  • [PFF+13] V. Papavassiliou, G. Flouris, I. Fundulaki, D. Kotzinos, V. Christophides. High-Level Change Detection in RDF(S) KBs. Transactions on Database Systems (TODS), 38(1), 2013.
  • [RFC11] Y. Roussakis, G. Flouris, V. Christophides. Declarative Repairing Policies for Curated KBs. HDMS-11, 2011.
  • Please interrupt for questions. The duration of the talk is actually less.
    [PFF+13] V. Papavassiliou, G. Flouris, I. Fundulaki, D. Kotzinos, V. Christophides. High-Level Change Detection in RDF(S) KBs. Transactions on Database Systems (TODS), 38(1), 2013.
    [RFC11] Y. Roussakis, G. Flouris, V. Christophides. Declarative Repairing Policies for Curated KBs. HDMS-11, 2011.
  • OWL language tries to balance trade-offs between usefulness, simplicity, expressiveness and computational complexity.
  • [HHM+10] H. Halpin, P.J. Hayes, J.P. McCusker, D.L. McGuiness, H.S. Thompson. When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data. ISWC-10, 2010.
  • [RPH+12] A. Rula, M. Palmonari, A. Harth, S. Stadtmüller, A. Maurino. On the Diversity and Availability of Temporal Information in Linked Open Data. ISWC-12, 2012.
  • When a change happens, then it must be communicated to other “consuming” datasets.
    Push-based: either the owner of the changed data sends notifications to those that use his data (requires registration) or he maintains info on the change (monitoring, versioning) and those that use the data ask for the related info (passive propagation – closer to pull-based).
    [OK02] D. Ognyanov, A. Kiryakov. Tracking Changes in RDF(S) Repositories. EKAW-02, 2002.
  • [HH00] J. Heflin, J. Hendler. Dynamic Ontologies on the Web. AAAI-00, 2000.
    [SSN+10] H. Van de Sompel, R. Sanderson, M.L. Nelson, L.L. Balakireva, H. Shankar, S. Ainsworth. An HTTP-Based Versioning Mechanism for Linked Data. LDOW-10, 2010.
  • [HP04] J. Heflin, J.Z. Pan. A Model Theoretic Semantics for Ontology Versioning. ISWC-04, 2004.
    [HS05] Z. Huang, H. Stuckenschmidt. Reasoning with Multi-version Ontologies: A Temporal Logic Approach. ISWC-05, 2005.
    [PTC05] P. Plessers, O. de Troyer, S. Casteleyn. Event-based Modeling of Evolution for Semantic-driven Systems. CAiSE-05, 2005.
    [KLGE07] N. Keberle, Y. Litvinenko, Y. Gordeyev, V. Ermolayev. Ontology Evolution Analysis with OWL-MeT. IWOD-07, 2007.
    [RSDT08] T. Redmond, M. Smith, N. Drummond, T. Tudorache. Managing Change: An Ontology Version Control System. OWLED-08, 2008.
  • [AAM09] C. Allocca, M. d'Aquin, E. Motta. Detecting Different Versions of Ontologies in Large Ontology Repositories. IWOD-09, 2009.
    [CQ13] G. Cheng, Y. Qu. Relatedness Between Vocabularies on the Web of Data: A Taxonomy and an Empirical Study. Web Semantics: Science, Services and Agents on the World Wide Web, 2013. Available at: http://dx.doi.org/10.1016/j.websem.2013.02.001
    [TTA08] Y. Tzitzikas, Y. Theoharis, D. Andreou. On Storage Policies for the Semantic Web Repositories that Support Version. ESWC-08, 2008.
  • [HGR12] M. Hartung, A. Gross, E. Rahm. COnto-diff: Generation of Complex Evolution Mappings for Life Science Ontologies. Journal of Biomedical Informatics, 2012.
    [JAP09] M. Javed, Y. Abgaz, C. Pahl. A Pattern-based Framework of Change Operators for Ontology Evolution. OTM-09, 2009.
    [PFF+13] V. Papavassiliou, G. Flouris, I. Fundulaki, D. Kotzinos, V. Christophides. High-Level Change Detection in RDF(S) KBs. Transactions on Database Systems (TODS), 38(1), 2013.
    [SK03] H. Stuckenschmidt, M. Klein. Integrity and Change in Modular Ontologies. IJCAI-03, 2003.
    [AH06] S. Auer, H. Herre. A Versioning and Evolution Framework for RDF Knowledge Bases. PSI-06, Revised Papers, 2006.
    [DA09] R. Djedidi, M. Aufaure. Change Management Patterns (CMP) for Ontology Evolution Process. IWOD-09, 2009.
    [PTC07] P. Plessers, O. de Troyer, S. Casteleyn. Understanding Ontology Evolution: A Change Detection Approach. Web Semantics: Science, Services and Agents on the WWW, 2007.
  • Once you have a change language, representing deltas is easy.
    [NCLM06] N. Noy, A. Chugh, W. Liu, M. Musen. A Framework for Ontology Evolution in Collaborative Environments. ISWC-06, 2006.
    [KFKO02] M. Klein, D. Fensel, A. Kiryakov, D. Ognyanov. Ontology Versioning and Change Detection on the Web. EKAW-02, 2002.
    [KN03] M. Klein, N. Noy. A Component-based Framework for Ontology Evolution. IJCAI-03 Workshop on Ontologies and Distributed Systems, CEUR-WS, vol. 71, 2003.
    [PT05] P. Plessers, O. de Troyer. Ontology Change Detection Using a Version Log. ISWC-05, 2005.
  • [PT05] P. Plessers, O. de Troyer. Ontology Change Detection Using a Version Log. ISWC-05, 2005.
    [ZZL+03] Z. Zhang, L. Zhang, C.X. Lin, Y. Zhao, Y. Yu. Data Migration for Ontology Evolution. Poster ISWC-03, 2003.
    [MLA+12] M. Morsey, J. Lehmann, S. Auer, C. Stadler, S. Hellmann. DBpedia and the Live Extraction of Structured Data from Wikipedia. Program: Electronic library and Information Systems, 46(2):157-181, 2012.
  • [VWS+05] M. Volkel, W. Winkler, Y. Sure, S. Kruk, M. Synak. SemVersion: A Versioning system for RDF and Ontologies. ESWC-05, 2005.
    [ILK12] D.H. Im, S.W. Lee, H.J. Kim. A Version Management Framework for RDF Triple Stores. International Journal of Software Engineering and Knowledge Engineering. 22(1):85-106, 2012.
    [ZTC11] D. Zeginis, Y. Tzitzikas, V. Christophides. On Computing Deltas of RDF/S Knowledge Bases. ACM Transactions on the Web (TWEB) 5, 3, 2011.
  • [KWZ08] R. Kontchakov, F. Wolter, M. Zakharyaschev. Can you Tell the Difference Between DL-Lite Ontologies? KR-08, 2008.
    [KWW08] B. Konev, D. Walther, F. Wolter. The Logical Difference Problem for Description Logic Terminologies. IJCAR-08, 2008.
    [FMV10] E. Franconi, T. Meyer, I. Varzinczak. Semantic Diff as the Basis for Knowledge Base Versioning. NMR-10, 2010.
  • PromptDiff->Protégé
    Ontoview->OntoStudio
    [NKKM04] N. Noy, S. Kunnatur, M. Klein, M. Musen. Tracking Changes During Ontology Evolution. ISWC-04, 2004.
    [KFKO02] M. Klein, D. Fensel, A. Kiryakov, D. Ognyanov. Ontology Versioning and Change Detection on the Web. EKAW-02, 2002.
    [PFF+13] V. Papavassiliou, G. Flouris, I. Fundulaki, D. Kotzinos, V. Christophides. High-Level Change Detection in RDF(S) KBs. Transactions on Database Systems (TODS), 38(1), 2013.
    [HGR12] M. Hartung, A. Gross, E. Rahm. COnto-diff: Generation of Complex Evolution Mappings for Life Science Ontologies. Journal of Biomedical Informatics, 2012.
  • PRISM also allows to determine when the changes are information preserving.
    [SMMS02] L. Stojanovic, A. Maedche, B. Motik, N. Stojanovic. User-driven Ontology Evolution Management EKAW-02, 2002.
    [MMS+03] A. Maedche, B. Motik, L. Stojanovic, R. Studer, R. Volz. An Infrastructure for Searching, Reusing and Evolving Distributed Ontologies. WWW-03, 2003.
    [PM10] A. Passant, P.N. Mendes. SparqlPuSH: Proactive Notification of Data Updates in RDF Stores Using PubSubHubbub. SFSW-10, 2010.
    [CMDZ10] C.A. Curino, H.J. Moon, A. Deutsch, C. Zaniolo. Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System: PRISM++. PVLDB 4(2):117-128, 2010.
    [CMZ08] C.A. Curino, H.J. Moon, C. Zaniolo. Graceful Database Schema Evolution: The PRISM Workbench. PVLDB 1(1):761-772, 2008.
  • [SP10] Y. Stavrakas, G. Papastefanatos. Supporting Complex Changes in Evolving Interrelated Web Databanks. CoopIS-10, 2010.
  • [PFF+13] V. Papavassiliou, G. Flouris, I. Fundulaki, D. Kotzinos, V. Christophides. High-Level Change Detection in RDF(S) KBs. Transactions on Database Systems (TODS), 38(1), 2013.
  • This is the example that I will use for this part of the talk.
    The question is what changed between V1 and V2.
  • Explain the 4 changes and how a human would understand them.
    Now let’s compare the low-level delta (produced by set difference) and the high-level delta (produced by the visual analysis).
  • Low-level changes bundled together to form high-level ones.
  • The superiority of high-level languages comes with a price.
  • CIDOC: ontology for cultural heritage
    GO: Gene ontology (gene products and their functions)
    CIDOC ontology manually and carefully curated, but they missed logging some changes.
  • [PFF+13] V. Papavassiliou, G. Flouris, I. Fundulaki, D. Kotzinos, V. Christophides. High-Level Change Detection in RDF(S) KBs. Transactions on Database Systems (TODS), 38(1), 2013.
  • [HHP+10]A. Hogan, A. Harth, A. Passant, S. Decker, A. Polleres. Weaving the Pedantic Web. LDOW-10, 2010.
  • First set is generic causes.
    Second set is LOD-specific.
  • [HHH+05] P. Haase, F. van Harmelen, Z. Huang, H. Stuckenschmidt, Y. Sure. A Framework for Handling Inconsistency in Changing Ontologies. ISWC-05, 2005.
  • The PROV Data Model, PROV-DM, is a conceptual data model for provenance.
    There are four kinds of constraints in PROV-DM: uniqueness constraints, event ordering constraints, impossibility constraints, and type constraints.
    In LOD constraints on properties are very popular.
    [TSBM10] J. Tao, E. Sirin, J. Bao, D.L. McGuinness, Integrity Constraints in OWL. AAAI-10, 2010.
    [MHS09] B. Motik, I. Horrocks, U. Sattler. Bridging the Gap Between OWL and Relational Databases. Journal of Web Semantics, 7(2):74-89, 2009.
  • [Jur74] J.M. Juran. The Quality Control Handbook. McGraw-Hill, New York, 1974.
    [BC09] C. Bizer, R. Cyganiak. Quality-driven Information Filtering Using the WIQA Policy Framework. Journal of Web Semantics, 7:1–10, 2009.
    [FH10] C. Furber, M. Hepp. Using Semantic Web Resources for Data Quality Management. EKAW-10, 2010.
  • The user can choose from several aggregation strategies, e.g. ANY, ALL, BEST, CONCAT, LATEST, AVG, MEDIAN, MIN (instead of dropping one of the two values).
  • [KHS12] M. Knuth, J. Hercher, H. Sack. Collaboratively Patching Linked Data. USEWOD-12, 2012.
  • Manual approaches: user manually resolves
    Semi-automatic: based on user input, possibly interactively
    Automatic: either ad-hoc, or based on user input provided as input to the process
  • [SC03] S. Schlobach, R. Cornet. Non-Standard Reasoning Services for the Debugging of Description Logic Terminologies. IJCAI-03, 2003.
    [MLBP06] T. Meyer, K. Lee, R. Booth, J.Z. Pan. Finding Maximally Satisfiable Terminologies for the Description Logic ALC. AAAI-06, 2006.
    [PT06] P. Plessers, O. de Troyer. Resolving Inconsistencies in Evolving Ontologies. ESWC-06, 2006.
    [WHR+05] H. Wang, M. Horridge, A. Rector, N. Drummond, J. Seidenberg. Debugging OWL-DL Ontologies: A Heuristic Approach. ISWC-05, 2005.
    [KPS+06] A. Kalyanpur, B. Parsia, E. Sirin, B. Cuenca Grau. Repairing Unsatisfiable Concepts in OWL Ontologies. ESWC-06, 2006.
    [LPSV06] S.C. Lam, J. Pan, D. Sleeman, W. Vasconcelos. A Fine-grained Approach to Resolving Unsatisfiable Ontologies. WI-06, 2006.
    [MWK00] P. Mitra, G. Wiederhold, M.L. Kersten. A Graph-oriented Model for Articulation of Ontology Interdependencies. EDBT-00, 2000.
    [NM00] N.F. Noy, M.A. Musen. Prompt: Algorithm and Tool for Automated Ontology Merging and Alignment. In AAAI/IAAI-00, 2000.
    [MFRW00] D.L. McGuinness, R. Fikes, J. Rice, S. Wilder. An Environment for Merging and Testing Large Ontologies. KR-00, 2000.
    [LB10] J. Lehmann, L. Buhmann. ORE - A Tool for Repairing and Enriching Knowledge Bases. ISWC-10, 2010.
    [QP07] G. Qi, J. Pan. A Stratification-based Approach for Inconsistency Handling in Description Logics. IWOD-07, 2007.
    [MLB05] T. Meyer, K. Lee, R. Booth. Knowledge Integration for Description Logics. AAAI-05, 2005.
  • Pretty much the same as debugging (recall slide 79).
  • In principle, debugging and repair use similar intuitions.
    But they are still not the same (custom constraints are not the same as inconsistency/incoherency).
    [Mel04] S. Melnik. Generic Model Management: Concepts and Algorithms. Springer, 2004.
    [RFC11] Y. Roussakis, G. Flouris, V. Christophides. Declarative Repairing Policies for Curated KBs. HDMS-11, 2011.
    [FRPV+12] G. Flouris, Y. Roussakis, M. Poveda-Villalon, P.N. Mendes, I. Fundulaki. Using Provenance for Quality Assessment and Repair in Linked Open Data. EvoDyn-12, 2012.
    [MMB12] P. Mendes, H. Muhleisen, C. Bizer. Sieve: Linked Data Quality Assessment and Fusion. LWDM-12, 2012.
  • [BC09] C. Bizer, R. Cyganiak. Quality-driven Information Filtering Using the WIQA Policy Framework. Journal of Web Semantics, 7:1–10, 2009.
  • [RH09] T. Ravn, M. Hoedbolt. How to Measure and Monitor the Quality of Master Data. 2009. Available at: http://www.information-management.com/issues/2007_58/master_data_management_mdm_quality-10015358-1.html
  • [ADA98] M.L. Abate, K.V. Diegert, H.W. Allen. A Hierarchical Approach to Improving Data Quality. Data Quality Journal, 4(1), 1998.
  • [RFC11] Y. Roussakis, G. Flouris, V. Christophides. Declarative Repairing Policies for Curated KBs. HDMS-11, 2011.
  • We create one table (predicate) for each “useful” relation between resources (e.g., subsumption, instantiation, etc).
    We also have one “spo” table for the remaining triples (basically spo for p a custom property).
  • [Deu09] A. Deutsch. FOL Modeling of Integrity Constraints (Dependencies). Encyclopedia of Database Systems, 2009.
  • Using DEDs and the aforementioned schema, it is easy to express constraints, such as acyclicity etc.
    It should be noted that we chose CWA for the semantics of our constraints.
    Once you have this machinery, it’s easy to do both diagnosis and repair, using syntactical manipulations over the constraints.
  • [RFC11] Y. Roussakis, G. Flouris, V. Christophides. Declarative Repairing Policies for Curated KBs. HDMS-11, 2011.
  • [SMMS02] L. Stojanovic, A. Maedche, B. Motik, N. Stojanovic. User-driven Ontology Evolution Management EKAW-02, 2002.
    [PT05] P. Plessers, O. de Troyer. Ontology Change Detection Using a Version Log. ISWC-05, 2005.
  • Explain the ontology. Arrows represent subsumptions, dashed arrows represent instantiations. Chess dataset, representing chess pieces. All Black pieces are Wooden etc.
  • Here suppose that you know that the King is Black. As a consequence, you have the inferred information that the King is Wooden.
  • Two main consequences from this analysis. First, the evolution result is not obvious, even in simple cases. Second, you need side-effects.
  • What can we do to avoid this naiveness? My proposal is: look at belief change.
    [FH11] E. Ferme, S.O. Hansson. AGM 25 Years: Twenty-five Years of Research in Belief Change. Journal of Philosophical Logic 40:295-331, 2011.
  • [Flo06] G. Flouris. On Belief Change and Ontology Evolution. Ph.D. thesis, University of Crete, 2006.
    [FPA05] G. Flouris, D. Plexousakis, G. Antoniou. On Applying the AGM Theory to DLs and OWL. ISWC-05, 2005.
    [FPA06] G. Flouris, D. Plexousakis, G. Antoniou. On Generalizing the AGM Postulates. STAIRS-06, 2006.
  • I will present 7 challenges.
  • Challenge #1
  • There are two philosophical viewpoints here.
  • Challenge #2 (related to the first)
  • Challenge #3: it will be explained with an example.
  • I now have to make Black an explicit subclass of Chess_Piece.
  • [KM91] H. Katsuno, A.O. Mendelzon. On the Difference Between Updating a Knowledge Base and Revising It. KR-91, 1991.
  • Thus, we need to define more than one operations. What operations are needed?
    This is challenge #4.
  • After I add [King rdf:type NotBlack] I have to delete [King rdf:type Black], but this is just a side-effect, not imposed directly by the change itself.
  • Challenge #5: how should I express the change?
    Also, what happens with multiple (bulk) changes?
  • Challenge #6: What are the properties that a “rational” operator should comply with?
  • Adding the first triple satisfies Success.
    Then we need to satisfy Validity.
    Then we must choose one of the options (Minimal Change).
  • Challenge #7: the principles do not solve the problem of evolution. Even with the principles, the result of evolution (and the side-effects) are not obvious, especially for more expressive languages.
  • I will start with an overview of some important results from the belief change community, before I go to evolution.
    Sometimes the different sets of proposed postulates are equivalent with each other (or adaptations for different settings), so they provide a different viewpoint, or intended for different languages.
    [AGM85] C. Alchourron, P. Gärdenfors, D. Makinson. On the Logic of Theory Change: Partial Meet Contraction and Revision Functions. Journal of Symbolic Logic, 50:510-530, 1985.
    [KM91] H. Katsuno, A.O. Mendelzon. On the Difference Between Updating a Knowledge Base and Revising It. KR-91, 1991.
    [Han91] S.O. Hansson. Belief Contraction Without Recovery. Studia Logica 50(2):251-260, 1991.
    [FKAC13] G. Flouris, G. Konstantinidis, G. Antoniou, V. Christophides. Formal Foundations for RDF/S KB Evolution. International Journal on Knowledge and Information Systems, 35(1):153-191, 2013.
    [WWT10] Z. Wang, K. Wang, R. Topor. A New Approach to Knowledge Base Revision in DL-Lite. AAAI-10, 2010.
    [QLB06a] G. Qi, W. Liu, D. Bell. Knowledge Base Revision in Description Logics. JELIA-06, 2006.
    [QLB06b] G. Qi, W. Liu, D. Bell. A Revision-based Approach for Handling Inconsistency in Description Logics. NMR-06, 2006.
  • [AGM85] C. Alchourron, P. Gärdenfors, D. Makinson. On the Logic of Theory Change: Partial Meet Contraction and Revision Functions. Journal of Symbolic Logic, 50:510-530, 1985.
    [KM91] H. Katsuno, A.O. Mendelzon. On the Difference Between Updating a Knowledge Base and Revising It. KR-91, 1991.
  • Now, I will start discussing evolution approaches.
  • [MSCK05] M. Magiridou, S. Sahtouris, V. Christophides, M. Koubarakis. RUL: A Declarative Update Language for RDF. ISWC-05, 2005.
    [RHTA10] C. Riess, N. Heino, S. Tramp, S. Auer. EvoPat - Pattern-based Evolution and Refactoring of RDF Knowledge Bases. ISWC-10, 2010.
    [LRV09] U. Lusch, S. Rudolph, D, Vrandecic. Tempus Fugit: Towards an Ontology Update Language. ESWC-09, 2009.
  • [SMMS02] L. Stojanovic, A. Maedche, B. Motik, N. Stojanovic. User-driven Ontology Evolution Management EKAW-02, 2002.
  • [LM04] K. Lee, T. Meyer. A Classification of Ontology Modification. AI-04, 2004.
    [QD09] G. Qi, J. Du. Model-based Revision Operators for Terminologies in Description Logics. IJCAI-09, 2009.
    [GQW12] S. Gao, G. Qi, H. Wang. A New Operator for ABox Revision in DL-Lite. AAAI-12, 2012.
    [GHV06] C. Gutierrez, C. Hurtado, A. Vaisman. The Meaning of Erasing in RDF Under the Katsuno-Mendelzon Approach. WebDB-06, 2006.
    [GHV11] C. Gutierrez, C. Hurtado, A. Vaisman. RDFS Update: From Theory to Practice. ESWC-11, 2011.
  • [MLB05] T. Meyer, K. Lee, R. Booth. Knowledge Integration for Description Logics. AAAI-05, 2005.
    [QLB06a] G. Qi, W. Liu, D. Bell. Knowledge Base Revision in Description Logics. JELIA-06, 2006.
    [QLB06b] G. Qi, W. Liu, D. Bell. A Revision-based Approach for Handling Inconsistency in Description Logics. NMR-06, 2006.
    [Han94] S.O. Hansson. Kernel Contraction. Journal of Symbolic Logic, 59(3):845-859, 1994.
    [SC03] S. Schlobach, R. Cornet. Non-Standard Reasoning Services for the Debugging of Description Logic Terminologies. IJCAI-03, 2003.
    [HWK06] C. Halaschek-Wiener, Y. Katz. Belief Base Revision for Expressive Description Logics. OWLED-06, 2006.
    [QHHP08] G. Qi, P. Haase, Z. Huang, J.Z. Pan. A Kernel Revision Operator for Terminologies. DL-08, 2008.
    [RW07] M.M. Ribeiro, R. Wassermann. Base Revision in Description Logics – Preliminary Results. IWOD-07, 2007.
  • The work that basically started the field of belief change. Fundamental work.
    [AGM85] C. Alchourron, P. Gärdenfors, D. Makinson. On the Logic of Theory Change: Partial Meet Contraction and Revision Functions. Journal of Symbolic Logic, 50:510-530, 1985.
  • [Flo06] G. Flouris. On Belief Change and Ontology Evolution. Ph.D. thesis, University of Crete, 2006.
    [FPA05] G. Flouris, D. Plexousakis, G. Antoniou. On Applying the AGM Theory to DLs and OWL. ISWC-05, 2005.
    [FPA06] G. Flouris, D. Plexousakis, G. Antoniou. On Generalizing the AGM Postulates. STAIRS-06, 2006.
    [RWFA13] M.M. Ribeiro, R. Wassermann, G. Flouris, G. Antoniou. Minimal Change: Relevance and Recovery Revisited. AI Journal (to appear), 2013.
    [AGM85] C. Alchourron, P. Gärdenfors, D. Makinson. On the Logic of Theory Change: Partial Meet Contraction and Revision Functions. Journal of Symbolic Logic, 50:510-530, 1985.
    [Han91] S.O. Hansson. Belief Contraction Without Recovery. Studia Logica 50(2):251-260, 1991.
  • [FPA06] G. Flouris, D. Plexousakis, G. Antoniou. On Generalizing the AGM Postulates. STAIRS-06, 2006.
    [FHP+06] G. Flouris, Z. Huang, J.Z. Pan, D. Plexousakis, H. Wache. Inconsistencies, Negations and Changes in Ontologies. AAAI-06, 2006.
    [Han91] S.O. Hansson. Belief Contraction Without Recovery. Studia Logica 50(2):251-260, 1991.
    [RWFA13] M.M. Ribeiro, R. Wassermann, G. Flouris, G. Antoniou. Minimal Change: Relevance and Recovery Revisited. AI Journal (to appear), 2013.
  • It is sometimes the case that all operators that satisfy such requirements lead to results in richer formalisms than the one you started from, thus violating the Principle. Implicitly, this was the reason for the negative results presented in the previous slides.
    CGKZ tried to understand this effect and broke the computation in two stages.
    [CGKZ12] B. Cuenca Grau, E. Kharlamov, D. Zheleznyakov. Ontology Contraction: Beyond the Propositional Paradise. AMW-12, 2012.
  • CGKZ went further to study this issue.
    First, they categorized contraction methods in some general classes.
    [CGKZ12] B. Cuenca Grau, E. Kharlamov, D. Zheleznyakov. Ontology Contraction: Beyond the Propositional Paradise. AMW-12, 2012.
  • [CKNZ10] D. Calvanese, E. Kharlamov, W. Nutt, D. Zheleznyakov. Evolution of DL-Lite Knowledge Bases. ISWC-10, 2010.
    [LLMW06] H. Liu, C. Lutz, M. Milicic, F. Wolter. Updating Description Logic ABoxes. KR-06, 2006.
    [GLPR07] G. Giacomo, M. Lenzerini, A. Poggi, R. Rosati. On the Approximation of Instance Level Update and Erasure in Description Logics. AAAI-07, 2007.
    [GLPR09] G. Giacomo, M. Lenzerini, A. Poggi, R. Rosati. On Instance-level Update and Erasure in Description Logic Ontologies. Journal of Logic and Computation 19(5):745-770, 2009.
    [WWT10] Z. Wang, K. Wang, R. Topor. A New Approach to Knowledge Base Revision in DL-Lite. AAAI-10, 2010.
  • [MRF08] M. Moguillansky, N. Rotstein, M. Falappa. A Theoretical Model to Handle Ontology Debugging and Change through Argumentation. IWOD-08, 2008.
    [HHH+05] P. Haase, F. van Harmelen, Z. Huang, H. Stuckenschmidt, Y. Sure. A Framework for Handling Inconsistency in Changing Ontologies. ISWC-05, 2005.
  • [KFAC07] G. Konstantinidis, G. Flouris, G. Antoniou, V. Christophides. Ontology Evolution: A Framework and its Application to RDF. SWDB-ODBIS-07, 2007.
    [FKAC13] G. Flouris, G. Konstantinidis, G. Antoniou, V. Christophides. Formal Foundations for RDF/S KB Evolution. International Journal on Knowledge and Information Systems, 35(1):153-191, 2013.
    [RFC11] Y. Roussakis, G. Flouris, V. Christophides. Declarative Repairing Policies for Curated KBs. HDMS-11, 2011.
  • The next 18 slides contain the references mentioned in this presentation.
  • Data and Knowledge Evolution

    1. 1. Data and Knowledge Evolution Giorgos Flouris fgeo@ics.forth.gr Open Data Tutorials, May 2013 Giorgos Flouris Slides available at: http://www.ics.forth.gr/~fgeo/Publications/WOD13.ppt Open Data Tutorials, May 2013 1
    2. 2. World Wide Web WWW (and HTML) focus on human readability Page presentation (fonts, colors, images, …) Human understanding Presentation ≠ Semantical content Content is not formally described (for a machine to understand) WWW contains documents, not data Giorgos Flouris Open Data Tutorials, May 2013 2
    3. 3. Problems with the Current Web Search and access becomes difficult Software ignorant of the semantical content of a web page Keyword search High recall, low precision Terminological issues Synonyms (heart disease = cardiac disease) Hyponyms/hypernyms (parliament members are politicians) Queries on the semantical content cannot be made Fetch articles that support B. Obama’s foreign policy Fetch the home pages of all members of the Greek Parliament Giorgos Flouris Open Data Tutorials, May 2013 3
    4. 4. Semantic Web The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation [BLHL01] The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries http://www.w3.org/2001/sw/ [Semantic Web] is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners http://www.w3.org/2001/sw/ Giorgos Flouris Open Data Tutorials, May 2013 4
    5. 5. Semantic Web in Practice Web of data, rather than documents HTML for presentation Semantical languages for semantical content Readable and understandable by humans and machines Semantic Web languages, protocols, etc Web page annotation (metadata descriptions etc) Publication of data on the Internet Efficient communication and manipulation of data over the Internet Different applications Efficient searching Sharing of data (e-science, e-government, remote learning, …) Linked Open Data (more on that later) Giorgos Flouris Open Data Tutorials, May 2013 5
    6. 6. Ontologies and Data (Datasets) An ontology is an explicit specification of a shared conceptualization of a domain [Gru93] Precise, logical account of the intended meaning of terms Common (shared) interpretation of terms Formal vocabulary for information exchange (humans/machines) Ontologies (vocabularies) allow the description of data Terminology: Ontology = vocabulary = schema Data = instances Dataset = data and the related ontology (i.e., a dataset may contain schema and/or data) Giorgos Flouris Open Data Tutorials, May 2013 6
    7. 7. Dataset Dynamics Datasets change constantly World changes (dynamic models) View on the world changes (new knowledge, measurements, etc) Perspective and usage changes Example: Gene Ontology (information about gene products): daily versions DBPedia: 1,4 updates/second (http://live.dbpedia.org/LiveStats/) [MLA+12] Need methodologies to cope with the problems related to dynamicity Evolution (modify a dataset in response to a change) Versioning (keep track of versions and their relations) Debugging, cleaning, repairing, quality (maintain consistency and quality in a dynamic environment) Change monitoring, detection and propagation (identify changes and use them to synchronize remote datasets) … Giorgos Flouris Open Data Tutorials, May 2013 7
    8. 8. Linked (Open) Data Datasets can be interlinked Sharing knowledge Reusing knowledge Modular development Reuse of schemas Linked Open Data (LOD) movement Constantly growing 31 billion triples and 295 datasets as of September 2011 Giorgos Flouris Open Data Tutorials, May 2013 8
    9. 9. Linked Open Data Cloud Diagram Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ Giorgos Flouris Open Data Tutorials, May 2013 9
    10. 10. Linked Open Data Challenges Both a blessing and a curse Added-value benefits Discovery of unknown correlations, connections, relationships Vast amount of interrelated knowledge No central control, everyone can publish and relate to others Quality of datasets lies/depends on different providers A change in one dataset affects all related ones Several new problems related to dynamics Propagation of changes among interrelated datasets Maintaining the quality of local datasets Co-evolution Giorgos Flouris Open Data Tutorials, May 2013 10
    11. 11. Scope: Dynamic Linked Datasets You are here Dynamic Datasets Giorgos Flouris Linked Datasets Open Data Tutorials, May 2013 11
    12. 12. Purpose of This Talk To survey different research areas related to dynamic LOD Remote Change Management Repair Data and Knowledge Evolution Categorize and classify works in each field Broad but shallow description Several references for more in-depth study No claims of completeness (references are just indicative) Two relevant surveys: [FMK+08, ZAA+13] Emphasis on some related work done in FORTH Will avoid technical discussion References will be given for further details Giorgos Flouris Open Data Tutorials, May 2013 12
    13. 13. Defining Remote Change Management Managing the effects of remote changes on interlinked datasets Remote changes have profound effects on local datasets Good practices are important —Proper versioning, change logging, adaptation to remote changes, … Attention exploded after the success of the LOD paradigm Related research questions How should I version my data? How can I efficiently monitor changes in my dataset? How can I detect changes in remote datasets? How does the evolution of remote datasets affect my data? How can I efficiently propagate changes from one dataset to another? Giorgos Flouris Open Data Tutorials, May 2013 13
    14. 14. Remote Change Management: Visualization RD1 Versioning, Change Monitoring Remote Site RD0 Change Detection LD1 Change Propagation Giorgos Flouris Open Data Tutorials, May 2013 Local Site LD0 14
    15. 15. Remote Change Management: Structure Three subfields Versioning Change monitoring and detection Change propagation Structure Introduction, definition of subfields Literature review An approach for change detection [PFF+13] Giorgos Flouris Open Data Tutorials, May 2013 15
    16. 16. Defining Repair Assessing and improving the quality and the semantical or structural integrity of the data Maintaining consistency, coherency, validity Restoring consistency, coherency, validity, when violated Assessing and improving quality Preserve quality/integrity in the face of remote changes Related research questions How can I preserve the integrity and quality of my data in a dynamic and interlinked environment? How can I guarantee consistency and validity? How can I restore consistency and validity, if violated? Giorgos Flouris Open Data Tutorials, May 2013 16
    17. 17. Repair: Visualization D0 D1 Repair Process (Cleaning, Debugging, Repairing, Quality Enhancement) Assessment Module (Diagnosis, Quality Assessment) Giorgos Flouris Open Data Tutorials, May 2013 17
    18. 18. Repair: Structure Four subfields Cleaning Debugging Validity repair Quality enhancement Structure Introduction, definition of subfields Literature review An approach for validity repair [RFC11] Giorgos Flouris Open Data Tutorials, May 2013 18
    19. 19. Defining Evolution Modifying a dataset in response to a change in the domain or its conceptualization Identify the result of applying new information on the dataset Determine the result of change propagation from remote datasets Understand the process of change Related research questions What is the semantics of evolution and change? How can I efficiently compute the ideal evolution result? Giorgos Flouris Open Data Tutorials, May 2013 19
    20. 20. Evolution: Visualization Real World D0 Dataset Giorgos Flouris Evolution Algorithm D1 Delete_Class(…) Pull_Up_Class(…) Rename_Class(…) … Open Data Tutorials, May 2013 20
    21. 21. Evolution: Summary Evolution topics Understanding the evolution challenges Understanding the process of change —Balancing between philosophical and practical considerations Cross-fertilization with belief change Structure Introduction, connection with belief change Understanding the process of change Literature review Giorgos Flouris Open Data Tutorials, May 2013 21
    22. 22. General Structure of this Talk A.Introduction to RDF/S, DLs, OWL B.Remote change management 1. Introduction, definition of subfields 2. Literature review 3. An approach for change detection [PFF+13] A.Repair Part I (2 hours) 1. Introduction, definition of subfields 2. Literature review 3. An approach for validity repair [RFC11] A.Data and Knowledge Evolution 1. Introduction, connection with belief change 2. Understanding the process of change 3. Literature review Part II (1 hour) The final few slides contain citations for the references in this talk Giorgos Flouris Open Data Tutorials, May 2013 22
    23. 23. Talk Structure (A) A.Introduction to RDF/S, DLs, OWL B.Remote change management 1.Introduction, definition of subfields 2.Literature review 3.An approach for change detection [PFF+13] A.Repair 1.Introduction, definition of subfields 2.Literature review 3.An approach for validity repair [RFC11] A.Data and Knowledge Evolution 1.Introduction, connection with belief change 2.Understanding the process of change 3.Literature review Giorgos Flouris Open Data Tutorials, May 2013 23
    24. 24. Datasets Basic structures Classes (or concepts): collections of objects (e.g., Actor, Politician) Properties (or roles): binary relationships between objects (e.g., started_on, member_of) Instances (or individuals): objects (e.g., Giorgos, B. Obama) Relations between them Subsumption (Parliament_Member subclass of Politician), instantiation (B. Obama instance of Politician), … The allowed relations and their semantics depend on the language Different representation languages for LOD RDF/S, OWL Giorgos Flouris Open Data Tutorials, May 2013 24
    25. 25. Visualization, Triples, Serialization Visualization Triple Representation Period Actor Event participants Existing started_on Stuff Onset Birth Define classes [Period type Class] Define properties [participants type Property] [participants domain Onset] [participants range Actor] Instantiate/define individuals [G_Birth type Birth] [Giorgos type Actor] [G_Birth participants Giorgos] Define hierarchies [Event subClass Period] participants Giorgos G_Birth Serialization (RDF/XML) <rdfs:Class rdf:ID=“Period”> </rdfs:Class> <rdf:Property rdf:ID=“participants”> <rdfs:domain rdf:resource=“Onset”/> <rdfs:range rdf:resource=“Actor”/> </rdf:Property> <G_Birth rdf:about Birth> <participants> <Giorgos rdf:about Actor/> </participants> </G_Birth> <rdfs:Class rdf:ID=“Event”> <rdfs:subClassOf rdf:resource=“Period”/> </rdfs:Class> instantiation subsumption Giorgos Flouris Open Data Tutorials, May 2013 25
    26. 26. RDF and RDFS An RDF dataset consists of triples RDFS adds semantics Subsumption hierarchies (classes and properties) —Transitive Instantiation —Inheritance, implicit instantiation Sometimes more than subsumption/instantiation is needed Combining concepts, roles to form more complex relations —Concept definitions: a mother is a female who has a child —Other knowledge: all items stored in warehouse X are flammable Constraints on data —Each person must have one mother Giorgos Flouris Open Data Tutorials, May 2013 26
    27. 27. Extensions of RDF/S: DLs (1/2) Description Logics (DLs) http://dl.kr.org/ Formal underpinning of web representation languages Family of logical formalisms —Well-defined semantics —Model-theoretic reasoning based on interpretations Formally studied —Expressiveness, reasoning tools, computational complexity, … Components Individuals: specific objects (instances) – Giorgos Concepts: sets of individuals (classes) – Parent Roles: sets of pairs of individuals (properties) – has_child Operators: ¬, ⊓ , ∃, {.}, ⊤ , … Connectives: ⊑ , ≡, … Giorgos Flouris Open Data Tutorials, May 2013 27
    28. 28. Extensions of RDF/S: DLs (2/2) Definitions, partial definitions, constraints, subsumptions, … A mother is a female who has a child —Mother ≡ ∃has_child ⊓ Female Each person must have one mother —Person ⊑ ∃has_child-1.Mother A great variety of DLs (trade-off involved) Different properties Different expressive power Different reasoning complexity Giorgos Flouris Open Data Tutorials, May 2013 28
    29. 29. Extensions of RDF/S: OWL OWL (Web Ontology Language) http://www.w3.org/2004/OWL/ General-purpose representation language Compatible with the architecture of the Semantic Web A family of languages Flavors: OWL-Lite, OWL-DL, OWL Full Profiles: OWL 2 EL, OWL 2 QL, OWL 2 RL Different expressiveness (and complexity) Each corresponds to a specific DL Useful from a modeling perspective Expressive but not too complex Appealing computationally Giorgos Flouris Open Data Tutorials, May 2013 29
    30. 30. Representation Languages in LOD Mostly RDF With RDFS semantics —Instantiations —Class subsumption —Property subsumption is rare Some OWL Mostly OWL Lite Extensive use of owl:sameAs —Often abusing it [HHM+10] OWL 2 profiles are gaining ground Giorgos Flouris Open Data Tutorials, May 2013 30
    31. 31. Talk Structure (B1) A.Introduction to RDF/S, DLs, OWL B.Remote change management 1.Introduction, definition of subfields 2.Literature review 3.An approach for change detection [PFF+13] A.Repair 1.Introduction, definition of subfields 2.Literature review 3.An approach for validity repair [RFC11] A.Data and Knowledge Evolution 1.Introduction, connection with belief change 2.Understanding the process of change 3.Literature review Giorgos Flouris Open Data Tutorials, May 2013 31
    32. 32. Motivation for Remote Change Management Crucial problem for dynamic linked datasets Linking: datasets linked to other datasets (e.g., vocabularies) Dynamics: changes cause problems to linked datasets No central curation or control —No control over (or knowledge of) other datasets’ evolution process Curators don’t bother annotating and logging changes —Temporal and versioning information is usually missing [RPH+12] Remote change management seeks solutions to allow: DR Keeping track of versions Restoring previous versions uses Assessing compatibility of versions Monitoring and detecting changes Tracing back the evolution history (of datasets, concepts, …) DL —For visualization and understanding Propagating changes to synchronize linked datasets Giorgos Flouris Open Data Tutorials, May 2013 32
    33. 33. Subfields of Remote Change Management Remote Change Management Versioning —Keep track of versions Change monitoring and detection —Monitoring: record changes as they happen —Detection: identify changes after they happen Change propagation —Propagate changes across linked datasets for synchronization purposes Giorgos Flouris Open Data Tutorials, May 2013 33
    34. 34. Versioning Versioning Keep track of versions Identify different versions of a dataset Enable transparent access to the “correct” version (smooth interoperation) Issues involved Identification —Determine which versions to store and how to identify them —Manually or automatically (syntactical, semantical considerations) —Packaging of changes Relation between versions —A sequence or a tree Compatibility information —Backwards/forwards compatibility and how to determine it (often manually) —Dataset-wide compatibility or fine-grained compatibility (e.g., at resource level) —Metadata on the different versions Transparent access —Relate versions with (compatible) data sources, applications etc Giorgos Flouris Open Data Tutorials, May 2013 34
    35. 35. Change Monitoring and Detection Change monitoring Record changes as they happen —Manual (error-prone and often incorrect) —Automatic (not used in practice) In the good will of the dataset owner Sometimes change logs are inaccessible DR uses Change detection Identify changes after they happen Based on the previous and current versions DL In both cases, a change language is required Supported set of changes, along with their semantics Can be low-level or high-level Giorgos Flouris Open Data Tutorials, May 2013 35
    36. 36. Change Propagation Change propagation Communicate changes to linked datasets for synchronization Push-based or pull-based propagation Push-based: locally-initiated, via “registration” or via monitoring and versioning Pull-based: consumer-initiated Communication based on deltas (rather than versions) Reduce communication overhead Reduce storage requirements On average, 2-3% of a dataset changes between versions [OK02] Deltas are based on a language of changes Giorgos Flouris Open Data Tutorials, May 2013 36
    37. 37. Talk Structure (B2) A.Introduction to RDF/S, DLs, OWL B.Remote change management 1.Introduction, definition of subfields 2.Literature review 3.An approach for change detection [PFF+13] A.Repair 1.Introduction, definition of subfields 2.Literature review 3.An approach for validity repair [RFC11] A.Data and Knowledge Evolution 1.Introduction, connection with belief change 2.Understanding the process of change 3.Literature review Giorgos Flouris Open Data Tutorials, May 2013 37
    38. 38. Versioning Approaches (1/3) Capture different aspects of versioning, such as: Detecting versions Storing versions efficiently Allow cross-snapshot queries —Find gene products whose functions have not changed in the last 50 versions —Determine price fluctuation for x along different versions of the product catalog Early versioning approaches inspired by SVN Good for files, not directly adaptable to semantical languages SHOE language [HH00] Machine-readable version information (e.g., compatibility) Provided by curator as SHOE statements Memento [SSN+10] Fine-grained versioning at URI level (resources, web pages) Machine-readable version information, in the HTTP header —Timestamps, traversal information (prior/current versions) etc Giorgos Flouris Open Data Tutorials, May 2013 38
    39. 39. Versioning Approaches (2/3) Theoretical foundations for versioning [HP04] Formal definitions to capture notions such as: —Compatibility (between versions) —Commitment (resources committing to a certain ontology) —Ontology perspectives (the part of the web committing to an ontology) Temporal approaches [HS05, PTC05, KLGE07] For capturing temporal relations between versions For allowing cross-snapshot queries Versioning in multi-editor environments [RSDT08] Via change monitoring Giorgos Flouris Open Data Tutorials, May 2013 39
    40. 40. Versioning Approaches (3/3) Automatically detecting version relationships [AAM09] Using heuristics based on URIs Study of “relatedness” between versions [CQ13] A model of “relatedness” between vocabularies from various sources Similar to links in web pages POI: Partial Order Index [TTA08] Efficient method for storing versions and their differences Stores several versions, exploiting their common triples for efficient storage Giorgos Flouris Open Data Tutorials, May 2013 40
    41. 41. Change Languages (1/2) Change languages necessary for monitoring, detection, propagation Granularity Low-level (or atomic, or elementary) —Simple add/remove operations —Add(s,p,o), Delete(s,p,o) —Simple to detect and define —Focus on machine-readability: determinism, well-defined semantics High-level (or complex, or composite) —More coarse-grained, compact, closer to editor’s perception and intuition —Generalize_Domain(P,A), Delete_Class(A) —More interesting; harder to detect and define —Focus on human-understandability: often unclear and/or informal semantics Giorgos Flouris Open Data Tutorials, May 2013 41
    42. 42. Change Languages (2/2) Many different high-level languages (no standard) [HGR12, JAP09, PFF+13, SK03, AH06, DA09, PTC07, …] Some are domain-specific (e.g., [HGR12]) Some are dynamic (e.g,, [AH06, DA09, PTC07]) —Allow custom, user-defined changes Some allow terminological changes (e.g., [PFF+13]) —Rename, merge, split —Common, but tough to detect (easily confused with add/delete) Giorgos Flouris Open Data Tutorials, May 2013 42
    43. 43. Representation Issues Deltas are just sets of changes from the change language Changes usually represented using a change ontology Ontology represents changes A specific change is an instance of such an ontology Deltas associated with sets of such instances Different proposals [NCLM06, KFKO02, KN03, PT05] Allows the manipulation and communication of deltas/changes using standard Semantic Web technologies Giorgos Flouris Open Data Tutorials, May 2013 43
    44. 44. Change Monitoring Approaches Using a version log [PT05] Logging actions on the dataset Use it for change detection, as well as proper versioning Good quality, high-level change monitoring Based on a dynamic language of changes Using migration specifications [ZZL+03] Similar to logs, but with a more formal structure DBPedia change monitoring [MLA+12] http://live.dbpedia.org/ Live versions, as opposed to “standard” versions Giorgos Flouris Open Data Tutorials, May 2013 44
    45. 45. Low-Level Change Detection (1/2) SemVersion [VWS+05] Developed in Karlsruhe (FZI, AIFB) Low-level change detection tool for RDF Provides also versioning functionalities Allows cross-snapshot queries For RDF [ILK12] Low-level change detection based on set difference Aggregating and compressing deltas Also dealing with versioning issues For RDF/S [ZTC11] Takes into account semantics (RDFS inference) Four different methods to compute deltas (all based on set difference) Formal analysis of these methods’ properties and semantics Extension: effect of blank nodes on change detection [TLZ12] Giorgos Flouris Open Data Tutorials, May 2013 45
    46. 46. Low-Level Change Detection (2/2) Bubastis (http://www.ebi.ac.uk/fgpt/sw/bubastis/index.html) Simple diff tool (triple-based comparison) Basically RDF, but also supports OWL For DL-Lite [KWZ08] Formal, semantical approach For EL [KWW08] Uses a concept-based description of changes For propositional knowledge bases [FMV10] Propositional, but generic; it can be applied to DLs Formal analysis of the problem Also dealing with propagation semantics Giorgos Flouris Open Data Tutorials, May 2013 46
    47. 47. High-Level Change Detection (1/2) For OWL: PromptDiff [NKKM04], OntoView [KFKO02] Employ heuristics and probabilistic methods Evaluation using precision/recall metrics against a gold standard Integrated into tools that also provide versioning functionalities For RDF/S [PFF+13] Dealing with both machine-readability and humanunderstandability Also dealing with propagation (applying changes) To be discussed in detail later COnto-Diff [HGR12] Rule-based approach Also dealing with propagation Giorgos Flouris Open Data Tutorials, May 2013 47
    48. 48. Change Propagation Approaches Usually part of other tools [SMMS02, MMS+03] Versioning, monitoring tools (push-based propagation) Detection tools (pull-based propagation) Evolution and repair tools (pull-based propagation) —Adapt your data to be “compatible” with the new remote version SparqlPush [PM10] Push-based propagation of changes on SPARQL “views” PRISM, PRISM++ [CMZ08, CMDZ10] High-level language of schema changes for relational data —Also supports changes on the integrity constraints Identifies and propagates the changes required in the data for abiding to the new schema Query and update rewriting —For applications that try to access the old schema Giorgos Flouris Open Data Tutorials, May 2013 48
    49. 49. Other Change Management Approaches Complete approach for XML [SP10] Representing changes inline with the data using a graph (“evograph”) Supports different change representation languages (both lowlevel and high-level) Timestamps changes Monitoring: evograph can be used to log the changes Propagation: changes can be accessed and propagated Versioning: timestamps in changes can be used to generate snapshots (versions) at different times Allows cross-snapshot queries Fairly generic, can be adapted for RDF Giorgos Flouris Open Data Tutorials, May 2013 49
    50. 50. Talk Structure (B3) A.Introduction to RDF/S, DLs, OWL B.Remote change management 1.Introduction, definition of subfields 2.Literature review 3.An approach for change detection [PFF+13] A.Repair 1.Introduction, definition of subfields 2.Literature review 3.An approach for validity repair [RFC11] A.Data and Knowledge Evolution 1.Introduction, connection with belief change 2.Understanding the process of change 3.Literature review Giorgos Flouris Open Data Tutorials, May 2013 50
    51. 51. Our Approach on Change Detection Purpose of this work: change detection [PFF+13] A posteriori detect the differences (delta or diff) between versions in a concise, intuitive and correct way Main design choices Change detection based on a general-purpose high-level language Human-understandable, but also machine-readable Clear, formal semantics Provable formal properties and functionality guarantees Detection and application (propagation) semantics V1 Giorgos Flouris C1 V2 C2 V3 C3 Open Data Tutorials, May 2013 V4 C4 V5 51
    52. 52. Sample Evolution Version 1 (V1) Version 2 (V2) Period Actor Actor Event Persistent participants Existing started_on Onset participants started_on Onset Event Birth Evolution Stuff Stuff Birth participants Giorgos G_Birth participants Giorgos G_Birth instantiation subsumption instantiation subsumption Giorgos Flouris Open Data Tutorials, May 2013 52
    53. 53. Analyzing the Evolution (Using Triples) Triples in V1 (partial list) Triples in V2 (partial list) [Event type Class] [Period type Class] [Event subclass Period] [participants type Property] [participants domain Onset] [participants range Actor] [Giorgos type Actor] [Existing type Class] [Stuff subclass Existing] [started_on domain Existing] [Onset subclass Event] [Birth subclass Onset] … Giorgos Flouris [Event type Class] [participants type Property] [Event domain participants] [participants range Actor] [Giorgos type Actor] [Persistent type Class] [Stuff subclass Persistent] [started_on domain Persistent] [Onset subclass Event] [Birth subclass Event] … Open Data Tutorials, May 2013 53
    54. 54. Low-Level Delta Triples in V2 but not in V1 Triples in V1 but not in V2 (added triples) (deleted triples) [Event domain participants] [Persistent type Class] [Stuff subclass Persistent] [started_on domain Persistent] [Birth subclass Event] Version 1 (V1) [Period type Class] [Event subclass Period] [participants domain Onset] [Existing type Class] [Stuff subclass Existing] [started_on domain Existing] [Birth subclass Onset] Version 2 (V2) Period Actor Actor Event Persistent participants Existing started_on Onset participants started_on Onset Event Birth Evolution Stuff Stuff Birth participants Giorgos participants Giorgos G_Birth instantiation subsumption Giorgos Flouris instantiation subsumption G_Birth Low-Level Delta Add([Event domain participants]) Add([Persistent type Class]) … Del([Period type Class]) … Open Data Tutorials, May 2013 54
    55. 55. Analyzing the Evolution (Visually) Version 1 (V1) Version 2 (V2) Period Actor Actor Event Persistent participants started_on Event Birth Onset participants Existing started_on Onset Evolution Stuff participants Stuff Birth participants Giorgos G_Birth instantiation subsumption Giorgos Flouris Giorgos G_Birth High-Level Delta Generalize_Domain(participants, Onset, Event) Pull_Up_Class(Birth, Onset, Event) Delete_Class(Period, Ø, {Event}, Ø, Ø, Ø, Ø) Rename_Class(Existing, Persistent) Open Data Tutorials, May 2013 55
    56. 56. Comparing the Deltas Version 1 (V1) Version 2 (V2) Period Actor Actor Event Persistent participants started_on Onset Event Birth participants Existing started_on Onset Evolution Stuff participants Stuff Giorgos Birth participants Giorgos G_Birth instantiation subsumption Giorgos Flouris Low-level delta Del([participants domain Onset]) Del([Period type Class]) Del([Birth subclass Onset]) Add([participants domain Event]) Del([Event subclass Period]) Add([Birth subclass Event]) Open Data Tutorials, May 2013 G_Birth High-level delta Generalize_Domain Delete_Class Pull_Up_Class (Period,Ø,{Event},Ø,Ø,Ø,Ø) (participants, Onset, Event) (Birth, Onset, Event) 56
    57. 57. Associations (Partitioning) Low-Level Changes Associated High-Level Changes Del([participants domain Onset]) Generalize_Domain (participants, Onset, Event) Add([participants domain Event]) Del([Birth subclass Onset]) Pull_Up_Class(Birth, Onset, Event) Add([Birth subclass Event]) Del([Period type Class]) Delete_Class (Period, Ø, {Event}, Ø, Ø, Ø, Ø) Del([Event subclass Period]) Del([Existing type Class]) Del([Stuff subclass Existing]) Del([started_on domain Existing]) Add([Persistent type Class]) Rename_Class(Existing, Persistent) Add([Stuff subclass Persistent]) Add([started_on domain Persistent]) Giorgos Flouris Open Data Tutorials, May 2013 57
    58. 58. Challenges for High-Level Languages High-level deltas are superior More concise (e.g., Rename_Class) More intuitive (e.g., Pull_Up_Class) Carry additional information (e.g., Generalize_Domain) Challenges for high-level languages Must be deterministic (exactly one high-level delta) Must be fine-grained enough to capture subtle changes Must be coarse-grained enough to be concise Must be intuitive and close to editor’s perception of the changes Compatible detection and application algorithms Intuitive results Efficient Giorgos Flouris Open Data Tutorials, May 2013 58
    59. 59. Proposed Language L The formal definition of a change consists of: Changes required in the low-level delta (added/deleted triples) Conditions that should hold in V1 and/or V2 Generalize_Domain(P, X, Y) Del([P domain X]) Add([P domain Y]) P existing property in both V1, V2 X, Y existing classes in both V1, V2 X subclass of Y in both V1, V2 Generalize_Domain(participants, Onset, Event): detectable Similarly for the other changes in L (132 high-level ones) Giorgos Flouris Open Data Tutorials, May 2013 59
    60. 60. Types and Number of Defined Changes Changes (134) Low-Level (2) High-Level (132) Add Del Basic (54) Heuristic (27) Delete_Subclass Delete_Domain Giorgos Flouris Composite (51) Pull_Up_Class Change_Domain Rename_Class Split_Class Open Data Tutorials, May 2013 60
    61. 61. Results on L: Granularity Granularity problem: solved by defining levels of changes Basic Changes: fine-grained, roughly correspond to low-level Composite Changes: coarse-grained, group several basic changes together Heuristic Changes: based on heuristics, necessary for Rename, Merge, Split etc; require mappings between URIs Problems with determinism One evolution could correspond to different sets of basic/composite changes Priorities in detection Heuristic  Composite  Basic Giorgos Flouris Open Data Tutorials, May 2013 61
    62. 62. Results on L: Determinism Each low-level change is associated with exactly one detectable high-level change Full partitioning of low-level changes into high-level ones Each pair of versions (V1, V2) is associated with: Exactly one low-level delta Exactly one high-level delta Determinism is necessary More than one would lead to ambiguities Less than one would make some inputs (V1, V2) irresolvable Giorgos Flouris Open Data Tutorials, May 2013 62
    63. 63. Results on L: Propagation Version 1 (V1) Version 2 (V2) Period Actor Actor Event Detect C participants Existing started_on Onset Persistent participants started_on Onset Event Birth Apply C Stuff Stuff Apply C-1 Birth participants Giorgos G_Birth participants Giorgos G_Birth Giorgos Flouris Open Data Tutorials, May 2013 63
    64. 64. Results on L: Deltas Keep Version History Can reproduce all versions as long as you keep (any) one version and the deltas Deltas are more concise than the versions themselves Storage and communication efficiency V1 Giorgos Flouris C1 V2 C2 V3 C3 Open Data Tutorials, May 2013 V4 C4 V5 64
    65. 65. Change Detection: Evaluation Detection and application algorithms implemented for evaluation Performance Complexity: O(max{N1,N2,N2}) Performance depends on the detected changes (type, number) Bottleneck: calculating the low-level delta (>80% of total time) Intuitiveness Changes in our language are used in practice Results confirmed by literature/editor notes (CIDOC, GO) Better than CIDOC’s manually recorded changes (18 changes missed) Conciseness Basic ≈ Low-Level Basic + Composite + Heuristic << Low-Level Up to 80% reduction, depending on the case Giorgos Flouris Open Data Tutorials, May 2013 65
    66. 66. Summary and Conclusions: RCM Remote change management is at the heart of LOD Uncontrolled character of LOD makes it critical Various related fields Versioning, change monitoring and detection, change propagation Unfortunately, not used in practice in LOD Presented a formal approach for change detection [PFF+13] Other possible directions (related to LOD) Best practices should be studied and promoted —Automated versioning and monitoring mechanisms embedded in evolution tools/editors —Understand and use temporal and provenance metadata on versions Improved change monitoring and detection —A standard change language? Giorgos Flouris Open Data Tutorials, May 2013 66
    67. 67. Talk Structure (C1) A.Introduction to RDF/S, DLs, OWL B.Remote change management 1.Introduction, definition of subfields 2.Literature review 3.An approach for change detection [PFF+13] A.Repair 1.Introduction, definition of subfields 2.Literature review 3.An approach for validity repair [RFC11] A.Data and Knowledge Evolution 1.Introduction, connection with belief change 2.Understanding the process of change 3.Literature review Giorgos Flouris Open Data Tutorials, May 2013 67
    68. 68. Motivation for Repair Published data is usually problematic Several different types of problems in LOD [HHP+10] Pedantic web initiative (http://pedantic-web.org/) —Advice for data owners on how to prevent common problems in their data Giorgos Flouris Open Data Tutorials, May 2013 68
    69. 69. Causes of Data Problems Several reasons for data problems Erroneous data (faulty sensors, human mistakes etc) Different symbolisms and terminology Modeling errors (e.g., all birds fly) Requirements (constraints) on the data may change —E.g., when applications’ needs change Reuse data by different providers (no quality guarantees) Quality jeopardized by re-use and open evolution Integration/merging of datasets Giorgos Flouris Open Data Tutorials, May 2013 69
    70. 70. Generic Approaches Four ways to deal with problems in data [HHH+05] Prevent it (careful evolution, merging etc) —Can only prevent problems caused by changes in the local dataset Correct it (repair) —Actively address the problem (after it appears) Ignore it (consistent query answering, non-monotonic reasoning) —CQA: popular in database community; prevents user from noticing the problem by rewriting queries (common denominator approach) —NMR: popular in AI community; avoid trivialization of reasoning (paraconsistent reasoning, defeasible reasoning, default reasoning, …) Use versions (versioning) —Make sure you refer to the correct (compatible) version —Only when the problem is due to a remote change Giorgos Flouris Open Data Tutorials, May 2013 70
    71. 71. Subfields of Repair Cleaning Mainly related to literal quality Terminology, symbols, metric units etc Debugging Consistency (at least one model) Coherency (no unsatisfiable classes) Relevant for DL/OWL only Validity repair Satisfaction of custom integrity constraints (e.g., business rules) Expressed in OWL, DL, Datalog or predicate logic Quality enhancement Assessing and improving the quality of data Different dimensions (timeliness, completeness, reputation, …) Giorgos Flouris Open Data Tutorials, May 2013 71
    72. 72. Cleaning Literals in LOD are often messy, and have to be “cleaned up” Different formats for names, dates etc —&gf name “Giorgos Flouris” —&gf birth_date 03/05/76 —&gf birthplace “Hellas” &gf name “Flouris, Giorgos” &gf birth_date 05/03/76 &gf birthplace “Greece” Different symbols —Paris land_area 105,4 —Paris population 2.234.105 Paris land_area 105.4 Paris population 2,234,105 Different metric units —Paris land_area 105,4 —&x price 30 Paris land_area 40,7 &x price 39 Inconsistent values —&x price 0 &x price “free” Data is not in the desired form (data transformation) —LIP6 addr “4, P. Jussieu” Giorgos Flouris LIP6 street “P. Jussieu” Open Data Tutorials, May 2013 LIP6 streetno 4 72
    73. 73. Debugging Coherency No unsatisfiable classes Indicates good modeling ¬hasHorns hasHorns Consistency At least one model Avoids reasoning triviality canFly Horse ¬canFly Bird Relevant for DL/OWL only Unicorn Penguin Pengo Giorgos Flouris Open Data Tutorials, May 2013 73
    74. 74. Validity Repair Validity repair Satisfaction of custom integrity constraints (e.g., business rules) Encode context- or application-specific requirements —PROV-DM: http://www.w3.org/TR/2013/REC-prov-constraints-20130430/ Applications may be useless over invalid data Expressed in OWL, DL, Datalog, Datalog±, predicate logic, … Different expressive power Different semantics (OWA/CWA, UNA) [TSBM10, MHS09] Various types of constraints Functional, inverse functional, transitivity, cardinality constraints Disjointness constraints Primary key, foreign key, inclusion constraints Tuple-generating dependencies (tgd), equality-generating dependencies (egd) Giorgos Flouris Open Data Tutorials, May 2013 74
    75. 75. Quality Enhancement Quality is defined as “fitness for use” [Jur74] Multi-faceted (timeliness, completeness, reputation, …) Task-dependent Subjective Assessing quality Via assessment functions (e.g., [BC09]) or SPARQL queries (e.g., [FH10]) Some kind of combined scoring over the relevant dimensions Improving (enhancing) quality Usually manual Tries to improve the assessment score Giorgos Flouris Open Data Tutorials, May 2013 75
    76. 76. Talk Structure (C2) A.Introduction to RDF/S, DLs, OWL B.Remote change management 1.Introduction, definition of subfields 2.Literature review 3.An approach for change detection [PFF+13] A.Repair 1.Introduction, definition of subfields 2.Literature review 3.An approach for validity repair [RFC11] A.Data and Knowledge Evolution 1.Introduction, connection with belief change 2.Understanding the process of change 3.Literature review Giorgos Flouris Open Data Tutorials, May 2013 76
    77. 77. Cleaning Tool: OpenRefine Open source Originally developed by google (GoogleRefine) http://openrefine.org/ Applies on various representations of the input data CSV/TSV, Excel, JSON, XML, RDF as XML, etc RDF extension Functionalities (related to this talk) Data exploration and cleaning —Both automated and manual (interface assists in manual cleaning) Data transformation (format conversion) —Uses GREL (Google Refine Expression Language) and regular expressions Giorgos Flouris Open Data Tutorials, May 2013 77
    78. 78. Cleaning Tool: ODCleanStore Web application, written in Java Developed by Charles University (Prague) http://www.ksi.mff.cuni.cz/~knap/odcs/sections/odcs.html Functionalities (related to this talk) Cleaning —Via “transformers” (policies for cleaning) —Expressed using SPARQL or regular expressions Quality assessment —Transformer assigns a score to data Validity repair —Supports conflict resolution for functional properties —Decides what to drop based on the quality of the data items involved —Supports aggregation functionalities based on “aggregation policies” Giorgos Flouris Open Data Tutorials, May 2013 78
    79. 79. Other Cleaning Approaches Involve users in the loop [KHS12] Manual requests for improvements (cleaning, quality, …) Patch Request Ontology (PRO) Use a GWAP (Game With A Purpose) for identifying data problems Giorgos Flouris Open Data Tutorials, May 2013 79
    80. 80. Debugging: Literature Overview Identify and resolve inconsistency/incoherency Two phases Diagnosis: identify inconsistency/incoherency Repair: remove inconsistency/incoherency Literature mostly dealing with diagnosis Repair requires additional user input Diagnosis is more than reasoning Pinpoint the causes of inconsistency/incoherency Repair User input required (manual or semi-automatic approaches) Automatic approaches also require user input or domain knowledge (ad-hoc solutions) Giorgos Flouris Open Data Tutorials, May 2013 80
    81. 81. Debugging Approaches Diagnosis using tableau-based algorithms for various DLs Identify minimal sets of responsible axioms —[SC03, MLBP06, PT06, WHR+05] Identify responsible parts of axioms (more fine-grained) —[KPS+06, LPSV06] Repair Manual: editors and related tools —Onion [MWK00], PROMPT [NM00], Chimaera [MFRW00] Semi-automatic —Interactive approach via suggestions: ORE tool [LB10] Automatic: —Using external information, e.g., for stratified datasets [QP07, MLB05] Giorgos Flouris Open Data Tutorials, May 2013 81
    82. 82. Validity Repair: Literature Overview Identify and resolve invalidity (custom constraints) Two phases Diagnosis: identify invalidity Repair: remove invalidity Literature mostly dealing with diagnosis Repair requires additional user input Diagnosis is more than validation Pinpoint the causes of invalidity Repair User input required (manual or semi-automatic approaches) Automatic approaches also require user input or domain knowledge (ad-hoc solutions) Giorgos Flouris Open Data Tutorials, May 2013 82
    83. 83. Validity Repair Approaches Not much work in repairing custom constraints in LOD A large body of related work for the relational setting —For various constraint types and repair methodologies Existing tools Stardog (http://www.stardog.com/docs/) —Commercial RDF database that supports validation of custom constraints Rondo (relational/XML) [Mel04] —Repair based on a fixed “importance” of data items Declarative repairing based on preferences [RFC11] —To be discussed in detail later Repairing functional properties ([FRPV+12], Sieve [MMB12]) Giorgos Flouris Open Data Tutorials, May 2013 83
    84. 84. Data Quality Frameworks (1/4) Many different quality assessment methodologies and frameworks Several different quality dimensions Different works consider different dimensions Different proposals for their classification and organization There is no single, generally accepted data quality framework Cannot be one Different applications have different needs Giorgos Flouris Open Data Tutorials, May 2013 84
    85. 85. Data Quality Frameworks (2/4) Quality dimensions, quality indicators, scoring functions and assessment metrics [BC09] Different quality dimensions —Timeliness, completeness, reputation, … Each dimension associated with different indicators —Timeliness: last modification date, creation date, … Each indicator associated with different scoring functions —E.g., days since last update Scoring functions from relevant indicators are combined using assessment metrics —E.g., Reputation_value*0,6 + days_since_update*0,4 Giorgos Flouris Open Data Tutorials, May 2013 85
    86. 86. Data Quality Frameworks (3/4) [RH09] Giorgos Flouris Open Data Tutorials, May 2013 86
    87. 87. Data Quality Frameworks (4/4) [ADA98] Giorgos Flouris Open Data Tutorials, May 2013 87
    88. 88. Talk Structure (C3) A.Introduction to RDF/S, DLs, OWL B.Remote change management 1.Introduction, definition of subfields 2.Literature review 3.An approach for change detection [PFF+13] A.Repair 1.Introduction, definition of subfields 2.Literature review 3.An approach for validity repair [RFC11] A.Data and Knowledge Evolution 1.Introduction, connection with belief change 2.Understanding the process of change 3.Literature review Giorgos Flouris Open Data Tutorials, May 2013 88
    89. 89. Our Approach on Validity Repair Declarative approach for validity repair [RFC11] Main design choices Both diagnosis and repair Applicable for RDF/S Adopted relational semantics (CWA) for the constraints Generality on the supported constraints (DEDs) Minimal user interaction (all info provided at input) Automatic diagnosis Automatic repair using preferences (provided by the user at input) Giorgos Flouris Open Data Tutorials, May 2013 89
    90. 90. RDF/S Representation Model Express RDF/S over an adequate relational schema Hybrid method —C_IsA(A,B): A is a subclass of B —C_Inst(x,A): x is an instance of A —Domain(P,A): the domain of P is A —… Alternatives Schema-specific —One table/predicate for each class/property (A(x), B(x), P(x,y), …) —Not amenable to changes (e.g., delete class) Schema-agnostic (triple-store) —One table with three columns (spo) —Harder to define constraints, less intuitive Giorgos Flouris Open Data Tutorials, May 2013 90
    91. 91. Allowed Constraints Considered a very general class of constraints Disjunctive Embedded Dependencies (DEDs) [Deu09] Very general class Functional, inverse functional, transitivity, cardinality constraints Disjointness constraints Primary key, foreign key, inclusion constraints Tuple-generating dependencies (tgd), equality-generating dependencies (egd) Giorgos Flouris Open Data Tutorials, May 2013 91
    92. 92. Constraints Express validity constraints over the aforementioned schema: Class subsumption must be acyclic ∀x,y C_IsA(x,y) ∧ C_IsA(y,x)→ ⊥ Correct classification in property instances ∀x,y,p,a P_Inst(x,y,p) ∧ Domain(p,a) → C_Inst(x,a) ∀x,y,p,a P_Inst(x,y,p) ∧ Range(p,a) → C_Inst(y,a) Closed World Assumption (CWA) Failure to prove something, is a proof for its negation Syntactical manipulations on constraints allow Diagnosis —Finding violated constraints Repair —Identifying repairing options per violation Giorgos Flouris Open Data Tutorials, May 2013 92
    93. 93. Repairing Example Dataset D0 Class(Sensor), Class(SpatialThing), Class(Observation) Prop(geo:location) Sensor Domain(geo:location,Sensor) Observation Range(geo:location,SpatialThing) Inst(Item1), Inst(ST1) P_Inst(Item1,ST1,geo:location) C_Inst(Item1,Observation), C_Inst(ST1,SpatialThing) Correct classification in property instances ∀x,y,p,a P_Inst(x,y,p) ∧ Domain(p,a) → C_Inst(x,a) Item1 geo:location SpatialThing Schema Data geo:location ST1 Item1 geo:location ST1 P_Inst(Item1,ST1,geo:location)∈D0 Sensor is the domain of geo:location Domain(geo:location,Sensor)∈D0 Item1 is not a Sensor C_Inst(Item1,Sensor)∉D0 Remove P_Inst(Item1,ST1,geo:location) Remove Domain(geo:location,Sensor) Add C_Inst(Item1,Sensor) Giorgos Flouris Open Data Tutorials, May 2013 93
    94. 94. Preferences for Repair Which repairing option is best? Data owner determines that via preferences Preferences Specified beforehand High-level “specifications” for the ideal repair Serve as “instructions” to determine the preferred (optimal) solution Giorgos Flouris Open Data Tutorials, May 2013 94
    95. 95. Preferences (On Datasets) D1 Score: 3 D2 Score: 4 D3 D0 Score: 6 Giorgos Flouris Open Data Tutorials, May 2013 95
    96. 96. Preferences (On Deltas) D1 Score: 2 D2 -P_Inst (Item1,ST1, geo:location) D3 D0 Score: 1 -Dom (geo:location, Sensor) +C_Inst (Item1,Sensor) Giorgos Flouris Score: 5 Open Data Tutorials, May 2013 96
    97. 97. More Details on Preferences Preferences on datasets are result-oriented Consider the quality of the repair result Ignore the impact of repair Popular options: prefer newest/trustable information, prefer a specific schema structure Preferences on deltas are impact-oriented Consider the impact of repair Ignore the quality of the repair result Popular options: minimize schema changes, minimize addition/deletion of information, minimize delta size Properties of preferences Quality metrics can be used for stating preferences Metadata on the data can be used (e.g., provenance) Can be qualitative or quantitative Giorgos Flouris Open Data Tutorials, May 2013 97
    98. 98. Generalizing the Approach  For one violated constraint 1. 2. 3.  For many violated constraints    Diagnose invalidity Determine minimal ways to resolve it Determine and return preferred solution based on the preference Problem becomes more complicated More than one resolution steps are required Issues: 1. 2. 3. Resolution order When and how to filter non-optimal solutions? Constraint (and resolution) interdependencies Giorgos Flouris Open Data Tutorials, May 2013 98
    99. 99. Constraint Interdependencies  A given resolution may:    Optimal resolution unknown ‘a priori’    Cause other violations (bad) Resolve other violations (good) Cannot predict a resolution’s ramifications Exhaustive, recursive search required (resolution tree) Two ways to create the resolution tree   Globally-optimal (GO) / locally-optimal (LO) When and how to filter non-optimal solutions? Giorgos Flouris Open Data Tutorials, May 2013 99
    100. 100. Resolution Tree Creation (GO) – Find all minimal resolutions for all the violated constraints, then find the optimal ones – Globally-optimal (GO)     Find all minimal resolutions for one violation Explore them all Repeat recursively until valid Return the optimal leaves Giorgos Flouris Open Data Tutorials, May 2013 Optimal repairs (returned) 100
    101. 101. Resolution Tree Creation (LO) – Find the minimal and optimal resolutions for one violated constraint, then repeat for the next – Locally-optimal (LO)     Find all minimal resolutions for one violation Explore the optimal one(s) Repeat recursively until valid Return all remaining leaves Giorgos Flouris Open Data Tutorials, May 2013 Optimal repair (returned) 101
    102. 102. Comparison (GO versus LO) Characteristics of GO Characteristics of LO Exhaustive Less efficient: Greedy More efficient: large resolution trees Always returns optimal repairs Insensitive to constraint syntax Deterministic (result does not depend on resolution order) small resolution trees May return sub-optimal repairs Sensitive to constraint syntax Non-deterministic (result may depend on resolution order) Giorgos Flouris Open Data Tutorials, May 2013 102
    103. 103. Repair: Generality Results The approach is very general Thanks to the generality/flexibility of preferences Repair approaches can be captured using adequately designed preferences Using either the LO or the GO strategy All the current approaches that we checked Practically all future ones —This has been proved, under some general conditions regarding the behavior of the repair approach Our model can be viewed as a general approach engulfing other repair approaches Giorgos Flouris Open Data Tutorials, May 2013 103
    104. 104. Repair: Algorithms and Complexity Implemented both algorithms Detailed complexity analysis for GO/LO and various different types of constraints and preferences Inherently difficult problem Exponential complexity (in general) Main exception: LO is polynomial (in special cases) Theoretical complexity is misleading as to the actual performance of the algorithms Giorgos Flouris Open Data Tutorials, May 2013 104
    105. 105. Performance in Practice Performance in practice Linear with respect to dataset size Linear with respect to tree size —Types of violated constraints (tree width) —Number of violations (tree height) – causes the exponential blowup —Constraint interdependencies (tree height) —Preference (for LO): affects pruning (tree width) Further performance improvement Use optimizations Use LO with restrictive preference Currently considering a redesign for further improvement Giorgos Flouris Open Data Tutorials, May 2013 105
    106. 106. Summary and Conclusions: Repair Data usually problematic Different types of problems Repair is done using different approaches depending on the type of the problem Cleaning, debugging, repairing, quality assessment and enrichment Presented a formal approach for validity repair [RFC11] Other possible directions (related to LOD) Most approaches detect problems, but don’t resolve them Efficiency problems (for repairing algorithms) Exploit external knowledge on the cause of the problem (e.g., propagation of invalidity by a linked dataset) Giorgos Flouris Open Data Tutorials, May 2013 106
    107. 107. Talk Structure (D1) A.Introduction to RDF/S, DLs, OWL B.Remote change management 1.Introduction, definition of subfields 2.Literature review 3.An approach for change detection [PFF+13] A.Repair 1.Introduction, definition of subfields 2.Literature review 3.An approach for validity repair [RFC11] A.Data and Knowledge Evolution 1.Introduction, connection with belief change 2.Understanding the process of change 3.Literature review Giorgos Flouris Open Data Tutorials, May 2013 107
    108. 108. Motivation for Evolution Reasons for evolution New observations or experiments Change in the viewpoint or usage of the dataset Newly gained access to information (previously classified, unknown or otherwise unavailable) Incomplete or inaccurate conceptualization Changes in the world itself Repairing Change propagation (cascading evolution in LOD) Not an LOD-specific problem But critical for LOD as well Giorgos Flouris Open Data Tutorials, May 2013 108
    109. 109. Definition of Evolution The process of modifying a dataset in response to a change in the domain or its conceptualization Dealing with both data and schema changes Original Dataset Evolution Algorithm Modified Dataset New Data/Knowledge Giorgos Flouris Open Data Tutorials, May 2013 109
    110. 110. Evolution: Setting the Scope Evolution is an overloaded term Phases of evolution Six phases in [SMMS02], five phases in [PT05] Detecting the need for evolution, change propagation, logging changes, versioning etc Scope: apply the change and compute the new dataset Out of scope: deciding on the change, evaluating the result, managing versions, logging changes etc Giorgos Flouris Open Data Tutorials, May 2013 110
    111. 111. Explaining Evolution (1/4) Chess Dataset Representation Language: RDF Chess Piece Wooden Plastic Change: Add([King rdf:type Red]) Red White Black Schema Level Data Level King Giorgos Flouris Open Data Tutorials, May 2013 111
    112. 112. Explaining Evolution (2/4) Chess Dataset Representation Language: RDF Chess Piece Wooden Plastic Is the King Wooden? Red White Change: Del([King rdf:type Black]) Black Schema Level Data Level King Giorgos Flouris Open Data Tutorials, May 2013 112
    113. 113. Explaining Evolution (3/4) Chess Dataset Representation Language: RDF Chess Piece Wooden Plastic Red White Black Change: Del([King rdf:type Wooden]) Some domain knowledge required (extra-logical considerations) Schema Level Data Level King Giorgos Flouris Open Data Tutorials, May 2013 113
    114. 114. Explaining Evolution (4/4) Chess Dataset Representation Language: OWL Chess Piece Wooden and Plastic are disjoint [Wooden owl:disjointClass Plastic] Plastic disjoint Wooden Change: Add([King rdf:type Plastic]) Red White Black Schema Level Data Level Is the King Black? Is the King Wooden? King Giorgos Flouris Open Data Tutorials, May 2013 114
    115. 115. Side-effects in Evolution Changes should not undermine the “quality” of the dataset Side-effects: additional changes that need to be applied along with the original change to maintain knowledge integrity and quality Consistency, coherency, custom constraints, quality metrics, … Main challenge in determining the evolution result Determining side-effects Giorgos Flouris Open Data Tutorials, May 2013 115
    116. 116. Determining Side-effects Challenges in determining side-effects Evolution result not always obvious (even for humans) —Understand the process of change —Various philosophical considerations involved Selection involved (extra-logical considerations) —Domain expertise —Preferences (trust, provenance, axiom “strength” or “entrenchment”) Early evolution approaches rather naïve in this respect Ignored such issues or addressed them in an ad-hoc manner Giorgos Flouris Open Data Tutorials, May 2013 116
    117. 117. Belief Change Belief change (often referred to as belief revision) The process of modifying a knowledge base in the face of new, possibly contradictory knowledge Mature, well-established field Focuses for logical formalisms (propositional, first-order logic) Recent survey on belief change [FH11] Aims to understand the process of change The philosophical/logical counterpart of dataset evolution Can provide solutions and inspiration Giorgos Flouris Open Data Tutorials, May 2013 117
    118. 118. Cross-Fertilization with Belief Change Cross-fertilization beneficial [Flo06, FPA05, FPA06] Benefits Similar problems Differences on the underlying intuitions are minimal Belief change field more mature Frame problems and provide inspiration towards a solution Protect from pitfalls Avoid “reinventing the wheel” Problems Representation languages and formalisms are different Assumptions regarding the underlying representation language —These assumptions do not hold for LOD representation languages Can reuse the ideas, not the results themselves Giorgos Flouris Open Data Tutorials, May 2013 118
    119. 119. Talk Structure (D2) A.Introduction to RDF/S, DLs, OWL B.Remote change management 1.Introduction, definition of subfields 2.Literature review 3.An approach for change detection [PFF+13] A.Repair 1.Introduction, definition of subfields 2.Literature review 3.An approach for validity repair [RFC11] A.Data and Knowledge Evolution 1.Introduction, connection with belief change 2.Understanding the process of change 3.Literature review Giorgos Flouris Open Data Tutorials, May 2013 119
    120. 120. Challenges and Considerations List of challenges and problems related to evolution As well as some answers from the belief change field Challenges and the complexity of formalisms Some of the problems do not appear in simpler formalisms (RDF) Some of the problems are only relevant in the presence of schema —Data changes are simpler (on a fixed schema) Part of the discussion only relevant for DL, OWL Giorgos Flouris Open Data Tutorials, May 2013 120
    121. 121. Importance of Implicit Data (Example) Chess Dataset Representation Language: RDF Chess Piece Wooden Plastic Is the King Wooden? Red White Change: Del([King rdf:type Black]) Black Schema Level Data Level King Giorgos Flouris Open Data Tutorials, May 2013 121
    122. 122. Importance of Implicit Data Explicit and implicit data equally important Explicit data more important than implicit The coherence viewpoint King is Wooden The closure of the dataset is The foundational viewpoint King is not Wooden Only explicit knowledge is considered during changes considered during changes —Belief set semantics —Belief base semantics Implicit data persistent Implicit data volatile —Explicit support not necessary for —Retained only as long as there is implicit data No discrimination explicit support Discrimination —No need to distinguish explicit data from implicit —Redundant data can be deleted Giorgos Flouris Open Data Tutorials, May 2013 —Explicit data should be explicitly marked as such —Redundant data should persist 122
    123. 123. Redundant Data Chess Dataset Representation Language: RDF Chess Piece Wooden Plastic Change: Add([King rdf:type Black]) Red White Black Schema Level Data Level King Giorgos Flouris Open Data Tutorials, May 2013 123
    124. 124. The King Is Black Chess Dataset Representation Language: RDF Chess Piece Observation: the King is Black Wooden Plastic Change: Add([King rdf:type Black]) Red White Black Is the King Wooden? Schema Level Data Level King Giorgos Flouris Open Data Tutorials, May 2013 124
    125. 125. Paint It Black Chess Dataset Representation Language: RDF Chess Piece Action: King is painted Black Wooden Plastic Change: Add([King rdf:type Black]) Red White Black Is the King Wooden? Schema Level Data Level King Giorgos Flouris Open Data Tutorials, May 2013 125
    126. 126. Static and Dynamic Worlds Same dataset, same change, but different expected result Different semantics between the two cases [KM91] Different operations Static world change semantics The world does not change, but our perception of it changes Modeling or conceptualization problems, new observation etc Dynamic world change semantics The world changes, and we need to keep ourselves up-to-date No problems with the original conceptualization Giorgos Flouris Open Data Tutorials, May 2013 126
    127. 127. Types of Operations Static world Revision (add) Contraction (delete) Static Addition Revision Dynamic world Dynamic Update Deletion Contraction Erasure Update (add) Erasure (delete) Plus some more (forget, expansion, …) Less well-studied Ignored for this talk Irrelevant for LOD or trivial Giorgos Flouris Open Data Tutorials, May 2013 127
    128. 128. Example: Revision and Contraction Chess Dataset Representation Language: OWL Chess Piece Wooden Plastic Change #1 I believe that the King is not Black Add([King rdf:type NotBlack], [NotBlack owl:complementOf Black]) Red White Black NotBlack Schema Level Data Level Change #2 I do not believe that the King is Black Del([King rdf:type Black]) King Giorgos Flouris Open Data Tutorials, May 2013 128
    129. 129. Expressing the Change Different paradigms for expressing the change Modification-based —“Add([King rdf:type NotBlack], [NotBlack owl:complementOf Black])” —The exact modifications that should be applied to accommodate the new knowledge —Must know the conceptualization —Closer to the ontology expert Fact-based —“I believe that the King is not Black” —A new fact that should be accommodated in the dataset —Extra layer of abstraction (extra step required to determine modifications) —Closer to the domain expert Handling multiple changes Iterated belief change Package versus choice semantics (contraction and erasure) Merging Giorgos Flouris Open Data Tutorials, May 2013 129
    130. 130. Evolution Principles (Partial List) Principle of Success (Primacy of New Information) New information is unconditionally accepted Non-prioritized belief change Principle of Validity (Consistency Maintenance) Belief change: usually logical consistency LOD evolution: consistency, coherency, custom constraints, … Principle of Minimal Change Determine the side-effects that have minimal impact —But satisfying the other principles Corresponds to the selection process Minimality depends on the task, context, user, application, … Different postulates and intuitions (recovery, relevance etc) Different metrics (model-based, formula-based, cardinality etc) Giorgos Flouris Open Data Tutorials, May 2013 130
    131. 131. Understanding the Principles Chess Dataset Representation Language: OWL Chess Piece Wooden and Plastic are disjoint [Wooden owl:disjointClass Plastic] Plastic disjoint Wooden Change: Add([King rdf:type Plastic]) Red White Black Schema Level Data Level Invalidity (basically, inconsistency) The King is both Wooden and Plastic Three options (Minimal Change) King Giorgos Flouris Open Data Tutorials, May 2013 131
    132. 132. Non-obvious Side-effects Chess Dataset Chess_Piece Representation Language: ALC DL Chess Piece White White Wooden Plastic Plastic I don’t believe that all White items are Chess_Pieces Red White Black Schema Level Data Level King Giorgos Flouris Replace subsumptions with: White ⊓ Chess_Piece ⊑ Plastic Plastic ⊑ White ⊔ Chess_Piece Open Data Tutorials, May 2013 132
    133. 133. Talk Structure (D3) A.Introduction to RDF/S, DLs, OWL B.Remote change management 1.Introduction, definition of subfields 2.Literature review 3.An approach for change detection [PFF+13] A.Repair 1.Introduction, definition of subfields 2.Literature review 3.An approach for validity repair [RFC11] A.Data and Knowledge Evolution 1.Introduction, connection with belief change 2.Understanding the process of change 3.Literature review Giorgos Flouris Open Data Tutorials, May 2013 133
    134. 134. Classes of Belief Change Approaches (1/2) Postulates (one set for each operation) Formalize the principles, using logical conditions Essentially define the properties of a rational change operator —Some principles not considered or given varying semantics —Principle of Minimal Change is the most controversial Do not uniquely define an operator —A class of operators (expected rational results) —Extra-logical considerations would determine the actual result —Operator-specific (preferences, axiom strength, hard-coded semantics, …) Belief change context [AGM85, KM91, Han91] Evolution context [FKAC13, WWT10, QLB06a, QLB06b] Giorgos Flouris Open Data Tutorials, May 2013 134
    135. 135. Classes of Belief Change Approaches (2/2) Construction methods Intuitive constructions for a family of operators of a certain type Representation theorems —Proof that the constructed family corresponds exactly to the class of operators that satisfy a certain set of postulates Can be used as “templates” to construct rational change operators Parameterized selection process —Preferences, axiom strength, etc Popular in belief change, not so much in evolution Explicit algorithms Implement a specific operator that satisfies some of the postulates Hard-coded or parameterized selection process Popular in evolution context, not so much in belief change Giorgos Flouris Open Data Tutorials, May 2013 135
    136. 136. Discussion on the Operator Types (1/2) Connections between the various operators Static: revision/contraction interdefinable [AGM85] Dynamic: update/erasure interdefinable [KM91] Model-theoretic characterization of the connection between static/dynamic worlds (revision-update, contraction-erasure) [KM91] Postulates critical for establishing those results Revision and update more useful in practice Contraction/erasure only used to express agnosticism Contraction and erasure more interesting from a theoretical perspective More fundamental operations Giorgos Flouris Open Data Tutorials, May 2013 136
    137. 137. Discussion on the Operator Types (2/2) Revise with φ (in belief change) Contract ¬φ —This resolves, a priori, any potential inconsistency problems Add φ (without side-effects) Revise with φ (in LOD) Contract data that could potentially cause problems —Inconsistency, incoherency, … Add φ (without side-effects) Contraction is the basis for revision Simpler operation Basically, if you know how to contract, you know how to revise Most of the focus in belief change and also in LOD evolution Same for update/erasure Giorgos Flouris Open Data Tutorials, May 2013 137
    138. 138. Evolution via Editors Features Intuitive interfaces Easy to add/delete triples (but not facts) Some help for determining the side-effects of a change —Embedded reasoners and/or debugging/repair tools to propose side-effects Additional facilities —Versioning, monitoring, undo/redo, … Main problems User should be both ontology and domain expert Not applicable in some cases —Examples: automated agents, time-critical applications, massive streaming input No formal properties Examples Protégé (http://protege.stanford.edu/) NeOn toolkit (http://neon-toolkit.org/wiki/Main_Page) OntoStudio (http://www.semafora-systems.com/en/products/ontostudio/) KAON2 (http://kaon2.semanticweb.org/) Giorgos Flouris Open Data Tutorials, May 2013 138
    139. 139. Declarative Approaches SPARQL Update (http://www.w3.org/TR/sparql11-update/) For RDF Fixed semantics, no side-effects Data and schema operations (also bulk changes) RUL [MSCK05] For RDF/S, taking into account RDFS semantics Fixed semantics, predefined set of side-effects per operation Only for data operations (also bulk changes) EvoPat [RHTA10] Declaratively associate changes with side-effects (using SPARQL) SPARQL queries determine whether side-effects should be applied SPARQL update statements represent such side-effects Tempus fugit [LRV09] Event-driven, declarative specification of the operators’ semantics Giorgos Flouris Open Data Tutorials, May 2013 139
    140. 140. Fixed-Operations Approach Standard approach in the early days (e.g., [SMMS02]) Set of supported operations (Add_Class, Add_Domain, …) Identify potential problems and side-effects per operation —Decision is hard-coded or user-defined (from a set of options) —Example: when deleting a subsumption, how about implicit subsumptions? Automatic or semi-automatic Problems No consensus on the language of changes No limit on the number of operations —What about unknown/unsupported operators? No exhaustive formal analysis of potential side-effects No formal properties or other guarantees Incomplete understanding of the change process Giorgos Flouris Open Data Tutorials, May 2013 140
    141. 141. Approaches Inspired by Belief Change (1/2) Revision in ALU DL [LM04] Using preferences among axioms Inspired by “epistemic entrenchment” Revision in generic DLs [QD09] Three model-based revision operators for DLs Emphasis on the Principle of Irrelevance of Syntax —Semantical, rather than syntactical, considerations should drive the result Revision in DL-Lite [GQW12] Using a graph-based algorithm For data changes only (Abox) Update and erasure in RDF/S [GHV06, GHV11] Taking into account RDFS inference Update is trivial, erasure is challenging (due to RDFS inference) Giorgos Flouris Open Data Tutorials, May 2013 141
    142. 142. Approaches Inspired by Belief Change (2/2) Using the maxi-adjustment algorithm [MLB05, QLB06a, QLB06b] Used to repair inconsistencies in propositional knowledge bases Requires a stratification in the knowledge Adapted for disjunctive DLs Using kernel operators [Han94] Kernels: minimal sets of formulas leading to inconsistency —Minimal Inconsistency Preserving Sub-Tboxes (MIPS) [SC03] OWL [HWK06] DLs [QHHP08] Generic formalisms with no negation (such as RDF) [RW07] Giorgos Flouris Open Data Tutorials, May 2013 142
    143. 143. Postulation Approaches in Evolution (1/3) AGM: dominating paradigm in belief change [AGM85] The single most influential work in the field of belief change Contributions AGM postulates: two sets of 6 basic and 2 supplementary postulates —One set for each operator (revision and contraction) Plus various related results —Partial meet contraction —Representation theorems —Connections between operators Only for classical logics (satisfying certain assumptions) Propositional, first-order, modal logics, … Not for LOD formalisms (RDF/S, DLs, OWL) Giorgos Flouris Open Data Tutorials, May 2013 143
    144. 144. Postulation Approaches in Evolution (2/3) AGM contraction postulates adapted for monotonic logics [Flo06, FPA05, FPA06] Includes all LOD formalisms But: no satisfying contraction operator exists for many such logics Cannot find a proper result in certain cases Necessary and sufficient conditions for the existence of such an operator [FPA06, Flo06] Negative results for RDF/S, OWL, most DLs [FPA05, RWFA13] Problem stems from the postulate of recovery [AGM85] Captures the Principle of Minimal Change Controversial [Han91] Giorgos Flouris Open Data Tutorials, May 2013 144
    145. 145. Postulation Approaches in Evolution (3/3) Replacing recovery with optimal recovery [FPA06, FHP+06] Equivalent to recovery for classical logics But weaker in general Not particularly successful either Replacing recovery with relevance [Han91] An intuitive, well-established alternative to recovery Equivalent with recovery for classical logics Applicable under quite general conditions [RWFA13] —Applicable for all compact logics —Includes RDF/S, practically all DLs and OWL flavors and profiles Adequate for expressing the principles of contraction in LOD languages Connections with recovery established for non-classical logics Giorgos Flouris Open Data Tutorials, May 2013 145
    146. 146. Principle of Adequacy of Representation Principle of Adequacy of Representation The evolution result should be expressible in the same formalism as the original dataset Obvious and trivial Not always compatible with our requirements for the evolution result Postulates (e.g., AGM postulates) Specific incarnations of the Principle of Minimal Change Specific computational methods or classes of operators Two stages for the computation [CGKZ12] Find the “optimal” evolution result according to the requirements Express it in the target language (not always possible) —Inexpressibility results Giorgos Flouris Open Data Tutorials, May 2013 146
    147. 147. Inexpressibility for Classes of Operators Generic contraction methods [CGKZ12] Syntactic: remove a minimal set of explicit axioms Formula-based: remove a minimal set of axioms from the closure —Three different semantics for minimality Model-based: modify the model in a minimal manner —Eight different methods to find the “minimal” distance between models Existing contraction algorithms can be categorized along these generic classes of methods Different contraction methods not compatible in general (for DLs) Model-based and formula-based are compatible in classical logics Inexpressibility results for DL-Lite, EL (i.e., OWL2 QL, OWL2 EL) [CGKZ12] Proposal: a “hybrid” operator combining ideas from syntactic and formula-based approaches [CGKZ12] Giorgos Flouris Open Data Tutorials, May 2013 147
    148. 148. More Inexpressibility Results DL-Lite evolution [CKNZ10] Focusing on model-based and formula-based approaches for contraction Inexpressibility results Propose a formula-based approach DL revision [LLMW06] Model-based approach, limited to Abox only (data level) Inexpressibility results Propose a new DL that supports model-based evolution Approximations DL-LiteF [GLPR07, GLPR09] —Update and erasure approximation algorithms for data-level changes only —Alternative: extend DL-LiteF to make sure that result is expressible DL-Lite [WWT10] —Provide postulates and approximation algorithms for revision Giorgos Flouris Open Data Tutorials, May 2013 148
    149. 149. Other Approaches Evolution using ideas from argumentation frameworks [MRF08] ALC DL Inconsistency in a dataset is an “attack” between arguments Acceptability semantics used to resolve such attacks and eliminate inconsistencies Useful for both debugging and evolution Evolution can be reduced to debugging/repair [HHH+05] Apply the change Then repair the result to resolve problems (Principle of Validity) —Making sure the change is not “undone” during repair (Principle of Success) Giorgos Flouris Open Data Tutorials, May 2013 149
    150. 150. Evolution Under Custom Constraints Evolution in the presence of custom validity constraints [KFAC07, FKAC13] Methodology Apply the change (Principle of Success) Guarantee satisfaction of constraints (Principle of Validity) Use a preference to determine minimality (Principle of Minimal Change) Features Generic method, applied for RDF/S evolution A formal expression of the principles for the proposed setting Exhaustive method to determine all possible side-effects and identify the “best” (according to the preference) Constrain allowed preferences for rationality and performance Based on similar ideas as the repairing approach of [RFC11] Giorgos Flouris Open Data Tutorials, May 2013 150
    151. 151. Summary and Conclusions: Evolution The problem of evolution is very challenging Several issues need to be considered —Not obvious to a newcomer —Often ignored Evolution approaches Direct: manual, based on fixed operators, declarative Indirect: postulation attempts Adapted: adapting belief change algorithms or methods Other possible directions (related to LOD) Adapt for the “linked” character of LOD —Evolution during propagation or after change detection —Extra knowledge that can be exploited for adapting preferences, fine-tuning of automated algorithms etc Giorgos Flouris Open Data Tutorials, May 2013 151
    152. 152. Thank You! Giorgos Flouris Open Data Tutorials, May 2013 152
    153. 153. References (1/18) [AAM09] C. Allocca, M. d'Aquin, E. Motta. Detecting Different Versions of Ontologies in Large Ontology Repositories. IWOD-09, 2009. [ADA98] M.L. Abate, K.V. Diegert, H.W. Allen. A Hierarchical Approach to Improving Data Quality. Data Quality Journal, 4(1), 1998. [AGM85] C. Alchourron, P. Gärdenfors, D. Makinson. On the Logic of Theory Change: Partial Meet Contraction and Revision Functions. Journal of Symbolic Logic, 50:510-530, 1985. [AH06] S. Auer, H. Herre. A Versioning and Evolution Framework for RDF Knowledge Bases. PSI-06, Revised Papers, 2006. [BC09] C. Bizer, R. Cyganiak. Quality-driven Information Filtering Using the WIQA Policy Framework. Journal of Web Semantics, 7:1–10, 2009. [BLHL01] T. Berners-Lee, J. Hendler, O. Lassila, The Semantic Web. Scientific American, 2001. Giorgos Flouris Open Data Tutorials, May 2013 153
    154. 154. References (2/18) [CGKZ12] B. Cuenca Grau, E. Kharlamov, D. Zheleznyakov. Ontology Contraction: Beyond the Propositional Paradise. AMW-12, 2012. [CKNZ10] D. Calvanese, E. Kharlamov, W. Nutt, D. Zheleznyakov. Evolution of DLLite Knowledge Bases. ISWC-10, 2010. [CMDZ10] C.A. Curino, H.J. Moon, A. Deutsch, C. Zaniolo. Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System: PRISM++. PVLDB 4(2):117-128, 2010. [CMZ08] C.A. Curino, H.J. Moon, C. Zaniolo. Graceful Database Schema Evolution: The PRISM Workbench. PVLDB 1(1):761-772, 2008. [CQ13] G. Cheng, Y. Qu. Relatedness Between Vocabularies on the Web of Data: A Taxonomy and an Empirical Study. Web Semantics: Science, Services and Agents on the World Wide Web, 2013. Available at: http://dx.doi.org/10.1016/j.websem.2013.02.001 Giorgos Flouris Open Data Tutorials, May 2013 154
    155. 155. References (3/18) [Deu09] A. Deutsch. FOL Modeling of Integrity Constraints (Dependencies). Encyclopedia of Database Systems, 2009. [DA09] R. Djedidi, M. Aufaure. Change Management Patterns (CMP) for Ontology Evolution Process. IWOD-09, 2009. Giorgos Flouris Open Data Tutorials, May 2013 155
    156. 156. References (4/18) [FH10] C. Furber, M. Hepp. Using Semantic Web Resources for Data Quality Management. EKAW-10, 2010. [FH11] E. Ferme, S.O. Hansson. AGM 25 Years: Twenty-five Years of Research in Belief Change. Journal of Philosophical Logic 40:295-331, 2011. [FHP+06] G. Flouris, Z. Huang, J.Z. Pan, D. Plexousakis, H. Wache. Inconsistencies, Negations and Changes in Ontologies. AAAI-06, 2006. [FKAC13] G. Flouris, G. Konstantinidis, G. Antoniou, V. Christophides. Formal Foundations for RDF/S KB Evolution. International Journal on Knowledge and Information Systems, 35(1):153-191, 2013. [Flo06] G. Flouris. On Belief Change and Ontology Evolution. Ph.D. thesis, University of Crete, 2006. Giorgos Flouris Open Data Tutorials, May 2013 156
    157. 157. References (5/18) [FPA05] G. Flouris, D. Plexousakis, G. Antoniou. On Applying the AGM Theory to DLs and OWL. ISWC-05, 2005. [FPA06] G. Flouris, D. Plexousakis, G. Antoniou. On Generalizing the AGM Postulates. STAIRS-06, 2006. [FMK+08] G. Flouris, D. Manakanatas, H. Kondylakis, D. Plexousakis, G. Antoniou. Ontology Change: Classification and Survey. Knowledge Engineering Review, 23(2):117152, 2008. [FMV10] E. Franconi, T. Meyer, I. Varzinczak. Semantic Diff as the Basis for Knowledge Base Versioning. NMR-10, 2010. [FRPV+12] G. Flouris, Y. Roussakis, M. Poveda-Villalon, P.N. Mendes, I. Fundulaki. Using Provenance for Quality Assessment and Repair in Linked Open Data. EvoDyn-12, 2012. Giorgos Flouris Open Data Tutorials, May 2013 157
    158. 158. References (6/18) [GHV06] C. Gutierrez, C. Hurtado, A. Vaisman. The Meaning of Erasing in RDF Under the Katsuno-Mendelzon Approach. WebDB-06, 2006. [GHV11] C. Gutierrez, C. Hurtado, A. Vaisman. RDFS Update: From Theory to Practice. ESWC-11, 2011. [GLPR07] G. Giacomo, M. Lenzerini, A. Poggi, R. Rosati. On the Approximation of Instance Level Update and Erasure in Description Logics. AAAI-07, 2007. [GLPR09] G. Giacomo, M. Lenzerini, A. Poggi, R. Rosati. On Instance-level Update and Erasure in Description Logic Ontologies. Journal of Logic and Computation 19(5):745-770, 2009. [GQW12] S. Gao, G. Qi, H. Wang. A New Operator for ABox Revision in DL-Lite. AAAI-12, 2012. [Gru93] T.R. Gruber. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5 (2), 1993. Giorgos Flouris Open Data Tutorials, May 2013 158
    159. 159. References (7/18) [Han91] S.O. Hansson. Belief Contraction Without Recovery. Studia Logica 50(2):251-260, 1991. [Han94] 859, 1994. S.O. Hansson. Kernel Contraction. Journal of Symbolic Logic, 59(3):845- [HGR12] M. Hartung, A. Gross, E. Rahm. COnto-diff: Generation of Complex Evolution Mappings for Life Science Ontologies. Journal of Biomedical Informatics, 2012. [HH00] J. Heflin, J. Hendler. Dynamic Ontologies on the Web. AAAI-00, 2000. [HHM+10] H. Halpin, P.J. Hayes, J.P. McCusker, D.L. McGuiness, H.S. Thompson. When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data. ISWC-10, 2010. [HHH+05] P. Haase, F. van Harmelen, Z. Huang, H. Stuckenschmidt, Y. Sure. A Framework for Handling Inconsistency in Changing Ontologies. ISWC-05, 2005. [HHP+10] A. Hogan, A. Harth, A. Passant, S. Decker, A. Polleres. Weaving the Pedantic Web. LDOW-10, 2010. [HP04] J. Heflin, J.Z. Pan. A Model Theoretic Semantics for Ontology Versioning. ISWC-04, 2004. [HS05] Z. Huang, H. Stuckenschmidt. Reasoning with Multi-version Ontologies: A Temporal Logic Approach. ISWC-05, 2005. [HWK06] C. Halaschek-Wiener, Y. Katz. Belief Base Revision for Expressive Description Logics. OWLED-06, 2006. Giorgos Flouris Open Data Tutorials, May 2013 159
    160. 160. References (8/18) [ILK12] D.H. Im, S.W. Lee, H.J. Kim. A Version Management Framework for RDF Triple Stores. International Journal of Software Engineering and Knowledge Engineering, 22(1):85-106, 2012. [JAP09] M. Javed, Y. Abgaz, C. Pahl. A Pattern-based Framework of Change Operators for Ontology Evolution. OTM-09, 2009. [Jur74] Giorgos Flouris J.M. Juran. The Quality Control Handbook. McGraw-Hill, New York, 1974. Open Data Tutorials, May 2013 160
    161. 161. References (9/18) [KFAC07] G. Konstantinidis, G. Flouris, G. Antoniou, V. Christophides. Ontology Evolution: A Framework and its Application to RDF. SWDB-ODBIS-07, 2007. [KFKO02] M. Klein, D. Fensel, A. Kiryakov, D. Ognyanov. Ontology Versioning and Change Detection on the Web. EKAW-02, 2002. [KHS12] M. Knuth, J. Hercher, H. Sack. Collaboratively Patching Linked Data. USEWOD-12, 2012. [KLGE07] N. Keberle, Y. Litvinenko, Y. Gordeyev, V. Ermolayev. Ontology Evolution Analysis with OWL-MeT. IWOD-07, 2007. [KM91] H. Katsuno, A.O. Mendelzon. On the Difference Between Updating a Knowledge Base and Revising It. KR-91, 1991. [KN03] M. Klein, N. Noy. A Component-based Framework for Ontology Evolution. IJCAI-03 Workshop on Ontologies and Distributed Systems, CEUR-WS, vol. 71, 2003. [KPS+06] A. Kalyanpur, B. Parsia, E. Sirin, B. Cuenca Grau. Repairing Unsatisfiable Concepts in OWL Ontologies. ESWC-06, 2006. [KWW08] B. Konev, D. Walther, F. Wolter. The Logical Difference Problem for Description Logic Terminologies. IJCAR-08, 2008. [KWZ08] R. Kontchakov, F. Wolter, M. Zakharyaschev. Can you Tell the Difference Between DL-Lite Ontologies? KR-08, 2008. Giorgos Flouris Open Data Tutorials, May 2013 161
    162. 162. References (10/18) [LB10] J. Lehmann, L. Buhmann. ORE - A Tool for Repairing and Enriching Knowledge Bases. ISWC-10, 2010. [LLMW06] KR-06, 2006. H. Liu, C. Lutz, M. Milicic, F. Wolter. Updating Description Logic ABoxes. [LM04] K. Lee, T. Meyer. A Classification of Ontology Modification. AI-04, 2004. [LPSV06] S.C. Lam, J. Pan, D. Sleeman, W. Vasconcelos. A Fine-grained Approach to Resolving Unsatisfiable Ontologies. WI-06, 2006. [LRV09] U. Lusch, S. Rudolph, D. Vrandecic. Tempus Fugit: Towards an Ontology Update Language. ESWC-09, 2009. Giorgos Flouris Open Data Tutorials, May 2013 162
    163. 163. References (11/18) [Mel04] S. Melnik. Generic Model Management: Concepts and Algorithms. Springer, 2004. [MFRW00] D.L. McGuinness, R. Fikes, J. Rice, S. Wilder. An Environment for Merging and Testing Large Ontologies. KR-00, 2000. [MHS09] B. Motik, I. Horrocks, U. Sattler. Bridging the Gap Between OWL and Relational Databases. Journal of Web Semantics, 7(2):74-89, 2009. [MLA+12] M. Morsey, J. Lehmann, S. Auer, C. Stadler, S. Hellmann. DBpedia and the Live Extraction of Structured Data from Wikipedia. Program: Electronic library and Information Systems, 46(2):157-181, 2012. [MLB05] T. Meyer, K. Lee, R. Booth. Knowledge Integration for Description Logics. AAAI-05, 2005. [MLBP06] T. Meyer, K. Lee, R. Booth, J.Z. Pan. Finding Maximally Satisfiable Terminologies for the Description Logic ALC. AAAI-06, 2006. Giorgos Flouris Open Data Tutorials, May 2013 163
    164. 164. References (12/18) [MMB12] P. Mendes, H. Muhleisen, C. Bizer. Sieve: Linked Data Quality Assessment and Fusion. LWDM-12, 2012. [MMS+03] A. Maedche, B. Motik, L. Stojanovic, R. Studer, R. Volz. An Infrastructure for Searching, Reusing and Evolving Distributed Ontologies. WWW-03, 2003. [MRF08] M. Moguillansky, N. Rotstein, M. Falappa. A Theoretical Model to Handle Ontology Debugging and Change through Argumentation. IWOD-08, 2008. [MSCK05] M. Magiridou, S. Sahtouris, V. Christophides, M. Koubarakis. RUL: A Declarative Update Language for RDF. ISWC-05, 2005. [MWK00] P. Mitra, G. Wiederhold, M.L. Kersten. A Graph-oriented Model for Articulation of Ontology Interdependencies. EDBT-00, 2000. Giorgos Flouris Open Data Tutorials, May 2013 164
    165. 165. References (13/18) [NCLM06] N. Noy, A. Chugh, W. Liu, M. Musen. A Framework for Ontology Evolution in Collaborative Environments. ISWC-06, 2006. [NKKM04] N. Noy, S. Kunnatur, M. Klein, M. Musen. Tracking Changes During Ontology Evolution. ISWC-04, 2004. [NM00] N.F. Noy, M.A. Musen. Prompt: Algorithm and Tool for Automated Ontology Merging and Alignment. In AAAI/IAAI-00, 2000. [OK02] D. Ognyanov, A. Kiryakov. Tracking Changes in RDF(S) Repositories. EKAW-02, 2002. Giorgos Flouris Open Data Tutorials, May 2013 165
    166. 166. References (14/18) [PFF+13] V. Papavassiliou, G. Flouris, I. Fundulaki, D. Kotzinos, V. Christophides. High-Level Change Detection in RDF(S) KBs. Transactions on Database Systems (TODS), 38(1), 2013. [PM10] A. Passant, P.N. Mendes. SparqlPuSH: Proactive Notification of Data Updates in RDF Stores Using PubSubHubbub. SFSW-10, 2010. [PT05] P. Plessers, O. de Troyer. Ontology Change Detection Using a Version Log. ISWC-05, 2005. [PT06] P. Plessers, O. de Troyer. Resolving Inconsistencies in Evolving Ontologies. ESWC-06, 2006. [PTC05] P. Plessers, O. de Troyer, S. Casteleyn. Event-based Modeling of Evolution for Semantic-driven Systems. CAiSE-05, 2005. [PTC07] P. Plessers, O. de Troyer, S. Casteleyn. Understanding Ontology Evolution: A Change Detection Approach. Web Semantics: Science, Services and Agents on the WWW, 2007. Giorgos Flouris Open Data Tutorials, May 2013 166
    167. 167. References (15/18) [QD09] G. Qi, J. Du. Model-based Revision Operators for Terminologies in Description Logics. IJCAI-09, 2009. [QHHP08] G. Qi, P. Haase, Z. Huang, J.Z. Pan. A Kernel Revision Operator for Terminologies. DL-08, 2008. [QLB06a] G. Qi, W. Liu, D. Bell. Knowledge Base Revision in Description Logics. JELIA-06, 2006. [QLB06b] G. Qi, W. Liu, D. Bell. A Revision-based Approach for Handling Inconsistency in Description Logics. NMR-06, 2006. [QP07] G. Qi, J. Pan. A Stratification-based Approach for Inconsistency Handling in Description Logics. IWOD-07, 2007. Giorgos Flouris Open Data Tutorials, May 2013 167
    168. 168. References (16/18) [RFC11] Y. Roussakis, G. Flouris, V. Christophides. Declarative Repairing Policies for Curated KBs. HDMS-11, 2011. [RH09] T. Ravn, M. Hoedbolt. How to Measure and Monitor the Quality of Master Data. 2009. Available at: http://www.informationmanagement.com/issues/2007_58/master_data_management_mdm_quality-100153581.html [RHTA10]C. Riess, N. Heino, S. Tramp, S. Auer. EvoPat - Pattern-based Evolution and Refactoring of RDF Knowledge Bases. ISWC-10, 2010. [RPH+12] A. Rula, M. Palmonari, A. Harth, S. Stadtmüller, A. Maurino. On the Diversity and Availability of Temporal Information in Linked Open Data. ISWC-12, 2012. [RSDT08] T. Redmond, M. Smith, N. Drummond, T. Tudorache. Managing Change: An Ontology Version Control System. OWLED-08, 2008. [RW07] M.M. Ribeiro, R. Wassermann. Base Revision in Description Logics – Preliminary Results. IWOD-07, 2007. [RWFA13] M.M. Ribeiro, R. Wassermann, G. Flouris, G. Antoniou. Minimal Change: Relevance and Recovery Revisited. AI Journal (to appear), 2013. Giorgos Flouris Open Data Tutorials, May 2013 168
    169. 169. References (17/18) [SC03] S. Schlobach, R. Cornet. Non-Standard Reasoning Services for the Debugging of Description Logic Terminologies. IJCAI-03, 2003. [SMMS02] L. Stojanovic, A. Maedche, B. Motik, N. Stojanovic. User-driven Ontology Evolution Management EKAW-02, 2002. [SK03] H. Stuckenschmidt, M. Klein. Integrity and Change in Modular Ontologies. IJCAI-03, 2003. [SP10] Y. Stavrakas, G. Papastefanatos. Supporting Complex Changes in Evolving Interrelated Web Databanks. CoopIS-10, 2010. [SSN+10] H. Van de Sompel, R. Sanderson, M.L. Nelson, L.L. Balakireva, H. Shankar, S. Ainsworth. An HTTP-Based Versioning Mechanism for Linked Data. LDOW-10, 2010. [TSBM10] J. Tao, E. Sirin, J. Bao, D.L. McGuinness. Integrity Constraints in OWL. AAAI-10, 2010. [TTA08] Y. Tzitzikas, Y. Theoharis, D. Andreou. On Storage Policies for the Semantic Web Repositories that Support Version. ESWC-08, 2008. [TLZ12] Y. Tzitzikas, C. Lantzaki, D. Zeginis. Blank Node Matching and RDF/S Comparison Functions. ISWC-12, 2012. Giorgos Flouris Open Data Tutorials, May 2013 169
    170. 170. References (18/18) [VWS+05] M. Volkel, W. Winkler, Y. Sure, S. Kruk, M. Synak. SemVersion: A Versioning system for RDF and Ontologies. ESWC-05, 2005. [WHR+05] H. Wang, M. Horridge, A. Rector, N. Drummond, J. Seidenberg. Debugging OWL-DL Ontologies: A Heuristic Approach. ISWC-05, 2005. [WWT10] Z. Wang, K. Wang, R. Topor. A New Approach to Knowledge Base Revision in DL-Lite. AAAI-10, 2010. [ZAA+13] F. Zablith, G. Antoniou, M. d’Aquin, G. Flouris, H. Kondylakis, E. Motta, D. Plexousakis, M. Sabou. Ontology Evolution: A Process Centric Survey. Knowledge Engineering Review (to appear). [ZTC11] D. Zeginis, Y. Tzitzikas, V. Christophides. On Computing Deltas of RDF/S Knowledge Bases. ACM Transactions on the Web (TWEB) 5(3), 2011. [ZZL+03] Z. Zhang, L. Zhang, C.X. Lin, Y. Zhao, Y. Yu. Data Migration for Ontology Evolution. Poster ISWC-03, 2003. Giorgos Flouris Open Data Tutorials, May 2013 170

    ×