• Save
Information Quality in the Web Era
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Information Quality in the Web Era

on

  • 459 views

The tutorial has been presented at CAISE 2010. The tutorial discusses the state-of-the-art on research addresseing the quality of data at the conceptual level (conceptual schemas) and of Ontologies

The tutorial has been presented at CAISE 2010. The tutorial discusses the state-of-the-art on research addresseing the quality of data at the conceptual level (conceptual schemas) and of Ontologies

Statistics

Views

Total Views
459
Views on SlideShare
459
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Information Quality in the Web Era Presentation Transcript

  • 1. Tutorial at CAISE 2010Information Quality in the Web Era C. Batini & Matteo Palmonari Department of Computer Science, Communication and Systems University of Milano Bicocca [batini;palmonari]@disco.unimib.it 1
  • 2. Outline•  Motivation [Palmonari] –  the Web era / information quality meeting ontologies / the ontology landscape /•  Quality of data (conceptual level) [Batini] –  frameworks / metamodels / dimensions / metrics / groups of schemas•  Quality of ontologies [Palmonari] –  frameworks / metamodels / dimensions / metrics•  Conclusions [Palmonari+Batini] 2
  • 3. Outline•  Motivation: –  the Web era / information quality meeting ontologies / the ontology landscape /•  Quality of data (conceptual level) –  frameworks / metamodels / dimensions / metrics / groups of schemas•  Quality of ontologies –  frameworks / metamodels / dimensions / metrics•  Conclusions 3
  • 4. the Web era ischaracterized by… The “Big Data” phenomenon
  • 5. How to make sense of all these data? 5
  • 6. Documents’ and diplicates’ size along time 6
  • 7. How to make sense of all these data? Data management needs data quality 7
  • 8. How to make sense of all these data? Data management needs data quality 8
  • 9. Data/information heterogeneity in Information SystemsInformation is available in different formats and is represented according different modelsPlace Country Population Main economic activityPortofino Italy 7.000 Tourism Need to consider information Image quality for heterogeneousStructured data information sourcesPortofino Map Dear Laure, I try to describe the wonder- ful harbour of Portofino as I have seen Text this morning a boat is going in, other boats are along the wharf. Small pretty buildings 9 and villas are looking on to the harbour.
  • 10. Tutorial Background - Data Quality (Structured Data)23rd International Conference on Conceptual Modeling (ER 2004), Shangai A Survey of Data Quality Issues in Cooperative Information Systems Carlo Batini Monica Scannapieco Università di Milano “Bicocca” Università di Roma “La Sapienza” batini@disco.unimib.it monscan@dis.uniroma1.it
  • 11. Tutorial Background – Towards Information Quality (Heterogenous Data)Tutorial at ER 08, Barcelona, Spain Quality of Data, Textual Information and Images: a comparative survey Speaker: C. Batini Other authors: F. Cabitza, G. Pasi, R. SchettiniDipartimento di Informatica, Sistemistica e Comunicazione, Universita’ di Milano Bicocca, Milano, Italy batini@disco.unimib.it
  • 12. How to make sense of all these data? Together with automatic techniques for information extraction, processing & integration, also need automatic techniques for assessing the quality of information Information quality for information shared, consumed and delivered on the Web Increasing attention to information semantics 12
  • 13. Of course, the “Semantic Web” perspective•  Make the semantics of 1998 information explicit with Web- compliant ontologies* by –  sharing conceptualizations/ terminologies on the Web –  sharing data on the Web•  Models, languages & technologies –  E.g. RDF, RDFS, OWL, SKOS 2006By now, let’s consider a very broad definitionAn ontology is a specification of a conceptualization.T. R. Gruber. A translation approach to portable ontologies. Knowledge Acquisition, 5(2):199-220, 1993. 13
  • 14. Ontologies out of the Semantic Web•  But also for the ones that are skeptic wrt the semantic Web,•  Ontologies (e.g. OWL ontologies, linked data, thesauri) can be considered useful external resources to use in –  Conceptual modeling –  Data integration –  Document management –  Service Oriented Computing –  Information retrieval –  … –  Software Engineering –  Information System Design 14
  • 15. Ontology + “Information Systems” 15
  • 16. Ontology + “Software Engineering” 16
  • 17. Ontologies &Semantic Resources•  KB - Axiomatic ontologies (e.g. SUMO) –  Terminological (intentional/schema) level: concepts, relationships, axioms specifying logical constraints –  Assertional (extensional/data) level: instances, typing, relations between instances•  LD - Linked data on the Web (e.g. DBpedia) –  RDF data, usually light-weight KBs•  Th – Thesauri (e.g. WordNet) –  Lexical ontologies: terms, no schema vs. instances•  In synthesis, the ontology landscape includes: –  Shared Vocabulary (KB,LD,Th) –  Modeling principles (KB) –  Logical theories supporting reasoning (KB) –  Web-compliant representations of models and data (KB,LD,Th) 17
  • 18. Need for ontology evaluation•  Ontology “Quality”  Ontology Evaluation•  Quality of ontologies matters! –  In particular, when ontologies: •  are built to support specific applications (their quality impacts on the application effectiveness) •  are searched on the Web, reused, extended –  Many ontologies to choose from! –  E.g. suppose that you need an ontology describing customer and the business domain 18
  • 19. Searching for “Customer” with Sindice 19
  • 20. Searching for “Customer” with Watson 20
  • 21. Searching for “Customer” on Swoogle 21
  • 22. Searching for “Customer” on Swoogle (refined search) 22
  • 23. Ontologies and semantic resources should be considered in comprehensive studies about information quality in the Web era Tough work! Let’s start from the beginning: ontologies and structured data 23
  • 24. Structured data and ontologies•  Structured data •  Ontologies (KB) Instances Instances Logical Schemas Schema Tight vs loose instance-schema Conceptual schemas coupling A - Concpetual level representations - Externalized models (semiotic objects) - Constraints on domain (data) Diagrammatic models (ER, UML,ORM) Logical models supporting reasoning 24
  • 25. Ontologies and their grandparents•  Structured data •  Ontologies (KB) Instances Instances Logical Schemas Schema / Terminologies Conceptual schemas In this (mini) tutorial we will: - focus on the modeling level: “Quality of Conceptual Schemas and Ontologies” A -  provide a guided tour on the topic by - Concpetual level representations discussing only part of the material (soon - Externalized models (semiotic objects) available online) on domain (data) - Constraints Diagrammatic models (ER, UML,ORM) on -  focus common aspects and, in Logical models particular, differences supporting reasoning 25
  • 26. Outline•  Motivation: –  the Web era / information quality meeting ontologies / the ontology landscape /•  Quality of data (conceptual level) –  frameworks / metamodels / dimensions / metrics / groups of schemas•  Quality of ontologies –  frameworks / metamodels / dimensions / metrics•  Conclusions 26
  • 27. Outline•  Motivation: –  the Web era / information quality meeting ontologies / the ontology landscape /•  Quality of Conceptual Schemas –  frameworks / metamodels / dimensions / metrics / groups of schemas•  Quality of ontologies –  frameworks / metamodels / dimensions / metrics•  Conclusions 27
  • 28. # of slides•  About 130  30•  I will provide mainly a guided introduction to the slides
  • 29. In a database, quality can be investigated..•  At model (language) level•  At schema (model) level•  Al instance (value/data) level 29
  • 30. Data quality dimensions 30
  • 31. Acronym Data Quality DimensionTDQM Accessibility, Appropriateness, Believability, Completeness, Concise/Consistent representation, Ease of manipulation, Value added, Free of error, Interpretability, Objectivity, Relevance, Reputation, Security, Timeliness, UnderstandabilityDWQ Correctness, Completeness, Minimality, Traceability, Interpretability, Metadata Evolution, Accessibility (System, Transactional, Security), Usefulness (Interpretability), Timeliness (Currency, Volatility), Responsiveness, Completeness, Credibility, Accuracy, Consistency, InterpretabilityTIQM Inherent dimensions: Definition conformance (consistency), Completeness, Business rules conformance, Accuracy (to surrogate source), Accuracy (to reality), Precision, Nonduplication, Equivalence of redundant data, Concurrency of redundant data, Pragmatic dimensions: accessibility, timeliness, contextual clarity, Derivation integrity, Usability, Rightness (fact completeness), cost.AIMQ Accessibility, Appropriateness, Believability, Completeness, Concise/Consistent representation, Ease of operation, Freedom from errors, Interpretability, Objectivity, Relevancy, Reputation, Security, Timeliness, UnderstandabilityCIHI Dimensions: Accuracy, Timeliness Comparability, Usability, Relevance Characteristics: Over-coverage, Under-coverage, Simple/correlated response variance, Reliability, Collection and capture, Unit/Item non response, Edit and imputation, Processing, Estimation, Timeliness, Comprehensiveness, Integration, Standardization, Equivalence, Linkage ability, Product/Historical comparability, Accessibility, Documentation, Interpretability, Adaptability, Value.DQA Accessibility, Appropriate amount of data, Believability, Completeness, Freedom from errors, Consistency, Concise Representation, Relevance, Ease of manipulation, Interpretability, Objectivity, Reputation, Security, Timeliness, Understandability, Value added.IQM Accessibility, Consistency, Timeliness, Conciseness, Maintainability, Currency, Applicability, Convenience, Speed, Comprehensiveness, Clarity, Accuracy, Traceability, Security, Correctness, Interactivity.ISTAT Accuracy, Completeness, ConsistencyAMEQ Consistent representation, Interpretability, Case of understanding, Concise representation, Timeliness, Completeness Value added, Relevance, Appropriateness, Meaningfulness, Lack of confusion, Arrangement, Readable, Reasonability, Precision, Reliability, Freedom from bias, Data Deficiency, Design Deficiency, Operation, Deficiencies, Accuracy, Cost, Objectivity, Believability, Reputation, Accessibility, Correctness, Unambiguity, ConsistencyCOLDQ Schema: Clarity of definition, Comprehensiveness, Flexibility, Robustness, Essentialness, Attribute granularity, Precision of domains, Homogeneity, Identifiability, Obtainability, Relevance, Simplicity/Complexity, Semantic consistency, Syntactic consistency. Data: Accuracy, Null Values, Completeness, Consistency, Currency, Timeliness, Agreement of Usage, Stewardship, Ubiquity, Presentation: Appropriateness, Correct Interpretation, Flexibility, Format precision, Portability, Consistency, Use of storage, Information policy: Accessibility, Metadata, Privacy, Security, Redundancy, Cost.DaQuinCIS Accuracy, Completeness, Consistency, Currency, TrustworthinessQAFD Syntactic/Semantic accuracy, Internal/External consistency, Completeness, Currency, Uniqueness.CDQ Schema: Correctness with respect to the model, Correctness with respect to Requirements, Completeness, Pertinence, Readability, Normalization, Data: Syntactic/Semantic Accuracy, Semantic Accuracy, Completeness, Consistency, Currency, Timeliness, Volatility, Completability, Reputation, Accessibility, Cost. 31
  • 32. Reference forquality of data in databases 2006 32
  • 33. Here we focus on•  Model level• Schema level•  Data level 33
  • 34. Quality of Conceptual Schemas - contents•  Frameworks and Metamodels proposed•  Quality of Schemas –  Classifications, Dimensions & Metrics: main proposals –  Comparison of proposals –  Improving the quality of schemas•  Quality of groups of schemas –  Quality of Data Integration Architectures –  Quality of the documentation for large related groups of schemas 34
  • 35. Quality of schemas 35
  • 36. Some figures on proposed approaches in the literature (from Mehmood 2009, citing Moody 2005) Research Practice Mixed # of proposals 29 8 2 Frameworks and % of total 74% 21% 5% metamodels Empirically validated 6 0 1 % 20% 0% 50% Generalizable 5 0 0 % 175 0% 0% Not generalizable 24 8 2 % 83% 100% 100%Generalizable means that the proposal can be applied toconceptual models in general and is not specific to, e.g., ER
  • 37. Metaschema of approaches Formal Meta Classification Framework schema One/two or three level taxnomies Quality Concepts and dimension Concepts and paradigmsparadigms involved in involved in the life cyclea formally grounded of quality, namely in the approach to quality Quality production assessment subdimension and improvement activities Metrics Examples Experiments 37
  • 38. Krogstie & Solvberg (the Scandinavians) Proposals Meta Classification Formal schema Framework• Shanks Quality• Arab French dimension• Vassiliadis Quality origins – Batini et al. The subdimension • Scandinavians • Arab French • Moody Metrics • Genero et al. • Herden • Poels Examples Experiments 38
  • 39. Proposals Formal Meta ClassificationFramework schema Quality dimension Quality subdimension Metrics Examples Experiments 39
  • 40. Frameworks for schema quality 40
  • 41. Krogstie and Solvberg framework Social Participant quality knowledge Perceived Social actor Semantic InterpretationGoal of qualitymodeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Technical actor Intepretation 41
  • 42. Krogstie and Solvberg framework Social Participant quality knowledge Perceived Social actor Semantic InterpretationGoal of Correspondence quality betweenmodeling the conceptual model and Physical Empirical Social quality quality domain the Pragmatic Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Technical actor Intepretation 42
  • 43. Krogstie and Solvberg framework Correspondence between participant knowledge and individual interpretation Social Participant quality knowledge Perceived Social actor Semantic InterpretationGoal of qualitymodeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Technical actor Intepretation 43
  • 44. Krogstie and Solvberg framework Social Participant quality knowledge Perceived Social actor Semantic InterpretationGoal of qualitymodeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Correspondence between the conceptual model and Technical actor Intepretation the language 44
  • 45. Krogstie and Solvberg framework Social Participant quality knowledge Perceived Social actor Semantic Interpretation Goal of quality modeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic qualityCorrespondence between theconceptual model and the Technical actoraudience’s interpetation Intepretation of it 45
  • 46. Correspondence between participant knowledge and Krogstie and Solvberg framework the externalized conceptual model ° Externalization: the knowledge of social actors has been externalized in the model Social ° Internalizability, the model is persistent Participant quality knowledge Perceived Social actor Semantic InterpretationGoal of qualitymodeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Technical actor Intepretation 46
  • 47. Krogstie and Solvberg framework SocialIt is reflected by the error frequency when a model is Participant quality Perceivedread or written, so by readability and clarity Social actor knowledge Semantic InterpretationGoal of qualitymodeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Technical actor Intepretation 47
  • 48. Krogstie and Solvberg framework Social Participant quality knowledge Perceived Social actor Semantic InterpretationGoal of qualitymodeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Syntactic Language domain Agreement on participant knowledge Semantic Model quality extension quality Externalization and individual interpretation Technical Pragmatic quality Technical actor Intepretation 48
  • 49. More formally•  G, the goals of the modeling task.•  L, the language extension, i.e., the set of all statements that are possible to make according to the graphemes, vocabulary, and syntax of the modeling languages used.•  D, the domain, i.e., the set of all statements that can be stated about the situation at hand.•  M, the model (schema) itself.•  Ks, the relevant explicit knowledge of those being involved in modeling. A subset of these is actively involved in modeling, and their explicit knowledge is indicated by KM.•  I, the social actor interpretation, i.e., the set of all statements that the audience thinks that an externalized model consists of.•  T, the technical actor interpretation, i.e., the statements in the model as interpreted by modeling tools. 49
  • 50. Main quality types•  Physical quality: The basic quality goal is that the model M is available for the audience.•  Empirical quality deals with predictable error frequencies when a model is read or written by different users, coding (e.g. shapes of boxes) and HCI-ergonomics for documentation and modeling-tools. For instance, graph layout to avoid crossing lines in a model is a mean to address the empirical quality of a model.•  Syntactic quality is the correspondence between the model M and the language extension L.•  Semantic quality is the correspondence between the model M and the domain D. This includes validity and completeness.•  Perceived semantic quality is the similar correspondence between the audience interpretation I of a model M and his or hers current knowledge K of the domain D.•  Pragmatic quality is the correspondence between the model M and the audiences interpretation and application of it (I). 50
  • 51. Framework for language (model) quality 51
  • 52. Framework for language (model) quality Participant Social actor knowledge Interpretation Participant appropriatenessGoal ofmodeling Organizational Modeler appropr. Comprehensibility appropriateness appropriateness Model Externalization Language Modeling Domain extension domain Appropriateness Tool Appropriateness Technical actor Intepretation 52
  • 53. Main quality typesDomain appropriateness. This relates the language and the domain. Ideally, the conceptual basis must be powerful enough to express anything in the domain, not having what terms construct deficit. On the other hand, you should not be able to express things that are not in the domain, i.e. what is termed construct excess. Domain appropriateness is primarily a mean to achieve semantic quality.Participant appropriateness relates the social actors’ explicit knowledge to the language. Participant appropriateness is primarily a mean to achieve pragmatic quality both for comprehension, learning and action.Modeler appropriateness: This area relates the language extension to the participant knowledge. The goal is that there are no statements in the explicit knowledge of the modeler that cannot be expressed in the language. Modeler appropriateness is primarily a mean to achieve semantic quality. 53
  • 54. Main quality typesComprehensibility appropriateness relates the language to the social actor interpretation. The goal is that the participants in the modeling effort using the language understand all the possible statements of the language. Comprehensibility appropriateness is primarily a mean to achieve empirical and pragmatic quality.Tool appropriateness relates the language to the technical audience interpretations. For tool interpretation, it is especially important that the language lend itself to automatic reasoning. This requires formality (i.e. both formal syntax and semantics being operational and/or logical), but formality is not necessarily enough, since the reasoning must also be efficient to be of practical use. This is covered by what we term analyzability (to exploit any mathematical semantics) and executability (to exploit any operational semantics). Different aspects of tool appropriateness are means to achieve syntactic, semantic and pragmatic quality (through formal syntax, mathematical semantics, and operational semantics).Organizational appropriateness relates the language to standards and other organizational needs within the organizational context of modeling. These are means to support organizational quality. 54
  • 55. Metamodels 55
  • 56. Shanks et al. composite modelTheory based Domain Quality type Means Language Goal Property Prqa Model Activity Audience Weighting Quality factor Rating Evaluation method Practice based 56
  • 57. Metamodels – Arab/French Mehmood, Chefri et al. 2009, based on goals, question, metrics Quality goal Q. Dimension Q. Attribute Model elementTransformation Transformation Q. Metric step rule 57
  • 58. Metamodel instantiationQuality goal Ease of changeDimension Complexity Mantainability Quality Simplicity Structural Modu Under Modi attribute complexity larity standa fiabi bility lity Quality # of # of metric associations dependencies Transfor Merge Divide mation entities The model 58
  • 59. Metamodels – Vassiliadis et al. For DWs Quality goal Q. Dimension Improvement Factor process Interaction Measurem. Q. Metric methodTransformation Information Measurem. System object value Date Data o. Process o. Model o. 59
  • 60. Quality goal Q. Dimension Comparison Improvement Factor process Interaction Measurem. Q. Metric method Transformation Vassiliadis Information System object Measurem. value Date Data o. Process o. Model o.Quality goal Q. Dimension Q. Attribute Model element MehemoodTransformation Transformation Q. Metric step rule
  • 61. Schema Quality Dimensions 61
  • 62. The origins…Batini, Ceri, Navathe 1991 Formal Meta Classifica Frame schema tion work Quality dimension Quality subdimension Metrics Examples Experiments
  • 63. Batini, Ceri, Navathe 1991Q. Dimension DefinitionCompleteness Represents all (only) relevant features ofPertinence requirementsCorrectness - Concepts are properly defined in the schemaSyntacticCorrectness - Concepts are used according to their definitionsSemanticMinimality Every aspect of reqs. appears only once in the schemaExpressiveness Can be easily understoodReadability Diagram respects aesthetic criteriaSelf-explaination Other formalisms and languages not neededExtensibility Easily adapted to changing requirementsNormality From theory of normalization 63
  • 64. CompletenessCompleteness measures theextent to which a conceptual Students have aSchema includes all the code, a name, a place of birth.conceptual elements necessary tomeet some specified requirements.It is possible that the designer hasnot included certain characteristicspresent in the requirements in the Codeschema, e.g., attributes related to Student Namean entity Person; in this case, theschema is incomplete. 64
  • 65. PertinencePertinence measures how manyunnecessary conceptual Students have a code and a name.elements are included in theConceptual schema. In the caseof a schema that is notpertinent, the designer hasGone too far in modeling the Coderequirements, and has included Student Name Place_oftoo many concepts. Birth 65
  • 66. Correctness - syntacticConcerns the correct use of thecategories of the model in representingrequirements. StudentExample – In the Entity Relationshipmodel we may represent the (1,n)logical link between persons and their hasfirst names using the two entities Person (1,1)and FirstName and a relationship between First Namethem. The schema is not correct wrt themodel since an entity should be used onlywhen the concept has a unique existencein the real world and has an identifier. 66
  • 67. Correctness - semanticCorrectness with respect to requirementsconcerns the correct representation ofThe requirements in terms of the model Managercategories. (1,n)Example - In an organization eachdepartment is headed by exactly one headsmanager and each manager may head (1,1)exactly one department. DepartmentIf we represent Manager and Departmentas entities, the Relationship between themshould be one-to-one; in this case, theSchema is correct wrt requirements. If weUse a one-to-many relationship, theschema is incorrect. 67
  • 68. Minimality/Redundancy 1,nA schema is minimal if every Studentpart of the requirements is 1,nrepresented only once in the Attends 1,nschema. In other words, it is Course Assigned tonot possible to eliminate some 1,?element from the schema Teacheswithout compromising the 1,n Instructorinformation content. 1,n 68
  • 69. Expressiveness/ReadabilityIntuitively, a schema is readable whenever it representsthe meaning of the reality represented by the schema in aclear way for its intended use. This simple, qualitativedefinition is not easy to translate in a more formal way,since the evaluation expressed by the word clearlyconveys some elements of subjectivity. In models, such asthe Entity Relationship model, that provide a graphicalrepresentation of the schema, called readability concernsboth the diagram and the schema itself. 69
  • 70. Diagrammatic readabilityWith regard to the diagrammatic representation,readability can be expressed objectively by anumber of aesthetic criteria that human beings adopt indrawing diagrams: 1.  crossings between lines should be minimized, 2.  graphic symbols should be embedded in a grid, 3.  lines should be made of horizontal or vertical segments, 4.  The number of bends in lines should be minimized, 5.  the total area of the diagram should be minimized, and, finally, 6.  Parents in generalization hierarchies should be positioned at a higher level in the diagram in respect to children. 7.  The children entities in the generalization hierarchy should be symmetrical with respect to the parent entity. 70
  • 71. Unreadable schema Works Manages Head Employee Floor Purchase Vendor Located Born InDepartment Warehouse Engineer Worker Of Produces Acquires Order Item Type City Warranty 71 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  • 72. A Readable schema Floor Located Manages Head Born CityDepartment Employee Works Produces Vendor Worker Engineer Item In Warehouse Type Acquires Order Of Purchase Warranty 72 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  • 73. Is diagrammatic readability objective? SEM Place close entitities inSYNT Minimize generalizations bends Works Manages Head Employee SYNT Minimize Minimize Floor Purchase crossings crossings… Vendor LocatedSEM Place most Born Don’t change at all ! important In Department Warehouse Engineer Worker concept in the Of middle Produces Acquires Order Item Type CitySYNT Use only Warranty horizontal Works Manages Floor Head Located Manages Employee Department Head Employee Born City Floor Purchase Works Vendor Located Born Produces Vendor Worker Engineer Department In Warehouse Engineer Worker Item In Of Produces Type Warehouse Acquires Order Item Type Warranty Acquires Order Of Purchase City Warranty 73 @C.Batini, 2009
  • 74. But ……personal experience in China, Beda University, about 1985Question to chinese professors:Which one of the two diagrams do you like more? Works Manages Floor Located Manages Head Employee Head City Born Floor Department Employee Purchase Works Vendor Located Born Produces Vendor Worker Engineer In Department Warehouse Engineer Worker Item In Of Produces Acquires Order Warehouse Type Item Type City Acquires Order Of Purchase Warranty Warranty Answer: definitively the left one, we like asymmetry and movement … 74 @C.Batini, 2009
  • 75. ExpressivenessThe second issue addressed by readability is thecompactness of schema representation. Among thedifferent conceptual schemas that equivalently represent acertain reality, we prefer the one or the ones that aremore compact, because compactness favors readability. 75
  • 76. Transformation the preserves information content and enhances compactness/expressiveness Employee Born City EmployeeVendor Worker Engineer Vendor Worker Engineer Born Born City Born 76
  • 77. Normalization Unnormalized ER schema Employee-Project Employee # Salary Project # Budget Role Normalized ER schema Employee 1,n 1,n Project Assigned toEmployee # Project #Salary Role Budget 77
  • 78. Scandinavians (1994- Formal Meta Classification Framework schema Quality dimension Quality subdimension Metrics Examples Experiments
  • 79. Scandinavians (1994- Formal Meta Classification Framework schema Quality dimension Quality subdimension Metrics Examples Experiments 79
  • 80. Main model (schema) quality dimensionsPhysical quality•  Externalization, number of statements on the domain not yet stated in the model/total # of stat.•  Interalizability –  Persistence, proptection against loss or damage –  Availability, usual meaningEmpirical quality, deals with readability by the audience Expressed in terms of graph aesthetics and graph layout criteriaSyntactic quality, correspondence between the model (schema) and the language (model), where errors are due by Syntactic invalidity, words or graphems not part of the language are used Syntactic incompleteness, the model lack constructs to obey the language’s grammar (e.g. usa only one cardinality to express minimum and max cardsSemantic quality (feasible) Validity, the stements in the model are correct and relevant for the problem (feasible) Completeness, the model contains all the stements which would be correct and relevantPerceived semantic quality the correspondence between the actor interpetation of the model and her current knolwledge of the domain Validity CompletenessPragmatic quality, the correspondence between the model and the audience interpretation of it (Feasible) Comprehension the actors undesrstaod the moled, or else individual actors und. The part of the model relevant to themSocial quality, Agreement in knowledge, Agreement in model interpretationKnowledge quality, that is perfect when the audience knew everything about the domain at a given time. Validity Completeness 80
  • 81. Language quality dimensions - 1May refer a. to the language or else b. to the relationship btwn language and other issues.In the first case may refer to: –  the constructs of the language –  the external visual representationFor both Perceptibility, how easy for persons is language comprehension Expressive power, what it is possible to espress in the language Expressive economy, hoe effectively can things be expressed in the lanugage Method/tool potential, how easily the language lends itself to proper method or tool support. Reducibility, what features are provided by the language to deal with large and complex models. 81
  • 82. Language quality dimensions - 2Referring to the relationship btwn the language and other issuesDomain appropriateness, there are not statements in the domain that cannot be expressed in the languageParticipant kn. appr., statements in the language models are part of the explicit knowledge of participants.Knowledge externalizability appr. There are no statements in the explicit kn. of the participants that cannot be expressed in the languageComprehensibility apprTechnical actor interpretation appr. 82
  • 83. More of Pragmatic quality•  Social pragmatic quality (to what extent people understand and are able to use the models) and technical pragmatic quality (to what extent tools can be made that interpret the models). 83
  • 84. Arab French (2002- Formal Meta ClassificationFramework schema Quality dimension Quality subdimension Metrics Examples Experiments
  • 85. Chefri et al. classification•  Specification –  Legibility •  Clarity •  Minimality –  Non Redundancy –  Factorization degree –  Aggregation degree –  Expressiveness •  Concept expressiveness •  Schema expressiveness –  Simplicity –  Correctness•  Usage –  Understandability •  Documentation degree •  User Vocabulary •  Concept independence degree –  Completeness •  Requirements coverage degree •  Cross modeling completeness•  Implementation –  Implementability –  Maintainability •  Modifiability •  Cohesion •  Coupling 85
  • 86. Definitions – 1Q. Dimension DefinitionClarity is an aesthetic criterion, based on the graphical arrangementMinimality Every aspect of the requirements appears only onceMin - Non Redundancy No concept can be canceled without decreasing the information contentMin - Factorization degree Measures the effectiveness of inheritance hierarchies of the schemaMin - Aggregation degree Measures the efficient use of aggregate attributes in the schemaExpressiveness The schema can be easily understood without additional explainationExp – Concept and schema expr CompactnessSimplicity The schema contains the minimum possible constructsCorrectness (syntactical) Concepts are properly defined in the schemaUnderstandability (model) The easy with which the data model can be intepreted by the userUnderstandability (schema) How much modeling features are made explicitUnd – Documentation degree Presence of additional documentation for conceptsUnd – User vocabulary rate Users are able to make easy correspondences btwn schema and reqs.Und Concept independ. degree “short paths” for semantic intercnnections (ex. A ISA B) 86
  • 87. Definitions - 1Q. Dimension DefinitionCompleteness The schema represents all relevant features in the requirementsComp – Requirements Correpondence btwn concepts in sch. and relevant terms incoverage reqsComp – Cross modeling Presence in a sch S1 of all concepts in schemas in a setcompl.Implementability Amount of effort to implement the schemaImp - Implementability Overall semantic distance btwn concept is the source m and conc in the target modelMaintainability Ease with which the schema can evolveMan - Modifiability # of modif. related to a concept mod. deriving from dependenciesMan - Cohesion Existence of clusters with high # of internal links btwn clusters compared with external linksMan – Coupling Existence of clusters with low # of links btwn clusters 87
  • 88. Chefri et al. classification – metrics (examples)Specification Legibility –  Clarity # of concepts – number of crossings in the diagram –  Minimality •  Non Redundancy (# weight. conc. - # red. Conc.)/ total # weigh conc. •  Factorization degree •  Aggregation degree Expressiveness –  Concept expressiveness –  Schema expressiveness Simplicity Correctness 88
  • 89. Metrics for structural complexity•  # of associations•  # of dependencies•  # of aggregations•  Depth inheritance tree, longest path from the root of a hierachy to the leaves
  • 90. Moody 1998 - Meta Classification Formal schemaMethod for Framework Quality dimension Quality subdimension Metrics Examples Experiments
  • 91. Moody’s classification•  Completness•  Integrity•  Flexibility•  Understendability•  Correctness•  Simplicity•  Implementability•  Integration  Quality of related groups of schemas (see later) 91
  • 92. Moody’s classific. of Quality dim. and metrics - 1Dimension DefinitionCompleteness The schema contains all the information required to meet reqs.Completness M1 # of items that do not correspond to reqs.Completness M2 # of reqs. Not represented in the schemaCompletness M3 # of items that inacurrately represent reqsCompletness M4 # of inconsistencies in the schemaIntegrity Extent to which the business rules on data are enforced by the sch.Integrity M1 # of business rules not enforced by the schemaIntegrity M2 # of integrity constr. In the schema not accurate in repr. Bus. rulesFlexibility The ease with which the schema can cope with business changeFlexibility M1 # of elements in the sch. Which are subject to change in the futureFlexibility M2 Estimated cost of changesFlexibility M3 Strategic importance of change 92
  • 93. Moody’s classific. of Quality dim. and metrics - 2Dimension DefinitionUnderstandability Ease with which the schema can be understoodUnderstandability User ratingM1Understandability Ability of users to interpret the model correctlyM2Understandability Application developer ratingM3Correctness The schema conforms to the rules of the conceptual modelCorrectness M1 # of violations to model conventionsCorrectness M2 Intra ent. Redundancy: Number of normal form violationsCorrectness M3.a Inter ent. Redundancy: # of redund. concepts in the schema 93
  • 94. Moody’s classific. of Quality dim. and metrics - 3Dimension DefinitionSimplicity The schema contains the minimum possible constructsSimplicity M1 # of entitiesSimplicity M2 # of entities + relationshipsSimplicity M3 # of entities + relationships + attributesImplementability Ease with which the schema can be implemented within time, budget, technology constraintsImplement M1 Technical risk ratingImplement M2 Schedule risk ratingImplement M3 Development cost estimate 94
  • 95. Moody’s monumental contribution to empirical quality/ quality of diagrammatic notations (TSE 2009)Semiotic clarity – there should be a 1:1 correspondence between semantic constructs and graphical symbols Symbol redundancy Symbol overload Symbol excess Symbol deficitPerceptual discriminability: different symbols should be clearly distinguishable form each other Visual distance Discriminability tresholdSemantic transparency: use visual representations whose appearenace suggests their meaning, where symbols can be Immediate Semantically opaque Semantically perverse Semantic translucent
  • 96. Moody’s monumental contribution to empirical quality/ quality of diagrammatic notations (TSE 2009)Complexity management: include explicit mechanisms for dealing with complexity Modularization AbstractionCognitive integration: include explicit mechanisms to support integration of information for different diagrams Conceptual integration Contextualization Perceptual integration Wayfinding
  • 97. Moody’s monumental contribution to empirical quality/ quality of diagrammatic notations (TSE 2009)Visual expressiveness: use the full range and capacities of visual variables Degree of visual freedom SaturationDual coding: use text to complement graphicsGraphic economy: the number of different graphical symboles should be cognitively maneageble Symbol deficitCognitive fit: use different visual dialects for different tasks and audiences Visual mono/plurilinguism
  • 98. Moody’s monumental contribution to empirical quality/ quality of diagrammatic notations (TSE 2009) Interactions among principlesSemiotic Clarity can affect Graphic Economy either positively or negatively: Symbol excess and symbol redundancy increase graphic complexity, while symbol overload and symbol deficit reduce it.Perceptual Discriminability increases Visual Expressiveness as it involves using more visual variables and a wider range of values (a side effect of increasing visual distance); similarly, Visual Expressiveness is one of the primary ways of improving Perceptual Discriminability.Increasing Visual Expressiveness reduces the effects of graphic complexity, while Graphic Economy defines limits on Visual Expressiveness (how much information can be effectively encoded graphically).Increasing the number of symbols (Graphic Economy) makes it more difficult to discriminate between them (Perceptual Discriminability).Perceptual Discriminability, Complexity Management, Semantic Transparency, Graphic Economy, and Dual Coding improve effectiveness for novices, though Semantic Transparency can reduce effectiveness for experts (Cognitive Fit).Semantic Transparency and Visual Expressiveness can make hand drawing more difficult (Cognitive Fit)
  • 99. Others…
  • 100. Genero et al. 2005 - Formal Meta ClassificaFramework schema tion Quality dimension Quality subdimension Metrics Examples Experiments
  • 101. Genero et al classificationMaintainability is influenced by the following subcharacteristics:•  Understandability: the ease with which the conceptual data model can be understood.•  Legibility: is the ease with which the conceptual data model can be read, with respect to certain aesthetic criteria [13].•  Simplicity: means that the conceptual data model contains the minimum number of constructions possible.•  Analysability: the capability of the conceptual data model to be diagnosed for deficiencies or for parts to be modified.•  Modifiability: the capability of the conceptual data model to enable a specified modification to be implemented.•  Stability: the capability of the conceptual data model to avoid unexpected effects from modifications.•  Testability: the capability of the conceptual data model to enable modifications to be validated 101
  • 102. Herden Formal Meta Frame Classification schema work QualityMetadata dimension Quality subdimension Metrics Examples Experiments
  • 103. Herden classification•  Correctness•  Consistency•  Scope•  Level of detail•  Completeness•  Minimality•  Ability of integration (see later)•  Readability 103
  • 104. HerdenDimension Definition(Technical) Correctness Correctness of concepts w.r.t reqs.(Technical) Consistency Absence of contradictionScope Comprehensive w.r.t. general user acceptanceLevel of detail Adequacy in detail w.r.t. user acceptanceCompleteness Completeness w.r.t. requirementsMinimality Compactness and absence of redundanciesReadability Completeness od documentation 104
  • 105. Metadata in Herden’s classification•  Description•  Relevance•  Measuring•  Metric•  Degree of automation•  Objectivity 105
  • 106. Poels et alFormal Meta ClassificaFrame schema tion work Quality dimension Quality subdimension Metrics Examples Experiments
  • 107. Poels et alInterested in•  Perceived semantic quality•  Perceived pragmatic qualityTo understand their relationship with1.  Perceived ease of use (efficiency)2.  Perceived usefullness (effectiveness)and3.  User information satisfaction
  • 108. Poels et al. classificationQuality# Quality dimension DefinitionPSQ1 The schema represents the business process correctlyPSQ2 The schema is a realistic representation of the business processPSQ3 The schema contains contradicting elementsPSQ4 The schema contains redundant elementsPSQ5 Elements must be added to faithfully represent the business processPSQ6 All the elements in the conceptual schema are relevant for the representation of the business processPSQ7 The schema gives a complete representation of the business process 108 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  • 109. Poels et al. classificationQuality# Quality dimension DefinitionPSQ1 Correctness/ The schema represents the business process Validity correctlyPSQ2 Feasible cor- The schema is a realistic representation of rectness/validity the business processPSQ3 Coherence The schema contains contradicting elementsPSQ4 Non redundancy The schema contains redundant elementsPSQ5 ??? Elements must be added to faithfully represent the business processPSQ6 Relevance All the elements in the conceptual schema are relevant for the representation of the business processPSQ7 Completeness The schema gives a complete representation of the business process 109 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  • 110. Poels et al general findings Perceived usefullness 0,1 0,38Perceived UserSemantic Infornation quality 0,58 satisfaction 0,35 0,29 Perceived ease of use
  • 111. Comparisonamong proposals
  • 112. Physical and empirical qualityAuthor(s)/Types of Batini Scand. Moody ArabFrench Genero et Herden Poelsqualities et al 91 94- 98- 02- 2005Physical qualityExternalization xPersistence xAvailability xEmpirical qualityMinimality x x xReadability/legibility x x x x xExpressiveness x x xSimplicity/self x x x xexplainationGraph aesthetics/ x x xreadability/ClarityUnderstandability X-3 x x 112
  • 113. Syntactic and semantic qualityAuthor(s)/Types of Batini Scand. Moody ArabFrench Genero Herden Poelsqualities et al 91 94- 98- 02- et 2005Syntactic qualityInvalidity x x x xIncompleteness xSemantic qualityValidity/Correctness x x X-1 x xFeasible validity x xNormality xIntegrity X-2 x xCompleteness x x X-4 x xLevel of detail xScope xRelevance/Pertinence x x xPerceived semanitc quality xAnalyzability xTestability x 113
  • 114. Pragmatic, knowledge and process qualityAuthor(s)/Types of Batini et Scand. Moody ArabFren Genero et Herden Poelsqualities al 91- 94- 98- ch 02- 05Pragmatic qualityComprehension xSocial quality xAgreement in xknowledgeAgreement in modelinterpret.Knowledge qualityCompleteness xValidity xProcess qualityImplementability xStability xMaintainability/ Fle- x X - 3 xxibility/Extensibility 114
  • 115. Specific dimensions
  • 116. Sheldon classification for Inheritance hierarchiesViewpoints.•  (1) The deeper a class is in the hierarchy, the higher the degree of methods inheritance, making it more complex to predict its behavior.•  (2) Deeper trees constitute greater design complexity, since more methods and classes are involved.•  (3) The deeper a particular class is in the hierarchy, the greater the potential reuse of inherited methods. 116 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  • 117. Sheldon classification for Inheritance hierarchies•  Maintainability•  Understandability•  Modifiability 117 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  • 118. Schema and Data Quality together
  • 119. PersonWhen a schema is defined, quality ID Name Surnamecan be achieved working both onthe schema and on the instance 1 John Smith 2 Mark Bauer 3 Ann Swenson Person AddressID Name Surname Address ID StreetPrefix StreetName Number City1 John Smith 113 Sunset Avenue A11 Avenue Sunset 113 Chicago 60601 Chicago2 Mark Bauer 113 Sunset Avenue A12 Street 4 Heroes null Denver 60601 Chicago3 Ann Swenson 4 Heroes Street Denver ResidenceAddress (a) PersonID AddressID 1 A11 (b) 2 A11 3 A12 119
  • 120. Experimentally investigated by Arab French Quality at schema level Impact Quality at data level Interdependencies
  • 121. Improving the quality of schemas 121
  • 122. Methods•  Origins: achieving normal form  Decomposition techniques•  Scandinavian: derived from the framework•  Through schema transformations 122
  • 123. Derived from framework Syntactic quality•  Error prevention through syntax directed editors•  Error detection through syntactic checks 123
  • 124. Derived from framework Semantic quality•  Consistency checking –  Based on a logical description –  Based on constructivity, namely through properties of the generation process (Langefors et al.) –  Use of driving questions to improve completeness 124
  • 125. Derived from framework Pragmatic quality•  Audience training•  Inspection and walkthroughs•  Transformations (see also later) –  Rephrasing –  Filtering•  Translation –  Explaination generation –  Model execution•  Documentation•  Prototyping 125
  • 126. Derived from framework Social quality•  Integration –  Intra project –  Inter project –  Inter organizational•  Integration process –  Pre-integration –  Viewpoint comparison –  Viewpoint conforming –  Merginf and restructuring 126
  • 127. Through schema transformation 127
  • 128. The Assenova Johannesons approach Dimensions consideredDimension DefinitionExplicitness Requirements are represented at the schema level, not at instance levelSize # of entities + relationships + attributesRule simplicity High # of business rules are represented by simple type of constraintsRule uniformity Cardinality constraints are uniform,Query simplicity Simple retrieval form requirements corresponds to simple queries on the schemaStability Small changes in requrements result in small changes in requirements 128
  • 129. Dimensions and transformations Explicit Size Rule sim- Rule uni- Query Stabi ness plicity formity Simplic. lityPartial attributes - + +Non surjective attributes - + +Partial attr. which are total in + + - +UnionNon-surg. attributes surjective + + - +in Un.M-N attributes - + - +Lexical attributes - + - +Attributes with fixed ranges + - = +Two non disjoint entities + - + +Non unary “overloaded” +/- +attributes 129
  • 130. Example transformation Partial attribute- The size of the schema increases (-)- Introducing the entity EMPLOYEE results in -  increased rule uniformity (+) (all attributes are total) -  increased stability (+)  130
  • 131. Example of increased stabilityIntroducing different categories of employees can be done in the new schema without violating rule simplicityThe same cannot be done in the old schema Old schema New schema 131
  • 132. Quality in data integration architectures
  • 133. The approach of Akoka et al. (2007)General statement: In DI Architectures quality of data and quality of schemas have to be considered together•  Qualities at schema level –  Completeness, –  Understandability –  Minimality –  Expressiveness•  Qualities at data level –  Completeness •  Coverage •  Density –  Uniqueness –  Consistency –  Freshness •  Currency •  Timeliness –  Accuracy •  Semantic •  Syntactic •  Precision
  • 134. The approach of De Conseicao et al (2007) Relevant qualities to be evaluated in DI arch.Given a DI Architecture defined in terms of•  [Data, Local Schemas, Global Schema, Data sources] DI Element IQ Criteria Data Sources Reputation; Verifiability; Availability; Response Time Schema Schema completeness, Minimality, Type Consistency Data Data Completeness, Timeliness, Accuracy
  • 135. The approach of De Conseicao et al (2007) Relevant qualities to be evaluated in DI arch. Given a DI Architecture defined in terms of •  [Data, Local Schemas, Global Schema, Data sources]Quality Definition Refers to Metrics Detailed in terms ofdimensionCompleteness Coverage of global Global schema 1 – (# of incomplete schema concepts items / # total wrt the application items) domainMinimality Extent in which the Global schema 1 – (# redundant Attrib. in an entity schema is schema elements / Attrib. in diff. Ent. compaclty modeled # total items) Ent. Redundancy degree and without Redundant Relationship redundancy Entity Redund. of a Schema Relationsh. Red. Of a Schema Schema MinimalityType Data Type Global schema 1 – (# of Data type consistencyConsistency uniformity across + inconsistent schema Attribute type consistency the schemas Local schemas elements / # total Schema data type consistency schema elements )
  • 136. The H. Dai et al. approach (2006)•  Focus on Column Heterogeneity e-mail, phone n. Many e-mail and Only E-mail addr. And socsec n. phone numb. And Few phone numb Socsec numbers B more heterogeneous than a B more heterogeneous than c B more heterogeneous than d
  • 137. The H. Dai et al. approach (2006)Focus on Column HeterogeneityHeterogeneity dimensions –  Number of semantic types resulting in different clusters –  Cluster entropy –  Probabilistic soft clustering
  • 138. The Moody’s approach Classification of schemas related by integrationQuality categ. DefinitionIntegration Level of consistency of the schema with the rest of the org. dataIntegr M1 # of data conflicts with the Corporate SchemaIntegr M1.a # of entity conflictsIntegr M1.b # of data element conflicts, namely, defs. and domainsIntegr M1.c # of naming conflicts (synonims + homonims)Integr. M2 # of data conflicts with existing systems (ES)Integr M2.a # of data element conflicts, namely, defs. and domains with ESIntegr M2.b # of key conflicts, namely, defs. and domains with ESIntegr M2.c # of naming conflicts (synonims + homonims) with ESIntegr M3 # of data elements with duplicate data elem. in ESIntegr M4 Rating by representatives of other business areas 138
  • 139. The Chai approach Matchability of schemas•  Focus on the evolution of a Data Integration system, and the cost of maintaining the mediated schema S•  Quality observed: the matchability of S against a matching tool M, defined as•  the average of accuracies of matching S with future schemas F1, F2, …Fn (that we assume known at least to some extent) using M
  • 140. Cases for matching mistakes•  Predict a spurious match•  Miss a match•  Predict a wrong match•  Strategy to improve matchability –  Change concepts in M using rules that minimize error probability
  • 141. Batini et al. 2010Potential information content
  • 142. The data architecture of a set of databases is the allocation of concepts and tables across the DB data schemasExample of change of data architecture due to improving access efficiencyEmployeeEmployee # DistributeSalary d DBAssigned-toEmployee #Project #Role Centralized DBProjectProject #Budget
  • 143. Data integration technologies•  Virtual data integration•  Data Warehouses•  Application integration•  Consolidation
  • 144. Potential information content Global Boat has schema Tax payer declares Income Find CF, Name of Tax Payer thatTax payer Boat declares <= 30.000 € and has declares has >= 1 Boat Income Tax payer Sources
  • 145. Potential information content•  Given a schema I, global schemas resulting from virtual integration of schemas S1, S2, .., Sn, the potential information content of I is the set of queries that can be performed on I and cannot be performed on S1, S2, .., Sn.
  • 146. Example E1   E2   E6   Q11 Q12S1 E3   S2 E4   Q21 E5   146
  • 147. Quality of the documentation for large related groups of schemas 147
  • 148. Why integration alone is not enough? ? Hundreds of schemas
  • 149. Relationships investigated•  Integration•  Abstraction•  Abstraction/Integration 149
  • 150. Abstraction 150
  • 151. AbstractionDepartment Employee City Department Employee Seller Item Order Item in Order of Purchaser Floor Department Employee City Department Employee City Seller Seller Engineer Clerk Item in Order Item in Order of Purchaser ofWarranty Warehouse Purchaser 151
  • 152. Integration + abstraction 152
  • 153. First case: integration + abstraction Company Production Sales Department structure Department Employee Item OrderDepartment Employee City Seller Item in Order of Purchaser Floor Floor Floor Employee Department Employee City Department Employee City Department Employee Engineer Seller Seller Engineer Clerk Clerk Item in Order City Item Item of in Order of Warehouse Warranty Purchaser Warranty Warehouse Purchaser 153
  • 154. Second case: integration-abstraction Company Production Sales Department structure Department Employee Item OrderDepartment Employee City Seller Item in Order of Purchaser Floor Floor Floor Employee Department Employee City Department Employee City Department Employee Engineer Seller Seller Engineer Clerk Clerk Item in Order City Item Item of in Order of Warehouse Warranty Purchaser Warranty Warehouse Purchaser 154
  • 155. The approach of Batini et al (1993) The balancing quality dimension cathegory•  For a set of abstractions –  Abstraction balancing •  For a repository of integrated-abstract schemas –  Global balancing  –  Local balancing 
  • 156. Abstraction balancingLow abstraction balancing High abstraction balancing 156
  • 157. Integration/Abstraction balancing - 1High global balancing Low local balancing 157
  • 158. Integration/Abstraction balancing - 2Low global balancing High local balancing 158
  • 159. ConclusionsIssue State of artFramework Mature stageMetamodel Proliferation of proposalsLanguage Few well grounded proposalsQuality dimensions Proliferation of proposals, good taxonomic convergenceQuality metrics Few proposals, empirically validatedObjective vs subjective Few investigations, empirically validatedImprovement methods Several heterogeneous proposalsSchema q. vs process q. Needs more investiogationConvergence to ISO Slowstandards
  • 160. References for conceptual models
  • 161. References: Batini et al.C. Batini, L. Furlani, E. Nardelli - What is a "good" diagram? A pragmatic approach - Proc. Fourth International Conference on the Entity Relationship Approach - Chicago, Ottobre 1985.•  C. Batini, M. Lenzerini, S.B. Navathe - A Comparative Review of methodologies for database integration - IEEE Computing Surveys, 1986.•  R. Tamassia, G. Di Battista, and C. Batini. Automatic graph drawing and readability of diagrams. IEEE Transactions on Systems, Man, and Cybernetics,18(1):61-79, January 1988•  C. Batini, S. Ceri, S.B. Navathe - Logical data base design using the Entity Relationship model - Benjamin and Cummings/ Addison Wesley, Palo Alto, California, USA, 1991.•  C. Batini, G. Di Battista, G. Santucci - Representation structures for data dictionaries, IEEE Transactions on Software Engineering, 1993•  C. Batini, M. Scannapieco – Data quality: dimensions, techniques and methodologies, Springer Verlag, 2006.
  • 162. References: Quality of Conceptual Schemas – The Arab-French approachS. Cherfi et al. Conceptual Modeling Quality, ER Conference 2002S. Cherfi et al. A Framework for conceptual Modeling quality Evaluation, 2007J. Akoka, L. Berti-Equille, et al. A Framework for Quality Evaluation in Data Integration Systems. ICEIS (3) 2007: 170-175J. Akoka et al. Quality of Conceptual Schemas An Experimental Comparison, ER Conference 2007.A. Cherfi et al Perceived vs Measured Quality of Conceptual Schemas: An Experimental Evalution, ER Conference 2007.S. Chefri et al. Conceptual modeling Quality – From EER to UML Schema evaluationS. Cherfi et al. Quality Patterns for Conceptual Modelling, ER Conference 2008J. Akoka et al. Quality of Conceptual Schemas, An Experimental Comparison, RCIS Conference 2008K. Mehmood, S. Chefri, et al – Data Quality Through Conceptual model quality – Reconciling researchers and practictioners through a Customizable Quality Model – Prc. Of 14° Conf. on Information Quality, 2009. 162
  • 163. References: MoodyD. Moody, Metrics for evaluating the quality of ER models, ER98D. Moody, Schanks G., Darke P. Improving the quality of ER models – experience in research and practice, ER98.D. Moody – Measuring the Quality of Data models: an empirical evaluation of the use of quality metrics in practice. ECIS 2003, Naples, Italy.Moody. D. Theoretical and practical issues in evaluating the quality of conceptual models: current state and future directions, Data & Knowledge Engineering, 55 (2005), pp 243 – 276D. Moody – The “Physisc” of Notations: Toward a scientific Basis for Constructing Visual Notations in Software engineering, IEEE Trans on Software Engineering, Vol 35, N. 6, 2009. 163
  • 164. References: Quality of Conceptual Schemas – The Scandinavian approach•  O. Lindland et al. Uniderstanding Quality in Conceptual Modeling, IEEE Software 1994•  G. Shanks et al. Quality in Conceptual Modeling: linking theory and practice, in: Proceedings of the Pacific Asia Conference on Information Systems, PACIS97, Brisbane, Australia, 1997.•  J. Krogstie, Defining quality aspects for conceptual models Source - Proceedings of the IFIP international working conference on Information system concepts: Towards a consolidation of views, IFIP Conference Proceedings; Vol. 26, 1995 J. Krogstie, A. Solvberg, Information Systems Engineering: Conceptual Modeling in a quality perspective, Springer, 2000•  Lillehagen, F. and J. Krogstie, Active Knowledge Modeling of Enterprises. 2008: Springer 164
  • 165. References – Quality in heterogeneous architectures•  B.T. Dai, N. Kouas et al. – column heterogeneity as a measure of Data quality, CleanDB, Seoul, Korea, 2006•  M. de Conseicao Moraes Batista et al. – Information Quality Measurement in Data Integration Schemas, VLDB 07’ Wien, Austria•  C. Naiman et al. – A classification of semantic conlficts in Heterogeneous Database systems, University of Illinois.•  J. Akoka et al. – A Framework for Quality Evaluation in Data Integration Systems, Proc. ICEIS 2007.•  V. Peralta et al, A Framework for Data Quality Evaluation in a Data Integration System•  H. Do et al. – Comparison of Schema Matching Evaluations, NODe 2002 Web and Database-Related Workshops on Web, Web- Services, and Database Systems•  X. Chai et al. – Analyzing and Revising Data Integration Schemas to improve their Matchability, PVLDB 2008, Auckland, Australia.
  • 166. References: Quality of Conceptual Schemas – Others 1P. Assenova, P Johannesson Improving quality of conceptual modelling by the use of schema transformations, ER Conference, 1996Carlo Batini, L. Furlani, Enrico Nardelli: What is a Good Diagram? A PragmaticApproach. 312-319, Proceedings of the Fourth International Conference on Entity Relationship Approach Pp 312 – 319, 1985C. Batini, M. Scannapieco – Data Quality: Concepts, Methodologies and Techniques.Springer Verlag, 2006.C. Batini, C. Cappiello, C. Francalanci, A. Maurino - Methodologies for data quality assessment and improvement – Computing Surveys, 2009.J. Hoxmeier – Typology of database quality factors, Software quality journal, 1998. 166 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  • 167. OthersKeng Siau, Xin Tan, Improving the quality of conceptual modeling using cognitive mappingtechniques, Data and Knowledge Engineering, 2005.F. T. Sheldon∗, Kshamta Jerath and Hong Chung - Metrics formaintainability of class inheritance hierarchies, Journal of Softw.Maintenance and Evolution, 2002P. Vassiliadis et al., Towards Quality Oriented Data Warehouse Usage and Evolution – Information Systems Journal, 25, pp 89-115, 2000. 167
  • 168. Outline•  Motivation: –  the Web era / information quality meeting ontologies / the ontology landscape /•  Quality of Conceptual Schemas –  frameworks / metamodels / dimensions / metrics / groups of schemas•  Quality of Ontologies –  frameworks / metamodels / dimensions / metrics•  Conclusions 168
  • 169. FRAMEWORKS 169
  • 170. Ontologies vs Graphs•  Can ontologies be considered graphs? –  Often expected to be –  LD, Th, and KB assertions (instance level) can be straightforwardly represented by graphs –  KB terminologies (schema level) consists of axioms •  Represented by graphs at the implementation level (RDF/XML) •  At the conceptual level, mapping to graphs is trivial only for specific A aspects (i.e. hierarchical relations, domain and range restrictions) 170
  • 171. Ontologies vs Graphs•  Can ontologies be considered graphs? “Relationships –  Often expected to be between concepts” ? –  LD, Th, and KB assertions (instance level) can be straightforwardly represented by graphs –  KB terminologies (schema level) consists of axioms •  Represented by graphs at the implementation level (RDF/XML) •  At the conceptual level, mapping to graphs is trivial only for specific A aspects (i.e. hierarchical relations, domain and range restrictions) Anonimous classes ? 171
  • 172. Krogstie and Solvberg framework Social quality Participant Perceived Social actor knowledge Semantic quality InterpretationGoal of Physical Social Empiricalmodeling quality quality Pragmatic Organizational quality quality Model SyntacticModeling Semantic Language qualitydomain quality Externalization extension Technical Pragmatic quality Technical actor Intepretation 172
  • 173. Krogstie and Solvberg framework Burton-Jones et al. 2004 Social quality Participant Perceived Social actor knowledge Semantic quality Interpretation Goal of Physical Social Empirical modeling quality quality Pragmatic Organizational quality quality Model Syntactic Modeling Semantic Language quality domain quality Externalization extension CognitiveStrasunskas & Process-driven Technical qualityTomassen 2008 quality Pragmatic qualityYu et al. 2007 Mental Process/Task representation Technical actor Intepretation Evermann & Fang 2010 Usually within an application, e.g. semantic search 173
  • 174. Types of qualities revisitedType of quality DescriptionPhysical Correspondence between participant knowledge and the externalized conceptual modelEmpirical It is reflected by the error frequency when a model is read or writtenSyntactic Correspondence between the conceptual model and the languageSemantic Correspondence between the conceptual model and the domainPragmatic Correspondence between the conceptual model and the audience’s interpetation of itProcess/task Usefulness of the conceptual model wrt a process/task it is used indriven (often a computational task)Perceived Correspondence between participant knowledge and individualsemantic interpretationCognitive Correspondence between the conceptual model and a cognitive (mental) modelSocial Agreement on participant knowledge and individual interpretation 174 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  • 175. Burton-Jones et al. 2004•  A semiotic metrics suite for assessing the quality of ontologies•  Semiotic approach –  Dimensions at different levels with dependencies (e.g. A  B = qualities at level B depend on qualities at level A)•  From –  R. Stamper, K. Liu, M. Hafkamp, Y. Ades, Understanding the role of signs and norms in organizations—asemiotic approach to information systems design, Behaviour & Information Technology 19 (1) (2000) 15– 27. –  Same as Krogstie and Solberg, introduces dependencies 175
  • 176. Strasunskas & Tomassen 2008 [EvOQS] Framework•  EvOQS: Evaluation of Ontologies for Search 176
  • 177. Evermann & Fang 2010•  Evaluating ontologies: Towards a cognitive measure of quality•  Cognitive perspective: cognitive model of the relationship between ontologies as formal (externalized) specifications, (mental) conceptualization and “real world” 177
  • 178. Gangemi et al. 2006 •  O2: an ontology representing ontologies as semiotic objects and their context (e.g. creation, annotation, and so on)Ontologies as graphs “Cognitive” interpretation of conceptualization 178
  • 179. METAMODELS 179
  • 180. ADDITIONALCLASSIFICATIONS 180
  • 181. Brank et al. 2005 E.g. Measuring•  Classification based on: similarity between ontologies –  levels of quality evaluation E.g. Semantic search –  different evaluation strategies E.g. Considering semantic annotation of documents that need to be covered E.g. Considering understandability 181
  • 182. EvOQS - Strasunskas & Tomassen 2008 182
  • 183. DIMENSIONS 183
  • 184. Burton-Jones et al. 2004•  Semiotic metric suite (DAML ontologies) –  From software engineering –  Design principles: •  In general, not limited to a particular type of user (by humans and softwares) and independent of the capabilities of a specific person or machine. •  In general, not limited to particular types of ontologies (e.g. domain, application etc) •  Designed to be comprehensive, yet parsimonious. –  Assumptions. •  the ontology is written in a known language •  there is an independent body of semantics that can be used to assess an ontology semantics. To the extent that the ontology is the only known description of some domain, only a portion of the metrics could be used. 184
  • 185. Burton-Jones 2004 Formal Meta ClassificaFramework schema tion Quality dimension Metrics Examples Experiments
  • 186. Burton-Jones 2004 Formal Meta ClassificationFramework schema Quality dimension Quality subdimension Metrics Examples Experiments
  • 187. Burton-Jones et al. 2004•  Semiotic metric suite (DAML ontologies)•  Quality attributes and their definition by type 187
  • 188. Gangemi et al. 2006 [oQual] Formal Meta Classification Framework schema Quality dimension Quality subdimension Metrics Examples Experiments
  • 189. Gangemi et al. 2006 [oQual] Semiotic-level macro dimensions•  Structural dimension –  The structural dimension of ontologies focuses on syntax and formal semantics, i.e. on ontologies represented as graphs (Context free metrics) •  Ontology as an (information) object;•  Functional dimension –  The functional dimension is related to the intended use of a given ontology and of its components, i.e. their function in a context. The focus is on the conceptualization specified by an ontology. •  Ontology as a language (information object + intended conceptualization)•  Usability-profiling dimension –  The usability-profiling dimension focuses on the ontology profile (annotations), which typically addresses the communication context of an ontology (i.e. its pragmatics). 189
  • 190. Gangemi et al. 2006•  O2: an ontology representing ontologies as semiotic objects and their context (e.g. creation, annotation, and so on)Structural dimensionFunctional dimension User-profiling dimension 190
  • 191. Gangemi et al. 2006 (oQual) Evaluation-level Dimensions (QooD)•  Cognitive ergonomics•  Transparency (explicitness of EXAMPLE (metrics for cognitive organizing principles) ergonomics) –  Depth•  Computational integrity and –  Breadth efficiency –  Tangledness•  Meta-level integrity + class/property ratio•  Flexibility (context- + annotations (esp. lexical, glosses, boundedness) topic)•  Compliance to expertise -  Anonymous classes + interfacing•  Compliance to procedures for + patterns (dense areas) extension, integration, adaptation, etc.•  Generic accessibility (computational as well as commercial)•  Organizational fitness 191
  • 192. Tartir et al. 2005 (OntoQA) - dimensions•  Schema –  Relationship richness –  Attribute richness –  Inheritance richness More a classification of metrics than dimensions•  Knowledge base – Instance –  Class richness –  Avg. Population –  Cohesion•  Knowledge base – Class –  Importance –  Fullness –  Inheritance Richness (c) –  Relationship Richness (c) –  Connectivity –  Readability 192
  • 193. Strasunskas & Tomassen 2008 [EvOQS]•  EvOQS ontology fitness for Ontology-based Information Retrieval•  Quality Dimensions: –  Search task fitness •  Fact-Finding search task Fitness (FFF) •  EXploratory search task Fitness (EXF) •  Comprehensive search task Fitness (COF) –  Search enhancement capability •  Recall Enhancement Capability (REC) •  Precision Enhancement Capability (PEC) 193
  • 194. Evermann & Fang 2010•  Cognitive evaluation of ontologies•  Cognitive assumptions –  Knowledge stored in our memory as a set of cognitive concepts into hierarchies•  Dimension: –  cognitive adequacy •  How much an ontology correspond to actual conceptualization 194
  • 195. Guarino & Welty 2000 - OntoClean•  Metaphysical quality –  Meta-level consistency Ontoclean •  Identity // When you conceive of a class, ask “What makes each instance unique?” •  Rigidity // essentiality wrt instances of the class •  … 195
  • 196. METRICS 196
  • 197. Burton-Jones et al. 2004•  Metrics: A R –  Absolute ( A ) vs relative ( R ) A R metrics A R •  Relative metrics: the values for a given ontology will depend on an A R external benchmark such as the A R metrics average value across all the ontologies in the ontology A R library in which the ontology A R exists –  Automatic ( ) vs. human- A R driven ( ) measurment A R A R 197
  • 198. Burton-Jones et al. 2004 198
  • 199. Tartir et al. 2005 (OntoQA)•  Relationship Richness –  An ontology that contains many relations other than class-subclass relations is richer than a taxonomy with only class-subclass relationships. –  a percentage representing how much of the connections between classes are rich relationships A compared to all of the possible connections 199
  • 200. OntoQA•  Attribute Richness –  The more slots that are defined the more knowledge the ontology conveys –  Average attribute per class A 200
  • 201. OntoQA•  Inheritance Richness –  A horizontal (or flat) ontology: small number of inheritance levels, and each class has a relatively large number of subclasses –  A vertical ontology contains a large number of inheritance levels where classes have a small number of subclasses. A –  Average number of subclasses per class•  Also refined as average number of subclasses per subtree 201
  • 202. OntoQA•  Readability –  both at schema an instance level –  the sum of the number of attributes that are comments and the number of attributes that are labels A 202
  • 203. OntoQA•  Instance metrics - Knowledge Base –  Class richness •  The number of classes that have instances is compared with the total number of classes –  Average population •  Number of instances A compared to the number of classes –  Cohesion •  Number of separate connected components in the instances 203
  • 204. OntoQA•  Instance metrics - Class –  Importance (of a class) •  percentage of instances that belong to classes at the subtree rooted at the current class with respect to the total number of instances –  Fullness A •  actual number of instances that belong to the subtree of the class compared to the expected number of instances 204
  • 205. OntoQA•  Instance metrics – Class –  Relationship Richness (instance level) •  number of relationships that are being used by instances of a class C compared to the number of relationships that are defined for C at the schema level A –  Connectivity •  the number of instances of other classes that are connected to instances of that class 205
  • 206. Gangemi et al. 2006 (oQual) - metricsSTRUCTURAL DIMENSION FUNCTIONAL DIMENSION (ontologies•  Classes & Instances w.r.t. tasks) –  No. Classes •  Precision recall accuracy wrt expert –  No. Leaf Classes judgments or datasets –  Unique No. Instances •  Black-box (rational agents) –  Avg. Instances per class –  Agreement –  Max. Instances per class –  User-satisfaction•  Breadth •  Glass-box –  Absolute, Avg. & Max. –  Task•  Depth –  Topic –  Absolute, Avg. & Max. –  Modularity ≈OntoQA (cohesion)•  Parents & Children USABILITY PROFILE (ontology –  No. Parent Classes annotations); three levels of –  No. Children Classes usability: –  Avg. Children per Parent –  Recognition (documentation on use) –  Max. Parents for any given child –  Efficiency (organizational annotation) –  Fanout factor ≈OntoQA (inher. Richness) –  Interfacing (interfacing operations) –  Tangledness –  Density –  Degree distribution ≈OntoQA –  Meta & Logical adequacy Logical Consistency 206
  • 207. oQual - Metrics•  Average Depth (subclasses)•  Average Breadth A 207
  • 208. oQual - Metrics•  Tangledness –  How much concepts are related to different subtrees in the ontology –  The more tangled the more branches in the taxonomy are interdependent A 208
  • 209. oQual - Metrics•  Leaf Fan-Outness –  Similar to inheritance richness in OntoQA but computed considering leafs –  Measures how much the taxonomy expands into subclasses A 209
  • 210. oQual - Metrics•  Modularity –  Similar to cohesion in OntoQA but considered also at the schema level –  Measures how much the concepts are interconnected into modules –  Based on ontology design A patterns 210
  • 211. oQual - Metrics•  Consistency ratio –  Measures the degree of inconsistency within an ontology –  Based on •  Number of inconsistent concepts •  Reasoning techniques A 211
  • 212. oQual - Metrics•  Consistency ratio –  Measures the degree of inconsistency within an ontology –  Based on •  Number of inconsistent concepts •  Reasoning techniques A 212
  • 213. oQual - dimensionsSTRUCTURAL DIMENSION FUNCTIONAL DIMENSION (ontologies•  Classes & Instances w.r.t. tasks) –  No. Classes •  Precision recall accuracy wrt expert –  No. Leaf Classes judgments or datasets –  Unique No. Instances Proposed NLP •  Black-box (rational agents) –  Avg. Instances per class –  Agreement techniques –  Max. Instances per class –  User-satisfaction•  Breadth •  Glass-box –  Absolute, Avg. & Max. –  Task•  Depth –  Topic –  Absolute, Avg. & Max. –  Modularity ≈OntoQA (cohesion)•  Parents & Children USABILITY PROFILE (ontology –  No. Parent Classes annotations); three levels of –  No. Children Classes usability: –  Avg. Children per Parent –  Recognition (documentation on use) –  Max. Parents for any given child –  Efficiency (organizational annotation) –  Fanout factor ≈OntoQA (inher. Richness) –  Interfacing (interfacing operations) –  Tangledness –  Density –  Degree distribution ≈OntoQA –  Meta & Logical adequacy Logical Consistency 213
  • 214. Dagupta et al. 2007 (Pan-Onto-Eval)An ontology is meaningful when there are many diverse relationships•  Focus on Non Is-A relations –  Triple Centricity: •  Analysis of concepts that are domain and range of relationships (in the summaries) –  Theme Centricity •  Seven broad thematic categories for classification of non-IS-A relations inspired by UMLS (Compositional, Attributive, Spatial, Functional, Temporal, Comparative and Conceptual.) –  Structure Centricity •  Placement of Non Is-A relations within the hierarchy –  Domain Centricity •  Richness of Non Is-A relations in the different hierarchies (representing different “domains”) 214
  • 215. Pan-Onto-Eval - Metrics•  Metrics computed on each hierarchy that exists under the root of the ontology independently and then combine these information –  Information Content of a subclass hierarchy –  Relational richness –  Inheritance richness •  Refines OntoQA –  Dimensional Richness –  Domain Importance A •  Refines OntoQA•  Global score 215
  • 216. Pan-Onto-Eval - Metrics•  Information Content (IC) of a subclass hierarchy•  Measures how well information involving relations R is distributed over a subclass hierarchy•  Hypothesis: A –  a well spread distribution of important relations with respect to domain concepts DC in H indicates richness of information•  Based on entropy theory 216
  • 217. More…Strasunskas & Tomassen 2008 [EvOQS]•  Empirical Insights on a Value of Ontology Quality in Ontology-Driven Web Search•  EvOQS ontology fitness in information retrieval using Ontology-based Information Retrieval (ObIR) tools –  Evaluation of ontologies wrt to search tasks •  Fact-finding •  Exploratory •  Comprehensive –  Evaluation of search enhancement capabilities –  Empirical Evaluation: •  show that ontology quality improvement (by specifying equivalent and disjoint classes, adding instances and properties) can significantly improve Web search results. 217
  • 218. EvOQS principles•  Quality Dimensions & Metrics –  Fact-Finding search task Fitness (FFF) –  EXploratory search task Fitness (EXF) –  Comprehensive search task Fitness (COF) 218
  • 219. EvOQS principles•  Quality Dimensions & Metrics –  Recall Enhancement Capability (REC) –  Precision Enhancement Capability (PEC) 219
  • 220. More… Yu et al. 2007 - Quality wrt browsing categories in Wikipedia•  Ontology Evaluation Using Wikipedia Categories for Browsing –  Empirical evaluation of: •  Tangledness •  Breadth •  Depth •  Fanout –  Conclusion: •  A highly tangled ontology would not be desirable for structured ontologies. However, in the context of Wikipedia, it is beneficial and a requirement as it allows for greater intersectedness of the domain. 220
  • 221. More… Lei et al. 2007 - Quality of semantic annotation for a document corpus•  Detecting Quality Problems in Semantic Metadata without the Presence of a Gold Standard –  In general data-level, useful e.g. for linked data although it requires reasoning –  Automatically detect data deficiencies in semantic metadata without constructing gold standard data sets by detecting: •  Incomplete annotations •  Inconsistencies (+ explanation) •  Duplicate problems •  Ambigous and inaccurate problems 221
  • 222. More… Cognitive Quality Evermann & Fang 2010•  Cognitive evaluation of ontologies•  Cognitive assumptions –  Knowledge stored in our memory as a set of cognitive concepts into hierarchies•  Evaluation of cognitive adequacy –  How much an ontology correspond to actual conceptualization•  Two cognitive effects on information retrieval from memory: –  Semantic Distance –  Category size•  Measure of these effects via sentence verification task –  subjects verify category subsumption statements, e.g. ‘‘A terrier is an animal’’, and the response time and correctness of the answer are measured. 222
  • 223. Evermann & Fang 2010•  Semantic Distance Effect on information retrieval –  The time to retrieve a concepts D given a concept C should be proportional to the distance a semantic network •  to retrieve ‘‘animal’’ when given ‘‘dog’’ should take twice as long as retrieving ‘‘canine’’ when given ‘‘dog’’.•  Category size –  Every class described by prototypical instances –  This suggests that the larger a category, the longer the time that is required to search the category for specific instances •  For example, determining whether ‘‘A Terrier is a Dog’’ should be quicker than determining whether ‘‘A Terrier is an Animal’’•  Measuring verification times for subsumption statements based on the formal specification –  All BWW Sumo Wordnet 223
  • 224. More… Metaphisical qualities (OntoClean)•  Relation to reality•  Making meaning clear•  Meta-level consistency Ontoclean –  Identity // When you conceive of a class, ask “What makes each instance unique?” –  Unity –  Rigidity // essentiality wrt instances of the class –  Dependence –  Actuality –  Permanence•  Captures the invariant structure of the domain 224
  • 225. Ontoclean: Which one is better? Computer Computer has-part has-part +I+R+U +I+R~U -I~R-U Disk Drive Memory Computer Part -I~R-U Computer Part +I+R+U +I+R~UDisk Drive Memory +I~R-U +I~R~U Disk Part Memory Part Due to: Guizzardi, et al, 2004. 225
  • 226. More… Naturalness (of terms) - Chun & Geller 2008•  Evaluating Ontologies based on the Naturalness of their Preferred Terms•  Quality w.r.t. naturalness of preferred terms –  Preferred terms that have the highest rating for acceptability in the relevant user community.•  For Th, (but extensible to KB) –  Given a set of synonyms, select the preferred ones based on its naturalness •  UMLS // the Metathesaurus (MESH SNOMED NCI etc) has grown to over 1.4 million concepts corresponding to over 7.2 million terms.•  Metrics: compute the average number of corpus- confirmed preferred terms relative to the total number of concepts in the ontology 226
  • 227. References – Quality of Ontologies•  J. Brank, M. Grobelnik, D. Mladenic, A survey of ontology evaluation techniques, in: Proceedings of SIKDD 2005 at multiconference IS 2005, Ljubljana, Slovenia, 2005.•  A. Burton-Jones, V. C. Storey, V. Sugumaran, P. Ahluwalia, A semiotic metrics suite for assessing the quality of ontologies, Data Knowl. Eng. 55 (1) (2005) 84–102.•  S. A. Chun, J. Geller, Evaluating ontologies based on the naturalness of their preferred terms, in: HICSS ’08: Proceedings of the Proceedings of the 41st Annual Hawaii International Conference on System Sciences, IEEE Computer Society, Washington, DC, USA, 2008, p. 238.•  S. Dasgupta, D. Dinakarpandian, Y. Lee, A panoramic approach to integrated evaluation of ontologies in the semantic web, in: Garcia-Castro et al. [16], pp. 31–40.•  L. Ding, T. Finin, Characterizing the semantic web on the web, in: I. F. Cruz, S. Decker, D. Allemang, C. Preist, D. Schwabe, P. Mika, M. Uschold, L. Aroyo (Eds.), International Semantic Web Conference, Vol. 4273 of Lecture Notes in Computer Science, Springer, 2006, pp. 242–257.•  J. Evermann and J. Fang. 2010. Evaluating ontologies: Towards a cognitive measure of quality. Inf. Syst. 35, 4 (Jun. 2010), 391-403.•  A. Gangemi, C. Catenacci, M. Ciaramita, J. Lehmann, Modelling ontology evaluation and validation, in: Y. Sure, J. Domingue (Eds.), ESWC, Vol. 4011 of Lecture Notes in Computer Science, Springer, 2006, pp. 140–154. 227
  • 228. References – Quality of Ontologies•  R. Garcia-Castro, D. Vrandecic, A. Gomez-Perez, Y. Sure, Z. Huang (Eds.), Proceedings of the 5th International Workshop on Evaluation of Ontologies and Ontology-based Tools, EON2007, Co-located with the ISWC2007, Busan, Korea, November 11th, 2007, Vol. 329 of CEUR Workshop Proceedings, CEUR-WS.org, 2008.•  Y. Lei, V. S. Uren, E. Motta, A framework for evaluating semantic metadata, in: D. H. Sleeman, K. Barker (Eds.), K-CAP, ACM, 2007, pp. 135–142.•  Y. Lei, A. Nikolov, Detecting quality problems in semantic metadata without the presence of a gold standard, in: Garcia-Castro et al. [16], pp. 51–60.•  F. Mostowfi, F. Fotouhi, Improving quality of ontology: An ontology transformation approach, in: R. S. Barga, X. Zhou (Eds.), ICDE Workshops, IEEE Computer Society, 2006, p. 61.•  D. Strasunskas, S. L. Tomassen, Empirical insights on a value of ontology quality in ontology-driven web search, in: R. Meersman, Z. Tari (Eds.), OTM Conferences (2), Vol. 5332 of Lecture Notes in Computer Science, Springer, 2008, pp. 1319–1337.•  S. Tartir, I. B. Arpinar, M. Moore, A. P. Sheth, B. Aleman-Meza, OntoQA: Metric- based ontology quality analysis, in: Proceedings of IEEE Workshop on Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources, 2005.•  Y. Theoharis, Y. Tzitzikas, D. Kotzinos, V. Christophides, On graph features of semantic web schemas, IEEE Trans. Knowl. Data Eng. 20 (5) (2008) 692–702. 228
  • 229. References – Quality of Ontologies•  K. Verspoor, D. Dvorkin, K. B. Cohen, L. Hunter, Ontology quality assurance through analysis of term transformations, Bioinformatics 25 (12).•  D. Vrandecic, Y. Sure, How to design better ontology metrics, in: E. Franconi, M. Kifer, W. May (Eds.), ESWC, Vol. 4519 of Lecture Notes in Computer Science, Springer, 2007, pp. 311–325.•  J. Yu, J. A. Thom, A. Tam, Ontology evaluation using Wikipedia categories for browsing, in: CIKM ’07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, ACM, New York, NY, USA, 2007, pp. 223– 232. 229
  • 230. Outline•  Motivation: –  the Web era / information quality meeting ontologies / the ontology landscape /•  Quality of Conceptual Schemas –  frameworks / metamodels / dimensions / metrics / groups of schemas•  Quality of Ontologies –  frameworks / metamodels / dimensions / metrics•  Conclusions 230
  • 231. Scope of the Researches•  Research on ontology quality ( ontology evaluation) is more fragmented –  Explanation •  Different perspectives on ontologies •  Different intended usages of ontologies •  Quite recent research (although several approaches have been proposed so far) •  More interest for measurability •  After earlier works (Burton-Jones 2004, OntoQA, oQual), more interest in specific dimensions and evaluation 231
  • 232. Frameworks, Metamodels, Dimensions and Metrics in Ontologies•  More attention on metrics for models characterization and evaluation rather than on comprehensive frameworks and metamodels –  When a quality metamodel is considered, the semiotic approach is adopted –  The conceptual distinction between quality dimensions and metrics is blurry –  Explanation: •  Conceptual schemas: attention to CSs as models of Information Systems prevails •  (Web) ontologies: attention to Onto as support representational objects for a given process/task prevail 232
  • 233. Task/Process-driven Quality•  Burton-Jones et al. 2004 for conceptual schemas Low relevance Social quality•  High relevance for ontologies dueSocialthe Participant knowledge Perceived Semantic to actor Interpretation exploitation of ontologies in different quality applications Physical Goal of modeling quality Empirical Social quality Pragmatic Organizational quality quality Model Syntactic Modeling Semantic Language quality domain quality Externalization extension Cognitive Technical Process-driven quality Pragmatic quality quality Mental Process / Task Technical actor representation Intepretation 233
  • 234. Metrics, Lexical & Logical Analysis A•  Conceptual schemas •  Ontologies –  Diagrammatic nature –  Linguistic & logical + graphical display (e.g. nature crossing, etc) + consistency by reasoning -  consistency + lexicon + logical structure - graphical display 234
  • 235. Metrics & Structural Analysis A•  Conceptual schemas •  Ontologies –  Flat structure –  Hierarchical structure + concept-relationship- + many analytics for concept graph taxonomies properties - concept-relationship- -  structural analysis of concept graph hierarchical properties properties 235
  • 236. THANK YOU FOR YOURATTENTIONThis presentation will be available online soon 236