Tutorial at CAISE 2010Information Quality in the Web Era   C. Batini & Matteo Palmonari        Department of Computer Scie...
Outline•  Motivation [Palmonari]  –  the Web era / information quality meeting     ontologies / the ontology landscape /• ...
Outline•  Motivation:  –  the Web era / information quality meeting     ontologies / the ontology landscape /•  Quality of...
the Web era ischaracterized by…   The “Big Data”     phenomenon
How to make sense of all these data?                                   5
Documents’ and diplicates’ size along time                                         6
How to make sense of all these data?                    Data management needs                         data quality        ...
How to make sense of all these data?                    Data management needs                         data quality        ...
Data/information heterogeneity in Information SystemsInformation is available in different formats and is represented   ac...
Tutorial Background - Data Quality (Structured Data)23rd International Conference on Conceptual Modeling (ER 2004), Shanga...
Tutorial Background – Towards Information Quality                (Heterogenous Data)Tutorial at ER 08, Barcelona, Spain   ...
How to make sense of all these data?                Together with automatic techniques for                 information ext...
Of course, the “Semantic Web” perspective•  Make the semantics of                                                         ...
Ontologies out of the Semantic Web•  But also for the ones that are skeptic wrt the semantic   Web,•  Ontologies (e.g. OWL...
Ontology + “Information Systems”                                   15
Ontology + “Software Engineering”                                    16
Ontologies &Semantic Resources•  KB - Axiomatic ontologies (e.g. SUMO)   –  Terminological (intentional/schema) level:    ...
Need for ontology evaluation•  Ontology “Quality”  Ontology Evaluation•  Quality of ontologies matters!  –  In particular...
Searching for “Customer” with Sindice                                        19
Searching for “Customer” with Watson                                       20
Searching for “Customer” on Swoogle                                      21
Searching for “Customer” on Swoogle          (refined search)                                      22
Ontologies and semantic resources should be considered in comprehensive studies about information quality in the Web era  ...
Structured data and ontologies•  Structured data                   •  Ontologies (KB)  Instances                          ...
Ontologies and their grandparents•  Structured data                  •  Ontologies (KB)  Instances                        ...
Outline•  Motivation:  –  the Web era / information quality meeting     ontologies / the ontology landscape /•  Quality of...
Outline•  Motivation:  –  the Web era / information quality meeting     ontologies / the ontology landscape /•  Quality of...
# of slides•  About 130  30•  I will provide mainly a guided   introduction to the slides
In a database,      quality can be investigated..•  At model (language) level•  At schema (model) level•  Al instance (val...
Data quality dimensions                          30
Acronym     Data Quality DimensionTDQM        Accessibility, Appropriateness, Believability, Completeness, Concise/Consist...
Reference forquality of data in databases    2006                  32
Here we focus on•  Model level• Schema level•  Data level                                33
Quality of Conceptual Schemas - contents•  Frameworks and Metamodels proposed•  Quality of Schemas  –  Classifications, Di...
Quality of schemas                     35
Some figures on proposed approaches in the literature       (from Mehmood 2009, citing Moody 2005)                      Re...
Metaschema of approaches              Formal          Meta       Classification            Framework        schema        ...
Krogstie & Solvberg (the Scandinavians)         Proposals                                Meta     Classification          ...
Proposals  Formal          Meta       ClassificationFramework        schema                Quality               dimension...
Frameworks for schema quality                                40
Krogstie and Solvberg framework                                                                 Social           Participa...
Krogstie and Solvberg framework                                                                  Social           Particip...
Krogstie and Solvberg framework                        Correspondence between                                      partici...
Krogstie and Solvberg framework                                                                 Social           Participa...
Krogstie and Solvberg framework                                                                   Social             Parti...
Correspondence between participant knowledge and       Krogstie and Solvberg framework  the externalized conceptual model ...
Krogstie and Solvberg framework                                                                  SocialIt is reflected by ...
Krogstie and Solvberg framework                                                              Social           Participant ...
More formally•  G, the goals of the modeling task.•  L, the language extension, i.e., the set of all statements that are  ...
Main quality types•  Physical quality: The basic quality goal is that the model M is  available for the audience.•  Empiri...
Framework for language (model) quality                                         51
Framework for language (model) quality           Participant                                 Social actor           knowle...
Main quality typesDomain appropriateness. This relates the language and the domain.  Ideally, the conceptual basis must be...
Main quality typesComprehensibility appropriateness relates the language to the  social actor interpretation. The goal is ...
Metamodels             55
Shanks et al. composite modelTheory based       Domain                Quality type                       Means            ...
Metamodels – Arab/French Mehmood, Chefri et al. 2009, based on goals,             question, metrics Quality goal      Q. D...
Metamodel instantiationQuality goal              Ease of changeDimension         Complexity                Mantainability ...
Metamodels – Vassiliadis et al. For DWs            Quality goal              Q. Dimension Improvement                     ...
Quality goal                Q. Dimension Comparison                  Improvement                              Factor      ...
Schema Quality Dimensions                            61
The origins…Batini, Ceri, Navathe 1991                    Formal Meta Classifica                    Frame schema tion     ...
Batini, Ceri, Navathe 1991Q. Dimension        DefinitionCompleteness        Represents all (only) relevant features ofPert...
CompletenessCompleteness measures theextent to which a conceptual           Students have aSchema includes all the        ...
PertinencePertinence measures how manyunnecessary conceptual            Students have a                                  c...
Correctness - syntacticConcerns the correct use of thecategories of the model in representingrequirements.                ...
Correctness - semanticCorrectness with respect to requirementsconcerns the correct representation ofThe requirements in te...
Minimality/Redundancy                                              1,nA schema is minimal if every      Studentpart of the...
Expressiveness/ReadabilityIntuitively, a schema is readable whenever it representsthe meaning of the reality represented b...
Diagrammatic readabilityWith regard to the diagrammatic representation,readability can be expressed objectively by anumber...
Unreadable schema              Works                                  Manages  Head                                       ...
A Readable schema   Floor   Located                                  Manages                     Head                     ...
Is diagrammatic readability objective?                                                                                    ...
But ……personal experience in China,                            Beda University, about 1985Question to chinese professors:W...
ExpressivenessThe second issue addressed by readability is thecompactness of schema representation. Among thedifferent con...
Transformation the preserves              information content    and enhances compactness/expressiveness         Employee ...
Normalization             Unnormalized ER schema                Employee-Project                Employee #                ...
Scandinavians (1994-    Formal        Meta Classification  Framework      schema                Quality               dime...
Scandinavians (1994-    Formal        Meta Classification  Framework      schema                Quality               dime...
Main model (schema) quality dimensionsPhysical quality•   Externalization, number of statements on the domain not yet stat...
Language quality dimensions - 1May refer   a. to the language or else   b. to the relationship btwn language and other iss...
Language quality dimensions - 2Referring to the relationship btwn the language and other  issuesDomain appropriateness, th...
More of Pragmatic quality•  Social pragmatic quality (to what extent people understand and are able to use the models) and...
Arab French (2002-  Formal        Meta ClassificationFramework      schema              Quality             dimension     ...
Chefri et al. classification•    Specification      –    Legibility             •    Clarity             •    Minimality  ...
Definitions – 1Q. Dimension                    DefinitionClarity                         is an aesthetic criterion, based ...
Definitions - 1Q. Dimension             DefinitionCompleteness             The schema represents all relevant features in ...
Chefri et al. classification – metrics                   (examples)Specification  Legibility  –  Clarity    # of concepts ...
Metrics for structural complexity•    # of associations•    # of dependencies•    # of aggregations•    Depth inheritance ...
Moody 1998 -                        Meta Classification               Formal                       schemaMethod for   Fram...
Moody’s classification•    Completness•    Integrity•    Flexibility•    Understendability•    Correctness•    Simplicity•...
Moody’s classific. of Quality dim. and metrics - 1Dimension        DefinitionCompleteness     The schema contains all the ...
Moody’s classific. of Quality dim. and metrics - 2Dimension           DefinitionUnderstandability   Ease with which the sc...
Moody’s classific. of Quality dim. and metrics - 3Dimension        DefinitionSimplicity       The schema contains the mini...
Moody’s monumental contribution to empirical quality/      quality of diagrammatic notations (TSE 2009)Semiotic clarity – ...
Moody’s monumental contribution to empirical quality/     quality of diagrammatic notations (TSE 2009)Complexity managemen...
Moody’s monumental contribution to empirical quality/     quality of diagrammatic notations (TSE 2009)Visual expressivenes...
Moody’s monumental contribution to empirical quality/      quality of diagrammatic notations (TSE 2009)               Inte...
Others…
Genero et al. 2005 -  Formal      Meta         ClassificaFramework    schema           tion              Quality          ...
Genero et al classificationMaintainability is influenced by the following subcharacteristics:•  Understandability: the eas...
Herden           Formal      Meta           Frame                Classification                      schema            wor...
Herden classification•    Correctness•    Consistency•    Scope•    Level of detail•    Completeness•    Minimality•    Ab...
HerdenDimension                 Definition(Technical) Correctness   Correctness of concepts w.r.t reqs.(Technical) Consist...
Metadata in Herden’s classification•    Description•    Relevance•    Measuring•    Metric•    Degree of automation•    Ob...
Poels et alFormal      Meta     ClassificaFrame      schema       tion work          Quality         dimension        Qual...
Poels et alInterested in•  Perceived semantic quality•  Perceived pragmatic qualityTo understand their relationship with1....
Poels et al. classificationQuality#   Quality dimension     DefinitionPSQ1                             The schema represen...
Poels et al. classificationQuality#   Quality dimension       DefinitionPSQ1       Correctness/            The schema repr...
Poels et al general findings                     Perceived                    usefullness            0,1                  ...
Comparisonamong proposals
Physical and empirical qualityAuthor(s)/Types of       Batini     Scand.   Moody   ArabFrench   Genero et   Herden   Poels...
Syntactic and semantic qualityAuthor(s)/Types of           Batini     Scand.   Moody   ArabFrench   Genero    Herden   Poe...
Pragmatic, knowledge and process qualityAuthor(s)/Types of       Batini et   Scand.   Moody   ArabFren   Genero et   Herde...
Specific dimensions
Sheldon classification for Inheritance hierarchiesViewpoints.•  (1) The deeper a class is in the hierarchy, the higher the...
Sheldon classification for Inheritance hierarchies•  Maintainability•  Understandability•  Modifiability                  ...
Schema and Data Quality together
PersonWhen a schema is defined, quality                                                               ID   Name    Surname...
Experimentally investigated     by Arab French     Quality at schema level             Impact     Quality at data level   ...
Improving the quality    of schemas                        121
Methods•  Origins: achieving normal form    Decomposition techniques•  Scandinavian: derived from the framework•  Through...
Derived from framework            Syntactic quality•  Error prevention through syntax directed   editors•  Error detection...
Derived from framework              Semantic quality•  Consistency checking  –  Based on a logical description  –  Based o...
Derived from framework               Pragmatic quality•  Audience training•  Inspection and walkthroughs•  Transformations...
Derived from framework                 Social quality•  Integration  –  Intra project  –  Inter project  –  Inter organiza...
Through schema transformation                                127
The Assenova Johannesons approach              Dimensions consideredDimension         DefinitionExplicitness      Requirem...
Dimensions and transformations                                   Explicit   Size   Rule sim-   Rule uni-   Query      Stab...
Example transformation                 Partial attribute- The size of the schema increases (-)- Introducing the entity EMP...
Example of increased stabilityIntroducing different categories of employees can be done  in the new schema without violati...
Quality in data integration      architectures
The approach of Akoka et al. (2007)General statement: In DI Architectures quality of data  and quality of schemas have to ...
The approach of De Conseicao et al (2007)   Relevant qualities to be evaluated in DI arch.Given a DI Architecture defined ...
The approach of De Conseicao et al (2007)      Relevant qualities to be evaluated in DI arch. Given a DI Architecture defi...
The H. Dai et al. approach (2006)•  Focus on Column Heterogeneity                      e-mail, phone n.   Many e-mail and ...
The H. Dai et al. approach (2006)Focus on Column HeterogeneityHeterogeneity dimensions  –  Number of semantic types result...
The Moody’s approach        Classification of schemas related by integrationQuality categ.   DefinitionIntegration      Le...
The Chai approach           Matchability of schemas•  Focus on the evolution of a Data   Integration system, and the cost ...
Cases for matching mistakes•  Predict a spurious match•  Miss a match•  Predict a wrong match•  Strategy to improve matcha...
Batini et al. 2010Potential information content
The data architecture of a set of databases is the           allocation of concepts and tables              across the DB ...
Data integration technologies•    Virtual data integration•    Data Warehouses•    Application integration•    Consolidation
Potential information content                                                  Global                                  Boa...
Potential information content•  Given a schema I, global schemas resulting   from virtual integration of schemas S1,   S2,...
Example     E1	              E2	              E6	                                     Q11                Q12S1            ...
Quality of the documentation      for large related     groups of schemas                               147
Why integration alone is not enough?             ?                                Hundreds                               o...
Relationships investigated•  Integration•  Abstraction•  Abstraction/Integration                                      149
Abstraction              150
AbstractionDepartment       Employee          City                                                  Department            ...
Integration + abstraction                            152
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Information Quality in the Web Era
Upcoming SlideShare
Loading in...5
×

Information Quality in the Web Era

376

Published on

The tutorial has been presented at CAISE 2010. The tutorial discusses the state-of-the-art on research addresseing the quality of data at the conceptual level (conceptual schemas) and of Ontologies

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
376
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Information Quality in the Web Era

  1. 1. Tutorial at CAISE 2010Information Quality in the Web Era C. Batini & Matteo Palmonari Department of Computer Science, Communication and Systems University of Milano Bicocca [batini;palmonari]@disco.unimib.it 1
  2. 2. Outline•  Motivation [Palmonari] –  the Web era / information quality meeting ontologies / the ontology landscape /•  Quality of data (conceptual level) [Batini] –  frameworks / metamodels / dimensions / metrics / groups of schemas•  Quality of ontologies [Palmonari] –  frameworks / metamodels / dimensions / metrics•  Conclusions [Palmonari+Batini] 2
  3. 3. Outline•  Motivation: –  the Web era / information quality meeting ontologies / the ontology landscape /•  Quality of data (conceptual level) –  frameworks / metamodels / dimensions / metrics / groups of schemas•  Quality of ontologies –  frameworks / metamodels / dimensions / metrics•  Conclusions 3
  4. 4. the Web era ischaracterized by… The “Big Data” phenomenon
  5. 5. How to make sense of all these data? 5
  6. 6. Documents’ and diplicates’ size along time 6
  7. 7. How to make sense of all these data? Data management needs data quality 7
  8. 8. How to make sense of all these data? Data management needs data quality 8
  9. 9. Data/information heterogeneity in Information SystemsInformation is available in different formats and is represented according different modelsPlace Country Population Main economic activityPortofino Italy 7.000 Tourism Need to consider information Image quality for heterogeneousStructured data information sourcesPortofino Map Dear Laure, I try to describe the wonder- ful harbour of Portofino as I have seen Text this morning a boat is going in, other boats are along the wharf. Small pretty buildings 9 and villas are looking on to the harbour.
  10. 10. Tutorial Background - Data Quality (Structured Data)23rd International Conference on Conceptual Modeling (ER 2004), Shangai A Survey of Data Quality Issues in Cooperative Information Systems Carlo Batini Monica Scannapieco Università di Milano “Bicocca” Università di Roma “La Sapienza” batini@disco.unimib.it monscan@dis.uniroma1.it
  11. 11. Tutorial Background – Towards Information Quality (Heterogenous Data)Tutorial at ER 08, Barcelona, Spain Quality of Data, Textual Information and Images: a comparative survey Speaker: C. Batini Other authors: F. Cabitza, G. Pasi, R. SchettiniDipartimento di Informatica, Sistemistica e Comunicazione, Universita’ di Milano Bicocca, Milano, Italy batini@disco.unimib.it
  12. 12. How to make sense of all these data? Together with automatic techniques for information extraction, processing & integration, also need automatic techniques for assessing the quality of information Information quality for information shared, consumed and delivered on the Web Increasing attention to information semantics 12
  13. 13. Of course, the “Semantic Web” perspective•  Make the semantics of 1998 information explicit with Web- compliant ontologies* by –  sharing conceptualizations/ terminologies on the Web –  sharing data on the Web•  Models, languages & technologies –  E.g. RDF, RDFS, OWL, SKOS 2006By now, let’s consider a very broad definitionAn ontology is a specification of a conceptualization.T. R. Gruber. A translation approach to portable ontologies. Knowledge Acquisition, 5(2):199-220, 1993. 13
  14. 14. Ontologies out of the Semantic Web•  But also for the ones that are skeptic wrt the semantic Web,•  Ontologies (e.g. OWL ontologies, linked data, thesauri) can be considered useful external resources to use in –  Conceptual modeling –  Data integration –  Document management –  Service Oriented Computing –  Information retrieval –  … –  Software Engineering –  Information System Design 14
  15. 15. Ontology + “Information Systems” 15
  16. 16. Ontology + “Software Engineering” 16
  17. 17. Ontologies &Semantic Resources•  KB - Axiomatic ontologies (e.g. SUMO) –  Terminological (intentional/schema) level: concepts, relationships, axioms specifying logical constraints –  Assertional (extensional/data) level: instances, typing, relations between instances•  LD - Linked data on the Web (e.g. DBpedia) –  RDF data, usually light-weight KBs•  Th – Thesauri (e.g. WordNet) –  Lexical ontologies: terms, no schema vs. instances•  In synthesis, the ontology landscape includes: –  Shared Vocabulary (KB,LD,Th) –  Modeling principles (KB) –  Logical theories supporting reasoning (KB) –  Web-compliant representations of models and data (KB,LD,Th) 17
  18. 18. Need for ontology evaluation•  Ontology “Quality”  Ontology Evaluation•  Quality of ontologies matters! –  In particular, when ontologies: •  are built to support specific applications (their quality impacts on the application effectiveness) •  are searched on the Web, reused, extended –  Many ontologies to choose from! –  E.g. suppose that you need an ontology describing customer and the business domain 18
  19. 19. Searching for “Customer” with Sindice 19
  20. 20. Searching for “Customer” with Watson 20
  21. 21. Searching for “Customer” on Swoogle 21
  22. 22. Searching for “Customer” on Swoogle (refined search) 22
  23. 23. Ontologies and semantic resources should be considered in comprehensive studies about information quality in the Web era Tough work! Let’s start from the beginning: ontologies and structured data 23
  24. 24. Structured data and ontologies•  Structured data •  Ontologies (KB) Instances Instances Logical Schemas Schema Tight vs loose instance-schema Conceptual schemas coupling A - Concpetual level representations - Externalized models (semiotic objects) - Constraints on domain (data) Diagrammatic models (ER, UML,ORM) Logical models supporting reasoning 24
  25. 25. Ontologies and their grandparents•  Structured data •  Ontologies (KB) Instances Instances Logical Schemas Schema / Terminologies Conceptual schemas In this (mini) tutorial we will: - focus on the modeling level: “Quality of Conceptual Schemas and Ontologies” A -  provide a guided tour on the topic by - Concpetual level representations discussing only part of the material (soon - Externalized models (semiotic objects) available online) on domain (data) - Constraints Diagrammatic models (ER, UML,ORM) on -  focus common aspects and, in Logical models particular, differences supporting reasoning 25
  26. 26. Outline•  Motivation: –  the Web era / information quality meeting ontologies / the ontology landscape /•  Quality of data (conceptual level) –  frameworks / metamodels / dimensions / metrics / groups of schemas•  Quality of ontologies –  frameworks / metamodels / dimensions / metrics•  Conclusions 26
  27. 27. Outline•  Motivation: –  the Web era / information quality meeting ontologies / the ontology landscape /•  Quality of Conceptual Schemas –  frameworks / metamodels / dimensions / metrics / groups of schemas•  Quality of ontologies –  frameworks / metamodels / dimensions / metrics•  Conclusions 27
  28. 28. # of slides•  About 130  30•  I will provide mainly a guided introduction to the slides
  29. 29. In a database, quality can be investigated..•  At model (language) level•  At schema (model) level•  Al instance (value/data) level 29
  30. 30. Data quality dimensions 30
  31. 31. Acronym Data Quality DimensionTDQM Accessibility, Appropriateness, Believability, Completeness, Concise/Consistent representation, Ease of manipulation, Value added, Free of error, Interpretability, Objectivity, Relevance, Reputation, Security, Timeliness, UnderstandabilityDWQ Correctness, Completeness, Minimality, Traceability, Interpretability, Metadata Evolution, Accessibility (System, Transactional, Security), Usefulness (Interpretability), Timeliness (Currency, Volatility), Responsiveness, Completeness, Credibility, Accuracy, Consistency, InterpretabilityTIQM Inherent dimensions: Definition conformance (consistency), Completeness, Business rules conformance, Accuracy (to surrogate source), Accuracy (to reality), Precision, Nonduplication, Equivalence of redundant data, Concurrency of redundant data, Pragmatic dimensions: accessibility, timeliness, contextual clarity, Derivation integrity, Usability, Rightness (fact completeness), cost.AIMQ Accessibility, Appropriateness, Believability, Completeness, Concise/Consistent representation, Ease of operation, Freedom from errors, Interpretability, Objectivity, Relevancy, Reputation, Security, Timeliness, UnderstandabilityCIHI Dimensions: Accuracy, Timeliness Comparability, Usability, Relevance Characteristics: Over-coverage, Under-coverage, Simple/correlated response variance, Reliability, Collection and capture, Unit/Item non response, Edit and imputation, Processing, Estimation, Timeliness, Comprehensiveness, Integration, Standardization, Equivalence, Linkage ability, Product/Historical comparability, Accessibility, Documentation, Interpretability, Adaptability, Value.DQA Accessibility, Appropriate amount of data, Believability, Completeness, Freedom from errors, Consistency, Concise Representation, Relevance, Ease of manipulation, Interpretability, Objectivity, Reputation, Security, Timeliness, Understandability, Value added.IQM Accessibility, Consistency, Timeliness, Conciseness, Maintainability, Currency, Applicability, Convenience, Speed, Comprehensiveness, Clarity, Accuracy, Traceability, Security, Correctness, Interactivity.ISTAT Accuracy, Completeness, ConsistencyAMEQ Consistent representation, Interpretability, Case of understanding, Concise representation, Timeliness, Completeness Value added, Relevance, Appropriateness, Meaningfulness, Lack of confusion, Arrangement, Readable, Reasonability, Precision, Reliability, Freedom from bias, Data Deficiency, Design Deficiency, Operation, Deficiencies, Accuracy, Cost, Objectivity, Believability, Reputation, Accessibility, Correctness, Unambiguity, ConsistencyCOLDQ Schema: Clarity of definition, Comprehensiveness, Flexibility, Robustness, Essentialness, Attribute granularity, Precision of domains, Homogeneity, Identifiability, Obtainability, Relevance, Simplicity/Complexity, Semantic consistency, Syntactic consistency. Data: Accuracy, Null Values, Completeness, Consistency, Currency, Timeliness, Agreement of Usage, Stewardship, Ubiquity, Presentation: Appropriateness, Correct Interpretation, Flexibility, Format precision, Portability, Consistency, Use of storage, Information policy: Accessibility, Metadata, Privacy, Security, Redundancy, Cost.DaQuinCIS Accuracy, Completeness, Consistency, Currency, TrustworthinessQAFD Syntactic/Semantic accuracy, Internal/External consistency, Completeness, Currency, Uniqueness.CDQ Schema: Correctness with respect to the model, Correctness with respect to Requirements, Completeness, Pertinence, Readability, Normalization, Data: Syntactic/Semantic Accuracy, Semantic Accuracy, Completeness, Consistency, Currency, Timeliness, Volatility, Completability, Reputation, Accessibility, Cost. 31
  32. 32. Reference forquality of data in databases 2006 32
  33. 33. Here we focus on•  Model level• Schema level•  Data level 33
  34. 34. Quality of Conceptual Schemas - contents•  Frameworks and Metamodels proposed•  Quality of Schemas –  Classifications, Dimensions & Metrics: main proposals –  Comparison of proposals –  Improving the quality of schemas•  Quality of groups of schemas –  Quality of Data Integration Architectures –  Quality of the documentation for large related groups of schemas 34
  35. 35. Quality of schemas 35
  36. 36. Some figures on proposed approaches in the literature (from Mehmood 2009, citing Moody 2005) Research Practice Mixed # of proposals 29 8 2 Frameworks and % of total 74% 21% 5% metamodels Empirically validated 6 0 1 % 20% 0% 50% Generalizable 5 0 0 % 175 0% 0% Not generalizable 24 8 2 % 83% 100% 100%Generalizable means that the proposal can be applied toconceptual models in general and is not specific to, e.g., ER
  37. 37. Metaschema of approaches Formal Meta Classification Framework schema One/two or three level taxnomies Quality Concepts and dimension Concepts and paradigmsparadigms involved in involved in the life cyclea formally grounded of quality, namely in the approach to quality Quality production assessment subdimension and improvement activities Metrics Examples Experiments 37
  38. 38. Krogstie & Solvberg (the Scandinavians) Proposals Meta Classification Formal schema Framework• Shanks Quality• Arab French dimension• Vassiliadis Quality origins – Batini et al. The subdimension • Scandinavians • Arab French • Moody Metrics • Genero et al. • Herden • Poels Examples Experiments 38
  39. 39. Proposals Formal Meta ClassificationFramework schema Quality dimension Quality subdimension Metrics Examples Experiments 39
  40. 40. Frameworks for schema quality 40
  41. 41. Krogstie and Solvberg framework Social Participant quality knowledge Perceived Social actor Semantic InterpretationGoal of qualitymodeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Technical actor Intepretation 41
  42. 42. Krogstie and Solvberg framework Social Participant quality knowledge Perceived Social actor Semantic InterpretationGoal of Correspondence quality betweenmodeling the conceptual model and Physical Empirical Social quality quality domain the Pragmatic Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Technical actor Intepretation 42
  43. 43. Krogstie and Solvberg framework Correspondence between participant knowledge and individual interpretation Social Participant quality knowledge Perceived Social actor Semantic InterpretationGoal of qualitymodeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Technical actor Intepretation 43
  44. 44. Krogstie and Solvberg framework Social Participant quality knowledge Perceived Social actor Semantic InterpretationGoal of qualitymodeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Correspondence between the conceptual model and Technical actor Intepretation the language 44
  45. 45. Krogstie and Solvberg framework Social Participant quality knowledge Perceived Social actor Semantic Interpretation Goal of quality modeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic qualityCorrespondence between theconceptual model and the Technical actoraudience’s interpetation Intepretation of it 45
  46. 46. Correspondence between participant knowledge and Krogstie and Solvberg framework the externalized conceptual model ° Externalization: the knowledge of social actors has been externalized in the model Social ° Internalizability, the model is persistent Participant quality knowledge Perceived Social actor Semantic InterpretationGoal of qualitymodeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Technical actor Intepretation 46
  47. 47. Krogstie and Solvberg framework SocialIt is reflected by the error frequency when a model is Participant quality Perceivedread or written, so by readability and clarity Social actor knowledge Semantic InterpretationGoal of qualitymodeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Model Syntactic Language Semantic extension domain Externalization quality quality Technical Pragmatic quality Technical actor Intepretation 47
  48. 48. Krogstie and Solvberg framework Social Participant quality knowledge Perceived Social actor Semantic InterpretationGoal of qualitymodeling Physical Social Empirical quality Pragmatic quality Organizational quality quality Modeling Syntactic Language domain Agreement on participant knowledge Semantic Model quality extension quality Externalization and individual interpretation Technical Pragmatic quality Technical actor Intepretation 48
  49. 49. More formally•  G, the goals of the modeling task.•  L, the language extension, i.e., the set of all statements that are possible to make according to the graphemes, vocabulary, and syntax of the modeling languages used.•  D, the domain, i.e., the set of all statements that can be stated about the situation at hand.•  M, the model (schema) itself.•  Ks, the relevant explicit knowledge of those being involved in modeling. A subset of these is actively involved in modeling, and their explicit knowledge is indicated by KM.•  I, the social actor interpretation, i.e., the set of all statements that the audience thinks that an externalized model consists of.•  T, the technical actor interpretation, i.e., the statements in the model as interpreted by modeling tools. 49
  50. 50. Main quality types•  Physical quality: The basic quality goal is that the model M is available for the audience.•  Empirical quality deals with predictable error frequencies when a model is read or written by different users, coding (e.g. shapes of boxes) and HCI-ergonomics for documentation and modeling-tools. For instance, graph layout to avoid crossing lines in a model is a mean to address the empirical quality of a model.•  Syntactic quality is the correspondence between the model M and the language extension L.•  Semantic quality is the correspondence between the model M and the domain D. This includes validity and completeness.•  Perceived semantic quality is the similar correspondence between the audience interpretation I of a model M and his or hers current knowledge K of the domain D.•  Pragmatic quality is the correspondence between the model M and the audiences interpretation and application of it (I). 50
  51. 51. Framework for language (model) quality 51
  52. 52. Framework for language (model) quality Participant Social actor knowledge Interpretation Participant appropriatenessGoal ofmodeling Organizational Modeler appropr. Comprehensibility appropriateness appropriateness Model Externalization Language Modeling Domain extension domain Appropriateness Tool Appropriateness Technical actor Intepretation 52
  53. 53. Main quality typesDomain appropriateness. This relates the language and the domain. Ideally, the conceptual basis must be powerful enough to express anything in the domain, not having what terms construct deficit. On the other hand, you should not be able to express things that are not in the domain, i.e. what is termed construct excess. Domain appropriateness is primarily a mean to achieve semantic quality.Participant appropriateness relates the social actors’ explicit knowledge to the language. Participant appropriateness is primarily a mean to achieve pragmatic quality both for comprehension, learning and action.Modeler appropriateness: This area relates the language extension to the participant knowledge. The goal is that there are no statements in the explicit knowledge of the modeler that cannot be expressed in the language. Modeler appropriateness is primarily a mean to achieve semantic quality. 53
  54. 54. Main quality typesComprehensibility appropriateness relates the language to the social actor interpretation. The goal is that the participants in the modeling effort using the language understand all the possible statements of the language. Comprehensibility appropriateness is primarily a mean to achieve empirical and pragmatic quality.Tool appropriateness relates the language to the technical audience interpretations. For tool interpretation, it is especially important that the language lend itself to automatic reasoning. This requires formality (i.e. both formal syntax and semantics being operational and/or logical), but formality is not necessarily enough, since the reasoning must also be efficient to be of practical use. This is covered by what we term analyzability (to exploit any mathematical semantics) and executability (to exploit any operational semantics). Different aspects of tool appropriateness are means to achieve syntactic, semantic and pragmatic quality (through formal syntax, mathematical semantics, and operational semantics).Organizational appropriateness relates the language to standards and other organizational needs within the organizational context of modeling. These are means to support organizational quality. 54
  55. 55. Metamodels 55
  56. 56. Shanks et al. composite modelTheory based Domain Quality type Means Language Goal Property Prqa Model Activity Audience Weighting Quality factor Rating Evaluation method Practice based 56
  57. 57. Metamodels – Arab/French Mehmood, Chefri et al. 2009, based on goals, question, metrics Quality goal Q. Dimension Q. Attribute Model elementTransformation Transformation Q. Metric step rule 57
  58. 58. Metamodel instantiationQuality goal Ease of changeDimension Complexity Mantainability Quality Simplicity Structural Modu Under Modi attribute complexity larity standa fiabi bility lity Quality # of # of metric associations dependencies Transfor Merge Divide mation entities The model 58
  59. 59. Metamodels – Vassiliadis et al. For DWs Quality goal Q. Dimension Improvement Factor process Interaction Measurem. Q. Metric methodTransformation Information Measurem. System object value Date Data o. Process o. Model o. 59
  60. 60. Quality goal Q. Dimension Comparison Improvement Factor process Interaction Measurem. Q. Metric method Transformation Vassiliadis Information System object Measurem. value Date Data o. Process o. Model o.Quality goal Q. Dimension Q. Attribute Model element MehemoodTransformation Transformation Q. Metric step rule
  61. 61. Schema Quality Dimensions 61
  62. 62. The origins…Batini, Ceri, Navathe 1991 Formal Meta Classifica Frame schema tion work Quality dimension Quality subdimension Metrics Examples Experiments
  63. 63. Batini, Ceri, Navathe 1991Q. Dimension DefinitionCompleteness Represents all (only) relevant features ofPertinence requirementsCorrectness - Concepts are properly defined in the schemaSyntacticCorrectness - Concepts are used according to their definitionsSemanticMinimality Every aspect of reqs. appears only once in the schemaExpressiveness Can be easily understoodReadability Diagram respects aesthetic criteriaSelf-explaination Other formalisms and languages not neededExtensibility Easily adapted to changing requirementsNormality From theory of normalization 63
  64. 64. CompletenessCompleteness measures theextent to which a conceptual Students have aSchema includes all the code, a name, a place of birth.conceptual elements necessary tomeet some specified requirements.It is possible that the designer hasnot included certain characteristicspresent in the requirements in the Codeschema, e.g., attributes related to Student Namean entity Person; in this case, theschema is incomplete. 64
  65. 65. PertinencePertinence measures how manyunnecessary conceptual Students have a code and a name.elements are included in theConceptual schema. In the caseof a schema that is notpertinent, the designer hasGone too far in modeling the Coderequirements, and has included Student Name Place_oftoo many concepts. Birth 65
  66. 66. Correctness - syntacticConcerns the correct use of thecategories of the model in representingrequirements. StudentExample – In the Entity Relationshipmodel we may represent the (1,n)logical link between persons and their hasfirst names using the two entities Person (1,1)and FirstName and a relationship between First Namethem. The schema is not correct wrt themodel since an entity should be used onlywhen the concept has a unique existencein the real world and has an identifier. 66
  67. 67. Correctness - semanticCorrectness with respect to requirementsconcerns the correct representation ofThe requirements in terms of the model Managercategories. (1,n)Example - In an organization eachdepartment is headed by exactly one headsmanager and each manager may head (1,1)exactly one department. DepartmentIf we represent Manager and Departmentas entities, the Relationship between themshould be one-to-one; in this case, theSchema is correct wrt requirements. If weUse a one-to-many relationship, theschema is incorrect. 67
  68. 68. Minimality/Redundancy 1,nA schema is minimal if every Studentpart of the requirements is 1,nrepresented only once in the Attends 1,nschema. In other words, it is Course Assigned tonot possible to eliminate some 1,?element from the schema Teacheswithout compromising the 1,n Instructorinformation content. 1,n 68
  69. 69. Expressiveness/ReadabilityIntuitively, a schema is readable whenever it representsthe meaning of the reality represented by the schema in aclear way for its intended use. This simple, qualitativedefinition is not easy to translate in a more formal way,since the evaluation expressed by the word clearlyconveys some elements of subjectivity. In models, such asthe Entity Relationship model, that provide a graphicalrepresentation of the schema, called readability concernsboth the diagram and the schema itself. 69
  70. 70. Diagrammatic readabilityWith regard to the diagrammatic representation,readability can be expressed objectively by anumber of aesthetic criteria that human beings adopt indrawing diagrams: 1.  crossings between lines should be minimized, 2.  graphic symbols should be embedded in a grid, 3.  lines should be made of horizontal or vertical segments, 4.  The number of bends in lines should be minimized, 5.  the total area of the diagram should be minimized, and, finally, 6.  Parents in generalization hierarchies should be positioned at a higher level in the diagram in respect to children. 7.  The children entities in the generalization hierarchy should be symmetrical with respect to the parent entity. 70
  71. 71. Unreadable schema Works Manages Head Employee Floor Purchase Vendor Located Born InDepartment Warehouse Engineer Worker Of Produces Acquires Order Item Type City Warranty 71 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  72. 72. A Readable schema Floor Located Manages Head Born CityDepartment Employee Works Produces Vendor Worker Engineer Item In Warehouse Type Acquires Order Of Purchase Warranty 72 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  73. 73. Is diagrammatic readability objective? SEM Place close entitities inSYNT Minimize generalizations bends Works Manages Head Employee SYNT Minimize Minimize Floor Purchase crossings crossings… Vendor LocatedSEM Place most Born Don’t change at all ! important In Department Warehouse Engineer Worker concept in the Of middle Produces Acquires Order Item Type CitySYNT Use only Warranty horizontal Works Manages Floor Head Located Manages Employee Department Head Employee Born City Floor Purchase Works Vendor Located Born Produces Vendor Worker Engineer Department In Warehouse Engineer Worker Item In Of Produces Type Warehouse Acquires Order Item Type Warranty Acquires Order Of Purchase City Warranty 73 @C.Batini, 2009
  74. 74. But ……personal experience in China, Beda University, about 1985Question to chinese professors:Which one of the two diagrams do you like more? Works Manages Floor Located Manages Head Employee Head City Born Floor Department Employee Purchase Works Vendor Located Born Produces Vendor Worker Engineer In Department Warehouse Engineer Worker Item In Of Produces Acquires Order Warehouse Type Item Type City Acquires Order Of Purchase Warranty Warranty Answer: definitively the left one, we like asymmetry and movement … 74 @C.Batini, 2009
  75. 75. ExpressivenessThe second issue addressed by readability is thecompactness of schema representation. Among thedifferent conceptual schemas that equivalently represent acertain reality, we prefer the one or the ones that aremore compact, because compactness favors readability. 75
  76. 76. Transformation the preserves information content and enhances compactness/expressiveness Employee Born City EmployeeVendor Worker Engineer Vendor Worker Engineer Born Born City Born 76
  77. 77. Normalization Unnormalized ER schema Employee-Project Employee # Salary Project # Budget Role Normalized ER schema Employee 1,n 1,n Project Assigned toEmployee # Project #Salary Role Budget 77
  78. 78. Scandinavians (1994- Formal Meta Classification Framework schema Quality dimension Quality subdimension Metrics Examples Experiments
  79. 79. Scandinavians (1994- Formal Meta Classification Framework schema Quality dimension Quality subdimension Metrics Examples Experiments 79
  80. 80. Main model (schema) quality dimensionsPhysical quality•  Externalization, number of statements on the domain not yet stated in the model/total # of stat.•  Interalizability –  Persistence, proptection against loss or damage –  Availability, usual meaningEmpirical quality, deals with readability by the audience Expressed in terms of graph aesthetics and graph layout criteriaSyntactic quality, correspondence between the model (schema) and the language (model), where errors are due by Syntactic invalidity, words or graphems not part of the language are used Syntactic incompleteness, the model lack constructs to obey the language’s grammar (e.g. usa only one cardinality to express minimum and max cardsSemantic quality (feasible) Validity, the stements in the model are correct and relevant for the problem (feasible) Completeness, the model contains all the stements which would be correct and relevantPerceived semantic quality the correspondence between the actor interpetation of the model and her current knolwledge of the domain Validity CompletenessPragmatic quality, the correspondence between the model and the audience interpretation of it (Feasible) Comprehension the actors undesrstaod the moled, or else individual actors und. The part of the model relevant to themSocial quality, Agreement in knowledge, Agreement in model interpretationKnowledge quality, that is perfect when the audience knew everything about the domain at a given time. Validity Completeness 80
  81. 81. Language quality dimensions - 1May refer a. to the language or else b. to the relationship btwn language and other issues.In the first case may refer to: –  the constructs of the language –  the external visual representationFor both Perceptibility, how easy for persons is language comprehension Expressive power, what it is possible to espress in the language Expressive economy, hoe effectively can things be expressed in the lanugage Method/tool potential, how easily the language lends itself to proper method or tool support. Reducibility, what features are provided by the language to deal with large and complex models. 81
  82. 82. Language quality dimensions - 2Referring to the relationship btwn the language and other issuesDomain appropriateness, there are not statements in the domain that cannot be expressed in the languageParticipant kn. appr., statements in the language models are part of the explicit knowledge of participants.Knowledge externalizability appr. There are no statements in the explicit kn. of the participants that cannot be expressed in the languageComprehensibility apprTechnical actor interpretation appr. 82
  83. 83. More of Pragmatic quality•  Social pragmatic quality (to what extent people understand and are able to use the models) and technical pragmatic quality (to what extent tools can be made that interpret the models). 83
  84. 84. Arab French (2002- Formal Meta ClassificationFramework schema Quality dimension Quality subdimension Metrics Examples Experiments
  85. 85. Chefri et al. classification•  Specification –  Legibility •  Clarity •  Minimality –  Non Redundancy –  Factorization degree –  Aggregation degree –  Expressiveness •  Concept expressiveness •  Schema expressiveness –  Simplicity –  Correctness•  Usage –  Understandability •  Documentation degree •  User Vocabulary •  Concept independence degree –  Completeness •  Requirements coverage degree •  Cross modeling completeness•  Implementation –  Implementability –  Maintainability •  Modifiability •  Cohesion •  Coupling 85
  86. 86. Definitions – 1Q. Dimension DefinitionClarity is an aesthetic criterion, based on the graphical arrangementMinimality Every aspect of the requirements appears only onceMin - Non Redundancy No concept can be canceled without decreasing the information contentMin - Factorization degree Measures the effectiveness of inheritance hierarchies of the schemaMin - Aggregation degree Measures the efficient use of aggregate attributes in the schemaExpressiveness The schema can be easily understood without additional explainationExp – Concept and schema expr CompactnessSimplicity The schema contains the minimum possible constructsCorrectness (syntactical) Concepts are properly defined in the schemaUnderstandability (model) The easy with which the data model can be intepreted by the userUnderstandability (schema) How much modeling features are made explicitUnd – Documentation degree Presence of additional documentation for conceptsUnd – User vocabulary rate Users are able to make easy correspondences btwn schema and reqs.Und Concept independ. degree “short paths” for semantic intercnnections (ex. A ISA B) 86
  87. 87. Definitions - 1Q. Dimension DefinitionCompleteness The schema represents all relevant features in the requirementsComp – Requirements Correpondence btwn concepts in sch. and relevant terms incoverage reqsComp – Cross modeling Presence in a sch S1 of all concepts in schemas in a setcompl.Implementability Amount of effort to implement the schemaImp - Implementability Overall semantic distance btwn concept is the source m and conc in the target modelMaintainability Ease with which the schema can evolveMan - Modifiability # of modif. related to a concept mod. deriving from dependenciesMan - Cohesion Existence of clusters with high # of internal links btwn clusters compared with external linksMan – Coupling Existence of clusters with low # of links btwn clusters 87
  88. 88. Chefri et al. classification – metrics (examples)Specification Legibility –  Clarity # of concepts – number of crossings in the diagram –  Minimality •  Non Redundancy (# weight. conc. - # red. Conc.)/ total # weigh conc. •  Factorization degree •  Aggregation degree Expressiveness –  Concept expressiveness –  Schema expressiveness Simplicity Correctness 88
  89. 89. Metrics for structural complexity•  # of associations•  # of dependencies•  # of aggregations•  Depth inheritance tree, longest path from the root of a hierachy to the leaves
  90. 90. Moody 1998 - Meta Classification Formal schemaMethod for Framework Quality dimension Quality subdimension Metrics Examples Experiments
  91. 91. Moody’s classification•  Completness•  Integrity•  Flexibility•  Understendability•  Correctness•  Simplicity•  Implementability•  Integration  Quality of related groups of schemas (see later) 91
  92. 92. Moody’s classific. of Quality dim. and metrics - 1Dimension DefinitionCompleteness The schema contains all the information required to meet reqs.Completness M1 # of items that do not correspond to reqs.Completness M2 # of reqs. Not represented in the schemaCompletness M3 # of items that inacurrately represent reqsCompletness M4 # of inconsistencies in the schemaIntegrity Extent to which the business rules on data are enforced by the sch.Integrity M1 # of business rules not enforced by the schemaIntegrity M2 # of integrity constr. In the schema not accurate in repr. Bus. rulesFlexibility The ease with which the schema can cope with business changeFlexibility M1 # of elements in the sch. Which are subject to change in the futureFlexibility M2 Estimated cost of changesFlexibility M3 Strategic importance of change 92
  93. 93. Moody’s classific. of Quality dim. and metrics - 2Dimension DefinitionUnderstandability Ease with which the schema can be understoodUnderstandability User ratingM1Understandability Ability of users to interpret the model correctlyM2Understandability Application developer ratingM3Correctness The schema conforms to the rules of the conceptual modelCorrectness M1 # of violations to model conventionsCorrectness M2 Intra ent. Redundancy: Number of normal form violationsCorrectness M3.a Inter ent. Redundancy: # of redund. concepts in the schema 93
  94. 94. Moody’s classific. of Quality dim. and metrics - 3Dimension DefinitionSimplicity The schema contains the minimum possible constructsSimplicity M1 # of entitiesSimplicity M2 # of entities + relationshipsSimplicity M3 # of entities + relationships + attributesImplementability Ease with which the schema can be implemented within time, budget, technology constraintsImplement M1 Technical risk ratingImplement M2 Schedule risk ratingImplement M3 Development cost estimate 94
  95. 95. Moody’s monumental contribution to empirical quality/ quality of diagrammatic notations (TSE 2009)Semiotic clarity – there should be a 1:1 correspondence between semantic constructs and graphical symbols Symbol redundancy Symbol overload Symbol excess Symbol deficitPerceptual discriminability: different symbols should be clearly distinguishable form each other Visual distance Discriminability tresholdSemantic transparency: use visual representations whose appearenace suggests their meaning, where symbols can be Immediate Semantically opaque Semantically perverse Semantic translucent
  96. 96. Moody’s monumental contribution to empirical quality/ quality of diagrammatic notations (TSE 2009)Complexity management: include explicit mechanisms for dealing with complexity Modularization AbstractionCognitive integration: include explicit mechanisms to support integration of information for different diagrams Conceptual integration Contextualization Perceptual integration Wayfinding
  97. 97. Moody’s monumental contribution to empirical quality/ quality of diagrammatic notations (TSE 2009)Visual expressiveness: use the full range and capacities of visual variables Degree of visual freedom SaturationDual coding: use text to complement graphicsGraphic economy: the number of different graphical symboles should be cognitively maneageble Symbol deficitCognitive fit: use different visual dialects for different tasks and audiences Visual mono/plurilinguism
  98. 98. Moody’s monumental contribution to empirical quality/ quality of diagrammatic notations (TSE 2009) Interactions among principlesSemiotic Clarity can affect Graphic Economy either positively or negatively: Symbol excess and symbol redundancy increase graphic complexity, while symbol overload and symbol deficit reduce it.Perceptual Discriminability increases Visual Expressiveness as it involves using more visual variables and a wider range of values (a side effect of increasing visual distance); similarly, Visual Expressiveness is one of the primary ways of improving Perceptual Discriminability.Increasing Visual Expressiveness reduces the effects of graphic complexity, while Graphic Economy defines limits on Visual Expressiveness (how much information can be effectively encoded graphically).Increasing the number of symbols (Graphic Economy) makes it more difficult to discriminate between them (Perceptual Discriminability).Perceptual Discriminability, Complexity Management, Semantic Transparency, Graphic Economy, and Dual Coding improve effectiveness for novices, though Semantic Transparency can reduce effectiveness for experts (Cognitive Fit).Semantic Transparency and Visual Expressiveness can make hand drawing more difficult (Cognitive Fit)
  99. 99. Others…
  100. 100. Genero et al. 2005 - Formal Meta ClassificaFramework schema tion Quality dimension Quality subdimension Metrics Examples Experiments
  101. 101. Genero et al classificationMaintainability is influenced by the following subcharacteristics:•  Understandability: the ease with which the conceptual data model can be understood.•  Legibility: is the ease with which the conceptual data model can be read, with respect to certain aesthetic criteria [13].•  Simplicity: means that the conceptual data model contains the minimum number of constructions possible.•  Analysability: the capability of the conceptual data model to be diagnosed for deficiencies or for parts to be modified.•  Modifiability: the capability of the conceptual data model to enable a specified modification to be implemented.•  Stability: the capability of the conceptual data model to avoid unexpected effects from modifications.•  Testability: the capability of the conceptual data model to enable modifications to be validated 101
  102. 102. Herden Formal Meta Frame Classification schema work QualityMetadata dimension Quality subdimension Metrics Examples Experiments
  103. 103. Herden classification•  Correctness•  Consistency•  Scope•  Level of detail•  Completeness•  Minimality•  Ability of integration (see later)•  Readability 103
  104. 104. HerdenDimension Definition(Technical) Correctness Correctness of concepts w.r.t reqs.(Technical) Consistency Absence of contradictionScope Comprehensive w.r.t. general user acceptanceLevel of detail Adequacy in detail w.r.t. user acceptanceCompleteness Completeness w.r.t. requirementsMinimality Compactness and absence of redundanciesReadability Completeness od documentation 104
  105. 105. Metadata in Herden’s classification•  Description•  Relevance•  Measuring•  Metric•  Degree of automation•  Objectivity 105
  106. 106. Poels et alFormal Meta ClassificaFrame schema tion work Quality dimension Quality subdimension Metrics Examples Experiments
  107. 107. Poels et alInterested in•  Perceived semantic quality•  Perceived pragmatic qualityTo understand their relationship with1.  Perceived ease of use (efficiency)2.  Perceived usefullness (effectiveness)and3.  User information satisfaction
  108. 108. Poels et al. classificationQuality# Quality dimension DefinitionPSQ1 The schema represents the business process correctlyPSQ2 The schema is a realistic representation of the business processPSQ3 The schema contains contradicting elementsPSQ4 The schema contains redundant elementsPSQ5 Elements must be added to faithfully represent the business processPSQ6 All the elements in the conceptual schema are relevant for the representation of the business processPSQ7 The schema gives a complete representation of the business process 108 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  109. 109. Poels et al. classificationQuality# Quality dimension DefinitionPSQ1 Correctness/ The schema represents the business process Validity correctlyPSQ2 Feasible cor- The schema is a realistic representation of rectness/validity the business processPSQ3 Coherence The schema contains contradicting elementsPSQ4 Non redundancy The schema contains redundant elementsPSQ5 ??? Elements must be added to faithfully represent the business processPSQ6 Relevance All the elements in the conceptual schema are relevant for the representation of the business processPSQ7 Completeness The schema gives a complete representation of the business process 109 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  110. 110. Poels et al general findings Perceived usefullness 0,1 0,38Perceived UserSemantic Infornation quality 0,58 satisfaction 0,35 0,29 Perceived ease of use
  111. 111. Comparisonamong proposals
  112. 112. Physical and empirical qualityAuthor(s)/Types of Batini Scand. Moody ArabFrench Genero et Herden Poelsqualities et al 91 94- 98- 02- 2005Physical qualityExternalization xPersistence xAvailability xEmpirical qualityMinimality x x xReadability/legibility x x x x xExpressiveness x x xSimplicity/self x x x xexplainationGraph aesthetics/ x x xreadability/ClarityUnderstandability X-3 x x 112
  113. 113. Syntactic and semantic qualityAuthor(s)/Types of Batini Scand. Moody ArabFrench Genero Herden Poelsqualities et al 91 94- 98- 02- et 2005Syntactic qualityInvalidity x x x xIncompleteness xSemantic qualityValidity/Correctness x x X-1 x xFeasible validity x xNormality xIntegrity X-2 x xCompleteness x x X-4 x xLevel of detail xScope xRelevance/Pertinence x x xPerceived semanitc quality xAnalyzability xTestability x 113
  114. 114. Pragmatic, knowledge and process qualityAuthor(s)/Types of Batini et Scand. Moody ArabFren Genero et Herden Poelsqualities al 91- 94- 98- ch 02- 05Pragmatic qualityComprehension xSocial quality xAgreement in xknowledgeAgreement in modelinterpret.Knowledge qualityCompleteness xValidity xProcess qualityImplementability xStability xMaintainability/ Fle- x X - 3 xxibility/Extensibility 114
  115. 115. Specific dimensions
  116. 116. Sheldon classification for Inheritance hierarchiesViewpoints.•  (1) The deeper a class is in the hierarchy, the higher the degree of methods inheritance, making it more complex to predict its behavior.•  (2) Deeper trees constitute greater design complexity, since more methods and classes are involved.•  (3) The deeper a particular class is in the hierarchy, the greater the potential reuse of inherited methods. 116 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  117. 117. Sheldon classification for Inheritance hierarchies•  Maintainability•  Understandability•  Modifiability 117 @C.Batini, F. Cabitza, G. Pasi, R. Schettini, 2008
  118. 118. Schema and Data Quality together
  119. 119. PersonWhen a schema is defined, quality ID Name Surnamecan be achieved working both onthe schema and on the instance 1 John Smith 2 Mark Bauer 3 Ann Swenson Person AddressID Name Surname Address ID StreetPrefix StreetName Number City1 John Smith 113 Sunset Avenue A11 Avenue Sunset 113 Chicago 60601 Chicago2 Mark Bauer 113 Sunset Avenue A12 Street 4 Heroes null Denver 60601 Chicago3 Ann Swenson 4 Heroes Street Denver ResidenceAddress (a) PersonID AddressID 1 A11 (b) 2 A11 3 A12 119
  120. 120. Experimentally investigated by Arab French Quality at schema level Impact Quality at data level Interdependencies
  121. 121. Improving the quality of schemas 121
  122. 122. Methods•  Origins: achieving normal form  Decomposition techniques•  Scandinavian: derived from the framework•  Through schema transformations 122
  123. 123. Derived from framework Syntactic quality•  Error prevention through syntax directed editors•  Error detection through syntactic checks 123
  124. 124. Derived from framework Semantic quality•  Consistency checking –  Based on a logical description –  Based on constructivity, namely through properties of the generation process (Langefors et al.) –  Use of driving questions to improve completeness 124
  125. 125. Derived from framework Pragmatic quality•  Audience training•  Inspection and walkthroughs•  Transformations (see also later) –  Rephrasing –  Filtering•  Translation –  Explaination generation –  Model execution•  Documentation•  Prototyping 125
  126. 126. Derived from framework Social quality•  Integration –  Intra project –  Inter project –  Inter organizational•  Integration process –  Pre-integration –  Viewpoint comparison –  Viewpoint conforming –  Merginf and restructuring 126
  127. 127. Through schema transformation 127
  128. 128. The Assenova Johannesons approach Dimensions consideredDimension DefinitionExplicitness Requirements are represented at the schema level, not at instance levelSize # of entities + relationships + attributesRule simplicity High # of business rules are represented by simple type of constraintsRule uniformity Cardinality constraints are uniform,Query simplicity Simple retrieval form requirements corresponds to simple queries on the schemaStability Small changes in requrements result in small changes in requirements 128
  129. 129. Dimensions and transformations Explicit Size Rule sim- Rule uni- Query Stabi ness plicity formity Simplic. lityPartial attributes - + +Non surjective attributes - + +Partial attr. which are total in + + - +UnionNon-surg. attributes surjective + + - +in Un.M-N attributes - + - +Lexical attributes - + - +Attributes with fixed ranges + - = +Two non disjoint entities + - + +Non unary “overloaded” +/- +attributes 129
  130. 130. Example transformation Partial attribute- The size of the schema increases (-)- Introducing the entity EMPLOYEE results in -  increased rule uniformity (+) (all attributes are total) -  increased stability (+)  130
  131. 131. Example of increased stabilityIntroducing different categories of employees can be done in the new schema without violating rule simplicityThe same cannot be done in the old schema Old schema New schema 131
  132. 132. Quality in data integration architectures
  133. 133. The approach of Akoka et al. (2007)General statement: In DI Architectures quality of data and quality of schemas have to be considered together•  Qualities at schema level –  Completeness, –  Understandability –  Minimality –  Expressiveness•  Qualities at data level –  Completeness •  Coverage •  Density –  Uniqueness –  Consistency –  Freshness •  Currency •  Timeliness –  Accuracy •  Semantic •  Syntactic •  Precision
  134. 134. The approach of De Conseicao et al (2007) Relevant qualities to be evaluated in DI arch.Given a DI Architecture defined in terms of•  [Data, Local Schemas, Global Schema, Data sources] DI Element IQ Criteria Data Sources Reputation; Verifiability; Availability; Response Time Schema Schema completeness, Minimality, Type Consistency Data Data Completeness, Timeliness, Accuracy
  135. 135. The approach of De Conseicao et al (2007) Relevant qualities to be evaluated in DI arch. Given a DI Architecture defined in terms of •  [Data, Local Schemas, Global Schema, Data sources]Quality Definition Refers to Metrics Detailed in terms ofdimensionCompleteness Coverage of global Global schema 1 – (# of incomplete schema concepts items / # total wrt the application items) domainMinimality Extent in which the Global schema 1 – (# redundant Attrib. in an entity schema is schema elements / Attrib. in diff. Ent. compaclty modeled # total items) Ent. Redundancy degree and without Redundant Relationship redundancy Entity Redund. of a Schema Relationsh. Red. Of a Schema Schema MinimalityType Data Type Global schema 1 – (# of Data type consistencyConsistency uniformity across + inconsistent schema Attribute type consistency the schemas Local schemas elements / # total Schema data type consistency schema elements )
  136. 136. The H. Dai et al. approach (2006)•  Focus on Column Heterogeneity e-mail, phone n. Many e-mail and Only E-mail addr. And socsec n. phone numb. And Few phone numb Socsec numbers B more heterogeneous than a B more heterogeneous than c B more heterogeneous than d
  137. 137. The H. Dai et al. approach (2006)Focus on Column HeterogeneityHeterogeneity dimensions –  Number of semantic types resulting in different clusters –  Cluster entropy –  Probabilistic soft clustering
  138. 138. The Moody’s approach Classification of schemas related by integrationQuality categ. DefinitionIntegration Level of consistency of the schema with the rest of the org. dataIntegr M1 # of data conflicts with the Corporate SchemaIntegr M1.a # of entity conflictsIntegr M1.b # of data element conflicts, namely, defs. and domainsIntegr M1.c # of naming conflicts (synonims + homonims)Integr. M2 # of data conflicts with existing systems (ES)Integr M2.a # of data element conflicts, namely, defs. and domains with ESIntegr M2.b # of key conflicts, namely, defs. and domains with ESIntegr M2.c # of naming conflicts (synonims + homonims) with ESIntegr M3 # of data elements with duplicate data elem. in ESIntegr M4 Rating by representatives of other business areas 138
  139. 139. The Chai approach Matchability of schemas•  Focus on the evolution of a Data Integration system, and the cost of maintaining the mediated schema S•  Quality observed: the matchability of S against a matching tool M, defined as•  the average of accuracies of matching S with future schemas F1, F2, …Fn (that we assume known at least to some extent) using M
  140. 140. Cases for matching mistakes•  Predict a spurious match•  Miss a match•  Predict a wrong match•  Strategy to improve matchability –  Change concepts in M using rules that minimize error probability
  141. 141. Batini et al. 2010Potential information content
  142. 142. The data architecture of a set of databases is the allocation of concepts and tables across the DB data schemasExample of change of data architecture due to improving access efficiencyEmployeeEmployee # DistributeSalary d DBAssigned-toEmployee #Project #Role Centralized DBProjectProject #Budget
  143. 143. Data integration technologies•  Virtual data integration•  Data Warehouses•  Application integration•  Consolidation
  144. 144. Potential information content Global Boat has schema Tax payer declares Income Find CF, Name of Tax Payer thatTax payer Boat declares <= 30.000 € and has declares has >= 1 Boat Income Tax payer Sources
  145. 145. Potential information content•  Given a schema I, global schemas resulting from virtual integration of schemas S1, S2, .., Sn, the potential information content of I is the set of queries that can be performed on I and cannot be performed on S1, S2, .., Sn.
  146. 146. Example E1   E2   E6   Q11 Q12S1 E3   S2 E4   Q21 E5   146
  147. 147. Quality of the documentation for large related groups of schemas 147
  148. 148. Why integration alone is not enough? ? Hundreds of schemas
  149. 149. Relationships investigated•  Integration•  Abstraction•  Abstraction/Integration 149
  150. 150. Abstraction 150
  151. 151. AbstractionDepartment Employee City Department Employee Seller Item Order Item in Order of Purchaser Floor Department Employee City Department Employee City Seller Seller Engineer Clerk Item in Order Item in Order of Purchaser ofWarranty Warehouse Purchaser 151
  152. 152. Integration + abstraction 152

×