SlideShare a Scribd company logo
1 of 25
The role of thesauri in
data modeling
Danny Greefhorst
dgreefhorst@archixl.nl
Topics in this presentation
• What is a thesaurus and why is it valuable?
• What does a thesaurus look like?
• How does a thesaurus relate to a data model?
• SKOS as a language for describing thesauri
• Guidelines for good definitions (based on ISO 704)
• Quality control for thesauri
• A thesaurus is a type of controlled vocabulary for content retrieval.
• A controlled vocabulary is a defined list of explicitly allowed terms used to
index, categorize, tag, sort, and retrieve content through browsing and
searching.
• A thesaurus provides information about each term and its relationship to
other terms.
• Relationships are either hierarchical, associative or equivalent.
• Thesauri can be used to:
• organize unstructured content
• uncover relationships between content from different media
• improve website navigation
• optimize search
Thesaurus in the Data Management Body of
Knowledge
3
• A business glossary is a means of sharing this vocabulary within the organization.
• A data steward is generally responsible for business glossary content.
• They enhance enterprise knowledge by associating data assets with glossary
terms.
• Business glossaries have the following objectives:
• enable common understanding of the core business concepts and
terminology
• reduce the risk that data will be misused due to inconsistent
understanding of the business concepts
• improve the alignment between technology assets (with their
technical naming conventions) and the
business organization
• maximize search capability and enable access to
documented institutional knowlegde
Business glossary in Data Management Body of
Knowledge
4
Link concepts to other objects
Concept
Document/ web content
Application
Business rule
API specification
Database definition
Data model
Dataset
Dashboard/report
A controlled vocabulary is needed to make data FAIR
https://www.go-fair.org/fair-principles/
Concepts and data lineage - wat does the data mean?
Regulations such as PERDARR/BCBS239 ask explicitly for a catalogue of
concepts:
• As a precondition, a bank should have a “dictionary” of the concepts used, such that data
is defined consistently across an organization
• A bank should develop an inventory and classification of risk data items which includes a
reference to the concepts used to elaborate the reports.
Data Data Data
Concepts Concepts Concepts
Report
Horizontal data lineage
Vertical
data
lineage
Practical template for concepts
Name Description
Term A preferred linguistic reference to a concept.
URI A unique identifier to the concept.
Domain Domain in which the concept exists.
Definition The formal definition of the concept.
Source A reference to the source of the definition.
Informal definition A simple definition of the concept that is understandable for a broad audience.
Explanation A further clarification of the concept and the way it is used in the specific context.
Editorial notes Remarks that are related to decisions made during the description of the concept.
Examples A short summary or description of example instances of the concept.
Synonyms Terms that denote almost the same concept.
Exact match Concepts in another thesaurus that denote the same concept.
Related Concepts that are related to the concept in another (non-hierarchical) manner.
Broader Concepts that have a broader meaning than the concept.
Broader partitive Concepts that represent a whole that the concept is a part of.
Levels of modeling
Thesaurus
Logical data model
Physical data model
A collection of concepts and their relationships
A design of a data structure
A technology-specific representation of data
Information model A formal description of a universe of discourse
Conceptual
Logical
Physical
Level Type of model Description
Semiotic triangle
Thought or reference
Referent
Symbol
Stands for
Source: Ogden and Richards (1923)
A concept model is a model that develops the meaning of
core concepts for a problem domain, defines their collective
structure, and specifies the appropriate vocabulary needed
to communicate about it consistently.
Data models can usually be rather easily derived from
concept models
Strengths of a concept model:
• Provides a business-friendly way to communicate with
stakeholders about precise meanings and subtle
distinctions.
• Is independent of data design biases and the often
limited business vocabulary coverage of data models.
• Proves highly useful for white-collar, knowledge-rich,
decision-laden business processes.
• Helps ensure that large numbers of business rules and
complex decision tables are free of ambiguity and fit
together cohesively.
A thesaurus is close to a concept model
Source: Ron Ross: “Business Rule Concepts” and https://www.brcommunity.com/articles.php?id=b779
Linking concepts to a data model – MIM standard
All model elements have a property “Concept”:
Reference to a concept, from a model element, indicating on
which concept, or concepts, the information model element
is based. The reference is in the form of a term or a URI.
SKOS - Simple knowledge organisation system
• Open standard of the W3C – defined in 2009
• Part of and based on Linked Data standards such as RDF
• Makes every concept findable on the web with a URI
• Offers a model for describing knowledge organisation systems such as thesauri
• Based on general theory and standards about thesauri
• Specifically aimed at publication of concepts on the web
• Simplified model for describing concepts compared to other systems
• Uses RDF and accompanyning standards (XML, TTL, JSON-LD)
• Supported by various commercial and open source tools
• Can be combined with the SKOS-THES standard to include part of and instance of
relationships
• Can be combined with the Dublin Core metadata standard
• More information: https://www.w3.org/TR/skos-primer/
Practical template mapped to SKOS
Name SKOS representation
Term skos:prefLabel
URI skos:Concept
Domain skos:member
Definition skos:definition
Source dc:source
Informal definition rdfs:comment
Explanation skos:scopeNote
Editorial notes skos:editorialNote
Examples skos:example
Synonyms skos:altLabel
Exact match skos:exactMatch
Related skos:related
Broader skos:broader
Broader partitive isothes:broaderPartitive
Practical template for concepts
Name Description
Term A preferred linguistic reference to a concept.
URI A unique identifier to the concept.
Domain Domain in which the concept exists.
Definition The formal definition of the concept.
Source A reference to the source of the definition.
Informal definition A simple definition of the concept that is understandable for a broad audience.
Explanation A further clarification of the concept and the way it is used in the specific context.
Editorial notes Remarks that are related to decisions made during the description of the concept.
Examples A short summary or description of example instances of the concept.
Synonyms Terms that denote almost the same concept.
Exact match Concepts in another thesaurus that denote the same concept.
Related Concepts that are related to the concept in another (non-hierarchical) manner.
Broader Concepts that have a broader meaning than the concept.
Broader partitive Concepts that represent a whole that the concept is a part of.
A SKOS concept
<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dc="http://purl.org/dc/terms/"
xmlns:ns0="http://www.eionet.europa.eu/gemet/2004/06/gemet-schema.rdf#">
<skos:narrower rdf:resource="http://www.eionet.europa.eu/gemet/concept/15031"/>
<dc:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">
2004-09-08T09:59:20+00:00</dc:modified>
<skos:prefLabel xml:lang="en">air quality</skos:prefLabel>
<skos:prefLabel xml:lang="nl">luchtkwaliteit</skos:prefLabel>
<dc:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">
2004-09-08T09:59:20+00:00</dc:created>
<skos:definition xml:lang="en">The degree to which air is polluted; the type and
maximum concentration of man-produced pollutants that should be permitted in the
atmosphere.</skos:definition>
</rdf:RDF>
Guidelines for formulating definitions of concepts (1)
• Connect with general language use and the language in the organization
in terms and definitions
“Car” instead of “Automobile”
• Define terms with a short name (term), in singular and starting with a
capital letter
“Car” instead of “Cars”
• Keep definitions as short as possible; include only distinguishing features
"a motorized vehicle with 4 wheels" instead of "a motorized vehicle with 4 wheels that
can be used for both private and business transport“
• Use intensional definitions where possible; name the distinguishing
features of a concept
“a 4-wheel motorized vehicle” instead of “sedan or station wagon”
Guidelines for formulating definitions of concepts (2)
• Start by defining more general and broader words
Define car first, then define station wagon
• Define a term with a narrower meaning such as “A <broader notion> that…”
A station wagon is “a car with a large cargo area”
• Do not include features of a broader concept in the definition of a concept
Not: a station wagon is “a car with 4 wheels and a large loading space”
• Adopt definitions from official sources where possible and consistent with
proprietary terminology
Do not adopt a definition from a commercial source (such as a supplier)
Guidelines for formulating definitions of concepts (3)
• Define separate terms for all non-common words in definitions
A “wheel” is a common word and needs no definition
• Define not only concepts that lead 1-1 to data elements, but also the relevant
context
Define road and driver in addition to car
• Avoid circular definitions; do not express a concept in terms of itself or its
conjugations and do not allow definitions of concepts to refer to each other
Driving is “moving around with a car” instead of “driving a car”
• Avoid definitions that contain negations; define what something is and not what
something isn't (unless you define opposing concepts)
Not: a car is “a vehicle that is not a truck”
Guidelines for formulating definitions of concepts (4)
• Avoid the term “data” and anything directly related to data in definitions of terms
Not: a car is “four-wheel vehicle data”
• Support the definition with an explanation that indicates how the term is used
within the organization
Explanation for cars: “Cars are only relevant to our organization from the perspective of
parking.”
• Avoid using homonyms whenever possible
Don't: define the term “Car” in two ways
• Only name synonyms that are frequently used and acceptable
“Automotive” as a synonym for “Car”, but not “Motor car”
Quality rules for SKOS thesauri
https://seco.cs.aalto.fi/publications/2014/suominen-mader-skosquality.pdf
Example quality rules in more detail
• Omitted or Invalid Language Tags: Literals should be tagged consistently with a
language tag.
• Undocumented Concepts: Concepts should include the set of “documentation
properties" as defined in the SKOS Reference.
• Overlapping Labels: No two concepts should have the same preferred lexical label
in a given language when they belong to the same concept scheme.
• Disjoint Labels Violation: skos:prefLabel, skos:altLabel and skos:hiddenLabel
should be pairwise disjoint properties.
• Extra Whitespace in Labels: Labels should not have any leading or trailing
whitespace.
• Orphan Concepts: Concepts should have associative or hierarchical relationships
with other concepts.
https://seco.cs.aalto.fi/publications/2014/suominen-mader-skosquality.pdf
Summary
• A thesaurus gives meaning to words
• Concepts can be linked to all sorts of artefacts, enabling findability
• Data modelling should start at a thesaurus level
• Open and FAIR data requires a controlled vocabulary such as a thesaurus
• SKOS is the de facto standard for thesauri
More information?
ArchiXL thesaurus:
https://begrippen.archixl.nl/archixl/nl/
BegrippenXL thesaurusplatform:
https://www.begrippenxl.nl/en/?clang=nl

More Related Content

What's hot

Topic map for Topic Maps case examples
Topic map for Topic Maps case examplesTopic map for Topic Maps case examples
Topic map for Topic Maps case examplestmra
 
Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and FolksonomiesHeather Hedden
 
Subject analysis, subject heading principles
Subject analysis, subject heading principlesSubject analysis, subject heading principles
Subject analysis, subject heading principlesRichard.Sapon-White
 
LIS 653, Session 9: Subject Analysis
LIS 653, Session 9: Subject Analysis LIS 653, Session 9: Subject Analysis
LIS 653, Session 9: Subject Analysis Dr. Starr Hoffman
 
Subject analysis: What's it all about, Alfie?
Subject analysis:  What's it all about, Alfie?Subject analysis:  What's it all about, Alfie?
Subject analysis: What's it all about, Alfie?Johan Koren
 
Subject analysis, an introduction
Subject analysis, an introductionSubject analysis, an introduction
Subject analysis, an introductionRichard.Sapon-White
 
4. Publication Strategy - Iustin Dornescu (UoW)
4. Publication Strategy - Iustin Dornescu (UoW)4. Publication Strategy - Iustin Dornescu (UoW)
4. Publication Strategy - Iustin Dornescu (UoW)RIILP
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for TranslationRIILP
 
Last But Not Least - Managing The Indexing Process
Last But Not Least  - Managing The Indexing ProcessLast But Not Least  - Managing The Indexing Process
Last But Not Least - Managing The Indexing ProcessFred Leise
 
LIS 703 Subject Analysis by Malgorzata Kot
LIS 703 Subject Analysis by Malgorzata KotLIS 703 Subject Analysis by Malgorzata Kot
LIS 703 Subject Analysis by Malgorzata KotMalgorzataKot
 
Using the library for research
Using the library for researchUsing the library for research
Using the library for researchRoddy MacLeod
 
Logistics Management 354 - Reading and Referencing
Logistics Management 354 - Reading and ReferencingLogistics Management 354 - Reading and Referencing
Logistics Management 354 - Reading and Referencingpvhead123
 
4 Literature Search Techniques 2 Strategic Searching
4 Literature Search Techniques 2 Strategic Searching4 Literature Search Techniques 2 Strategic Searching
4 Literature Search Techniques 2 Strategic Searchingrichard kemp
 
Literature search and review
Literature search and reviewLiterature search and review
Literature search and reviewGraça Gabriel
 
Subject analysis, process of subject analysis
Subject analysis, process of subject analysisSubject analysis, process of subject analysis
Subject analysis, process of subject analysisRichard.Sapon-White
 

What's hot (20)

Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013
 
Topic map for Topic Maps case examples
Topic map for Topic Maps case examplesTopic map for Topic Maps case examples
Topic map for Topic Maps case examples
 
Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and Folksonomies
 
Machine Aided Indexer
Machine Aided IndexerMachine Aided Indexer
Machine Aided Indexer
 
Subject analysis, subject heading principles
Subject analysis, subject heading principlesSubject analysis, subject heading principles
Subject analysis, subject heading principles
 
LIS 653, Session 9: Subject Analysis
LIS 653, Session 9: Subject Analysis LIS 653, Session 9: Subject Analysis
LIS 653, Session 9: Subject Analysis
 
Subject analysis: What's it all about, Alfie?
Subject analysis:  What's it all about, Alfie?Subject analysis:  What's it all about, Alfie?
Subject analysis: What's it all about, Alfie?
 
Subject analysis, an introduction
Subject analysis, an introductionSubject analysis, an introduction
Subject analysis, an introduction
 
4. Publication Strategy - Iustin Dornescu (UoW)
4. Publication Strategy - Iustin Dornescu (UoW)4. Publication Strategy - Iustin Dornescu (UoW)
4. Publication Strategy - Iustin Dornescu (UoW)
 
Taxonomy made easy
Taxonomy made easyTaxonomy made easy
Taxonomy made easy
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation
 
Business research lec5
Business research lec5Business research lec5
Business research lec5
 
Last But Not Least - Managing The Indexing Process
Last But Not Least  - Managing The Indexing ProcessLast But Not Least  - Managing The Indexing Process
Last But Not Least - Managing The Indexing Process
 
LIS 703 Subject Analysis by Malgorzata Kot
LIS 703 Subject Analysis by Malgorzata KotLIS 703 Subject Analysis by Malgorzata Kot
LIS 703 Subject Analysis by Malgorzata Kot
 
Using the library for research
Using the library for researchUsing the library for research
Using the library for research
 
Literature Review
Literature ReviewLiterature Review
Literature Review
 
Logistics Management 354 - Reading and Referencing
Logistics Management 354 - Reading and ReferencingLogistics Management 354 - Reading and Referencing
Logistics Management 354 - Reading and Referencing
 
4 Literature Search Techniques 2 Strategic Searching
4 Literature Search Techniques 2 Strategic Searching4 Literature Search Techniques 2 Strategic Searching
4 Literature Search Techniques 2 Strategic Searching
 
Literature search and review
Literature search and reviewLiterature search and review
Literature search and review
 
Subject analysis, process of subject analysis
Subject analysis, process of subject analysisSubject analysis, process of subject analysis
Subject analysis, process of subject analysis
 

Similar to The Role of Thesauri in Data Modeling

An introduction to Metadata Application Profiles
An introduction to Metadata Application ProfilesAn introduction to Metadata Application Profiles
An introduction to Metadata Application Profileskcoylenet
 
Taxonomy design best practices
Taxonomy design best practices Taxonomy design best practices
Taxonomy design best practices voginip
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Semantic Web (Web 3.0)
Semantic Web (Web 3.0)Semantic Web (Web 3.0)
Semantic Web (Web 3.0)John Dougherty
 
The Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal RegulationsThe Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal Regulationstbruce
 
Writing technical definitions
Writing technical definitionsWriting technical definitions
Writing technical definitionsAriadne Rooney
 
Expressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLExpressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLCredential Engine
 
Argumentative Research EssayAssignment DescriptionIn upper lev.docx
Argumentative Research EssayAssignment DescriptionIn upper lev.docxArgumentative Research EssayAssignment DescriptionIn upper lev.docx
Argumentative Research EssayAssignment DescriptionIn upper lev.docxjewisonantone
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information ArchitectureScott Abel
 
SKOS - 2007 Open Forum on Metadata Registries - NYC
SKOS - 2007 Open Forum on Metadata Registries - NYCSKOS - 2007 Open Forum on Metadata Registries - NYC
SKOS - 2007 Open Forum on Metadata Registries - NYCjonphipps
 
DCMI Abstract Model: issues and proposed changes
DCMI Abstract Model: issues and proposed changesDCMI Abstract Model: issues and proposed changes
DCMI Abstract Model: issues and proposed changesEduserv Foundation
 
Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Bradley Allen
 
Referencing an Article - Its styles and type.pptx
Referencing an Article - Its styles and type.pptxReferencing an Article - Its styles and type.pptx
Referencing an Article - Its styles and type.pptxPhD Assistance
 
Mastering the Art of SharePoint DMS
Mastering the Art of SharePoint DMSMastering the Art of SharePoint DMS
Mastering the Art of SharePoint DMSOliver Wirkus
 
Post conference workshop (xml and structure)
Post conference workshop (xml and structure)Post conference workshop (xml and structure)
Post conference workshop (xml and structure)Scriptorium Publishing
 

Similar to The Role of Thesauri in Data Modeling (20)

An introduction to Metadata Application Profiles
An introduction to Metadata Application ProfilesAn introduction to Metadata Application Profiles
An introduction to Metadata Application Profiles
 
Taxonomy design best practices
Taxonomy design best practices Taxonomy design best practices
Taxonomy design best practices
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Semantic Web (Web 3.0)
Semantic Web (Web 3.0)Semantic Web (Web 3.0)
Semantic Web (Web 3.0)
 
The Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal RegulationsThe Semantic Web meets the Code of Federal Regulations
The Semantic Web meets the Code of Federal Regulations
 
Writing technical definitions
Writing technical definitionsWriting technical definitions
Writing technical definitions
 
Expressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDLExpressing Concept Schemes & Competency Frameworks in CTDL
Expressing Concept Schemes & Competency Frameworks in CTDL
 
Linked Data
Linked DataLinked Data
Linked Data
 
Taxonomy Quality Assessment
Taxonomy Quality AssessmentTaxonomy Quality Assessment
Taxonomy Quality Assessment
 
Argumentative Research EssayAssignment DescriptionIn upper lev.docx
Argumentative Research EssayAssignment DescriptionIn upper lev.docxArgumentative Research EssayAssignment DescriptionIn upper lev.docx
Argumentative Research EssayAssignment DescriptionIn upper lev.docx
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
 
SKOS - 2007 Open Forum on Metadata Registries - NYC
SKOS - 2007 Open Forum on Metadata Registries - NYCSKOS - 2007 Open Forum on Metadata Registries - NYC
SKOS - 2007 Open Forum on Metadata Registries - NYC
 
DCMI Abstract Model: issues and proposed changes
DCMI Abstract Model: issues and proposed changesDCMI Abstract Model: issues and proposed changes
DCMI Abstract Model: issues and proposed changes
 
Case Study: JSTOR: A Year Later
Case Study: JSTOR: A Year LaterCase Study: JSTOR: A Year Later
Case Study: JSTOR: A Year Later
 
Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)Semantic Search using RDF Metadata (SemTech 2005)
Semantic Search using RDF Metadata (SemTech 2005)
 
Basics of scientific research writing
Basics of scientific  research writingBasics of scientific  research writing
Basics of scientific research writing
 
Mind the Semantic Gap
Mind the Semantic GapMind the Semantic Gap
Mind the Semantic Gap
 
Referencing an Article - Its styles and type.pptx
Referencing an Article - Its styles and type.pptxReferencing an Article - Its styles and type.pptx
Referencing an Article - Its styles and type.pptx
 
Mastering the Art of SharePoint DMS
Mastering the Art of SharePoint DMSMastering the Art of SharePoint DMS
Mastering the Art of SharePoint DMS
 
Post conference workshop (xml and structure)
Post conference workshop (xml and structure)Post conference workshop (xml and structure)
Post conference workshop (xml and structure)
 

More from Danny Greefhorst

Architecture as Linked Data
Architecture as Linked DataArchitecture as Linked Data
Architecture as Linked DataDanny Greefhorst
 
De rol van thesauri in datamanagement
De rol van thesauri in datamanagementDe rol van thesauri in datamanagement
De rol van thesauri in datamanagementDanny Greefhorst
 
Gegevenskwaliteit – een raamwerk vanuit NORA
Gegevenskwaliteit – een raamwerk vanuit NORAGegevenskwaliteit – een raamwerk vanuit NORA
Gegevenskwaliteit – een raamwerk vanuit NORADanny Greefhorst
 
Presentatie bij Boeklancering "Testautomatisering wendbaar organiseren"
Presentatie bij Boeklancering "Testautomatisering wendbaar organiseren"Presentatie bij Boeklancering "Testautomatisering wendbaar organiseren"
Presentatie bij Boeklancering "Testautomatisering wendbaar organiseren"Danny Greefhorst
 
Presentatie Gegevenskwaliteit in de Omgevingswet voor Werkgroep GAB
Presentatie Gegevenskwaliteit in de Omgevingswet voor Werkgroep GABPresentatie Gegevenskwaliteit in de Omgevingswet voor Werkgroep GAB
Presentatie Gegevenskwaliteit in de Omgevingswet voor Werkgroep GABDanny Greefhorst
 
Inzicht in kwaliteit van gegevens
Inzicht in kwaliteit van gegevensInzicht in kwaliteit van gegevens
Inzicht in kwaliteit van gegevensDanny Greefhorst
 
Data trends en ontwikkelingen
Data trends en ontwikkelingenData trends en ontwikkelingen
Data trends en ontwikkelingenDanny Greefhorst
 
Enterprise Architectuur - de essentie
Enterprise Architectuur - de essentieEnterprise Architectuur - de essentie
Enterprise Architectuur - de essentieDanny Greefhorst
 
The role of enterprise architecture in digital transformation
The role of enterprise architecture in digital transformationThe role of enterprise architecture in digital transformation
The role of enterprise architecture in digital transformationDanny Greefhorst
 
Presentatie Gegevenskwaliteit voor Nationaal Archief
Presentatie Gegevenskwaliteit voor Nationaal ArchiefPresentatie Gegevenskwaliteit voor Nationaal Archief
Presentatie Gegevenskwaliteit voor Nationaal ArchiefDanny Greefhorst
 
Enterprise Architectuur - terug naar de essentie
Enterprise Architectuur - terug naar de essentieEnterprise Architectuur - terug naar de essentie
Enterprise Architectuur - terug naar de essentieDanny Greefhorst
 
Creatief en kritisch denken
Creatief en kritisch denkenCreatief en kritisch denken
Creatief en kritisch denkenDanny Greefhorst
 
Gegevenskwaliteit in de omgevingswet
Gegevenskwaliteit in de omgevingswetGegevenskwaliteit in de omgevingswet
Gegevenskwaliteit in de omgevingswetDanny Greefhorst
 
Gegevenskwaliteit in de omgevingswet 1.0
Gegevenskwaliteit in de omgevingswet 1.0Gegevenskwaliteit in de omgevingswet 1.0
Gegevenskwaliteit in de omgevingswet 1.0Danny Greefhorst
 
Handreiking bij gegevenskwaliteit in de omgevingswet
Handreiking bij gegevenskwaliteit in de omgevingswetHandreiking bij gegevenskwaliteit in de omgevingswet
Handreiking bij gegevenskwaliteit in de omgevingswetDanny Greefhorst
 
Presentatie Enterprise Architectuur - Agile en Essentie
Presentatie Enterprise Architectuur - Agile en EssentiePresentatie Enterprise Architectuur - Agile en Essentie
Presentatie Enterprise Architectuur - Agile en EssentieDanny Greefhorst
 
Presentatie Kritisch Denken van Informatie voor NAF ALV
Presentatie Kritisch Denken van Informatie voor NAF ALVPresentatie Kritisch Denken van Informatie voor NAF ALV
Presentatie Kritisch Denken van Informatie voor NAF ALVDanny Greefhorst
 

More from Danny Greefhorst (20)

Architecture as Linked Data
Architecture as Linked DataArchitecture as Linked Data
Architecture as Linked Data
 
Design for sustainability
Design for sustainabilityDesign for sustainability
Design for sustainability
 
De rol van thesauri in datamanagement
De rol van thesauri in datamanagementDe rol van thesauri in datamanagement
De rol van thesauri in datamanagement
 
Gegevenskwaliteit – een raamwerk vanuit NORA
Gegevenskwaliteit – een raamwerk vanuit NORAGegevenskwaliteit – een raamwerk vanuit NORA
Gegevenskwaliteit – een raamwerk vanuit NORA
 
Presentatie bij Boeklancering "Testautomatisering wendbaar organiseren"
Presentatie bij Boeklancering "Testautomatisering wendbaar organiseren"Presentatie bij Boeklancering "Testautomatisering wendbaar organiseren"
Presentatie bij Boeklancering "Testautomatisering wendbaar organiseren"
 
Presentatie Gegevenskwaliteit in de Omgevingswet voor Werkgroep GAB
Presentatie Gegevenskwaliteit in de Omgevingswet voor Werkgroep GABPresentatie Gegevenskwaliteit in de Omgevingswet voor Werkgroep GAB
Presentatie Gegevenskwaliteit in de Omgevingswet voor Werkgroep GAB
 
Routes naar datakwaliteit
Routes naar datakwaliteitRoutes naar datakwaliteit
Routes naar datakwaliteit
 
Inzicht in kwaliteit van gegevens
Inzicht in kwaliteit van gegevensInzicht in kwaliteit van gegevens
Inzicht in kwaliteit van gegevens
 
Data trends en ontwikkelingen
Data trends en ontwikkelingenData trends en ontwikkelingen
Data trends en ontwikkelingen
 
TOGAF 9.2 - the update
TOGAF 9.2 - the updateTOGAF 9.2 - the update
TOGAF 9.2 - the update
 
Enterprise Architectuur - de essentie
Enterprise Architectuur - de essentieEnterprise Architectuur - de essentie
Enterprise Architectuur - de essentie
 
The role of enterprise architecture in digital transformation
The role of enterprise architecture in digital transformationThe role of enterprise architecture in digital transformation
The role of enterprise architecture in digital transformation
 
Presentatie Gegevenskwaliteit voor Nationaal Archief
Presentatie Gegevenskwaliteit voor Nationaal ArchiefPresentatie Gegevenskwaliteit voor Nationaal Archief
Presentatie Gegevenskwaliteit voor Nationaal Archief
 
Enterprise Architectuur - terug naar de essentie
Enterprise Architectuur - terug naar de essentieEnterprise Architectuur - terug naar de essentie
Enterprise Architectuur - terug naar de essentie
 
Creatief en kritisch denken
Creatief en kritisch denkenCreatief en kritisch denken
Creatief en kritisch denken
 
Gegevenskwaliteit in de omgevingswet
Gegevenskwaliteit in de omgevingswetGegevenskwaliteit in de omgevingswet
Gegevenskwaliteit in de omgevingswet
 
Gegevenskwaliteit in de omgevingswet 1.0
Gegevenskwaliteit in de omgevingswet 1.0Gegevenskwaliteit in de omgevingswet 1.0
Gegevenskwaliteit in de omgevingswet 1.0
 
Handreiking bij gegevenskwaliteit in de omgevingswet
Handreiking bij gegevenskwaliteit in de omgevingswetHandreiking bij gegevenskwaliteit in de omgevingswet
Handreiking bij gegevenskwaliteit in de omgevingswet
 
Presentatie Enterprise Architectuur - Agile en Essentie
Presentatie Enterprise Architectuur - Agile en EssentiePresentatie Enterprise Architectuur - Agile en Essentie
Presentatie Enterprise Architectuur - Agile en Essentie
 
Presentatie Kritisch Denken van Informatie voor NAF ALV
Presentatie Kritisch Denken van Informatie voor NAF ALVPresentatie Kritisch Denken van Informatie voor NAF ALV
Presentatie Kritisch Denken van Informatie voor NAF ALV
 

Recently uploaded

社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeralNABLAS株式会社
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一w7jl3eyno
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一hwhqz6r1y
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyRafigAliyev2
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfEmmanuel Dauda
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证dq9vz1isj
 
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat ViagraToko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagraadet6151
 

Recently uploaded (20)

社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
123.docx. .
123.docx.                                 .123.docx.                                 .
123.docx. .
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat ViagraToko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
 

The Role of Thesauri in Data Modeling

  • 1. The role of thesauri in data modeling Danny Greefhorst dgreefhorst@archixl.nl
  • 2. Topics in this presentation • What is a thesaurus and why is it valuable? • What does a thesaurus look like? • How does a thesaurus relate to a data model? • SKOS as a language for describing thesauri • Guidelines for good definitions (based on ISO 704) • Quality control for thesauri
  • 3. • A thesaurus is a type of controlled vocabulary for content retrieval. • A controlled vocabulary is a defined list of explicitly allowed terms used to index, categorize, tag, sort, and retrieve content through browsing and searching. • A thesaurus provides information about each term and its relationship to other terms. • Relationships are either hierarchical, associative or equivalent. • Thesauri can be used to: • organize unstructured content • uncover relationships between content from different media • improve website navigation • optimize search Thesaurus in the Data Management Body of Knowledge 3
  • 4. • A business glossary is a means of sharing this vocabulary within the organization. • A data steward is generally responsible for business glossary content. • They enhance enterprise knowledge by associating data assets with glossary terms. • Business glossaries have the following objectives: • enable common understanding of the core business concepts and terminology • reduce the risk that data will be misused due to inconsistent understanding of the business concepts • improve the alignment between technology assets (with their technical naming conventions) and the business organization • maximize search capability and enable access to documented institutional knowlegde Business glossary in Data Management Body of Knowledge 4
  • 5. Link concepts to other objects Concept Document/ web content Application Business rule API specification Database definition Data model Dataset Dashboard/report
  • 6. A controlled vocabulary is needed to make data FAIR https://www.go-fair.org/fair-principles/
  • 7. Concepts and data lineage - wat does the data mean? Regulations such as PERDARR/BCBS239 ask explicitly for a catalogue of concepts: • As a precondition, a bank should have a “dictionary” of the concepts used, such that data is defined consistently across an organization • A bank should develop an inventory and classification of risk data items which includes a reference to the concepts used to elaborate the reports. Data Data Data Concepts Concepts Concepts Report Horizontal data lineage Vertical data lineage
  • 8.
  • 9. Practical template for concepts Name Description Term A preferred linguistic reference to a concept. URI A unique identifier to the concept. Domain Domain in which the concept exists. Definition The formal definition of the concept. Source A reference to the source of the definition. Informal definition A simple definition of the concept that is understandable for a broad audience. Explanation A further clarification of the concept and the way it is used in the specific context. Editorial notes Remarks that are related to decisions made during the description of the concept. Examples A short summary or description of example instances of the concept. Synonyms Terms that denote almost the same concept. Exact match Concepts in another thesaurus that denote the same concept. Related Concepts that are related to the concept in another (non-hierarchical) manner. Broader Concepts that have a broader meaning than the concept. Broader partitive Concepts that represent a whole that the concept is a part of.
  • 10. Levels of modeling Thesaurus Logical data model Physical data model A collection of concepts and their relationships A design of a data structure A technology-specific representation of data Information model A formal description of a universe of discourse Conceptual Logical Physical Level Type of model Description
  • 11. Semiotic triangle Thought or reference Referent Symbol Stands for Source: Ogden and Richards (1923)
  • 12. A concept model is a model that develops the meaning of core concepts for a problem domain, defines their collective structure, and specifies the appropriate vocabulary needed to communicate about it consistently. Data models can usually be rather easily derived from concept models Strengths of a concept model: • Provides a business-friendly way to communicate with stakeholders about precise meanings and subtle distinctions. • Is independent of data design biases and the often limited business vocabulary coverage of data models. • Proves highly useful for white-collar, knowledge-rich, decision-laden business processes. • Helps ensure that large numbers of business rules and complex decision tables are free of ambiguity and fit together cohesively. A thesaurus is close to a concept model Source: Ron Ross: “Business Rule Concepts” and https://www.brcommunity.com/articles.php?id=b779
  • 13. Linking concepts to a data model – MIM standard All model elements have a property “Concept”: Reference to a concept, from a model element, indicating on which concept, or concepts, the information model element is based. The reference is in the form of a term or a URI.
  • 14. SKOS - Simple knowledge organisation system • Open standard of the W3C – defined in 2009 • Part of and based on Linked Data standards such as RDF • Makes every concept findable on the web with a URI • Offers a model for describing knowledge organisation systems such as thesauri • Based on general theory and standards about thesauri • Specifically aimed at publication of concepts on the web • Simplified model for describing concepts compared to other systems • Uses RDF and accompanyning standards (XML, TTL, JSON-LD) • Supported by various commercial and open source tools • Can be combined with the SKOS-THES standard to include part of and instance of relationships • Can be combined with the Dublin Core metadata standard • More information: https://www.w3.org/TR/skos-primer/
  • 15. Practical template mapped to SKOS Name SKOS representation Term skos:prefLabel URI skos:Concept Domain skos:member Definition skos:definition Source dc:source Informal definition rdfs:comment Explanation skos:scopeNote Editorial notes skos:editorialNote Examples skos:example Synonyms skos:altLabel Exact match skos:exactMatch Related skos:related Broader skos:broader Broader partitive isothes:broaderPartitive
  • 16. Practical template for concepts Name Description Term A preferred linguistic reference to a concept. URI A unique identifier to the concept. Domain Domain in which the concept exists. Definition The formal definition of the concept. Source A reference to the source of the definition. Informal definition A simple definition of the concept that is understandable for a broad audience. Explanation A further clarification of the concept and the way it is used in the specific context. Editorial notes Remarks that are related to decisions made during the description of the concept. Examples A short summary or description of example instances of the concept. Synonyms Terms that denote almost the same concept. Exact match Concepts in another thesaurus that denote the same concept. Related Concepts that are related to the concept in another (non-hierarchical) manner. Broader Concepts that have a broader meaning than the concept. Broader partitive Concepts that represent a whole that the concept is a part of.
  • 17. A SKOS concept <?xml version="1.0" encoding="utf-8" ?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:dc="http://purl.org/dc/terms/" xmlns:ns0="http://www.eionet.europa.eu/gemet/2004/06/gemet-schema.rdf#"> <skos:narrower rdf:resource="http://www.eionet.europa.eu/gemet/concept/15031"/> <dc:modified rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"> 2004-09-08T09:59:20+00:00</dc:modified> <skos:prefLabel xml:lang="en">air quality</skos:prefLabel> <skos:prefLabel xml:lang="nl">luchtkwaliteit</skos:prefLabel> <dc:created rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime"> 2004-09-08T09:59:20+00:00</dc:created> <skos:definition xml:lang="en">The degree to which air is polluted; the type and maximum concentration of man-produced pollutants that should be permitted in the atmosphere.</skos:definition> </rdf:RDF>
  • 18. Guidelines for formulating definitions of concepts (1) • Connect with general language use and the language in the organization in terms and definitions “Car” instead of “Automobile” • Define terms with a short name (term), in singular and starting with a capital letter “Car” instead of “Cars” • Keep definitions as short as possible; include only distinguishing features "a motorized vehicle with 4 wheels" instead of "a motorized vehicle with 4 wheels that can be used for both private and business transport“ • Use intensional definitions where possible; name the distinguishing features of a concept “a 4-wheel motorized vehicle” instead of “sedan or station wagon”
  • 19. Guidelines for formulating definitions of concepts (2) • Start by defining more general and broader words Define car first, then define station wagon • Define a term with a narrower meaning such as “A <broader notion> that…” A station wagon is “a car with a large cargo area” • Do not include features of a broader concept in the definition of a concept Not: a station wagon is “a car with 4 wheels and a large loading space” • Adopt definitions from official sources where possible and consistent with proprietary terminology Do not adopt a definition from a commercial source (such as a supplier)
  • 20. Guidelines for formulating definitions of concepts (3) • Define separate terms for all non-common words in definitions A “wheel” is a common word and needs no definition • Define not only concepts that lead 1-1 to data elements, but also the relevant context Define road and driver in addition to car • Avoid circular definitions; do not express a concept in terms of itself or its conjugations and do not allow definitions of concepts to refer to each other Driving is “moving around with a car” instead of “driving a car” • Avoid definitions that contain negations; define what something is and not what something isn't (unless you define opposing concepts) Not: a car is “a vehicle that is not a truck”
  • 21. Guidelines for formulating definitions of concepts (4) • Avoid the term “data” and anything directly related to data in definitions of terms Not: a car is “four-wheel vehicle data” • Support the definition with an explanation that indicates how the term is used within the organization Explanation for cars: “Cars are only relevant to our organization from the perspective of parking.” • Avoid using homonyms whenever possible Don't: define the term “Car” in two ways • Only name synonyms that are frequently used and acceptable “Automotive” as a synonym for “Car”, but not “Motor car”
  • 22. Quality rules for SKOS thesauri https://seco.cs.aalto.fi/publications/2014/suominen-mader-skosquality.pdf
  • 23. Example quality rules in more detail • Omitted or Invalid Language Tags: Literals should be tagged consistently with a language tag. • Undocumented Concepts: Concepts should include the set of “documentation properties" as defined in the SKOS Reference. • Overlapping Labels: No two concepts should have the same preferred lexical label in a given language when they belong to the same concept scheme. • Disjoint Labels Violation: skos:prefLabel, skos:altLabel and skos:hiddenLabel should be pairwise disjoint properties. • Extra Whitespace in Labels: Labels should not have any leading or trailing whitespace. • Orphan Concepts: Concepts should have associative or hierarchical relationships with other concepts. https://seco.cs.aalto.fi/publications/2014/suominen-mader-skosquality.pdf
  • 24. Summary • A thesaurus gives meaning to words • Concepts can be linked to all sorts of artefacts, enabling findability • Data modelling should start at a thesaurus level • Open and FAIR data requires a controlled vocabulary such as a thesaurus • SKOS is the de facto standard for thesauri
  • 25. More information? ArchiXL thesaurus: https://begrippen.archixl.nl/archixl/nl/ BegrippenXL thesaurusplatform: https://www.begrippenxl.nl/en/?clang=nl