SlideShare a Scribd company logo
1 of 58
Metadata Quality Assurance Framework
Péter Király <peter.kiraly@gwdg.de>
Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen, Germany
QQML2016
8th International Conference on Qualitative and Quantitative Methods in Libraries
2016-05-24, London
Metadata Quality Assurance Framework
2
the problem
there are „good” and „bad” metadata records
Metadata Quality Assurance Framework
3
Typical issues – non-informative field
 Title is not informative
non informative:
„photograph, framed”,
„group photograph”
„photograph”
vs
informative:
„Photograph of Sir
Dugald Clerk”,
„Photograph of "Puffing Billy"
Metadata Quality Assurance Framework
4
Typical issues – Copy & paste cataloging
 Keeping placeholders / templates
Metadata Quality Assurance Framework
5
Typical issues – Field overuse
 What is the meaning of the field? (overuse)
TextGrid OAI-PMH response
Metadata Quality Assurance Framework
6
Why data quality is important?
„Fitness for purpose” (QA principle)
no metadata no access to data no data usage
more explanation:
Data on the Web Best Practices
W3C Working Draft 19 May 2016
https://www.w3.org/TR/dwbp/
Metadata Quality Assurance Framework
7
Europeana Data Quality Committee
 Online collaboration
 Use case documents
 Problem catalog
 Tickets
 Discussion forum
 #EuropeanaDataQuality
 Bi-weekly teleconf
 Bi-yearly face-to-face
meeting
 Topics
 Usage scenarios
 Metadata profiles
 Schema modification
 Measuring
 Event model
 Proposals for data
providers
Metadata Quality Assurance Framework
8
Research hypothesis
hypothesis
with measuring structural elements we
can predict metadata record quality
Metadata Quality Assurance Framework
9
What it is good for?
 improve the metadata
 improve services: good data → functions
 improve metadata schema & documentation
 propagate „good practice”
Domains:
 cultural heritage sector
 research data management and archiving
Metadata Quality Assurance Framework
10
Research hypothesis
proposed solution
Metadata Quality Assurance Framework
Metadata Quality Assurance Framework
11
What to measure?
Metadata Quality Assurance Framework
12
Measurements
 Schema-independent structural features
existence, cardinality, uniqueness, length,
dictionary entry, data type conformance
 Use case scenarios („fit for purpose”)
Requirements of the most important functions
 Problem catalog
Known metadata problems
Metadata Quality Assurance Framework
13
Discovery scenarios and their metadata requirements
Europeana’s most important functions
1. Basic retrieval with high precision and recall
2. Cross-language recall
3. Entity-based facets
4. Date-based facets
5. Improved language facets
6. Browse by subjects and resource types
7. Browse by agents
8. Browse/Search by Event
9. Entity-based knowledge cards and pages
10. Categorised similar items
11. Spatial search, browse, and map display
12. Entity-based autocompletion
13. Diversification of results
14. Hierarchical search and facets
Credit: the document was initialized by Timothy Hill, Europeana’s search engineer
Metadata Quality Assurance Framework
14
Discovery scenarios and their metadata requirements – Entity-based facets
Scenario
As a user I want to be able to filter by whether a person is the
subject of a book, or its author, engraver, printer etc.
Metadata analysis
In each case the underlying requirement is that the relevant EDM
fields for objects be populated by identifying URIs rather than free
text. These URIs need to be related, at a minimum, to a label for
each of the supported languages.
Measurement rules
 The relevant field values should be resolvable URI
 each URI should have labels in multiple languages
Metadata Quality Assurance Framework
15
Discovery scenarios and their metadata requirements – Date-based facets
Scenario
I want to be able to filter my results by a variety of timespans, e.g.:
 Date of creation
 Date of publication
 Date as subject
Metadata analysis
Dates should be fully and consistently normalised to follow the XSD
date-time data types. Dates expressed in styles like “490 avant J.C”
that are inherently language dependent should be avoided as they’re
very difficult to normalise (e.g. this should be represented as “-
0490”^^xsd:gYear).
Measurement rules
 Field value should be XSD date-time data types
Metadata Quality Assurance Framework
16
Problem catalog
Catalog of known metadata problems in Europeana
 Title contents same as description contents
 Systematic use of the same title
 Bad string: "empty" (and variants)
 Shelfmarks and other identifiers in fields
 Creator not an agent name
 Absurd geographical location
 Subject field used as description field
 Unicode U+FFFD (�)
 Very short description field
 ...
Credit: the document was initialized by Timoty Hill, Europeana’s search engineer
Metadata Quality Assurance Framework
17
Problem catalog
Description Title contents same as description contents
Example /2023702/35D943DF60D779EC9EF31F5DF...
Motivation Distorts search weightings
Checking Method Field comparison
Notes Record display: creator concatenated onto title
Metadata Scenario Basic Retrieval
Metadata Quality Assurance Framework
18
How to define measurements?
Metadata Quality Assurance Framework
19
Problem catalog – proposed basis of implementation
Shapes Constraint Language (SHACL)
https://www.w3.org/TR/shacl/
A language for describing and constraining the contents of RDF
graphs. It provides a high-level vocabulary to identify predicates and
their associated cardinalities, datatypes and other constraints.
 sh:equals, sh:notEquals
 sh:hasValue
 sh:in
 sh:lessThan, sh:lessThanOrEquals
 sh:minCount, sh:maxCount
 sh:minLength, sh:maxLength
 sh:pattern
Metadata Quality Assurance Framework
20
early measurement results
and their visualization
Metadata Quality Assurance Framework
21
overall view collection view record view
Completeness – 40 measurements
Field cardinality – 27 measurements
Uniqueness – 6 measurements
Language specification – 20 measurements
Problem catalog – 3 measurements
etc.
links
measurementsaggregated numbers
Metadata Quality Assurance Framework
22
completeness
What is the ratio of populated fields in records?
Metadata Quality Assurance Framework
23
Field frequency / main
Metadata Quality Assurance Framework
24
Field frequency / main
Alternative title is a rare field
Metadata Quality Assurance Framework
25
Field frequency per collections / all
no record has alternative title
every record has alternative title
Metadata Quality Assurance Framework
26
Field frequency per collections / remove no-instances
Metadata Quality Assurance Framework
27
Field frequency per collections / display only complete collections
Metadata Quality Assurance Framework
28
cardinality
How many field instances are in the records?
Metadata Quality Assurance Framework
29
Field cardinality – overview
more field than record
number of records
Metadata Quality Assurance Framework
30
Field cardinality – overview
dc:type
Metadata Quality Assurance Framework
31
Field cardinality – histogram
128 subjects in one record
median is 0, mean is close to 1
link to interesting records
Metadata Quality Assurance Framework
32
Field cardinality – an outlier
Metadata Quality Assurance Framework
33
multilinguality
Do we know the language of a field value?
Metadata Quality Assurance Framework
34
Multilinguality
@resource is a URI
@ = language notation in RDF
no language specification
Metadata Quality Assurance Framework
35
Language frequency / barchart
Metadata Quality Assurance Framework
36
Language frequency / barchart
same language,
different encodings
Metadata Quality Assurance Framework
37
Language frequency / Treemap
has language
specification
has no language
specification
Metadata Quality Assurance Framework
38
Language frequency / Treemap with resources
has no language
specification
has language
specification
Is a URI
Metadata Quality Assurance Framework
39
Language frequency / Treemap + interaction + table
hide/display categories
table-like formal
Metadata Quality Assurance Framework
40
uniqueness (entropy)
How unique the terms are in a field?
Metadata Quality Assurance Framework
41
Entropy – term uniqueness / main
1 means a unique term
0.0000x means a very frequent term
These are cumulative numbers
entropycumolative = term1 + ... + termn
Metadata Quality Assurance Framework
42
Entropy – term uniqueness / collection
max is exceptional (=1425 * mean)
unique records
not or less unique records
Metadata Quality Assurance Framework
43
Entropy – term uniqueness / refining the picture
bulk of records are close to zero
although 25% are between 0.05 and 1.25
Metadata Quality Assurance Framework
44
Entropy – term uniqueness / field value
Russian text in transcribed Latin
writing szstem, not in Cyrillic
Metadata Quality Assurance Framework
45
Entropy – term uniqueness / terms
explanation of uniqueness score
TF-IDF values come from Apache Solr
term frequency: 1
document freq.: 2
uniqueness score: 0.5
Metadata Quality Assurance Framework
46
problem catalog
Does the record have any specific issues?
Metadata Quality Assurance Framework
47
Problem catalog – Long subject
a record with 265 „long” subject heading
Metadata Quality Assurance Framework
48
Problem catalog – Long subject – example (not so long...)
Conclusion: we have to refine
the definition of „long”
Metadata Quality Assurance Framework
49
Problem catalog – same title and description
there is one title and
description which is the same
... and we have 9 such records
Metadata Quality Assurance Framework
50
Problem catalog – same title and description – example
Metadata Quality Assurance Framework
51
completeness sub-dimensions
Are the sub-dimensions (field groups
supporting specific functionalities) complete?
Metadata Quality Assurance Framework
52
Record view – functionality matrix
existing
missing
functionalities
Metadata Quality Assurance Framework
53
miscellaneous
Metadata Quality Assurance Framework
54
Other elements of the record view
Metadata Quality Assurance Framework
55
Further steps
 Incorporating into Europeana’s ingestion tool
 Process usage statistics (logs, Google Analitics)
 Human evaluation of metadata quality
 Measuring timeliness (changes of scores over time)
 Machine learning based classification & clustering
 Incorporating into research data management tool
 Cooperation with other projects
Metadata Quality Assurance Framework
56
Project principles
 Scalable, ready for big data
 Loose coupling to metadata schemas
 Transparency: open source, open data (CC0)
 Release early, release often
 Getting real [1]
 Collaboration and communication
[1] https://gettingreal.37signals.com/
Metadata Quality Assurance Framework
57
Architectural overview
Apache Spark
(Java)
OAI-PMH client (PHP)
Analysis with
Spark (Scala) Analysis with R
Web interface
(PHP, d3.js)
Hadoop File
System
JSON files
Apache Solr
Apache
Cassandra
JSON files
JSON files image files
CSV files
CSV files
recent workflow
planned workflow
Metadata Quality Assurance Framework
58
Follow me
 Europeana Data Quality Committee
http://pro.europeana.eu/europeana-tech/data-
quality-committee
 research plan and blog http://pkiraly.github.io
 site http://144.76.218.178/europeana-qa/
 source codes
 https://github.com/pkiraly/europeana-qa-spark
 https://github.com/pkiraly/europeana-qa-r
 @kiru, https://www.linkedin.com/in/peterkiraly

More Related Content

What's hot

Identifying Relevant Sources for Data Linking using a Semantic Web Index
Identifying Relevant Sources for Data Linking using a Semantic Web IndexIdentifying Relevant Sources for Data Linking using a Semantic Web Index
Identifying Relevant Sources for Data Linking using a Semantic Web IndexAndriy Nikolov
 
Metadata mapping
Metadata mappingMetadata mapping
Metadata mappingVlad Vega
 
Establishing the Connection: Creating a Linked Data Version of the BNB
Establishing the Connection: Creating a Linked Data Version of the BNBEstablishing the Connection: Creating a Linked Data Version of the BNB
Establishing the Connection: Creating a Linked Data Version of the BNBnw13
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DLAndrea Nuzzolese
 
FAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologiesFAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologiesResearch Data Alliance
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the FutureCarole Goble
 
OSFair2017 Training | FAIR metrics - Starring your data sets
OSFair2017 Training | FAIR metrics - Starring your data setsOSFair2017 Training | FAIR metrics - Starring your data sets
OSFair2017 Training | FAIR metrics - Starring your data setsOpen Science Fair
 
2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovukJun Zhao
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Andre Freitas
 
Dev days 2017 questionnaires (brian postlethwaite)
Dev days 2017 questionnaires (brian postlethwaite)Dev days 2017 questionnaires (brian postlethwaite)
Dev days 2017 questionnaires (brian postlethwaite)DevDays
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...semanticsconference
 
Data analysis in dataverse & visualization of datasets on historical maps
Data analysis in dataverse & visualization of datasets on historical mapsData analysis in dataverse & visualization of datasets on historical maps
Data analysis in dataverse & visualization of datasets on historical mapsvty
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftSebastian Hellmann
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Dr.-Ing. Thomas Hartmann
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Daniel Valcarce
 

What's hot (20)

Identifying Relevant Sources for Data Linking using a Semantic Web Index
Identifying Relevant Sources for Data Linking using a Semantic Web IndexIdentifying Relevant Sources for Data Linking using a Semantic Web Index
Identifying Relevant Sources for Data Linking using a Semantic Web Index
 
Metadata mapping
Metadata mappingMetadata mapping
Metadata mapping
 
Metadata crosswalks
Metadata crosswalksMetadata crosswalks
Metadata crosswalks
 
Establishing the Connection: Creating a Linked Data Version of the BNB
Establishing the Connection: Creating a Linked Data Version of the BNBEstablishing the Connection: Creating a Linked Data Version of the BNB
Establishing the Connection: Creating a Linked Data Version of the BNB
 
Semantic Technologies in ST&DL
Semantic Technologies in ST&DLSemantic Technologies in ST&DL
Semantic Technologies in ST&DL
 
FAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologiesFAIRness through a novel combination of Web technologies
FAIRness through a novel combination of Web technologies
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 
OSFair2017 Training | FAIR metrics - Starring your data sets
OSFair2017 Training | FAIR metrics - Starring your data setsOSFair2017 Training | FAIR metrics - Starring your data sets
OSFair2017 Training | FAIR metrics - Starring your data sets
 
2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
A Case for linked Data for Medical Devices in the IVD Market
A Case for linked Data for Medical Devices in the IVD MarketA Case for linked Data for Medical Devices in the IVD Market
A Case for linked Data for Medical Devices in the IVD Market
 
Dev days 2017 questionnaires (brian postlethwaite)
Dev days 2017 questionnaires (brian postlethwaite)Dev days 2017 questionnaires (brian postlethwaite)
Dev days 2017 questionnaires (brian postlethwaite)
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
Nicoletta Fornara and Fabio Marfia | Modeling and Enforcing Access Control Ob...
 
Data analysis in dataverse & visualization of datasets on historical maps
Data analysis in dataverse & visualization of datasets on historical mapsData analysis in dataverse & visualization of datasets on historical maps
Data analysis in dataverse & visualization of datasets on historical maps
 
NLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draftNLP2RDF Wortschatz and Linguistic LOD draft
NLP2RDF Wortschatz and Linguistic LOD draft
 
All aboard the Semantic Bandwagon
All aboard the Semantic BandwagonAll aboard the Semantic Bandwagon
All aboard the Semantic Bandwagon
 
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
Doctoral Examination at the Karlsruhe Institute of Technology (08.07.2016)
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
 

Viewers also liked

Quality Assurance Vs Quality Control
Quality Assurance Vs Quality ControlQuality Assurance Vs Quality Control
Quality Assurance Vs Quality ControlYogita patil
 
The eXtensible Catalog Drupal Toolkit
The eXtensible Catalog Drupal ToolkitThe eXtensible Catalog Drupal Toolkit
The eXtensible Catalog Drupal ToolkitPéter Király
 
Proposal for a quality framework for the evaluation of administrative and sur...
Proposal for a quality framework for the evaluation of administrative and sur...Proposal for a quality framework for the evaluation of administrative and sur...
Proposal for a quality framework for the evaluation of administrative and sur...Piet J.H. Daas
 
Quality management system procedures
Quality management system proceduresQuality management system procedures
Quality management system proceduresselinasimpson2101
 
Quality framework
Quality frameworkQuality framework
Quality frameworksaurabhshri
 
WebeX Presentation - Quality Consortium
WebeX Presentation - Quality ConsortiumWebeX Presentation - Quality Consortium
WebeX Presentation - Quality ConsortiumThe Avoca Group
 
Sharepoint quality management system
Sharepoint quality management systemSharepoint quality management system
Sharepoint quality management systemselinasimpson2101
 
Process asset library as process improvement and knowledge sharing tool
Process asset library as process improvement and knowledge sharing toolProcess asset library as process improvement and knowledge sharing tool
Process asset library as process improvement and knowledge sharing toolKobi Vider
 
Institutional framework for quality assurance on infrastructure provisions in...
Institutional framework for quality assurance on infrastructure provisions in...Institutional framework for quality assurance on infrastructure provisions in...
Institutional framework for quality assurance on infrastructure provisions in...Adarsha Kapoor
 
QMS SharePoint Structure Definition Document
QMS SharePoint Structure Definition DocumentQMS SharePoint Structure Definition Document
QMS SharePoint Structure Definition DocumentMelissa Jones
 
2004 E2M - The ShopView Story Information Package.PDF
2004 E2M - The ShopView Story Information Package.PDF2004 E2M - The ShopView Story Information Package.PDF
2004 E2M - The ShopView Story Information Package.PDFMelissa Jones
 
Part 3 - SharePoint QMS Anyone Can Make - Data Dictionary
Part 3 - SharePoint QMS Anyone Can Make - Data DictionaryPart 3 - SharePoint QMS Anyone Can Make - Data Dictionary
Part 3 - SharePoint QMS Anyone Can Make - Data DictionaryMelissa Jones
 
QMS SharePoint Wireframe - download and edit for you use
QMS SharePoint Wireframe - download and edit for you useQMS SharePoint Wireframe - download and edit for you use
QMS SharePoint Wireframe - download and edit for you useMelissa Jones
 
Quality framework 1
Quality framework 1Quality framework 1
Quality framework 1Shwetha Bhat
 
Guidelines for the Development of a Quality Assurance (QA) Framework for Ope...
Guidelines for the Development of a Quality Assurance (QA) Framework for Ope...Guidelines for the Development of a Quality Assurance (QA) Framework for Ope...
Guidelines for the Development of a Quality Assurance (QA) Framework for Ope...African Virtual University
 
Quality measurement - How to measure the quality of any object?
Quality measurement - How to measure the quality of any object?Quality measurement - How to measure the quality of any object?
Quality measurement - How to measure the quality of any object?Grzegorz Grela
 

Viewers also liked (20)

Quality Assurance Vs Quality Control
Quality Assurance Vs Quality ControlQuality Assurance Vs Quality Control
Quality Assurance Vs Quality Control
 
The eXtensible Catalog Drupal Toolkit
The eXtensible Catalog Drupal ToolkitThe eXtensible Catalog Drupal Toolkit
The eXtensible Catalog Drupal Toolkit
 
Solr in Drupal
Solr in DrupalSolr in Drupal
Solr in Drupal
 
Proposal for a quality framework for the evaluation of administrative and sur...
Proposal for a quality framework for the evaluation of administrative and sur...Proposal for a quality framework for the evaluation of administrative and sur...
Proposal for a quality framework for the evaluation of administrative and sur...
 
Quality management system procedures
Quality management system proceduresQuality management system procedures
Quality management system procedures
 
Quality framework
Quality frameworkQuality framework
Quality framework
 
Bpo risk management 2013
Bpo risk management 2013Bpo risk management 2013
Bpo risk management 2013
 
WebeX Presentation - Quality Consortium
WebeX Presentation - Quality ConsortiumWebeX Presentation - Quality Consortium
WebeX Presentation - Quality Consortium
 
Sharepoint quality management system
Sharepoint quality management systemSharepoint quality management system
Sharepoint quality management system
 
Mixed Methods Research
Mixed Methods ResearchMixed Methods Research
Mixed Methods Research
 
Process asset library as process improvement and knowledge sharing tool
Process asset library as process improvement and knowledge sharing toolProcess asset library as process improvement and knowledge sharing tool
Process asset library as process improvement and knowledge sharing tool
 
Institutional framework for quality assurance on infrastructure provisions in...
Institutional framework for quality assurance on infrastructure provisions in...Institutional framework for quality assurance on infrastructure provisions in...
Institutional framework for quality assurance on infrastructure provisions in...
 
QMS SharePoint Structure Definition Document
QMS SharePoint Structure Definition DocumentQMS SharePoint Structure Definition Document
QMS SharePoint Structure Definition Document
 
2004 E2M - The ShopView Story Information Package.PDF
2004 E2M - The ShopView Story Information Package.PDF2004 E2M - The ShopView Story Information Package.PDF
2004 E2M - The ShopView Story Information Package.PDF
 
Part 3 - SharePoint QMS Anyone Can Make - Data Dictionary
Part 3 - SharePoint QMS Anyone Can Make - Data DictionaryPart 3 - SharePoint QMS Anyone Can Make - Data Dictionary
Part 3 - SharePoint QMS Anyone Can Make - Data Dictionary
 
QMS SharePoint Wireframe - download and edit for you use
QMS SharePoint Wireframe - download and edit for you useQMS SharePoint Wireframe - download and edit for you use
QMS SharePoint Wireframe - download and edit for you use
 
Quality framework 1
Quality framework 1Quality framework 1
Quality framework 1
 
Guidelines for the Development of a Quality Assurance (QA) Framework for Ope...
Guidelines for the Development of a Quality Assurance (QA) Framework for Ope...Guidelines for the Development of a Quality Assurance (QA) Framework for Ope...
Guidelines for the Development of a Quality Assurance (QA) Framework for Ope...
 
Quality measurement - How to measure the quality of any object?
Quality measurement - How to measure the quality of any object?Quality measurement - How to measure the quality of any object?
Quality measurement - How to measure the quality of any object?
 
Audit Quality Framework & Proportionate Application of ISAs
Audit Quality Framework & Proportionate Application of ISAsAudit Quality Framework & Proportionate Application of ISAs
Audit Quality Framework & Proportionate Application of ISAs
 

Similar to Metadata Quality Assurance Framework at QQML2016 conference - full version

Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In PracticeMarcia Zeng
 
Measuring Metadata Quality (ELAG, 2018)
Measuring Metadata Quality (ELAG, 2018)Measuring Metadata Quality (ELAG, 2018)
Measuring Metadata Quality (ELAG, 2018)Péter Király
 
Metadata Quality assessment tool for Open Access
Metadata Quality assessment tool for Open AccessMetadata Quality assessment tool for Open Access
Metadata Quality assessment tool for Open AccessPaolo Nesi
 
Metadata Quality assessment tool for Open Access Cultural Heritage institutio...
Metadata Quality assessment tool for Open Access Cultural Heritage institutio...Metadata Quality assessment tool for Open Access Cultural Heritage institutio...
Metadata Quality assessment tool for Open Access Cultural Heritage institutio...Paolo Nesi
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...Artificial Intelligence Institute at UofSC
 
A language modeling framework for expert finding
A language modeling framework for expert findingA language modeling framework for expert finding
A language modeling framework for expert findingSaúl Vargas Sandoval
 
Nothing is created, nothing is lost, everything changes (ELAG, 2017)
Nothing is created, nothing is lost, everything changes (ELAG, 2017)Nothing is created, nothing is lost, everything changes (ELAG, 2017)
Nothing is created, nothing is lost, everything changes (ELAG, 2017)Péter Király
 
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Péter Király
 
A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsMichel Dumontier
 
How to expose research data in EOSC
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSCEUDAT
 
Introduction to Digital Humanities: Metadata standards and ontologies
Introduction to Digital Humanities: Metadata standards and ontologies Introduction to Digital Humanities: Metadata standards and ontologies
Introduction to Digital Humanities: Metadata standards and ontologies LIBIS
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?andrea huang
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data ManagementOpenAIRE
 
DataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open DataDataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open Datadapaasproject
 
Academic Writing and Research Data Management
Academic Writing and Research Data ManagementAcademic Writing and Research Data Management
Academic Writing and Research Data ManagementCESSDA Training
 
Data Quality
Data QualityData Quality
Data Qualityjerdeb
 
Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005Paolo Missier
 

Similar to Metadata Quality Assurance Framework at QQML2016 conference - full version (20)

Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In Practice
 
Measuring Metadata Quality (ELAG, 2018)
Measuring Metadata Quality (ELAG, 2018)Measuring Metadata Quality (ELAG, 2018)
Measuring Metadata Quality (ELAG, 2018)
 
Metadata Quality assessment tool for Open Access
Metadata Quality assessment tool for Open AccessMetadata Quality assessment tool for Open Access
Metadata Quality assessment tool for Open Access
 
Metadata Quality assessment tool for Open Access Cultural Heritage institutio...
Metadata Quality assessment tool for Open Access Cultural Heritage institutio...Metadata Quality assessment tool for Open Access Cultural Heritage institutio...
Metadata Quality assessment tool for Open Access Cultural Heritage institutio...
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
 
A language modeling framework for expert finding
A language modeling framework for expert findingA language modeling framework for expert finding
A language modeling framework for expert finding
 
Nothing is created, nothing is lost, everything changes (ELAG, 2017)
Nothing is created, nothing is lost, everything changes (ELAG, 2017)Nothing is created, nothing is lost, everything changes (ELAG, 2017)
Nothing is created, nothing is lost, everything changes (ELAG, 2017)
 
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)
 
Preservation Metadata
Preservation MetadataPreservation Metadata
Preservation Metadata
 
A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge Graphs
 
How to expose research data in EOSC
How to expose research data in EOSCHow to expose research data in EOSC
How to expose research data in EOSC
 
Introduction to Digital Humanities: Metadata standards and ontologies
Introduction to Digital Humanities: Metadata standards and ontologies Introduction to Digital Humanities: Metadata standards and ontologies
Introduction to Digital Humanities: Metadata standards and ontologies
 
Globe seminar
Globe seminarGlobe seminar
Globe seminar
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
 
Realizing Semantic Web - Light Weight semantics and beyond
Realizing Semantic Web - Light Weight semantics and beyondRealizing Semantic Web - Light Weight semantics and beyond
Realizing Semantic Web - Light Weight semantics and beyond
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data Management
 
DataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open DataDataGraft: Data-as-a-Service for Open Data
DataGraft: Data-as-a-Service for Open Data
 
Academic Writing and Research Data Management
Academic Writing and Research Data ManagementAcademic Writing and Research Data Management
Academic Writing and Research Data Management
 
Data Quality
Data QualityData Quality
Data Quality
 
Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005Paper presentations: UK e-science AHM meeting, 2005
Paper presentations: UK e-science AHM meeting, 2005
 

More from Péter Király

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Péter Király
 
Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Péter Király
 
Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Péter Király
 
Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Péter Király
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)Péter Király
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Péter Király
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Péter Király
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Péter Király
 
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Péter Király
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Péter Király
 
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Péter Király
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)Péter Király
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)Péter Király
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Péter Király
 
Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Péter Király
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Péter Király
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Péter Király
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Péter Király
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Péter Király
 
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)Péter Király
 

More from Péter Király (20)

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
 
Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)
 
Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)
 
Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
 
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)
 
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...
 
Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
 
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
 

Recently uploaded

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 

Recently uploaded (20)

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 

Metadata Quality Assurance Framework at QQML2016 conference - full version

  • 1. Metadata Quality Assurance Framework Péter Király <peter.kiraly@gwdg.de> Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen, Germany QQML2016 8th International Conference on Qualitative and Quantitative Methods in Libraries 2016-05-24, London
  • 2. Metadata Quality Assurance Framework 2 the problem there are „good” and „bad” metadata records
  • 3. Metadata Quality Assurance Framework 3 Typical issues – non-informative field  Title is not informative non informative: „photograph, framed”, „group photograph” „photograph” vs informative: „Photograph of Sir Dugald Clerk”, „Photograph of "Puffing Billy"
  • 4. Metadata Quality Assurance Framework 4 Typical issues – Copy & paste cataloging  Keeping placeholders / templates
  • 5. Metadata Quality Assurance Framework 5 Typical issues – Field overuse  What is the meaning of the field? (overuse) TextGrid OAI-PMH response
  • 6. Metadata Quality Assurance Framework 6 Why data quality is important? „Fitness for purpose” (QA principle) no metadata no access to data no data usage more explanation: Data on the Web Best Practices W3C Working Draft 19 May 2016 https://www.w3.org/TR/dwbp/
  • 7. Metadata Quality Assurance Framework 7 Europeana Data Quality Committee  Online collaboration  Use case documents  Problem catalog  Tickets  Discussion forum  #EuropeanaDataQuality  Bi-weekly teleconf  Bi-yearly face-to-face meeting  Topics  Usage scenarios  Metadata profiles  Schema modification  Measuring  Event model  Proposals for data providers
  • 8. Metadata Quality Assurance Framework 8 Research hypothesis hypothesis with measuring structural elements we can predict metadata record quality
  • 9. Metadata Quality Assurance Framework 9 What it is good for?  improve the metadata  improve services: good data → functions  improve metadata schema & documentation  propagate „good practice” Domains:  cultural heritage sector  research data management and archiving
  • 10. Metadata Quality Assurance Framework 10 Research hypothesis proposed solution Metadata Quality Assurance Framework
  • 11. Metadata Quality Assurance Framework 11 What to measure?
  • 12. Metadata Quality Assurance Framework 12 Measurements  Schema-independent structural features existence, cardinality, uniqueness, length, dictionary entry, data type conformance  Use case scenarios („fit for purpose”) Requirements of the most important functions  Problem catalog Known metadata problems
  • 13. Metadata Quality Assurance Framework 13 Discovery scenarios and their metadata requirements Europeana’s most important functions 1. Basic retrieval with high precision and recall 2. Cross-language recall 3. Entity-based facets 4. Date-based facets 5. Improved language facets 6. Browse by subjects and resource types 7. Browse by agents 8. Browse/Search by Event 9. Entity-based knowledge cards and pages 10. Categorised similar items 11. Spatial search, browse, and map display 12. Entity-based autocompletion 13. Diversification of results 14. Hierarchical search and facets Credit: the document was initialized by Timothy Hill, Europeana’s search engineer
  • 14. Metadata Quality Assurance Framework 14 Discovery scenarios and their metadata requirements – Entity-based facets Scenario As a user I want to be able to filter by whether a person is the subject of a book, or its author, engraver, printer etc. Metadata analysis In each case the underlying requirement is that the relevant EDM fields for objects be populated by identifying URIs rather than free text. These URIs need to be related, at a minimum, to a label for each of the supported languages. Measurement rules  The relevant field values should be resolvable URI  each URI should have labels in multiple languages
  • 15. Metadata Quality Assurance Framework 15 Discovery scenarios and their metadata requirements – Date-based facets Scenario I want to be able to filter my results by a variety of timespans, e.g.:  Date of creation  Date of publication  Date as subject Metadata analysis Dates should be fully and consistently normalised to follow the XSD date-time data types. Dates expressed in styles like “490 avant J.C” that are inherently language dependent should be avoided as they’re very difficult to normalise (e.g. this should be represented as “- 0490”^^xsd:gYear). Measurement rules  Field value should be XSD date-time data types
  • 16. Metadata Quality Assurance Framework 16 Problem catalog Catalog of known metadata problems in Europeana  Title contents same as description contents  Systematic use of the same title  Bad string: "empty" (and variants)  Shelfmarks and other identifiers in fields  Creator not an agent name  Absurd geographical location  Subject field used as description field  Unicode U+FFFD (�)  Very short description field  ... Credit: the document was initialized by Timoty Hill, Europeana’s search engineer
  • 17. Metadata Quality Assurance Framework 17 Problem catalog Description Title contents same as description contents Example /2023702/35D943DF60D779EC9EF31F5DF... Motivation Distorts search weightings Checking Method Field comparison Notes Record display: creator concatenated onto title Metadata Scenario Basic Retrieval
  • 18. Metadata Quality Assurance Framework 18 How to define measurements?
  • 19. Metadata Quality Assurance Framework 19 Problem catalog – proposed basis of implementation Shapes Constraint Language (SHACL) https://www.w3.org/TR/shacl/ A language for describing and constraining the contents of RDF graphs. It provides a high-level vocabulary to identify predicates and their associated cardinalities, datatypes and other constraints.  sh:equals, sh:notEquals  sh:hasValue  sh:in  sh:lessThan, sh:lessThanOrEquals  sh:minCount, sh:maxCount  sh:minLength, sh:maxLength  sh:pattern
  • 20. Metadata Quality Assurance Framework 20 early measurement results and their visualization
  • 21. Metadata Quality Assurance Framework 21 overall view collection view record view Completeness – 40 measurements Field cardinality – 27 measurements Uniqueness – 6 measurements Language specification – 20 measurements Problem catalog – 3 measurements etc. links measurementsaggregated numbers
  • 22. Metadata Quality Assurance Framework 22 completeness What is the ratio of populated fields in records?
  • 23. Metadata Quality Assurance Framework 23 Field frequency / main
  • 24. Metadata Quality Assurance Framework 24 Field frequency / main Alternative title is a rare field
  • 25. Metadata Quality Assurance Framework 25 Field frequency per collections / all no record has alternative title every record has alternative title
  • 26. Metadata Quality Assurance Framework 26 Field frequency per collections / remove no-instances
  • 27. Metadata Quality Assurance Framework 27 Field frequency per collections / display only complete collections
  • 28. Metadata Quality Assurance Framework 28 cardinality How many field instances are in the records?
  • 29. Metadata Quality Assurance Framework 29 Field cardinality – overview more field than record number of records
  • 30. Metadata Quality Assurance Framework 30 Field cardinality – overview dc:type
  • 31. Metadata Quality Assurance Framework 31 Field cardinality – histogram 128 subjects in one record median is 0, mean is close to 1 link to interesting records
  • 32. Metadata Quality Assurance Framework 32 Field cardinality – an outlier
  • 33. Metadata Quality Assurance Framework 33 multilinguality Do we know the language of a field value?
  • 34. Metadata Quality Assurance Framework 34 Multilinguality @resource is a URI @ = language notation in RDF no language specification
  • 35. Metadata Quality Assurance Framework 35 Language frequency / barchart
  • 36. Metadata Quality Assurance Framework 36 Language frequency / barchart same language, different encodings
  • 37. Metadata Quality Assurance Framework 37 Language frequency / Treemap has language specification has no language specification
  • 38. Metadata Quality Assurance Framework 38 Language frequency / Treemap with resources has no language specification has language specification Is a URI
  • 39. Metadata Quality Assurance Framework 39 Language frequency / Treemap + interaction + table hide/display categories table-like formal
  • 40. Metadata Quality Assurance Framework 40 uniqueness (entropy) How unique the terms are in a field?
  • 41. Metadata Quality Assurance Framework 41 Entropy – term uniqueness / main 1 means a unique term 0.0000x means a very frequent term These are cumulative numbers entropycumolative = term1 + ... + termn
  • 42. Metadata Quality Assurance Framework 42 Entropy – term uniqueness / collection max is exceptional (=1425 * mean) unique records not or less unique records
  • 43. Metadata Quality Assurance Framework 43 Entropy – term uniqueness / refining the picture bulk of records are close to zero although 25% are between 0.05 and 1.25
  • 44. Metadata Quality Assurance Framework 44 Entropy – term uniqueness / field value Russian text in transcribed Latin writing szstem, not in Cyrillic
  • 45. Metadata Quality Assurance Framework 45 Entropy – term uniqueness / terms explanation of uniqueness score TF-IDF values come from Apache Solr term frequency: 1 document freq.: 2 uniqueness score: 0.5
  • 46. Metadata Quality Assurance Framework 46 problem catalog Does the record have any specific issues?
  • 47. Metadata Quality Assurance Framework 47 Problem catalog – Long subject a record with 265 „long” subject heading
  • 48. Metadata Quality Assurance Framework 48 Problem catalog – Long subject – example (not so long...) Conclusion: we have to refine the definition of „long”
  • 49. Metadata Quality Assurance Framework 49 Problem catalog – same title and description there is one title and description which is the same ... and we have 9 such records
  • 50. Metadata Quality Assurance Framework 50 Problem catalog – same title and description – example
  • 51. Metadata Quality Assurance Framework 51 completeness sub-dimensions Are the sub-dimensions (field groups supporting specific functionalities) complete?
  • 52. Metadata Quality Assurance Framework 52 Record view – functionality matrix existing missing functionalities
  • 53. Metadata Quality Assurance Framework 53 miscellaneous
  • 54. Metadata Quality Assurance Framework 54 Other elements of the record view
  • 55. Metadata Quality Assurance Framework 55 Further steps  Incorporating into Europeana’s ingestion tool  Process usage statistics (logs, Google Analitics)  Human evaluation of metadata quality  Measuring timeliness (changes of scores over time)  Machine learning based classification & clustering  Incorporating into research data management tool  Cooperation with other projects
  • 56. Metadata Quality Assurance Framework 56 Project principles  Scalable, ready for big data  Loose coupling to metadata schemas  Transparency: open source, open data (CC0)  Release early, release often  Getting real [1]  Collaboration and communication [1] https://gettingreal.37signals.com/
  • 57. Metadata Quality Assurance Framework 57 Architectural overview Apache Spark (Java) OAI-PMH client (PHP) Analysis with Spark (Scala) Analysis with R Web interface (PHP, d3.js) Hadoop File System JSON files Apache Solr Apache Cassandra JSON files JSON files image files CSV files CSV files recent workflow planned workflow
  • 58. Metadata Quality Assurance Framework 58 Follow me  Europeana Data Quality Committee http://pro.europeana.eu/europeana-tech/data- quality-committee  research plan and blog http://pkiraly.github.io  site http://144.76.218.178/europeana-qa/  source codes  https://github.com/pkiraly/europeana-qa-spark  https://github.com/pkiraly/europeana-qa-r  @kiru, https://www.linkedin.com/in/peterkiraly