SlideShare a Scribd company logo
Metadata Quality Assurance Framwork
Part II. – The implementation begins
Péter Király
peter.kiraly@gwdg.de
Göttingen, Geiststraße 10, GWDG meeting room 20/05/2016
Oberseminar Datenmanagement, Cloud und e-Infrastructure
Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen
Metadata Quality Assurance Framework
2
Why data quality is important?
„Fitness for purpose”
no metadata no access to data no data usage
more explanation:
Data on the Web Best Practices
W3C Working Draft 17 December 2015
http://www.w3.org/TR/2015/WD-dwbp-20151217/
Metadata Quality Assurance Framework
3
What it is good for?
 Improve the metadata
 Improve metadata schema and its docum.
 Propagate „good practice”
 Improve services: „good” data is ranked
higher in search result list
Specifically for GWDG:
 Could be built in to current and planned data
management / data archiving tools
Metadata Quality Assurance Framework
4
Project principles
 Full transparency
 Open source, open data (CC0)
 Minimal viable product
 „Release early. Release often. And listen to
your customers” (Eric S. Raymond)
 „Eat your own dog food”
 Getting real https://gettingreal.37signals.com/
Metadata Quality Assurance Framework
5
Measurements
Schema-independent structural features
Existence, cardinality, uniqueness
Use case scenarios („fit for purpose”)
Requirements of the most important functions
Problem catalog
Known metadata problems
Metadata Quality Assurance Framework
6
Europeana Data Quality Committee
 Online collaboration
 Use case documents
 Problem catalog
 Tickets
 Discussion forum
 #EuropeanaDataQuali
ty
 Bi-weekly teleconf
 Bi-yearly face-to-face
meeting
 Topics
 Usage scenarios
 Metadata profiles
 Schema modification
 Measuring
 Event model
Metadata Quality Assurance Framework
7
Discovery scenarios and their metadata requirements
1. Basic retrieval with high precision and recall
2. Cross-language recall
3. Entity-based facets
4. Date-based facets
5. Improved language facets
6. Browse by subjects and resource types
7. Browse by agents
8. Browse/Search by Event
9. Entity-based knowledge cards and pages
10.Categorised similar items
11.Spatial search, browse, and map display
12.Entity-based autocompletion
13.Diversification of results
14.Hierarchical search and facets
Credit: the document was initialized by Tim Hill, Europeana’s search engineer
Metadata Quality Assurance Framework
8
Discovery scenarios and their metadata requirements - 3. Entity-based facets
Scenario
As a user, ... I want to be able to filter by whether a person is the
subject of a book, or its author, engraver, printer etc.
Metadata analysis
In each case the underlying requirement is that the relevant EDM
fields for objects be populated by identifying URIs rather than free
text. These URIs need to be related, at a minimum, to a label for
each of the supported languages.
Measurement rules
 The relevant field values should be resolvable URI
 each URI should have labels in multiple languages
Metadata Quality Assurance Framework
9
Discovery scenarios and their metadata requirements – 4. Date-based facets
Scenario
I want to be able to filter my results by a variety of timespans, e.g.:
 Date of creation
 Date of publication
 Date as subject
Metadata analysis
Dates should be fully and consistently normalised to follow the XSD
date-time data types. Dates expressed in styles like “490 avant J.C”
that are inherently language dependent should be avoided as they’re
very difficult to normalise (e.g. this should be represented as “-
0490”^^xsd:gYear).
Measurement rules
 Field value should be XSD date-time data types
Metadata Quality Assurance Framework
10
Problem catalog
 Title contents same as description contents
 Systematic use of the same title
 Bad string: "empty" (and variants)
 Shelfmarks and other identifiers in fields
 Creator not an agent name
 Absurd geographical location
 Subject field used as description field
 Unicode U+FFFD (�)
 Very short description field
Credit: the document was initialized by Tim Hill, Europeana’s search engineer
Metadata Quality Assurance Framework
11
Problem catalog
Description Title contents same as description contents
Example /2023702/35D943DF60D779EC9EF31F5DF...
Motivation Distorts search weightings
Checking Method Field comparison
Notes Record display: creator concatenated onto title
Metadata Scenario Basic Retrieval
Metadata Quality Assurance Framework
12
Problem catalog – proposed basis of implementation
Shapes Constraint Language (SHACL)
https://www.w3.org/TR/shacl/
SHACL (Shapes Constraint Language) is a language for describing
and constraining the contents of RDF graphs. SHACL groups these
descriptions and constraints into "shapes", which specify conditions
that apply at a given RDF node. Shapes provide a high-level
vocabulary to identify predicates and their associated cardinalities,
datatypes and other constraints.
 sh:equals, sh:notEquals
 sh:hasValue
 sh:in
 sh:lessThan, sh:lessThanOrEquals
 sh:minCount, sh:maxCount
 sh:minLength, sh:maxLength
 sh:pattern
Metadata Quality Assurance Framework
13
Field frequency / main
Metadata Quality Assurance Framework
14
Field frequency per collections / all
Metadata Quality Assurance Framework
15
Field frequency per collections / >0%
Metadata Quality Assurance Framework
16
Field frequency per collections / =100%
Metadata Quality Assurance Framework
17
Field cardinality – overview
Metadata Quality Assurance Framework
18
Field cardinality –histogram
Metadata Quality Assurance Framework
19
Field cardinality – an outlier
Metadata Quality Assurance Framework
20
Multilinguality
@ = language notation in RDF
resource notation
no language
Metadata Quality Assurance Framework
21
Language frequency / barchart
Metadata Quality Assurance Framework
22
Language frequency / barchart
Metadata Quality Assurance Framework
23
Language frequency / Treemap
Metadata Quality Assurance Framework
24
Language frequency / Treemap with resources
Metadata Quality Assurance Framework
25
Language frequency / Treemap + interaction + table
Metadata Quality Assurance Framework
26
Entropy – term uniqueness / main
Metadata Quality Assurance Framework
27
Entropy – term uniqueness / collection
Metadata Quality Assurance Framework
28
Entropy – term uniqueness / field value
Metadata Quality Assurance Framework
29
Entropy – term uniqueness / terms
Metadata Quality Assurance Framework
30
Problem catalog – Long subject
Metadata Quality Assurance Framework
31
Problem catalog – Long subject – example (not so long...)
Conclusion: we
have to refine
the definition of
„long”
Metadata Quality Assurance Framework
32
Problem catalog – same title and description
Metadata Quality Assurance Framework
33
Problem catalog – same title and description – example
Metadata Quality Assurance Framework
34
Record view – functionality matrix
Metadata Quality Assurance Framework
35
Other elements of the record view
Metadata Quality Assurance Framework
36
Further steps
 Building in completeness measurements to Europeana’s ingestion tool
 Including usage statistics (log files, Google Analitics API)
 Human evaluation of metadata quality
 Measuring timeliness (changes of scores over time)
 Machine learning:
 Classification/Clustering of records
 Statistical relevancy of measurements
 Göttingen use case: proposed SUB project „Shared Print Study”
 Göttingen use case: incorporating into research data management tool
 Cooperation with other projects
Metadata Quality Assurance Framework
37
Architectural overview
Apache Spark
(Java)
OAI-PMH client (PHP)
Analysis with
Spark (Scala) Analysis with R
Web interface
(PHP, d3.js)
Hadoop File
System
JSON files
Apache Solr
Apache
Cassandra
JSON files
JSON files
Image files
CSV files
CSV files
recent workflow
planned workflow
Metadata Quality Assurance Framework
38
Articles, reports, presentations
Metadata Quality Assurance Framework
39
Follow me
 Project plan and blog: http://pkiraly.github.io
 Site: http://144.76.218.178/europeana-qa/
 Software development:
 https://github.com/pkiraly/europeana-qa-spark: Europeana
Metadata Quality Assurance Toolkit
 https://github.com/pkiraly/europeana-qa-r: Europeana
Metadata Quality Assurance Toolkit
 @kiru, https://www.linkedin.com/in/peterkiraly

More Related Content

What's hot

FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
Tom Plasterer
 
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Tom Plasterer
 
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
Kerstin Forsberg
 
OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...
Barry Hardy
 
Metadata mapping
Metadata mappingMetadata mapping
Metadata mappingVlad Vega
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Artificial Intelligence Institute at UofSC
 
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Tom Plasterer
 
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Tom Plasterer
 
OSFair2017 Training | FAIR metrics - Starring your data sets
OSFair2017 Training | FAIR metrics - Starring your data setsOSFair2017 Training | FAIR metrics - Starring your data sets
OSFair2017 Training | FAIR metrics - Starring your data sets
Open Science Fair
 
Dev days 2017 questionnaires (brian postlethwaite)
Dev days 2017 questionnaires (brian postlethwaite)Dev days 2017 questionnaires (brian postlethwaite)
Dev days 2017 questionnaires (brian postlethwaite)
DevDays
 
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
Nandana Mihindukulasooriya
 
Megan Milton & Mallory Van Wyngaarden - Managing Barcode Data Library Generation
Megan Milton & Mallory Van Wyngaarden - Managing Barcode Data Library GenerationMegan Milton & Mallory Van Wyngaarden - Managing Barcode Data Library Generation
Megan Milton & Mallory Van Wyngaarden - Managing Barcode Data Library Generation
Consortium for the Barcode of Life (CBOL)
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
andrea huang
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
Lucy McKenna
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval GESIS
 
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
Dr. Haxel Consult
 
Providing Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case studyProviding Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case study
inscit2006
 
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives TaiwanA Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
andrea huang
 

What's hot (20)

FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
 
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
MIE2014: A Framework for Evaluating and Utilizing Medical Terminology Mappings
 
OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...OpenTox - an open community and framework supporting predictive toxicology an...
OpenTox - an open community and framework supporting predictive toxicology an...
 
Metadata mapping
Metadata mappingMetadata mapping
Metadata mapping
 
Metadata crosswalks
Metadata crosswalksMetadata crosswalks
Metadata crosswalks
 
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...Semantics-enhanced Cyberinfrastructure for ICMSE :  Interoperability, Analyti...
Semantics-enhanced Cyberinfrastructure for ICMSE : Interoperability, Analyti...
 
A Case for linked Data for Medical Devices in the IVD Market
A Case for linked Data for Medical Devices in the IVD MarketA Case for linked Data for Medical Devices in the IVD Market
A Case for linked Data for Medical Devices in the IVD Market
 
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
 
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
 
OSFair2017 Training | FAIR metrics - Starring your data sets
OSFair2017 Training | FAIR metrics - Starring your data setsOSFair2017 Training | FAIR metrics - Starring your data sets
OSFair2017 Training | FAIR metrics - Starring your data sets
 
Dev days 2017 questionnaires (brian postlethwaite)
Dev days 2017 questionnaires (brian postlethwaite)Dev days 2017 questionnaires (brian postlethwaite)
Dev days 2017 questionnaires (brian postlethwaite)
 
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
 
Megan Milton & Mallory Van Wyngaarden - Managing Barcode Data Library Generation
Megan Milton & Mallory Van Wyngaarden - Managing Barcode Data Library GenerationMegan Milton & Mallory Van Wyngaarden - Managing Barcode Data Library Generation
Megan Milton & Mallory Van Wyngaarden - Managing Barcode Data Library Generation
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval
 
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
 
Providing Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case studyProviding Tools for Author Evaluation - A case study
Providing Tools for Author Evaluation - A case study
 
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives TaiwanA Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
 

Viewers also liked

Čempionu Brokastis #23 / Edgars Lapiņš / "Autentisks mārketings kritiski domā...
Čempionu Brokastis #23 / Edgars Lapiņš / "Autentisks mārketings kritiski domā...Čempionu Brokastis #23 / Edgars Lapiņš / "Autentisks mārketings kritiski domā...
Čempionu Brokastis #23 / Edgars Lapiņš / "Autentisks mārketings kritiski domā...
NORD DDB RIGA
 
A Saga about the Witcher by group2
A Saga about the Witcher by group2A Saga about the Witcher by group2
A Saga about the Witcher by group2Erasmus+
 
From local to global: Romanian cultural values in Europeana through Locloud
From local to global: Romanian cultural values in Europeana through LocloudFrom local to global: Romanian cultural values in Europeana through Locloud
From local to global: Romanian cultural values in Europeana through Locloud
locloud
 
IES GALLICUM
IES GALLICUMIES GALLICUM
Chris hess 2012
Chris hess 2012Chris hess 2012
Chris hess 2012Hesscj
 
007 V領 龐德背心
007 V領 龐德背心007 V領 龐德背心
007 V領 龐德背心
sisy
 
KB domeinaggregator voor publicaties naar DigitaleCollectie.nl
KB domeinaggregator voor publicaties naar DigitaleCollectie.nlKB domeinaggregator voor publicaties naar DigitaleCollectie.nl
KB domeinaggregator voor publicaties naar DigitaleCollectie.nl
Elco van Staveren
 
BEST PRACTICE: Value is the key to opening more doors – Dramatically enhance ...
BEST PRACTICE: Value is the key to opening more doors – Dramatically enhance ...BEST PRACTICE: Value is the key to opening more doors – Dramatically enhance ...
BEST PRACTICE: Value is the key to opening more doors – Dramatically enhance ...
B2B Marketing
 
Small, smaller and smallest: working with small archaeological content provid...
Small, smaller and smallest: working with small archaeological content provid...Small, smaller and smallest: working with small archaeological content provid...
Small, smaller and smallest: working with small archaeological content provid...
locloud
 
Antenna prevention team 4
Antenna prevention team 4Antenna prevention team 4
Antenna prevention team 4Erasmus+
 
Čempionu Brokastis #22 / Tatjana Baranovska / "Spriedzes mārketings"
Čempionu Brokastis #22 / Tatjana Baranovska / "Spriedzes mārketings"Čempionu Brokastis #22 / Tatjana Baranovska / "Spriedzes mārketings"
Čempionu Brokastis #22 / Tatjana Baranovska / "Spriedzes mārketings"
NORD DDB RIGA
 
iAnnotate 2016 - Demo Pundit web annotator
iAnnotate 2016 - Demo Pundit web annotatoriAnnotate 2016 - Demo Pundit web annotator
iAnnotate 2016 - Demo Pundit web annotator
Net7
 
Europeana Network Association AGM 2016 - 8 November - Federico Milani.
Europeana Network Association AGM 2016 - 8 November - Federico Milani.Europeana Network Association AGM 2016 - 8 November - Federico Milani.
Europeana Network Association AGM 2016 - 8 November - Federico Milani.
Europeana
 
The Lord of the Rings
The Lord of the Rings  The Lord of the Rings
The Lord of the Rings
Fahad Saleem
 
Larutan Elektrolit dan Nonelektrolit
Larutan Elektrolit dan NonelektrolitLarutan Elektrolit dan Nonelektrolit
Larutan Elektrolit dan Nonelektrolit
Puswita Septia Usman
 
I Twi
I TwiI Twi
I Twifeeds
 

Viewers also liked (17)

Čempionu Brokastis #23 / Edgars Lapiņš / "Autentisks mārketings kritiski domā...
Čempionu Brokastis #23 / Edgars Lapiņš / "Autentisks mārketings kritiski domā...Čempionu Brokastis #23 / Edgars Lapiņš / "Autentisks mārketings kritiski domā...
Čempionu Brokastis #23 / Edgars Lapiņš / "Autentisks mārketings kritiski domā...
 
A Saga about the Witcher by group2
A Saga about the Witcher by group2A Saga about the Witcher by group2
A Saga about the Witcher by group2
 
From local to global: Romanian cultural values in Europeana through Locloud
From local to global: Romanian cultural values in Europeana through LocloudFrom local to global: Romanian cultural values in Europeana through Locloud
From local to global: Romanian cultural values in Europeana through Locloud
 
IES GALLICUM
IES GALLICUMIES GALLICUM
IES GALLICUM
 
Chris hess 2012
Chris hess 2012Chris hess 2012
Chris hess 2012
 
007 V領 龐德背心
007 V領 龐德背心007 V領 龐德背心
007 V領 龐德背心
 
KB domeinaggregator voor publicaties naar DigitaleCollectie.nl
KB domeinaggregator voor publicaties naar DigitaleCollectie.nlKB domeinaggregator voor publicaties naar DigitaleCollectie.nl
KB domeinaggregator voor publicaties naar DigitaleCollectie.nl
 
BEST PRACTICE: Value is the key to opening more doors – Dramatically enhance ...
BEST PRACTICE: Value is the key to opening more doors – Dramatically enhance ...BEST PRACTICE: Value is the key to opening more doors – Dramatically enhance ...
BEST PRACTICE: Value is the key to opening more doors – Dramatically enhance ...
 
Small, smaller and smallest: working with small archaeological content provid...
Small, smaller and smallest: working with small archaeological content provid...Small, smaller and smallest: working with small archaeological content provid...
Small, smaller and smallest: working with small archaeological content provid...
 
Antenna prevention team 4
Antenna prevention team 4Antenna prevention team 4
Antenna prevention team 4
 
Čempionu Brokastis #22 / Tatjana Baranovska / "Spriedzes mārketings"
Čempionu Brokastis #22 / Tatjana Baranovska / "Spriedzes mārketings"Čempionu Brokastis #22 / Tatjana Baranovska / "Spriedzes mārketings"
Čempionu Brokastis #22 / Tatjana Baranovska / "Spriedzes mārketings"
 
iAnnotate 2016 - Demo Pundit web annotator
iAnnotate 2016 - Demo Pundit web annotatoriAnnotate 2016 - Demo Pundit web annotator
iAnnotate 2016 - Demo Pundit web annotator
 
Europeana Network Association AGM 2016 - 8 November - Federico Milani.
Europeana Network Association AGM 2016 - 8 November - Federico Milani.Europeana Network Association AGM 2016 - 8 November - Federico Milani.
Europeana Network Association AGM 2016 - 8 November - Federico Milani.
 
The Lord of the Rings
The Lord of the Rings  The Lord of the Rings
The Lord of the Rings
 
Larutan Elektrolit dan Nonelektrolit
Larutan Elektrolit dan NonelektrolitLarutan Elektrolit dan Nonelektrolit
Larutan Elektrolit dan Nonelektrolit
 
Numbers
NumbersNumbers
Numbers
 
I Twi
I TwiI Twi
I Twi
 

Similar to Metadata Quality Assurance Part II. The implementation begins

Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In Practice
Marcia Zeng
 
A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge Graphs
Michel Dumontier
 
Metadata: Digital Humanties
Metadata: Digital HumantiesMetadata: Digital Humanties
Metadata: Digital Humanties
Matthew Miguez
 
20230525_mmc_seminar.pdf
20230525_mmc_seminar.pdf20230525_mmc_seminar.pdf
20230525_mmc_seminar.pdf
Miel Vander Sande
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
ManjulaPatel
 
Sensor metadata management with SWM (SMWCon fall 2013)
Sensor metadata management with SWM (SMWCon fall 2013)Sensor metadata management with SWM (SMWCon fall 2013)
Sensor metadata management with SWM (SMWCon fall 2013)
jwnoteboom
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
Carole Goble
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the Haystack
Adrian Stevenson
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
National Information Standards Organization (NISO)
 
Making Inter-operability Visible
Making Inter-operability VisibleMaking Inter-operability Visible
Making Inter-operability Visibleliddy
 
DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!
dclsocialmedia
 
DITA, Semantics, Content Management, Dynamic Documents, and Linked Data – A M...
DITA, Semantics, Content Management, Dynamic Documents, and Linked Data – A M...DITA, Semantics, Content Management, Dynamic Documents, and Linked Data – A M...
DITA, Semantics, Content Management, Dynamic Documents, and Linked Data – A M...
Paul Wlodarczyk
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of Scientists
Andreas Schreiber
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data Management
Andreas Schreiber
 
Kuliman "Content Profiles & linked documents"
Kuliman "Content Profiles & linked documents"Kuliman "Content Profiles & linked documents"
Kuliman "Content Profiles & linked documents"
National Information Standards Organization (NISO)
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)
Globus
 
Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1
ErhardRahm
 
Dynamic Object-Oriented Requirements System (DOORS)
Dynamic Object-Oriented Requirements System (DOORS)Dynamic Object-Oriented Requirements System (DOORS)
Dynamic Object-Oriented Requirements System (DOORS)David Groff
 

Similar to Metadata Quality Assurance Part II. The implementation begins (20)

Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In Practice
 
A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge Graphs
 
Metadata: Digital Humanties
Metadata: Digital HumantiesMetadata: Digital Humanties
Metadata: Digital Humanties
 
20230525_mmc_seminar.pdf
20230525_mmc_seminar.pdf20230525_mmc_seminar.pdf
20230525_mmc_seminar.pdf
 
NISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector AdministrationNISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector Administration
 
Knowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents EnvironmentKnowledge Discovery in an Agents Environment
Knowledge Discovery in an Agents Environment
 
Sensor metadata management with SWM (SMWCon fall 2013)
Sensor metadata management with SWM (SMWCon fall 2013)Sensor metadata management with SWM (SMWCon fall 2013)
Sensor metadata management with SWM (SMWCon fall 2013)
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the Haystack
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
Making Inter-operability Visible
Making Inter-operability VisibleMaking Inter-operability Visible
Making Inter-operability Visible
 
DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!
 
DITA, Semantics, Content Management, Dynamic Documents, and Linked Data – A M...
DITA, Semantics, Content Management, Dynamic Documents, and Linked Data – A M...DITA, Semantics, Content Management, Dynamic Documents, and Linked Data – A M...
DITA, Semantics, Content Management, Dynamic Documents, and Linked Data – A M...
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of Scientists
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data Management
 
Kuliman "Content Profiles & linked documents"
Kuliman "Content Profiles & linked documents"Kuliman "Content Profiles & linked documents"
Kuliman "Content Profiles & linked documents"
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)
 
Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1Scalable and privacy-preserving data integration - part 1
Scalable and privacy-preserving data integration - part 1
 
Dynamic Object-Oriented Requirements System (DOORS)
Dynamic Object-Oriented Requirements System (DOORS)Dynamic Object-Oriented Requirements System (DOORS)
Dynamic Object-Oriented Requirements System (DOORS)
 

More from Péter Király

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Péter Király
 
Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)
Péter Király
 
Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)
Péter Király
 
Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)
Péter Király
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
Péter Király
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)
Péter Király
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Péter Király
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Péter Király
 
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Péter Király
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)
Péter Király
 
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Péter Király
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)
Péter Király
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
Péter Király
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...
Péter Király
 
Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)
Péter Király
 
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Péter Király
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)
Péter Király
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Péter Király
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)
Péter Király
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Péter Király
 

More from Péter Király (20)

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
 
Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)
 
Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)
 
Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
 
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)
 
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...
 
Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)
 
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
 

Recently uploaded

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 

Recently uploaded (20)

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 

Metadata Quality Assurance Part II. The implementation begins

  • 1. Metadata Quality Assurance Framwork Part II. – The implementation begins Péter Király peter.kiraly@gwdg.de Göttingen, Geiststraße 10, GWDG meeting room 20/05/2016 Oberseminar Datenmanagement, Cloud und e-Infrastructure Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen
  • 2. Metadata Quality Assurance Framework 2 Why data quality is important? „Fitness for purpose” no metadata no access to data no data usage more explanation: Data on the Web Best Practices W3C Working Draft 17 December 2015 http://www.w3.org/TR/2015/WD-dwbp-20151217/
  • 3. Metadata Quality Assurance Framework 3 What it is good for?  Improve the metadata  Improve metadata schema and its docum.  Propagate „good practice”  Improve services: „good” data is ranked higher in search result list Specifically for GWDG:  Could be built in to current and planned data management / data archiving tools
  • 4. Metadata Quality Assurance Framework 4 Project principles  Full transparency  Open source, open data (CC0)  Minimal viable product  „Release early. Release often. And listen to your customers” (Eric S. Raymond)  „Eat your own dog food”  Getting real https://gettingreal.37signals.com/
  • 5. Metadata Quality Assurance Framework 5 Measurements Schema-independent structural features Existence, cardinality, uniqueness Use case scenarios („fit for purpose”) Requirements of the most important functions Problem catalog Known metadata problems
  • 6. Metadata Quality Assurance Framework 6 Europeana Data Quality Committee  Online collaboration  Use case documents  Problem catalog  Tickets  Discussion forum  #EuropeanaDataQuali ty  Bi-weekly teleconf  Bi-yearly face-to-face meeting  Topics  Usage scenarios  Metadata profiles  Schema modification  Measuring  Event model
  • 7. Metadata Quality Assurance Framework 7 Discovery scenarios and their metadata requirements 1. Basic retrieval with high precision and recall 2. Cross-language recall 3. Entity-based facets 4. Date-based facets 5. Improved language facets 6. Browse by subjects and resource types 7. Browse by agents 8. Browse/Search by Event 9. Entity-based knowledge cards and pages 10.Categorised similar items 11.Spatial search, browse, and map display 12.Entity-based autocompletion 13.Diversification of results 14.Hierarchical search and facets Credit: the document was initialized by Tim Hill, Europeana’s search engineer
  • 8. Metadata Quality Assurance Framework 8 Discovery scenarios and their metadata requirements - 3. Entity-based facets Scenario As a user, ... I want to be able to filter by whether a person is the subject of a book, or its author, engraver, printer etc. Metadata analysis In each case the underlying requirement is that the relevant EDM fields for objects be populated by identifying URIs rather than free text. These URIs need to be related, at a minimum, to a label for each of the supported languages. Measurement rules  The relevant field values should be resolvable URI  each URI should have labels in multiple languages
  • 9. Metadata Quality Assurance Framework 9 Discovery scenarios and their metadata requirements – 4. Date-based facets Scenario I want to be able to filter my results by a variety of timespans, e.g.:  Date of creation  Date of publication  Date as subject Metadata analysis Dates should be fully and consistently normalised to follow the XSD date-time data types. Dates expressed in styles like “490 avant J.C” that are inherently language dependent should be avoided as they’re very difficult to normalise (e.g. this should be represented as “- 0490”^^xsd:gYear). Measurement rules  Field value should be XSD date-time data types
  • 10. Metadata Quality Assurance Framework 10 Problem catalog  Title contents same as description contents  Systematic use of the same title  Bad string: "empty" (and variants)  Shelfmarks and other identifiers in fields  Creator not an agent name  Absurd geographical location  Subject field used as description field  Unicode U+FFFD (�)  Very short description field Credit: the document was initialized by Tim Hill, Europeana’s search engineer
  • 11. Metadata Quality Assurance Framework 11 Problem catalog Description Title contents same as description contents Example /2023702/35D943DF60D779EC9EF31F5DF... Motivation Distorts search weightings Checking Method Field comparison Notes Record display: creator concatenated onto title Metadata Scenario Basic Retrieval
  • 12. Metadata Quality Assurance Framework 12 Problem catalog – proposed basis of implementation Shapes Constraint Language (SHACL) https://www.w3.org/TR/shacl/ SHACL (Shapes Constraint Language) is a language for describing and constraining the contents of RDF graphs. SHACL groups these descriptions and constraints into "shapes", which specify conditions that apply at a given RDF node. Shapes provide a high-level vocabulary to identify predicates and their associated cardinalities, datatypes and other constraints.  sh:equals, sh:notEquals  sh:hasValue  sh:in  sh:lessThan, sh:lessThanOrEquals  sh:minCount, sh:maxCount  sh:minLength, sh:maxLength  sh:pattern
  • 13. Metadata Quality Assurance Framework 13 Field frequency / main
  • 14. Metadata Quality Assurance Framework 14 Field frequency per collections / all
  • 15. Metadata Quality Assurance Framework 15 Field frequency per collections / >0%
  • 16. Metadata Quality Assurance Framework 16 Field frequency per collections / =100%
  • 17. Metadata Quality Assurance Framework 17 Field cardinality – overview
  • 18. Metadata Quality Assurance Framework 18 Field cardinality –histogram
  • 19. Metadata Quality Assurance Framework 19 Field cardinality – an outlier
  • 20. Metadata Quality Assurance Framework 20 Multilinguality @ = language notation in RDF resource notation no language
  • 21. Metadata Quality Assurance Framework 21 Language frequency / barchart
  • 22. Metadata Quality Assurance Framework 22 Language frequency / barchart
  • 23. Metadata Quality Assurance Framework 23 Language frequency / Treemap
  • 24. Metadata Quality Assurance Framework 24 Language frequency / Treemap with resources
  • 25. Metadata Quality Assurance Framework 25 Language frequency / Treemap + interaction + table
  • 26. Metadata Quality Assurance Framework 26 Entropy – term uniqueness / main
  • 27. Metadata Quality Assurance Framework 27 Entropy – term uniqueness / collection
  • 28. Metadata Quality Assurance Framework 28 Entropy – term uniqueness / field value
  • 29. Metadata Quality Assurance Framework 29 Entropy – term uniqueness / terms
  • 30. Metadata Quality Assurance Framework 30 Problem catalog – Long subject
  • 31. Metadata Quality Assurance Framework 31 Problem catalog – Long subject – example (not so long...) Conclusion: we have to refine the definition of „long”
  • 32. Metadata Quality Assurance Framework 32 Problem catalog – same title and description
  • 33. Metadata Quality Assurance Framework 33 Problem catalog – same title and description – example
  • 34. Metadata Quality Assurance Framework 34 Record view – functionality matrix
  • 35. Metadata Quality Assurance Framework 35 Other elements of the record view
  • 36. Metadata Quality Assurance Framework 36 Further steps  Building in completeness measurements to Europeana’s ingestion tool  Including usage statistics (log files, Google Analitics API)  Human evaluation of metadata quality  Measuring timeliness (changes of scores over time)  Machine learning:  Classification/Clustering of records  Statistical relevancy of measurements  Göttingen use case: proposed SUB project „Shared Print Study”  Göttingen use case: incorporating into research data management tool  Cooperation with other projects
  • 37. Metadata Quality Assurance Framework 37 Architectural overview Apache Spark (Java) OAI-PMH client (PHP) Analysis with Spark (Scala) Analysis with R Web interface (PHP, d3.js) Hadoop File System JSON files Apache Solr Apache Cassandra JSON files JSON files Image files CSV files CSV files recent workflow planned workflow
  • 38. Metadata Quality Assurance Framework 38 Articles, reports, presentations
  • 39. Metadata Quality Assurance Framework 39 Follow me  Project plan and blog: http://pkiraly.github.io  Site: http://144.76.218.178/europeana-qa/  Software development:  https://github.com/pkiraly/europeana-qa-spark: Europeana Metadata Quality Assurance Toolkit  https://github.com/pkiraly/europeana-qa-r: Europeana Metadata Quality Assurance Toolkit  @kiru, https://www.linkedin.com/in/peterkiraly