SlideShare a Scribd company logo
Evaluating Data Quality in Europeana:
Metrics for Multilinguality
Péter Király1, Juliane Stiller2, Valentine Charles3, Werner Bailer4, Nuno Freire5
1 Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen
2 Berlin School of Library and Information Science, Humboldt-Universität zu Berlin
3 Europeana Foundation, The Hague
4 Joanneum Research Forschungsgesellschaft mbH, Graz
5 INESC-ID, Lisbon
MTSR 2018 - Track on Cultural Collections and Applications, Limassol, Oct. 24, 2018
1
Nummertjes by Fabio (CC BY-NC 2.0)
Agenda
1. Europeana
2. Multilingual Information in Europeana’s Metadata
3. Multilinguality as a Facet of Quality Dimensions
4. Results
5. Demo
2
Europeana - Platform for
Cultural Heritage Material
www.europeana.eu
○ Books, newspapers, letters, paintings, photographs, radio shows, films,
etc.
○ Text, images, video, audio, sounds, 3D
○ Over 58 million objects
○ > 50 languages
Europeana - Facts
4
Multilingual Information in
Europeana’s Metadata
5
English cultural heritage object:
<dc:language>en</dc:language>
English cultural heritage object:
<dc:language>en</dc:language>
German metadata
Multilinguality on Field Level
<#record> a ore:Proxy ;
dc:subject “Ballet”, “Opera”@en
<#record> a ore:Proxy ; edm:europeanaProxy true ;
dc:subject <http://data.europeana.eu/concept/base/264>.
<http://data.europeana.eu/concept/base/264> a skos:Concept .
skos:prefLabel "Ballett"@no, "बैले"@hi, "Ballett"@de, "Балет"@be, "Балет"@ru,
"Balé"@pt, "Балет"@bg, "Baletas"@lt, "Balet"@hr, "Balets"@lv .
Europeana Dereferencing
Literal, literal with language tag
Processes Contributing to Multilinguality
dc: subject
“subject”@en
dc:creator
<http://vocab.getty.edu/...
>
dc:type
<http://voc.example./…>
dc:subject
<http://dbpedia.org/
aSubjectID>
dc:subject
“Subject”
Data from Provider
dc:creator
new labels in
different languages
Data added by Europeana: dereferencing step
Quantifiable: “term”@language annotation
dc:subject
New labels in different
languages
Quantify Multilinguality of Data to:
○ Establish a sense of the multilingual reach of Europeana, incl.
distribution of languages
○ Identify the impact of different workflows / processes on
multilinguality of data
○ Take measures to improve multilinguality in data
○ Devise strategies for underrepresented languages
What Could be Measured?
○ Number of (distinct) languages in the metadata
○ Number of language-tagged literals
○ Tagged literals per language
○ Existence of language information fields such as dc:language
○ Consistency and conformity of language information
Multilinguality as a Facet of
Quality Dimensions
12
Completeness
○ This dimension:
○ expresses the number (fraction) of fields present in a dataset
○ identifies non-empty values in a record or (sub-)collection.
○ Multilingual completeness is captured by:
○ Presence of value in dc:language
○ Share of fields with language tags to overall available fields
Consistency
○ Describes the logical coherence of metadata
○ Assesses variety of language values in the dc:language field:
how many distinct values?
○ Contributes to features like language-based facet
Conformity
○ Describes the conformity to a given standard such as ISO-639-2
○ Example: English is expressed as: English, ENG, en, en-uk, …
○ Share of values that comply or do not comply
Accessibility
○ Access to information and data across languages
○ Distribution of linguistic information in metadata
○ Quantifying the language tag
○ The more language tags, the higher the multilingual reach
Dimensions, Criteria & Measures
Dimension Criteria Measure
Completeness Presence or absence of values in fields
relating to the language of the object or
the metadata
Share of multilingual fields to overall
fields
Presence or absence of dc:language
field
Consistency Variance in language notation Distinct language notations
Conformity Compliance to ISO-639-2 Share of values that comply
Accessibility Accessibility across languages
expressed through language tags
Number of distinct languages
Number of languages/Number of
tagged literals
tagged literals per language
Results
18
Data processing workflow
web interface
statistical analysis
measuring
ingestion
★ OAI-PMH
★ Europeana API
★ Hadoop
★ NoSQL
★ Spark
★ Hadoop
★ Java
★ Apache Solr
★ Spark
★ R
★ PHP
★ D3.js
★ highchart.js
★ NoSQL
json csv json, png html, svg
20
DEMO
Questions
★ Contact
valentine.charles@europeana.eu
juliane.stiller@ibi.hu-berlin.de
werner.bailer@joanneum.at
peter.kiraly@gwdg.de
nfreire@gmail.com
★ Metadata Quality Assurance Framework
http://144.76.218.178/europeana-qa
★ Europeana Data Quality Committee
https://pro.europeana.eu/project/data-
quality-committee
22
Discussion & Future Work
23
Discussion
○ Completeness and Consistency are fairly easy to interpret
○ Accessibility measures are harder, e.g. contextual entities for
broad concepts or common places have often more translations
than less known things
○ Applying quality dimensions is tricky, e.g. technical accessibility
vs. accessibility across languages
○ No common understanding of quality dimensions
Future Work
○ Embedding into Europeana workflow
○ Evaluation of the metrics
Metadata Multilinguality
26
+ 40 other languages....

More Related Content

Similar to Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)

Multilingualism for Digital Europe
Multilingualism for Digital EuropeMultilingualism for Digital Europe
Multilingualism for Digital Europe
Georg Rehm
 
ELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technology
Dafydd Gibbon
 
The META-NET Strategic Research Agenda and Linked Open Data
The META-NET Strategic Research Agenda and Linked Open DataThe META-NET Strategic Research Agenda and Linked Open Data
The META-NET Strategic Research Agenda and Linked Open Data
Georg Rehm
 
Celtic language technologies in the digital age
Celtic language technologies in the digital ageCeltic language technologies in the digital age
Celtic language technologies in the digital age
techiaith
 
Is MT ready for e-Government? The Latvian Story. Indra Samite, Tilde
Is MT ready for e-Government? The Latvian Story. Indra Samite, TildeIs MT ready for e-Government? The Latvian Story. Indra Samite, Tilde
Is MT ready for e-Government? The Latvian Story. Indra Samite, Tilde
ABBYY Language Serivces
 
EDF2012 Aris Karanikas - PortDial
EDF2012  Aris Karanikas - PortDialEDF2012  Aris Karanikas - PortDial
EDF2012 Aris Karanikas - PortDial
European Data Forum
 
META-NET: Towards a Strategic Research Agenda for Multilingual Europe
META-NET: Towards a Strategic Research Agenda for Multilingual EuropeMETA-NET: Towards a Strategic Research Agenda for Multilingual Europe
META-NET: Towards a Strategic Research Agenda for Multilingual Europe
Georg Rehm
 
Cracking the Language Barrier for a Multilingual Europe
Cracking the Language Barrier for a Multilingual EuropeCracking the Language Barrier for a Multilingual Europe
Cracking the Language Barrier for a Multilingual Europe
Georg Rehm
 
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Georg Rehm
 
The META-NET Language White Paper Series
The META-NET Language White Paper SeriesThe META-NET Language White Paper Series
The META-NET Language White Paper Series
Georg Rehm
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
IMPACT Centre of Competence
 
The Strategic Agenda for the Multilingual Digital Single Market V0.9
The Strategic Agenda for the Multilingual Digital Single Market V0.9The Strategic Agenda for the Multilingual Digital Single Market V0.9
The Strategic Agenda for the Multilingual Digital Single Market V0.9
Georg Rehm
 
AI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual EuropeAI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual Europe
Georg Rehm
 
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Péter Király
 
Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...
LangOER
 
Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...
Web2Learn
 
Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...
LangOER
 
Stiller & Király, Multilinguality of Metadata
Stiller & Király, Multilinguality of MetadataStiller & Király, Multilinguality of Metadata
Stiller & Király, Multilinguality of Metadata
Péter Király
 
Gerhard Budin, University of Vienna: Beyond Accessibility: “Operational Usabi...
Gerhard Budin, University of Vienna: Beyond Accessibility: “Operational Usabi...Gerhard Budin, University of Vienna: Beyond Accessibility: “Operational Usabi...
Gerhard Budin, University of Vienna: Beyond Accessibility: “Operational Usabi...
Katedra Informatologii. Wydział Dziennikarstwa, Informacji i Bibliologii, Uniwersytet Warszawski
 
E-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government ArchivesE-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government Archives
Danube University Krems, Centre for E-Governance
 

Similar to Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018) (20)

Multilingualism for Digital Europe
Multilingualism for Digital EuropeMultilingualism for Digital Europe
Multilingualism for Digital Europe
 
ELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technology
 
The META-NET Strategic Research Agenda and Linked Open Data
The META-NET Strategic Research Agenda and Linked Open DataThe META-NET Strategic Research Agenda and Linked Open Data
The META-NET Strategic Research Agenda and Linked Open Data
 
Celtic language technologies in the digital age
Celtic language technologies in the digital ageCeltic language technologies in the digital age
Celtic language technologies in the digital age
 
Is MT ready for e-Government? The Latvian Story. Indra Samite, Tilde
Is MT ready for e-Government? The Latvian Story. Indra Samite, TildeIs MT ready for e-Government? The Latvian Story. Indra Samite, Tilde
Is MT ready for e-Government? The Latvian Story. Indra Samite, Tilde
 
EDF2012 Aris Karanikas - PortDial
EDF2012  Aris Karanikas - PortDialEDF2012  Aris Karanikas - PortDial
EDF2012 Aris Karanikas - PortDial
 
META-NET: Towards a Strategic Research Agenda for Multilingual Europe
META-NET: Towards a Strategic Research Agenda for Multilingual EuropeMETA-NET: Towards a Strategic Research Agenda for Multilingual Europe
META-NET: Towards a Strategic Research Agenda for Multilingual Europe
 
Cracking the Language Barrier for a Multilingual Europe
Cracking the Language Barrier for a Multilingual EuropeCracking the Language Barrier for a Multilingual Europe
Cracking the Language Barrier for a Multilingual Europe
 
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
 
The META-NET Language White Paper Series
The META-NET Language White Paper SeriesThe META-NET Language White Paper Series
The META-NET Language White Paper Series
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
 
The Strategic Agenda for the Multilingual Digital Single Market V0.9
The Strategic Agenda for the Multilingual Digital Single Market V0.9The Strategic Agenda for the Multilingual Digital Single Market V0.9
The Strategic Agenda for the Multilingual Digital Single Market V0.9
 
AI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual EuropeAI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual Europe
 
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
 
Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...
 
Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...
 
Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...
 
Stiller & Király, Multilinguality of Metadata
Stiller & Király, Multilinguality of MetadataStiller & Király, Multilinguality of Metadata
Stiller & Király, Multilinguality of Metadata
 
Gerhard Budin, University of Vienna: Beyond Accessibility: “Operational Usabi...
Gerhard Budin, University of Vienna: Beyond Accessibility: “Operational Usabi...Gerhard Budin, University of Vienna: Beyond Accessibility: “Operational Usabi...
Gerhard Budin, University of Vienna: Beyond Accessibility: “Operational Usabi...
 
E-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government ArchivesE-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government Archives
 

More from Péter Király

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Péter Király
 
Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)
Péter Király
 
Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)
Péter Király
 
Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)
Péter Király
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
Péter Király
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)
Péter Király
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Péter Király
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Péter Király
 
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Péter Király
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)
Péter Király
 
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Péter Király
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)
Péter Király
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
Péter Király
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...
Péter Király
 
Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)
Péter Király
 
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Péter Király
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)
Péter Király
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)
Péter Király
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Péter Király
 
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Péter Király
 

More from Péter Király (20)

Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
Requirements of DARIAH community for a Dataverse repository (SSHOC 2020)
 
Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)Validating 126 million MARC records (DATeCH 2019)
Validating 126 million MARC records (DATeCH 2019)
 
Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)Measuring Metadata Quality (doctoral defense 2019)
Measuring Metadata Quality (doctoral defense 2019)
 
Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)Empirical evaluation of library catalogues (SWIB 2019)
Empirical evaluation of library catalogues (SWIB 2019)
 
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
GRO.data - Dataverse in Göttingen (Dataverse Europe 2020)
 
Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)Data element constraints for DDB (DDB 2021)
Data element constraints for DDB (DDB 2021)
 
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
Incubating Göttingen Cultural Analytics Alliance (SUB 2021)
 
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
Continuous quality assessment for MARC21 catalogues (MINI ELAG 2021)
 
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
Introduction to data quality management (BVB KVB FDM-KompetenzPool, 2021)
 
Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)Magyar irodalom idegen nyelven (BTK ITI 2021)
Magyar irodalom idegen nyelven (BTK ITI 2021)
 
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
Validating JSON, XML and CSV data with SHACL-like constraints (DINI-KIM 2022)
 
FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)FRBR a book history perspective (Bibliodata WG 2022)
FRBR a book history perspective (Bibliodata WG 2022)
 
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
GRO.data - Dataverse in Göttingen (Magdeburg Coffee Lecture, 2022)
 
Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...Understanding, extracting and enhancing catalogue data (CE Book history works...
Understanding, extracting and enhancing catalogue data (CE Book history works...
 
Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)Measuring cultural heritage metadata quality (Semantics 2017)
Measuring cultural heritage metadata quality (Semantics 2017)
 
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)
 
Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)Measuring library catalogs (ADOCHS 2017)
Measuring library catalogs (ADOCHS 2017)
 
Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)Researching metadata quality (ORKG 2018)
Researching metadata quality (ORKG 2018)
 
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
Metadata quality in cultural heritage institutions (ReIRes-FAIR 2018)
 
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
Measuring Completeness as Metadata Quality Metric in Europeana (CAS 2018)
 

Recently uploaded

Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
Bisnar Chase Personal Injury Attorneys
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
Vineet
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
Vineet
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
davidpietrzykowski1
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
eoxhsaa
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
eudsoh
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 

Recently uploaded (20)

Drownings spike from May to August in children
Drownings spike from May to August in childrenDrownings spike from May to August in children
Drownings spike from May to August in children
 
Sample Devops SRE Product Companies .pdf
Sample Devops SRE  Product Companies .pdfSample Devops SRE  Product Companies .pdf
Sample Devops SRE Product Companies .pdf
 
Digital Marketing Performance Marketing Sample .pdf
Digital Marketing Performance Marketing  Sample .pdfDigital Marketing Performance Marketing  Sample .pdf
Digital Marketing Performance Marketing Sample .pdf
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
Salesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - CanariasSalesforce AI + Data Community Tour Slides - Canarias
Salesforce AI + Data Community Tour Slides - Canarias
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
一比一原版多伦多大学毕业证(UofT毕业证书)学历如何办理
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
一比一原版马来西亚博特拉大学毕业证(upm毕业证)如何办理
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 

Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)

  • 1. Evaluating Data Quality in Europeana: Metrics for Multilinguality Péter Király1, Juliane Stiller2, Valentine Charles3, Werner Bailer4, Nuno Freire5 1 Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen 2 Berlin School of Library and Information Science, Humboldt-Universität zu Berlin 3 Europeana Foundation, The Hague 4 Joanneum Research Forschungsgesellschaft mbH, Graz 5 INESC-ID, Lisbon MTSR 2018 - Track on Cultural Collections and Applications, Limassol, Oct. 24, 2018 1 Nummertjes by Fabio (CC BY-NC 2.0)
  • 2. Agenda 1. Europeana 2. Multilingual Information in Europeana’s Metadata 3. Multilinguality as a Facet of Quality Dimensions 4. Results 5. Demo 2
  • 3. Europeana - Platform for Cultural Heritage Material www.europeana.eu
  • 4. ○ Books, newspapers, letters, paintings, photographs, radio shows, films, etc. ○ Text, images, video, audio, sounds, 3D ○ Over 58 million objects ○ > 50 languages Europeana - Facts 4
  • 6. English cultural heritage object: <dc:language>en</dc:language>
  • 7. English cultural heritage object: <dc:language>en</dc:language> German metadata
  • 8. Multilinguality on Field Level <#record> a ore:Proxy ; dc:subject “Ballet”, “Opera”@en <#record> a ore:Proxy ; edm:europeanaProxy true ; dc:subject <http://data.europeana.eu/concept/base/264>. <http://data.europeana.eu/concept/base/264> a skos:Concept . skos:prefLabel "Ballett"@no, "बैले"@hi, "Ballett"@de, "Балет"@be, "Балет"@ru, "Balé"@pt, "Балет"@bg, "Baletas"@lt, "Balet"@hr, "Balets"@lv . Europeana Dereferencing Literal, literal with language tag
  • 9. Processes Contributing to Multilinguality dc: subject “subject”@en dc:creator <http://vocab.getty.edu/... > dc:type <http://voc.example./…> dc:subject <http://dbpedia.org/ aSubjectID> dc:subject “Subject” Data from Provider dc:creator new labels in different languages Data added by Europeana: dereferencing step Quantifiable: “term”@language annotation dc:subject New labels in different languages
  • 10. Quantify Multilinguality of Data to: ○ Establish a sense of the multilingual reach of Europeana, incl. distribution of languages ○ Identify the impact of different workflows / processes on multilinguality of data ○ Take measures to improve multilinguality in data ○ Devise strategies for underrepresented languages
  • 11. What Could be Measured? ○ Number of (distinct) languages in the metadata ○ Number of language-tagged literals ○ Tagged literals per language ○ Existence of language information fields such as dc:language ○ Consistency and conformity of language information
  • 12. Multilinguality as a Facet of Quality Dimensions 12
  • 13. Completeness ○ This dimension: ○ expresses the number (fraction) of fields present in a dataset ○ identifies non-empty values in a record or (sub-)collection. ○ Multilingual completeness is captured by: ○ Presence of value in dc:language ○ Share of fields with language tags to overall available fields
  • 14. Consistency ○ Describes the logical coherence of metadata ○ Assesses variety of language values in the dc:language field: how many distinct values? ○ Contributes to features like language-based facet
  • 15. Conformity ○ Describes the conformity to a given standard such as ISO-639-2 ○ Example: English is expressed as: English, ENG, en, en-uk, … ○ Share of values that comply or do not comply
  • 16. Accessibility ○ Access to information and data across languages ○ Distribution of linguistic information in metadata ○ Quantifying the language tag ○ The more language tags, the higher the multilingual reach
  • 17. Dimensions, Criteria & Measures Dimension Criteria Measure Completeness Presence or absence of values in fields relating to the language of the object or the metadata Share of multilingual fields to overall fields Presence or absence of dc:language field Consistency Variance in language notation Distinct language notations Conformity Compliance to ISO-639-2 Share of values that comply Accessibility Accessibility across languages expressed through language tags Number of distinct languages Number of languages/Number of tagged literals tagged literals per language
  • 19.
  • 20. Data processing workflow web interface statistical analysis measuring ingestion ★ OAI-PMH ★ Europeana API ★ Hadoop ★ NoSQL ★ Spark ★ Hadoop ★ Java ★ Apache Solr ★ Spark ★ R ★ PHP ★ D3.js ★ highchart.js ★ NoSQL json csv json, png html, svg 20
  • 21. DEMO
  • 22. Questions ★ Contact valentine.charles@europeana.eu juliane.stiller@ibi.hu-berlin.de werner.bailer@joanneum.at peter.kiraly@gwdg.de nfreire@gmail.com ★ Metadata Quality Assurance Framework http://144.76.218.178/europeana-qa ★ Europeana Data Quality Committee https://pro.europeana.eu/project/data- quality-committee 22
  • 24. Discussion ○ Completeness and Consistency are fairly easy to interpret ○ Accessibility measures are harder, e.g. contextual entities for broad concepts or common places have often more translations than less known things ○ Applying quality dimensions is tricky, e.g. technical accessibility vs. accessibility across languages ○ No common understanding of quality dimensions
  • 25. Future Work ○ Embedding into Europeana workflow ○ Evaluation of the metrics
  • 26. Metadata Multilinguality 26 + 40 other languages....

Editor's Notes

  1. Update
  2. Distinguish from consistency
  3. Please add your email
  4. update