SlideShare a Scribd company logo
Data Quality Assessment in Europeana:
Metrics for Multilinguality
Valentine Charles1, Juliane Stiller2, Péter Király3, Werner Bailer4, Nuno Freire5
1 Europeana Foundation, The Hague
2 Berlin School of Library and Information Science, Humboldt-Universität zu Berlin
3 Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen
4 Joanneum Research Forschungsgesellschaft mbH, Graz
5 INESC-ID, Lisbon
TPDL 2017 (meta)-data quality workshop, Thessaloniki, September 21, 2017
1
“Measuring tape” by Therese Banström (CC BY-NC 2.0)
Agenda
1. Europeana
2. Multilinguality of Metadata and Functional Requirements
3. Multilinguality as a Facet of Quality Dimensions
4. Measuring Multilingual Metadata Quality
5. First Results
6. Discussion & Future Work
2
Europeana, Platform for
Cultural Heritage Material
www.europeana.eu
3
○ Books, newspapers, letters, paintings,
photographs, radio shows, films, etc.
○ Text, images, video, audio, sounds, 3D
○ Over 53 million objects
○ > 50 languages
Europeana - Facts
http://statistics.europeana.eu/europeana 4
Thumbnail
Metadata
Link to Provider
Multilinguality of Metadata &
Functional Requirements
6
Metadata Multilinguality
7+ 40 other languages....
Multilingual Entities
8
Quantify Multilinguality of Data to:
○ Establish a sense of the multilingual reach of Europeana, incl.
distribution of languages
○ Identify the impact of different workflows / processes on
multilinguality of data
○ Take measures to improve multilinguality in data
○ Devise strategies for underrepresented languages
What Could be Measured?
○ Number of (distinct) languages in the metadata
○ Number of language-tagged literals
○ Tagged literals per language
○ Existence of language information fields such as dc:language
○ Consistency of language information
Requirement: language annotations / tags!
Multilingual Information
<#record> a ore:Proxy ;
dc:subject “Ballet”, “Opera”@en
<#record> a ore:Proxy ; edm:europeanaProxy true ;
dc:subject <http://data.europeana.eu/concept/base/264>.
<http://data.europeana.eu/concept/base/264> a skos:Concept .
skos:prefLabel "Ballett"@no, "बैले"@hi, "Ballett"@de, "Балет"@be, "Балет"@ru
, "Balé"@pt, "Балет"@bg, "Baletas"@lt, "Balet"@hr, "Balets"@lv .
Europeana Enrichment
Literal, literal with language tag
Processes Contributing to Multilinguality
dc: subject
“subject”@en
dc:creator
<http://vocab.getty.edu/
aPersonNumber>
dc:type
http://vocab.example/dom
ain-spcific
dc:subject
<http://dbpedia.org/aSu
bjectID>
dc:subject
“Subject”
Data from Provider
dc:creator
new labels in different
languages
Data added by Europeana: dereferencing step
Quantifiable
dc:subject
New labels in different
langauges
Functional Requirements for
Multilingual Services
○ Cross-lingual search
○ Language-based facets
○ Entity-based facets
Multilinguality as a Facet of
Quality Dimensions
14
Completeness
○ expresses the number (fraction) of fields present in a dataset
○ identifies non-empty values in a record or (sub-)collection.
○ Problem: data model with optional fields
○ Multilingual completeness:
○ does the field dc:language has a value?
○ Share of fields with language tags to overall available fields
Consistency
○ Logical coherence of metadata
○ Variety of language values in the dc:language field
Accessibility
○ Access to information and data across languages
○ Distribution of linguistic information in metadata
○ Quantifying the language tag
○ On record level, collection level, Europeana level or across fields,
e.g. how multilingual is the dc:subject field
Dimensions, Criteria & Measures
Dimension Criteria Measure
Completeness Presence or absence of values in fields
relating to the language of the object or
the metadata
Share of multilingual fields to overall
fields
Presence or absence of dc:language field
Consistency Variance in language notation Distinct language notations
Accessibility Accessibility across languages expressed
through language tags
In language tags:
Number of distinct languages
Number of languages/Number of tagged
literals
Number of tagged literals per language
Measuring Multilingual
Metadata Quality
19
Implementation
source codes: http://pkiraly.github.io/about/#source-codes
data source: http://hdl.handle.net/21.11101/0000-0001-781F-7
(Europeana snapshot, 2015 december)
Access to the project: http://144.76.218.178/europeana-
qa/multilinguality.php?id=all
20
Data processing workflow
web interfacestatistical analysismeasuringingestion
★ OAI-PMH
★ Europeana API
★ Hadoop
★ NoSQL
★ Spark
★ Hadoop
★ Java
★ Apache Solr
★ Spark
★ R
★ PHP
★ D3.js
★ highchart.js
★ NoSQL
json csv json, png html, svg
21
Visualization
2222
First Results
23
Completeness
○ 904 (out of 3,548) collections have no value in the dc:language field,
which shows the field is missing.
○ On a record level, 58.03% of the records have a dc:language field.
○ misuse of fields
○ collections that have metadata fields with more than 3 instances of
dc:language .
○ duplication of the language tag.
Consistency
Total values in the Europeana dataset 33,070,941
Total values in ISO-639-1 31,803,048
(96.17%)
Total values non-normalized 1,267,893
(3.83%)
Error rate of the normalization (approx.) 1 / 212,766
dc:language: eng
dc:language en
dc:language en_GB
en (ISO-639-1, 2 letter codes)
9,436,280 values needed normalization to ISO-639-1
Record level - Accessibility
<#record> a ore:Proxy ;
dc:subject “Ballet”, “Opera” .
<#record> a ore:Proxy ; edm:europeanaProxy true ;
dc:subject <http://data.europeana.eu/concept/base/264>
, <http://data.europeana.eu/concept/base/247> .
<http://data.europeana.eu/concept/base/264> a skos:Concept .
skos:prefLabel "Ballett"@no, "बैले"@hi, "Ballett"@de, "Балет"@be, "Балет"@ru
, "Balé"@pt, "Балет"@bg, "Baletas"@lt, "Balet"@hr, "Balets"@lv .
<http://data.europeana.eu/concept/base/247>
skos:prefLabel "Opera"@no, "ओपेरा (गीतिनाटक)"@hi, "Oper"@de, "Ooppera"@fi
, "Опера"@be, "Опера"@ru, "Ópera"@pt, "Опера"@bg, "Opera"@lt .
0
0
11 19Distinct languages Tagged literals 1,7 Literals per language
Discussion & Future Work
27
Discussion
○ Completeness and Consistency are fairly easy to interpret
○ Accessibility measures are harder, e.g. contextual entities for broad
concepts or common places have often more translations than less
known things
○ Applying quality dimensions is tricky, e.g. technical accessibility vs.
accessibility across languages
○ No common understanding of quality dimensions
Future Work
○ Conceptualize the quality dimensions for multilinguality
○ Work on implementation of visualizations that are straightforward for
providers
Questions
○ Contact
valentine.charles@europeana.eu
juliane.stiller@ibi.hu-berlin.de
werner.bailer@joanneum.at
peter.kiraly@gwdg.de
nfreire@gmail.com
○ Metadata Quality Assurance Framework
http://144.76.218.178/europeana-qa
○ Europeana Data Quality Committee
https://pro.europeana.eu/project/data-quality-
committee
30

More Related Content

Similar to Data Quality Assessment in Europeana: Metrics for Multilinguality

Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Péter Király
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for MultilingualityEvaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for MultilingualityJuliane Stiller
 
Stiller & Király, Multilinguality of Metadata
Stiller & Király, Multilinguality of MetadataStiller & Király, Multilinguality of Metadata
Stiller & Király, Multilinguality of MetadataPéter Király
 
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Péter Király
 
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...Péter Király
 
A portrait of Europeana as a Linked Open Data case
A portrait of Europeana as a Linked Open Data caseA portrait of Europeana as a Linked Open Data case
A portrait of Europeana as a Linked Open Data caseAntoine Isaac
 
Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)Péter Király
 
Europeana @ NISO Bibliographic Roadmap Meeting
Europeana @ NISO Bibliographic Roadmap MeetingEuropeana @ NISO Bibliographic Roadmap Meeting
Europeana @ NISO Bibliographic Roadmap MeetingAntoine Isaac
 
Linked Open Data Cloud
Linked Open Data CloudLinked Open Data Cloud
Linked Open Data CloudPretaLLOD
 
Designing a multilingual knowledge graph - DCMI2018
Designing a multilingual knowledge graph - DCMI2018Designing a multilingual knowledge graph - DCMI2018
Designing a multilingual knowledge graph - DCMI2018Antoine Isaac
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana
 
Data quality in cultural heritage (meta)data
Data quality in cultural heritage (meta)dataData quality in cultural heritage (meta)data
Data quality in cultural heritage (meta)dataValentine Charles
 
Enriching Cultural Heritage Data with DBpedia
Enriching Cultural Heritage Data with DBpediaEnriching Cultural Heritage Data with DBpedia
Enriching Cultural Heritage Data with DBpediaAntoine Isaac
 
Connecting archaeology and architecture data
Connecting archaeology and architecture dataConnecting archaeology and architecture data
Connecting archaeology and architecture dataCARARE
 
Valentine Charles: Linking cultural heritage with KOS: the Europeana example
Valentine Charles: Linking cultural heritage with KOS: the Europeana example Valentine Charles: Linking cultural heritage with KOS: the Europeana example
Valentine Charles: Linking cultural heritage with KOS: the Europeana example COST Action TD1210
 
Europeana and open data
Europeana and open dataEuropeana and open data
Europeana and open dataRobinaClayphan
 
Juliane Stiller EuropeanaTech 2015
Juliane Stiller EuropeanaTech 2015Juliane Stiller EuropeanaTech 2015
Juliane Stiller EuropeanaTech 2015Europeana
 
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataEnrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataHang Dong
 

Similar to Data Quality Assessment in Europeana: Metrics for Multilinguality (20)

Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for MultilingualityEvaluating Data Quality in Europeana: Metrics for Multilinguality
Evaluating Data Quality in Europeana: Metrics for Multilinguality
 
Stiller & Király, Multilinguality of Metadata
Stiller & Király, Multilinguality of MetadataStiller & Király, Multilinguality of Metadata
Stiller & Király, Multilinguality of Metadata
 
Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)Measuring Metadata Quality in Europeana (ADOCHS 2017)
Measuring Metadata Quality in Europeana (ADOCHS 2017)
 
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
Multilinguality of Metadata. Measuring the Multilingual Degree of Europeana‘s...
 
A portrait of Europeana as a Linked Open Data case
A portrait of Europeana as a Linked Open Data caseA portrait of Europeana as a Linked Open Data case
A portrait of Europeana as a Linked Open Data case
 
Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)Improving data quality at Europeana (SWIB 2016)
Improving data quality at Europeana (SWIB 2016)
 
Europeana @ NISO Bibliographic Roadmap Meeting
Europeana @ NISO Bibliographic Roadmap MeetingEuropeana @ NISO Bibliographic Roadmap Meeting
Europeana @ NISO Bibliographic Roadmap Meeting
 
Linked Open Data Cloud
Linked Open Data CloudLinked Open Data Cloud
Linked Open Data Cloud
 
Designing a multilingual knowledge graph - DCMI2018
Designing a multilingual knowledge graph - DCMI2018Designing a multilingual knowledge graph - DCMI2018
Designing a multilingual knowledge graph - DCMI2018
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
 
Data quality in cultural heritage (meta)data
Data quality in cultural heritage (meta)dataData quality in cultural heritage (meta)data
Data quality in cultural heritage (meta)data
 
Session5 03.george rehm
Session5 03.george rehmSession5 03.george rehm
Session5 03.george rehm
 
Enriching Cultural Heritage Data with DBpedia
Enriching Cultural Heritage Data with DBpediaEnriching Cultural Heritage Data with DBpedia
Enriching Cultural Heritage Data with DBpedia
 
Connecting archaeology and architecture data
Connecting archaeology and architecture dataConnecting archaeology and architecture data
Connecting archaeology and architecture data
 
E-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government ArchivesE-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government Archives
 
Valentine Charles: Linking cultural heritage with KOS: the Europeana example
Valentine Charles: Linking cultural heritage with KOS: the Europeana example Valentine Charles: Linking cultural heritage with KOS: the Europeana example
Valentine Charles: Linking cultural heritage with KOS: the Europeana example
 
Europeana and open data
Europeana and open dataEuropeana and open data
Europeana and open data
 
Juliane Stiller EuropeanaTech 2015
Juliane Stiller EuropeanaTech 2015Juliane Stiller EuropeanaTech 2015
Juliane Stiller EuropeanaTech 2015
 
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataEnrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
 

More from Juliane Stiller

KOBV-Forum 2022 - Digitale Inklusion von Menschen mit Fluchtbiografie
KOBV-Forum 2022 - Digitale Inklusion von Menschen mit FluchtbiografieKOBV-Forum 2022 - Digitale Inklusion von Menschen mit Fluchtbiografie
KOBV-Forum 2022 - Digitale Inklusion von Menschen mit FluchtbiografieJuliane Stiller
 
KOBV-Forum 2022 - Desinformationen im Gesundheitsbereich
KOBV-Forum 2022 - Desinformationen im GesundheitsbereichKOBV-Forum 2022 - Desinformationen im Gesundheitsbereich
KOBV-Forum 2022 - Desinformationen im GesundheitsbereichJuliane Stiller
 
Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...
Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...
Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...Juliane Stiller
 
Berlin auf dem Weg zu Open Research
Berlin auf dem Weg zu Open ResearchBerlin auf dem Weg zu Open Research
Berlin auf dem Weg zu Open ResearchJuliane Stiller
 
Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...
Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...
Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...Juliane Stiller
 
Cross-Lingual Bibliographic Search (CLuBS)
Cross-Lingual Bibliographic Search (CLuBS)Cross-Lingual Bibliographic Search (CLuBS)
Cross-Lingual Bibliographic Search (CLuBS)Juliane Stiller
 
Zur Bedeutung digitaler Kompetenzen von Geflüchteten bei der Jobsuche
Zur Bedeutung digitaler Kompetenzen von Geflüchteten bei der JobsucheZur Bedeutung digitaler Kompetenzen von Geflüchteten bei der Jobsuche
Zur Bedeutung digitaler Kompetenzen von Geflüchteten bei der JobsucheJuliane Stiller
 
Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...
Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...
Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...Juliane Stiller
 
The Role of Information Literacy for the Integration of Refugees
The Role of Information Literacy for the Integration of RefugeesThe Role of Information Literacy for the Integration of Refugees
The Role of Information Literacy for the Integration of RefugeesJuliane Stiller
 
Query Translation for Cross-lingual Search in the Academic Search Engine PubP...
Query Translation for Cross-lingual Search in the Academic Search Engine PubP...Query Translation for Cross-lingual Search in the Academic Search Engine PubP...
Query Translation for Cross-lingual Search in the Academic Search Engine PubP...Juliane Stiller
 
Have You Hired a Refugee? - Hiring Success 2018 Europe
 Have You Hired a Refugee? - Hiring Success 2018 Europe  Have You Hired a Refugee? - Hiring Success 2018 Europe
Have You Hired a Refugee? - Hiring Success 2018 Europe Juliane Stiller
 
Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...
Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...
Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...Juliane Stiller
 
Iconference 2018 stiller trkulja-digital literacy session-27-03
Iconference 2018 stiller trkulja-digital literacy session-27-03Iconference 2018 stiller trkulja-digital literacy session-27-03
Iconference 2018 stiller trkulja-digital literacy session-27-03Juliane Stiller
 
A Decade of Evaluating Europeana: Constructs, Contexts, Methods & Criteria
A Decade of Evaluating Europeana: Constructs, Contexts, Methods & CriteriaA Decade of Evaluating Europeana: Constructs, Contexts, Methods & Criteria
A Decade of Evaluating Europeana: Constructs, Contexts, Methods & CriteriaJuliane Stiller
 

More from Juliane Stiller (14)

KOBV-Forum 2022 - Digitale Inklusion von Menschen mit Fluchtbiografie
KOBV-Forum 2022 - Digitale Inklusion von Menschen mit FluchtbiografieKOBV-Forum 2022 - Digitale Inklusion von Menschen mit Fluchtbiografie
KOBV-Forum 2022 - Digitale Inklusion von Menschen mit Fluchtbiografie
 
KOBV-Forum 2022 - Desinformationen im Gesundheitsbereich
KOBV-Forum 2022 - Desinformationen im GesundheitsbereichKOBV-Forum 2022 - Desinformationen im Gesundheitsbereich
KOBV-Forum 2022 - Desinformationen im Gesundheitsbereich
 
Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...
Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...
Open Access in Museen. Vorteile der Offenheit und wie Museen mehr Offenheit w...
 
Berlin auf dem Weg zu Open Research
Berlin auf dem Weg zu Open ResearchBerlin auf dem Weg zu Open Research
Berlin auf dem Weg zu Open Research
 
Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...
Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...
Transfer informationswissenschaftlicher Fachkompetenz in die Praxis: Erfahrun...
 
Cross-Lingual Bibliographic Search (CLuBS)
Cross-Lingual Bibliographic Search (CLuBS)Cross-Lingual Bibliographic Search (CLuBS)
Cross-Lingual Bibliographic Search (CLuBS)
 
Zur Bedeutung digitaler Kompetenzen von Geflüchteten bei der Jobsuche
Zur Bedeutung digitaler Kompetenzen von Geflüchteten bei der JobsucheZur Bedeutung digitaler Kompetenzen von Geflüchteten bei der Jobsuche
Zur Bedeutung digitaler Kompetenzen von Geflüchteten bei der Jobsuche
 
Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...
Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...
Die Rolle digitaler Kompetenzen bei der Jobsuche: Ergebnisse aus einer Studie...
 
The Role of Information Literacy for the Integration of Refugees
The Role of Information Literacy for the Integration of RefugeesThe Role of Information Literacy for the Integration of Refugees
The Role of Information Literacy for the Integration of Refugees
 
Query Translation for Cross-lingual Search in the Academic Search Engine PubP...
Query Translation for Cross-lingual Search in the Academic Search Engine PubP...Query Translation for Cross-lingual Search in the Academic Search Engine PubP...
Query Translation for Cross-lingual Search in the Academic Search Engine PubP...
 
Have You Hired a Refugee? - Hiring Success 2018 Europe
 Have You Hired a Refugee? - Hiring Success 2018 Europe  Have You Hired a Refugee? - Hiring Success 2018 Europe
Have You Hired a Refugee? - Hiring Success 2018 Europe
 
Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...
Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...
Integrating Refugee Migrants into the Labour Market: the Necessity of Digital...
 
Iconference 2018 stiller trkulja-digital literacy session-27-03
Iconference 2018 stiller trkulja-digital literacy session-27-03Iconference 2018 stiller trkulja-digital literacy session-27-03
Iconference 2018 stiller trkulja-digital literacy session-27-03
 
A Decade of Evaluating Europeana: Constructs, Contexts, Methods & Criteria
A Decade of Evaluating Europeana: Constructs, Contexts, Methods & CriteriaA Decade of Evaluating Europeana: Constructs, Contexts, Methods & Criteria
A Decade of Evaluating Europeana: Constructs, Contexts, Methods & Criteria
 

Recently uploaded

Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsalex933524
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...elinavihriala
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .NABLAS株式会社
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxbenishzehra469
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBAlireza Kamrani
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like BitcoinDOT TECH
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIAlejandraGmez176757
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesStarCompliance.io
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...correoyaya
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJames Polillo
 

Recently uploaded (20)

Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 

Data Quality Assessment in Europeana: Metrics for Multilinguality

  • 1. Data Quality Assessment in Europeana: Metrics for Multilinguality Valentine Charles1, Juliane Stiller2, Péter Király3, Werner Bailer4, Nuno Freire5 1 Europeana Foundation, The Hague 2 Berlin School of Library and Information Science, Humboldt-Universität zu Berlin 3 Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen 4 Joanneum Research Forschungsgesellschaft mbH, Graz 5 INESC-ID, Lisbon TPDL 2017 (meta)-data quality workshop, Thessaloniki, September 21, 2017 1 “Measuring tape” by Therese Banström (CC BY-NC 2.0)
  • 2. Agenda 1. Europeana 2. Multilinguality of Metadata and Functional Requirements 3. Multilinguality as a Facet of Quality Dimensions 4. Measuring Multilingual Metadata Quality 5. First Results 6. Discussion & Future Work 2
  • 3. Europeana, Platform for Cultural Heritage Material www.europeana.eu 3
  • 4. ○ Books, newspapers, letters, paintings, photographs, radio shows, films, etc. ○ Text, images, video, audio, sounds, 3D ○ Over 53 million objects ○ > 50 languages Europeana - Facts http://statistics.europeana.eu/europeana 4
  • 6. Multilinguality of Metadata & Functional Requirements 6
  • 7. Metadata Multilinguality 7+ 40 other languages....
  • 9. Quantify Multilinguality of Data to: ○ Establish a sense of the multilingual reach of Europeana, incl. distribution of languages ○ Identify the impact of different workflows / processes on multilinguality of data ○ Take measures to improve multilinguality in data ○ Devise strategies for underrepresented languages
  • 10. What Could be Measured? ○ Number of (distinct) languages in the metadata ○ Number of language-tagged literals ○ Tagged literals per language ○ Existence of language information fields such as dc:language ○ Consistency of language information Requirement: language annotations / tags!
  • 11. Multilingual Information <#record> a ore:Proxy ; dc:subject “Ballet”, “Opera”@en <#record> a ore:Proxy ; edm:europeanaProxy true ; dc:subject <http://data.europeana.eu/concept/base/264>. <http://data.europeana.eu/concept/base/264> a skos:Concept . skos:prefLabel "Ballett"@no, "बैले"@hi, "Ballett"@de, "Балет"@be, "Балет"@ru , "Balé"@pt, "Балет"@bg, "Baletas"@lt, "Balet"@hr, "Balets"@lv . Europeana Enrichment Literal, literal with language tag
  • 12. Processes Contributing to Multilinguality dc: subject “subject”@en dc:creator <http://vocab.getty.edu/ aPersonNumber> dc:type http://vocab.example/dom ain-spcific dc:subject <http://dbpedia.org/aSu bjectID> dc:subject “Subject” Data from Provider dc:creator new labels in different languages Data added by Europeana: dereferencing step Quantifiable dc:subject New labels in different langauges
  • 13. Functional Requirements for Multilingual Services ○ Cross-lingual search ○ Language-based facets ○ Entity-based facets
  • 14. Multilinguality as a Facet of Quality Dimensions 14
  • 15. Completeness ○ expresses the number (fraction) of fields present in a dataset ○ identifies non-empty values in a record or (sub-)collection. ○ Problem: data model with optional fields ○ Multilingual completeness: ○ does the field dc:language has a value? ○ Share of fields with language tags to overall available fields
  • 16. Consistency ○ Logical coherence of metadata ○ Variety of language values in the dc:language field
  • 17. Accessibility ○ Access to information and data across languages ○ Distribution of linguistic information in metadata ○ Quantifying the language tag ○ On record level, collection level, Europeana level or across fields, e.g. how multilingual is the dc:subject field
  • 18. Dimensions, Criteria & Measures Dimension Criteria Measure Completeness Presence or absence of values in fields relating to the language of the object or the metadata Share of multilingual fields to overall fields Presence or absence of dc:language field Consistency Variance in language notation Distinct language notations Accessibility Accessibility across languages expressed through language tags In language tags: Number of distinct languages Number of languages/Number of tagged literals Number of tagged literals per language
  • 20. Implementation source codes: http://pkiraly.github.io/about/#source-codes data source: http://hdl.handle.net/21.11101/0000-0001-781F-7 (Europeana snapshot, 2015 december) Access to the project: http://144.76.218.178/europeana- qa/multilinguality.php?id=all 20
  • 21. Data processing workflow web interfacestatistical analysismeasuringingestion ★ OAI-PMH ★ Europeana API ★ Hadoop ★ NoSQL ★ Spark ★ Hadoop ★ Java ★ Apache Solr ★ Spark ★ R ★ PHP ★ D3.js ★ highchart.js ★ NoSQL json csv json, png html, svg 21
  • 24. Completeness ○ 904 (out of 3,548) collections have no value in the dc:language field, which shows the field is missing. ○ On a record level, 58.03% of the records have a dc:language field. ○ misuse of fields ○ collections that have metadata fields with more than 3 instances of dc:language . ○ duplication of the language tag.
  • 25. Consistency Total values in the Europeana dataset 33,070,941 Total values in ISO-639-1 31,803,048 (96.17%) Total values non-normalized 1,267,893 (3.83%) Error rate of the normalization (approx.) 1 / 212,766 dc:language: eng dc:language en dc:language en_GB en (ISO-639-1, 2 letter codes) 9,436,280 values needed normalization to ISO-639-1
  • 26. Record level - Accessibility <#record> a ore:Proxy ; dc:subject “Ballet”, “Opera” . <#record> a ore:Proxy ; edm:europeanaProxy true ; dc:subject <http://data.europeana.eu/concept/base/264> , <http://data.europeana.eu/concept/base/247> . <http://data.europeana.eu/concept/base/264> a skos:Concept . skos:prefLabel "Ballett"@no, "बैले"@hi, "Ballett"@de, "Балет"@be, "Балет"@ru , "Balé"@pt, "Балет"@bg, "Baletas"@lt, "Balet"@hr, "Balets"@lv . <http://data.europeana.eu/concept/base/247> skos:prefLabel "Opera"@no, "ओपेरा (गीतिनाटक)"@hi, "Oper"@de, "Ooppera"@fi , "Опера"@be, "Опера"@ru, "Ópera"@pt, "Опера"@bg, "Opera"@lt . 0 0 11 19Distinct languages Tagged literals 1,7 Literals per language
  • 28. Discussion ○ Completeness and Consistency are fairly easy to interpret ○ Accessibility measures are harder, e.g. contextual entities for broad concepts or common places have often more translations than less known things ○ Applying quality dimensions is tricky, e.g. technical accessibility vs. accessibility across languages ○ No common understanding of quality dimensions
  • 29. Future Work ○ Conceptualize the quality dimensions for multilinguality ○ Work on implementation of visualizations that are straightforward for providers
  • 30. Questions ○ Contact valentine.charles@europeana.eu juliane.stiller@ibi.hu-berlin.de werner.bailer@joanneum.at peter.kiraly@gwdg.de nfreire@gmail.com ○ Metadata Quality Assurance Framework http://144.76.218.178/europeana-qa ○ Europeana Data Quality Committee https://pro.europeana.eu/project/data-quality- committee 30

Editor's Notes

  1. Neu machen Populated by edm_language which gives each collection a static value based on the language of the institution
  2. Screenshot from MQF
  3. Please add your email