SlideShare a Scribd company logo
1 of 67
Download to read offline
Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
EUROPEANA MEETING
UNDER FINLAND’S PRESIDENCY
OF THE COUNCIL OF THE EU
ESPOO, FINLAND
25 October 2019
Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Andy
Neale
Technical Director
Europeana Foundation
Recap on main conclusions of
Day 1
Content
Information Access
Interactions
User Interface
Metadata and digital
CH objects
Search, Browse & Explore
Show user‘s
preferred language
Bridge the gap between
language of user input
and content
Layers of digital CH system
Juliane
Mismatch between query and
content language
• Mona Lisa 203 results
• Monna Lisa 13 results
• La Gioconda 376 results 
• La Joconde 78 results
5
Interactions
Roma, Galleria Corsini - La
Gioconda,
Juliane
Challenges
• Missing training data for small languages
• Missing training data for (sub)domains
• Amount of language pairs is immense with 50+
languages
• Metadata is too scarce for good translation results
6
Juliane
Evaluate solution based on goal
○ E.g. for ML retrieval we might not need the perfect fluent
translation
○ Identify the impact of different workflows / processes on
multilinguality of system
○ Translations do not only have an impact on data but also on
retrieval and therefore on user satisfaction
7
Juliane
Challenges for LT in cultural heritage
● Interface or content (= multilingual in a broad sense)
● Far beyond modern standard language use
● Great variation makes domain adaptation hard
● Variation in place (dialects and languages), time (old Swedish) and
situation (informal-formal)
● Modal variation in collections: (handwritten) text, speech, pictures
● Hard to handle as researchers want to explore a collection as a whole
Rickard
Next steps
● Linked data to describe the collection conceptually and relationally
● Multilingual search methods for handling language variation in place,
time and situation
● Domain adopted speech-to-text conversion to transcribe recordings
● Crowdsourcing for correcting
● Shared resources for the languages, dialects, domains etc
● Long time funding for the National Language Bank
● Collaborative projects involving LTists, researchers and data holders
Rickard
Hugo.lv – AI powered language technology portal
Andrejs & Jānis
Conclusions
• New generation of Neural MT strongly improves quality and applicability of
machine translation, especially for morphology rich languages
• Domain specific data is crucial for making MT suitable for cultural and other
domains
• Depending on the application, translation needs can be served by selecting
the most efficient approach – pure MT, human review of the MT, or fully
human translation
• We will be happy to share our experience, technologies and tools :)
Andrejs & Jānis
Development Implementation Operation and maintenance Initiation
(of a new service)
time
Process-time Use-time Future
Who are involved in
the development and
implementation of
your service?
What kinds of benefits
can be identified?
Who uses your
service? Are there
other stakeholders?
What kinds of benefits
can be identified?
Who could (re)use
your service or
materials in the
(undefined) future?
What kinds of benefits
can be anticipated?
Model for temporal division of benefits
Kautonen, H. & Nieminen, M. (2018): Conceptualizing Benefits of User-Centered Design for Digital Library
Services. Liber Quarterly, 28(1), ss. 1–34. DOI: http://doi.org/10.18352/lq.10231.
Heli
Dasha
Dasha
Dasha
Language detection and display (for validation)
Query translated in 24 languages
Dasha
THE NATIONAL LIBRARY OF FINLAND
Thesaurus to ontology
▪ Reconstruction of YSA into machine-readable and multilingual YSO
▪ Trilingual terms for concepts (fin, swe, eng)
▪ YSA and Allärs merged together and translated into English
▪ Concepts are a compromise between Finnish and Swedish as YSA
and Allärs are not completely identical
▪ Links to Library of Congress Subject Headings (LCSH)
▪ Linking to Wikidata underway
▪ YSO just made the list of Europeana dereferenceable vocabularies
that can be enriched in the Europeana portal
Matias
THE NATIONAL LIBRARY OF FINLAND
Annotate in one language, find using another
Matias
THE NATIONAL LIBRARY OF FINLAND
Automated Subject Indexing made easy:
Annif
▪ An open source multilingual automated subject indexing
system using machine learning and our own vocabularies
Matias
Europeana’s Knowledge Graph
Entity
Collection
Hugo
Proposals for indexing and storing translations
● Automated identification of language if needed (only 26.5% of the data
provider’s metadata is language qualified)
● Use translations from multilingual knowledge graph
● Augment the provider metadata with static translation of the fields to English
(to fill metadata values not covered by the knowledge graph)
● Store and index translated metadata for search and display (original metadata
+ languages of the knowledge graph + English)
Hugo
Proposals for search on object metadata
Identify
language
Original
query
Translate to
English
Multilingual
index
User
Disambiguates
Search
Translated query (English)
Suggest Entity
(Knowledge Graph)
Entity-based query
Multilingual query:
entity based query
OR original query +
translated query
#1: French
#2: Spanish
#3: Polish
Hugo
Session 4
CONTENT TRANSLATION
Europa [Material cartográfico] : Nach den vorzüglichsten Hülfsnitteln, Götze, Johann August Ferdinand, 1773-1819 Biblioteca Digital de Madrid Spain, Public domain
Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Tom
Vanallemeersch
Machine translation specialist
CrossLang
The art of automating translation
Cultural heritage and translation
● Translation helps to open up cultures
● Rosetta stone was the key to understanding hieroglyphs
Parallel data
● Systems for automated translation (now) act in a similar way
● However, the right stones are required, and many of them ...
Context of this talk
EC project SMART 2016/0103:
● Identification of language technology needs of Digital Service Infrastructures of EC
E.g. Europeana DSI
● Framework: Connecting Europe Facility – Automated Translation
● Contracting authority: DG CNECT (the EC's multilingual enabler)
● Consortium:
Guide to this talk
Machine translation (MT):
● In general
● In a highly multilingual environment: eTranslation (EC)
● For EU cultural heritage
Challenges:
●Domain imbalance
●Language imbalance
●Context demand
●Multimodal sources
Approaches
MT in general
● MT systems are data-driven
🡪 Sentence pairs: They were living there - Ils habitaient là-bas
🡪 Software consisting of a neural network (like many recent AI applications)
● MT is used for various purposes
🡪 Post-editing, gisting, cross-lingual retrieval
MT in general: domain imbalance
● Quality typically improves when increasing training data
● But there are few (accessible) translations in some domains
● The same problem occurs for specific genres (e.g. novels)
and registers (e.g. informal language)
Difference in amount of domain-specific resources
MT in general: domain imbalance
Approach: identify/create domain-specific data
● Select sentence pairs from the vast ParaCrawl Corpus
● Use the ParaCrawl toolkit for multilingual websites, archives
● Select domain-specific parallel corpora from the ELRC-SHARE repository
● Create artificial training data: e.g. apply MT to French in-domain data,
add the English translations to English-French MT system
Difference in amount of domain-specific resources
Guide to this talk
Machine translation:
● In general
● In a highly multilingual environment: eTranslation (EC)
● For EU cultural heritage
Challenges:
●Domain imbalance
●Language imbalance
●Context demand
●Multimodal sources
Approaches
eTranslation
● 130+ out of 552 language pairs, often from or into English
● Sometimes pivot:
● Management: DG Translation (technical), DG CNECT (EU’s MT policy)
● Users: translators of DG Translation, public administrations in the EEA
● Free use
● Confidentiality and security
MT system for 24 official EU languages + Icelandic and Norwegian (Bokmål)
Finnish English Portuguese
eTranslation
● User interface: snippets, documents
● API: online services, …
● Domain of training data: legal and administrative texts
● Specific MT systems for some organisations
🡪 E.g. Court of Justice (French ⇄ X)
MT system for 24 official EU languages + Icelandic and Norwegian (Bokmål)
eTranslation: language imbalance
● Resource-rich language pairs (many parallel data), e.g. English-French
● Resource-poor language pairs, e.g. English-Irish, English-Icelandic
🡪 Lower MT quality
Difference in amount of training data for language pairs
eTranslation: language imbalance
Approach: build multilingual models
● Recent research topic in MT
● Translation from many languages into one, from one into many, etc.
● Language pairs that “learn” from each other how to translate (pieces of) words
● Surprising improvements for resource-poor language pairs
Difference in amount of training data for language pairs
eTranslation: language imbalance
Approach: build multilingual models (continued)
● Recent workshop in Luxembourg, organised by CrossLang for DG CNECT
🡪 Moderated by high-profile expert from Facebook
● Google AI group: attempts at creating “universal MT” (102 languages for now)
● Opportunity for scaling up MT
Difference in amount of training data for language pairs
Guide to this talk
Machine translation:
● In general
● In a highly multilingual environment: eTranslation (EC)
● For EU cultural heritage
Challenges:
●Domain imbalance
●Language imbalance
●Context demand
●Multimodal sources
Approaches
MT for culture
● Post-editing: e.g. static text on websites
● Gisting: e.g. dynamic text like visitors’ comments
● Cross-lingual retrieval: e.g. search for objects having metadata in another language
Potential uses
MT for culture: context demand
Metadata consisting of short text fragments
Title: note, bank = “financial institution” / “location near river” ?
= “comment” / “money” ?
MT for culture: context demand
Metadata consisting of short text fragments
Title: note, bank Subject: paper money
= “comment” / ”money” ?
🡪 Dutch: biljet
Approach: make use of the remainder of the metadata
MT for culture: context demand
Metadata consisting of short text fragments
Approach: make use of the remainder of the metadata
🡪 Approach is also useful for named entity recognition:
Description: The Utrecht artist De Heem is regarded as one …
Artist: Jan Davidsz de Heem
MTforculture:languageimbalance
Little or no parallel data involving “dead” / minority languages
Approach for related languages: use available data + additional techniques
● Minority language + larger language
● Old + new language variant
● Advantage: similar vocabulary, spelling
MTforculture:languageimbalance
Little or no parallel data involving “dead” / minority languages
Alternative approach for related languages: train an unsupervised MT system
● Uses monolingual corpora for the two languages
● Identifies similar words and sentences in both languages
● Learns to translate in both directions
MT for culture: multimodal sources
Translation in case of non-textual objects (including non-digitised text)
● Audio material
● Scanned documents
● Photographs with text
● Images without text
Speech recognition
OCR
OCR (?)
Text describing image
Imperfect MT input
MT for culture: multimodal sources
Translation in case of non-textual objects (including non-digitised text)
Approach: correct output using metadata before applying MT
OCR: Demer en Capueienen
Metadata: … Capucienen …
Conclusions
● MT for cultural heritage stretches across many dimensions
Languages, domains, genres, registers, periods, …
● It is a particularly interesting and demanding area for MT
Huge potential of multilingual object metadata, big challenges
● Approaches involve new information sources, refinement of tools and methods
Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Antoine
Isaac
R&D Manager
Europeana Foundation
Case study -
Content translation and search
Aspects of multilingual experience
- Content
A focused view of our
conceptual model of
multilingual approach
First experiments -
Translation of virtual exhibitions
Translation of virtual exhibitions
Pilot: apply eTranslation to assist
manual translation of exhibitions
● Exhibitions from two Generic Services projects:
○ Migration in the Arts and Sciences
○ Rise of Literacy
● 13 people from 11 institutions reviewed
translations from English into 8 languages:
○ Dutch, French, Hungarian, Italian, Lithuanian,
Polish, Portuguese, Slovenian
NB: no German (for which eTranslation has a
"cultural" version)
Translation of virtual exhibitions
Pilot: apply eTranslation to assist
manual translation of exhibitions
● The output is medium to good but does not
translate well the carefully crafted narrative text,
leading to partners spending a lot of time
rewriting
● The quality is too low yet to translate exhibitions
sustainably and cost-effectively
Ongoing experiments - content
translation and search
New case study: using translation
in search for text objects
● An important need for
Europeana (cf.
Newspapers,
Transcriptions)
● One that may still work
with less-than-perfect
translations
The strategy for using translation
in cross-lingual search
Identify
language
Original
query
Translate to
English
Multilingual
index
User
validation
Search
Translated query (English)
Align to
entity
Entity-based query
Multilingual query:
entity based query
+ original query +
translated query
#1: French
#2: Spanish
#3: Polish
Search results
Multilingual search for text objects
A focused view on the general strategy
Usage scenarios
● Input fulltext to multilingual search
● Enter search query in chosen language
● See search results
● Multilingual search would be extended with fulltext English
Outcomes
Caveat: no display/UX considerations at this stage!
Multilingual search for text objects
● Automated identification of text object language if needed
● Static translation of text objects to English
● Index fulltext in both English and source language
Proposals - indexing
● Automated identification of language of entered query
● Dynamically translate search phrase to English
● Submit query comprising of [original search phrase] + [English translation of search phrase]
Proposals - search
Multilingual search for text objects
● How successful is automated language detection?
● What is the projected cost of statically translating fulltext to
English?
● Benchmarking of search engine results that compare native
language keyword queries with English keyword queries
Validation points
What we've done
We have tested our cross-lingual search approach on transcriptions
of World War I objects from Transcribathons hosted by the Enrich
Europeana project. We have used the CEF eTranslation automatic
translation serviced and have assessed the prototype with a sample
of user queries from the Europeana 1914-1918 thematic collection.
Data acquisition
and processing
Original corpus:
● 18,257 transcriptions
● 17 languages
eTranslation didn't work only in 404 cases:
● Language not supported (Bosnian)
● Long text - can be fixed
Text objects (transcriptions)
Language tag Transcriptions Translated to English
de 9300 9151
fr 1669 1659
it 992 973
ro 578 577
nl 455 454
el 364 356
lv 226 226
bs 215 0
cs 90 90
da 90 90
sl 7 7
hu 3 2
es 2 2
pl 2 2
sk 2 2
hr 1 1
TOTAL (non-en) 13996 13592
en 4243 0
TOTAL 18239 13592
Data acquisition and processing
Original corpus:
● Sample from Google Analytics, 10 first
months of 2019
● 91 different queries
● 9 languages
eTranslation worked in all cases
Queries
Language tag Queries Translated to English
it 29 29
fr 14 14
de 12 12
pl 6 6
es 3 3
nl 2 2
ro 2 2
cs 1 1
TOTAL (non-en) 69 69
en 22 0
TOTAL 91 69
Results Translation brings more results in!
original query language translated query
results original
query
results
translated query
new docs retrieved
thanks to translation
domov cs home 2 1529 1527
Bernhard Stiens de Bernhard Stiens 16 21 8
cimitero de ciemitero 0 0 0
eastern front de Eastern front 345 1272 955
lagazuoi de lapiönoi 0 0 0
letters de letters 25 1935 1913
nova vas de Nova vas 4 31 29
Pinsk de Pinsk 1 1 0
podgora de podgora 1 7 6
Rokitno de Roitno 0 0 0
san elia de San elia 40 49 16
Talies de Talies 0 2 2
women de women 4 255 251
antonio sordi it Antonio Deaf 12 25 14
Asiago it Asiago 1) 4 2552 2548
avion it Avion 0 4 4
bini cima it Bini top 3 837 835
celle lager it lager cells 2 56 56
Example
Kriegstagebuch von Peter Arabin
contributed by Sigrid Arabin-Möhrer
CC-BY-SA
https://www.europeana.eu/portal/en/record/2020601/http
s___1914_1918_europeana_eu_contributions_6461.html
Evaluation
We didn't have time to do a
fine-grained evaluation of the
relevance of results, especially for
accuracy
original query language translated query
results original
query
results translated
query
new docs retrieved
thanks to translation
domov cs home 2 1529 1527
Bernhard Stiens de Bernhard Stiens 16 21 8
cimitero de ciemitero 0 0 0
eastern front de Eastern front 345 1272 955
lagazuoi de lapiönoi 0 0 0
letters de letters 25 1935 1913
nova vas de Nova vas 4 31 29
Pinsk de Pinsk 1 1 0
podgora de podgora 1 7 6
Rokitno de Roitno 0 0 0
san elia de San elia 40 49 16
Talies de Talies 0 2 2
women de women 4 255 251
antonio sordi it Antonio Deaf 12 25 14
Asiago it Asiago 1) 4 2552 2548
avion it Avion 0 4 4
bini cima it Bini top 3 837 835
celle lager it lager cells 2 56 56
What price are we ready to pay for such results?
Evaluation 1 - reproducing original results with
translations
For each language, we tested the overlap between results without
translation & results with translation, for queries and docs in that language
● 67% original results are retrieved after translation.
Extrapolation: we can expect that if we use translation we could discover 67% of the records
in other languages that are more likely to be good.
● 49% of translation-based results are confirmed in the original language.
Extrapolation: we would have to assume that 51% of the results are more likely to be noisy.
This is interesting but we need more evaluation, especially since
● We could do it only for 5 languages (in others the original queries had 0 results).
● We cannot assess possible beneficial side effects of translation over monolingual case, such
as matching synonyms.
Evaluation 2
- evaluating
query
translations
Assessing the quality of
translations for the 69
non-English queries
original query
(WWI collection) language
translated
query
good
translation
bad
translation
wrong
language
named entity,
no transl. applicable
named entity,
transl. applicable [...]
domov cs home 1
Bernhard Stiens de
Bernhard
Stiens 1
cimitero de ciemitero 1 1
eastern front de Eastern front 1
lagazuoi de lapiönoi 1 1
letters de letters 1
nova vas de Nova vas 1
Pinsk de Pinsk 1
podgora de podgora 1
Rokitno de Roitno 1 1
san elia de San elia 1
Talies de Talies 1
women de women 1
antonio sordi it Antonio Deaf 1 1
Asiago it Asiago 1) 1 1
avion it Avion
bini cima it Bini top 1 1
celle lager it lager cells 1 1 1
cellelager it celager
eastern front it Eastern front 1
fogliano it Fogliano 1
gaudioso matteo it Mr Matteo 1 1
gay flavio it Mr Gay Flavio 1 1
germania it Germany 1 1
Evaluation 2 - evaluating query translations
Winnowing the original set
● In 22 cases the system was given wrong input, like
typos or wrong language (einsenbahn in French?)
● In 4 cases we couldn't guess the user's intention
(avion on the Italian portal)
On the remaining 43 queries
● 37 queries were entities to be left unchanged, e.g.,
Bernhard Stiens (as opposed to Italia).
eTranslation correctly handled 20 of them (54%).
● eTranslation correctly translated 5 of the 6
remaining cases (83%).
Frankreich, Avion.- Soldatenfriedhof, Bundesarchiv, CC-BY-SA
http://www.bild.bundesarchiv.de/archives/barchpic/search/_1268685391/
General observation: in our case, we're straight into the long tail of the queries
Future work
● Really evaluate the relevance of cross-lingual search results
● Scale up
● Extend to metadata
● Evaluate the impact of cross-lingual search on search performance
● Better handle named entities
● Better language identification
● Decide if query translation is really the way to go...
The Chinese Market, 1767 - 1769, Rijksmuseum, Netherlands, Public domain
europeana.eu
@EuropeanaEU

More Related Content

What's hot

Multilingualism for Digital Europe
Multilingualism for Digital EuropeMultilingualism for Digital Europe
Multilingualism for Digital EuropeGeorg Rehm
 
AI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual EuropeAI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual EuropeGeorg Rehm
 
Human Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeHuman Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeGeorg Rehm
 
META-NET and META-SHARE: Language Technology for Europe
META-NET and META-SHARE: Language Technology for EuropeMETA-NET and META-SHARE: Language Technology for Europe
META-NET and META-SHARE: Language Technology for EuropeGeorg Rehm
 
The Strategic Impact of META-NET on the Regional, National and International ...
The Strategic Impact of META-NET on the Regional, National and International ...The Strategic Impact of META-NET on the Regional, National and International ...
The Strategic Impact of META-NET on the Regional, National and International ...Georg Rehm
 
Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)
Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)
Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)IMPACT Centre of Competence
 
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...Georg Rehm
 
The META-NET Language White Paper Series
The META-NET Language White Paper SeriesThe META-NET Language White Paper Series
The META-NET Language White Paper SeriesGeorg Rehm
 
Keynote new convergences between natural language processing and knowledge ...
Keynote   new convergences between natural language processing and knowledge ...Keynote   new convergences between natural language processing and knowledge ...
Keynote new convergences between natural language processing and knowledge ...semanticsconference
 
Digital Humanities @ Net7
Digital Humanities @ Net7Digital Humanities @ Net7
Digital Humanities @ Net7Net7
 
2015-11-18 research seminar
2015-11-18 research seminar2015-11-18 research seminar
2015-11-18 research seminarifi8106tlu
 
Pundit, an Open Source semantic annotation tool for the web
Pundit, an Open Source semantic annotation tool for the webPundit, an Open Source semantic annotation tool for the web
Pundit, an Open Source semantic annotation tool for the webNet7
 
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...Georg Rehm
 
Handbook learning cultures
Handbook learning culturesHandbook learning cultures
Handbook learning culturesAndrea Ciantar
 

What's hot (17)

Multilingualism for Digital Europe
Multilingualism for Digital EuropeMultilingualism for Digital Europe
Multilingualism for Digital Europe
 
AI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual EuropeAI for Translation Technologies and Multilingual Europe
AI for Translation Technologies and Multilingual Europe
 
Human Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual EuropeHuman Language Technologies in a Multilingual Europe
Human Language Technologies in a Multilingual Europe
 
META-NET and META-SHARE: Language Technology for Europe
META-NET and META-SHARE: Language Technology for EuropeMETA-NET and META-SHARE: Language Technology for Europe
META-NET and META-SHARE: Language Technology for Europe
 
The Strategic Impact of META-NET on the Regional, National and International ...
The Strategic Impact of META-NET on the Regional, National and International ...The Strategic Impact of META-NET on the Regional, National and International ...
The Strategic Impact of META-NET on the Regional, National and International ...
 
Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)
Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)
Impact Centre of Competence presentation at CERL 2014 by Tomasz Parkola (PSNC)
 
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
Language Technologies for Big Data – A Strategic Agenda for the Multilingual ...
 
08b final event_experimente
08b final event_experimente08b final event_experimente
08b final event_experimente
 
The META-NET Language White Paper Series
The META-NET Language White Paper SeriesThe META-NET Language White Paper Series
The META-NET Language White Paper Series
 
Keynote new convergences between natural language processing and knowledge ...
Keynote   new convergences between natural language processing and knowledge ...Keynote   new convergences between natural language processing and knowledge ...
Keynote new convergences between natural language processing and knowledge ...
 
Digital Humanities @ Net7
Digital Humanities @ Net7Digital Humanities @ Net7
Digital Humanities @ Net7
 
2015-11-18 research seminar
2015-11-18 research seminar2015-11-18 research seminar
2015-11-18 research seminar
 
Pundit, an Open Source semantic annotation tool for the web
Pundit, an Open Source semantic annotation tool for the webPundit, an Open Source semantic annotation tool for the web
Pundit, an Open Source semantic annotation tool for the web
 
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
Multilingual Europe in late 2016 – A Strategic Research and Innovation Agenda...
 
How African students contribute to Libreoffice 
How African students contribute to Libreoffice How African students contribute to Libreoffice 
How African students contribute to Libreoffice 
 
Handbook learning cultures
Handbook learning culturesHandbook learning cultures
Handbook learning cultures
 
03 isaac dm2-e14-full
03 isaac dm2-e14-full03 isaac dm2-e14-full
03 isaac dm2-e14-full
 

Similar to Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2, 25 october 2019

Towards a Human Language Project for Multilingual Europe: AI and Interpretation
Towards a Human Language Project for Multilingual Europe: AI and InterpretationTowards a Human Language Project for Multilingual Europe: AI and Interpretation
Towards a Human Language Project for Multilingual Europe: AI and InterpretationGeorg Rehm
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Péter Király
 
TAUS MT Showcace, MT Applications in the EU Public Sector, Adrejs Vasiljevs, ...
TAUS MT Showcace, MT Applications in the EU Public Sector, Adrejs Vasiljevs, ...TAUS MT Showcace, MT Applications in the EU Public Sector, Adrejs Vasiljevs, ...
TAUS MT Showcace, MT Applications in the EU Public Sector, Adrejs Vasiljevs, ...TAUS - The Language Data Network
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Prompsit Language Engineering
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Gema Ramirez-Sanchez
 
FrameNet development for Latvian
FrameNet development for LatvianFrameNet development for Latvian
FrameNet development for LatvianNormunds Grūzītis
 
ELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyDafydd Gibbon
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Project
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Waykantanmt
 
TraduXio project - Cosi10
TraduXio project - Cosi10TraduXio project - Cosi10
TraduXio project - Cosi10PhilippeLacour
 
EDF2012 Aris Karanikas - PortDial
EDF2012  Aris Karanikas - PortDialEDF2012  Aris Karanikas - PortDial
EDF2012 Aris Karanikas - PortDialEuropean Data Forum
 
Terminology turbocharges your translation: From my archive before TaaS ;-)
Terminology turbocharges your translation: From my archive before TaaS ;-)Terminology turbocharges your translation: From my archive before TaaS ;-)
Terminology turbocharges your translation: From my archive before TaaS ;-)Tatjana Gornostaja
 
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptxDataScienceConferenc1
 
Language Use And Preservation Online
Language Use And Preservation OnlineLanguage Use And Preservation Online
Language Use And Preservation OnlineTadej Gregorcic
 
MLi - Project presentation
MLi - Project presentationMLi - Project presentation
MLi - Project presentationMLi Project
 
Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...Web2Learn
 
Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...LangOER
 
Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...LangOER
 

Similar to Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2, 25 october 2019 (20)

Towards a Human Language Project for Multilingual Europe: AI and Interpretation
Towards a Human Language Project for Multilingual Europe: AI and InterpretationTowards a Human Language Project for Multilingual Europe: AI and Interpretation
Towards a Human Language Project for Multilingual Europe: AI and Interpretation
 
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
Evaluating Data Quality in Europeana: Metrics for Multilinguality (MTSR 2018)
 
TAUS MT Showcace, MT Applications in the EU Public Sector, Adrejs Vasiljevs, ...
TAUS MT Showcace, MT Applications in the EU Public Sector, Adrejs Vasiljevs, ...TAUS MT Showcace, MT Applications in the EU Public Sector, Adrejs Vasiljevs, ...
TAUS MT Showcace, MT Applications in the EU Public Sector, Adrejs Vasiljevs, ...
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...
 
Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...Apertium: a unique free/open-source MT system for related languages [but not ...
Apertium: a unique free/open-source MT system for related languages [but not ...
 
FrameNet development for Latvian
FrameNet development for LatvianFrameNet development for Latvian
FrameNet development for Latvian
 
Lemon at-mlw3
Lemon at-mlw3Lemon at-mlw3
Lemon at-mlw3
 
Achievement And Lessons Learned By An Loc
Achievement And Lessons Learned By An LocAchievement And Lessons Learned By An Loc
Achievement And Lessons Learned By An Loc
 
ELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technology
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Way
 
TraduXio project - Cosi10
TraduXio project - Cosi10TraduXio project - Cosi10
TraduXio project - Cosi10
 
EDF2012 Aris Karanikas - PortDial
EDF2012  Aris Karanikas - PortDialEDF2012  Aris Karanikas - PortDial
EDF2012 Aris Karanikas - PortDial
 
Terminology turbocharges your translation: From my archive before TaaS ;-)
Terminology turbocharges your translation: From my archive before TaaS ;-)Terminology turbocharges your translation: From my archive before TaaS ;-)
Terminology turbocharges your translation: From my archive before TaaS ;-)
 
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx
[DSC Europe 23] Slobodan Markovic - NLP for Serbian.pptx
 
Language Use And Preservation Online
Language Use And Preservation OnlineLanguage Use And Preservation Online
Language Use And Preservation Online
 
MLi - Project presentation
MLi - Project presentationMLi - Project presentation
MLi - Project presentation
 
Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...
 
Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...
 
Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...Framing quality indicators for multilingual repositories of Open Educational ...
Framing quality indicators for multilingual repositories of Open Educational ...
 

More from Europeana

Europeana Climate Action Community meetup 29_03_2022.pdf
Europeana Climate Action Community meetup 29_03_2022.pdfEuropeana Climate Action Community meetup 29_03_2022.pdf
Europeana Climate Action Community meetup 29_03_2022.pdfEuropeana
 
French Presidency - 1 march 2022
French Presidency - 1 march 2022French Presidency - 1 march 2022
French Presidency - 1 march 2022Europeana
 
Europeana Aggregators' Fair day 1
Europeana Aggregators' Fair day 1Europeana Aggregators' Fair day 1
Europeana Aggregators' Fair day 1Europeana
 
Europeana Aggregators' Fair day 2
Europeana Aggregators' Fair day 2Europeana Aggregators' Fair day 2
Europeana Aggregators' Fair day 2Europeana
 
Europeana web conference portuguese presidency of the council of the eu - jun...
Europeana web conference portuguese presidency of the council of the eu - jun...Europeana web conference portuguese presidency of the council of the eu - jun...
Europeana web conference portuguese presidency of the council of the eu - jun...Europeana
 
Europeana 2019 - Connect Communities - 27-28 November 2019 - Auditorium
Europeana 2019 - Connect Communities - 27-28 November 2019 - AuditoriumEuropeana 2019 - Connect Communities - 27-28 November 2019 - Auditorium
Europeana 2019 - Connect Communities - 27-28 November 2019 - AuditoriumEuropeana
 
Europeana 2019 - Connect Communities - 29 November 2019 - Auditorium
Europeana 2019 - Connect Communities - 29 November 2019 - AuditoriumEuropeana 2019 - Connect Communities - 29 November 2019 - Auditorium
Europeana 2019 - Connect Communities - 29 November 2019 - AuditoriumEuropeana
 
Europeana 2019 - Connect Communities - Pitch your project
Europeana 2019 - Connect Communities - Pitch your projectEuropeana 2019 - Connect Communities - Pitch your project
Europeana 2019 - Connect Communities - Pitch your projectEuropeana
 
Europeana 2019 - Connect Communities
Europeana 2019 - Connect CommunitiesEuropeana 2019 - Connect Communities
Europeana 2019 - Connect CommunitiesEuropeana
 
Europeana 2019 - Connect Communities
Europeana 2019 - Connect CommunitiesEuropeana 2019 - Connect Communities
Europeana 2019 - Connect CommunitiesEuropeana
 
The Europeana meeting under the Romanian Presidency, “Exposing Online the Eur...
The Europeana meeting under the Romanian Presidency, “Exposing Online the Eur...The Europeana meeting under the Romanian Presidency, “Exposing Online the Eur...
The Europeana meeting under the Romanian Presidency, “Exposing Online the Eur...Europeana
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...Europeana
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...Europeana
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...Europeana
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...Europeana
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...Europeana
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...Europeana
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...Europeana
 
Europeana Network Association Members Council Meeting 2019, The Hague by Marc...
Europeana Network Association Members Council Meeting 2019, The Hague by Marc...Europeana Network Association Members Council Meeting 2019, The Hague by Marc...
Europeana Network Association Members Council Meeting 2019, The Hague by Marc...Europeana
 
Europeana Network Association Members Council Meeting 2019, The Hague by Emil...
Europeana Network Association Members Council Meeting 2019, The Hague by Emil...Europeana Network Association Members Council Meeting 2019, The Hague by Emil...
Europeana Network Association Members Council Meeting 2019, The Hague by Emil...Europeana
 

More from Europeana (20)

Europeana Climate Action Community meetup 29_03_2022.pdf
Europeana Climate Action Community meetup 29_03_2022.pdfEuropeana Climate Action Community meetup 29_03_2022.pdf
Europeana Climate Action Community meetup 29_03_2022.pdf
 
French Presidency - 1 march 2022
French Presidency - 1 march 2022French Presidency - 1 march 2022
French Presidency - 1 march 2022
 
Europeana Aggregators' Fair day 1
Europeana Aggregators' Fair day 1Europeana Aggregators' Fair day 1
Europeana Aggregators' Fair day 1
 
Europeana Aggregators' Fair day 2
Europeana Aggregators' Fair day 2Europeana Aggregators' Fair day 2
Europeana Aggregators' Fair day 2
 
Europeana web conference portuguese presidency of the council of the eu - jun...
Europeana web conference portuguese presidency of the council of the eu - jun...Europeana web conference portuguese presidency of the council of the eu - jun...
Europeana web conference portuguese presidency of the council of the eu - jun...
 
Europeana 2019 - Connect Communities - 27-28 November 2019 - Auditorium
Europeana 2019 - Connect Communities - 27-28 November 2019 - AuditoriumEuropeana 2019 - Connect Communities - 27-28 November 2019 - Auditorium
Europeana 2019 - Connect Communities - 27-28 November 2019 - Auditorium
 
Europeana 2019 - Connect Communities - 29 November 2019 - Auditorium
Europeana 2019 - Connect Communities - 29 November 2019 - AuditoriumEuropeana 2019 - Connect Communities - 29 November 2019 - Auditorium
Europeana 2019 - Connect Communities - 29 November 2019 - Auditorium
 
Europeana 2019 - Connect Communities - Pitch your project
Europeana 2019 - Connect Communities - Pitch your projectEuropeana 2019 - Connect Communities - Pitch your project
Europeana 2019 - Connect Communities - Pitch your project
 
Europeana 2019 - Connect Communities
Europeana 2019 - Connect CommunitiesEuropeana 2019 - Connect Communities
Europeana 2019 - Connect Communities
 
Europeana 2019 - Connect Communities
Europeana 2019 - Connect CommunitiesEuropeana 2019 - Connect Communities
Europeana 2019 - Connect Communities
 
The Europeana meeting under the Romanian Presidency, “Exposing Online the Eur...
The Europeana meeting under the Romanian Presidency, “Exposing Online the Eur...The Europeana meeting under the Romanian Presidency, “Exposing Online the Eur...
The Europeana meeting under the Romanian Presidency, “Exposing Online the Eur...
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
 
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
The Europeana meeting under the Romanian Presidency, Exposing Online the Euro...
 
Europeana Network Association Members Council Meeting 2019, The Hague by Marc...
Europeana Network Association Members Council Meeting 2019, The Hague by Marc...Europeana Network Association Members Council Meeting 2019, The Hague by Marc...
Europeana Network Association Members Council Meeting 2019, The Hague by Marc...
 
Europeana Network Association Members Council Meeting 2019, The Hague by Emil...
Europeana Network Association Members Council Meeting 2019, The Hague by Emil...Europeana Network Association Members Council Meeting 2019, The Hague by Emil...
Europeana Network Association Members Council Meeting 2019, The Hague by Emil...
 

Recently uploaded

Anne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxAnne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxnoorehahmad
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Escort Service
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...NETWAYS
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxFamilyWorshipCenterD
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxCarrieButtitta
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)Basil Achie
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...NETWAYS
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Krijn Poppe
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGYpruthirajnayak525
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@vikas rana
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...marjmae69
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power
 
James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !risocarla2016
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxJohnree4
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringSebastiano Panichella
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 

Recently uploaded (20)

Anne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptxAnne Frank A Beacon of Hope amidst darkness ppt.pptx
Anne Frank A Beacon of Hope amidst darkness ppt.pptx
 
Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170Call Girls In Aerocity 🤳 Call Us +919599264170
Call Girls In Aerocity 🤳 Call Us +919599264170
 
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
OSCamp Kubernetes 2024 | Zero-Touch OS-Infrastruktur für Container und Kubern...
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptxGenesis part 2 Isaiah Scudder 04-24-2024.pptx
Genesis part 2 Isaiah Scudder 04-24-2024.pptx
 
miladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptxmiladyskindiseases-200705210221 2.!!pptx
miladyskindiseases-200705210221 2.!!pptx
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
 
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
Presentation for the Strategic Dialogue on the Future of Agriculture, Brussel...
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC  - NANOTECHNOLOGYPHYSICS PROJECT BY MSC  - NANOTECHNOLOGY
PHYSICS PROJECT BY MSC - NANOTECHNOLOGY
 
call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@call girls in delhi malviya nagar @9811711561@
call girls in delhi malviya nagar @9811711561@
 
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
Gaps, Issues and Challenges in the Implementation of Mother Tongue Based-Mult...
 
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular PlasticsDutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
Dutch Power - 26 maart 2024 - Henk Kras - Circular Plastics
 
James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !James Joyce, Dubliners and Ulysses.ppt !
James Joyce, Dubliners and Ulysses.ppt !
 
Genshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptxGenshin Impact PPT Template by EaTemp.pptx
Genshin Impact PPT Template by EaTemp.pptx
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
The 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software EngineeringThe 3rd Intl. Workshop on NL-based Software Engineering
The 3rd Intl. Workshop on NL-based Software Engineering
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 

Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2, 25 october 2019

  • 1. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 EUROPEANA MEETING UNDER FINLAND’S PRESIDENCY OF THE COUNCIL OF THE EU ESPOO, FINLAND 25 October 2019
  • 2. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Andy Neale Technical Director Europeana Foundation Recap on main conclusions of Day 1
  • 3.
  • 4. Content Information Access Interactions User Interface Metadata and digital CH objects Search, Browse & Explore Show user‘s preferred language Bridge the gap between language of user input and content Layers of digital CH system Juliane
  • 5. Mismatch between query and content language • Mona Lisa 203 results • Monna Lisa 13 results • La Gioconda 376 results  • La Joconde 78 results 5 Interactions Roma, Galleria Corsini - La Gioconda, Juliane
  • 6. Challenges • Missing training data for small languages • Missing training data for (sub)domains • Amount of language pairs is immense with 50+ languages • Metadata is too scarce for good translation results 6 Juliane
  • 7. Evaluate solution based on goal ○ E.g. for ML retrieval we might not need the perfect fluent translation ○ Identify the impact of different workflows / processes on multilinguality of system ○ Translations do not only have an impact on data but also on retrieval and therefore on user satisfaction 7 Juliane
  • 8. Challenges for LT in cultural heritage ● Interface or content (= multilingual in a broad sense) ● Far beyond modern standard language use ● Great variation makes domain adaptation hard ● Variation in place (dialects and languages), time (old Swedish) and situation (informal-formal) ● Modal variation in collections: (handwritten) text, speech, pictures ● Hard to handle as researchers want to explore a collection as a whole Rickard
  • 9. Next steps ● Linked data to describe the collection conceptually and relationally ● Multilingual search methods for handling language variation in place, time and situation ● Domain adopted speech-to-text conversion to transcribe recordings ● Crowdsourcing for correcting ● Shared resources for the languages, dialects, domains etc ● Long time funding for the National Language Bank ● Collaborative projects involving LTists, researchers and data holders Rickard
  • 10. Hugo.lv – AI powered language technology portal Andrejs & Jānis
  • 11. Conclusions • New generation of Neural MT strongly improves quality and applicability of machine translation, especially for morphology rich languages • Domain specific data is crucial for making MT suitable for cultural and other domains • Depending on the application, translation needs can be served by selecting the most efficient approach – pure MT, human review of the MT, or fully human translation • We will be happy to share our experience, technologies and tools :) Andrejs & Jānis
  • 12. Development Implementation Operation and maintenance Initiation (of a new service) time Process-time Use-time Future Who are involved in the development and implementation of your service? What kinds of benefits can be identified? Who uses your service? Are there other stakeholders? What kinds of benefits can be identified? Who could (re)use your service or materials in the (undefined) future? What kinds of benefits can be anticipated? Model for temporal division of benefits Kautonen, H. & Nieminen, M. (2018): Conceptualizing Benefits of User-Centered Design for Digital Library Services. Liber Quarterly, 28(1), ss. 1–34. DOI: http://doi.org/10.18352/lq.10231. Heli
  • 13. Dasha
  • 14. Dasha
  • 15. Dasha
  • 16. Language detection and display (for validation) Query translated in 24 languages Dasha
  • 17. THE NATIONAL LIBRARY OF FINLAND Thesaurus to ontology ▪ Reconstruction of YSA into machine-readable and multilingual YSO ▪ Trilingual terms for concepts (fin, swe, eng) ▪ YSA and Allärs merged together and translated into English ▪ Concepts are a compromise between Finnish and Swedish as YSA and Allärs are not completely identical ▪ Links to Library of Congress Subject Headings (LCSH) ▪ Linking to Wikidata underway ▪ YSO just made the list of Europeana dereferenceable vocabularies that can be enriched in the Europeana portal Matias
  • 18. THE NATIONAL LIBRARY OF FINLAND Annotate in one language, find using another Matias
  • 19. THE NATIONAL LIBRARY OF FINLAND Automated Subject Indexing made easy: Annif ▪ An open source multilingual automated subject indexing system using machine learning and our own vocabularies Matias
  • 21. Proposals for indexing and storing translations ● Automated identification of language if needed (only 26.5% of the data provider’s metadata is language qualified) ● Use translations from multilingual knowledge graph ● Augment the provider metadata with static translation of the fields to English (to fill metadata values not covered by the knowledge graph) ● Store and index translated metadata for search and display (original metadata + languages of the knowledge graph + English) Hugo
  • 22. Proposals for search on object metadata Identify language Original query Translate to English Multilingual index User Disambiguates Search Translated query (English) Suggest Entity (Knowledge Graph) Entity-based query Multilingual query: entity based query OR original query + translated query #1: French #2: Spanish #3: Polish Hugo
  • 23. Session 4 CONTENT TRANSLATION Europa [Material cartográfico] : Nach den vorzüglichsten Hülfsnitteln, Götze, Johann August Ferdinand, 1773-1819 Biblioteca Digital de Madrid Spain, Public domain
  • 24. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Tom Vanallemeersch Machine translation specialist CrossLang The art of automating translation
  • 25. Cultural heritage and translation ● Translation helps to open up cultures ● Rosetta stone was the key to understanding hieroglyphs Parallel data ● Systems for automated translation (now) act in a similar way ● However, the right stones are required, and many of them ...
  • 26. Context of this talk EC project SMART 2016/0103: ● Identification of language technology needs of Digital Service Infrastructures of EC E.g. Europeana DSI ● Framework: Connecting Europe Facility – Automated Translation ● Contracting authority: DG CNECT (the EC's multilingual enabler) ● Consortium:
  • 27. Guide to this talk Machine translation (MT): ● In general ● In a highly multilingual environment: eTranslation (EC) ● For EU cultural heritage Challenges: ●Domain imbalance ●Language imbalance ●Context demand ●Multimodal sources Approaches
  • 28. MT in general ● MT systems are data-driven 🡪 Sentence pairs: They were living there - Ils habitaient là-bas 🡪 Software consisting of a neural network (like many recent AI applications) ● MT is used for various purposes 🡪 Post-editing, gisting, cross-lingual retrieval
  • 29. MT in general: domain imbalance ● Quality typically improves when increasing training data ● But there are few (accessible) translations in some domains ● The same problem occurs for specific genres (e.g. novels) and registers (e.g. informal language) Difference in amount of domain-specific resources
  • 30. MT in general: domain imbalance Approach: identify/create domain-specific data ● Select sentence pairs from the vast ParaCrawl Corpus ● Use the ParaCrawl toolkit for multilingual websites, archives ● Select domain-specific parallel corpora from the ELRC-SHARE repository ● Create artificial training data: e.g. apply MT to French in-domain data, add the English translations to English-French MT system Difference in amount of domain-specific resources
  • 31. Guide to this talk Machine translation: ● In general ● In a highly multilingual environment: eTranslation (EC) ● For EU cultural heritage Challenges: ●Domain imbalance ●Language imbalance ●Context demand ●Multimodal sources Approaches
  • 32. eTranslation ● 130+ out of 552 language pairs, often from or into English ● Sometimes pivot: ● Management: DG Translation (technical), DG CNECT (EU’s MT policy) ● Users: translators of DG Translation, public administrations in the EEA ● Free use ● Confidentiality and security MT system for 24 official EU languages + Icelandic and Norwegian (Bokmål) Finnish English Portuguese
  • 33. eTranslation ● User interface: snippets, documents ● API: online services, … ● Domain of training data: legal and administrative texts ● Specific MT systems for some organisations 🡪 E.g. Court of Justice (French ⇄ X) MT system for 24 official EU languages + Icelandic and Norwegian (Bokmål)
  • 34. eTranslation: language imbalance ● Resource-rich language pairs (many parallel data), e.g. English-French ● Resource-poor language pairs, e.g. English-Irish, English-Icelandic 🡪 Lower MT quality Difference in amount of training data for language pairs
  • 35. eTranslation: language imbalance Approach: build multilingual models ● Recent research topic in MT ● Translation from many languages into one, from one into many, etc. ● Language pairs that “learn” from each other how to translate (pieces of) words ● Surprising improvements for resource-poor language pairs Difference in amount of training data for language pairs
  • 36. eTranslation: language imbalance Approach: build multilingual models (continued) ● Recent workshop in Luxembourg, organised by CrossLang for DG CNECT 🡪 Moderated by high-profile expert from Facebook ● Google AI group: attempts at creating “universal MT” (102 languages for now) ● Opportunity for scaling up MT Difference in amount of training data for language pairs
  • 37. Guide to this talk Machine translation: ● In general ● In a highly multilingual environment: eTranslation (EC) ● For EU cultural heritage Challenges: ●Domain imbalance ●Language imbalance ●Context demand ●Multimodal sources Approaches
  • 38. MT for culture ● Post-editing: e.g. static text on websites ● Gisting: e.g. dynamic text like visitors’ comments ● Cross-lingual retrieval: e.g. search for objects having metadata in another language Potential uses
  • 39. MT for culture: context demand Metadata consisting of short text fragments Title: note, bank = “financial institution” / “location near river” ? = “comment” / “money” ?
  • 40. MT for culture: context demand Metadata consisting of short text fragments Title: note, bank Subject: paper money = “comment” / ”money” ? 🡪 Dutch: biljet Approach: make use of the remainder of the metadata
  • 41. MT for culture: context demand Metadata consisting of short text fragments Approach: make use of the remainder of the metadata 🡪 Approach is also useful for named entity recognition: Description: The Utrecht artist De Heem is regarded as one … Artist: Jan Davidsz de Heem
  • 42. MTforculture:languageimbalance Little or no parallel data involving “dead” / minority languages Approach for related languages: use available data + additional techniques ● Minority language + larger language ● Old + new language variant ● Advantage: similar vocabulary, spelling
  • 43. MTforculture:languageimbalance Little or no parallel data involving “dead” / minority languages Alternative approach for related languages: train an unsupervised MT system ● Uses monolingual corpora for the two languages ● Identifies similar words and sentences in both languages ● Learns to translate in both directions
  • 44. MT for culture: multimodal sources Translation in case of non-textual objects (including non-digitised text) ● Audio material ● Scanned documents ● Photographs with text ● Images without text Speech recognition OCR OCR (?) Text describing image Imperfect MT input
  • 45. MT for culture: multimodal sources Translation in case of non-textual objects (including non-digitised text) Approach: correct output using metadata before applying MT OCR: Demer en Capueienen Metadata: … Capucienen …
  • 46. Conclusions ● MT for cultural heritage stretches across many dimensions Languages, domains, genres, registers, periods, … ● It is a particularly interesting and demanding area for MT Huge potential of multilingual object metadata, big challenges ● Approaches involve new information sources, refinement of tools and methods
  • 47. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Antoine Isaac R&D Manager Europeana Foundation Case study - Content translation and search
  • 48. Aspects of multilingual experience - Content A focused view of our conceptual model of multilingual approach
  • 49. First experiments - Translation of virtual exhibitions
  • 50. Translation of virtual exhibitions Pilot: apply eTranslation to assist manual translation of exhibitions ● Exhibitions from two Generic Services projects: ○ Migration in the Arts and Sciences ○ Rise of Literacy ● 13 people from 11 institutions reviewed translations from English into 8 languages: ○ Dutch, French, Hungarian, Italian, Lithuanian, Polish, Portuguese, Slovenian NB: no German (for which eTranslation has a "cultural" version)
  • 51. Translation of virtual exhibitions Pilot: apply eTranslation to assist manual translation of exhibitions ● The output is medium to good but does not translate well the carefully crafted narrative text, leading to partners spending a lot of time rewriting ● The quality is too low yet to translate exhibitions sustainably and cost-effectively
  • 52. Ongoing experiments - content translation and search New case study: using translation in search for text objects ● An important need for Europeana (cf. Newspapers, Transcriptions) ● One that may still work with less-than-perfect translations
  • 53. The strategy for using translation in cross-lingual search Identify language Original query Translate to English Multilingual index User validation Search Translated query (English) Align to entity Entity-based query Multilingual query: entity based query + original query + translated query #1: French #2: Spanish #3: Polish Search results
  • 54. Multilingual search for text objects A focused view on the general strategy Usage scenarios ● Input fulltext to multilingual search ● Enter search query in chosen language ● See search results ● Multilingual search would be extended with fulltext English Outcomes Caveat: no display/UX considerations at this stage!
  • 55. Multilingual search for text objects ● Automated identification of text object language if needed ● Static translation of text objects to English ● Index fulltext in both English and source language Proposals - indexing ● Automated identification of language of entered query ● Dynamically translate search phrase to English ● Submit query comprising of [original search phrase] + [English translation of search phrase] Proposals - search
  • 56. Multilingual search for text objects ● How successful is automated language detection? ● What is the projected cost of statically translating fulltext to English? ● Benchmarking of search engine results that compare native language keyword queries with English keyword queries Validation points
  • 57. What we've done We have tested our cross-lingual search approach on transcriptions of World War I objects from Transcribathons hosted by the Enrich Europeana project. We have used the CEF eTranslation automatic translation serviced and have assessed the prototype with a sample of user queries from the Europeana 1914-1918 thematic collection.
  • 58. Data acquisition and processing Original corpus: ● 18,257 transcriptions ● 17 languages eTranslation didn't work only in 404 cases: ● Language not supported (Bosnian) ● Long text - can be fixed Text objects (transcriptions) Language tag Transcriptions Translated to English de 9300 9151 fr 1669 1659 it 992 973 ro 578 577 nl 455 454 el 364 356 lv 226 226 bs 215 0 cs 90 90 da 90 90 sl 7 7 hu 3 2 es 2 2 pl 2 2 sk 2 2 hr 1 1 TOTAL (non-en) 13996 13592 en 4243 0 TOTAL 18239 13592
  • 59. Data acquisition and processing Original corpus: ● Sample from Google Analytics, 10 first months of 2019 ● 91 different queries ● 9 languages eTranslation worked in all cases Queries Language tag Queries Translated to English it 29 29 fr 14 14 de 12 12 pl 6 6 es 3 3 nl 2 2 ro 2 2 cs 1 1 TOTAL (non-en) 69 69 en 22 0 TOTAL 91 69
  • 60. Results Translation brings more results in! original query language translated query results original query results translated query new docs retrieved thanks to translation domov cs home 2 1529 1527 Bernhard Stiens de Bernhard Stiens 16 21 8 cimitero de ciemitero 0 0 0 eastern front de Eastern front 345 1272 955 lagazuoi de lapiönoi 0 0 0 letters de letters 25 1935 1913 nova vas de Nova vas 4 31 29 Pinsk de Pinsk 1 1 0 podgora de podgora 1 7 6 Rokitno de Roitno 0 0 0 san elia de San elia 40 49 16 Talies de Talies 0 2 2 women de women 4 255 251 antonio sordi it Antonio Deaf 12 25 14 Asiago it Asiago 1) 4 2552 2548 avion it Avion 0 4 4 bini cima it Bini top 3 837 835 celle lager it lager cells 2 56 56
  • 61. Example Kriegstagebuch von Peter Arabin contributed by Sigrid Arabin-Möhrer CC-BY-SA https://www.europeana.eu/portal/en/record/2020601/http s___1914_1918_europeana_eu_contributions_6461.html
  • 62. Evaluation We didn't have time to do a fine-grained evaluation of the relevance of results, especially for accuracy original query language translated query results original query results translated query new docs retrieved thanks to translation domov cs home 2 1529 1527 Bernhard Stiens de Bernhard Stiens 16 21 8 cimitero de ciemitero 0 0 0 eastern front de Eastern front 345 1272 955 lagazuoi de lapiönoi 0 0 0 letters de letters 25 1935 1913 nova vas de Nova vas 4 31 29 Pinsk de Pinsk 1 1 0 podgora de podgora 1 7 6 Rokitno de Roitno 0 0 0 san elia de San elia 40 49 16 Talies de Talies 0 2 2 women de women 4 255 251 antonio sordi it Antonio Deaf 12 25 14 Asiago it Asiago 1) 4 2552 2548 avion it Avion 0 4 4 bini cima it Bini top 3 837 835 celle lager it lager cells 2 56 56 What price are we ready to pay for such results?
  • 63. Evaluation 1 - reproducing original results with translations For each language, we tested the overlap between results without translation & results with translation, for queries and docs in that language ● 67% original results are retrieved after translation. Extrapolation: we can expect that if we use translation we could discover 67% of the records in other languages that are more likely to be good. ● 49% of translation-based results are confirmed in the original language. Extrapolation: we would have to assume that 51% of the results are more likely to be noisy. This is interesting but we need more evaluation, especially since ● We could do it only for 5 languages (in others the original queries had 0 results). ● We cannot assess possible beneficial side effects of translation over monolingual case, such as matching synonyms.
  • 64. Evaluation 2 - evaluating query translations Assessing the quality of translations for the 69 non-English queries original query (WWI collection) language translated query good translation bad translation wrong language named entity, no transl. applicable named entity, transl. applicable [...] domov cs home 1 Bernhard Stiens de Bernhard Stiens 1 cimitero de ciemitero 1 1 eastern front de Eastern front 1 lagazuoi de lapiönoi 1 1 letters de letters 1 nova vas de Nova vas 1 Pinsk de Pinsk 1 podgora de podgora 1 Rokitno de Roitno 1 1 san elia de San elia 1 Talies de Talies 1 women de women 1 antonio sordi it Antonio Deaf 1 1 Asiago it Asiago 1) 1 1 avion it Avion bini cima it Bini top 1 1 celle lager it lager cells 1 1 1 cellelager it celager eastern front it Eastern front 1 fogliano it Fogliano 1 gaudioso matteo it Mr Matteo 1 1 gay flavio it Mr Gay Flavio 1 1 germania it Germany 1 1
  • 65. Evaluation 2 - evaluating query translations Winnowing the original set ● In 22 cases the system was given wrong input, like typos or wrong language (einsenbahn in French?) ● In 4 cases we couldn't guess the user's intention (avion on the Italian portal) On the remaining 43 queries ● 37 queries were entities to be left unchanged, e.g., Bernhard Stiens (as opposed to Italia). eTranslation correctly handled 20 of them (54%). ● eTranslation correctly translated 5 of the 6 remaining cases (83%). Frankreich, Avion.- Soldatenfriedhof, Bundesarchiv, CC-BY-SA http://www.bild.bundesarchiv.de/archives/barchpic/search/_1268685391/ General observation: in our case, we're straight into the long tail of the queries
  • 66. Future work ● Really evaluate the relevance of cross-lingual search results ● Scale up ● Extend to metadata ● Evaluate the impact of cross-lingual search on search performance ● Better handle named entities ● Better language identification ● Decide if query translation is really the way to go...
  • 67. The Chinese Market, 1767 - 1769, Rijksmuseum, Netherlands, Public domain europeana.eu @EuropeanaEU