Europeana and multi-lingual access – challenges and possibilities First an intro to Europeana. What we are and what we are not, www.europeana.eu To identify the major challenges facing Europeana conerning multi-lingual access and t o sketch possible solutions Challenge: Ontologies and multi-lingual labelling of metadata Challenge: Query translation Challenge: Results translation of metadata Challenge: Localisation of the Europeana portal Painter: Lucas van Valckenborch File: http://commons.wikimedia.org/wiki/File:Valckenborch_tower-babel.jpg
Challenge: Ontologies Currently we use multi-lingual ontologies to create multi-lingual labels and index them for search Probably our main route forward However, it’s difficult to find ontologies and authority files that cover all Europeana languages (the EU 27 languages) Operational ontologies in Europeana: Dbpedia, GEMET, GeoNames Other ontologies we’re looking at: VIAF, LCSH We prefer openly licensed resources We prefer resources modelled in SKOS
Challenge: Query translation Under development Main efforts are part of EuropeanaConnect Work Packages 1 and 2 www.europeanaconnect.eu Basis is language identification Named entity recognition Licensed resources XEROX CELI Open resources Language resources registry http://europeanalabs.eu/wiki/LinguisticResourceRegister Inventory of vocabularies and language resources http://europeanalabs.eu/wiki/WP12Vocabularies http://europeanalabs.eu/wiki/WP2LanguageResources Google and Bing Translation APIs Very good at to/from English Evaluation of Proprietary vs. Open vs. Google/Bing (morphological/dictionaries)
Challenge: Results translation Already in production in the Europeana portal Commercial APIs the only practical option? Cover numeroous languages and have easy to work with, well documented APIs Problems: Can be shut down, as Google Translate that will be shut down December 2011 Are there open and free alternatives? Crowdsourced translation is something we’re considering However even if successful it will barely dent our c. 20 million metadata records!
Challenge: Localisation Currently we use our own network of Europeana partner institutions and have volunteers there Problem with scale! Solution? Larger translation communities e.g. TranslateWiki
Any questions? This poster by an unknown artist is courtesy of the Municipal Library of Lyon The work is in the public domain Slides 2-5 are taken from the Europeana Strategic Plan
Europeana and multi-lingual access – challenges and possibilities FLaReNet Forum 2011 David Haskiya, Product Developer&Project Coordinator Europeana, www.europeana.eu
Semantic enrichment and multi-lingual labelling Ontologies
Persones, places, periods, subjects
GEMET, Dbpedia, GeoNames, etc.
Query translation Language resources – open and licensed, commercial APIs
Licensed resources, e.g. XEROX or CELI
Open resources, e.g. WordNet
Free but commercial APIs, e.g. Google and Bing
Weighing strengths and weaknesses
Commercial APIs the only realistic solution? Results translation
Commercial translation APIs
Google & Bing (Microsoft)
Crowdsourced metadata translations
But even if successful, we have 20 million records…
Or nothing? Are there open, easy to develop against alternatives?
How do we translate the user interface and editorial texts into 27 languages? Localisation