Successfully reported this slideshow.
Your SlideShare is downloading. ×

Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1, 24 october 2019

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 212 Ad

More Related Content

Slideshows for you (15)

Similar to Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1, 24 october 2019 (20)

Advertisement

More from Europeana (20)

Recently uploaded (20)

Advertisement

Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1, 24 october 2019

  1. 1. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 EUROPEANA MEETING UNDER FINLAND’S PRESIDENCY OF THE COUNCIL OF THE EU ESPOO, FINLAND 24 October 2019
  2. 2. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Andy Neale Technical Director Europeana Foundation
  3. 3. Europa [Material cartográfico] : Nach den vorzüglichsten Hülfsnitteln, Götze, Johann August Ferdinand, 1773-1819 Biblioteca Digital de Madrid Spain, Public domain Contribution to EU GDP by culture and creative sectors Trade Surplus in cultural goods € 8.7B 4.2% New Agenda for Culture
  4. 4. Automotive + Manufacturing + Chemical Industries Cultural + Creative Sector 7.8M 4.4M> Employment
  5. 5. young professionals (15-29 yrs old) 19.1%
  6. 6. The role of Europeana Europeana Party People @ Christmas party, CC BY
  7. 7. We support cultural heritage institutions in their digital transformation
  8. 8. Europa [Material cartográfico] : Nach den vorzüglichsten Hülfsnitteln, Götze, Johann August Ferdinand, 1773-1819 Biblioteca Digital de Madrid Spain, Public domain 3.700 CHIs across Europe
  9. 9. EUROPEANA COLLECTIONS 58m Cultural heritage records Europa [Material cartográfico] : Nach den vorzüglichsten Hülfsnitteln, Götze, Johann August Ferdinand, 1773-1819 Biblioteca Digital de Madrid Spain, Public domain 2.5bln Information items
  10. 10. 1. Common Tech & data architecture Europeana Data Model + Metis
  11. 11. 2. Common policies & standards Europeana ● Licensing Framework ● Publishing Framework
  12. 12. Statements for works that are not in copyright Statements for works where the copyright status is unclear Statements for works that are in copyright
  13. 13. 3. Websites & APIs Europeana Collections
  14. 14. Programme Europeana Party People @ Christmas party, CC BY
  15. 15. Objectives 1. Stimulate reflection on multilingualism in digital cultural heritage at large using Europeana as a case study; 2. Develop a deeper understanding of the multilingualism problem/opportunity space for digital cultural heritage; 3. Consider what options can be pursued to provoke action at the local level, furthering the multilingual capabilities; 4. Provide input and feedback for the Europeana multilingual strategy.
  16. 16. Sessions 1. Setting the scene 2. User interactions 3. Multilingual metadata 4. Content translation 5. Conclusions and steps for progress
  17. 17. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Juliane Stiller Information Specialist You, We & Digital ‘Multilingual Developments in Digital Cultural Heritage Domain: Problem Space & Solutions’
  18. 18. 20 ● 10 years researcher at Humboldt-Universität zu Berlin in Europeana-related projects ● multilinguality, interaction patterns, metadata and its quality, research on search and browse, retrieval, evaluation ● since 2019 consultancy and training in digital literacy @stillinsky
  19. 19. Agenda • Multilinguality: the problem space • Bridging the language gap • Translations • Enrichments • What is left to do? 21
  20. 20. 22 Multilinguality The Problem Space Christoffel van Sichem: Bouw van de toren van Babel
  21. 21. Content Information Access Interactions User Interface Metadata and digital CH objects Search, Browse & Explore Show user‘s preferred language Bridge the gap between language of user input and content Layers of digital CH system
  22. 22. User Interface 24 Challenges: • Translation of static and dynamic pages • Switching languages via text or icons such as flags • Default language • Determine the user‘s preferred language through IP address or browser settings User Interface
  23. 23. Interactions: Search 25 Interactions
  24. 24. Mismatch between query and content language • Mona Lisa 203 results • Monna Lisa 13 results • La Gioconda 376 results  • La Joconde 78 results 26 Interactions Roma, Galleria Corsini - La Gioconda,
  25. 25. Interactions: Browse ● Search vs. browse ● (Metadata) text vs. object 27 Interactions
  26. 26. Interactions: Explore cater for different information needs in different languages: • Entities • Colors, format • Access & copyrights • Inspiration 28 Interactions
  27. 27. Content & Metadata 29 Image Credit: both from Europeana with Titlte „Kinderbuch” from Spielzeugmuseum der Stadt Nürnberg (CC BY-NC-SA) Content
  28. 28. Metadata multilinguality 30+ 40 other languages.... Content
  29. 29. Bridging the language gap Translations & Enrichments 31 Bridge by Mark Robinson (CC-BY 2.0)
  30. 30. To bridge the gap between language of user input and content, one can translate 1. Queries 2. Content / Metadata 3. ..... 32
  31. 31. 1) Translating queries 33 Query English Spanish French .... comes with challenges .... Database Information Access
  32. 32. Cultural heritage queries 34 κερκυρα poblet bævre humble østerskovvej espana salamanca academia coleccam documentos estatutos εσκι σεχιρ first war world berlin berliner mauer or wall alphonse mucha
  33. 33. Query heterogeneity & long tail 35 Europeana queries in a month in 2016 442 times: Wolfgang Amadeus Mozart once: full history of ging tsholing in bhutan
  34. 34. Queries in cultural heritage are ● Short ● Heterogeneous ● Focus on entities: 61.96% of the queries contain NE (Stiller, Gäde & Petras, 2010) ● Highly ambiguous in language: ○ “culture”, “administration”, “paris”, “madonna” ● Semantically ambiguous: ○ “barber” (composer or hairdresser) 36
  35. 35. Multilingual academic search ● informational queries from the psychology domain in 4 languages: pubpsych.eu ● Buildung domain-specific lexical resources and map them to queries; entries look like this: ○ wohlbefinden|||en:well-being|||es:bienestar|||fr:bien-etre ○ wohlfuhlen|||en:well-being|||es:bienestar|||fr:bien-etre ○ Well-being|||es:bienestar|||de:wohlbefinden|||fr:bien-etre ● Translation does not depend on language identification ● Deals well with NE -> no match in Lexicon, no translation More Info on the project: https://www.clubs-project.eu/en/
  36. 36. Query 2) Translate the content 38 Spanish French German English Content English French German Spanish Content Database
  37. 37. Metadata heterogeneity & sparsity 39 http://www.europeana.eu/port al/en/record/92022/Bibliograp hicResource_1000125938148.ht ml https://www.europeana.eu/portal/en/record/92022/Bibl iographicResource_1000125938148.html
  38. 38. Challenges • Missing training data for small languages • Missing training data for (sub)domains • Amount of language pairs is immense with 50+ languages • Metadata is too scarce for good translation results 40
  39. 39. Enrichment 41
  40. 40. 42 Enrich metadata
  41. 41. Number of enriched objects, their type and vocabularies GeoNames 7 Millions GEMET, DBpedia 9.2 Millions Semium Time 10.2 Millions DBpedia 144,000 Time Concept Locations Agents Enriched entities in Europeana
  42. 42. Semantically incorrect enrichment Polen (Dutch) Polen (Basque)
  43. 43. What is left to do? 45
  44. 44. Adapt to queries Entity graphs for exploration • Object • Person • Concept • Period • Location • Event 46
  45. 45. Evaluate solution based on goal ○ E.g. for ML retrieval we might not need the perfect fluent translation ○ Identify the impact of different workflows / processes on multilinguality of system ○ Translations do not only have an impact on data but also on retrieval and therefore on user satisfaction 47
  46. 46. Thank you! http://tatecollectives.tumblr.com/tagged/1840s-GIF-Party 48 @stillinsky hello@you-we-digital.com
  47. 47. References • Petras, V., Hill, T., Stiller, J., & Gäde, M. (2017). Europeana – a Search Engine for Digitised Cultural Heritage Material. Datenbank-Spektrum, 1–6. https://doi.org/10.1007/s13222-016-0238-1 • Hill, T. D., Charles, V., Isaac, A., & Stiller, J. (2016). “Searching for Inspiration”: User Needs and Search Architecture in Europeana Collections. ASIS&T 2016 Annual Meeting. • Manguinhas H (2016) Europeana Semantic Enrichment Framework. Documentation, Europeana. https://docs.google.com/document/d/1JvjrWMTpMIH7WnuieNqcT0zpJAXUPo6x4uMBj1pEx0Y • Stiller, J. (editor) )(2016) Best practices for multilingual access. Tech. rep. http://pro.europeana.eu/files/Europeana_ Professional/Publications/BestPracticesForMultilingualAccess_whitepaper.pdf • Stiller, J., Gäde, M., & Petras, V. (2013). Multilingual access to digital libraries: The Europeana use case. Information-Wissenschaft Und Praxis, 64, 86–95. • Olensky, M., Stiller, J., & Dröge, E. (2012). Poisonous India or the Importance of a Semantic and Multilingual Enrichment Strategy. In 6th Research Conference, MTSR 2012, Cádiz, Spain, November 28-30, 2012. (pp. 252–263). Berlin: Springer. • Stiller, Gäde, Petras (2010): Ambiguity of Queries and the Challenges for Query Language Detection. 49
  48. 48. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Rickard Domeij Language Planner Language Council of Sweden, Institute of Language and Folklore Multilingualism, technology and language policy
  49. 49. Content ● The LC and the multilingual language policy of Sweden (and EU) ● Multilingually accessible services ● Language technology (LT) and language resources ● National Language Bank ● First experiences in digital humanities and cultural heritage ● Challenges for LT in cultural heritage ● Next steps
  50. 50. Multilingual language policy ● The LC monitors and promotes the languages of Sweden and their use ● Language policy (2005) and Language act (2009) ● Status and rights to use Swedish and other languages in Sweden ● National minority languages: Sami, Meänkieli, Finnish, Romani, Jiddish ● Swedish sign, Nordic languages, EU-languages, immigrant languages ● Public agencies have to reach out to the whole population ● Also good for business
  51. 51. Multilingually accessible services ● Vision: a multilingual society in which all citizens are included with respect to different backgrounds and languages --> digital inclusion ● Access to info and services according to language rights and needs ● Switch between languages and modes according to preferences ● Example: have a web text read aloud in your language ● Essential for people with disabilities but also useful for others = design for all (e.g. subtitling)
  52. 52. LT to make it possible ● Conversions between languages and modes ● Different modes: writing, speech, gestures … ● Multilingualism = multilinguality + multimodality ● LT modules: text-to-speech (TTS), speech-to-text (STT), machine translation (MT) … ● Applications: recitation, dictation, translation … ● Voice translation: STT > MT > TTS
  53. 53. LT to make it possible II ● Problems with quality and trust, especially on unrestricted data ● User and domain adaptation, user interaction ● Ex: respeaking system for subtitling on tv ● Accessibility often means loss of quality, but other gains ● Accessible and usable
  54. 54. Language resources needed ● Data and tools: corpora, markup tools, lexicons, language models … ● Rule-based methods, especially for less resourced languages ● Market forces are not enough ● Stimulate the development of LT and multilingually accessible services by national means (ex: respeaking system for Swedish tv) ● National Language Bank (NLB) to make resources available for R&D An NLB promotes the development of technology, which benefits the languages in Sweden and improves access to information for everyone. Digital agenda for Sweden (2011)
  55. 55. National research infrastructure (2017- 00626) funded by the Swedish Research Council by 1,5 mil./year until 2025. Two main types of data: Multilingual texts and terms from PAs Multimodal cultural heritage collections
  56. 56. First experiences in cultural heritage ● Available voice recognition and MT doesn’t work! ● Instead try other methods: ○ ”sound browsing” to explore speech recordings acoustically ○ respeaking for transcribing speech ○ transcription of handwritten dialect text in Transcribus ○ time-alignment of existing transcripts to sound in ELAN ○ linking from text to speech data in the archives (see next page) ● Usage centered, participative design in multidisciplinary teams ● Tilltal project (SAF16-0917:1)
  57. 57. First experiences in cultural heritage ● State-of-the-art voice recognition and MT doesn’t work! ● Instead try other methods: ○ ”sound browsing” to explore speech recordings acoustically ○ respeaking for transcribing speech ○ transcription of handwritten dialect text in Transcribus ○ time-alignment of existing transcripts to sound in ELAN ○ linking from text to speech data in the archives ● Usage centered, participative design in multidisciplinary teams ● Tilltal project (SAF16-0917:1)
  58. 58. Challenges for LT in cultural heritage ● Interface or content (= multilingual in a broad sense) ● Far beyond modern standard language use ● Great variation makes domain adaptation hard ● Variation in place (dialects and languages), time (old Swedish) and situation (informal-formal) ● Modal variation in collections: (handwritten) text, speech, pictures ● Hard to handle as researchers want to explore a collection as a whole
  59. 59. Next steps ● Linked data to describe the collection conceptually and relationally ● Multilingual search methods for handling language variation in place, time and situation ● Domain adopted speech-to-text conversion to transcribe recordings ● Crowdsourcing for correcting ● Shared resources for the languages, dialects, domains etc ● Long time funding for the National Language Bank ● Collaborative projects involving LTists, researchers and data holders
  60. 60. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Andrejs Vasiļjevs Executive Chairman Tilde Project Manager Culture information systems centre Riga Jānis Ziediņš Learnings from the automatic translation projects and how to apply them for the culture and heritage sector
  61. 61. Culture information systems centre 65 Our mission is to assist cultural heritage institutions - ARCHIVES LIBRARIES MUSEUMS maintain and make available cultural heritage for future generations through the latest information technology solutions.
  62. 62. 6 6
  63. 63. 6 7
  64. 64. Benefit for eGovernment 6 8 State Gov.lv platform Platform for the provision and management of e-Services Single Public Administration Data Area Municipality IS Other IS State information systems MT platform OpenData
  65. 65. 6 9
  66. 66. 7 0
  67. 67. 7 1
  68. 68. 7 2
  69. 69. 7 3
  70. 70. 7 4
  71. 71. 7 5
  72. 72. 7 6
  73. 73. Digitization of the Cultural Heritage Content The National Library of Latvia is implementing a European Regional Development Fund (ERDF) and nationally co-funded project in the field of Latvia's digital cultural heritage, together with project partners – the National Archives of Latvia, the State Inspection for Heritage Protection of Latvia, and the Cultural Information System Centre. The project will further develop the Digital Object Management and Conservation System, develop the Copyright Management and Content Licensing System, publish several Open Datasets, including Related Open Datasets, and develop the Stage of an Integrated Centralized Open System Information Platform. 7 7
  74. 74. 7 8
  75. 75. 7 9
  76. 76. Translation test A photomontage postcard with five views of Riga. The central city panorama with the new Pontoon Bridge opened in 1896 and the Mazā Guild building in the right corner. Below these images, the city theatre, Vērmanes Garden and the bridge across the canal by Bastejkalns. A postcard is assembled from five views of Riga - downtown panorama with the new Pontonbridge discovered in 1896, the Little Guild House in the right corner, under these images - City Theatre, Verman Gardens, a bridge over the canal near BastejHill. Manual translation Hugo.lv translation VRVM 176655 http://www.nmkk.lv/Items/ItemViewForm.aspx?i d=167748 8 0
  77. 77. AI for breaking language barriers
  78. 78. Enablers of AI ML Algorithms Computing PowerBig Data
  79. 79. 84 Based on Tilde Neural MT technologies that have won the 1st place at the WMT2017-2019, a global competition between the World’s top language technology providers Best WMT 2017 Best WMT 2018 Best WMT 2019
  80. 80. • Generic MT systems were trained on 52 million parallel sentences • Cultural domain MT systems were customized with additional 826 000 parallel sentences 5 million monolingual sentences Books Public sector data ▪ Fiction ▪ Scientific literature ▪ Technical literature (manuals, instructions) ▪ News from popular media (also multilingual media) ▪ Company press releases ▪ Multilingual web site content ▪ Laws, regulations, directives, etc. ▪ Documents of internal and external use ▪ Press releases ▪ Public sector web site data News and web content Proprietary translation memories ▪ Professional and amateur translator produced data ▪ Translation memories of translation and localisation service providing companies ▪ Translation memories of international organisations Datafor MT System Development
  81. 81. Comparison to Google – Automatic Evaluation
  82. 82. Comparison to Google – Human Evaluation
  83. 83. Usability, productivity and integration Translation add-on for browsers Translation API Plug-ins for CAT tools Translation widget
  84. 84. Hugo.lv – AI powered language technology portal
  85. 85. 90
  86. 86. 91 1.1 million terms 22 subject fields 164 216 terms in culture domain
  87. 87. 92 EU Council Presidency Translator 2017-2020
  88. 88. 93 EU Presidency bildīte EU COUNCIL PRESIDENCY TRANSLATOR
  89. 89. 94 EU PRESIDENCY TRANSLATOR AI-powered Neural MTCEF eTranslation MT systems for the 24 official EU languages enabling translation of full documents, preserving text formatting AI-powered custom Neural MT providing superior-quality translation adapted for the Presidency requirements
  90. 90. 95 Web Site – Text Translation
  91. 91. 96 Formatting-Rich Document Translation
  92. 92. 97 Website Translation
  93. 93. 98 BENEFITS FOR ESTONIA, BULGARIA, AUSTRIA • Enables Presidency staff to quickly translate documents • Empowers visiting journalists and delegates to access info in the local language, e.g., press releases, local news sites • Supports staff translators in their work by boosting translation productivity up to 35% • Lowers costs of translation for documents by utilizing post-edited machine translation • Allows public sector organizations to translate content and websites into multiple languages
  94. 94. 99 From September, 2017 to October, 2019 the EU Council Presidency Translator has processed: 32 159 082 million words 2.83 million sentences 1.09 million translation requests ~207 books (there are 155 thousand words on average in one Harry Potter book) STATISTICS
  95. 95. 100
  96. 96. 101 Conclusions • New generation of Neural MT strongly improves quality and applicability of machine translation, especially for morphology rich languages • Domain specific data is crucial for making MT suitable for cultural and other domains • Depending on the application, translation needs can be served by selecting the most efficient approach – pure MT, human review of the MT, or fully human translation • We will be happy to share our experience, technologies and tools :)
  97. 97. Thank you! Jānis Ziediņš, janis.ziedins@kis.gov.lv Andrejs Vasiļjevs, andrejs@tilde.com
  98. 98. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Heli Kautonen Library Director Finnish Literature Society SKS Design for Diversity
  99. 99. Design for Diversity Heli Kautonen Library Director, Finnish Literature Society (SKS) 24.10.2019 Europeana meeting on multilingualism, Hanaholmen, Finland
  100. 100. 1831 Photo © Gary Wornell, SKS 2019
  101. 101. Image © SKS 2010 Suomalaisen Kirjallisuuden Seura (Finnish Literature Society)
  102. 102. Photo © Gary Wornell, SKS 2019
  103. 103. Photo: Alexandre Caffiaux, Université de Lille, 2018. CC-BY 2.0 Diversity
  104. 104. Diversity Photo: Jackster121212 - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=80077504C
  105. 105. Photo: Heli Kautonen 2017 Design
  106. 106. Universal Design Critical Design Inclusive Design Value-Sensitive Design Photo: Helsinki City Museum, CC-BY 4.0 Source: Finna.fi Photo: newobj Source: Github.com
  107. 107. How might we…? Photo: Heli Kautonen 2016
  108. 108. …measure the value… …now, next year, in the future? Photo: Heli Kautonen 2019
  109. 109. Development Implementation Operation and maintenance Initiation (of a new service) time Process-time Use-time Future Who are involved in the development and implementation of your service? What kinds of benefits can be identified? Who uses your service? Are there other stakeholders? What kinds of benefits can be identified? Who could (re)use your service or materials in the (undefined) future? What kinds of benefits can be anticipated? Model for temporal division of benefits Kautonen, H. & Nieminen, M. (2018): Conceptualizing Benefits of User-Centered Design for Digital Library Services. Liber Quarterly, 28(1), ss. 1–34. DOI: http://doi.org/10.18352/lq.10231.
  110. 110. TrustEfficiency Revenue Better quality Learning & competence Self esteem Ease of use Cost savings COMMITMENT Sustainability
  111. 111. ”for + with society” Prof. Linda Doyle Trinity College Dublin Photo: Heli Kautonen 2019
  112. 112. Photo: Heli Kautonen 2019 2031
  113. 113. Questions and comments heli.kautonen@finlit.fi Twitter: @helimuori https://fi.linkedin.com/in/heli-kautonen-38136512
  114. 114. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Dasha Moskalenko Manager Service Design Europeana Foundation Europeana case study UX Design and user testing
  115. 115. Ο Ζητιάνος Φοιτητής, Άγνωστος δημιουργός, 1945,Ίδρυμα Μουσείου Νίκου Καζαντζάκη, Greece, CC BY-NC-ND
  116. 116. Καντσονίσιμα-Σατιρίσιμα-Ψυθιρίσιμα, Άγνωστος δημιουργός, 1971, Ίδρυμα Μουσείου Νίκου Καζαντζάκη, Greece, CC BY-NC-ND
  117. 117. Language in Portuguese
  118. 118. Language detection and display (for validation)Query translated in 24 languages
  119. 119. Results displayed based on relevance in all languagesResults displayed in original languageSearch term highlighted
  120. 120. Sort by language availableLanguage tag showing item’s original languageLanguages in which item metadata is available
  121. 121. Item’s original language & option for automatically translation
  122. 122. Hands showing the French sign language alphabet, Wellcome Collection, CC BY europeana.eu @EuropeanaEU THANK YOU! Questions & comments are welcome. dasha.moskalenko@europeana.eu
  123. 123. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Matias Frosterus Information Systems Manager with Mikko Lappalainen, Osma Suominen, Satu Niininen National Library of Finland Multilingual linked vocabularies and automatic subject indexing services - National Library's Finto and Annif
  124. 124. THE NATIONAL LIBRARY OF FINLAND Libraries and access
  125. 125. THE NATIONAL LIBRARY OF FINLAND Libraries and access ?
  126. 126. THE NATIONAL LIBRARY OF FINLAND Libraries and access ?
  127. 127. THE NATIONAL LIBRARY OF FINLAND Libraries and access !
  128. 128. THE NATIONAL LIBRARY OF FINLAND Libraries and access !
  129. 129. THE NATIONAL LIBRARY OF FINLAND Libraries and access
  130. 130. THE NATIONAL LIBRARY OF FINLAND The goal ▪ Bringing the library know-how into use for all of the public sector ▪ But better! ▪ Better vocabularies ▪ Publication, use, and integration of those better vocabularies ▪ Automated tools to make it even easier
  131. 131. THE NATIONAL LIBRARY OF FINLAND What is needed? ▪ Modern linked data vocabularies ▪ A way to publish them for everyone to use ▪ A way to integrate them into your systems ▪ A way to make using them less labour-intensive
  132. 132. THE NATIONAL LIBRARY OF FINLAND Vocabularies
  133. 133. THE NATIONAL LIBRARY OF FINLAND Vocabularies ▪ Starting point: General Finnish Thesaurus YSA ▪ Developed in the 1980’s mainly for book indexing ▪ Over 30,000 terms ▪ Monolingual but has a Swedish counterpart Allärs
  134. 134. THE NATIONAL LIBRARY OF FINLAND Thesaurus to ontology ▪ Reconstruction of YSA into machine-readable and multilingual YSO ▪ Trilingual terms for concepts (fin, swe, eng) ▪ YSA and Allärs merged together and translated into English ▪ Concepts are a compromise between Finnish and Swedish as YSA and Allärs are not completely identical ▪ Links to Library of Congress Subject Headings (LCSH) ▪ Linking to Wikidata underway ▪ YSO just made the list of Europeana dereferenceable vocabularies that can be enriched in the Europeana portal
  135. 135. THE NATIONAL LIBRARY OF FINLAND Annotate in one language, find using another
  136. 136. THE NATIONAL LIBRARY OF FINLAND Challenges of multilinguality ▪ Founded on the concepts of the Finnish cultural sphere ▪ Some concepts may not be common outside of that ▪ sandwich cakes, uncles (maternal) ▪ väheneminen = minskning (antal) = decrease (passive) vähentäminen = minskning (aktiv reducering av antal) = decrease (active) ▪ Liikuntalukiot = idrottsgymnasier = general upper secondary schools focusing on sport and exercise
  137. 137. THE NATIONAL LIBRARY OF FINLAND Challenges of multilinguality ▪ Some may result in somewhat awkward terms ▪ rivers = joet = floder, åar och älvar ▪ The original Swedish thesaurus Allärs had three terms that could be used interchangeably
  138. 138. THE NATIONAL LIBRARY OF FINLAND Challenges of multilinguality ▪ Can also affect hierarchy ▪ pesät ⤷ muurahaispesät (literally ant nests) bon ⤷ myrstackar nests ⤷ ant hills ▪ For more information, see http://urn.fi/URN:NBN:fi-fe201705106375 Satu Niininen, Susanna Nykyri, Osma Suominen, (2017) "The future of metadata: open, linked, and multilingual – the YSO case", Journal of Documentation, Vol. 73 Issue: 3, pp.451-465, doi: 10.1108/JD-06-2016-0084.
  139. 139. THE NATIONAL LIBRARY OF FINLAND YSO YSO Upper hierarchy General concepts Specific concepts
  140. 140. THE NATIONAL LIBRARY OF FINLAND YSO YSO Upper hierarchy General concepts Specific concepts
  141. 141. THE NATIONAL LIBRARY OF FINLAND Adapted into use outside the library domain ▪ Extended with domain ontologies ▪ Using the core provided by YSO ▪ Helps interoperability! ▪ Developed by the domain experts in various organizations
  142. 142. THE NATIONAL LIBRARY OF FINLAND Adapted into use outside the library domain ▪ Extended with domain ontologies ▪ Using the core provided by YSO ▪ Helps interoperability! ▪ Developed by the domain experts in various organizations ▪ Over a dozen domain ontologies such as: ▪ AFO - Agriculture - 7 000 concepts ▪ JUHO - Government - 6 300 ▪ KAUNO - Literature - 5 000 ▪ KULO - Cultural research - 1 500 ▪ LIITO - Economics - 3 000 ▪ SOTO - Military - 2 000 ▪ TERO - Health - 6 500 ▪ And others
  143. 143. THE NATIONAL LIBRARY OF FINLAND Domain ontologies all extending YSO in
  144. 144. THE NATIONAL LIBRARY OF FINLAND KOKO ▪ An ”ontology cloud” which combines the domain ontologies and the general ontology into a cohesive whole
  145. 145. KOKO ▪ An ”ontology cloud” which combines the domain ontologies and the general ontology into a cohesive whole
  146. 146. THE NATIONAL LIBRARY OF FINLAND Vocabulary service
  147. 147. THE NATIONAL LIBRARY OF FINLAND National vocabulary and ontology service Finto ▪ A bit of history ▪ FinnONTO-research project (2003-2012) ▪ Built research prototypes of services and started the ontologization process of the various thesauri ▪ The National Library began the Finto project in 2013 funded by the Ministry of Education and Culture and the Ministry of Finance ▪ A national vocabulary and ontology service for the whole public sector
  148. 148. THE NATIONAL LIBRARY OF FINLAND Finto offers
  149. 149. THE NATIONAL LIBRARY OF FINLAND Finto offers Free to use Open licenses
  150. 150. http://finto.fi
  151. 151. THE NATIONAL LIBRARY OF FINLAND Adopted widely in Finland ▪ Finto is used in many organizations in Finland to annotate their various resources, among them ▪ The national broadcasting company Yle ▪ Suomi.fi citizen’s portal to public services ▪ Various public sector content systems ▪ Websites of various ministries ▪ Various museums, archives, and libraries
  152. 152. THE NATIONAL LIBRARY OF FINLAND Skosmos ▪ The heart beating inside Finto ▪ Open source SKOS vocabulary browser ▪ http://skosmos.org ▪ Publication and use of light-weight ontologies, thesauri and classifications ▪ Web interface ▪ REST API ▪ SPARQL endpoint ▪ Community ▪ https://groups.google.com/forum/#!forum/skosmos-users
  153. 153. How does it work? ▪ Make your thesaurus into SKOS
  154. 154. SPARQL ▪ Put in in a SPARQL triple store How does it work?
  155. 155. SPARQL Skosmos ▪ Point Skosmos at your SPARQL endpoint How does it work?
  156. 156. SPARQL Skosmos ▪ And serve your thesaurus for humans, Linked Data agents, and REST API access How does it work?
  157. 157. THE NATIONAL LIBRARY OF FINLAND Key features ▪ Multilingual browser interface (10 languages) ▪ Autocomplete search ▪ Alphabetical index ▪ Concept hierarchy display ▪ Concept groups (thematic index) ▪ New concepts ▪ REST API for enabling use of vocabularies in other applications ▪ responses usually JSON-LD
  158. 158. www.loterre.fr/skosmos http://chemskos.com Skosmos installations around the world http://vocabularies.unesco.org/ http://aims.fao.org/standards/agro voc/functionalities/search
  159. 159. THE NATIONAL LIBRARY OF FINLAND Automated subject indexing
  160. 160. THE NATIONAL LIBRARY OF FINLAND Many possible solutions
  161. 161. THE NATIONAL LIBRARY OF FINLAND Some problems YSO KOKO AFO JUHO € £ $
  162. 162. THE NATIONAL LIBRARY OF FINLAND Automated Subject Indexing made easy: Annif ▪ An open source multilingual automated subject indexing system using machine learning and our own vocabularies
  163. 163. THE NATIONAL LIBRARY OF FINLAND Where to get the learning material?
  164. 164. Metadata about 13M documents, many of them tagged with subjects! Hot tub by a lake Andrei Niemimäki CC BY-SA
  165. 165. Hot tub by a lake Andrei Niemimäki CC BY-SA Metadata about 13M documents, many of them tagged with subjects!
  166. 166. Hot tub by a lake Andrei Niemimäki CC BY-SA Metadata about 13M documents, many of them tagged with subjects!
  167. 167. Finna API ▪ All Finna metadata is ▪ YSO and KOKO widely used
  168. 168. THE NATIONAL LIBRARY OF FINLAND ▪ Try it out for yourself at http://annif.org/ Automated Subject Indexing made easy: Annif Prototype in 2017
  169. 169. THE NATIONAL LIBRARY OF FINLAND Automated Subject Indexing made easy: Annif VsAutomating our own processes Creating generic tools for many contexts
  170. 170. THE NATIONAL LIBRARY OF FINLAND Annif development ▪ Packaging Annif into an easy-to-deploy solution via Docker ▪ Tuning the various algorithms and their hyperparameters powering Annif ▪ Making integration easier through a Finto API
  171. 171. THE NATIONAL LIBRARY OF FINLAND Summary
  172. 172. THE NATIONAL LIBRARY OF FINLAND Summary Interlinked multilingual vocabularies for various domains A national service for publishing and using said vocabularies An automated system for making it easy to produce annotations with said vocabularies
  173. 173. THE NATIONAL LIBRARY OF FINLAND Summary Interlinked multilingual vocabularies for various domains A national service for publishing and using said vocabularies An automated system for making it easy to produce annotations with said vocabularies All the while utilizing library know-how Richer metadata Cross-domain findability and interoperability More efficient workflows New connections, new possibilities
  174. 174. THE NATIONAL LIBRARY OF FINLAND Thank you! matias.frosterus@helsinki.fi finto-posti@helsinki.fi @Fintopalvelu All pictures used under CC0 license unless otherwise noted
  175. 175. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0 Hugo Manguinhas Product Manager API Europeana Foundation Case Study - Translation of object metadata using the Knowledge Graph
  176. 176. Multilingual experience Collections Object metadata Text objects Search Browse Display Translatable dataUsage scenarios Editorial content User interface
  177. 177. Object Metadata What is the title of the object? Who created or contributed it? What topics is the object about? What kind of object it is? When was it created or published? Where was it created or is located? ...
  178. 178. KNOWLEDGE GRAPH Bulong Miao, Wellcome Collection, United Kingdom, CC BY
  179. 179. About the Knowledge Graph ● Vast network of data sources made available in the wider Linked Open Data cloud ● Can be linked to and used to bring more contextual information to the items ● Vast and readily available source of controlled translations Part of the Linking Open (LOD) Data Project Cloud Diagram, CC-BY-SA.
  180. 180. EDM and the Knowledge Graph We encourage data providers to ● Contribute links to their own vocabularies and publish them as Linked Open Data ● Use available reference vocabularies to describe their content Clavecin, Bartolomeo Cristofori Cite de la Musique, MIMO - Musical Instruments Museums Online|CC BY-NC-SA
  181. 181. ● Available as Linked Open Data and therefore part of the Knowledge Graph ● The rights statements have been translated into: Estonian, Finnish, French, German, Polish and Spanish, but 7 more translation efforts are ongoing Research has shown that the official translation of rights information leads to better investment/effort into adoption of rs.org and thus more accurate copyright info
  182. 182. General Finnish Ontology (YSO) <skos:Concept rdf:about="http://www.yso.fi/onto/yso/p4349"> <skos:prefLabel xml:lang="sv">hederstecken</skos:prefLabel> <skos:prefLabel xml:lang="fi">kunniamerkit</skos:prefLabel> <skos:prefLabel xml:lang="en">medals of honour</skos:prefLabel> <skos:altLabel xml:lang="sv">ordnar</skos:altLabel> <skos:altLabel xml:lang="sv">ordnar (hederstecken)</skos:altLabel> <skos:broader rdf:resource="http://www.yso.fi/onto/yso/p1581"/> <skos:related rdf:resource="http://www.yso.fi/onto/yso/p4347"/> <skos:related rdf:resource="http://www.yso.fi/onto/yso/p4348"/> <skos:related rdf:resource="http://www.yso.fi/onto/yso/p11634"/> <skos:exactMatch rdf:resource="http://www.yso.fi/onto/koko/p30868"/> <skos:exactMatch rdf:resource="http://www.yso.fi/onto/ysa/Y96541"/> <skos:exactMatch rdf:resource="http://www.yso.fi/onto/allars/Y23916"/> </skos:Concept>
  183. 183. Vocabularies used by Data Providers language coverage: 0.36 (topics and subjects) Not all vocabularies are properly language tagged!
  184. 184. Europeana’s Knowledge Graph Entity Collection
  185. 185. Entity Collection: benefits ● Allows Europeana to establish links to the Knowledge Graph through means of semantic enrichment of the object metadata ● Harmonizes vocabularies from the multiplexity of data providers into a single point of reference ● Exploits coreference links between vocabularies to increase multilingual coverage Entity Collection Entity Collection
  186. 186. Entity Collection: multilingual coverage language coverage: 13.1 (topics and subjects) For persons drops to 4.8
  187. 187. Entity Collection: multilingual improvements
  188. 188. Steps to improve the Knowledge Graph ● Promote alignment efforts between vocabularies used by data providers to complementary vocabularies such as Wikidata ● Promote translation efforts/campaigns to increase multilingual coverage of the Knowledge Graph prioritising on discovery-enabling metadata fields
  189. 189. A FOCUSED VIEW ON THE GENERAL STRATEGY Idrottstävlingar på Eyravallen. "Benke". 27 september 1955.,Örebro Kuriren, Örebro läns museum, Sweden, Public domain
  190. 190. Multilingual search, browse and display Usage scenarios ● Enter search query in chosen language ● See search results and interact with filters in chosen language ● Display object metadata on item page ● Navigate to entities
  191. 191. Proposals for indexing and storing translations ● Automated identification of language if needed (only 26.5% of the data provider’s metadata is language qualified) ● Use translations from multilingual knowledge graph ● Augment the provider metadata with static translation of the fields to English (to fill metadata values not covered by the knowledge graph) ● Store and index translated metadata for search and display (original metadata + languages of the knowledge graph + English)
  192. 192. Proposals for search on object metadata Identify language Original query Translate to English Multilingual index User Disambiguates Search Translated query (English) Suggest Entity (Knowledge Graph) Entity-based query Multilingual query: entity based query OR original query + translated query #1: French #2: Spanish #3: Polish
  193. 193. Proposals for display of object metadata Multilingual Database Translate from English Obtain metadata (Knowledge Graph) In original language or English Obtain metadata In other language Request metadata
  194. 194. MULTILINGUAL EXPERIENCE OUTCOMES ● Users can search and filter in one of 24 official languages ● Item page metadata would display in chosen language if knowledge graph translations were present ● Where chosen language is not supported, display will default to source language and offer option to view in English
  195. 195. Challenges & Open Questions ● How successful is automated language detection? ● Would prioritising static translation of discovery-enabling metadata fields to English be “good enough”? ● How well can we statically translate remaining metadata fields to English, specially when they contain single or short phrases? ● Would dynamic translation of metadata (for languages other than English) be good enough?
  196. 196. The Chinese Market, 1767 - 1769, Rijksmuseum, Netherlands, Public domain europeana.eu @EuropeanaEU

×