Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Challenges of Multilingualism


Published on

Presentation given at the Visualizing Digital Humanities workshop hosted by the Lorentz Center, University of Leiden 19-23 June 2017. It focuses on translations as that is how we learn about other cultures, and other cultures learn about us. It summarizes OCLC Research’s exploration of enriching WorldCat records with data extracted from a third-party linked data resource, Wikidata, to associate translations to the original work.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Challenges of Multilingualism

  1. 1. Visualizing Digital Humanities • 21 June 2017 Challenges of Multilingualism Karen Smith-Yoshimura Senior Program Officer, OCLC Research
  2. 2. • Different writing systems, different transliterations • Metadata not always good enough • Linked data opportunities Challenges Why focus on translations?
  3. 3. Why focus on translations? For Developers: Present information in the preferred language & script of the user For Academics: Understand information sharing across cultures
  4. 4. Leo Tolstoy: 97 languages Rabindranath Tagore: 93 Homer: 84 languages Mahatma Gandhi: 52 languages Isaac Bashevis Singer: 52 Najīb Maḥfūẓ: 47 languages Cao Xueqin: 27 languages Murasaki Shikabu: 21 languages Translations HangingTogether, 2013-11-12 [By January 2017, person clusters have increased to 45 million]
  5. 5. Ιλιάδα The Iliad 紅樓夢 Dream of the Red Chamber Война и миръ War and Peace ঘরে বাইরে The Home and the World સત્યના પ્રયોગો અથવા આત્મકથા The Story of My Experiments with Truth [Gandhi autobiography] 源 氏 物 語 The Tale of Genji ‫בעל‬ ‫דער‬-‫תשובה‬ The Penitent ‫المدق‬ ‫زقاق‬ Midaq Alley
  6. 6. René Descartes, 1596-1650 Martin Heidegger, 1889-1976 [24K publications in 24 languages] [26K publications in 17 languages]
  7. 7. Michel Foucault, 1926-1984 Stephen Hawking, 1942- [14K publications in 15 languages] [4K publications in 28 languages]
  8. 8. One VIAF identifier clusters over 40 sources for the same entity Things not strings
  9. 9. • Resources in nearly all languages • More than 2.5 billion holdings contributed by libraries worldwide • More than half the database is for works not in English WorldCat today English German French Spanish Chinese Japanese Italian Dutch Russian Latin 476 others Languages April 2017
  10. 10. UNESCO Translation Database
  11. 11. The UNESCO database has translations in only 5 languages of Loti’s Pêcheur d’Islande
  12. 12. Distinguishing translations into same language
  13. 13. Language of cataloging 0 20,000,000 40,000,000 60,000,000 80,000,000 100,000,000 120,000,000 140,000,000 160,000,000 English German French Dutch Spanish Danish Swedish Italian Chinese Japanese
  14. 14. Language of cataloging and subject headings Filosofía alemana [@es, Spanish] Sein und Zeit by Martin Heidegger Fundamentalontologie. Ontologie. [@de, German] 哲学思想 [@zh, Chinese]
  15. 15. Leveraging language of cataloging
  16. 16. @fr @en
  17. 17. • No original language • No original title • Chiodi translator, not author
  18. 18. Many languages in WorldCat written in non-Latin scripts 홍길동전 Ιλιάδα 源 氏 物 語 ‫المدق‬ ‫زقاق‬ Война и миръ ‫בעל‬ ‫דער‬-‫תשובה‬ 紅樓夢 พระอภัยมณี
  19. 19. Wikipedia languages English Cebuano Swedish German Dutch French Russian Italian Spanish Waray-Waray Polish Vietnamese Japanese Portuguese Chinese All other 279 languages As of 2017-05-31
  20. 20. Translators
  21. 21. Title: 존재 와 시간 Language: Korean Translator: 전 양범 Date: 1989 IsTranslationOf: Title: 存在와時間 Language: Korean Translator: 鄭明五, 鄭淳喆 Date: 1972 IsTranslationOf: Title: Sein und Zeit Language: German Author: Martin Heidegger Created: 1927 HasTranslation: Title: Время и бытие Language: Russian Translator: Владимир Вениамович Бибихин Date: 1993 IsTranslationOf: Title: 存在と時間 Language: Japanese Translator: 細谷貞雄 Date: 1997 IsTranslationOf: Title: Being and Time Language: English Translator: Joan Stambaugh Date: 2010 IsTranslationOf: schema:translationOfWork
  22. 22. Markup for the semantic web # Original Work (in Chinese) <> a schema:CreativeWork; schema:creator <> ; # "Gao, Xingjian” schema:inLanguage "zh"; schema:name "靈山"@zh-hant . # Translated Work (in English) <> a schema:CreativeWork; schema:creator <> ; # "Gao, Xingjian“ schema: translator <> ; # "Lee, Mabel" schema:inLanguage "en"; schema:name "Soul Mountain"@en ; schema:translationOfWork <>
  23. 23. Work sets • Series • Editions • Translations • Publishers • Subjects • Classifications • Materials • Library holdings • … Book instances Courtesy of Shenghui Wang • Multilingual labels • First publication date • Original language and script of work • First line • Freebase ID • MusicBrainz work ID • … Multilingual labels Other identifiers Common descriptions Original scripts Work ID Richer bibliographic descriptions Translations Library related identifiers
  24. 24. • WorldCat has more translations than any other resource. • We can tag data elements from different languages of cataloging. • We can ingest data from other linked data sources to present information in the preferred language and script of the user and associate translations to the original work. Meeting the challenges
  25. 25. SM Together we make breakthroughs possible. Thank you! Karen Smith-Yoshimura @KarenS_Y Challenges of Multilingualism Visualizing Digital Humanities• 21 June 2017 ©2017 OCLC. This work is licensed under a Creative Commons Attribution 4.0 International License. Suggested attribution: “This work uses content from “Data Designed for Discovery” © OCLC, used under a Creative Commons Attribution 4.0 International License:”