Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multilingualism ifla 2014 08

669 views

Published on

OCLC's 3 overlapping projects aim to generate true multi-lingual displays and to generate translation records for sharing via VIAF.

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Multilingualism ifla 2014 08

  1. 1. IFLA - Lyon, France 19 August 2014 Multilingualism in WorldCat and VIAF Janifer Gatenby Working with Karen Smith-Yoshimura, Robert Bremer, Eric Childress, Jean Godby, Richard Greene, JD Shipengrover, Gail Thornburg, Jenny Toves, Diane Vizine Goetz, Shenghui Wang, Jay Weitz
  2. 2. WorldCat Today • Resources in nearly all languages • Contributed by more than 20,000 libraries worldwide • More than half the database is for works not in English Languages English German French Spanish Chinese Dutch Japanese Russian Arabic 469 others
  3. 3. WorldCat Today • Bibliographic Records – Hybrid records – Parallel records • Clustered at Work level (FRBR)
  4. 4. Existing Architecture Authors Authors Authors Subj Classif Subj Classif Subj Classif Holding Holding Holdings Bibliographic record Work cluster Content cluster Manifes tation cluster
  5. 5. Complementary Initiatives Work Level Record GLIMIR Manifestation & Content Clusters Multi-lingual Bibliographic Structure
  6. 6. Create a consolidated metadata summary for the content of a work Objective: Work Level Record
  7. 7. Work Level Record http://www.oclc.org/research/activities/workrecs.html Coming Q1 2015
  8. 8. Create better work presentations GLIMIR: Objective
  9. 9. Users like C • The Content Cluster GLIMIR – Enables better work record displays by reducing the number of lines that display for large works – Enables a choice of format and presents the formats that could be acceptable substitutes – Consolidates holdings for identical content • The Manifestation Cluster is important – Consolidates holdings at manifestation level – In the short term allows the record catalogued in the language of the interface to be chosen for display – Reduces apparent duplication – Allows a more accurate count of the number of manifestations in WorldCat (as opposed to the number of records) Cataloguers & scholars like C
  10. 10. Manifestation Clustering So far 103 million records processed (about 30%)
  11. 11. Manifestation Cluster Opened
  12. 12. SRU Search: Loti Pêcheur d’islande (Work ID 21536567) Records Holdings Work 18 148 Content 14 143 Manifestation 7 115
  13. 13. Multilingual Bibliographic Structure Project Objective: Improve displays; surface translations
  14. 14. Multilingual Bibliographic Structure Project Creates true multi-lingual displays – At work and manifestation levels – Using all available data instead of “most appropriate record” – Generates data Corrects many of the 28 million records coded “und” Better control and linking of translations Input to refinement of work clusters Smarter data storage
  15. 15. “Most appropriate” questioned • Worldcat.org selects the most appropriate record to show to a user as representative of the work in the short result list and beyond • The end result will not be very satisfactory from a multi-lingual viewpoint… here’s why
  16. 16. Which record is better to present to a German speaker?
  17. 17. Incomplete Swedish Record
  18. 18. Hybrid record
  19. 19. Most appropriate display Build the display from all available data
  20. 20. Multilingual Bibliographic Structure Project • Work level data, mined from all associated bibliographic records will be displayed supplemented with expression / manifestation level data as the user drills through the short to fuller versions of the metadata. End user interface will show works and manifestations not bibliographic records; the cataloguing client will also show bibliographic records
  21. 21. Proposed new architecture jpn Work eng fre ger jpn Manif eng Manif eng Manif eng Manif eng Manif Manif eng engA oN oftrees Contents ++ Holding Holding Holding Subj Classif Holding Subj sif eng fre ger jpn Authors Authors eng Authors fre ger eng fre ger jpn fre eng ger jpn Translations (Language of work) Manif fre Holding
  22. 22. Important principles • Language tagging of elements, particularly – Summaries (M21 520) – Subject headings • Display in script preferred by the user if data is available • Improve translated interfaces • Show consolidated holdings as appropriate
  23. 23. Translations Surfacing the “cream”
  24. 24. Great works are translated • The cream of the world’s cultural and knowledge heritage is shared by being translated • WorldCat contains many rich cataloguing records for these translations GOAL: Data mine the really good records to improve clustering, presentation, authority records and linked data
  25. 25. Ιλιάδα The Iliad 紅樓夢 Dream of the Red Chamber ঘরে বাইরে The Home and the World زقاق المدق Midaq Alley Война и миръ War and Peace The Tale of Genji דער בעל-תשובה The Penitent સત્યના પ્રયોગો અથવા આત્મકથા 源 氏 物 語 The Story of My Experiments with Truth [Gandhi autobiography]
  26. 26. Translations Leo Tolstoy: 32 languages Homer: 28 languages Rabindranath Tagore: 21 Isaac Bashevis Singer: 17 Najīb Maḥfūẓ: 12 languages Cao Xueqin: 9 languages Mahatma Gandhi: 7 languages Murasaki Shikabu: 7 languages
  27. 27. Improving work clustering • Inconsistencies cause work clusters to be incomplete resulting in less than optimal search results – Titles without subtitles – Missing or different forms of uniform title – Inverted title – Different coding of original and translated information Generated uniform title authority records will overcome most of these differences without needing to edit individual records
  28. 28. Addition of xR records to VIAF Before After
  29. 29. UNESCO Translation Database
  30. 30. XR VIAF Record VIAF ID for Author Translated title Translator
  31. 31. IFLA - Lyon, France 19 August 2014 VIAF Linked Data New Information
  32. 32. Title: 西遊記 Language: Chinese Author: 吳承恩 Created: 1592 HasTranslation: Title: Journey to the West Language: English Translator: Anthony C. Yu Date: 1977 IsTranslationOf: Title: Journey to the West Language: English Translator: W. J. F. Jenner Date: 1982-1984 IsTranslationOf: Title: Tây du ký bình khảo Language: Vietnamese Translator: Phan Quân Date: 1980 IsTranslationOf: Title: Monkeys Pilgerfahrt Language: German Translator: Georgette Boner Date: 1983 IsTranslationOf: Title: 西遊記 Language: Japanese Translator: 中野美代子 Date: 1986 IsTranslationOf:
  33. 33. Markup for the Semantic Web # Original Work (in Chinese) <http://worldcat.org/entity/work/id/1215997> a schema:CreativeWork; schema:creator <http://viaf.org/viaf/102266649> ; # "Gao, Xingjian” schema:inLanguage "zh"; schema:name "靈山"@zh; . # Translated Work (in English) <http://worldcat.org/entity/work/id/145209748> a schema:CreativeWork; schema:creator <http://viaf.org/viaf/102266649> ; # "Gao, Xingjian“ [new]:translator <http://viaf.org/viaf/81663420> ; # "Lee, Mabel" schema:inLanguage "en"; schema:name "Soul Mountain"@en ; [new]:translationOfWork <http://worldcat.org/entity/work/id/1215997> “
  34. 34. Understanding information sharing across cultures • What percentage of non-English works are translations of English works, and vice-versa? • Which authors are translated the most? • Which works have been translated into the most languages? • Which countries translate the most English works, the most non-English works? • Which countries translate a new work the fastest? Etc. http://www.oclc.org/research/activities/multilingual-bib-structure.html
  35. 35. Where are we now? Clustering • Work clusters done; ongoing refinement • GLIMIR clustering done for all [simple] text; – 103 million records have GLIMIR IDs • Working on collected works Displays • Working on VIAF expression displays • Work level displays in WorldCat.org ++ Data Mining for translations
  36. 36. Janifer Gatenby EMEA Program Manager Metadata Janifer.gatenby@oclc.orgoclc.org Explore. Share. Magnify.

×