Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multilingual presentation ifla 2013 08-19

1,939 views

Published on

Data mining OCLC for translations.
Creating authority records for VIAF.
Remodelling the bibliorgraphic structure to make the best mutli-lingual displays from all available data in a work set.

Published in: Technology, Education
  • Be the first to comment

Multilingual presentation ifla 2013 08-19

  1. 1. The world’s libraries. Connected. Multilingual WorldCatpresented by Janifer Gatenby IFLA, Singapore, 2013-08-19 Karen Smith Yoshimura Eric Childress Janifer Gatenby Jean Godby Richard Greene Jenny Toves Diane Vizine Goetz Robert Bremer JD Shipengrover Gail Thornburg Jay Weitz
  2. 2. The world’s libraries. Connected. WorldCat Today • Resources in nearly all languages • Contributed by more than 20,000 libraries worldwide • More than half the database is for works not in English
  3. 3. The world’s libraries. Connected. WorldCat Today • Bibliographic Records • Hybrid records • Parallel records • Clustered at Work level (FRBR)
  4. 4. The world’s libraries. Connected. Existing Architecture Author sAuthor sAuthors Subj Classif Subj ClassifSubj Classif Holdin gHoldin gHoldings Bibliographic recordWork cluster Content cluster Manifes tation cluster
  5. 5. The world’s libraries. Connected. Complementary Initiatives Work Level Record GLIMIR Manifestation & Content Clusters Multi-lingual Bibliographic Structure
  6. 6. The world’s libraries. Connected. Work Level Record http://www.oclc.org/research/activities/workrecs.html
  7. 7. The world’s libraries. Connected. Create a landing page summarizing content for a work Work Level Record: Objective
  8. 8. The world’s libraries. Connected. • The Content Cluster • Enables better work record displays by reducing the number of lines that display for large works • Enables a choice of format and presents the formats that could be acceptable substitutes • Consolidates holdings for identical content • The Manifestation Cluster is important • Consolidates holdings at manifestation level • In the short term allows the record catalogued in the language of the interface to be chosen for display • Reduces apparent duplication • Allows a more accurate count of the number of manifestations in WorldCat (as opposed to the number of records) GLIMIR
  9. 9. The world’s libraries. Connected. Creates true multi-lingual displays • At work and manifestation levels • Using all available data instead of “most appropriate record” • Generates data Corrects many of the 28 million records coded “und” Better control and linking of translations Input to refinement of work clusters Smarter data storage Multilingual Bibliographic Structure Project
  10. 10. The world’s libraries. Connected. • Worldcat.org selects the most appropriate record to show to a user as representative of the work in the short result list and beyond • The end result will not be very satisfactory from a multi-lingual viewpoint… here’s why “Most appropriate” questioned
  11. 11. The world’s libraries. Connected. Which record is better to present to a German speaker?
  12. 12. The world’s libraries. Connected. Incomplete Swedish Record
  13. 13. The world’s libraries. Connected. Hybrid record
  14. 14. The world’s libraries. Connected. Most appropriate display
  15. 15. The world’s libraries. Connected. • Work level data, mined from all associated bibliographic records will be displayed supplemented with expression / manifestation level data as the user drills through the short to fuller versions of the metadata. Multilingual Bibliographic Structure Project End user interface will show works and manifestations not bibliographic records; the cataloguing client will also show bibliographic records
  16. 16. The world’s libraries. Connected. Proposed new architecture Work eng fre ger jpn Manif eng Manif eng Manif eng Manif eng Manif eng Manif eng o freNotes Contents ++ Holdin gHoldin gHolding Holdin g Subj sif Subj Classif eng fre ger jpn Author sAuthor sAuthorseng fre ger jpn eng fre ger jpn eng fre ger jpn Translations (Language of work) Manif fre Holding
  17. 17. The world’s libraries. Connected. • Language tagging of elements, particularly • Summaries (M21 520) • Subject headings • Display in script preferred by the user if data is available • Improve translated interfaces • Show consolidated holdings as appropriate Important principles
  18. 18. The world’s libraries. Connected.
  19. 19. The world’s libraries. Connected.
  20. 20. The world’s libraries. Connected.
  21. 21. The world’s libraries. Connected.
  22. 22. The world’s libraries. Connected. Translations
  23. 23. The world’s libraries. Connected. • The cream of the world’s cultural and knowledge heritage is shared by being translated • WorldCat contains many rich cataloguing records for these translations Great works are translated GOAL: Data mine the really good records to improve clustering, presentation, authority records and linked data
  24. 24. The world’s libraries. Connected. • Inconsistencies causing work clusters to be incomplete & less than optimal search results • Titles without subtitles • Different forms of uniform title or missing uniform title • Inverted title • Different coding of original and translated information Translations Generated uniform title authority records will overcome most of these differences without needing to edit individual records
  25. 25. The world’s libraries. Connected. • Improve FRBR work groups • Made by data mining • Contribute to VIAF • Diffuse via VIAF as linked data • Possibility to create web page / web service Generate uniform title authority records
  26. 26. The world’s libraries. Connected.
  27. 27. The world’s libraries. Connected. Translation records in VIAF • Will enrich VIAF significantly • New elements - translated title and translator Author Title Expressions in VIAF Translation count in WorldCat Atwood Blind assassin 8 31 Guevara Notas de viaje 0 11 Hawking Grand design 0 18 Lenard Grosse naturforscher 1 3 Loti Pêcheur d’Islande 1 31
  28. 28. The world’s libraries. Connected. • Records are freely available to the world from VIAF in • MARC-21 • XML • RDF (linked data) • Just links in JSON • And other formats as introduced Diffusion of Translation records
  29. 29. The world’s libraries. Connected. • # of manifestations as opposed to # of records • # of works that have translations • Top translated authors and works • And more  We don’t know now, but soon will

×