Multilingual presentation ifla 2013 08-19


Published on

Data mining OCLC for translations.
Creating authority records for VIAF.
Remodelling the bibliorgraphic structure to make the best mutli-lingual displays from all available data in a work set.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • There are more than 300 million records in WorldCat today representing holdings of the world’s libraries. There are also another 200 million articles. Of the 300 million, more than half are in languages other than English.
  • The 300 million records are clustered into works. The records may be in just one language of cataloguing or they may have a mixture of language of cataloguing, e.g. subject heading in more than one language. Parallel records may exist, i.e. records in different languages of cataloguing describing the same resource.
  • The existing architecture is bibliographicrecord centric. There may be links to author and subject authority records. The work, manifestation and content clusters only contain identifiers, no metadata at present.
  • Three complementary initiatives are in progress concerning multi-lingualism and WorldCat.
  • The first project – generation of metadata at work level to make better presentations.
  • The GLIMIR project creates clusters of parallel records for the same manifestation (manifestation cluster) and also clusters for the same content, though the form may be different (print, microform, digital). We are in the process of trawling thorugh the database to make these clusters.Only the part in blue will be changed by the multi-lingual bibliographic structure approach
  • Concerning correcting records coded as “und” = “undetermined”, we expect that we can correct about 7 million by searching for the same title string in other records.
  • Currently when the short list and full displays are created, the system selects the most appropriate record for display. The most appropriate record is determined by its size and the number of associated holdings and it was envisaged to extend this to include the language of cataloguing. But we think we can do much better by questioning the “most appropriate record” concept.
  • Here the first record is catalogued in German but has no significant German content – no subject headings and no notes. There are other richer records that could inform the user better.
  • Hyrid records also mean that significant linguistic information may be buried in other records.
  • This record has subject headings in 3 languages.
  • Build the information to display to users from all available records and show them all relevant holdings. Do not display just one selected record from the work set. Cataloguers too will benefit from this, being able to drill down to actual records where appropriate.
  • This is the theoretical bibliographic structure that will no longer be bibliographic record centric but work centric. All information – authors, notes, summaries, subject headings will be flagged with the language of cataloguing.
  • The work level metadata will be tagged with language for each data element.Instead of always showing the data from the main title in the record, the alternative script fields may be chosen for display, depending on what the system can determine about the user, e.g. from IP address range or expressed already includes the ability to change language of display, but the numbers of fields that change will be enhanced and the tabulation of several displays will be improved.Consolidate holdings from all records that re applicable to a display (work, content, manifestation level)
  • An Iceland Fisherman, original in French displayed in English UI with Translations table.
  • Same record with Chinese interface
  • The Grand Design; English in Italian UI with Translations table.
  • Same in Japanese interface
  • Instead of working directly with lesser quality records to improve the quality in WorldCat and instead of working with the long tail, we are turning our attention to the most important works and working ways to use the good records to improve the quality.
  • Translated titles not always consistent, causing work grouping failure.  Sometimes:caused by titles without sub titles, caused by different forms of uniform title, i.e. in Gujarati and in English (several forms)caused by inverting the titles, by placing the name Gandhi before “Autobiography”.  Some figures:  French – 15 records, 6 work sets; German -  9 records, 9 work sets; Italian 7 records, 5 work sets; Spanish 8 records, 4 work sets
  • VIAFConsolidated display – shows work with expressions summary.
  • This is the full expressions summary for the work Pêcheurd’Islande
  • VIAFFull display – whos all the expressions, e.g. different translations with the title and the translator and the earliest determined publication date
  • These records will enrich VIAF. There is no risk of generating bad records. The translation records will include both the translated title and the translator (both are not usually included in existing records in VIAF)Most of the expression records will be new to VIAF.
  • The newly generated records will be available in the formats as distributed by VIAF.
  • Just think. How many of the 150+ million records that are for non English language works are actually translations of English language records. And how many of the English language records are also translations? It could be as high as 25%? Once we have these records generated, it will open many new possibilities.Also, we know we have 300 million records, but how many real resources do we have? GLIMIR will produce these figures. We are just starting…
  • Multilingual presentation ifla 2013 08-19

    1. 1. The world’s libraries. Connected. Multilingual WorldCatpresented by Janifer Gatenby IFLA, Singapore, 2013-08-19 Karen Smith Yoshimura Eric Childress Janifer Gatenby Jean Godby Richard Greene Jenny Toves Diane Vizine Goetz Robert Bremer JD Shipengrover Gail Thornburg Jay Weitz
    2. 2. The world’s libraries. Connected. WorldCat Today • Resources in nearly all languages • Contributed by more than 20,000 libraries worldwide • More than half the database is for works not in English
    3. 3. The world’s libraries. Connected. WorldCat Today • Bibliographic Records • Hybrid records • Parallel records • Clustered at Work level (FRBR)
    4. 4. The world’s libraries. Connected. Existing Architecture Author sAuthor sAuthors Subj Classif Subj ClassifSubj Classif Holdin gHoldin gHoldings Bibliographic recordWork cluster Content cluster Manifes tation cluster
    5. 5. The world’s libraries. Connected. Complementary Initiatives Work Level Record GLIMIR Manifestation & Content Clusters Multi-lingual Bibliographic Structure
    6. 6. The world’s libraries. Connected. Work Level Record
    7. 7. The world’s libraries. Connected. Create a landing page summarizing content for a work Work Level Record: Objective
    8. 8. The world’s libraries. Connected. • The Content Cluster • Enables better work record displays by reducing the number of lines that display for large works • Enables a choice of format and presents the formats that could be acceptable substitutes • Consolidates holdings for identical content • The Manifestation Cluster is important • Consolidates holdings at manifestation level • In the short term allows the record catalogued in the language of the interface to be chosen for display • Reduces apparent duplication • Allows a more accurate count of the number of manifestations in WorldCat (as opposed to the number of records) GLIMIR
    9. 9. The world’s libraries. Connected. Creates true multi-lingual displays • At work and manifestation levels • Using all available data instead of “most appropriate record” • Generates data Corrects many of the 28 million records coded “und” Better control and linking of translations Input to refinement of work clusters Smarter data storage Multilingual Bibliographic Structure Project
    10. 10. The world’s libraries. Connected. • selects the most appropriate record to show to a user as representative of the work in the short result list and beyond • The end result will not be very satisfactory from a multi-lingual viewpoint… here’s why “Most appropriate” questioned
    11. 11. The world’s libraries. Connected. Which record is better to present to a German speaker?
    12. 12. The world’s libraries. Connected. Incomplete Swedish Record
    13. 13. The world’s libraries. Connected. Hybrid record
    14. 14. The world’s libraries. Connected. Most appropriate display
    15. 15. The world’s libraries. Connected. • Work level data, mined from all associated bibliographic records will be displayed supplemented with expression / manifestation level data as the user drills through the short to fuller versions of the metadata. Multilingual Bibliographic Structure Project End user interface will show works and manifestations not bibliographic records; the cataloguing client will also show bibliographic records
    16. 16. The world’s libraries. Connected. Proposed new architecture Work eng fre ger jpn Manif eng Manif eng Manif eng Manif eng Manif eng Manif eng o freNotes Contents ++ Holdin gHoldin gHolding Holdin g Subj sif Subj Classif eng fre ger jpn Author sAuthor sAuthorseng fre ger jpn eng fre ger jpn eng fre ger jpn Translations (Language of work) Manif fre Holding
    17. 17. The world’s libraries. Connected. • Language tagging of elements, particularly • Summaries (M21 520) • Subject headings • Display in script preferred by the user if data is available • Improve translated interfaces • Show consolidated holdings as appropriate Important principles
    18. 18. The world’s libraries. Connected.
    19. 19. The world’s libraries. Connected.
    20. 20. The world’s libraries. Connected.
    21. 21. The world’s libraries. Connected.
    22. 22. The world’s libraries. Connected. Translations
    23. 23. The world’s libraries. Connected. • The cream of the world’s cultural and knowledge heritage is shared by being translated • WorldCat contains many rich cataloguing records for these translations Great works are translated GOAL: Data mine the really good records to improve clustering, presentation, authority records and linked data
    24. 24. The world’s libraries. Connected. • Inconsistencies causing work clusters to be incomplete & less than optimal search results • Titles without subtitles • Different forms of uniform title or missing uniform title • Inverted title • Different coding of original and translated information Translations Generated uniform title authority records will overcome most of these differences without needing to edit individual records
    25. 25. The world’s libraries. Connected. • Improve FRBR work groups • Made by data mining • Contribute to VIAF • Diffuse via VIAF as linked data • Possibility to create web page / web service Generate uniform title authority records
    26. 26. The world’s libraries. Connected.
    27. 27. The world’s libraries. Connected. Translation records in VIAF • Will enrich VIAF significantly • New elements - translated title and translator Author Title Expressions in VIAF Translation count in WorldCat Atwood Blind assassin 8 31 Guevara Notas de viaje 0 11 Hawking Grand design 0 18 Lenard Grosse naturforscher 1 3 Loti Pêcheur d’Islande 1 31
    28. 28. The world’s libraries. Connected. • Records are freely available to the world from VIAF in • MARC-21 • XML • RDF (linked data) • Just links in JSON • And other formats as introduced Diffusion of Translation records
    29. 29. The world’s libraries. Connected. • # of manifestations as opposed to # of records • # of works that have translations • Top translated authors and works • And more  We don’t know now, but soon will