Cornell20080516

2,020
-1

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,020
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cornell20080516

  1. 1. Metadata Normalization A Case Study in Primo -- and -- Linked Open Data In Libraries
  2. 2. Topical Overview <ul><li>Non-OPAC Discovery Systems </li></ul><ul><li>ILS - Discvoery System Ineroperability </li></ul><ul><li>SFX Optimization </li></ul><ul><li>Metadata Normalization </li></ul><ul><li>Ex Libris’ Primo - A Case Study </li></ul><ul><ul><li>Front end and System Overview </li></ul></ul><ul><ul><li>Primo - System Demo </li></ul></ul><ul><ul><li>Primo - NYU’s Implementation </li></ul></ul>
  3. 3. Topical Overview <ul><li>Primo (Cont) </li></ul><ul><ul><li>Metadata and Data Analysis </li></ul></ul><ul><ul><li>Challenges and Possibilities </li></ul></ul><ul><li>Common Models and Linked Open Data </li></ul><ul><ul><li>An Alternative Approach </li></ul></ul><ul><li>Data Harmonization Benefits </li></ul><ul><ul><li>Authority Control </li></ul></ul><ul><ul><li>Application Profiles </li></ul></ul><ul><ul><li>A possible future for Bibliographic Data </li></ul></ul>
  4. 4. Not an OPAC Replacement <ul><li>Primo, Endeca, Encore, AquaBrowser, Library Find, VUFind, WorldCat Local </li></ul><ul><li>Not OPAC Replacements </li></ul><ul><ul><li>More seamless discovery </li></ul></ul><ul><ul><li>“ Web 2.0” </li></ul></ul><ul><ul><li>Fewer Clicks - ease of use </li></ul></ul><ul><ul><li>Cross-depository discovery </li></ul></ul>
  5. 5. Encore by III <ul><li>“ Encore goes beyond the online-catalog model to provide a better patron experience that leverages library content and patron-contributed information. Key features include: </li></ul><ul><ul><li>Faceted search by multiple parameters </li></ul></ul><ul><ul><li>RightResult™ relevance-ranking </li></ul></ul><ul><ul><li>Real-time holdings and status information </li></ul></ul><ul><ul><li>Suggested links to content related to the user's search” </li></ul></ul><ul><li>http://www.iii.com/encore/main_index2.html </li></ul>
  6. 6. AquaBrowser <ul><li>“ Whatever it is, wherever it is, patrons can quickly and easily find it using a single interface for all types and formats of content. Visually represented and faceted search results allow your patrons to search and discover information faster and more effectively. Relevant search results help them find answers fast. Word clouds encourage exploration and discovery. Facets help to quickly focus the results.” </li></ul><ul><li>http://www.aquabrowser.com/products/academic/ </li></ul>
  7. 7. Endeca <ul><li>“ Endeca for Libraries is the most effective way for members of the library community to find the book or resource they need and to discover new information they didn't even know the library owned, which drives increased usage of the library's resources, usage of legacy library collections, and re-circulation.” </li></ul><ul><li>http://endeca.com/byIndustry/media/index.html </li></ul>
  8. 8. Primo <ul><li>“ Interfacing seamlessly with library applications from Ex Libris and other vendors … management of all types of library resources, regardless of format and location.” </li></ul><ul><li>“ Find it all. Find it Easily. Get it” </li></ul><ul><li>http://www.exlibrisgroup.com/category/PrimoOverview </li></ul>
  9. 9. WorldCat Local <ul><li>“ A localized version of WorldCat with custom branding and relevancy ranking </li></ul><ul><li>… interoperates with your existing ILS and fulfillment systems … </li></ul><ul><li>Single-search, multilingual interface for all physical and electronic content held locally or in remote locations </li></ul><ul><li>Integrated access to the most appropriate delivery options” </li></ul><ul><li>http://www.oclc.org/worldcatlocal/default.htm </li></ul>
  10. 10. One Problem, Many Solutions <ul><li>Users want a more seamless discovery experience </li></ul><ul><li>Libraries get locked on the 2.0 buzz </li></ul><ul><ul><li>Tagging, Reviews, Recommendations </li></ul></ul><ul><ul><li>Improved Relevancy Ranking </li></ul></ul><ul><li>Other goals may be more important </li></ul><ul><ul><li>Fewer clicks to fulfillment </li></ul></ul><ul><ul><li>Cross-Depository Discovery </li></ul></ul>
  11. 11. More Intuitive Searching <ul><li>Less complicated initial searches </li></ul><ul><li>Less pre-search limiting </li></ul><ul><li>More post-search limits via faceting </li></ul><ul><li>Appropriate Delivery bubbles up </li></ul><ul><li>Trade-offs… </li></ul>
  12. 12. Not in your OPAC <ul><li>DLF ILS Discovery Interface Task Force </li></ul><ul><li>From the “Berkeley Accords”: </li></ul><ul><ul><li>“ participants agreed to support a set of essential functions through open protocols and technologies by deploying specific recommended standards” </li></ul></ul><ul><ul><li>Harvesting, Availability, Linking </li></ul></ul>
  13. 13. Availability and Access <ul><li>Links to full resource at the front </li></ul><ul><li>Carefully considered SFX options </li></ul><ul><li>Record FRBRization and Dedup </li></ul><ul><li>Availibility Statements real time (or as close as possible) </li></ul>
  14. 14. Open URL From Primo Aleph Record Aleph Record(s) and/or other data deduped Other Data Source <ul><li>Query: </li></ul><ul><li>Other NYU Cat </li></ul><ul><li>WorldCat </li></ul><ul><li>Link both to ISBN/ISSN </li></ul><ul><li>Search </li></ul><ul><li>Query: </li></ul><ul><li>Aleph (link to Holdings) </li></ul><ul><li>Other NYU Cat </li></ul><ul><li>WorldCat </li></ul><ul><li>Link Others to ISBN/ISSN </li></ul><ul><li>Search </li></ul><ul><li>Query: </li></ul><ul><li>Aleph (link to Holdings) </li></ul><ul><li>Other NYU Cat </li></ul><ul><li>WorldCat </li></ul><ul><li>Link Others to ISBN/ISSN </li></ul><ul><li>Search </li></ul>
  15. 15. Primo: A Case Study <ul><li>Normalization Rules </li></ul><ul><li>Delivery templates </li></ul><ul><li>Tight SFX and MetaLib Integration </li></ul><ul><li>“ Pipes” for different data sources </li></ul><ul><li>Hourly Availability Checking </li></ul><ul><ul><li>(Real Time in Version 2.0) </li></ul></ul>
  16. 16. Harvesting <ul><li>Different Data Sources </li></ul><ul><li>Different Normalization Rules </li></ul><ul><li>All standardized on Primo Normalized XML (PNX) Record </li></ul><ul><ul><li>Very Flat, sections corresponding to Primo Functionality </li></ul></ul>
  17. 17. The PNX Record <ul><li>Display Section </li></ul><ul><li>Links Section </li></ul><ul><li>Search Section </li></ul><ul><li>Sort Section </li></ul><ul><li>Facets Section </li></ul><ul><li>Dedup Section </li></ul><ul><li>FRBR Section </li></ul><ul><li>Delivery Section </li></ul>
  18. 18. Data Sources at NYU <ul><li>BobCat (Aleph) </li></ul><ul><li>MarcIt! </li></ul><ul><li>EAD Records (Archivists Toolkit) </li></ul><ul><li>Preservation Repository </li></ul><ul><li>Faculty Digital Archive (IR) </li></ul><ul><li>Art Images (Luna Insight) </li></ul><ul><li>MetaLib Resources </li></ul><ul><li>Data in SOLR </li></ul><ul><ul><li>Newspaper Index </li></ul></ul><ul><ul><li>Data Sets </li></ul></ul>
  19. 19. Issues and Challenges <ul><li>Managing Deduplication </li></ul><ul><ul><li>Dedup Data only out of box for MARC </li></ul></ul><ul><ul><li>Writing for OAI-PMH sources (EAD) </li></ul></ul><ul><li>Consortial Environment(s) </li></ul><ul><li>Appropriate Delivery Options </li></ul><ul><li>“ Interpreting” Metadata </li></ul>
  20. 20. EAD Records <ul><li>Archivists Toolkit </li></ul><ul><ul><li>Previously in Access, Notepad, Excel </li></ul></ul><ul><ul><li>Authority Control (sort of) </li></ul></ul><ul><li>OAI-PMH Overlay </li></ul><ul><li>Multiple layers of Crosswalking </li></ul><ul><li>Deduping </li></ul>
  21. 21. EAD / Aleph Dedup <ul><li>Aleph Title: </li></ul><ul><ul><li>James E. Jackson and Esther Cooper Jackson papers </li></ul></ul><ul><li>EAD Title: </li></ul><ul><ul><li>Guide to the James E. Jackson and Esther Cooper Jackson papers 1917-2004 (Bulk 1937-1992) Tamiment 347 </li></ul></ul>
  22. 22. MARC + EAD EAD Record Aleph Record Authority Records MARC Record w/ Auth Data OAI-DC Record w/ FT of EAD EAD PNX Aleph PNX Dedup PNX
  23. 23. Value of Dedup <ul><li>Indexing the Best of Both Worlds </li></ul><ul><li>EAD Records: </li></ul><ul><ul><li>Inventory </li></ul></ul><ul><ul><li>Long Biographical / Historical Notes </li></ul></ul><ul><li>MARC Data: </li></ul><ul><ul><li>Cross References for Access Points </li></ul></ul>
  24. 24. Why is it so hard? <ul><li>Continually Repetition of Effort </li></ul>
  25. 26. A Distinction <ul><li>Metadata Harmonization : </li></ul><ul><ul><li>the “ability to use serveral different metadata standards in a single software system.” </li></ul></ul><ul><li>Metadata Normalization : </li></ul><ul><ul><li>mapping serveral different metadata standards to a single schema or structure for use in a single software system. </li></ul></ul>
  26. 27. MARC + EAD EAD Record Aleph Record Authority Records MARC Record w/ Auth Data OAI-DC Record w/ FT of EAD EAD PNX Aleph PNX Dedup PNX
  27. 28. Linked Open Data <ul><li>Use URIs as names for things </li></ul><ul><li>Use HTTP URIs so that people can look up those names. </li></ul><ul><li>When someone looks up a URI, provide useful information. </li></ul><ul><li>Include links to other URIs. so that they can discover more things. </li></ul><ul><li>http://www.w3.org/DesignIssues/LinkedData.html </li></ul>
  28. 29. Primo is NOT Linked Data <ul><li>List of nearly a dozen sources, some “normalized” more than once </li></ul><ul><li>“ Normalized” into another proprietary format, used by one system </li></ul><ul><li>Additional Resources require additional pipes </li></ul>
  29. 30. Linked Library Data <ul><li>Resources get URI’s early in lifecycle </li></ul><ul><li>Properties get URI’s </li></ul><ul><li>Vocabularies get URI’s </li></ul><ul><li>Everything is dereferenceable as to it’s meaning </li></ul>
  30. 31. Conclusions <ul><li>DCMI/RDA Work </li></ul><ul><li>NSDL Registry Work </li></ul><ul><li>LC Registry Work </li></ul><ul><li>MODs as RDF (Simile & LC) </li></ul><ul><li>OAI-ORE </li></ul><ul><li>OAI2LOD </li></ul>
  31. 32. Conclusions <ul><li>This stuff is happening </li></ul><ul><li>We need to be playing with it </li></ul><ul><li>We need to be applying lessons from projects like Primo to it </li></ul><ul><li>Library Data is a key component! </li></ul>
  32. 33. … and Library data is extremely complicated
  33. 34. MARC Record Graph <ul><li>Does not include authority data </li></ul><ul><li>Coins new URI’s any non-literal value </li></ul><ul><li>Contains a few minor modeling errors </li></ul><ul><ul><li><modsrdf:Publisher modsrdf:value=&quot;Crowell&quot; rdf:about=&quot;http://simile.mit.edu/2006/01/publisher/Crowell&quot;> </li></ul></ul><ul><ul><li><modsrdf:location> </li></ul></ul><ul><ul><li><modsrdf:Place modsrdf:name=&quot;New York“ </li></ul></ul><ul><ul><li>rdf:about=&quot;http://simile.mit.edu/2006/01/place/marccountry/nyu&quot;/> </li></ul></ul><ul><ul><li></modsrdf:location> </li></ul></ul><ul><ul><li></modsrdf:Publisher> </li></ul></ul>
  34. 35. Thanks! <ul><li>Questions? </li></ul><ul><li>[email_address] </li></ul><ul><li>212.998.2479 </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×