Book Discovery in a  Mass Digitized Environment Heather Christenson, Mass Digitization Project Manager, CDL Steve Toub, Bi...
Motivations <ul><li>An interesting thought experiment: Could interfaces to mass digitized collections replace our OPACs? <...
Research Questions <ul><li>What are strengths and weaknesses of leading book discovery interfaces? </li></ul><ul><li>What ...
<ul><li>Best of breed next-generation catalogs </li></ul><ul><li>Best of breed non-library book discovery systems </li></u...
Methodology <ul><li>Identified, ranked core features for evaluation </li></ul><ul><li>Attempted to simulate tasks, query s...
Tasks <ul><li>Find a known titles, authors </li></ul><ul><li>Subject searching </li></ul><ul><li>Winnow results </li></ul>...
Ratings used Room to improve ★ Below par ★★ Getting there ★★★ Very good ★★★★ Everything you could expect to have ★★★★★
Find known titles, authors <ul><li>Find a known title </li></ul><ul><ul><li>Search terms:  Sierra Club Green Guide </li></...
 
Find known titles, authors Spotty coverage; full-text hinders ★ Accurate, but hard to select ★★★ If target isn’t first, fa...
Subject searching <ul><li>Find books on peak oil </li></ul><ul><li>Find a history about Plutonium production at Hanford At...
Subject searching Poor coverage; full text hurts ★ Not great ★★ Decent, full text hurts ★★★ Lack of combined index hurts ★...
Winnow results <ul><li>To what extent does the site allow narrowing, refining, and sorting results? </li></ul><ul><li>Are ...
 
 
 
 
 
 
 
 
Winnow results No facets or sorting ★ No sorting; facets need work ★★ On the right track ★★★ Facet values are a grab bag ★...
Choose specific edition; compare <ul><li>Find the best critical edition of  Hamlet </li></ul><ul><ul><li>Harold Jenkin’s A...
 
FRBR doesn’t help me compare
 
Choose specific edition; compare Even if complete, hard to compare ★ Hard to choose among editions ★ Hard to choose among ...
<ul><li>Do I want to obtain this book?  </li></ul><ul><li>What tools or features does each site offer to help me evaluate ...
 
 
 
 
 
 
 
 
 
Evaluate the item Brief records only ★ Brief records; attempt at reviews ★ A traditional OPAC in this area ★★ Little more ...
Evaluate the digital item <ul><li>Full text is not natively online in: </li></ul><ul><ul><li>LibraryThing, NCSU, U.Washing...
 
 
Experimentation: full-text access
Evaluate the digital item No full text there ★ No full text there ★ No full text there ★ Good ★★★ Good ★★★ Intuitive navig...
Recommendations: more like this <ul><li>Can the system recommend other works similar to this one (in other ways than just ...
 
 
 
 
 
 
Recommendations: more like this No attempt to recommend ★ No attempt to recommend ★ No attempt to recommend ★ Not much bet...
Obtain a book for local use <ul><li>How quick and easy is it to obtain a particular book, or portions of the book, in eith...
 
 
 
 
Obtain a book for local use Buy, buy, buy ★ Limited to download full book ★★ Find at NCSU, borrow (ILL) ★★★ Buy, find in a...
Find references to quotes, facts <ul><li>Quotes </li></ul><ul><ul><li>Life's but a walking shadow, a poor player  That str...
 
 
Find references to quotes, facts No full text; luck not very likely ★ No full text indexing >1 book! ★ You get lucky occas...
Linkability <ul><li>Tasks </li></ul><ul><ul><li>Can I link to a work? </li></ul></ul><ul><ul><li>Can I link to an expressi...
LT puts thought into linkability
Clips in Google Book Search
Linkability Opaque identifiers in ugly URLs ★ Text strings in URLs (OL vs. IA) ★ System ID of underlying ILS ★ ISBN option...
API access <ul><li>Tasks </li></ul><ul><ul><li>Can I develop remote applications that display bib, holdings, item records?...
API Access Complete API promised ★★★ Complete, documented API ★★★★ Complete, documented API ★★★★ None announced ★ None ann...
Linking to mass dig from OPACs:  No way to batch load yet <ul><li>Vigilante efforts to harvest GBS URLs </li></ul><ul><ul>...
Linking to mass dig from OPACs <ul><li>Formal efforts by individual libraries </li></ul><ul><ul><li>U. Michigan links to i...
Strengths, weaknesses… <ul><li>Amazon has most relevant hits; LT 2nd  </li></ul><ul><ul><li>Results displays in Amazon, Li...
Strengths, weaknesses… <ul><li>Breadth and depth of LibraryThing tags and community is amazing </li></ul><ul><ul><li>Espec...
Best book discovery experience <ul><li>Amazon and LibraryThing, lead the way  in user experience for book discovery tasks ...
What if we replaced our OPACs? <ul><li>Gains </li></ul><ul><ul><li>Fast access to full text (of out of copyright items) </...
What to watch as things evolve <ul><li>Non-traditional metadata, based on full text analytics  </li></ul><ul><ul><li>Examp...
When book discovery services  talk to each other in the background
…who will control the interface?
Barriers to perfect book discovery <ul><li>Economic, political barriers are most difficult </li></ul><ul><ul><li>Competiti...
Questions? <ul><li>heather.christenson@ucop.edu  </li></ul><ul><li>[email_address]   </li></ul>
Upcoming SlideShare
Loading in...5
×

Book Discovery In Mass Digitized Environment

9,705
-1

Published on

A slightly-expanded version of the talk Heather and I gave at the Fall 2007 DLF Forum.

Published in: Business, Education
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
9,705
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
58
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Introduce CDL/UC context: (GBS, MSFT, OCA) and WorldCat Local pilot. In this presentation we will -give a little background on our motivations for looking at book discovery interface -share our evaluation of the interfaces -share our insights gained from the evaluation -speculate about the future of book discovery Please hold your questions until the end.
  • Book Discovery In Mass Digitized Environment

    1. 1. Book Discovery in a Mass Digitized Environment Heather Christenson, Mass Digitization Project Manager, CDL Steve Toub, Bibliographic Services Strategist, CDL
    2. 2. Motivations <ul><li>An interesting thought experiment: Could interfaces to mass digitized collections replace our OPACs? </li></ul><ul><li>A starting point and an excuse to get familiar with our mass digitized collections </li></ul>
    3. 3. Research Questions <ul><li>What are strengths and weaknesses of leading book discovery interfaces? </li></ul><ul><li>What is the best user experience for book discovery tasks? </li></ul><ul><li>What’s gained and lost by replacing our (next-generation) catalog entirely with a full-text repository? </li></ul>
    4. 4. <ul><li>Best of breed next-generation catalogs </li></ul><ul><li>Best of breed non-library book discovery systems </li></ul><ul><li>Interfaces to mass digitized collections </li></ul>Sites we chose to evaluate
    5. 5. Methodology <ul><li>Identified, ranked core features for evaluation </li></ul><ul><li>Attempted to simulate tasks, query syntax and attention span of a typical undergraduate </li></ul><ul><li>Evaluated some features related to discovery and integration that are of interest to librarians </li></ul><ul><li>Our experiences in interface design and evaluation criteria we have used in the past has shaped our perspective </li></ul><ul><li>Not systematic, not comprehensive </li></ul>
    6. 6. Tasks <ul><li>Find a known titles, authors </li></ul><ul><li>Subject searching </li></ul><ul><li>Winnow results </li></ul><ul><li>Choose specific edition: compare </li></ul><ul><li>Evaluate the item </li></ul><ul><li>Evaluate the digital item </li></ul><ul><li>Recommendations: more like this </li></ul><ul><li>Obtain a book for local use </li></ul><ul><li>Find references to quotes, facts </li></ul>
    7. 7. Ratings used Room to improve ★ Below par ★★ Getting there ★★★ Very good ★★★★ Everything you could expect to have ★★★★★
    8. 8. Find known titles, authors <ul><li>Find a known title </li></ul><ul><ul><li>Search terms: Sierra Club Green Guide </li></ul></ul><ul><ul><li>Search terms: What Would Jesus Do </li></ul></ul><ul><ul><li>Search terms: 1984 Orwell </li></ul></ul><ul><ul><li>Search terms: Sartre Nausea </li></ul></ul><ul><li>Find that book where David Sedaris tells stories about his life in France </li></ul><ul><ul><li>Search terms: sedaris france </li></ul></ul><ul><li>Find recent books by David Sedaris </li></ul><ul><ul><li>Search terms: david sedaris </li></ul></ul>
    9. 10. Find known titles, authors Spotty coverage; full-text hinders ★ Accurate, but hard to select ★★★ If target isn’t first, facets help ★★★ Target is usually first ★★★★ Target is usually first ★★★★ Target is usually first ★★★★ Great relevance; compact display ★★★★★
    10. 11. Subject searching <ul><li>Find books on peak oil </li></ul><ul><li>Find a history about Plutonium production at Hanford Atomic Facility </li></ul><ul><li>Find a biography of John Philip Sousa </li></ul>
    11. 12. Subject searching Poor coverage; full text hurts ★ Not great ★★ Decent, full text hurts ★★★ Lack of combined index hurts ★★★ Better than average ★★★★ Better than average ★★★★ Great relevance ★★★★★
    12. 13. Winnow results <ul><li>To what extent does the site allow narrowing, refining, and sorting results? </li></ul><ul><li>Are the methods effective? </li></ul><ul><li>Are the methods intuitive? </li></ul>
    13. 22. Winnow results No facets or sorting ★ No sorting; facets need work ★★ On the right track ★★★ Facet values are a grab bag ★★★ Tags galore (from tag search) ★★★ Good ★★★ Excels ★★★★
    14. 23. Choose specific edition; compare <ul><li>Find the best critical edition of Hamlet </li></ul><ul><ul><li>Harold Jenkin’s Arden edition </li></ul></ul><ul><li>Find the definitive critical edition of Huckleberry Finn </li></ul><ul><ul><li>UC Press, 2003 </li></ul></ul><ul><li>Find definitive Elvis Presley biography </li></ul><ul><li>Find good biography: John Philip Sousa </li></ul><ul><li>Find a good book on peak oil </li></ul>
    15. 25. FRBR doesn’t help me compare
    16. 27. Choose specific edition; compare Even if complete, hard to compare ★ Hard to choose among editions ★ Hard to choose among editions ★ Some good, some less so ★★★ Decent; facets help somewhat ★★★ Decent; compare tool concept is nice ★★★★ Decent; number of holdings help ★★★★
    17. 28. <ul><li>Do I want to obtain this book? </li></ul><ul><li>What tools or features does each site offer to help me evaluate its items? </li></ul><ul><ul><li>Cover art </li></ul></ul><ul><ul><li>Traditional descriptive metadata </li></ul></ul><ul><ul><li>Published reviews </li></ul></ul><ul><ul><li>User generated reviews and rankings </li></ul></ul><ul><ul><li>Table of contents, index, book jacket </li></ul></ul>Evaluate the item
    18. 38. Evaluate the item Brief records only ★ Brief records; attempt at reviews ★ A traditional OPAC in this area ★★ Little more than a regular OPAC ★★ Some machine-generated MD ★★ Active community yields results ★★★★ What more would you want? ★★★★★
    19. 39. Evaluate the digital item <ul><li>Full text is not natively online in: </li></ul><ul><ul><li>LibraryThing, NCSU, U.Washington </li></ul></ul><ul><li>Copyright status affects levels of access </li></ul><ul><li>What tools are there on top of the full text to help me evaluate the item? </li></ul>
    20. 42. Experimentation: full-text access
    21. 43. Evaluate the digital item No full text there ★ No full text there ★ No full text there ★ Good ★★★ Good ★★★ Intuitive navigation ★★★ Replicates physical experience ★★★★
    22. 44. Recommendations: more like this <ul><li>Can the system recommend other works similar to this one (in other ways than just hyperlinking subject headings)? </li></ul><ul><ul><li>Are these recommended works relevant? </li></ul></ul><ul><li>Examples </li></ul><ul><ul><li>The Wisdom of Crowds </li></ul></ul><ul><ul><li>A Confederacy of Dunces </li></ul></ul><ul><ul><li>Information Architecture for the World Wide Web </li></ul></ul><ul><ul><li>Jesus Before Christianity </li></ul></ul>
    23. 51. Recommendations: more like this No attempt to recommend ★ No attempt to recommend ★ No attempt to recommend ★ Not much better than nothing ★★ Ok; not always there! ★★★ Many options, composite results ★★★★ Many options, high quality ★★★★
    24. 52. Obtain a book for local use <ul><li>How quick and easy is it to obtain a particular book, or portions of the book, in either digital or print form? </li></ul><ul><ul><li>View online, download, print on demand </li></ul></ul><ul><ul><li>Borrow, swap, buy </li></ul></ul><ul><li>How does the interface present availability? </li></ul><ul><ul><li>Ability to limit results by only those items that are available to me? </li></ul></ul>
    25. 57. Obtain a book for local use Buy, buy, buy ★ Limited to download full book ★★ Find at NCSU, borrow (ILL) ★★★ Buy, find in a library, download book ★★★ Many variations on download ★★★ Find in a library, borrow (ILL) ★★★ Buy, find in library, link to swap ★★★
    26. 58. Find references to quotes, facts <ul><li>Quotes </li></ul><ul><ul><li>Life's but a walking shadow, a poor player That struts and frets his hour upon the stage And then is heard no more. It is a tale Told by an idiot, full of sound and fury, Signifying nothing. </li></ul></ul><ul><ul><li>Ol' man river, / Dat ol' man river He mus'know sumpin’ / But don't say nuthin', He jes'keeps rollin’ / He keeps on rollin' along. </li></ul></ul><ul><li>References to the size of Rhode Island </li></ul><ul><li>Population of Nepal in 1990 </li></ul><ul><li>When is Tajikistan Constitution Day? </li></ul>
    27. 61. Find references to quotes, facts No full text; luck not very likely ★ No full text indexing >1 book! ★ You get lucky occasionally ★★ You get lucky occasionally ★★ You get lucky occasionally ★★ Full-text indexing across books ★★★ “ Popular passages” is potpourri ★★★
    28. 62. Linkability <ul><li>Tasks </li></ul><ul><ul><li>Can I link to a work? </li></ul></ul><ul><ul><li>Can I link to an expression? </li></ul></ul><ul><ul><li>Can I link within an item? </li></ul></ul><ul><ul><li>What identifiers are in use? </li></ul></ul><ul><li>Results </li></ul><ul><ul><li>No visible guarantees of persistent URLs </li></ul></ul><ul><ul><li>No standard for work-level identifiers </li></ul></ul><ul><ul><li>Some ability to link within an item </li></ul></ul>
    29. 63. LT puts thought into linkability
    30. 64. Clips in Google Book Search
    31. 65. Linkability Opaque identifiers in ugly URLs ★ Text strings in URLs (OL vs. IA) ★ System ID of underlying ILS ★ ISBN option in URL; p= ★★★ ISBN option in URL; clips ★★★★ ISBN, OCLC No. in URL; loc= ★★★★ ISBN option in URL --> Work ID ★★★★
    32. 66. API access <ul><li>Tasks </li></ul><ul><ul><li>Can I develop remote applications that display bib, holdings, item records? </li></ul></ul><ul><ul><li>Do I have the ability to perform ad hoc data or text mining operations on the full text? </li></ul></ul><ul><li>Comments </li></ul><ul><ul><li>Not a strong point of traditional ILS systems </li></ul></ul><ul><ul><li>ILS-DI work is ongoing; how to give it teeth? </li></ul></ul><ul><ul><li>Intellectual property issues limit ability to provide open access to everyone for everything </li></ul></ul>
    33. 67. API Access Complete API promised ★★★ Complete, documented API ★★★★ Complete, documented API ★★★★ None announced ★ None announced ★ xISBN, xISSN; more soon? ★★ thingISBN, LT for Libraries ★★
    34. 68. Linking to mass dig from OPACs: No way to batch load yet <ul><li>Vigilante efforts to harvest GBS URLs </li></ul><ul><ul><li>John Blyberg (then AADL) blocked in August 2006 </li></ul></ul><ul><ul><li>Tim Spalding (LibraryThing) voluntarily stopped in Sep 2007 after bookmarklet collected >250,000 </li></ul></ul><ul><ul><li>In both cases, Google communicated interest in a better solution </li></ul></ul><ul><li>Other cowboy efforts to link to books from OPAC </li></ul><ul><ul><li>Jackie Wrosch (Eastern Michigan U.) developed JavaScript that polls GBS for OCLC number </li></ul></ul><ul><ul><li>Jan Szczepanski (G ö teborg U.) has personally selected and cataloged 17,000 eBooks </li></ul></ul><ul><li>IA exposes all content from each book page </li></ul><ul><ul><li>Is it possible to download in bulk? </li></ul></ul>
    35. 69. Linking to mass dig from OPACs <ul><li>Formal efforts by individual libraries </li></ul><ul><ul><li>U. Michigan links to its GBS books in its catalog by loading identifiers into the 2nd call number field of the item record </li></ul></ul><ul><ul><li>UIUC links to its OCA books by creating a separate bib record for the e-format and loading that into their catalog. </li></ul></ul><ul><ul><li>Anyone else? </li></ul></ul><ul><li>Formal programs across libraries </li></ul><ul><ul><li>OCLC’s synchronization program with interested mass digitization programs begins pilot soon </li></ul></ul><ul><ul><li>Bowker? </li></ul></ul>
    36. 70. Strengths, weaknesses… <ul><li>Amazon has most relevant hits; LT 2nd </li></ul><ul><ul><li>Results displays in Amazon, LibraryThing are most useful, though very different </li></ul></ul><ul><li>A breakthrough ranking algorithm like PageRank isn’t yet available for books </li></ul><ul><li>Can choose either winnowing or access to full text, but, unfortunately, not both </li></ul><ul><ul><li>Not all facet implementations are created equally </li></ul></ul><ul><li>Microsoft, OpenLibrary not yet polished </li></ul>
    37. 71. Strengths, weaknesses… <ul><li>Breadth and depth of LibraryThing tags and community is amazing </li></ul><ul><ul><li>Especially compared to relative lack of tags in Amazon, and paucity of user-generated content in WorldCat and Internet Archive </li></ul></ul><ul><li>Ability to compare books isn’t mature </li></ul><ul><ul><li>An interface that groups editions doesn’t necessarily mean it provides tools to choose among editions </li></ul></ul><ul><li>Amazon metadata display: broad, dense </li></ul><ul><li>Full-text displays still relatively immature </li></ul>
    38. 72. Best book discovery experience <ul><li>Amazon and LibraryThing, lead the way in user experience for book discovery tasks </li></ul><ul><ul><li>Proven track records of continuous innovation </li></ul></ul><ul><li>NCSU, Google, and U.Washington </li></ul><ul><ul><li>All compete favorably with a traditional OPAC </li></ul></ul><ul><li>Internet Archive (and Open Library project), and Microsoft have the most room to grow </li></ul><ul><ul><li>Hard to compare these to a traditional OPAC </li></ul></ul>
    39. 73. What if we replaced our OPACs? <ul><li>Gains </li></ul><ul><ul><li>Fast access to full text (of out of copyright items) </li></ul></ul><ul><ul><li>Improved ability to answer questions you can’t answer in an OPAC </li></ul></ul><ul><li>Lost </li></ul><ul><ul><li>Using metadata’s power to winnow and evaluate </li></ul></ul><ul><ul><li>Nice display of multi-volume works (e.g., serials) </li></ul></ul><ul><li>Instead of replacing OPAC w/ GBS, MSFT, IA </li></ul><ul><ul><li>Replacing the OPAC with Amazon or LibraryThing might better serve your users today </li></ul></ul>
    40. 74. What to watch as things evolve <ul><li>Non-traditional metadata, based on full text analytics </li></ul><ul><ul><li>Example: Recommendations based on full text occurences of Statistically Improbable Phrases </li></ul></ul><ul><li>Better integration of analog filtering, social networks into online book discovery services </li></ul><ul><ul><li>Web architecture for identity (OpenID?), attention (APML?), and trust (OpenSocial?) will impact </li></ul></ul><ul><li>Innovations in delivery have potential to disrupt traditional library delivery services </li></ul><ul><ul><li>Swapping and print on demand </li></ul></ul>
    41. 75. When book discovery services talk to each other in the background
    42. 76. …who will control the interface?
    43. 77. Barriers to perfect book discovery <ul><li>Economic, political barriers are most difficult </li></ul><ul><ul><li>Competition among those with power </li></ul></ul><ul><ul><ul><li>Google, OCLC, Amazon, Bowker, Ingram </li></ul></ul></ul><ul><ul><li>Economic incentives to build an open commons </li></ul></ul><ul><ul><ul><li>Who pays for utilities that benefit all? </li></ul></ul></ul><ul><ul><ul><li>Especially if the benefits are invisible to library patrons </li></ul></ul></ul><ul><ul><li>Fear of loss of local control </li></ul></ul><ul><ul><li>Risk-averse nature of librarians </li></ul></ul><ul><ul><li>Agreement on which identifiers to use or who owns the master lookup database </li></ul></ul><ul><li>Tech issues are hard, but less of a barrier </li></ul><ul><ul><li>Equivalent of PageRank for books </li></ul></ul><ul><ul><li>How to leverage identity, attention, and trust </li></ul></ul>
    44. 78. Questions? <ul><li>heather.christenson@ucop.edu </li></ul><ul><li>[email_address] </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×