0
Digital Library Infrastructure for a Million Books:  Synthesis  (and Opinion) Steve Toub California Digital Library
Having books talk to each other in the background <ul><li>Ability to enable books to talk to each other </li></ul><ul><ul>...
“How many libraries do we need?” <ul><li>We must operate under the assumption that that everything will live at the networ...
“Explore effectiveness of discovery tools” <ul><li>Listed only as a single bullet point on one of Bruce Robertson’s slides...
Bancroft’s History of the US: Perseus
Bancroft’s History of the US: MoA
Bancroft’s History of the US: MoA 2
Bancroft’s History of the US: Google
Bancroft’s History of the US: Google 2
Bancroft’s History of the US: Google 3
Bancroft’s History of the US: Google 4
Bancroft’s History of the US: Google 5
Bancroft’s History of the US: IA
Bancroft’s History of the US: IA 2
Bancroft’s History of the US: IA 3
Bancroft’s History of the US: IA 4
Bancroft’s History of the US: Gale
Bancroft’s History of the US: WorldCat Editions
Bancroft’s History of the US: WorldCat Results
Discovery infrastructure  at the network level <ul><li>Evangelize metadata/content exposure APIs  </li></ul><ul><ul><li>OA...
Need more focus on usage, research process as a whole <ul><li>Tools like Piggy Bank, Zotero can capture </li></ul><ul><li>...
Infrastructure for identity/users <ul><li>OpenID is only a hint of what's to come </li></ul><ul><li>Identity is the founda...
Can’t take these for granted <ul><li>Preservation </li></ul><ul><ul><li>Anyone who has looked at credible cost breakdown f...
Sustainable business models <ul><li>Crucial to think about economic sustainability and shape realistic economic incentives...
Where to begin:  Any consensus? <ul><li>Keep increasing available texts? </li></ul><ul><li>Compelling applications? </li><...
Upcoming SlideShare
Loading in...5
×

Digital Library Infrastructure for a Million Books

2,282

Published on

Describes what library infrastructure is needed for digital humanities use of mass digitized collections. Given at the Million Books Workshop, May 2007.

Published in: Education
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total Views
2,282
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
51
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide
  • Transcript of "Digital Library Infrastructure for a Million Books"

    1. 1. Digital Library Infrastructure for a Million Books: Synthesis (and Opinion) Steve Toub California Digital Library
    2. 2. Having books talk to each other in the background <ul><li>Ability to enable books to talk to each other </li></ul><ul><ul><li>Too costly to pre-define all relationships a priori </li></ul></ul><ul><ul><li>Need a system that grows, learns dynamically </li></ul></ul><ul><li>Requires: </li></ul><ul><ul><li>Rich markup of high-value texts </li></ul></ul><ul><ul><li>Ability to expose structure (e.g., microformats) OR address latent structure </li></ul></ul><ul><ul><ul><li>Including well-managed identifiers within a text </li></ul></ul></ul><ul><ul><li>Ability to define relationships between hooks in one text and one in another </li></ul></ul><ul><ul><ul><li>If not RDF/OWL, something conceptually similar </li></ul></ul></ul><ul><ul><li>Tools, APIs (standards,service registries, later) </li></ul></ul>
    3. 3. “How many libraries do we need?” <ul><li>We must operate under the assumption that that everything will live at the network level </li></ul><ul><li>Dividing the heterogeneous million book corpus in to sub-corpora, by domain, is necessary to apply tools effectively </li></ul><ul><li>Unless things change radically, humanities scholars, libraries, and the government don’t have the resources or will to shape how digital library services will exist in an open, interoperable way at the network level </li></ul>
    4. 4. “Explore effectiveness of discovery tools” <ul><li>Listed only as a single bullet point on one of Bruce Robertson’s slides but something near and dear to my heart </li></ul><ul><li>Example: </li></ul><ul><ul><li>George Bancroft's History of the United States: Vol. 1, History of the Colonization of the United States </li></ul></ul>
    5. 5. Bancroft’s History of the US: Perseus
    6. 6. Bancroft’s History of the US: MoA
    7. 7. Bancroft’s History of the US: MoA 2
    8. 8. Bancroft’s History of the US: Google
    9. 9. Bancroft’s History of the US: Google 2
    10. 10. Bancroft’s History of the US: Google 3
    11. 11. Bancroft’s History of the US: Google 4
    12. 12. Bancroft’s History of the US: Google 5
    13. 13. Bancroft’s History of the US: IA
    14. 14. Bancroft’s History of the US: IA 2
    15. 15. Bancroft’s History of the US: IA 3
    16. 16. Bancroft’s History of the US: IA 4
    17. 17. Bancroft’s History of the US: Gale
    18. 18. Bancroft’s History of the US: WorldCat Editions
    19. 19. Bancroft’s History of the US: WorldCat Results
    20. 20. Discovery infrastructure at the network level <ul><li>Evangelize metadata/content exposure APIs </li></ul><ul><ul><li>OAI-PMH, COinS, hCite,RDFa, unAPI, OAI-ORE </li></ul></ul><ul><li>Ability to aggregate raw data and scrubbed data at the network level (plus scrubbing tools and loaders) </li></ul><ul><li>An authoritative content registry (i.e., OCLC Registry of Digital Works) </li></ul><ul><ul><li>Work-level (and expression-level) identifiers </li></ul></ul><ul><ul><li>Low-barrier workflows for update + expose/harvest </li></ul></ul><ul><li>Formalized ways to express and expose content relationships: xISBN + thingISBN + ML approaches </li></ul><ul><ul><li>Related objects (versions, editions, translations, dupes, …) </li></ul></ul><ul><ul><li>Compound objects (chapters, overlay journals, …) </li></ul></ul><ul><ul><li>References (annotations, quotations, excerpts, …) </li></ul></ul>
    21. 21. Need more focus on usage, research process as a whole <ul><li>Tools like Piggy Bank, Zotero can capture </li></ul><ul><li>Count transactions (views, downloads, annotations, citations…) in standardized ways </li></ul><ul><li>Standardize log data (e.g., COUNTER) and its exchange (e.g., SUSHI) </li></ul><ul><li>Consolidate transaction data to close feedback loops and improve services </li></ul><ul><ul><li>Metrics (e.g., MESUR, search analytics) </li></ul></ul><ul><ul><li>Ranking, recommending, social discovery </li></ul></ul><ul><ul><li>Matching user terms to: authority files; texts </li></ul></ul>
    22. 22. Infrastructure for identity/users <ul><li>OpenID is only a hint of what's to come </li></ul><ul><li>Identity is the foundation for higher-level services: trust, authority, verified claims, reputation, groups/memberships, … </li></ul><ul><li>How can this emerging ecosystem leverage existing infrastructure? </li></ul><ul><ul><li>Authority files </li></ul></ul><ul><ul><li>LDAP servers </li></ul></ul><ul><ul><li>Peer-review, citations, etc. </li></ul></ul>
    23. 23. Can’t take these for granted <ul><li>Preservation </li></ul><ul><ul><li>Anyone who has looked at credible cost breakdown for a well-managed long-term digital preservation repository will understand that the libraries will always preserve NOT a given </li></ul></ul><ul><li>Intellectual property rights </li></ul><ul><ul><li>GBS not showing many public domain items </li></ul></ul><ul><ul><li>Keeping tabs on orphan works </li></ul></ul>
    24. 24. Sustainable business models <ul><li>Crucial to think about economic sustainability and shape realistic economic incentives </li></ul><ul><li>Why don’t libraries license JSTOR indefinitely so it could be opened up free for all? </li></ul><ul><li>How to incentivize disciplinary collaboration that go across institutions? </li></ul><ul><li>What’s our relationship to players at network level: Google, MSFT, IA, OCLC, Amazon, … </li></ul><ul><li>Who pays for the common infrastructure? </li></ul><ul><li>Organizational infrastructure to: </li></ul><ul><ul><ul><li>Enable free riders? Feed and care for free kittens? </li></ul></ul></ul>
    25. 25. Where to begin: Any consensus? <ul><li>Keep increasing available texts? </li></ul><ul><li>Compelling applications? </li></ul><ul><li>Easy to use tools? </li></ul><ul><li>APIs, service registries? </li></ul><ul><li>Key software engineering tasks? </li></ul><ul><li>Marketing and outreach? </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×