1330 mon katrine york
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

1330 mon katrine york

on

  • 868 views

 

Statistics

Views

Total Views
868
Views on SlideShare
868
Embed Views
0

Actions

Likes
1
Downloads
6
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

1330 mon katrine york Presentation Transcript

  • 1. HATHITRUST A Shared Digital RepositoryHathiTrust: Aspiring to Build the Universal Library UKSG Annual Conference March 26-28, 2012 Jeremy York, Project Librarian, HathiTrust
  • 2. PartnershipArizona State University North Carolina State University of ConnecticutBaylor University University University of FloridaBoston College Northwestern University University of IllinoisBoston University The Ohio State University University of Illinois at ChicagoCalifornia Digital Library The Pennsylvania State The University of IowaColumbia University University Princeton University University of MarylandCornell University Purdue University University of MiamiDartmouth CollegeDuke University Stanford University University of MichiganEmory University Texas A&M University University of MinnesotaFlorida State University Universidad Complutense University of MissouriGetty Research Institute de Madrid University of Nebraska-LincolnHarvard University Library University of Arizona The University of NorthIndiana University University of Calgary Carolina at Chapel HillJohns Hopkins University University of California University of Notre DameLafayette College Berkeley Davis University of PennsylvaniaLibrary of Congress Irvine University of PittsburghMassachusetts Institute of Technology Los Angeles University of UtahMcGill University` Merced University of VirginiaMichigan State University Riverside University of WashingtonNew York Public Library San Diego University of Wisconsin-New York University San Francisco MadisonNorth Carolina Central Santa Barbara Utah State University University Santa Cruz Washington University The University of Chicago Yale University Library
  • 3. Digital Repository• Launched 2008• Initial focus on digitized book and journal content – 10,109,919 total volumes – 5,372,755 book titles – 266,540 serial titles – 2,802,347 public domain (~28%)
  • 4. The Name• The meaning behind the name – Hathi (hah-tee)--Hindi for elephant – Big, strong – Never forgets, wise – Secure – Trustworthy
  • 5. Mission• To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge
  • 6. HathiTrust Universal Library Common GoalSingle Entity, Many Partners
  • 7. Collections and Collaboration• Comprehensive collection - Preservation…with Access• Shared strategies – Copyright – Collection management, development – Preservation – Discovery / Use – Bibliographic Indeterminacy – Efficient user services• Public Good
  • 8. Content Distribution U.S. Federal Government Documents (worldwide) 4% Public Domain72% "Public Domain" Public Domain (US) 28% (worldwide) 10% 14% Open Access .1% Creative Commons .01%
  • 9. Content Sources LC Minnesota 1% 1% Yale UNC-Chapel Hill Harvard Madrid 0% Virginia 0% Utah State Indiana 1% 1% 0% 0% Chicago 2% NCSU 0% Columbia NorthwesternDuke 1% 0% 0% Princeton 0% Illinois Purdue Penn State 3% 0% NYPL 0% 0% Cornell 3%Wisconsin 4% 5% Michigan 45% California 33%
  • 10. Dates 1500-1599 1600-1699 1800-1849 0% 1900-1909 0% 3% 4% 0-1500 2000-2009 1700-1799 1850-1899 0% 10% 8% 1% 1910-1919 4% 1990-19991920-1929 4% 14% 1930-1939 4% 1980-1989 15% 1940-1949 1960-1969 1970-1979 4% 11% 13% 1950-1959 6%
  • 11. Language Distribution (1) The top 10 languages make up Remaining ~86% of all content Languages Arabic Latin 14%Italian 2% 1% 3% Japanese 3% EnglishRussian 48% 4% Chinese German 4% 9% Spanish 5% French 7%
  • 12. Language Distribution (2) Bulgarian ArmenianAncient-Greek Panjabi Catalan Malayalam 1% 1% 1% 1% 1% 1% Multiple The next 40 Sanskrit 1% 2% Ukrainian Serbian Marathi Malay Undetermined languages make 1% 1%Romanian Telugu 1% 1% Finnish 7% up ~13% of total Slovak Vietnamese Greek 1% 1% 1%1% Polish Hungarian 1% 7% 1% 2% Portuguese Norwegian Dutch 7% 2% 5% Music 2%Bengali Tamil 2% Hebrew 2% 5% Persian Hindi 2% 5% Unknown Czech Indonesian 3% 3% Thai Korean Turkish Urdu 4% Danish 3% Swedish 4%Croatian 3% 3% 3% 3% 2%
  • 13. Preservation with Access• Cost effective preservation and access services• Preservation – TRAC-certified – Robust infrastructure – Long-term commitments on digital content facilitate planning, decision-making
  • 14. Executive Committee Strategic Advisory BoardBudget/Finances Decision-making Guidance on Policy, Planning Collective Work: Working Groups and Committees Operational Operational Strategic •• Communications Communications • Collections •• User Support User Support • Discovery Interface •• User Experience User Experience • Full-text Search Distributed work • Driven by needs of institutions • Leverage across the partnership • Projects, Grant Work, Ingest Specifications, PageTurner, Bibliographic Data Management HathiTrust
  • 15. Bibliographic Enterprise Repository Repository Rights Collection Governance Data Management Administration Administration Management Development Management Communication Data management Digital Budget, Finances Hardware Copyright Entity description and Coordination (content • Expansion beyond configuration and determination (record-level) with partner storage, backup, in books and journals institutions maintenance (born- Decision-making tegrity digital, images and checks, deletion) Object maps, audio) Project Copyright review identification • Selection of Policy management Web and (item-level) content (for non- application server Google volume configuration and Hardware selection ingest and pilots Copyright projects) maintenance and replacement information Data availability Planning management Print (database) • Cloud Library (effect Security of digital on print) Content and Metadata specifications Rightsholder permissions Permissions Disaster Recovery Logging Processes for ensuring content integrity Qualitye-Commerce Content Ingest Content Access User Services Outreach Legal Assurance Transformation PageTurner Quality Review Risk management Print on Demand Usability Project website (use of materials) Validation Collection Builder Content User support Partner Certification Monthly agreements (helpdesk) newsletter Large-scale Search Advocacy Papers and Financial presentations contributions Research Center HathiTrust Functional Communication of partners Bibliographic Framework with potential partners Catalog Surveys, general APIs inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC)
  • 16. Constitutional Convention• October 2011• 52 partners• 3-year review overseen by SAB• Ballot Proposals – Print monograph storage – Approval Process for development initiatives – U.S. Government Documents – Fee-for-service content deposit – Governance
  • 17. Emerging Governance• 12-member Board of Governors – 3-member Executive Committee – Executive Director• 6 seats to founding institutions – 2 California, 2 CIC (minus Indiana and Michigan) – 1 Indiana, 1 Michigan• Voting (March 1 – March 15)• Announcement of Results March 30• Begin work April 16, 2012
  • 18. Preservation with Access• Cost effective preservation and access services• Preservation – TRAC-certified – Robust infrastructure – Long-term commitments on digital content facilitate planning, decision-making
  • 19. Preservation with Access (2)• Discovery – Bibliographic and full-text search of all materials – Extended discovery (ProQuest, EBSCO, OCLC, Ex Libris) – Mechanisms for local loading of records
  • 20. Preservation with Access (3)• Access and Use – Public domain and open access works – Full download of materials where possible* – Print on demand – Collections and APIs – Research Center* – Lawful uses of in-copyright works*
  • 21. Lawful uses• Access to users who have print disabilities• Section 108 uses of materials• Access to orphan works
  • 22. Terms of Access• Available to students, faculty, staff of partnering institutions – On library premises or authenticated into HathiTrust• Partner libraries own a print copy – One simultaneous user per print copy owned• Users must be on U.S. soil• One page at a time download
  • 23. How do we facilitate uses?• Fundamental issues of – Identification – Description – Rights
  • 24. Approach• Collective problems as collective• Web of relationships Rights Records Digital Volumes Libraries Print Volumes
  • 25. Bibliographic Data• Normalization of bibliographic data – University of Michigan • Efficiency – California Digital Library
  • 26. Copyright• Bibliographic metadata• Automatic and manual rights determination
  • 27. Automatic Rights Determination• Conducted on all works at time of ingest and when records are modified – Public domain worldwide • US works published before 1923, US federal government publications, non-US works published prior to 1872 – Public domain in the United States • Non-US works published prior to 1923
  • 28. Manual Rights Determination• IMLS-funded CRMS project – US-published works 1923-1963 – Conformance with formalities – Expanding to non-US works – Double-blind review with expert review for conflicts – Staff at 4 HathiTrust partner institutions (15 will take part in non-US) – As of February 2012 ~190,000 reviewed, more than 100,000 opened• Rights Holder Permissions
  • 29. Breakdown of HathiTrust book corpus by publication dateBibliographic Indeterminacy and the Scale of Problems and Opportunities of "Rights" in Digital Collection Building – 2/2011
  • 30. Breakdown of HathiTrust book corpus by publication date
  • 31. Copyright status of books published pre-1923 and US workspublished 1923-1963
  • 32. Copyright status of books published pre-1923 and US workspublished 1923-1963 Pre-1872 ~ 5%
  • 33. Copyright status of books published pre-1923 and US workspublished 1923-1963 Pre-1872 ~ 5% Public Domain in the US
  • 34. Copyright status of books published pre-1923 and US works published 1923-1963? Pre-1872 ~ 5% Public Domain in the US
  • 35. Copyright status of books published pre-1923 and US workspublished 1923-1963
  • 36. Copyright status of books published pre-1923 and US works published 1923-1963In Print ?
  • 37. Collection Management, Development• Overlap
  • 38. A global change in the library environment 60% Academic print book collection already substantially 50% duplicated in mass digitized book corpus June 2010% of Titles in Local Collection 40% Median duplication: 31% 30% 20% 10% June 2009 Median duplication: 19% 0% 0 20 40 60 80 100 120 Rank in 2008 ARL Investment Index
  • 39. Digitized Books in Shared Repositories ~3.5M titles 3,500,000 ~75% of mass digitized corpus is ‘backed up’ in one or more shared print repositories 3,000,000 ~2.5M 2,500,000Unique Titles 2,000,000 1,500,000 1,000,000 500,000 0 Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10 Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
  • 40. Collection Management, Development• Overlap – More than 50% median overlap with ARL institutions; higher for small liberal arts colleges• Pricing model based on Print holdings – Requires print holdings database – Also support expansion of legal uses, efforts in de- duplication – Facilitate individual and collaborative collection development and management operations• Print monographs archiving
  • 41. Collection Management, Development• Discovery (OCLC)• Collections Committee
  • 42. Comprehensive Picture• “Definitional Issues” – Identification, Description, Rights• Discovery and Use – Finding – Relating (APIs and integration) – Using (Reading, Computational activities)• Collection management, development• Preservation infrastructure – Digital and Print – Relationships
  • 43. Work going forward• Definitional elements• Print archiving, management• Discovery and use – Lawful uses• Research Center• Quality• Government documents• Beyond books and journals• Publishing• Transitioning to next phase of partnership
  • 44. How to find out more• Web site “About” section • http://www.hathitrust.org/about• HathiTrust Research Center • http://www.hathitrust.org/htrc• Twitter • http://twitter.com/hathitrust• Monthly newsletter • http://www.hathitrust.org/updates • RSS: http://www.hathitrust.org/updates_rss• Contact us: feedback@issues.hathitrust.org• Blogs: http://www.hathitrust.org/blogs • Large-scale search • Perspectives from HathiTrust
  • 45. Thank you very much!