• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
HathiTrust--a GovDocs Repository?
 

HathiTrust--a GovDocs Repository?

on

  • 3,426 views

Presentation given at "Leveraging Your Strengths: Regional Government Documents Conference" at the Federal Reserve Bank St. Louis on May 4, 2012

Presentation given at "Leveraging Your Strengths: Regional Government Documents Conference" at the Federal Reserve Bank St. Louis on May 4, 2012

Statistics

Views

Total Views
3,426
Views on SlideShare
1,759
Embed Views
1,667

Actions

Likes
1
Downloads
12
Comments
0

9 Embeds 1,667

http://freegovinfo.info 1582
http://www.freegovinfo.info 65
http://wordpress.freegovinfo.info 8
https://si0.twimg.com 3
http://embedded.dreamwidth.net 3
http://webcache.googleusercontent.com 2
http://translate.googleusercontent.com 2
http://www.hanrss.com 1
http://cache.baidu.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • HathiTrust was launched in 2008 by a 12-university consortium known as the Committee on Institutional Cooperation (CIC), along with the University of California system. It has grown to more than 60 partners, including Columbia, Princeton, Yale, Duke, and Johns Hopkins, Also MissouUnlike other e-book initiatives, both PRESERVATION and ACCESS are main focal points. HathiTrust stated intention to preserve digital volumes over long term. hāthī (हाथी) (pronounced HAH-tee) is the Hindi word for elephant, an animal highly regarded for its memory, wisdom, and strength.5,422,301 book titlesTrustworthy Repository Audit and Certification (TRAC)269,168 serial titles
  • Much of the current content in HathiTrust was digitized as part of the Google Books. Another major source is from the Internet Archive. Increasing amount of content coming out of digitization by partner libraries. So if most of content in HT is in Google or IA, how is it different-- digital library organized by libraries for libraries and their users -- Catalog structure to facilitate access -- use all the metadata -- fine tune search interfaces to fit user’s needs --open access to data and meta--key goal PRESERVATION over the long haul. -- Locally digitized collections from partners increasingly important-- coordination w/ other digital library initiatives like Digital Public Library of America, as wellCLICK ON HATHI TRUST TO SHOW CONTENT VISUALIZATIONS. Overlap -- Median overlap for ARL libraries is 50% Higher for smaller college libraries (already 50% in May 2011) Jeremy York, "HathiTrust: Aspiring to Build the Universal Library". UKSG Annual Conference, March 26, 2012.
  • According to the report on HT Constitutional Conventional, there are an estimated 300,000 of US documents in HathiTrust which according to the report 1/5/ to 1/3 of all printed documentsOthers estimate the Gov Docs constitute about 4% of all titles in HathiTrust. Malpas in her report Cloud Sourcing notes that ~ 80 of gov docs in public domain and thus are or should be in viewable in their entirety. Gov Docs account for high percentage of public domain materials. Automatic rights determination: Conducted on all works at time of ingest and when records are modifiedPublic domain worldwideUS works published before 1923, US federal government publications, non-US works published prior to 1872Public domain in the United StatesNon-US works published prior to 1923
  • Using Monthly Catalog of US Government Publications, 1895-1976 via Proquest and Catalog of Government Publications (1976 onward), Christopher Brown at University of Denver looked at the percentage of government documents in HathiTrust.Best coverage for 1970-1980s; worse coverage for late 19th century. And of course drops off in 2000s with GPOs decision to do away with most print documents.
  • Sare, Laura. 2012. “A Comparison of HathiTrust and Google Books Using Federal Publications.” Practical Academic Librarianship: The International Journal of the SLA Academic Division 2Using a random sample of 1540 federal documents published between 1943 and 1976, Sare looked at number of titles found, full text, search interface, and quality of bibliographic records.OVERALL – Sare found more docs listed in Google Books but more documents were available as full text in HathiTrust1940s -- 385 total HT found 98 titles; 90 were full text Google found 181 titles; only 4 were full text DUE TO GOOGLE DECISION TO CONSIDER ANYTHING PUBLISHED AFTER 1923 AS IN-COPYRIGHT MATERIAL BECAUSE THEY FALL WITHIN ORPHAN WORKS TIMEFRAME. . THOSE NON-FULL TEXT TITLES FOUND IN GOOGLE BOOKS EITHER HAD “SNIPPET” VIEWS OR “NO PREVIEW” AVAILABE. ---BETTER BIBLIOGRAPHIC DATA IN HATHITRUST RECORDS, ESPECIALLY FOR SERIALS. MULTIPLE RECORDS AND MULTIPLE LINKS IN WORLDCAT RECORDS FOR GOOGLE BOOKS ESPECIALLY CUMBERSOME. PUTTING ON CATALOGING HAT– TITLE CHANGES NOTED IN HT RECORDS WHEREAS NOT IN GOOGLE BOOKSSNIPPET views useful. Also the fact users can see keywords in context beneficial
  • Everyone can view books and journals, docs in public domain and read online. Single pages can be downloaded. In fact, anyone could download multiple pages from a public domain volume – just one page at a time. The one main difference for users from a HathiTrust partner library is that they can download entire volumes of public domain materials. Other difference is that users from HT partners can save sets of records. Can do this even if not from a partner library although very very difficult.---------NOW LET’S GO INTO HATHITRUSTCLICK ON ICON TO GO TO HT CATALOGAU: United statesAU: CommerceKW: LumberProblems of the softwood lumber industry : hearings before the Committee on Commerce, United States Senate, Eighty-seventh Congress, second session, on impact of lumber imports...by United States. Congress. Senate. Committee on Commerce. Published 1962 FULL TEXT SEARCH – both public domain and copyrightCubanCastroimmigrationCollections:Official gazette of the United States Patent OfficBritish Foreign offceNASA Technical reportsUNDER ABOUTOUR RESEARCH CENTER. AUTHOR SEARCH*DIFFERENT THAN GOOGLE’S N-GRAM, ALSO SEARCHES IN COPYRIGHT.
  • HathiTrust was launched in 2008 by a 12-university consortium known as the Committee on Institutional Cooperation (CIC), along with the University of California system. It has grown to more than 60 partners, including Columbia, Princeton, Yale, Duke, and Johns Hopkins, Also MissouContent includes Google Books, InternetArchiv, and digital collections from partnershāthī (हाथी) (pronounced HAH-tee) is the Hindi word for elephant, an animal highly regarded for its memory, wisdom, and strength.
  • LOADING RECORDS INTO LOCAL ILS-- AVAILABLE TO ANY LIBRARY – REGARDLESS IF PARTNER OR NOT PARTNER1. University of Michigan provides an OAI feed of MARC21 and Dublin Core records for public domain items. OAI Toolkit to assist in harvesting records2. Tab delimited files use to retrieve metadata3. Exporting records from WorldCat – public domain materials not identified; would need to check items individually. Non partner libraries who have done this-- Ball State University-- Kent State University (records available via OHIOLink??) – end of 2009/beginning of 2010 Link to powerpoint at end-- University of Colorado at Denver -- Did this in May 2008 but deleted records in 2011 when started using Summon --Article by Jeffrey Beall on this at end. **HATHI DOES ASSIST WITH CREATING CUSTOMIZED DATA SETS. --issues – -- initial labor -- continual maintenance -- would need to regularly download new files -- can your system handle it? Sluggishness -- duplication
  • HathiTrust was launched in 2008 by a 12-university consortium known as the Committee on Institutional Cooperation (CIC), along with the University of California system. It has grown to more than 60 partners, including Columbia, Princeton, Yale, Duke, and Johns Hopkins, Also MissouContent includes Google Books, InternetArchiv, and digital collections from partnershāthī (हाथी) (pronounced HAH-tee) is the Hindi word for elephant, an animal highly regarded for its memory, wisdom, and strength.
  • CURRENT CHALLENGES WITH HATHITRUST CATALOG – CAN’T LIMIT TO GOV DOCS; CAN’T SEARCH BY SUDOC NUMBER. AND THE ONGOING ISSUE OF AGENCY NAME CHANGES -- ALL IMPEDIMENTS TO USERS FINDING GOV DOCSANOTHER THORNY ISSUE IS INACCURATE COPYRIGHT STATUS. SOMETIMES GOV DOCS IN PUBLIC DOMAIN NOTED AS “IN COPYRIGHT” AND THUS NOT AVAILABLE IN FULL VIEW. GOOGLE LOCKED DOWN ALL MATERIALS PUBLISHED AFTER 1923 REGARDLESS OF IF GOV DOC OR NOT. ----
  • HathiTrust was launched in 2008 by a 12-university consortium known as the Committee on Institutional Cooperation (CIC), along with the University of California system. It has grown to more than 60 partners, including Columbia, Princeton, Yale, Duke, and Johns Hopkins, Also MissouContent includes Google Books, InternetArchiv, and digital collections from partnershāthī (हाथी) (pronounced HAH-tee) is the Hindi word for elephant, an animal highly regarded for its memory, wisdom, and strength.

HathiTrust--a GovDocs Repository? HathiTrust--a GovDocs Repository? Presentation Transcript

  • HathiTrust--a GovDocs Repository? Brian Vetruba, Catalog Librarian/Germanic Studies Librarian Washington University in St. Louis bvetruba@wustl.edu Leveraging Your Strengths: Regional GovernmentDocuments Conference | Federal Reserve Bank St. Louis May 4, 2012
  • Overview• Began in 2008• Over 10.2 million volumes• Over 2.9 million public domain (PD) volumes (“full view”)• Over 60 partners hāthī ( ) (pronounced HAH-tee) is the Hindi word for elephant.
  • Content HathiTrust Partners: Content by call number, language, date And more... http://www.hathitrust.org/statistics_visualizations
  • US Gov Docs in HathiTrust• Ca. 300,000 = 4% of all titles in HathiTrust• 80% of gov docs in HathiTrust in public domain
  • Percentage of Docs in HathiTrust (est.) 1895-2009 Brown (2011)
  • HathiTrust Compared to Google BooksMore titles found in Google but HathiTrust provides more full-text Total docs = 385 Titles Found Full-text (1940s) HathiTrust 98 90 Google 181 4HathiTrust better for searching serials One record for all issues of a title Title changes noted Sare (2012)
  • Who can do whatEveryone HathiTrust Partners• View PD content • Download entire volumes• Search PD and copyright of PD materials materials • Create private or public• View public collections collections• Download single PD pages • Have a voice in the future• Download MARC records of HathiTrust
  • Advanced Search in HathiTrust
  • Searching & Discovering ContentHT Catalog http://www.hathitrust.org/HT WorldCat Local prototypehttp://hathitrust.worldcat.org/Resource Discovery Tools
  • Searching & Discovering ContentLoading records into local ILS http://www.hathitrust.org/dataBibliographic and Data APIs http://www.hathitrust.org/dataWidgets http://www.hathitrust.org/widgets
  • Searching & Discovering ContentEmbed links Public collections  http://babel.hathitrust.org/cgi/mb Individual itemsFrom a LibGuide for a German literature course
  • Final ThoughtsCHALLENGES• Searching/retrieval obstacles (e.g. no SUDoc search)• Inaccurate copyright statuses impeding access• Inaccurate linkages and bibliographic infoPROGRESS• Commitment to expand and enhance access to gov docs• Research study examining how to improve access• Coordination with Committee on Institutional Cooperation and others to create a digital corpus of 1+ million print docs
  • Questions about HathiTrust http://www.hathitrust.org/help feedback@issues.hathitrust.org @hathitrust More info:http://libguides.wustl.edu/hathitrust
  • More info on loading records into ILSKent State Univ.:http://techserv.lib.muohio.edu/ovgtsl11/presentations/Panchyshyn.pptxUniv. of Denver:http://www.slideserve.com/holleb/harvesting-hathitrust-documents-a-new-model-for-online-accessUniv. of Colorado-Denver:Beall, Jeffrey. 2009. “Free Books: Loading Brief MARC Records for Open-Access Books in an Academic Library Catalog.” Cataloging & ClassificationQuarterly 47 (5) (January 4): 452–463. doi:10.1080/01639370902870215.
  • BibliographyMalpas, Constance. 2011. Cloud-sourcing Research Collections ManagingPrint in the Mass-digitized Library Environment. Dublin, Ohio  OCLC :Research. Accessed May 2, 2012http://www.oclc.org/research/publications/library/2011/2011-01.pdfYork, Jeremy. 2012. HathiTrust: Issues and Challenges in Preserving thePublished Record [PowerPoint slides]. Accessed April 30, 2012http://www.hathitrust.org/documents/HathiTrust-Amigos-201202.pptxBrown, Christopher C. 2011. Harvesting HathiTrust Documents: A New Modelfor Online Access [PowerPoint slides]. Accessed April 30, 2012http://www.slideserve.com/holleb/harvesting-hathitrust-documents-a-new-model-for-online-accessYork, Jeremy. 2012. “HathiTrust: The Elephant in the Library.” Library Issues:Briefings for Faculty and Administrators 32 (3) (January). Accessed May 2,2012 http://www.libraryissues.com/sub/LI320003.asp .Sare, Laura. 2012. “A Comparison of HathiTrust and Google Books UsingFederal Publications.” Practical Academic Librarianship: The InternationalJournal of the SLA Academic Division 2 (1): 1–25. Accessed May 2, 2012http://journals.tdl.org/pal/article/viewFile/5880/5922
  • Thank you!bvetruba@wustl.edu @bvetruba