0
Harvesting HathiTrust Documents: A New
Model for Online Access

Christopher C. Brown

University of Denver, Penrose Librar...
DR, IR,
Digital Texts

Inbound Harvesting
Outbound Harvesting

This presentation will show how Encore harvesting can be
us...
Collection Downsizing?

Malpas, Constance. 2011. Cloud-sourcing Research Collections: Managing Print in the Mass-digitized...
About University of Denver
Depository since 1909
Historically a 70-75% selective
Now a 4.8% selective, but receive 100%...
Partial Solution: Using Encore for
Outbound Harvesting
All documents off-site
Our users are accustomed to using electron...
PD = where docs generally live

Hathi Trust Attributes
From: http://www.hathitrust.org/rights_database
Sampling Method
I wanted to see how many government documents were in

our Hathi Trust harvest
Limit to Hathi Trust for ...
Harvesting Hathi Docs: The Stats
Date Range
2000-2009
1990-1999
1980-1989
1970-1979
1960-1969
1950-1959
1940-1949
1930-193...
Malpas: Docs about 3% of Hathi Total
and 15% of Public Domain
GovDocs: 3% overall

GovDocs: 15% of Public Domain
Hathi Docs Usage in Proportion to Docs
Distribution

Sources: 1895-1976 data: Monthly Catalog, 1895-1976 (ProQuest);1976 o...
% Docs in HathiTrust (est.)
Hathi Docs Links Provide Access to
Docs in Storage
Stripped-Out Fields
008 fixed field data

650 subfields other than “a”

500 notes
5xx shipping list info
300 subfields aft...
Use Stats for Hathi Trust?

Statistics from Google Analytics
•Statistics for all Hathi Trust records accessed, not just do...
Harvesting with Summon
Summon Harvesting of HathiTrust
Conclusions
Documents content in HathiTrust can provide a

suitable surrogate for a limited subset of documents,
but not ...
Upcoming SlideShare
Loading in...5
×

Harvesting HathiTrust Documents: A New Model for Online Access

191

Published on

Brown, Christopher C. “Harvesting HathiTrust Documents: A New Model for Online Access.” Presentation given at the 2011 Missouri Government Documents Conference, 7 June 2011, Columbia, MO.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
191
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Harvesting HathiTrust Documents: A New Model for Online Access"

  1. 1. Harvesting HathiTrust Documents: A New Model for Online Access Christopher C. Brown University of Denver, Penrose Library (303) 871-3404 cbrown@du.edu 2011 Missouri Government Documents Conference
  2. 2. DR, IR, Digital Texts Inbound Harvesting Outbound Harvesting This presentation will show how Encore harvesting can be used to mitigate a space problem in a library, substituting online access for the need for physical access to the collection. The government documents collection will be the primary focus. Note: Encore is the next-generation catalog interface produced by Innovative Interfaces, Inc.
  3. 3. Collection Downsizing? Malpas, Constance. 2011. Cloud-sourcing Research Collections: Managing Print in the Mass-digitized Library Environment. Dublin, Ohio: OCLC Research. http://www.oclc.org/research/publications/library/2011/2011-01.pdf.
  4. 4. About University of Denver Depository since 1909 Historically a 70-75% selective Now a 4.8% selective, but receive 100% of online cataloging Adding URLs to historic documents Currently 100% of our paper documents are in storage We are remodeling our library. Under the remodeling plan, all docs will remain in remote storage.
  5. 5. Partial Solution: Using Encore for Outbound Harvesting All documents off-site Our users are accustomed to using electronic documents Need to divert attention away from physical collection holdings Encore harvesting of Hathi Trust can do this
  6. 6. PD = where docs generally live Hathi Trust Attributes From: http://www.hathitrust.org/rights_database
  7. 7. Sampling Method I wanted to see how many government documents were in our Hathi Trust harvest Limit to Hathi Trust for a given year Examine first result on each page of 25 results (4% of results) [limitation: Encore only displays first 1,000 results]
  8. 8. Harvesting Hathi Docs: The Stats Date Range 2000-2009 1990-1999 1980-1989 1970-1979 1960-1969 1950-1959 1940-1949 1930-1939 1920-1929 1910-1919 1900-1909 1890-1899 1880-1889 1870-1879 1860-1869 Hathi Totals 505,682 709,214 723,657 631,110 546,914 281,615 184,755 175,103 175,226 175,148 179,018 112,295 83,950 58,624 50,907 4,593,218 Hathi All Pub Domain pdus + pd 14,140 29,163 33,753 28,633 21,244 20,861 17,096 16,237 66,563 169,923 153,284 110,605 82,809 57,826 50,337 872,474 Hathi pdus DU pd Harvest 726 13,369 880 28,164 1,204 32,321 2,046 26,189 1,987 18,991 863 19,893 600 16,253 654 15,317 27,108 28,854 75,955 61,230 70,900 47,999 50,502 34,742 38,928 23,855 27,202 17,751 2,273 45,790 301,828 430,718 Docs Sampling 13,340 99.78% 26,662 94.67% 31,370 97.06% 25,607 97.78% 7,668 40.38% 3,888 19.54% 3,771 23.21% 2,600 16.97% 1,529 5.30% 4,124 6.73% 2,265 4.72% 596 1.72% 699 2.93% 319 1.80% 248 0.54% 124,686 28.95% Statistics as of mid-March, 2011 The Docs Sampling columns show the estimated numbers of docs per year and the estimated percentage of docs per year from the Harvest
  9. 9. Malpas: Docs about 3% of Hathi Total and 15% of Public Domain GovDocs: 3% overall GovDocs: 15% of Public Domain
  10. 10. Hathi Docs Usage in Proportion to Docs Distribution Sources: 1895-1976 data: Monthly Catalog, 1895-1976 (ProQuest);1976 onward data: CGP
  11. 11. % Docs in HathiTrust (est.)
  12. 12. Hathi Docs Links Provide Access to Docs in Storage
  13. 13. Stripped-Out Fields 008 fixed field data 650 subfields other than “a” 500 notes 5xx shipping list info 300 subfields after “a” 086 SuDocs number
  14. 14. Use Stats for Hathi Trust? Statistics from Google Analytics •Statistics for all Hathi Trust records accessed, not just documents •Spikes in usage are docs librarian (my) testing, not real users
  15. 15. Harvesting with Summon
  16. 16. Summon Harvesting of HathiTrust
  17. 17. Conclusions Documents content in HathiTrust can provide a suitable surrogate for a limited subset of documents, but not a wholesale replacement. HathiTrust documents can be used as surrogates for selected titles, especially larger serial runs. But it is difficult at this time to isolate those titles. HathiTrust is definitely worth harvesting into local catalogs or other digital repositories.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×