SlideShare a Scribd company logo
Harvesting HathiTrust Documents: A New
Model for Online Access

Christopher C. Brown

University of Denver, Penrose Library
(303) 871-3404
cbrown@du.edu
2011 Missouri Government Documents Conference
DR, IR,
Digital Texts

Inbound Harvesting
Outbound Harvesting

This presentation will show how Encore harvesting can be
used to mitigate a space problem in a library, substituting
online access for the need for physical access to the
collection. The government documents collection will be the
primary focus.
Note: Encore is the next-generation catalog interface produced by Innovative Interfaces, Inc.
Collection Downsizing?

Malpas, Constance. 2011. Cloud-sourcing Research Collections: Managing Print in the Mass-digitized
Library Environment. Dublin, Ohio: OCLC Research.
http://www.oclc.org/research/publications/library/2011/2011-01.pdf.
About University of Denver
Depository since 1909
Historically a 70-75% selective
Now a 4.8% selective, but receive 100%

of online cataloging
Adding URLs to historic documents
Currently 100% of our paper documents
are in storage
We are remodeling our library. Under the
remodeling plan, all docs will remain in
remote storage.
Partial Solution: Using Encore for
Outbound Harvesting
All documents off-site
Our users are accustomed to using electronic documents
Need to divert attention away from physical collection

holdings
Encore harvesting of Hathi Trust can do this
PD = where docs generally live

Hathi Trust Attributes
From: http://www.hathitrust.org/rights_database
Sampling Method
I wanted to see how many government documents were in

our Hathi Trust harvest
Limit to Hathi Trust for a given year
Examine first result on each page of 25 results (4% of
results) [limitation: Encore only displays first 1,000 results]
Harvesting Hathi Docs: The Stats
Date Range
2000-2009
1990-1999
1980-1989
1970-1979
1960-1969
1950-1959
1940-1949
1930-1939
1920-1929
1910-1919
1900-1909
1890-1899
1880-1889
1870-1879
1860-1869

Hathi Totals
505,682
709,214
723,657
631,110
546,914
281,615
184,755
175,103
175,226
175,148
179,018
112,295
83,950
58,624
50,907
4,593,218

Hathi All Pub
Domain
pdus + pd
14,140
29,163
33,753
28,633
21,244
20,861
17,096
16,237
66,563
169,923
153,284
110,605
82,809
57,826
50,337
872,474

Hathi pdus DU pd Harvest
726
13,369
880
28,164
1,204
32,321
2,046
26,189
1,987
18,991
863
19,893
600
16,253
654
15,317
27,108
28,854
75,955
61,230
70,900
47,999
50,502
34,742
38,928
23,855
27,202
17,751
2,273
45,790
301,828
430,718

Docs Sampling
13,340
99.78%
26,662
94.67%
31,370
97.06%
25,607
97.78%
7,668
40.38%
3,888
19.54%
3,771
23.21%
2,600
16.97%
1,529
5.30%
4,124
6.73%
2,265
4.72%
596
1.72%
699
2.93%
319
1.80%
248
0.54%
124,686
28.95%

Statistics as of mid-March, 2011
The Docs Sampling columns show the estimated numbers of docs per
year and the estimated percentage of docs per year from the Harvest
Malpas: Docs about 3% of Hathi Total
and 15% of Public Domain
GovDocs: 3% overall

GovDocs: 15% of Public Domain
Hathi Docs Usage in Proportion to Docs
Distribution

Sources: 1895-1976 data: Monthly Catalog, 1895-1976 (ProQuest);1976 onward data: CGP
% Docs in HathiTrust (est.)
Hathi Docs Links Provide Access to
Docs in Storage
Stripped-Out Fields
008 fixed field data

650 subfields other than “a”

500 notes
5xx shipping list info
300 subfields after “a”
086 SuDocs number
Use Stats for Hathi Trust?

Statistics from Google Analytics
•Statistics for all Hathi Trust records accessed, not just documents
•Spikes in usage are docs librarian (my) testing, not real users
Harvesting with Summon
Summon Harvesting of HathiTrust
Conclusions
Documents content in HathiTrust can provide a

suitable surrogate for a limited subset of documents,
but not a wholesale replacement.
HathiTrust documents can be used as surrogates for
selected titles, especially larger serial runs. But it is
difficult at this time to isolate those titles.
HathiTrust is definitely worth harvesting into local
catalogs or other digital repositories.

More Related Content

Similar to Harvesting HathiTrust Documents: A New Model for Online Access

Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
Ross Mounce
 
Hausstein data cite-dara-dasish2014
Hausstein data cite-dara-dasish2014Hausstein data cite-dara-dasish2014
Hausstein data cite-dara-dasish2014
bhausstein
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
Herbert Van de Sompel
 
BHL Technologies: Review for BHL-Australia
BHL Technologies: Review for BHL-AustraliaBHL Technologies: Review for BHL-Australia
BHL Technologies: Review for BHL-Australia
Chris Freeland
 
Bringing Digital Humanities to the wider public: libraries as incubator for D...
Bringing Digital Humanities to the wider public: libraries as incubator for D...Bringing Digital Humanities to the wider public: libraries as incubator for D...
Bringing Digital Humanities to the wider public: libraries as incubator for D...
Martijn Kleppe
 
Collaborating in the Clouds
Collaborating in the CloudsCollaborating in the Clouds
Collaborating in the Clouds
Tom Ipri
 
Data integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientistData integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientist
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Workshop 5: Uptake of, and concepts in text and data mining
Workshop 5: Uptake of, and concepts in text and data miningWorkshop 5: Uptake of, and concepts in text and data mining
Workshop 5: Uptake of, and concepts in text and data mining
Ross Mounce
 
The Rolex Learning Center at EPFL: a new building for a new vision in collect...
The Rolex Learning Center at EPFL: a new building for a new vision in collect...The Rolex Learning Center at EPFL: a new building for a new vision in collect...
The Rolex Learning Center at EPFL: a new building for a new vision in collect...
Thomas Guignard
 
The Biodiversity Heritage Library
The Biodiversity Heritage LibraryThe Biodiversity Heritage Library
The Biodiversity Heritage Library
Martin Kalfatovic
 
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
National Information Standards Organization (NISO)
 
Apa frascati november 2012
Apa frascati november 2012Apa frascati november 2012
Apa frascati november 2012
pkdoorn
 
What is DataCite?
What is DataCite?What is DataCite?
What is DataCite?
datacite
 
DataCite - services and support for opening up research data
DataCite - services and support for opening up research dataDataCite - services and support for opening up research data
DataCite - services and support for opening up research data
Herbert Gruttemeier
 
BHL / EOL technology sit down
BHL / EOL technology sit downBHL / EOL technology sit down
BHL / EOL technology sit down
Chris Freeland
 
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Christoph Lange
 
Ciard Initiative and a Global Infrastructure for Linked Open Data
Ciard Initiative and a Global Infrastructure for Linked Open Data Ciard Initiative and a Global Infrastructure for Linked Open Data
Ciard Initiative and a Global Infrastructure for Linked Open Data
AIMS (Agricultural Information Management Standards)
 
Cornell 2011 05-13
Cornell 2011 05-13Cornell 2011 05-13
Cornell 2011 05-13
Johannes Keizer
 
Building a Vast Library of Life: The Biodiversity Heritage Library Looks to t...
Building a Vast Library of Life: The Biodiversity Heritage Library Looks to t...Building a Vast Library of Life: The Biodiversity Heritage Library Looks to t...
Building a Vast Library of Life: The Biodiversity Heritage Library Looks to t...
Martin Kalfatovic
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and Humans
petermurrayrust
 

Similar to Harvesting HathiTrust Documents: A New Model for Online Access (20)

Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 
Hausstein data cite-dara-dasish2014
Hausstein data cite-dara-dasish2014Hausstein data cite-dara-dasish2014
Hausstein data cite-dara-dasish2014
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
BHL Technologies: Review for BHL-Australia
BHL Technologies: Review for BHL-AustraliaBHL Technologies: Review for BHL-Australia
BHL Technologies: Review for BHL-Australia
 
Bringing Digital Humanities to the wider public: libraries as incubator for D...
Bringing Digital Humanities to the wider public: libraries as incubator for D...Bringing Digital Humanities to the wider public: libraries as incubator for D...
Bringing Digital Humanities to the wider public: libraries as incubator for D...
 
Collaborating in the Clouds
Collaborating in the CloudsCollaborating in the Clouds
Collaborating in the Clouds
 
Data integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientistData integration and building a profile for yourself as an online scientist
Data integration and building a profile for yourself as an online scientist
 
Workshop 5: Uptake of, and concepts in text and data mining
Workshop 5: Uptake of, and concepts in text and data miningWorkshop 5: Uptake of, and concepts in text and data mining
Workshop 5: Uptake of, and concepts in text and data mining
 
The Rolex Learning Center at EPFL: a new building for a new vision in collect...
The Rolex Learning Center at EPFL: a new building for a new vision in collect...The Rolex Learning Center at EPFL: a new building for a new vision in collect...
The Rolex Learning Center at EPFL: a new building for a new vision in collect...
 
The Biodiversity Heritage Library
The Biodiversity Heritage LibraryThe Biodiversity Heritage Library
The Biodiversity Heritage Library
 
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
April 23 NISO Virtual Conference: Dealing with the Data Deluge: Successful Te...
 
Apa frascati november 2012
Apa frascati november 2012Apa frascati november 2012
Apa frascati november 2012
 
What is DataCite?
What is DataCite?What is DataCite?
What is DataCite?
 
DataCite - services and support for opening up research data
DataCite - services and support for opening up research dataDataCite - services and support for opening up research data
DataCite - services and support for opening up research data
 
BHL / EOL technology sit down
BHL / EOL technology sit downBHL / EOL technology sit down
BHL / EOL technology sit down
 
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
Interlinking Data and Knowledge in Enterprises, Research and Society with Lin...
 
Ciard Initiative and a Global Infrastructure for Linked Open Data
Ciard Initiative and a Global Infrastructure for Linked Open Data Ciard Initiative and a Global Infrastructure for Linked Open Data
Ciard Initiative and a Global Infrastructure for Linked Open Data
 
Cornell 2011 05-13
Cornell 2011 05-13Cornell 2011 05-13
Cornell 2011 05-13
 
Building a Vast Library of Life: The Biodiversity Heritage Library Looks to t...
Building a Vast Library of Life: The Biodiversity Heritage Library Looks to t...Building a Vast Library of Life: The Biodiversity Heritage Library Looks to t...
Building a Vast Library of Life: The Biodiversity Heritage Library Looks to t...
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and Humans
 

More from Christopher Brown

Migrating Government Publications without Going South: Our Alma/Primo Experience
Migrating Government Publications without Going South: Our Alma/Primo ExperienceMigrating Government Publications without Going South: Our Alma/Primo Experience
Migrating Government Publications without Going South: Our Alma/Primo Experience
Christopher Brown
 
Downsizing Your Depository: Dealing with Mandates from Your Administration
Downsizing Your Depository: Dealing with Mandates from Your AdministrationDownsizing Your Depository: Dealing with Mandates from Your Administration
Downsizing Your Depository: Dealing with Mandates from Your Administration
Christopher Brown
 
Downsizing your Depository: Tools and Ideas
Downsizing your Depository: Tools and IdeasDownsizing your Depository: Tools and Ideas
Downsizing your Depository: Tools and Ideas
Christopher Brown
 
Web-scale Discovery Tools and the Backgrounding of Government Information
Web-scale Discovery Tools and the Backgrounding of Government InformationWeb-scale Discovery Tools and the Backgrounding of Government Information
Web-scale Discovery Tools and the Backgrounding of Government Information
Christopher Brown
 
The Darkening of Government Information
The Darkening of Government InformationThe Darkening of Government Information
The Darkening of Government Information
Christopher Brown
 
Collecting Usage Statistics for E-Government Resources
Collecting Usage Statistics for E-Government ResourcesCollecting Usage Statistics for E-Government Resources
Collecting Usage Statistics for E-Government Resources
Christopher Brown
 
Outbound Harvesting with Encore as a Library Space-Saving Strategy : The Cas...
Outbound Harvesting with Encore as a Library Space-Saving  Strategy : The Cas...Outbound Harvesting with Encore as a Library Space-Saving  Strategy : The Cas...
Outbound Harvesting with Encore as a Library Space-Saving Strategy : The Cas...
Christopher Brown
 
Item Deselection on the Fast Track
Item Deselection on the Fast TrackItem Deselection on the Fast Track
Item Deselection on the Fast Track
Christopher Brown
 
Going All-Electronic and Keeping Track of It: Clickthrough Statistics for On...
Going All-Electronic and Keeping Track of It: Clickthrough  Statistics for On...Going All-Electronic and Keeping Track of It: Clickthrough  Statistics for On...
Going All-Electronic and Keeping Track of It: Clickthrough Statistics for On...
Christopher Brown
 
The Three Googles: How I Teach Google in an Academic Setting
The Three Googles: How I Teach Google in an Academic SettingThe Three Googles: How I Teach Google in an Academic Setting
The Three Googles: How I Teach Google in an Academic Setting
Christopher Brown
 
The Front Face of the ERM
The Front Face of the ERMThe Front Face of the ERM
The Front Face of the ERM
Christopher Brown
 
Planning the Six-State Virtual Government Information Conference
Planning the Six-State Virtual Government Information ConferencePlanning the Six-State Virtual Government Information Conference
Planning the Six-State Virtual Government Information Conference
Christopher Brown
 
Fiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents FicheFiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents Fiche
Christopher Brown
 
Summon and the Art of Discovery
Summon and the Art of DiscoverySummon and the Art of Discovery
Summon and the Art of Discovery
Christopher Brown
 
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...
Christopher Brown
 

More from Christopher Brown (15)

Migrating Government Publications without Going South: Our Alma/Primo Experience
Migrating Government Publications without Going South: Our Alma/Primo ExperienceMigrating Government Publications without Going South: Our Alma/Primo Experience
Migrating Government Publications without Going South: Our Alma/Primo Experience
 
Downsizing Your Depository: Dealing with Mandates from Your Administration
Downsizing Your Depository: Dealing with Mandates from Your AdministrationDownsizing Your Depository: Dealing with Mandates from Your Administration
Downsizing Your Depository: Dealing with Mandates from Your Administration
 
Downsizing your Depository: Tools and Ideas
Downsizing your Depository: Tools and IdeasDownsizing your Depository: Tools and Ideas
Downsizing your Depository: Tools and Ideas
 
Web-scale Discovery Tools and the Backgrounding of Government Information
Web-scale Discovery Tools and the Backgrounding of Government InformationWeb-scale Discovery Tools and the Backgrounding of Government Information
Web-scale Discovery Tools and the Backgrounding of Government Information
 
The Darkening of Government Information
The Darkening of Government InformationThe Darkening of Government Information
The Darkening of Government Information
 
Collecting Usage Statistics for E-Government Resources
Collecting Usage Statistics for E-Government ResourcesCollecting Usage Statistics for E-Government Resources
Collecting Usage Statistics for E-Government Resources
 
Outbound Harvesting with Encore as a Library Space-Saving Strategy : The Cas...
Outbound Harvesting with Encore as a Library Space-Saving  Strategy : The Cas...Outbound Harvesting with Encore as a Library Space-Saving  Strategy : The Cas...
Outbound Harvesting with Encore as a Library Space-Saving Strategy : The Cas...
 
Item Deselection on the Fast Track
Item Deselection on the Fast TrackItem Deselection on the Fast Track
Item Deselection on the Fast Track
 
Going All-Electronic and Keeping Track of It: Clickthrough Statistics for On...
Going All-Electronic and Keeping Track of It: Clickthrough  Statistics for On...Going All-Electronic and Keeping Track of It: Clickthrough  Statistics for On...
Going All-Electronic and Keeping Track of It: Clickthrough Statistics for On...
 
The Three Googles: How I Teach Google in an Academic Setting
The Three Googles: How I Teach Google in an Academic SettingThe Three Googles: How I Teach Google in an Academic Setting
The Three Googles: How I Teach Google in an Academic Setting
 
The Front Face of the ERM
The Front Face of the ERMThe Front Face of the ERM
The Front Face of the ERM
 
Planning the Six-State Virtual Government Information Conference
Planning the Six-State Virtual Government Information ConferencePlanning the Six-State Virtual Government Information Conference
Planning the Six-State Virtual Government Information Conference
 
Fiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents FicheFiche Online: A Vision for Digitizing All Documents Fiche
Fiche Online: A Vision for Digitizing All Documents Fiche
 
Summon and the Art of Discovery
Summon and the Art of DiscoverySummon and the Art of Discovery
Summon and the Art of Discovery
 
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...
When there is no Vendor: Statistics for Free Clickthroughs via the Online Cat...
 

Recently uploaded

Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
Wahiba Chair Training & Consulting
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Nguyen Thanh Tu Collection
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdfIGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
Amin Marwan
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
Krassimira Luka
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
PsychoTech Services
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
สมใจ จันสุกสี
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 

Recently uploaded (20)

Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience How to Create a More Engaging and Human Online Learning Experience
How to Create a More Engaging and Human Online Learning Experience
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdfIGCSE Biology Chapter 14- Reproduction in Plants.pdf
IGCSE Biology Chapter 14- Reproduction in Plants.pdf
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
 
Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...Gender and Mental Health - Counselling and Family Therapy Applications and In...
Gender and Mental Health - Counselling and Family Therapy Applications and In...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
คำศัพท์ คำพื้นฐานการอ่าน ภาษาอังกฤษ ระดับชั้น ม.1
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 

Harvesting HathiTrust Documents: A New Model for Online Access

  • 1. Harvesting HathiTrust Documents: A New Model for Online Access Christopher C. Brown University of Denver, Penrose Library (303) 871-3404 cbrown@du.edu 2011 Missouri Government Documents Conference
  • 2. DR, IR, Digital Texts Inbound Harvesting Outbound Harvesting This presentation will show how Encore harvesting can be used to mitigate a space problem in a library, substituting online access for the need for physical access to the collection. The government documents collection will be the primary focus. Note: Encore is the next-generation catalog interface produced by Innovative Interfaces, Inc.
  • 3. Collection Downsizing? Malpas, Constance. 2011. Cloud-sourcing Research Collections: Managing Print in the Mass-digitized Library Environment. Dublin, Ohio: OCLC Research. http://www.oclc.org/research/publications/library/2011/2011-01.pdf.
  • 4. About University of Denver Depository since 1909 Historically a 70-75% selective Now a 4.8% selective, but receive 100% of online cataloging Adding URLs to historic documents Currently 100% of our paper documents are in storage We are remodeling our library. Under the remodeling plan, all docs will remain in remote storage.
  • 5. Partial Solution: Using Encore for Outbound Harvesting All documents off-site Our users are accustomed to using electronic documents Need to divert attention away from physical collection holdings Encore harvesting of Hathi Trust can do this
  • 6. PD = where docs generally live Hathi Trust Attributes From: http://www.hathitrust.org/rights_database
  • 7. Sampling Method I wanted to see how many government documents were in our Hathi Trust harvest Limit to Hathi Trust for a given year Examine first result on each page of 25 results (4% of results) [limitation: Encore only displays first 1,000 results]
  • 8. Harvesting Hathi Docs: The Stats Date Range 2000-2009 1990-1999 1980-1989 1970-1979 1960-1969 1950-1959 1940-1949 1930-1939 1920-1929 1910-1919 1900-1909 1890-1899 1880-1889 1870-1879 1860-1869 Hathi Totals 505,682 709,214 723,657 631,110 546,914 281,615 184,755 175,103 175,226 175,148 179,018 112,295 83,950 58,624 50,907 4,593,218 Hathi All Pub Domain pdus + pd 14,140 29,163 33,753 28,633 21,244 20,861 17,096 16,237 66,563 169,923 153,284 110,605 82,809 57,826 50,337 872,474 Hathi pdus DU pd Harvest 726 13,369 880 28,164 1,204 32,321 2,046 26,189 1,987 18,991 863 19,893 600 16,253 654 15,317 27,108 28,854 75,955 61,230 70,900 47,999 50,502 34,742 38,928 23,855 27,202 17,751 2,273 45,790 301,828 430,718 Docs Sampling 13,340 99.78% 26,662 94.67% 31,370 97.06% 25,607 97.78% 7,668 40.38% 3,888 19.54% 3,771 23.21% 2,600 16.97% 1,529 5.30% 4,124 6.73% 2,265 4.72% 596 1.72% 699 2.93% 319 1.80% 248 0.54% 124,686 28.95% Statistics as of mid-March, 2011 The Docs Sampling columns show the estimated numbers of docs per year and the estimated percentage of docs per year from the Harvest
  • 9. Malpas: Docs about 3% of Hathi Total and 15% of Public Domain GovDocs: 3% overall GovDocs: 15% of Public Domain
  • 10. Hathi Docs Usage in Proportion to Docs Distribution Sources: 1895-1976 data: Monthly Catalog, 1895-1976 (ProQuest);1976 onward data: CGP
  • 11. % Docs in HathiTrust (est.)
  • 12. Hathi Docs Links Provide Access to Docs in Storage
  • 13. Stripped-Out Fields 008 fixed field data 650 subfields other than “a” 500 notes 5xx shipping list info 300 subfields after “a” 086 SuDocs number
  • 14. Use Stats for Hathi Trust? Statistics from Google Analytics •Statistics for all Hathi Trust records accessed, not just documents •Spikes in usage are docs librarian (my) testing, not real users
  • 16. Summon Harvesting of HathiTrust
  • 17. Conclusions Documents content in HathiTrust can provide a suitable surrogate for a limited subset of documents, but not a wholesale replacement. HathiTrust documents can be used as surrogates for selected titles, especially larger serial runs. But it is difficult at this time to isolate those titles. HathiTrust is definitely worth harvesting into local catalogs or other digital repositories.