SlideShare a Scribd company logo
1 of 13
Elephant in the Room:
Scaling Storage for the
HathiTrust Research Center
Robert H. McDonald | @mcdonald
Associate Dean for Library Technologies
Deputy Director Data to Insight Center (D2I)
Indiana University
PASIG 2015
#PASIG2015
UC San Diego
March 12, 2015
@hathitrust
Mission of the HT Research Center
• Research arm of HathiTrust
• Established: July, 2011
• Collaborative center: Indiana University & University
of Illinois
• Mission: Enable researchers world-wide to accomplish
tera-scale text data-mining and analysis
– Develop cyberinfrastructure to enable HPC access to the
HathiTrust Digital Library
– Develop cutting-edge software tools for processing,
analyzing text
– Develop translational tools and data that can be used to
enhance HathiTrust Digital Library services to users
HathiTrust and HTRC
HathiTrust
University
of
Illinois
Indiana
University
HathiTrust
Research
Center
University
of
Michigan
• Board of Governors
• Executive Committee
• Executive Director
Working with HTRC Staff
Advanced Collaborative
Support
Scholarly Commons
Advanced Research
Workshops, tutorials, and
guidance for using HTRC
One-on-one research support
provided through a competitive
awards process
Collaborative research
partnership with HTRC
HathiTrust “Wow” Numbers
• 13,284,163 total volumes
• 6,742,394 book titles
• 352,534 serial titles
• 4,649,457,050 pages
• 595 terabytes
• 157 miles
• 10,793 tons
• 4,979,599 volumes in the public domain
Non-Consumptive Research Paradigm
• No action or set of actions on part of users,
either acting alone or in cooperation with
other users over duration of one or multiple
sessions can result in sufficient information
gathered from collection of copyrighted works
to reassemble pages from collection.
• Definition disallows collusion between users,
or accumulation of material over time.
Differentiates human researcher from proxy
which is not a user. Users are human beings.
DATA LEVELS FOR HTRC
• Derived Factual Data or Supplementary Bibliographic Data; Publicly available in US and
worldwide (bulk download permitted)Level 0
• Page text or Page images that are in public domain in US only and not subject to third
party restrictions; Publicly available to everyone in the US (no bulk download)Level 1A
• Page text or Page images that are in the public domain worldwide and not subject to
third party restrictions; Publicly available to everyone worldwide (no bulk download)Level 1B
• Primary Bibliographic Metadata; Publicly available in US and worldwide (no bulk
download)Level 1C
• Page text or Page images that are in public domain in US only and are subject to third
party restrictions; Publicly available to everyone in the US (no download)Level 2A
• Page text or Page images that are in public domain worldwide and are subject to third
party restrictions; Publicly available to everyone worldwide (no download)Level 2B
• Restricted in-copyright data; may or may not be subject to additional third- party
restrictions (no download)Level 3
Working with HTRC Tools
Get started at: https://htrc2.pti.indiana.edu/
Build Worksets
Execute Algorithms
Visualize Term Frequency
http://sandbox.htrc.illinois.edu/bookworm/
HTRC Architecture
Data API access interface
Portal Access
Direct
programmatic
access (by
programs running
on HTRC machines)
Security (OAuth2)
Audit
Cassandra
cluster
volume store
Solr index
Algorithms
Result Sets
Meandre
Workflows
Registry (WSO2)
Compute resources
Storage resources
Agent
Job
Submission
Collection
building
Collections
Blacklight
Solr Proxy
HTRC Storage Layer Ingest
HTRC Shared IU Systems
Data Systems
• Data Capacitor II
(Lustre)
• NetApp NFS (18 TB)
– GPFS/Data Direct
Networks (28 TB
Compute Systems
• KARST (high-
throughput cluster)
• Big Red 2 (Cray
XE6/XK7)
Current Storage (18 TB > 30 TB)
• MARC – 15 GB
• R – 5 TB
• iPython – 5 TB
• Bookworm – 5 TB
• Public Domain BW – 3 TB
• OCR Data - 5 TB
• OCR Index – 2.3 TB
• Audit Logs – 44 GB
• User created MD – 15 GB
• Blacklight – 20 GB
• IU Pairtree – 2.65. TB
Want More HTRC?
3rd Annual HTRC UnCamp!
March 30-31, 2015 in Ann Arbor, Michigan
DH 2015
June 29-3 July, 2015 in Sydney, Australia
http://www.hathitrust.org/htrc_uncamp2015
Thank You
• This presentation was made possible with content
provided by many HTRC colleagues John Unsworth, J.
Stephen Downie, Beth Plale, Beth Namachchivaya, Dirk
Herr-Hoyman, Milnda Pathirage, Samitha Liyanage,
Miao Chen, Guangchen Ruan, Jiaan Zeng, Loretta Auvil,
Boris Capitanu, and many others…
• The HTRC Non-Consumptive Research Grant was
graciously funded by the Alfred P. Sloan Foundation
• THE HTRC WCSA grant is graciously funded by the
Andrew W. Mellon Foundation.
• HTRC - http://www.hathitrust.org/htrc
• IU D2I Center - http://d2i.indiana.edu/
• UIUC GSLIS - http://www.lis.illinois.edu/

More Related Content

What's hot

Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedKeystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedJoel Azzopardi
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and TechniquesBernhard Haslhofer
 
Towards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebTowards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebStefan Dietze
 
Scripting User Contributed Interlinking
Scripting User Contributed InterlinkingScripting User Contributed Interlinking
Scripting User Contributed Interlinkingwhalb
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinAnja Jentzsch
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation SlidesDuraSpace
 
ORDS, research data network
ORDS, research data networkORDS, research data network
ORDS, research data networkJisc RDM
 
Clipper, research data network
Clipper, research data networkClipper, research data network
Clipper, research data networkJisc RDM
 
Digital Preservation in Production (DPN and DuraCloud Vault)
Digital Preservation in Production (DPN and DuraCloud Vault)Digital Preservation in Production (DPN and DuraCloud Vault)
Digital Preservation in Production (DPN and DuraCloud Vault)DuraSpace
 
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...Stefan Schmunk
 
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusHealth Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusGlobus
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...Micah Altman
 
Illuminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportIlluminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportPascal-Nicolas Becker
 
LOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataLOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataFilip Ilievski
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasIan Foster
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsPeter Haase
 

What's hot (20)

Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedKeystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
 
From Big Data to Fast Data
From Big Data to Fast DataFrom Big Data to Fast Data
From Big Data to Fast Data
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
Towards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the WebTowards embedded Markup of Learning Resources on the Web
Towards embedded Markup of Learning Resources on the Web
 
Open data and linked data
Open data and linked dataOpen data and linked data
Open data and linked data
 
Scripting User Contributed Interlinking
Scripting User Contributed InterlinkingScripting User Contributed Interlinking
Scripting User Contributed Interlinking
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, BerlinDBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
DBpedia Mappings Wiki, SMWCon Fall 2013, Berlin
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides
 
ORDS, research data network
ORDS, research data networkORDS, research data network
ORDS, research data network
 
Clipper, research data network
Clipper, research data networkClipper, research data network
Clipper, research data network
 
Digital Preservation in Production (DPN and DuraCloud Vault)
Digital Preservation in Production (DPN and DuraCloud Vault)Digital Preservation in Production (DPN and DuraCloud Vault)
Digital Preservation in Production (DPN and DuraCloud Vault)
 
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
New tasks, new roles: Libraries in the tension between Digital Humanities, Re...
 
Health Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by GlobusHealth Sciences Research Informatics, Powered by Globus
Health Sciences Research Informatics, Powered by Globus
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
 
Illuminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportIlluminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data Support
 
Digital libraries
Digital librariesDigital libraries
Digital libraries
 
LOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked DataLOTUS: Adaptive Text Search for Big Linked Data
LOTUS: Adaptive Text Search for Big Linked Data
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
 
Discovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data PortalsDiscovering Related Data Sources in Data Portals
Discovering Related Data Sources in Data Portals
 

Viewers also liked

Moving the Elephant in the Room: Data Migration at Scale
Moving the Elephant in the Room: Data Migration at ScaleMoving the Elephant in the Room: Data Migration at Scale
Moving the Elephant in the Room: Data Migration at ScaleTyrone Hinderson
 
Strategy - The elephant in the room
Strategy - The elephant in the roomStrategy - The elephant in the room
Strategy - The elephant in the roomIIBA UK Chapter
 
Addressing the Elephant in the Room - Content Strategy
Addressing the Elephant in the Room - Content StrategyAddressing the Elephant in the Room - Content Strategy
Addressing the Elephant in the Room - Content StrategyRay Killebrew
 
asteRISK
asteRISKasteRISK
asteRISKkrnmcg
 
Elephant in the Room: Social Media ROI - WEB 2.0 NYC
Elephant in the Room: Social Media ROI - WEB 2.0 NYCElephant in the Room: Social Media ROI - WEB 2.0 NYC
Elephant in the Room: Social Media ROI - WEB 2.0 NYCMike Lewis
 
Risk: the Elephant in the Room
Risk: the Elephant in the RoomRisk: the Elephant in the Room
Risk: the Elephant in the RoomLast Call Media
 
The elephant in the room. discussion
The elephant in the room. discussionThe elephant in the room. discussion
The elephant in the room. discussionAndrew Gelston
 
The elephant in the room mongo db + hadoop
The elephant in the room  mongo db + hadoopThe elephant in the room  mongo db + hadoop
The elephant in the room mongo db + hadoopiammutex
 
Lance Concannon, Sysomos: Simplifiying social - How marketers can manage the ...
Lance Concannon, Sysomos: Simplifiying social - How marketers can manage the ...Lance Concannon, Sysomos: Simplifiying social - How marketers can manage the ...
Lance Concannon, Sysomos: Simplifiying social - How marketers can manage the ...ad:tech London, MMS & iMedia
 
CMS Expo 2011 Keynote - The Elephant in the Room
CMS Expo 2011 Keynote - The Elephant in the RoomCMS Expo 2011 Keynote - The Elephant in the Room
CMS Expo 2011 Keynote - The Elephant in the RoomScott Liewehr
 
RIDE 2011: Student dropout – the elephant in the room of distance education (...
RIDE 2011: Student dropout – the elephant in the room of distance education (...RIDE 2011: Student dropout – the elephant in the room of distance education (...
RIDE 2011: Student dropout – the elephant in the room of distance education (...Centre for Distance Education
 
The Elephant In The Room - Research Report 31 July 2013
The Elephant In The Room - Research Report 31 July 2013The Elephant In The Room - Research Report 31 July 2013
The Elephant In The Room - Research Report 31 July 2013Howard Cooke
 
The elephant in the room
The elephant in the roomThe elephant in the room
The elephant in the roomJohn Gillis
 
Kanban. Dealing with the elephant in the room. One chunk at a time
Kanban. Dealing with the elephant in the room. One chunk at a timeKanban. Dealing with the elephant in the room. One chunk at a time
Kanban. Dealing with the elephant in the room. One chunk at a timejsonnevelt
 
How to Tame the Elephant in the Room- 6 steps to build trust and close deals!
How to Tame the Elephant in the Room- 6 steps to build trust and close deals!How to Tame the Elephant in the Room- 6 steps to build trust and close deals!
How to Tame the Elephant in the Room- 6 steps to build trust and close deals!Mitch Jackson
 

Viewers also liked (20)

Moving the Elephant in the Room: Data Migration at Scale
Moving the Elephant in the Room: Data Migration at ScaleMoving the Elephant in the Room: Data Migration at Scale
Moving the Elephant in the Room: Data Migration at Scale
 
Strategy - The elephant in the room
Strategy - The elephant in the roomStrategy - The elephant in the room
Strategy - The elephant in the room
 
Addressing the Elephant in the Room - Content Strategy
Addressing the Elephant in the Room - Content StrategyAddressing the Elephant in the Room - Content Strategy
Addressing the Elephant in the Room - Content Strategy
 
Elephant in Room Version 2
Elephant in Room Version 2Elephant in Room Version 2
Elephant in Room Version 2
 
YUI The Elephant In The Room
YUI The Elephant In The RoomYUI The Elephant In The Room
YUI The Elephant In The Room
 
asteRISK
asteRISKasteRISK
asteRISK
 
ELEARNING IN ART AND DESIGN: THE ELEPHANT IN THE ROOM
ELEARNING IN ART AND DESIGN: THE ELEPHANT IN THE ROOMELEARNING IN ART AND DESIGN: THE ELEPHANT IN THE ROOM
ELEARNING IN ART AND DESIGN: THE ELEPHANT IN THE ROOM
 
Elephant in the Room: Social Media ROI - WEB 2.0 NYC
Elephant in the Room: Social Media ROI - WEB 2.0 NYCElephant in the Room: Social Media ROI - WEB 2.0 NYC
Elephant in the Room: Social Media ROI - WEB 2.0 NYC
 
Risk: the Elephant in the Room
Risk: the Elephant in the RoomRisk: the Elephant in the Room
Risk: the Elephant in the Room
 
The elephant in the room. discussion
The elephant in the room. discussionThe elephant in the room. discussion
The elephant in the room. discussion
 
The elephant in the room
The elephant in the roomThe elephant in the room
The elephant in the room
 
The elephant in the room mongo db + hadoop
The elephant in the room  mongo db + hadoopThe elephant in the room  mongo db + hadoop
The elephant in the room mongo db + hadoop
 
Lance Concannon, Sysomos: Simplifiying social - How marketers can manage the ...
Lance Concannon, Sysomos: Simplifiying social - How marketers can manage the ...Lance Concannon, Sysomos: Simplifiying social - How marketers can manage the ...
Lance Concannon, Sysomos: Simplifiying social - How marketers can manage the ...
 
CMS Expo 2011 Keynote - The Elephant in the Room
CMS Expo 2011 Keynote - The Elephant in the RoomCMS Expo 2011 Keynote - The Elephant in the Room
CMS Expo 2011 Keynote - The Elephant in the Room
 
G!
G!G!
G!
 
RIDE 2011: Student dropout – the elephant in the room of distance education (...
RIDE 2011: Student dropout – the elephant in the room of distance education (...RIDE 2011: Student dropout – the elephant in the room of distance education (...
RIDE 2011: Student dropout – the elephant in the room of distance education (...
 
The Elephant In The Room - Research Report 31 July 2013
The Elephant In The Room - Research Report 31 July 2013The Elephant In The Room - Research Report 31 July 2013
The Elephant In The Room - Research Report 31 July 2013
 
The elephant in the room
The elephant in the roomThe elephant in the room
The elephant in the room
 
Kanban. Dealing with the elephant in the room. One chunk at a time
Kanban. Dealing with the elephant in the room. One chunk at a timeKanban. Dealing with the elephant in the room. One chunk at a time
Kanban. Dealing with the elephant in the room. One chunk at a time
 
How to Tame the Elephant in the Room- 6 steps to build trust and close deals!
How to Tame the Elephant in the Room- 6 steps to build trust and close deals!How to Tame the Elephant in the Room- 6 steps to build trust and close deals!
How to Tame the Elephant in the Room- 6 steps to build trust and close deals!
 

Similar to Elephant in the Room: Scaling Storage for the HathiTrust Research Center

HathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsHathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsBeth Plale
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesRobert H. McDonald
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...Ben Blaiszik
 
HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14Robert H. McDonald
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
 
Plale HathiTrust El Colegio de Mexico May2014
Plale HathiTrust El Colegio de Mexico May2014Plale HathiTrust El Colegio de Mexico May2014
Plale HathiTrust El Colegio de Mexico May2014Beth Plale
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorialJosh Young
 
“Filling the digital preservation gap” an update from the Jisc Research Data ...
“Filling the digital preservation gap”an update from the Jisc Research Data ...“Filling the digital preservation gap”an update from the Jisc Research Data ...
“Filling the digital preservation gap” an update from the Jisc Research Data ...Jenny Mitcham
 
Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundNidhiAhuja30
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...OpenAIRE
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
 
Next generation repositories
Next generation repositoriesNext generation repositories
Next generation repositoriesPaul Walk
 
Graham Pryor
Graham PryorGraham Pryor
Graham PryorEduserv
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...petrknoth
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...petrknoth
 
OpenAIRE: eInfrastructure for Open Science
OpenAIRE: eInfrastructure for Open ScienceOpenAIRE: eInfrastructure for Open Science
OpenAIRE: eInfrastructure for Open ScienceOpenAIRE
 
Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...Artificial Intelligence Institute at UofSC
 
"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with ArchivematicaJenny Mitcham
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationMANENDRASINGH30
 

Similar to Elephant in the Room: Scaling Storage for the HathiTrust Research Center (20)

HathiTrust Research Center Secure Commons
HathiTrust Research Center Secure CommonsHathiTrust Research Center Secure Commons
HathiTrust Research Center Secure Commons
 
JCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening SlidesJCDL 2015 Tutorial Opening Slides
JCDL 2015 Tutorial Opening Slides
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
 
HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14HathiTrust Research Center Data Capsule Overview 09.10.14
HathiTrust Research Center Data Capsule Overview 09.10.14
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 
Plale HathiTrust El Colegio de Mexico May2014
Plale HathiTrust El Colegio de Mexico May2014Plale HathiTrust El Colegio de Mexico May2014
Plale HathiTrust El Colegio de Mexico May2014
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
 
“Filling the digital preservation gap” an update from the Jisc Research Data ...
“Filling the digital preservation gap”an update from the Jisc Research Data ...“Filling the digital preservation gap”an update from the Jisc Research Data ...
“Filling the digital preservation gap” an update from the Jisc Research Data ...
 
Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic background
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
Next generation repositories
Next generation repositoriesNext generation repositories
Next generation repositories
 
Graham Pryor
Graham PryorGraham Pryor
Graham Pryor
 
Big Data
Big Data Big Data
Big Data
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
OpenAIRE: eInfrastructure for Open Science
OpenAIRE: eInfrastructure for Open ScienceOpenAIRE: eInfrastructure for Open Science
OpenAIRE: eInfrastructure for Open Science
 
Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...Federated Architecture with Provenance and Access Control to realize Open Dig...
Federated Architecture with Provenance and Access Control to realize Open Dig...
 
"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica"Filling the Digital Preservation Gap" with Archivematica
"Filling the Digital Preservation Gap" with Archivematica
 
Impact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and EducationImpact of Covid-19 on Learning and Education
Impact of Covid-19 on Learning and Education
 

More from Robert H. McDonald

ER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelRobert H. McDonald
 
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...Robert H. McDonald
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Robert H. McDonald
 
TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15Robert H. McDonald
 
The HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesRobert H. McDonald
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Robert H. McDonald
 
ER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesRobert H. McDonald
 
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkRobert H. McDonald
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsRobert H. McDonald
 
Kuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesKuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesRobert H. McDonald
 
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudCharleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudRobert H. McDonald
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoRobert H. McDonald
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science Robert H. McDonald
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...Robert H. McDonald
 
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Robert H. McDonald
 
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...Robert H. McDonald
 
HathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionHathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionRobert H. McDonald
 

More from Robert H. McDonald (20)

ER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations PanelER&L The Role of Choice in the Future of Discovery Evaluations Panel
ER&L The Role of Choice in the Future of Discovery Evaluations Panel
 
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
 
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
 
TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15TLT Discussion on "Saving My Stuff" - 06.05.15
TLT Discussion on "Saving My Stuff" - 06.05.15
 
The HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational ServicesThe HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center: An Overview of Advanced Computational Services
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
 
ER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote SlidesER&L 2015 Closing Keynote Slides
ER&L 2015 Closing Keynote Slides
 
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data FrameworkThe HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
 
Owning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your PatronsOwning the Discovery Experience for Your Patrons
Owning the Discovery Experience for Your Patrons
 
Kuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for LibrariesKuali OLE: Enabling Choices for Libraries
Kuali OLE: Enabling Choices for Libraries
 
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to CloudCharleston Seminar Being Earnest with our Collections - Legacy to Cloud
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
 
The HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and DemoThe HathiTrust Research Center (HTRC): An Overview and Demo
The HathiTrust Research Center (HTRC): An Overview and Demo
 
SCONUL Kuali OLE Briefing
SCONUL Kuali OLE BriefingSCONUL Kuali OLE Briefing
SCONUL Kuali OLE Briefing
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...New Perspectives for Business Intelligence: Library and Research Technologies...
New Perspectives for Business Intelligence: Library and Research Technologies...
 
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
 
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...GOKb & KB+: An International Partnership to leverage Open Access and Communit...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
 
Kuali OLE @ LITA Forum 2012
Kuali OLE @ LITA Forum 2012Kuali OLE @ LITA Forum 2012
Kuali OLE @ LITA Forum 2012
 
HathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast VersionHathiTrust Research Center: The Fast Version
HathiTrust Research Center: The Fast Version
 
HTRC Architecture Overview
HTRC Architecture OverviewHTRC Architecture Overview
HTRC Architecture Overview
 

Recently uploaded

What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationAadityaSharma884161
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 

Recently uploaded (20)

Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint Presentation
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 

Elephant in the Room: Scaling Storage for the HathiTrust Research Center

  • 1. Elephant in the Room: Scaling Storage for the HathiTrust Research Center Robert H. McDonald | @mcdonald Associate Dean for Library Technologies Deputy Director Data to Insight Center (D2I) Indiana University PASIG 2015 #PASIG2015 UC San Diego March 12, 2015 @hathitrust
  • 2. Mission of the HT Research Center • Research arm of HathiTrust • Established: July, 2011 • Collaborative center: Indiana University & University of Illinois • Mission: Enable researchers world-wide to accomplish tera-scale text data-mining and analysis – Develop cyberinfrastructure to enable HPC access to the HathiTrust Digital Library – Develop cutting-edge software tools for processing, analyzing text – Develop translational tools and data that can be used to enhance HathiTrust Digital Library services to users
  • 4. Working with HTRC Staff Advanced Collaborative Support Scholarly Commons Advanced Research Workshops, tutorials, and guidance for using HTRC One-on-one research support provided through a competitive awards process Collaborative research partnership with HTRC
  • 5. HathiTrust “Wow” Numbers • 13,284,163 total volumes • 6,742,394 book titles • 352,534 serial titles • 4,649,457,050 pages • 595 terabytes • 157 miles • 10,793 tons • 4,979,599 volumes in the public domain
  • 6. Non-Consumptive Research Paradigm • No action or set of actions on part of users, either acting alone or in cooperation with other users over duration of one or multiple sessions can result in sufficient information gathered from collection of copyrighted works to reassemble pages from collection. • Definition disallows collusion between users, or accumulation of material over time. Differentiates human researcher from proxy which is not a user. Users are human beings.
  • 7. DATA LEVELS FOR HTRC • Derived Factual Data or Supplementary Bibliographic Data; Publicly available in US and worldwide (bulk download permitted)Level 0 • Page text or Page images that are in public domain in US only and not subject to third party restrictions; Publicly available to everyone in the US (no bulk download)Level 1A • Page text or Page images that are in the public domain worldwide and not subject to third party restrictions; Publicly available to everyone worldwide (no bulk download)Level 1B • Primary Bibliographic Metadata; Publicly available in US and worldwide (no bulk download)Level 1C • Page text or Page images that are in public domain in US only and are subject to third party restrictions; Publicly available to everyone in the US (no download)Level 2A • Page text or Page images that are in public domain worldwide and are subject to third party restrictions; Publicly available to everyone worldwide (no download)Level 2B • Restricted in-copyright data; may or may not be subject to additional third- party restrictions (no download)Level 3
  • 8. Working with HTRC Tools Get started at: https://htrc2.pti.indiana.edu/ Build Worksets Execute Algorithms Visualize Term Frequency http://sandbox.htrc.illinois.edu/bookworm/
  • 9. HTRC Architecture Data API access interface Portal Access Direct programmatic access (by programs running on HTRC machines) Security (OAuth2) Audit Cassandra cluster volume store Solr index Algorithms Result Sets Meandre Workflows Registry (WSO2) Compute resources Storage resources Agent Job Submission Collection building Collections Blacklight Solr Proxy
  • 11. HTRC Shared IU Systems Data Systems • Data Capacitor II (Lustre) • NetApp NFS (18 TB) – GPFS/Data Direct Networks (28 TB Compute Systems • KARST (high- throughput cluster) • Big Red 2 (Cray XE6/XK7) Current Storage (18 TB > 30 TB) • MARC – 15 GB • R – 5 TB • iPython – 5 TB • Bookworm – 5 TB • Public Domain BW – 3 TB • OCR Data - 5 TB • OCR Index – 2.3 TB • Audit Logs – 44 GB • User created MD – 15 GB • Blacklight – 20 GB • IU Pairtree – 2.65. TB
  • 12. Want More HTRC? 3rd Annual HTRC UnCamp! March 30-31, 2015 in Ann Arbor, Michigan DH 2015 June 29-3 July, 2015 in Sydney, Australia http://www.hathitrust.org/htrc_uncamp2015
  • 13. Thank You • This presentation was made possible with content provided by many HTRC colleagues John Unsworth, J. Stephen Downie, Beth Plale, Beth Namachchivaya, Dirk Herr-Hoyman, Milnda Pathirage, Samitha Liyanage, Miao Chen, Guangchen Ruan, Jiaan Zeng, Loretta Auvil, Boris Capitanu, and many others… • The HTRC Non-Consumptive Research Grant was graciously funded by the Alfred P. Sloan Foundation • THE HTRC WCSA grant is graciously funded by the Andrew W. Mellon Foundation. • HTRC - http://www.hathitrust.org/htrc • IU D2I Center - http://d2i.indiana.edu/ • UIUC GSLIS - http://www.lis.illinois.edu/

Editor's Notes

  1. 3
  2. Level 0 – Derived Factual Data or Supplementary Bibliographic Data; Publicly available in US and worldwide (bulk download permitted)   Level 1.A – Page text or Page images that are in public domain in US only and not subject to third party restrictions; Publicly available to everyone in the US (no bulk download)   Level 1.B – Page text or Page images that are in the public domain worldwide and not subject to third party restrictions; Publicly available to everyone worldwide (no bulk download)   Level 1.C – Primary Bibliographic Metadata; Publicly available in US and worldwide (no bulk download)   Level 2.A – Page text or Page images that are in public domain in US only and are subject to third party restrictions; Publicly available to everyone in the US (no download)   Level 2.B – Page text or Page images that are in public domain worldwide and are subject to third party restrictions; Publicly available to everyone worldwide (no download)   Level 3 – Restricted in-copyright data; may or may not be subject to additional third- party restrictions (no download)
  3. Registry – agent can deploy any service listed in this digram and can run with the computational resources – Original Plan iis to use XSEDE – not using this on IIS machine but are using ODIN (128 node cluster each core has 4Gb memory and 4 computation cores)– smoketree (D2I server)(24 cores physical 48 loical cores 128 GB memory) – these are not long term just using for now -