Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
HathiTrust Research Center     The Fast Version      Robert H. McDonald | @mcdonaldExecutive Committee-HathiTrust Research...
HTRC MissionThe HathiTrust Research Center (HTRC) is a collaborativeresearch center launched jointly by Indiana University...
HTRC Non-Consumptive Research             Paradigm• No action or set of actions on part of users,  either acting alone or ...
HTRC Current Infrastructure• Servers   – 14 production-level quad-core servers (virtual     machines)      • 16 – 32GB of ...
HTRC Architecture Portal Access             Blacklight                                                           Direct   ...
HTRC Next Steps• Phase 2 Begins Jan 2013….• Thanks to
Contact Information• Robert H. McDonald  – Email –    robert@indiana.edu  – Chat – rhmcdonald on    googletalk | skype  – ...
Upcoming SlideShare
Loading in …5
×

HathiTrust Research Center: The Fast Version

538 views

Published on

These are my pecha kucha slides for the Library of Congress Storage meeting held on Sept 21, 2012 in Washington, DC.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

HathiTrust Research Center: The Fast Version

  1. 1. HathiTrust Research Center The Fast Version Robert H. McDonald | @mcdonaldExecutive Committee-HathiTrust Research Center (HTRC) Deputy Director-Data to Insight Center Associate Dean-University Libraries Indiana University
  2. 2. HTRC MissionThe HathiTrust Research Center (HTRC) is a collaborativeresearch center launched jointly by Indiana Universityand the University of Illinois to act as the public-facingresearch arm of the massive HathiTrust Digital Library.The HTRC is mandated to help researchers from aroundthe world surmount the difficulties associated withprocessing and analyzing terascale amounts of digitaltext. Thus, the scholarly developers at HTRC work todevelop cutting-edge software tools andcyberinfrastructure to enable advanced computationalaccess to the growing digital record of human knowledge.HTRC began its efforts July 2011.
  3. 3. HTRC Non-Consumptive Research Paradigm• No action or set of actions on part of users, either acting alone or in cooperation with other users over duration of one or multiple sessions can result in sufficient information gathered from collection of copyrighted works to reassemble pages from collection.• Definition disallows collusion between users, or accumulation of material over time. Differentiates human researcher from proxy which is not a user. Users are human beings.
  4. 4. HTRC Current Infrastructure• Servers – 14 production-level quad-core servers (virtual machines) • 16 – 32GB of memory • 250 – 500GB of local disk each – 6-node Cassandra cluster for volume store – Ingest service and secure Data API access point• Storage (IU University Infrastructure) – 13TB of 15,000 RPM SAS disk storage – Increase up to 17TB by end of 2012 – 500TB available in late year 2-year 3
  5. 5. HTRC Architecture Portal Access Blacklight Direct Agent programmatic access (by Job Collection programs runningSubmission building on HTRC machines) Security (OAuth2) Data API access interface Solr Proxy Registry (WSO2) Audit Meandre Algorithms Cassandra Workflows cluster volume store Result Sets Collections Solr index Compute resources Storage resources
  6. 6. HTRC Next Steps• Phase 2 Begins Jan 2013….• Thanks to
  7. 7. Contact Information• Robert H. McDonald – Email – robert@indiana.edu – Chat – rhmcdonald on googletalk | skype – Twitter - @mcdonald – Blog – http://www.rmcdonald.net – Twitter Hashtag: #HTRC12 http://slidesha.re/QCOrIX – Web: http:www.hathitrust.org/htrc

×