HathiTrust Research Center     The Fast Version      Robert H. McDonald | @mcdonaldExecutive Committee-HathiTrust Research...
HTRC MissionThe HathiTrust Research Center (HTRC) is a collaborativeresearch center launched jointly by Indiana University...
HTRC Non-Consumptive Research             Paradigm• No action or set of actions on part of users,  either acting alone or ...
HTRC Current Infrastructure• Servers   – 14 production-level quad-core servers (virtual     machines)      • 16 – 32GB of ...
HTRC Architecture Portal Access             Blacklight                                                           Direct   ...
HTRC Next Steps• Phase 2 Begins Jan 2013….• Thanks to
Contact Information• Robert H. McDonald  – Email –    robert@indiana.edu  – Chat – rhmcdonald on    googletalk | skype  – ...
Upcoming SlideShare
Loading in …5
×

HathiTrust Research Center: The Fast Version

370
-1

Published on

These are my pecha kucha slides for the Library of Congress Storage meeting held on Sept 21, 2012 in Washington, DC.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
370
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Registry – agent can deploy any service listed in this digram and can run with the computational resources – Original Plan iis to use XSEDE – not using this on IIS machine but are using ODIN (128 node cluster each core has 4Gb memory and 4 computation cores)– smoketree (D2I server)(24 cores physical 48 loical cores 128 GB memory) – these are not long term just using for now -
  • HathiTrust Research Center: The Fast Version

    1. 1. HathiTrust Research Center The Fast Version Robert H. McDonald | @mcdonaldExecutive Committee-HathiTrust Research Center (HTRC) Deputy Director-Data to Insight Center Associate Dean-University Libraries Indiana University
    2. 2. HTRC MissionThe HathiTrust Research Center (HTRC) is a collaborativeresearch center launched jointly by Indiana Universityand the University of Illinois to act as the public-facingresearch arm of the massive HathiTrust Digital Library.The HTRC is mandated to help researchers from aroundthe world surmount the difficulties associated withprocessing and analyzing terascale amounts of digitaltext. Thus, the scholarly developers at HTRC work todevelop cutting-edge software tools andcyberinfrastructure to enable advanced computationalaccess to the growing digital record of human knowledge.HTRC began its efforts July 2011.
    3. 3. HTRC Non-Consumptive Research Paradigm• No action or set of actions on part of users, either acting alone or in cooperation with other users over duration of one or multiple sessions can result in sufficient information gathered from collection of copyrighted works to reassemble pages from collection.• Definition disallows collusion between users, or accumulation of material over time. Differentiates human researcher from proxy which is not a user. Users are human beings.
    4. 4. HTRC Current Infrastructure• Servers – 14 production-level quad-core servers (virtual machines) • 16 – 32GB of memory • 250 – 500GB of local disk each – 6-node Cassandra cluster for volume store – Ingest service and secure Data API access point• Storage (IU University Infrastructure) – 13TB of 15,000 RPM SAS disk storage – Increase up to 17TB by end of 2012 – 500TB available in late year 2-year 3
    5. 5. HTRC Architecture Portal Access Blacklight Direct Agent programmatic access (by Job Collection programs runningSubmission building on HTRC machines) Security (OAuth2) Data API access interface Solr Proxy Registry (WSO2) Audit Meandre Algorithms Cassandra Workflows cluster volume store Result Sets Collections Solr index Compute resources Storage resources
    6. 6. HTRC Next Steps• Phase 2 Begins Jan 2013….• Thanks to
    7. 7. Contact Information• Robert H. McDonald – Email – robert@indiana.edu – Chat – rhmcdonald on googletalk | skype – Twitter - @mcdonald – Blog – http://www.rmcdonald.net – Twitter Hashtag: #HTRC12 http://slidesha.re/QCOrIX – Web: http:www.hathitrust.org/htrc

    ×