Your SlideShare is downloading. ×
Building a Public Research Center for the HathiTrust Digital Library
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Building a Public Research Center for the HathiTrust Digital Library


Published on

This is a ppt by Robert H. McDonald from the panel moderated by Stephen Downie at JCDL 2011 called Big Data! Dig Deal?

This is a ppt by Robert H. McDonald from the panel moderated by Stephen Downie at JCDL 2011 called Big Data! Dig Deal?

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • State Core Team NamesTalk about Partnership between IU and UIUC
  • Basic History of HathiTrust Digital Library – Digital Public Library of America - LAC
  • Transcript

    • 1. Building a Public Research Center for the HathiTrust Digital Library
      @hathitresearch | @hathitrust
      Robert H. McDonald
      Associate Dean for Library Technologies and Digital Libraries
      Associate Director-Data to Insight Center, Pervasive Technology Institute
      Indiana University
      June 14, 2011
      JCDL 2011: Big Data! Big Deal? Panel
    • 2. HathiTrust Research Center (HTRC) Team
      Indiana University
      Beth Plale – Director
      Robert McDonald – Executive Committee
      University of Illinois
      Scott Poole – Co-Director
      John Unsworth – Executive Committee
    • 3. HathiTrust Digital Library History
      To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge.
      Launched in October 2008
      University of Michigan
      Indiana University
      Used Google Books Repository at Michigan as Model
      Expanded to include content from
      CIC Member Libraries
      UC System Libraries
      University of Virginia
      Now includes more than 50 partner institutions and more than 8 million volumes
    • 4. Towards a HathiTrust Research Center
      Started in response to proposed Google Settlement - June 2009
      • Specific Funding set aside by Google to build a public research center
      • 5. Worked to identify key stakeholders from HT institutions to collaborate and write RFP
      • 6. Google Settlement in early 2011 did not stop the center
      Developed specific RFP for HathiTrust to solicit proposals – Summer/Fall 2009
      HTRC RFP Working Group
      RFP Released – Winter 2010
    • 7. Our Collaboration
      HTRC is founded as a joint venture between Indiana University and the University of Illinois Urbana-Champaign, aimed at solving the difficult challenges of increasing computational access to the public domain and copyrighted material in HathiTrust.
    • 8. Our Mission
      Phase I : starting Apr 2011 and going for 18 mos.
      Phase II : starting Fall 2012 and going for …
      Goal: enable strong computational research and education on a collection that has not been amenable to computational exploration EVER before!
    • 9. Our Goals
      Maintain repository of text mining algorithms and retrieval tools available on-line for human and programmatic discovery. Also register derived data sets, indexes, and versions in registry repository.
      Be a user-driven resource, with an active advisory board, and a community model that allows users to share algorithms and tools.
      Support interoperability across collections and institutions, through use of inCommon SAML identity.
    • 10. Our Future
      Support innovation in cyberinfrastructure to deliver optimal access and use of the HathiTrust corpus.
      Implement “Non-consumptive” research: a technical and intellectual challenge
      Identify and host existing data analysis, text mining and retrieval toolsthat are of interest to the community.  
      Stimulate development of new analytical methods and tools. We hope that the scale of the HTRC will promote new levels of collaboration in tool development.
    • 11. HathiTrust Research Center Today
      HTRC is dedicated to the provision of access to a comprehensive body of published works for scholarship and education for computational research purposes.
      Lightweight Organization
      Executive Committee
      Beth Plale, Indiana
      Scott Poole, Illinois
      Robert H. McDonald, Indiana
      John Unsworth, Illinois
      Advisory Board
      HathiTrust Executive Committee Liaison
      Laine Farley, California Digital Library
    • 12. HathiTrust Research Center Today
      $250K in funding for initial 18 month startup
      Creating Themed Collections for early Use Cases
      Astronomy – Victorian Literature - Influenza
      Ingest and Replication Mechanisms Between HT and HTRC
      SOLR indexes
      Data Capsule integration
      Karma integration
      Integration with SEASR/MEANDRE SOA services at NCSA
      Alignment with Bamboo Technology Project
      Alignment with international Google Books Research Centers
      Establishing long-term non-consumptive research methodologies
    • 13. HTRC Proposed Technical Architecture
      Courtesy IU Data to Insight Center – Beth Plale/Yiming Sun
    • 14. Courtesy IU Data to Insight Center – Felix Terkhorn/Yiming Sun
      Current SEASR Integration Demo
      1. User enters
      Author name or Volume title
      2. Query RIS for Author Name or Volume Title
      Sample Collection Bibliography Database
      JS/PHP Auto-completer
      Book Search Interface by Author or Title
      3. Volume ID
      7. Tag Cloud returned to user
      4. Invoke Tag Cloud service with URL
      Converted from MARC to RIS
      5. Use URL to Retrieve Volume
      Public-domain OCR Web Access Servlet
      A persistent RESTful Web Service
      Tag Cloud Viewer Data Flow
      6. OCR for volume
      Sample Public Domain Collection
      Meandre Workbench
      Organized as pairtree for demo only
      SEASR Infrastructure
      Administrator creates tag cloud viewer in advance through SEASR
    • 15. Non-Consumptive Research Track
      No action or set of actions on the part of HathiTrust Research Center users, either acting alone or in cooperation with other users over the duration of one or multiple sessions can result in sufficient information gathered from the HathiTrust collection to reassemble pages from the collection.
      Beth Plale
      (Indiana University)
      Atul Prakash
      (University of Michigan)
      Geoffrey Fox
      (Indiana University)
      Robert H. McDonald
      (Indiana University)
    • 16. HTRC Managed Data-Intensive Compute Resources
      HathiTrust Digital Library Content
      • Access to HT open content indices
      • 17. Access to HT copyrighted indices
      • 18. Auditable Secure Mechanisms for legal mandated MOU based and fair-use compliance
      Researcher Driven Applications for Use as Services within the Data Capsule
      • Can HTRC provide a services framework for researcher applications to run within the secure data capsule compute resources?
      Secure Data Capsule
      Researcher Access
      Provision access to copyrighted content for research purpose giving researcher flexible computing resources in controlled environment
    • 19. HathiTrust Research Center Events
      HTRC Kickoff Event at Digital Humanities Conference 2011
      Stanford University - June 20, 2011
      Working on models for collaborative research
      AHRC/ESRC/IMLS/JISC/NEH/NSF/NOW/SSHRC Digging into Data Round 2
      Working on early advanced user case studies for the HathiTrust Corpus
    • 20. Support and Acknowledgements
      IU UITS Research Technologies
      National Center for Supercomputing Applications
      IU Data to Insight Center
      Illinois Informatics Institute
      Lilly Endowment, Inc.
      The Alfred P. Sloan Foundation
    • 21. For More on HathiTrust Research Center
      See –
      Follow us @hathitresearch on twitter
      Robert H. McDonald
      @mcdonald on twitter