This is the presentation on the HTRC given at the Indiana University booth at Supercomputing 2014 by Beth Plale - Co-Director HTRC and Robert McDonald - HTRC Executive Management Group.
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsBeth Plale
Invited talk at TRUST Women’s Institute for Summer Enrichment (WISE), Cornell, NY Jun 16, 2014. Infrastructure support for text mining research of big data repository like HathiTrust raises challenges in access and security when the bulk of the repository is protected by copyright.
Introduction for skills seminar on Search and Data Mining, Master of European...Gerben Zaagsma
These are the slides for the introductory lecture that I gave as part of a skills seminar on Search and Data Mining (Luxembourg, 11 December 2014). The slides are rather visual and for the most part don’t include notes, yet I believe the gist of the talk will be clear. At the end links are included for tools, further reading and a link to the exercises we did.
Slides from keynote lecture by Andrew Prescott to the 7th Herrenhausen conference of the Volkswagen Foundation, 'Big Data in a Transdisciplinary Perspective'
This presentation was provided by Jake Zarnegar of Silverchair, during the NFAIS Forethought event "Artificial Intelligence #2 – Processes for Media Analysis and Extraction" The webinar was held on May 20, 2020.
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital TextsBeth Plale
Invited talk at TRUST Women’s Institute for Summer Enrichment (WISE), Cornell, NY Jun 16, 2014. Infrastructure support for text mining research of big data repository like HathiTrust raises challenges in access and security when the bulk of the repository is protected by copyright.
Introduction for skills seminar on Search and Data Mining, Master of European...Gerben Zaagsma
These are the slides for the introductory lecture that I gave as part of a skills seminar on Search and Data Mining (Luxembourg, 11 December 2014). The slides are rather visual and for the most part don’t include notes, yet I believe the gist of the talk will be clear. At the end links are included for tools, further reading and a link to the exercises we did.
Slides from keynote lecture by Andrew Prescott to the 7th Herrenhausen conference of the Volkswagen Foundation, 'Big Data in a Transdisciplinary Perspective'
This presentation was provided by Jake Zarnegar of Silverchair, during the NFAIS Forethought event "Artificial Intelligence #2 – Processes for Media Analysis and Extraction" The webinar was held on May 20, 2020.
Research Data Management in the Humanities and Social SciencesCelia Emmelhainz
This two-part presentation for librarians reviews basic concepts and concerns with research data management, and is targeted to those working with humanists and social scientists. You are free to re-use and modify with attribution.
Slides for presentation given at the first Digital Humanities Congress held in Sheffield from 6 – 8 September 2012 with the support of the Network of Expert Centres and Centernet.
URL http://www.shef.ac.uk/hri/dhc2012
Research into Practice case study 2: Library linked data implementations an...Hazel Hall
The research underlying this presentation explored the role that libraries play in the linked data context. Focusing on European national libraries and Scottish libraries, multiple data gathering methods and constant comparative analysis were applied in the study. Amongst the findings, a general lack of awareness within the library community of the Semantic Web and the implications of linked data was identified. At the same time, there is recognition that linked data augments the discoverability and enhances the interoperability of library data. The presentation will include recommendations for the application of the findings of this research in practice.
presented by Stuart Macdonald at the College of Science and Engineering - "What's new for you in the Library“, Murray Library, Kings Buildings, University of Edinburgh. 28 May 2014
Covers research data, research data management, funder policies and the University's RDM policy, RDM services and support, awareness raising, training, progress so far.
Research data management: a tale of two paradigms: Martin Donnelly
Presentation I was supposed to give at "Scotland’s Collections and the Digital Humanities" workshop in Edinburgh on May 2nd 2014. Illness prevented it, but my heroic DCC colleague Jonathan Rans stepped up and delivered the presentation on my behalf.
From Theory to Practice: Can Opennesss Improve the Quality of OER Research? Beck Pitt
This presentation was co-authored with fellow OER Research Hub researchers Bea de los Arcos and Rob Farrow. It was presented at CALRG14 at IET, The Open University (UK) on 10 June 2014.
An updated and revised version of these slides will be presented at OpenEd14 in Washington DC in November 2014.
Milena Dobreva (University of Malta, MT): How to Index Biographical Data from Archival Documents Using the Methods of the Citizen Science
co:op-READ-Convention Marburg
Technology meets Scholarship, or how Handwritten Text Recognition will Revolutionize Access to Archival Collections.
With a special focus on biographical data in archives
Hessian State Archives Marburg Friedrichsplatz 15, D - 35037 Marburg
19-21 January 2016
The liaison librarian: connecting with the qualitative research lifecycleCelia Emmelhainz
A discussion of user needs in anthropology and ways in which academic liaison librarians could support the lifecycle of qualitative research in a holistic way.
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014Kimberly Hoffman
Lecture on Scholarly Communications for CUA LSC634 students Sept. 29, 2014. Activities noted by * include mining new scholarly communications job descriptions; determining open access, self archiving and author rights of individual journals using SHERPA/RoMEO; and finding bibliometrics like JIF and h-index that drive publishing.
The HathiTrust Research Center: An Overview of Advanced Computational ServicesRobert H. McDonald
These are my slides from the DPLAFest 2015 held in Indianapolis, IN on 04/17/2015-04/18/2015.
For more see - https://dplafest2015.sched.org/event/a1cfbaca67fd71a2409d28d9b27b1351
Research Data Management in the Humanities and Social SciencesCelia Emmelhainz
This two-part presentation for librarians reviews basic concepts and concerns with research data management, and is targeted to those working with humanists and social scientists. You are free to re-use and modify with attribution.
Slides for presentation given at the first Digital Humanities Congress held in Sheffield from 6 – 8 September 2012 with the support of the Network of Expert Centres and Centernet.
URL http://www.shef.ac.uk/hri/dhc2012
Research into Practice case study 2: Library linked data implementations an...Hazel Hall
The research underlying this presentation explored the role that libraries play in the linked data context. Focusing on European national libraries and Scottish libraries, multiple data gathering methods and constant comparative analysis were applied in the study. Amongst the findings, a general lack of awareness within the library community of the Semantic Web and the implications of linked data was identified. At the same time, there is recognition that linked data augments the discoverability and enhances the interoperability of library data. The presentation will include recommendations for the application of the findings of this research in practice.
presented by Stuart Macdonald at the College of Science and Engineering - "What's new for you in the Library“, Murray Library, Kings Buildings, University of Edinburgh. 28 May 2014
Covers research data, research data management, funder policies and the University's RDM policy, RDM services and support, awareness raising, training, progress so far.
Research data management: a tale of two paradigms: Martin Donnelly
Presentation I was supposed to give at "Scotland’s Collections and the Digital Humanities" workshop in Edinburgh on May 2nd 2014. Illness prevented it, but my heroic DCC colleague Jonathan Rans stepped up and delivered the presentation on my behalf.
From Theory to Practice: Can Opennesss Improve the Quality of OER Research? Beck Pitt
This presentation was co-authored with fellow OER Research Hub researchers Bea de los Arcos and Rob Farrow. It was presented at CALRG14 at IET, The Open University (UK) on 10 June 2014.
An updated and revised version of these slides will be presented at OpenEd14 in Washington DC in November 2014.
Milena Dobreva (University of Malta, MT): How to Index Biographical Data from Archival Documents Using the Methods of the Citizen Science
co:op-READ-Convention Marburg
Technology meets Scholarship, or how Handwritten Text Recognition will Revolutionize Access to Archival Collections.
With a special focus on biographical data in archives
Hessian State Archives Marburg Friedrichsplatz 15, D - 35037 Marburg
19-21 January 2016
The liaison librarian: connecting with the qualitative research lifecycleCelia Emmelhainz
A discussion of user needs in anthropology and ways in which academic liaison librarians could support the lifecycle of qualitative research in a holistic way.
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014Kimberly Hoffman
Lecture on Scholarly Communications for CUA LSC634 students Sept. 29, 2014. Activities noted by * include mining new scholarly communications job descriptions; determining open access, self archiving and author rights of individual journals using SHERPA/RoMEO; and finding bibliometrics like JIF and h-index that drive publishing.
The HathiTrust Research Center: An Overview of Advanced Computational ServicesRobert H. McDonald
These are my slides from the DPLAFest 2015 held in Indianapolis, IN on 04/17/2015-04/18/2015.
For more see - https://dplafest2015.sched.org/event/a1cfbaca67fd71a2409d28d9b27b1351
The HathiTrust Research Center (HTRC): An Overview and DemoRobert H. McDonald
The session will provide an overview of the HathiTrust Research Center including its mission and current status. It will also include a demonstration of current HTRC phase one technology and services. Additionally, the speakers will address the HTRC's role in supporting humanities research at scale.
SGCI - Science Gateways: Sustainability via On-Campus TeamsSandra Gesing
This talk gives an overview on enhancing the sustainability of science gateways via on-campus teams. It goes into detail for success stories, available funding mechanisms and suggests a roadmap for universities aiming at building centralized on-campus teams.
Open data & knowledge solutions - a cgiar perspective dileepFRANK Water
This was a presentation made by Dr. G Dileepkumar of ICRISAT, sharing what is happening at CGIAR with respect to open access and how far has their initiative gone.
On November 21st 2014 at the Tufts University Medford campus and November 25th 2014 at the campus of the University of Massachusetts Medical School in Worcester, the BLC and Digital Science hosted a workshop focused on better understanding the research information management landscape.
Mark Hahnel, CEO of Figshare discussed more specific aspects of the research data management landscape and various approaches to address the growing suite of mandates.
Immersive informatics - research data management at Pitt iSchool and Carnegie...Keith Webster
A joint presentation by Liz Lyon and Keith Webster on providing education for librarians engaged in research data management. This was delivered at Library Research Seminar VI, at the University of Illinois Urbana Champaign in September 2014. The presentation looks at a class delivered by Lyon at the University of Pittsburgh's iSchool in 2014, and the related needs for immersive training opportunities amongst experienced practicing librarians, using Carnegie Mellon University's library, led by Webster, as a case study.
SGCI Science Gateways: Software sustainability via on-campus teams - Webinar ...Sandra Gesing
Achieve software sustainability via on-campus teams. SGCI can support you with a roadmap to use free resources on campus and/or build your own on-campus team
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...Carole Goble
Invited talk, PHIL_OS, March 30-31 2023, Exeter
https://opensciencestudies.eu/whither-open-science. Includes hidden slides.
FAIR and Open Science needs Digital Research Infrastructure, which is a federated system of systems and needs funding models that are fit for purpose
Culture change needed for paying for Open Science’s infrastructure and funding support for data driven research needs more reality and less rhetoric
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
These slides cover evolving federal research requirements for sharing scientific data. Provided are updates on federal agency responses to the 2013 OSTP memo, guidance on data management plans, resources for data management and curation training for staff/researchers, and tips for evaluating public data-sharing services. ICPSR's public data-sharing service, openICPSR, is also presented. Recording of this presentation is here: https://www.youtube.com/watch?v=2_erMkASSv4&feature=youtu.be
SGCI - URSSI - Research Software Engineers, Science Gateway Developers and Cy...Sandra Gesing
The conceptualization of the US Research Software Sustainability Institute (URSSI) just received funding in December 2017 and aims at building the focal point for RSEs in the US similar to SSI in the UK. The Science Gateways Community Institute (SGCI), opened in August 2016, provides free resources, services, experts, and ideas for creating and sustaining science gateways on national and international level. Science gateways – also called virtual research environments or virtual labs – allow science and engineering communities to access shared data, software, computing services, instruments, and other resources specific to their disciplines and use them also in teaching environments. Especially the goals of the workforce development and incubator services have a broad overlap with RSE initiatives to improve career paths of developers and building on-campus developer teams. ACI-REFs (Advanced Cyberinfrastructure Research and Education Facilitators) is a synonym for RSEs and the goal of the project and the trainings aims also at building a network and training the trainers for efficient research software support. The talk will give an overview on the diverse initiatives and highlights the international collaboration possibilities.
Similar to The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework (20)
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Robert H. McDonald
This is the slidedeck for my ACRL 2015 TechConnect Presentation with Nicole Vasilevsky (OHSU). For more on the program see - <a>http://bit.ly/1xcQbCr</a>.
This is the slide deck for the presentation that was given with Kate Lawrence (VP User Experience EBSCO), Courtney McDonald (Indiana University), and Esther Onega (University of Virginia) at the 2014 Charleston Conference on Thursday Nov 6, 2014.
Charleston Seminar Being Earnest with our Collections - Legacy to CloudRobert H. McDonald
These are my slides for the 2014 Charleston Conference Seminar, "Being Earnest with our Collections," that I presented with Jill Grogg on moving libraries to the cloud.
New Perspectives for Business Intelligence: Library and Research Technologies...Robert H. McDonald
This is our presentation for Educause 2012 entitled New Perspectives for Business Intelligence: Library and Research Technologies and Research Collaboration for New Data Models held on Nov 8, 2012.
This is the Kuali OLE update presente by Robert H. McDonald (Indiana University) and Bob Persing (University of Pennsylvania) at the LITA Forum 2012 on October 6, 2012 in Columbus, OH.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
How to Split Bills in the Odoo 17 POS ModuleCeline George
Bills have a main role in point of sale procedure. It will help to track sales, handling payments and giving receipts to customers. Bill splitting also has an important role in POS. For example, If some friends come together for dinner and if they want to divide the bill then it is possible by POS bill splitting. This slide will show how to split bills in odoo 17 POS.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Unit 8 - Information and Communication Technology (Paper I).pdf
The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework
1. The HathiTrust Research Center:
Big Data Analytics in a Secure
pti.iu.edu/sc14
Data Framework
@hathitrust #SC14
Beth Plale | @bplale
Director Data to Insight Center | Indiana University
Robert H. McDonald | @mcdonald
Deputy Director Data to Insight Center | Indiana University
2. pti.iu.edu/sc14
@hathitrust #SC14
Outline
• What is the HTRC?
• Non-Consumptive Research Paradigm
• Current Architecture
• Future Architecture
• Advanced Collaborative Support (RFP)
• HTRC Science on a Sphere
• HTRC @ Events
3. pti.iu.edu/sc14
@hathitrust #SC14
HathiTrust Digital Library
• HathiTrust is a partnership of
90+ academic & research
institutions, offering a collection
of millions of digitized titles.
• http://hathitrust.org
– IU is a founding member of the
HathiTrust along with University of
Michigan, University of California,
and the University of Virginia
4. @hathitrust #SC14
HathiTrust Research Center
Mission
• Public research arm of HathiTrust
• Goal: enable researchers world-wide to accomplish tera-scale
pti.iu.edu/sc14
text data-mining and analysis
– Develop cutting-edge software tools for processing, analyzing
text
– Develop cyberinfrastructure to enable HPC access to the
HathiTrust Digital Library
• Established: July, 2011
• Collaborative center: Indiana University & University of
Illinois
6. pti.iu.edu/sc14
@hathitrust #SC14
HTRC Current Users
Projected Use 2019
Digital
Humanities
(60)
Education
(60)
Informatics
(60)
Observers
(20)
194 existing user accounts
Lots of user accounts; good
starting point.
Improve :
• Increase amount of real work
being accomplished as
measured by usage on HTRC’s
compute resources Quarry and
Big Red II at IU
• Develop educational uses
• Develop informatics uses
• Decrease number of observers
to 10%
Project 200 users at any one time
of which 90% are doing relevant
education/scholarship
6
7. pti.iu.edu/sc14
@hathitrust #SC14
Non-Consumptive Research
Paradigm
• No action or set of actions on part of users,
either acting alone or in cooperation with other
users over duration of one or multiple sessions
can result in sufficient information gathered from
collection of copyrighted works to reassemble
pages from collection.
• Definition disallows collusion between users, or
accumulation of material over time.
Differentiates human researcher from proxy
which is not a user. Users are human beings.
8. pti.iu.edu/sc14
@hathitrust #SC14
HTRC
All the complexity
Complexity hiding interface
Request
Spatial plots
Statistical plots
Tabular info
10. pti.iu.edu/sc14
@hathitrust #SC14
HTRC Goals
• Provide a persistent and sustainable structure to
enable original and cutting edge research.
– Leverage data storage and computational infrastructure at Indiana
& Illinois
– Stimulate community development of new functionality and tools
– Use tools to enable discoveries that would not be possible without
the HTRC
• Enable scholars to fully utilize content of
HathiTrust Library while preventing intellectual
property misuse within U.S. copyright law.
– Provision secure computational and data environment for scholars
to perform research using HathiTrust Digital Library.
11. pti.iu.edu/sc14
@hathitrust #SC14
HTRC Organization
2014-18
HTRC Executive
Mgmt
Administrative
Support
Core
Development
Advanced
Research
Advanced
Collaborative
Support
Scholarly
Commons
12. HTRC Data Capsule
pti.iu.edu/sc14
@hathitrust #SC14
HTRC Data Capsule@IU
Team
• Beth Plale (PI)
• Jiaan Zeng
• Guangchen Ruan
HTRC Data Capsule@Michigan Team
• Atul Prakash (PI)
• Alexander Crowell
Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and
Beth Plale. 2014. Cloud computing data capsules for non-consumptiveuse
of texts. In Proceedings of the 5th ACM workshop
on Scientific cloud computing (ScienceCloud '14). ACM, New York,
NY, USA, 9-16. DOI=10.1145/2608029.2608031
http://doi.acm.org/10.1145/2608029.2608031
Special Thanks to
• Samitha Liyanage
• Milinda Pathirage
• Zong Peng
• Earlence Fernandes
• Ajit Aluri
13. User Authentication
pti.iu.edu/sc14
@hathitrust #SC14
HTRC Data Capsule
VM-1 …
Host-1
Web UI
Web Services
Hypervisor Scripts
…
Database
Firewall
Audit
Image Store
Volume Store
VM-k
VM-1 … VM-k
Host-N
Web front end Web service Backend
16. pti.iu.edu/sc14
@hathitrust #SC14
HTRC Science on a Sphere #SC14
1. Texts published per
country
2. HathiTrust Member
Institutions
3. HT Google analytics
17. @hathitrust #SC14
HTRC Advanced Collaborative Support
• ACS will be offered on a rolling basis over next
pti.iu.edu/sc14
four years 2014-18
• 1st RFP Call Deadline is Jan 8, 2015 5:00pm
eastern
– RFP - http://www.hathitrust.org/htrc/acs-rfp
• For more info on the Advanced Collaborative
Support please contact:
htrc.acs.awards@gmail.com
18. pti.iu.edu/sc14
@hathitrust #SC14
HTRC@Events
• DHCS 2014, Oct 22, 2014
Evanston, IL
• SC14 – IU Booth, Nov 17-19,
2014, New Orleans, LA
• CLIR/CNI Workshop on
Expanded Access to
Collections, Dec. 7, 2014,
Washington, DC
• HTRC UnCamp 2015 – March
30-31, 2015 Ann Arbor, MI
19. pti.iu.edu/sc14
@hathitrust #SC14
Thank You
HTRC IU Team
• Beth Plale (PI)
• Robert H. McDonald
• Miao Chen
• Guangchen Ruan
• Zong Peng
• Milinda Pathirage
• Samitha Liyanage
• Leena Unnikrishnan
• Nicholae Cline
HTRC UIUC Team
• J. Stephen Downie (PI)
• Beth Namachchivaya
• Megan Senseney
• Sayan Bhattacharyya
• Colleen Fallaw
• Loretta Auvil
• Boris Capitanu
• Harriet Green
20. @hathitrust #SC14
More Information on HTRC
• For details http://www.hathitrust.org/htrc/faq
• General contact info
pti.iu.edu/sc14
– J. Stephen Downie, Co-Director HTRC,
jdownie@Illinois.edu
– Beth Plale, Co-Director HTRC, plale@indiana.edu
• Requests for capability, interest
– Miao Chen, Asst. Director for Outreach HTRC
miaochen@indiana.edu
21. @hathitrust #SC14
The HathiTrust Research Center:
Big Data Analytics in a Secure
pti.iu.edu/sc14
Data Framework
For more on HTRC: http://www.hathitrust.org/htrc
For these slides go to:
Editor's Notes
HTRC hides complexity of analytics. In this sense, it is like Google search, which is a simple interface that hides complexity to search billions of pages. The kinds of things returned from HTRC interaction are spatial relationship of words (and their frequency obviously), statistical plots of information or tabular information.
Shifting the complexity hiding interface to the right, we open up the cloud to see what’s inside. HTRC at it simplest has 1) algorithms – these are drawn from SEASR and from other analysis tool suites including Mahout and mapreduce, the 2) HT corpus (and subsets of the corpus that users either have personally as part of a workset, or are publically available, and 3) other data sets that are used. HTRC brokers the bringing together of these pieces so that computation can take place on a resource like Big Red II (or XSEDE). Note that there is an arrow from the compute engine to the complexity hiding interface. This is because researcher interaction with the texts isn’t an automated workflow; it is one requiring levels of interaction with the computation as it is running.
Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and Beth Plale. 2014. Cloud computing data capsules for non-consumptiveuse of texts. In Proceedings of the 5th ACM workshop on Scientific cloud computing (ScienceCloud '14). ACM, New York, NY, USA, 9-16. DOI=10.1145/2608029.2608031 http://doi.acm.org/10.1145/2608029.2608031
1.) Texts published per country
Data were from the Gender metadata work. It was used because it has volume authors and country of publication information. The total records have 60K volumes, with some country fields missing
2.) HathiTrust Member institutions
It maps geolocations of the UnCamp 2013 participants;
The text band shows UnCamp 15’ and 13M books available for non-consumptive use soon
3.) HT Google analytics
Shows HT webpage use over the time, by aggregating over the quarter
A drop around 2013 summer: could possibly be cause by summer break