ESI Supplemental 1   E-research Support Slides
Upcoming SlideShare
Loading in...5
×
 

ESI Supplemental 1 E-research Support Slides

on

  • 1,074 views

E-Research Support at ...

E-Research Support at
Johns Hopkins University & Purdue University



Supplemental Webinar
Wednesday, October 17, 2012
Presented by Sayeed Choudhurry & James Mullins

Statistics

Views

Total Views
1,074
Views on SlideShare
1,074
Embed Views
0

Actions

Likes
1
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • KRISTIWelcome
  • First we have the goal of making sure that PURR can address the needs of data mgmt planning, and we have populated the site with a variety of tools and resources to help researchers do that—we emphasize this is necessary for all projects, not just NSF ones
  • One of the strengths of PURR is built into the HUBzero® platform, the ability to engage in collaborative work—Purdue researchers can create projects, they can use them to stage a grant, they can invite others to participate, create or add files, etc.
  • A key part of PURR is the incentive to submit projects for funding (as universities kind of like getting funding), and we are automating a process whereby once a grant is awarded, researchers automatically acquire more space—the chart may be hard to read but is on the website if you’re curious
  • As a goal is make research more transparent and data more available, PURR allows for what we are currently calling “publishing” and archiving data sets and collections—notice that it has to be submitted and reviewed first, and we are building a service in which librarians can approve submissions much like we do for our IR, after a check that metadata is added to ensure adequate descriptions for discovery and preservation. We believe that as data sets become collections, libraries will need to apply collection development principles to manage them
  • We continue to work on the longevity of support for these collections… currently we see them supported on the discovery platform for 10 years, at which time a selection or de-selection decision can be made, which is why we believe librarians must be involved early on, so they can make collection mgmt decisions down the road. We have just received funding to work on this and the preservation environment that will support long term preservation

ESI Supplemental 1   E-research Support Slides ESI Supplemental 1 E-research Support Slides Presentation Transcript

  • DuraSpace/ARL/DLF E-Science Institute E-Research Support atJohns Hopkins University & Purdue University Supplemental Webinar Wednesday, October 17, 2012 1:00-2:30 pm EDT
  • E-Research Support at Johns Hopkins University Presented by Sayeed Choudhurry, Johns Hopkins University, Sheridan Libraries Associate Dean for Library Digital Programs &Director, Hodson Digital Research & Curation Center DuraSpace/ARL/DLF E-Science Institute
  • Data Conservancy• Data Conservancy (DC) is a community that develops solutions for data preservation and sharing to promote cross-disciplinary re-use.• DC Service Instance: data centric hardware, software, components, and APIs within an organizational context – installed at Johns Hopkins University and National Snow and Ice Data Center DuraSpace/ARL/DLF E-Science Institute
  • Data Sharing Attributes• Feature Extraction Framework that atomizes data into constituent parts for indexing, metadata extraction, etc.• Discipline agnostic data model (inspired by PLANETS project)• Provenance and Lineage service• Spatial, temporal and (soon) taxonomic query capabilities• Sustainability through diverse funding from Johns Hopkins University, direct charges to NSF grants, other grants and community development DuraSpace/ARL/DLF E-Science Institute
  • Data Management LayersLayers Characteristics Implication for PI Implication relative to NSFCuration Adding value throughout • Feature Extraction • Competitive life-cycle • New query advantage capabilities • New • Cross-disciplinary opportunitiesPreservation Ensuring that data can • Ability to use own • Satisfies NSF be fully used and data in the future needs across interpreted (e.g. 5 yrs) directorates • Data sharingArchiving Data protection including • Provides identifiers • Could satisfy most fixity, identifiers for sharing, NSF requirements references, etc.Storage Bits on disk, tape, cloud, • Responsible for: • Could be enough etc. • Restore for now but not Backup and restore • Sharing near-term future • Staffing
  • Establishing the JHU DMS• May 2010 NSF announces DMP expectations• Services incubated and scoped summer/fall 2010 – Build on Data Conservancy expertise• Proposed in January and launched in July 2011 – Consultative data management planning services to support NSF proposals – Post award data management services• Assessment of service in March 2012 DuraSpace/ARL/DLF E-Science Institute
  • Background work to scope services• Review of data management plan best practices and development of questionnaire• Piloted data management consultations as cases• Short data survey with over 70 JHU researchers• Analysis of JHU NSF proposal and award activity• Business school capstone project on storage options and costs• Review of past data archiving projects and work DuraSpace/ARL/DLF E-Science Institute
  • Proposing data management services• Services scoped to support anticipated NSF requirements and to reflect system capabilities – Defined time limits, volume of data deposited per project, unencumbered data only for now• Prepared budget for services – Five year timeframe for costs – All costs included: staffing, hardware, overhead, etc. – Cost assumptions included: total data archived, complexity of data prep for ingest DuraSpace/ARL/DLF E-Science Institute
  • Developing financial modelSupport secured and financial model established• Data management planning for NSF proposals – Service directly funded by schools – Each school pays percentage according to 3 year average of total NSF proposals submitted• Post award data management – Fee based service billed through a service center – First year fee a percent of total direct costs on grant DuraSpace/ARL/DLF E-Science Institute
  • JHU Data Management Services teamDedicated group (that collaborates with DC and Digital Research and Curation Center)• Two data management consultants• Senior technical consultant (Part-time)• Software developer• System administrator (to be hired)• Interim manager (Part-time) DuraSpace/ARL/DLF E-Science Institute
  • Service marketing• Reach out through all stakeholders – Announcements through Deans – Work with research projects administration – Outreach to department administrators – Briefings with library colleagues/departments – Presentations to researchers, graduate students• More to do….and then repeat! DuraSpace/ARL/DLF E-Science Institute
  • Observations• Role of Choudhury as NSF PI within JHU• Sheridan Libraries R&D and experience with scientific data• Already embedded within research enterprise• Specifics will vary by institution but JHU approach can be generalized…• …But each institution should consider appropriate role(s) or approach DuraSpace/ARL/DLF E-Science Institute
  • Resources• http://dataconservancy.org• Alpha release of software - https://dataconservancy.org/software/download s/• http://dmp.data.jhu.edu• Reviewer guidelines for data management plans - http://dmp.data.jhu.edu/assistance/grant- reviewers-worksheet-for-data-management- plans/
  • Acknowledgements• NSF Award OCI-0830976• Sheridan Libraries financial support• Johns Hopkins University financial support• Data Conservancy colleagues for their exceptional work and patience DuraSpace/ARL/DLF E-Science Institute
  • Questions DuraSpace/ARL/DLF E-Science Institute
  • On overview of Sustaininge-Science Collaboration in an Academic Research Library – the Purdue Experience James L. Mullins, PhD Dean of Libraries & Esther Ellis Norton Professor October 17th, 2012 Libraries
  • What is meant when we say the libraryhas a role in sustaining e-science?•Application of library and archival science principles and theoryto data management.•Collaboration of Libraries with faculty, information technology,research office, and sponsored programs to develop a processand repository to manage and preserve data. DuraSpace/ARL/DLF E-Science Institute Libraries
  • I. Background and Development of theLibraries collaboration with e-Science atPurdue – on local and national levels.• Local – Conversations with researchers, research office, etc.• Local – Principles of library and archival sciences.• Local – Restructuring of Libraries.• National – NSF Data Management dialogue.• Local – Creation of Data Research Scientist.• Local – Librarians not able “to service ” funded research.• Local – Librarians with professorial rank and tenure (start-uppackage of $40,000+).• Local – Distributed Data Curation Center (D2C2) DuraSpace/ARL/DLF E-Science Institute Libraries
  • I. Background and Development of the Libraries involvement in e-Science at Purdue – on local and national levels (con’t).• National – IMLS grant to develop Data Curation Profiles.• Local – Partnerships of subject liaison librarians and faculty.• Local – Re-definition of librarian roles.• Local – Collaboration/advising on data management librarian role.• National – IMLS grant to develop Data Information Literacy.• National– Develop/teach ICPSR data science curriculum.• National – IMLS grant to develop Databib-DMPTool collaboration.• International – DataCite-Databib collaboration.• Local – Society of American Archivists (SAA) workshop. DuraSpace/ARL/DLF E-Science Institute Libraries
  • Sustainability – applied expertise of librarians.• Must be integrated into role of librarians.• New positions must be created (data curation specialists, etc).• Priority for new positions must be established with a totalview of strategic growth areas (at Purdue data managementand information literacy).• Salaries partially funded through sponsored research, makingfunds available for other positions and graduate researchassistants.• Cluster hires with colleges and schools.• Critical role of librarians in research garners additionalsupport from University Administration. DuraSpace/ARL/DLF E-Science Institute Libraries
  • Research Collaborations 2012/2013• Big Data and Complex System Analytics to Enhance Societys Resilience w/ Agronomy• Human Rights Texts for Digital Research: Archiving and Analyzing Amnesty International’s Historic Urgent Action Bulletins w/ Political Science• A Cross-Disciplinary Design Thinking Research Symposium to Catalyze Groundbreaking Research and Practice w/ Engineering Education• Establishing a Materials Center for Agriculture, Food and Health w/ Food Science DuraSpace/ARL/DLF E-Science Institute Libraries
  • Purdue University Research Repository – PURRII. Building a Data Curation Program and Repository • Not done independently of librarians knowledge & support structure within Libraries • In 2006, collaboration built around Purdue’s HubZero platform in answer to NSF DataNet RFP. • 2007 – 2010 Provost informed of impending data management mandate. • May 2010 – NSF announcement. • Summer 2010 – Provost and VPR appoint taskforce of faculty researchers – co-chaired by CIO and dean of libraries to develop “ template.” Report written August 2010. DuraSpace/ARL/DLF E-Science Institute Libraries
  • PURRII. Building a Data Curation Program and Repository(Con’t)• 2010 – • Commitment to develop repository jointly by ITaP, OVPR, and Libraries - $90K •Working Group created to plan and develop Purdue University Research Repository (PURR). • Workshops sponsored by OVPR, conducted by Libraries and ITaP; •Libraries create resources to support faculty in developing DMPs. DuraSpace/ARL/DLF E-Science Institute Libraries
  • PURRII. Building a Data Curation Program and Repository(Con’t) •2011/2013 • Libraries Budget request indicated need for positions to support sustainable data curation. • 479 grant proposals to date include PURR in data management plans • 36 grants (so far) awarded with PURR as DMP. • TRAC certification underway – ISO 16363. DuraSpace/ARL/DLF E-Science Institute Libraries
  • PURRII. Building a Data Curation Program and Repository(Con’t)• What is provided by PURR? Any Purdue faculty, graduatestudent, or staff can: • Create a trial project of 500 MB for three years. • External funding project receives 100GB for ten years. • Invite collaborators to join from other institutions. • Datasets can be published w/o grant: 50MB; with, 10GB. • Each project receives to-do lists to manage projects; • Wiki area for notes; • Micro-blogging interface (similar to Facebook) for discussion among team. DuraSpace/ARL/DLF E-Science Institute Libraries
  • PURRII. Building a Data Curation Program and Repository(Con’t)• PURR Digital Preservation Policy approved April, 2012http://www.lib.purdue.edu/spcol/content/PURRdigitalpreservationpolicy.pdf• Working Group report on three year funding requirements • One time - $1.2 M – received January 2012. • Ongoing costs - $194,000 / year.• Ongoing costs: F&A? Charge Back? DuraSpace/ARL/DLF E-Science Institute Libraries
  • DataPURR BB Management Discovery Preservation
  • OVERVIEW OF PURR Research Collaboration, Data Management Publishing & Archiving Researchers Libraries Data Services OVPR/SPS (Reference & Policy, Submission, and Consulting) & Grant Compliance Preservation ITaP Infrastructure (HUBzero™)
  • OVERVIEW OF PURR• Collaboration of ITaP, Libraries, and OVPR• Based on HUBzero, provides a hub for Purdue researchers and their collaborators to use, manage, and share their data• Comprehensive resource for supporting research data management (Knowledge Base, tutorials, example plans, boilerplate text, ask questions, etc.)• Approximately 1/3 of NSF proposals submitted from Purdue last year included PURR as a component of their data management plans• Purdue researchers are not required to use PURR. Other options may be appropriate such as center facilities or disciplinary repositories.
  • WHY USE PURR ?PURR can be used for…Managing DataPublishing DataPreserving Data andResearch Collaboration
  • QUICK START http://research.hub.purdue.eduWhat can be done right now: – Create an account – Create a project • a default allocation of storage for free and can purchase more if you need it – Invite collaborators – Upload data to project – Publish and/or archive datasets with Digital Object Identifiers (DOI) – Search, browse, and cite published datasets
  • Overview model PURR FUNCTIONS of PURR functions STEP 1 DiscoveryData mgmt Creating IF grant for Data commitment ends, planning projects, awarded, publishing submitted Long termresources collaborating more space /archiving preservation decision Create Research, data generation/collection Uncurated data Curation Discovery & Dissemination Long term preservation Researchers are guided to PURR for help with data mgmt plans by Pre-Awards, workshops and promotion, and by word-of-mouth
  • PURR FUNCTIONS STEP 2 DiscoveryData mgmt Creating IF grant for Data commitment ends, planning projects, awarded, publishing submitted Long termresources collaborating more space /archiving preservation decisionPLAN DEVELOP PROJECT EXPAND PUBLISH DATA DISSEMINATE DATA Initiate Research, data generation/collection Uncurated data Curation Discovery & Dissemination Long term preservation Researchers can create projects at any time, invite others to join… the goal is to help facilitate research development
  • Overview model of PURR FUNCTIONS PURRfunctions STEP 3 DiscoveryData mgmt Creating IF grant for Data commitment ends, planning projects, awarded, publishing submitted Long termresources collaborating more space /archiving preservation decision Initiate Research, data generation/collection Uncurated data Curation Discovery & Dissemination Long term preservation Once a grant is awarded, researchers get an increase in space allocation and length of time for project and data
  • PURR FUNCTIONS STEP 4 DiscoveryData mgmt Creating IF grant for Data commitment ends, planning projects, awarded, publishing submitted Long termresources collaborating more space /archiving preservation decision Initiate Research, data generation/collection Uncurated data Curation Discovery & Dissemination Long term preservation To make data sets publicly discoverable and available, there is a submission and “publishing” process
  • PURR FUNCTIONS STEP 5 DiscoveryData mgmt Creating IF grant for Data commitment ends, planning projects, awarded, publishing submitted Long termresources collaborating more space /archiving preservation decision Initiate Research, data generation/collection Uncurated data Curation Discovery & Dissemination Long term preservation PURR policy allows for a specified time for discovery, and then decisions are made regarding long-term preservation
  • WHERE CAN I GO FOR HELP ?Overall help: Librarians(link to subject librarians directory or name)Data Services: http://www.lib.purdue.edu/research/dataservices Librarians consult on best practices for data formats, metadata, sharing, reuse, archiving, review plans,write letters of support, and collaborate as partners/co-PI’s on proposals.Grant preparation: Sponsored Programs Services (SPS)PURR Website: http://research.hub.purdue.edu
  • Retrieval and Citation•Establish easier access to scientific research data on the Internet.•Increase acceptance of research data as legitimate, citable contributions to the scientific record.•Support data archiving that will permit results to be verified and re-purposed for future study. http://www.datacite.org/ Libraries
  • Linking of Dataset to ArticleThe DOI system offers an easy way to connect the article with the underlying data: The dataset: Kuhlmann, H et al. (2009): Age models, iron intensity, magnetic susceptibility records and dry bulk density of sediment cores from around the Canary Islands. doi:10.1594/PANGAEA.727522, Is supplement to the article: Kuhlmann, Holger; Freudenthal, Tim; Helmke, Peer; Meggers, Helge (2004): Reconstruction of paleoceanography off NW Africa during the last 40,000 years: influence of local and regional factors on sediment accumulation. Marine Geology, 207(1-4), 209-224, doi:10.1016/j.margeo.2004.03.017 DuraSpace/ARL/DLF E-Science Institute Libraries
  • Retrieval and Citation•Establish easier access to scientific research data on the Internet.•Increase acceptance of research data as legitimate, citable contributions to the scientific record.•Support data archiving that will permit results to be verified and re-purposed for future study. http://www.datacite.org/ Libraries
  • In the United States threeDataCite Members ProvideDOIs for datasets: http://datacite.org/DataCiteUS Libraries Libraries
  • No one/right way to sustaine-science or datamanagement; eachinstitutional environmentwill be different and requireits own uniquecollaborations or roles. DuraSpace/ARL/DLF E-Science Institute Libraries
  • Thank youQuestions:jmullins@purdue.edu DuraSpace/ARL/DLF E-Science Institute Libraries