Presentation to the UM Library Emergent Research Series
Upcoming SlideShare
Loading in...5
×
 

Presentation to the UM Library Emergent Research Series

on

  • 315 views

Slides for Margaret Hedstrom's presentation on June 23, 2014.

Slides for Margaret Hedstrom's presentation on June 23, 2014.

Statistics

Views

Total Views
315
Views on SlideShare
142
Embed Views
173

Actions

Likes
0
Downloads
4
Comments
0

3 Embeds 173

http://sead-data.net 97
http://seadwp.sites.uofmhosting.net 65
https://twitter.com 11

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • MH: Revise Slide and Clarify message. <br /> <br /> One might say the everyone is under-served by today’s DPAI but Interdisciplinary researchers have particular barriers / requirements <br /> Multiple Sources <br /> extracts from reference databases <br /> observations <br /> experimental results and model outputs <br /> images <br /> derived data products <br /> Multiple file types, data types, data structures, data models <br /> Multiple resolutions (spatial, temporal, granularity) <br /> Multiple metadata standards and ontologies <br /> Local standards and data practices developed on the fly <br /> Data are vulnerable to interruptions in organizational arrangements <br /> graduate students finish PhD’s and move on <br /> project funding lapses <br /> lab or center funding sunsets <br /> <br /> One might also say that the long tail under utilizes existing DPAI (which is true) but for good reasons. <br />
  • Build from Praveen’s life cycles. <br /> <br /> Mention some of the steps that occur in curation. <br /> Mention time lag
  • Build from Praveen’s life cycles. <br /> <br /> Mention some of the steps that occur in curation. <br /> Mention time lag
  • Support inter-disciplinary research and data driven research by: <br /> <br /> Enabling access to: <br /> Publications <br /> Data <br /> People (Expertise / Potential Collaborators <br /> in novel innovative ways <br /> <br /> that continuously anticipate and adapt to changes in technologies and in user needs and expectations; <br /> <br /> Specifically, <br /> Accelerate data discovery <br /> Support new types of analyses with heterogeneous data <br /> Reduce overall costs of curation [rather than shift costs between researchers and repositories] <br /> Accelerate the movement of data from researchers into preservation, discovery and access environments <br /> Increase the quantity, improve the quality, and enhance the utility of scientific data for reuse. <br /> <br /> <br />
  • Start 2:01 Stop 4:00 ACR <br /> <br /> Start 4:00 Stop: 4:53 Vivo <br /> <br /> Start 9:57 Stop: 11:21 <br /> <br /> 11:55 – end Vivo <br /> <br /> <br /> <br />
  • Might move this section on Ingest workflow?
  • Reporting (Extra win for SEAD) and responsive to the community
  •  - less emphasis on features and functionality <br /> -  remove "context" slides (done) <br /> matchmaker workflow slide – simplify <br /> <br /> make multiple dimensions of decision-making process of matchmaker more clear <br /> - record a demo of how ingest and matchmaking works <br /> <br /> deposit to ideals; make decision-making process points clear through example of Praveen, and demonstrate visually the embargo in ideals <br /> - move DataNet slide to other decks (done) <br />
  • VA - ACR interactions - user or science side of the story <br /> <br /> A researcher at U of Illinois led the data collection effort related to Lower Mississippi flood. <br /> The data have been collected and uploaded to ACR. In ACR the data have been organized into collections, processed for easy previewing and described (tagged and annotated). One subcollection has been marked as “Ready to publish”, i.e., ready for long-term preservation. Praveen wants to preserve the subcollection, but keep it private for 5 years. <br /> SEAD Virtual Archive queries ACR and finds this subcollection. It packages the subcollection using its BagIT protocol and invokes its matchmaker algorithm to decide where to ingest the subcollection. The Matchmaker queries VIVO and finds that Praveen is from the University of Illinois. VA automatically creates a collection in IDEALS and marks it “embargoed” for 5 years. <br /> After the collection is ingested, it appears in Virtual Archive and in IDEALS. In Virtual Archive this collection can be found by searching by author, location, keywords and repository. In the future, it will also provide search by data types (e.g., images, geo, video, etc.), instruments (e.g., Lidar, Aviris) and methods (e.g., data models, experiments, etc.)
  • - VA - IR communication - bring out details about solutions for large files (SDA), explain why numbers of files are so different for SDA, Scholarworks and Ideals (slide 10)
  • After Lunch <br />

Presentation to the UM Library Emergent Research Series Presentation to the UM Library Emergent Research Series Presentation Transcript

  • SEAD: Sustainable Environment through Actionable Data Margaret Hedstrom Professor of Information Faculty Associate Institute for Social Research (ICPSR) PI, SEAD June 23, 2014
  • Overview • What is SEAD? • Vision and Rationale • Target Audience and User Communities • Current Status • SEAD, Universities, and Libraries • Some Lessons Learned (so far) • Plans and Future Engagement 2
  • What is SEAD? • A Cooperative Agreement funded by NSF to develop sustainable cyberinfrastructure for preservation and access to scientific data ($8 million/5 years) • A partnership between the universities of Michigan, Indiana and Illinois • An emerging set of services for data management, sharing, curation, discovery and preservation for researchers in the “long tail” • A case study of data needs in sustainability science 3 View slide
  • SEAD Vision and Rationale • Small teams, researchers with short-term projects, and individual scientists (the long tail) are under served by today’s data preservation and access infrastructure • These communities will take advantage of evolving data preservation and access infrastructure if: – it supports science objectives and enables new kinds of science – it is easy to use – collaborators and peers are also using it • Sustainability science is a good test case View slide
  • Researcher(s) Create and Analyze Data Researchers Publish Results ? Researchers Deposit Data Libraries Acquire Publications Repositories Curate Data Researchers Search for Publications Researchers Integrate, Create New Data Researchers Search for Data Data Preservation and Access Today
  • Researcher(s) Create and Analyze Data Researchers Publish Results ? Researchers Deposit Data Libraries Acquire Publications Repositories Curate Data Researchers Search for Publications Researchers Integrate, Create New Data, and Analyze Data Researchers Search for Data Data Preservation and Access Today
  • Research Question SEARCH for People Publications Data Collaboration Environment Discovery and Access Environments Combine, Integrate, Analyze Preservation Environments SEAD Vision Share Improve Curate Data Upload/Do wnload DataSEAD ACR SEAD Virtual Archive SEAD Social Network
  • Target Audience / User Communities Sustainability Scientists • Focused on problems that require data, methods, tools, and expertise from multiple disciplines • Requires many different types of data about physical, natural, and social phenomena in order to understand interactions between natural and human systems • Uses a combination of observational (field) data, experimental data, simulations, and models • Conducts research in small to medium-sized labs or centers under the direction of a single PI or a Center Director. 8
  • Target Audience / User Communities the “Long Tail” of Scientific Research • Data discovery is via targeted foraging and word-of-mouth • Almost all data are stored locally • Minimal local IT support • Metadata standards and ontologies, where they do exist, are based on disciplinary norms or local practices • Data formats and metadata standards are often controlled by multiple independent third-parties (e.g. instrument and application providers • Data are vulnerable to interruptions in organizational arrangements (graduate students finish PhD’s and move on – lab or center funding sunsets) • No single data set is likely to have great value standing alone, but when aggregated, combined and integrated data become valuable resources of discovery and innovation. 9
  • Overview Project Start 10/01/11 User Requirements Report 5/12 NCED Repository Ingest 8/12 Prototype Review 4/22/13 SEAD 1.0 Released 10/13 DataOne Member Node 11/13 End User Workshop 4/11/14 10th User Group 5/11/14 36-Month Review 10/14/14 Renewal (?) for Years 6-10 10
  • Summary of Current Status • Working Platform – SEAD Active Content Repository (ACR) • Collaboration / File Sharing Space for Research Projects • Staging Area for Data Prior to Publishing or Archiving – SEAD Virtual Archive • Capability to push data from ACR and/or local research environments to preservation and discovery services (Institutional Repositories/DataONE) – SEAD Research Network • Researcher initiated profiles with harvesting of citations, linkage of data-people-publications, reporting 11
  • SEAD Prototype
  • SEAD, Universities, and Libraries • From the researcher’s perspective – SEAD is an project work space that enables data sharing, commenting, secure storage, extraction of metadata, and active/social curation • From the university research infrastructure perspective – SEAD is a staging area for data curation prior to publication, submission, and preservation 13
  • Data Set Publishing Workflow •Data content used within ACR •Researcher Profile Established in VIVO NCED Data Set Ingested to ACR •Data Set ready to publish NCED Data Set Ingested to VA •DataCite minted DOI attached to finalized Data Set NCED Data Set Deposited with IR •DOI Resolution to designated IR NCED Data Set Published to VIVO
  • Data Citation Example - Person
  • Data Citation Example - Dataset DOI Authors Subject areas Abstract Geographic focus Rights information
  • SEAD: Explore Sustainability Research PEOPLE ORGANIZATIONS RESEARCH (DATA + PUBLICATIONS)
  • NCED Publications in VIVO
  • SEAD Virtual Archive • Purpose: Long-term preservation and discovery – Thin virtualization layer on top of multiple university Institutional Repositories (IRs) – Enhances IRs by being sustainability science-aware • Team: IU Libraries, UIUC Libraries, and Data To Insight Center at IU • Starting point: Data Conservancy code (Johns Hopkins U.) – Extended for sustainability science long tail use cases
  • Making Data Sustainable: Use Case Active Curation Repository (ACR) SEAD Virtual Archive IUScholarwork s UIUC Ideals Packaged object Preserve data Keep private for 5 years Index data, metadata and relationships • Collected data about Lower Mississippi flood • Stored in Active Repository • Organized as a collection • Marked “Ready for publication” • Collections visible to team only for 5 years • Deposited to repository based on dataset creator affiliation • Find by author, location, keywords or repository
  • Preview Data Upload Data to VA Run Virus Checking File Charact- erization Mint DOI Deposit to IR (& cloud) Update DOI target Index Metadata Index Scientific Metadata Large Dataset Decision Version Data IR Match- maker Index Scientific Metadata Accept Repository Agreement Ingest Workflow into SEAD VA Link to live demo http://seadva.d2i.indiana.edu:8181/sead-access/#login
  • Successful automatic ingest into UIUC IDEALS repository
  • Communication with IRs Datasets deposited into IU SDA, IU Scholworks and UIUC IDEALS
  • Some Lessons Learned • Some researchers and projects in the “long tail” have sophisticated demands for active data services • Supporting analysis of data in SEAD adds complexity and cost • Users want some degree of customization of bare-bones file storage and active project space • A big gap remains between data producers and the campus/library/archive infrastructures for long-term access and preservation. 24
  • SEAD Priorities and Future Plans • Make SEAD more stable and more usable • Attract a larger, broader, and more diverse user community – Network effects in the long tail – Self service • Expand repository options • Resolve Governance and Sustainability 25
  • More info • www.sead-data.net • Give or send email to myersjd@umich.edu for access to the SEAD Demo site