Data 2012 -- Presentation by Margaret Hedstrom (Jan 2012

SEAD presentation by Margaret Hedstrom at the DataNet Interop Meeting in Indianapolis January 26, 2012.

SEAD’s Goals
• Provide data services that address the needs of researchers in
sustainability science
• Integrate these services into an generalizable “Active and
Social Curation” infrastructure suited to data in the “long tail”
• Develop capabilities to package and migrate the most
valuable datasets to a federated repository infrastructure for
long-term preservation

Sustainability
Science

Science

Cooperation Technology

Policy Economics

Poverty &
Justice

3

Data challenges
• Small and derived
data sets
• Heterogeneous data
• Multiple sources of
data
• Short-lived data with
long-term value
• Value of data grows
when combined &
integrated

SEAD’s Strategy

• Leverage social media for discovery of data,
interest, and expertise
• Move data curation upstream in the data life cycle
• Involve domain scientists in setting priorities for
evolution of data and services
• Take advantage of existing infrastructures
(Institutional Repositories, ICPSR) for long-term
preservation

Active Curation Model
Active Curation Social Media

Workflows
Data Review
Rating
Commenting

Metadata

SEAD: Leveraging Existing Resources
• Cyberinfrastructure
– IU Data Capacitor/HPC Capabilities
– UIUC/NCSA HPC Capabilities
– Rensselaer CCNI Capabilities
• Repositories
– UM Deep Blue
– IU ScholarWorks
– ICPSR Repository
– UIUC IDEALS

SEAD 18 Month Prototype Targets for
Cyberinfrastructure
• Domain Engagement
– Requirements derived from researchers
– Use Cases
• Active and Social Content Curation
– Pilot Active Content Repository, VIVO deployments
– Exemplar services for Data Ingest, Discovery, Re-
use, Curation
• CI for Long-term Access
– Data model, protocol design/development
– Pilot Federated Repository infrastructure

SEAD TEAM
University of Michigan: Margaret Hedstrom (UM PI), Ann Zimmerman (Co-
PI and Project Manager), George Alter, Bryan Beecher, Charles Severance,
Karen Woollams, Jude Yew.
Indiana University: Beth Plale (IU PI), Katy Borner, Robert H. McDonald,
Kavitha Chandrasekar, Robert Ping, Stacy Kowalczyk, Robert Light.
University of Illinois: Praveen Kumar (UIUC PI), Rob Kooper, Luigi Marini,
Terry McLaren, Zaman Aktaruzzaman.
Rensselaer Polytechnic Institute: Jim Myers (RPI PI), Ram Prasanna Govind
Krishnan, Lindsay Todd, Adam Wilson.

Acknowledgments
SEAD is funded by the National Science
Foundation under cooperative agreement
#OCI0940824

http://sead-data.net

SEAD is a 5-year project funded by the NSF to develop cyberinfrastructure for sustainable data preservation and access. It is a partnership between the universities of Michigan, Indiana, and Illinois. SEAD aims to serve researchers in sustainability science who work in small teams and have diverse data needs. It provides active curation tools, collaboration spaces, and interfaces that integrate data, publications, and people. Data can be deposited to university repositories through the SEAD Virtual Archive for long-term preservation and discovery. Lessons show more support is needed to bridge data production and long-term infrastructure. Future plans include expanding the user community and repository options.

Preservation, Publishing, and People: A SEAD View

Inna Kouper

The document discusses research objects (ROs) which bundle together primary research results, metadata, software, and other materials. It describes the roles of data creators, curators, and data scientists in working with ROs as they move from initial research to publication and later reuse. The SEAD Virtual Archive (VA) implements a model for ROs that allows them to transition between different states as they move through the research lifecycle from creation to publication and reuse.

Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...

This document discusses research data management and the role of university libraries. It describes the SEAD (Sustainable Environment Actionable Data) project, which provides data services like curation, preservation, and a social community network to support research data across its lifecycle. SEAD aims to support interdisciplinary research by allowing researchers to define and manage related collections of data and metadata called Research Objects in a scalable way. The document argues that research organizations are best positioned to provide comprehensive long-term data services that integrate across the entire research process.

SEAD slide set (October 2011)

This document discusses the Sustainable Environment - Actionable Data (SEAD) project. SEAD aims to provide data services to sustainability researchers by developing tools that address challenges like heterogeneous and small datasets. It plans to move data curation upstream, involve domain scientists, and leverage social media and metadata. SEAD will integrate these active curation services into a federated infrastructure to preserve datasets long-term. The project is led by researchers from multiple institutions and funded by the National Science Foundation.

Practical and Conceptual Considerations of Research Object Preservation

This document discusses research object (RO) frameworks for preserving digital research data. It addresses the challenges of research spanning long periods of time and involving complex, heterogeneous data that changes states. The research object framework aims to capture agents, states, relationships, and content to enable automation, reproducibility, and reuse of research. The framework defines three states for research objects - live, curated, and published. Live objects are works in progress, curated objects are packaged for preservation, and published objects are immutable and citable. The framework allows documentation of research processes and outputs to build trust and facilitate reuse.

Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)

This document summarizes a panel discussion on the NSF funded Datanet partnerships program. It introduces the panelists from various Datanet projects including SEAD, TerraPop, Datanet Federation Consortium, and DataOne. It then provides more detail on the goals and strategies of the SEAD project, which aims to develop tools and services to address the needs of long-tail sustainability research by leveraging social curation and active metadata. SEAD works to move data curation upstream and engage researchers throughout the project using automated metadata and volunteered contributions.

NSF DataNet Partners Update at RDAP14

The NSF DataNet Program aims to create exemplar data infrastructure organizations called DataNet Partners to provide researchers with access to data and advance research. SEAD is one such DataNet Partner that provides lightweight data services for sustainability science. It acts as an active content repository and curation service, and is developing tools for community exploration of data. The current focus is on an end-user workshop, conference demonstrations, and interface redesign to refine models for supporting the full lifecycle of research data objects.

Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...

This document discusses the Sustainable Environment Actionable Data (SEAD) project, which aims to lower the costs and increase the value of data curation through a data lifecycle approach. SEAD provides lightweight data services to support sustainability research, including secure project workspaces, active and social curation tools, and integrated lifecycle support for data from ingest to long-term preservation. By leveraging technologies like Web 2.0 and standards, SEAD simplifies and automates curation processes using metadata captured from data producers and users. This allows curation activities to begin earlier in the data lifecycle and be distributed across researchers and curators.

This document provides an overview of the SEAD (Sustainable Environment and Ecological Development) services and tools for data curation, preservation, and sharing. It outlines the SEAD workshop agenda which demonstrates how to use project spaces to manage research data, metadata, and social features. It also describes how to publish and preserve data, connect with other researchers through profiles and a research network, and find data within a project space. The goal of SEAD is to provide secure, team-controlled spaces to manage research data throughout the data lifecycle and promote sharing and discovery.

SEAD: Lightweight Data Services for Sustainability Research

This document describes SEAD, a set of lightweight data services to help sustainability researchers manage, share, and preserve their data. SEAD offers a secure project space to work privately with data, services to publish data and get DOIs to ensure the longevity of data, and tools to connect researchers and help them get credit for their work including profiles, networking visualizations, and metrics of research impact. It also provides data discovery resources to find relevant published data through faceted search and geospatial tools to view and interact with location-based data on maps.

Building a Data Discovery Network for Sustainability Science

RDAP14: Learning to Curate Panel

NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...

The document discusses data management plan requirements for proposals submitted to the U.S. Department of Energy Office of Science for research funding. It provides context on the history of data management policies, outlines the four main requirements for inclusion of a data management plan, and suggests elements that should be included in the plan such as data types/sources, content/format, sharing/preservation, and protection. It also discusses tools like the Public Access Gateway for Energy and Science that can help manage access to research publications and data.

RDAP 15 Local ICPSR Data Curation Workshop Pilot Project

Improving Data Management Capacity in the Mekong Basin Using SEAD

Repository Federation: Towards Data Interoperability

NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...

RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...

RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework

The Neuroscience Information Framework (NIF) is an initiative of the NIH Blueprint to maximize access to and utility of worldwide neuroscience research resources. NIF catalogs over 10,000 resources including databases, literature, and materials. It provides search capabilities across these resources and develops ontologies and semantic frameworks to integrate diverse data types and scales. NIF aims to make dispersed neuroscience information more findable, accessible, interoperable, and reusable to enable new insights.

RDAP 15 Navigating the Rocky Road to Research Data Acceptance

Poster RDAP13: Data information literacy multiple paths to a single goal

University of California Curation Center

Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Information technology and resources are an integral and indispensable part of the contemporary academic enterprise. In particular, technological advances have nurtured a new paradigm of data-intensive research. However, far too much of this activity still takes place in silos, to the detriment of open scholarly inquiry, integrity, and advancement. To counteract this tendency, the University of California Curation Center (UC3) has been developing and deploying a comprehensive suite of curation services that facilitate widespread data management, preservation, publication, sharing, and reuse. Through these services UC3 is engaging with new communities of use: in addition to its traditional stakeholders in cultural heritage memory organizations, e.g., libraries, museums, and archives, the UC3 service suite is now attracting significant adoption by research projects, laboratories, and individual faculty researchers. This webinar will present an introduction to five specific services – DMPTool, DataUp, EZID, Merritt, Web Archiving Service (WAS) – applicable to data curation throughout the scholarly lifecycle, two recent initiatives in collaboration with UC campuses, UC Berkeley Research Hub and UC San Francisco DataShare, and the ways in which they encourage and promote new communities of practice and greater transparency in scholarly research.

Poster RDAP13: Research Data in eCommons @ Cornell: Present and Future

Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Cas...

Zucca "Technology & Systems"

Jeff Haywood - Research Integrity: Institutional Responsibility

Jisc

1) The document discusses challenges and solutions related to research data management (RDM) at the University of Edinburgh. It outlines the university's RDM policy and implementation plan to provide training, support, and services for storing, backing up, and sharing research data. 2) The RDM working group at the university recommended establishing a research data service strategy to provide archiving of data, globally accessible storage, and support for mobile access and collaboration. 3) Key challenges going forward include securing sustainable funding, integrating new services with existing practices, developing support staff skills, and encouraging researcher engagement with new RDM practices.

NISO Training Thursday Crafting a Scientific Data Management Plan

Feb 26 NISO Training Thursday Crafting a Scientific Data Management Plan About the Training Addressing a data management plan for the first time can be an intimidating exercise. Join NISO for a hands-on workshop that will guide you through the elements of creating a data management plan, including gathering necessary information, identifying needed resources, and navigating potential pitfalls. Participants explore the important components of a data management plan and critique excerpts of sample plans provided by the instructors. This session is meant to be a guided, step-by-step session that will follow the February 18 NISO Virtual Conference, Scientific Data Management: Caring for Your Institution and its Intellectual Wealth. About the Instructors Kiyomi D. Deards, MSLIS, Assistant Professor, University of Nebraska-Lincoln Libraries Jennifer Thoegersen, Data Curation Librarian, University of Nebraska-Lincoln Libraries

Strasser "Effective data management and its role in open research"

SEAD: Sustainable Environment-Actionable Data - Robert McDonald - RDAP12

SEAD Prototype: Data Curation and Preservation for Sustainability Science

The SEAD prototype aims to enable data curation and preservation for sustainability science research by providing tools for ingesting, annotating, visualizing, and preserving heterogeneous research data. It integrates three components: Active Curation and Research (ACR) for data management and curation, VIVO for networking and analytics of research outputs, and Virtual Archive (VA) for long-term data publication, preservation, and discovery. The prototype is being tested by curating a 1.6 terabyte dataset from the National Center for Earth Surface Dynamics involving transfer of data and metadata between the three SEAD components.

What's hot

ESA14 Workshop on SEAD's Data Services and Tools

SEAD: Lightweight Data Services for Sustainability Research

Building a Data Discovery Network for Sustainability Science

RDAP14: Learning to Curate Panel

NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...

RDAP 15 Local ICPSR Data Curation Workshop Pilot Project

Improving Data Management Capacity in the Mekong Basin Using SEAD

Repository Federation: Towards Data Interoperability

NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...

RDAP 15 EarthCollab: Connecting Scientific Information Sources using the Sema...

RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework

RDAP 15 Navigating the Rocky Road to Research Data Acceptance

Poster RDAP13: Data information literacy multiple paths to a single goal

University of California Curation Center

Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Poster RDAP13: Research Data in eCommons @ Cornell: Present and Future

Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Cas...

Zucca "Technology & Systems"

Jeff Haywood - Research Integrity: Institutional Responsibility

Jisc

NISO Training Thursday Crafting a Scientific Data Management Plan

Strasser "Effective data management and its role in open research"