SEAD Prototype: Data Curation and Preservation for Sustainability Science
Upcoming SlideShare
Loading in...5
×
 

SEAD Prototype: Data Curation and Preservation for Sustainability Science

on

  • 93 views

A poster presented at ESIP July 2013

A poster presented at ESIP July 2013

Statistics

Views

Total Views
93
Views on SlideShare
93
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

SEAD Prototype: Data Curation and Preservation for Sustainability Science SEAD Prototype: Data Curation and Preservation for Sustainability Science Presentation Transcript

  • The SEAD Prototype: Data Curation and Preservation for Sustainability Science Beth Plale, Robert H. McDonald, Kavitha Chandrasekar, Inna Kouper, Indiana University, {plale, rhmcdona, kavchand, inkouper}@indiana.edu Margaret Hedstrom, James Myers, University of Michigan, {hedstrom, myersjd}@umich.edu Praveen Kumar, Rob Kooper, Luigi Marini, University of Illinois at Urbana-Champaign, {kumar1, kooper, lmarini}@illinois.edu SEAD Vision and Rationale      Serve interdisciplinary and data-driven research in sustainability science Enable access to publications, data and people Support new types of analyses with heterogeneous data Reduce overall cost of data curation and preservation Capture metadata to provide immediate value for users, producers and repositories Increase capabilities for research data re-use Active Curation, Actionable Data (ACR) Community Exploration, Research Analytics (VIVO) SEAD Use Cases (focusing on curation)     Active Content Repository APIs – Web Services Role-based Access Control Data/Metadata Management People / Projects / Publications Data Citations Organizations Visualized Networks and Community Dynamics People Projects Publications Organizations Data Citations Visualizations  Branded Public Access  Active Project Spaces  Individual Data Pages Data pages Collection pages Tag – Search – Map Project Summary Geo-Web App Branded Repository Android – Desktop Apps  Ingestion of heterogeneous data types (e.g., images, geo-spatial data, and sensor data) and mapping of semantic relationships among the research data collections as well as semantic annotation and tagging.  Support of data discovery through interoperable standards and algorithms, social networking and data publishing.  Enhancements of existing data through automated scientific metadata extraction and data visualization plugins.  Ingestion of new data sets directly via workbench tools.  Curation of data via federated deposit into institutional and disciplinary repositories. Data Publication, Preservation and Discovery (VA)  Policy-Driven Curation  Institutional / Cloud / Grid Storage  Faceted Search Curator’s Workbench Ingest Processing Matchmaking Faceted Search Geo-spatial Search  SEAD Prototype VIVO APIs – Web Services APIs – Joseki – Web Services Extractors and Indexing User Management RDF –Tupelo 2 – Medici – Lucene – Geoserver SPRAQL / HTTP User / Entity Management – Analytics Jena – RDF MySQL – Local File System Virtual Archive SPRAQL / HTTP BAGIT Metadata Extraction Persistent IDs Indexing Archiving Solr Query (XML) Geospatial Query BagIt Matchmaker Conversion DataONE Member Node MySQL – Local File System – Solr – PostGIS MySQL – Local File System Acknowledgements SEAD is funded by the National Science Foundation under Cooperative Agreement #OCI0940824. SEAD gratefully acknowledges all of our partner participants who have been involved in developing our services framework. This includes the research teams from the following organizations: School of Information, University of Michigan; Department of Civil and Environmental Engineering, the National Center for Supercomputing Applications (NCSA) and UIUC Libraries, University of Illinois at Urbana-Champaign; Data to Insight Center, IU Libraries and School of Informatics and Computing, Indiana University; the Interuniversity Consortium for Political and Social Research (ICPSR); the National Center for Earth-Surface Dynamics (NCED) and the Data Conservancy Project, John Hopkins University. Currently, SEAD has implemented core functionality for uploading, annotating, and viewing data, linking data to researcher profiles, and mechanisms to package this information and transfer it to institutional repositories or archival cloud storage. The curation pipeline to institutional repositories supports both long-term preservation and search and discovery workflows. The SEAD prototype is currently being tested by ingesting, annotating, and preserving datasets from the National Center for Earth Surface Dynamics (1.6 terabytes of data containing over 450,000 files) which involves transfer of data and metadata between SEAD ACR, VIVO and VA components.