1. www.postersession.com
SEAD Member Node is a Multi-Repository node that:
• Serves sustainability science community
• Shares and preserves data via any number of repositories
• Gives focused exposure through DataONE to sustainability science
products
• Advocates for and supports distributed and interoperable data
sharing
SEAD 2.0 Multi-Repository Member Node
Charitha Madurangi, Inna Kouper, Yu Luo, Isuru Suriarachchi,
Kunalan Ratharanjan, and Beth Plale
Indiana University SEAD
Curbee is a lightweight Java publishing pipeline that:
• Accepts Research Objects formed as a single ORE package
• Validates contents of Research Object
• Generates and preserves metadata
• Notifies recommended repository of prepared submission
• Synchronizes metadata with DataONE for discoverability
A recommendation engine that:
• Responds to requests from Project Spaces and independent
clients for recommendations for their Research Object
• Utilizes information from profiles about People, Data, and
Things (Repositories) to recommend the most appropriate
repository for a research object
Matchmaker profiles People, Data, and Things (Repositories):
• People: profiles are one of three kinds: ORCID ID, Google
identity, or Clowder identity (NCSA)
• Data: profiles are in JSON-LD format as defined by SEAD
• Repositories: definitions inspired by profiles used in the
Registry of Research Data Repositories, re3data.org
SEAD 2.1 will support fuzzy reasoning
Plale, B., Kouper, I., Goodwell, A., & Suriarachchi, I. (in press).
Trust Threads: Minimal Provenance for Data Publishing and
Reuse. In C. R. Sugimoto, H. Ekbia, & M. Mattioli (Eds.), Big
Data is Not a Monolith: Policies, Practices and Problems. MIT
Press.
• 2.0 release Jul 2016; 2.1 release Oct 2016
• Long term commitment to research objects published in IU SEAD
Cloud
• Make publishing tools standalone
• Embed SEAD publishing into existing analysis frameworks
• Extend Curbee to handle external submissions
A popular but basic repository of SEAD that is:
• Large scale replicated storage server at Indiana University
• Highly available and ingests large scale objects easily
• Pulls new Research Objects from SEAD
• Uses BagIt to describe the research objects
• Harvests minimal metadata and creates landing page per object
• Stores objects as zipped archives, but exposes the contents from
the landing page
• Assigns DOI
Core principles:
1. Offer curation as early in research process as possible
2. Provide choice to researchers in where to publish their data
3. Avoid duplication of services when partnering with existing and
emerging repositories
4. Support minimal data provenance by tracking and distinguishing
between revised, derived, and replicated datasets
CurBeeSubmissionAPI
Independent
Research Object
Submissions
Komadu
Provenance Store
Data Storage
MongoDB
object profile and
state store
DataONE
web file-space
CurBee
Persist
Library of Micro-services
Validate
Record provenance
CurBee-Service
DataONE MN API
Prepare for DataONE
Generate metadata
Curbee Architecture
Research
Object
Unique ID
Agents
StatesRelationships
Content
• Data creator
• Curator
• Data re-use scientist
• Live
• Curated
• Published
• Aggregates
• Related to
• Describes
• Derived from
• Versioned from
• Files
• Bitstreams
• Pointers
• Annotations