Poster RDAP13: A Workflow for Depositing to a Research Data Repository: A Case Study for Archiving Publication Data

  • 703 views
Uploaded on

Betsy Gunia, David Fearon, Benjamin Brosius, Tim DiLauro …

Betsy Gunia, David Fearon, Benjamin Brosius, Tim DiLauro
JHU Data Management Services
Johns Hopkins University Sheridan Libraries

A Workflow for Depositing to a Research Data Repository: A Case Study for Archiving Publication Data

Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13


More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
703
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
11
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. An Example Workflow for Depositing to a Research Data Repository:A Case Study for Archiving Publication DataBetsy Gunia, David Fearon, Benjamin Brosius, Tim DiLauro | JHU Data Management Services | Johns Hopkins University Sheridan Libraries | datamanagement@jhu.eduData•Pilot project with two, graduating doctoral students•Biomedical engineering field. Largely image data•Data already published, which differs from our usualservice model of working with researchers at thebeginning of their projectJHU Data Archive•Used alpha-release of Data Conservancy software [1]•Discipline-agnostic and data as primary objects•A collection of data may have an associatedmetadata file, structured or unstructured•Not yet publicly-accessibleUnderstanding Research•Met with students for initial overview of research•Read publications to map data products and activitythat created them•As shown in Fig. 1, provided a framework to organizedata and ensure that all data were included (studentscould not locate all their data)Organizing Data•Completed several in-depth meetings with students•Created new folders and subfolders with studentspresent, and moved files to appropriate location•Discussed data content, instrument(s) used, and filenaming conventions used, if any•Experimented with directory structures based onpublication figures or research methods. Studentsand advisor decided that organizing by figure wasmore useful for data reuse•Did not rename files due to time constraints andlack of consistency in filenamesPackaging•Used BagIt (v. 0.97) and TAR for packaging format•Used MD5 checksums for data (payload) and tag files•Created a documentation folder for our unstructuredmetadata (Fig. 2), which we treated as a tag file andnot part of the payload•One “bag” per publication•Unsurprisingly, it is hard for researchers to recall informationabout their data after a few years. This pilot project reinforcedthe importance of working with scientists early in theirresearch, which is our usual service model.•Due to time constraints and student recollection, our metadatacreation was limited to folder and file documentation (Fig. 2).•Closely reading and mapping the students research was centralto being able to ask them relevant questions about the data.•The BagIt specification worked well for packaging.Future WorkThis pilot project began the process of formalizing our archivingprocesses, but we have much more to do! The Data Conservancysoftware will have improved functionality over the coming years,which has implications for how we evolve the process forarchiving. For example, we currently cannot hide deposited datain the JHU Data Archive; however, researchers may want totransfer data to us before their project is complete and ready forpublic access. We need to develop rigorous processes forensuring that we maintain the integrity of the data during theoften significant alterations required to archive datasets that areuseful to others.Figure 1. Example of data flow diagram Figure 2. Example of unstructured metadata. Folderand file documentationConclusions[1] http://dataconservancy.org/software/Copyright © 2013, by JHU Data Management Services