Using the to meet
internal data management needs:
Lessons from the Smithsonian
Stephanie Simms | @stephrsimms
25 February 2016
• Journals – PLOS, Nature
• Funders – NSF, NIH, DOE, NASA, etc.
• Institutions – SI
Data management & sharing mandates
dmptool.org
Smithsonian Institution Dashboard
http://dashboard.si.edu/
Bee AllYou Can Bee
NMNH Entomology
https://dpo.si.edu/blog/bee-all-you-can-bee
Data management at SI
1. DAMPs for each museum/unit
2. DAMPs for intra- and extramural projects
3. DMPs for grant-funded research projects
through Office of Sponsored Research
Additional requirements
• Institutional hierarchy
• Formatting templates and DMP content
• Data mining
Lessons
• Living documents
• Assessment potential
• APIs!
• Locales = fluid

International DMP workshop presentation, IDCC, Feb 2016

Editor's Notes

  • #3 SI has their own public access plan released last year, applies to journal articles and data Mission to increase and diffuse knowledge throughout the world
  • #4 -DMPTool is similar to DMPonline, separate tool developed to address specific funder requirements in the US -primary use case also similar to DMPonline, individual researchers creating DMPs at partner institutions (180, mostly US unis) DMPTool was developed by a team of founding partners: CDL, DCC, DataONE, University Libraries, Smithsonian (Sherry Lake from UVA is here) V1 - 2011 V2 – 2014 added functionality, mostly thanks to SI use cases -Smithsonian got involved from the outset because they were preparing themselves for institutional reqs; wanted a tool to facilitate this process -Günter “I don’t believe in paper-pushing exercises” -I thought it would be instructive to share some details about the SI use case – for the moment we’re all fixated on looking outward to funding agencies -SI is looking inward at internal practices, management, self-assessment, arguably something we should all be doing more of and might be required to do in the future
  • #5 Important numbers: -19 museums, 9 research centers plus affiliated centers in Puerto Rico and Panama + zoo -Mass digitization and 3D modeling program – goal to digitize all of the 138.1M objects (2M so far) -people numbers at the bottom – required to track all of their data activities Summary of the SI dashboard that presents interactive analytics – great example of transparency and openness Also a tool for self-assessment, applications for analytics was a theme from Cliff Lynch’s summary yesterday
  • #6 Follow Us on Our 40 day Odyssey to Digitize over 40,000 Bumble Bees from the Smithsonian's National Museum of Natural History's Entomology Dept's Bombus Collection! Each unit at each museum has produced a DAMP “digital asset management plan” for daily tasks (e.g., mass digitization) in a Word template but this does not scale for project-based DMPs -run conveyor belts, efficient process now running digitization at scale
  • #7 -next you need a project-based DMP for both internal and externally funded projects making use of SI equipment and/or staff - this part hasn’t been rolled out yet because of complexity (U.S. Trust: private donations, Endowment, memberships, contracts and business income = hundreds of millions annually) -need to administer DMPs within hierarchical organizational structure of SI - every museum would have DMP coordinator, this person would create statistics/annual reports, need to have access to data, living documents -Office of Sponsored Projects: separate effort for external grants, someone there directs people to DMPTool ($161 million grants FY2015) Principles from Dir 610 – 1 - Digital assets are publicly accessible 2 - Digital assets are valuable resources and future sustainability should be considered throughout their lifecycle
  • #8 DMPTool Dashboard UI Overview of plans I own or co-own and whether any are pending reviewer comments DMPs for me to Review as an Admin DMP Templates I’ve created for users at my institution, as well as public templates Typical use case: describe SI use cases responsible for review functionality, other functionality in admin interface thanks to clear requirements for V.2, institutional partners starting to leverage these features
  • #9 Again for comparison: NSF Earth Sciences template – typical scenario for funder template, 5 questions Obligation: Optional
  • #10 Contrast: SI project template – 9 sections with up to 8 questions for each Obligation: Mandatory, If Applicable
  • #11 Unfortunately, since V.2 release they’ve determined that they need (35) additional enhancements for the Admin interface, hierarchical structure, formatting templates and DMP content, and more robust data mining capabilities
  • #12 DAMP/DMPs are living documents at the SI, day-to-day operations, lifecycle/stewardship of digital materials, reuse! Data mining to make business decisions (e.g., IR, predict future needs), transparency and openness, analytics theme Importance of API for interoperability with other systems, e.g. researcher profile management, passing info to funders, cannot be underestimated Locales are not so neatly defined: nation, institution, etc SI research involves multiple funding streams, languages, countries, institutions have to think in terms of how research is actually done, build systems that support this to truly enable open science