Successfully reported this slideshow.

Active actionable DMPs



Loading in …3
1 of 24
1 of 24

Active actionable DMPs



Download to read offline

Presentation given at 2nd DPHEP Collaboration Workshop at CERN on 12 March 2017. The talk reflects on machine-actionable DMP use cases developed during an IDCC workshop ( and activities underway or planned by the DCC and UC3 teams. These will be implemented in the DMPRoadmap codebase to pilot in our respective tools, DMPonline and DMPTool

Presentation given at 2nd DPHEP Collaboration Workshop at CERN on 12 March 2017. The talk reflects on machine-actionable DMP use cases developed during an IDCC workshop ( and activities underway or planned by the DCC and UC3 teams. These will be implemented in the DMPRoadmap codebase to pilot in our respective tools, DMPonline and DMPTool

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Active actionable DMPs

  1. 1. Active, actionable DMPs Sarah Jones | Digital Curation Centre | Stephanie Simms | California Digital Library | Daniel Mietchen | National Institutes of Health | Tomasz Miksa| TU Wien | #ActiveDMPs
  2. 2. ah Martin public domain
  3. 3. Planning & administration Create, analyse, manage data Publishing & reuse • DMP on periphery • Often done at grant stage and not looked at again • Opportunities to (re)use information being missed • Disconnected & unlinked
  4. 4. Planning & administration Create, analyse, manage data Publishing & reuse Identifiers:
  5. 5. Kevin Dooley/ Flickr CC BY
  6. 6. 47 participants from 16 countries • Funders • Developers • Librarians • Service providers • Researchers Understand research workflows Develop use cases for maDMPs Set priorities for future work postcard-future-tools-and-services- perfect-dmp-world Utopia workshop
  7. 7. Uses cases and prioritisation • Interoperability with research systems • Institutional perspective • Repository use cases • Evaluation & monitoring • Utilising PIDs
  8. 8. maDMP priority areas • Common standards and protocols • Funder integration • Share/publish/deposit DMPs • Utilise PIDs for automatic reporting • Capacity planning (institutional & data centre) • Automated compliance checks
  9. 9. The problem of freetext Templates typically ask very broad questions, even when dropdown options are feasible (e.g. metadata standards, file formats, data volumes, repositories, licences…)
  10. 10. Plugins to give structured response
  11. 11. Define a minimum data model • DMPRoadmap themes • Mappings • Common format? From Flickr by Steve Johnson, CC BY 2.0
  12. 12. New text editor Substance Forms:
  13. 13. Repository use cases I. Repository recommender service via II. Text mine to ping repositories when mentioned in a DMP III. Use DMP as metadata to facilitate deposit process IV. Deposit DMPs with data Ainsley Seago. CC BY 4.0
  14. 14. Institutional use cases I. Connect researchers to relevant services & support II. Gather information to forecast demand and do capacity planning III. Embed DMP in research process (domain workflows, ethics, admin systems) William Murphy. CC BY-SA 4.0
  15. 15. Persistent identifiers (PIDs) • Assign DOIs to releases of DMP versions • Leverage other PIDs to populate DMP over time: –Researcher IDs (ORCIDs) –Funder IDs (FundRef) –Grant IDs –Research Resource IDs (RRIDs) • antibodies, organisms, cell lines • Also enables compliance monitoring
  16. 16. ORCID integration
  17. 17. Utilising EC grant IDs in plans • Harvest grant IDs from OpenAIRE API • Provide look up when entering project details • Enables join up of DMP with other outputs
  18. 18. Evaluation & monitoring • Automated compliance checks – did researchers do what they said they would? • Quality or validation checks – closed questions / range of defined options – training and evaluation rubrics – evaluate FAIRness of data and repository…
  19. 19. Next steps: maDMP pilots From Flickr by Allen, CC BY 2.0
  20. 20. Summary Think of DMPs as key elements of a networked data management ecosystem: – connected via a shared vocabulary – actionable by humans and software – versioned – public From Flickr by highwaysengland, CC BY 2.0
  21. 21. Join us for more! Thurs 6th April, 9:30-11:00, @RDA Plenary in BCN, Active DMP IG session

Editor's Notes

  • Thanks to Jamie for invite to speak here today.

    I’m going to reflect on some work that we’ve been doing at DCC in collaboration with our partners at UC3, as well as Daniel Mietchen from NIH and Tomasz Miksa from TU Wien.

    DCC and UC3 have joined up over last year to converge on a single codebase (DMPRoadmap) to operate both of our tools – DMPonline and DMPTool. These are the two main DMP platforms worldwide and have thousands of users so provide a good basis for implementing and road testing active DMPs.

    We’ve all been thinking about how we can make the most of DMPs so information is shared between systems, and the experience of producing DMPs becomes much more valuable and returns benefits to all involved
  • So I want to begin by thinking about the current system and how it is flawed. Most researchers see the DMP as yet another hurdle to overcome, an obstacle in the race to get grant funding.

    In the UK we have a very compliance heavy landscape and it’s not always clear to what extent DMPs are being monitored so it encourages researchers to just pay lip service to the requirements

  • When we think about how DMPs fit into the overall research lifecycle and the various systems and processes being followed, all too often they’re on the periphery, something that is done at application stage and then not looked at again. Although funders and others talk about DMPs being actively updated, there’s little to encourage researchers to actually engage with this

    Fortunately there is a lot of interest in changing and improving things. Funders are aware that they gather a lot of valuable information in DMPs and are missing opportunities to use this. It could help with data discovery for example to promote reuse of the published outputs. Some funders (Wellcome / NSF) are considering pilots in this area

    Another key issue at the moment is that DMPs are disconnected from other systems. A lot more information could be exchanged and reused
  • Where we hope to get to eventually is to connect up DMPs with the wide range of systems researchers are using in the course of their work, so they act as more of a hub. This includes:
    Research Information Management systems like Pure or Symplectic elements so administrative data from research offices can be pushed into DMPs and vice versa
    Funder application systems like Je-S and the EC’s participants portal. While it’s very difficult to integrate with these, APIs could hopefully exchange some information and make the process more streamlined for researchers
    Lab notebook systems like Jupyter
    Storage and data management platforms like Dataverse and OSF
    Journals like RIO and BMC research notes for publishing DMPs
    Repositories (e.g. zenodo & figshare) to deposit DMPs alongside datasets and other outputs

    Identifiers also play a key role to track assertions about people/organisations/grants etc, to trigger notifications and automate reporting activities. There are many different identifier systems that could be leveraged in DMPs e.g. fundref, RRID, DOIs, ORCIDs…

    We also need to think about how we should exchange the information across systems e.g. with APIs, defining open format/protocol/standard
  • Building on these thoughts, I wanted to share a couple of slides that Tomasz Miksa presented at IDCC a few weeks ago. He put forward a vision for DMPs, where all the information you need is in one place and it interacts with various tools, standards and systems.
  • Tomasz has started using the DCC Checklist for a DMP to map across the sections of a DMP and existing tools and standards
  • If you’re doing work on machine-actionable DMPs or think you have something to bring, we encourage you to join up and collaborate with us. We’re engaging in a number of international fora e.g. RDA and FORCE11 to ensure there’s consensus on the way forward. We already see a lot of benefits from pooling our knowledge and bringing together people from different backgrounds and countries, so do join in and help us solve this together.
  • With those thoughts of collaboration in mind, we recently held a workshop at IDCC to gather thoughts on where we should be heading.

    We dubbed this our utopia workshop. We didn’t want people to be constrained by what’s feasible now or in the short-term. We really wanted people to consider what would be optimum for all stakeholders involved.

    We were hugely oversubscribed so couldn’t accommodate everyone at the workshop. We ended up with 47 people in total from 16 different countries so it was a very international mix. We also had representation from various stakeholder groups…

    We asked people to map out their activities and research workflows to see where things connect, to develop specific use cases and set priorities for future work
  • So everyone got very hands-on. You can see here one of the examples of visualising workflows and connections
  • And some of the use cases and prioritisation:

    Interoperability across research systems was the most popular topic, with a few groups discussing connections between different platforms
    A number addressed institutional use cases, consider how universities could support researchers with DMPs
    The repository use cases again considered integrations. You can see in the photo the desired two way exchange of info (DMPs alerting repos to data, and repos pushing info on datasets and DOIs back into DMPs)
    Evaluation and monitoring was a topic of interest, particularly among funders
    And the PID group also generated a lot of tangible starting points
  • We’re writing up full notes from the workshop which we expect to release to participants in the next week and then publish in RIOjournal as a white paper.

    To give you a sneak preview, a number of priority areas came out across the groups:

    All stakeholders expressed a need for common standards as a foundation to enable information to flow between plans and systems. It’s a top priority to define a minimum data model with a core set of elements for DMPs. It could potentially be based on an existing template structure or DMPRoadmap themes

    Given that funders often drive demand for DMPs, there’s a lot of demand to integrate with their systems on some level. Basic interoperation to support grant submission, monitoring and reporting would help.

    An increasing number of researchers are publishing DMPs and we want to support open practices. Connections with journals or repositories would help here.

    PIDs could be used in several ways to pull in relevant data and also help automate reporting

    Harvesting information on data volumes was critical in an institutional context and for domain data centres so they can do capacity planning exercises and ensure support is costed in

    Some automation of compliance checks is also desired e.g. checking that data has been deposited in the named repository
  • So I now want to tease into some of the issues in more detail and give you a sense of some work that is underway, others things that are planned or activities we hope to do.

    One of the primary issues we face in terms of machine-actionability is the format that DMPs are in. They are typically free-form text documents as funders and others tend to ask very broad questions, even when a limited range of options could be provided.
  • One things we plan to do is draw in external resources such as the RDA Metadata Standards Directory and Biosharing. This will have a number of benefits: it will help direct researchers to relevant options as they’re not always aware of good practice, and it will mean that some more structured responses are provided for future action.

    There are a number of databases e.g. re3data, or wizards like the EUDAT licencing tool that could be used in this way
  • One of the key priorities coming out of the workshop is to define a minimum data model for DMPs

    The DCC did some work a few years back to analyse requirements and good practice to derive a basic checklist for a DMP and common themes. We’ve since revised these themes in collaboration with UC3 to test that they meet the US and UK landscapes.

    Currently the themes are used to associate questions and guidance within the tool, but we want to explore the potential for basic tagging (e.g. for text mining sections of DMPs)

    We are also working with repositories to test mapping themes to other vocabularies e.g. DDI working group for social science repos

    These themes may also form the basis of a common format
  • We are introducing a new text editor into the DMPRoadmap platform (the codebase from which we will operate DMPonline and DMPTool)

    Substance Forms is a Javascript library which will provide a simpler, cleaner way to write plans and annotate text with comments

    In future, we hope the editor can also support us to mine the text for expected terms e.g. repository names

  • To move on to some of the specific use cases, there are a number related to repos…
  • Institutions were keen to use the DMP process to connect researchers up with relevant services & support. They see the DMP as a good awareness raising and training tool.

    Capacity planning is also a key use case for unis, specifically to ensure costs are written into proposals

    And there’s a desire to link up across various systems in use as noted in the introductory slides. Some institutions like Purdue University are keen to be institutional pilot and map out all the information flows across different research systems so we can have a picture of a whole university as an example.
  • Add IDs to the DMP, forward recognition = living, updatable document

    Make the point that there are many kinds of IDs, can leverage them in various actionable ways

    Can be about assigning DOIs to DMPs and/or using DOIs in DMPs. Goal to enable tracking of how commitments made to funders are followed through into actions to publish/deposit outputs.

    Also note Stephanie’s presentation from PIDapalooza
  • We have done an ORCID integration so users can connect up their DMP profile. We hope to do more with this e.g. login with ORCID and use it to bring in other relevant info
  • Grant IDs can also be pulled into DMPs. For the EC for example, we could use the OpenAIRE API to harvest these and provide a look up when researchers are creating their plan at month 6. This also allows us to connect up the DMP with other outputs to assist in reporting
  • In terms of evaluation and monitoring, funders are particularly interested in automated compliance checks, and also support for plan review. More closed questions or predefined options would help in assessment, as well as basic tools like evaluation rubrics and training to support project officers. The other aspect mentioned in terms of H2020 was an evaluation of the FAIRness of data and repositories, which is a nice handover to Ingrid who’s up next…
  • Our next steps are to develop some of the use cases and test these out in the roadmap platform. Everyone from the workshop wanted to be a pilot user so we have lots of volunteers. The only limiting factor will be our bandwidth to run them all!
  • Summary of the overall vision

    Quick note about promoting greater openness and public sharing of DMPs:
    DMPTool Public DMPs list
    DMP Collection in RIO Journal
    Depositing DMPs in Zenodo, Dataverse and other IRs
  • Also plan to continue this work at the Barcelona plenary session in April – join us there!
  • ×