Active, actionable DMPs
Sarah Jones | Digital Curation Centre | firstname.lastname@example.org
Stephanie Simms | California Digital Library | email@example.com
Daniel Mietchen | National Institutes of Health | firstname.lastname@example.org
Tomasz Miksa| TU Wien | email@example.com
ah Martin public domain
Publishing & reuse
• DMP on periphery
• Often done at grant stage and
not looked at again
• Opportunities to (re)use
information being missed
• Disconnected & unlinked
Publishing & reuse
Kevin Dooley/ Flickr CC BY
47 participants from 16 countries
• Service providers
Understand research workflows
Develop use cases for maDMPs
Set priorities for future work
Uses cases and prioritisation
• Interoperability with research systems
• Institutional perspective
• Repository use cases
• Evaluation & monitoring
• Utilising PIDs
maDMP priority areas
• Common standards and protocols
• Funder integration
• Share/publish/deposit DMPs
• Utilise PIDs for automatic reporting
• Capacity planning (institutional & data centre)
• Automated compliance checks
The problem of freetext
Templates typically ask very broad questions, even when
dropdown options are feasible (e.g. metadata standards, file
formats, data volumes, repositories, licences…)
Plugins to give structured response
Define a minimum data model
• DMPRoadmap themes
• Common format?
From Flickr by Steve Johnson, CC BY 2.0
New text editor
Substance Forms: http://substance.io
Repository use cases
I. Repository recommender
service via re3data.org
II. Text mine to ping repositories
when mentioned in a DMP
III. Use DMP as metadata to
facilitate deposit process
IV. Deposit DMPs with data Ainsley Seago. CC BY 4.0
Institutional use cases
I. Connect researchers to
relevant services & support
II. Gather information to
forecast demand and do
III. Embed DMP in research
process (domain workflows,
ethics, admin systems)
William Murphy. CC BY-SA 4.0
Persistent identifiers (PIDs)
• Assign DOIs to releases of DMP versions
• Leverage other PIDs to populate DMP over time:
–Researcher IDs (ORCIDs)
–Funder IDs (FundRef)
–Research Resource IDs (RRIDs)
• antibodies, organisms, cell lines
• Also enables compliance monitoring http://pidapalooza.org
Utilising EC grant IDs in plans
• Harvest grant IDs from OpenAIRE API
• Provide look up when entering project details
• Enables join up of DMP with other outputs
Evaluation & monitoring
• Automated compliance checks
– did researchers do what they said they would?
• Quality or validation checks
– closed questions / range of defined options
– training and evaluation rubrics
– evaluate FAIRness of data and repository…
Next steps: maDMP pilots
From Flickr by Allen, CC BY 2.0
Think of DMPs as key elements of
a networked data management
– connected via a shared
– actionable by humans and
From Flickr by highwaysengland, CC BY 2.0
Join us for more!
Thurs 6th April, 9:30-11:00, @RDA Plenary in BCN, Active DMP IG session
Thanks to Jamie for invite to speak here today.
I’m going to reflect on some work that we’ve been doing at DCC in collaboration with our partners at UC3, as well as Daniel Mietchen from NIH and Tomasz Miksa from TU Wien.
DCC and UC3 have joined up over last year to converge on a single codebase (DMPRoadmap) to operate both of our tools – DMPonline and DMPTool. These are the two main DMP platforms worldwide and have thousands of users so provide a good basis for implementing and road testing active DMPs.
We’ve all been thinking about how we can make the most of DMPs so information is shared between systems, and the experience of producing DMPs becomes much more valuable and returns benefits to all involved
So I want to begin by thinking about the current system and how it is flawed. Most researchers see the DMP as yet another hurdle to overcome, an obstacle in the race to get grant funding.
In the UK we have a very compliance heavy landscape and it’s not always clear to what extent DMPs are being monitored so it encourages researchers to just pay lip service to the requirements
When we think about how DMPs fit into the overall research lifecycle and the various systems and processes being followed, all too often they’re on the periphery, something that is done at application stage and then not looked at again. Although funders and others talk about DMPs being actively updated, there’s little to encourage researchers to actually engage with this
Fortunately there is a lot of interest in changing and improving things. Funders are aware that they gather a lot of valuable information in DMPs and are missing opportunities to use this. It could help with data discovery for example to promote reuse of the published outputs. Some funders (Wellcome / NSF) are considering pilots in this area
Another key issue at the moment is that DMPs are disconnected from other systems. A lot more information could be exchanged and reused
Where we hope to get to eventually is to connect up DMPs with the wide range of systems researchers are using in the course of their work, so they act as more of a hub. This includes:
Research Information Management systems like Pure or Symplectic elements so administrative data from research offices can be pushed into DMPs and vice versa
Funder application systems like Je-S and the EC’s participants portal. While it’s very difficult to integrate with these, APIs could hopefully exchange some information and make the process more streamlined for researchers
Lab notebook systems like Jupyter
Storage and data management platforms like Dataverse and OSF
Journals like RIO and BMC research notes for publishing DMPs
Repositories (e.g. zenodo & figshare) to deposit DMPs alongside datasets and other outputs
Identifiers also play a key role to track assertions about people/organisations/grants etc, to trigger notifications and automate reporting activities. There are many different identifier systems that could be leveraged in DMPs e.g. fundref, RRID, DOIs, ORCIDs…
We also need to think about how we should exchange the information across systems e.g. with APIs, defining open format/protocol/standard
Building on these thoughts, I wanted to share a couple of slides that Tomasz Miksa presented at IDCC a few weeks ago. He put forward a vision for DMPs, where all the information you need is in one place and it interacts with various tools, standards and systems.
Tomasz has started using the DCC Checklist for a DMP to map across the sections of a DMP and existing tools and standards
If you’re doing work on machine-actionable DMPs or think you have something to bring, we encourage you to join up and collaborate with us. We’re engaging in a number of international fora e.g. RDA and FORCE11 to ensure there’s consensus on the way forward. We already see a lot of benefits from pooling our knowledge and bringing together people from different backgrounds and countries, so do join in and help us solve this together.
With those thoughts of collaboration in mind, we recently held a workshop at IDCC to gather thoughts on where we should be heading.
We dubbed this our utopia workshop. We didn’t want people to be constrained by what’s feasible now or in the short-term. We really wanted people to consider what would be optimum for all stakeholders involved.
We were hugely oversubscribed so couldn’t accommodate everyone at the workshop. We ended up with 47 people in total from 16 different countries so it was a very international mix. We also had representation from various stakeholder groups…
We asked people to map out their activities and research workflows to see where things connect, to develop specific use cases and set priorities for future work
So everyone got very hands-on. You can see here one of the examples of visualising workflows and connections
And some of the use cases and prioritisation:
Interoperability across research systems was the most popular topic, with a few groups discussing connections between different platforms
A number addressed institutional use cases, consider how universities could support researchers with DMPs
The repository use cases again considered integrations. You can see in the photo the desired two way exchange of info (DMPs alerting repos to data, and repos pushing info on datasets and DOIs back into DMPs)
Evaluation and monitoring was a topic of interest, particularly among funders
And the PID group also generated a lot of tangible starting points
We’re writing up full notes from the workshop which we expect to release to participants in the next week and then publish in RIOjournal as a white paper.
To give you a sneak preview, a number of priority areas came out across the groups:
All stakeholders expressed a need for common standards as a foundation to enable information to flow between plans and systems. It’s a top priority to define a minimum data model with a core set of elements for DMPs. It could potentially be based on an existing template structure or DMPRoadmap themes
Given that funders often drive demand for DMPs, there’s a lot of demand to integrate with their systems on some level. Basic interoperation to support grant submission, monitoring and reporting would help.
An increasing number of researchers are publishing DMPs and we want to support open practices. Connections with journals or repositories would help here.
PIDs could be used in several ways to pull in relevant data and also help automate reporting
Harvesting information on data volumes was critical in an institutional context and for domain data centres so they can do capacity planning exercises and ensure support is costed in
Some automation of compliance checks is also desired e.g. checking that data has been deposited in the named repository
So I now want to tease into some of the issues in more detail and give you a sense of some work that is underway, others things that are planned or activities we hope to do.
One of the primary issues we face in terms of machine-actionability is the format that DMPs are in. They are typically free-form text documents as funders and others tend to ask very broad questions, even when a limited range of options could be provided.
One things we plan to do is draw in external resources such as the RDA Metadata Standards Directory and Biosharing. This will have a number of benefits: it will help direct researchers to relevant options as they’re not always aware of good practice, and it will mean that some more structured responses are provided for future action.
There are a number of databases e.g. re3data, or wizards like the EUDAT licencing tool that could be used in this way
One of the key priorities coming out of the workshop is to define a minimum data model for DMPs
The DCC did some work a few years back to analyse requirements and good practice to derive a basic checklist for a DMP and common themes. We’ve since revised these themes in collaboration with UC3 to test that they meet the US and UK landscapes.
Currently the themes are used to associate questions and guidance within the tool, but we want to explore the potential for basic tagging (e.g. for text mining sections of DMPs)
We are also working with repositories to test mapping themes to other vocabularies e.g. DDI working group for social science repos
These themes may also form the basis of a common format
We are introducing a new text editor into the DMPRoadmap platform (the codebase from which we will operate DMPonline and DMPTool)
In future, we hope the editor can also support us to mine the text for expected terms e.g. repository names
To move on to some of the specific use cases, there are a number related to repos…
Institutions were keen to use the DMP process to connect researchers up with relevant services & support. They see the DMP as a good awareness raising and training tool.
Capacity planning is also a key use case for unis, specifically to ensure costs are written into proposals
And there’s a desire to link up across various systems in use as noted in the introductory slides. Some institutions like Purdue University are keen to be institutional pilot and map out all the information flows across different research systems so we can have a picture of a whole university as an example.
Add IDs to the DMP, forward recognition = living, updatable document
Make the point that there are many kinds of IDs, can leverage them in various actionable ways
Can be about assigning DOIs to DMPs and/or using DOIs in DMPs. Goal to enable tracking of how commitments made to funders are followed through into actions to publish/deposit outputs.
Also note Stephanie’s presentation from PIDapalooza
We have done an ORCID integration so users can connect up their DMP profile. We hope to do more with this e.g. login with ORCID and use it to bring in other relevant info
Grant IDs can also be pulled into DMPs. For the EC for example, we could use the OpenAIRE API to harvest these and provide a look up when researchers are creating their plan at month 6. This also allows us to connect up the DMP with other outputs to assist in reporting
In terms of evaluation and monitoring, funders are particularly interested in automated compliance checks, and also support for plan review. More closed questions or predefined options would help in assessment, as well as basic tools like evaluation rubrics and training to support project officers. The other aspect mentioned in terms of H2020 was an evaluation of the FAIRness of data and repositories, which is a nice handover to Ingrid who’s up next…
Our next steps are to develop some of the use cases and test these out in the roadmap platform. Everyone from the workshop wanted to be a pilot user so we have lots of volunteers. The only limiting factor will be our bandwidth to run them all!
Summary of the overall vision
Quick note about promoting greater openness and public sharing of DMPs:
DMPTool Public DMPs list
DMP Collection in RIO Journal
Depositing DMPs in Zenodo, Dataverse and other IRs
Also plan to continue this work at the Barcelona plenary session in April – join us there!