Open Research Data Pilot
Open research data and data management
for Horizon 2020 projects
OpenAIRE
Belgium
Emilie Hermans
Project Assistant OpenAIRE, UGent can be reused under the CC BY license
2
Why data management/
open data?
1. e.g. Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage. PeerJ 1:e175 https://doi.org/10.7717/peerj.175, Piwowar HA, Day RS, Fridsma DB
(2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308
2. Cartoon: "recycle" | Foster by Patrick Hochstenbach, 2015
1. Prevents data loss
2. Data management to maximize usefulness:
organize, make understandable and reusable
3. Fosters creativity, participation of citizens
and increases transparency
4. Get credit: (much!) longer shelf life than
interpretation
3
The Open Research Data Pilot
Horizon 2020
limited and flexible pilot
• Avoid duplication of research and
loss of resources
• Foster Open Science:
transparency, effectiveness and
greater impact
Open Access
to research data
Data Management
Planning
4
Which areas are participating?
Open Research Data Pilot
Projects in other areas can participate on a voluntary basis
• Check Article 29.3 of the Model Grant Agreement
• Costs eligible (Article 6.2.D.3 of the Model Grant Agreement)
• Future and Emerging Technologies
• Research infrastructures (including e-Infrastructures)
• Leadership in enabling and industrial technologies – Information and Communication Technologies
• Nanotechnologies, Advanced Materials, Advanced Manufacturing and Processing, and Biotechnology:
‘nanosafety’ and ‘modelling’ topics
• Societal Challenge: Food security, sustainable agriculture and forestry, marine and maritime and inland
water research and the bioeconomy - selected topics in the calls H2020-SFS-2016/2017, H2020-BG-
2016/2017, H2020-RUR-2016/2017 and H2020-BB-2016/2017, as specified in the work programme
• Societal Challenge: Climate Action, Environment, Resource Efficiency and Raw materials – except raw
materials
• Societal Challenge: Europe in a changing world – inclusive, innovative and reflective Societies
• Science with and for Society
• Cross-cutting activities - focus areas – part Smart and Sustainable Cities.
5
Requirements of the Data Pilot
1. Develop a Data Management Plan (DMP)
2. Deposit data in a research data repository
Open Research Data Pilot
3. Open data: freely used, modified, and shared by anyone for
any purpose
4. Provide information, tools and instruments needed to
validate results
REASONS FOR OPTING-OUT
6Open Research Data Pilot
• Exploitation of results
• Confidentiality
• Protection of personal data
• Would jeopardize the main aim of the action
• No data generated
• Any other legitimate reason
• Complete opt-out via project amendment
• Complete or partially opt-out: describe issues in project DMP
• As open as possible as closed as necessary
Projects can opt out at any stage.
7
Develop a DMP
Open Research Data Pilot
Updated minimum at:
• Initial DMP: within first 6 Months of the project
• Mid-term review
• Final project review
Data
Management
Plan
(DMP)
Living document: revise and update
Data management plan (DMP):
• Well managed in present and prepared for preservation in the future
• Handling of data during and after project
The DMP should address the points below on a
dataset by dataset basis:
• Data set reference and name
• Data set description
• Standards and metadata
• Data sharing
• Archiving and preservation (including storage and
backup)
Annex 2
(mid-term & final review)
Scientific research data should be easily:
• Discoverable
• Accessible
• Assessable and intelligible
• Useable beyond the original purpose for which it
was collected
• Interoperable to specific quality standards
Annex 1
(by month 6)
Content of a DMP
Annex I and II of EC guidelines
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
9Open Research Data Pilot 9
How to write a DMP
Online data management tool: dmponline.dcc.ac.uk/
10Open Research Data Pilot 10
How to write a DMP
11Open Research Data Pilot 11
Create and confirm plan
12Open Research Data Pilot 12
Plan Details
13
Open Research Data Pilot
13
Final DMP: DOI
Versions Share with partners
14Open Research Data Pilot 14
Guidance
Guidance based on
guidelines EC
Guidance and links
from DCC
15
Open Research Data Pilot
15
Export to various formats
1616
Content of a DMP
Handling
of data
Collecting
and
processing
Methodology
and
standards
Open
access
Curation
and
preservation
Annex I and II of EC guidelines
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
Handling of data
Open Research Data Pilot 17
• Storage and backup
• Additional measures?
• During and after the project
Handling
of data
Collecting and analysing data
Open Research Data Pilot 18
Collecting
and
processing
• Be clear what data you use
• Provide links to data sets you used
• E.g. lab notebook, end-to-end code/scripts for
statistics
• Software can help: R, MatLab, Python…
• Be clear what methods you use
19
Data files: standard formats
Open Research Data Pilot
Use data formats that are:
Methodology
and
standards
• Open standard
• In an easily re-usable format
• Commonly used
by research community
Examples of preferred format choices:
Text .odt, .txt, .xml, .html, .rtf
Tabular Data .csv (comma separated values),
.xml, .rdf, .SPSS portable
Images .tif, .jpeg2000, .png, .svg,
Structured data .xml, .rdf
Any standard used in your field
20
Create searchable data
Open Research Data Pilot
• Data about data
• Machine readable
Using metadata
• Consists of set of attributes
• Helps prevent inappropriate use
Open Research Data Pilot 21
DON’T
HOME
TRY THIS AT
Example
Use standards of
your domain
Digital Curation Centre
General
• Dublin Core (DC)
• Datacite metadata schema
• Metadata Object Description
Schema (MODS)
Humanities
• Text Encoding Initiative
(TEI)
• Visual Resources
Association Core (VRA)
Archives/Repositories
• DatastaR minimD-space
metadata
• um Metadata
Social Science
• Data Documentation
Initiative (DDI)
Life Sciences
• Darwin Core
• Integrated Taxonomic
information System (ITIS)
Earth Science
• Directory Interchange Format
(DIF)
• Standard for the Exchange of
Earthquake Data (SEED)
Ecology
• Ecological Metadata
Language (EML)
Geographic/Geospatial
• Federal Geographic Data
Committee (FGDC)
• ISO 19115
• Geospatial Interoperability
Framework (GIF)
METADATA
STANDARDS
Where to deposit data?
Open Research Data Pilot 23
• Disciplinary data repository
Research data repository
Curation
and
preservation
• Institutional data repository
• Zenodo
• Matches data needs
• Directory of data repositories:
www.Re3data.org
Re3data.org
Re3data
Open Research Data Pilot 25
 Trustworthy
Digital repository
• Persistent identifier
• Licenses
• Access
26
What to deposit?
Open Research Data Pilot
• Tools: Documentation, scripts, software, info about statistical
analyses….
Open Access
to research data
Everything needed to validate results
presented in scientific publications
• Understandable?  add readme text file
• Data
• Metadata
• Other data described in Data Management Plan
27
What to deposit?
Open Research Data Pilot
Select
• Confidentiality/anonymization
• Regenerating data cheaper
than archiving?
• Version control
• Potentially useful to others
28
Open data
Open Research Data Pilot
• Apply an open license:
• Keep it simple
• What intellectual property rights exist in the data?
• Apply a suitable ‘open’ license
e.g. creative commons :
• Data repositories can provide licenses
Open
access
• Re3data.org
29
Example
Open Research Data Pilot
Understandable
for humans
Machine readable
metadata
Tools
Open Data
Open license
Open Research Data Pilot 30
Support and information?
3030
OpenAIRE - An Open Knowledge & Research Information Infrastructure
• www.OpenAIRE.eu offers infrastructure, tools, information and helpdesk system
FACILITATING THE
OPEN ACCESS
POLICY OF THE
EUROPEAN
COMMISSION
Open Research Data Pilot
Zenodo
For all content types
Create communities
describe publish
31
For all content types
With GitHub Integration
Create communities
upload describe publish
Open Research Data Pilot 32
OpenAIRE
3232
www.openaire.eu/search
Link your data to
publications or project
Open Research Data Pilot 33
OpenAIRE
3333
Training and support material
Information on:
• Open research data pilot
• Creating a data management plan
• Selecting a data repository
Support material:
Briefing papers, factsheets, webinars,
workshops , FAQs, helpdesk
www.openaire.eu/opendatapilot
(Open) Data
Metadata
Other tools
dmponline.dcc.ac.uk
Open
Research
Data Pilot
Data Repositories
• EC guidelines
• OpenAIRE.eu
• www.dcc.ac.uk
• Standard File Formats
• Standards metadata
schema
• Open Licences
• 6 months
• Mid-term review
• Final review
STEP 1
WRITE A DMP
Deliverable at
FIND REPOSITORY DEPOSIT DATA Supporting
infrastructure and
information
STEP 2 STEP 3 SUPPORT
• discipline/institutional
• www.re3data.org
• Zenodo
Matches data needs
Designed by Freepik
35
Questions!
Open Research Data Pilot
www.openaire.eu
@openaire_eu
Facebook.com/groups/openaire
https://www.linkedin.com/groups/OpenAIRE3893548
Emilie.Hermans@UGent.be
info@openaccess.be
can be reused under the CC BY license

Webinar: Data management and the Open Research Data Pilot in Horizon 2020

  • 1.
    Open Research DataPilot Open research data and data management for Horizon 2020 projects OpenAIRE Belgium Emilie Hermans Project Assistant OpenAIRE, UGent can be reused under the CC BY license
  • 2.
    2 Why data management/ opendata? 1. e.g. Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage. PeerJ 1:e175 https://doi.org/10.7717/peerj.175, Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308 2. Cartoon: "recycle" | Foster by Patrick Hochstenbach, 2015 1. Prevents data loss 2. Data management to maximize usefulness: organize, make understandable and reusable 3. Fosters creativity, participation of citizens and increases transparency 4. Get credit: (much!) longer shelf life than interpretation
  • 3.
    3 The Open ResearchData Pilot Horizon 2020 limited and flexible pilot • Avoid duplication of research and loss of resources • Foster Open Science: transparency, effectiveness and greater impact Open Access to research data Data Management Planning
  • 4.
    4 Which areas areparticipating? Open Research Data Pilot Projects in other areas can participate on a voluntary basis • Check Article 29.3 of the Model Grant Agreement • Costs eligible (Article 6.2.D.3 of the Model Grant Agreement) • Future and Emerging Technologies • Research infrastructures (including e-Infrastructures) • Leadership in enabling and industrial technologies – Information and Communication Technologies • Nanotechnologies, Advanced Materials, Advanced Manufacturing and Processing, and Biotechnology: ‘nanosafety’ and ‘modelling’ topics • Societal Challenge: Food security, sustainable agriculture and forestry, marine and maritime and inland water research and the bioeconomy - selected topics in the calls H2020-SFS-2016/2017, H2020-BG- 2016/2017, H2020-RUR-2016/2017 and H2020-BB-2016/2017, as specified in the work programme • Societal Challenge: Climate Action, Environment, Resource Efficiency and Raw materials – except raw materials • Societal Challenge: Europe in a changing world – inclusive, innovative and reflective Societies • Science with and for Society • Cross-cutting activities - focus areas – part Smart and Sustainable Cities.
  • 5.
    5 Requirements of theData Pilot 1. Develop a Data Management Plan (DMP) 2. Deposit data in a research data repository Open Research Data Pilot 3. Open data: freely used, modified, and shared by anyone for any purpose 4. Provide information, tools and instruments needed to validate results
  • 6.
    REASONS FOR OPTING-OUT 6OpenResearch Data Pilot • Exploitation of results • Confidentiality • Protection of personal data • Would jeopardize the main aim of the action • No data generated • Any other legitimate reason • Complete opt-out via project amendment • Complete or partially opt-out: describe issues in project DMP • As open as possible as closed as necessary Projects can opt out at any stage.
  • 7.
    7 Develop a DMP OpenResearch Data Pilot Updated minimum at: • Initial DMP: within first 6 Months of the project • Mid-term review • Final project review Data Management Plan (DMP) Living document: revise and update Data management plan (DMP): • Well managed in present and prepared for preservation in the future • Handling of data during and after project
  • 8.
    The DMP shouldaddress the points below on a dataset by dataset basis: • Data set reference and name • Data set description • Standards and metadata • Data sharing • Archiving and preservation (including storage and backup) Annex 2 (mid-term & final review) Scientific research data should be easily: • Discoverable • Accessible • Assessable and intelligible • Useable beyond the original purpose for which it was collected • Interoperable to specific quality standards Annex 1 (by month 6) Content of a DMP Annex I and II of EC guidelines http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
  • 9.
    9Open Research DataPilot 9 How to write a DMP Online data management tool: dmponline.dcc.ac.uk/
  • 10.
    10Open Research DataPilot 10 How to write a DMP
  • 11.
    11Open Research DataPilot 11 Create and confirm plan
  • 12.
    12Open Research DataPilot 12 Plan Details
  • 13.
    13 Open Research DataPilot 13 Final DMP: DOI Versions Share with partners
  • 14.
    14Open Research DataPilot 14 Guidance Guidance based on guidelines EC Guidance and links from DCC
  • 15.
    15 Open Research DataPilot 15 Export to various formats
  • 16.
    1616 Content of aDMP Handling of data Collecting and processing Methodology and standards Open access Curation and preservation Annex I and II of EC guidelines http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
  • 17.
    Handling of data OpenResearch Data Pilot 17 • Storage and backup • Additional measures? • During and after the project Handling of data
  • 18.
    Collecting and analysingdata Open Research Data Pilot 18 Collecting and processing • Be clear what data you use • Provide links to data sets you used • E.g. lab notebook, end-to-end code/scripts for statistics • Software can help: R, MatLab, Python… • Be clear what methods you use
  • 19.
    19 Data files: standardformats Open Research Data Pilot Use data formats that are: Methodology and standards • Open standard • In an easily re-usable format • Commonly used by research community Examples of preferred format choices: Text .odt, .txt, .xml, .html, .rtf Tabular Data .csv (comma separated values), .xml, .rdf, .SPSS portable Images .tif, .jpeg2000, .png, .svg, Structured data .xml, .rdf Any standard used in your field
  • 20.
    20 Create searchable data OpenResearch Data Pilot • Data about data • Machine readable Using metadata • Consists of set of attributes • Helps prevent inappropriate use
  • 21.
    Open Research DataPilot 21 DON’T HOME TRY THIS AT Example
  • 22.
    Use standards of yourdomain Digital Curation Centre General • Dublin Core (DC) • Datacite metadata schema • Metadata Object Description Schema (MODS) Humanities • Text Encoding Initiative (TEI) • Visual Resources Association Core (VRA) Archives/Repositories • DatastaR minimD-space metadata • um Metadata Social Science • Data Documentation Initiative (DDI) Life Sciences • Darwin Core • Integrated Taxonomic information System (ITIS) Earth Science • Directory Interchange Format (DIF) • Standard for the Exchange of Earthquake Data (SEED) Ecology • Ecological Metadata Language (EML) Geographic/Geospatial • Federal Geographic Data Committee (FGDC) • ISO 19115 • Geospatial Interoperability Framework (GIF) METADATA STANDARDS
  • 23.
    Where to depositdata? Open Research Data Pilot 23 • Disciplinary data repository Research data repository Curation and preservation • Institutional data repository • Zenodo • Matches data needs • Directory of data repositories: www.Re3data.org
  • 24.
  • 25.
    Re3data Open Research DataPilot 25  Trustworthy Digital repository • Persistent identifier • Licenses • Access
  • 26.
    26 What to deposit? OpenResearch Data Pilot • Tools: Documentation, scripts, software, info about statistical analyses…. Open Access to research data Everything needed to validate results presented in scientific publications • Understandable?  add readme text file • Data • Metadata • Other data described in Data Management Plan
  • 27.
    27 What to deposit? OpenResearch Data Pilot Select • Confidentiality/anonymization • Regenerating data cheaper than archiving? • Version control • Potentially useful to others
  • 28.
    28 Open data Open ResearchData Pilot • Apply an open license: • Keep it simple • What intellectual property rights exist in the data? • Apply a suitable ‘open’ license e.g. creative commons : • Data repositories can provide licenses Open access • Re3data.org
  • 29.
    29 Example Open Research DataPilot Understandable for humans Machine readable metadata Tools Open Data Open license
  • 30.
    Open Research DataPilot 30 Support and information? 3030 OpenAIRE - An Open Knowledge & Research Information Infrastructure • www.OpenAIRE.eu offers infrastructure, tools, information and helpdesk system FACILITATING THE OPEN ACCESS POLICY OF THE EUROPEAN COMMISSION
  • 31.
    Open Research DataPilot Zenodo For all content types Create communities describe publish 31 For all content types With GitHub Integration Create communities upload describe publish
  • 32.
    Open Research DataPilot 32 OpenAIRE 3232 www.openaire.eu/search Link your data to publications or project
  • 33.
    Open Research DataPilot 33 OpenAIRE 3333 Training and support material Information on: • Open research data pilot • Creating a data management plan • Selecting a data repository Support material: Briefing papers, factsheets, webinars, workshops , FAQs, helpdesk www.openaire.eu/opendatapilot
  • 34.
    (Open) Data Metadata Other tools dmponline.dcc.ac.uk Open Research DataPilot Data Repositories • EC guidelines • OpenAIRE.eu • www.dcc.ac.uk • Standard File Formats • Standards metadata schema • Open Licences • 6 months • Mid-term review • Final review STEP 1 WRITE A DMP Deliverable at FIND REPOSITORY DEPOSIT DATA Supporting infrastructure and information STEP 2 STEP 3 SUPPORT • discipline/institutional • www.re3data.org • Zenodo Matches data needs Designed by Freepik
  • 35.
    35 Questions! Open Research DataPilot www.openaire.eu @openaire_eu Facebook.com/groups/openaire https://www.linkedin.com/groups/OpenAIRE3893548 Emilie.Hermans@UGent.be info@openaccess.be can be reused under the CC BY license

Editor's Notes

  • #2 http://slideplayer.com/slide/6631125/#
  • #3 Prevents data loss: 80% of data is lost after 10 years. Data is fragile and reproducibility very difficult without data. Maximize usefulness and built much more efficient on previous work 3. Fosters creativity, participation of citizens and increases transparency 4. Data tend to have a (much!) longer shelf life than interpretation After accounting for other factors affecting citation rate, we find a robust citation benefit from open data.1
  • #4 Horizon 2020 Includes a limited and flexible pilot with opt-outs and safeguards action on open access to research data. Participating projects must develop a Data Management Plan(DMP) specifying which data will be openly accessible.
  • #5 If your project stems from one of these Horizon 2020 areas, you are automatically part of the pilot. Costs related to data management in Horizon 2020 are eligible for reimbursement during the duration of the project (see Article 6.2.D.3 of the Annotated Model Grant Agreement)
  • #6 Develop a data management plan in the first 6 months of the project and keep it up-to-date throughout their project; Deposit their research data in a suitable research data repository; Make sure third parties can freely access, mine, exploit, reproduce and disseminate their data; Make clear what tools will be needed to use the raw data to validate research results, or provide the tools themselves.
  • #8 A data management plan or DMP is a formal document that outlines how you will handle your data both during your research, and after the project is completed.[1] The goal of a data management plan is to consider the many aspects of data management, metadata generation, data preservation, and analysis before the project begins; this ensures that data are well-managed in the present, and prepared for preservation in the future.
  • #9 - Discoverable/ a standard identification mechanism such as a DOI - Accessible–how easily can the data be accessed and are there any licenses or embargo periods attached? - Assessable and intelligible –is the data provided in such a way that judgements can be made about its reliability, such as in peer review alongside a scientific paper
  • #10 DMP online: A web based tool to help researchers write a DMP
  • #11 Includes a template for Horizon 2020 projects with guidance
  • #15 checklist for a Data Management Plan: - a list of questions and guidance that researchers may find useful when writing data management plans;
  • #17 Storage of data during the project: what are you going to do with the data during the project? Collecting: how will data be collected? What will you do with the data? E.g. survey: will it include a disclaimer what will happen with the data? Types of data: Standards (formats, metadata) How will you describe them? Access policy for your data, can be open or partly open, do you need to take extra measurements to secure your data? Post-project plans, how to preserve your data?
  • #18 Will the data be stored and backed-up appropriately during the research project? For example on managed university filestores rather than external hard drives Arrange backup and storage procedures which are most suited to the partners and nature of your project
  • #19 Collecting: how will data be collected? What will you do with the data? E.g. survey: will it include a disclaimer what will happen with the data? Provide links to data sets you used or if you’re allowed, lincenses and copyright, you can also upload the original data set. Provide end-to-end code/scripts for the generation of figures and statistics Keep in mind: will someone who is not familiar with the data or the research setup understand what the data is about
  • #20 Try to make the barriers to view your data as low as possible. Use open file formats. Avoid word, pdf and excel files. You can use pdf/a for archiving/if the layout matters.
  • #21 metadata assures accessibility of the data Data about data to discover and disclose data: resource descriptions A metadata record consists of a set of attributes or elements, necessary to describe the data in question Structured information that describes, explains, locates or otherwise makes it easier to retrieve, use or manage an information source basically data is read by humans, metadata is read by PCs Helps prevent inappropriate use due to misunderstanding or research purpose or parameters
  • #23 Standards-based metadata is generally preferable, but where no appropriate standard exists, for internal use, writing “readme” style metadata is an appropriate strategy.
  • #26 Trustworthy Digital repository: either supports a repository standard or is certified
  • #27 Metadata Other data, including associated metadata, as specified and within the deadlines laid down in the Data Management Plan, that is, according to the individual judgement of each project: For instance curated data not directly attributable to a publication, or raw data. Documentation: Codebooks, lab journals, informed consents forms… required to enable reuse of the data. Will people not involved in the project understand what the data is about, how it has been processed. Read me file: in a plain text format about your data: Keep in mind: will someone who is not familiar with the data or the research setup understand what the data is about
  • #29 keep it simple: There is no requirement that every dataset must be made open right now. Starting out by opening up just one dataset, or even one part of a large dataset, is fine – of course, the more datasets you can open up the better. Open licenses: legaly sound licensing CC0: public domain, waive copyright CC-BY: Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. No additional restrictions
  • #30 NUP84 proteins. MRC is a standard file format for electron density The txt explains the parameters used