The document provides an overview of the Open Research Data Pilot, the data management plan, and OPENAIRE tools and services to support implementation of FAIR data management plans. It discusses the aims of the Open Research Data Pilot, which Horizon 2020 projects are required to participate, and the types of data that must be deposited. It also covers topics like creating a data management plan, selecting a repository, making data FAIR, and OPENAIRE support resources like briefing papers, webinars, and the Zenodo repository.
LIBER Webinar: Turning FAIR Data Into RealityLIBER Europe
These slides relate to a LIBER Webinar given on 23 April 2018. Turning FAIR Data Into Reality — Progress and Plans from the European Commission FAIR Data Expert Group.
In this webinar, Simon Hodson, Executive Director of CODATA and Chair of the FAIR Data Expert Group, and Sarah Jones, Associate Director at the Digital Curation Centre and Rapporteur, reported on the Group’s progress.
LIBER Webinar: Turning FAIR Data Into RealityLIBER Europe
These slides relate to a LIBER Webinar given on 23 April 2018. Turning FAIR Data Into Reality — Progress and Plans from the European Commission FAIR Data Expert Group.
In this webinar, Simon Hodson, Executive Director of CODATA and Chair of the FAIR Data Expert Group, and Sarah Jones, Associate Director at the Digital Curation Centre and Rapporteur, reported on the Group’s progress.
20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity modelOpenAIRE
Presented by Brecht Wyns & Christophe Bahim (RDA)
during the OpenAIRE workshop "Research policy monitoring in the era of Open Science and Big Data" taking place in Ghent, Belgium on May 27th and 28th 2019
Day 1: Monitoring and Infrastructure for Open Science
https://www.openaire.eu/research-policy-monitoring-in-the-era-of-open-science-and-big-data-the-what-indicators-and-the-how-infrastructures
Presented by Helena Cousijn (FREYA)
during the OpenAIRE workshop "Research policy monitoring in the era of Open Science and Big Data" taking place in Ghent, Belgium on May 27th and 28th 2019
Day 1: Monitoring and Infrastructure for Open Science
https://www.openaire.eu/research-policy-monitoring-in-the-era-of-open-science-and-big-data-the-what-indicators-and-the-how-infrastructures
Connecting the dots - e-Infra services for open scienceOpenAIRE
Starting from Open access towards services for open science, we present OpenAIRE, OpenMinTeD and OpenUP, three EU projects that build services to facilitate and accelerate open science.
The first workshop of the series "Services to support FAIR data" took place in Prague during the EOSC-hub week (on April 12, 2019).
Speaker: Kostas Repanas (EC DG RTD)
Webinar: Data management and the Open Research Data Pilot in Horizon 2020OpenAccessBelgium
This webinar provides information about strategies for successful Research Data Management, resources to help manage data effectively, choosing where to store and deposit data, the EC H2020 Open Data Pilot and the basics of data management.
At the end of the session participants will be able to:
- Understand the basic principles and importance of RDM
- Set clear goals regarding data curation, preservation and sharing
- Comply with the requirements of the Research Data Pilot
- Draft a Data Management Plan
- Identify RDM resources and tools
What I wish I’d known at the start! What I wish I’d known at the start! Lessons learned the hard way when setting up RDM services;
Stephen Grace, London South Bank University, Sarah Jones, DCC; Research Data Network
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
https://datascience.nih.gov/news/march-data-sharing-and-reuse-seminar 11 March 2022
Starting in 2023, the US National Institutes of Health (NIH) will require institutes and researchers receiving funding to include a Data Management Plan (DMP) in their grant applications, including the making their data publicly available. Similar mandates are already in place in Europe, for example a DMP is mandatory in Horizon Europe projects involving data.
Policy is one thing - practice is quite another. How do we provide the necessary information, guidance and advice for our bioscientists, researchers, data stewards and project managers? There are numerous repositories and standards. Which is best? What are the challenges at each step of the data lifecycle? How should different types of data? What tools are available? Research Data Management advice is often too general to be useful and specific information is fragmented and hard to find.
ELIXIR, the pan-national European Research Infrastructure for Life Science data, aims to enable research projects to operate “FAIR data first”. ELIXIR supports researchers across their whole RDM lifecycle, navigating the complexity of a data ecosystem that bridges from local cyberinfrastructures to pan-national archives and across bio-domains.
The ELIXIR RDMkit (https://rdmkit.elixir-europe.org (link is external)) is a toolkit built by the biosciences community, for the biosciences community to provide the RDM information they need. It is a framework for advice and best practice for RDM and acts as a hub of RDM information, with links to tool registries, training materials, standards, and databases, and to services that offer deeper knowledge for DMP planning and FAIR-ification practices.
Launched in March 2021, over 120 contributors have provided nearly 100 pages of content and links to more than 300 tools. Content covers the data lifecycle and specialized domains in biology, national considerations and examples of “tool assemblies” developed to support RDM. It has been accessed by over 123 countries, and the top of the access list is … the United States.
The RDMkit is already a recommended resource of the European Commission. The platform, editorial, and contributor methods helped build a specialized sister toolkit for infectious diseases as part of the recently launched BY-COVID project. The toolkit’s platform is the simplest we could manage - built on plain GitHub - and the whole development and contribution approach tailored to be as lightweight and sustainable as possible.
In this talk, Carole and Frederik will present the RDMkit; aims and context, content, community management, how folks can contribute, and our future plans and potential prospects for trans-Atlantic cooperation.
Data policy must be partnered with data practice. Our researchers need to be the best informed in order to meet these new data management and data sharing mandates.
20190527_Brecht Wyns & Christophe Bahim _ FAIR data maturity modelOpenAIRE
Presented by Brecht Wyns & Christophe Bahim (RDA)
during the OpenAIRE workshop "Research policy monitoring in the era of Open Science and Big Data" taking place in Ghent, Belgium on May 27th and 28th 2019
Day 1: Monitoring and Infrastructure for Open Science
https://www.openaire.eu/research-policy-monitoring-in-the-era-of-open-science-and-big-data-the-what-indicators-and-the-how-infrastructures
Presented by Helena Cousijn (FREYA)
during the OpenAIRE workshop "Research policy monitoring in the era of Open Science and Big Data" taking place in Ghent, Belgium on May 27th and 28th 2019
Day 1: Monitoring and Infrastructure for Open Science
https://www.openaire.eu/research-policy-monitoring-in-the-era-of-open-science-and-big-data-the-what-indicators-and-the-how-infrastructures
Connecting the dots - e-Infra services for open scienceOpenAIRE
Starting from Open access towards services for open science, we present OpenAIRE, OpenMinTeD and OpenUP, three EU projects that build services to facilitate and accelerate open science.
The first workshop of the series "Services to support FAIR data" took place in Prague during the EOSC-hub week (on April 12, 2019).
Speaker: Kostas Repanas (EC DG RTD)
Webinar: Data management and the Open Research Data Pilot in Horizon 2020OpenAccessBelgium
This webinar provides information about strategies for successful Research Data Management, resources to help manage data effectively, choosing where to store and deposit data, the EC H2020 Open Data Pilot and the basics of data management.
At the end of the session participants will be able to:
- Understand the basic principles and importance of RDM
- Set clear goals regarding data curation, preservation and sharing
- Comply with the requirements of the Research Data Pilot
- Draft a Data Management Plan
- Identify RDM resources and tools
What I wish I’d known at the start! What I wish I’d known at the start! Lessons learned the hard way when setting up RDM services;
Stephen Grace, London South Bank University, Sarah Jones, DCC; Research Data Network
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...Carole Goble
https://datascience.nih.gov/news/march-data-sharing-and-reuse-seminar 11 March 2022
Starting in 2023, the US National Institutes of Health (NIH) will require institutes and researchers receiving funding to include a Data Management Plan (DMP) in their grant applications, including the making their data publicly available. Similar mandates are already in place in Europe, for example a DMP is mandatory in Horizon Europe projects involving data.
Policy is one thing - practice is quite another. How do we provide the necessary information, guidance and advice for our bioscientists, researchers, data stewards and project managers? There are numerous repositories and standards. Which is best? What are the challenges at each step of the data lifecycle? How should different types of data? What tools are available? Research Data Management advice is often too general to be useful and specific information is fragmented and hard to find.
ELIXIR, the pan-national European Research Infrastructure for Life Science data, aims to enable research projects to operate “FAIR data first”. ELIXIR supports researchers across their whole RDM lifecycle, navigating the complexity of a data ecosystem that bridges from local cyberinfrastructures to pan-national archives and across bio-domains.
The ELIXIR RDMkit (https://rdmkit.elixir-europe.org (link is external)) is a toolkit built by the biosciences community, for the biosciences community to provide the RDM information they need. It is a framework for advice and best practice for RDM and acts as a hub of RDM information, with links to tool registries, training materials, standards, and databases, and to services that offer deeper knowledge for DMP planning and FAIR-ification practices.
Launched in March 2021, over 120 contributors have provided nearly 100 pages of content and links to more than 300 tools. Content covers the data lifecycle and specialized domains in biology, national considerations and examples of “tool assemblies” developed to support RDM. It has been accessed by over 123 countries, and the top of the access list is … the United States.
The RDMkit is already a recommended resource of the European Commission. The platform, editorial, and contributor methods helped build a specialized sister toolkit for infectious diseases as part of the recently launched BY-COVID project. The toolkit’s platform is the simplest we could manage - built on plain GitHub - and the whole development and contribution approach tailored to be as lightweight and sustainable as possible.
In this talk, Carole and Frederik will present the RDMkit; aims and context, content, community management, how folks can contribute, and our future plans and potential prospects for trans-Atlantic cooperation.
Data policy must be partnered with data practice. Our researchers need to be the best informed in order to meet these new data management and data sharing mandates.
This presentation uses a long-term case study to explore the socio-scientific aspects influencing what data products are created and made available for use. We examine two major satellite remote-sensing product collections from the National Snow and Ice Data Center—one on sea ice extent and another on Greenland ice sheet melt. We examine how the products and their curation have evolved over time in response to environmental events and increasing scientific and public demand over several decades. The products have evolved in conjunction with the needs of a changing and expanding designated user community. These changes in the user community were driven by increased interest in the Arctic partly because of the rapid change in the Arctic as characterized in these data, but also because of the increasing awareness (and controversy) around climate change and its impact.
We find that a data product development cycle supported by a data product team with multiple perspectives is key to mobilizing scientific knowledge to multiple stakeholders. Furthermore, the expertise and approaches to making data open and truly useful must continually adapt to new perceptions, needs, and events. Effective data access is an ongoing process, not a one-time event.
References
Baker K S; Duerr, R E; and Parsons, M A 2016 Scientific knowledge mobilization: Co-evolution of data products and designated communities. International Journal of Digital Curation 10 (2): 110-135. http://dx.doi.org/doi:10.2218/ijdc.v10i2.346
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 14, 2016...EUDAT
| www.eudat.eu | 2nd Session: July 14, 2016.
In this webinar, Sarah Jones (DCC) and Marjan Grootveld (DANS) talked through the aspects that Horizon 2020 requires from a DMP. They discussed examples from real DMPs and also touched upon the Software Management Plan, which for some projects can be a sensible addition
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT
| www.eudat.eu | 1st Session: July 7, 2016.
In this webinar, Sarah Jones (DCC) and Marjan Grootveld (DANS) talked through the aspects that Horizon 2020 requires from a DMP. They discussed examples from real DMPs and also touched upon the Software Management Plan, which for some projects can be a sensible addition
A presentation given on the Horizon 2020 open data pilot as part of a series of OpenAIRE webinars for Open Access week 2014 - http://www.fosteropenscience.eu/event/openaire-webinars-during-oa-week-2014
The Horizon 2020 Open Data Pilot - OpenAIRE webinar (Oct. 21 2014) by Sarah J...OpenAIRE
Sarah Jones (HATII, Digital Curation Center) will provide more information on the Open Research Data Pilot in H2020: who should participate and how to comply (in collaboration with FOSTER)
Date: Tuesday, October 21 2014
Big Data Europe: SC6 Workshop 3: The European Research Data Landscape: Opport...BigData_Europe
Slides of the keynote at the 3rd Big Data Europe SC6 Workshop co-located at SEMANTiCS2018 in Amsterdam (NL) on: The European Research Data Landscape: Opportunities for CESSDA by Peter Doorn, Director DANS, Chair, Science Europe W.G. on Research Data. Chair, CESSDA ERIC General Assembly
Presentación de Joy Davidson, Digital Curation Centre (UK) en FOSTER event: Data Management Plan and Social Impact of Research. Universitat Jaume I, 27 mayo 2016
Presentació a càrrec de Mireia Alcalá, tècnica de Recursos d'Informació al CSUC, duta a terme al workshop en línia "Research Data Management & Open Science" organitzat per l'IDIBELL el 2 de novembre de 2020.
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Tom Plasterer
What to do About FAIR…
In the experience of most pharma professionals, FAIR remains fairly abstract, bordering on inconclusive. This session will outline specific case studies – real problems with real data, and address opportunities and real concerns.
·
Why making data Findable, Actionable, Interoperable and Reusable is important.
Talk presented at the Data Driven Drug Development (D4) conference on March 20th, 2019.
Webinar: Data management and the Open Research Data Pilot in Horizon 2020 OpenAIRE
This webinar provides information about strategies for successful Research Data Management, resources to help manage data effectively, choosing where to store and deposit data, the EC H2020 Open Data Pilot and the basics of data management.
At the end of the session participants will be able to:
- Understand the basic principles and importance of RDM
- Set clear goals regarding data curation, preservation and sharing
- Comply with the requirements of the Research Data Pilot
- Draft a Data Management Plan
- Identify RDM resources and tools
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATTony Ross-Hellauer
OpenAIRE and EUDAT co-present this webinar which aims to introduce researchers and others to the concept of research data management (RDM). As well as presenting the benefits of taking an active approach to research data management – including increased speed and ease of access, efficiency (fund once, reuse many times), and improved quality and transparency of research – the webinar will advise on strategies for successful RDM, resources to help manage data effectively, choosing where to store and deposit data, the EC H2020 Open Data Pilot and the basics of data management, stewardship and archiving.
Webinar recording available: http://www.instantpresenter.com/eifl/EB57D6888147
Research Data Management: An Introductory Webinar from OpenAIRE and EUDATOpenAIRE
OpenAIRE and EUDAT co-present this webinar which aims to introduce researchers and others to the concept of research data management (RDM). As well as presenting the benefits of taking an active approach to research data management – including increased speed and ease of access, efficiency (fund once, reuse many times), and improved quality and transparency of research – the webinar will advise on strategies for successful RDM, resources to help manage data effectively, choosing where to store and deposit data, the EC H2020 Open Data Pilot and the basics of data management, stewardship and archiving.
Webinar recording available: http://www.instantpresenter.com/eifl/EB57D6888147
Overview of the Research on Open Educational Resources for Development (ROER4D) Open Data initiative, highlighting data management principles, the five pillars of the ROER4D data publication approach and the project de-identification approach.
Similar to OpenAIRE and Eudat services and tools to support FAIR DMP implementation (20)
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
1. Credits: OpenAIRE team
Sarah Jones, Data Curation Centre, DCC (UK)
Marjon Grootweld, Dans (NL)
Natalia Manola, ATR (GR)
FAIR Data Management: best practices and open
issues
Paola Gargiulo, OpenAIRE NOAD/Cineca
OpenAIRE and Eudat services and tools to
support FAIR DMP implementation
2. Agenda
• The Open Research Data Pilot
• The data management plan
• OPENAIRE tools and services for the Data Pilot
• EUDAT data services
2
3. Open Research Data Pilot (2015-
2016): aims
To make the research data
generated by selected
Horizon 2020 projects
accessible with as few
restrictions as possible,
while at the same time
protecting sensitive data
from inappropriate
EC:
information already paid for by the public
should not be paid for again.
Open data is data that is free to access and
reuse
4. To whom does the Data Pilot
concern?
Current situation 2015-2016:
• Researchers funded by Horizon 2020 within 9 specified call areas.
• Opt out and opt in are possible and are being used
• Call areas: https://www.openaire.eu/opendatapilot
As of 2017:
• European Cloud Initiative to give Europe a global lead in the data-
driven economy.
• Open data will become the default option. The pilot will be extended
to cover all call areas. Opting out remains possible.
• Press release: http://europa.eu/rapid/press-release_IP-16-
1408_en.htm
Daniel Spichtinger (EC) at OpenCon 14-11-15: 3,699 Horizon 2020 signed grant agreements – 149/431 projects in core areas opted out - 409/3268 projects in
other areas opted in 4
5. Which research has to partipate in the
pilot? (2015- 2016)
• Future and Emerging Technologies
• Research infrastructures – (new: coverage of the whole area)
• Leadership in enabling and industrial technologies – Information and
Communication Technologies
• Nanotechnologies, Advanced Materials, Advanced Manufacturing and Processing,
and Biotechnology: ‘nanosafety’ and ‘modelling’ topics (new)
• Societal Challenge: Food security, sustainable agriculture and forestry, marine and
maritime and inland water research and the bioeconomy - selected topics as
specified in the work programme (new)
• Societal Challenge: Climate Action, Environment, Resource Efficiency and Raw
materials – except raw materials
• Societal Challenge: Europe in a changing world – inclusive, innovative and
reflective Societies
• Science with and for Society 5
7. Two types of data:
Data, including metadata, needed to validate
the results in scientific publications
Other data, including metadata, as specified
in the Data Management Plan, like raw data
8. The following slides come from the EC’s open access team
and provide an overview to the key points. Content from
Jean-Francois Dechamp and colleagues.
Mail: RTD-open-access@ec.europa.eu
Web: http://ec.europa.eu/research/openscience/index.cfm
Twitter: @OpenAccessEC
RDA National Event in Italy, 14-15 November 2016 8
9. Publications
Openly accessible and minable.
Eligible costs for APCs.
Research data
Openly accessible research data can
typically be accessed, mined,
exploited, reproduced and
disseminated free of charge for the
user.
10.
11.
12.
13. Three top reasons to opt out
Whether a (proposed) project participates
in the ORD or chooses to opt out does
not affect the evaluation of that project.
Proposals will not be penalised for opting
out
14. Reasons for opting out:
14
• participation is incompatible with the Horizon 2020 obligation to
protect results that can reasonably be expected to be commercially or
industrially exploited;
• participation is incompatible with the need for confidentiality in
connection with security issues;
• participation is incompatible with rules on protecting personal data;
• the project will not generate / collect any research data; or
• there are other legitimate reasons not to take part in the Pilot.
• Note that partial opt out is possible – and preferable to full opt out!
17. FAIR data
• Findable
– assign persistent IDs, provide rich metadata, register in a searchable
resource...
• Accessible
– Retrievable by their ID using a standard protocol, metadata remain accessible
even if data aren’t...
• Interoperable
– Use formal, broadly applicable languages, use standard vocabularies,
qualified references...
• Reusable
– Rich, accurate metadata, clear licences, provenance, use of community
standards...
18. Findable
• Use metadata and specify standards for metadata creation (if
any). If there are no standards in your discipline describe what
type of metadata will be created and how
• Search keywords
• Persistent and unique identifiers such as DOIs or other
handles
• File and folder naming conventions
• Versioning of the datasets and clear version numbers
18
19. Metadata and documentation
• Metadata and documentation is needed to find and
understand research data
• Think about what others would need in order to find,
evaluate, understand, and reuse your data
• Get others to check the metadata to improve quality
• Use standards to enable interoperability
19
20. Where to find metadata standards
Metadata Standards
Directory
Broad, disciplinary listing of
standards and tools
Maintained by RDA group
http://rd-alliance.github.io/metadata-
directory
Biosharing
A portal of data standards,
databases, and policies for life,
environmental and biomedical
sciences
https://biosharing.org
20
21. Accessible
• Explain which data can’t be shared openly, if any
• Specify how access will be provided in case of restrictions,
e.g., through a data committee, a license, or arranged with the
repository
• Will methods or software tools needed to access the data (if
any) be included or documented?
• Deposit the data and associated metadata, documentation and
code preferably in certified repositories which support Open
Access
21
22. Where to find a repository?
More information: https://www.openaire.eu/opendatapilot-repository
What to deposit?
a. the data needed to validate the results
presented in scientific publications, including the
metadata;
b. any other data, including the metadata, as
specified in the DMP;
c. plus for a-b the documentation and the tools
that are needed to validate the results, e.g.
specialised software or software code,
algorithms and analysis protocols (when
possible, these instruments themselves).
22
24. Interoperable
• Interoperability on data and metadata, on data exchange
formats and protocols
• Specify what data and metadata vocabularies, standards or
methodologies you will follow to facilitate interoperability
• Standard vocabulary to allow inter-disciplinary interoperability
or a mapping from your vocabulary to more commonly used
ontologies?
Aim for compliance to globally accepted practices
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 24
25. • Clarify licences early on
• License the data to permit the widest reuse possible
• Specify a data embargo, if needed
• If data re-use by third parties is restricted, explain why
• How long will the data remain reusable?
• Describe data quality assurance processes
Reusable
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 25
26. www.dcc.ac.uk/resources/how-guides/license-research-data
License research data openly
DCC guide outlines the pros and cons of
each approach and gives practical advice
on how to implement your licence
CREATIVE COMMONS LIMITATIONS
NC Non-Commercial
What counts as commercial?
ND No Derivatives
Severely restricts use
These clauses are not open licenses
Horizon 2020 Open Access
guidelines point to:
or
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 26
28. What is a data management plan?
A plan written at the start of a project to define:
• how the data will be created?
• how it will be documented?
• who will access it?
• where it will be stored?
• who will back it up?
• whether (and how) it will be shared & preserved?
DMPs are often submitted as part of grant applications, but are useful
whenever researchers are creating data
The DMP is a living document.
You are not required to provide
detailed answers to all the
questions in the first version of
the DMP (due M6)
28
Explain any selection criteria in the DMP
29. When to submit the DMP
• Note that the Commission does NOT require applicants to submit a
DMP at the proposal stage.
• A DMP is therefore NOT part of the evaluation.
• DMPs are a deliverable
• Note that the Commission requires updates. A DMP is a living or
“active” document.
30. What aspects of RDM should be in a DMP?
What data will be created (format, types, volume...)
Standards and methodologies to be used (incl. metadata)
How ethics and Intellectual Property will be addressed
Plans for data sharing and access
Strategy for long-term preservation Create
Document
Use
Store
Share
Preserve
A DMP is a plan to share!
31. What should be deposited?
• The data needed to validate results in scientific publications (minimally!).
• The associated metadata: the dataset’s creator, title, year of publication, repository,
identifier etc.
• Follow a metadata standard in your line of work, or a generic standard, e.g. Dublin Core or
DataCite., and be FAIR.
• The repository will assign a persistent ID to the dataset: important for discovering and citing the
data.
• Documentation like code books, lab journals, informed consent forms – domain-
dependent, and important for understanding the data and combining them with other
data sources.
• Software, hardware, tools, syntax queries, machine configurations – domain-
dependent, and important for using the data. (Alternative: information about the
software etc.)
Basically, everything that is needed to replicate a study should be available for others.
Research Data Alliance (RDA) http://rd-alliance.github.io/metadata-directory/standards/
FAIR Guiding Principles for scientific data management http://www.nature.com/articles/sdata201618 31
32. Archive the data openly,
unless…
• Confidentiality and security issues can be good reasons not to
publish or share – all – data. Note in the DMP the reasons for
not giving access, and deposit that part of the data under a
Restricted Access regime.
• When regenerating data would be cheaper than archiving, don’t
archive. Spend time on selecting what data you’ll need and
want to retain. Motivate your criteria in the DMP.
See http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf
For selection criteria see https://www.openaire.eu/opendatapilot
32
Grant Agreement, Art. 29.3, Open Access to research data:
33. A DMP is about ‘keeping’ data
• Storing data < > archiving data
• Archived data < > findable data
• Findable < > accessible
• Accessible < > understandable
• Understandable < > usable
• a USB stick is not safe
• Figshare is not a Trustworthy Digital Repository
• a persistent identifier is essential but no guarantee for usability
• Data in a proprietary format are not sustainable
34. How much does it cost? Who pays?
• What are the costs for making data FAIR in your project?
• Resources needed for long term preservation
• Check the UK Data Service Costing model
• The High Level Expert Group on the European Open Science Cloud
recommends that “well budgeted data stewardship plans should be
made mandatory and we expect that on average about 5% of
research expenditure should be spent on properly managing and
stewarding data”
• Who pays? How?UKDS model http://www.data-archive.ac.uk/create-manage/planning-for-sharing/costing
HLEG report
http://ec.europa.eu/research/openscience/pdf/realising_the_european_open_science_cloud_2016.pdf#view=fit&pagemode=none
34
38. Human Network e-infrastructure
NOADS: National Open Access Desks
Monitor and foster the adoption of Open
Access policies at the local level
Support researchers at the implementation of
the Open Data Pilot
FP7 post grant APCs Pilot
e-infrastructure for monitoring impact of OA
mandates and research projects
OpenAIRE guidelines for metadata exchange
Zenodo Repository for the deposition of research
products
THE POINT OF REFERENCE FOR OPEN ACCESS IN EUROPE
50 Partners: EU countries, data centers, universities, libraries, repositories
Open Access infrastructure
for research in Europe
39. Integrated Scientific Information System
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016
17.3 mi unique publications
760+ validated data providers
370Κ publications linked to
projects from 6 funders
28 K datasets linked to
publications
3.5K links to software
repositories
33K organizations
Organization
s
Projects
AuthorsDatasets
Publications
Data
Providers
Software Facilities Methods
Research
Communities
OpenAIRE-Connect
From January 2017
39
40. OpenAIRE support
materials
Briefing papers, factsheets,
webinars, workshops, FAQs
Information on
• Open Research Data Pilot
• Creating a data
management plan
• Selecting a data repository
• Personal data
Developing guidance to add
to DMPonline
https://www.openaire.eu/opendatapilot
https://www.openaire.eu/support
RDM Seminar @ ISERD, Tel Aviv - Oct 1, 2016 40
41. Information at the OpenAIRE website
• Open Research Data Pilot
https://www.openaire.eu/opendatapilot
• What is the pilot? Which H2020 strands are required to participate? What
practical steps should the researcher take?
• Create a Data Management Plan
https://www.openaire.eu/opendatapilot-dmp
• Information about how to create a Data Management Plan. First steps; When to
write and revise your Data Management Plan
• Select a Data Repository
https://www.openaire.eu/opendatapilot-repository
• Information about how to select a repository
• Frequently Asked Questions about the Open Research Data
Pilot
https://www.openaire.eu/support/faq
41
44. Briefing PaperRDM
OpenAIRE Research Data Management Briefing
Paper
• https://www.openaire.eu/briefpaper-rdm-infonoads
• This extensive briefing paper gives an overview of
Research Data Management with practical sections
about data management planning, and archiving the
research data for reuse.
44
45. OpenAIRE services
• Researchers
• Zenodo for all types of publications, data and software
• Claiming – linking research results
• Amnesia, an anonymization tool for all
• Data providers – Interoperability Guidelines, validation,…
• Project coordinators – reporting
• Funders and institutions – monitoring
• Research communities – gathering, monitoring all research
45
DASHBOARDS
46. Zenodo
Multi-disciplinary repository used for the long-tail of research
data
• An OpenAIRE-CERN joint effort
• Multidisciplinary repository accepting
– Multiple data types
– Publications
– Software – link to Github
• Assigns a Digital Object Identifier (DOI), up t 50GB per
dataset
• Links funding, publications, data & software
www.zenodo.org
46
47. What is DMPonline?
• A web-based tool to help researchers write Data
Management and Sharing Plans
• Includes requirements and guidance from funders,
universities and other groups
• Developed by the Digital Curation Centre
48. How to write a DMP
• Template available from https://dmponline.dcc.ac.uk/
•
• And from a few national DMPonline sites, e.g. in Spain and Belgium
See https://www.openaire.eu/opendatapilot-dmp - Spain: http://pgd.consorciomadrono.es/ - Belgium: pilot and therefore limited to authorised persons 48
1
50. DMPonline
A web-based tool to help researchers write DMPs
Includes a template for Horizon 2020
Guidance from EUDAT and OpenAIRE being added
https://dmponline.dcc.ac.uk
52. Deliver the DMP
EC: “Since DMPs are expected to mature during the project, more
developed versions of the plan can be included as additional
deliverables at later stages. (…) New versions of the DMP should be
created whenever important changes to the project occur due to
inclusion of new data sets, changes in consortium policies or external
factors.”
52
53. Where to find a repository?
More information: https://www.openaire.eu/opendatapilot-repository
Zenodo: http://www.zenodo.org/ 54
54. How to select a repository?
1/2
Main criteria for choosing a data repository:
• Certification as a ‘Trustworthy Digital Repository’, with an explicit
ambition to keep the data available in the long term.
• Network of trustworthy digital repositories for long-term preservation of the data
after the research is finished.
• Three common certification standards for TDRs:
Data Seal of Approval: http://datasealofapproval.org/en/
nestor seal for DIN 31644: http://www.langzeitarchivierung.de/Subsites/nestor/EN/nestor-Siegel/siegel_node.html
ISO 16363: http://www.iso16363.org/
55
55. Main criteria for choosing a data repository:
• Certification as a ‘Trustworthy Digital Repository’, with an explicit ambition
to keep the data available in the long term.
• Matches your particular data needs and is FAIR compliant: e.g. certain file
formats; mixture of Open and Restricted Access. So contact the repository
of your choice when writing the first version of your DMP, or earlier.
• Provides guidance on metadata and on how to cite the data that has been
deposited.
• Gives your submitted dataset a persistent and globally unique identifier:
for sustainable citations – both for data and publications – and to link back
to particular researchers and grants.
How to select a repository?
2/2
https://www.openaire.eu/opendatapilot-repository 56
57. EUDAT project
https://eudat.eu/ 58
EUDAT offers common data services
to both research communities and
individuals through a network of 35
European organisations.
58. EUDAT offers data
services
EUDAT services are designed, built and implemented based on user
community requirements.
59
PHYSICAL SCIENCES
& ENGINEERING
MATERIALS &
ANALYTICAL
FACILITIES
MAPPER
BIOMEDICAL &
MEDICAL SCIENCES
60. • Store and exchange data with
colleagues and team members,
including research data not finalized
for publishing
• share data with fine-grained access
controls
• synchronize multiple versions of data
across different devices
e.g. B2DROP – a solution for
researchers and scientists to:
Features:
20GB storage per user
Living objects, so no PIDs
Versioning and offline use
Desktop synchronisation
B2DROP is hosted at the Jülich Supercomputing Centre
Daily backups of all files in B2DROP are taken and kept on tape
b2drop.eudat.eu
61. • move large amounts of data between
data stores and high-performance
compute resources
• re-ingest computational results back
into EUDAT
• deposit large data sets into EUDAT
resources for long-term preservation
Features:
high-speed transfer
reliable and light-weight
manages permanent PIDs
62
e.g. B2STAGE - Facilitating communities to:
eudat.eu/b2stage
Basis : 3,699 Horizon 2020 signed grant agreements
Calls in core-areas: opt out 35% (149/431 proposals)
Other areas: voluntary opt in 13% (409/3,268 proposals)
In multi-beneficiary projects it is also possible for specific beneficiaries to keep their data closed if relevant provisions are made in the consortium agreement and are in line with the reasons for opting out
Please take time to read Background information and the guidance in the Annex, because the questions in the template are not all clear on their own.
What metadata will be created? In case metadata standards do not exist in your discipline, please outline what type of metadata will be created and how.
Are the data produced and/or used in the project discoverable with metadata, identifiable and locatable by means of a standard identification mechanism (e.g. persistent and unique identifiers such as Digital Object Identifiers)?
What naming conventions do you follow?
Will search keywords be provided that optimize possibilities for re-use?
Do you provide clear version numbers?
Metadata is needed to locate and understand the data. When you are deciding what information to capture, think about what others would need in order to find, evaluate, understand, and reuse your data; the EC template also mentions keywords. Also get others to check your metadata to improve the quality and make sure it’s understandable to others. Standards should be used where possible.
To make sure their data can be understood by themselves, their community and others, researchers should create metadata and documentation.
Metadata is basic descriptive information to help identify and understand the structure of the data e.g. title, author...
Documentation provides the wider context. It’s useful to share the methodology / workflow, software and any information needed to understand the data e.g. explanation of abbreviations or acronyms
For Accessibility the Guidance contains more questions:
Which data produced and/or used in the project will be made openly available as the default? If certain datasets cannot be shared (or need to be shared under restrictions), explain why, clearly separating legal and contractual reasons from voluntary restrictions.
Note that in multi-beneficiary projects it is also possible for specific beneficiaries to keep their data closed if relevant provisions are made in the consortium agreement and are in line with the reasons for opting out.
How will the data be made accessible (e.g. by deposition in a repository)? What methods or software tools are needed to access the data?Is documentation about the software needed to access the data included? Is it possible to include the relevant software (e.g. in open source code)?
Where will the data and associated metadata, documentation and code be deposited? Preference should be given to certified repositories which support open access where possible.
Have you explored appropriate arrangements with the identified repository?If there are restrictions on use, how will access be provided?Is there a need for a data access committee?Are there well described conditions for access (i.e. a machine readable license)? How will the identity of the person accessing the data be ascertained?
Are the data produced in the project interoperable, that is allowing data exchange and re-use between researchers, institutions, organisations, countries, etc. (i.e. adhering to standards for formats, as much as possible compliant with available (open) software applications, and in particular facilitating re-combinations with different datasets from different origins)?
What data and metadata vocabularies, standards or methodologies will you follow to make your data interoperable?
Will you be using standard vocabularies for all data types present in your data set, to allow inter-disciplinary interoperability?
In case it is unavoidable that you use uncommon or generate project specific ontologies or vocabularies, will you provide mappings to more commonly used ontologies?
How will the data be licensed to permit the widest re-use possible?
When will the data be made available for re-use? If an embargo is sought to give time to publish or seek patents, specify why and how long this will apply, bearing in mind that research data should be made available as soon as possible.
Are the data produced and/or used in the project useable by third parties, in particular after the end of the project? If the re-use of some data is restricted, explain why.
How long is it intended that the data remains re-usable? Are data quality assurance processes described?
Guidance from the DCC can also help researchers to understand data licensing. This guide outlines the pros and cons of each approach e.g. the limitations of some CC options
The OA guidelines under Horizon 2020 point to CC-0 or CC-BY as a straightforward and effective way to make it possible for others to mine, exploit and reproduce the data. See p11 at: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf
Contrary to some other funders, the EC does not require a DMP at the proposal stage.
First of all, what is a Data Management Plan (dmp)?
Essentially, most funders just want evidence at the grant stage that data has been considered – how much will be generated? Usually just a 1-2 page summary covering the expected data to be produced through research along with an idea of what might be shared and how and when it will be shared.
It is important to stress that funders aren’t expecting something carved in stone at this stage. Projects often change quite radically from what is submitted at the proposal stage and this is ok. Researchers just need to be able to provide evidence that they have thought about the data they might be generating and how it will be managed and shared.
In terms of preservation, it is important to remember that not ALL data has to be retained. Selecting what data needs to be kept is something that only the researcher can do. Essentially, he/she will need to retain any data that underpins published findings to allow for validation of results. Additional data that is not required for validation purposes but is deemed to have longer-term value might also be worth keeping.
DMPs are often submitted as part of grant applications, but are useful whenever you’re creating data. Some HEIs are introducing policies that require DMPs for all research undertaken by staff – whether externally funded or not.
[final bullet] Acting on requests from the community, DMPonline will add an ‘export to Zenodo’ feature alongside the other export options. You might want to use this to increase your project’s transparancy, share good practices, or maybe because you write your DMP as a (kind of ) data paper, which is interesting in its own right. At the moment there are a few H2020 DMPs in Zenodo and figshare.
Web-based tool to help researchers write Data Management and Sharing Plans according to different funder / institutional requirements
There are various templates in DMP Online based on different funder requirements and institutional customisations.
We’re currently enhancing it with practical examples, boilerplate text and tailored support. TEDDINET may wish to develop discipline specific guidance within the tool for future related projects.
You may get the feeling that there is so much to do and to know. It is important to realise that you don’t have to build or buy all data services. Instead, institutes and academic communities should support researchers to find & use what is there already. That holds for the repositories that I mentioned, but also for the services that our sister project EUDAT offers.
Note that these are all ”technical” services. The notion “RDM” has different meanings in EUDAT and OpenAIRE.
B2STAGE was conceived to deal with modern day research challenges. As hardware and research software improve, scope for research is broadening. Communities now pursue large-scale simulations, for example developing models for climate simulation encompassing the whole of the Earth, as opposed to isolated regions. Scientists simulate not only organs in the human body, but also their interactions. Similarly, earthquake data are now collected and processed for areas as large as entire continents. The common requirement of such research challenges is that they generate and process increasing volumes of data, with typical workflows requiring data to be processed in a distributed fashion, so as to cope with the pace of data generation. Efficient transfer of data to high performance computing (HPC) workspaces is essential especially in distributed. In order for this to be possible, data need to be transferred in an efficient way to the high-performance or high-throughput computing resources, and this is where B2STAGE comes in.
The service also takes care of depositing the computation output from the HPC facilities to EUDAT. B2STAGE can also be used to deposit the community data into the EUDAT facilities. B2STAGE uses the established gridFTP protocol to ensure high-speed transfer between the sites. Data transfer is reliable and requires very little user interaction. B2STAGE also assigns PIDs to computational output that the user elects to inject back into the EUDAT datacentres.
When you are interested in learning more about EUDAT services you can contact CINECA.