1) The document discusses supporting research data management at INRA. It outlines key dates in developing INRA's data sharing policy and recommendations from a 2012 report.
2) Researchers at INRA produce various types of data from genetics to social sciences. Current data sharing practices and pain points are discussed for each domain.
3) A proposed service offer is outlined to support data management and sharing. This includes guidance, training, a digital repository leveraging existing repositories, and additional services like DOI minting and vocabularies.
2. .02
E. Dzalé Yeumo/ IG Agriculture meeting 21 September 2015
Agenda
Introduction
What data do we produce at INRA
Current data management and sharing practices and pain points: the point of
view of some researchers
The service offer under construction
3. .03
E. Dzalé Yeumo/ IG Agriculture meeting 21 September 2015
2009: political awarness of Inra CEO
2012: Report* of Inra scientific council « data management
and sharing »
9 recommendations, 1st : define Inra Policy
2013: Inra data sharing policy
11 data management and sharing principles
Implementation
Domain specific working groups inventory, requirements
Trans-disciplinary working groups state of the art (legal/IP,
technical, social issues), proposals
Some key dates
Introduction
*Gaspin, C., Pontier, D., Colinet, L., Dardel, F., Franc, A., Hologne, O., Le Gall, O., Maurin, N., Perrière, G., Pichot, C., Rodolphe, F. (2012).
Rapport du groupe de travail sur la gestion et le partage des données http://prodinra.inra.fr/record/206746
5. .05
E. Dzalé Yeumo/ IG Agriculture meeting 21 September 2015
Current data sharing practices: the point of view of some
scientists
E. Dzalé Yeumo, O. Hologne
16/juillet/2015
Genetic &
genomic
- In general, data is released
once the producers have
published a paper
- Many data sharing
platforms exist at the
national and international
level
- Many metadata and data
format standards exist
- Lots of data is produced in
collaboration with public and
private partners
Experimentation,
observation and
simulation
- Raw data may be of as
interest as processed data
- Data sharing rules depends
on the nature, granularity,
and origin of the data
- The importance of
metadata is of paramount
for the reusability of the
released data
- Lots of data is unique, and
can be captured only once
Social sciences
- A few data sharing
platforms exist at the
national and international
level
- Lots of data are are bought
and can’t be freely released
- Experimental economics
data can be freely released
most of the time
- Data documentation and
statistical disclosure are of
paramount importance
- The importance of the
longitudinal aspect (range or
historical aspect) of the data
increases the need for their
long term preservation
Practices
6. .06
E. Dzalé Yeumo/ IG Agriculture meeting 21 September 2015
Current data management pain points: the point of view of some
scientists
E. Dzalé Yeumo, O. Hologne
16/juillet/2015
Genetic &
genomic
- Data is more and more
massive economical
model for long term
hosting
- Exchange, transfer and
storage of large datasets
- Lack of human resources
- Gaps in semantic
coverage
- Many existing public data
repositories are congested
with important waiting
time for the deposit of
datasets
- Risk of conferring an
economic advantage to
our competitors or to our
partners competitors
Experimentation,
observation and
simulation
- Data standardization
- Exchange, transfer and
storage of large datasets
- Capturing metadata
automatically
- Gaps in semantic
coverage
- Metadata may be
strategic (e.g protocols,
methods)
- Some metadata or data
are sensitive (geographic
information about
epidemiologic data or
GMO data)
- Exchange of large
datasets
Social sciences
- Data archiving:
sustainability of the
existing platforms
- Statistical disclosure
control
- Lots of personal and
sensitive data (social and
economical survey data)
risk of re-identification
- Legal and intellectual
property issues are of
paramount importance
(ex: data inferred from
existing textual data
through decision tools,
data purchased from third
parties)
Pain points
7. .07
E. Dzalé Yeumo/ IG Agriculture meeting 21 September 2015
What service offer to support the data management
and sharing?
8. .08
E. Dzalé Yeumo/ IG Agriculture meeting 21 September 2015
The scientists whish list
E. Dzalé Yeumo, O. Hologne
16/juillet/2015
Genetic &
genomic
- Provide templates for consortium
agreements + identify special cases
(biological material, long tradition of
partnership, etc.)
- A data portal with access to all the
data shared by INRA
- DOIs
- Metadata training
- Volunteering for species-related data
repositories at an international level
Experimentation,
observation and
simulation
- Recognition of all the contributors
- Recognition of data sharing as first
class skill at the institutional level
- Data papers training
- A data portal with access to all the
data shared by INRA
- Harmonization of metadata standards
and vocabularies
Social sciences
- Guidance with regards to publishing
derived data: when, how?
- Webscrapping data of interest which
may be removed from original sources
- DOIs
- A secured data platform that allows
reviewers to access data and
reproduce findings with respect to
legal and IP requirements.
- Thematic active data management
platforms
Legal, IP , social
Technics,
methods, tools
9. .09
E. Dzalé Yeumo/ IG Agriculture meeting 21 September 2015
Awareness, guidance, training
10. A digital repository
Base on the IT environment provided by the IT department:
Two data centers
The Netapp Storage Virtual Machine technology
Outsource the long term preservation
Leverage the many existing data repositories
Upgrade the existing repositories up to trusted repositories in accordance with the Data
Seal of Approval and the defra assessement grids:
http://www.datasealofapproval.org/en/assessment/ and
https://defradigital.blog.gov.uk/2015/02/09/are-you-a-mature-open-data-publisher/
Conform to the OAIS reference model
Cover both active and historical data