NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
Behind the FAIR Brand: Thinkers, Doers and Dreamers
1. Behind the FAIR Brand:
Thinkers, Doers and Dreamers
Susanna-Assunta Sansone
ORCiD: 0000-0001-5306-5690 | Twitter: @SusannaASansone
Beilstein Open Science 2019 Symposium, 15 -17 October 2019
Slides: https://www.slideshare.net/SusannaSansone
sansonegroup.eng.ox.ac.uk
Associate Professor, Engineering Science
Associate Director, Oxford e-Research Centre
Principal Investigator and Group Leader
2. A set of principles to enhance the
value of all digital resources
2014
2016
Developed and endorsed by
researchers, service
providers, publishers, funding
agencies, industry partners
5. Everybody needs data that are
• Discoverable by humans and machines
• Retrievable and structured in standard format(s)
• Self-described so that third parties can make sense of it
Better data = better science and more efficiently
Datasets SOPs Figures, Photos Workflows Slides Codes Tools DatabasesAlgorithmsDocument
6. • A crisis in confidence in research integrity in certain fields
• Human-machine collaboration will be crucial to our future
• Data-relates mandates and policies by funders
• The changing world of scholarly publishing
• The need for recognition and credit
Driving factors
Datasets SOPs Figures, Photos Workflows Slides Codes Tools DatabasesAlgorithmsDocument
7. Depends upon several stakeholders in the research ecosystem
actively playing their parts
• to deliver research infrastructures, tools and standards,
policies, education and training
• to overcome technical, social and cultural challenges
It is not simple and it requires long term investment
Making FAIR a reality
9. €3.3 billion
programme
2014 - 2020
€300 million
programme
2018 - 2020
European
intergovernmental
organisation
23 member
countries and
over 180 research
organisations
Since 2014
1
2
3 Started in 2019
FAIR-enabling EU and USA biomedical infrastructure
programmes and projects, e.g.
Since in 2014, several programs:
2014-2017
2017-2018
10. Organization and structure
• Hub and (national) Nodes
• Community-driven and rooted
• Strong focus on interoperability
• SMEs and Industry links
• Cross-nodes funded activities
11. model and related formats
Initiated in
2003
Open source tools and formats to help researchers to:
describe multi-modal experiments
follow community-developed standards
curate, analyze, release, share and publish
12. Nowadays ISA (format and/or tools)
powers over 30 public resources, e.g.,
The ‘curse’ of success:
• Time and (lack of) funds for:
- Maintenance
- Extensions
- Community coordination/training
• Not just about the software
- data curation know-how
13.
14. Funded by
Part of the
ISA-InterMine project
Reproducibility – FAIR at the first mile
From curated, structured metadata to data paper
datascriptor.org
15. Academics from several ELIXIR Nodes, with Janssen, AstraZeneca, Eli Lilly,
GSK, Novartis, Bayer, Boehringer Ingelheim
Define, document and implement a data FAIRification process:
17. Human capital maximization
• Work in squads cross-cutting working packages and partners
• Address questions/issues, rather then perform technical duties
• Prioritization of the work based on pharma's needs
• Three months sprint cycles
FAIRcookbook
19. 1 2014-2017
12 centres of excellence
2 2017-2018
3 Started in 2019
10 multi-PIs teams, forming one consortium
around 3 data types/databases
A consortium of 6 teams
20. 1
12 centres of excellence
2
3
10 multi-PIs teams, forming one consortium
around 3 data types/databases
A consortium of 6 teams
2014-2017
2017-2018
Started in 2019
21. 1
12 centres of excellence
2
3
10 multi-PIs teams, forming one consortium
around 3 data types/databases
A consortium of 6 teams
2014-2017
2017-2018
Started in 2019
22. 1
12 centres of excellence
2
3
10 multi-PIs teams, forming one consortium
around 3 data types/databases
A consortium of 6 teams
2014-2017
2017-2018
Started in 2019
23. 1
Building on previous work
• Learn from positive and
negative outcomes
• Assessment of what did not
work well and why
• NIH centres/officers playing an
active role
• Evolving understanding of what
a FAIR Data Commons is
12 centres of excellence
2
3
10 multi-PIs teams, forming one consortium
around 3 data types/databases
A consortium of 6 teams
2014-2017
2017-2018
Started in 2019
24. Data for machines – Use of data at scale
Findable
Accessible
Interoperable
Reusable
• Globally unique, resolvable, and persistent identifiers
§ To retrieve and connect data
• Community defined descriptive metadata
§ To enhance discoverability
• Common terminologies
§ To use the same term mean the same thing
• Detailed provenance
§ To contextualize the data and facilitate reproducibility
• Terms of access
§ Open as possible, closed as necessary
• Terms of use
§ Clear licences, ideally to enable innovation and reuse
31. Formats Terminologies Guidelines Identifiers
ID
REPOSITORIES,
databases and
knowledgebases
DATA POLICIES
by journals, funders, and
other organizations
COMMUNITY STANDARDS
for metadata and identifiers
informative and educational resource
Curated inter-linked
descriptions
32. Formats Terminologies Guidelines Identifiers
ID
informative and educational resource
Curated inter-linked
descriptions
All records are manually curated
in-house, verified and claimed by the
community behind each resource
Ready for use, implementation, or recommendation
In development
Status uncertain
Deprecated as subsumed or superseded
REPOSITORIES,
databases and
knowledgebases
DATA POLICIES
by journals, funders, and
other organizations
COMMUNITY STANDARDS
for metadata and identifiers
33. Formats Terminologies Guidelines Identifiers
ID
REPOSITORIES,
databases and
knowledgebases
DATA POLICIES
by journals, funders, and
other organizations
COMMUNITY STANDARDS
for metadata and identifiers
informative and educational resource
Curated inter-linked
descriptions
We guide consumers to discover, select and use these
resources with confidence
We help producers to make their resources more visible, more
widely adopted and cited
34.
35.
36. Researchers in academia,
industry, government
Developers and curators
of resources
Journal publishers or
organizations with data
policy
Research data facilitators,
librarians, trainers
Learned societies, unions
and associations
Funders and data
policy makers
A flagship output (and a WG) of the:
Recommended by funders, e.g.:
Core part of implementation networks in:
37. https://doi.org/10.1038/s41587-019-0080-8
Open Access CC-BY
69 authors (adopters, collaborators, users)
representing different stakeholder groups
Analysed the data policies by
journals/publishers, and the standards and
repositories they recommend
Working with journal editors and publishers
38. Discrepancy in recommendation across the data policies
• some repositories are named, but very few standards are
• cautious approach due to the wealth of existing resources
Recommendations are often driven by
• the editor’s familiarity with one or more standards, notably
for journals or publishers focusing on specific disciplines
• the engagement with learned societies and researchers
actively supporting and using certain resources
Ø Consensus: FAIRsharing plays a key role in helping editors
to discover and recommend appropriate resources
What have we learned and are doing now
39. “The interactive browser will allow us to discover which databases and standards
are not currently included in our author guidelines, enabling us to regularly
monitor and refine our policies as appropriate, in support of our mission to help
our authors enhance the reproducibility of their work.”
H. Murray. Publishing Editor, F1000Research
40. Collaboration:
Harmonize journals and publishers’ data deposition guidelines
by defining a common set of criteria for repository selection
Document being approved internally by publishers; out before / to be presented at RDA 14th Plenary, Helsinki
Criteria include:
• Access conditions
• Reuse condition
• Deposition conditions
• Unique ID schema
• User support
• Curation
• …….
41. Increase the number and the clarity of journals and funders
data policies by classifying the recommendations these policies contain
to improve their definition and guidance to researchers
Collaboration:
Workplan – phase 1:
Curate and assess their compliance to the Transparency and Openness Promotion
(TOP) guidelines and display the level in FAIRsharing