Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Agenda for FAIR Data Meeting
1. AGENDA
13.00 Welcome & goal of the meeting
13.10 Short introductions
Also outlining what they would want to learn today
•Bluebee
•Centric
•ICT automatisering
•MedVision
•Ordina
•Ortec
13.45 FAIR technology & Tools
14.30 Q & A,
15.00 Wrap up
2. FAIR Data - GO FAIR
Interconnected Data Infrastructure
Towards an Internet of FAIR Data and Services
Luiz Olavo Bonino – luiz.bonino@dtls.nl - June 15, 2017
3. SUMMARY
• FAIR Data overview
• GO FAIR initiative
• DTL’s technology developments
10. THE UNDERLYING PROBLEM…
FRAGMENTATION of…
• data
• sample collections
• image collections
• regulations
• software tools
• research initiatives
• funding
• expertise
• etc.
11. THE DATA PROBLEM IN HEALTH RESEARCH &
INNOVATION
Most data do not TALK to each other
Data are lost and/or hard to find
Inhibits scaling of effective knowledge discovery
Inhibits fully effective health care and research &
innovation
Research data malpractice (Life Science example):
Only 12% of NIH funded datasets are deposited in recognized repositories: so over
200,000 ‘invisible’ public datasets can not be re-used effectively.
Approximately 50% of funded research not reproducible
12. DATA LOSS IS SIGNIFICANT, DATA GROWTH IS
STAGGERING
Computer speed and storage
capacity is doubling every 18
months and this rate is steady
DNA sequence data is doubling
every 6 months over the last 3
years and looks to continue for
this decade
Nature news, 19 December 2013
15. FAIR DATA PRINCIPLES
Findable:
F1. (meta)data are assigned a globally unique and
persistent identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier of
the data it describes;
F4. (meta)data are registered or indexed in a searchable
resource;
Accessible:
A1. (meta)data are retrievable by their identifier using a
standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no
longer available;
Interoperable:
I1. (meta)data use a formal, accessible, shared, and
broadly applicable language for knowledge
representation.
I2. (meta)data use vocabularies that follow FAIR
principles;
I3. (meta)data include qualified references to other
(meta)data;
Reusable:
R1. meta(data) are richly described with a plurality of
accurate and relevant attributes;
R1.1. (meta)data are released with a clear and
accessible data usage license;
R1.2. (meta)data are associated with detailed
provenance;
R1.3. (meta)data meet domain-relevant community
standards;
https://www.nature.com/articles/sdata201618
16. FAIR DATA PRINCIPLES - METADATA
Findable:
F1. metadata are assigned a globally unique and persistent
identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier of
the data it describes;
F4. (meta)data are registered or indexed in a searchable
resource;
Accessible:
A1. metadata are retrievable by their identifier using a
standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no
longer available;
Interoperable:
I1. metadata use a formal, accessible, shared, and
broadly applicable language for knowledge
representation;
I2. metadata use vocabularies that follow FAIR
principles;
I3. metadata include qualified references to other
(meta)data;
Reusable:
R1. metadata are richly described with a plurality of
accurate and relevant attributes;
R1.1. metadata are released with a clear and
accessible data usage license;
R1.2. metadata are associated with detailed
provenance;
R1.3. metadata meet domain-relevant community
standards;
17. FAIR DATA PRINCIPLES - DATA
Findable:
F1. data are assigned a globally unique and persistent
identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier of
the data it describes;
F4. (meta)data are registered or indexed in a searchable
resource;
Accessible:
A1. data are retrievable by their identifier using a
standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no
longer available;
Interoperable:
I1. data use a formal, accessible, shared, and broadly
applicable language for knowledge representation;
I2. data use vocabularies that follow FAIR principles;
I3. data include qualified references to other
(meta)data;
Reusable:
R1. data are richly described with a plurality of
accurate and relevant attributes;
R1.1. data are released with a clear and accessible
data usage license;
R1.2. data are associated with detailed provenance;
R1.3. data meet domain-relevant community
standards;
18. FAIR DATA PRINCIPLES - SUPPORTING
INFRASTRUCTURE
Findable:
F1. (meta)data are assigned a globally unique and
persistent identifier;
F2. data are described with rich metadata;
F3. metadata clearly and explicitly include the identifier of
the data it describes;
F4. (meta)data are registered or indexed in a searchable
resource;
Accessible:
A1. (meta)data are retrievable by their identifier using a
standardized communications protocol;
A1.1 the protocol is open, free, and universally
implementable;
A1.2. the protocol allows for an authentication and
authorization procedure, where necessary;
A2. metadata are accessible, even when the data are no
longer available;
Interoperable:
I1. (meta)data use a formal, accessible, shared, and
broadly applicable language for knowledge
representation.
I2. (meta)data use vocabularies that follow FAIR
principles;
I3. (meta)data include qualified references to other
(meta)data;
Reusable:
R1. meta(data) are richly described with a plurality of
accurate and relevant attributes;
R1.1. (meta)data are released with a clear and
accessible data usage license;
R1.2. (meta)data are associated with detailed
provenance;
R1.3. (meta)data meet domain-relevant community
standards;
19. DATA STEWARDSHIP: A TWO WAY STREET (1)
ucs
storage
sustainability
maintenance
license
privacy security
stewardship
access
?
standards
ontologies
?
If data are not interoperable
Ridiculogram
Cross data analytics are not instructive
20. DATA STEWARDSHIP: A TWO WAY STREET (2)
If data are interoperable Data analytics provide new knowledge
21. What is the current status in
Europe and the USA?
22. EVOLVED RAPIDLY INTO A GLOBAL MOVEMENT
Rapid acceptance and endorsement process
The Lorentz conference
The FAIR website
Research Data Alliance endorsement
DTL flagship project
FORCE11 international partner
Articles accepted in NATURE
NIH accepts FAIR compliance in Life Sciences Commons
DTL director Prof. Barend Mons Chair High Level Expert Group EC
The Personal Health Train Initiative started
EC announces European Open Science Cloud with FAIR as leading principle
World 2016
23. AND RECENTLY EVEN THE G20 WANTS FAIR!
“We support appropriate efforts to promote open science and facilitate appropriate access to
publicly funded research results on findable, accessible, interoperable and reusable (FAIR)
principles.” (Statement 12)
http://europa.eu/rapid/press-release_STATEMENT-16-2967_en.htm
Let’s GO FAIR
24. EC TAKES ACTION: THE EUROPEAN OPEN SCIENCE CLOUD
Europe acknowledged the
problem
Moved for a solution:
EOSC
Data Stewardship (DS) for
better discovery
Internet of Data of FAIR
Data & Services
Training of 500.000 data
experts
Financing
€2B for initial phase EOSC
DS market $85B annually
30. THE IMPLEMENTATION NETWORK APPROACH
Implementation Networks can be
content based, FAIR data & services
based or Analytics based, but the
strongest nodes combine these
aspects. A A topical or domain GO FAIR
Network has technology, domain
expertise and content
B A FAIR data modeling and publishing
GO FAIR Network has Linked Data
or other FAIR experts
C A Data analytics Network has FAIR
compliant analytics and learning
tools and visualization expertise
B
FAIR Data
Services
A
Domain
content
C
Analytics
31. NL-DE CALL FOR ACTION – 30-05-2017
https://www.dtls.nl/germany-netherlands-call-action-european-open-science-cloud/
Joint position paper on implementing EOSC
through GO FAIR from Dutch and German
Secretaries of State
”As science becomes increasingly data-
driven, making data FAIR will create
real added value…”
”Funding providers have to
acknowledge the effort related to data
management as a prerequisite for FAIR
data.”
”Germany and the Netherlands propose
to support the GO FAIR initiative as a
promising approach towards
establishing the EOSC.”
37. BRING YOUR OWN DATA - BYOD
• Goals:
– Learn how to make data linkable “hands-on” with experts
– Create a “telling story” to demonstrate its use
– Make FAIR Data at the source
• Composition:
– Data owners – specialists on given datasets
– Data interoperability experts
– Domain experts
41. BYOD PLANNING
Preparation
Identify Plan
Datasets
Attendees' profile
Output data access
Tentative dates
Tentative venue
Costs
Funds
Coordination
Set date
Invite attendees
Set venue
Catering
Lodging
Financial planning
Publicity
Working document
Preparatory calls
Data hosting
Software hosting
Documentation hosting
42. BYOD PLANNING
Execution
Day One
Introduction
SW, LD, Ontology intro
Use case intro
Workgroups division
Working sessions
WWW/TTTALA
Day Two
Progress report
Working sessions
Groups reports
WWW/TTTALA
Day Three
Data integration
Answer driving question
Explore data
Demo improvement
Final report
WWW/TTTALA
46. FAIRIFICATION PROCESS
• Retrieve original data
• Dataset identification and analysis
• Definition of the semantic model
• Data transformation
• License assignment
• Metadata definition
• FAIR Data resource (data, metadata, license) deployment
49. FAIRIFIER
• Transform non-FAIR datasets into FAIR Data
Resources (dataset in FAIR format, license
and metadata)
• Data munging
• Semantic modeling
• License definition
• Metadata definition and extraction
• Data publication
51. FAIRIFICATION PROCESS
• Retrieve original data
• Dataset identification and analysis
• Definition of the semantic model
• Data transformation
• License assignment
• Metadata definition
• FAIR Data resource (data, metadata, license)
deployment
53. FAIRIFICATION - NEW DATASET TYPE
FAIR Data Resource
submit generate
FAIR Data Model
Registry
store
Semantic
Model &
Non-FAIR -
FAIR
mapping
54. FAIRIFICATION - RECURRING DATASET TYPE
FAIR Data Resource
submit generate
FAIR Data Model
Registry
query
Semantic
Model &
Non-FAIR -
FAIR
mapping
retrieve
55.
56. FAIR DATA POINT
A particular class of FAIR Data System that provides access to datasets in a FAIR
manner. The datasets can be external or internal to the FAIR Data Point. Also,
the source data can be a non-FAIR dataset or a FAIR Data Resource. If the
source data is non-FAIR, the FAIR Data Point needs to made the necessary FAIR
transformations on the fly.
57.
58. FAIR Data Point metadata
Title
Responsible institution(s)
Contact
FAIR API version
License
…
59. FAIR Data Point metadata
Catalog metadata
Title
Theme taxonomy
Issued date
… DCAT
60. FAIR Data Point metadata
Catalog 1 metadata
Dataset metadataTitle
Publisher
License
Theme(s)
Version
…
DCAT/HCLS
61. FAIR Data Point metadata
Catalog 1 metadata
Dataset 1 metadata
Distribution metadata
Title
Media type
Download/access URL
License
… DCAT
62. FAIR Data Point metadata
Catalog metadata
Dataset metadata
Distribution
metadata
Data record metadata
Type
Domain
Range
…
RML
63. FAIR Data Point metadata
Catalog 2 metadataCatalog 1 metadata
Dataset 1 metadata
Distribution 1.a
metadata
Data record metadata
Distribution 1.b
metadata
Dataset 2 metadata
Distribution 2.a
metadata
Data record metadata
Distribution 2.b
metadata
Dataset 3 metadata
Distribution 3.a
metadata
Data record metadata
73. FAIR HACKATHON - GOALS
• Align solutions with FAIR Data Point
specifications.
– Metadata content
– API
– Data
74. FAIR HACKATHON OUTCOME
• FAIR data model for solutions content;
• Architecture of the required
adjustments/extensions;
• Technical specification of the
adjustments/extensions;
• Proof-of-concept of the adjusted solution;
79. • Allow third-party annotation on existing
knowledge bases
• Capture the provenance of the annotator and
the original statement
Open RDF
Knowledge AnnotatorORKA
83. • A particular class of FAIR Data System to provide support for data
interoperability;
• Supports publication and access to FAIR data.
• Fosters an ecosystems of applications and services;
• Federated architecture: different FAIRports (and other FAIR Data
Systems) are interconnectable;
• Supports citations of datasets and data items;
• Provides metrics for data usage and citation;
86. BENEFITS OF FAIR DATA STEWARDSHIP (1)
If you are a government/funder/policy maker:
Better organized stakeholder community
Less resources lost on slack and overhead
Increased ‘Return on Investment’ of public funding
Opportunity to harmonize fragmented legislation in a way that facilitates
personalized medicine & health research
Participate in international developments, e.g. European Open Science
Cloud
Improved societal impact:
increased involvement of citizen/patient (digital control of own data)
Increased economic benefits in health care sector including prevention
87. BENEFITS OF FAIR DATA STEWARDSHIP (2)
If you are an institution:
No more short term point solutions over and over again
Compliance by design to technical, ELSI and scientific standards
Easier & safer (inter)national exchange
More efficient business operations
Less risks
More successes
88. BENEFITS OF FAIR DATA STEWARDSHIP (3)
If you are a citizen:
Better opportunities for active participation
With better prevention lower insurance premiums
Better privacy: you are in control of your own data!
More benefit from tax and charity money spent
Faster development of new preventive, diagnostic and
therapeutic solutions
89. BENEFITS OF FAIR DATA STEWARDSHIP (4)
If you are a company/entrepreneur:
Your proprietary data
compliant with FAIR public domain data
Improved data analytics and knowledge predictions
More effective discovery process
Offer current customers better analytics, applications and services
Develop new FAIR applications and services
Provide training/certification services
Save cost and increase revenue
Access to the ‘5% market’: 100B+!
90. JUST AN IMPRESSION OF HOW BIG THIS MARKET IS
€2B for initial phase EOSC
Total EU (28) plus USA 86,5 B for Data Stewardship (DS)
annually
EU (28)
GDP 52,000 B
2.4 % of GDP to R&D = 1,248 B
@ 5% = 62 B for DS
USA
GDP 18,000 B
2.73 % of GDP to R&D = 491 B
@ 5% = 24,5 B for DS
The Netherlands
GDP 818 B
1,973% of GDP to R&D = 16B
@ 5% = 800M for DS
(Source OECD)
€62 billion
€800 million
91. FAIR DATA SERVICE PROVIDER BUSINESS
ARCHITECTUREGO FAIR Support Office
(+ 7 MS)
GO FAIR
expertise broker
Certification
Certified
FAIR Data Service
provider
Analytics
Certified
FAIR Data Service
provider
Software development
Certified
FAIR Data Service
provider
Training/Staffing
Academic (5%) and private sector customers
Etc.
Het onderliggende probleem is “fragmentatie”. De stukjes van de puzzel hebben we wel beschikbaar, maar die zijn verspreid over heel veel partijen in dit land. Deze versplintering zie je op alle vlakken die op deze dia genoemd worden: data, sample collecties, etc. Om onderzoek goed te laten renderen met toepassingen die de zorg bereiken zijn faciliteiten nodig die nu nog verspreid zijn over veel onderzoeksgroepen en instellingen. Het organiseren hiervan tot een goed geoliede machine die het ons helpt personalised medicine & health research effectief te implementeren vergt een gezamenlijke inspanning. We hebben kortom nationale actie nodig om tot een gezamenlijke infrastructuur hiervoor te komen : Health-RI.
“Prohibitive”- I know what you mean, but the use of this word isn’t quite right.. I would suggest that the issues you raise PREVENT, LIMT OR INHIBIT (ONE OF THESE) the delivery of a fully effective…….
Box: where does this data come from? Same comment about the use of Prohibitive. Here I would say that it CONSTRAINS OR LIMITS the scaling of…..
One point to be stressed is that the data in the functionally interlinkable format has the sole purpose of facilitating data integration and interoperability. This doesn’t mean that the data in this format should be used for other purposes. Once the datasets are integrated and the scientific questions has been answered, for streamlining analysis on the selected integrated datasets, further processing may be necessary to transform the data into a format that would be optimal for the intended analysis.
In summary, in a BYOD we take non-FAIR datasets and, with the expertise of data owners, data experts and domain specialists, produces functionally interlinkable FAIR data by combining the data with the appropriate ontologies. These FAIR datasets, then, can be more easily integrated, giving answers to questions that wouldn’t be possible with the isolated datasets. These questions tend to be richer and more complex, fostering a richer knowledge discovery.