Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204

Kees van Bochove
Kees van BochoveFounder at The Hyve
Kees van Bochove, Founder, The Hyve
Reuse of R&D data and the
promise of FAIR data lakes
@keesvanbochove
BioDataWorld
Basel, 5 Dec 2019
Outline
1. FAIR Data is about people & change
2. The data lake is a passing phase
3. Forget about AI. Data & UX matter.
The Hyve
We advance biology and medical research…
… by building and serving thriving open source communities.
Services
Professional support for
open source software in
biomedical informatics
➢Software development
➢Data engineering
➢Consultancy
➢Hosting / SLAs
Core values
Share
Reuse
Specialize
Office Locations
Utrecht, The Netherlands
Cambridge, MA, United States
Customer Segments
Pharma
Life Sciences
Healthcare
Fast-growing
Started in 2012
40+ people by now
FAIR Data is
about people &
embracing change
Statement #1
@keesvanbochove @TheHyveNL
The roots of FAIR
►Public-private partnership to advance:
►Open Science
► Sustainability & reuse of data
►Workshop in Leiden in 2014
►Towards a Modular Blueprint ‘Floor-plan’ of a safe
and fair Data Stewardship, Trading and Routing
environment, provisionally called the Data
FAIRPORT
https://www.lorentzcenter.nl/lc/web/2014/602/info.php3?wsid=602
FAIR Workshop at The Hyve in Utrecht, 2018
http://blog.thehyve.nl/blog/highlights-from-pistoia-alliances-fair-workshop
https://www.sciencedirect.com/science/article/pii/S1359644618303039
15 FAIR principles for (meta)data
http://www.nature.com/articles/sdata201618
Accessible:
A1. standardized protocol
A1.1 open, free and universally implementable
A1.2. authentication and authorization
A2. metadata stay accessible
Reusable:
R1. attributes
R1.1. license
R1.2. provenance
R1.3. community standards
Interoperable:
I1. language for knowledge representation
I2. vocabularies that follow FAIR principles
I3. qualified references to other (meta)data
Findable:
F1. persistent identifier
F2. metadata
F3. metadata - data link
F4. registered or indexed
The fundamental change behind FAIR
Data
management
Data
stewardship
scope: project scope: organization
GO-FAIR Initiative Pillars
Why resilience to change matters
● Domain changes and focus shifts: new data types,
new applications, new scientific paradigms etc.
● Organizational changes: M&A, re-orgs, people
moving roles etc.
● Technology changes: new software and hardware
platforms, analysis methods, automation, ML/AI etc.
FAIR Data is
about people &
embracing change
Statement #1
● From data management
to data stewardship
● Implies cultural, process
and technical change
● Data strategy should be
resilient to change
@keesvanbochove @TheHyveNL
The data lake is a
passing phase
Statement #2
@keesvanbochove @TheHyveNL
15
Network architectures
Genomics
England
Research Environment
NHS Trusts
Airlock
Research Community
17
MedMij Personal Health Apps
The classical monolith
Enterprise
Data Warehouse
ETL
ETL
ETL
Business Intelligence
/ Analytics
The modern (?) monolith
Ingest
Self-service
Pipelines
AnalyticsEnterprise Data Lake
Ingestion Team Data Engineering Team Unification TeamSearch TeamPlatform API Team Analytics Team
Architectural division
Axis of
change
Decentralized data management
● IRI / identifier schemes
● Metadata standards
● Provenance standards
CDO
Data Federation
{
{
Oncology
Neuro-
science Development
ClinOps
HCS
Omics platforms
Data science
Preclinical
ADME/Tox
Biomarker dev.
RWD
Epidemiology
● Catalog function
● Data standards
● Entities / data sets
Publish
Advantages of a decentralized FAIR approach
● More resilient to change: no dependency on large central functions
● Allows for an iterative data strategy operationalization (no ‘big bang’
data lake delivery needed, FAIRification can start today and locally)
● No need to shuffle people around to start a big data lake project:
embed informatics and data experts directly in the research and
development teams
● Centralize only standardization functions, decentralize the rest 
empower teams to do their own data science and informatics
● Embrace usage of external data and collaborations, no need to
‘ingest first’ via a central function, but use & link directly
The data lake is a
passing phase
Statement #2
● Centralization is a
potential bottleneck and
a barrier for change
● The solution is in
decentralization of
storage, applications etc.
● Standards management
and data federation as
central functions
@keesvanbochove @TheHyveNL
Forget about AI.
Data &UX matter.
Statement #3
@keesvanbochove @TheHyveNL
Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204
Teams at The Hyve: open source communities
Research Data Management
● FAIR Data Governance consultancy
● Fairspace (meta)data management
Genomics
● Cancer data portal: cBioPortal
● Knowledge base: Open Targets
Health Data Networks
● Data warehouses: tranSMART, i2b2
● Cohort selection: Glowing Bear
● Request Portals: Podium
Real World Data
● Real world evidence: OMOP/OHDSI
● Wearables platform: RADAR-BASE
FAIR Services at The Hyve
● Semantic modelling: creating (meta)data models that allow traversal of
linked data
● Data conformance: choose the right data standard for specific problems,
align with community standards to maximize benefits from the open
science communities and precompetitive collaborations
● Data landscape: create an understanding of existing applications and
data sources in the company and readiness for FAIR
● FAIRification: get started with FAIRifying datasets, defining metadata,
appropriate standards, provenance etc.
● Data catalog: build collaborative environment around data catalog (e.g.
using Fairspace)
Example: OMOP CDM v5 for RWE/RWD
● Observational
healthcare
data
● Fields defined
per domain
● Standardized
Vocabularies
cBioPortal: hard to resist value proposition
● 4000+ citations
in literature
● ~20k+ unique
users per
month
● Local instances
deployed in
many pharma
companies
and cancer
centers
Open
Targets
● Integration
of 20+ key
public data
sources for
target
discovery
Forget about AI.
Data &UX matter.
Statement #3
● Decision making by HIPPO
instead of by algorithms
● Make AI developers happy
with relevant FAIR data
● Strong semantics are key to
and standards can help
(e.g. OMOP, CDISC)
● Investments in UX are costly
& you should capitalize on
them (e.g. OpenTargets)
● It’s a great time to build
knowledge graphs!
@keesvanbochove @TheHyveNL
Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204
We advance biology and medical
sciences by building and serving
thriving open source communities
1 of 30

Recommended

Business context of FAIR health data networks - The Hyve - MEDINFO Lyon 2019 by
Business context of FAIR health data networks - The Hyve - MEDINFO Lyon 2019Business context of FAIR health data networks - The Hyve - MEDINFO Lyon 2019
Business context of FAIR health data networks - The Hyve - MEDINFO Lyon 2019Kees van Bochove
129 views20 slides
How 2019 became the year FAIR landed in biopharmaceutical R&D by
How 2019 became the year FAIR landed in biopharmaceutical R&DHow 2019 became the year FAIR landed in biopharmaceutical R&D
How 2019 became the year FAIR landed in biopharmaceutical R&DKees van Bochove
677 views23 slides
Clinical Data Models - The Hyve - Bio IT World April 2019 by
Clinical Data Models - The Hyve - Bio IT World April 2019Clinical Data Models - The Hyve - Bio IT World April 2019
Clinical Data Models - The Hyve - Bio IT World April 2019Kees van Bochove
1.1K views39 slides
Open science and medical evidence generation - Kees van Bochove - The Hyve by
Open science and medical evidence generation - Kees van Bochove - The HyveOpen science and medical evidence generation - Kees van Bochove - The Hyve
Open science and medical evidence generation - Kees van Bochove - The HyveKees van Bochove
238 views45 slides
Semantics and linked data at astra zeneca by
Semantics and linked data at astra zenecaSemantics and linked data at astra zeneca
Semantics and linked data at astra zenecaKerstin Forsberg
1.6K views31 slides
FAIR data overview by
FAIR data overviewFAIR data overview
FAIR data overviewLuiz Olavo Bonino da Silva Santos
1.8K views53 slides

More Related Content

What's hot

Fair webinar, Ted slater: progress towards commercial fair data products and ... by
Fair webinar, Ted slater: progress towards commercial fair data products and ...Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...Pistoia Alliance
2.7K views39 slides
FAIR data by
FAIR dataFAIR data
FAIR dataSarah Jones
3.3K views15 slides
Open Science Globally: Some Developments/Dr Simon Hodson by
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonAfrican Open Science Platform
186 views23 slides
Dataset Catalogs as a Foundation for FAIR* Data by
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
682 views22 slides
D4Science Data infrastructure: a facilitator for a FAIR data management by
D4Science Data infrastructure: a facilitator for a FAIR data managementD4Science Data infrastructure: a facilitator for a FAIR data management
D4Science Data infrastructure: a facilitator for a FAIR data managementResearch Data Alliance
339 views15 slides
ODIN: Connecting research and researchers by
ODIN: Connecting research and researchersODIN: Connecting research and researchers
ODIN: Connecting research and researchersSergio Ruiz
577 views20 slides

What's hot(20)

Fair webinar, Ted slater: progress towards commercial fair data products and ... by Pistoia Alliance
Fair webinar, Ted slater: progress towards commercial fair data products and ...Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...
Pistoia Alliance2.7K views
Dataset Catalogs as a Foundation for FAIR* Data by Tom Plasterer
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
Tom Plasterer682 views
D4Science Data infrastructure: a facilitator for a FAIR data management by Research Data Alliance
D4Science Data infrastructure: a facilitator for a FAIR data managementD4Science Data infrastructure: a facilitator for a FAIR data management
D4Science Data infrastructure: a facilitator for a FAIR data management
ODIN: Connecting research and researchers by Sergio Ruiz
ODIN: Connecting research and researchersODIN: Connecting research and researchers
ODIN: Connecting research and researchers
Sergio Ruiz577 views
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021 by dkNET
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET313 views
Linked Data for Biopharma by Tom Plasterer
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for Biopharma
Tom Plasterer3.4K views
Data discovery and sharing at UCLH by Jisc
Data discovery and sharing at UCLHData discovery and sharing at UCLH
Data discovery and sharing at UCLH
Jisc1.2K views
Fairification experience clarifying the semantics of data matrices by Pistoia Alliance
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
Pistoia Alliance465 views
Fair data principles for AOASG by Keith Russell
Fair data principles for AOASGFair data principles for AOASG
Fair data principles for AOASG
Keith Russell82 views
FAIR Data Knowledge Graphs–from Theory to Practice by Tom Plasterer
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
Tom Plasterer628 views
Darwin ai covid-net mitre by ianmitch
Darwin ai   covid-net mitreDarwin ai   covid-net mitre
Darwin ai covid-net mitre
ianmitch370 views
FAIR Data Knowledge Graphs by Tom Plasterer
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
Tom Plasterer936 views
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources by Pistoia Alliance
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Pistoia Alliance3.3K views
BioPharma and FAIR Data, a Collaborative Advantage by Tom Plasterer
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative Advantage
Tom Plasterer661 views
SDSC Industry News Q1 2015 by Ron Hawkins
SDSC Industry News Q1 2015SDSC Industry News Q1 2015
SDSC Industry News Q1 2015
Ron Hawkins120 views

Similar to Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204

Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve by
Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The HyveOpen Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve
Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The HyveKees van Bochove
260 views38 slides
Data Harmonization for a Molecularly Driven Health System by
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
110 views63 slides
Open Science Governance and Regulation/Simon Hodson by
Open Science Governance and Regulation/Simon HodsonOpen Science Governance and Regulation/Simon Hodson
Open Science Governance and Regulation/Simon HodsonAcademy of Science of South Africa (ASSAf)
317 views11 slides
cBioPortal Webinar Slides (3/3) by
cBioPortal Webinar Slides (3/3)cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)Pistoia Alliance
1.1K views27 slides
A coordinated framework for open data open science in Botswana/Simon Hodson by
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonAfrican Open Science Platform
330 views38 slides
Data as a research output and a research asset: the case for Open Science/Sim... by
Data as a research output and a research asset: the case for Open Science/Sim...Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...African Open Science Platform
146 views29 slides

Similar to Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204(20)

Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve by Kees van Bochove
Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The HyveOpen Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve
Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve
Kees van Bochove260 views
Data Harmonization for a Molecularly Driven Health System by Warren Kibbe
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe110 views
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese... by VMware Tanzu
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
VMware Tanzu1.2K views
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ... by Aridhia Informatics Ltd
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Managing Metadata for Science and Technology Studies: the RISIS case by Rinke Hoekstra
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
Rinke Hoekstra498 views
I o dav data workshop prof wafula final 19.9.17 by Tom Nyongesa
I o dav data workshop prof wafula final 19.9.17I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17
Tom Nyongesa333 views
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present by Tim Williams
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Tim Williams854 views
Research Data, or: How I Learned to Stop Worrying and Love the Policy by Torsten Reimer
Research Data, or: How I Learned to Stop Worrying and Love the PolicyResearch Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the Policy
Torsten Reimer977 views
FAIR data: what it means, how we achieve it, and the role of RDA by Sarah Jones
FAIR data: what it means, how we achieve it, and the role of RDAFAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDA
Sarah Jones871 views
Data Harmonization for a Molecularly Driven Health System by Warren Kibbe
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
Warren Kibbe501 views
OpenAIRE and Eudat services and tools to support FAIR DMP implementation by Research Data Alliance
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation by Research Data Alliance
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation

More from Kees van Bochove

2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR... by
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...Kees van Bochove
271 views22 slides
Origins of FAIR webinar by
Origins of FAIR webinarOrigins of FAIR webinar
Origins of FAIR webinarKees van Bochove
83 views21 slides
Health Data Networks webinar by
Health Data Networks webinarHealth Data Networks webinar
Health Data Networks webinarKees van Bochove
89 views13 slides
FAIR Data Experiences - Kees van Bochove - The Hyve by
FAIR Data Experiences - Kees van Bochove - The HyveFAIR Data Experiences - Kees van Bochove - The Hyve
FAIR Data Experiences - Kees van Bochove - The HyveKees van Bochove
194 views42 slides
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro... by
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...Kees van Bochove
2.2K views50 slides
Using Healthcare Data for Research @ The Hyve - Campus Party 2016 by
Using Healthcare Data for Research @ The Hyve - Campus Party 2016Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016Kees van Bochove
607 views40 slides

More from Kees van Bochove(13)

2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR... by Kees van Bochove
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...
Kees van Bochove271 views
FAIR Data Experiences - Kees van Bochove - The Hyve by Kees van Bochove
FAIR Data Experiences - Kees van Bochove - The HyveFAIR Data Experiences - Kees van Bochove - The Hyve
FAIR Data Experiences - Kees van Bochove - The Hyve
Kees van Bochove194 views
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro... by Kees van Bochove
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
Kees van Bochove2.2K views
Using Healthcare Data for Research @ The Hyve - Campus Party 2016 by Kees van Bochove
Using Healthcare Data for Research @ The Hyve - Campus Party 2016Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Kees van Bochove607 views
Usage of open source software for Real World Data Analysis in pharmaceutical ... by Kees van Bochove
Usage of open source software for Real World Data Analysis in pharmaceutical ...Usage of open source software for Real World Data Analysis in pharmaceutical ...
Usage of open source software for Real World Data Analysis in pharmaceutical ...
Kees van Bochove1.3K views
The Hyve introduction TranSMART Annual Meeting 2015 Amsterdam by Kees van Bochove
The Hyve introduction TranSMART Annual Meeting 2015 AmsterdamThe Hyve introduction TranSMART Annual Meeting 2015 Amsterdam
The Hyve introduction TranSMART Annual Meeting 2015 Amsterdam
Kees van Bochove787 views
TranSMART Roadmap Presentation Amsterdam 2015 by Kees van Bochove
TranSMART Roadmap Presentation Amsterdam 2015TranSMART Roadmap Presentation Amsterdam 2015
TranSMART Roadmap Presentation Amsterdam 2015
Kees van Bochove1K views
TranSMART Development Highlights Amsterdam 2015 by Kees van Bochove
TranSMART Development Highlights Amsterdam 2015TranSMART Development Highlights Amsterdam 2015
TranSMART Development Highlights Amsterdam 2015
Kees van Bochove934 views
TranSMART Hackathon Introduction Amsterdam 2015 by Kees van Bochove
TranSMART Hackathon Introduction Amsterdam 2015TranSMART Hackathon Introduction Amsterdam 2015
TranSMART Hackathon Introduction Amsterdam 2015
Kees van Bochove649 views
Open Source Collaboration in Drug Discovery in Pharma by Kees van Bochove
Open Source Collaboration in Drug Discovery in PharmaOpen Source Collaboration in Drug Discovery in Pharma
Open Source Collaboration in Drug Discovery in Pharma
Kees van Bochove803 views
TranSMART API Plugin Case Study: Genome Browser by Kees van Bochove
TranSMART API Plugin Case Study: Genome BrowserTranSMART API Plugin Case Study: Genome Browser
TranSMART API Plugin Case Study: Genome Browser
Kees van Bochove1.1K views

Recently uploaded

India's Leading Cyber Security Companies to Watch.pdf by
India's Leading Cyber Security Companies to Watch.pdfIndia's Leading Cyber Security Companies to Watch.pdf
India's Leading Cyber Security Companies to Watch.pdfinsightssuccess2
7 views40 slides
Accel_Series_2023Autumn_En.pptx by
Accel_Series_2023Autumn_En.pptxAccel_Series_2023Autumn_En.pptx
Accel_Series_2023Autumn_En.pptxNTTDATA INTRAMART
77 views75 slides
Presentation on proposed acquisition of leading European asset manager Aermon... by
Presentation on proposed acquisition of leading European asset manager Aermon...Presentation on proposed acquisition of leading European asset manager Aermon...
Presentation on proposed acquisition of leading European asset manager Aermon...KeppelCorporation
93 views11 slides
ANTHROPOIDS WHITE PAPER.pdf by
ANTHROPOIDS WHITE PAPER.pdfANTHROPOIDS WHITE PAPER.pdf
ANTHROPOIDS WHITE PAPER.pdfAnthropoids Nfts
39 views12 slides
TNR Gold Shotgun Gold Project Presentation by
TNR Gold Shotgun Gold Project PresentationTNR Gold Shotgun Gold Project Presentation
TNR Gold Shotgun Gold Project PresentationKirill Klip
88 views38 slides
UCA towards I5.0 OECD.pdf by
UCA towards I5.0 OECD.pdfUCA towards I5.0 OECD.pdf
UCA towards I5.0 OECD.pdfAPPAU_Ukraine
7 views16 slides

Recently uploaded(20)

India's Leading Cyber Security Companies to Watch.pdf by insightssuccess2
India's Leading Cyber Security Companies to Watch.pdfIndia's Leading Cyber Security Companies to Watch.pdf
India's Leading Cyber Security Companies to Watch.pdf
Presentation on proposed acquisition of leading European asset manager Aermon... by KeppelCorporation
Presentation on proposed acquisition of leading European asset manager Aermon...Presentation on proposed acquisition of leading European asset manager Aermon...
Presentation on proposed acquisition of leading European asset manager Aermon...
TNR Gold Shotgun Gold Project Presentation by Kirill Klip
TNR Gold Shotgun Gold Project PresentationTNR Gold Shotgun Gold Project Presentation
TNR Gold Shotgun Gold Project Presentation
Kirill Klip88 views
NewBase 23 November 2023 Energy News issue - 1676 by Khaled Al Awadi_compre... by Khaled Al Awadi
NewBase  23 November 2023  Energy News issue - 1676 by Khaled Al Awadi_compre...NewBase  23 November 2023  Energy News issue - 1676 by Khaled Al Awadi_compre...
NewBase 23 November 2023 Energy News issue - 1676 by Khaled Al Awadi_compre...
Khaled Al Awadi17 views
Group and Teams: Increasing Cooperation and Reducing Conflict by Seta Wicaksana
Group and Teams: Increasing Cooperation and Reducing Conflict Group and Teams: Increasing Cooperation and Reducing Conflict
Group and Teams: Increasing Cooperation and Reducing Conflict
Seta Wicaksana13 views
SUGAR cosmetics ppt by shafrinn5
SUGAR cosmetics pptSUGAR cosmetics ppt
SUGAR cosmetics ppt
shafrinn524 views
TNR Gold Investor Presentation - Building The Green Energy Metals Royalty and... by Kirill Klip
TNR Gold Investor Presentation - Building The Green Energy Metals Royalty and...TNR Gold Investor Presentation - Building The Green Energy Metals Royalty and...
TNR Gold Investor Presentation - Building The Green Energy Metals Royalty and...
Kirill Klip74 views
Strategies for Responsible and Efficient Waste Disposal by AlfredoRaylan
Strategies for Responsible and Efficient Waste DisposalStrategies for Responsible and Efficient Waste Disposal
Strategies for Responsible and Efficient Waste Disposal
AlfredoRaylan67 views
Coomes Consulting Business Profile by Chris Coomes
Coomes Consulting Business ProfileCoomes Consulting Business Profile
Coomes Consulting Business Profile
Chris Coomes41 views
How to get your business featured on Forbes - Business Show 23 by Quibble
How to get your business featured on Forbes - Business Show 23How to get your business featured on Forbes - Business Show 23
How to get your business featured on Forbes - Business Show 23
Quibble28 views
Episode 258 Snippets: Rob Gevertz of First Five Yards by Neil Horowitz
Episode 258 Snippets: Rob Gevertz of First Five YardsEpisode 258 Snippets: Rob Gevertz of First Five Yards
Episode 258 Snippets: Rob Gevertz of First Five Yards
Neil Horowitz45 views

Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204

  • 1. Kees van Bochove, Founder, The Hyve Reuse of R&D data and the promise of FAIR data lakes @keesvanbochove BioDataWorld Basel, 5 Dec 2019
  • 2. Outline 1. FAIR Data is about people & change 2. The data lake is a passing phase 3. Forget about AI. Data & UX matter.
  • 3. The Hyve We advance biology and medical research… … by building and serving thriving open source communities. Services Professional support for open source software in biomedical informatics ➢Software development ➢Data engineering ➢Consultancy ➢Hosting / SLAs Core values Share Reuse Specialize Office Locations Utrecht, The Netherlands Cambridge, MA, United States Customer Segments Pharma Life Sciences Healthcare Fast-growing Started in 2012 40+ people by now
  • 4. FAIR Data is about people & embracing change Statement #1 @keesvanbochove @TheHyveNL
  • 5. The roots of FAIR ►Public-private partnership to advance: ►Open Science ► Sustainability & reuse of data ►Workshop in Leiden in 2014 ►Towards a Modular Blueprint ‘Floor-plan’ of a safe and fair Data Stewardship, Trading and Routing environment, provisionally called the Data FAIRPORT https://www.lorentzcenter.nl/lc/web/2014/602/info.php3?wsid=602
  • 6. FAIR Workshop at The Hyve in Utrecht, 2018 http://blog.thehyve.nl/blog/highlights-from-pistoia-alliances-fair-workshop https://www.sciencedirect.com/science/article/pii/S1359644618303039
  • 7. 15 FAIR principles for (meta)data http://www.nature.com/articles/sdata201618 Accessible: A1. standardized protocol A1.1 open, free and universally implementable A1.2. authentication and authorization A2. metadata stay accessible Reusable: R1. attributes R1.1. license R1.2. provenance R1.3. community standards Interoperable: I1. language for knowledge representation I2. vocabularies that follow FAIR principles I3. qualified references to other (meta)data Findable: F1. persistent identifier F2. metadata F3. metadata - data link F4. registered or indexed
  • 8. The fundamental change behind FAIR Data management Data stewardship scope: project scope: organization
  • 10. Why resilience to change matters ● Domain changes and focus shifts: new data types, new applications, new scientific paradigms etc. ● Organizational changes: M&A, re-orgs, people moving roles etc. ● Technology changes: new software and hardware platforms, analysis methods, automation, ML/AI etc.
  • 11. FAIR Data is about people & embracing change Statement #1 ● From data management to data stewardship ● Implies cultural, process and technical change ● Data strategy should be resilient to change @keesvanbochove @TheHyveNL
  • 12. The data lake is a passing phase Statement #2 @keesvanbochove @TheHyveNL
  • 16. The classical monolith Enterprise Data Warehouse ETL ETL ETL Business Intelligence / Analytics
  • 17. The modern (?) monolith Ingest Self-service Pipelines AnalyticsEnterprise Data Lake Ingestion Team Data Engineering Team Unification TeamSearch TeamPlatform API Team Analytics Team Architectural division Axis of change
  • 18. Decentralized data management ● IRI / identifier schemes ● Metadata standards ● Provenance standards CDO Data Federation { { Oncology Neuro- science Development ClinOps HCS Omics platforms Data science Preclinical ADME/Tox Biomarker dev. RWD Epidemiology ● Catalog function ● Data standards ● Entities / data sets Publish
  • 19. Advantages of a decentralized FAIR approach ● More resilient to change: no dependency on large central functions ● Allows for an iterative data strategy operationalization (no ‘big bang’ data lake delivery needed, FAIRification can start today and locally) ● No need to shuffle people around to start a big data lake project: embed informatics and data experts directly in the research and development teams ● Centralize only standardization functions, decentralize the rest  empower teams to do their own data science and informatics ● Embrace usage of external data and collaborations, no need to ‘ingest first’ via a central function, but use & link directly
  • 20. The data lake is a passing phase Statement #2 ● Centralization is a potential bottleneck and a barrier for change ● The solution is in decentralization of storage, applications etc. ● Standards management and data federation as central functions @keesvanbochove @TheHyveNL
  • 21. Forget about AI. Data &UX matter. Statement #3 @keesvanbochove @TheHyveNL
  • 23. Teams at The Hyve: open source communities Research Data Management ● FAIR Data Governance consultancy ● Fairspace (meta)data management Genomics ● Cancer data portal: cBioPortal ● Knowledge base: Open Targets Health Data Networks ● Data warehouses: tranSMART, i2b2 ● Cohort selection: Glowing Bear ● Request Portals: Podium Real World Data ● Real world evidence: OMOP/OHDSI ● Wearables platform: RADAR-BASE
  • 24. FAIR Services at The Hyve ● Semantic modelling: creating (meta)data models that allow traversal of linked data ● Data conformance: choose the right data standard for specific problems, align with community standards to maximize benefits from the open science communities and precompetitive collaborations ● Data landscape: create an understanding of existing applications and data sources in the company and readiness for FAIR ● FAIRification: get started with FAIRifying datasets, defining metadata, appropriate standards, provenance etc. ● Data catalog: build collaborative environment around data catalog (e.g. using Fairspace)
  • 25. Example: OMOP CDM v5 for RWE/RWD ● Observational healthcare data ● Fields defined per domain ● Standardized Vocabularies
  • 26. cBioPortal: hard to resist value proposition ● 4000+ citations in literature ● ~20k+ unique users per month ● Local instances deployed in many pharma companies and cancer centers
  • 27. Open Targets ● Integration of 20+ key public data sources for target discovery
  • 28. Forget about AI. Data &UX matter. Statement #3 ● Decision making by HIPPO instead of by algorithms ● Make AI developers happy with relevant FAIR data ● Strong semantics are key to and standards can help (e.g. OMOP, CDISC) ● Investments in UX are costly & you should capitalize on them (e.g. OpenTargets) ● It’s a great time to build knowledge graphs! @keesvanbochove @TheHyveNL
  • 30. We advance biology and medical sciences by building and serving thriving open source communities