SlideShare a Scribd company logo
1 of 30
Download to read offline
Kees van Bochove, Founder, The Hyve
Reuse of R&D data and the
promise of FAIR data lakes
@keesvanbochove
BioDataWorld
Basel, 5 Dec 2019
Outline
1. FAIR Data is about people & change
2. The data lake is a passing phase
3. Forget about AI. Data & UX matter.
The Hyve
We advance biology and medical research…
… by building and serving thriving open source communities.
Services
Professional support for
open source software in
biomedical informatics
➢Software development
➢Data engineering
➢Consultancy
➢Hosting / SLAs
Core values
Share
Reuse
Specialize
Office Locations
Utrecht, The Netherlands
Cambridge, MA, United States
Customer Segments
Pharma
Life Sciences
Healthcare
Fast-growing
Started in 2012
40+ people by now
FAIR Data is
about people &
embracing change
Statement #1
@keesvanbochove @TheHyveNL
The roots of FAIR
►Public-private partnership to advance:
►Open Science
► Sustainability & reuse of data
►Workshop in Leiden in 2014
►Towards a Modular Blueprint ‘Floor-plan’ of a safe
and fair Data Stewardship, Trading and Routing
environment, provisionally called the Data
FAIRPORT
https://www.lorentzcenter.nl/lc/web/2014/602/info.php3?wsid=602
FAIR Workshop at The Hyve in Utrecht, 2018
http://blog.thehyve.nl/blog/highlights-from-pistoia-alliances-fair-workshop
https://www.sciencedirect.com/science/article/pii/S1359644618303039
15 FAIR principles for (meta)data
http://www.nature.com/articles/sdata201618
Accessible:
A1. standardized protocol
A1.1 open, free and universally implementable
A1.2. authentication and authorization
A2. metadata stay accessible
Reusable:
R1. attributes
R1.1. license
R1.2. provenance
R1.3. community standards
Interoperable:
I1. language for knowledge representation
I2. vocabularies that follow FAIR principles
I3. qualified references to other (meta)data
Findable:
F1. persistent identifier
F2. metadata
F3. metadata - data link
F4. registered or indexed
The fundamental change behind FAIR
Data
management
Data
stewardship
scope: project scope: organization
GO-FAIR Initiative Pillars
Why resilience to change matters
● Domain changes and focus shifts: new data types,
new applications, new scientific paradigms etc.
● Organizational changes: M&A, re-orgs, people
moving roles etc.
● Technology changes: new software and hardware
platforms, analysis methods, automation, ML/AI etc.
FAIR Data is
about people &
embracing change
Statement #1
● From data management
to data stewardship
● Implies cultural, process
and technical change
● Data strategy should be
resilient to change
@keesvanbochove @TheHyveNL
The data lake is a
passing phase
Statement #2
@keesvanbochove @TheHyveNL
15
Network architectures
Genomics
England
Research Environment
NHS Trusts
Airlock
Research Community
17
MedMij Personal Health Apps
The classical monolith
Enterprise
Data Warehouse
ETL
ETL
ETL
Business Intelligence
/ Analytics
The modern (?) monolith
Ingest
Self-service
Pipelines
AnalyticsEnterprise Data Lake
Ingestion Team Data Engineering Team Unification TeamSearch TeamPlatform API Team Analytics Team
Architectural division
Axis of
change
Decentralized data management
● IRI / identifier schemes
● Metadata standards
● Provenance standards
CDO
Data Federation
{
{
Oncology
Neuro-
science Development
ClinOps
HCS
Omics platforms
Data science
Preclinical
ADME/Tox
Biomarker dev.
RWD
Epidemiology
● Catalog function
● Data standards
● Entities / data sets
Publish
Advantages of a decentralized FAIR approach
● More resilient to change: no dependency on large central functions
● Allows for an iterative data strategy operationalization (no ‘big bang’
data lake delivery needed, FAIRification can start today and locally)
● No need to shuffle people around to start a big data lake project:
embed informatics and data experts directly in the research and
development teams
● Centralize only standardization functions, decentralize the rest 
empower teams to do their own data science and informatics
● Embrace usage of external data and collaborations, no need to
‘ingest first’ via a central function, but use & link directly
The data lake is a
passing phase
Statement #2
● Centralization is a
potential bottleneck and
a barrier for change
● The solution is in
decentralization of
storage, applications etc.
● Standards management
and data federation as
central functions
@keesvanbochove @TheHyveNL
Forget about AI.
Data &UX matter.
Statement #3
@keesvanbochove @TheHyveNL
Teams at The Hyve: open source communities
Research Data Management
● FAIR Data Governance consultancy
● Fairspace (meta)data management
Genomics
● Cancer data portal: cBioPortal
● Knowledge base: Open Targets
Health Data Networks
● Data warehouses: tranSMART, i2b2
● Cohort selection: Glowing Bear
● Request Portals: Podium
Real World Data
● Real world evidence: OMOP/OHDSI
● Wearables platform: RADAR-BASE
FAIR Services at The Hyve
● Semantic modelling: creating (meta)data models that allow traversal of
linked data
● Data conformance: choose the right data standard for specific problems,
align with community standards to maximize benefits from the open
science communities and precompetitive collaborations
● Data landscape: create an understanding of existing applications and
data sources in the company and readiness for FAIR
● FAIRification: get started with FAIRifying datasets, defining metadata,
appropriate standards, provenance etc.
● Data catalog: build collaborative environment around data catalog (e.g.
using Fairspace)
Example: OMOP CDM v5 for RWE/RWD
● Observational
healthcare
data
● Fields defined
per domain
● Standardized
Vocabularies
cBioPortal: hard to resist value proposition
● 4000+ citations
in literature
● ~20k+ unique
users per
month
● Local instances
deployed in
many pharma
companies
and cancer
centers
Open
Targets
● Integration
of 20+ key
public data
sources for
target
discovery
Forget about AI.
Data &UX matter.
Statement #3
● Decision making by HIPPO
instead of by algorithms
● Make AI developers happy
with relevant FAIR data
● Strong semantics are key to
and standards can help
(e.g. OMOP, CDISC)
● Investments in UX are costly
& you should capitalize on
them (e.g. OpenTargets)
● It’s a great time to build
knowledge graphs!
@keesvanbochove @TheHyveNL
We advance biology and medical
sciences by building and serving
thriving open source communities

More Related Content

What's hot

Data discovery and sharing at UCLH
Data discovery and sharing at UCLHData discovery and sharing at UCLH
Data discovery and sharing at UCLH
Jisc
 
IC-SDV 2019: OntoChem
IC-SDV 2019: OntoChemIC-SDV 2019: OntoChem
IC-SDV 2019: OntoChem
Dr. Haxel Consult
 
SDSC Industry News Q1 2015
SDSC Industry News Q1 2015SDSC Industry News Q1 2015
SDSC Industry News Q1 2015
Ron Hawkins
 

What's hot (20)

Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...
 
FAIR data
FAIR dataFAIR data
FAIR data
 
Open Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon HodsonOpen Science Globally: Some Developments/Dr Simon Hodson
Open Science Globally: Some Developments/Dr Simon Hodson
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
 
D4Science Data infrastructure: a facilitator for a FAIR data management
D4Science Data infrastructure: a facilitator for a FAIR data managementD4Science Data infrastructure: a facilitator for a FAIR data management
D4Science Data infrastructure: a facilitator for a FAIR data management
 
ODIN: Connecting research and researchers
ODIN: Connecting research and researchersODIN: Connecting research and researchers
ODIN: Connecting research and researchers
 
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for Biopharma
 
Data discovery and sharing at UCLH
Data discovery and sharing at UCLHData discovery and sharing at UCLH
Data discovery and sharing at UCLH
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
 
Fair data principles for AOASG
Fair data principles for AOASGFair data principles for AOASG
Fair data principles for AOASG
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
 
Darwin ai covid-net mitre
Darwin ai   covid-net mitreDarwin ai   covid-net mitre
Darwin ai covid-net mitre
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
IWSG Science Gateways
IWSG Science GatewaysIWSG Science Gateways
IWSG Science Gateways
 
IC-SDV 2019: OntoChem
IC-SDV 2019: OntoChemIC-SDV 2019: OntoChem
IC-SDV 2019: OntoChem
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative Advantage
 
SDSC Industry News Q1 2015
SDSC Industry News Q1 2015SDSC Industry News Q1 2015
SDSC Industry News Q1 2015
 

Similar to Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204

Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Aridhia Informatics Ltd
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Tim Williams
 

Similar to Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204 (20)

Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve
Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The HyveOpen Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve
Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Open Science Governance and Regulation/Simon Hodson
Open Science Governance and Regulation/Simon HodsonOpen Science Governance and Regulation/Simon Hodson
Open Science Governance and Regulation/Simon Hodson
 
cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)cBioPortal Webinar Slides (3/3)
cBioPortal Webinar Slides (3/3)
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...Data as a research output and a research asset: the case for Open Science/Sim...
Data as a research output and a research asset: the case for Open Science/Sim...
 
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
 
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
 
Fair by design
Fair by designFair by design
Fair by design
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer Research
 
Starting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer ResearchStarting the Hadoop Journey at a Global Leader in Cancer Research
Starting the Hadoop Journey at a Global Leader in Cancer Research
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17
 
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and PresentToward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
Toward F.A.I.R. Pharma. PhUSE Linked Data Initiatives Past and Present
 
Research Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the PolicyResearch Data, or: How I Learned to Stop Worrying and Love the Policy
Research Data, or: How I Learned to Stop Worrying and Love the Policy
 
FAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDAFAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDA
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 

More from Kees van Bochove

TranSMART Hackathon Introduction Amsterdam 2015
TranSMART Hackathon Introduction Amsterdam 2015TranSMART Hackathon Introduction Amsterdam 2015
TranSMART Hackathon Introduction Amsterdam 2015
Kees van Bochove
 

More from Kees van Bochove (13)

2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...
2019-10-11 The value of FAIR data in health data networks - The Hyve - ELIXIR...
 
Origins of FAIR webinar
Origins of FAIR webinarOrigins of FAIR webinar
Origins of FAIR webinar
 
Health Data Networks webinar
Health Data Networks webinarHealth Data Networks webinar
Health Data Networks webinar
 
FAIR Data Experiences - Kees van Bochove - The Hyve
FAIR Data Experiences - Kees van Bochove - The HyveFAIR Data Experiences - Kees van Bochove - The Hyve
FAIR Data Experiences - Kees van Bochove - The Hyve
 
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
SCOPE Summit - Applying the OMOP data model & OHDSI software to national Euro...
 
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
 
Usage of open source software for Real World Data Analysis in pharmaceutical ...
Usage of open source software for Real World Data Analysis in pharmaceutical ...Usage of open source software for Real World Data Analysis in pharmaceutical ...
Usage of open source software for Real World Data Analysis in pharmaceutical ...
 
The Hyve introduction TranSMART Annual Meeting 2015 Amsterdam
The Hyve introduction TranSMART Annual Meeting 2015 AmsterdamThe Hyve introduction TranSMART Annual Meeting 2015 Amsterdam
The Hyve introduction TranSMART Annual Meeting 2015 Amsterdam
 
TranSMART Roadmap Presentation Amsterdam 2015
TranSMART Roadmap Presentation Amsterdam 2015TranSMART Roadmap Presentation Amsterdam 2015
TranSMART Roadmap Presentation Amsterdam 2015
 
TranSMART Development Highlights Amsterdam 2015
TranSMART Development Highlights Amsterdam 2015TranSMART Development Highlights Amsterdam 2015
TranSMART Development Highlights Amsterdam 2015
 
TranSMART Hackathon Introduction Amsterdam 2015
TranSMART Hackathon Introduction Amsterdam 2015TranSMART Hackathon Introduction Amsterdam 2015
TranSMART Hackathon Introduction Amsterdam 2015
 
Open Source Collaboration in Drug Discovery in Pharma
Open Source Collaboration in Drug Discovery in PharmaOpen Source Collaboration in Drug Discovery in Pharma
Open Source Collaboration in Drug Discovery in Pharma
 
TranSMART API Plugin Case Study: Genome Browser
TranSMART API Plugin Case Study: Genome BrowserTranSMART API Plugin Case Study: Genome Browser
TranSMART API Plugin Case Study: Genome Browser
 

Recently uploaded

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
lizamodels9
 
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Sheetaleventcompany
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
amitlee9823
 

Recently uploaded (20)

FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture concept
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
John Halpern sued for sexual assault.pdf
John Halpern sued for sexual assault.pdfJohn Halpern sued for sexual assault.pdf
John Halpern sued for sexual assault.pdf
 
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptx
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
 

Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204

  • 1. Kees van Bochove, Founder, The Hyve Reuse of R&D data and the promise of FAIR data lakes @keesvanbochove BioDataWorld Basel, 5 Dec 2019
  • 2. Outline 1. FAIR Data is about people & change 2. The data lake is a passing phase 3. Forget about AI. Data & UX matter.
  • 3. The Hyve We advance biology and medical research… … by building and serving thriving open source communities. Services Professional support for open source software in biomedical informatics ➢Software development ➢Data engineering ➢Consultancy ➢Hosting / SLAs Core values Share Reuse Specialize Office Locations Utrecht, The Netherlands Cambridge, MA, United States Customer Segments Pharma Life Sciences Healthcare Fast-growing Started in 2012 40+ people by now
  • 4. FAIR Data is about people & embracing change Statement #1 @keesvanbochove @TheHyveNL
  • 5. The roots of FAIR ►Public-private partnership to advance: ►Open Science ► Sustainability & reuse of data ►Workshop in Leiden in 2014 ►Towards a Modular Blueprint ‘Floor-plan’ of a safe and fair Data Stewardship, Trading and Routing environment, provisionally called the Data FAIRPORT https://www.lorentzcenter.nl/lc/web/2014/602/info.php3?wsid=602
  • 6. FAIR Workshop at The Hyve in Utrecht, 2018 http://blog.thehyve.nl/blog/highlights-from-pistoia-alliances-fair-workshop https://www.sciencedirect.com/science/article/pii/S1359644618303039
  • 7. 15 FAIR principles for (meta)data http://www.nature.com/articles/sdata201618 Accessible: A1. standardized protocol A1.1 open, free and universally implementable A1.2. authentication and authorization A2. metadata stay accessible Reusable: R1. attributes R1.1. license R1.2. provenance R1.3. community standards Interoperable: I1. language for knowledge representation I2. vocabularies that follow FAIR principles I3. qualified references to other (meta)data Findable: F1. persistent identifier F2. metadata F3. metadata - data link F4. registered or indexed
  • 8. The fundamental change behind FAIR Data management Data stewardship scope: project scope: organization
  • 10. Why resilience to change matters ● Domain changes and focus shifts: new data types, new applications, new scientific paradigms etc. ● Organizational changes: M&A, re-orgs, people moving roles etc. ● Technology changes: new software and hardware platforms, analysis methods, automation, ML/AI etc.
  • 11. FAIR Data is about people & embracing change Statement #1 ● From data management to data stewardship ● Implies cultural, process and technical change ● Data strategy should be resilient to change @keesvanbochove @TheHyveNL
  • 12. The data lake is a passing phase Statement #2 @keesvanbochove @TheHyveNL
  • 16. The classical monolith Enterprise Data Warehouse ETL ETL ETL Business Intelligence / Analytics
  • 17. The modern (?) monolith Ingest Self-service Pipelines AnalyticsEnterprise Data Lake Ingestion Team Data Engineering Team Unification TeamSearch TeamPlatform API Team Analytics Team Architectural division Axis of change
  • 18. Decentralized data management ● IRI / identifier schemes ● Metadata standards ● Provenance standards CDO Data Federation { { Oncology Neuro- science Development ClinOps HCS Omics platforms Data science Preclinical ADME/Tox Biomarker dev. RWD Epidemiology ● Catalog function ● Data standards ● Entities / data sets Publish
  • 19. Advantages of a decentralized FAIR approach ● More resilient to change: no dependency on large central functions ● Allows for an iterative data strategy operationalization (no ‘big bang’ data lake delivery needed, FAIRification can start today and locally) ● No need to shuffle people around to start a big data lake project: embed informatics and data experts directly in the research and development teams ● Centralize only standardization functions, decentralize the rest  empower teams to do their own data science and informatics ● Embrace usage of external data and collaborations, no need to ‘ingest first’ via a central function, but use & link directly
  • 20. The data lake is a passing phase Statement #2 ● Centralization is a potential bottleneck and a barrier for change ● The solution is in decentralization of storage, applications etc. ● Standards management and data federation as central functions @keesvanbochove @TheHyveNL
  • 21. Forget about AI. Data &UX matter. Statement #3 @keesvanbochove @TheHyveNL
  • 22.
  • 23. Teams at The Hyve: open source communities Research Data Management ● FAIR Data Governance consultancy ● Fairspace (meta)data management Genomics ● Cancer data portal: cBioPortal ● Knowledge base: Open Targets Health Data Networks ● Data warehouses: tranSMART, i2b2 ● Cohort selection: Glowing Bear ● Request Portals: Podium Real World Data ● Real world evidence: OMOP/OHDSI ● Wearables platform: RADAR-BASE
  • 24. FAIR Services at The Hyve ● Semantic modelling: creating (meta)data models that allow traversal of linked data ● Data conformance: choose the right data standard for specific problems, align with community standards to maximize benefits from the open science communities and precompetitive collaborations ● Data landscape: create an understanding of existing applications and data sources in the company and readiness for FAIR ● FAIRification: get started with FAIRifying datasets, defining metadata, appropriate standards, provenance etc. ● Data catalog: build collaborative environment around data catalog (e.g. using Fairspace)
  • 25. Example: OMOP CDM v5 for RWE/RWD ● Observational healthcare data ● Fields defined per domain ● Standardized Vocabularies
  • 26. cBioPortal: hard to resist value proposition ● 4000+ citations in literature ● ~20k+ unique users per month ● Local instances deployed in many pharma companies and cancer centers
  • 27. Open Targets ● Integration of 20+ key public data sources for target discovery
  • 28. Forget about AI. Data &UX matter. Statement #3 ● Decision making by HIPPO instead of by algorithms ● Make AI developers happy with relevant FAIR data ● Strong semantics are key to and standards can help (e.g. OMOP, CDISC) ● Investments in UX are costly & you should capitalize on them (e.g. OpenTargets) ● It’s a great time to build knowledge graphs! @keesvanbochove @TheHyveNL
  • 29.
  • 30. We advance biology and medical sciences by building and serving thriving open source communities