Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Kees van Bochove, Founder, The Hyve
Reuse of R&D data and the
promise of FAIR data lakes
@keesvanbochove
BioDataWorld
Base...
Outline
1. FAIR Data is about people & change
2. The data lake is a passing phase
3. Forget about AI. Data & UX matter.
The Hyve
We advance biology and medical research…
… by building and serving thriving open source communities.
Services
Pro...
FAIR Data is
about people &
embracing change
Statement #1
@keesvanbochove @TheHyveNL
The roots of FAIR
►Public-private partnership to advance:
►Open Science
► Sustainability & reuse of data
►Workshop in Leid...
FAIR Workshop at The Hyve in Utrecht, 2018
http://blog.thehyve.nl/blog/highlights-from-pistoia-alliances-fair-workshop
htt...
15 FAIR principles for (meta)data
http://www.nature.com/articles/sdata201618
Accessible:
A1. standardized protocol
A1.1 op...
The fundamental change behind FAIR
Data
management
Data
stewardship
scope: project scope: organization
GO-FAIR Initiative Pillars
Why resilience to change matters
● Domain changes and focus shifts: new data types,
new applications, new scientific parad...
FAIR Data is
about people &
embracing change
Statement #1
● From data management
to data stewardship
● Implies cultural, p...
The data lake is a
passing phase
Statement #2
@keesvanbochove @TheHyveNL
15
Network architectures
Genomics
England
Research Environment
NHS Trusts
Airlock
Research Community
17
MedMij Personal Health Apps
The classical monolith
Enterprise
Data Warehouse
ETL
ETL
ETL
Business Intelligence
/ Analytics
The modern (?) monolith
Ingest
Self-service
Pipelines
AnalyticsEnterprise Data Lake
Ingestion Team Data Engineering Team U...
Decentralized data management
● IRI / identifier schemes
● Metadata standards
● Provenance standards
CDO
Data Federation
{...
Advantages of a decentralized FAIR approach
● More resilient to change: no dependency on large central functions
● Allows ...
The data lake is a
passing phase
Statement #2
● Centralization is a
potential bottleneck and
a barrier for change
● The so...
Forget about AI.
Data &UX matter.
Statement #3
@keesvanbochove @TheHyveNL
Teams at The Hyve: open source communities
Research Data Management
● FAIR Data Governance consultancy
● Fairspace (meta)d...
FAIR Services at The Hyve
● Semantic modelling: creating (meta)data models that allow traversal of
linked data
● Data conf...
Example: OMOP CDM v5 for RWE/RWD
● Observational
healthcare
data
● Fields defined
per domain
● Standardized
Vocabularies
cBioPortal: hard to resist value proposition
● 4000+ citations
in literature
● ~20k+ unique
users per
month
● Local instan...
Open
Targets
● Integration
of 20+ key
public data
sources for
target
discovery
Forget about AI.
Data &UX matter.
Statement #3
● Decision making by HIPPO
instead of by algorithms
● Make AI developers ha...
We advance biology and medical
sciences by building and serving
thriving open source communities
Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204
Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204
Upcoming SlideShare
Loading in …5
×

0

Share

Download to read offline

Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204

Download to read offline

At the Bio Data World conference in Basel in December 2019, Kees van Bochove, Founder of The Hyve gave a talk on re-use of pharma R&D data, and what strategies could be used to realize operationalization of FAIR data at scale.

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Bio Data World - The promise of FAIR data lakes - The Hyve - 20191204

  1. 1. Kees van Bochove, Founder, The Hyve Reuse of R&D data and the promise of FAIR data lakes @keesvanbochove BioDataWorld Basel, 5 Dec 2019
  2. 2. Outline 1. FAIR Data is about people & change 2. The data lake is a passing phase 3. Forget about AI. Data & UX matter.
  3. 3. The Hyve We advance biology and medical research… … by building and serving thriving open source communities. Services Professional support for open source software in biomedical informatics ➢Software development ➢Data engineering ➢Consultancy ➢Hosting / SLAs Core values Share Reuse Specialize Office Locations Utrecht, The Netherlands Cambridge, MA, United States Customer Segments Pharma Life Sciences Healthcare Fast-growing Started in 2012 40+ people by now
  4. 4. FAIR Data is about people & embracing change Statement #1 @keesvanbochove @TheHyveNL
  5. 5. The roots of FAIR ►Public-private partnership to advance: ►Open Science ► Sustainability & reuse of data ►Workshop in Leiden in 2014 ►Towards a Modular Blueprint ‘Floor-plan’ of a safe and fair Data Stewardship, Trading and Routing environment, provisionally called the Data FAIRPORT https://www.lorentzcenter.nl/lc/web/2014/602/info.php3?wsid=602
  6. 6. FAIR Workshop at The Hyve in Utrecht, 2018 http://blog.thehyve.nl/blog/highlights-from-pistoia-alliances-fair-workshop https://www.sciencedirect.com/science/article/pii/S1359644618303039
  7. 7. 15 FAIR principles for (meta)data http://www.nature.com/articles/sdata201618 Accessible: A1. standardized protocol A1.1 open, free and universally implementable A1.2. authentication and authorization A2. metadata stay accessible Reusable: R1. attributes R1.1. license R1.2. provenance R1.3. community standards Interoperable: I1. language for knowledge representation I2. vocabularies that follow FAIR principles I3. qualified references to other (meta)data Findable: F1. persistent identifier F2. metadata F3. metadata - data link F4. registered or indexed
  8. 8. The fundamental change behind FAIR Data management Data stewardship scope: project scope: organization
  9. 9. GO-FAIR Initiative Pillars
  10. 10. Why resilience to change matters ● Domain changes and focus shifts: new data types, new applications, new scientific paradigms etc. ● Organizational changes: M&A, re-orgs, people moving roles etc. ● Technology changes: new software and hardware platforms, analysis methods, automation, ML/AI etc.
  11. 11. FAIR Data is about people & embracing change Statement #1 ● From data management to data stewardship ● Implies cultural, process and technical change ● Data strategy should be resilient to change @keesvanbochove @TheHyveNL
  12. 12. The data lake is a passing phase Statement #2 @keesvanbochove @TheHyveNL
  13. 13. 15 Network architectures
  14. 14. Genomics England Research Environment NHS Trusts Airlock Research Community
  15. 15. 17 MedMij Personal Health Apps
  16. 16. The classical monolith Enterprise Data Warehouse ETL ETL ETL Business Intelligence / Analytics
  17. 17. The modern (?) monolith Ingest Self-service Pipelines AnalyticsEnterprise Data Lake Ingestion Team Data Engineering Team Unification TeamSearch TeamPlatform API Team Analytics Team Architectural division Axis of change
  18. 18. Decentralized data management ● IRI / identifier schemes ● Metadata standards ● Provenance standards CDO Data Federation { { Oncology Neuro- science Development ClinOps HCS Omics platforms Data science Preclinical ADME/Tox Biomarker dev. RWD Epidemiology ● Catalog function ● Data standards ● Entities / data sets Publish
  19. 19. Advantages of a decentralized FAIR approach ● More resilient to change: no dependency on large central functions ● Allows for an iterative data strategy operationalization (no ‘big bang’ data lake delivery needed, FAIRification can start today and locally) ● No need to shuffle people around to start a big data lake project: embed informatics and data experts directly in the research and development teams ● Centralize only standardization functions, decentralize the rest  empower teams to do their own data science and informatics ● Embrace usage of external data and collaborations, no need to ‘ingest first’ via a central function, but use & link directly
  20. 20. The data lake is a passing phase Statement #2 ● Centralization is a potential bottleneck and a barrier for change ● The solution is in decentralization of storage, applications etc. ● Standards management and data federation as central functions @keesvanbochove @TheHyveNL
  21. 21. Forget about AI. Data &UX matter. Statement #3 @keesvanbochove @TheHyveNL
  22. 22. Teams at The Hyve: open source communities Research Data Management ● FAIR Data Governance consultancy ● Fairspace (meta)data management Genomics ● Cancer data portal: cBioPortal ● Knowledge base: Open Targets Health Data Networks ● Data warehouses: tranSMART, i2b2 ● Cohort selection: Glowing Bear ● Request Portals: Podium Real World Data ● Real world evidence: OMOP/OHDSI ● Wearables platform: RADAR-BASE
  23. 23. FAIR Services at The Hyve ● Semantic modelling: creating (meta)data models that allow traversal of linked data ● Data conformance: choose the right data standard for specific problems, align with community standards to maximize benefits from the open science communities and precompetitive collaborations ● Data landscape: create an understanding of existing applications and data sources in the company and readiness for FAIR ● FAIRification: get started with FAIRifying datasets, defining metadata, appropriate standards, provenance etc. ● Data catalog: build collaborative environment around data catalog (e.g. using Fairspace)
  24. 24. Example: OMOP CDM v5 for RWE/RWD ● Observational healthcare data ● Fields defined per domain ● Standardized Vocabularies
  25. 25. cBioPortal: hard to resist value proposition ● 4000+ citations in literature ● ~20k+ unique users per month ● Local instances deployed in many pharma companies and cancer centers
  26. 26. Open Targets ● Integration of 20+ key public data sources for target discovery
  27. 27. Forget about AI. Data &UX matter. Statement #3 ● Decision making by HIPPO instead of by algorithms ● Make AI developers happy with relevant FAIR data ● Strong semantics are key to and standards can help (e.g. OMOP, CDISC) ● Investments in UX are costly & you should capitalize on them (e.g. OpenTargets) ● It’s a great time to build knowledge graphs! @keesvanbochove @TheHyveNL
  28. 28. We advance biology and medical sciences by building and serving thriving open source communities

At the Bio Data World conference in Basel in December 2019, Kees van Bochove, Founder of The Hyve gave a talk on re-use of pharma R&D data, and what strategies could be used to realize operationalization of FAIR data at scale.

Views

Total views

756

On Slideshare

0

From embeds

0

Number of embeds

73

Actions

Downloads

13

Shares

0

Comments

0

Likes

0

×