Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How 2019 became the year FAIR landed in biopharmaceutical R&D


Published on

At the Pharma IT 2019 conference in London, Kees van Bochove, Founder of The Hyve gave a talk on how 2019 became the year in which many biopharmaceutical companies have operational programs to make data FAIR across the enterprise.

Published in: Science
  • Be the first to comment

How 2019 became the year FAIR landed in biopharmaceutical R&D

  1. 1. Kees van Bochove, Founder, The Hyve How 2019 became the year FAIR landed in biopharmaceutical R&D @keesvanbochove #PharmaTec19 London, 24 Sep 2019
  2. 2. Outline 1. FAIR Data is about people 2. The data lake is a passing phase 3. Relational data models are back
  3. 3. The Hyve We advance biology and medical research… … by building and serving thriving open source communities. Services Professional support for open source software in biomedical informatics ➢Software development ➢Data engineering ➢Consultancy ➢Hosting / SLAs Core values Share Reuse Specialize Office Locations Utrecht, The Netherlands Cambridge, MA, United States Customer Segments Pharma Life Sciences Healthcare Fast-growing Started in 2012 40+ people by now
  4. 4. FAIR Data is about people Statement #1 @keesvanbochove @TheHyveNL
  5. 5. The roots of FAIR ►Public-private partnership to advance: ►Open Science ► Sustainability & reuse of data ►Workshop in Leiden in 2014 ►Towards a Modular Blueprint ‘Floor-plan’ of a safe and fair Data Stewardship, Trading and Routing environment, provisionally called the Data FAIRPORT
  6. 6. FAIR Workshop at The Hyve in Utrecht, 2018
  7. 7. GO-FAIR Initiative Pillars
  8. 8. FAIR Data Principles <> People  GO-CHANGE: socio-cultural changes around working together on data: it’s about connecting people to each other’s data  GO-TRAIN: promote awareness of FAIR and teach best practices on how to make your data available to others  GO-BUILD: provide the infrastructure that supports this change  Goes by many names: digital transformation, data-driven, FAIR, silo- breaking etc., but the result is improved (scientific) collaboration
  9. 9. Why resilience to change matters ● Domain changes and focus shifts: new data types, applications etc. ● Organizational changes: M&A, re-orgs, people moving roles etc. ● Technology changes: new software and hardware platforms, analysis methods, automation, ML/AI etc.
  10. 10. Let’s look at one of the 15 principles as example Findable: F1. (meta)data are assigned a globally unique and persistent identifier; F2. data are described with rich metadata; GO-CHANGE ● Adapt information processes to systematically acquire, capture and persist metadata GO-TRAIN ● Work with data and domain experts to define important metadata to capture for all datasets GO-BUILD ▶ Choose widely accepted and easy to produce machine-readable format for describing metadata (hint: RDFa, JSON-LD etc.) ▶ Master metadata management services FAIR Maturity Indicators ● F2A Structured Metadata ● F2B Grounded Metadata
  11. 11. FAIR Data is about people Statement #1 ● Connecting people to each other’s data ● Changing processes ● Supporting change @keesvanbochove @TheHyveNL
  12. 12. The classical monolith Enterprise Data Warehouse ETL ETL ETL Business Intelligence / Analytics
  13. 13. The modern (?) monolith Ingest Self-service Pipelines AnalyticsEnterprise Data Lake Ingestion Team Data Engineering Team Unification TeamSearch TeamPlatform API Team Analytics Team Architectural division Axis of change
  14. 14. 14 Network architectures
  15. 15. Decentralized data management ● IRI / identifier schemes ● Metadata standards ● Provenance standards CDO Data Federation { { Oncology Neuro- science Development ClinOps HCS Omics platforms Data science Preclinical ADME/Tox Biomarker dev. RWD Epidemiology ● Catalog function ● Data standards ● Entities / data sets Publish
  16. 16. Advantages of a decentralized FAIR approach ● More resilient to change: no dependency on large central functions ● Allows for an iterative data strategy operationalization (no ‘big bang’ data lake delivery needed, FAIRification can start today and locally) ● No need to shuffle people around to start a big data lake project: embed informatics and data experts directly in the research and development teams ● Centralize only standardization functions, decentralize the rest  empower teams to do their own data science and informatics ● Embrace usage of external data and collaborations, no need to ‘ingest first’ via a central function, but use & link directly
  17. 17. The data lake is a passing phase Statement #2 ● Centralization is a potential bottleneck and a barrier for change ● The solution is in decentralization of storage, applications etc. ● Standards management and data federation as central functions @keesvanbochove @TheHyveNL
  18. 18. Teams at The Hyve: open source communities Research Data Management ● FAIR Data Governance consultancy ● Fairspace (meta)data management Genomics ● Cancer data portal: cBioPortal ● Knowledge base: Open Targets Health Data Networks ● Data warehouses: tranSMART, i2b2 ● Cohort selection: Glowing Bear ● Request Portals: Podium Real World Data ● Real world evidence: OMOP/OHDSI ● Wearables platform: RADAR-BASE
  19. 19. FAIR Services at The Hyve ● Semantic modelling: creating (meta)data models that allow traversal of linked data ● Data conformance: choose the right data standard for specific problems, align with community standards to maximize benefits from the open science communities and precompetitive collaborations ● Data landscape: create an understanding of existing applications and data sources in the company and readiness for FAIR ● FAIRification: get started with FAIRifying datasets, defining metadata, appropriate standards, provenance etc. ● Data catalog: build collaborative environment around data catalog (e.g. using Fairspace)
  20. 20. Example: OMOP CDM v5 for RWE/RWD ● Observational healthcare data ● Fields defined per domain ● Standardized Vocabularies
  21. 21. cBioPortal: hard to resist value proposition ● 4000+ citations in literature ● ~20k+ unique users per month ● Local instances deployed in many pharma companies and cancer centers
  22. 22. Relational data models are back Statement #3 ● RDBMS abandoned in favor of NoSQL, ‘schemaless’, ‘we use ElasticSearch’ etc. ● But some applications need strong (relational) semantics (e.g. CDISC) ● Descriptions can be in relational db (e.g. OMOP), RDF, JSON-LD etc. ● Underlying infrastructure doesn’t matter as long as it does not leak abstractions @keesvanbochove @TheHyveNL
  23. 23. We advance biology and medical sciences by building and serving thriving open source communities