Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PA webinar on benefits & costs of FAIR implementation in life sciences


Published on

The slides from the Pistoia Alliance Debates Webinar where a panel of experts from technology support providers and the biopharma industry, who have been invited to share their views on the "Benefits and costs of FAIR Implementation for life science industry".

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

PA webinar on benefits & costs of FAIR implementation in life sciences

  1. 1. Benefits and costs of FAIR Implementation for the life sciences industry Moderated by: Ian Harrow (Pistoia Alliance) Panelists: James Malone (SciBite) Filip Pattyn (OntoForce) Alexandra Grebe de Barron (Bayer) Drashtti Vasant (Bayer)
  2. 2. This webinar is being recorded
  3. 3. ©PistoiaAlliance FAIR Guiding Principles at-a-Glance 3 Findable: • F1 (meta)data are assigned a globally • unique and persistent identifier • F2 data are described with rich metadata • F3 metadata clearly and explicitly include the identifier of the data it describes • F4 (meta)data are registered or indexed in a searchable resource Interoperable: • I1 (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation • I2 (meta)data use vocabularies that follow FAIR principles • I3 (meta)data include qualified references to other (meta)data Accessible: • A1 (meta)data are retrievable by their identifier using a standardized communications protocol • A1.1 the protocol is open, free, and universally implementable • A1.2 the protocol allows for an authentication and authorization procedure, where necessary; • A2 metadata are accessible, even when the data are no longer available Reusable: • R1 meta(data) are richly described with a plurality of accurate and relevant attributes • R1.1 (meta)data are released with a clear and accessible data usage license • R1.2 (meta)data are associated with detailed provenance • R1.3 (meta)data meet domain-relevant community standards Source: The FAIR Guiding Principles for scientific data management and stewardship. Wilkinson MD et al 2016
  4. 4. Poll Question 1 Where is your workplace? A) A biopharmaceutical company B) An agriculture or food company C) A technology provider D) An academic institution E) Other
  5. 5. Poll Question 2 How mature is FAIR implementation in your workplace? A) Minimal understanding of FAIR guidelines B) Good understanding but minimal FAIR implementation C) FAIR implementation is well underway D) Mature FAIR implementation in selected areas of my organisation E) Mature and systematic implementation of FAIR across my organisation
  6. 6. ©PistoiaAlliance Our Expert Panel • James Malone – CTO at SciBite, a semantic technology company. Previously, Lead ontologist at EMBL-EBI. Worked on Open Targets & EBI’s linked data platform. PhD in Machine Learning in Bioinformatics. • Alexandra Grebe de Barron – IT business partner for Real World Evidence at Bayer. Works closely with scientists across all functions to make data FAIR for advanced analytics. PhD in Molecular Genetics. • Filip Pattyn – Scientific lead at ONTOFORCE, a semantic technology company. Previously, a Consultant at Menapi Informatics. Worked on ICT and bioinformatics. PhD in Applied Informatics in Medical Sciences. • Drashtti Vasant – IT Business Partner for Translational Sciences at Bayer. Currently leading a project to enable data integration of pre-clinical studies. Worked at the European Bioinformatics Institute and Thomson Reuters. Known as the “FAIR Ladies” at Bayer
  7. 7. ©PistoiaAlliance Cost effective FAIR James Malone @scibite @jamesmalone
  8. 8. ©PistoiaAlliance More than the sum of its parts
  9. 9. ©PistoiaAlliance How do we make all of the components actually happen? Where are the pinch points?
  10. 10. ©PistoiaAlliance The Cost of unFAIR • Cost of not doing FAIR – the cost of lost opportunity – is very high • May 2018 EC report on cost-benefit estimated missed opportunity to be >€10 Billion • Suggests barriers persist: “The fact that the FAIR principles are not common practice yet is due to numerous reasons.” “Despite the significant annual cost…many research performing organisations and infrastructures are still reluctant to apply the FAIR principles and share the datasets because of real or perceived costs, mostly related to time investment and money.”
  11. 11. ©PistoiaAlliance Across Industries • Life sciences is a good starting point as so much open data • But not just a life science problem • The problem persists even across organisations who do not open their data
  12. 12. ©PistoiaAlliance Technical Debt • There exists a lot of historic data with intrinsic value • Q: Is tomorrow’s data always going to be more valuable than today’s? • Automating as much of this as possible seems sweet spot for historic data • Retrospective, manual curation expensive and likely impossible: • much of metadata missing • data generators have moved on • Commercial technology no longer supported • These challenges teach us why prospective FAIR is valuable..
  13. 13. ©PistoiaAlliance Budgeting for Serendipity • Structuring data for reuse should open up possibilities we can’t conceive today • Ishino et al (1987) reporting of repeat sequences accidentally cloned part of gene sequencing work • Mojica et al (1993) often go to first publication on CRISPR, but made the connection with Ishino work after ‘trawling literature’^ • Value of hypothesis-free + hypothesis-driven research • Data needs to be ‘broadly reusable’ to increase the opportunity now and in future ^
  14. 14. ©PistoiaAlliance Cost of Representing Biology • “Machine readable” representations get very complex, very quickly • Knowing up front the future use is very hard, what do we represent?
  15. 15. ©PistoiaAlliance The EBI RDF Linked Data Platform • Spectrum of semantics - knowing up front the future use is very hard • vs OWL modeling • FAIR is not simply be ‘rebranding’ of semantic web (Mons et al, 2017) • What can we justifiably simplify vs what is unsimplifiable • Coordination took real effort (plus other cost to transform, maintain) • Significant coordination activity even across 6 groups (and big advantage that UniProt RDF already existed and we had previously worked on Atlas RDF) • Was really only achievable with minimum budget because the data was already well annotated • (Does not mean we shouldn’t try..)
  16. 16. ©PistoiaAlliance Cost of Culture Change • Curation has always been an underfunded, underappreciated research activity • Most value is in producing data, summary analysis, actionable insights • Peer review already has ‘issues’ • Investing in technology necessary but not sufficient • People require investment • Involve data generators in these conversations
  17. 17. ©PistoiaAlliance FAIR as a Machine Learning Enabler • Creating training data, wrangling it, et al one of biggest parts of ML • Labeled training & test sets crucial step, need generating or obtaining • Also makes creating a new data set (e.g. subsetting a few diff sets to create a new one) is expensive • FAIR can help to: 1. Get you the data in the first place 2. Help you understand how you can use it (i.e. what is the license) 3. Perform feature extraction by making those features more readily extractable 4. Incorporate domain heuristics (e.g. from ontologies used to describe data)
  18. 18. ©PistoiaAlliance Cost effective ways to think about FAIR • Ask third party vendors you use if they support FAIR (and how) – includes technology providers through to CROs • Agree on your metadata standards across org and stick to them • Involve data generators in your discussions(!) • If you/your group are wrangling data for machine learning, think about ‘putting’ back’ the clean up they do • Let any license/data usage live with the data • If you are developing knowledge graphs, think about the schema you design • For data capture, think about hooking up to existing ontology standards where suitable • Automate annotation where feasible using technology cost $ $$
  19. 19. ©PistoiaAlliance Increasingly FAIR • Ensure FAIR data is shared across an organization to demonstrate value • Fund public curation in support of FAIR • Use of FAIR-compatible metadata in ELNs • Mandate minimum metadata for every experiment (requires automated FAIR metric tests) • Ensure FAIR data is shared across an organization to demonstrate value cost $$ $$$
  20. 20. ©PistoiaAlliance Each step brings cost & benefit: objective should be to produce the required resolution you need to make sense of the data
  21. 21. ©PistoiaAlliance FAIRness as a cost-based measurement How to assess FAIRness of a data source? When is a dataset FAIR enough? Filip Pattyn, PhD
  22. 22. ©PistoiaAlliance Simple as counting the principles? q F1. q F2. q F3. q F4. q A1. q A1.1. q A1.2. q A2. q I1. q I2. q I3. q R1. q R1.1. q R1.2. q R1.3. q F1. q F2. q F3. q F4. q A1. q A1.1. q A1.2. q A2. q I1. q I2. q I3. q R1. q R1.1. q R1.2. q R1.3. Total count Total count Data source 1 Data source 2
  23. 23. ©PistoiaAlliance Difficult to compare
  24. 24. ©PistoiaAlliance How to measure FAIRness? • Measuring FAIRness –Clear definition of what is being measured and why one wants to measure it. –Describe what’s a valid result and how one obtains it, thus reproducible • Qualities of a good measurement –: able to distinguish differences
  25. 25. ©PistoiaAlliance What’s the rationale behind FAIR? • (Re-)use data for multiple purposes • What’s the impact for the end-user? Who’s the audience? • More FAIRness should mean less hurdles to solve a use case
  26. 26. ©PistoiaAlliance When is a dataset FAIR or FAIR enough? • Propagation of FAIRness –I2. (meta)data use vocabularies that follow FAIR principles > >
  27. 27. ©PistoiaAlliance More FAIR means less effort • What’s the effort needed to make a data source more FAIR so one can solve a single or multiple use cases? • Effort quantified as a cost –Time –Human and machine resources • Unit of measure –Price ($) • Potential to calculate the Return On Investment (ROI) on FAIR data –Who benefits when a data sources is more FAIR? They don’t have to do the effort anymore.
  28. 28. ©PistoiaAlliance More FAIR means less effort transformations … … … more FAIR application graphical UI API
  29. 29. ©PistoiaAlliance Different types of effort transformations … … … more FAIR application graphical UI API
  30. 30. ©PistoiaAlliance FAIR enough means less effort application graphical UI API • ROI of FAIR enough data • Data Consumers can –solve use case that couldn’t be solved before –solve use cases with much less effort
  31. 31. ©PistoiaAlliance FAIR enough to bring value Time Cost 1st effort maintenance maintenance 2nd value 1st effort end-usersdatascientists value 1st & 2nd effort
  32. 32. ©PistoiaAlliance Food for thought Price vs. Time of data transformations > Unit of cost –Faster by more expensive skilled data scientist –Slower by less expensive junior data scientist –Manual vs. automated Resources Time fast but expensive Slow but inexpensive
  33. 33. ©PistoiaAlliance Food for thought Data source FAIRness evolution FAIRness ($) data generation initial use cases A new use cases B new use cases C new use cases E new use cases Fdata generation initial use cases D Technological advancements
  34. 34. ©PistoiaAlliance FAIRness as a cost-based measurement • Pragmatic, no over-engineering • Use case and user oriented …. & dependent > not fixed • Ratio scale • Calculate ROI of FAIRness Consensus units of cost Hans Constandt Bérénice Wulbrecht Kenny Knecht, PhD Paul Vauterin, PhD Filip Pattyn, PhD +32 486 739 129
  35. 35. ©PistoiaAlliance /////////// Benefits and costs of FAIR implementation for the life sciences industry PISTOIA alliance debates May 2019 FAIR ladies: Alexandra Grebe de Barron Drashtti Vasant
  36. 36. ©PistoiaAlliance Why FAIR in Pharma 36 scientific discovery medical care O O OH O H3C EHRAI Digitalization to overcome the gap towards translational medicine
  37. 37. ©PistoiaAlliance 3 - 9 % of all research expenditure Not having FAIR research data costs the European Economy 10.2 - 26 bn EUR every year 37 Written by pwc: detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1 • Time spent on data collection, integration, analysis, registration, publication and indexing • Cost of storage for duplicated data • Licence fees due to lack of open access to FAIR data Impact on research activities • Redundant research • Lack of clarity about licenses and data use conditions • Cross-fertilization Impact on collaboration • Develop innovative services • Create new business models • Number of patents filed • Use of machine science • Job creation Impact on innovation Allocation of 2,5% of R&D expenditure into FAIR implementation would yield a positive ROI.
  38. 38. ©PistoiaAlliance 38 When are we done with it? How much does it cost to make all our R&D data FAIR until 2022? Never - as long as we innovate. Cost of FAIR implementation: Make legacy data FAIR Make data generation FAIR Create awareness, educate, change mindset, incentivise Set up FAIR ecosystem Depends on the use case.
  39. 39. ©PistoiaAlliance as described in the FAIR action plan FAIR ecosystem: deliverables of a FAIR data service team FAIR digital objects data/metadata software/code/algorithms protocols models licenses other research outputs FAIR components skills and investment policies data mgmt plans (DMPs) persistent identifiers standards metrics FAIR services curation and stewardship data lifecycle management long-term preservation file format transformation data protection / security handover plans for discontinued services 39
  40. 40. ©PistoiaAlliance Skills needed to support the implementation of FAIR The FAIR data service team 40 – Business Analyst (Strategic mindset) – Curator/Domain Expert – Service Engineer/Developer – Data/Ontology Engineer – Product Manager – System Architect – Data Steward – Data Scientist
  41. 41. ©PistoiaAlliance Business Value Benefits of FAIR Data 41 Innovation Better Prediction Reduced Trial length Early market access Generate Insights
  42. 42. ©PistoiaAlliance PORTIN - Bayer Case Study 1 42 Game Changer within translational data integration. Platform for access to clinical, biomarker and biosample data from Bayer-sponsored interventional and non- interventional clinical trials. Easy access to all available clinical, biomarker and biosample data. All data are semantically integrated within a common repository. Data privacy questions and informed consents are considered appropriately and contextual. PORTIN is agnostic to data sources, types, or variety of data owners. It enables scientists to search for patient cohorts within or across studies. Reduced FTEs Additional revenue generated (insights) Savings on hardware costs = ~3 mio € p.a. till phase 2 (predicted profit: 350 mio € after phase 3)
  43. 43. ©PistoiaAlliance IMI eTox Case Study 2 43 The eTOX project broke ground in that it enabled pharmaceutical companies to share their data on the toxicity of drug-like compounds for the first time on a large scale. This resulted in the creation of a large database, which can now be mined for further insights, including predictions on whether or not a particular compound is likely to have an adverse effect on patients. Tox studies data and in silico models expected to: enable 10% spend reduction for 1% of INDs and enable better decisions enable 10% spend reduction for 10% target and candidate selection and lead optimization Overall expected impact in 5 years = ~82 million euros IMI impact Value of R&D project Direct product outputs Investment into IMI projects – Proba- bility of success Deve- lopment cost Reach, relevance, reputation += x– +
  44. 44. ©PistoiaAlliance IMI AETIONOMY Case Study 3 44 The AETIONOMY consortium chose to seek molecular characteristics of Alzheimer’s disease (AD) and Parkinson’s disease (PD) that might contribute to a ‘taxonomy’ of these conditions, and help our community move towards a precision-medicine approach. The project has developed innovative computational tools to manage and interpret the complex healthcare and research data environment. Identified groups of patients that differ significantly from each other. New information about both the diseases Insights into new disease models Evaluate new data mining approaches Validate new mechanistic disease hypotheses
  45. 45. ©PistoiaAlliance Summary 45 Research is key driver of productivity and economic growth Redundant research does not contribute to science Collaboration, especially public-private, is the KEY for successful research output and innovation Costs on time/storage/license fees spent by researchers to manually read and understand metadata could be down to almost zero by FAIR data A sustainable FAIR ecosystem is the foundation for advanced data analytics and AI Change in mindset is a battle half won Data and service providers need to be part of the change Puts the “patient” in the center
  46. 46. Audience Q&A Please use the Question function in GoToMeeting
  47. 47. Data for AI Models: The Past, The Present, The Future Join us for the next Pistoia Alliance AI Center of Excellence webinar: Presented by: Prof. John Overington, CIO of the Medicines Discovery Catapult Thursday June 6th, 11 am EST/ 4pm GMT
  48. 48. @pistoiaalliance