Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fair by design

42 views

Published on

Themes and objectives:
To position FAIR as a key enabler to automate and accelerate R&D process workflows
FAIR Implementation within the context of a use case
Grounded in precise outcomes (e.g. faster and bigger science / more reuse of data to enhance value / increased ability to share data for collaboration and partnership)
To make data actionable through FAIR interoperability
Speakers:
Mathew Woodwark,Head of Data Infrastructure and Tools, Data Science & AI, AstraZeneca
Erik Schultes, International Science Coordinator, GO-FAIR
Georges Heiter, Founder & CEO, Databiology

Published in: Healthcare
  • Be the first to comment

  • Be the first to like this

Fair by design

  1. 1. Pistoia Alliance Webinar FAIR by Design 14th May 2020 15.00 to 16.30 BST
  2. 2. This webinar is being recorded
  3. 3. Audience Q&A Please use the questions box
  4. 4. Introduction Ian Harrow, Project Manager Pistoia Alliance
  5. 5. ©PistoiaAlliance Themes and Objectives 5 • To position FAIR as a key enabler to automate and accelerate R&D process workflows • FAIR Implementation by design within the context of a use case • Grounded in precise outcomes – e.g. faster and bigger science / more reuse of data to enhance value / increased ability to share data for collaboration and partnership • To make data actionable especially through FAIR interoperability
  6. 6. ©PistoiaAlliance Bios 6 Mathew Woodwark • Head of Data Infrastructure, Tools, Data Science and AI at AstraZeneca • Experienced Informatics and Information Management professional, with an established track record of delivery • Combines biological understanding, organisational psychology and technical skills to managing a wide portfolio of complex Informatics projects Erik Schultes • International Science Coordinator for the GO-FAIR International Support and Coordination Office in Leiden, The Netherlands. • Previously held appointments at Duke University Medical Center and Leiden University Medical Center. • Erik has worked on data intensive projects within academia and the private sector. Georges Heiter • Founder & CEO for the Databiology • Provides biomedical information management and orchestration for the life sciences and healthcare sectors. • Enables global distribution of biomedical data, applications and infrastructure.
  7. 7. ©PistoiaAlliance Agenda 7 Time (BST) Title Presenter 15:00 Introductions & housekeeping Ian Harrow, Pistoia Alliance 15:05 Case Study: AZ’s Science Data Foundation Mathew Woodwark 15:25 FAIR digital objects for automating processes Erik Schultes 15:45 FAIR automation workflows and applications Georges Heiter 16:05 Panel All speakers Moderator: Ian Harrow 16:30 Close
  8. 8. AstraZeneca’s Science Data Foundation: Analytics-ready data for machine learning and AI Mathew Woodwark Head of Data Infrastructure and Tools Data Science and AI Pistoia European Conference, 11th March
  9. 9. AstraZeneca generates and has access to more data than ever before. Target ID Target Validation Discovery Pre- Clinical Clinical Commerci al Post Marketing Surveillanc e Genetic & Genomic Data Patient-Centric Data Sensors & Smart Devices Interactive Media Healthcare Information network Market Data
  10. 10. 3 A concerted effort is required to shape and govern data, transforming it into a strategic asset. From disconnected internal databases and external sources to data that is FAIR: Findable, Accessible, Interoperable, Reusable ADME Imaging RWE In Vivo Biology In Silico Clinical Trial Phenotypic Screens HTS Genomic Pharmacology Toxicology Biomarkers Efficacy Literature Chemistry
  11. 11. Genomics EHR The way we analyse data is changing. Connected data allows us to unleash the power of AI. Today Security/privacy is a key consideration <2 years >5 years INDIVIDUAL DATA TYPES CONNECTION OF DATA TYPES ALGORITHMIC INTELLIGENCE Genomics Sensor/ smart EHR Market Interactive media Sensor/ smart Market Interactive media
  12. 12. 5 Data Science uses scientific methods, processes and AI algorithms to extract insights from these data. Artificial Intelligence Any process, task or decision where computerised technology may in some way mimic and/or replace human intelligence. Machine Learning Using algorithms to give a computer system the ability to ‘learn for itself’ deriving patterns and rules from data it is exposed to, as opposed to explicit programming. Manual feature extraction Deep Learning A type of machine learning mimicking the dense set of interconnections in our brains. 1950 1980 2010 Automated feature extraction
  13. 13. 6 Big Data / Cognitive Computing Robots / automation Sensors / IoT NLP / NLG / NLU Computer vision / image processing Neural networks / deep learning Statistical / machine learning Chatbots / assistants AI is a diverse and constantly changing set of disciplines. AI is any process, task or decision whereby a computerised technology may in some way mimic and/or replace human intelligence.
  14. 14. 7 Opportunities to extract scientific insights using Data Science and Artificial Intelligence (AI) exist across R&D. Target identification less attrition Trial Optimization faster and more efficient Imaging less time Personalised Medicine the right medications for the right patients Clinical real-time data innovative trials 10% 30% Machine Learning Ÿ Visual Analytics Ÿ Advanced Statistics Ÿ Neural Networks Ÿ Data Exploration Signal processing Ÿ Natural Language Processing Ÿ Math. Modeling Ÿ Knowledge Representation Data Access Ÿ Standards Ÿ Data Strategy ŸTraining & Awareness Ÿ Partnership Management *Statistics above are for illustrative purposes only Deeper and more sophisticated scientific insights in patients, medicines & disease. 30%
  15. 15. 8 Opportunities to extract scientific insights using Data Science and AI exist across R&D Genomics Personalised Medicine Disease Understanding Drug Design & Synthesis Imaging Deeper and more sophisticated scientific insights in patients, medicines & disease. Clinical 1 2
  16. 16. 14 May 2020Name 9 Putting it all together Data sources and core systems combined create a data backbone upon which we can leverage AI based capabilities.
  17. 17. Name 10 This is the place where data science and AI impact lives >> Our Mission in the Data Science & AI team is to collaborate across R&D to drive innovation through data science and AI. Improving our understanding of disease and uncovering new targets Transforming R&D processes Speeding the design and delivery of new medicines for patients >> Our Vision is that by 2025, data science and AI will have transformed R&D, enabling AZ to accelerate the delivery of the most life changing medicines to patients.
  18. 18. 11 CONFIDENTIAL Developing standards, governance & policies, ensuring trust, privacy and security in data. Processing, formatting, profiling, structuring, capturing meaning in, and relationships between data. Creating tools and techniques to extract value, make decisions, report, analyse and act on data. Investing in education for all, data science communities, job families, studentships, external comms. >> >> >> We use a simple framework to drive innovation in Data Science & AI Control Organise Insight Learning>>
  19. 19. 12 Hub provides strong central capability support, while R&D functions are spokes providing insights and more. • Data management, standards & policies • Tools & Platforms • Education & Awareness Control Organise Insight Insight Insight Insight Insight Insight Insight Insight Insight Learning
  20. 20. Science Data Foundation 13
  21. 21. The challenge: Access to high quality data is our life blood, yet today R&D teams cannot rapidly access and exploit it for re-use DATA WE OWN TODAY EMERGING DATA SOURCES, OWNED BY OTHERS We don’t know what we have or where it is We only use it once We can’t compare or combine it We don’t know what’s valid AZ clinical trial data (23,000 studies) & imported clinical data Biomarker data Anonymized external data CGR Genomic Data Medical image data Real-World Evidence DataScreening and Assay data
  22. 22. Open by default – compliant by design – insights by your deadline BioPharmaceuticals R&D For Internal Use Only15
  23. 23. What it is: ü Collaborative programme between Science IT, DS&AI and R&D ü Building enduring capabilities for storing and connecting data sources in a compliant way üA change management programme encouraging data capture and tagging for re-use ü Analytics-ready data for ML and AI, the tools, processes and compute environments to drive scientific insight Science Data Foundation: Democratising data with re-use in mind What it isn’t: ✕ One-time effort ✕ Clean-up effort across all R&D data BioPharmaceuticals R&D For Internal Use Only16
  24. 24. Science Data Foundation: A common way to manage R&D data Master Data: Common Language Workflow(s) Sources Workflow(s) Sources Workflow(s) Sources Workflow(s) Sources Workflow(s) Sources Workflow(s) Sources Workflow(s) Sources Indexing Sources Indexing Sources Indexing Sources Biological Insights Knowledge Graph Data Catalogue Data selection for AI AI Orchestration AI Algorithms Metrics & Rules (Marts) Reports & Dashboards ‘Fact’ Discovery (NLP) Analytics Data SAR Data Reaction Data Imaging Data Metadata Metadata Metadata Metadata Science Data Foundation Biomedical Research DataDrug Design Data Patient Data Metadata Omics Data Metadata Real World Data Metadata Literature Metadata AZ Documents Metadata Comp Intelligence Metadata Upstream Processing Down Stream Analysis 17
  25. 25. SDF Programme Outline Vision All scientific decision- making in AstraZeneca R&D is supported by or improved through the application of data science. Goal Strategy ObjectivesA scalable and enduring scientific data supply-chain is founded comprising both technology and services, through which data is made ‘analytics-ready’ accessible to users through a seamless ‘intent to insight’ workflow. Ø Build and operate platforms for hosting at least four key analytical data types, that make data ‘FAIR’. Ø Data interconnections support cross-domain exploration and analytics. Ø Tools and services to support data science workflows are created. Ø Data-use is compliant-by-default due to data governance wrappers. R&D data operations and IT platforms will be co-created between Data Science and AI R&D business units and Science IT to be operated as enduring capabilities with a focus on making data ‘FAIR’.
  26. 26. 01 SDF’s biggest tangible value contribution will be to accelerate innovative science through direct enablement of data science workflows and programmes designed to introduce data-driven decision-making, Accelerate efforts in AI, data and digital 02 03 - SDF is a key enabler of AZ’s Growth Through Innovation Strategy SDF Strategic Drivers Data lies at the heart of scientific workflows. By democratizing data through SDF, we will change our culture to one that is more collaborative and truth-seeking, where decisions are data-driven and where we increasingly perform as an enterprise team. Advance our culture Through creation of an enduring data supply-chain, SDF will increase AZ’s agility to: take advantage of new data analysis methodologies and technologies; incorporate and drive value from new data sources; and actively govern and manage data in response to changing ethical and legal requirements. Build and adapt capabilities for the future
  27. 27. Science Data Foundation (SDF) Goals A foundational programme to enable the Growth Through Innovation Strategy. Create an enduring supply-chain of data of various types and across the discovery and development pipeline that will drive scalable, and efficient data science operations. Generate analytics-ready data Create an efficient and seamless experience throughout the chain of activities scientists undertake to undertake data science. From planning projects, obtaining the data they need, performing analyses using powerful tools to finally applying new insights systematically and at scale, into R&D pipelines. Seamless intent to insight Introduce governing principles, supported by technology, to minimise risk of data misuse by ensuring compliance to internal and external policies. This shall allow scientists to focus on innovative science, guided through compliant ‘paved-paths’ to R&D data. Ensure compliance by default
  28. 28. Relationships Between SDF Goals Data sources: Operational systems, other data platforms, instruments and external sources Analytics Ready Data Intent to insight The analytics-ready data goal will take data from sources, standardise, integrate and enrich the information. This is then supplied into the intent to insight data-usage process. The intent to insight process will create a seamless data usage process that provides a compliant by default path to data use and analytics. Compliance by defaultThe compliance by default goal will act to help define or update policies, assert that the policies adhere to external regulations and ensure that the intent to insight process applies
  29. 29. SDF Programme Structure SDF Leadership SDF Change & Comms Sources Workflows SDF-Core Data Platform SDF- Data Policy & Governance SDF-Data Find & Integrate SDF-Data Science StorageIngestion Curation ExplorationAccess Analysis SDF Capability Enabling Workstream Cross-data-type SDF Workstream Cross data-type data- management, -quality and -usage policies defined. Scientific data management platforms setup (e.g., reference data management, data catalogue). Governing procedures that apply policies to SDF processes. Provide cross-cutting capabilities that enable all data-type workstreams to develop against a consistent, supported data foundation. SDF Analytical Data-Type Workstream Analytical data type workstreams (ADD, Patient, Omics, Imaging) will prepare and process data and meta-data for ingestion into the core data platform. In doing so: the data will become accessible according to standard policies and access mechanism alongside other data types; Standard patterns of exploration and data science will be enabled, although data-type workstreams are required to develop highly data-specific exploration and modes of analysis (e.g., genome browser and ‘omic variant analysis for SDF-Omics). SDF Workstreams SDF Data Workstreams: ADD; Patient; Omics; Imaging
  30. 30. Goal: Generate Analytics Ready Data Ingest and cataloguing Standardise and improve quality Curation and enrichment Data Hosting ü Reduce the time taken and effort by data scientists to assemble data into a single place. ü Reduce costs associated with lost innovation opportunities due to scientific data being unfindable, unusable or inaccessible to analytics toolsets. Data availability triggers automated flow into hosting environment Automated cataloguing of data on ingest is an enabler of findability Data ingestion can be templated to ensure new data sources have low barrier to also becoming hosted. Data can be ‘cleaned up’ by applying standardisation of key terms and identifiers. Data quality can be measured to help ultimate consumers plan their analyses. Enrichment of information and metadata can be both automated and expert-driven to create greater data reuseability and thus value. All data and metadata is hosted through an accessible environment so that information discovery and analytics tools can gain systematic (yet secure) access. Full track and trace and monitoring of a maximally automated process supports content reporting and auditability. Target SDF Capabilities An enduring supply-chain of data of various types and across the discovery and development pipeline that will drive scalable, and efficient data science operations. Key requirements are: • Data quality and completeness • Machine readable metadata • A hosting environment that can be support access by other systems Description Benefit Strategy
  31. 31. Goal: Seamless Intent to insight Ideation using Information Discovery Register intent and make data request Data is provisioned Analysis and insights Application of insight ü Reduce time and effort to generate and administer intent to insight activities allows greater scale and lower cost to reuse data. ü Reduce wasted effort associated with scientifically flawed or non-compliant data reuse requests. ü Increase analytics capabilities to drive innovation ü Improve experience and job satisfaction Single point of entry to simplify and lower barriers to data reuse. Powerful and intuitive information discovery tools and connection to other experts to enable scientific ideation Intent, data and analysis requirements captured and issued electronically to ensure governance with minimal administration effort Data provided to an analysis team in the desired format and analysis environment. Data compliance and security are by default. Bespoke data products also supported Powerful analysis environments to support data science & AI workflows. Insights are captured and traceable to requests QA triggered for investment decisions and external publications. Insights with potential as BAU decision-support processes will trigger further creation of productionised data- analytics pipelines Full track and trace and monitoring of a maximally automated process supports audit and process improvement Target SDF Capabilities The chain of activities scientists undertake to plan data science projects, obtain the data they need, perform analyses and finally apply new insights systematically and at scale into R&D pipelines. Description Benefit Strategy
  32. 32. Goal: Ensure compliance by default Ethical and legal frameworks Manage data standards Securing our data Training ü Reduce likelihood of fines associated to legal or ethical misuse of data. ü Reduce the burden on scientists to become compliance experts and allow them more time to focus on science, leading to increased innovation- based revenue generation. Frameworks that are built into systems and processes are fit for innovation purposes; balancing potentially changing restrictions that prevent misuse with enablement of data science. Information that supports ethical and legal data reusability is machine readable and can be efficiently managed by the Data Office. Host systems are secure from cyber attack and only allow users to perform operations such as data access, copy or movement without increasing risk. Target SDF Capabilities Governing principles, supported by technology, to minimise risk of data misuse by ensuring compliance to internal and external policies. This shall allow scientists to focus on innovative science, guided through compliant ‘paved-paths’ to R&D data. Description Provide training on processes and systems so that compliant paths to request and access data are known. Compliance monitoring Active compliance monitoring to provide early warning of risks associated to data reuse, helping to target training and remediation. Benefit Strategy
  33. 33. Ideation & discovery Intent & Request Data Provisioning Analysis Application of insight As a scientist the Data Office provides me with a single point of entry to begin a process to exploit our data and the information and data exploration tools to drive my scientific creativity and ideation. I am able to find and request data online. The Data Office is on hand to advise me on issues of compliance and they also help to put me in touch with other experts. From the point of creating my request, I can follow the process easily. Whether you are an expert in AI, visual informatics, or have more scientific than IT expertise, Data Office will help you get your data to the right analysis environment, including cutting-edge cloud environments. Data office helps to ensure the right quality processes are triggered for investment decisions and external use, meaning that we customers can focus more on the science. When we’ve generated a promising new exploratory model that could be productionised to drive real value, Data Office will help us ‘productionise’ the data flow alongside our IT colleagues. Data office can get the data to you in a format you need and to a place where you can perform your analysis in a compliant and secure way. This ranges from systematic data flows to bespoke ‘data products’. ‘Intent to Insight’ – the process experienced by our customers. Data office provides a single point of entry for gaining access to data Expert support and maximal automation through the process ensures efficiency yet data compliance by default End of 2020 target
  34. 34. Deliverables, benefits and next steps New Target Biology AI-driven Lead Optimisation Driving re-use of clinical data: 1000 studies in 2019, 1 million patients in 2020
  35. 35. Erik Schultes, PhD International Science Coordinator GO FAIR International Support and Coordination Office Leiden Center for Data Science erik.schultes@go-fair.org https://www.go-fair.org http://orcid.org/0000-0001-8888-635X FAIR Digtial Objects for Automating Processes 14 May, 2020
  36. 36. Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018 doi: 10.1038/sdata.2016.18 (2016). Automating F, A, I and R
  37. 37. https://www.go-fair.org/today/fair-digital-framework/
  38. 38. Paris, October 28-29, 2019 RDA / GEDE: FAIR Digital Objects
  39. 39. https://fairdo.org
  40. 40. FAIR Digital Objects Based on Bonino 2019 minimal open standard linking the FDO components ‘everything else’ TBA
  41. 41. FAIR Digital Objects Based on Bonino 2019 minimal open standard linking the FDO components ‘everything else’ TBA 1) GUPRI resolution service 2) Recursive FDO construction
  42. 42. minimal open standard linking the FDO components ‘everything else’ TBA Machine- actionable atom-to- atom configuration FAIR Molecule a FAIR Digital Object for molecular structure
  43. 43. A minimal standard* for a machine-actionable** representation of molecular structure*** that can be the basis of organizing other heterogeneous (meta)data****. * Easy to follow, encourages voluntary adoption ** FAIR *** Foundational concept in the chemistry domain *** Knowlet-like clusters of assertions about molecular structure FAIR Molecule a FAIR Digital Object for molecular structure
  44. 44. Why FAIR Molecules? • Chemical view of the world is ubiquitous (example: biomedicine) • Chemical data is vast and complex • Rate of chemical data production is vast and growing • FAIR solutions are welcomed
  45. 45. FAIR Molecule Hackathon 21 & 22 January, 2020 Hamburg https://osf.io/ft6wn/
  46. 46. Tuesday January 21 • 13:00 Lunch • 14:00 Welcome / Overview (Erik) • 14:30 Participants Introductions - Rajaram / Kees / Luiz (FDO) - Yuliia / Alessa / Nicola (Molecular Structure) - Robert / Barbara / Stuart (Concpetual Models) - Hao / Folkert (DataBiology) - Myles / Erik (use cases) - John / Robert (Launch Pads) • 16:00 Break • 16:20 Task Organization / Discussion • 18:00 Pizza dinner • 19:00 Continue as desired • 22:00 ZBW doors close Hackathon Agenda
  47. 47. Tuesday January 21 • 13:00 Lunch • 14:00 Welcome / Overview (Erik) • 14:30 Participants Introductions - Rajaram / Kees / Luiz (FDO) - Yuliia / Alessa / Nicola (Molecular Structure) - Robert / Barbara / Stuart (Concpetual Models) - Hao / Folkert (DataBiology) - Myles / Erik (use cases) - John / Robert (Launch Pads) • 16:00 Break • 16:20 Task Organization / Discussion • 18:00 Pizza dinner • 19:00 Continue as desired • 22:00 ZBW doors close Hackathon Agenda
  48. 48. Tuesday January 21 • 13:00 Lunch • 14:00 Welcome / Overview (Erik) • 14:30 Participants Introductions - Rajaram / Kees / Luiz (FDO) - Yuliia / Alessa / Nicola (Molecular Structure) - Robert / Barbara / Stuart (Concpetual Models) - Hao / Folkert (DataBiology) - Myles / Erik (use cases) - John / Robert (Launch Pads) • 16:00 Break • 16:20 Task Organization / Discussion • 18:00 Pizza dinner • 19:00 Continue as desired • 22:00 ZBW doors close Hackathon Agenda
  49. 49. Tuesday January 21 • 13:00 Lunch • 14:00 Welcome / Overview (Erik) • 14:30 Participants Introductions - Rajaram / Kees / Luiz (FDO) - Yuliia / Alessa / Nicola (Molecular Structure) - Robert / Barbara / Stuart (Concpetual Models) - Hao / Folkert (DataBiology) - Myles / Erik (use cases) - John / Robert (Launch Pads) • 16:00 Break • 16:20 Task Organization / Discussion • 18:00 Pizza dinner • 19:00 Continue as desired • 22:00 ZBW doors close Hackathon Agenda
  50. 50. Tuesday January 21 • 13:00 Lunch • 14:00 Welcome / Overview (Erik) • 14:30 Participants Introductions - Rajaram / Kees / Luiz (FDO) - Yuliia / Alessa / Nicola (Molecular Structure) - Robert / Barbara / Stuart (Concpetual Models) - Hao / Folkert (DataBiology) - Myles / Erik (use cases) - John / Robert (Launch Pads) • 16:00 Break • 16:20 Task Organization / Discussion • 18:00 Pizza dinner • 19:00 Continue as desired • 22:00 ZBW doors close Hackathon Agenda
  51. 51. Tuesday January 21 • 13:00 Lunch • 14:00 Welcome / Overview (Erik) • 14:30 Participants Introductions - Rajaram / Kees / Luiz (FDO) - Yuliia / Alessa / Nicola (Molecular Structure) - Robert / Barbara / Stuart (Concpetual Models) - Hao / Folkert (DataBiology) - Myles / Erik (use cases) - John / Robert (Launch Pads) • 16:00 Break • 16:20 Task Organization / Discussion • 18:00 Pizza dinner • 19:00 Continue as desired • 22:00 ZBW doors close Hackathon Agenda
  52. 52. Goal: Show FAIR interoperation between data & code Hackathon Agenda
  53. 53. Resolves to GUPRI ePIC FAIR Digital Object Record fdo:digitalObjectOfType fdo:MGFile ; fdo:locationOfDO <https://hackathon.fair-dtls.surf-hosted.nl/EL/> ; datacite:hasIdentifier :identifier ; dct:conformsTo <https://hackathon.fair-dtls.surf-hosted.nl/shacl-record.ttl> . fdof:hasResourceLocation Resource fdo:digitalObjectOfType Type MG File fdof:hasMetadata fdof:isMetadataOf Extensible Metadata # metadata section #<http://rdf.ncbi.nlm.nih.gov/pubchem/compound/CID702> ; # Ethanol #<http://rdf.ncbi.nlm.nih.gov/pubchem/compound/CID5280450> ; # Lineoleic acid #<http://rdf.ncbi.nlm.nih.gov/pubchem/compound/CID5282184> . # Ethyl Lineolate :elMetadata :respresents :molecule . :molecule :molecularWeight "308.47"^^:gramsPerMol ; skos:prefLabel "Ethyl Lineolate" ; skos:notation "C20H36O2" ; :cas "544-35-4" ; <http://semanticscience.org/resource/SIO_000212> <http://dx.doi.org/10.1002/ anie.201801332> ; # is referred to by :availableAt <https://www.sigmaaldrich.com/catalog/search? term=ethyl+linoleate&interface=All&N=0&mode=match%20partialmax&lang=en&regi on=US&focus=product> . # provenance :elMetadata dct:contributor orcid:0000-0002-8042-4131 . orcid:0000-0002-8042-4131 a foaf:Person ; foaf:name "Myles Axton" ; pro:holdsRoleInTime [ a pro:RoleInTime ; pro:withRole scoro:investigator-role ; ] .
  54. 54. • GUPRI • FDO Record • Type - Molecular Graph • Extensible Metadata • Resource - molecular structure • GUPRI • FDO Record • Type - .mol • Extensible Metadata • Resource - molecular structure • GUPRI • FDO Record • Type - File conversion script • Extensible Metadata • Resource - Docker image FAIR Molecule 1 FAIR Molecule 2FDO for scripts FAIR Molecule Hackathon
  55. 55. • GUPRI • FDO Record • Type - Molecular Graph • Extensible Metadata • Resource - molecular structure • GUPRI • FDO Record • Type - .mol • Extensible Metadata • Resource - molecular structure • GUPRI • FDO Record • Type - File conversion script • Extensible Metadata • Resource - Docker image FAIR Molecule 1 FAIR Molecule 2FDO for scripts FAIR Molecule Hackathon FDO Orchestration
  56. 56. FAIR Molecule Established Knowledge - chemical informatics Real World Observations - lab automation Virtual World Observations - computer simulations chemify.org
  57. 57. FAIR Molecules as Digital Twins https://www.manufacturingleadershipcouncil.com/2019/12/02/digital-twins/
  58. 58. FAIR Molecules as Digital Twins
  59. 59. FAIR Molecules as Digital Twins chemify.org
  60. 60. FAIR Molecule Drug candidates for COVID-19
  61. 61. FDO Hackathon https://docs.google.com/document/d/1rhUeMmdIf7khn5XAgLW0oBpqp81kjtmmxEC6queRl5A/edit?usp=sharing
  62. 62. FDO Hackathon https://docs.google.com/document/d/1rhUeMmdIf7khn5XAgLW0oBpqp81kjtmmxEC6queRl5A/edit?usp=sharing https://www.go-fair.org/today/FAIR-funder/
  63. 63. Convergence Resource 1 Resource 2 Resource 3 Resource 4 Resource 5 Resource 6 Resource 7 Resource 8 F A I R 0 1 0 0 0 0 0 1 1 1 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 0 1 0 1 0 1 1 1 Communities Resources FAIR Implementation Profiles Convergence Matrix http://www.data-intelligence-journal.org/p/47/ Reusing FIPs https://osf.io/8sv5f/
  64. 64. Convergence • FIPs are reusable = drives convergence • FIPs guarantee interoperation • FIPs inform data stewardship plans FIPs are the DNA of the DMP Convergence Matrix http://www.data-intelligence-journal.org/p/47/ Reusing FIPs https://osf.io/8sv5f/
  65. 65. Convergence Resource 1 Resource 2 Resource 3 Resource 4 Resource 5 Resource 6 Resource 7 Resource 8 F A I R 0 1 0 0 0 0 0 1 1 1 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 0 0 0 0 0 1 0 1 0 1 1 1 Communities Resources Pharma Industry Challenge: Develop a common pre-competative FIP
  66. 66. Thank you & Questions
  67. 67. 3 communities building FAIR distributed learning platforms “FAIR Data Trains” Barbra Magagna, Umweltbundesamt GmbH Kristina Hettne, CDS University Library
  68. 68. 3 communities building FAIR distributed learning platforms “FAIR Data Trains” Barbra Magagna, Umweltbundesamt GmbH Kristina Hettne, CDS University Library Choice Challenge
  69. 69. 3 communities building FAIR distributed learning platforms “FAIR Data Trains” Barbra Magagna, Umweltbundesamt GmbH Kristina Hettne, CDS University Library Choice Challenge A F R I
  70. 70. FAIR automation and FAIR applications May 2020 Georges Heiter
  71. 71. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd. AUTOMATION
  72. 72. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd. Humans are manually involved in every step of the research process Bulk of energy is still spent on finding and preparing data for analysis Metadata about digital assets and the operations upon them is mostly not being captured → research not easily repeatable or automatable
  73. 73. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 4 Data Usage Challenge (making data actionable) Data Source Model 1 Data Source Model 2 Data Source Model n Analysis Data Model 1 Analysis Data Model 2 Analysis Data Model n Knowledge Network Data Source Model Narrow scope Specialized Use case specific and/or Proprietary Domain Scope Flexibility Knowledge Network Multi-domain Broad & standardized Growing/changing Analysis Data Model Cross-domain Specialized Use case specific and/or Proprietary
  74. 74. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 5 Machine Actionable Components as Foundation of the Knowledge Network
  75. 75. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd. ▪ Findable with unique ID, and digitally signed ▪ Accessible in an associated permanent registry ▪ Interoperable because they rely on standards ▪ Reusable as self-contained and fully portable ▪ Software integrity and quality ▪ Customizable Page ▪ 6 What makes an application FAIR?
  76. 76. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd. App metadata − Name − Version − Author − Description − Inputs − Outputs − Parameters − License − Original source − Reference data dependencies Page ▪ 7 CIAO App – software packaged with metadata https://hub.databiology.net/app-dbio-blast/tags/2.9.1 docker pull hub.databiology.net/app-dbio/blast:2.9.1 App are stored and distributed in a repository with unique id: Metadata is integrated in the container CIAO app Code Aux Metadata
  77. 77. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd. Sets Page ▪ 8 CIAO app instance Links the app to an infrastructure and organizational context Defines
  78. 78. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd. The workunit will keep record of: Page ▪ 9 CIAO App run App instance execution record − App instance used − App status − Inputs and outputs − Execution versions − Execution logs − Infrastructure used − Keeps data provenance
  79. 79. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 10 CIAO apps evolution – progressive layering of metadata to make apps FAIR CIAO app Code Aux Data Metadata CIAO app instance Storages Compute Security CIAO app run (Workunit) Inputs Outputs Parameters Policies Logs Versions CIAO app Code Aux Data Metadata CIAO app instance Storages Compute Security Policies CIAO app Code Aux Data Metadata
  80. 80. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 11 Machine actionable policies and secrets Policy based Consent Management ▪ Policies make use of metadata − Example: Define consent tags on studies, datasets and entities − Key operations on data, applications and infrastructure subject to policy − Granularity vs scalability ▪ Stand-alone Policy service − System landscape enforces policies managed in policy service −OPA (https://www.openpolicyagent.org/) ▪ Stand-alone secret management − Facilitation of security workflows
  81. 81. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 12 Composability
  82. 82. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 13 Databiology Approach – Intelligent Automation powered by a Knowledge Network that converges Data, Applications, Infrastructure and Organizations Knowledge Network Intelligent Automation INFRASTRUCTURE Source1 Source2 Sourcen DATA App1 AppnApp2 APPLICATIONS Compute Site Application Orchestration Engine Knowledge Engine PEOPLE,ORGS&POLICIES Data
  83. 83. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd. Data Modeling ▪ Entity Definition / MDS ▪ Terminology Service (Ontologies) ▪ Policy Service Page ▪ 14 Knowledge Engine Data Discovery ▪ Search ▪ Ontology Mapping ▪ Collection Management ▪ Federated Search Data Ingestion ▪ Aggregation − Multi-Channel (Batch / Stream) − Enrichment − Validation − Origination (Lineage / Provenance) − Persistence ▪ Federation − Federated Data Sources − Origination (Lineage / Provenance) − Caching
  84. 84. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd. ▪ Secret Management − Secure Credential Store − Security workflows to provide secrets to orchestration processes ▪ Workunit Management − Inspection (Real-time monitoring) − Monitoring & Logging − State control (Real-time) ▪ Compute Capacity Management − Provisioning /Deprovisioning Dynamic Capacity (VMs) − Cloud Providers − On-premise technologies ▪ Data Orchestration − Data Transport (unstructured) o Transfer Protocols − Data Projections (entity data) o Covers Ingress and Egress entity data ▪ Application Orchestration − Application Registry − Application Transport − Dynamic Proxying of Interactive Apps to user browser Page ▪ 15 Orchestration Engine
  85. 85. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 16 Example: Contextually aware research assistant delivers intelligent automation Automatically routes to and executes analysis on the most suitable infrastructure Automatically extracts insights and feeds them back into the knowledge graph Suggests analysis apps based on contextual data, including the user’s data selection and their previous analysis runs
  86. 86. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd. Automation will free researchers to focus on higher level tasks Let machines will take over manual labor intensive functions to allow researchers to focus on ideation and creativity for LOWER COST PER INSIGHT
  87. 87. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 18 Intelligent Automation Value Increase Researcher effectiveness and efficiency • Discover more across a federated knowledge network and collaborate securely • Automation and AI allow researchers to focus on the science instead of the IT • Always use best in class analytics tools to get the most insights out of data Mitigated Risk • Automatic audit trail, provenance and reproducibility • Future-proof due to no technology stack lock-ins (composability side effect) • Lasting data integration and interoperability (knowledge network) Lower the cost per insight • Achieve higher levels of automation -> contextual aware assistance • Eliminate duplication of effort • Speed to value measured in weeks not years
  88. 88. Copyright ©2020. All Rights Reserved. Confidential Databiology Ltd.Page ▪ 19 Do You Have Any Questions?
  89. 89. Expert Panel 1
  90. 90. ©PistoiaAlliance Prepared Questions 2 1. What are different flavours of FAIR implementation and application for Life Science industry? 2. What is the low hanging fruit and likely challenges of FAIR implementation and application by Life Science industry? 3. What would a common specification for FAIR digital objects look like? Why is this important for Life Science industry? Questions from the audience
  91. 91. Get Involved! Join the FAIR Implementation project Ian Harrow Ian.harrow@pistoiaalliance.org Membership: membership@pistoiaalliance.org General Enquiries: Zahid Tharia – zahid.tharia@pistoiaalliance.org www.pistoiaalliance.orgwww.pistoiaalliance.org
  92. 92. Next Webinar Lab of the Future Thurs, May 21st, 2020, 14:30 – 16:00 BST www.pistoiaalliance.org/webinars-2020/
  93. 93. info@pistoiaalliance.org @pistoiaalliance www.pistoiaalliance.org

×