Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Automating the process of continuously prioritising
data, updating and deploying AI models in a robotised
lab for drug dis...
Objective: Accelerate drug discovery using
AI and intelligent design of experiments.
• Predict safety concerns (fail early...
Hypothesis
revise
Insight
• Iterative
• Flexible
• Mostly manual
• Slow
Experiments
Analysis and interpretation
Traditiona...
Data-driven science
Stream Processing
Real- T ime
Analytics
Data Results
Data
Models Evaluation
Prediction/
Insight
Hard p...
Intelligent experimentation
Data
Stream Processing
Real- T ime
Analytics
Data Results
Current fact finding
Analyze data in...
Genetic or
chemical
perturbations
Experiments
in multi-
well plates
Imaging Features Hypotheses
Convolutional Neural Netwo...
Automated cell-based experiments
GPU cluster
CPU server
Storage
Automated plate handling
Informatics system
Current setup
...
Robotized lab
images
cell profiles
CPU/GPU/HPC cloud
Models
OMERO
Chemical DB
link
Notebooks
chemicals
Data
Models
Externa...
•Fluorescent LNPs (lipids)
•Fluorescent Cargo (mRNA)
•Fluorescent Product (protein)
No
LNPs
Partial LNP
uptake
LNP uptake ...
Dealing with large scale data
• High volume, high velocity
• Continuously process data, train models, serve models
• Embra...
Designing a flexible and scalable
informatics system
14
Cloud
HPC
GPU cluster
CPU server
Storage
Master
Worker
Worker
Server1 (KVM) Server2 (KVM) GPU-Server
GPU-Worker
Join external nodes
VPN /
IPSec
Develop Other
Maste...
System configuration, IaaC
Master
Worker
Server1 (KVM) Server2 (KVM) GPU-Server
Kubernetes (administered with Rancher)
Dev...
17
Multi-cluster Kubernetes management
• Kubernetes Setup and
admin in the cluster
• High availability.
Kubernetes extras
...
Continuous Integration and Deployment (CICD)
• IaaC
• Cluster config
• Services
Master
Worker
Server1 (KVM) Server2 (KVM) ...
AI/Machine Learning life cycle
•Structured processing steps
•Governance
•Versioning
•Streamline analysis on high-
performa...
DevOps for machine learning
A systematic method of working that accelerates the path from initial data to production AI se...
Logging and monitoring
http://haste.research.it.uu.se/
FAIR data and services
FAIR1
• Findable
• Semantic service discovery (JSON-LD)
• Accessible
• Web UI
• Programmatic API (O...
Examples of what we serve
Site-of-metabolism and reaction types
http://ptp.service.pharmb.io/
https://metpred.service.phar...
Robotized lab
images
cell profiles
CPU/GPU/HPC cloud
Models
OMERO
Chemical DB
link
Notebooks
chemicals
Data
Models
Externa...
Key components and takeaways
• IaaC: Infrastructure is portable, testable and simplifies maintenance
• Containers: Compone...
Implications: Continuous Analytics
• We can handle the continuous data processing from instruments with
robust, resilient ...
• Easy deployment of virtual infrastructures on IaaS
• Containerize tools, orchestrate microservices with
workflow systems...
Research group website: http://pharmb.io
- Thank you -
Funding:
Automating the process of continuously prioritising data, updating and deploying AI models in a robotised lab for drug dis...
Upcoming SlideShare
Loading in …5
×

Automating the process of continuously prioritising data, updating and deploying AI models in a robotised lab for drug discovery

55 views

Published on

Presentation at Data Innovation Summit 2019 in Stockholm, Sweden.

ABSTRACT
Microscopes are capable of producing vast amounts of data, and when used in automated laboratories both the number and size of images present many challenges for storing, categorizing, analyzing, annotating, and transforming the data into actionable information that can used for decision making; either by humans or machines. In this presentation I will describe the informatics system we have established at the Department of Pharmaceutical Biosciences at Uppsala University, which consists of computational hardware (CPUs, GPUs, storage), middleware (Kubernetes), imaging database (OMERO), and workflow system (Pachyderm) to perform online prioritization of new data, as well as the continuous analytics system to automate the process from captured images to continuously updated and deployed AI models. The AI methodologies include Deep Learning models trained on image data, and conventional machine learning models trained on features extracted from images or chemical structures. Due to the microservice architecture the system is scalable and can be expanded using hybrid-architectures with cloud computing resources. The informatics system serves a robotized cell profiling setup with incubators, liquid handling and high-content microscopy. The lab is quite young and is targeting applications primarily in drug screening and toxicity assessment, with the aim to improve research using AI and intelligent design of experiments.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Automating the process of continuously prioritising data, updating and deploying AI models in a robotised lab for drug discovery

  1. 1. Automating the process of continuously prioritising data, updating and deploying AI models in a robotised lab for drug discovery Ola Spjuth <ola.spjuth@farmbio.uu.se> Department of Pharmaceutical Biosciences, Uppsala University www.pharmb.io Lead Scientist AI at
  2. 2. Objective: Accelerate drug discovery using AI and intelligent design of experiments. • Predict safety concerns (fail early) • Explain drug mechanisms • Screen for new drugs
  3. 3. Hypothesis revise Insight • Iterative • Flexible • Mostly manual • Slow Experiments Analysis and interpretation Traditional hypothesis testing • Retrospective analysis • Hopefully predictive • Expensive • Limited for hypothesis testing more Predictive modeling Database Data generation Traditional Processing Stream Processing Data Data Query request response Real- T ime Analytics Data Results ModelPrediction Modeling and prediction
  4. 4. Data-driven science Stream Processing Real- T ime Analytics Data Results Data Models Evaluation Prediction/ Insight Hard problem! Poor accuracy? Hypothesis Hypothesis test generate Generate new data
  5. 5. Intelligent experimentation Data Stream Processing Real- T ime Analytics Data Results Current fact finding Analyze data in motion – before it is storedsk Continuous Analytics Results Intelligent design of experiments Experiments Scientist • What experiments should we do and how? • Can we reduce search space? • How store only interesting data? • Can we replace experiments with predictions? Informatics system
  6. 6. Genetic or chemical perturbations Experiments in multi- well plates Imaging Features Hypotheses Convolutional Neural Network Predictions High-content cell profiling Bray et al. (2016). “Cell Painting, a High-Content Image-Based Assay for Morphological Profiling Using Multiplexed Fluorescent Dyes.” Nature Protocols 11 (9): 1757–74.
  7. 7. Automated cell-based experiments GPU cluster CPU server Storage Automated plate handling Informatics system Current setup Automate more protocols Open source robotized cell lab
  8. 8. Robotized lab images cell profiles CPU/GPU/HPC cloud Models OMERO Chemical DB link Notebooks chemicals Data Models External user Services Public services
  9. 9. •Fluorescent LNPs (lipids) •Fluorescent Cargo (mRNA) •Fluorescent Product (protein) No LNPs Partial LNP uptake LNP uptake and mRNA decoding
  10. 10. Dealing with large scale data • High volume, high velocity • Continuously process data, train models, serve models • Embrace scalable virtual infrastructures (cloud) and microservices (containers) • Intelligently prioritize what to store
  11. 11. Designing a flexible and scalable informatics system 14 Cloud HPC GPU cluster CPU server Storage
  12. 12. Master Worker Worker Server1 (KVM) Server2 (KVM) GPU-Server GPU-Worker Join external nodes VPN / IPSec Develop Other Master Worker Master/ Worker Other Our infrastructure Master Worker Master Worker External users Internal users and devel (administered with Rancher) Fileserver (NFS)
  13. 13. System configuration, IaaC Master Worker Server1 (KVM) Server2 (KVM) GPU-Server Kubernetes (administered with Rancher) Develop Other Master Worker Master/ Worker Other OpenShift Master Worker Master Worker
  14. 14. 17 Multi-cluster Kubernetes management • Kubernetes Setup and admin in the cluster • High availability. Kubernetes extras such as Networking, Helm, Nginx-load- balancer with tested and matched versions
  15. 15. Continuous Integration and Deployment (CICD) • IaaC • Cluster config • Services Master Worker Server1 (KVM) Server2 (KVM) GPU-Server Develop Other Master Worker Master/ Worker Other Master Worker Master Worker
  16. 16. AI/Machine Learning life cycle •Structured processing steps •Governance •Versioning •Streamline analysis on high- performance e-infrastructures •Support reproducible data analysis •Enable large-scale data analysis
  17. 17. DevOps for machine learning A systematic method of working that accelerates the path from initial data to production AI services Continuous Integration & Continuous Delivery Physical Virtual Cloud Data scientist Data Modeling Testing Pre-processing Training Container Testing Model serving
  18. 18. Logging and monitoring
  19. 19. http://haste.research.it.uu.se/
  20. 20. FAIR data and services FAIR1 • Findable • Semantic service discovery (JSON-LD) • Accessible • Web UI • Programmatic API (OpenAPI) • Interoperable • Open API • Semantic annotations (JSON-LD) • Reproducible • Published scientific workflow (Pachyderm) 23 Services • Portable (Containers) • Resilient (Kubernetes) • Scalable (Kubernetes and cloud computing) 1 Wilkinson, Mark D., et al. "The FAIR Guiding Principles for scientific data management and stewardship." Scientific data 3 (2016).
  21. 21. Examples of what we serve Site-of-metabolism and reaction types http://ptp.service.pharmb.io/ https://metpred.service.pharmb.io/draw/ Target (safety) profiles
  22. 22. Robotized lab images cell profiles CPU/GPU/HPC cloud Models OMERO Chemical DB link Notebooks chemicals Data Models External user Services Public services
  23. 23. Key components and takeaways • IaaC: Infrastructure is portable, testable and simplifies maintenance • Containers: Components become portable and scalable • Kubernetes: Resilient container orchestration over multiple nodes • logging and monitoring - profiling • Hybrid cloud - elasticity • Pachyderm: Workflows of containers in Kubernetes with data versioning • CICD: Streamline development and testing • Open Source: Only open source components  DevOps: Software Developers, Data Engineers and Data Scientists working together in same infrastructure
  24. 24. Implications: Continuous Analytics • We can handle the continuous data processing from instruments with robust, resilient data pipelines • We can continuously re-train models as data is updated • We can continuously publish data and models Agile research group of different competencies • Scientists has access to infrastructure • DevOps roles, no dedicated sysadmin / tool devel / data engineer
  25. 25. • Easy deployment of virtual infrastructures on IaaS • Containerize tools, orchestrate microservices with workflow systems on top of Kubernetes and OpenShift Cloudflare kubeadm Terraform kubectl Packer KubeNow https://github.com/kubenow Capuccini et al. On-Demand Virtual Research Environments using Microservices. https://arxiv.org/abs/1805.06180 Novella et al. Container-based bioinformatics with Pachyderm. Bioinformatics. 35, 5, 839-846. (2018). DOI: http://dx.doi.org/10.1093/bioinformatics/bty699 Tools we develop and use • Data pipelining and data versioning layer for Kubernetes https://www.pachyderm.io
  26. 26. Research group website: http://pharmb.io - Thank you - Funding:

×