SlideShare a Scribd company logo
Building an informatics solution to sustain AI-guided cell
profiling with high-content microscopy imaging
Ola Spjuth <ola.spjuth@farmbio.uu.se>
Department of Pharmaceutical Biosciences, Uppsala University
www.pharmb.io
Who are we?
• Academic research group at Uppsala University
• Background in computational pharmacology (AI/ML)
• Good at e-infrastructure, big data (data engineering)
• Setting up an high-content imaging lab for cell profiling
Research group website: http://pharmb.io
Objective: Accelerate drug discovery using
AI and intelligent design of experiments.
• Predict safety concerns
• Explain drug mechanisms
• Screen for new drugs
Hypothesis
revise
Insight
• Iterative
• Flexible
• Mostly manual
• Slow
Experiments
Analysis and interpretation
Traditional hypothesis testing
• Retrospective analysis
• Hopefully predictive
• Expensive
• Limited for hypothesis
testing
more
Predictive modeling
Database
Data generation
Traditional Processing Stream Processing
Data
Data Query
request
response
Real- T ime
Analytics
Data Results
ModelPrediction
Modeling and prediction
Data-driven science
Stream Processing
Real- T ime
Analytics
Data Results
Data
Models Evaluation
Prediction/
Insight
Hard problem!
Poor accuracy?
Hypothesis
Hypothesis
test
generate
Generate new data
Closing the loop:
Intelligent experimentation
Data
Stream Processing
Real- T ime
Analytics
Data Results
Current fact finding
Analyze data in motion – before it is storedsk
Continuous Analytics
Results
Intelligent design of
experiments Experiments
Scientist
• What experiments should we
do and how?
• Can we reduce search space?
• How store only interesting
data?
• Can we replace experiments
with predictions?
Automation Informatics
Genetic or
chemical
perturbations
Experiments
in multi-
well plates
Imaging Features Hypotheses
Convolutional Neural Network
Predictions
Cell painting: HCI with multiplexed dyes
Bray et al. (2016). “Cell Painting, a High-Content Image-Based Assay for Morphological
Profiling Using Multiplexed Fluorescent Dyes.” Nature Protocols 11 (9): 1757–74.
Holographic live cell imaging
• Quantitative phase-contrast microscopy
• Holographic phase-shift imaging
• Label-free, live cell imaging
• Used inside incubator
HoloMonitor system
Main focus area: Drug/chemical profiling
with AI modeling
Explore profiling with AI/ML
• Target identification
• Mechanism-of-Action predictions
• Pathway enrichment
actin disruption
microtubule destabilization
aurora kinase inhibition
DNA replication
Eg5 inhibition
protein degradation
cholesterol lowering
DNA damage
epithelial
kinase inhibition
protein synthesis
microtubule stabilization.
Microscopy
image
Deep Neural Network MoA profile prediction
Cell
treatment
• 2D monolayer, cell lines (U2OS,
MCF-7, A549, RKO, …)
• Integrate HCI with other data
model
Protein degradation Cholesterol-lowering DNA replication
Microtubule stabilizer Actin disruptor Kinase inhibitor
Classify images into biological
mechanisms
Kensert A, Harrison PJ, Spjuth O.
Transfer learning with deep convolutional neural network for classifying cellular morphological changes.
SLAS DISCOVERY: Advancing Life Sciences R&D. 24, 4 (2019)
•Fluorescent LNPs (lipids)
•Fluorescent Cargo (mRNA)
•Fluorescent Product (protein)
No
LNPs
Partial LNP
uptake
LNP uptake and mRNA
decoding
Make predictions
using available
data
External data
Data warehouse
Design new
experimentsAI
Modeling
Publish data and models
Manual wet lab
Hypothesis
Verify using
external
protocol
Automated lab
Carry out
new
experiments
Analysis pipeline
Vision: Intelligent systems for
drug/chemical profiling
Hypothesis
Hypothesis
test
generate
Automating our cell-based lab
Fixed setup (version 1)
• ImageXpress XLS (Molecular Devices)
• Plate robot (Preciseflex)
• Plate incubator (Liconic), barcode reader
• BioMek 4000 liquid handling (Beckman
Coulter)
• Green Button Go lab automation software
(Biosero)
Observations:
• Quick to get up and running
• Suitable for fixed protocols
• Dependent on vendors to
solve problems
• Not easy to expand or
configure for us
Open source setup (under construction)
• HoloMonitor (Phase Holographic Imaging)
• OT-2 liquid handling (OpenTrons)
• Plate robots (under procurement)
• Open source lab automation (to be decided)
• More components… (to be decided)
Our priorities:
• Flexibility to expand/adapt
• Open source or good APIs
• Low cost, serviceable by us
• Configurable by us
Collaborators wanted!
Robotized lab
images
Automating our data processing
ImageDBImage viewer
File system
Metadata Files (images)
https://github.com/pharmbio/imagedb
Cold storage
Hot storage
Online,
intelligent
processing
Cell profilesQC workflows Interestingness models
HASTE CORE and Cell Profiler Pipeline
https://github.com/HASTE-project/cellprofiler-pipeline
Avoid storing
uninteresting data
Robotized lab
Data scientists
Empowering our data scientists
ImageDB
File system
Metadata Files (images)
Models
CPU/GPU/HPC cloud
Notebooks
Data
Models
External
users
Services
Public services
Publish
Managing our software ecosystem
• Scientists require many different software tools
• Difficult and time-consuming to manage dependencies
• Software Containers
• Offers isolation on application level, share operating system
• Portable, fast, smaller than virtual machine images
• Docker
• Microservices
• Decompose functionality into smaller, loosely coupled, on-demand
services
• Improve resilience, agile development
• Easy to scale
• Kubernetes
• manage a cluster of machines running containers
Building pipelines of containers
• A suitable way of using containers are
connecting them into a (scientific)
workflow
• Goal: Reproducible, fault-tolerant,
scalable execution
• Lampa S et al. SciPipe: A workflow library for agile development of complex and dynamic bioinformatics pipelines. Gigascience. 8, 5 (2019)
• Spjuth O et al. Approaches for containerized scientific workflows in cloud environments with applications in life science. PeerJ Preprints. 6, e27141v1 (2018
• Capuccini M, et al. MaRe: Container-Based Parallel Computing with Data Locality ArXiv. 1808.02318 (2018)
• Novella JA et al. Container-based bioinformatics with Pachyderm. Bioinformatics. 35, 5, 839-846. (2018)
• Lampa S et al. Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles. Journal of Cheminformatics. 8, 67. (2016)
Dealing with large scale data
• High volume, relatively high velocity
• Continuously process data, train
models, serve models
• Embrace scalable virtual
infrastructures (cloud) and
microservices (containers)
GPU cluster
CPU server
Storage
Cloud
HPC
Online processing
AI modeling life cycle
Model Development
ML studio
ML workflow
automation
Package & Deploy Models Model Serving
Model
management
Model
serving
Monitoring
Explore Data and
Develop Models
Train at scale
Register Model
and Metadata for
Serving
Package and
Publish Run in
operations Monitor
LoggingIntegrate
Data
scientist
Data
Engineer
Data
Engineer
Promote
Model
Ship
Model
www.scaleoutsystems.com
In collaboration with:
Integrate with our other AI services
Site-of-metabolism and reaction types
http://ptp.service.pharmb.io/
https://metpred.service.pharmb.io/draw/
Target (safety) profiles
Implications: Continuous Analytics
• We can handle the continuous data processing from instruments with
robust, resilient data pipelines
• We can continuously re-train models as data is updated
• We can (soon) continuously publish data and models
Data
Traditional Processing Stream Processing
Data Query
request
response
Real-Time
Analytics
Data Results
Continuous Analytics
Results
Intelligent
design of
experiments
Experiments
Scientist
Agile research group of different competencies
• Scientists has access to necessary
infrastructure
• Data stored in structured databases
• DevOps roles, no dedicated sysadmin /
developer
http://haste.research.it.uu.se/
Research group website: http://pharmb.io
- Thank you -
Funding:

More Related Content

What's hot

Agile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discoveryAgile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discovery
Ola Spjuth
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
c.titus.brown
 
MPS webinar master deck
MPS webinar master deckMPS webinar master deck
MPS webinar master deck
Pistoia Alliance
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Paul Groth
 
AI is the Future of Drug Discovery
AI is the Future of Drug DiscoveryAI is the Future of Drug Discovery
AI is the Future of Drug Discovery
David Leahy
 
Ilya Kupershmidt speaks at the Molecular Medicine Tri-Conference
Ilya Kupershmidt speaks at the Molecular Medicine Tri-ConferenceIlya Kupershmidt speaks at the Molecular Medicine Tri-Conference
Ilya Kupershmidt speaks at the Molecular Medicine Tri-Conference
NextBio
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Keesthehyve
 
Integrative data management for reproducibility of microscopy experiments
Integrative data management for reproducibility of microscopy experimentsIntegrative data management for reproducibility of microscopy experiments
Integrative data management for reproducibility of microscopy experiments
Sheeba Samuel
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
Carole Goble
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
Carole Goble
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
Greg Landrum
 
Fair by design
Fair by designFair by design
Fair by design
Pistoia Alliance
 
Scientific Workflow Systems for accessible, reproducible research
Scientific Workflow Systems for accessible, reproducible researchScientific Workflow Systems for accessible, reproducible research
Scientific Workflow Systems for accessible, reproducible research
Peter van Heusden
 
Experimenta
ExperimentaExperimenta
Experimenta
GuttiPavan
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Amit Sheth
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
Neuroscience Information Framework
 
Assessing Drug Safety Using AI
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AI
Databricks
 
Drug Discovery and Development Using AI
Drug Discovery and Development Using AIDrug Discovery and Development Using AI
Drug Discovery and Development Using AI
Databricks
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
Richard Layton
 

What's hot (20)

Agile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discoveryAgile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discovery
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
MPS webinar master deck
MPS webinar master deckMPS webinar master deck
MPS webinar master deck
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
AI is the Future of Drug Discovery
AI is the Future of Drug DiscoveryAI is the Future of Drug Discovery
AI is the Future of Drug Discovery
 
Ilya Kupershmidt speaks at the Molecular Medicine Tri-Conference
Ilya Kupershmidt speaks at the Molecular Medicine Tri-ConferenceIlya Kupershmidt speaks at the Molecular Medicine Tri-Conference
Ilya Kupershmidt speaks at the Molecular Medicine Tri-Conference
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
 
Integrative data management for reproducibility of microscopy experiments
Integrative data management for reproducibility of microscopy experimentsIntegrative data management for reproducibility of microscopy experiments
Integrative data management for reproducibility of microscopy experiments
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
 
Fair by design
Fair by designFair by design
Fair by design
 
Scientific Workflow Systems for accessible, reproducible research
Scientific Workflow Systems for accessible, reproducible researchScientific Workflow Systems for accessible, reproducible research
Scientific Workflow Systems for accessible, reproducible research
 
Experimenta
ExperimentaExperimenta
Experimenta
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
Assessing Drug Safety Using AI
Assessing Drug Safety Using AIAssessing Drug Safety Using AI
Assessing Drug Safety Using AI
 
Drug Discovery and Development Using AI
Drug Discovery and Development Using AIDrug Discovery and Development Using AI
Drug Discovery and Development Using AI
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
 

Similar to Building an informatics solution to sustain AI-guided cell profiling with high-content microscopy imaging

Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...
Ola Spjuth
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
European Bioinformatics Institute
 
CV_10/17
CV_10/17CV_10/17
Collins seattle-2014-final
Collins seattle-2014-finalCollins seattle-2014-final
Collins seattle-2014-final
inside-BigData.com
 
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsRamil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
GigaScience, BGI Hong Kong
 
The case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesThe case for cloud computing in Life Sciences
The case for cloud computing in Life Sciences
Ola Spjuth
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
Philip Cheung
 
Mining Big datasets to create and validate machine learning models
Mining Big datasets to create and validate machine learning modelsMining Big datasets to create and validate machine learning models
Mining Big datasets to create and validate machine learning models
Sean Ekins
 
AI for Science
AI for ScienceAI for Science
AI for Science
inside-BigData.com
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Sean Ekins
 
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Ashish Sharma
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Spark Summit
 
Digital transformation of translational medicine
Digital transformation of translational medicineDigital transformation of translational medicine
Digital transformation of translational medicine
Eagle Genomics
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
Russ Altman
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
mikaelhuss
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
DataScienceConferenc1
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
Ian Foster
 

Similar to Building an informatics solution to sustain AI-guided cell profiling with high-content microscopy imaging (20)

Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Cv long
Cv longCv long
Cv long
 
CV_10/17
CV_10/17CV_10/17
CV_10/17
 
Collins seattle-2014-final
Collins seattle-2014-finalCollins seattle-2014-final
Collins seattle-2014-final
 
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsRamil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
 
The case for cloud computing in Life Sciences
The case for cloud computing in Life SciencesThe case for cloud computing in Life Sciences
The case for cloud computing in Life Sciences
 
BioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadataBioAssay Express: Creating and exploiting assay metadata
BioAssay Express: Creating and exploiting assay metadata
 
Mining Big datasets to create and validate machine learning models
Mining Big datasets to create and validate machine learning modelsMining Big datasets to create and validate machine learning models
Mining Big datasets to create and validate machine learning models
 
AI for Science
AI for ScienceAI for Science
AI for Science
 
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning ModelsMining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
Mining 'Bigger' Datasets to Create, Validate and Share Machine Learning Models
 
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
Radiomics Data Management, Computation, and Analysis for QIN F2F 2016
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
 
Digital transformation of translational medicine
Digital transformation of translational medicineDigital transformation of translational medicine
Digital transformation of translational medicine
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
DR KL CV v5
DR KL CV v5DR KL CV v5
DR KL CV v5
 

More from Ola Spjuth

Automating cell-based screening with open source, robotics and AI
Automating cell-based screening with open source, robotics and AIAutomating cell-based screening with open source, robotics and AI
Automating cell-based screening with open source, robotics and AI
Ola Spjuth
 
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression DatasetsCombining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
Ola Spjuth
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...
Ola Spjuth
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
Ola Spjuth
 
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in SwedenStorage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Ola Spjuth
 
Enabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceEnabling Translational Medicine with e-Science
Enabling Translational Medicine with e-Science
Ola Spjuth
 
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Ola Spjuth
 
Interoperability and scalability with microservices in science
Interoperability and scalability with microservices in scienceInteroperability and scalability with microservices in science
Interoperability and scalability with microservices in science
Ola Spjuth
 
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
Ola Spjuth
 
Building a flexible infrastructure with Bioclipse, open source, and federated...
Building a flexible infrastructure with Bioclipse, open source, and federated...Building a flexible infrastructure with Bioclipse, open source, and federated...
Building a flexible infrastructure with Bioclipse, open source, and federated...
Ola Spjuth
 
Accessing and scripting CDK from Bioclipse
Accessing and scripting CDK from BioclipseAccessing and scripting CDK from Bioclipse
Accessing and scripting CDK from Bioclipse
Ola Spjuth
 

More from Ola Spjuth (11)

Automating cell-based screening with open source, robotics and AI
Automating cell-based screening with open source, robotics and AIAutomating cell-based screening with open source, robotics and AI
Automating cell-based screening with open source, robotics and AI
 
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression DatasetsCombining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
Combining Prediction Intervals on Multi-Source Non-Disclosed Regression Datasets
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in SwedenStorage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
Storage and Analysis of Sensitive Large-Scale Biomedical Data in Sweden
 
Enabling Translational Medicine with e-Science
Enabling Translational Medicine with e-ScienceEnabling Translational Medicine with e-Science
Enabling Translational Medicine with e-Science
 
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
Analyzing Big Data in Medicine with Virtual Research Environments and Microse...
 
Interoperability and scalability with microservices in science
Interoperability and scalability with microservices in scienceInteroperability and scalability with microservices in science
Interoperability and scalability with microservices in science
 
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
Chemical decision support in toxicology and pharmacology (OpenToxEU 2013)
 
Building a flexible infrastructure with Bioclipse, open source, and federated...
Building a flexible infrastructure with Bioclipse, open source, and federated...Building a flexible infrastructure with Bioclipse, open source, and federated...
Building a flexible infrastructure with Bioclipse, open source, and federated...
 
Accessing and scripting CDK from Bioclipse
Accessing and scripting CDK from BioclipseAccessing and scripting CDK from Bioclipse
Accessing and scripting CDK from Bioclipse
 

Recently uploaded

DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
alishadewangan1
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
frank0071
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
NoelManyise1
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 

Recently uploaded (20)

DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdfMudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
Mudde & Rovira Kaltwasser. - Populism - a very short introduction [2017].pdf
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 

Building an informatics solution to sustain AI-guided cell profiling with high-content microscopy imaging

  • 1. Building an informatics solution to sustain AI-guided cell profiling with high-content microscopy imaging Ola Spjuth <ola.spjuth@farmbio.uu.se> Department of Pharmaceutical Biosciences, Uppsala University www.pharmb.io
  • 2. Who are we? • Academic research group at Uppsala University • Background in computational pharmacology (AI/ML) • Good at e-infrastructure, big data (data engineering) • Setting up an high-content imaging lab for cell profiling Research group website: http://pharmb.io
  • 3. Objective: Accelerate drug discovery using AI and intelligent design of experiments. • Predict safety concerns • Explain drug mechanisms • Screen for new drugs
  • 4. Hypothesis revise Insight • Iterative • Flexible • Mostly manual • Slow Experiments Analysis and interpretation Traditional hypothesis testing • Retrospective analysis • Hopefully predictive • Expensive • Limited for hypothesis testing more Predictive modeling Database Data generation Traditional Processing Stream Processing Data Data Query request response Real- T ime Analytics Data Results ModelPrediction Modeling and prediction
  • 5. Data-driven science Stream Processing Real- T ime Analytics Data Results Data Models Evaluation Prediction/ Insight Hard problem! Poor accuracy? Hypothesis Hypothesis test generate Generate new data
  • 6. Closing the loop: Intelligent experimentation Data Stream Processing Real- T ime Analytics Data Results Current fact finding Analyze data in motion – before it is storedsk Continuous Analytics Results Intelligent design of experiments Experiments Scientist • What experiments should we do and how? • Can we reduce search space? • How store only interesting data? • Can we replace experiments with predictions? Automation Informatics
  • 7. Genetic or chemical perturbations Experiments in multi- well plates Imaging Features Hypotheses Convolutional Neural Network Predictions Cell painting: HCI with multiplexed dyes Bray et al. (2016). “Cell Painting, a High-Content Image-Based Assay for Morphological Profiling Using Multiplexed Fluorescent Dyes.” Nature Protocols 11 (9): 1757–74.
  • 8. Holographic live cell imaging • Quantitative phase-contrast microscopy • Holographic phase-shift imaging • Label-free, live cell imaging • Used inside incubator HoloMonitor system
  • 9. Main focus area: Drug/chemical profiling with AI modeling Explore profiling with AI/ML • Target identification • Mechanism-of-Action predictions • Pathway enrichment actin disruption microtubule destabilization aurora kinase inhibition DNA replication Eg5 inhibition protein degradation cholesterol lowering DNA damage epithelial kinase inhibition protein synthesis microtubule stabilization. Microscopy image Deep Neural Network MoA profile prediction Cell treatment • 2D monolayer, cell lines (U2OS, MCF-7, A549, RKO, …) • Integrate HCI with other data model
  • 10. Protein degradation Cholesterol-lowering DNA replication Microtubule stabilizer Actin disruptor Kinase inhibitor Classify images into biological mechanisms Kensert A, Harrison PJ, Spjuth O. Transfer learning with deep convolutional neural network for classifying cellular morphological changes. SLAS DISCOVERY: Advancing Life Sciences R&D. 24, 4 (2019)
  • 11. •Fluorescent LNPs (lipids) •Fluorescent Cargo (mRNA) •Fluorescent Product (protein) No LNPs Partial LNP uptake LNP uptake and mRNA decoding
  • 12.
  • 13. Make predictions using available data External data Data warehouse Design new experimentsAI Modeling Publish data and models Manual wet lab Hypothesis Verify using external protocol Automated lab Carry out new experiments Analysis pipeline Vision: Intelligent systems for drug/chemical profiling Hypothesis Hypothesis test generate
  • 14. Automating our cell-based lab Fixed setup (version 1) • ImageXpress XLS (Molecular Devices) • Plate robot (Preciseflex) • Plate incubator (Liconic), barcode reader • BioMek 4000 liquid handling (Beckman Coulter) • Green Button Go lab automation software (Biosero) Observations: • Quick to get up and running • Suitable for fixed protocols • Dependent on vendors to solve problems • Not easy to expand or configure for us Open source setup (under construction) • HoloMonitor (Phase Holographic Imaging) • OT-2 liquid handling (OpenTrons) • Plate robots (under procurement) • Open source lab automation (to be decided) • More components… (to be decided) Our priorities: • Flexibility to expand/adapt • Open source or good APIs • Low cost, serviceable by us • Configurable by us Collaborators wanted!
  • 15. Robotized lab images Automating our data processing ImageDBImage viewer File system Metadata Files (images) https://github.com/pharmbio/imagedb Cold storage Hot storage Online, intelligent processing Cell profilesQC workflows Interestingness models HASTE CORE and Cell Profiler Pipeline https://github.com/HASTE-project/cellprofiler-pipeline Avoid storing uninteresting data
  • 16. Robotized lab Data scientists Empowering our data scientists ImageDB File system Metadata Files (images) Models CPU/GPU/HPC cloud Notebooks Data Models External users Services Public services Publish
  • 17. Managing our software ecosystem • Scientists require many different software tools • Difficult and time-consuming to manage dependencies • Software Containers • Offers isolation on application level, share operating system • Portable, fast, smaller than virtual machine images • Docker • Microservices • Decompose functionality into smaller, loosely coupled, on-demand services • Improve resilience, agile development • Easy to scale • Kubernetes • manage a cluster of machines running containers
  • 18. Building pipelines of containers • A suitable way of using containers are connecting them into a (scientific) workflow • Goal: Reproducible, fault-tolerant, scalable execution • Lampa S et al. SciPipe: A workflow library for agile development of complex and dynamic bioinformatics pipelines. Gigascience. 8, 5 (2019) • Spjuth O et al. Approaches for containerized scientific workflows in cloud environments with applications in life science. PeerJ Preprints. 6, e27141v1 (2018 • Capuccini M, et al. MaRe: Container-Based Parallel Computing with Data Locality ArXiv. 1808.02318 (2018) • Novella JA et al. Container-based bioinformatics with Pachyderm. Bioinformatics. 35, 5, 839-846. (2018) • Lampa S et al. Towards agile large-scale predictive modelling in drug discovery with flow-based programming design principles. Journal of Cheminformatics. 8, 67. (2016)
  • 19. Dealing with large scale data • High volume, relatively high velocity • Continuously process data, train models, serve models • Embrace scalable virtual infrastructures (cloud) and microservices (containers) GPU cluster CPU server Storage Cloud HPC Online processing
  • 20. AI modeling life cycle Model Development ML studio ML workflow automation Package & Deploy Models Model Serving Model management Model serving Monitoring Explore Data and Develop Models Train at scale Register Model and Metadata for Serving Package and Publish Run in operations Monitor LoggingIntegrate Data scientist Data Engineer Data Engineer Promote Model Ship Model www.scaleoutsystems.com In collaboration with:
  • 21. Integrate with our other AI services Site-of-metabolism and reaction types http://ptp.service.pharmb.io/ https://metpred.service.pharmb.io/draw/ Target (safety) profiles
  • 22. Implications: Continuous Analytics • We can handle the continuous data processing from instruments with robust, resilient data pipelines • We can continuously re-train models as data is updated • We can (soon) continuously publish data and models Data Traditional Processing Stream Processing Data Query request response Real-Time Analytics Data Results Continuous Analytics Results Intelligent design of experiments Experiments Scientist Agile research group of different competencies • Scientists has access to necessary infrastructure • Data stored in structured databases • DevOps roles, no dedicated sysadmin / developer
  • 24. Research group website: http://pharmb.io - Thank you - Funding: