SlideShare a Scribd company logo
1 of 28
Challenges and strategies in bringing
AI models to production
David Qixiang Chen, PhD
Co-founder, CTO, Director of AI
(Watakabe et al, 2014)
Biomedical research is impossible without biological products
Reproducibility Crisis
Blame it on the antibodies
(Nature 2015)
Observation
Hypothesis
ExperimentAnalyze
Theory
2-6
Months wasted per
project
2M
Wasted funding“Blame it on the Antibodies”, Nature 2015
“Reproducibility: Standardize antibodies used in research”, Nature 2015
50%
of products fail
Software Ate The World
■ Software and IT
■ Consumer and Media
■ Finances/banking
904,860
895,670
874,710
818,160
493,750
475,730
472,940
440,980
372,230
342,170
MICROSOFT
APPLE
AMAZON
ALPHABET
BERKSHIRE HATHAWAY
FACEBOOK
ALIBABA
TENCENT
JOHNSON & JOHNSON
EXXONMOBILE
Top 10 Most Valuable Companies 2019 Q1
Source:Wikipedia
What is “Software” Anyway
“Traditional” Computing
■ Deterministic
■ Linear Models
“Artificial Intelligence”
■ Probabilistic
■ Non-linear models
Why Not Biomed
■ Nature is not
deterministic
■ Decisions are not
clear cut
■ Independent from
IT and computing
Traditional
Computing
Biology
Medical
Biology
Medical
Solvable Tasks
All Problems
Humans
AI
Trad Compute
AlphaGo
Biology
Medical
We Are Here
All Problems
Humans
AI
Trad Compute
AlphaGo
BenchSci
Challenges of ML Engineering
■Model Code Organization
■Data dependency
■Data/Model Drift
■Model co-dependency
The New Wall Of Confusion
ML Scientists Software Engineers
Here’s the BERT on GCN,
got accuracy to 99%.
Can you deploy it?
What’s aTensor?
Can I npm it?
The Engineering in ML
■ ML engineering is more than fine-tuning training metrics
■ Run-time efficiency
■ Coding structure for extensibility
■ Deployment scaling
■ Good ML engineers are good software engineers first
Model Code Structure
■ NLP and Image tasks
often require
transforming input data
■ Data transformation at
run-time is expensive
■ Models class should not
include these
preprocessing logic.
Use Classes To Encapsulate Models
■ Do use classes to
encapsulate model
training/prediction and
model definitions
■ Separate training and
prediction from the model
■ Don’t relying on ad-hoc
linear codes and do
everything within a single
file.
Separate Forward and Loss
■ Separate model forward
computation and loss
calculation
■ Optimizer and loss can change
often during R&D
■ Forward function will be
reused for inference
■ Needs to be as efficient as
possible
Separate Batching and Single Compute
■ Model assumes tensor I/O
only, do not include batching
logic within a model
■ In Tensorflow and PyTorch,
data loader is a separate class
that can include preprocessing
logic, and output an input
batch.
■ This should be included in the
training class, not the model
definition.
Data Dependency
■ Source control (Git) tracks stateless logic changes as code
■ ML systems are stateful depending on Data
ML Code
Training
Data
Model
Weights
Git
Inference
Data
Prediction
Data/Model Drift
“It’s not that I don’t understand, the world changes too fast”
– Cui Jian
■ Model captures training data assumptions
■ If input changes, the model will breakdown
■ 1. Data format contract ( string instead of numbers )
■ 2. Data input distribution (Here be dragons)
But Why?
ModelSensor ActionWorld
Input & Labels
Prediction
But Why?
ModelSensor ActionWorld
Input & Labels
Prediction
Data Dependency
■ Need to track Input Distribution assumptions
■ Meta should be captured with the model weights
Meta
Distribution
Monitor
ML Code
Training
Data
Model
WeightsInference
Data
Prediction
ML Code
Training
Data
Model
WeightsInference
Data
ML Code
Training
Data
Model
Weights
ML Code
Training
Data Model
Weights
ML Code
Training
Data
Model
Weights Prediction
Model Co-dependency
ML systems grow together
■ Real world systems is a composite of many ML deployments
■ End-to-end model is not realistic
■ Multiple models are intimately linked by data distribution
dependency
■ Top-level output distribution change will cause failure
cascades
Observation from Neuroscience
(Kuner and Flor, 2016)
ML Code
Training
Data
Model
WeightsInference
Data ML Code
Training
Data
Model
Weights
ML Code
Training
Data
Model
Weights Prediction
ML Systems as Single Entity
Meta
Distribution
Monitor
New Strategy Is Needed
■ Combine both modular system design, and ML system dependencies
■ Current coding practice only solves part of the problem
■ Better tools are needed to track multiple ML systems based on
distribution analysis
■ Rethink engineering roles and organization
Conclusion
■ ML Data dependency has challenges at all levels of system
engineering
■ ML system reliability is particularly critical in biomedical
domains
■ ML Deployment is a different beast from ML R&D
development
■ ML engineers will require a wider range of expertise
Thank
You

More Related Content

What's hot

I believe I can fly (Extract London 2015)
I believe I can fly (Extract London 2015)I believe I can fly (Extract London 2015)
I believe I can fly (Extract London 2015)Ignacio Elola Villar
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Edureka!
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Edureka!
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Edureka!
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science Mahesh Kumar CV
 
BioIT Webinar on AI and data methods for drug discovery
BioIT Webinar on AI and data methods for drug discoveryBioIT Webinar on AI and data methods for drug discovery
BioIT Webinar on AI and data methods for drug discoveryFernanda Foertter
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
 
Sentiment Analysis In Retail Domain
Sentiment Analysis In Retail DomainSentiment Analysis In Retail Domain
Sentiment Analysis In Retail DomainEdureka!
 
Data Science Lifecycle
Data Science LifecycleData Science Lifecycle
Data Science LifecycleSwapnilDahake2
 
The DevOps Panel - Innotech Austin CD Summit
The DevOps Panel - Innotech Austin CD SummitThe DevOps Panel - Innotech Austin CD Summit
The DevOps Panel - Innotech Austin CD SummitErnest Mueller
 
Open Data, Big Data and Machine Learning
Open Data, Big Data and Machine LearningOpen Data, Big Data and Machine Learning
Open Data, Big Data and Machine LearningSteven Van Vaerenbergh
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?NUS-ISS
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big DataRevolution Analytics
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...QuantUniversity
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With REdureka!
 
Jennifer Marsman, Principal Developer Evangelist, Microsoft at MLconf ATL - 9...
Jennifer Marsman, Principal Developer Evangelist, Microsoft at MLconf ATL - 9...Jennifer Marsman, Principal Developer Evangelist, Microsoft at MLconf ATL - 9...
Jennifer Marsman, Principal Developer Evangelist, Microsoft at MLconf ATL - 9...MLconf
 

What's hot (20)

I believe I can fly (Extract London 2015)
I believe I can fly (Extract London 2015)I believe I can fly (Extract London 2015)
I believe I can fly (Extract London 2015)
 
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
Data Science Tutorial | What is Data Science? | Data Science For Beginners | ...
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
 
Ml masterclass
Ml masterclassMl masterclass
Ml masterclass
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
 
BioIT Webinar on AI and data methods for drug discovery
BioIT Webinar on AI and data methods for drug discoveryBioIT Webinar on AI and data methods for drug discovery
BioIT Webinar on AI and data methods for drug discovery
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Sentiment Analysis In Retail Domain
Sentiment Analysis In Retail DomainSentiment Analysis In Retail Domain
Sentiment Analysis In Retail Domain
 
Data Science Lifecycle
Data Science LifecycleData Science Lifecycle
Data Science Lifecycle
 
Data science
Data scienceData science
Data science
 
Hands-on Introduction to Machine Learning
Hands-on Introduction to Machine LearningHands-on Introduction to Machine Learning
Hands-on Introduction to Machine Learning
 
The DevOps Panel - Innotech Austin CD Summit
The DevOps Panel - Innotech Austin CD SummitThe DevOps Panel - Innotech Austin CD Summit
The DevOps Panel - Innotech Austin CD Summit
 
Open Data, Big Data and Machine Learning
Open Data, Big Data and Machine LearningOpen Data, Big Data and Machine Learning
Open Data, Big Data and Machine Learning
 
Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?Big Data Analytics: Challenge or Opportunity?
Big Data Analytics: Challenge or Opportunity?
 
Data Science: Not Just For Big Data
Data Science: Not Just For Big DataData Science: Not Just For Big Data
Data Science: Not Just For Big Data
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With R
 
Jennifer Marsman, Principal Developer Evangelist, Microsoft at MLconf ATL - 9...
Jennifer Marsman, Principal Developer Evangelist, Microsoft at MLconf ATL - 9...Jennifer Marsman, Principal Developer Evangelist, Microsoft at MLconf ATL - 9...
Jennifer Marsman, Principal Developer Evangelist, Microsoft at MLconf ATL - 9...
 

Similar to Challenges and strategies in bringing AI models to production

Towards the Industrialization of AI
Towards the Industrialization of AITowards the Industrialization of AI
Towards the Industrialization of AIHui Lei
 
Test-Driven Machine Learning
Test-Driven Machine LearningTest-Driven Machine Learning
Test-Driven Machine LearningC4Media
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Matt Stubbs
 
AI in Business: Opportunities & Challenges
AI in Business: Opportunities & ChallengesAI in Business: Opportunities & Challenges
AI in Business: Opportunities & ChallengesTathagat Varma
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...Daniel Katz
 
EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session Steve Ardire
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSrishti44
 
Whats Next for Machine Learning
Whats Next for Machine LearningWhats Next for Machine Learning
Whats Next for Machine LearningOgilvy Consulting
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...Edge AI and Vision Alliance
 
Lifesaving AI and Javascript (JSConf Korea 2019)
Lifesaving AI and Javascript (JSConf Korea 2019)Lifesaving AI and Javascript (JSConf Korea 2019)
Lifesaving AI and Javascript (JSConf Korea 2019)Jaeman An
 
Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020Debmalya Biswas
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiProfessor Lili Saghafi
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsAkin Osman Kazakci
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014Roger Barga
 
Practical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in CybersecurityPractical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in Cybersecurityscoopnewsgroup
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellSri Ambati
 
Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing Vishal Sharma
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 

Similar to Challenges and strategies in bringing AI models to production (20)

Towards the Industrialization of AI
Towards the Industrialization of AITowards the Industrialization of AI
Towards the Industrialization of AI
 
Test-Driven Machine Learning
Test-Driven Machine LearningTest-Driven Machine Learning
Test-Driven Machine Learning
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
 
AI in Business: Opportunities & Challenges
AI in Business: Opportunities & ChallengesAI in Business: Opportunities & Challenges
AI in Business: Opportunities & Challenges
 
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
ICPSR - Complex Systems Models in the Social Sciences - Lecture 6 - Professor...
 
EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Whats Next for Machine Learning
Whats Next for Machine LearningWhats Next for Machine Learning
Whats Next for Machine Learning
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
 
Lifesaving AI and Javascript (JSConf Korea 2019)
Lifesaving AI and Javascript (JSConf Korea 2019)Lifesaving AI and Javascript (JSConf Korea 2019)
Lifesaving AI and Javascript (JSConf Korea 2019)
 
Data Science
Data ScienceData Science
Data Science
 
Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020Ethical AI - Open Compliance Summit 2020
Ethical AI - Open Compliance Summit 2020
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014
 
Practical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in CybersecurityPractical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in Cybersecurity
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 
Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Challenges and strategies in bringing AI models to production

  • 1. Challenges and strategies in bringing AI models to production David Qixiang Chen, PhD Co-founder, CTO, Director of AI
  • 2. (Watakabe et al, 2014) Biomedical research is impossible without biological products
  • 3. Reproducibility Crisis Blame it on the antibodies (Nature 2015)
  • 4. Observation Hypothesis ExperimentAnalyze Theory 2-6 Months wasted per project 2M Wasted funding“Blame it on the Antibodies”, Nature 2015 “Reproducibility: Standardize antibodies used in research”, Nature 2015 50% of products fail
  • 5. Software Ate The World ■ Software and IT ■ Consumer and Media ■ Finances/banking 904,860 895,670 874,710 818,160 493,750 475,730 472,940 440,980 372,230 342,170 MICROSOFT APPLE AMAZON ALPHABET BERKSHIRE HATHAWAY FACEBOOK ALIBABA TENCENT JOHNSON & JOHNSON EXXONMOBILE Top 10 Most Valuable Companies 2019 Q1 Source:Wikipedia
  • 6. What is “Software” Anyway “Traditional” Computing ■ Deterministic ■ Linear Models “Artificial Intelligence” ■ Probabilistic ■ Non-linear models
  • 7. Why Not Biomed ■ Nature is not deterministic ■ Decisions are not clear cut ■ Independent from IT and computing Traditional Computing Biology Medical
  • 9. Biology Medical We Are Here All Problems Humans AI Trad Compute AlphaGo BenchSci
  • 10. Challenges of ML Engineering ■Model Code Organization ■Data dependency ■Data/Model Drift ■Model co-dependency
  • 11. The New Wall Of Confusion ML Scientists Software Engineers Here’s the BERT on GCN, got accuracy to 99%. Can you deploy it? What’s aTensor? Can I npm it?
  • 12. The Engineering in ML ■ ML engineering is more than fine-tuning training metrics ■ Run-time efficiency ■ Coding structure for extensibility ■ Deployment scaling ■ Good ML engineers are good software engineers first
  • 13. Model Code Structure ■ NLP and Image tasks often require transforming input data ■ Data transformation at run-time is expensive ■ Models class should not include these preprocessing logic.
  • 14. Use Classes To Encapsulate Models ■ Do use classes to encapsulate model training/prediction and model definitions ■ Separate training and prediction from the model ■ Don’t relying on ad-hoc linear codes and do everything within a single file.
  • 15. Separate Forward and Loss ■ Separate model forward computation and loss calculation ■ Optimizer and loss can change often during R&D ■ Forward function will be reused for inference ■ Needs to be as efficient as possible
  • 16. Separate Batching and Single Compute ■ Model assumes tensor I/O only, do not include batching logic within a model ■ In Tensorflow and PyTorch, data loader is a separate class that can include preprocessing logic, and output an input batch. ■ This should be included in the training class, not the model definition.
  • 17. Data Dependency ■ Source control (Git) tracks stateless logic changes as code ■ ML systems are stateful depending on Data ML Code Training Data Model Weights Git Inference Data Prediction
  • 18. Data/Model Drift “It’s not that I don’t understand, the world changes too fast” – Cui Jian ■ Model captures training data assumptions ■ If input changes, the model will breakdown ■ 1. Data format contract ( string instead of numbers ) ■ 2. Data input distribution (Here be dragons)
  • 21. Data Dependency ■ Need to track Input Distribution assumptions ■ Meta should be captured with the model weights Meta Distribution Monitor ML Code Training Data Model WeightsInference Data Prediction
  • 22. ML Code Training Data Model WeightsInference Data ML Code Training Data Model Weights ML Code Training Data Model Weights ML Code Training Data Model Weights Prediction Model Co-dependency
  • 23. ML systems grow together ■ Real world systems is a composite of many ML deployments ■ End-to-end model is not realistic ■ Multiple models are intimately linked by data distribution dependency ■ Top-level output distribution change will cause failure cascades
  • 25. ML Code Training Data Model WeightsInference Data ML Code Training Data Model Weights ML Code Training Data Model Weights Prediction ML Systems as Single Entity Meta Distribution Monitor
  • 26. New Strategy Is Needed ■ Combine both modular system design, and ML system dependencies ■ Current coding practice only solves part of the problem ■ Better tools are needed to track multiple ML systems based on distribution analysis ■ Rethink engineering roles and organization
  • 27. Conclusion ■ ML Data dependency has challenges at all levels of system engineering ■ ML system reliability is particularly critical in biomedical domains ■ ML Deployment is a different beast from ML R&D development ■ ML engineers will require a wider range of expertise

Editor's Notes

  1. Biomedical research, which is foundation of drug development, is impossible without biological products. Biological products are compounds that are used by scientists, in both pharma and academia, to conduct experiments which lead to the development of drugs for life threatening diseases.