Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Navigating the ML Pipeline Jungle
with MLflow: Notes from the Field
Thunder Shiviah
thunder@databricks.com
#SAISDS11
Who am I
2
● Databricks Solutions Architect
focused on machine learning and
deep learning
● Previously McKinsey Data Scien...
3
● Overview of challenges with AI in production
● How we’re solving these challenges
● Demos
● A final word on where AI i...
AI is a Game Changing Opportunity
OPPORTUNITY
Fraud Detection
Genome Sequencing
Recommendation Engine
Predictive Maintenan...
Hardest Part of AI isn’t AI, it’s plumbing
ML
Code
Configuration
Data Collection
Data
Verification
Feature
Extraction
Mach...
ML Lifecycle is Manual, Inconsistent
and Disconnected
● Ad hoc approach to track
experiments
● Very hard to reproduce
expe...
How we’re making AI in
production simple
● ML Runtime - Pre-configured ML libraries for CPU and GPU
● Pandas vectorized UDFs
● Distributed Transfer learning with d...
New: Databricks Runtime for ML
Ready to use clusters with built-in ML Frameworks
GPU support
Run your native Python code with PySpark, fast, with Vectorized
Pandas UDFs
● Use Pandas UDFs to
convert existing
pandas c...
Transfer learning with DL pipelines
● Use pre-trained neural networks to
harness the power of neural nets
on smaller data....
New: Databricks MLflow standardizes
ML Lifecycle
Track Experiments
Reproduce experiments
Build Model
Data Prep
Feed data t...
MLflow Components
13
Tracking
Record and query
experiments: code,
data, config, results
Projects
Packaging format
for repr...
Demo
A word about where AI in
production is going
Q&A
16
Thank you!
thunder@databricks.com
#SAISDS11
17
Upcoming SlideShare
Loading in …5
×

of

Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 1 Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 2 Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 3 Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 4 Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 5 Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 6 Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 7 Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 8 Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 9 Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 10 Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 11 Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 12 Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 13 Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 14 Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 15 Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 16 Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah  Slide 17
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

1 Like

Share

Download to read offline

Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah

Download to read offline

Plumbing has been a key focus of modern software engineering, with our API/services/containers/devops driven landscape so it may come as a surprise that plumbing is where AI projects tend to fail. But it is precisely because our modern software development focuses on decoupled plumbing that we have struggled to handle the rise of AI.

Specifically, companies are able to use AI effectively when they are able to create end-to-end AI model factories that explicitly account for coupling between data, models, and code.

In this talk, I will be walking through what a model factory is and how MLFlow’s design supports the creation of end-to-end model factories as well as sharing best practices I’ve observed helping customers from startups to Fortune 50s create, productionize, and scale end-to-end ML pipelines, and watching those pipelines produce serious, game changing business impact.

Navigating the ML Pipeline Jungle with MLflow: Notes from the Field with Thunder Shiviah

  1. 1. Navigating the ML Pipeline Jungle with MLflow: Notes from the Field Thunder Shiviah thunder@databricks.com #SAISDS11
  2. 2. Who am I 2 ● Databricks Solutions Architect focused on machine learning and deep learning ● Previously McKinsey Data Scientist and QuantumBlack Machine Learning Engineer designing and building ML pipelines for Fortune 100 companies ● Developed and deployed models across diverse verticals such as healthcare, telecom, finance, and renewable energy
  3. 3. 3 ● Overview of challenges with AI in production ● How we’re solving these challenges ● Demos ● A final word on where AI in production is heading ● Q&A
  4. 4. AI is a Game Changing Opportunity OPPORTUNITY Fraud Detection Genome Sequencing Recommendation Engine Predictive Maintenance … LOTS OF NEW DATA Customer Data Click Streams Sensor data (IoT) Video/Speech … DATA SCIENTISTDATA ENGINEER BUSINESS Machine Learning Requires Collaborative Experimentation on Big Data
  5. 5. Hardest Part of AI isn’t AI, it’s plumbing ML Code Configuration Data Collection Data Verification Feature Extraction Machine Resource Management Analysis Tools Process Management Tools Serving Infrastructure Monitoring “Hidden Technical Debt in Machine Learning Systems,” Google NIPS 2015 Figure 1: Only a small fraction of real-world ML systems is composed of the ML code, as shown by the small green box in the middle. The required surrounding infrastructure is vast and complex.
  6. 6. ML Lifecycle is Manual, Inconsistent and Disconnected ● Ad hoc approach to track experiments ● Very hard to reproduce experiments Data Prep ● Multiple tightly coupled deployment options ● Different monitoring approach for each framework Build Model Deploy Model ● Low level integrations for Data and ML ● Difficult to track data used for a model
  7. 7. How we’re making AI in production simple
  8. 8. ● ML Runtime - Pre-configured ML libraries for CPU and GPU ● Pandas vectorized UDFs ● Distributed Transfer learning with deep learning pipelines ● MLflow Simplifying the AI pipeline
  9. 9. New: Databricks Runtime for ML Ready to use clusters with built-in ML Frameworks GPU support
  10. 10. Run your native Python code with PySpark, fast, with Vectorized Pandas UDFs ● Use Pandas UDFs to convert existing pandas code into performant spark UDFs ● Write pyspark dataframes to Pandas fast
  11. 11. Transfer learning with DL pipelines ● Use pre-trained neural networks to harness the power of neural nets on smaller data. ● Model inference using SparkSQL UDFs
  12. 12. New: Databricks MLflow standardizes ML Lifecycle Track Experiments Reproduce experiments Build Model Data Prep Feed data to Models Enrich data in experiments Databricks Delta Databricks Runtime for ML MLflow Project & Tracker Integrate with multiple clouds Manage and monitor models MLflow Serving Deploy Model
  13. 13. MLflow Components 13 Tracking Record and query experiments: code, data, config, results Projects Packaging format for reproducible runs on any platform Models General model format that supports diverse deployment tools
  14. 14. Demo
  15. 15. A word about where AI in production is going
  16. 16. Q&A 16
  17. 17. Thank you! thunder@databricks.com #SAISDS11 17
  • AlexandraParedes33

    Dec. 5, 2021

Plumbing has been a key focus of modern software engineering, with our API/services/containers/devops driven landscape so it may come as a surprise that plumbing is where AI projects tend to fail. But it is precisely because our modern software development focuses on decoupled plumbing that we have struggled to handle the rise of AI. Specifically, companies are able to use AI effectively when they are able to create end-to-end AI model factories that explicitly account for coupling between data, models, and code. In this talk, I will be walking through what a model factory is and how MLFlow’s design supports the creation of end-to-end model factories as well as sharing best practices I’ve observed helping customers from startups to Fortune 50s create, productionize, and scale end-to-end ML pipelines, and watching those pipelines produce serious, game changing business impact.

Views

Total views

1,226

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

53

Shares

0

Comments

0

Likes

1

×