SlideShare a Scribd company logo
Overview of a typical machine learning model workflow
Fact #1 : Doing machine learning IS complex
Fact #2 ! Hardest part of AI actually is not AI code...
Machine learning projects main concerns
1- Open source ML ecosystem is crowded : For each phase of ML
process, there is a myriad of tools to choose from ;
2- Tracking : it is difficult to track by hand which parameters, code, and
data went into each experiment to produce a model, especially when
work in teams ;
3- Reproducibility : Without detailed tracking, teams often have
trouble getting the same code to work / achieve same results
5
- First release on june 2018
- Latest version v1.5, released 19 Dec 2019
Introducing MLflow
6
MLflow address machine learning challenges through its
3 main components
7
“MLflow is an open source platform to manage the ML
lifecycle, including experimentation, reproducibility and
deployment. It currently offers three components:
What is ML flow ?
1 2 3
MLflow value proposition:
Track, store, Deploy models from any ML framework
8
ML tracking API
Single API + UI to track for
each experiment :
▸ Parameters
▸ Metrics
▸ Artefacts (training
datasets, …)
Can be used on standalone
script / from a notebook
9
ML flow Tracking UI
10
ML projects
- ML projects define a standard
packaging format to manage data
science code.
- It can be a simple directory / git
repo with code to run.
- The running environment
requirements are defined as a
simple YAML file.
11
ML flow projects sample YAML project
ML Models
- MLflow Models is a
convention for packaging
machine learning models in
multiple formats called
“flavors”. MLflow offers a
variety of tools to help you
deploy different flavors of
models.
- Each MLflow Model is saved
as a directory containing
arbitrary files and an MLmodel
descriptor file that lists the
flavors it can be used in.
12
Example of scikit-learn model
Model serving commands
13
mlflow models serve
Deploys the model as a
local REST API server.
mlflow models build-
docker
packages a REST API
endpoint serving the model
as a docker image.
mlflow models predict
uses the model to generate
a prediction for a local CSV
or JSON file.
Demo Lab
14
E-commerce fraud detection
We have some json profiles representing fictional customers from an ecommerce
company.
(cf. courtesy of RAVELIN : https://github.com/unravelin/code-test-data-science)
The profiles contain information about the customers, their orders, their transactions, what
payment methods they used and whether the customer is fraudulent or not.
Our task :
● Transform the json profiles into feature vectors :
a. Automated feature engineering using featuretools package
● Construct a model to predict if a customer is fraudulent based on their profile.
a. modeling phase using python + scikit-learn
b. Track experiments results using databricks ML flow
15
16
Transform input data
1/Transactions / orders
2/Labels : fraudulent (true/false)
Extract features
Count orders
min/max/avg transaction amount...
Store analytical dataset
Parquet file with :
customerID | features X… | label
Python script to decode each user profile json array into relational
pandas dataframes
Data aggregation can be done using SQL / sparkSQL / pandas dataframe
- In our case will use featurestools package to automate this phase
Baseline classifier
Train a random forest model with
default parameters
Tuned classifier
Using gridsearchCV to tune best
parameters based on cross-
validation results
Base
AUC
Optimized
AUC
MLflow tracking goes here
Model Building & tracking process
Appendix #1 :
Deep feature synthesis used in Feature tools python package to generate
aggregates / apply transformations on relational data
17featurelabs.com
18
Appendix #2 : How random forests works ?
Main parameters to tune :
- Max_depth : # trees
- Nb_estimators : # of trees
- Min_rows: Specify the minimum
number of observations for a leaf
- Col_sample: column sample / tree
- Sample_rate : default to 0.63333
19
Complete code on github :
https://github.com/mmejdoubi/mlflow_fraud_ecom/blob/mast
er/ravelin_fraud_RF_mlflow_v1.ipynb
20
21
22
23
24
THANKS!
Any questions?
Reach me at mejdoubi2005@gmail.com
😉

More Related Content

What's hot

Why more than half of ML models don't make it to production
Why more than half of ML models don't make it to productionWhy more than half of ML models don't make it to production
Why more than half of ML models don't make it to production
cnvrg.io AI OS - Hands-on ML Workshops
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
Rui Quintino
 
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusRobust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Manasi Vartak
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning products
Gianmario Spacagna
 
Ml ops deployment choices
Ml ops   deployment choicesMl ops   deployment choices
Ml ops deployment choices
Avinash Patil
 
How to monitor your ML models in production with Kubernetes
How to monitor your ML models in production with KubernetesHow to monitor your ML models in production with Kubernetes
How to monitor your ML models in production with Kubernetes
cnvrg.io AI OS - Hands-on ML Workshops
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 
Build machine learning pipelines from research to production
Build machine learning pipelines from research to productionBuild machine learning pipelines from research to production
Build machine learning pipelines from research to production
cnvrg.io AI OS - Hands-on ML Workshops
 
MATLAB GUI Projects Research Ideas
MATLAB GUI Projects Research IdeasMATLAB GUI Projects Research Ideas
MATLAB GUI Projects Research Ideas
Matlab Simulation
 
Reproducible AI Using PyTorch and MLflow
Reproducible AI Using PyTorch and MLflowReproducible AI Using PyTorch and MLflow
Reproducible AI Using PyTorch and MLflow
Databricks
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
Henrik Skogström
 
MLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumMLOps by Sasha Rosenbaum
MLOps by Sasha Rosenbaum
Sasha Rosenbaum
 
Dependency inversion using ports and adapters
Dependency inversion using ports and adaptersDependency inversion using ports and adapters
Dependency inversion using ports and adapters
Mahfuzul Haque
 
Matlab Based Projects Research guidance
Matlab Based Projects Research guidanceMatlab Based Projects Research guidance
Matlab Based Projects Research guidance
Matlab Simulation
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
cnvrg.io AI OS - Hands-on ML Workshops
 
Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_future
Nisha Talagala
 
NLP Text Recommendation System Journey to Automated Training
NLP Text Recommendation System Journey to Automated TrainingNLP Text Recommendation System Journey to Automated Training
NLP Text Recommendation System Journey to Automated Training
Databricks
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
Databricks
 
Grokking: Data Engineering Course
Grokking: Data Engineering CourseGrokking: Data Engineering Course
Grokking: Data Engineering Course
Grokking VN
 
Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey
 

What's hot (20)

Why more than half of ML models don't make it to production
Why more than half of ML models don't make it to productionWhy more than half of ML models don't make it to production
Why more than half of ML models don't make it to production
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
 
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusRobust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning products
 
Ml ops deployment choices
Ml ops   deployment choicesMl ops   deployment choices
Ml ops deployment choices
 
How to monitor your ML models in production with Kubernetes
How to monitor your ML models in production with KubernetesHow to monitor your ML models in production with Kubernetes
How to monitor your ML models in production with Kubernetes
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
Build machine learning pipelines from research to production
Build machine learning pipelines from research to productionBuild machine learning pipelines from research to production
Build machine learning pipelines from research to production
 
MATLAB GUI Projects Research Ideas
MATLAB GUI Projects Research IdeasMATLAB GUI Projects Research Ideas
MATLAB GUI Projects Research Ideas
 
Reproducible AI Using PyTorch and MLflow
Reproducible AI Using PyTorch and MLflowReproducible AI Using PyTorch and MLflow
Reproducible AI Using PyTorch and MLflow
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
MLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumMLOps by Sasha Rosenbaum
MLOps by Sasha Rosenbaum
 
Dependency inversion using ports and adapters
Dependency inversion using ports and adaptersDependency inversion using ports and adapters
Dependency inversion using ports and adapters
 
Matlab Based Projects Research guidance
Matlab Based Projects Research guidanceMatlab Based Projects Research guidance
Matlab Based Projects Research guidance
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_future
 
NLP Text Recommendation System Journey to Automated Training
NLP Text Recommendation System Journey to Automated TrainingNLP Text Recommendation System Journey to Automated Training
NLP Text Recommendation System Journey to Automated Training
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
Grokking: Data Engineering Course
Grokking: Data Engineering CourseGrokking: Data Engineering Course
Grokking: Data Engineering Course
 
Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2Richard Coffey (x18140785) - Research in Computing CA2
Richard Coffey (x18140785) - Research in Computing CA2
 

Similar to databricks ml flow demonstration using automatic features engineering

Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
Databricks
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
Databricks
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycle
Databricks
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflows
Márton Kodok
 
"Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow""Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow"
Databricks
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
Build 2019 Recap
Build 2019 RecapBuild 2019 Recap
Build 2019 Recap
Eran Stiller
 
Feature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsFeature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systems
Andrzej Michałowski
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
Stavros Kontopoulos
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
Data Science Milan
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
Gülden Bilgütay
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
Liangjun Jiang
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
Liangjun Jiang
 
MLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to productionMLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to production
Fabian Hadiji
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
Stepan Pushkarev
 
EPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHUEPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHU
Dmitrii Suslov
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Márton Kodok
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
Márton Kodok
 
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learningUtilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Paris Data Engineers !
 

Similar to databricks ml flow demonstration using automatic features engineering (20)

Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
mlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecyclemlflow: Accelerating the End-to-End ML lifecycle
mlflow: Accelerating the End-to-End ML lifecycle
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflows
 
"Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow""Managing the Complete Machine Learning Lifecycle with MLflow"
"Managing the Complete Machine Learning Lifecycle with MLflow"
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
Build 2019 Recap
Build 2019 RecapBuild 2019 Recap
Build 2019 Recap
 
Feature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systemsFeature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systems
 
Machine learning at scale challenges and solutions
Machine learning at scale challenges and solutionsMachine learning at scale challenges and solutions
Machine learning at scale challenges and solutions
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
 
MLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to productionMLOps pipelines using MLFlow - From training to production
MLOps pipelines using MLFlow - From training to production
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
EPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHUEPAM ML/AI Accelerator - ODAHU
EPAM ML/AI Accelerator - ODAHU
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learningUtilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learning
 

Recently uploaded

一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 

Recently uploaded (20)

一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 

databricks ml flow demonstration using automatic features engineering

  • 1.
  • 2. Overview of a typical machine learning model workflow Fact #1 : Doing machine learning IS complex
  • 3. Fact #2 ! Hardest part of AI actually is not AI code...
  • 4.
  • 5. Machine learning projects main concerns 1- Open source ML ecosystem is crowded : For each phase of ML process, there is a myriad of tools to choose from ; 2- Tracking : it is difficult to track by hand which parameters, code, and data went into each experiment to produce a model, especially when work in teams ; 3- Reproducibility : Without detailed tracking, teams often have trouble getting the same code to work / achieve same results 5
  • 6. - First release on june 2018 - Latest version v1.5, released 19 Dec 2019 Introducing MLflow 6
  • 7. MLflow address machine learning challenges through its 3 main components 7 “MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility and deployment. It currently offers three components: What is ML flow ? 1 2 3
  • 8. MLflow value proposition: Track, store, Deploy models from any ML framework 8
  • 9. ML tracking API Single API + UI to track for each experiment : ▸ Parameters ▸ Metrics ▸ Artefacts (training datasets, …) Can be used on standalone script / from a notebook 9
  • 11. ML projects - ML projects define a standard packaging format to manage data science code. - It can be a simple directory / git repo with code to run. - The running environment requirements are defined as a simple YAML file. 11 ML flow projects sample YAML project
  • 12. ML Models - MLflow Models is a convention for packaging machine learning models in multiple formats called “flavors”. MLflow offers a variety of tools to help you deploy different flavors of models. - Each MLflow Model is saved as a directory containing arbitrary files and an MLmodel descriptor file that lists the flavors it can be used in. 12 Example of scikit-learn model
  • 13. Model serving commands 13 mlflow models serve Deploys the model as a local REST API server. mlflow models build- docker packages a REST API endpoint serving the model as a docker image. mlflow models predict uses the model to generate a prediction for a local CSV or JSON file.
  • 15. E-commerce fraud detection We have some json profiles representing fictional customers from an ecommerce company. (cf. courtesy of RAVELIN : https://github.com/unravelin/code-test-data-science) The profiles contain information about the customers, their orders, their transactions, what payment methods they used and whether the customer is fraudulent or not. Our task : ● Transform the json profiles into feature vectors : a. Automated feature engineering using featuretools package ● Construct a model to predict if a customer is fraudulent based on their profile. a. modeling phase using python + scikit-learn b. Track experiments results using databricks ML flow 15
  • 16. 16 Transform input data 1/Transactions / orders 2/Labels : fraudulent (true/false) Extract features Count orders min/max/avg transaction amount... Store analytical dataset Parquet file with : customerID | features X… | label Python script to decode each user profile json array into relational pandas dataframes Data aggregation can be done using SQL / sparkSQL / pandas dataframe - In our case will use featurestools package to automate this phase Baseline classifier Train a random forest model with default parameters Tuned classifier Using gridsearchCV to tune best parameters based on cross- validation results Base AUC Optimized AUC MLflow tracking goes here Model Building & tracking process
  • 17. Appendix #1 : Deep feature synthesis used in Feature tools python package to generate aggregates / apply transformations on relational data 17featurelabs.com
  • 18. 18 Appendix #2 : How random forests works ? Main parameters to tune : - Max_depth : # trees - Nb_estimators : # of trees - Min_rows: Specify the minimum number of observations for a leaf - Col_sample: column sample / tree - Sample_rate : default to 0.63333
  • 19. 19 Complete code on github : https://github.com/mmejdoubi/mlflow_fraud_ecom/blob/mast er/ravelin_fraud_RF_mlflow_v1.ipynb
  • 20. 20
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. 24 THANKS! Any questions? Reach me at mejdoubi2005@gmail.com 😉