Successfully reported this slideshow.
Your SlideShare is downloading. ×

Advanced Model Comparison and Automated Deployment Using ML

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 24 Ad

Advanced Model Comparison and Automated Deployment Using ML

Download to read offline

Here at T-Mobile when a new account is opened, there are fraud checks that occur both pre- and post-activation. Fraud that is missed has a tendency of falling into first payment default, looking like a delinquent new account. The objective of this project was to investigate newly created accounts headed towards delinquency to find additional fraud.

For the longevity of this project we wanted to implement it as an end to end automated solution for building and productionizing models that included multiple modeling techniques and hyper parameter tuning.

We wanted to utilize MLflow for model comparison, graduation to production, and parallel hyper parameter tuning using Hyperopt. To achieve this goal, we created multiple machine learning notebooks where a variety of models could be tuned with their specific parameters. These models were saved into a training MLflow experiment, after which the best performing model for each model notebook was saved to a model comparison MLflow experiment.

In the second experiment the newly built models would be compared with each other as well as the models currently and previously in production. After the best performing model was identified it was then saved to the MLflow Model Registry to be graduated to production.

We were able to execute the multiple notebook solution above as part of an Azure Data Factory pipeline to be regularly scheduled, making the model building and selection a completely hand off implementation.

Every data science project has its nuances; the key is to leverage available tools in a customized approach that fit your needs. We are hoping to provide the audience with a view into our advanced and custom approach of utilizing the MLflow infrastructure and leveraging these tools through automation.

Here at T-Mobile when a new account is opened, there are fraud checks that occur both pre- and post-activation. Fraud that is missed has a tendency of falling into first payment default, looking like a delinquent new account. The objective of this project was to investigate newly created accounts headed towards delinquency to find additional fraud.

For the longevity of this project we wanted to implement it as an end to end automated solution for building and productionizing models that included multiple modeling techniques and hyper parameter tuning.

We wanted to utilize MLflow for model comparison, graduation to production, and parallel hyper parameter tuning using Hyperopt. To achieve this goal, we created multiple machine learning notebooks where a variety of models could be tuned with their specific parameters. These models were saved into a training MLflow experiment, after which the best performing model for each model notebook was saved to a model comparison MLflow experiment.

In the second experiment the newly built models would be compared with each other as well as the models currently and previously in production. After the best performing model was identified it was then saved to the MLflow Model Registry to be graduated to production.

We were able to execute the multiple notebook solution above as part of an Azure Data Factory pipeline to be regularly scheduled, making the model building and selection a completely hand off implementation.

Every data science project has its nuances; the key is to leverage available tools in a customized approach that fit your needs. We are hoping to provide the audience with a view into our advanced and custom approach of utilizing the MLflow infrastructure and leveraging these tools through automation.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Advanced Model Comparison and Automated Deployment Using ML (20)

Advertisement

More from Databricks (20)

Advertisement

Advanced Model Comparison and Automated Deployment Using ML

  1. 1. Advanced Model Comparison and Automated Deployment Using MLFlow Charu Kalra Connor McCambridge Sr Data Scientist Sr Data Scientist
  2. 2. Fraud Insights & Analytics Data Science Team Charu Kalra • Senior Data Scientist • Master’s in Mathematical Finance from Rutgers University • Previously worked at American Express as a Risk Manager and Commerce Bank as a Data Scientist • 2+ years experience working with Spark, Databricks, and Big Data Architectures Connor McCambridge • Senior Data Scientist • Master’s in Business Intelligence and Analytics from Rockhurst University • Started data science career as an Intern for Sprint’s Prepaid Division • 3+ years experience working with Spark, Databricks, and Big Data Architectures Ted Burbidge • Senior Data Scientist • Master’s in Applied Statistics from the University of Kansas • Working in telecom since 2000 in various roles including Performance Engineering and Application Design • 3+ years experience working with Spark, Databricks, and Big Data Architectures
  3. 3. Agenda § Project Vision § Solution Design § Demo § Conclusion
  4. 4. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
  5. 5. Project Vision • Problem Statement: • Multiple Fraud Checks on New Accounts • Missed Fraud falls into Delinquent Status. • Objective: 1. Measure Fraud Rate 2. Identify Missed Fraud Pre- Activation Check Post- Activation Check Delinquent Status Check
  6. 6. Random Sampling Machine Learning Outlier Detection Project Approach
  7. 7. Automated Process Flow Historical Data Build Model Daily Data Production Model Collect Results Load Results Manual Review Examine Model Performance Refresh Historical Records
  8. 8. Data Science Stages ▪ Gather data ▪ Training data ▪ Test data ▪ Variable Creation ▪ Aggregations ▪ Scaling/ Standardizing • Data Transformation • Data Preparation ▪ Machine Learning ▪ Outlier Detection • Model Building ▪ Examine Metrics ▪ Productionalize Model (optional) • Review Results
  9. 9. Notebook Building Data Prep Build Transformer Build Model (Logistic Regression) Review Results Data Prep Build Transformer Build Model (Logistic Regression) Review Results Build Model (Neural Net) Build Model (XGBoost) Data Prep Build Transformer Build Model (Logistic Regression) Review Results Data Prep Build Transformer Build Model (Neural Net) Review Results Data Prep Build Transformer Build Model (XGBoost) Review Results
  10. 10. Notebook Framework Review Results Build Model (XGBoost) Build Model (Neural Net) Build Model (Logistic Regression) Data Prep Build Transformer
  11. 11. Framework Components § Create & Store Data § Build Uniform Transformer § Train Multiple Models § Hyper-Tune Parameters § Model Selection § Automated Deployment
  12. 12. Machine Learning Outlier Detection Outlier Experiment Solution Design Data Prep & Transformer ML Model Building Outlier Model Building Outlier Model Compare ML Model Compare Daily Batch Scoring Outlier Model Registry ML Experiment ML Model Registry Transformer Model Registry
  13. 13. Machine Learning Outlier Detection Outlier Experiment ML Model Building Outlier Model Building Outlier Model Compare ML Model Compare Daily Batch Scoring Outlier Model Registry ML Experiment ML Model Registry Transformer Model Registry Create & Store Data Data Prep & Transformer
  14. 14. Machine Learning Outlier Detection Outlier Experiment Data Prep & Transformer ML Model Building Outlier Model Building Outlier Model Compare ML Model Compare Daily Batch Scoring Outlier Model Registry ML Experiment ML Model Registry Transformer Model Registry Build Uniform Transformer
  15. 15. Data Prep & Transformer Machine Learning Outlier Detection Outlier Experiment ML Model Building Outlier Model Building Outlier Model Compare ML Model Compare Daily Batch Scoring Outlier Model Registry ML Experiment ML Model Registry Transformer Model Registry Train Multiple Models
  16. 16. Data Prep & Transformer Machine Learning Outlier Detection Outlier Experiment ML Model Building Outlier Model Building Outlier Model Compare ML Model Compare Daily Batch Scoring Outlier Model Registry ML Experiment ML Model Registry Transformer Model Registry Hyper-Tuning Parameters
  17. 17. {𝝆𝟏: 𝝆𝟏𝟏, 𝝆𝟏𝟐, … , 𝝆𝟏𝒏 , 𝝆𝟐: 𝝆𝟐𝟏, 𝝆𝟐𝟐, … , 𝝆𝟐𝒏 , …, 𝝆𝒏: 𝝆𝒏𝟏, 𝝆𝒏𝟐, … , 𝝆𝒏𝒏 } fmin(): {𝝆𝟏𝟏, 𝝆𝟐𝟏, … , 𝝆𝒏𝟏} {𝝆𝟏𝟐, 𝝆𝟐𝟏, … , 𝝆𝒏𝟏} {𝝆𝟏𝟑, 𝝆𝟐𝟏, … , 𝝆𝒏𝟏} {… , … , … , … } {𝝆𝟏𝒏, 𝝆𝟐𝒏, … , 𝝆𝒏𝒏} {𝑩𝒆𝒔𝒕 𝑴𝒐𝒅𝒆𝒍} Test & Train Data Transform Data Define Hyperopt Build Models Compare Results Select Best Model Build & Save Pipeline Hyper-Tuning
  18. 18. Data Prep & Transformer Machine Learning Outlier Detection Outlier Experiment ML Model Building Outlier Model Building Outlier Model Compare ML Model Compare Daily Batch Scoring Outlier Model Registry ML Experiment ML Model Registry Transformer Model Registry Model Selection
  19. 19. Data Prep & Transformer Machine Learning Outlier Detection Outlier Experiment ML Model Building Outlier Model Building Outlier Model Compare ML Model Compare Daily Batch Scoring Outlier Model Registry ML Experiment ML Model Registry Transformer Model Registry Automated Deployment
  20. 20. Azure Data Factory Implementation By coding all the notebooks in Azure Databricks, we can then use Azure Data Factory to orchestrate the notebook executions
  21. 21. Demo Dataset: Worldline and the Machine Learning Group at the Free University of Brussels. (2018). Credit Card Fraud Detection (Version 3) [CSV]. Retrieved from https://www.kaggle.com/mlg-ulb/creditcardfraud
  22. 22. Project Results § 3x higher fraud rate through Random Sampling § Outlier Detection captured 4x random sample rate § Machine Learning captured 10x random sample rate
  23. 23. Key Takeaways • Leverage available tools in Azure • Customization of solution using Databricks • Complete model lifecycle utilizing MLFlow • Full automation with Azure Data Factory
  24. 24. Thank You

×