SlideShare a Scribd company logo
1 of 7
©2022 Databricks Inc. — All rights reserved
Datathon
Retail – Churn Prediction
7th - 9th June 2022
1
©2022 Databricks Inc. — All rights reserved
The team
• DAs:
• Vizualisation
• Preprocessing
• DE: sys admin
• DS:
• Baseline
• Feature engineering
• Model development
• Captain
• Code review
• Team Data Science Process
2
©2022 Databricks Inc. — All rights reserved
The architecture
3
Preprocessing
members_table
Preprocessing
account_transacti
ons
Preprocessing
user_activity
Learning and
scoring
Using the model
on test set
Inference on the
predict set
Machine Learning
Feature
engineering
Split into
train/val/test
Data Engineering & Data Sciences
©2022 Databricks Inc. — All rights reserved
The results
4
Model Score on train set Score on test set Score on test set provided
XGBoost 0.96 0.91 0.82
Decision Tree 0.9 0.9 0.75
Logistic Regression 0.83 0.82 0.49
• Our goal was to score on the True Positive (ie, a churn predicted as a churn).
Benchmark of different algorithms used, with the same features and same splits.
©2022 Databricks Inc. — All rights reserved
5
Challenge Ideas to answer the issue
Features are the hardest component to get right Feature store provides support for feature storage, serving
and management for ML scalability
DS models can be slow and choosing the right architecture
is primordial
GPUs and even TPUs are better choices (but higher cost)
for parallel processing and complex matrix manipulation over
CPUs
Framework versions and dependency problems with ML
deployment
Model registry keeps ML models encapsulated and easy to
deploy across all environments for easy scalability
Large ML application is never a one-time deal, it requires
efficient optimization methodologies to fine-tune or retrain
through iterative processes
Bayesian optimization offers a computational-efficient
benefit to scale up ML hyper-parameter tuning
Friction between ML and business experts CI/CD platform to get direct feedback from business
stakeholders
Vertical scaling remains the most pragmatic solution (not always the best one), especially for applications that have been
in production for many years. However, in a distributed environment, horizontal scaling by adding more machines to your
pool of resources can be a better idea.
How to scale
©2022 Databricks Inc. — All rights reserved
Next steps you recommend
• Capitalize within our company
• AGILE Framework
• Data analysis Methodology
• Statistics based Exploratory Data Analysis
• Business stakeholders feedbacks
• Data drift
• Engineering
• Improve features creations especially for user and activity table
• Optimize model’s hyperparameters
• Benchmark deep learning
6
©2022 Databricks Inc. — All rights reserved
What is for you the value of Azure
Databricks?
• Triple level Parallelization
• Models benchmark
• Hyperparameters search within algorithms
• Map-reduceable algorithms
• Team Data Science Process Ready (Collaboration)
• One-stop-shop (Easy hands-on complex tasks)
• Parallel by design (Cutting-edge technology)
• Pyspark optimized (Done for and by ML practitioners)
7

More Related Content

What's hot

Putting the Human Back in the Loop: Keynote Talk at IS-EUD 2023 Cagliari
Putting the Human Back in the Loop: Keynote Talk at IS-EUD 2023 Cagliari Putting the Human Back in the Loop: Keynote Talk at IS-EUD 2023 Cagliari
Putting the Human Back in the Loop: Keynote Talk at IS-EUD 2023 Cagliari AnttiOulasvirta
 
Agile methodologies in_project_management
Agile methodologies in_project_managementAgile methodologies in_project_management
Agile methodologies in_project_managementPravin Asar
 
Fhir dev days 2017 fhir profiling - overview and introduction v07
Fhir dev days 2017   fhir profiling - overview and introduction v07Fhir dev days 2017   fhir profiling - overview and introduction v07
Fhir dev days 2017 fhir profiling - overview and introduction v07DevDays
 
Försäkringskassan: Neo4j as an Information Hub (GraphSummit Stockholm 2023)
Försäkringskassan: Neo4j as an Information Hub (GraphSummit Stockholm 2023)Försäkringskassan: Neo4j as an Information Hub (GraphSummit Stockholm 2023)
Försäkringskassan: Neo4j as an Information Hub (GraphSummit Stockholm 2023)Neo4j
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Flink Forward
 
Data Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open DataData Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open DataMarco Torchiano
 
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Monitoring Models in Production
Monitoring Models in ProductionMonitoring Models in Production
Monitoring Models in ProductionJannes Klaas
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflowDatabricks
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeDatabricks
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
le guide swebok
le guide swebokle guide swebok
le guide sweboksammiiaa
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
 
Self-Service IoT Data Analytics with StreamPipes
Self-Service IoT Data Analytics with StreamPipesSelf-Service IoT Data Analytics with StreamPipes
Self-Service IoT Data Analytics with StreamPipesApache StreamPipes
 
Getting Started with BigQuery ML
Getting Started with BigQuery MLGetting Started with BigQuery ML
Getting Started with BigQuery MLDan Sullivan, Ph.D.
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOpsDatabricks
 

What's hot (20)

Putting the Human Back in the Loop: Keynote Talk at IS-EUD 2023 Cagliari
Putting the Human Back in the Loop: Keynote Talk at IS-EUD 2023 Cagliari Putting the Human Back in the Loop: Keynote Talk at IS-EUD 2023 Cagliari
Putting the Human Back in the Loop: Keynote Talk at IS-EUD 2023 Cagliari
 
Introduction to knime
Introduction to knimeIntroduction to knime
Introduction to knime
 
Agile methodologies in_project_management
Agile methodologies in_project_managementAgile methodologies in_project_management
Agile methodologies in_project_management
 
Fhir dev days 2017 fhir profiling - overview and introduction v07
Fhir dev days 2017   fhir profiling - overview and introduction v07Fhir dev days 2017   fhir profiling - overview and introduction v07
Fhir dev days 2017 fhir profiling - overview and introduction v07
 
Försäkringskassan: Neo4j as an Information Hub (GraphSummit Stockholm 2023)
Försäkringskassan: Neo4j as an Information Hub (GraphSummit Stockholm 2023)Försäkringskassan: Neo4j as an Information Hub (GraphSummit Stockholm 2023)
Försäkringskassan: Neo4j as an Information Hub (GraphSummit Stockholm 2023)
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
Data Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open DataData Quality - Standards and Application to Open Data
Data Quality - Standards and Application to Open Data
 
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
 
Monitoring Models in Production
Monitoring Models in ProductionMonitoring Models in Production
Monitoring Models in Production
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Building Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta LakeBuilding Reliable Data Lakes at Scale with Delta Lake
Building Reliable Data Lakes at Scale with Delta Lake
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
le guide swebok
le guide swebokle guide swebok
le guide swebok
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Self-Service IoT Data Analytics with StreamPipes
Self-Service IoT Data Analytics with StreamPipesSelf-Service IoT Data Analytics with StreamPipes
Self-Service IoT Data Analytics with StreamPipes
 
Getting Started with BigQuery ML
Getting Started with BigQuery MLGetting Started with BigQuery ML
Getting Started with BigQuery ML
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 

Similar to Microsoft_Databricks Datathon - Submission Deck TEMPLATE.pptx

Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera, Inc.
 
Kyligence Cloud 4 - Feature Focus: AI-Augmented Engine
Kyligence Cloud 4 - Feature Focus: AI-Augmented EngineKyligence Cloud 4 - Feature Focus: AI-Augmented Engine
Kyligence Cloud 4 - Feature Focus: AI-Augmented EngineSamanthaBerlant
 
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...Edge AI and Vision Alliance
 
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...IDERA Software
 
Bulletproof Your QAD ERP to Cloud | JK Tech Webinar
Bulletproof Your QAD ERP to Cloud | JK Tech WebinarBulletproof Your QAD ERP to Cloud | JK Tech Webinar
Bulletproof Your QAD ERP to Cloud | JK Tech WebinarJK Tech
 
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Denodo
 
Optimizing Open Source for Greater Database Savings and Control
Optimizing Open Source for Greater Database Savings and ControlOptimizing Open Source for Greater Database Savings and Control
Optimizing Open Source for Greater Database Savings and ControlEDB
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesMapR Technologies
 
Introduction to Adaptive and 3DEXPERIENCE Cloud
Introduction to Adaptive and 3DEXPERIENCE CloudIntroduction to Adaptive and 3DEXPERIENCE Cloud
Introduction to Adaptive and 3DEXPERIENCE CloudAdaptive Corporation
 
DesignTech Systems - DCS presentation Oct 2017
DesignTech Systems - DCS presentation Oct 2017DesignTech Systems - DCS presentation Oct 2017
DesignTech Systems - DCS presentation Oct 2017DesignTech Systems Ltd.
 
Engineering_Campus_Presentation_2022 (1)-compressed.pptx
Engineering_Campus_Presentation_2022 (1)-compressed.pptxEngineering_Campus_Presentation_2022 (1)-compressed.pptx
Engineering_Campus_Presentation_2022 (1)-compressed.pptxManikaahuja4
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBMongoDB
 
Optimizing Open Source for Greater Database Savings & Control
Optimizing Open Source for Greater Database Savings & ControlOptimizing Open Source for Greater Database Savings & Control
Optimizing Open Source for Greater Database Savings & ControlEDB
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...DataStax Academy
 
Agile and Its Impact on Productivity
Agile and Its Impact on ProductivityAgile and Its Impact on Productivity
Agile and Its Impact on ProductivityDCG Software Value
 
Remote DBA Service: Powering your DBA needs
Remote DBA Service: Powering your DBA needsRemote DBA Service: Powering your DBA needs
Remote DBA Service: Powering your DBA needsEDB
 
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus WebinarAutomated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus WebinarImpetus Technologies
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...Databricks
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...Agile Testing Alliance
 

Similar to Microsoft_Databricks Datathon - Submission Deck TEMPLATE.pptx (20)

Cloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made EasyCloudera Altus: Big Data in the Cloud Made Easy
Cloudera Altus: Big Data in the Cloud Made Easy
 
Kyligence Cloud 4 - Feature Focus: AI-Augmented Engine
Kyligence Cloud 4 - Feature Focus: AI-Augmented EngineKyligence Cloud 4 - Feature Focus: AI-Augmented Engine
Kyligence Cloud 4 - Feature Focus: AI-Augmented Engine
 
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
 
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
Idera live 2021: Will Data Vault add Value to Your Data Warehouse? 3 Signs th...
 
Bulletproof Your QAD ERP to Cloud | JK Tech Webinar
Bulletproof Your QAD ERP to Cloud | JK Tech WebinarBulletproof Your QAD ERP to Cloud | JK Tech Webinar
Bulletproof Your QAD ERP to Cloud | JK Tech Webinar
 
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
 
Optimizing Open Source for Greater Database Savings and Control
Optimizing Open Source for Greater Database Savings and ControlOptimizing Open Source for Greater Database Savings and Control
Optimizing Open Source for Greater Database Savings and Control
 
Productionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best PracticesProductionizing Hadoop: 7 Architectural Best Practices
Productionizing Hadoop: 7 Architectural Best Practices
 
Introduction to Adaptive and 3DEXPERIENCE Cloud
Introduction to Adaptive and 3DEXPERIENCE CloudIntroduction to Adaptive and 3DEXPERIENCE Cloud
Introduction to Adaptive and 3DEXPERIENCE Cloud
 
DesignTech Systems - DCS presentation Oct 2017
DesignTech Systems - DCS presentation Oct 2017DesignTech Systems - DCS presentation Oct 2017
DesignTech Systems - DCS presentation Oct 2017
 
Engineering_Campus_Presentation_2022 (1)-compressed.pptx
Engineering_Campus_Presentation_2022 (1)-compressed.pptxEngineering_Campus_Presentation_2022 (1)-compressed.pptx
Engineering_Campus_Presentation_2022 (1)-compressed.pptx
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 
Optimizing Open Source for Greater Database Savings & Control
Optimizing Open Source for Greater Database Savings & ControlOptimizing Open Source for Greater Database Savings & Control
Optimizing Open Source for Greater Database Savings & Control
 
PTC - CREO - DesignTech Systems Ltd
PTC - CREO - DesignTech Systems LtdPTC - CREO - DesignTech Systems Ltd
PTC - CREO - DesignTech Systems Ltd
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
 
Agile and Its Impact on Productivity
Agile and Its Impact on ProductivityAgile and Its Impact on Productivity
Agile and Its Impact on Productivity
 
Remote DBA Service: Powering your DBA needs
Remote DBA Service: Powering your DBA needsRemote DBA Service: Powering your DBA needs
Remote DBA Service: Powering your DBA needs
 
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus WebinarAutomated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
ATAGTR2017 Performance Testing and Non-Functional Testing Strategy for Big Da...
 

Recently uploaded

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 

Recently uploaded (20)

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 

Microsoft_Databricks Datathon - Submission Deck TEMPLATE.pptx

  • 1. ©2022 Databricks Inc. — All rights reserved Datathon Retail – Churn Prediction 7th - 9th June 2022 1
  • 2. ©2022 Databricks Inc. — All rights reserved The team • DAs: • Vizualisation • Preprocessing • DE: sys admin • DS: • Baseline • Feature engineering • Model development • Captain • Code review • Team Data Science Process 2
  • 3. ©2022 Databricks Inc. — All rights reserved The architecture 3 Preprocessing members_table Preprocessing account_transacti ons Preprocessing user_activity Learning and scoring Using the model on test set Inference on the predict set Machine Learning Feature engineering Split into train/val/test Data Engineering & Data Sciences
  • 4. ©2022 Databricks Inc. — All rights reserved The results 4 Model Score on train set Score on test set Score on test set provided XGBoost 0.96 0.91 0.82 Decision Tree 0.9 0.9 0.75 Logistic Regression 0.83 0.82 0.49 • Our goal was to score on the True Positive (ie, a churn predicted as a churn). Benchmark of different algorithms used, with the same features and same splits.
  • 5. ©2022 Databricks Inc. — All rights reserved 5 Challenge Ideas to answer the issue Features are the hardest component to get right Feature store provides support for feature storage, serving and management for ML scalability DS models can be slow and choosing the right architecture is primordial GPUs and even TPUs are better choices (but higher cost) for parallel processing and complex matrix manipulation over CPUs Framework versions and dependency problems with ML deployment Model registry keeps ML models encapsulated and easy to deploy across all environments for easy scalability Large ML application is never a one-time deal, it requires efficient optimization methodologies to fine-tune or retrain through iterative processes Bayesian optimization offers a computational-efficient benefit to scale up ML hyper-parameter tuning Friction between ML and business experts CI/CD platform to get direct feedback from business stakeholders Vertical scaling remains the most pragmatic solution (not always the best one), especially for applications that have been in production for many years. However, in a distributed environment, horizontal scaling by adding more machines to your pool of resources can be a better idea. How to scale
  • 6. ©2022 Databricks Inc. — All rights reserved Next steps you recommend • Capitalize within our company • AGILE Framework • Data analysis Methodology • Statistics based Exploratory Data Analysis • Business stakeholders feedbacks • Data drift • Engineering • Improve features creations especially for user and activity table • Optimize model’s hyperparameters • Benchmark deep learning 6
  • 7. ©2022 Databricks Inc. — All rights reserved What is for you the value of Azure Databricks? • Triple level Parallelization • Models benchmark • Hyperparameters search within algorithms • Map-reduceable algorithms • Team Data Science Process Ready (Collaboration) • One-stop-shop (Easy hands-on complex tasks) • Parallel by design (Cutting-edge technology) • Pyspark optimized (Done for and by ML practitioners) 7