SlideShare a Scribd company logo
Scaling Machine
Learning
@ Careem
Who?
Ahmed Kamal
- Tech Lead @ Machine Learning Platform, Careem,
- Computer Engineer by training
I blog @ ahmedkamal.me
Find me on twitter @_akamal_
Presenting the work of my team and other awesome colleagues @ Careem
Agenda
3
Introduction
AI @ Careem
Motivation behind building ML Platform
A Deep dive into Careem MLP
Scaling Machine Learning Usage
Learnings from the journey
1
2
3
4
5
6
We are the leading
technology
platform in the
middle east !
Multi-vertical platform of
Mobility, Delivery, Payments
14 Countries 100+ Cities 3000 Colleagues
33M Customers 1.7 M Captains
To simplify and improve the lives of people…
...and build an awesome organisation that inspires
Sneak Peek into ML @ Careem
● Enhances customers and captains Experience
○ ETAs & Accurate Prices
○ Cancellations & Captain Acceptance
Sneak Peek into ML @ Careem
● Enhances customers and captains Experience
○ ETAs & Accurate Prices
○ Cancellations & Captain Acceptance
● Platform Integrity
○ Fraud Prevention and Detection
○ Anomaly Detection
Sneak Peek into ML @ Careem
● Enhances customers and captains Experience
○ ETAs & Accurate Prices
○ Cancellations & Captain Acceptance
● Platform Integrity
○ Fraud Prevention and Detection
○ Anomaly Detection
● Ensure efficiency of our two-sided marketplace
○ Demand and Supply forecasting
○ Smart Dispatching & Peak
Building an AI
ecosystem
10
ML Infra
Scalable
Machine
Learning
Platforms
Data
Warehouses
Well governed,
trustworthy
and
documented
data
Big Data
capabilities
Easy and
reliable access
to large volumes
of data
Scalable AI Ecosystem
Know How
AI aware
colleagues
Expectation: (%)
Reality: (%)
Formulate the problem
Data selection and feature
engineering
ML model development
Model deployment,
integration and monitoring
ML Workflow - Challenges of ML at scale
ML Challenges
ML Development Challenges
13
Lots amount of data to
process
Prepare Data
Training is very costly
and takes long time.
Reproducibility
Train a model
Development
environment mismatch
with production
environment.
Transfer to Prod
Models needs to be in
production
- Low Latency &
High Throughput
- Monitoring &
Alerting
- Fault Tolerance &
Auto scaling
Deploy
Post Deployment Challenges
Continuously refresh and updated
deployed models
Post Deployment Challenges
Performance Monitoring and alertingContinuously refresh and updated
deployed models
Post Deployment Challenges
Performance Monitoring and alertingContinuously refresh and updated
deployed models
A/B Testing between new/old or
new/new models
Post Deployment Challenges
Which cities are ready for ML ?
Performance Monitoring and alertingContinuously refresh and updated
deployed models
A/B Testing between new/old or
new/new models
Additional 100 models ?
Post Deployment Challenges
Which cities are ready for ML ?
Performance Monitoring and alertingContinuously refresh and updated
deployed models
A/B Testing between new/old or
new/new models
Additional 100 models ?
Post Deployment Challenges
Which cities are ready for ML ?
Performance Monitoring and alerting
Too many APIs ? Integration headache
Continuously refresh and updated
deployed models
A/B Testing between new/old or
new/new models
Careem ML
Platform
Accelerate adoption of AI @
Careem
Automate the end to end ML
experience
Formulate the
problem
Select data and
feature engineering
Deploy model to
production
Train and test
models
Monitor and improve
ML lifecycle
Tackling ML Life Cycle
Dataset Generation
Model Training
Model Report
Batch Serving
- Batch predictions generated offline for a target
dataset.
- Store predictions in different data-stores.
Realtime Serving
- Config Based Modular Serving Framework
- Inject Custom Feature Engineering Logic
- Access to external data & A/B testing support
- Prediction and performance logging
-
- Configuration Based Deployment service.
- Latency, integration and API tests
- Auto-Rollout Capabilities to smoothen model updates experience
One Click Deployment
Configs Production Level API
One Click !
A Glued Dynamic System
Data Generation Model Training Model Serving
URL => eta-service.careem.com/v1/101/eta
Request =>
[{"uuid": "9fdsaf9as9da9sd9", "assignment_time": "2019-03-14 14:09:46", "captain_lat": 30.0039,
"captain_long": 31.1422, "booking_lat": 30.0022, "booking_long": 31.1405}]
Response =>
{
"response": [{
"uuid": "9fdsaf9as9da9sd9",
"prediction": 2.5
}
],
"result": "ok"
}
Now you have an API
Impact
Infra
Cost
Reduced time needed by
a DS from days to
minutes per model.
Reduced time needed by
a DE from 2 weeks to 12
minutes per use-case
One-Click Serving API Deployment
Training pipelines and Auto-rollout
Dataset Generation
More impact
than doubling
the size of our
DS team
Serverless Job Training
Up to 90% saving on
training cost
We are able to have more
models on production and
have much higher
compounded impact
Model Reports (Visualizations +
Metrics over time)
Saving hours from DS
time
Reduced analysis and
evaluation time from
hours to minutes.
Productivity
To Summarize
Scaling Machine Learning Usage
From Scaling Infra to Scaling Usage
AI For Everyone
Auto Machine Learning
Custom AI Powered
Toolings
Generic Time Series
Forecasting
Supply
Forecasting
Demand
Forecasting
Campaign
Management
System
Customer
Care
- Enables experimenting with ML in short time.
- Lower the barrier for using ML for lots of people
AutoML
Auto Feature Selection/Engineering
Auto Hyperparameter Tuning
Auto Model Selection
Auto Model Ensembling
Auto Rollout
Rich Feature Store
- ML development is a cross functional work.
- Heavy investments in automation is the key for scaling ML.
- Design with different user segments in mind.
Learnings from the journey
For Slides : @_akamal8_
Thank you! Danke! Teşekkürler!
!‫ﺷﮑرﯾہ‬ !ً‫ا‬‫ﺷﻛر‬
For Slides : @_akamal8_

More Related Content

Similar to Scaling ml @ careem (oreilly ai conf)

Starter Kit for Collaboration from Karuana @ Microsoft IT
Starter Kit for Collaboration from Karuana @ Microsoft ITStarter Kit for Collaboration from Karuana @ Microsoft IT
Starter Kit for Collaboration from Karuana @ Microsoft ITKaruana Gatimu
 
Hunter Fan + EAC Presentation
Hunter Fan + EAC PresentationHunter Fan + EAC Presentation
Hunter Fan + EAC PresentationAddison9
 
EAC Hunter Fan Presentation
EAC Hunter Fan PresentationEAC Hunter Fan Presentation
EAC Hunter Fan PresentationAddison9
 
Ad Hoc Automation is an Expensive Mistake
Ad Hoc Automation is an Expensive MistakeAd Hoc Automation is an Expensive Mistake
Ad Hoc Automation is an Expensive MistakeBMC Software
 
Incremental test automation for Retailers to save money
Incremental test automation for Retailers to save moneyIncremental test automation for Retailers to save money
Incremental test automation for Retailers to save moneyAspire Systems
 
AWS Cloud Adoption Framework and Workshops
AWS Cloud Adoption Framework and WorkshopsAWS Cloud Adoption Framework and Workshops
AWS Cloud Adoption Framework and WorkshopsTom Laszewski
 
Cloud
CloudCloud
Cloudain84
 
Parameter Server Approach for Online Learning at Twitter
Parameter Server Approach for Online Learning at TwitterParameter Server Approach for Online Learning at Twitter
Parameter Server Approach for Online Learning at TwitterZhiyong (Joe) Xie
 
20210428 - Sustainable Engineering practices & API Communities: Adoption Best...
20210428 - Sustainable Engineering practices & API Communities: Adoption Best...20210428 - Sustainable Engineering practices & API Communities: Adoption Best...
20210428 - Sustainable Engineering practices & API Communities: Adoption Best...Angel Alberici
 
5 Steps to Get Precise SAP Impact-Based Testing
5 Steps to Get Precise SAP Impact-Based Testing5 Steps to Get Precise SAP Impact-Based Testing
5 Steps to Get Precise SAP Impact-Based TestingTurnKey Solutions
 
ICML'16 Scaling ML System@Twitter
ICML'16 Scaling ML System@TwitterICML'16 Scaling ML System@Twitter
ICML'16 Scaling ML System@TwitterJack Xiaojiang Guo
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Paul Brebner
 
Accelerate Your Sap Testing with Bqurious
Accelerate Your Sap Testing with BquriousAccelerate Your Sap Testing with Bqurious
Accelerate Your Sap Testing with BquriousyadavSusheel
 
TopTen_MobilityWithALM_v1 11
TopTen_MobilityWithALM_v1 11TopTen_MobilityWithALM_v1 11
TopTen_MobilityWithALM_v1 11Mike Hansen
 
BEST COMMISSIONING MANAGEMENT SERVICES IN UK
BEST COMMISSIONING MANAGEMENT SERVICES IN UKBEST COMMISSIONING MANAGEMENT SERVICES IN UK
BEST COMMISSIONING MANAGEMENT SERVICES IN UKOlivia Wilson
 
BEST COMMISSIONING MANAGEMENT SERVICES IN UK
BEST COMMISSIONING MANAGEMENT SERVICES IN UKBEST COMMISSIONING MANAGEMENT SERVICES IN UK
BEST COMMISSIONING MANAGEMENT SERVICES IN UKOlivia Wilson
 
How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...
How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...
How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...Edureka!
 
Frank Cohen - Are We Ready For Cloud Testing - EuroSTAR 2010
Frank Cohen - Are We Ready For Cloud Testing - EuroSTAR 2010Frank Cohen - Are We Ready For Cloud Testing - EuroSTAR 2010
Frank Cohen - Are We Ready For Cloud Testing - EuroSTAR 2010TEST Huddle
 

Similar to Scaling ml @ careem (oreilly ai conf) (20)

Starter Kit for Collaboration from Karuana @ Microsoft IT
Starter Kit for Collaboration from Karuana @ Microsoft ITStarter Kit for Collaboration from Karuana @ Microsoft IT
Starter Kit for Collaboration from Karuana @ Microsoft IT
 
Hunter Fan + EAC Presentation
Hunter Fan + EAC PresentationHunter Fan + EAC Presentation
Hunter Fan + EAC Presentation
 
EAC Hunter Fan Presentation
EAC Hunter Fan PresentationEAC Hunter Fan Presentation
EAC Hunter Fan Presentation
 
Ad Hoc Automation is an Expensive Mistake
Ad Hoc Automation is an Expensive MistakeAd Hoc Automation is an Expensive Mistake
Ad Hoc Automation is an Expensive Mistake
 
Incremental test automation for Retailers to save money
Incremental test automation for Retailers to save moneyIncremental test automation for Retailers to save money
Incremental test automation for Retailers to save money
 
AWS Cloud Adoption Framework and Workshops
AWS Cloud Adoption Framework and WorkshopsAWS Cloud Adoption Framework and Workshops
AWS Cloud Adoption Framework and Workshops
 
Cloud
CloudCloud
Cloud
 
Parameter Server Approach for Online Learning at Twitter
Parameter Server Approach for Online Learning at TwitterParameter Server Approach for Online Learning at Twitter
Parameter Server Approach for Online Learning at Twitter
 
20210428 - Sustainable Engineering practices & API Communities: Adoption Best...
20210428 - Sustainable Engineering practices & API Communities: Adoption Best...20210428 - Sustainable Engineering practices & API Communities: Adoption Best...
20210428 - Sustainable Engineering practices & API Communities: Adoption Best...
 
CA|Automic Live Melbourne 2017
CA|Automic Live Melbourne 2017CA|Automic Live Melbourne 2017
CA|Automic Live Melbourne 2017
 
5 Steps to Get Precise SAP Impact-Based Testing
5 Steps to Get Precise SAP Impact-Based Testing5 Steps to Get Precise SAP Impact-Based Testing
5 Steps to Get Precise SAP Impact-Based Testing
 
ICML'16 Scaling ML System@Twitter
ICML'16 Scaling ML System@TwitterICML'16 Scaling ML System@Twitter
ICML'16 Scaling ML System@Twitter
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...
 
Accelerate Your Sap Testing with Bqurious
Accelerate Your Sap Testing with BquriousAccelerate Your Sap Testing with Bqurious
Accelerate Your Sap Testing with Bqurious
 
TopTen_MobilityWithALM_v1 11
TopTen_MobilityWithALM_v1 11TopTen_MobilityWithALM_v1 11
TopTen_MobilityWithALM_v1 11
 
AppliFire Platform
AppliFire PlatformAppliFire Platform
AppliFire Platform
 
BEST COMMISSIONING MANAGEMENT SERVICES IN UK
BEST COMMISSIONING MANAGEMENT SERVICES IN UKBEST COMMISSIONING MANAGEMENT SERVICES IN UK
BEST COMMISSIONING MANAGEMENT SERVICES IN UK
 
BEST COMMISSIONING MANAGEMENT SERVICES IN UK
BEST COMMISSIONING MANAGEMENT SERVICES IN UKBEST COMMISSIONING MANAGEMENT SERVICES IN UK
BEST COMMISSIONING MANAGEMENT SERVICES IN UK
 
How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...
How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...
How To Become A Machine Learning Engineer? | Machine Learning Engineer Salary...
 
Frank Cohen - Are We Ready For Cloud Testing - EuroSTAR 2010
Frank Cohen - Are We Ready For Cloud Testing - EuroSTAR 2010Frank Cohen - Are We Ready For Cloud Testing - EuroSTAR 2010
Frank Cohen - Are We Ready For Cloud Testing - EuroSTAR 2010
 

Recently uploaded

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesThousandEyes
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Product School
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Thierry Lestable
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance
 

Recently uploaded (20)

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 

Scaling ml @ careem (oreilly ai conf)

  • 2. Who? Ahmed Kamal - Tech Lead @ Machine Learning Platform, Careem, - Computer Engineer by training I blog @ ahmedkamal.me Find me on twitter @_akamal_ Presenting the work of my team and other awesome colleagues @ Careem
  • 3. Agenda 3 Introduction AI @ Careem Motivation behind building ML Platform A Deep dive into Careem MLP Scaling Machine Learning Usage Learnings from the journey 1 2 3 4 5 6
  • 4. We are the leading technology platform in the middle east !
  • 5. Multi-vertical platform of Mobility, Delivery, Payments 14 Countries 100+ Cities 3000 Colleagues 33M Customers 1.7 M Captains
  • 6. To simplify and improve the lives of people… ...and build an awesome organisation that inspires
  • 7. Sneak Peek into ML @ Careem ● Enhances customers and captains Experience ○ ETAs & Accurate Prices ○ Cancellations & Captain Acceptance
  • 8. Sneak Peek into ML @ Careem ● Enhances customers and captains Experience ○ ETAs & Accurate Prices ○ Cancellations & Captain Acceptance ● Platform Integrity ○ Fraud Prevention and Detection ○ Anomaly Detection
  • 9. Sneak Peek into ML @ Careem ● Enhances customers and captains Experience ○ ETAs & Accurate Prices ○ Cancellations & Captain Acceptance ● Platform Integrity ○ Fraud Prevention and Detection ○ Anomaly Detection ● Ensure efficiency of our two-sided marketplace ○ Demand and Supply forecasting ○ Smart Dispatching & Peak
  • 10. Building an AI ecosystem 10 ML Infra Scalable Machine Learning Platforms Data Warehouses Well governed, trustworthy and documented data Big Data capabilities Easy and reliable access to large volumes of data Scalable AI Ecosystem Know How AI aware colleagues
  • 11. Expectation: (%) Reality: (%) Formulate the problem Data selection and feature engineering ML model development Model deployment, integration and monitoring ML Workflow - Challenges of ML at scale
  • 13. ML Development Challenges 13 Lots amount of data to process Prepare Data Training is very costly and takes long time. Reproducibility Train a model Development environment mismatch with production environment. Transfer to Prod Models needs to be in production - Low Latency & High Throughput - Monitoring & Alerting - Fault Tolerance & Auto scaling Deploy
  • 14. Post Deployment Challenges Continuously refresh and updated deployed models
  • 15. Post Deployment Challenges Performance Monitoring and alertingContinuously refresh and updated deployed models
  • 16. Post Deployment Challenges Performance Monitoring and alertingContinuously refresh and updated deployed models A/B Testing between new/old or new/new models
  • 17. Post Deployment Challenges Which cities are ready for ML ? Performance Monitoring and alertingContinuously refresh and updated deployed models A/B Testing between new/old or new/new models
  • 18. Additional 100 models ? Post Deployment Challenges Which cities are ready for ML ? Performance Monitoring and alertingContinuously refresh and updated deployed models A/B Testing between new/old or new/new models
  • 19. Additional 100 models ? Post Deployment Challenges Which cities are ready for ML ? Performance Monitoring and alerting Too many APIs ? Integration headache Continuously refresh and updated deployed models A/B Testing between new/old or new/new models
  • 20. Careem ML Platform Accelerate adoption of AI @ Careem Automate the end to end ML experience
  • 21. Formulate the problem Select data and feature engineering Deploy model to production Train and test models Monitor and improve ML lifecycle Tackling ML Life Cycle
  • 25. Batch Serving - Batch predictions generated offline for a target dataset. - Store predictions in different data-stores.
  • 26. Realtime Serving - Config Based Modular Serving Framework - Inject Custom Feature Engineering Logic - Access to external data & A/B testing support - Prediction and performance logging -
  • 27. - Configuration Based Deployment service. - Latency, integration and API tests - Auto-Rollout Capabilities to smoothen model updates experience One Click Deployment Configs Production Level API One Click !
  • 28. A Glued Dynamic System Data Generation Model Training Model Serving
  • 29. URL => eta-service.careem.com/v1/101/eta Request => [{"uuid": "9fdsaf9as9da9sd9", "assignment_time": "2019-03-14 14:09:46", "captain_lat": 30.0039, "captain_long": 31.1422, "booking_lat": 30.0022, "booking_long": 31.1405}] Response => { "response": [{ "uuid": "9fdsaf9as9da9sd9", "prediction": 2.5 } ], "result": "ok" } Now you have an API
  • 30. Impact Infra Cost Reduced time needed by a DS from days to minutes per model. Reduced time needed by a DE from 2 weeks to 12 minutes per use-case One-Click Serving API Deployment Training pipelines and Auto-rollout Dataset Generation More impact than doubling the size of our DS team Serverless Job Training Up to 90% saving on training cost We are able to have more models on production and have much higher compounded impact Model Reports (Visualizations + Metrics over time) Saving hours from DS time Reduced analysis and evaluation time from hours to minutes. Productivity
  • 33. From Scaling Infra to Scaling Usage AI For Everyone Auto Machine Learning Custom AI Powered Toolings Generic Time Series Forecasting Supply Forecasting Demand Forecasting Campaign Management System Customer Care
  • 34. - Enables experimenting with ML in short time. - Lower the barrier for using ML for lots of people AutoML Auto Feature Selection/Engineering Auto Hyperparameter Tuning Auto Model Selection Auto Model Ensembling Auto Rollout Rich Feature Store
  • 35. - ML development is a cross functional work. - Heavy investments in automation is the key for scaling ML. - Design with different user segments in mind. Learnings from the journey
  • 36. For Slides : @_akamal8_
  • 37.
  • 38. Thank you! Danke! Teşekkürler! !‫ﺷﮑرﯾہ‬ !ً‫ا‬‫ﺷﻛر‬ For Slides : @_akamal8_