Scaling ml @ careem (oreilly ai conf)

Scaling Machine
Learning
@ Careem

Who?
Ahmed Kamal
- Tech Lead @ Machine Learning Platform, Careem,
- Computer Engineer by training
I blog @ ahmedkamal.me
Find me on twitter @_akamal_
Presenting the work of my team and other awesome colleagues @ Careem

Agenda
3
Introduction
AI @ Careem
Motivation behind building ML Platform
A Deep dive into Careem MLP
Scaling Machine Learning Usage
Learnings from the journey
1
2
3
4
5
6

We are the leading
technology
platform in the
middle east !

Multi-vertical platform of
Mobility, Delivery, Payments
14 Countries 100+ Cities 3000 Colleagues
33M Customers 1.7 M Captains

To simplify and improve the lives of people…
...and build an awesome organisation that inspires

Sneak Peek into ML @ Careem
● Enhances customers and captains Experience
○ ETAs & Accurate Prices
○ Cancellations & Captain Acceptance

● Platform Integrity
○ Fraud Prevention and Detection
○ Anomaly Detection

● Platform Integrity
○ Fraud Prevention and Detection
○ Anomaly Detection
● Ensure efficiency of our two-sided marketplace
○ Demand and Supply forecasting
○ Smart Dispatching & Peak

Building an AI
ecosystem
10
ML Infra
Scalable
Machine
Learning
Platforms
Data
Warehouses
Well governed,
trustworthy
and
documented
data
Big Data
capabilities
Easy and
reliable access
to large volumes
of data
Scalable AI Ecosystem
Know How
AI aware
colleagues

Expectation: (%)
Reality: (%)
Formulate the problem
Data selection and feature
engineering
ML model development
Model deployment,
integration and monitoring
ML Workﬂow - Challenges of ML at scale

ML Development Challenges
13
Lots amount of data to
process
Prepare Data
Training is very costly
and takes long time.
Reproducibility
Train a model
Development
environment mismatch
with production
environment.
Transfer to Prod
Models needs to be in
production
- Low Latency &
High Throughput
- Monitoring &
Alerting
- Fault Tolerance &
Auto scaling
Deploy

Post Deployment Challenges
Continuously refresh and updated
deployed models

Performance Monitoring and alertingContinuously refresh and updated
deployed models

deployed models
A/B Testing between new/old or
new/new models

Which cities are ready for ML ?
deployed models
new/new models

Additional 100 models ?
deployed models
new/new models

Additional 100 models ?
Performance Monitoring and alerting
Too many APIs ? Integration headache
Continuously refresh and updated
deployed models
new/new models

Careem ML
Platform
Accelerate adoption of AI @
Careem
Automate the end to end ML
experience

Formulate the
problem
Select data and
feature engineering
Deploy model to
production
Train and test
models
Monitor and improve
ML lifecycle
Tackling ML Life Cycle

Batch Serving
- Batch predictions generated offline for a target
dataset.
- Store predictions in different data-stores.

Realtime Serving
- Config Based Modular Serving Framework
- Inject Custom Feature Engineering Logic
- Access to external data & A/B testing support
- Prediction and performance logging
-

- Configuration Based Deployment service.
- Latency, integration and API tests
- Auto-Rollout Capabilities to smoothen model updates experience
One Click Deployment
Configs Production Level API
One Click !

A Glued Dynamic System
Data Generation Model Training Model Serving

URL => eta-service.careem.com/v1/101/eta
Request =>
[{"uuid": "9fdsaf9as9da9sd9", "assignment_time": "2019-03-14 14:09:46", "captain_lat": 30.0039,
"captain_long": 31.1422, "booking_lat": 30.0022, "booking_long": 31.1405}]
Response =>
{
"response": [{
"uuid": "9fdsaf9as9da9sd9",
"prediction": 2.5
}
],
"result": "ok"
}
Now you have an API

Impact
Infra
Cost
Reduced time needed by
a DS from days to
minutes per model.
Reduced time needed by
a DE from 2 weeks to 12
minutes per use-case
One-Click Serving API Deployment
Training pipelines and Auto-rollout
Dataset Generation
More impact
than doubling
the size of our
DS team
Serverless Job Training
Up to 90% saving on
training cost
We are able to have more
models on production and
have much higher
compounded impact
Model Reports (Visualizations +
Metrics over time)
Saving hours from DS
time
Reduced analysis and
evaluation time from
hours to minutes.
Productivity

Scaling Machine Learning Usage

From Scaling Infra to Scaling Usage
AI For Everyone
Auto Machine Learning
Custom AI Powered
Toolings
Generic Time Series
Forecasting
Supply
Forecasting
Demand
Forecasting
Campaign
Management
System
Customer
Care

- Enables experimenting with ML in short time.
- Lower the barrier for using ML for lots of people
AutoML
Auto Feature Selection/Engineering
Auto Hyperparameter Tuning
Auto Model Selection
Auto Model Ensembling
Auto Rollout
Rich Feature Store

- ML development is a cross functional work.
- Heavy investments in automation is the key for scaling ML.
- Design with different user segments in mind.
Learnings from the journey

Thank you! Danke! Teşekkürler!
!‫ﺷﮑرﯾہ‬ !ً‫ا‬‫ﺷﻛر‬
For Slides : @_akamal8_

Scaling ml @ careem (oreilly ai conf)

Recommended

Recommended

More Related Content

Similar to Scaling ml @ careem (oreilly ai conf)

Similar to Scaling ml @ careem (oreilly ai conf) (20)

Recently uploaded

Recently uploaded (20)

Scaling ml @ careem (oreilly ai conf)