Data Science Out of The Box : Case Studies in the Telecommunication by Anand Ranganathan

Anand Ranganathan,
VP of Solutions
Aug 2017
DATA SCIENCE OUT OF THE BOX:
Case Studies In The
Telecommunications Industry

Telecommunications Service Providers have huge
amounts of data related to customer activity that come to
them in real-time
2
• Calling, SMS and data usage information
• Purchase and recharge data
• Plan information
• Browsing data (DPI)
• Location information from CDRs, probes or other sources
• Device Data logs
• Call Center logs

But, they face challenges in getting value from this data to
improve customer experience
3
Difficult to integrate data about
customers from multiple
sources into a single view
Difficult to integrate the insights
from the models with other tools
Difficult to build models Difficult to act upon the insights
Difficult to operationalize the
models
Difficult to gain business value
1
2
3
4
5
6

What telcos would
like to do …
4

5
Make every
interaction
with the brand….

6
We believe it’s the little
things..
… Targeted,
precise and
contextual

7
Predict which
customers will
need int’l
roaming in the
next day?

8
Provide personalized, real-time offers to
customers whose data pack is predicted to run out
in the next 12 hours??

9
Provide real-time
predictive issue
resolution to your
customers, before they
call the call center?

10
Harnessing Data in
Real-Time is key to
creating a great
customer
experience…
… Most
enterprises,
though, have
struggled to
deploy and get
value from analytics …
especially, real-time
analytics

Does it have to take years to deploy an advanced analytics
solution ?
Do you really need an army of Data Scientists to create new
models every year ?
Do you really have to stitch together 10 solutions for a ‘single
customer view’ that gets updated once a day?
Why is it still so difficult to create personalized and contextual
campaigns ?

Our Vision – Easy to use Real-time analytics in a box
12
Our initial target
domains are:
• Telecommunications
• Healthcare
• Banking
Allow rapid
deployment of
analytics and reduce
time to value
Through reusable
machine learning
pipelines that cover
common needs
in several industries.

Firstly, what is a machine learning pipeline?
13
Training
Data
Parsing, Cleaning,
Transformations
Feature
Extraction
Train
Model
Model
Test Data
Predictions
Parsing, Cleaning,
Transformations
Feature
Extraction
Score Model
Training Pipeline
Scoring
Pipeline

We have 40+ readily deployable ML Pipelines covering
common telco marketing requirements
14
Machine
Learning Real-
Time and
Offline
Predictive
Models
Wallet, Purchase & Journey Models
§ Predict subscriber’s next top-up amount
§ Predict when subscriber might top-up
§ Predict if subscriber will buy or renew package
§ Predict Package expiry
§ Predict if package will expire with high balance
§ Prepaid to Postpaid Conversion Propensity
§ Churn Propensity
§ Next Best Action Model
§ Customer Lifetime Value Prediction
Spatio-Temporal Models
§ Predict home location, work location, weekend travel locations
§ Predict where subscriber will be at given hour & day, e.g. on Fridays
at 7 PM
§ Determine frequently visited locations (malls, churches, office
buildings etc.)
§ Mobility Profiling, e.g. frequent traveler,
stay-at-home, regular commuter
§ Home / Work Location Based Segmentation e.g. Stay-at-home
housewife, Traveling Salesman etc.
Anomaly Detection
§ Detect anomalies in calling pattern within the network / Cell
Site / Location / Subscriber
§ Anomaly Detection in SMS/data usage at Network / Cell Site
/ Location or Subscriber level
§ Anomaly Detection in dropped calls / dropped data sessions
at Network / Cell Site / Location or Subscriber level
Device Models
§ Detect Call Drops & Poor Call Quality from device logs
§ Detect Poorly performing device battery
§ Detect Anomalous Apps based on GPS, wake-lock etc.
§ Determine interests based on App Usage
Communication
• Determine relative preference of SMS, Voice or Data
§ Predict best time of day, day of week or location to
reach subscriber with offers
§ Determine preferred channels of communication
Customer Experience
§ Customer Satisfaction Model, based on dropped calls,
failed data sessions, poor call quality and device issues
§ Predict if customer will call contact center
§ Predict why customer may call contact center
Clickstream and Interests
§ URL Categorization into rich topic hierarchy
§ Long term and short term Interest derivation based
on browsing data of communication
§ Interest prediction based on location & device type
Social Network
§ Determine influencers and social hubs
§ Discover close contacts
§ Identify common interest communities within the
subscriber base

… used to create dynamic profiles of customers, locations
and business or retail outlets
15
Historical:
Typical home / work locations?
Recharge patterns
Calling network
Real Time:
Websites visited in last hour
Number of dropped calls in past day
Recharge prediction in next 6 hours
Historical:
Typical population at location
Spend patterns at location
Typical Mobility profiles
Real Time:
Anomalous network loads
Number of queries for weather
Current population
Historical:
Historical Population trends
Browsing behaviors
Communication patterns
Real Time:
Number of customers near business now.
Number of calls to business in last 1 hour
Number of visits to competing business

Key principle behind data science out of the box
16
Build ML pipeline once & Operationalize repeatedly
Operationalizing The Pipelines
– The ENGINEERING
Building Pipelines
– The ART
• Repeated for every new deployment
• Create the transformations & features on
historical data
• Train initial version of the model &
generate initial scores
• Create the transformations on streaming
data and update features
• Update scores and models “frequently”
based on streaming data
• Done once on some static representative
datasets
• Explore different possible transformations
of the data
• Explore different kinds of features
• Explore different models
• Finalize on a certain pipeline for a given
problem

Machine Learning is not a one-off process taking place in
a static world
17
All model-building & scoring activities happen at a certain point in time
TIME
NOW
Historical Data that has
been collected so far
Streaming Data that will come
in the future
Build initial versions of the
model, score them and
create initial profiles based
on this data
Update scores in the
profiles and refresh models
based on this data

Typical Enterprise Architecture
18
Separate processing pathways for real-time analytics and long-term historical
analytics
Telco Data Sources:
• CDRs
• DPI
• Location
• SMSC
• Billing
ETL
Real-time
Streaming
Data.
Historical Data

Problems with basic pipeline in streaming settings
19
Training
Data
Parsing, Cleaning,
Transformations
Feature
Extraction
Train
Model
Model
Test Data
Predictions
Parsing, Cleaning,
Transformations
Feature
Extraction
Score Model
• Doesn’t show feature creation &
updates on combination of historical &
streaming data
• Doesn't show scoring based on most
recent feature values
• Doesn’t show model refresh

Patterns for Machine Learning Pipelines
20
Update models and
predictions on every event.
E.g. time-series predictions
and anomaly detection for
fraud detection.
Refresh models periodically
and score on every event.
E.g. topup prediction with
models updated every
week.
Build model one-time or
infrequently and score on
every event. E.g. Real-time
churn prediction with static
model
Update models and
predictions periodically.
E.g. user interest models,
hangout predictions and
recommendation models.
Build model one-time or
infrequently and score on
every event. E.g. Real-time
churn prediction with static
model
Build models and
predictions one time or very
infrequently. E.g. offline
churn prediction scores.
Online Frequent/Periodic Batch
MODELBUILDING
Frequent/PeriodicOnline
SCORING

Typical Enterprise Architecture with Unscrambl Brain
21
Separate processing pathways for real-time analytics and long-term historical
analytics
Telco Data Sources:
• CDRs
• DPI
• Location
• SMSC
• Billing
ETL
Real-time
Streaming
Data
Historical Data
• Stream
Analytics
• Profile Store
• Aggregate
Store

Brain is powered by 3 specialized components
22
Leveldb based time-
series aggregate store
Recharges, Number of dropped calls,
Number of international calls,… in the
past 10 minutes, hour, day, week, month
or year
Redis-based
profile store
Last known location of
customers, predicted home
and work locations,,…
Python-based ML
pipeline framework
Call Center Call Prediction Model,
Preferred Channel Prediction
Model,
Social Network Models

Online Learning, Online Scoring
23
One-Time Initialization of features
from Historical Data
Online Model Building & Scoring on Streaming Data
Historical
Data
Parsing, Cleaning,
Transformations
Feature
Extraction
Model
Maintain
Features
Streaming
Data
Parsing, Cleaning,
Transformations
Feature
Extraction
Get Features
for one entity
Train & Score
Model
Write Predictions

Periodic Learning, Online Scoring
24
Historical
Data
Parsing, Cleaning,
Transformations
Feature
Extraction
Train
Model
Model
Maintain
Features
Get Features
for all entities
Streaming
Data
Parsing, Cleaning,
Transformations
Feature
Extraction
Get Features
for one entity
Score
Model
Write Predictions
Periodic Model Re-Training
Online Update of Features and Scoring on
Streaming Data

Periodic Learning & Periodic Scoring
25
Periodic Model Re-Training &
Re-Scoring of all Entities
Online Update of Features from Streaming
Data
Historical
Data
Parsing, Cleaning,
Transformations
Feature
Extraction
Train &
Score
Model
Model
Maintain
Features
Get Features
for all entities
Streaming
Data
Parsing, Cleaning,
Transformations
Feature
Extraction
Write Predictions

Case Study : Telco in SE Asia
26
60+ million subscribers
7+ million optin subscribers
10+ billion CDRs per day
100+ billion URL records per day
15 Machine Learning pipelines rapidly deployed on Spark and Brain to derive a
variety of profile attributes about subscribers
Able to update models and profiles as frequently as needed

Data Science Out of The Box : Case Studies in the Telecommunication by Anand Ranganathan

Data Science Out of The Box : Case Studies in the Telecommunication by Anand Ranganathan

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data Science Out of The Box : Case Studies in the Telecommunication by Anand Ranganathan

Similar to Data Science Out of The Box : Case Studies in the Telecommunication by Anand Ranganathan (20)

More from Data Con LA

More from Data Con LA (20)

Recently uploaded

Recently uploaded (20)

Data Science Out of The Box : Case Studies in the Telecommunication by Anand Ranganathan