Predictive churn h20_dsx

2
• How Am I?
• IBM Data Science Experience
• DSX Features Tour
• Predictive Churn
• What is PC?
• The Lift as the main performance measure
• Telco Dataset Example
• H2o Sparking Water
• What is H2O ?
• What is Apache Spark ?
• What is H2o Sparkling Water ?
• H2o REST API
• Building a Predictive Churn Model
• Modelling with Spark Scala
• Modelling with H2o Sparkling Water using
R
• Modelling with H2o Flow
• Deploying a H2o Model
• Using Play Scala to build a REST endpoint
• Deploying as Docker Container in Google
Cloud
• Deployment with STEAM
Agenda

How Am I ?
• Ndjido Ardo BAR : Data Scientist @ Davidson Consulting UAE
• Background : Research In Mathematics
• Now: Working in @DU Telecom as a Consultant
• Past:
• Worked @AXA (Paris)
• Worked @BearingPoint Hypercube
• Co-Founder of a StartUp (MLouma)
• Worked @Pasteur Institute: Involved in BioStatistical Research
@ndjido
3

IBM Data Science Experience
5
Better organise your Data Science projects
Collaborates with your team members
Learn from the community
https://datascience.ibm.com/

Predictive Churn
7
What is it ?
Predictive Churn is a set of methods to forecast the churn
rate (customers more likely to stop using a service) of a
given service. It’s used for customer retention mainly by
Marketing practitioners.
There are mainly 2 approaches:
1. Classiﬁcation based approach
2. Survival-Analysis based approach

Predictive Churn
8
Performance Measures
Actual
Model
TP
TNFN
FP
RECALL = TP / (TP + FN)
PRECISION = TP / (TP + FP)
LIFT = Precision / % Targeted Customer
Lift: It is the ratio between the Precision and the Support. For
instance a lift of N on the top 20% of the targeted population
means that the model can get up to N times more respondents
than if we had randomly picked 20% of the population.
ROC Curve

Predictive Churn
9
Our Dataset Example
Telco Churn Data Description: 21 Variables (Customer, Plan, Behaviour, other)

H2o Sparkling Water
11
What is H2o ?
H2O an open-source, fast, scalable Machine Learning platform
with Deep Learning capabilities. It’s production-ready.
Cloud Integration
Big Data EcosystemOpen Source Flexible Interface
Scalability and
Performance
GPU Enablement
Rapid Model
Deployment
Smart and Fast
Algorithms
H2O Flow• 100% open source
• Highly portable models
deployed in Java (POJO) and
Model Object Optimized
(MOJO)
• Automated and streamlined
scoring service deployment
with Rest API
• Distributed In-Memory
Computing Platform
• Distributed Algorithms
• Fine-Grain MapReduce
(source: H2o.ai)

H2o Sparkling Water
12
What is H2o ?
HDFS
S3
NFS
Distributed
In-Memory
Load Data
Loss-less
Compression
H2O Compute Engine
Production Scoring Environment
Exploratory &
Descriptive
Analysis
Feature
Engineering &
Selection
Supervised &
Unsupervised
Modeling
Model 
Evaluation &
Selection
Predict
Data & Model 
Storage
Model Export:
Plain Old Java Object
Your
Imagination
Data Prep Export:
Plain Old Java Object
Local
SQL
(source: H2o.ai)
High Level Architecture

H2o Sparkling Water
13
What is H2o ?
(source: H2o.ai)
Algorithms Overview
Supervised Learning
• Generalized Linear Models:
Binomial, Gaussian, Gamma,
Poisson and Tweedie
• Naïve Bayes
Statistical
Analysis
Ensembles
• Distributed Random Forest:
Classification or regression models
• Gradient Boosting Machine:
Produces an ensemble of decision
trees with increasing refined
approximations
Deep Neural
Networks
• Deep learning: Create multi-layer
feed forward neural networks
starting with an input layer followed
by multiple layers of nonlinear
transformations
Unsupervised Learning
• K-means: Partitions observations
into k clusters/groups of the same
spatial size. Automatically detect
optimal k
Clustering
Dimensionality
Reduction
• Principal Component Analysis: Linearly
transforms correlated variables to
independent components
• Generalized Low Rank Models: extend the
idea of PCA to handle arbitrary data
consisting of numerical, Boolean, categorical,
and missing data
Anomaly
Detection
• Autoencoders: Find outliers
using a nonlinear
dimensionality reduction using
deep learning

H2o Sparkling Water
14
“Apache Spark is a fast and general engine for large-scale data processing. ”
What is Spark?

H2o Sparkling Water
15
+
H2o Sparkling Water is :
a transparent integration of H2o with Apache Spark
with transparent use of H2o data structure & algorithms with Spark
with extension of Spark with more sophisticated Machine Learning algo
What is Sparkling Water ?
Powerful data preparation features
NLP Algorithm
Scikit-Learn like ML pipelines
Advanced ML Algorithms
Powerful Data Compression
Graphical UI (Flow)
Exports Model as POJO

H2o Sparkling Water
16
How does it work with Spark ?
(source: H2o.ai)

H2o REST API
17
Working with data (1/3)
Reading Data into H2O with Python/R/Flow
STEP 1
H2O Import function
(source: H2o.ai)
h2o_df = h2o.importFile(“path/to/dataset.csv”)

H2o REST API
18
Reading Data from HDFS into H2O with Python/R/Flow
H2
O
H2
O
H2
O
data.cs
v
HTTP REST API
request to H
2
O
has HDFS path
H2O ClusterInitiate distributed
ingest
HDFS
Request data
from HDFS
STEP 2
2.2
2.3
2.4
H2O import
function
2.1
function call
(source: H2o.ai)

H2o REST API
19
Reading Data from HDFS into H2O with Python/R/Flow
H2
O
H2
O
H2
O
HDFS
STEP 3
Cluster IP
Cluster Port
Pointer to Data
Return pointer to
data in REST API
JSON Response
HDFS provides
data
3.3
3.4
3.1
data.cs
v
Console
H2
O
Fram
e
3.2
Distributed H2
O
Frame in DKV
H2O Cluster
(source: H2o.ai)

Building A
Predictive Churn Model
20
Hands-on materials available at: https://github.com/ndjido/Predictive-Churn-Modeling-with-H2O/

Building A Predictive Churn Model
21
&
Modelling Pipeline: 3 Approaches
only
H2o Flow
VS
1 2 3
Hands-on

Deploying A
Predictive Churn Model
22

Deploying A Predictive Churn Model
23
Model Building Model POJO
export POJO
Deployment Pipeline
REST API Containerised App
local

24
Deployment Pipeline
POJO Integration in your Play App
H2o GenModel added to
your Play App
DEMO TIME

25
Deployment with H2O Steam
“The Steam AI engine is an end-to-end platform that streamlines the entire process of building
and deploying smart applications. Now data scientists and developers can launch turnkey
compute environments for collaboratively training and deploying predictive models and integrate
those models into real-time smart applications”
Demo

Thank You!
Questions ?
@ndjido
26

Predictive churn h20_dsx

More Related Content

What's hot

Similar to Predictive churn h20_dsx

Recently uploaded

Predictive churn h20_dsx