H2O World - Machine Learning at Comcast - Andrew Leamon & Chushi Ren
1. Machine Learning at Comcast
November 10th, 2015
Andrew Leamon – Director
Chushi Ren – Software Engineer / Data Scientist
Engineering Analysis
2. About Comcast
Machine Learning at Comcast2
Comcast brings together the best in media and technology.
We drive innovation to create the world’s best entertainment and online
experiences.
High Speed Internet
Video
IP Telephony
Home Security /
Automation
Universal Parks
Media Properties
4. Ø Average US household
watches 3-5 hours of
TV per day (Nielsen)
Ø 3x more than Netflix
(BTIG Research
4/2015)
Ø 4x Videos on
Smartphones,
Tablets, Computers
Ø 50% of leisure time is
spent watching TV!
Importance of Live TV
Netflix
LIVE TV
Online Video
Machine Learning at Comcast4
6. • Ensemble of Gradient Boosted Decision Trees
• Input: statistics of program ratings, program metadata, channel info, …
Number of Signals
0.
77
= New Signal
Trending on X1 – Predict Popularity 24 Hours in Advance
Machine Learning at Comcast6
7. Program recommendations are updated every 20 sec (Spark Streaming)
For more details and code samples see our talk at the Spark Summit
East March 2015 - https://spark-summit.org/east-2015/
Live Tune
Activity from
Kafka
Batch:
User Clustering
with KMeans
Real-time:
TopK Trending Programs
per Cluster
Real-time Program
recommendations per
user
User History
from HDFS
Real-time Recommendations
Machine Learning at Comcast7
9. Problem: Avoidable Truck Rolls (ATR)
Machine Learning at Comcast9
Customer calls to report
an issue with their
service
Customer service agent
goes through ITG to
debug the problem with
customer via phone
When agent cannot
resolve the problem by
phone, a truck roll will
be scheduled
Ø Examples of avoidable truck rolls:
Ø Reset modem
Ø Change remote battery
Ø Entitlement issue
Ø Goal
Ø Build a predictive model to prevent ATRs
10. ATR Machine Learning Pipeline
Machine Learning at Comcast10
Feature
extraction
Feature
selection
Model
training
Model
validation
Data
source
Training
data
Test data
Classifier
11. ATR Challenges
Machine Learning at Comcast11
Ø Skewed data --- only a very
small portion of the truck rolls
are avoidable
Ø Use balance class option
in H2O to upsample data
with minority class
Ø Subsemble
Ø Information leakage --- we use
some feature statistic as
feature, which will cause
information leakage
Ø Hold current row off
Ø Add random noise
Ø Operationalize model
13. Problem: Customer Experience Metric (CXE Metric)
Machine Learning at Comcast13
In CMTS (Cable Modem
Termination System),
ports are logically bonded
to form “Service Group”.
SG Utilization =
Customer experience?
14. Why Do We Need CXE Metric?
Machine Learning at Comcast14
CXE Metric
Understand Customers’ Need
Prioritize Hardware Deployment
15. Customer Experience Metric
Machine Learning at Comcast15
Ø Select features correlated to customer experience across
different dataset
Ø Join them and perform cleaning and aggregation
Ø Cluster to form customer experience groups
17. The Evolution of Resiliency – Scale It!
Machine Learning at Comcast17
System Errors
• User experiences an
Issue
Customer Contact
• Effort Required
Agent Manually
Fixes
• Effort Required
System Errors
• User Experiences an Issue
Machine Learning
• Intelligent Scoring for Solution
Automated / Suggested
Fix
• Issue Resolved with lower
effort
Ø We can reduce effort for Customers and for Customer Care by
building intelligent systems.
18. Self Healing & Sharing Context
Machine Learning at Comcast18
20. Real-time Data + Operationalized Models -> Better Products
“However valuable these PhDs are, the organizations that have been lucky enough to secure these
resources are realizing the limitations in human-powered data science: it’s simply not a scalable
solution.”
“The commonality across all of these new technologies is that they offer something additional
humans cannot provide: the power of scale. Organizations that do not have a strategic initiative to
regularly and organically engage with its customers will be at a serious disadvantage. Soon, AI-
driven engagement models that interpret data and intuitively interact with clients will be the norm.”
Harvard Business Review: “Data Scientists Don’t Scale”: https://hbr.org/2015/05/data-scientists-dont-scale
20 Machine Learning at Comcast
21. Challenges in Operation: Getting Data in Real-time
Machine Learning at Comcast21
Ø Various source of data with different format
Ø Enables real time query with customer event data
22. Challenges in Operation: Computation in Real-time
Machine Learning at Comcast22
Ø Challenges
Ø Handles heavy computation involved to transform raw data
Ø Responds to large amount of prediction requests fast
Ø Updates model with latest data
Ø Potential Solution
Ø Spark + Sparkling Water
23. Tools & Infrastructure to integrate with Actual Products
Machine Learning at Comcast23
Data
• Real-time Production
• Schema Management
• Governance
Models
• Versioning
• Operationalization
• Publishing / Deployment
Integration
• Execution at Runtime
• System APIs
• Validation