Better Customer Experience with
Data Science
(just add water)
Bernard Burg
Comcast
bernard_burg@comcast.com
7/19/16 H2O Open Tour 2016, New York 1
XFINITY TV
XFINITY Internet
XFINITY Voice
XFINITY Home
Digital & OtherOther
*Minority interest and/or non-controlling interest.
Slide is not comprehensive of all Comcast NBCUniversal assets
Updated: December 22, 2015
Complex Troubleshooting
• Failure scenario
– Customer orders a Video-on-Demand
– Transaction fails, customer care call initiated
• Consequences
– Unhappy customer: no visibility or opportunity to mitigate issue
– Potentially avoidable phone call
• Numerous potential reasons for failure
– Billing
– Resource unavailable
– Service issue
– Hardware issue (set-top box or router)
– Software issue
– Parental control settings
7/19/16 H2O Open Tour 2016, New York 3
Analysis
• What brought the customer to this point?
– Call records
– Billing history
– Events generated by hardware
– Upstream outages
– Usage spikes
• What’s the best course of action now?
• How can we predict such issues?
7/19/16 H2O Open Tour 2016, New York 4
Project Goals
7/19/16 H2O Open Tour 2016, New York 5
Improve Customer Experience
• Keep our customers informed
• Empower our CARE agents
– Timely, accurate, complete information & context
– Smart recommendations
• Higher first call resolution
Maximize Efficiency
• Customer self service
– Fewer calls & truck rolls
• Self Assisted-healing equipment
Goal of Data Science
7/19/16 H2O Open Tour 2016, New York 6
Each user’s set top boxes sends up to 150+
different codes of error messages, at any time:
Goal 1: predict if a user will call
Goal 2: predict why they call
Predicting User Calls
Using Error Model Alone
Data science
Gradient Boosting Machine
66% accuracy
Temporal model
The algorithm reached a glass ceiling
calls
no-calls
Using Error + User Behavior Models
Data science
Gradient Boosting Machine
79% accuracy
Temporal model
Behavior model
calls
no-calls
no-calls
7/19/16 H2O Open Tour 2016, New York 7
Predicting Why Users Call
A Single Algorithm Predicting 10 Buckets
Data science
Gradient Boosting Machine
47% accuracy is not great
but is about 5 times better than random
Temporal model
7/19/16 H2O Open Tour 2016, New York 8
Spark ML H2O
Accuracy 42% 47%
Processing time 10 minutes 2 minutes
Memory Limited size of test No limit reached
Ease of use Program dataFrame UI
Very easy to make in sparkling Water:
Map enum to n binary buckets
7/19/16 H2O Open Tour 2016, New York 9
Predicting Why Users Call
10 Specialized Algorithms Predicting 10 Buckets
10 binary
buckets
Predicting Why Users Call
10 Specialized Algorithms Predicting 10 Buckets
Data science
Gradient Boosting MachineTemporal model
7/19/16 H2O Open Tour 2016, New York 10
Accuracy SparkML H2O H2O’s gain
Bucket 0: activations 97% 99% 2%
Bucket 1: appointment 97% 99% 2%
Bucket 2: billing 84% 86% 2%
Bucket 3: op-3 90% 93% 3%
Bucket 4: op-4 85% 90% 5%
Bucket 5: op-5 99% 99% 0%
Bucket 6: op-6 98% 100% 2%
Bucket 7: op-7 80% 82% 2%
Bucket 8: op-8 93% 97% 4%
Bucket 9: technical 66% 87% 21%
Average Accuracy 89% 95% 6%
Predicting Why Users Call
Looks good but…
Data science
Gradient Boosting MachineTemporal model
7/19/16 H2O Open Tour 2016, New York 11
Accuracy SparkML H2O H2O’s gain
Bucket 0: activations 97% 99% 2%
Bucket 1: appointment 97% 99% 2%
Bucket 2: billing 84% 86% 2%
Bucket 3: op-3 90% 93% 3%
Bucket 4: op-4 85% 90% 5%
Bucket 5: op-5 99% 99% 0%
Bucket 6: op-6 98% 100% 2%
Bucket 7: op-7 80% 82% 2%
Bucket 8: op-8 93% 97% 4%
Bucket 9: technical 66% 87% 21%
Average Accuracy 89% 95% 6%
Data science
Gradient Boosting Machine
Spark ML H2O
Accuracy ? 60%
Processing time 10 * 10 minutes 11 * 2 minutes
Memory Limited size of test No limit reached
Ease of use Program dataFrame UI
Why this
drop from
95% to 60%
Learning 10 Specialized Algorithms in H2O
7/19/16 H2O Open Tour 2016, New York 12
Predicting Why Users Call
Overlapping Buckets
7/19/16 H2O Open Tour 2016, New York 13
Hope given by a 95%
composite precision of the 10
binary algorithms did not
materialize because of
overlapping classes
misclassifying elements as
shown in ROC (Receiver
Operating characteristic)
charts as drawn by H2O
false
positive
false
positive
truepositivetruepositive
Forecasting Improvements with H20
7/19/16 H2O Open Tour 2016, New York 14
• Hypothesis case 1: B2:billing can be predicted with 100% accuracy
• The overall prediction model would jump to : 75% accuracy
Replace
Estimatio
n by result
Forecasting Improvements
7/19/16 H2O Open Tour 2016, New York 15
• By fixing one of the problematic buckets:
• The overall prediction model would jump to : 75% accuracy
• By fixing both problematic buckets:
• The overall prediction model would jump to : 86% accuracy
These simple forecasts are worth gold,
as they allow us to focus on the essential
(out of 1000’s of parameters)
Conclusion
7/19/16 H2O Open Tour 2016, New York 16
Choice to switch to H20 was simple
• Superior results (accuracy)
• Faster algorithms (factor 3)
• Better use of memory
• Accelerated studies because of
– Input UI allowing to select/deselect columns
– Very smart output UI (ROC, influent parameters…)
• Stable and reliable algorithms
Room for improvement:
• Sparkling water interface showed some instabilities
• We designed around it by generating csv files

Better Customer Experience with Data Science - Bernard Burg, Comcast

  • 1.
    Better Customer Experiencewith Data Science (just add water) Bernard Burg Comcast bernard_burg@comcast.com 7/19/16 H2O Open Tour 2016, New York 1
  • 2.
    XFINITY TV XFINITY Internet XFINITYVoice XFINITY Home Digital & OtherOther *Minority interest and/or non-controlling interest. Slide is not comprehensive of all Comcast NBCUniversal assets Updated: December 22, 2015
  • 3.
    Complex Troubleshooting • Failurescenario – Customer orders a Video-on-Demand – Transaction fails, customer care call initiated • Consequences – Unhappy customer: no visibility or opportunity to mitigate issue – Potentially avoidable phone call • Numerous potential reasons for failure – Billing – Resource unavailable – Service issue – Hardware issue (set-top box or router) – Software issue – Parental control settings 7/19/16 H2O Open Tour 2016, New York 3
  • 4.
    Analysis • What broughtthe customer to this point? – Call records – Billing history – Events generated by hardware – Upstream outages – Usage spikes • What’s the best course of action now? • How can we predict such issues? 7/19/16 H2O Open Tour 2016, New York 4
  • 5.
    Project Goals 7/19/16 H2OOpen Tour 2016, New York 5 Improve Customer Experience • Keep our customers informed • Empower our CARE agents – Timely, accurate, complete information & context – Smart recommendations • Higher first call resolution Maximize Efficiency • Customer self service – Fewer calls & truck rolls • Self Assisted-healing equipment
  • 6.
    Goal of DataScience 7/19/16 H2O Open Tour 2016, New York 6 Each user’s set top boxes sends up to 150+ different codes of error messages, at any time: Goal 1: predict if a user will call Goal 2: predict why they call
  • 7.
    Predicting User Calls UsingError Model Alone Data science Gradient Boosting Machine 66% accuracy Temporal model The algorithm reached a glass ceiling calls no-calls Using Error + User Behavior Models Data science Gradient Boosting Machine 79% accuracy Temporal model Behavior model calls no-calls no-calls 7/19/16 H2O Open Tour 2016, New York 7
  • 8.
    Predicting Why UsersCall A Single Algorithm Predicting 10 Buckets Data science Gradient Boosting Machine 47% accuracy is not great but is about 5 times better than random Temporal model 7/19/16 H2O Open Tour 2016, New York 8 Spark ML H2O Accuracy 42% 47% Processing time 10 minutes 2 minutes Memory Limited size of test No limit reached Ease of use Program dataFrame UI
  • 9.
    Very easy tomake in sparkling Water: Map enum to n binary buckets 7/19/16 H2O Open Tour 2016, New York 9 Predicting Why Users Call 10 Specialized Algorithms Predicting 10 Buckets 10 binary buckets
  • 10.
    Predicting Why UsersCall 10 Specialized Algorithms Predicting 10 Buckets Data science Gradient Boosting MachineTemporal model 7/19/16 H2O Open Tour 2016, New York 10 Accuracy SparkML H2O H2O’s gain Bucket 0: activations 97% 99% 2% Bucket 1: appointment 97% 99% 2% Bucket 2: billing 84% 86% 2% Bucket 3: op-3 90% 93% 3% Bucket 4: op-4 85% 90% 5% Bucket 5: op-5 99% 99% 0% Bucket 6: op-6 98% 100% 2% Bucket 7: op-7 80% 82% 2% Bucket 8: op-8 93% 97% 4% Bucket 9: technical 66% 87% 21% Average Accuracy 89% 95% 6%
  • 11.
    Predicting Why UsersCall Looks good but… Data science Gradient Boosting MachineTemporal model 7/19/16 H2O Open Tour 2016, New York 11 Accuracy SparkML H2O H2O’s gain Bucket 0: activations 97% 99% 2% Bucket 1: appointment 97% 99% 2% Bucket 2: billing 84% 86% 2% Bucket 3: op-3 90% 93% 3% Bucket 4: op-4 85% 90% 5% Bucket 5: op-5 99% 99% 0% Bucket 6: op-6 98% 100% 2% Bucket 7: op-7 80% 82% 2% Bucket 8: op-8 93% 97% 4% Bucket 9: technical 66% 87% 21% Average Accuracy 89% 95% 6% Data science Gradient Boosting Machine Spark ML H2O Accuracy ? 60% Processing time 10 * 10 minutes 11 * 2 minutes Memory Limited size of test No limit reached Ease of use Program dataFrame UI Why this drop from 95% to 60%
  • 12.
    Learning 10 SpecializedAlgorithms in H2O 7/19/16 H2O Open Tour 2016, New York 12 Predicting Why Users Call
  • 13.
    Overlapping Buckets 7/19/16 H2OOpen Tour 2016, New York 13 Hope given by a 95% composite precision of the 10 binary algorithms did not materialize because of overlapping classes misclassifying elements as shown in ROC (Receiver Operating characteristic) charts as drawn by H2O false positive false positive truepositivetruepositive
  • 14.
    Forecasting Improvements withH20 7/19/16 H2O Open Tour 2016, New York 14 • Hypothesis case 1: B2:billing can be predicted with 100% accuracy • The overall prediction model would jump to : 75% accuracy Replace Estimatio n by result
  • 15.
    Forecasting Improvements 7/19/16 H2OOpen Tour 2016, New York 15 • By fixing one of the problematic buckets: • The overall prediction model would jump to : 75% accuracy • By fixing both problematic buckets: • The overall prediction model would jump to : 86% accuracy These simple forecasts are worth gold, as they allow us to focus on the essential (out of 1000’s of parameters)
  • 16.
    Conclusion 7/19/16 H2O OpenTour 2016, New York 16 Choice to switch to H20 was simple • Superior results (accuracy) • Faster algorithms (factor 3) • Better use of memory • Accelerated studies because of – Input UI allowing to select/deselect columns – Very smart output UI (ROC, influent parameters…) • Stable and reliable algorithms Room for improvement: • Sparkling water interface showed some instabilities • We designed around it by generating csv files