The Social Effect: Predicting Telecom Customer Churn with Call DataMichael E. Driscoll, Ph.D.Principal, FounderFebruary 16, 2010
Social Network Analysis with Telecom DataThe following slides describe an initial project analyzing a N. American telecom’s call data on a dedicated analytics platform:We describe the analysis of a slice of a telecom’s call history data from several million customers in the several major North American markets.We demonstrate the performance gain achieved by having a dedicated analytics platform (computation of  millions of relationships from tens of billions of events, spanning tens of TB of data, in less than one hour)We show  that social network influence is a powerful predictor of customer churn:  subscribers who experience a Telecom cancellation in their frequent calling network are 2x more likely to cancel themselves.We highlight one outbreak of cancellations in a metropolitan call network from May-June 2009.
Challenge:  Customer ChurnAcquisitionAttrition
Key Data:  Call Detail RecordsA slice of several billion call detail records (CDRs) from several million subscribers drawn from three major North American markets, for May-August 2009.
Call Quality AnalysisNo Relationship Between Dropped Calls and Customer ChurnNo significant correlation found between:  inferred dropped calls  (defined as consecutive calls to same number with < 20 s gap)library(ggplot2)qplot(Status, DroppedCalls,   data=CallHistory,geom="boxplot“)Box plot to shows log-normalized distributions of dropped call frequencies (drops per 100 calls) for 10k customers placed, faceted by active and cancelled subscribers.
What about social networks?
Social Network AnalysisNetwork is Generated from Call History DataCall history logs were pulled from the Greenplum warehouse.  These were parsed and outgoing numbers were associated with subscription ids.  The result is a row of data for every caller-callee connection meeting a low threshold (> 1 call and > 60 s talk-time per month).  The majority are between Telecom customers and other carriers (or land-lines).
Our Analytics WorkflowThree steps:  1.  Pull from DB, 2.  Analyze in R,  3.  Visualize in R + Graphviz
Our Tool:  The R Programming LanguageDownload R at http://www.r-project.org/
Getting Call Data Into R for Analysis	- from Files> Calls <- read.csv(“CallHistory.csv”,header=TRUE)	  from Databases> con <- dbConnect(driver,user,password,host,dbname)> Calls <- dbSendQuery(con, “SELECT * FROM call_history”)	  from the Web> con <- url('http://Telco.com/dump/CallHistory.csv')> Calls <- read.csv(con, header=TRUE)	   from previous R objects> load(‘CallHistory.RData’)
Social Network AnalysisMillions of edges analyzed in minutesFull analysis of a first-order outgoing call network for our slice (~ millions of customers, three months of call history) took less than one hour.This could be further improved with further parallelization of R code (currently SQL queries run parallel on Greenplum, R is run on master node).
Results:  People Have Small Call Networks (Three)The median size of a caller’s network is three,  while the mean size is five.
Results: Canceling Customers are 7x More Likely to be LinkedTypes of Callers (Nodes)active (A)cancelled (C)Types of Connections (Edges)A-AA-C or C-AC-CC-C edges are 7x more likely in call networks than what is expected by chance
Results:  A Customer With a Canceller in Their Network Churns at Twice the RateTypes of Connections (Edges)May C-AJune C-CIn essence, we are asking whether being connected to another canceller has any effect on one’s rate of cancellation.  It turns out that it does.   And if we only look at voluntary port-outs, we see that customers churn at 3x the rate.
From Data to Insights to ActionsIf we had known two customers’ calling networks…Could we have prevented four more from leaving?
The Emerging Analytics StackActionsApps (Email, Ad Campaigns)Analytics(R, SPSS, SAS, SAP)InsightsBig Data(HDFS or Parallel RDBMS) Data
ReferencesEnhancing Customer Knowledge at Optus, Teradata Case-Study (September 2009).IBM’s Analytics Tapped to Predict, Prevent Churn.  Telephony Online (April 2009).  The Elements of Statistical Learning, Hastie, Tibshirani, Friedman.  Springer. (February 2009).Study Shows Obesity Can Be Contagious, Gina Kolata, The New York Times (July 25, 2007)  [great example of homophily]ContactMichael E. Driscoll, Ph.D.med@dataspora.comFollow @datasporaon Twitter

Social Network Analysis for Telecoms

  • 1.
    The Social Effect:Predicting Telecom Customer Churn with Call DataMichael E. Driscoll, Ph.D.Principal, FounderFebruary 16, 2010
  • 2.
    Social Network Analysiswith Telecom DataThe following slides describe an initial project analyzing a N. American telecom’s call data on a dedicated analytics platform:We describe the analysis of a slice of a telecom’s call history data from several million customers in the several major North American markets.We demonstrate the performance gain achieved by having a dedicated analytics platform (computation of millions of relationships from tens of billions of events, spanning tens of TB of data, in less than one hour)We show that social network influence is a powerful predictor of customer churn: subscribers who experience a Telecom cancellation in their frequent calling network are 2x more likely to cancel themselves.We highlight one outbreak of cancellations in a metropolitan call network from May-June 2009.
  • 3.
    Challenge: CustomerChurnAcquisitionAttrition
  • 4.
    Key Data: Call Detail RecordsA slice of several billion call detail records (CDRs) from several million subscribers drawn from three major North American markets, for May-August 2009.
  • 5.
    Call Quality AnalysisNoRelationship Between Dropped Calls and Customer ChurnNo significant correlation found between: inferred dropped calls (defined as consecutive calls to same number with < 20 s gap)library(ggplot2)qplot(Status, DroppedCalls, data=CallHistory,geom="boxplot“)Box plot to shows log-normalized distributions of dropped call frequencies (drops per 100 calls) for 10k customers placed, faceted by active and cancelled subscribers.
  • 6.
  • 7.
    Social Network AnalysisNetworkis Generated from Call History DataCall history logs were pulled from the Greenplum warehouse. These were parsed and outgoing numbers were associated with subscription ids. The result is a row of data for every caller-callee connection meeting a low threshold (> 1 call and > 60 s talk-time per month). The majority are between Telecom customers and other carriers (or land-lines).
  • 8.
    Our Analytics WorkflowThreesteps: 1. Pull from DB, 2. Analyze in R, 3. Visualize in R + Graphviz
  • 9.
    Our Tool: The R Programming LanguageDownload R at http://www.r-project.org/
  • 10.
    Getting Call DataInto R for Analysis - from Files> Calls <- read.csv(“CallHistory.csv”,header=TRUE) from Databases> con <- dbConnect(driver,user,password,host,dbname)> Calls <- dbSendQuery(con, “SELECT * FROM call_history”) from the Web> con <- url('http://Telco.com/dump/CallHistory.csv')> Calls <- read.csv(con, header=TRUE) from previous R objects> load(‘CallHistory.RData’)
  • 11.
    Social Network AnalysisMillionsof edges analyzed in minutesFull analysis of a first-order outgoing call network for our slice (~ millions of customers, three months of call history) took less than one hour.This could be further improved with further parallelization of R code (currently SQL queries run parallel on Greenplum, R is run on master node).
  • 12.
    Results: PeopleHave Small Call Networks (Three)The median size of a caller’s network is three, while the mean size is five.
  • 13.
    Results: Canceling Customersare 7x More Likely to be LinkedTypes of Callers (Nodes)active (A)cancelled (C)Types of Connections (Edges)A-AA-C or C-AC-CC-C edges are 7x more likely in call networks than what is expected by chance
  • 14.
    Results: ACustomer With a Canceller in Their Network Churns at Twice the RateTypes of Connections (Edges)May C-AJune C-CIn essence, we are asking whether being connected to another canceller has any effect on one’s rate of cancellation. It turns out that it does. And if we only look at voluntary port-outs, we see that customers churn at 3x the rate.
  • 15.
    From Data toInsights to ActionsIf we had known two customers’ calling networks…Could we have prevented four more from leaving?
  • 16.
    The Emerging AnalyticsStackActionsApps (Email, Ad Campaigns)Analytics(R, SPSS, SAS, SAP)InsightsBig Data(HDFS or Parallel RDBMS) Data
  • 17.
    ReferencesEnhancing Customer Knowledgeat Optus, Teradata Case-Study (September 2009).IBM’s Analytics Tapped to Predict, Prevent Churn. Telephony Online (April 2009). The Elements of Statistical Learning, Hastie, Tibshirani, Friedman. Springer. (February 2009).Study Shows Obesity Can Be Contagious, Gina Kolata, The New York Times (July 25, 2007) [great example of homophily]ContactMichael E. Driscoll, Ph.D.med@dataspora.comFollow @datasporaon Twitter

Editor's Notes

  • #4 Most telcos lose 1-2% of their customers every month.It’s 7x more expensive to acquire a customer, than to retain.
  • #14 Birds of a feather flock together; cancellers clump together, so do active users. Like vinegar and water, we see enrichment for “like-like” edges in our network, and dilution of “dissimilar” edges (the A-C or C-A). Upshot: people cancellationQuestion: is this all an artifact of family plans – where a bunch of subscribers quits together? In part yes, but the trends hold up even when we do a temporal analysis.
  • #15 Key take-home point here is that this analysis , looking at the May to June transition, removes
  • #17  The stack is loosely coupled: right tool for the right job. The need for a dedicated analytics RDBMS