Successfully reported this slideshow.

Social Network Analysis for Telecoms

66

Share

Upcoming SlideShare
churn prediction in telecom
churn prediction in telecom
Loading in …3
×
1 of 17
1 of 17

Social Network Analysis for Telecoms

66

Share

A major North American telecom sought to identify factors driving customer churn. We applied social network analysis over several billion call records. We found that customers with a cancellation in their frequent calling network churned at twice the monthly rate.

A major North American telecom sought to identify factors driving customer churn. We applied social network analysis over several billion call records. We found that customers with a cancellation in their frequent calling network churned at twice the monthly rate.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Social Network Analysis for Telecoms

  1. 1. The Social Effect: Predicting Telecom Customer Churn with Call Data<br />Michael E. Driscoll, Ph.D.<br />Principal, Founder<br />February 16, 2010<br />
  2. 2. Social Network Analysis with Telecom Data<br />The following slides describe an initial project analyzing a N. American telecom’s call data on a dedicated analytics platform:<br />We describe the analysis of a slice of a telecom’s call history data from several million customers in the several major North American markets.<br />We demonstrate the performance gain achieved by having a dedicated analytics platform (computation of millions of relationships from tens of billions of events, spanning tens of TB of data, in less than one hour)<br />We show that social network influence is a powerful predictor of customer churn: subscribers who experience a Telecom cancellation in their frequent calling network are 2x more likely to cancel themselves.<br />We highlight one outbreak of cancellations in a metropolitan call network from May-June 2009.<br />
  3. 3. Challenge: Customer Churn<br />Acquisition<br />Attrition<br />
  4. 4. Key Data: Call Detail Records<br />A slice of several billion call detail records (CDRs) from several million subscribers drawn from three major North American markets, for May-August 2009. <br />
  5. 5. Call Quality Analysis<br />No Relationship Between Dropped Calls and Customer Churn<br />No significant correlation found between:<br /><ul><li> inferred dropped calls (defined as consecutive calls to same number with < 20 s gap)</li></ul>library(ggplot2)<br />qplot(Status, DroppedCalls, <br /> data=CallHistory,<br />geom="boxplot“)<br />Box plot to shows log-normalized distributions of dropped call frequencies (drops per 100 calls) for 10k customers placed, faceted by active and cancelled subscribers. <br />
  6. 6. What about social networks?<br />
  7. 7. Social Network Analysis<br />Network is Generated from Call History Data<br />Call history logs were pulled from the Greenplum warehouse. These were parsed and outgoing numbers were associated with subscription ids. The result is a row of data for every caller-callee connection meeting a low threshold (> 1 call and > 60 s talk-time per month). The majority are between Telecom customers and other carriers (or land-lines).<br />
  8. 8. Our Analytics Workflow<br />Three steps: 1. Pull from DB, 2. Analyze in R, 3. Visualize in R + Graphviz<br />
  9. 9. Our Tool: The R Programming Language<br />Download R at http://www.r-project.org/<br />
  10. 10. Getting Call Data Into R for Analysis<br /> - from Files<br />> Calls <- read.csv(“CallHistory.csv”,header=TRUE)<br /> from Databases<br />> con <- dbConnect(driver,user,password,host,dbname)<br />> Calls <- dbSendQuery(con, “SELECT * FROM call_history”)<br /> from the Web<br />> con <- url('http://Telco.com/dump/CallHistory.csv')<br />> Calls <- read.csv(con, header=TRUE)<br /> from previous R objects<br />> load(‘CallHistory.RData’)<br />
  11. 11. Social Network Analysis<br />Millions of edges analyzed in minutes<br />Full analysis of a first-order outgoing call network for our slice (~ millions of customers, three months of call history) took less than one hour.<br />This could be further improved with further parallelization of R code (currently SQL queries run parallel on Greenplum, R is run on master node).<br />
  12. 12. Results: People Have Small Call Networks (Three)<br />The median size of a caller’s network is three, while the mean size is five.<br />
  13. 13. Results: Canceling Customers are 7x More Likely to be Linked<br />Types of Callers (Nodes)<br />active (A)<br />cancelled (C)<br />Types of Connections (Edges)<br />A-A<br />A-C or C-A<br />C-C<br />C-C edges are 7x more likely in call networks <br />than what is expected by chance <br />
  14. 14. Results: A Customer With a Canceller in Their Network <br />Churns at Twice the Rate<br />Types of Connections (Edges)<br />May C-A<br />June C-C<br />In essence, we are asking whether being connected to another canceller has any effect on one’s rate of cancellation. It turns out that it does. <br />And if we only look at voluntary port-outs, we see that customers churn at 3x the rate.<br />
  15. 15. From Data to Insights to Actions<br />If we had known two customers’ calling networks…<br />Could we have prevented four more from leaving?<br />
  16. 16. The Emerging Analytics Stack<br />Actions<br />Apps <br />(Email, Ad Campaigns)<br />Analytics<br />(R, SPSS, SAS, SAP)<br />Insights<br />Big Data<br />(HDFS or Parallel RDBMS) <br />Data<br />
  17. 17. References<br />Enhancing Customer Knowledge at Optus, Teradata Case-Study (September 2009).<br />IBM’s Analytics Tapped to Predict, Prevent Churn. Telephony Online (April 2009). <br />The Elements of Statistical Learning, Hastie, Tibshirani, Friedman. Springer. (February 2009).<br />Study Shows Obesity Can Be Contagious, Gina Kolata, The New York Times (July 25, 2007) [great example of homophily]<br />Contact<br />Michael E. Driscoll, Ph.D.<br />med@dataspora.com<br />Follow @datasporaon Twitter<br />

Editor's Notes

  • Most telcos lose 1-2% of their customers every month.It’s 7x more expensive to acquire a customer, than to retain.
  • Birds of a feather flock together; cancellers clump together, so do active users. Like vinegar and water, we see enrichment for “like-like” edges in our network, and dilution of “dissimilar” edges (the A-C or C-A). Upshot: people cancellationQuestion: is this all an artifact of family plans – where a bunch of subscribers quits together? In part yes, but the trends hold up even when we do a temporal analysis.
  • Key take-home point here is that this analysis , looking at the May to June transition, removes
  • The stack is loosely coupled: right tool for the right job. The need for a dedicated analytics RDBMS
  • ×