Predicting Online Community Churners using Gaussian Sequences

PREDICTING ONLINE
COMMUNITY CHURNERS
USING GAUSSIAN
SEQUENCES
MATTHEW ROWE
SCHOOL OF COMPUTING AND COMMUNICATIONS
M.ROWE@LANCASTER.AC.UK | @MROWEBOT
International Conference on Social Informatics 2014
Barcelona, Spain

The Issue of Churn
1
Churner: a user, or subscriber,
who stops using a service!
Churners = Loss
Predicting Online Community Churners using Gaussian Sequences
Social
Social capital
Expertise
Vibrancy
Financial

Predicting Churners
2
Engineered Static Features
How do churners and non-churners develop?
How can we exploit development information to detect
churners?

Outline
¨ Defining churn: in the context of online communities
¨ User Lifecycle Model
¤ Mining development signals
¤ Churners vs. Non-churners
¨ Prediction Models
¤ Gaussian Sequences
¤ Single/Dual-Gaussian Sequence Model
¨ Experiments
¨ Conclusions + Future Work
3

2008 2010 2012
0 200 400 600 800
Time
Posts Frequency
Churners
posted for
the final
time in this
window
4
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●
●●
●
●
●●
●
●
●
Δ = maximum #days
●
between posts
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
Mean
Median
●
●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
100 101 102
Δ
p(x)
10−3 10−2 10−1
Split each dataset’s users into training (80%) and testing (20%)
Analysis
Point
Defining Churners
Facebook SAP ServerFault Boards.ie
(i.e. in-degree, out-degree, clustering coefficient, closeness centrality, etc.), induc-ing
a J48 decision tree to di↵erentiate between churners and non-churners when
using social network properties formed from the reply-to graph of the online
communities. In this paper, we implement this model as our baseline by en-gineering
the same features using the same experimental setup. Our approach
di↵ers from existing work by assessing churners’ and non-churners’ development
signals, and inducing a joint-probability function from such information.
Table 1. Splits of users within the datasets and the churn window duration
Platform #Churners #Non-churners Churn Window
Facebook 1,033 1,199 [04-11-2011, 28-08-2012]
SAP 10,421 7,255 [29-11-2009,07-09-2010]
Server Fault 12,314 11,144 [13-06-2010,24-12-2010]
Boards.ie 65,528 6,120,008 [01-01-2005,13-02-2008]

User Lifecycles We divide lifetime into equal activity periods
1 2 3 … k
Experiment
with different
settings of
k={5,10,20}
s
Model the actions to user u by other users
Model the actions by user u to other users
Model the terms used by user u
Term Count
Semantic 17
Web 5
5
s
= { In-degree Out-degree Lexical }

Mining User Evolution Signals
Applied Information Theory measures between stages
Cross-Entropy:
1 2 3 … n
H(P1, P2) H(P2, P3) H(Pn-1, Pn)
x
H(P1, P2)
H(P2, P3)
…
H(Pn-1, Pn)
Produces: {x1,x2,…, xM} = X∈ℝMxS
M users’ measures derived from S stages
6
Period Entropy Historical Cross-Entropy Community Cross-Entropy

User Evolution Signals
7
0.5 0.7 indegree k = 5
●
●
● ●
●
1 2 3 4 5
Lifecycle Stage
H
●
●
● ●
●
● ●
indegree k = 10
●
●
● ●
● ●
0.4 0.6
2 4 6 8 0.2 Lifecycle Stage
H
● ●
●
●
●
●
● ●
0.9
outdegree k = 5
● outdegree k = 10
For each stage:
1. Derive values for each user
2. Derive mean and 95% CI of churners & non-churners
3. Plot curves of each class

Period Variation: Entropy Signals
●
●
indegree k = 5
●
●
● ●
● ●
Increasing Lifecycle Fidelities
●
●
1 2 3 4 5
0.5 0.7
Lifecycle Stage
H
● ●
●
●
● ●
User
Properties
● ●
●
●
2 4 6 8 10
0.2 0.4 0.6
indegree k = 10
Lifecycle Stage
H
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
5 10 15 20
0.1 0.3
indegree k = 20
Lifecycle Stage
H
●
●
●●
●
●
●
●●●●●●●●
●●
●
●
●
● ● ●
● ●
1 2 3 4 5
0.5 0.7 0.9
outdegree k = 5
Lifecycle Stage
H
● ● ● ● ●
●
● ● ● ● ● ● ● ● ●
2 4 6 8 10
0.6 0.8
outdegree k = 10
Lifecycle Stage
H
● ● ● ● ● ● ● ● ● ●
●
●●●●●●●●●
●●●●
●●●●
●●
5 10 15 20
0.5 0.7 0.9
outdegree k = 20
Lifecycle Stage
H
●
●●●●●●●●●●●●●●●●●●●
●
●
●
●
●
1 2 3 4 5
4.0 4.3 4.6
lexical k = 5
Lifecycle Stage
H
● ● ● ● ●
●
●
● ● ● ●
● ● ● ●
2 4 6 8 10
3.8 4.2 4.6
lexical k = 10
Lifecycle Stage
H
●
● ●
●
● ●
● ● ● ●
●
●●●
●●
●
●●●
●
●
●●●●
●●
●●
5 10 15 20
3.4 3.8 4.2
lexical k = 20
Lifecycle Stage
H
●
●●●●●●
●
●●●
●
●●●
●●●●●
Fig. 2. Period entropy distribution on ServerFault for di↵erent fidelity settings (k) for
users’ lifecycles and di↵erent measures of social (indegree and out degree) and lexical
dynamics. The green dashed line shows the non-churners, while the red solid line shows
8
Non-Churners:
• Share more connections (greater entropy)
• Invest more and get more out of the community

Community-Comparisons: Cross-Entropy
Signals
●
●
indegree k = 5
●
●
●
●
●
●
2.0 3.0 4.0 5.0
2.5 3.5
Lifecycle Stage
H
●
●
●
●
● ● ●
●
●
2 4 6 8 10
1.0 2.0 3.0
indegree k = 10
Lifecycle Stage
H
●
●
● ● ● ● ●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
5 10 15 20
0.5 1.5
indegree k = 20
Lifecycle Stage
H
●●●
●
●
●
●
●●●
●●
●
●
●
●●
●
●
● ●
●
●
2.0 3.0 4.0 5.0
4.0 5.0
outdegree k = 5
Lifecycle Stage
H
● ●
●
●
● ● ●
●
● ●
●
●
●
2 4 6 8 10
4.0 5.0
outdegree k = 10
Lifecycle Stage
H
● ● ●
● ●
●
● ●
●
●●●
●
●
●●●
●
●
●
●
●●●
●●●
●
5 10 15 20
3.5 4.5 5.5
outdegree k = 20
Lifecycle Stage
H
●
●●
●●
●
●●●●●
●●●●
●●●
●
●
●
●
●
2.0 3.0 4.0 5.0
7.9 8.2 8.5
lexical k = 5
Lifecycle Stage
H
●
●
●
●
● ● ●
●
●
●
● ●
●
2 4 6 8 10
7.6 8.0 8.4
lexical k = 10
Lifecycle Stage
H
●
● ● ●
● ● ● ●
●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●
●
●
5 10 15 20
7.0 8.0
lexical k = 20
Lifecycle Stage
H
●●
●
●●●●●●●●
●●●●●●
●
Fig. 4. Community cross-entropy distribution for di↵erent fidelity settings (k) for users’
lifecycles and di↵erent measures of social (indegree and out degree) and lexical dynam-ics.
9
Churners:
• Diverge from community norms, but slower

10 Prediction Models
How can we exploit development information to
detect churners?

0.3 0.6 Lifecycle ●
H
●
0.2 0.5 Lifecycle ●
Gaussian Sequences
11
●
●
●
2.0 3.0 4.0 5.0
Stage
0.1 0.4 Lifecycle Assume that measure, m, (in-degree) is normally
distributed for a given stage (s)
Rationale: uncommon overlap between 95% CI bounds
●
●
●●●●
● indegree k = 5
●
●
● ●
● ●
●
●
1 2 3 4 5
0.5 0.7
Lifecycle Stage
H
● ●
0.3
indegree Lifecycle ●●●●●●●●●0.4 "
Gaussian Sequence is a chain of Gaussians for measure
where μm,ˆand !ˆm,s denote the m over maximum the k=|likelihood S| stages:
estimates of the mean and standard
deviation respectively. Then the Gaussian Sequence of m is defined as follows:
Gm =
●
●
● ● ● ● ● ●
2 4 6 8 10
Stage
H
● ● ● ● ● ● ●
●●●●●●●●●●
●●●●●●●●
5 10 15 20
Stage
H
●
●●●●●●●●●●●●●●●●
Fig. 3. Period cross-entropy distribution on ServerFault for di↵erent fidelity settings
(k) for users’ lifecycles and di↵erent measures of social (indegree and out degree) lexical dynamics.
that each correspond to a sequence of Gaussians measured over the k lifecycle
stages:
Definition 1 (Gaussian Sequence). Let m be a given measurement, s be given lifecycle stage drawn from the set of lifecycle !
stages s 2 S, then m said to be normally distributed on s and defined by N
μm,ˆs, (!ˆs)2
m,⇣
N
!
ˆμm,1, (ˆ!m,1)2,N
!
ˆμm,2, (ˆ!m,2)2, . . . ,N
0.7 0.9
!
ˆμm,|S|, (ˆ!m,|S|)2
"⌘
.
5.1 Single-Gaussian Sequence Model
●
●
● ●
● ●
●
●
2 4 6 8 10
0.2 0.4 0.6
indegree k = 10
Lifecycle Stage
H
● ●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
5 10 0.1 H
●
●
●●
●
●
● ●
● ●
1 2 3 4 5
0.5 0.7 0.9
outdegree k = 5
Lifecycle Stage
H
● ● ● ● ●
●
● ● ● ● ● ● ● ● ●
2 4 6 8 10
0.6 0.8
outdegree k = 10
Lifecycle Stage
H
● ● ● ● ● ● ● ● ● ●
●
outdegree Lifecycle ●●●●●●●●●
5 10 0.5 H
●
0.6 0.8
0.3978 0.3988
m
f(m)

for influence on the churn probability; its inclusion is necessary because we may
have an outlier measure for u and should limit over fitting as a consequence
- note that this variable is indexed by both m and s as it is specific to both
the lifecycle stage, and the measure under inspection. Given our formulation of
the churn probability in a particular lifecycle stage s and based on measure m,
we can therefore derive the joint probability of u churning over the observed
sequence of measures (m 2 M) and his lifecycle stages (s 2 S) as follows - we
term this the Single-Gaussian Sequence Model :
Gaussian Sequence Prediction Models
its domain a given user’s development
as its co-domain, hence: f : Rn ! [0, 1].
regression; Single-(ii) a Gaussian dual-Sequence gaussian Model
sequence
We now explain each in turn.
Probability of u belonging to the churn Gaussian, in each measure and stage
Y
Y
⇣
"m,sN
!
f(u, m, s)|ˆμc
the logistic regression model to predict
probability over observed measures and lifecycle stages as follows - we term this the Dual-Gaussian
Sequence Model:
Here ⇢ acts as a smoother to chain together zero-probability values. Now, for this detection model,
our objective is to minimise the squared-loss between a user’s forecasted churn probability and the
observed churn label - given that the former is in the closed interval [0, 1] and the latter is from the set
{0, 1} - our parameters are L2-regularised to control for over-fitting:
12
Q(u|b) =
m2M
s2S
⇢
Mining User Development Signals for Online Community Churner Detection • 1:19
Dual-Gaussian Sequence Model
Probability of u belonging to the churn Gaussian minus the probability of
belonging to the non-churn Gaussian , in each measure and stage
Q(u|b) =
Y
m2M
Y
s2S
⇢
h
"m,sN
#
f(u, m, s)|ˆμc
m,s, (ˆ#c
m,s)2$
− (1 − "m,s)N
m,s, (ˆ#n
m,s)2$i
#
f(u, m, s)|ˆμn
ρ smoothes zero probabilities (set to 0.1)
xi
(10)
each feature within the linear model
we used the maximum likelihood esti-mation
derived model is used to predict the churn
When inspecting each different mea-surement
lifecycle stage 1) for both churners and non-churners,
m,s, (ˆ!c
m,s)2"⌘
The parameter ⇢ smooths zero probability values given our joint calculation.
Assuming we have |S| lifecycle stages, and |M| measures, then the slack variables
are stored within a parameter vector: b where - where "m,s 2 b.
+
β∈b learnt using (single|dual) stochastic gradient descent
argmin
b⇤
X
(xi,yi)2D
#
yi − Q(u|b)
$2 + $||b||2 (12)

Experiments
¨ Aim: to maximise detection of churners
¤ I.e. Area Under ROC Curve
1. Do we perform better than the state of
the art?
2. How do we fare against existing
classifiers?
13

Table IV. Number of instances within the training and testing datasets used
for the experiments across the different lifecycle fidelities. The number of
instances decreases as the lifecycle fidelity increases as we require each user
Experimental Setup
to have posted double the fidelity number of posts.
Fidelity Facebook SAP ServerFault Boards.ie
|Train| |Test| |Train| |Test| |Train| |Test| |Train| |Test| 5 306 72 1,099 302 1,229 338 6,338 1,700
10 204 48 716 205 688 177 4,979 1,334
20 123 27 448 129 375 84 3,635 995
Stage 1: Model Tuning
instances, deriving the error in prediction and the derivative of each parameter and thus updating
accordingly. As a result our 10-Fold update Cross-rule Validation is the following, to tune model for a given hyperparameters
training instance with index !j = !j − ⌘ri!j model we Web have Science
two hyperparameters Stage 2: than Model must Testing
be tuned: the learning rate ⌘ and weight Conference
Apply tuned models to held out test splits
#. The use of elastic-net regularisation means that we can examine the spectrum
2011
using solely a lasso penalty (↵ = 1), or solely a ridge penalty (↵ = 0), or somewhere (↵ = 0.5). Rather than tuning linear models Baseline using the 1 (following B1-J48): ↵ as Decision a hyperparameter, tree with measures we adopted a different approach settings: ↵ = {0, 0.5, 1}, thereby tuning as features
a hyperpameter }, Baseline 2 (B2-NB): Social network features (e.g. centrality)
for each 14
setting. We explain in the following section the model tuning approach that Predicting Online Community Churners using Gaussian Sequences
Experiments

average number of posts within participated threads, popularity (% of user au-thored
posts that receive replies), initialisation (% of threads authored by the
user), and polarity. We first tested the J48 classifier, as used in [2], but found
this to be poor performing5 therefore we used the Naive Bayes classifier instead.
Results (ROC)
15
Table 2. Area under the Receiver Operator Characteristic (ROC) Curve results for
the di↵erent Gaussian Sequence Models and Learning Procedures
Baselines SGD D-SGD
Platform Lifecycle Fidelity B1-J48 B2-NB Single-N Dual-N Single-N Dual-N
Facebook 5 0.559 0.461 0.570 0.472 0.548 0.478
10 0.531 0.491 0.569 0.554 0.593 0.545
20 0.478 0.444 0.664 0.500 0.528 0.583
SAP 5 0.594 0.497 0.573 0.527 0.545 0.533
10 0.533 0.494 0.553 0.503 0.584 0.590
20 0.478 0.582 0.500 0.500 0.540 0.525
ServerFault 5 0.583 0.530 0.522 0.556 0.583 0.577
10 0.534 0.546 0.500 0.557 0.569 0.589
20 0.463 0.530 0.500 0.634 0.486 0.484
Boards.ie 5 0.504 0.611 0.524 0.547 0.526 0.518
10 0.512 0.593 0.500 0.539 0.501 0.496
20 0.560 0.553 0.500 0.501 0.500 0.502
Significantly outperform baselines for 2/4 datasets
No clear winner among Gaussian Sequence Models
6.3 Results: Churn Prediction Performance
For the model testing phase of the experiments, we took the best performing hy-perparameters

Conclusions + Future Work
¨ Gaussian Sequences allow user measures to be
chained together in a joint probability model
¤ Mined from users’ development trajectories
¤ Examined across various lifecycle fidelities
¨ Exceed performance of baselines for 2/4 datasets
¨ FW1: Churn point prediction + user rankings
¨ FW2: Towards a theory of churner development

Questions?
@mrowebot | m.rowe@lancaster.ac.uk
http://www.lancaster.ac.uk/staff/rowem/
17

Predicting Online Community Churners using Gaussian Sequences

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (9)

Similar to Predicting Online Community Churners using Gaussian Sequences

Similar to Predicting Online Community Churners using Gaussian Sequences (20)

More from Matthew Rowe

More from Matthew Rowe (17)

Recently uploaded

Recently uploaded (20)

Predicting Online Community Churners using Gaussian Sequences