SlideShare a Scribd company logo
1 of 18
Download to read offline
PREDICTING ONLINE 
COMMUNITY CHURNERS 
USING GAUSSIAN 
SEQUENCES 
MATTHEW ROWE 
SCHOOL OF COMPUTING AND COMMUNICATIONS 
M.ROWE@LANCASTER.AC.UK | @MROWEBOT 
International Conference on Social Informatics 2014 
Barcelona, Spain
The Issue of Churn 
1 
Churner: a user, or subscriber, 
who stops using a service! 
Churners = Loss 
Predicting Online Community Churners using Gaussian Sequences 
Social 
Social capital 
Expertise 
Vibrancy 
Financial
Predicting Churners 
2 
Engineered Static Features 
How do churners and non-churners develop? 
How can we exploit development information to detect 
churners? 
Predicting Online Community Churners using Gaussian Sequences
Outline 
¨ Defining churn: in the context of online communities 
¨ User Lifecycle Model 
¤ Mining development signals 
¤ Churners vs. Non-churners 
¨ Prediction Models 
¤ Gaussian Sequences 
¤ Single/Dual-Gaussian Sequence Model 
¨ Experiments 
¨ Conclusions + Future Work 
3 
Predicting Online Community Churners using Gaussian Sequences
2008 2010 2012 
0 200 400 600 800 
Time 
Posts Frequency 
Churners 
posted for 
the final 
time in this 
window 
4 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
●●● 
● 
● 
●● 
● 
● 
●● 
● 
● 
● 
Δ = maximum #days 
● 
between posts 
●● 
● 
● 
● 
●● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
●●●
● 
● 
●● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
●● 
●● 
● 
● 
● 
● 
● 
● 
● 
Mean 
Median 
● 
● 
●
● 
●●● 
●
●●
●
● 
● 
●● 
● 
● 
●
● 
● 
● 
●
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
●● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
●
● 
● 
● 
●
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●● 
● 
● 
●
● 
● 
● 
● 
●●
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●
● 
● 
● 
● 
● 
●
● 
● 
●
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
● 
●
● 
●
● 
● 
● 
● 
100 101 102 
Δ 
p(x) 
10−3 10−2 10−1 
Split each dataset’s users into training (80%) and testing (20%) 
Predicting Online Community Churners using Gaussian Sequences 
Analysis 
Point 
Defining Churners 
Facebook SAP ServerFault Boards.ie 
(i.e. in-degree, out-degree, clustering coefficient, closeness centrality, etc.), induc-ing 
a J48 decision tree to di↵erentiate between churners and non-churners when 
using social network properties formed from the reply-to graph of the online 
communities. In this paper, we implement this model as our baseline by en-gineering 
the same features using the same experimental setup. Our approach 
di↵ers from existing work by assessing churners’ and non-churners’ development 
signals, and inducing a joint-probability function from such information. 
Table 1. Splits of users within the datasets and the churn window duration 
Platform #Churners #Non-churners Churn Window 
Facebook 1,033 1,199 [04-11-2011, 28-08-2012] 
SAP 10,421 7,255 [29-11-2009,07-09-2010] 
Server Fault 12,314 11,144 [13-06-2010,24-12-2010] 
Boards.ie 65,528 6,120,008 [01-01-2005,13-02-2008]
User Lifecycles We divide lifetime into equal activity periods 
1 2 3 … k 
Experiment 
with different 
settings of 
k={5,10,20} 
s 
Model the actions to user u by other users 
Model the actions by user u to other users 
Model the terms used by user u 
Predicting Online Community Churners using Gaussian Sequences 
Term Count 
Semantic 17 
Web 5 
5 
s 
= { In-degree Out-degree Lexical }
Mining User Evolution Signals 
Applied Information Theory measures between stages 
Cross-Entropy: 
1 2 3 … n 
H(P1, P2) H(P2, P3) H(Pn-1, Pn) 
Predicting Online Community Churners using Gaussian Sequences 
x 
H(P1, P2) 
H(P2, P3) 
… 
H(Pn-1, Pn) 
Produces: {x1,x2,…, xM} = X∈ℝMxS 
M users’ measures derived from S stages 
6 
Period Entropy Historical Cross-Entropy Community Cross-Entropy
User Evolution Signals 
7 
0.5 0.7 indegree k = 5 
● 
● 
Predicting Online Community Churners using Gaussian Sequences 
● ● 
● 
1 2 3 4 5 
Lifecycle Stage 
H 
● 
● 
● ● 
● 
● ● 
indegree k = 10 
● 
● 
● ● 
● ● 
0.4 0.6 
2 4 6 8 0.2 Lifecycle Stage 
H 
● ● 
● 
● 
● 
● 
● ● 
0.9 
outdegree k = 5 
● outdegree k = 10 
For each stage: 
1. Derive values for each user 
2. Derive mean and 95% CI of churners & non-churners 
3. Plot curves of each class
Period Variation: Entropy Signals 
● 
● 
indegree k = 5 
● 
● 
● ● 
● ● 
Increasing Lifecycle Fidelities 
● 
● 
1 2 3 4 5 
0.5 0.7 
Lifecycle Stage 
H 
● ● 
● 
● 
● ● 
User 
Properties 
Predicting Online Community Churners using Gaussian Sequences 
● ● 
● 
● 
2 4 6 8 10 
0.2 0.4 0.6 
indegree k = 10 
Lifecycle Stage 
H 
● ● 
● 
● 
● 
● 
● ● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
●●● 
● 
● 
● 
● 
● 
● 
● 
5 10 15 20 
0.1 0.3 
indegree k = 20 
Lifecycle Stage 
H 
● 
● 
●● 
● 
● 
● 
●●●●●●●● 
●● 
● 
● 
● 
● ● ● 
● ● 
1 2 3 4 5 
0.5 0.7 0.9 
outdegree k = 5 
Lifecycle Stage 
H 
● ● ● ● ● 
● 
● ● ● ● ● ● ● ● ● 
2 4 6 8 10 
0.6 0.8 
outdegree k = 10 
Lifecycle Stage 
H 
● ● ● ● ● ● ● ● ● ● 
● 
●●●●●●●●● 
●●●● 
●●●● 
●● 
5 10 15 20 
0.5 0.7 0.9 
outdegree k = 20 
Lifecycle Stage 
H 
● 
●●●●●●●●●●●●●●●●●●● 
● 
● 
● 
● 
● 
1 2 3 4 5 
4.0 4.3 4.6 
lexical k = 5 
Lifecycle Stage 
H 
● ● ● ● ● 
● 
● 
● ● ● ● 
● ● ● ● 
2 4 6 8 10 
3.8 4.2 4.6 
lexical k = 10 
Lifecycle Stage 
H 
● 
● ● 
● 
● ● 
● ● ● ● 
● 
●●● 
●● 
● 
●●● 
● 
● 
●●●● 
●● 
●● 
5 10 15 20 
3.4 3.8 4.2 
lexical k = 20 
Lifecycle Stage 
H 
● 
●●●●●● 
● 
●●● 
● 
●●● 
●●●●● 
Fig. 2. Period entropy distribution on ServerFault for di↵erent fidelity settings (k) for 
users’ lifecycles and di↵erent measures of social (indegree and out degree) and lexical 
dynamics. The green dashed line shows the non-churners, while the red solid line shows 
8 
Non-Churners: 
• Share more connections (greater entropy) 
• Invest more and get more out of the community
Community-Comparisons: Cross-Entropy 
Signals 
● 
● 
indegree k = 5 
● 
● 
● 
● 
● 
● 
2.0 3.0 4.0 5.0 
2.5 3.5 
Lifecycle Stage 
H 
● 
● 
● 
● 
Predicting Online Community Churners using Gaussian Sequences 
● ● ● 
● 
● 
2 4 6 8 10 
1.0 2.0 3.0 
indegree k = 10 
Lifecycle Stage 
H 
● 
● 
● ● ● ● ● 
● 
● 
●● 
● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
●●● 
● 
5 10 15 20 
0.5 1.5 
indegree k = 20 
Lifecycle Stage 
H 
●●● 
● 
● 
● 
● 
●●● 
●● 
● 
● 
● 
●● 
● 
● 
● ● 
● 
● 
2.0 3.0 4.0 5.0 
4.0 5.0 
outdegree k = 5 
Lifecycle Stage 
H 
● ● 
● 
● 
● ● ● 
● 
● ● 
● 
● 
● 
2 4 6 8 10 
4.0 5.0 
outdegree k = 10 
Lifecycle Stage 
H 
● ● ● 
● ● 
● 
● ● 
● 
●●● 
● 
● 
●●● 
● 
● 
● 
● 
●●● 
●●● 
● 
5 10 15 20 
3.5 4.5 5.5 
outdegree k = 20 
Lifecycle Stage 
H 
● 
●● 
●● 
● 
●●●●● 
●●●● 
●●● 
● 
● 
● 
● 
● 
2.0 3.0 4.0 5.0 
7.9 8.2 8.5 
lexical k = 5 
Lifecycle Stage 
H 
● 
● 
● 
● 
● ● ● 
● 
● 
● 
● ● 
● 
2 4 6 8 10 
7.6 8.0 8.4 
lexical k = 10 
Lifecycle Stage 
H 
● 
● ● ● 
● ● ● ● 
● 
● 
● 
●● 
● 
●● 
● 
●● 
● 
●● 
●● 
● 
● 
● 
● 
5 10 15 20 
7.0 8.0 
lexical k = 20 
Lifecycle Stage 
H 
●● 
● 
●●●●●●●● 
●●●●●● 
● 
Fig. 4. Community cross-entropy distribution for di↵erent fidelity settings (k) for users’ 
lifecycles and di↵erent measures of social (indegree and out degree) and lexical dynam-ics. 
9 
Churners: 
• Diverge from community norms, but slower
10 Prediction Models 
How can we exploit development information to 
detect churners? 
Predicting Online Community Churners using Gaussian Sequences
0.3 0.6 Lifecycle ● 
H 
● 
0.2 0.5 Lifecycle ● 
Gaussian Sequences 
11 
● 
● 
● 
2.0 3.0 4.0 5.0 
Stage 
0.1 0.4 Lifecycle Assume that measure, m, (in-degree) is normally 
distributed for a given stage (s) 
Rationale: uncommon overlap between 95% CI bounds 
● 
● 
●●●● 
● indegree k = 5 
● 
● 
● ● 
● ● 
● 
● 
1 2 3 4 5 
0.5 0.7 
Lifecycle Stage 
H 
● ● 
0.3 
indegree Lifecycle ●●●●●●●●●0.4 " 
Gaussian Sequence is a chain of Gaussians for measure 
where μm,ˆand !ˆm,s denote the m over maximum the k=|likelihood S| stages: 
estimates of the mean and standard 
deviation respectively. Then the Gaussian Sequence of m is defined as follows: 
Gm = 
Predicting Online Community Churners using Gaussian Sequences 
● 
● 
● ● ● ● ● ● 
2 4 6 8 10 
Stage 
H 
● ● ● ● ● ● ● 
●●●●●●●●●● 
●●●●●●●● 
5 10 15 20 
Stage 
H 
● 
●●●●●●●●●●●●●●●● 
Fig. 3. Period cross-entropy distribution on ServerFault for di↵erent fidelity settings 
(k) for users’ lifecycles and di↵erent measures of social (indegree and out degree) lexical dynamics. 
that each correspond to a sequence of Gaussians measured over the k lifecycle 
stages: 
Definition 1 (Gaussian Sequence). Let m be a given measurement, s be given lifecycle stage drawn from the set of lifecycle ! 
stages s 2 S, then m said to be normally distributed on s and defined by N 
μm,ˆs, (!ˆs)2 
m,⇣ 
N 
! 
ˆμm,1, (ˆ!m,1)2,N 
! 
ˆμm,2, (ˆ!m,2)2, . . . ,N 
0.7 0.9 
! 
ˆμm,|S|, (ˆ!m,|S|)2 
"⌘ 
. 
5.1 Single-Gaussian Sequence Model 
● 
● 
● ● 
● ● 
● 
● 
2 4 6 8 10 
0.2 0.4 0.6 
indegree k = 10 
Lifecycle Stage 
H 
● ● 
● 
● 
● 
● 
● ● 
● 
● 
● 
●● 
● 
● 
● 
● 
● 
● 
● 
5 10 0.1 H 
● 
● 
●● 
● 
● 
● ● 
● ● 
1 2 3 4 5 
0.5 0.7 0.9 
outdegree k = 5 
Lifecycle Stage 
H 
● ● ● ● ● 
● 
● ● ● ● ● ● ● ● ● 
2 4 6 8 10 
0.6 0.8 
outdegree k = 10 
Lifecycle Stage 
H 
● ● ● ● ● ● ● ● ● ● 
● 
outdegree Lifecycle ●●●●●●●●● 
5 10 0.5 H 
● 
0.6 0.8 
0.3978 0.3988 
m 
f(m)
for influence on the churn probability; its inclusion is necessary because we may 
have an outlier measure for u and should limit over fitting as a consequence 
- note that this variable is indexed by both m and s as it is specific to both 
the lifecycle stage, and the measure under inspection. Given our formulation of 
the churn probability in a particular lifecycle stage s and based on measure m, 
we can therefore derive the joint probability of u churning over the observed 
sequence of measures (m 2 M) and his lifecycle stages (s 2 S) as follows - we 
term this the Single-Gaussian Sequence Model : 
Gaussian Sequence Prediction Models 
its domain a given user’s development 
as its co-domain, hence: f : Rn ! [0, 1]. 
regression; Single-(ii) a Gaussian dual-Sequence gaussian Model 
sequence 
We now explain each in turn. 
Probability of u belonging to the churn Gaussian, in each measure and stage 
Y 
Y 
⇣ 
"m,sN 
! 
f(u, m, s)|ˆμc 
the logistic regression model to predict 
probability over observed measures and lifecycle stages as follows - we term this the Dual-Gaussian 
Sequence Model: 
Here ⇢ acts as a smoother to chain together zero-probability values. Now, for this detection model, 
our objective is to minimise the squared-loss between a user’s forecasted churn probability and the 
observed churn label - given that the former is in the closed interval [0, 1] and the latter is from the set 
{0, 1} - our parameters are L2-regularised to control for over-fitting: 
12 
Q(u|b) = 
m2M 
s2S 
⇢ 
Mining User Development Signals for Online Community Churner Detection • 1:19 
Dual-Gaussian Sequence Model 
Probability of u belonging to the churn Gaussian minus the probability of 
belonging to the non-churn Gaussian , in each measure and stage 
Q(u|b) = 
Y 
m2M 
Y 
s2S 
⇢ 
h 
"m,sN 
# 
f(u, m, s)|ˆμc 
m,s, (ˆ#c 
m,s)2$ 
− (1 − "m,s)N 
m,s, (ˆ#n 
m,s)2$i 
# 
f(u, m, s)|ˆμn 
ρ smoothes zero probabilities (set to 0.1) 
Predicting Online Community Churners using Gaussian Sequences 
xi 
(10) 
each feature within the linear model 
we used the maximum likelihood esti-mation 
derived model is used to predict the churn 
When inspecting each different mea-surement 
lifecycle stage 1) for both churners and non-churners, 
m,s, (ˆ!c 
m,s)2"⌘ 
The parameter ⇢ smooths zero probability values given our joint calculation. 
Assuming we have |S| lifecycle stages, and |M| measures, then the slack variables 
are stored within a parameter vector: b where - where "m,s 2 b. 
+ 
β∈b learnt using (single|dual) stochastic gradient descent 
argmin 
b⇤ 
X 
(xi,yi)2D 
# 
yi − Q(u|b) 
$2 + $||b||2 (12)
Experiments 
¨ Aim: to maximise detection of churners 
¤ I.e. Area Under ROC Curve 
1. Do we perform better than the state of 
the art? 
2. How do we fare against existing 
classifiers? 
13 
Predicting Online Community Churners using Gaussian Sequences
Table IV. Number of instances within the training and testing datasets used 
for the experiments across the different lifecycle fidelities. The number of 
instances decreases as the lifecycle fidelity increases as we require each user 
Experimental Setup 
to have posted double the fidelity number of posts. 
Fidelity Facebook SAP ServerFault Boards.ie 
|Train| |Test| |Train| |Test| |Train| |Test| |Train| |Test| 5 306 72 1,099 302 1,229 338 6,338 1,700 
10 204 48 716 205 688 177 4,979 1,334 
20 123 27 448 129 375 84 3,635 995 
Stage 1: Model Tuning 
instances, deriving the error in prediction and the derivative of each parameter and thus updating 
accordingly. As a result our 10-Fold update Cross-rule Validation is the following, to tune model for a given hyperparameters 
training instance with index !j = !j − ⌘ri!j model we Web have Science 
two hyperparameters Stage 2: than Model must Testing 
be tuned: the learning rate ⌘ and weight Conference 
Apply tuned models to held out test splits 
#. The use of elastic-net regularisation means that we can examine the spectrum 
2011 
using solely a lasso penalty (↵ = 1), or solely a ridge penalty (↵ = 0), or somewhere (↵ = 0.5). Rather than tuning linear models Baseline using the 1 (following B1-J48): ↵ as Decision a hyperparameter, tree with measures we adopted a different approach settings: ↵ = {0, 0.5, 1}, thereby tuning as features 
a hyperpameter }, Baseline 2 (B2-NB): Social network features (e.g. centrality) 
for each 14 
setting. We explain in the following section the model tuning approach that Predicting Online Community Churners using Gaussian Sequences 
Experiments
average number of posts within participated threads, popularity (% of user au-thored 
posts that receive replies), initialisation (% of threads authored by the 
user), and polarity. We first tested the J48 classifier, as used in [2], but found 
this to be poor performing5 therefore we used the Naive Bayes classifier instead. 
Results (ROC) 
15 
Table 2. Area under the Receiver Operator Characteristic (ROC) Curve results for 
the di↵erent Gaussian Sequence Models and Learning Procedures 
Baselines SGD D-SGD 
Platform Lifecycle Fidelity B1-J48 B2-NB Single-N Dual-N Single-N Dual-N 
Facebook 5 0.559 0.461 0.570 0.472 0.548 0.478 
10 0.531 0.491 0.569 0.554 0.593 0.545 
20 0.478 0.444 0.664 0.500 0.528 0.583 
SAP 5 0.594 0.497 0.573 0.527 0.545 0.533 
10 0.533 0.494 0.553 0.503 0.584 0.590 
20 0.478 0.582 0.500 0.500 0.540 0.525 
ServerFault 5 0.583 0.530 0.522 0.556 0.583 0.577 
10 0.534 0.546 0.500 0.557 0.569 0.589 
20 0.463 0.530 0.500 0.634 0.486 0.484 
Boards.ie 5 0.504 0.611 0.524 0.547 0.526 0.518 
10 0.512 0.593 0.500 0.539 0.501 0.496 
20 0.560 0.553 0.500 0.501 0.500 0.502 
Significantly outperform baselines for 2/4 datasets 
No clear winner among Gaussian Sequence Models 
6.3 Results: Churn Prediction Performance 
For the model testing phase of the experiments, we took the best performing hy-perparameters 
Predicting Online Community Churners using Gaussian Sequences
Conclusions + Future Work 
¨ Gaussian Sequences allow user measures to be 
chained together in a joint probability model 
¤ Mined from users’ development trajectories 
¤ Examined across various lifecycle fidelities 
¨ Exceed performance of baselines for 2/4 datasets 
¨ FW1: Churn point prediction + user rankings 
¨ FW2: Towards a theory of churner development
Questions? 
@mrowebot | m.rowe@lancaster.ac.uk 
http://www.lancaster.ac.uk/staff/rowem/ 
17 
Predicting Online Community Churners using Gaussian Sequences

More Related Content

Viewers also liked

Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataMatthew Rowe
 
Identity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureIdentity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureMatthew Rowe
 
Information Extraction from the Web - In today's web
Information Extraction from the Web - In today's webInformation Extraction from the Web - In today's web
Information Extraction from the Web - In today's webBenjamin Habegger
 
Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Matthew Rowe
 
Open Calais
Open CalaisOpen Calais
Open Calaisymark
 
Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013Anna Lisa Gentile
 
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...Matthew Rowe
 

Viewers also liked (9)

Data.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked DataData.dcs: Converting Legacy Data into Linked Data
Data.dcs: Converting Legacy Data into Linked Data
 
Identity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureIdentity: Physical, Cyber, Future
Identity: Physical, Cyber, Future
 
Information Extraction from the Web - In today's web
Information Extraction from the Web - In today's webInformation Extraction from the Web - In today's web
Information Extraction from the Web - In today's web
 
Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...
 
Open Calais
Open CalaisOpen Calais
Open Calais
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013
 
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
 
Comparing Ontotext KIM and Apache Stanbol
Comparing Ontotext KIM and Apache StanbolComparing Ontotext KIM and Apache Stanbol
Comparing Ontotext KIM and Apache Stanbol
 

Similar to Predicting Online Community Churners using Gaussian Sequences

Design and implementation of synchronous 4 bit up counter using 180 nm cmos p...
Design and implementation of synchronous 4 bit up counter using 180 nm cmos p...Design and implementation of synchronous 4 bit up counter using 180 nm cmos p...
Design and implementation of synchronous 4 bit up counter using 180 nm cmos p...eSAT Publishing House
 
scical manual fx-250HC
scical manual fx-250HCscical manual fx-250HC
scical manual fx-250HCpearlapplepen
 
Malmquist Total Factor Productivity Index for Modeling Dairy Sector
Malmquist Total Factor Productivity Index for Modeling Dairy SectorMalmquist Total Factor Productivity Index for Modeling Dairy Sector
Malmquist Total Factor Productivity Index for Modeling Dairy SectorARNAB ROY
 
PSO based NNIMC for a Conical Tank Level Process
PSO based NNIMC for a Conical Tank Level ProcessPSO based NNIMC for a Conical Tank Level Process
PSO based NNIMC for a Conical Tank Level ProcessIRJET Journal
 
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream JoinScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream JoinVincenzo Gulisano
 
Change Impact Analysis for Natural Language Requirements
Change Impact Analysis for Natural Language RequirementsChange Impact Analysis for Natural Language Requirements
Change Impact Analysis for Natural Language RequirementsLionel Briand
 
Lab Practices and Works Documentation / Report on Computer Graphics
Lab Practices and Works Documentation / Report on Computer GraphicsLab Practices and Works Documentation / Report on Computer Graphics
Lab Practices and Works Documentation / Report on Computer GraphicsRup Chowdhury
 
Unit Testing Tool Competition-Eighth Round
Unit Testing Tool Competition-Eighth RoundUnit Testing Tool Competition-Eighth Round
Unit Testing Tool Competition-Eighth RoundSebastiano Panichella
 
Ndss 2016 game_bot_final_no_video
Ndss 2016 game_bot_final_no_videoNdss 2016 game_bot_final_no_video
Ndss 2016 game_bot_final_no_videoEun-Jo Lee
 
7 tools for your devops stack
7 tools for your devops stack7 tools for your devops stack
7 tools for your devops stackKris Buytaert
 
Video smart cropping web application
Video smart cropping web applicationVideo smart cropping web application
Video smart cropping web applicationVasileiosMezaris
 
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020NECST Lab @ Politecnico di Milano
 
Introduction to Six Sigma
Introduction to Six SigmaIntroduction to Six Sigma
Introduction to Six SigmaKanriConsulting
 

Similar to Predicting Online Community Churners using Gaussian Sequences (20)

Design and implementation of synchronous 4 bit up counter using 180 nm cmos p...
Design and implementation of synchronous 4 bit up counter using 180 nm cmos p...Design and implementation of synchronous 4 bit up counter using 180 nm cmos p...
Design and implementation of synchronous 4 bit up counter using 180 nm cmos p...
 
Implementing error budgets
Implementing error budgetsImplementing error budgets
Implementing error budgets
 
TQM
TQMTQM
TQM
 
Applying QbD to Biotech Process Validation
Applying QbD to Biotech Process ValidationApplying QbD to Biotech Process Validation
Applying QbD to Biotech Process Validation
 
scical manual fx-250HC
scical manual fx-250HCscical manual fx-250HC
scical manual fx-250HC
 
ppt2.pptx
ppt2.pptxppt2.pptx
ppt2.pptx
 
Malmquist Total Factor Productivity Index for Modeling Dairy Sector
Malmquist Total Factor Productivity Index for Modeling Dairy SectorMalmquist Total Factor Productivity Index for Modeling Dairy Sector
Malmquist Total Factor Productivity Index for Modeling Dairy Sector
 
PSO based NNIMC for a Conical Tank Level Process
PSO based NNIMC for a Conical Tank Level ProcessPSO based NNIMC for a Conical Tank Level Process
PSO based NNIMC for a Conical Tank Level Process
 
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream JoinScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
 
Change Impact Analysis for Natural Language Requirements
Change Impact Analysis for Natural Language RequirementsChange Impact Analysis for Natural Language Requirements
Change Impact Analysis for Natural Language Requirements
 
Lab Practices and Works Documentation / Report on Computer Graphics
Lab Practices and Works Documentation / Report on Computer GraphicsLab Practices and Works Documentation / Report on Computer Graphics
Lab Practices and Works Documentation / Report on Computer Graphics
 
Practica1 digi2
Practica1 digi2Practica1 digi2
Practica1 digi2
 
Unit Testing Tool Competition-Eighth Round
Unit Testing Tool Competition-Eighth RoundUnit Testing Tool Competition-Eighth Round
Unit Testing Tool Competition-Eighth Round
 
.pptx
.pptx.pptx
.pptx
 
Ndss 2016 game_bot_final_no_video
Ndss 2016 game_bot_final_no_videoNdss 2016 game_bot_final_no_video
Ndss 2016 game_bot_final_no_video
 
7 tools for your devops stack
7 tools for your devops stack7 tools for your devops stack
7 tools for your devops stack
 
Video smart cropping web application
Video smart cropping web applicationVideo smart cropping web application
Video smart cropping web application
 
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
A Methodology for Automatic GPU Kernel Optimization - NECSTTechTalk 4/06/2020
 
Introduction to Six Sigma
Introduction to Six SigmaIntroduction to Six Sigma
Introduction to Six Sigma
 
ch06.ppt
ch06.pptch06.ppt
ch06.ppt
 

More from Matthew Rowe

Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMatthew Rowe
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Matthew Rowe
 
Attention Economics in Social Web Systems
Attention Economics in Social Web SystemsAttention Economics in Social Web Systems
Attention Economics in Social Web SystemsMatthew Rowe
 
What makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsWhat makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsMatthew Rowe
 
Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research AgendaMatthew Rowe
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social SemanticsMatthew Rowe
 
Modelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesModelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesMatthew Rowe
 
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsUsing Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsMatthew Rowe
 
Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsMatthew Rowe
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataMatthew Rowe
 
Forecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeForecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeMatthew Rowe
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebMatthew Rowe
 
PhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataPhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataMatthew Rowe
 
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous SourcesIntegrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous SourcesMatthew Rowe
 
Inferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesMatthew Rowe
 
The Credibility of Digital Identity Information on the Social Web: A User Study
The Credibility of Digital Identity Information on the Social Web: A User StudyThe Credibility of Digital Identity Information on the Social Web: A User Study
The Credibility of Digital Identity Information on the Social Web: A User StudyMatthew Rowe
 
Harnessing the Social Web: The Science of Identity Disambiguation
Harnessing the Social Web: The Science of Identity DisambiguationHarnessing the Social Web: The Science of Identity Disambiguation
Harnessing the Social Web: The Science of Identity DisambiguationMatthew Rowe
 

More from Matthew Rowe (17)

Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online Communities
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
 
Attention Economics in Social Web Systems
Attention Economics in Social Web SystemsAttention Economics in Social Web Systems
Attention Economics in Social Web Systems
 
What makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsWhat makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositions
 
Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research Agenda
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social Semantics
 
Modelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesModelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online Communities
 
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsUsing Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
 
Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community Forums
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic Data
 
Forecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeForecasting Audience Increase on Youtube
Forecasting Audience Increase on Youtube
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic Web
 
PhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataPhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social Data
 
Integrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous SourcesIntegrating and Interpreting Social Data from Heterogeneous Sources
Integrating and Interpreting Social Data from Heterogeneous Sources
 
Inferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL RulesInferring Web Citations using Social Data and SPARQL Rules
Inferring Web Citations using Social Data and SPARQL Rules
 
The Credibility of Digital Identity Information on the Social Web: A User Study
The Credibility of Digital Identity Information on the Social Web: A User StudyThe Credibility of Digital Identity Information on the Social Web: A User Study
The Credibility of Digital Identity Information on the Social Web: A User Study
 
Harnessing the Social Web: The Science of Identity Disambiguation
Harnessing the Social Web: The Science of Identity DisambiguationHarnessing the Social Web: The Science of Identity Disambiguation
Harnessing the Social Web: The Science of Identity Disambiguation
 

Recently uploaded

GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 

Recently uploaded (20)

GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 

Predicting Online Community Churners using Gaussian Sequences

  • 1. PREDICTING ONLINE COMMUNITY CHURNERS USING GAUSSIAN SEQUENCES MATTHEW ROWE SCHOOL OF COMPUTING AND COMMUNICATIONS M.ROWE@LANCASTER.AC.UK | @MROWEBOT International Conference on Social Informatics 2014 Barcelona, Spain
  • 2. The Issue of Churn 1 Churner: a user, or subscriber, who stops using a service! Churners = Loss Predicting Online Community Churners using Gaussian Sequences Social Social capital Expertise Vibrancy Financial
  • 3. Predicting Churners 2 Engineered Static Features How do churners and non-churners develop? How can we exploit development information to detect churners? Predicting Online Community Churners using Gaussian Sequences
  • 4. Outline ¨ Defining churn: in the context of online communities ¨ User Lifecycle Model ¤ Mining development signals ¤ Churners vs. Non-churners ¨ Prediction Models ¤ Gaussian Sequences ¤ Single/Dual-Gaussian Sequence Model ¨ Experiments ¨ Conclusions + Future Work 3 Predicting Online Community Churners using Gaussian Sequences
  • 5. 2008 2010 2012 0 200 400 600 800 Time Posts Frequency Churners posted for the final time in this window 4 ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ●● ● ● ●● ● ● ● Δ = maximum #days ● between posts ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● Mean Median ● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 101 102 Δ p(x) 10−3 10−2 10−1 Split each dataset’s users into training (80%) and testing (20%) Predicting Online Community Churners using Gaussian Sequences Analysis Point Defining Churners Facebook SAP ServerFault Boards.ie (i.e. in-degree, out-degree, clustering coefficient, closeness centrality, etc.), induc-ing a J48 decision tree to di↵erentiate between churners and non-churners when using social network properties formed from the reply-to graph of the online communities. In this paper, we implement this model as our baseline by en-gineering the same features using the same experimental setup. Our approach di↵ers from existing work by assessing churners’ and non-churners’ development signals, and inducing a joint-probability function from such information. Table 1. Splits of users within the datasets and the churn window duration Platform #Churners #Non-churners Churn Window Facebook 1,033 1,199 [04-11-2011, 28-08-2012] SAP 10,421 7,255 [29-11-2009,07-09-2010] Server Fault 12,314 11,144 [13-06-2010,24-12-2010] Boards.ie 65,528 6,120,008 [01-01-2005,13-02-2008]
  • 6. User Lifecycles We divide lifetime into equal activity periods 1 2 3 … k Experiment with different settings of k={5,10,20} s Model the actions to user u by other users Model the actions by user u to other users Model the terms used by user u Predicting Online Community Churners using Gaussian Sequences Term Count Semantic 17 Web 5 5 s = { In-degree Out-degree Lexical }
  • 7. Mining User Evolution Signals Applied Information Theory measures between stages Cross-Entropy: 1 2 3 … n H(P1, P2) H(P2, P3) H(Pn-1, Pn) Predicting Online Community Churners using Gaussian Sequences x H(P1, P2) H(P2, P3) … H(Pn-1, Pn) Produces: {x1,x2,…, xM} = X∈ℝMxS M users’ measures derived from S stages 6 Period Entropy Historical Cross-Entropy Community Cross-Entropy
  • 8. User Evolution Signals 7 0.5 0.7 indegree k = 5 ● ● Predicting Online Community Churners using Gaussian Sequences ● ● ● 1 2 3 4 5 Lifecycle Stage H ● ● ● ● ● ● ● indegree k = 10 ● ● ● ● ● ● 0.4 0.6 2 4 6 8 0.2 Lifecycle Stage H ● ● ● ● ● ● ● ● 0.9 outdegree k = 5 ● outdegree k = 10 For each stage: 1. Derive values for each user 2. Derive mean and 95% CI of churners & non-churners 3. Plot curves of each class
  • 9. Period Variation: Entropy Signals ● ● indegree k = 5 ● ● ● ● ● ● Increasing Lifecycle Fidelities ● ● 1 2 3 4 5 0.5 0.7 Lifecycle Stage H ● ● ● ● ● ● User Properties Predicting Online Community Churners using Gaussian Sequences ● ● ● ● 2 4 6 8 10 0.2 0.4 0.6 indegree k = 10 Lifecycle Stage H ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● 5 10 15 20 0.1 0.3 indegree k = 20 Lifecycle Stage H ● ● ●● ● ● ● ●●●●●●●● ●● ● ● ● ● ● ● ● ● 1 2 3 4 5 0.5 0.7 0.9 outdegree k = 5 Lifecycle Stage H ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 10 0.6 0.8 outdegree k = 10 Lifecycle Stage H ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●● ●●●● ●●●● ●● 5 10 15 20 0.5 0.7 0.9 outdegree k = 20 Lifecycle Stage H ● ●●●●●●●●●●●●●●●●●●● ● ● ● ● ● 1 2 3 4 5 4.0 4.3 4.6 lexical k = 5 Lifecycle Stage H ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 10 3.8 4.2 4.6 lexical k = 10 Lifecycle Stage H ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ●●● ● ● ●●●● ●● ●● 5 10 15 20 3.4 3.8 4.2 lexical k = 20 Lifecycle Stage H ● ●●●●●● ● ●●● ● ●●● ●●●●● Fig. 2. Period entropy distribution on ServerFault for di↵erent fidelity settings (k) for users’ lifecycles and di↵erent measures of social (indegree and out degree) and lexical dynamics. The green dashed line shows the non-churners, while the red solid line shows 8 Non-Churners: • Share more connections (greater entropy) • Invest more and get more out of the community
  • 10. Community-Comparisons: Cross-Entropy Signals ● ● indegree k = 5 ● ● ● ● ● ● 2.0 3.0 4.0 5.0 2.5 3.5 Lifecycle Stage H ● ● ● ● Predicting Online Community Churners using Gaussian Sequences ● ● ● ● ● 2 4 6 8 10 1.0 2.0 3.0 indegree k = 10 Lifecycle Stage H ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● 5 10 15 20 0.5 1.5 indegree k = 20 Lifecycle Stage H ●●● ● ● ● ● ●●● ●● ● ● ● ●● ● ● ● ● ● ● 2.0 3.0 4.0 5.0 4.0 5.0 outdegree k = 5 Lifecycle Stage H ● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 10 4.0 5.0 outdegree k = 10 Lifecycle Stage H ● ● ● ● ● ● ● ● ● ●●● ● ● ●●● ● ● ● ● ●●● ●●● ● 5 10 15 20 3.5 4.5 5.5 outdegree k = 20 Lifecycle Stage H ● ●● ●● ● ●●●●● ●●●● ●●● ● ● ● ● ● 2.0 3.0 4.0 5.0 7.9 8.2 8.5 lexical k = 5 Lifecycle Stage H ● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 10 7.6 8.0 8.4 lexical k = 10 Lifecycle Stage H ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ●● ●● ● ● ● ● 5 10 15 20 7.0 8.0 lexical k = 20 Lifecycle Stage H ●● ● ●●●●●●●● ●●●●●● ● Fig. 4. Community cross-entropy distribution for di↵erent fidelity settings (k) for users’ lifecycles and di↵erent measures of social (indegree and out degree) and lexical dynam-ics. 9 Churners: • Diverge from community norms, but slower
  • 11. 10 Prediction Models How can we exploit development information to detect churners? Predicting Online Community Churners using Gaussian Sequences
  • 12. 0.3 0.6 Lifecycle ● H ● 0.2 0.5 Lifecycle ● Gaussian Sequences 11 ● ● ● 2.0 3.0 4.0 5.0 Stage 0.1 0.4 Lifecycle Assume that measure, m, (in-degree) is normally distributed for a given stage (s) Rationale: uncommon overlap between 95% CI bounds ● ● ●●●● ● indegree k = 5 ● ● ● ● ● ● ● ● 1 2 3 4 5 0.5 0.7 Lifecycle Stage H ● ● 0.3 indegree Lifecycle ●●●●●●●●●0.4 " Gaussian Sequence is a chain of Gaussians for measure where μm,ˆand !ˆm,s denote the m over maximum the k=|likelihood S| stages: estimates of the mean and standard deviation respectively. Then the Gaussian Sequence of m is defined as follows: Gm = Predicting Online Community Churners using Gaussian Sequences ● ● ● ● ● ● ● ● 2 4 6 8 10 Stage H ● ● ● ● ● ● ● ●●●●●●●●●● ●●●●●●●● 5 10 15 20 Stage H ● ●●●●●●●●●●●●●●●● Fig. 3. Period cross-entropy distribution on ServerFault for di↵erent fidelity settings (k) for users’ lifecycles and di↵erent measures of social (indegree and out degree) lexical dynamics. that each correspond to a sequence of Gaussians measured over the k lifecycle stages: Definition 1 (Gaussian Sequence). Let m be a given measurement, s be given lifecycle stage drawn from the set of lifecycle ! stages s 2 S, then m said to be normally distributed on s and defined by N μm,ˆs, (!ˆs)2 m,⇣ N ! ˆμm,1, (ˆ!m,1)2,N ! ˆμm,2, (ˆ!m,2)2, . . . ,N 0.7 0.9 ! ˆμm,|S|, (ˆ!m,|S|)2 "⌘ . 5.1 Single-Gaussian Sequence Model ● ● ● ● ● ● ● ● 2 4 6 8 10 0.2 0.4 0.6 indegree k = 10 Lifecycle Stage H ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● 5 10 0.1 H ● ● ●● ● ● ● ● ● ● 1 2 3 4 5 0.5 0.7 0.9 outdegree k = 5 Lifecycle Stage H ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 4 6 8 10 0.6 0.8 outdegree k = 10 Lifecycle Stage H ● ● ● ● ● ● ● ● ● ● ● outdegree Lifecycle ●●●●●●●●● 5 10 0.5 H ● 0.6 0.8 0.3978 0.3988 m f(m)
  • 13. for influence on the churn probability; its inclusion is necessary because we may have an outlier measure for u and should limit over fitting as a consequence - note that this variable is indexed by both m and s as it is specific to both the lifecycle stage, and the measure under inspection. Given our formulation of the churn probability in a particular lifecycle stage s and based on measure m, we can therefore derive the joint probability of u churning over the observed sequence of measures (m 2 M) and his lifecycle stages (s 2 S) as follows - we term this the Single-Gaussian Sequence Model : Gaussian Sequence Prediction Models its domain a given user’s development as its co-domain, hence: f : Rn ! [0, 1]. regression; Single-(ii) a Gaussian dual-Sequence gaussian Model sequence We now explain each in turn. Probability of u belonging to the churn Gaussian, in each measure and stage Y Y ⇣ "m,sN ! f(u, m, s)|ˆμc the logistic regression model to predict probability over observed measures and lifecycle stages as follows - we term this the Dual-Gaussian Sequence Model: Here ⇢ acts as a smoother to chain together zero-probability values. Now, for this detection model, our objective is to minimise the squared-loss between a user’s forecasted churn probability and the observed churn label - given that the former is in the closed interval [0, 1] and the latter is from the set {0, 1} - our parameters are L2-regularised to control for over-fitting: 12 Q(u|b) = m2M s2S ⇢ Mining User Development Signals for Online Community Churner Detection • 1:19 Dual-Gaussian Sequence Model Probability of u belonging to the churn Gaussian minus the probability of belonging to the non-churn Gaussian , in each measure and stage Q(u|b) = Y m2M Y s2S ⇢ h "m,sN # f(u, m, s)|ˆμc m,s, (ˆ#c m,s)2$ − (1 − "m,s)N m,s, (ˆ#n m,s)2$i # f(u, m, s)|ˆμn ρ smoothes zero probabilities (set to 0.1) Predicting Online Community Churners using Gaussian Sequences xi (10) each feature within the linear model we used the maximum likelihood esti-mation derived model is used to predict the churn When inspecting each different mea-surement lifecycle stage 1) for both churners and non-churners, m,s, (ˆ!c m,s)2"⌘ The parameter ⇢ smooths zero probability values given our joint calculation. Assuming we have |S| lifecycle stages, and |M| measures, then the slack variables are stored within a parameter vector: b where - where "m,s 2 b. + β∈b learnt using (single|dual) stochastic gradient descent argmin b⇤ X (xi,yi)2D # yi − Q(u|b) $2 + $||b||2 (12)
  • 14. Experiments ¨ Aim: to maximise detection of churners ¤ I.e. Area Under ROC Curve 1. Do we perform better than the state of the art? 2. How do we fare against existing classifiers? 13 Predicting Online Community Churners using Gaussian Sequences
  • 15. Table IV. Number of instances within the training and testing datasets used for the experiments across the different lifecycle fidelities. The number of instances decreases as the lifecycle fidelity increases as we require each user Experimental Setup to have posted double the fidelity number of posts. Fidelity Facebook SAP ServerFault Boards.ie |Train| |Test| |Train| |Test| |Train| |Test| |Train| |Test| 5 306 72 1,099 302 1,229 338 6,338 1,700 10 204 48 716 205 688 177 4,979 1,334 20 123 27 448 129 375 84 3,635 995 Stage 1: Model Tuning instances, deriving the error in prediction and the derivative of each parameter and thus updating accordingly. As a result our 10-Fold update Cross-rule Validation is the following, to tune model for a given hyperparameters training instance with index !j = !j − ⌘ri!j model we Web have Science two hyperparameters Stage 2: than Model must Testing be tuned: the learning rate ⌘ and weight Conference Apply tuned models to held out test splits #. The use of elastic-net regularisation means that we can examine the spectrum 2011 using solely a lasso penalty (↵ = 1), or solely a ridge penalty (↵ = 0), or somewhere (↵ = 0.5). Rather than tuning linear models Baseline using the 1 (following B1-J48): ↵ as Decision a hyperparameter, tree with measures we adopted a different approach settings: ↵ = {0, 0.5, 1}, thereby tuning as features a hyperpameter }, Baseline 2 (B2-NB): Social network features (e.g. centrality) for each 14 setting. We explain in the following section the model tuning approach that Predicting Online Community Churners using Gaussian Sequences Experiments
  • 16. average number of posts within participated threads, popularity (% of user au-thored posts that receive replies), initialisation (% of threads authored by the user), and polarity. We first tested the J48 classifier, as used in [2], but found this to be poor performing5 therefore we used the Naive Bayes classifier instead. Results (ROC) 15 Table 2. Area under the Receiver Operator Characteristic (ROC) Curve results for the di↵erent Gaussian Sequence Models and Learning Procedures Baselines SGD D-SGD Platform Lifecycle Fidelity B1-J48 B2-NB Single-N Dual-N Single-N Dual-N Facebook 5 0.559 0.461 0.570 0.472 0.548 0.478 10 0.531 0.491 0.569 0.554 0.593 0.545 20 0.478 0.444 0.664 0.500 0.528 0.583 SAP 5 0.594 0.497 0.573 0.527 0.545 0.533 10 0.533 0.494 0.553 0.503 0.584 0.590 20 0.478 0.582 0.500 0.500 0.540 0.525 ServerFault 5 0.583 0.530 0.522 0.556 0.583 0.577 10 0.534 0.546 0.500 0.557 0.569 0.589 20 0.463 0.530 0.500 0.634 0.486 0.484 Boards.ie 5 0.504 0.611 0.524 0.547 0.526 0.518 10 0.512 0.593 0.500 0.539 0.501 0.496 20 0.560 0.553 0.500 0.501 0.500 0.502 Significantly outperform baselines for 2/4 datasets No clear winner among Gaussian Sequence Models 6.3 Results: Churn Prediction Performance For the model testing phase of the experiments, we took the best performing hy-perparameters Predicting Online Community Churners using Gaussian Sequences
  • 17. Conclusions + Future Work ¨ Gaussian Sequences allow user measures to be chained together in a joint probability model ¤ Mined from users’ development trajectories ¤ Examined across various lifecycle fidelities ¨ Exceed performance of baselines for 2/4 datasets ¨ FW1: Churn point prediction + user rankings ¨ FW2: Towards a theory of churner development
  • 18. Questions? @mrowebot | m.rowe@lancaster.ac.uk http://www.lancaster.ac.uk/staff/rowem/ 17 Predicting Online Community Churners using Gaussian Sequences