Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction

  • 827 views
Uploaded on

Presentation from the International Conference on Data Mining 2013. Dallas, USA.

Presentation from the International Conference on Data Mining 2013. Dallas, USA.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
827
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
5
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. MINING USER LIFECYCLES FROM ONLINE COMMUNITY PLATFORMS AND THEIR APPLICATION TO CHURN PREDICTION DR. MATTHEW ROWE SCHOOL OF COMPUTING AND COMMUNICATIONS @MROWEBOT | M.ROWE@LANCASTER.AC.UK International Conference on Data Mining 2013 Dallas, USA
  • 2. Identity Development: Offline 1 Development happens through stages Development = conflicts Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
  • 3. User Development: ‘Online’ 2 ¨  Recently studied in isolated dimensions: ¤  Socially (Telecoms Networks: Miritello et al. 2013) n  Communication networks tend to a capacity ¤  Lexically (Online Communities: Danescu-Niculescu-Mizil et al. 2013) n  Language ¨  adapts to the community, before diverging Without analysing development: a)  b)  Relative to earlier signals Relative to the community of interaction Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
  • 4. Understanding User Development enables… 3 work (more later) Jul Sep Nov A (b) T 0.8 Entropy Period Entropy Community Entropy In−degree Out−degree Lexical All 0.2 0.4 0.6 Figure 3: Average rat moving average of the categories. 0.0 of this talk True Positive Rate n  Focus churners from development signals 1.0 Churn Prediction ¤  Forecast Mar Time (a) Lens 2.  7.0 8.0 May 6.0 Average Rating 3.8 3.6 3.4 Directorial Debut Films 1990s Comedy Films 5.0 n  Current/future Average Rating Stage-based user neighbourhoods (e.g. user-kNN) ¤  Modelling taste evolution (e.g. biases in MF) 3.2 ¤  Developmental 4.0 Recommender Systems 3.0 1.  for MovieLens the scores re Movie Tweetings ‘Independe rating and ‘Directorial Debu rating over time. Such info the biases of the recommen stability of a given bias in made: i.e. considering the and how this relates to pre 0.0 Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction 5. 0.2 0.4 0.6 0.8 1.0 ANALYSING TA False Positive Rate Analysing the evolution a allows one to understand h
  • 5. Outline 4 Datasets: Online Community Platforms ¨  Defining User Lifecycles and Properties ¨  Mining Lifecycle Trajectories ¨  Predicting Churners ¨  Findings and Conclusions ¨  Future Work ¨  Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
  • 6. examination of user lifecycles we used data collected from Facebook, the SAP Community Network (SAP) and Server Fault. Table 1 provides summary statistics of the datasets where we only considered users who had posted more than 40 times within their lifetime on the platform.1 The Facebook dataset was collected Datasets: Online Community Platforms from groups discussing Open University courses, where users talked about their 5 issues with the courses and guidance on studying. The SAP Community Network is a community question ‘Open University’related to SAP technologies where 1.  Facebook answering system Groups users post questions and provide answers related to technical issues. Similarly, ¤  Containing discussions about courses and degrees Server Fault is a platform that is part of the Stack Overflow question answering 2.  site collection2SAP Community Network related to server-related issues. We where users post questions divided each platform’s users up into 80%/20% splits for training (and analysis) ¤  Question-answering system for SAP technologies and testing, using the former in this section to examine user development and 3.  the latter splitServer Fault detection experiments. for our later ¤  Stack Overflow subsidiary site for server-related issues Table 1. Statistics of the online community platform datasets. Platform Time Span Post Count User Count Facebook [18-08-2007,24-01-2013] 118,432 4,745 SAP [15-12-2003,20-07-2011] 427,221 32,926 Server Fault [01-08-2008,31-03-2011] 234,790 33,285 3.1 Defining Lifecycle Periods Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction In order to examine how users develop over time we needed some means to
  • 7. User Lifecycles: Derivation 6 Offline Lifecycle Periods Primary School High School University Postgrad Postdoc Lecturing Time First Post Last Post Lifecycle Periods of a potential Question-Answering System user (conjecture!) Novice Users Asking Questions Asking & Answering Questions Answering Questions In reality: do not know the labels, however we can split by equal time intervals: 1 2 3 … n Yet, users non-uniformly distribute their activity across lifecycles 1 2 3 … Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction n
  • 8. User Lifecycles: Properties 7 We set n=20 1 2 1 #posts ¨  3 2 = … n Divide lifetime into equal activity periods #posts Capture period-specific user properties (in period s): ¤  In-degree distribution n  ¤  Out-degree distribution n  ¤  Relative frequency distribution of senders to user u in period s Relative frequency distribution of recipients from user u in s Term distribution n  Relative frequency distribution of terms used by u in s Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction s
  • 9. they develop in the community (for SAP and Facebook), however Server Fault users remain relatively stable. This could be due to the relatively minor interaction effects that take place on ServerFault: users largely lurk on the platform Analysing Development: Period not contribute to seek answers to questions, and thus do Entropy unless it is necessary (i.e. they feel that their expertise is (3) 8 sufficient to answer a question or that a new question is ¨  required), asin users’itproperties across periods Variation a result is likely that users have an implicit understanding of how one should formulate a post and thus ¨  Computed period entropy for each property ghout their the language that should be used. using three tribution in change in Facebook with earlier SAP Server Fault ng relative ion in one es over the 0 0.2 0.5 0.8 1 0 0.2 0.5 0.8 1 0 0.2 0.5 0.8 1 G G G G GG G GGGG GG G GG GG G Lifecycle Stages G GG GGG GG G G GGGGG GG Lifecycle Stages G G G Distribution Entropy 2.5 3.0 3.5 4.0 4.5 G Distribution Entropy 0.6 0.8 G 0.4 Distribution Entropy 0.1 0.3 0.5 0.7 : C[t,t ] → sage by the conditional t, t ] as: G GGGGGGGGG GGGGG GGGG G Lifecycle Stages (a) In-degree (b) Out-degree (c) Lexical tropy): To hin a given Generally stable trends: of lifetime-stage distributions formed from users’ terms Figure 1. Entropies consistent variance in communication and probability in-degrees, out-degrees and lexical their Application to Churn Prediction Mining y describes User Lifecycles from Online Community Platforms andterms. riable, and
  • 10. that consistently across the platforms, users are contacted by people who have contacted them before and that fewer novel users appear. The same is also true for the out-degree distributions: users contact fewer new people than they did before. This is symptomatic of community platforms where despite new users arriving within the platform, users form sub-communities in which they interact and communicate Changes in properties relative to earlier with the same individuals. Figure 2(c) also demonstrates that Computed the minimised over time and thus produce a users tend to reuse language cross-entropy for each gradually propertydecaying cross-entropy curve. users form tently perfor We find a where diver the latter st demonstrate SAP we fi initially bef while for Se cross-entrop suggesting t Convergence on prior properties diverge f to This effect [2] where u begin with, Cross Entropy 0.10 0.20 G Facebook SAP Server Fault 1.2 G G G G G G G 0 G GGGGGGGGGGGGGGG 0.2 0.5 0.8 Lifecycle Stages 1 0.00 0.00 GG 0 G G GG GG GGG GGG GG G GG 0.2 0.5 0.8 Lifecycle Stages 1 GGG GGGGGG GGGGGG 0.0 0.30 ¨  Cross Entropy 0.4 0.8 ¨  Cross Entropy 0.05 0.10 9 0.15 Analysing Development: Period CrossEntropy 0 0.2 0.5 0.8 Lifecycle Stages 1 V. Inspecting concentrated Convergence: lack of communication with new people, or use of new terms platform, ex Figure 2. Cross-entropies derived from comparing users’ in-degree, outnamics of co Mining User Lifecycles from Online Community Platforms and theirwith previous lifecycle periods. We degree and lexical term distributions Application to Churn Prediction now turn to see a consistent reduction in the cross-entropies over time. (a) In-degree (b) Out-degree (c) Lexical
  • 11. Analysing Development: Community CrossEntropy 10 Difference in properties relative to the community ¨  Computed cross-entropy for each property between user @ [t,t’] and community @ [t,t’] ¨  G G GGGG GGG GGGGGGG 0 2.0 G G GGGGGGG GGGGGG G G (a) In-degree G Cross Entropy 7.0 8.0 G G G G GG 6.0 G Cross Entropy 3.0 4.0 5.0 Cross Entropy 1 2 3 4 lexical en Facebook entropy re SAP Server Fault increase. W here due t users R2 > 0 0.2 0.5 0.8 1 0 0.2 0.5 0.8 1 0 0.2 0.5 0.8 1 Lifecycle Lifecycle Stages Convergence onLifecycle Stages properties Stages community Divergence from the community B. Modell (b) Out-degree G GG G GG G G G G G GG G GG (c) Lexical G Inspecti Convergence-divergence: first, adapt to community; second, separate earlier, by Figure 3. Cross-entropies derived from comparing users’ in-degree, outMining User Lifecycles from Online Community Platforms and their Application to Churn Prediction paring use degree and lexical term distributions the community platform over the same time periods. We see a increased divergence towards the end of lifecycles. decreasing
  • 12. How can we model the evolution of individual users? Solution: Mine Lifecycle Trajectories i.e. fit a curve for each user’s development measure (property and indicator) Properties: in-degree, out-degree, terms Indicators: period entropy, period cross-entropy, community cross-entropy Measures: property and indicator (e.g. in-degree period entropy) 11 Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
  • 13. opment of user properties, setting the explanatory variable to be the lifecycle period of the user and the response variable to be the user property’s entropy. In modelling entropy development we can characterise each user using the slope (β) of the model, thus indicating the rate of change of entropy throughout the lifecycle periods. We induced user¨ specific entropy models for each platform’s users and then Fitted per-user linear regression models examined the cumulative frequency distribution [0,1] β¤  Ind’ var: entropy. Dep’ var: lifecycle period of the values for the different user properties and platforms, these ¤  >80% of users R2 > 0.4 are shown in Figure 4. −4 0 2 4 β 6 (a) In-degree 8 0.0 F(x) 0.4 0.0 F(x) 0.4 0.0 F(x) 0.4 0.8 Facebook SAP Server Fault 0.8 12 0.8 Lifecycle Trajectories: Period Entropy propertie the avera decay ov users had than 0, th model. T to be pro x (e.g. i λ = 1/¯. x model u [t0 , t0.05 ] model as the perio out-degre −2 −1 0 1 2 β (b) Out-degree 3 −3 −1 0 β 1 2 3 (c) Lexical Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction Figure 4. Cumulative frequency distributions of linear regression models’ As we model fo users alo In Figure
  • 14. user throughout their lifecycles, letting By se the in-degree, out- derivingbythe distribution cross-entropy when proportional f (ui , [t, t ]) earlier, deriving the minimum of average commparing users’ represent changes in user model to munity platform over the same values user properties with past properties, platforms and user change paring ( ) then across the users converge indicated clear different on their past devel-the end ofdevelopment, beforetrends. Thatfunction that returns the period cross-entropy of an towards community lifecycles. decreasing is, property (e.g. in-degree) for a given user propertiesbehaviour over time. the proportion of users for we examined user riable users. We begin this section This suggests that an exponential decay whom individual interval: model process. suitable for than 0, Cross-Entropy would be Lifecyclechange was greater describing such reductions Trajectories: Period and thus indicating yatforms differtheterms mining trajectories in average and the ponse throughout user’s lifecycles. Applying such a model requires f (ui , [t, t ]) − f (ui , [ 1 e development of users overall. We found that = cross-entropy values over decay for all tested measures, all elling 13 period y Trajectoriestrajecto- that users reduce in their δui |T | − 1 f (u , [t, t ]) ng the lifecycle time. average proportional changethe case]∈T, greater i To examine whether this was indeed ],[t ,t we [t,t value users an ngof the entropy haduser properties the s (in-degree, out-degree,of defined the converged on past behaviour of t<t <t ¨  Earlier: users measure δu that returns the average propor(entropy, nge of period-cross- thus suggesting the period cross-entropy for a given growth exhibited athan 0, tional change value in suitability of a decaying generally stable entropy Mining is performed by By deriving [t, t one model. Thewe chose their terms, letting requires denote a exponential ycle periods. in ¤  I.e. previously seendecay model f (ui , the ])distribution of average pro user-changes Thereforeuser throughoutthe lifecycles,relationships, etc. parameter resent user period cross-entropy of the del as a suitableFirst, examined the the change decay exponential decay platforms modelfunction that returns potential for rateacross given value for the that defines the values (δ) an arbitrary different develof a time d then beforebe provided evelopment, to then ¨  user property (e.g. in-degree) for a given user and properties we examined the proportion erties, begin this the explanatory variable rs. We setting section in-degree period cross-entropy) over time, where of users f x (e.g. interval: he the model: nd the average change was greater than 0, and thus i periodmining process. and the response of the user 1 f (u [t, t ]) − (ui [t , t ]) = 1/¯. We defined the lifecyclei ,period ffor, the exponential tested mea x δu = these decay overall. ,We ]) found that for all ser property’s entropy. In modelling |T | − 1 f (ui [t, t using an integer ,tusers s an average . , 20}, hence [t,t ],[t ]∈T, we can user model each user using the t<t value had = {1, 2, . .proportional change value o opy of characterise properties Average proportional <t Feature value for interval [t,t’] [t value in enerally change0 , t thefeature change of stable entropy rate than 0, Feature: property and development indicator thus suggesting the(6) suitability of l, thus indicating 0.05 ] ⌘of s1 , and then defined the exponential decay a decayin By deriving the Therefore we chose the We induced lettingdistribution) beexponential decay returns he lifecycle model as follows, a proportionalof average proportional model requires one p periods. users had user- fmodel. The a function that (s, ui change value <0, ¨  All change values (δ) across the different platforms and user le model for the develbe arbitrary feature (in-degree, els explanatory platform’s properties wethen ofthe proportion ofλ thatfor whom the decay rate of a giv for eachthe periodusers and examinedto an provided users defines cross-entropy the variable ative and the response the average change wasx (e.g.than 0, and thus indicating frequency hence fitted exponential decay model: distribution of the βgreater in-degree period cross-entropy) over tim user out-degree, terms) for a given 1/¯. We defined the lifecycle period for the ex user and lifecycle period: Average of user’s features λ= x i i As we induce a per-user parameter, and thus derive a 0.8 0.8 ntentropy.properties and platforms, these found that for all tested measures, all user In modelling decay overall. We Exponential Decay Model erise model using an s integer value 4. each user using the users had an average proportional change value of greater s = {1, 2, . . . , 20 g(ui , s) the ,suitability≡ a , and growth (u than 0, thus suggesting= f t i ,]s1 )es decaying then defined(7) exponenti ng the rate of change of [t0 0.05 of 1 the model. Community Platforms decay Application to Churn Prediction riods. WeMining User Lifecycles from Online The exponential and theirmodel requires one parameter induced user2 3 model decay rate of letting f (s, to be provided λ that defines the as follows, a given value ui ) be a function tha latform’s users and then
  • 15. Lifecycle Trajectories: Community CrossEntropy 14 n  Divergence linear regression ● ● ● ● ● ●●●● 0 ● ● ●●● ●●●●●●● ● ● ● ●●●●●●● ●●●●●● ● ● 0.2 0.5 0.8 Lifecycle Stages ●● ●● 1 ● ● ● ● ●●●● ● 6.0 0.2 0.5 0.8 Lifecycle Stages ●●● ●●●●●●● ● ● Cross Entropy 7.0 8.0 Cross Entropy 3.0 4.0 5.0 0 6.0 0 1 2.0 ● ● ● ● ● ●●●●●●● ● ●●●● ● ● ●● 2.0 0 Cross Entropy 1 2 3 4 ● 0.2 0.5 0.8 Lifecycle Stages ●● 0 ● ● ● ●● ● ●● ● ● ● (b lex 1 n  Facebook, SAP: quadratic regression Facebook en SAP Figure 3. Cross-entropies deri n  Server Fault: linearIn-degree (a) (b) Out-degree (c) Lexical inc Server Fault regression degree and lexical term distribut he time periods. We see a increase >73% of users have R2 > 0.4 Figure 3. Cross-entropies derived from comparing users’ in-degree, use out 0 (a) In-degree 2.0 Facebook SAP Server Fault ● ● ● ● ●●●●●●● ● ●●●● ● ● ¤  Lexical: ¨  ● Cross Entropy 3.0 4.0 5.0 ¤  Out-degree: Cross Entropy 1 2 3 4 n  Convergence-divergence Facebook SAP Server Fault Cross Entropy 7.0 8.0 quadratic regression ● 0 ¤  In-degree: Cross Entropy 1 2 3 4 Identified differences between platforms and properties’ trajectory models Cross Entropy 3.0 4.0 5.0 ¨  1 0 ●● ● ●● 0.2 0.5 0.8 Lifecycle Stages ● ● ● ● ● ●● ● ●● ● ● 0.2 degree and lexical term0.2 0.5 0.8 1 0 distributions the community platform over the sam 0.5 0.8 1 0 0.2 0.5 0.8 1 Lifecycle periods. We see a Lifecycle Stages Prediction Lifecycle Stages Mining User Lifecycles from Online CommunityStages time Platforms and their Application to Churn increased divergence towards the end of lifecycles 0 (a) In-degree (b) Out-degree B. informs how online com (c) Lexical
  • 16. Mining lifecycle trajectories enables users to be categorised by their behaviour… Facilitating Churn Prediction 15 Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
  • 17. entr F EATURES USED FOR THElabel of PREDICTION EXPERIMENTS .B. Experimental4 Setup CHURN the user from one of two T HE values: y 2 {0, 1}, the closed inter In this section we INDICATORS OF LIFECYCLExiTRAJECTORIESbinary R-valued feature vector for mod define churn prediction as an 11-element TO while denotes a ARE USED formance eac CHARACTERISE USER EVOLUTION ALONG or DIFFERENT USER 10-element feature of and either examined indicaclassification task and use the previously a FacebookTHE SAP user, and a For our experiments PROPERTIES . sures: (i) fina vector for a Server Fault user - given that we use a linear precis by tors of lifecycle trajectories to predict whether a for each user’s lexical combining the test user is a mod regression model community cross- charac operator Property Model Feature(s) Platform setting feat ea churner or not. As we confine Indicator user lifecycle periodsWe model thetogether and ranked the u entropy development. from feature vector of each we In-degree Period Entropy Linear Regression All sele user using thetrajectory indicators Alland a standard deviatio the trajectories from the previous section, Period Cross-Ent the start of their lifecycle to the end we useExponential Decay 2 16 the induced perf mo in short Quad’ Regress’ a1 ,our set ofAllagain where we place Table II defines a features into the respectiv Comm’ Cross-Ent mined from this period to characterise howLinearset depending on the dynamics it captures.ranks, set users develop. All top-k A Out-degree Period Entropy within a Regression each ¨  be Period Cross-Ent Exponential Decay the mean asthe We define churners as any user who posts for the last time Allthe same instancesof dict Comm’ Cross-Ent Linear Regression Table II All dom observingto .which the u different user F Period window of our datasets,PREDICTION EXPERIMENTS T HE Linear FOR THE CHURN All before the ¨  final 10%Lexicalthe time EntropyEATURES USEDRegression of the Period Cross-Ent INDICATORS OF LIFECYCLE TRAJECTORIES AREchurn prediction Exponential Decay Allics on USED TO correct. use cutoff points are: 2012-07-09 Comm’Facebook,Quad’ Regress’ EVOLUTION Fb, SAPTHE DIFFERENT USER We form for Cross-Ent 2010-05-11 2 CHARACTERISE USER a1 , afor ALONG ¨  a randomly sele sure Comm’ Cross-Ent Linear RegressionPROPERTIES . SAP, and 2010-12-23 for ServerFault. Our dataset is of the SFerty in isolation, for in oper to the probabil Property Indicator Modeland the entropy, period Feature(s) Platform following form: D = {(xi , yi )}, where yi denotes the class Linear Regression In-degree Period Entropy All (setting p =we |ch Period Cross-Ent 4 Exponential Decay All entropy trajectory indi the label of the user from one of two values: y Comm’ {0, 1}, Quad’ Regress’ a1 , a2 the receiver op 2 Cross-Ent All A. Prediction Model Definition Out-degree Period Entropy Linear Regression All model in confidence of a isolation, topfor while xi denotes an 11-element R-valued feature vector for Exponential Decay Period Cross-Ent All the Facebook, SAP: 11 features Comm’ All and examining in-degre observed userfeature ) contains ui to w settings of confi either a Facebook The SAP user,featurea vector of Period Cross-Ent (xiLinear Regression or features and 10-element Entropy Linear Regression Lexical All Server Fault: 10 Period along corr the indicator trajectories of we use a linear Exponential Decay finally thereby setting combined a vector for a Server Fault user - given that the user Cross-Ent the different, a2 we All Comm’ Cross-Ent Quad’ Regress’ a1 Fb, SAP properties. We use the logistic regression modelLinear predict In SF to Regression Comm’ Cross-Ent model. follows: soa ran doing to w regression model for each user’s lexical community crosst the conditional probability of user ui churning as follows: features maximum p (sett ¨  Induce entropy development. We model the feature vector of each coefficients via on prediction = f (x) the 1 Definition likelihood estimation selection for specific A. Prediction Model user using the trajectory indicators from |the)previous section, P r(Y = 1 xi = (9) |x confi i Probability of user churning 1+e model dif The where we place user ui (xi ) For each setti in short Table II defines our set of featuresobserved feature vector of performingcontains for F the indicator trajectoriesweight user As the used along different Mining User Lifecycles TheOnline Community Platforms and their)Application to Churn Predictionattached we positivethe therP from model’s coefficients ( define the of the log (T each within a set depending on the dynamics We captures. regression model to predictrate follo it use the logistic properties. to each identity trajectory feature within the linear model Predicting Churners Binary classification task: is user u a churner? Dataset churners: who last posted before final 10% Dataset attributes from trajectory model features Induced Logistic regression model: and from are diction model we these
  • 18. Evaluation: Setup 17 User-wise dataset split: 80% training, 20% testing ¨  Experiments: ¨  ¤  Isolated user properties, isolated development indicator features, all features together ¨  Evaluation measures: 1.  2.  ¨  Precision@k (P): Avg over k={1,5,10,20,50,100} Area Under the Receiver Operator Curve (AUC) Baseline: Success probability in single Bernoulli trial ¤  I.e. randomly selecting a churner Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
  • 19. Table III ¯ ) AND A REA U NDER THE RECEIVER OPERATOR P RECISION @ K (P CHARACTERISTIC C URVE (AU C) VALUES FOR FACEBOOK , SAP AND S ERVER FAULT WHEN TESTING DIFFERENT: ( I ) USER PROPERTIES , ( II ) DEVELOPMENT INDICATORS , ( III ) ALL FEATURES TOGETHER . Evaluation: Results AU C) is preferable (thus achieving a value baseline for this measure is 0.5. 18 nts the performance of the different models ¨  Variance in features atforms, showing variation in the optimum ation measures. Interestingly, we find that depending on: ures combined together does not yield the ¤  Accuracy preference y of the tested platforms. For Facebook the hat the prediction model using community n  I.e. precision ¯ recall > icators performed best in terms of both P sted the difference between this model and ¤  Platform ming model (Full) using a Mann-Whitney n  Different detection he difference to be significant (at the 5% signals for different found differences in the communities best performing Platform Facebook SAP Server Fault Feature Entropy Period Cross Entropy Community Cross Entropy In-degree Out-degree Lexical Full Baseline Entropy Period Cross Entropy Community Cross Entropy In-degree Out-degree Lexical Full Baseline Entropy Period Cross Entropy Community Cross Entropy In-degree Out-degree Lexical Full Baseline to the evaluation measure used: in-degree ¯ exical features ¨  FullThese differences for P . model is never entrating on top ranks and thus informing the best ners with high-levels of confidence can be assessing the term distributions of users dynamics, while for preferring recall the Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction distributions is preferable. o Server Fault, the results also indicate ¯ P 0.761 0.624 0.791 0.648 0.781 0.681 0.730 0.629 0.434 0.321 0.334 0.351 0.250 0.438 0.363 0.342 0.392 0.300 0.352 0.232 0.293 0.459 0.421 0.319 AU C 0.500 0.485 0.617 0.511 0.570 0.557 0.573 0.500 0.549 0.568 0.549 0.592 0.503 0.539 0.539 0.500 0.526 0.555 0.538 0.475 0.512 0.546 0.554 0.500
  • 20. s are salient gh precision their in-degree distributions, and the extent to which they are contacted during one time period relative to their past communications, reduces at a much faster rate than on ServerFault. Evaluation: Churner Patterns Table IV 19 B EST PERFORMING PREDICTION MODEL COEFFICIENTS FOR FACEBOOK icting churn( COMMUNITY CROSS - ENTROPY ), SAP (I N - DEGREE ) AND S ERVER nspecting the Reduced quadratic coefficients: churnersLL FEATURES ARE SIGNIFICANT FAULT ( PERIOD CROSS - ENTROPY ). A exhibit steep . One of the WITHIN THEIR RESPECTIVE MODELS (↵ < 0.05) cross-community curves towards the end of their lifecycles as our churn Feature Facebook SAP Server Fault s that can be In-degree Entropy 0.0532 dual features In-degree Period Cross-Ent 0.0139 -0.1826 1 In-degree Comm’ Cross-Ent a -0.1057 -0.1878 y inspecting 2 In-degree Comm’ Cross-Ent a -0.0510 -1.5104 odel we can Out-degree Comm’ Cross-Ent 0.3173 Out-degree Period Cross-Ent 0.0210 ase/decrease) Lexical Period Cross-Ent Lexical Comm’ Cross-Ent a1 Lexical Comm’ Cross-Ent a2 0.3253 -0.0541 - 0.0557 - nts from the g the AU C, Variance in decay coefficient: degree of communication decays and SAP we VII. D ISCUSSION a lot faster forW ORK AND F UTURE SAP than Server Fault n model for distributions Prior work on social network evolution by Panzarasa et al. Mining [6] from Miritello et al. [1] their Application to Churn social has a vertexUser Lifecyclesand Online Community Platforms andfound that users’Prediction networks sed and that tend to a limit in terms of their communication capacity.
  • 21. Conclusions 20 1.  Users communicate with a fixed-set of users ¤  Similar 2.  to findings from (Miritello et al. 2013) Convergence-divergence effect: users converge on community ‘norms’ before diverging ¤  (Erikson. 1959) theorised that younger people are susceptible to social norms ¤  (Danescu-Niculescu-Mizil et al. 2013) found users to converge on lexical norms, before diverging 3.  Variance in churner signals ¤  No common best model was found across platforms Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
  • 22. Current & Future Work 21 1.  Regularised Linear Models ¤  Achieved ~30% AUC boost with growth and magnitude that users tend to converge in their reviewing behaviour and features u,s,c that previous profiles allow one to gauge how the user will Dtrain ) (4) rate items in the future given their category information. u,s,c0 ng(Dtrain ) 2.  Conversely, for MovieLens and Movie Tweetings we see an opposite e↵ect: users’ taste profiles become less predictable ¤  Used lifecycle model (n=5) to form category-ratings profiles as they develop; users rate items in a way that renders unassess the relative certainty variance from previous information. n user and lifecycle ¤  Identified in profiling in taste evolution across platforms mapping function categories they are Dissimilarity categories ( g ) we in taste profile o di↵erent categorfrom previous the former profile profile gories, would lead ficity that the cat1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Lifecycle Stages Lifecycle Stages Lifecycle Stages type, formed from ries would lead to (a) Lens (b) Tweetings (c) Amazon uenced byMiningprior the User Lifecycles from Online Community Platforms and their Application to Churn Prediction thors consider only 0.220 0.290 0.275 ● ● 0.215 ● ● 0.210 ● ● Conditional Entropy 0.285 ● 0.205 ● 0.280 ● Conditional Entropy 0.235 0.245 ● ● 0.225 Conditional Entropy Evolving-Taste Recommender System ●
  • 23. 22 Questions? @mrowebot m.rowe@lancaster.ac.uk http://www.lancaster.ac.uk/staff/rowem/ Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction