• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms
 

Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms

on

  • 337 views

Presentation at the International Conference on Social Informatics 2013

Presentation at the International Conference on Social Informatics 2013

Statistics

Views

Total Views
337
Views on SlideShare
336
Embed Views
1

Actions

Likes
0
Downloads
1
Comments
0

1 Embed 1

https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms Presentation Transcript

    • CHANGING WITH TIME: MODELLING AND DETECTING USER LIFECYCLE PERIODS IN ONLINE COMMUNITY PLATFORMS DR. MATTHEW ROWE SCHOOL OF COMPUTING AND COMMUNICATIONS @MROWEBOT | M.ROWE@LANCASTER.AC.UK International Conference on Social Informatics 2013 Kyoto, Japan
    • Offline Personal Development 1 Primary School High School University Postgrad Postdoc Time Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms Lecturing
    • Offline Personal Development 2 Offline, we develop in terms of both our interests and social networks Primary School High School University Postgrad Postdoc Time Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms Lecturing
    • User Development in Online Communities 3 ¨  Understanding user development enables: ¤  Churn prediction (concentration of this paper) ¤  Stage-based Recommendations (future work) ¨  Studied thus far in isolated dimensions: ¤  Socially (Telecoms Networks: Miritello et al. 2013) ¤  Lexically (Online Communities: McAuley & Leskovec. 2013) ¨  Without considering user development: a)  b)  Relative to earlier signals Relative to the community of interaction Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms
    • Modelling User Lifecycles 4 Offline Lifecycle Periods Primary School High School University Postgrad Postdoc Lecturing Time First Post Last Post Lifecycle Periods of a potential Question-Answering System user (conjecture!) Novice Users Asking Questions Asking & Answering Questions Answering Questions In reality: do not know the labels, however we can split by equal time intervals: 1 2 3 … n Yet, users non-uniformly distribute their activity across lifecycles 1 2 3 … n Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms
    • Lifecycle Periods and User Properties 5 1 2 1 #posts ¨  3 2 = … n Divide lifetime into equal activity periods #posts Capture period-specific user properties (in period s): ¤  In-degree distribution n  ¤  Out-degree distribution n  ¤  Relative frequency distribution of senders to user u in period s Relative frequency distribution of recipients from user u in s Term distribution n  Relative frequency distribution of terms used by u in s Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms s
    • tics of the datasets where we only considered users who had posted more than 40 times within their lifetime on the platform.1 The Facebook dataset was collected from groups discussing Open University courses, where users talked about their issues with the courses and guidance on studying. The SAP Community Network is a community question answering system related to SAP technologies where 6 users post questions and provide answers related to technical issues. Similarly, 1.  Server Fault is Facebook ‘Open University’ GroupsOverflow question answering a platform that is part of the Stack 2 site collection¤  Containing discussions about courses and degrees where users post questions related to server-related issues. We divided each platform’s users up into 80%/20% splits for training (and analysis) 2.  SAP Community Network and testing, using the former in this section to technologies development and examine user ¤  Question-answering system for SAP the latter split for our later detection experiments. Datasets: Online Community Platforms 3.  Server Fault Table 1. Statistics ofsubsidiary site for server-related issues the online community platform datasets. ¤  Stack Overflow Platform Time Span Post Count User Count Facebook [18-08-2007,24-01-2013] 118,432 4,745 SAP [15-12-2003,20-07-2011] 427,221 32,926 Server Fault [01-08-2008,31-03-2011] 234,790 33,285 3.1 ¨  For each dataset we set Defining Lifecycle Periods the number of lifecycle periods (n) to 20 In order to examine how users develop over time we needed some means to Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms segment a user’s lifetime (i.e. from the first date at which they post to the date
    • We can assess users’ properties in each of this lifecycle periods for: 1.  Property changes relative to earlier properties 2.  Property changes relative to the online community’s properties 7 Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms
    • probability forming use set of posts the probabi instance, fo 8 frequencies ¨  discrete pro (P[t,t ] ), and time interva H(P, Q) = − p(x) log q(x) (5) between the x As befor ¨  Computed between period s and earlier periods, In the same vein as the earlier entropy analysis, we each platfor choosing the minimum cross-entropyplatform’s users the mean co derived the period cross-entropy for each throughout their lifecycles and then derived the mean crossFigure 3 pre Convergence on prior properties entropy for the 20 lifecycle periods. Figure 2 presents the degree, outcross-entropies derived for the different platforms and user periods. We properties. We observe that for each distribution and each entropy of platform cross-entropies reduce throughout users’ lifecycles, that a given suggesting that users do not tend to exhibit behaviour that users of the 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 has not been seen previously. For instance, for the in-degree Lifecycle Stages Lifecycle Stages Lifecycle Stages entropy of t Changing(a) distribution the cross-entropy gauges Community Platforms with Period Cross-Entropy (b) Period Periods in Online the extent Cross-Entropy Time: Modelling and Detecting User Lifecycle Cross-Entropy (c) Period to which later parts o - In-degree - Out-degree the users who contact a given user at- Lexical a given lifecycle given user d by computing the cross-entropy of one probability distribution with respect to another distribution from an lifecycle period, and then selecting the distribution that minimises cross-entropy. Assuming we have a probability distribution (P ) formed from a given lifecycle period ([t, t ]), and a probability distribution (Q) from an earlier lifecycle period, Cross-entropy: ‘uncertainty’ of one distribution (Q) then we define the cross-entropy between the distributions as follows: relative to another (P) 0.15 0.10 0.05 ● ● ●● ●● ●●● ●●● ●● ● ●● 0.0 0.2 0.4 0.6 0.8 1.0 1.2 ●●●●●●●●●●●●●●● ● Time−period Cross Entropy ● ● ● 0.00 0.05 ● Time−period Cross Entropy 0.20 Facebook SAP Server Fault 0.10 0.15 ● ● 0.00 Time−period Cross Entropy Assessing User Evolution: Period Cross-Entropy ● ● ●● ●●● ●●●●●● ●●●●●●
    • 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Time−period Cross Entropy 0.15 ● 0.10 Time−period Cross Entropy 0.20 0.10 0.15 Facebook SAP Server Fault ● ● ● ●●●●●●●●●● ● ●●●● ● ● ● 0.00 ● 0.05 Computed between properties in period s of user u and properties of the community in period s 0.05 ¨  ● 0.00 9 Time−period Cross Entropy Assessing User Evolution: Community Cross-Entropy ¤  I.e. Lifecycle Stages term distribution 0 0.2 0.4 0.6 0.8 1 0 ●● ●● ●●● ●●● ●● ● ●● ● ● of entire community 0.2 0.4 0.6 0.8 1 0 ●● 0.2 Lifecycle Stages ●●● ●●●●●● ●●●●●● 0.4 0.6 0.8 1 Lifecycle Stages ●●●● ●●● ●●●●●●● 0 0.2 0.4 0.6 0.8 Lifecycle Stages 1 8.0 7.5 7.0 ●● ●● 6.0 ● 2.0 ● ● ● 6.5 5.0 4.0 3.0 ● Distribution Cross Entropy ● ●●● ● ●●●●●●●●● ●●●● ● Distribution Cross Entropy 4 3 Facebook SAP Server Fault 1 2 ● 0 Distribution Cross Entropy 5 8.5 (a) Period Cross-Entropy (b) Period Cross-Entropy (c) Period Cross-Entropy - In-degree - properties - Lexical Convergence on communityOut-degree Divergence from the community 0 0.2 0.4 0.6 0.8 Lifecycle Stages 1 0 0.2 ● ● ●● 0.4 ● ● Entropy - Out-degree ●● ● ●● 0.6 0.8 ● ● ● 1 Lifecycle Stages Cross- (f) Community Changing(d) Time: Modelling and Cross- (e) Lifecycle Periods in Online Community Platforms with Community Detecting User Community Entropy - In-degree ● Entropy - Lexical Cross-
    • How can we use these developmental signals to detect the lifecycle period of a user? 10 Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms
    • dimensional feature dimensional feature vector representation representation of a prediction model uses: (i) rates and (ii) magnitudes, where vector of a user’s evolutionua is an integer each feature is measured for a givenis an integer period. To of the lifecycle of the lifecycle per. lifecycle representation representation period (S = {6, 7, ease feature definition and model specification, we alter the lifeFeature Engineering Detecting Lifecycle4.1 interval4.1 Feature Engineering Periods:tuple set (i.e. cycle period notation from the existing section s 2 section we found that di↵erent [t, t0 ] 2 T )Engineering Features previousIn the previous S,that di↵erent lifecycle perio to use a set of discrete In the elements: we found single terised and thus on the based on the evolut terised and thus detected based detectedevolution of the user 11 where S = {1, 2, . . . , 20}. Magnitude features inspecting the final lifecycle periodlifecycle period: defined period: i.e. when are i.e. when inspecting the final of users in terms period lexical period cross-entropy, w as a given user’s measure taken at in terms of their lexical period: a given lifecycle of theircross-entropy, we find that thi ¨  Growth feature: proportioned and that theinandof decay has levelled o↵ has change rate that the vious periods, vious periods, measurerate of decay som m(u, s), where the measure for user u is taken at lifecycle can rate from one rate from one to the pe can use the growthuse the growth lifecycle periodlifecycle ne period s. m = measure ofdefined as changesdetecting the lifecycle period lifecycleuser. Weof the user. W Rates are user property (e.g. in-degree distribution)detecting the of the period define this g for by a for in measures from one developmental function (e.g. m(s + 1) m(s) /m(s), wh lifecycle period to the next: community cross-entropy) lows:5 + (s) =m(s) /m(s), where m(s) denot lows:5 m (s) = m(s m 1) function that returns the (e.g. in-degree peri function that returns the measure value measure value (e.g. for the m(u, period s; thus m (s) s; indicates < 0 m = m (s) m(u, s + 1) lifecyclefor the lifecycle period< 0thusDecay decay,indi s) dm = (5) m (u, s) = change, indicates > 0 indicates no change, andnom (s) > 0and m (s) growth. Growth growth. T The growth rat ds {(x , y ),m(u,y growth.afeature, growth dataset, in x=by lookingand byk s) . . , single y our feature and denotes back the following form: D = (x2 , (xn inn )}, where our dataset, the 1 1 a single 2 ), we growth we produce k growth features. Basedfeatures. user propertie feature vector of a given user and y denoteslexical producelabel and period) 3Basedindicator theClass labellexical (lifecycle period)the 3 class k(lifecycle 3on the and 3on of Where m is indexed by the given measure (i.e. in-degree distributions) degree, and degree, and distributions) development develop ¨  the user. Hence the feature the above magnitude functioncommunity cross-entropy) o cross-entropy the user up have a and cross-entropy and communityand period cross-entropy), usingvector is a characterisation ofcross-entropy) we until totalw the for m can including the magnitude of a and describes how convenience function m with take, and wit to return the lifecycle period sgiventhe convenience functionuser has developed beforemeasure (m) the user utake, and can k growth fea each measure datasets, Our denotes the each 2 , y2 ), . . 2 M ). yn (m where x datasets, both fo the following form: D = {(x1 , y1 ), (xmeasure .(m , (xn , Our)}, 2 M ). both for training and Dataset formalisation: hand. allotted lifecycle period. Thus a feature vector (x) at the The feature vector contains growth features of the user across di↵erent 4 4 feature vector[ of a(s user user, and (sWe use thisandWerepresentation for(s k)]. period)perio given using these rate integermagnitude (lifecycle This for- a y denotes (s class .label representation for legibility the use this .integer legibility and consider of is formed for = single 1), . . . m1 to provide m cientprovide sudata. 2 training data. measures: x a m1 k), su to 1), . , cient m training the user. Hence the feature vector 5is a characterisation of the user up until and 5 features: equivalent proportionat mat indicates that for both measures This 1growth mThiswe includeisk growth growth rates (m and rate ) equivalent to proportionatetofeatures. 2 is growth rate including thegrowth features from the previous k lifecycle periodsthe user has developed beforeUsing lifecycle period s and describes how models. models. We maintain the splits mentioned above by using the 80% of analysed users for hand. The 2), .Modelling 1 (u, 19),User 2 (u, growth, m2Community of. . . user across di↵erent x =[m1 (u, featurem and Detecting m Lifecycle Periods in Online (u, 19), the Changing with Time: . . , vector contains 2), . . . features Platforms the training x =and the held-out, 20%(s users for the1), . . . , split. For this latter set [ of testing measures: 2), . . . m1 (s (u, 18),. m(u, 1), k),. ,m (s (u, 18)] m2 (s k)]. This for1), . . 1 (u, , ..
    • with the findings of Danescu et al. [2] where users adapt their language to the measures: x = [ m (s 1), . . . , m (s k), m (s 1), . . . , m2 (s k)]. This fo community to begin with, before 1then diverging 1towards the end. mat indicates that for both measures (m1 and m2 ) we include k growth feature We maintain the splits mentioned above by using the 80% of analysed users fo the training Detection 4 Lifecycle Period set and the held-out 20% of users for the testing split. For this latte dataset we hid the class labels and detected the label using the below model. The above analysis unearthed the development that users go through across 12 the three community platforms, based on di↵erent properties and development 4.2 Vector Space Detection Model indicators. ¨  Goal: induce a surjective function thatthe lifecycle period that We now turn to the problem of detecting returns a user’s lifecycle period prior his growth We characterise this task as a given user isTo induce on his fromfunction thatfeatures in based a detection development. can perform multi-class classification we us n-dimension growth feature a multi-class classification problem in which classes and theirinduce a functionvector the mos a vector space representation of our goal is to boundaries to identify that returns asimilar, lifecycle period: f :class,! S, where theuser’s feature vector. We defin user’s or rather proximal Rn to an arbitrary domain is an ndimensional feature vector representation of a user’s evolution and the co-domain using a give this function as: f (x) = arg maxs2S sim(x, s), where sim is derived 4 is an integer representation of theand thus chooses(S =class (lifecycle period) that maximise similarity function lifecycle period the {6, 7, . . . , 20}). Choose the most similar class from vector space proximity/similarity similarity. We vary the similarity function (sim(x, s)) through four measures: 4.1 Feature Engineering vary the similarity function: ¨  Hence, we can 1. Cosine Similarity: Measures the cosine of the angle between the user’s fea 1.  Cosine similaritythat di↵erent lifecycle periods can be characIn the previous sectionvector x and the class centroid vector from the training data ps . ture we found (class centroid similarity) 2.  2. Reciprocal of Euclidean distance (to the centroid) terised and thus detected based on the evolutionthe classuser into the lifecycle Euclidean Distance: Measures of distance between the vectors and the period: i.e. when inspecting reciprocal lifecycledistance (considering each platform 3.  Reciprocal theMahalanobis distance of users on class takes the of final of this period to derive the similarity measure, as th covariance) in terms of their lexical period cross-entropy, we find that this is less than pre- is minimised reciprocal distance is maximised when the Euclidean distance vious periods,4. andSpearman rate of decay Accounts for o↵(monotonic association) distributio that the Rank Correlation levelled the variance Hence we 3. Mahalanobis Distance: hasCoefficient somewhat.in the class can use the growth rate from one lifecycle period derived by including the covariance matri from which the centroid vector is to the next as information Changing with Time: Modelling and Detecting User Lifecycle Periods Community for detecting the lifecycle period of the user.= in Onlineps )⌃ s (xgrowth rate as folWe define this Platforms) 1 s 2 ⌃ of the class s: simmah (x ps lows:5 m (s) = m(s + 1) m(s) /m(s), where m(s) denotes a convenience Detecting Lifecycle Periods: Vector Space Model
    • Detecting Lifecycle Periods: Evaluation 13 ¨  Setup: Set the number of prior lifecycle periods (k) to 5 ¤  Detected all periods in the closed interval [6,20] ¤  80% users for training (analysed above), 20% test ¤  n  ¤  ¨  Used the former to induce centroids and covariance matrices Varied feature sets (e.g. just in-degree, just cross-entropy, etc.) Accuracy Measures: F1: F-measure (harmonic mean of precision & recall, beta=1) ¤  MCC: Matthews (not mine!) Correlation Coefficient ¤  ¨  Baselines: Random Model (implicit in MCC) ¤  Naïve Bayes ¤  Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms
    • the naive Bayes classifier. The latter comparing the vector space model against an existing generative model that is regularly used in multi-class classification tasks, and the former measuring performance of each vector space model against the random model using the Matthews correlation coe cient (mcc). Detecting Lifecycle Periods: Results Table 2. F1 scores of the di↵erent platforms when detecting the user lifecycle peri14 ods using di↵erent detection models and feature sets, with the Matthews Correlation Coe cient in parentheses to show improvement over the random model baseline. F1 (MCC) Key: Platform Facebook Feature Set Cosine Euclidean Mahalanobis In-degree 0.677 (0.637) 0.757 (0.730) 0.706 (0.659) Out-degree 0.609 (0.582) 0.751 (0.718) 0.703 (0.665) Lexical 0.653 (0.632) 0.757 (0.730) 0.739 (0.700) Entropy 0.674 (0.618) 0.757 (0.730) 0.676 (0.621) Period Cross Entropy 0.650 (0.590) 0.774** (0.746) 0.630 (0.586) Comm’ Cross Entropy 0.643 (0.592) 0.760 (0.732) 0.657 (0.610) All 0.676 (0.614) 0.757 (0.730) 0.659 (0.608) SAP In-degree 0.582 (0.520) 0.665 (0.652) 0.426 (0.376) Out-degree 0.597 (0.571) 0.658 (0.647) 0.600 (0.588) Lexical 0.583 (0.521) 0.665 (0.652) 0.431 (0.378) Entropy 0.522 (0.468) 0.665 (0.652) 0.470 (0.418) Period Cross Entropy 0.643 (0.591) 0.656 (0.651) 0.434 (0.377) Comm’ Cross Entropy 0.546 (0.497) 0.708*** (0.677) 0.529 (0.475) All 0.619 (0.565) 0.665 (0.652) 0.423 (0.364) Server Fault In-degree 0.671 (0.631) 0.748 (0.721) 0.718 (0.664) Out-degree 0.635 (0.613) 0.760 (0.727) 0.732 (0.683) Lexical 0.666 (0.631) 0.748 (0.721) 0.711 (0.663) Entropy 0.669 (0.631) 0.748 (0.721) 0.703 (0.637) Period Cross Entropy 0.701 (0.650) 0.774** (0.747) 0.622 (0.584) Comm’ Cross Entropy 0.650 (0.597) 0.738 (0.710) 0.709 (0.647) All 0.698 (0.637) 0.748 (0.721) 0.706 (0.637) Significance codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 . 1 Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms 5.2 Detection Results Spearman 0.672 (0.627) 0.592 (0.553) 0.629 (0.601) 0.654 (0.602) 0.647 (0.589) 0.671 (0.621) 0.686 (0.633) 0.583 (0.527) 0.574 (0.541) 0.558 (0.499) 0.532 (0.467) 0.640 (0.590) 0.520 (0.466) 0.640 (0.590) 0.667 (0.619) 0.608 (0.580) 0.643 (0.595) 0.654 (0.602) 0.702 (0.660) 0.651 (0.603) 0.680 (0.632)
    • Conclusions and Future Work 15 ¨  Modelling user development provides signals for detecting lifecycle periods ¤  Based ¨  on social and lexical dynamics Users’ development is influenced by the community: ¤  Convergence, ¨  Users tend to not exhibit unseen behaviour ¤  I.e. ¨  then divergence socially and lexically reuse terms, maintain existing relationships Future Work: ¤  Addressing the necessity for knowledge of prior lifecycle stages ¤  Stage-based recommendation Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms
    • 16 Questions? @mrowebot m.rowe@lancaster.ac.uk http://www.lancaster.ac.uk/staff/rowem/ Changing with Time: Modelling and Detecting User Lifecycle Periods in Online Community Platforms