Transcript of "The Semantic Evolution of Online Communities"
1.
THE SEMANTIC
EVOLUTION OF
ONLINE COMMUNITIES
MATTHEW ROWE1 AND MARKUS STROHMAIER2
1. LANCASTER UNIVERSITY, LANCASTER, UK
@MROWEBOT | M.ROWE@LANCASTER.AC.UK
2. UNIVERSITY OF KOBLENZ AND GESIS, COLOGNE, GERMANY
@MSTROHM | MARKUS.STROHMAIER@GESIS.ORG
World Wide Web Conference 2014
Seoul, South Korea
2.
Studies of Online Community Evolution
The Semantic Evolution of Online Communities
1
¨ Prior work has examined online community
development based on:
¤ Social network properties
n (Gong et al., 2012), (Mislove et al., 2007)
¤ Social group formation
n (Backstrom et al., 2006)
¤ Lexical term usage and uptake
n (Danescu et al., 2013)
Organic fixie Pitchfork, fingerstache
fashion axe 8-bit ethical. Neutra
shabby chic brunch, mustache vegan
twee typewriter dreamcatcher try-hard
organic church-key!
subsequent interpretation, of churne
nities. Churners present a serious
managers and hosts as the leaving of
a detrimental effect on the communi
a question-answering community ca
unanswered queries).
In this section we deﬁne churn
classiﬁcation task and use the previo
tors of lifecycle trajectories to predi
churner or not. As we conﬁne user
the start of their lifecycle to the end
mined from this period to characteri
We deﬁne churners as any user who
before the ﬁnal 10% of the time w
cutoff points are: 2012-07-09 for Fac
SAP, and 2010-12-23 for ServerFaul
following form: D = {(xi, yi)}, whe
label of the user from one of two
while xi denotes an 11-element R-va
either a Facebook or SAP user, and
= {w1,w2,…,wn}
3.
The Semantic Evolution of Online Communities
2
Work has yet to examine the semantic evolution of online
communities
1. Understand how semantic concepts emerge
2. Model the development of semantic structures over time
3. Examine how communities differ from one another in their evolution
4.
Assessing Semantic Evolution
The Semantic Evolution of Online Communities
3
Define: Community Semantics!
Entities, and their classes, discussed within a
community over a given time-period, and the
structure connecting those concepts!
Time
…
5.
Assessing Semantic Evolution
The Semantic Evolution of Online Communities
4
¨ Assessing community semantics enables:
1. Characterisation of semantic evolution dynamics
2. Comparison based on community evolution
3. Forecasting churn rates from evolution signals
Our Contributions
Define: Community Semantics!
Entities, and their classes, discussed within a
community over a given time-period, and the
structure connecting those concepts!
6.
The Semantic Evolution of Online Communities
Approach: Examining Semantic
Evolution5
Retrieve
Posts and
Extract
Entities
Construct
Time-
delimited
Semantic
Graphs
Inspect
Macro
Evolution
Induce
Community-
Specific
Evolution
Models
Apply
Mined
Semantic
Evolution
Dynamics
We have two flavours:
(i) concept graphs
(ii) entity graphs
Applying graph
measures over cumulative
time-sensitive graphs
Inducing logistic
population models for
each community and
graph measure
Using model dynamics as
community motifs to:
(i) inspect communities
(ii) predict churn rate
Experiment Dataset: Boards.ie
All posts during 2005-2008
93 Communities: online forums
7.
Extracting Entities from Online
Community Posts
The Semantic Evolution of Online Communities
6
¨ We are provided with a set of post quadruples:
¨ We constrain content to within a given interval [t’,t’’)
¨ Extract each post’s entities within the time interval using TextRazor
¤ No quota limit, and good prior performance (Derczynski et al., 2013)
¨ Result: time-sensitive community-discussed entities (DBPedia URIs)
Retrieve
Posts and
Extract
Entities
specialised topics of discussion over A. Posts are provided
as a set of quadruples <u, s, t, f> 2 P, where user u posted
message s at time t in forum f. A message (s) is com-
posed of terms that we use to build the semantic models
for individual communities. The information discussed in
a community, and thus its semantics, can change and alter
over time, therefore we constrain a community’s model to
speciﬁc time snapshots - e.g. t0
! t00
where t0
< t00
- for
this we use the following construct that ﬁlters through all
relevant posts’ contents within the allotted time window:
St0
t00
= {s : <u, s, t, f> 2 P, t0
t < t00
} (1)
Information discussed within online communities can be
represented in terms of its semantics, using information from
either the schema-level (i.e. ontological classes and relations
between them) or the data level (i.e. using entities and how
they are related to one another). For the former we consider
concepts to be classes found within the DBPedia Ontology,
that is: the types of entities that users are discussing (e.g.
people, locations, etc.), while for the latter case we con-
Pa
ide
ite
rea
for
{c,
riv
fro
3.2
A
wh
con
fro
as
- w
ing
pro
NITIES WITH SEMANTIC GRAPHS
For our experiments we used data from the Irish commu-
nity message board Boards.ie.2
This is a general-discussion
community message board that includes a set of hierarchi-
cally nested forums (F) in which posts are made - i.e. fo-
rum A can be a parent of forum B, and thus B contains
specialised topics of discussion over A. Posts are provided
as a set of quadruples <u, s, t, f> 2 P, where user u posted
message s at time t in forum f. A message (s) is com-
posed of terms that we use to build the semantic models
for individual communities. The information discussed in
a community, and thus its semantics, can change and alter
over time, therefore we constrain a community’s model to
speciﬁc time snapshots - e.g. t0
! t00
where t0
< t00
- for
this we use the following construct that ﬁlters through all
relevant posts’ contents within the allotted time window:
St0
t00
= {s : <u, s, t, f> 2 P, t0
t < t00
} (1)
In or
the se
that
conne
such
(i.e. w
Path
ident
iterat
reach
forme
{c, p,
rived
from
3.2
An
User u posted content s at time t in forum f
8.
Building Semantic Resource
Graphs
The Semantic Evolution of Online Communities
7
1. Concept graphs
¤ Nodes: Union of rootpaths from each entity’s class to
owl:Thing class
¤ Edges: Between nodes within the DBPedia Ontology
2. Entity graphs
¤ Nodes: Entities provided by time-specific post extractions
¤ Edges: Relations between entity pairs within the DBPedia linked
data graph
Construct
Time-
delimited
Semantic
Graphs
e - i.e. fo-
B contains
e provided
er u posted
s) is com-
tic models
iscussed in
e and alter
s model to
< t00
- for
through all
window:
} (1)
ties can be
mation from
d relations
es and how
we consider
Ontology,
ussing (e.g.
se we con-
(e.g. dbpe-
ents, St0
t00
,
such concepts, and derive a fully connected concept graph
(i.e. with no disconnected components) we extract the Root
Path Graph as follows: From each concept (c 2 RC ) we
identify the parent concept (<c rdfs:subClassOf p>) and
iteratively move up the concept graph until the root node is
reached (owl:Thing), thereby returning a set of nodes that
formed the path from c to the root node: rootpath(c) =
{c, p, ...,owl:Thing}. The graph’s vertices are therefore de-
rived by taking the union of all paths to the root returned
from each concept within the seed set:
Vf =
[
c2RC
rootpath(c) (3)
3.2 Entity Graphs
An entity graph (GE) is a type of semantic resource graph
where the vertices (V ) are entities and the set of edges (E)
connecting these entities are relations between them derived
from the Web of Linked Data. We deﬁne the entity graph
as GE[f, t0
, t00
] = hVf , Ef i, such that GE[f, t0
, t00
] ⇢ Gentity
- where Gentity denotes the DBPedia entity graph contain-
ing relations between entities at the data level. As we are
provided with a collection of time-delimited entities RE for
a given forum, we query DBPedia for links between entity
pairs and add such edges to the graph where such a link
exists:
nd char-
Google+.
cial net-
oo! An-
node ar-
ng times
ed term
on how
lthough
cial net-
rs evolve
MMU-
HS
commu-
scussion
ierarchi-
i.e. fo-
contains
provided
u posted
is com-
resource URIs, is typed according to one or more classes
from the DBPedia ontology.3
Therefore to construct the set
of concepts that are cited within a given forum f over the
allotted time period we retrieve the classes that each en-
tity is a type of and store these in the following set: RC .
From this set we generate a time-dependent concept graph:
GC [f, t0
, t00
] = hVf , Ef i, such that GC [f, t0
, t00
] ⇢ Gtype -
where Gtype denotes the DBPedia type graph formed from
the class structure of the DBPedia ontology. In this con-
text the set of concepts denotes the seed set and is used to
populate the vertices in the graph and then construct edges
between the vertices based on existing links between the
concepts in the DBPedia type graph (Gtype):
Ef = {(ci, cj) : ci, cj 2 Vf , (ci, cj) 2 Etype} (2)
In order to derive the set of vertices we must consider how
the seed set can be used for this process as it is often the case
that the set is comprised of concepts which are not directly
connected to one another in the concept graph. To connect
such concepts, and derive a fully connected concept graph
(i.e. with no disconnected components) we extract the Root
Path Graph as follows: From each concept (c 2 RC ) we
identify the parent concept (<c rdfs:subClassOf p>) and
iteratively move up the concept graph until the root node is
c models
cussed in
and alter
model to
< t00
- for
rough all
window:
(1)
es can be
tion from
relations
and how
e consider
Ontology,
sing (e.g.
we con-
.g. dbpe-
nts, St0
t00
,
a time pe-
t0
t00
using
of entities
returned
reached (owl:Thing), thereby returning a set of nodes that
formed the path from c to the root node: rootpath(c) =
{c, p, ...,owl:Thing}. The graph’s vertices are therefore de-
rived by taking the union of all paths to the root returned
from each concept within the seed set:
Vf =
[
c2RC
rootpath(c) (3)
3.2 Entity Graphs
An entity graph (GE) is a type of semantic resource graph
where the vertices (V ) are entities and the set of edges (E)
connecting these entities are relations between them derived
from the Web of Linked Data. We deﬁne the entity graph
as GE[f, t0
, t00
] = hVf , Ef i, such that GE[f, t0
, t00
] ⇢ Gentity
- where Gentity denotes the DBPedia entity graph contain-
ing relations between entities at the data level. As we are
provided with a collection of time-delimited entities RE for
a given forum, we query DBPedia for links between entity
pairs and add such edges to the graph where such a link
exists:
Ef = {(ri, rj) : ri, rj 2 Vf , (ri, rj) 2 Eentity} (4)
Given this edge construction mechanism we only look for
relations one-hop away in the entity graph, that is: given
9.
Measuring Graph Dynamics
The Semantic Evolution of Online Communities
8
1. Node Count: size of the graph
2. Diameter: breadth of the graph
3. Specialisation Count: class specialisations
4. Graph Entropy: density of the graph
5. Clustering Coefficient: cliquishness of the graph
Inspect
Macro
Evolution
Computation of the measures is described within the paper
10.
The Semantic Evolution of Online Communities
9
Macro Evolution
Cumulative Time Interval
Entropy of the
Concept Graph
Showing the mean graph entropy across all
communities and the 95% Confidence Interval
0 20 40 60 80 100 120
150200250300
Timestep
(a) Node Count
0 20 40 60 80 100 120
4.04.55.05.5
Timestep
H(G)
(b) Graph Entropy
0 20 40 60 80 100
150200250
Timestep
Specialisations
(c) Specialisations
: Concept graphs’ evolution based on node counts, graph entropy and spec
ty of the graph grows as more entities are of a given online community based on
Inspect
Macro
Evolution
11.
Inspect
Macro
Evolution
The Semantic Evolution of Online Communities
10
Macro Evolution: Concept Graphs
0 20 40 60 80 100 120
150200250300
Timestep
|V|
(a) Node Count
0 20 40 60 80 100 120
4.04.55.05.5 Timestep
H(G)
(b) Graph Entropy
0 20 40 60 80 100 120
150200250
Timestep
Specialisations
(c) Specialisations
Figure 1: Concept graphs’ evolution based on node counts, graph entropy and specialisations.
ing that the density of the graph grows as more entities are
added and thus more connections are possible between them.
Summary: We can summarise the following salient ﬁnd-
ings: (i) for concept graphs: node count, specialisation count
and density (graph entropy) tend to converge to limit; (ii)
for entity graphs, the diameter, graph entropy and cluster-
ing coe cient tend to converge to a limit, while the node
count (number of entities) increases linearly; (iii) despite
new entities arriving at a constant linear rate, on average,
of a given online community based on a single graph mea-
sure. We exclude the derivation of the equations from the
paper, but it is su cient to conclude that given |T| time
steps we would have a single equation for each time step
(t 2 T): Rtr 1
+ PtE 1
= 1. We can then solve for the
unknown variables r and E using the QR-decomposition of
a matrix: expressing the lefthand side of the simultaneous
equations as a |T| ⇥ 2 matrix and the righthand side as a
|T|-element vector where each element is 1. We induced
logistic population models for each of the graph measures
Node count, graph entropy, and specialisations converge to a limit
12.
Inspect
Macro
Evolution
The Semantic Evolution of Online Communities
11
Macro Evolution: Entity Graphs
0 20 40 60 80 100 120
050010001500
Timestep
|V|
Linear Model (β=9.932)
(a) Node Count
0 20 40 60 80 100 120
1234567
Timestep
Diameter
(b) Diameter
0 20 40 60 80 100 120
1234
Timestep
H(G)
(c) Graph Entropy
0 20 40 60 80 100 120
0.020.060.100.14
Timestep
CC
(d) Clustering Coe cient
Figure 2: Entity graphs’ evolution based on node counts, diameter, graph entropy, and clustering coe cient.
and entity graphs deﬁned within a single vector for a given
community (f): mf = {m1, m2, . . . , m13}. We began by de-
riving a single matrix M 2 R93⇥13
containing the 93 commu-
nities (rows) under analysis together with their 13 evolution
measures (columns), and performing principal component
analysis over this matrix. The result of this clustering is
shown in Fig. 3 where we have colour-coded the di↵erent
community forums by their hierarchical level in the plat-
form (level 2 = most general, level 4 = most speciﬁc). We
found that several of the more general forums appeared as
dynamics of a given community at time step t to predict
the churn rate of community members at time t + 1. We
deﬁned the churn rate of a community as the proportion of
active users during a given time period (i.e. week segment)
that post for the last time. We used the semantic dynamics
motifs from the prior experiment (as listed within Fig. 3)
and also included graph measures at a given time period:
i.e. graph entropy at time t, specialisation count at time t,
etc. We derived these features for every time step for each
community and derived the response variable as the churn
Diameter, graph entropy, and clustering coefficient converge to a limit
13.
Modelling the Semantic Evolution
of Individual Communities
The Semantic Evolution of Online Communities
12
¨ For each community and measure, induce a logistic
population model:
1. Get the set of measure increase time steps (T)
2. Derive the proportionate growth rate for each step
3. Use the set of proportionate growth rates to solve for
unknown community-specific evolution variables (r & E)
T = {a : a 2 [1, 119], m(G1,a+1) > m(G1,a)}
Deriving the set of change time steps for a given com
allows the proportionate growth rate for a given t
(t 2 T) to be derived: Rt = (Pt+1 Pt)/Pt. Th
is equivalent to the following equation which deﬁ
proportionate growth rate Rt in terms of the comm
growth rate (r) and carrying capacity (E), our u
variables: Rt = r(1 Pt/E). Therefore if we mea
proportionate growth rate over the |T| distinct tim
then we can derive, via simultaneous equations, the
rate of the graph and its carrying capacity, the ve
sures that we can use to characterise the semantic e
Induce
Community-
Specific
Evolution
Models
Proportionate
Growth Rate @ t
Population Growth Rate
Population (measure) size @ t
Carrying Capacity
(Equilibrium)
All communities’ models fit with χ2-test at α<0.01
14.
The Semantic Evolution of Online Communities
13
Semantic Evolution… what can we actually
use it for?
15.
The Semantic Evolution of Online Communities
Application: Community Evolution
Analysis14
Apply
Mined
Semantic
Evolution
Dynamics
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
−60 −40 −20 0 20 40 60
−40−20020
PC1
PC2
7
9
10
11 12
18
19
20
21
22
23
24
25
31
34
37
38
47
52
54
55
56
6064
68
82
86
93
99
105
107
108
109
116
120
124
125
126
127
136
137
151
171
177
227
232
237
246
252
259
264
267
269
271
333
343
346
370
382
388
389
392
410
411 443
446
453
464
468
471
474
475
476
478
481
482
483
490
495
503
506 512
514
518
522
529
532
542544
545
547
554
556
●
●
●
Level 2
Level 3
Level 4
E
EG − CSemantic Evolution Motif:
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●●
●
●
●
●
20 40 6022
24
5
31
34
37
54
56
82
93
105
109
177
227
2
264
269
333
392 443
446468
4
3
490
495
6 512
514522
529
542544
554
556
●
●
●
Level 2
Level 3
Level 4 CG − Node Count − Rate
CG − Node Count − Equilbrium
CG − Graph Entropy − Rate
CG − Graph Entropy − Eq
CG − Specialisation Count − Rate
CG − Specialisation − Equilibrium
EG − Node Count − Slope
EG − Diameter − Rate
EG − Diameter − Equilibrium
EG − Entropy − Rate
EG − Entropy − Equilibrium
EG − Clustering Coefficient − Rate
EG − Clustering Coefficient − Equilibrium f7 − After Hours
f227 − Television
f554 − Wanted Motors
10−1
100
101
102
103
he communities based on their semantic motifs (left) where level 4 forums are clustered
lues for the concept graphs (CG) and entity graphs (EG) for the three outlier forums
right).
hat concept and entity graph den-
row linearly (unlike in social net-
erges on a limit, which we charac-
Conference on Advances in Social Networks Analysis
and Mining, 0:724–725, 2012.
[4] Cristian Danescu-Niculescu-Mizil, Robert West, Dan
20 40 60 80 100 120
Timestep
Linear Model (β=9.932)
(a) Node Count
0 20 40 60 80 100 1201234567
Timestep
Diameter
(b) Diameter
0 20 40 60 80 100 1
1234
Timestep
H(G)
(c) Graph Entropy
2: Entity graphs’ evolution based on node counts, diameter, graph entro
ity graphs deﬁned within a single vector for a given
nity (f): mf = {m1, m2, . . . , m13}. We began by de-
single matrix M 2 R93⇥13
containing the 93 commu-
ows) under analysis together with their 13 evolution
es (columns), and performing principal component
over this matrix. The result of this clustering is
dynamics of a given com
the churn rate of commu
deﬁned the churn rate of
active users during a give
that post for the last time
motifs from the prior exp
40 60 80 100 120
Timestep
Linear Model (β=9.932)
Node Count
0 20 40 60 80 100 120
1234
Timestep
Diameter
(b) Diameter
0 20
123
H(G)
(c) G
Entity graphs’ evolution based on node counts, diame
raphs deﬁned within a single vector for a given
(f): mf = {m1, m2, . . . , m13}. We began by de-
le matrix M 2 R93⇥13
containing the 93 commu-
) under analysis together with their 13 evolution
dynami
the chu
deﬁned
active u
17.
Application: Churn Rate Prediction
The Semantic Evolution of Online Communities
16
¨ Task: forecast community churn rate at t+1
¤ Based on semantic evolution up to t
¨ Trained a ridge regression model
¨ Root Mean Square Error: Predicted vs. Actual churn
rate
Apply
Mined
Semantic
Evolution
Dynamics
Training Test
0 120 150
Training
Test
dynamics of a given community at time step t to predict
the churn rate of community members at time t + 1. We
deﬁned the churn rate of a community as the proportion of
active users during a given time period (i.e. week segment)
that post for the last time. We used the semantic dynamics
motifs from the prior experiment (as listed within Fig. 3)
and also included graph measures at a given time period:
i.e. graph entropy at time t, specialisation count at time t,
etc. We derived these features for every time step for each
community and derived the response variable as the churn
rate at the following time step. We then compiled a train-
ing dataset (up to week 120) and a test dataset (from week
120). Each dataset had the following form: D = {(xi, yi)},
where xi contained a 21-element time-delimited feature vec-
tor for a given community and yi was the churn rate of the
community at the following time step. We trained a ridge
regression model ( ) using Dtrain and applied it to Dtest,
testing the performance of: a) just concept graph features,
b) just entity graph features, and c) all features. An autore-
gressive model was used as the baseline - using the churn
rate at time t as a single predictor variable for the churn
rate at time t + 1. Performance was evaluated using the
Root Mean Square Error (RMSE).
Table 1: Root Mean Square Error when predicting
churn rates using: an Autoregressive model (R2
=
0.341) and a Ridge Regression model using Concept
18.
Application: Churn Rate Prediction
The Semantic Evolution of Online Communities
17
sions
: we
te of
that
t the
con-
size.
aphs:
h but
g to-
ount
After
ums,
ty of
eciﬁc
The
and
neral
ntity
gressive model was used as the baseline - using the churn
rate at time t as a single predictor variable for the churn
rate at time t + 1. Performance was evaluated using the
Root Mean Square Error (RMSE).
Table 1: Root Mean Square Error when predicting
churn rates using: an Autoregressive model (R2
=
0.341) and a Ridge Regression model using Concept
Graph, Entity Graph, and all features.
Baseline Concept Graph Entity Graph All Features
7.310 ⇥10 3
5.315 ⇥10 3
5.301 ⇥10 3
4.941 ⇥10 3
Table 1 presents the results from our prediction our exper-
iment. We found that for all tested models (concept graph,
entity graph, all features) we signiﬁcantly outperformed the
baseline - tested using the sign test (↵ = 0.001). Entity
graph features outperform concept graphs but not signiﬁ-
cantly, while our best model is the use of all features together
in a single model. These results empirically demonstrate the
Significant reduction in error (Sign test with α=0.001)
Baseline: Autoregressive model with churn
rate @ t as a predictor for t+1
Apply
Mined
Semantic
Evolution
Dynamics
19.
Findings and Conclusions
The Semantic Evolution of Online Communities
18
¨ Semantic graphs of online communities do not grow linearly:
instead, they evolve to a limit
¤ Unlike in social networks [Lekovec et al., 2008]
¤ Exception: entity graph size
¨ A finite number of topics are discussed within communities
¤ Variation between communities
¨ Our use of logistic population models has enabled:
1. Characterisation of community-specific evolution dynamics
2. Community analysis to inspect how communities evolved
differently
3. Churn prediction based on semantic evolution
21.
Matthew Rowe
@mrowebot
m.rowe@lancaster.ac.uk
http://www.lancaster.ac.uk/staff/rowem/
Markus Strohmaier
@mstrohm
markus.strohmaier@gesis.org
http://markusstrohmaier.info/
Questions?20
The Semantic Evolution of Online Communities
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.
Be the first to comment