SlideShare a Scribd company logo
1 of 47
Semantic Stability in Social Tagging
Streams
Claudia Wagner, Philipp Singer,
Markus Strohmaier and Bernardo Huberman
2
Folksonomies
Ontologies
Formal, shared
and stableNot formal but shared
and stable?
4
1970
1990
2010
http://schwarzenegger.com/
5
How can we measure semantic stability?
How can we compare the semantic stabilization process in
different systems?
What impacts semantic stability?
Measuring Semantic Stability
State of the Art
• Relative tag proportions per resource become stable with
increasing number of tag assignments [Golder and
Huberman, 2006]
• KL-divergence of rank-ordered tag frequency distribution per
resource at different time points converges towards zero
[Halpin et al., 2007]
• Power Law distributions [Cattuto et al., 2006] – Scale
invariance property ensures that regardless how large the
system grows the shape of the distribution stays the same
6
Some Limitations
• Don’t allow comparing the semantic
stabilization process of different systems
• Prune tag distributions to top-k tags
– Cannot handle non-conjoint lists of tags
• Random tagging process also produces
“stable” description
– Tag assignment at timepoint t+1 has less impact
on the tag distribution of a resource than a tag at
timepoint t
7
Example
KL-Divergence
8
• KL-divergence converges
towards zero.
• But random baseline also
converges towards zero if
we assume a constant
tagging rate.
• We do not always know
the top k tags!
0 200 400 600 800 1000
0.00.20.40.60.81.0
Number of consecutive tags assignments
KLDivergence
●
●
●
●
●
●
●
●
● ●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Example
Relative Tag Proportion
9
0e+00 2e+04 4e+04 6e+04 8e+04 1e+05
0.000.050.100.150.200.25
Consecutive Tags (User List Names)
RelativeTagProportion
bloggers
blogs
business
design
digital
entertainment
internet
it
marketing
mashable
media
my favstar.fm list
news
social
social media
social−media
socialmedia
tech
tech news
techies
technews
technology
tecnologia
twibes−socialmedia
web
Relative Tag Proportion
0 2000 4000 6000 8000 10000
0.000.050.100.150.200.250.300.35
Consecutive Tags (User List Names)
RelativeTagProportion
1
2
3
4
5
Intuition and Approach
• Some descriptors are
more important than
others.
• Ranking of (top)
descriptors remains
stable over time
• All descriptors are
equally important.
• Ranking of (top)
descriptors changes
over time
0
0.1
0.2
0.3
0.4
P(T)
0
0.1
0.2
0.3
0.4
0
0.1
0.2
0.3
P(T)
0
0.1
0.2
0.3
stable
less stable
tn tn+m
tn tn+m
Intuition and Approach
• Some descriptors are
more important than
others.
• Ranking of (top)
descriptors remains
stable over time
• All descriptors are
equally important.
• Ranking of (top)
descriptors changes
over time
0
0.1
0.2
0.3
0.4
P(T)
0
0.1
0.2
0.3
0.4
stable
less stable
tn tn+m
tn tn+m
0
0.2
0.4
0.6
0
0.2
0.4
P(T)
Requirements
• Rank agreement of the descriptors of a
resources over time
• Weighted rank agreement
• Non-conjoint lists of descriptors
• Random Baseline
13
Rank Biased Overlap (RBO)
[Webber et al., 2010]
• RBO falls in the range [0, 1], where 0 means
disjoint, and 1 means identical
• p lies between 0 and 1 and determines how steep
the decline in weights is
• The smaller p, the more top-weighted the metric
14
Example
15
0
0.1
0.2
0.3
0.4
0
0.1
0.2
0.3
0.4
Overlap at depth 1 = 1
P(T)
P(T)
tn
tn+m
Example
16
0
0.1
0.2
0.3
0.4
0
0.1
0.2
0.3
0.4
Overlap at depth 2 = 0.5
P(T)
P(T)
tn
tn+m
Example
17
0
0.1
0.2
0.3
0.4
0
0.1
0.2
0.3
0.4
Overlap at depth 3 = 1
P(T)
P(T)
tn
tn+m
Effect of the Paramter p
18
Tie correction for
Rank Biased Overlap
• RBO does not penalize ties
• We want to penalize ties since they show that
users have not agreed on a ranking
• Sum only over those depths which occur in at
least one of the two rankings
19
Same concordant pairs: (A,D) and (B,D) and (C,D)
0
10
20
30
40
50
60
70
80
90
A B C D
0
10
20
30
40
50
60
70
80
90
C B A D
RBOorig = 0.2
RBOmod= 0.2
0
10
20
30
40
50
60
70
80
90
A B C D
0
10
20
30
40
50
60
70
80
90
A B C D
RBOorig = 0.34
RBOmod= 0.17
No Ties Ties
tn tn+m tn tn+m
R1 R2
A B C D C B A D A B C D C B A D
Frequency
Frequency
Semantic Stabilization on a
Resource Level
23
0 1000 2000 3000 4000
0.00.20.40.60.81.0
Number of consecutive tags assignments
RBO
●
●
●
●
●
●
●
●●
●
●●
●
●●
●●●
●●
●●
●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●
●●
●
●●●●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●
●
●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●
●
●
●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
• Tag distributions of Twitter
users become semantically
stable between 1k and 2k
tag assignments
• The RBO values of random
tagging distributions
increase slower and are
significantly lower
Semantic Stabilization
on a System Level
• How can we compare the semantic stabilization
process in different systems?
• We call a resource description semantically stable
after tn+m tag assignments, if the RBO value
between its tag distribution at point tn and tn+m is
equal or greater than k.
24
Semantic Stabilization
on a System Level
25
After 1250 tag assignments 90% of all
resources have a stability above 0.61
Empirical Study
Twitter
26
Medium level of semantic
stability is reached after
1k-2k tag assignments
Empirical Study
Twitter and Delicious
27
Tag streams in Delicious
stabelize faster and sign.
higher than in Twitter
Empirical Study
Twitter, Delicious and LibraryThing
28
Same is true for tag
streams of books in
LibraryBook
Empirical Study
Random Baseline
29
Difference between tag and word
streams?
30
What causes semantic stability?
• Simulations based on the epistemic tagging model
[Dellschaft and Staab, 2008].
• Use parameter I as imitation rate and produce tag
distributions for I=0, 0.1, ... 1
31
What causes stability?
33
Medium levels of
semantic stability are
reached after 1k-2k tag
assignments
What causes stability?
34
Same is true if we
combine BK and imitation
when BK is dominant
What causes stability?
35
If imitation and BK are
combined an imitation is
dominant higher levels of
semantic stability are
reached faster
What causes stability?
36
• Combination of shared background knowledge and imitation
behaviour (where imitation is more important) leads to the fastest
and highest stabilization.
• Natural language systems show similar stabilization as social tagging
systems where no imitation is supported
Conclusions & Implications
• Attempt to formalize semantic stability in social streams
• Novel approach to measure and compare the semantic
stabilization process in different social streams
Why is that useful?
• Identify social streams (e.g. tag stream of URL or word stream
of hashtags) which are semantically stable
– Extract shared and agreed-upon semantic knowledge from
social streams
• Select systems that provide semantically stable streams
37
References
• D. Bollen and H. Halpin. The role of tag suggestions in folksonomies. In Proceedings of the 20th ACM
conference on Hypertext and hypermedia, HT ’09, pages 359–360, New York, NY, USA, 2009. ACM.
• C. Cattuto, Semiotic dynamics on social tagging communities. The European Physical Journal C -
Particles and Fields August 2006, Volume 46, Issue 2 Supplement, pp 33-37
• A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Rev.,
51(4):661–703, Nov. 2009.
• K. Dellschaft and S. Staab. An epistemic dynamic model for tagging systems. In HT ’08: Proceedings of
the nineteenth ACM conference on Hypertext and hypermedia, pages 71–80, New York, NY, USA, 2008.
ACM.
• S. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of Information
Science, 32(2):198–208, April 2006.
• H. Halpin, V. Robu, and H. Shepherd. The complex dynamics of collaborative tagging. In Proceedings of
the 16th international conference on World Wide Web, WWW ’07, pages 211–220, New York, NY, USA,
2007. ACM.
• A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. Bibsonomy: A social bookmark and publication
sharing system. In Proceedings of the Conceptual Structures Tool Interoperability Workshop at the 14th
International Conference on Conceptual Structures, pages 87-102, 2006.
• C. T. Kello, G. D. A. Brown, R. Ferrer-i Cancho, J. G. Holden, K. Linkenkaer-Hansen, T. Rhodes, and G.
C. Van Orden. Scaling laws in cognitive sciences. Trends in Cognitive Sciences, 14(5):223{232, May
2010.
• W. Webber, A. Moat, and J. Zobel. A similarity measure for indefinite rankings. ACM Trans. Inf. Syst.,
28(4):20:1{20:38, Nov. 2010.
40
Thank you!
41
Special thanks to my collaborators (2/3 of them are here):
Limitations and Future Work
• RBO measures ranking but ignores the differences
in the frequencies
• Decay function to weight tag counts
– old tag assignments are less important than new ones
• Number and diversity of users who tag a resource
might impact the semantic stabilization process
42
Alternatives to RBO
• Unweighted and conjoint measures
– Kendall tau, Spearman rho
• Weighted and conjoint measures
– Weighted Kendall tau
• Unweighted and non-conjoint measures
– Intersection metric
• Weighted and conjoint
– Cumulative overlap at increasing depths
43
Dataset
44
Categories of Semantically
Unstable Resources
• Entity to which a resource refers changes
• Resource (i.e. website) changes
• Entity/Topic to which a resource refers is controversial
– website refers to controversial entity/topic on which
different viewpoints exist
• External conditions which impact viewpoints on
entity/topic change
– Website remains stable but viewpoint of taggers on the
entity or topic related with the site change
45
Relative Tag Proportion
[Golder and Huberman, 2006]
46
tn+mtn
stableless stable
Relative Tag Proportion
[Golder and Huberman, 2006]
47
0e+00 2e+04 4e+04 6e+04 8e+04 1e+05
0.000.050.100.150.200.25
Consecutive Tags (User List Names)
RelativeTagProportion
bloggers
blogs
business
design
digital
entertainment
internet
it
marketing
mashable
media
my favstar.fm list
news
social
social media
social−media
socialmedia
tech
tech news
techies
technews
technology
tecnologia
twibes−socialmedia
web
Relative Tag Proportion
0 2000 4000 6000 8000 10000
0.000.050.100.150.200.250.300.35
Consecutive Tags (User List Names)
RelativeTagProportion
1
2
3
4
5
KL-Divergence
[Halpin et al., 2007]
48
• KL divergence between the rank-ordered frequency
distribution of the top 25 tags at different time points
tn+mtn
stableless stable
KL-Divergence
49
Power Law
[Cattuto, 2006]
50
• Is the rank-ordered frequency distribution a power law
distribution?
• Is the frequency y of a tag inversely proportional to it's
rank r?
tn+mtn
Power Law
[Cattuto, 2006]
51
• Is it really power law?
– Very likely yes according to the maximum
likelihood estimator and Kolmogorov-
Smirnov statistic [Clauset et al., 2010]
– Estimate alpha and xmin over some
reasonable range
– Compare power law fit to the fit of the
exponential function, the lognormal
function and the stretched exponential
(Weibull) function. Use the log-likelihood
ratios to indicate which fit is better.
– We do not find significant differences
between the power law fit and the
lognormal fit
RBO
52
Stablilization going beyond
Baseline Stability
53
Stablilization not going beyond
Baseline Stability
54

More Related Content

Similar to WWW2014 Semantic Stability in Social Tagging Streams

Dominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender SystemsDominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender SystemsDominik Kowald
 
Tags as tools for social classification
Tags as tools for social classificationTags as tools for social classification
Tags as tools for social classificationIsabella Peters
 
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Stop thinking, start tagging - Tag Semantics emerge from Collaborative VerbosityStop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Stop thinking, start tagging - Tag Semantics emerge from Collaborative VerbosityInovex GmbH
 
Aspects of broad folksonomies
Aspects of broad folksonomiesAspects of broad folksonomies
Aspects of broad folksonomiesdermotte
 
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and RefinementGoal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and RefinementEmil Lupu
 
Self-organization of society: fragmentation, disagreement, and how to overcom...
Self-organization of society: fragmentation, disagreement, and how to overcom...Self-organization of society: fragmentation, disagreement, and how to overcom...
Self-organization of society: fragmentation, disagreement, and how to overcom...Hiroki Sayama
 
2011 06-14 cristhian-parra_u_count
2011 06-14 cristhian-parra_u_count2011 06-14 cristhian-parra_u_count
2011 06-14 cristhian-parra_u_countCristhian Parra
 
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?Toine Bogers
 
"Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013
"Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013"Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013
"Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013Susanna-Assunta Sansone
 
Community perspectives on sustainability and resilience within a social ecolo...
Community perspectives on sustainability and resilience within a social ecolo...Community perspectives on sustainability and resilience within a social ecolo...
Community perspectives on sustainability and resilience within a social ecolo...Alex Webb
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataBarry Smith
 
Identifying, annotating, and filtering arguments and opinions on the social w...
Identifying, annotating, and filtering arguments and opinions on the social w...Identifying, annotating, and filtering arguments and opinions on the social w...
Identifying, annotating, and filtering arguments and opinions on the social w...jodischneider
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...Natalia Díaz Rodríguez
 
Understanding Collaboration in Fluid Organizations, a Proximity Approach
Understanding Collaboration in Fluid Organizations, a Proximity ApproachUnderstanding Collaboration in Fluid Organizations, a Proximity Approach
Understanding Collaboration in Fluid Organizations, a Proximity ApproachDawn Foster
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionRrubaa Panchendrarajan
 
FaceTag - IASummit 2007
FaceTag - IASummit 2007FaceTag - IASummit 2007
FaceTag - IASummit 2007Andrea Resmini
 
Tips and tricks for contributing to an Open Source project.pptx
Tips and tricks for contributing to an Open Source project.pptxTips and tricks for contributing to an Open Source project.pptx
Tips and tricks for contributing to an Open Source project.pptxVictor Morales
 
TAGUCHI- QUALITY GURU
TAGUCHI- QUALITY GURUTAGUCHI- QUALITY GURU
TAGUCHI- QUALITY GURURajeev Sharan
 

Similar to WWW2014 Semantic Stability in Social Tagging Streams (20)

Dominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender SystemsDominik Kowald PhD Defense Recommender Systems
Dominik Kowald PhD Defense Recommender Systems
 
Tags as tools for social classification
Tags as tools for social classificationTags as tools for social classification
Tags as tools for social classification
 
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Stop thinking, start tagging - Tag Semantics emerge from Collaborative VerbosityStop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
Stop thinking, start tagging - Tag Semantics emerge from Collaborative Verbosity
 
Aspects of broad folksonomies
Aspects of broad folksonomiesAspects of broad folksonomies
Aspects of broad folksonomies
 
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and RefinementGoal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
Goal Decomposition and Abductive Reasoning for Policy Analysis and Refinement
 
Self-organization of society: fragmentation, disagreement, and how to overcom...
Self-organization of society: fragmentation, disagreement, and how to overcom...Self-organization of society: fragmentation, disagreement, and how to overcom...
Self-organization of society: fragmentation, disagreement, and how to overcom...
 
2011 06-14 cristhian-parra_u_count
2011 06-14 cristhian-parra_u_count2011 06-14 cristhian-parra_u_count
2011 06-14 cristhian-parra_u_count
 
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
Tagging vs. Controlled Vocabulary: Which is More Helpful for Book Search?
 
"Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013
"Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013"Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013
"Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013
 
Community perspectives on sustainability and resilience within a social ecolo...
Community perspectives on sustainability and resilience within a social ecolo...Community perspectives on sustainability and resilience within a social ecolo...
Community perspectives on sustainability and resilience within a social ecolo...
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort Data
 
Identifying, annotating, and filtering arguments and opinions on the social w...
Identifying, annotating, and filtering arguments and opinions on the social w...Identifying, annotating, and filtering arguments and opinions on the social w...
Identifying, annotating, and filtering arguments and opinions on the social w...
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
 
Understanding Collaboration in Fluid Organizations, a Proximity Approach
Understanding Collaboration in Fluid Organizations, a Proximity ApproachUnderstanding Collaboration in Fluid Organizations, a Proximity Approach
Understanding Collaboration in Fluid Organizations, a Proximity Approach
 
Neural Architectures for Named Entity Recognition
Neural Architectures for Named Entity RecognitionNeural Architectures for Named Entity Recognition
Neural Architectures for Named Entity Recognition
 
FaceTag - IASummit 2007
FaceTag - IASummit 2007FaceTag - IASummit 2007
FaceTag - IASummit 2007
 
FaceTag at IASummit 2007
FaceTag at IASummit 2007FaceTag at IASummit 2007
FaceTag at IASummit 2007
 
Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 
Tips and tricks for contributing to an Open Source project.pptx
Tips and tricks for contributing to an Open Source project.pptxTips and tricks for contributing to an Open Source project.pptx
Tips and tricks for contributing to an Open Source project.pptx
 
TAGUCHI- QUALITY GURU
TAGUCHI- QUALITY GURUTAGUCHI- QUALITY GURU
TAGUCHI- QUALITY GURU
 

More from Claudia Wagner

Measuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaMeasuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaClaudia Wagner
 
Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Claudia Wagner
 
It's a Man's Wikipedia?
It's a Man's Wikipedia? It's a Man's Wikipedia?
It's a Man's Wikipedia? Claudia Wagner
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Claudia Wagner
 
When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...Claudia Wagner
 
Welcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISWelcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISClaudia Wagner
 
Spatio and Temporal Dietary Patterns
Spatio and Temporal Dietary PatternsSpatio and Temporal Dietary Patterns
Spatio and Temporal Dietary PatternsClaudia Wagner
 
Eswc2013 audience short
Eswc2013 audience shortEswc2013 audience short
Eswc2013 audience shortClaudia Wagner
 
The Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksThe Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksClaudia Wagner
 
It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users Claudia Wagner
 
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...Claudia Wagner
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 
Knowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsKnowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsClaudia Wagner
 
The wisdom in Tweetonomies
The wisdom in TweetonomiesThe wisdom in Tweetonomies
The wisdom in TweetonomiesClaudia Wagner
 

More from Claudia Wagner (18)

Measuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaMeasuring Gender Inequality in Wikipedia
Measuring Gender Inequality in Wikipedia
 
Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"
 
It's a Man's Wikipedia?
It's a Man's Wikipedia? It's a Man's Wikipedia?
It's a Man's Wikipedia?
 
Food and Culture
Food and CultureFood and Culture
Food and Culture
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014
 
When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...
 
Welcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISWelcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESIS
 
Spatio and Temporal Dietary Patterns
Spatio and Temporal Dietary PatternsSpatio and Temporal Dietary Patterns
Spatio and Temporal Dietary Patterns
 
Eswc2013 audience short
Eswc2013 audience shortEswc2013 audience short
Eswc2013 audience short
 
The Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksThe Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social Networks
 
It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users
 
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
 
Socialbots www2012
Socialbots www2012Socialbots www2012
Socialbots www2012
 
SDOW (ISWC2011)
SDOW (ISWC2011)SDOW (ISWC2011)
SDOW (ISWC2011)
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
Knowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsKnowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness Streams
 
The wisdom in Tweetonomies
The wisdom in TweetonomiesThe wisdom in Tweetonomies
The wisdom in Tweetonomies
 

Recently uploaded

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 

WWW2014 Semantic Stability in Social Tagging Streams

  • 1. Semantic Stability in Social Tagging Streams Claudia Wagner, Philipp Singer, Markus Strohmaier and Bernardo Huberman
  • 4. 5 How can we measure semantic stability? How can we compare the semantic stabilization process in different systems? What impacts semantic stability?
  • 5. Measuring Semantic Stability State of the Art • Relative tag proportions per resource become stable with increasing number of tag assignments [Golder and Huberman, 2006] • KL-divergence of rank-ordered tag frequency distribution per resource at different time points converges towards zero [Halpin et al., 2007] • Power Law distributions [Cattuto et al., 2006] – Scale invariance property ensures that regardless how large the system grows the shape of the distribution stays the same 6
  • 6. Some Limitations • Don’t allow comparing the semantic stabilization process of different systems • Prune tag distributions to top-k tags – Cannot handle non-conjoint lists of tags • Random tagging process also produces “stable” description – Tag assignment at timepoint t+1 has less impact on the tag distribution of a resource than a tag at timepoint t 7
  • 7. Example KL-Divergence 8 • KL-divergence converges towards zero. • But random baseline also converges towards zero if we assume a constant tagging rate. • We do not always know the top k tags! 0 200 400 600 800 1000 0.00.20.40.60.81.0 Number of consecutive tags assignments KLDivergence ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
  • 8. Example Relative Tag Proportion 9 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 0.000.050.100.150.200.25 Consecutive Tags (User List Names) RelativeTagProportion bloggers blogs business design digital entertainment internet it marketing mashable media my favstar.fm list news social social media social−media socialmedia tech tech news techies technews technology tecnologia twibes−socialmedia web Relative Tag Proportion 0 2000 4000 6000 8000 10000 0.000.050.100.150.200.250.300.35 Consecutive Tags (User List Names) RelativeTagProportion 1 2 3 4 5
  • 9. Intuition and Approach • Some descriptors are more important than others. • Ranking of (top) descriptors remains stable over time • All descriptors are equally important. • Ranking of (top) descriptors changes over time 0 0.1 0.2 0.3 0.4 P(T) 0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 P(T) 0 0.1 0.2 0.3 stable less stable tn tn+m tn tn+m
  • 10. Intuition and Approach • Some descriptors are more important than others. • Ranking of (top) descriptors remains stable over time • All descriptors are equally important. • Ranking of (top) descriptors changes over time 0 0.1 0.2 0.3 0.4 P(T) 0 0.1 0.2 0.3 0.4 stable less stable tn tn+m tn tn+m 0 0.2 0.4 0.6 0 0.2 0.4 P(T)
  • 11. Requirements • Rank agreement of the descriptors of a resources over time • Weighted rank agreement • Non-conjoint lists of descriptors • Random Baseline 13
  • 12. Rank Biased Overlap (RBO) [Webber et al., 2010] • RBO falls in the range [0, 1], where 0 means disjoint, and 1 means identical • p lies between 0 and 1 and determines how steep the decline in weights is • The smaller p, the more top-weighted the metric 14
  • 16. Effect of the Paramter p 18
  • 17. Tie correction for Rank Biased Overlap • RBO does not penalize ties • We want to penalize ties since they show that users have not agreed on a ranking • Sum only over those depths which occur in at least one of the two rankings 19
  • 18. Same concordant pairs: (A,D) and (B,D) and (C,D) 0 10 20 30 40 50 60 70 80 90 A B C D 0 10 20 30 40 50 60 70 80 90 C B A D RBOorig = 0.2 RBOmod= 0.2 0 10 20 30 40 50 60 70 80 90 A B C D 0 10 20 30 40 50 60 70 80 90 A B C D RBOorig = 0.34 RBOmod= 0.17 No Ties Ties tn tn+m tn tn+m R1 R2 A B C D C B A D A B C D C B A D Frequency Frequency
  • 19. Semantic Stabilization on a Resource Level 23 0 1000 2000 3000 4000 0.00.20.40.60.81.0 Number of consecutive tags assignments RBO ● ● ● ● ● ● ● ●● ● ●● ● ●● ●●● ●● ●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●● ● ●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●● ● ● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● • Tag distributions of Twitter users become semantically stable between 1k and 2k tag assignments • The RBO values of random tagging distributions increase slower and are significantly lower
  • 20. Semantic Stabilization on a System Level • How can we compare the semantic stabilization process in different systems? • We call a resource description semantically stable after tn+m tag assignments, if the RBO value between its tag distribution at point tn and tn+m is equal or greater than k. 24
  • 21. Semantic Stabilization on a System Level 25 After 1250 tag assignments 90% of all resources have a stability above 0.61
  • 22. Empirical Study Twitter 26 Medium level of semantic stability is reached after 1k-2k tag assignments
  • 23. Empirical Study Twitter and Delicious 27 Tag streams in Delicious stabelize faster and sign. higher than in Twitter
  • 24. Empirical Study Twitter, Delicious and LibraryThing 28 Same is true for tag streams of books in LibraryBook
  • 26. Difference between tag and word streams? 30
  • 27. What causes semantic stability? • Simulations based on the epistemic tagging model [Dellschaft and Staab, 2008]. • Use parameter I as imitation rate and produce tag distributions for I=0, 0.1, ... 1 31
  • 28. What causes stability? 33 Medium levels of semantic stability are reached after 1k-2k tag assignments
  • 29. What causes stability? 34 Same is true if we combine BK and imitation when BK is dominant
  • 30. What causes stability? 35 If imitation and BK are combined an imitation is dominant higher levels of semantic stability are reached faster
  • 31. What causes stability? 36 • Combination of shared background knowledge and imitation behaviour (where imitation is more important) leads to the fastest and highest stabilization. • Natural language systems show similar stabilization as social tagging systems where no imitation is supported
  • 32. Conclusions & Implications • Attempt to formalize semantic stability in social streams • Novel approach to measure and compare the semantic stabilization process in different social streams Why is that useful? • Identify social streams (e.g. tag stream of URL or word stream of hashtags) which are semantically stable – Extract shared and agreed-upon semantic knowledge from social streams • Select systems that provide semantically stable streams 37
  • 33. References • D. Bollen and H. Halpin. The role of tag suggestions in folksonomies. In Proceedings of the 20th ACM conference on Hypertext and hypermedia, HT ’09, pages 359–360, New York, NY, USA, 2009. ACM. • C. Cattuto, Semiotic dynamics on social tagging communities. The European Physical Journal C - Particles and Fields August 2006, Volume 46, Issue 2 Supplement, pp 33-37 • A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Rev., 51(4):661–703, Nov. 2009. • K. Dellschaft and S. Staab. An epistemic dynamic model for tagging systems. In HT ’08: Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, pages 71–80, New York, NY, USA, 2008. ACM. • S. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of Information Science, 32(2):198–208, April 2006. • H. Halpin, V. Robu, and H. Shepherd. The complex dynamics of collaborative tagging. In Proceedings of the 16th international conference on World Wide Web, WWW ’07, pages 211–220, New York, NY, USA, 2007. ACM. • A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. Bibsonomy: A social bookmark and publication sharing system. In Proceedings of the Conceptual Structures Tool Interoperability Workshop at the 14th International Conference on Conceptual Structures, pages 87-102, 2006. • C. T. Kello, G. D. A. Brown, R. Ferrer-i Cancho, J. G. Holden, K. Linkenkaer-Hansen, T. Rhodes, and G. C. Van Orden. Scaling laws in cognitive sciences. Trends in Cognitive Sciences, 14(5):223{232, May 2010. • W. Webber, A. Moat, and J. Zobel. A similarity measure for indefinite rankings. ACM Trans. Inf. Syst., 28(4):20:1{20:38, Nov. 2010. 40
  • 34. Thank you! 41 Special thanks to my collaborators (2/3 of them are here):
  • 35. Limitations and Future Work • RBO measures ranking but ignores the differences in the frequencies • Decay function to weight tag counts – old tag assignments are less important than new ones • Number and diversity of users who tag a resource might impact the semantic stabilization process 42
  • 36. Alternatives to RBO • Unweighted and conjoint measures – Kendall tau, Spearman rho • Weighted and conjoint measures – Weighted Kendall tau • Unweighted and non-conjoint measures – Intersection metric • Weighted and conjoint – Cumulative overlap at increasing depths 43
  • 38. Categories of Semantically Unstable Resources • Entity to which a resource refers changes • Resource (i.e. website) changes • Entity/Topic to which a resource refers is controversial – website refers to controversial entity/topic on which different viewpoints exist • External conditions which impact viewpoints on entity/topic change – Website remains stable but viewpoint of taggers on the entity or topic related with the site change 45
  • 39. Relative Tag Proportion [Golder and Huberman, 2006] 46 tn+mtn stableless stable
  • 40. Relative Tag Proportion [Golder and Huberman, 2006] 47 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 0.000.050.100.150.200.25 Consecutive Tags (User List Names) RelativeTagProportion bloggers blogs business design digital entertainment internet it marketing mashable media my favstar.fm list news social social media social−media socialmedia tech tech news techies technews technology tecnologia twibes−socialmedia web Relative Tag Proportion 0 2000 4000 6000 8000 10000 0.000.050.100.150.200.250.300.35 Consecutive Tags (User List Names) RelativeTagProportion 1 2 3 4 5
  • 41. KL-Divergence [Halpin et al., 2007] 48 • KL divergence between the rank-ordered frequency distribution of the top 25 tags at different time points tn+mtn stableless stable
  • 43. Power Law [Cattuto, 2006] 50 • Is the rank-ordered frequency distribution a power law distribution? • Is the frequency y of a tag inversely proportional to it's rank r? tn+mtn
  • 44. Power Law [Cattuto, 2006] 51 • Is it really power law? – Very likely yes according to the maximum likelihood estimator and Kolmogorov- Smirnov statistic [Clauset et al., 2010] – Estimate alpha and xmin over some reasonable range – Compare power law fit to the fit of the exponential function, the lognormal function and the stretched exponential (Weibull) function. Use the log-likelihood ratios to indicate which fit is better. – We do not find significant differences between the power law fit and the lognormal fit
  • 47. Stablilization not going beyond Baseline Stability 54

Editor's Notes

  1. There are many social media apps which allow users to tag and talk… From these distr. User activities… Folksonomies are collaboratively generated and fuzzy categorization schemas. Ontologies are formally defined classification schemas. Usually they are generated pre-hoc by a group of experts who conceptualize a domain of interest. Ideally this conceptualization represents the agreed-upon and shared semantic view on the domain of interest. If the domain of interest is huge and constantly changing the manual construction of ontologies fails. Therefore there was a lot of research in the SW… Since ontologies present shared and agreed-upon semantics, we need to ask ourself to what extent the description of resources which emerge from social media streams represent shared and stable semantic descriptions of resources.
  2. Better example: resources which are very subjective. Ask 100 people and everyone describes them differently.
  3. Some resources change over time (and by res I mean everything that can have an URI) and therefor also their semantic description changes. People would have described him as body buildner in the 70ies, as an actor in the 90ies and as a politician nowadays. Other resources dont change but people‘s viewpoint on them may change or people may have contradicting viewpoints on the same resource. Therefore again the descriptions may stabelize but also destabelize over time.
  4. Other resources dont change, also people‘s viewpoint on them don‘t change. Therefore over time when people keep tagging them, the description will converge to a stable and shared semantic description. In our work we are interesting to measure semantic stability, comparing the semantic stabilization process in differnt SM systems and in exploring the factors that might impact sem stability.
  5. Previous researcher also recognized the need of exploring the semantic stability of resource descriptions which emerge when a large amount of users tag a resource since this stability is a pre-request for learning ontologies from folksonomies. And there are three methods which have been proposed to measure semantic stability. However as we have shown in our paper those methods have certain limitations to overcome them we present a novel approach for .
  6. Existing methods have certain limitations. They do not allow… they operate on a resource level
  7. Their measure converges per definition towards 0 if the number of tag assign remains stable over time. Only if the number of tags assigned to a res varies a lot over the time bins then convergence can be interpreted as sign for semantic stability. But a single tag assignment in month j has always more impact on the shape of the distribution than a single tag assignment in month j+1.
  8. If we look at the relative proportions of the top k tags (which scott golder and bernardo) did we see that
  9. To summerize: Kello et al provide a good critical reflection on the informtiveness of scaling laws. For example researchers have shown that also random sequences of characters exhibit Zipf law. So there is this ongoing disucssion about scaling laws since idiosynractic ways of producing power laws exsits. Further the question of what produces a power remains opeb. Additive summation of components as well as systems dominated by multiplicative interactions are known to produce heavy tails.
  10. So given these limitations which we observed we thought we should come up with an alternative approach for measruing semantic stability in social streams. The inuition behind our approach of measuring semantic stability is the following: for one given resource we observe after tn and tn+m tag assignments a ranked lists of tags which reflects by how many users a tag was assigned to a resource. We consider a resource description as sem. Stable if… Unstable if all descriptors are equally important or unimportant. This is a sign of disagreement.
  11. Our intuition of semantic stability incorporates 2 aspects: implicit consensus adn stability. Stability means tolerance to pertuations over time and implicit consensus means that users agree on the relative importance of tags. Some tags are picked much more frequently than others.
  12. Form this intuition we can infer some reuqirements for a new measure
  13. To operationalize our intuition of semantic stability and meet this requirements we propose to use a modified version of the RBO. Measures the agreement between two ranked lists of items. It is based upon the cumulative set overlap. Can handle non-conjoint rankings. The set overlap at each rank is weighted by a geometric sequence, providing both top-weightiness and convergence.
  14. Consider this resource for which we observe the following 2 tag distributions after tn and tn+m tag assignments. If we want to assess the stability of the resource description we first compare the overlap at depth1.
  15. And so on. This is the cumulative set overlap. Cumulative set overlap the weight of the overlap at depth d depends on the total number of depths D. Element at rank 1 has the weight D/D, Element at rank 2 has weight D-2/D. In addition we use p. P defines how fast the decline in weight is.
  16. In addition we have this parameter p which defines a convergent series of weights (i.e., a series of weights whose sum is bounded). RBO biases the proportional overlap at each depth using this convergent series of weights. Therefore p ensures that the infinte tail of tags does not dominate the finite head and this is important since we have heavy tail distributions. The smaller p the more top weighted the metric.
  17. RBO does not penalize ties.
  18. So we have two resources R1 and R2. For both resources we observe their tag distr after tn and tn+m tag assignments. If we look at R1 we can see that we have 4 concordant pairs, i.e. A has a higher rank than D after tn and also after tn+m tag assignments. The same is true for B,D and C,D. The desc of R1 contains ties, the desc of R2 does not contain ties. But we find the same concordant pairs between the 2 tag distributions of R2.
  19. Here it makes a difference how we rank. If we say C=1, B=1, A=1 and D=4 OR if we say C=1/3, B=1/3, A=1/3 and D=4. For the first version we would produce the same result as the original measure. For the second variant we would produce the following result: 0.24 (so does not really penalize ties). Alternative: we could only sum over those depths which occur in the second ranking. Then we would penalize the emgernce of ties over time, but not the existence of ties.
  20. So we have two resources R1 and R2. For both resources we observe their tag distr after tn and tn+m tag assignments. If we look at R1 we can see that we have 4 concordant pairs, i.e. A has a higher rank than D after tn and also after tn+m tag assignments. The same is true for B,D and C,D. The desc of R1 contains ties, the desc of R2 does not contain ties. But we find the same concordant pairs between the 2 tag distributions of R2.
  21. So now we have a measure that operationalizes our intuition of sem stability and we can use it to explore the stabilization process of individual resources. This figure shows the stabilization of the description of Twitter users. On Twitter people can tag…..One can see that between 1k and 2k tas high or medium levels of stability are reached depending on which resource we look at. We can also see that a random baseline process does not really stabelize. If we would e.g. only look at the shape of the distr also a random baseline process would stabelize since it would produce a flat distr which would continue to be flat over time.
  22. So now we have seen how we can use our approach on a reource level, however it is still unclear... To adress this problem we introduced a flexible definition of semantic stability which allows us to compare the semantic stability of different resource streams steming from different social tagging systems.
  23. This definition allows us to explore the semantic stabilization process per system by looking at the proportion of resources that have stabelized according to our parameter k and t. This figure shows the percentage of resources (in this case heavily tagged Twitter users) stabilized at time t with stability threshold k. For example, at point P indicates that after 1250 tag assignments 90% of resources have an RBO value of 0.61 or higher. The contour lines illustrate the curve for which the function has constant values. The corresponding values are depicted in the lines and represent the percentage of stabilization f.
  24. We used this approach to compare the sem. Stab. Process in different SM systems. First lets look at tag streams in Twitter. Tag streams are user list name streams.
  25. Let’s compare the stabilization process in Twitter with the stabilization process in Delicious. We can see that resource descriptions in Delicious stabelize faster and reach sign. Higher levels of sem. Stability.
  26. Next we looked at the sem. Stab. Of book descriptions in librarything.
  27. We also added a random baseline and we can see that it does not stabelize. This shows that our approach is able to differentiate between real sem. Stabilization and the stability which we observe for random tagging. If we would e.g. only look at the shape of the distr. Than also a random tagging process would stabelize since the relative flat list of tags also would stay relatively flat. Ist important that
  28. It is actually surprising that tag streams of Twitter users stabilize that much. We wanted to know if this is because people TAG other people or if we would observe the same stabilization if people would just TALK about other people. We created a dataset of tweets where a random set of users is mentioned. That means a stream of tweets where people talk about users. We used the words in this tweets as descriptors of the person. One can see a similar stabilization process than when people tag other users. This suggest that a medium level of stability can also be explained by the properties of natural language.
  29. This leads me to our final question….. The epistemic tagging model is a generative model which includes both BK and the influence of previously assigned tagged. Since BK is encoded in NL we cannot differ between NL and BK at this stage.
  30. Klaas and Steffen showed that a maixture of BK and imitation is best for reproducing the shape of the tag frequency distribution. However they focus on reproducing the shape of the rank-ordered frequency distribution while we explore the stabilization process over time. Further previous research considered the sharp drop between rank 7 and 10 as a typical characteristic of tagging streams which differs them for word-freu distributions. However Bollens and Halpins work suggests that this might only be caused by the user interface which suggests up to 10 tags. If no tags are suggested there is no sharp drop.
  31. First we consider a tagging model where people ONLY rely on their BK. This model reflects the properties of NL since we use Wikipedia as BK.
  32. We then add a bit of imitation. Now people imitate others 30% of the time and use 70% of the time their BK. We do not see differences.
  33. Next we use 70% imitation rate and 30% BK. We see faster and higher stabilization.
  34. Finally, we use 100% imitation and observe that in this case no stabilization happens since people fail to introduce new tags if they don‘t use their BK at all. Overall our emp. Results as well as our simulation results suggest that
  35. To sum up: where do we go from here? So I have presented you a simple method for measuring semantic stability in social streams which is pretty flexible, can easily be adapted to other count/frequency functions and it can be used on a resource and system level. I have shown you that existing methods have certain limitations and that the notion of semantic stability requires both concepts: stability and implicit consensus. So why should we care about semantic stability? First because it helps us to learn sth about the nature of resources on the web. Second it helps to identify streams which are in a stable phase. Finally, it helps to identify applications that have a community which produces stable descriptions.
  36. Empirical results as well as simulation results show that the stabilization process benefits from combining …
  37. Some interesting avenues for future work
  38. Problem of cum overlap is if we assume we have a long and potentially infinite tail than the tail will dominate the head. RBO biases the proportional overlap at each depth by a convergent series of weights (i.e., a series of weights whose sum is bounded)
  39. We used this method on different social resource streams. Here you can see a plot from one Twitter user. I use this dataset from Twitter here since this was the starting point for this project because we thought the dataset is interesting. One can see here that the lines indeed become straight for many sample users. Since I did this work during my internship at Hp i had the chance to discuss the plots with Bernardo and he said that it looks stable but less stable than delicious. So how can we quantify that?
  40. A power law is a functional relationship between two quantities, where one quantity varies as a power of another. The scale invariant propertiy of power laws makes them interesting since it suggests that no matter how much the system grows the shape of the distribution remains the same. The probability of measuring a particular value of some quantity varies (inversely) as a power of that value. The probability of observing a tag with frequency Y varies as a power of its rank. There are only few very frequent tags but many less frequent tags therefore the probability of observing a high-frequency tag is low. cumulative distribution function (CDF) (also called rank-frequency distribution) describes the probability that a random variable X will be found at a value less than or equal to x. complementary cumulative distribution function (ccdf) asks how often the random variable is above a particular level. Cumulative distributions are sometimes also called rank/frequency. Cumulative distributions with a power-law form are sometimes said to follow Zipf’s law or a Pareto distribution, after two early researchers. “Zipf’s law” and “Pareto distribution” are effectively synonymous with “power-law distribution”. Zipf’s law and the Pareto distribution differ from one another in the way the cumulative distribution is plotted—Zipf made his plots with x on the horizontal axis and P(x) on the vertical one; Pareto did it the other way around. This causes much confusion in the literature, but the data depicted in the plots are of course identical.
  41. Empirical power-law distributions hold only approximately or over a limited range.