WWW2014 Semantic Stability in Social Tagging Streams

606 views

Published on

Published in: Data & Analytics
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
606
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • There are many social media apps which allow users to tag and talk… From these distr. User activities… Folksonomies are collaboratively generated and fuzzy categorization schemas. Ontologies are formally defined classification schemas. Usually they are generated pre-hoc by a group of experts who conceptualize a domain of interest. Ideally this conceptualization represents the agreed-upon and shared semantic view on the domain of interest. If the domain of interest is huge and constantly changing the manual construction of ontologies fails. Therefore there was a lot of research in the SW… Since ontologies present shared and agreed-upon semantics, we need to ask ourself to what extent the description of resources which emerge from social media streams represent shared and stable semantic descriptions of resources.
  • Better example: resources which are very subjective. Ask 100 people and everyone describes them differently.
  • Some resources change over time (and by res I mean everything that can have an URI) and therefor also their semantic description changes. People would have described him as body buildner in the 70ies, as an actor in the 90ies and as a politician nowadays.
    Other resources dont change but people‘s viewpoint on them may change or people may have contradicting viewpoints on the same resource. Therefore again the descriptions may stabelize but also destabelize over time.
  • Other resources dont change, also people‘s viewpoint on them don‘t change. Therefore over time when people keep tagging them, the description will converge to a stable and shared semantic description. In our work we are interesting to measure semantic stability, comparing the semantic stabilization process in differnt SM systems and in exploring the factors that might impact sem stability.
  • Previous researcher also recognized the need of exploring the semantic stability of resource descriptions which emerge when a large amount of users tag a resource since this stability is a pre-request for learning ontologies from folksonomies.
    And there are three methods which have been proposed to measure semantic stability. However as we have shown in our paper those methods have certain limitations to overcome them we present a novel approach for .
  • Existing methods have certain limitations. They do not allow… they operate on a resource level
  • Their measure converges per definition towards 0 if the number of tag assign remains stable over time. Only if the number of tags assigned to a res varies a lot over the time bins then convergence can be interpreted as sign for semantic stability. But a single tag assignment in month j has always more impact on the shape of the distribution than a single tag assignment in month j+1.
  • If we look at the relative proportions of the top k tags (which scott golder and bernardo) did we see that
  • To summerize: Kello et al provide a good critical reflection on the informtiveness of scaling laws. For example researchers have shown that also random sequences of characters exhibit Zipf law. So there is this ongoing disucssion about scaling laws since idiosynractic ways of producing power laws exsits. Further the question of what produces a power remains opeb. Additive summation of components as well as systems dominated by multiplicative interactions are known to produce heavy tails.
  • So given these limitations which we observed we thought we should come up with an alternative approach for measruing semantic stability in social streams. The inuition behind our approach of measuring semantic stability is the following: for one given resource we observe after tn and tn+m tag assignments a ranked lists of tags which reflects by how many users a tag was assigned to a resource. We consider a resource description as sem. Stable if… Unstable if all descriptors are equally important or unimportant. This is a sign of disagreement.
  • Our intuition of semantic stability incorporates 2 aspects: implicit consensus adn stability. Stability means tolerance to pertuations over time and implicit consensus means that users agree on the relative importance of tags. Some tags are picked much more frequently than others.
  • Form this intuition we can infer some reuqirements for a new measure
  • To operationalize our intuition of semantic stability and meet this requirements we propose to use a modified version of the RBO. Measures the agreement between two ranked lists of items. It is based upon the cumulative set overlap. Can handle non-conjoint rankings. The set overlap at each rank is weighted by a geometric sequence, providing both top-weightiness and convergence.
  • Consider this resource for which we observe the following 2 tag distributions after tn and tn+m tag assignments. If we want to assess the stability of the resource description we first compare the overlap at depth1.
  • And so on. This is the cumulative set overlap. Cumulative set overlap the weight of the overlap at depth d depends on the total number of depths D. Element at rank 1 has the weight D/D, Element at rank 2 has weight D-2/D. In addition we use p. P defines how fast the decline in weight is.
  • In addition we have this parameter p which defines a convergent series of weights (i.e., a series of weights whose sum is bounded). RBO biases the proportional overlap at each depth using this convergent series of weights. Therefore p ensures that the infinte tail of tags does not dominate the finite head and this is important since we have heavy tail distributions.
    The smaller p the more top weighted the metric.
  • RBO does not penalize ties.
  • So we have two resources R1 and R2. For both resources we observe their tag distr after tn and tn+m tag assignments. If we look at R1 we can see that we have 4 concordant pairs, i.e. A has a higher rank than D after tn and also after tn+m tag assignments. The same is true for B,D and C,D. The desc of R1 contains ties, the desc of R2 does not contain ties. But we find the same concordant pairs between the 2 tag distributions of R2.
  • Here it makes a difference how we rank. If we say C=1, B=1, A=1 and D=4 OR if we say C=1/3, B=1/3, A=1/3 and D=4. For the first version we would produce the same result as the original measure. For the second variant we would produce the following result: 0.24 (so does not really penalize ties). Alternative: we could only sum over those depths which occur in the second ranking. Then we would penalize the emgernce of ties over time, but not the existence of ties.
  • So we have two resources R1 and R2. For both resources we observe their tag distr after tn and tn+m tag assignments. If we look at R1 we can see that we have 4 concordant pairs, i.e. A has a higher rank than D after tn and also after tn+m tag assignments. The same is true for B,D and C,D. The desc of R1 contains ties, the desc of R2 does not contain ties. But we find the same concordant pairs between the 2 tag distributions of R2.
  • So now we have a measure that operationalizes our intuition of sem stability and we can use it to explore the stabilization process of individual resources. This figure shows the stabilization of the description of Twitter users. On Twitter people can tag…..One can see that between 1k and 2k tas high or medium levels of stability are reached depending on which resource we look at. We can also see that a random baseline process does not really stabelize. If we would e.g. only look at the shape of the distr also a random baseline process would stabelize since it would produce a flat distr which would continue to be flat over time.
  • So now we have seen how we can use our approach on a reource level, however it is still unclear... To adress this problem we introduced a flexible definition of semantic stability which allows us to compare the semantic stability of different resource streams steming from different social tagging systems.
  • This definition allows us to explore the semantic stabilization process per system by looking at the proportion of resources that have stabelized according to our parameter k and t. This figure shows the percentage of resources (in this case heavily tagged Twitter users) stabilized at time t with stability threshold k. For example, at point P indicates that after 1250 tag assignments 90% of resources have an RBO value of 0.61 or higher. The contour lines illustrate the curve for which the function has constant values. The corresponding values are depicted in the lines and represent the percentage of stabilization f.
  • We used this approach to compare the sem. Stab. Process in different SM systems. First lets look at tag streams in Twitter. Tag streams are user list name streams.
  • Let’s compare the stabilization process in Twitter with the stabilization process in Delicious. We can see that resource descriptions in Delicious stabelize faster and reach sign. Higher levels of sem. Stability.
  • Next we looked at the sem. Stab. Of book descriptions in librarything.
  • We also added a random baseline and we can see that it does not stabelize. This shows that our approach is able to differentiate between real sem. Stabilization and the stability which we observe for random tagging. If we would e.g. only look at the shape of the distr. Than also a random tagging process would stabelize since the relative flat list of tags also would stay relatively flat. Ist important that
  • It is actually surprising that tag streams of Twitter users stabilize that much. We wanted to know if this is because people TAG other people or if we would observe the same stabilization if people would just TALK about other people. We created a dataset of tweets where a random set of users is mentioned. That means a stream of tweets where people talk about users. We used the words in this tweets as descriptors of the person. One can see a similar stabilization process than when people tag other users. This suggest that a medium level of stability can also be explained by the properties of natural language.
  • This leads me to our final question…..
    The epistemic tagging model is a generative model which includes both BK and the influence of previously assigned tagged. Since BK is encoded in NL we cannot differ between NL and BK at this stage.
  • Klaas and Steffen showed that a maixture of BK and imitation is best for reproducing the shape of the tag frequency distribution. However they focus on reproducing the shape of the rank-ordered frequency distribution while we explore the stabilization process over time. Further previous research considered the sharp drop between rank 7 and 10 as a typical characteristic of tagging streams which differs them for word-freu distributions. However Bollens and Halpins work suggests that this might only be caused by the user interface which suggests up to 10 tags. If no tags are suggested there is no sharp drop.
  • First we consider a tagging model where people ONLY rely on their BK. This model reflects the properties of NL since we use Wikipedia as BK.
  • We then add a bit of imitation. Now people imitate others 30% of the time and use 70% of the time their BK. We do not see differences.
  • Next we use 70% imitation rate and 30% BK. We see faster and higher stabilization.
  • Finally, we use 100% imitation and observe that in this case no stabilization happens since people fail to introduce new tags if they don‘t use their BK at all. Overall our emp. Results as well as our simulation results suggest that
  • To sum up: where do we go from here? So I have presented you a simple method for measuring semantic stability in social streams which is pretty flexible, can easily be adapted to other count/frequency functions and it can be used on a resource and system level. I have shown you that existing methods have certain limitations and that the notion of semantic stability requires both concepts: stability and implicit consensus.
    So why should we care about semantic stability?
    First because it helps us to learn sth about the nature of resources on the web. Second it helps to identify streams which are in a stable phase. Finally, it helps to identify applications that have a community which produces stable descriptions.
  • Empirical results as well as simulation results show that the stabilization process benefits from combining …
  • Some interesting avenues for future work
  • Problem of cum overlap is if we assume we have a long and potentially infinite tail than the tail will dominate the head. RBO biases the proportional overlap at each depth by a convergent series of weights (i.e., a series of weights whose sum is bounded)
  • We used this method on different social resource streams. Here you can see a plot from one Twitter user. I use this dataset from Twitter here since this was the starting point for this project because we thought the dataset is interesting. One can see here that the lines indeed become straight for many sample users. Since I did this work during my internship at Hp i had the chance to discuss the plots with Bernardo and he said that it looks stable but less stable than delicious. So how can we quantify that?
  • A power law is a functional relationship between two quantities, where one quantity varies as a power of another. The scale invariant propertiy of power laws makes them interesting since it suggests that no matter how much the system grows the shape of the distribution remains the same.
    The probability of measuring a particular value of some quantity varies (inversely) as a power of that value.
    The probability of observing a tag with frequency Y varies as a power of its rank. There are only few very frequent tags but many less frequent tags therefore the probability of observing a high-frequency tag is low.


    cumulative distribution function (CDF) (also called rank-frequency distribution) describes the probability that a random variable X will be found at a value less than or equal to x.
    complementary cumulative distribution function (ccdf) asks how often the random variable is above a particular level.


    Cumulative distributions are sometimes also called rank/frequency. Cumulative distributions with a power-law form are sometimes said to follow Zipf’s law or a Pareto distribution, after two early researchers.
    “Zipf’s law” and “Pareto distribution” are effectively synonymous with “power-law distribution”.
    Zipf’s law and the Pareto distribution differ from one another in the way the cumulative distribution is plotted—Zipf made his plots with x on the horizontal axis and P(x) on the vertical one; Pareto did it the other way around. This causes much confusion in the literature, but the data depicted in the plots are of course identical.
  • Empirical power-law distributions hold only approximately or over a limited range.
  • WWW2014 Semantic Stability in Social Tagging Streams

    1. 1. Semantic Stability in Social Tagging Streams Claudia Wagner, Philipp Singer, Markus Strohmaier and Bernardo Huberman
    2. 2. 2 Folksonomies Ontologies Formal, shared and stableNot formal but shared and stable?
    3. 3. 4 1970 1990 2010 http://schwarzenegger.com/
    4. 4. 5 How can we measure semantic stability? How can we compare the semantic stabilization process in different systems? What impacts semantic stability?
    5. 5. Measuring Semantic Stability State of the Art • Relative tag proportions per resource become stable with increasing number of tag assignments [Golder and Huberman, 2006] • KL-divergence of rank-ordered tag frequency distribution per resource at different time points converges towards zero [Halpin et al., 2007] • Power Law distributions [Cattuto et al., 2006] – Scale invariance property ensures that regardless how large the system grows the shape of the distribution stays the same 6
    6. 6. Some Limitations • Don’t allow comparing the semantic stabilization process of different systems • Prune tag distributions to top-k tags – Cannot handle non-conjoint lists of tags • Random tagging process also produces “stable” description – Tag assignment at timepoint t+1 has less impact on the tag distribution of a resource than a tag at timepoint t 7
    7. 7. Example KL-Divergence 8 • KL-divergence converges towards zero. • But random baseline also converges towards zero if we assume a constant tagging rate. • We do not always know the top k tags! 0 200 400 600 800 1000 0.00.20.40.60.81.0 Number of consecutive tags assignments KLDivergence ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
    8. 8. Example Relative Tag Proportion 9 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 0.000.050.100.150.200.25 Consecutive Tags (User List Names) RelativeTagProportion bloggers blogs business design digital entertainment internet it marketing mashable media my favstar.fm list news social social media social−media socialmedia tech tech news techies technews technology tecnologia twibes−socialmedia web Relative Tag Proportion 0 2000 4000 6000 8000 10000 0.000.050.100.150.200.250.300.35 Consecutive Tags (User List Names) RelativeTagProportion 1 2 3 4 5
    9. 9. Intuition and Approach • Some descriptors are more important than others. • Ranking of (top) descriptors remains stable over time • All descriptors are equally important. • Ranking of (top) descriptors changes over time 0 0.1 0.2 0.3 0.4 P(T) 0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 P(T) 0 0.1 0.2 0.3 stable less stable tn tn+m tn tn+m
    10. 10. Intuition and Approach • Some descriptors are more important than others. • Ranking of (top) descriptors remains stable over time • All descriptors are equally important. • Ranking of (top) descriptors changes over time 0 0.1 0.2 0.3 0.4 P(T) 0 0.1 0.2 0.3 0.4 stable less stable tn tn+m tn tn+m 0 0.2 0.4 0.6 0 0.2 0.4 P(T)
    11. 11. Requirements • Rank agreement of the descriptors of a resources over time • Weighted rank agreement • Non-conjoint lists of descriptors • Random Baseline 13
    12. 12. Rank Biased Overlap (RBO) [Webber et al., 2010] • RBO falls in the range [0, 1], where 0 means disjoint, and 1 means identical • p lies between 0 and 1 and determines how steep the decline in weights is • The smaller p, the more top-weighted the metric 14
    13. 13. Example 15 0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 0.4 Overlap at depth 1 = 1 P(T) P(T) tn tn+m
    14. 14. Example 16 0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 0.4 Overlap at depth 2 = 0.5 P(T) P(T) tn tn+m
    15. 15. Example 17 0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 0.4 Overlap at depth 3 = 1 P(T) P(T) tn tn+m
    16. 16. Effect of the Paramter p 18
    17. 17. Tie correction for Rank Biased Overlap • RBO does not penalize ties • We want to penalize ties since they show that users have not agreed on a ranking • Sum only over those depths which occur in at least one of the two rankings 19
    18. 18. Same concordant pairs: (A,D) and (B,D) and (C,D) 0 10 20 30 40 50 60 70 80 90 A B C D 0 10 20 30 40 50 60 70 80 90 C B A D RBOorig = 0.2 RBOmod= 0.2 0 10 20 30 40 50 60 70 80 90 A B C D 0 10 20 30 40 50 60 70 80 90 A B C D RBOorig = 0.34 RBOmod= 0.17 No Ties Ties tn tn+m tn tn+m R1 R2 A B C D C B A D A B C D C B A D Frequency Frequency
    19. 19. Semantic Stabilization on a Resource Level 23 0 1000 2000 3000 4000 0.00.20.40.60.81.0 Number of consecutive tags assignments RBO ● ● ● ● ● ● ● ●● ● ●● ● ●● ●●● ●● ●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●● ● ●●●●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●● ● ● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● • Tag distributions of Twitter users become semantically stable between 1k and 2k tag assignments • The RBO values of random tagging distributions increase slower and are significantly lower
    20. 20. Semantic Stabilization on a System Level • How can we compare the semantic stabilization process in different systems? • We call a resource description semantically stable after tn+m tag assignments, if the RBO value between its tag distribution at point tn and tn+m is equal or greater than k. 24
    21. 21. Semantic Stabilization on a System Level 25 After 1250 tag assignments 90% of all resources have a stability above 0.61
    22. 22. Empirical Study Twitter 26 Medium level of semantic stability is reached after 1k-2k tag assignments
    23. 23. Empirical Study Twitter and Delicious 27 Tag streams in Delicious stabelize faster and sign. higher than in Twitter
    24. 24. Empirical Study Twitter, Delicious and LibraryThing 28 Same is true for tag streams of books in LibraryBook
    25. 25. Empirical Study Random Baseline 29
    26. 26. Difference between tag and word streams? 30
    27. 27. What causes semantic stability? • Simulations based on the epistemic tagging model [Dellschaft and Staab, 2008]. • Use parameter I as imitation rate and produce tag distributions for I=0, 0.1, ... 1 31
    28. 28. What causes stability? 33 Medium levels of semantic stability are reached after 1k-2k tag assignments
    29. 29. What causes stability? 34 Same is true if we combine BK and imitation when BK is dominant
    30. 30. What causes stability? 35 If imitation and BK are combined an imitation is dominant higher levels of semantic stability are reached faster
    31. 31. What causes stability? 36 • Combination of shared background knowledge and imitation behaviour (where imitation is more important) leads to the fastest and highest stabilization. • Natural language systems show similar stabilization as social tagging systems where no imitation is supported
    32. 32. Conclusions & Implications • Attempt to formalize semantic stability in social streams • Novel approach to measure and compare the semantic stabilization process in different social streams Why is that useful? • Identify social streams (e.g. tag stream of URL or word stream of hashtags) which are semantically stable – Extract shared and agreed-upon semantic knowledge from social streams • Select systems that provide semantically stable streams 37
    33. 33. References • D. Bollen and H. Halpin. The role of tag suggestions in folksonomies. In Proceedings of the 20th ACM conference on Hypertext and hypermedia, HT ’09, pages 359–360, New York, NY, USA, 2009. ACM. • C. Cattuto, Semiotic dynamics on social tagging communities. The European Physical Journal C - Particles and Fields August 2006, Volume 46, Issue 2 Supplement, pp 33-37 • A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Rev., 51(4):661–703, Nov. 2009. • K. Dellschaft and S. Staab. An epistemic dynamic model for tagging systems. In HT ’08: Proceedings of the nineteenth ACM conference on Hypertext and hypermedia, pages 71–80, New York, NY, USA, 2008. ACM. • S. Golder and B. A. Huberman. Usage patterns of collaborative tagging systems. Journal of Information Science, 32(2):198–208, April 2006. • H. Halpin, V. Robu, and H. Shepherd. The complex dynamics of collaborative tagging. In Proceedings of the 16th international conference on World Wide Web, WWW ’07, pages 211–220, New York, NY, USA, 2007. ACM. • A. Hotho, R. Jäschke, C. Schmitz, and G. Stumme. Bibsonomy: A social bookmark and publication sharing system. In Proceedings of the Conceptual Structures Tool Interoperability Workshop at the 14th International Conference on Conceptual Structures, pages 87-102, 2006. • C. T. Kello, G. D. A. Brown, R. Ferrer-i Cancho, J. G. Holden, K. Linkenkaer-Hansen, T. Rhodes, and G. C. Van Orden. Scaling laws in cognitive sciences. Trends in Cognitive Sciences, 14(5):223{232, May 2010. • W. Webber, A. Moat, and J. Zobel. A similarity measure for indefinite rankings. ACM Trans. Inf. Syst., 28(4):20:1{20:38, Nov. 2010. 40
    34. 34. Thank you! 41 Special thanks to my collaborators (2/3 of them are here):
    35. 35. Limitations and Future Work • RBO measures ranking but ignores the differences in the frequencies • Decay function to weight tag counts – old tag assignments are less important than new ones • Number and diversity of users who tag a resource might impact the semantic stabilization process 42
    36. 36. Alternatives to RBO • Unweighted and conjoint measures – Kendall tau, Spearman rho • Weighted and conjoint measures – Weighted Kendall tau • Unweighted and non-conjoint measures – Intersection metric • Weighted and conjoint – Cumulative overlap at increasing depths 43
    37. 37. Dataset 44
    38. 38. Categories of Semantically Unstable Resources • Entity to which a resource refers changes • Resource (i.e. website) changes • Entity/Topic to which a resource refers is controversial – website refers to controversial entity/topic on which different viewpoints exist • External conditions which impact viewpoints on entity/topic change – Website remains stable but viewpoint of taggers on the entity or topic related with the site change 45
    39. 39. Relative Tag Proportion [Golder and Huberman, 2006] 46 tn+mtn stableless stable
    40. 40. Relative Tag Proportion [Golder and Huberman, 2006] 47 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 0.000.050.100.150.200.25 Consecutive Tags (User List Names) RelativeTagProportion bloggers blogs business design digital entertainment internet it marketing mashable media my favstar.fm list news social social media social−media socialmedia tech tech news techies technews technology tecnologia twibes−socialmedia web Relative Tag Proportion 0 2000 4000 6000 8000 10000 0.000.050.100.150.200.250.300.35 Consecutive Tags (User List Names) RelativeTagProportion 1 2 3 4 5
    41. 41. KL-Divergence [Halpin et al., 2007] 48 • KL divergence between the rank-ordered frequency distribution of the top 25 tags at different time points tn+mtn stableless stable
    42. 42. KL-Divergence 49
    43. 43. Power Law [Cattuto, 2006] 50 • Is the rank-ordered frequency distribution a power law distribution? • Is the frequency y of a tag inversely proportional to it's rank r? tn+mtn
    44. 44. Power Law [Cattuto, 2006] 51 • Is it really power law? – Very likely yes according to the maximum likelihood estimator and Kolmogorov- Smirnov statistic [Clauset et al., 2010] – Estimate alpha and xmin over some reasonable range – Compare power law fit to the fit of the exponential function, the lognormal function and the stretched exponential (Weibull) function. Use the log-likelihood ratios to indicate which fit is better. – We do not find significant differences between the power law fit and the lognormal fit
    45. 45. RBO 52
    46. 46. Stablilization going beyond Baseline Stability 53
    47. 47. Stablilization not going beyond Baseline Stability 54

    ×