Meaning as Collective Use:         Predicting Semantic Hashtag Categories on Twitter                                      ...
To answer the first research question, we used statisti-        grounded and classified into predefined semantic categories.c...
3/4/2012                1 week             4/1/2012                              4/29/2012we used to describe hashtag stre...
friends (norm entropy friend) by the number of stream’s          3.2.3    Lexical Measures:authors they are related with. ...
Since we have two different types of pragmatic features,             the 64 hashtag streams. That means we destroyed the or...
technology                                                                  technology                       sports       ...
1.0                                                                                                                       ...
We believe that pragmatic features can supplement lex-                Proceedings of the 19th International World Wideical...
Upcoming SlideShare
Loading in …5

Meaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter


Published on

This paper sets out to explore whether data about the us-
age of hashtags on Twitter contains information about their
semantics. Towards that end, we perform initial statisti-
cal hypothesis tests to quantify the association between us-
age patterns and semantics of hashtags. To assess the util-
ity of pragmatic features { which describe how a hashtag
is used over time { for semantic analysis of hashtags, we
conduct various hashtag stream classi cation experiments
and compare their utility with the utility of lexical features.
Our results indicate that pragmatic features indeed contain
valuable information for classifying hashtags into semantic
categories. Although pragmatic features do not outperform
lexical features in our experiments, we argue that pragmatic
features are important and relevant for settings in which tex-
tual information might be sparse or absent (e.g., in social
video streams).

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Meaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter

  1. 1. Meaning as Collective Use: Predicting Semantic Hashtag Categories on Twitter Lisa Posch Claudia Wagner Graz University of Technology, JOANNEUM RESEARCH, Knowledge Management Institute for Information and Institute Communication Technologies Inffeldgasse 13, 8010 Graz, Steyrergasse 17, 8010 Graz, Austria Austria Philipp Singer Markus Strohmaier Graz University of Technology, Graz University of Technology, Knowledge Management Knowledge Management Institute Institute Inffeldgasse 13, 8010 Graz, Inffeldgasse 13, 8010 Graz, Austria Austria markus.strohmaier@tugraz.atABSTRACT Thus, new methods and techniques for analyzing the se-This paper sets out to explore whether data about the us- mantics of hashtags are definitely needed.age of hashtags on Twitter contains information about their A simplistic view on Wittgenstein’s work [17] suggestssemantics. Towards that end, we perform initial statisti- that meaning is use. This indicates that the meaning ofcal hypothesis tests to quantify the association between us- a word is not defined by a reference to the object it denotes,age patterns and semantics of hashtags. To assess the util- but by the variety of uses to which the word is put. There-ity of pragmatic features – which describe how a hashtag fore, one can use the narrow, lexical context of a word (i.e.,is used over time – for semantic analysis of hashtags, we its co-occurring words) to approximate its meaning. Ourconduct various hashtag stream classification experiments work builds on this observation, but focuses on the prag-and compare their utility with the utility of lexical features. matics of a word (i.e., how a word, or in our case a hashtag,Our results indicate that pragmatic features indeed contain is used by a large group of users) – rather than its narrow,valuable information for classifying hashtags into semantic lexical context.categories. Although pragmatic features do not outperform The aim of this work is to investigate to what extent prag-lexical features in our experiments, we argue that pragmatic matic characteristics of a hashtag (which capture how a largefeatures are important and relevant for settings in which tex- group of users uses a hashtag) may reveal information abouttual information might be sparse or absent (e.g., in social its semantics. Specifically, our work addresses the followingvideo streams). research questions: • Do different semantic categories of hashtags reveal sub-Categories and Subject Descriptors stantially different usage patterns?H.3.5 [Information Storage and Retrieval]: Online In- • To what extent do pragmatic and lexical properties offormation Services—Web-based services hashtags help to predict the semantic category of a hashtag?Keywords To address these research questions we conducted an em-Twitter; hashtags; social structure; semantics pirical study on a broad range of diverse hashtag streams be- longing to eight different semantic categories (such as tech- nology, sports or idioms) which have been identified in pre-1. INTRODUCTION vious research [12] and have shown to be useful for grouping A hashtag is a string of characters preceded by the hash hashtags. From each of the eight categories, we selected ten(#) character and it is used on platforms like Twitter as sample hashtags at random and collected temporal snap-descriptive label or to build communities around particular shots of messages containing at least one of these hashtagstopics [15]. To outside observers, the meaning of hashtags at three different points in time. To quantify how hashtagsis usually difficult to analyze, as they consist of short, often are used over time, we extended the set of pragmatic streamabbreviated or concatenated concepts (e.g., #MSM2013). measures which we introduced in our previous work [16] and applied them to the hashtag streams in our dataset. TheseCopyright is held by the International World Wide Web Conference pragmatic measures capture not only the social structure ofCommittee (IW3C2). IW3C2 reserves the right to provide a hyperlinkto the author’s site if the Material is used in electronic media. a hashtag at specific points in time, but also the changes inWWW 2013 Companion, May 13–17, 2013, Rio de Janeiro, Brazil. social structure over time.ACM 978-1-4503-2038-2/13/05.
  2. 2. To answer the first research question, we used statisti- grounded and classified into predefined semantic standard tests which allow to quantify the association For example, Noll and Meinel [8] presented a study of thebetween pragmatic characteristics of hashtag streams and characteristics of tags and determined their usefulness fortheir semantic categories. To tackle the second research web page classification [9]. Similar to our work, Overell etquestion, we firstly computed lexical features using a a stan- al. [10] presented an approach which allows classifying tagsdard bag-of-words model with term frequency (TF). Then, into semantic categories. They trained a classifier to classifywe trained several classification models with lexical features Wikipedia articles into semantic categories, mapped Flickronly, pragmatic features only and a combination of both. We tags to Wikipedia articles using anchor texts in Wikipediacompared the performance of different classification models and finally classified Flickr tags into semantic categories byby using standard evaluation measures such as the F1-score using the previously trained classifier. Their results show(which is defined as the harmonic mean of precision and that their ClassTag system increases the coverage of the vo-recall). To get a fair baseline for our classification mod- cabulary by 115% compared to a simple WordNet approachels, we constructed a control dataset by randomly shuffling which classifies Flickr tags by mapping them to WordNetthe category labels of the hashtag streams. That means we via string matching techniques. Unlike our work, they diddestroyed the original relationship between the pragmatic not take into account how tags are used, but learn relationsproperties and the semantic categories of hashtags. between tags and semantic categories via mapping them to Our results show that pragmatic features indeed reveal Wikipedia articles.information about hashtags’ semantics and perform signifi- Pragmatics and semantics of hashtags: On Twitter,cantly better than the baseline. They can therefore be useful users have developed a tagging culture by adding a hashfor the task of semantically annotating social media con- symbol (#) in front of a short keyword. The first introduc-tent. Not surprisingly, our results also show that lexical fea- tion of the usage of hashtags was provided by Chris Messinatures are more suitable than pragmatic features for the task in a blog post [7]. Huang et al. [4] state that this kindof semantically categorizing hashtag streams. However, an of new tagging culture has created a completely new phe-advantage of pragmatic features is that they are language- nomenon, called micro-meme. The difference between suchand text-independent. Pragmatic features can be applied to micro-memes and other social tagging systems is that thetasks where the creation of lexical features is not possible – participation in micro-memes is an a-priori approach, whilesuch as multimedia streams. Also for scenarios where tex- other social tagging systems follow an a-posteriori approach.tual content is available, pragmatic features allow for more This is due to the fact that users are influenced by the ob-flexibility due to their independence of the language used in servation of the usage of micro-meme hashtags adopted bythe corpus. Our results are relevant for social media and other users. The work of [4] suggests that hashtagging insemantic web researchers who are interested in analyzing Twitter is more commonly used to join public discussionsthe semantics of hashtags in textual or non-textual social than to organize content for future retrieval. The role ofstreams (e.g., social video streams). hashtags has also been investigated in [18]. Their study This paper is structured as follows: Section 2 gives an confirms that a hashtag serves both as a tag of content andoverview of related research on analyzing the semantics of a symbol of community membership. Laniado and Mika [6]tags in social bookmarking systems and research on hash- explored to what extent hashtags can be used as strong iden-tagging on Twitter in general. In Section 3 we describe tifiers like URIs are used in the Semantic Web. They mea-our experimental setup, including our datasets, feature en- sured the quality of hashtags as identifiers for the Semanticgineering and evaluation approach. Our results are reported Web, defining several metrics to characterize hashtag usagein Section 4 and further discussed in Section 5. Finally, we on the dimensions of frequency, specificity, consistency, andconclude our work in Section 6. stability over time. Their results indicate that the lexical usage of hashtags can indeed be used to identify hashtags2. RELATED WORK which have the desirable properties of strong identifiers. Un- like our work, their work focuses on lexical usage patterns In the past, a considerable effort has been spent on study- and measures to what extent those patterns contribute toing the semantics of tags (e.g., tags in social bookmarking the differentiation between strong and weak semantic iden-systems), but also hashtags in Twitter have received atten- tifiers (binary classification) while we use usage patterns totion from the research community. classify hashtags into semantic categories. Semantics of tags: On the one hand, researchers ex- Recently, researchers have also started to explore the dif-plored to what extent semantics emerge from folksonomies fusion dynamics of hashtags - i.e., how hashtags spread inby investigating different algorithms for extracting tag net- online communities. For example the work of [15] aims toworks and hierarchies from such systems (see e.g., [1], [3] or predict the exposure of a hashtag in a given time frame[13]). The work of [14] evaluated three state-of-the-art folk- while [12] are interested in the temporal spreading patternssonomy induction algorithms in the context of five social of hashtags.tagging systems. Their results show that those algorithmsspecifically developed to capture intuitions of social taggingsystems outperform traditional hierarchical clustering tech- 3. EXPERIMENTAL SETUPniques. K¨rner et al. [5] investigated how tagging usage pat- o Our experiments are designed to explore to what extentterns influence the quality of the emergent semantics. They pragmatic properties of hashtag streams can be used tofound that ‘verbose’ taggers (describers) are more useful for gauge the semantic category of a hashtag. We are not onlythe emergence of tag semantics than users who use a small interested in the idiosyncrasies of hashtag usage within oneset of tags (categorizers). semantic category but also in the deltas between different On the other hand, researchers investigated to what extent semantic categories. In this section, we first introduce ourtags (and the resources they annotate) can be semantically dataset as well as the pragmatic and lexical measures which
  3. 3. 3/4/2012 1 week 4/1/2012 4/29/2012we used to describe hashtag streams. Then we present the t0 t1 t2methodology and evaluation approach which we used to an-swer our research questions. crawl of crawl of crawl of stream stream stream social social social tweets tweets tweets structure structure structure3.1 Dataset In this work we use data that we acquired from Twit- Figure 1: Timeline of the data collection processter’s API. Romero et al. [12] conducted a user study and aclassification experiment and identified eight broad semantic categories (concretely, we removed the two hashtags #bsbcategories of hashtags: celebrity, games, idiom, movies/TV, and #mj). We also decided to remove inactive hashtagmusic, political, sports and technology. We used a list con- streams (those where less than 300 posts where retrieved)sisting of the 500 hashtags which were used by most users as estimating information theoretic measures is problematicwithin their dataset and which were manually assigned to if only few observations are available [11]. The most commonthe eight categories as a starting point for creating our own solution is to restrict the measurements to situations wheredataset. one has an adequate amount of data. We found four inactive For each category, we chose ten hashtags at random (see hashtags in the category games and seven in the categoryTable 1). We biased our random sample towards active celebrity. The removal of these hashtag streams resulted inhashtag streams by re-sampling hashtags for which we found the complete removal of the category celebrity as it was onlyless than 1000 posts at the beginning of our data collection left with one hashtag stream (#michaeljackson). A possi-(March 4th, 2012). For those categories for which we could ble explanation for the low number of tweets in the hashtagnot find ten hashtags that had more than 1000 posts (i.e., streams for this category is that topics related to celebritiesgames and celebrity), we selected the most active hashtags have a shorter life-span than topics related to other cate-per category (i.e., the hashtags for which we found the most gories. Our final datasets consist of 64 hashtag streams andposts). seven semantic categories which were sufficiently active dur- The dataset consists of three parts, each part representing ing our observation period.a time frame of four weeks. The different time frames ensurethat we can observe the usage of a hashtag over a givenperiod of time. The time frames are independent of each Table 2: Description of the complete datasetother, i.e., the data collected at one time frame does not t0 t1 t2contain any information of the data collected at another time Tweets 94,634 94,984 95,105frame. Authors 53,593 54,099 53,750 At the start of each time frame, we retrieved the most Followers 56,685,755 58,822,119 66,450,378recent tweets in English for each hashtag using Twitter’s Followees 34,025,961 34,263,129 37,674,363 Friends 21,696,134 21,914,947 24,449,705public search API. Afterwards, we retrieved the followers Mean Followers per Author 1,057.71 1,087.31 1,236.29and followees of each user who had authored at least one Mean Followees per Author 634.90 633.34 700.92message in our hashtag stream dataset. Some pragmatic fea- Mean Friends per Author 404.83 405.09 454.88tures capture information about who potentially consumesa hashtag stream (followers) or who potentially informs au-thors of a hashtag stream (followees) and therefore require 3.2 Feature Engineeringthe one-hop neighborhood of hashtag streams’ authors. In In the following, we define the pragmatic and lexical fea-this work, we call users who hold both of these roles (i.e., tures which we designed to capture the different social andhave established a bidirectional link with an author) friends. message based structures of hashtag streams. For our prag-The starting dates of the time frames were March 4th (t0 ), matic features we further differentiate between static prag-April 1st (t1 ) and April 29th, 2012 (t2 ). Table 2 depicts matic features (which capture the social structure of a hash-the number of tweets and relations between users that we tag at a specific point in time) and dynamic pragmatic fea-collected during each time frame. tures (which combine information from several time points). The stream tweets were retrieved on the first day of eachtime frame, fetching tweets that were authored a maximum 3.2.1 Static Pragmatic Measures:of seven days previous to the date of retrieval. During the Entropy Measures are used to measure the random-first week of each time frame, the user IDs of the followers ness of streams’ authors and their followers, followees andand followees were collected. Figure 1 depicts this process. friends. For each hashtag stream, we rank the authors Since we were interested in learning what types of char- by the number of messages they published in that streamacteristics are useful for describing a semantic hashtag cat- (norm entropy author) and we rank the followers (norm -egory, we removed hashtag streams that belong to multiple entropy follower), followees (norm entropy followee) and Table 1: Randomly selected hashtags per category (ordered alphabetically) technology idioms sports political games music celebrity movies blackberry factaboutme f1 climate e3 bsb ashleytisdale avatar ebay followfriday football gaza games eurovision brazilmissesdemi bbcqt facebook dontyouhate golf healthcare gaming lastfm bsb bones flickr iloveitwhen nascar iran mafiawars listeningto michaeljackson chuck google iwish nba mmot mobsterworld mj mj glee iphone nevertrust nhl noh8 mw2 music niley glennbeck microsoft omgfacts redsox obama ps3 musicmonday regis movies photoshop oneofmyfollowers soccer politics spymaster nowplaying teamtaylor supernatural socialmedia rememberwhen sports teaparty uncharted2 paramore tilatequila tv twitter wheniwaslittle yankees tehran wow snsd weloveyoumiley xfactor
  4. 4. friends (norm entropy friend) by the number of stream’s 3.2.3 Lexical Measures:authors they are related with. A high author entropy indi- We use vector-based methods which allow representingcates that the stream is created in a democratic way since all each microblog message as a vector of terms and use termauthors contribute equally much. A high follower entropy frequency (TF ) as weighting schema. In this work lexicaland friend entropy indicate that the followers and friends do measures are always computed for individual time pointsnot focus their attention towards few authors but distribute and are therefore static equally across all authors. A high followee entropy andfriend entropy indicate that the authors do not focus their 3.3 Usage Patterns of Hashtag Categoriesattention on a selected part of their audience. Our first aim is to investigate whether different seman- Overlap Measures describe the overlap between the au- tic categories of hashtags reveal substantially different us-thors and the followers (overlap authorfollower), followees age patterns (such as that they are used and/or consumed(overlap authorfollowee) or friends (overlap authorfriend) of by different sets of users or that they are used for differ-a hashtag stream. If overlap is one, all authors of a stream ent purpose). To compare the pragmatic fingerprints ofare also followers, followees or friends of stream authors. hashtags belonging to different semantic categories and toThis indicates that the stream is consumed and produced by quantify the differences between categories, we conductedthe same users. A high overlap suggests that the community a pairwise Mann-Whitney-Wilcoxon-Test which is a non-around the hashtag is rather closed, while a low overlap indi- parametric statistical hypothesis test for assessing whethercates that the community is more open and that active and one of two samples of independent observations tends to havepassive part of the community do not extensively overlap. larger values than the other. We used a non-parametric test Coverage Measures characterize a hashtag stream via since the Shapiro-Wilk-Test revealed that not all featuresthe nature of its messages. We introduce four coverage are normally distributed, even after applying arcsine trans-measures. The informational coverage measure (informa- formation to ratio measures. Holm-Bonferroni method wastional) indicates how many messages of a stream have an used for adjusting the p-values and counteract the probleminformational purpose - i.e., contain a link. The conversa- of multiple comparisons. For this experiment, we used thetional coverage (conversational) measures the mean number timeframes t0→1 and t1→2 .of messages of a stream that have a conversational purpose- i.e., those messages that are directed to one or several t1→0 t2→1specific users (e.g., through @replies). The retweet cover-age (retweet) measures the percentage of messages whichare retweets. The hashtag coverage (hashtag) measures themean number of hashtags per message in a stream. t0 t1 t23.2.2 Dynamic Pragmatic Measures: To explore how the social structure of a hashtag streamchanges over time, we measure the distance between thetweet-frequency distributions of authors at different time t0→1 t1→2points, and the author-frequency distributions of followers,followees or friends at different time points. The intuitionbehind these features is that certain semantic categories of Figure 2: Illustration of our time frameshashtags may have a fast changing social structure sincenew people start and stop using those types of hashtagsfrequently, while other semantic categories may have a more 3.4 Hashtag Classificationstable community around them which changes less over time. Our second aim is to investigate to what extent pragmatic We use a symmetric variation of the Kullback-Leibler di- and lexical properties of hashtag streams contribute to clas-vergence (DKL ) which represents a natural distance mea- sify them into their semantic categories. That means wesure between two probability distributions (A and B) and aim to classify temporal snapshots of hashtag streams into 2 1is defined as follows: 1 DKL (A||B) + 2 DKL (B||A). The KL their correct semantic categories (to which they were as-divergence is also known as relative entropy or information signed in [12]) just by analyzing how they are used overdivergence. The KL divergence is zero if the two distri- time. We then compare the performance of the pragmati-butions are identical and approaches infinity as they differ cally informed classifier with the performance of a classifiermore and more. We measure the KL divergence for the dis- informed by lexical features within a semantic multiclasstributions of authors (kl authors), followers (kl followers), classification task. We used the timeframes t1→0 and t2→1followees (kl followees) and friends (kl friends). for this experiment in order to avoid including information Figure 2 visualizes the different time frames and their no- from the ‘future’ in our classification.tation. t0 only contains the static features computed from We performed grid search with varying hyperparametersdata collected at t0 . Consequently, t1 and t2 only contain using Support Vector Machine (linear and RBF kernels) andthe static features computed from data collected at t1 or an ensemble method with extremely randomized trees. Sincet2 , respectively. t0→1 includes static features computed on extremely randomized trees are a probabilistic method anddata collected at t0 and the dynamic measures computed perform slightly different in each run, we run them ten timeson data collected at t0 and t1 . t1→0 includes static features and report the average scores. The features were standard-computed on data collected at t1 and the dynamic measures ized by subtracting the mean and scaling to unit variance.computed on data collected at t0 and t1 . t1→2 and t2→1 are We used stratified 6-fold cross-validation (CV) to train anddefined in the same way. test each classification model.
  5. 5. Since we have two different types of pragmatic features, the 64 hashtag streams. That means we destroyed the orig-static and dynamic ones, we trained and tested three sepa- inal relationship between the pragmatic properties and therate classification models which were only informed by prag- semantic categories of hashtags and evaluated the perfor-matic information: mance of a classifier which tries to use pragmatic properties Static Pragmatic Model: We trained and tested this to classify hashtags into their shuffled categories within a 6-classification model with static pragmatic features on data fold cross-validation. We repeated the random shuffling 100collected at t1 using stratified 6-fold CV. The experiment times and used the resulting average F1-score as our base-was repeated on the data collected at t2 . line performance. For the baseline classifier we also used grid Dynamic Pragmatic Model: We trained and tested search to determine the optimal parameters prior to train-the classification model with dynamic pragmatic features on ing. Our baseline classifier tests how well randomly assigneddata collected at t0 and t1 using stratified 6-fold CV. The categories can be identified compared to our real semanticcomputation of our dynamic features requires at least two categories. One needs to note that a simple random guessertime points. We repeated this experiment on data collected baseline would be a weaker baseline than the one describedat t1 and t2 . above and would lead to a performance of 1/7. Combined Pragmatic Model: We combined the static To gain further insights into the impact of individual prop-and dynamic pragmatic features, and trained and tested the erties, we analyzed their information gain (IG) with respectclassification model on the data of t1→0 using stratified 6- to the categories. The information gain measures how accu-fold CV. Again, we repeated the experiment on the data of rately a specific stream property P is able to predict stream’st2→1 . category C and is defined as follows: IG(C, P ) =H(C) − We also performed our classification with a model using H(C | P ) where H denotes the entropy.our lexical features (i.e., TF weighted words). Finally, wetrained and tested a combined classification model usingpragmatic and lexical features, which leads to the follow- 4. RESULTSing classification models: In the following section, we present the results from our Lexical Model: We trained and tested the model on empirical study on usage patterns of different semantic cat-data from t1 using stratified 6-fold CV and repeated the egories of hashtags.experiment on data collected at t2 . Combined Pragmatic and Lexical Model: We trained 4.1 Usage Patterns of Hashtag Categoriesand tested the mixed classifier on the data of t1→0 us- To answer our first research question, we explored to whating stratified 6-fold CV, then repeated this experiment for extent usage patterns of hashtag streams in different seman-t2→1 . A simple concatenation of pragmatic and lexical fea- tic categories are indeed significantly different.tures is not useful, since the vast amount of lexical features Our results indicate that some pragmatic measures arewould overrule the pragmatic features. Therefore, we used indeed significantly different for distinct semantic categories.a stacking method (see [2]) and performed firstly a clas- This indicates that hashtags of certain categories are usedsification using lexical features alone and Leave-One-Out in a very specific way which may allow us to relate thesecross-validation. We used a SVM with linear kernel for hashtags with their semantic categories just by observingthis classification since it worked best for these features. how users use them. Table 3 depicts the measures that showSecondly, we combined the pragmatic features with the re- statistically significant (p < 0.05) differences in both t0→1sulting seven probability features which we got from the and t1→2 . In total, 35 pragmatic category differences wereprevious classification model and which describe how likely found to be statistically significant (with p < 0.05) for t0→1each semantic class is for a certain stream given its words. and 33 for t1→2 . 26 pragmatic category differences were To get a fair baseline for our experiment, we constructed a found to be significant for both t0→1 and t1→2 which suggestscontrol dataset by randomly shuffling the category labels of that results are independent of our choice of time frame.Table 3: Features which showed a statistically significant difference (with p < 0.05) for each pair of categoriesin both t0→1 and t1→2 games idioms movies music political sports idioms informational retweet movies informational music informational political kl followers kl authors kl followers kl followees informational hashtag sports kl followers kl authors kl followers informational technology kl followers kl authors kl friends kl friends overlap authorfollower kl followers overlap authorfriend kl followees kl friends informational retweet hashtag
  6. 6. technology technology sports sports political political music music movies movies idioms idioms games games 0.0 0.2 0.4 0.6 0.8 1.0 6 8 10 12 14 Informational Coverage KL Divergence Authors (a) This figure shows the percentage of mes- (b) This figure shows how much the au- sages of hashtag streams belonging to differ- thors’ tweet-frequency distributions of hash- ent categories that contain at least one link. tag streams of different categories change on average. technology technology sports sports political political music music movies movies idioms idioms games games 1 2 3 4 5 1 2 3 4 5 6 KL Divergence Followers KL Divergence Friends (c) This figure shows how much the follow- (d) This figure shows how much the friends’ ers’ author-frequency distributions of hash- author-frequency distributions of hashtag tag streams of different categories change on streams of different categories change on av- average. erage.Figure 3: Each plot shows the feature distribution of different categories of one of the 4 best pragmaticfeatures for t0→1 . We obtained similar results for t1→2 . Not surprisingly, the category which shows the most spe- The category technology can be distinguished from allcific usage patterns is idioms and therefore the hashtags of other categories except sports, particularly because its fol-this category can be distinguished from all hashtags just by lowers’ and friends’ author-frequency distributions haveanalyzing their pragmatic properties. Hashtag streams of significantly lower symmetric KL divergences than hash-the category idioms exhibit a significantly lower informa- tags in the categories games, idioms, movies and musictional coverage than hashtag streams of all other categories (see Figures 3(c) and 3(d)). This indicates that hashtag(see Figure 3(a)) and a significantly higher symmetric KL di- streams in the category technology have a stable socialvergence for author’s tweet-frequency distributions (see Fig- structure which changes less over time. This is not sur-ure 3(b)). Also the followers’ and friends’ author-frequency prising since this semantic category denotes a topical areadistributions tend to have a higher symmetric KL divergence and users who are interested in such areas may consumefor idioms hashtags than for other hashtags (see Figures 3(c) and provide information on a regular base. It is especiallyand 3(d)). This indicates that the social structure of hashtag interesting to note that the only pragmatic measures whichstreams in the category idioms changes faster than hashtags allows distinguishing political and technological hashtagof other categories. Furthermore, hashtag streams of this streams are the author-follower and author-friend over-category are less informative - i.e., contain significantly less laps since these overlaps are significantly lower for thelinks per message on average. category technology compared to the category political.
  7. 7. 1.0 1.0 Baseline performance Model performance Baseline performance Model performance 0.8 0.8 F1 measure F1 measure 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 Comb. Pragmatic and Lexical Combined Pragmatic Dynamic Pragmatic Lexical Static Pragmatic Comb. Pragmatic and Lexical Combined Pragmatic Dynamic Pragmatic Lexical Static Pragmatic (a) t1→0 (b) t2→1Figure 4: Weighted averaged F1-scores of different classification models trained and tested on t1→0 4(a) andt2→1 4(b) using 6-fold cross-validationThis indicates that the content of hashtag streams of thecategory political is far more likely to be produced and con- Table 4: Top features for two different datasetssumed by the same people than content of technological ranked via Information Gain Rank t1→0 t2→1hashtag streams. Comparing the individual measures reveals that the infor- 1 informational kl followers 2 kl followers informationalmational coverage (six category pairs) and the symmetric 3 kl friends hashtagKL divergences of followers’ author-frequency distributions 4 hashtag kl followees(six category pairs), authors’ tweet-frequency distributions 5 norm entropy friend kl friends(three pairs) and friends’ follower-frequency distributions(three pairs) are the most discriminative measures. Figure 3depicts the distributions of these four measures per category. 4.2.1 Feature Ranking:Other measures that show significant differences in medians In addition to the overall classification performance whichfor both t0→1 and t1→2 are the symmetric KL divergence can be achieved solely based on analyzing the pragmatics ofof followees’ author-frequency distributions (two pairs), the hashtags, we were also interested in gaining insights into theauthor-follower and the author-friend overlap (one pair) as impact of individual pragmatic features. To evaluate the in-well as the retweet and hashtag coverage (two pairs). dividual performance of the features we used information Some measures like the conversational coverage measure gain (with respect to the categories) as a ranking criterion.did not show any significant differences for any of the cat- The ranking was performed on t1→0 and t2→1 with stratifiedegory pairs, for any time frame. This indicates that in all 6-fold cross-validation. Table 4 shows that the top five fea-hashtag streams an equal amount of conversational activities tures (i.e., the pragmatic features which reveal most abouttake place. the semantic of hashtags) are features which capture the temporal dynamics of the social context of a hashtag (i.e.,4.2 Hashtag Classification the temporal follower, followees and friends dynamics) as In order to quantify the value of different pragmatic and well as the informational and hashtag coverage. This indi-lexical properties of hashtag streams for predicting their se- cates that the collective purpose for which a hashtag is usedmantic category, we conducted a hashtag stream classifi- (i.e., if it used to share information rather than for othercation experiment and systematically compared the perfor- purposes) and the social dynamics around a hashtags – i.e.,mance of various classification models trained with different who uses a hashtag for whom – play a key role in under-sets of features. standing its semantics. Figure 4 shows the performance of the best classifier (ex-tremely randomized trees) trained with different sets of fea-tures. One can see from this figure that in general lexical 5. DISCUSSION OF RESULTSfeatures perform better than pragmatic features, but also Although our results show that lexical features work bestthat pragmatic features (both static and dynamic) signifi- within the semantic classification task, those features arecantly outperform a random baseline. This indicates that text and language dependent. Therefore, their applicabilitypragmatic features indeed reveal information about a hash- is limited to settings where text is available. Pragmatic fea-tag’s meaning, even though they do not match the perfor- tures on the other hand rely on usage information which ismance of lexical features in this case. In 4(a) we can see that independent of the type of content which is shared in socialfor t1→0 the combination of lexical and pragmatic features streams and can therefore also be computed for social videoperforms slightly better than using lexical features alone. or image streams.
  8. 8. We believe that pragmatic features can supplement lex- Proceedings of the 19th International World Wideical features if lexical features alone are not sufficient. In Web Conference (WWW 2010), Raleigh, NC, USA,our experiments, we could see that the performance may Apr. 2010. ACM.slightly increase when combining pragmatic and lexical fea- [6] D. Laniado and P. Mika. Making sense of twitter. Intures. However, the effect was not significant. We think P. F. Patel-Schneider, Y. Pan, P. Hitzler, P. Mika,the reason for this is that in our setup lexical features alone L. Zhang, J. Z. Pan, I. Horrocks, and B. Glimm,already achieved good performance. editors, International Semantic Web Conference (1), The classification results coincide with the results of the volume 6496 of Lecture Notes in Computer Science,statistical significance tests. Ranking the properties by in- pages 470–485. Springer, 2010.formation gain showed that the most discriminative proper- [7] C. Messina. Groups for twitter; or a proposal forties (the ones that showed a statistical significance in both twitter tag channels.→1 and t1→2 for the highest amount of category pairs) 2007/08/25/groups-for-twitter-or-a-proposal-found in 4.1 were also the top ranked features (informational for-twitter-tag-channels/, 2007.coverage and the KL divergences). [8] M. G. Noll and C. Meinel. Exploring social annotations for web document classification. In6. CONCLUSIONS AND IMPLICATIONS Proceedings of the 2008 ACM symposium on Applied Our work suggests that the collective usage of hashtags computing, SAC ’08, pages 2315–2320, New York, NY,indeed reveals information about their semantics. How- USA, 2008. ACM.ever, further research is required to explore the relations [9] M. G. Noll and C. Meinel. The metadata triumvirate:between usage information and semantics, especially in do- Social annotations, anchor texts and search queries. Inmains where limited text is available. We hope that our Web Intelligence, pages 640–647. IEEE, 2008.research is a first step into this direction since it shows that [10] S. Overell, B. Sigurbj¨rnsson, and R. van Zwol. ohashtags of different semantic categories are indeed used in Classifying tags using open content resources. Indifferent ways. Proceedings of the Second ACM International Our work has implications for researchers and practition- Conference on Web Search and Data Mining, WSDMers interested in investigating the semantics of social media ’09, pages 64–73, New York, NY, USA, 2009. ACM.content. Social media applications such as Twitter provide [11] S. Panzeri and A. Treves. Analytical estimates ofa huge amount of textual information. Beside the textual limited sampling biases in different informationinformation, also usage information can be obtained from measures. Network: Computation in Neural Systems,these platforms and our work shows how this information 7(1):87–107, 1996.can be exploited for assigning semantic annotations to tex- [12] D. M. Romero, B. Meeder, and J. Kleinberg.tual data streams. Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex7. ACKNOWLEDGEMENTS contagion on twitter. In Proceedings of the 20th international conference on World wide web, WWW This work was supported in part by a DOC-fForte fellow- ’11, pages 695–704, New York, NY, USA, 2011. ACM.ship of the Austrian Academy of Science to Claudia Wagner. [13] P. Schmitz. Inducing ontology from flickr tags. InFurthermore, this work is in part funded by the FWF Aus- Proceedings of the Workshop on Collaborative Taggingtrian Science Fund Grant I677 and the Know-Center Graz. at WWW2006, Edinburgh, Scotland, May 2006. [14] M. Strohmaier, D. Helic, D. Benz, C. K¨rner, and o8. REFERENCES R. Kern. Evaluation of folksonomy induction [1] D. Benz, C. K¨rner, A. Hotho, G. Stumme, and o algorithms. Transactions on Intelligent Systems and M. Strohmaier. One tag to bind them all: Measuring Technology, 2012. term abstractness in social metadata. In Proc. of 8th [15] O. Tsur and A. Rappoport. What’s in a hashtag?: Extended Semantic Web Conference ESWC 2011, content based prediction of the spread of ideas in Heraklion, Crete, (May 2011), 2011. microblogging communities. In Proceedings of the fifth [2] T. Hastie, R. Tibshirani, and J. Friedman. The ACM international conference on Web search and Elements of Statistical Learning. Springer Series in data mining, WSDM ’12, pages 643–652, New York, Statistics. Springer New York Inc., New York, NY, NY, USA, 2012. ACM. USA, 2001. [16] C. Wagner and M. Strohmaier. The wisdom in [3] P. Heymann and H. Garcia-Molina. Collaborative tweetonomies: Acquiring latent conceptual structures creation of communal hierarchical taxonomies in social from social awareness streams. In Semantic Search tagging systems. Technical Report 2006-10, Computer Workshop at WWW2010, 2010. Science Department, April 2006. [17] L. Wittgenstein. Philosophical Investigations. [4] J. Huang, K. M. Thornton, and E. N. Efthimiadis. Blackwell Publishers, London, United Kingdom, 1953. Conversational tagging in twitter. In Proceedings of Republished 2001. the 21st ACM conference on Hypertext and [18] L. Yang, T. Sun, M. Zhang, and Q. Mei. We know hypermedia, HT ’10, pages 173–178, New York, NY, what @you #tag: does the dual role affect hashtag USA, 2010. ACM. adoption? In Proceedings of the 21st international [5] C. K¨rner, D. Benz, M. Strohmaier, A. Hotho, and o conference on World Wide Web, WWW ’12, pages G. Stumme. Stop thinking, start tagging - tag 261–270, New York, NY, USA, 2012. ACM. semantics emerge from collaborative verbosity. In