Future Work</li></li></ul><li>Introduction<br />Motivation:<br />We are interested in studying how social networks and textual information associated with entities in the networkcan be modeled for insight?<br />Goal : <br /> To create a model for annotating entities and links using <br />The social network<br />Textual content<br />Dataset : <br />Social network of 36 Million twitter users and 450 Million tweets<br />Applications <br />Targeted Advertising, <br />Friend suggestions<br />etc.,<br />
Future Work</li></li></ul><li>Prior Work<br />Large scale analysis of user behavior on Twitter "What is Twitter, a Social Network or a News Media” by Kwak et. al.<br />Studies the propagation of information through the network using retweets, to determine user influence. <br />“Automatic generation of personalized annotation tags for Twitter users” by Wu et. al.<br />Uses TFIDF weights to assign tags to each user, using textual information alone.<br />
Prior Work: Motivation<br />Connections between the lines: Augmenting social networks with text published by Chang et. al.<br />Using Wikipedia and Bible, annotated with entities, a network between entities and a topic model is constructed.<br />
Prior Work: Other models<br />Block-LDA : Jointly modeling entity-annotated text and entity-entity links by Cohen et. al. (Protein-Protein Interaction dataset)<br />Predefined undirected network & text associated with each node<br />
Prior Work: Other models<br />Topic-link LDA : joint models of topic and author communities by Liu et. al. <br />Corpus of academic publications modeled using Bayesian hierarchical topic model<br />To find topics within those papers as well as community of authors<br />
Stemming, stop word removal and rare word removal</li></li></ul><li>Methodology: Generating annotations<br />A topic is considered to be relevant to a user if the probability exceeds 0.05<br />Users are annotated using topics generated by the LDA model<br />For a link between users we take intersection of the topics generated for each user forming the link. <br />We also detect general topics, by comparing topics generated for randomly selected users from the network (not the community)<br />
Methodology: Evaluation<br /><ul><li>Select a single community
Generate the LDA model from all users within that community
Generate topic probabilities using the model for a set of randomly selected users
A classifier (linear SVM) is used to discriminate between a users in the community and randomly selected.
Future Work</li></li></ul><li>Future Work<br />Use Hierarchical Dirichlet Processes to determine the number of topics automatically.<br />Also use online version of LDA currently being developed by David Blei at Princeton. Which will allow the possibility of generating topic distribution over whole twitter dataset.<br />
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.