2. What is twitter?
• 140 character tweet
• Hashtag # before relevant keywords in tweet
• RT means to “re-tweet” or forward a tweet
• @ reference refers to a user’s screen name
3. Why it is different?
• Very short in length
• Written in informal style
• Social
4. What is twitter, a social network or a
news media?(www2010)
• Following is mostly not reciprocated(not so
“social”)
• Users talk about timely topics
• A few users reach large audience directly
• Most users can reach large audience by word-
of-mouth quickly
6. Analysis 1: Take the people out
• Krishnamurthy et al (2008)
• users were classified by
follower/following counts,
Numbers and ratios
• means and mechanisms of their
engagement
Web (61.7%), mobile/text (7.5%),
software (22.4%)
7. Analysis 2: Content Category
Four meta-categories
• daily chatter
• conversations
• information / URL sharing
• news reporting
8. Analysis 3: measuring user influence
• Indegree, retweets and mentions
• Strong correlation between retweet and
mention
• Most connected != most influential
13. How to detect spam?
• classification
• Content attributes
hashtags, trending topics
replies, mentions, http links
• User behavior attributes
age of user account
• Graph based attribute
14. Sentiment analysis
• Supervised classification
• Training data come from twitter, instead of
human labeled
• Happy emotions: “:-)”, “:)”, “=)”, “:D” etc
• Sad emotions: “:-(”, “:(”, “=(”, “;(” etc
• Objective: newspapers and magzines
such as “NY times”
19. Recommending content from
information streams
• The filtering problem:
– “I get 1000+ items in my stream daily but only
have time to read 10 of them. Which ones should I
read?”
• The Discovery Problem:
– “There are millions of URLs posted daily on
twitter. Am I missing something important there
outside my own Twitter stream?”
20. Recommending content from
information streams
• Recency of content: only interesting within a
short time after published.
– always a “cold start” situation
• Explicit interaction among users
– Explicitly interact by subscribing or sharing
• User-generated content
– People are content producers as well as
consumers
22. URL Sources
• Considering all URLs was impossible
• FoF : URLs from followee-of-followees
• Popular : URLs that are popular across whole
twitter
23. Topic relevance scores
• Topic profile of URLs
– Use term vectors as profiles
– Built from tweets that have mentioned the URL
• Topic profile of users
– Self-topic: content profile based on what I post
– Followee-Topic: content profile based on what my
followees post
24. Social network scores
• “Popular Vote” in among my followees-of-
followees
– People “vote” a URL by tweeting it
– Votes are weighted using social network structure
– URLs with more votes in total are assigned higher
score
25.
26. Recommending twitter users to follow
• Social graph
• Profile user
– User himself
– Followers
– followees
28. The phrase reinforcement algorithm
• Looking for the most commonly occurring
phrases
– Users tend to use similar words when describing a
particular topic
– RT
29.
30. Hybrid TF-IDF summarization
• TF: the document is the entire collection of
posts
• IDF: the document is a single post
32. Content modeling on Twitter
tf.idf cosine
similarity,
Surface word
etc. features
Deeper
Parsing, parts of dats yur mom not
speech, coreferen
natural
me lol
ce, etc language
processing THE_REAL_SHAQ
32
33. Content modeling on Twitter
tf.idf cosine
Topic Latent Dirichlet
similarity,
Surface word models, Dimen Allocation (LDA),
etc. features sionality LSA, etc.
reduction
Supervised
classification
#hashtags, emotico
ns, questions, etc. Labeled LDA
Best model in
Naïve Bayes,
ranking
SVM, etc.
experiments
33
34. Content modeling with Labeled LDA
Discover unlabeled topics Model common labels
Parameter K=200 latent 500 - 1000 dimensions for
topic dimensions hashtags, emoticons, etc.
obama president Smile : )
american
america says :) good day
country russia morning thanks #jobs
pope island have happy
#jobs featured
hope birthday
I’m going go out manager sales
gonna see im :) can‘t wait see engineer yahoo
tonight sleep one yay!!! cant location senior
tomorrow about tomorrow got !!
am night next christmas
34
35. Content modeling with Labeled LDA
4 1 1 1
new muppetblog political commentary link
2 2 2 3 3
@kermit heyy wanna catch a movie
5 5 #yummy #yummy
just ate a cookie #yummy
Histogram as signature
for set of posts
35
36. Twitter content by category
can make help if someone obama president american
tell_me them anyone use america says country russia
makes any sense trying explain pope island failed honduras
up what's hit pick whats hey iphone new phone app mobile
set twitter sign give catch Social apple ipod blackberry touch
when show first wats make 23% Substance pro store apps free android an
27%
Status
12%
haha lol :) funny :p omg Style am still doing sleep so going
hahaha yeah too yes thats ha 38% tired bed awake supposed hell
wow cool lmao though kinda asleep early sleeping sleepy
im get dont gonna shit gotta night sleep bed going off
wanna cuz damn ur make cant tomorrow bye tonight
say cause bout ill mad tired goodnight all im time now nite
36
37. Characterizing Microblogs with Topic
Models
Outline
• Modeling Twitter content with topic models
• Characterizing, recommending and filtering
37
40. TwitterRank: Finding Topic-sensitive
Influential Twitterers
• Apply LDA to distill topics automatically
• Find topics in the twitterer’s content to
represent her interests
– Twitterer’s content = aggregated tweets
• Twitterers with “following” relationships are
more similar than those without according to
the topics they are interested in
42. Interesting application
• Personalized and automatic social
summarization of events in video
• Twitter Can Predict the Stock Market
• Predicting elections with twitter
• Earthquake(time, location)
Hashtags are indicated by a # symbol and are combined with keywords to indicate a topic of interest. Hashtags become popular when many people use it. Popular topics, known as “trending” topics, appear on the main twitter page and can significantly increase the number of tweets containing that topic.
http://www.slideshare.net/haewoon/what-is-twitter-a-social-network-or-a-news-media-3922095OSN we are friendsTwitter follow youMedia the means of communication, as radio and television, newspapers, and magazines, that reach or influence people widelyOnly 22.1% user pairs follow each other (flickr 68%, 84 yahoo% )Majority of topics are headlineTwitter user ranking by followers, pagerank, and RT Followers, pagerank(actor, musician, show host, sports star, model)RT (news)A retweet brings a few hundred additional readers (55% of RT < 1hr)Summary:Low reciprocity distinguishes twitter from OSNsTwitter hasw characteristics of news media: 1. tweets mentioning timely topics 2. plenty of hubs reaching a large public directly 3. fast and wide spread of word-of-mouth
The follower/followee ratio “matters” more than raw number of followersFollowing people is a simple way to get followers
TunkRank is an influence ranking tool that helps you identify leading influencers on Twitter. There are two basic ideas:The amount of attention you can give is spread out among all those you follow. The more you follow, the less attention you can give each one.Your influence depends on the amount of attention your followers can give you.As a twitterer, your influence does not depend on how many people you follow. However, your usefulness as a follower does. Having higher influence depends on having many followers who follow relatively few people but are followed by many. Followers like that are more likely to read your tweets and act on them (retweeting, clicking links, responding, blogging, etc). Their influence trickles up to you.Your TunkRank score is a reflection of how much attention your followers can both directly give you and how much attention they bring you from their network of followers.
External URLLetter+number patterns in usernamesSuggestive keywords (“naked”, “girls”, “webcam”)Propagation tree
Context extraction algorithm(such as PCA, SVD) over the recent history of the trend and reports the keywords that most correlated with it.For example, thekeyword ‘NBA’ may usually appear in 5 tweets per minute,yet suddenly exhibit a rate of 100 tweets/min.
Lots of celebrity names–lady gaga@ and # reduce ambiguity like advanced query operators•Hashtagqueries particularly popular–Most popular queries: Hashtag51% of the time–Least popular queries: Hashtag7% of the time•Celebrity queries particularly popular–Most popular queries: Celebrity 25% of the time–Least popular queries: Celebrity 4% of the time•Twitter queries less diverse than Web queries–Only 1 in 4 unique (v. 2 in 4 unique)
http://www.slideshare.net/PARCInc/recommending-content-from-social-information-streamsThere is no collective filtering