Based on paper Understanding and Classifying Image Tweets
ACM-MM 2013
Disclaimer: I am not any kind of author of this paper. I have used that paper as a basis for my course project proposal.
Top Rated Pune Call Girls Pashan β 6297143586 β Call Me For Genuine Sex Serv...
Β
CS531presentation
1. Investigating Images Related to
Twitter Trending Topics
1
MUSTAFA ILKER SARAC
20801528
UNDERSTANDING AND CLASSIFYING IMAGE
TWEETS
ACM-MM 2013
CS531 - Mustafa Ilker SARAC
1/13/2014
2. Content
2
ο Introduction
ο Motivation
ο Image-Tweets
ο Image and Text Relation
ο Visual/Non-Visual Classification
ο Experiments
ο Initial Results
CS531 - Mustafa Ilker SARAC
1/13/2014
3. Introduction
3
ο Image-tweets
ο‘ Correlation between tweetβs image and text
ο 50% of all posts are image-tweets
ο Image tweets retweeted more and survived longer
CS531 - Mustafa Ilker SARAC
1/13/2014
4. Motivation
4
ο Questions to ask
ο‘ What types of images do users embed?
ο‘ Do the images distinctly differ from images on image/photosharing websites like Flickr?
ο‘ Do the textual contents of image tweets differ from posts that
are text-only?
ο Contributions
ο‘ Corpus
ο‘ Annotated subset
ο‘ Built a classifier to distinguish two subclasses of image-tweets;
Visual
ο· Non-Visual
ο·
CS531 - Mustafa Ilker SARAC
1/13/2014
5. Image-Tweets
5
ο Corpus
ο‘ Text-only and image-tweets from Weibo
ο‘ 7 months in 2012
ο‘ ~57M tweets
ο‘ Manually annotated ~5K subset
CS531 - Mustafa Ilker SARAC
1/13/2014
6. Image-Tweets
6
ο Image Characteristics
ο‘ Images are post-processed by Weibo
ο‘ 45.1% of the corpus are image-tweets
ο‘ Images vary by quality and topics
ο·
70% of annotated corpus are natural photograph.
CS531 - Mustafa Ilker SARAC
1/13/2014
7. Image-Tweets
7
ο Image-tweets vs. Text-only When? What? Why?
ο‘ More image-tweets during daytime β When?
ο‘ LDA applied to a subset, ~1M, of corpus β What?
ο·
ο‘
k=50 latent topics are learned
Daily chatter or information sharing β Why?
CS531 - Mustafa Ilker SARAC
1/13/2014
8. Image and Text Relation
8
ο 99% of image tweets have text.
ο‘ Status (event, time ,location)
ο‘ Logico β semantic
CS531 - Mustafa Ilker SARAC
1/13/2014
9. Image and Text Relation
9
ο Visually-relevant image-tweets
ο‘ At least one noun or verb corresponds to part of the image
ο Non-visual image-tweets
ο‘ Image and text has no visual correspondence
ο‘ Hard to distinguish by just looking images
ο‘ May exhibit emotional relevance
CS531 - Mustafa Ilker SARAC
1/13/2014
10. Visual/Non-Visual Classification
10
ο Dataset Construction
ο‘ Crowdsourcing to label a random subset of the image-tweets
Visual
ο· Non-visual
ο·
ο‘
ο‘
Each image is annotated by 3 different subjects
4811 image-tweets annotated
3206 (2/3) visual
ο· 1605 (1/3) non-visual
ο·
ο‘
3 major types of features are used
Text
ο· Image
ο· Context
ο·
CS531 - Mustafa Ilker SARAC
1/13/2014
11. Visual/Non-Visual Classification
11
ο Text Features
ο‘ Binary word features
ο‘ Previously learned topics from LDA
ο‘ Part of Speech(POS) density features
ο‘ Named Entities
ο‘ Microblog specific features
@mentions
ο· #hashtags
ο· Geolocation
ο· URLs
ο·
CS531 - Mustafa Ilker SARAC
1/13/2014
12. Visual/Non-Visual Classification
12
ο Image features
ο‘ Face detection
ο‘ SIFT features with bag of visual words representation
ο·
Applied LDA with k=35
ο Context Features
ο‘ Retweets
ο‘ Comments
ο‘ Follower Ratio
ο‘ Posting Time etc.
CS531 - Mustafa Ilker SARAC
1/13/2014
13. Experiment
13
ο 10 fold cross-validation with NaΓ―ve Bayes is
performed
ο Macro-averaged F1 score is computed.
ο Baseline is using only words as feature
ο‘
F1 = 64.8
ο Each feature is combined individually to observe the
impact.
ο When combined all positive features
ο‘
F1 = 70.5
CS531 - Mustafa Ilker SARAC
1/13/2014
15. Proposed Work
15
ο Re-rank images of image-tweets returned by Twitter
search
ο Select good images in order to represent Trending
Topics.
ο Twitter scraped and some initial results are obtained
using
ο‘
ο‘
ο‘
Retweets,
Favorites for contextual features
SIFT for image features to compare images.
CS531 - Mustafa Ilker SARAC
1/13/2014