CS531presentation

Investigating Images Related to
Twitter Trending Topics
1
MUSTAFA ILKER SARAC
20801528

UNDERSTANDING AND CLASSIFYING IMAGE
TWEETS
ACM-MM 2013

CS531 - Mustafa Ilker SARAC

1/13/2014

Content
2

 Introduction
 Motivation
 Image-Tweets
 Image and Text Relation
 Visual/Non-Visual Classification
 Experiments
 Initial Results


1/13/2014

Introduction
3

 Image-tweets
 Correlation between tweet’s image and text
 50% of all posts are image-tweets
 Image tweets retweeted more and survived longer


1/13/2014

Motivation
4

 Questions to ask
 What types of images do users embed?
 Do the images distinctly differ from images on image/photosharing websites like Flickr?
 Do the textual contents of image tweets differ from posts that
are text-only?

 Contributions
 Corpus
 Annotated subset
 Built a classifier to distinguish two subclasses of image-tweets;
Visual
 Non-Visual



1/13/2014

Image-Tweets
5

 Corpus
 Text-only and image-tweets from Weibo
 7 months in 2012
 ~57M tweets
 Manually annotated ~5K subset


1/13/2014

Image-Tweets
6

 Image Characteristics
 Images are post-processed by Weibo
 45.1% of the corpus are image-tweets
 Images vary by quality and topics


70% of annotated corpus are natural photograph.


1/13/2014

Image-Tweets
7

 Image-tweets vs. Text-only When? What? Why?
 More image-tweets during daytime – When?
 LDA applied to a subset, ~1M, of corpus – What?




k=50 latent topics are learned

Daily chatter or information sharing – Why?


1/13/2014

Image and Text Relation
8

 99% of image tweets have text.
 Status (event, time ,location)
 Logico – semantic


1/13/2014

Image and Text Relation
9

 Visually-relevant image-tweets
 At least one noun or verb corresponds to part of the image
 Non-visual image-tweets
 Image and text has no visual correspondence
 Hard to distinguish by just looking images
 May exhibit emotional relevance


1/13/2014

Visual/Non-Visual Classification
10

 Dataset Construction
 Crowdsourcing to label a random subset of the image-tweets
Visual
 Non-visual





Each image is annotated by 3 different subjects
4811 image-tweets annotated
3206 (2/3) visual
 1605 (1/3) non-visual




3 major types of features are used
Text
 Image
 Context



1/13/2014

11

 Text Features
 Binary word features
 Previously learned topics from LDA
 Part of Speech(POS) density features
 Named Entities
 Microblog specific features
@mentions
 #hashtags
 Geolocation
 URLs



1/13/2014

12

 Image features
 Face detection
 SIFT features with bag of visual words representation


Applied LDA with k=35

 Context Features
 Retweets
 Comments
 Follower Ratio
 Posting Time etc.


1/13/2014

Experiment
13

 10 fold cross-validation with Naïve Bayes is

performed
 Macro-averaged F1 score is computed.
 Baseline is using only words as feature


F1 = 64.8

 Each feature is combined individually to observe the

impact.
 When combined all positive features


F1 = 70.5


1/13/2014

Experiment
14


1/13/2014

Proposed Work
15

 Re-rank images of image-tweets returned by Twitter

search
 Select good images in order to represent Trending
Topics.
 Twitter scraped and some initial results are obtained
using




Retweets,
Favorites for contextual features
SIFT for image features to compare images.


1/13/2014

Initial Results
16


1/13/2014

Thank You
17

QUESTIONS?


1/13/2014

CS531presentation

Recommended

Recommended

More Related Content

More from mustafa sarac

More from mustafa sarac (20)

Recently uploaded

Recently uploaded (20)

CS531presentation

Editor's Notes