Faking Sandy: Characterizing and
Identifying Fake Images on Twitter
during Hurricane Sandy	

Adi$	
  Gupta	
  
Hemank	
  L...
Agenda
 Background	
  
 Mo$va$on	
  
 Main	
  contribu$ons	
  
 Data	
  descrip$on	
  
 Methodology	
  
 Results	
  ...
Background: Hurricane Sandy
 Dates:	
  Oct	
  22-­‐	
  31,	
  2012	
  
 Category	
  3	
  storm	
  
 Damages	
  worth	
 ...
Motivation
FAKE	
  
IMAGES	
  

	
  precog.iiitd.edu.in	
  

4	
  
Motivation

	
  precog.iiitd.edu.in	
  

5	
  
Our Contributions
  Performed	
  in-­‐depth	
  characteriza$on	
  of	
  tweets	
  sharing	
  
fake	
  images	
  on	
  Twi...
Methodology

Feature	
  
Genera$on	
  
Data	
  Collec$on	
  
and	
  Filtering	
  

Evalua$ng	
  
Results	
  

Data	
  
Cha...
Data Description
Total	
  tweets	
  
Total	
  unique	
  users	
  

1,174,266	
  

Tweets	
  with	
  URLs	
  

	
  precog.i...
Data Filtering
 Reputable	
  online	
  resource	
  to	
  filter	
  fake	
  and	
  real	
  
images	
  
- Guardian	
  collec...
Characterization – Fake Image
Propagation
 86%	
  of	
  tweets	
  spreading	
  the	
  fake	
  images	
  were	
  
retweets...
Network Analysis

Tweet	
  –	
  Retweet	
  graph	
  for	
  the	
  spread	
  of	
  
fake	
  images	
  at	
  ‘nth’	
  and	
 ...
Role of Explicit Twitter Network
 Analyzing	
  role	
  of	
  follower	
  network	
  in	
  fake	
  
image	
  propaga$on	
 ...
Algorithm

	
  precog.iiitd.edu.in	
  

13	
  
Results
Total	
  edges	
  in	
  the	
  retweet	
  network	
  
Total	
  edges	
  in	
  the	
  follower-­‐followee	
  networ...
Classification
 5	
  fold	
  cross	
  valida$on	
  
User	
  Features	
  [F1]	
  
Number	
  of	
  Friends	
  
Number	
  of...
Classification Results
F1	
  (user)	
   F2	
  (tweet)	
  

F1+F2	
  

Naïve	
  Bayes	
  

56.32%	
  

91.97%	
  

91.52%	
...
Future Work
Fake	
  Charity	
  

Rumor	
  
	
  precog.iiitd.edu.in	
  

17	
  
Some Attractions

	
  precog.iiitd.edu.in	
  

18	
  
Some Attractions

	
  precog.iiitd.edu.in	
  

19	
  
Q&A

	
  precog.iiitd.edu.in	
  

20	
  
Thank You!!!
Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy
Upcoming SlideShare
Loading in …5
×

Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

493 views

Published on

In today's world, online social media plays a vital role during real world events, especially crisis events. There are both positive and negative effects of social media coverage of events, it can be used by authorities for effective disaster management or by malicious entities to spread rumors and fake news. The aim of this paper, is to highlight the role of Twitter, during Hurricane Sandy (2012) to spread fake images about the disaster. We identified 10,350 unique tweets containing fake images that were circulated on Twitter, during Hurricane Sandy. We performed a characterization analysis, to understand the temporal, social reputation and influence patterns for the spread of fake images. Eighty six percent of tweets spreading the fake images were retweets, hence very few were original tweets. Our results showed that top thirty users out of 10,215 users (0.3%) resulted in 90% of the retweets of fake images; also network links such as follower relationships of Twitter, contributed very less (only 11%) to the spread of these fake photos URLs. Next, we used classification models, to distinguish fake images from real images of Hurricane Sandy. Best results were obtained from Decision Tree classifier, we got 97% accuracy in predicting fake images from real. Also, tweet based features were very effective in distinguishing fake images tweets from real, while the performance of user based features was very poor. Our results, showed that, automated techniques can be used in identifying real images from fake images posted on Twitter.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
493
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy

  1. 1. Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hurricane Sandy Adi$  Gupta   Hemank  Lamba   Ponnurangam  Kumaraguru   Anupam  Joshi    
  2. 2. Agenda  Background    Mo$va$on    Main  contribu$ons    Data  descrip$on    Methodology    Results    Future  work    precog.iiitd.edu.in   2  
  3. 3. Background: Hurricane Sandy  Dates:  Oct  22-­‐  31,  2012    Category  3  storm    Damages  worth  $75  billion    Coast  of  NE  America    precog.iiitd.edu.in   3  
  4. 4. Motivation FAKE   IMAGES    precog.iiitd.edu.in   4  
  5. 5. Motivation  precog.iiitd.edu.in   5  
  6. 6. Our Contributions   Performed  in-­‐depth  characteriza$on  of  tweets  sharing   fake  images  on  TwiVer  during  Hurricane  Sandy   -  Tweets  containing  the  fake  images  URLs  were  mostly   retweets  (86%)   -  Just  11%  overlap  between  the  retweet  and  follower  graphs  of   tweets  containing  fake  images.         Applied  classifica$on  algorithms  to  dis$nguish  between   tweets  containing  fake  and  real  images.     -  Best  accuracy  of  97%  was  achieved  using  decision  tree   classifier,  using  tweet  based  features.      precog.iiitd.edu.in   6  
  7. 7. Methodology Feature   Genera$on   Data  Collec$on   and  Filtering   Evalua$ng   Results   Data   Characteriza$on   Obtaining   Ground  Truth   Classifica$on  Module    precog.iiitd.edu.in   7  
  8. 8. Data Description Total  tweets   Total  unique  users   1,174,266   Tweets  with  URLs    precog.iiitd.edu.in   1,782,526   622,860   8  
  9. 9. Data Filtering  Reputable  online  resource  to  filter  fake  and  real   images   - Guardian  collected  and  publically  distributed  a  list   of  fake  and  true  images  shared  during  Hurricane   Sandy       Tweets  with  fake  images   10,350     Users  with  fake  images   10,215       Tweets  with  real  images   5,767       Users  with  real  images   5,678      One  of  the  biggest  fake  content  propaga$on   datasets  that  have  been  studied  by  researchers    precog.iiitd.edu.in   9  
  10. 10. Characterization – Fake Image Propagation  86%  of  tweets  spreading  the  fake  images  were   retweets    Top  30  users  out  of  10,215  users  (0.3%)  resulted   in  90%  of  the  retweets  of  fake  images    precog.iiitd.edu.in   10  
  11. 11. Network Analysis Tweet  –  Retweet  graph  for  the  spread  of   fake  images  at  ‘nth’  and  ‘n+1th‘  hour    precog.iiitd.edu.in   11  
  12. 12. Role of Explicit Twitter Network  Analyzing  role  of  follower  network  in  fake   image  propaga$on    We  crawled  the  TwiVer  network  for  all  users   who  tweeted  the  fake  image  URLs    precog.iiitd.edu.in   12  
  13. 13. Algorithm  precog.iiitd.edu.in   13  
  14. 14. Results Total  edges  in  the  retweet  network   Total  edges  in  the  follower-­‐followee  network   Total  edges  that  exist  in  both  retweet   network  and  the  follower-­‐  followee  network   %age  Overlap    precog.iiitd.edu.in   10,508   10,799,122   1,215   11%   14  
  15. 15. Classification  5  fold  cross  valida$on   User  Features  [F1]   Number  of  Friends   Number  of  Followers   Follower-­‐Friend  Ra$o   Number  of  $mes  listed   User  has  a  URL   User  is  a  verified  user   Age  of  user  account    precog.iiitd.edu.in   Tweet  Features  [F2]   Length  of  Tweet   Number  of  Words   Contains  Ques$on  Mark?   Contains  Exclama$on  Mark?   Number  of  Ques$on  Marks   Number  of  Exclama$on  Marks   Contains  Happy  Emo$con   Contains  Sad  Emo$con   Contains  First  Order  Pronoun   Contains  Second  Order  Pronoun   Contains  Third  Order  Pronoun   Number  of  uppercase  characters   Number  of  nega$ve  sen$ment  words   Number  of  posi$ve  sen$ment  words   Number  of  men$ons   Number  of  hashtags   Number  of  URLs   Retweet  count   15  
  16. 16. Classification Results F1  (user)   F2  (tweet)   F1+F2   Naïve  Bayes   56.32%   91.97%   91.52%   Decision  Tree   53.24%   97.65%   96.65%   •  Best  results  were  obtained  from  Decision  Tree  classifier,  we  got  97%   accuracy  in  predic$ng  fake  images  from  real.     •  Tweet  based  features  are  very  effec$ve  in  dis$nguishing  fake  images  tweets   from  real,  while  the  performance  of  user  based  features  was  very  poor.       •  Our  results  provided  a  proof  of  concept  that,  automated  techniques  can  be   used  in  iden$fying  real  images  from  fake  images  posted  on  TwiVer.      precog.iiitd.edu.in   16  
  17. 17. Future Work Fake  Charity   Rumor    precog.iiitd.edu.in   17  
  18. 18. Some Attractions  precog.iiitd.edu.in   18  
  19. 19. Some Attractions  precog.iiitd.edu.in   19  
  20. 20. Q&A  precog.iiitd.edu.in   20  
  21. 21. Thank You!!!

×