Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Towards Understanding Crisis Events On Online Social Networks Through Pictures


Published on

Extensive research has been conducted to identify, analyze and measure popular topics and public sentiment on Online Social Networks (OSNs) through text, especially during crisis events. However, little work has been done to understand
such events through pictures posted on these networks. Given the potential of visual content for influencing users’ thoughts and emotions, we perform a large-scale analysis to study and compare popular themes and sentiment across images and textual content posted on Facebook during the terror attacks that took place in Paris in 2015. We propose a generalizable and highly automated 3-tier pipeline which utilizes state-of-the-art computer vision techniques to extract high-level human understandable image descriptors.

Published in: Engineering

Towards Understanding Crisis Events On Online Social Networks Through Pictures

  1. 1. Towards Understanding Crisis Events On Online Social Networks Through Pictures IEEE/ACM Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2017 Prateek Dewan, Anshuman Suri, Varun Bharadhwaj, Aditi Mithal, Ponnurangam Kumaraguru Precog@IIITD Indraprastha Institute of Information Technology – Delhi (IIITD)
  2. 2. Who am I? • PhD student at IIIT-Delhi, India • 2012 – present • Masters (Information Security), IIIT-Delhi (2010 – 2012) • Funded by the Government of India, IIIT-Delhi, IBM, National Internet eXchange of India (NIXI)… • Part of Precog@IIITD • Privacy, eCrime, Online Social Networks, Data Science for Security and Privacy • Research interests • Privacy and Security in Online Social Media, Web Security, Machine Learning • Data Scientist at Apple 2
  3. 3. An example to start… 3
  4. 4. The Human Brain: Images versus text • Human brain processes images 60,000 times faster than text 4
  5. 5. “A Picture Is Worth A Thousand Words” • Images are the latest way of communicating on OSNs • 1.8 billion+ pictures shared on Online Social Networks every day • Images attract much more attention and engagement as compared to text • Tweets with images get 18% more clicks, 150% more retweets • 93% of most engaging content on Facebook has an image 5
  6. 6. Are we doing enough to "understand" images? • Most research to analyze social media content focuses on text • Topics are understood using topic modelling on text • Sentiment is understood by subjecting textual content to linguistic techniques • Is that enough? Does it capture everything? • Studies related to images are limited to small scale • Few hundred images manually annotated and analyzed • What can be done? • Automated techniques for image summarization using Deep Learning and Convolutional Neural Networks (CNNs) to scale across large no. of images • Domain transfer learning: Using existing knowledge in one domain to understand another domain • Optical Character Recognition 6
  7. 7. What do we study? • Crisis event • Terrorist attacks in Paris, France in November 2015 • Images on Social Networks • Facebook • Data collection – Facebook Graph API Search • #ParisAttacks • #PrayForParis 7 Unique posts 131,548 Unique users 106,275 Posts with images 75,277 Total images extracted 57,748 Total unique images 15,123
  8. 8. Methodology • 3-tier pipeline for extracting high level image descriptors from images 8 Images Themes (Inception v3) Image Sentiment (DeCAF trained on SentiBank) Optical Character Recognition Human understandable descriptors Text Sentiment (LIWC) + Topics(TF) Manual calibration Tier 1: Visual Themes Tier 2: Image Sentiment Tier 3: Text embedded in images
  9. 9. Tier I: Visual Themes • ImageNet Large Scale Visual Recognition Challenge (ILSVRC), 2012 • 1.2 million images, 1,000 categories • Winner: Google’s Inception-v3 (top-1 error: 17.2%) • 48-layer Deep Convolutional Neural Network 9
  10. 10. Tier I: Visual Themes contd. • All images labeled using Inception-v3 • Validation: • Random sample of 2,545 images annotated by 3 human annotators • 38.87% accuracy (majority voting) • Manual calibration • Renamed 7 out of the top 30 (most frequently occurring) labels • New accuracy: 51.3% • Why rename?  10 Bolo Tie (Inception-v3) PeaceForParis (Our dataset)
  11. 11. Tier II: Image Sentiment • Domain Transfer Learning • Inception-v3’s last layer retrained using SentiBank • SentiBank • Images collected from Flickr using Adjective Noun Pairs (ANPs) as search query • ANPs: happy dog, adorable baby, abandoned house • Weakly labeled dataset of images carrying emotion • Final training set – 133,108 negative + 305,100 positive sentiment images • 10-fold random subsampling • 69.8% accuracy 11
  12. 12. Tier III: Text embedded in images • Optical Character Recognition (OCR) • Tesseract OCR (Python) • 31,689 images had text • Manually extracted text from a random sample of 1,000 images • Compared with OCR output using string similarity metrics • ~62% accuracy 12 Tesseract output: No-one thinks that these people are representative of Christians. So why do so many think that these people are representative of Muslims?
  13. 13. Helix Demo 13
  14. 14. Findings: Top visual themes 14 Label Count Description Website 12,416 Images of posts, tweets, banners, etc. Book jacket * 5,383 Posters, banners, etc. Comic book 3,803 Cartoons, animated posters and memes Fountain 1,264 Fountain in front of the Louvre museum, other fountains Envelope * 1,248 Posters, banners, etc. Suit (clothing) 1,246 People wearing suits, at gatherings etc. Stage 1,135 Stages during public speeches, mass gathering events, etc. Candle waxlight 1,021 Lit candles and lamps offering support to victims Malinois # 995 Police dog that died during the attacks Scoreboard # 971 Images of sports stadium
  15. 15. Poor quality image content popular on Facebook 15
  16. 16. Image and post text had different topics • Text embedded in images depicted more negative sentiment than user generated textual content 16 Text embedded in images User generated text
  17. 17. Findings • Image sentiment was more positive than text sentiment 17 0 0.1 0.2 0.3 0.4 0.5 0.6 8 24 40 56 72 88 104 120 136 152 168 184 200 216 232 248 264 280 SentimentValue/VolumeFraction No. of hours after the attacks Post Text Image Text Image Volume Fraction
  18. 18. Contrasting sentiment in text and image 18
  19. 19. Contributions • Insights into the visual side of content during crisis events on social networks • Generalizable methodology / pipeline for analyzing large topical image datasets 19
  20. 20. Limitations • Object detection technique has limited accuracy • Retraining is costly; we prefer manual intervention over retraining • Sentiment portrayed by an image can be subjective • OCR does not always produce good results • Missing out on part of the content 20
  21. 21. Thank you!