Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Automatic Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises

524 views

Published on

The extensive use of social media platforms, especially during disasters, creates unique opportunities for humanitarian organizations to gain situational awareness and launch relief operations accordingly. In addition to the textual content, people post overwhelming amounts of imagery data on social networks within minutes of a disaster hit. Studies point to the importance of this online imagery content for emergency response. Despite recent advances in the computer vision field, automatic processing of the crisis-related social media imagery data remains a challenging task. It is because a majority of which consists of redundant and irrelevant content. In this paper, we present an image processing pipeline that comprises de-duplication and relevancy filtering mechanisms to collect and filter social media image content in real-time during a crisis event. Results obtained from extensive experiments on real-world crisis datasets demonstrate the significance of the proposed pipeline for optimal utilization of both human and machine computing resources.

Published in: Science
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Automatic Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises

  1. 1. Automa'c Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises Dat Tien Nguyen, Firoj Alam Ferda Ofli, Muhammad Imran Qatar Compu>ng Research Ins>tute @mimran15 @ferdaofli @firojalam04 @ndat
  2. 2. Outline Background Data Image Processing Pipeline Summary and Future Works
  3. 3. Outline Background Data Image Processing Pipeline Summary and Future Works
  4. 4. Ar'ficial Intelligence for Digital Response Background hIp://aidr.qcri.org What about images? Expert/User
  5. 5. “A picture is worth a thousand words.” Background
  6. 6. Example Background Irrelevant Images for the event Relevant Images for the event
  7. 7. Challenges of Social Media Image Processing •  High velocity and high volume – Crowd cannot keep up •  Sandy Hurricane –  20 millions tweets in 5 days –  16 thousands tweets per minute Background
  8. 8. Challenges of Social Media Image Processing •  High velocity and high volume – Crowd cannot keep up •  Sandy Hurricane –  20 millions tweets in 5 days –  16 thousands tweets per minute •  Large por>on of redundant/duplicate and irrelevant content – We should not waste crowd’s >me and effort Background
  9. 9. Task/Goal “is to filter irrelevant and duplicate images, and categorize them into different damage level, which can help in the decision making process” Background
  10. 10. Image Filtering? •  Social media image filtering – Real->me image collec>on – Duplicate or near-duplicate image detec>on – Irrelevant image content detec>on Background
  11. 11. Image Filtering? •  Social media image filtering – Real->me image collec>on – Duplicate or near-duplicate image detec>on – Irrelevant image content detec>on •  Ac'onable informa'on extrac'on – Infrastructure damage assessment – Injured people detec>on – Fine-grained object analysis Background
  12. 12. Automa'c Image Processing Pipeline Background Tweet Collector Image Collector - Receives tweets from Tweet collector - Image URL extrac>on - Image downloading from the Web Relevancy Filtering De-duplica>on Filtering Tweets Images Web Image downloading Images Images All images + meta-data pass through this channel Only relevant images + meta-data pass through this channel Only relevant and unique images + meta-data pass through this channel Damage Assessment
  13. 13. Related Works •  Very few: –  Images with on-topic tweets are more relevant (Peters and Joao, 2015) –  Visually relevant and irrelevant tweets (F1 70.5% in binary classifica>on task) (Chen et al., 2013) –  Damage level from aerial Images (Turker et al., 2004; Fernandez et al. 2015) •  Mostly used bag-of-visual-word features Background
  14. 14. Outline Background Data Image Processing Pipeline Summary and Future Works
  15. 15. Data: Previous Disaster Datasets •  Four disaster events Corpus
  16. 16. Data: Damage Assessment Corpus Annota'on: •  Severe damage: –  Substan>al destruc>on of an infrastructure –  Non-usable buildings, roads, bridges for example. •  Mild damage: –  Minor damages with up to 50% •  LiQle-to-no damage Number of labeled images for each dataset and each damage category
  17. 17. Data: Relevancy Detec>on Corpus Annota'on Mapping: •  Images has been selected from the data we annotated for damage assessment –  {severe, mild} à relevant (randomly sampled 3518 images) •  Resulted 7036 Images {relevant, irrelevant} Image annotated as LiQle-to-None ImageNet Object Detec'on Model Object category website, suit, lab coat, envelope, dust jacket, candle, menu, vestment, monitor, street sign, puzzle, television, cash machine, screen irrelevant Most frequently occurring Human Supervision
  18. 18. Outline Background Data Image Processing Pipeline Summary and Future Works
  19. 19. Automa'c Image Processing Pipeline Image Processing Pipeline Tweet Collector Image Collector - Receives tweets from Tweet collector - Image URL extrac>on - Image downloading from the Web Relevancy Filtering De-duplica>on Filtering Tweets Images Web Image downloading Images Images All images + meta-data pass through this channel Only relevant images + meta-data pass through this channel Only relevant and unique images + meta-data pass through this channel Damage assessment
  20. 20. Image Collector Image Processing Pipeline Tweet Collector Image Collector - Receives tweets from Tweet collector - Image URL extrac'on - Image downloading from the Web Relevancy Filtering De-duplica>on Filtering Tweets Images Web Image downloading Images Images All images + meta-data pass through this channel Only relevant images + meta-data pass through this channel Only relevant and unique images + meta-data pass through this channel Damage assessment
  21. 21. Relevancy Filtering Image Processing Pipeline Tweet Collector Image Collector - Receives tweets from Tweet collector - Image URL extrac>on - Image downloading from the Web Relevancy Filtering De-duplica>on Filtering Tweets Images Web Image downloading Images Images All images + meta-data pass through this channel Only relevant images + meta-data pass through this channel Only relevant and unique images + meta-data pass through this channel Damage assessment
  22. 22. Relevancy Filtering (Cont.) Irrelevant images showing cartoons, banners, adver>sements, celebri>es, etc. Image Processing Pipeline
  23. 23. Relevancy Filtering (Cont.) Image Processing Pipeline •  Deep Learning Model: Build a binary classifier –  Transfer learning –  Extracted features from VGG16, a pre-trained convolu>onal neural network, e.g., VGG16 (Simonyan et al., 2014 ) –  Fine tuned the parameters 1x2 Image credit: hIps://blog.heuritech.com
  24. 24. Relevancy Filtering (Cont.) * Simonyan, K. and Zisserman, A. (2014). “Very deep convolu>onal networks for large-scale image recogni>on”. In: arXiv preprint arXiv:1409.1556 Image Processing Pipeline •  Data Split for Experiments: –  Training 60% –  Valida>on 20% –  Test 20% •  Performance
  25. 25. Duplicate Filtering Image Processing Pipeline Tweet Collector Image Collector - Receives tweets from Tweet collector - Image URL extrac>on - Image downloading from the Web Relevancy Filtering De-duplica'on Filtering Tweets Images Web Image downloading Images Images All images + meta-data pass through this channel Only relevant images + meta-data pass through this channel Only relevant and unique images + meta-data pass through this channel Damage assessment
  26. 26. Duplicate Filtering Examples of near-duplicate images Image Processing Pipeline
  27. 27. Duplicate Filtering (Cont.) Image Processing Pipeline •  Selected 1100 images for the experiment •  Approach: –  Compute perceptual hash for each image –  Compute hamming distance between a pair of images Two images are duplicate if: distance(Vi, Vj) 10 Vi = pHash(Imagei) Vi ∈ Rn distance(Vi, Vj) = n k=0 |(Vi,k − Vj,k)| Vi = pHash(Imagei) Vi ∈ Rn distance(Vi, Vj) = n k=0 |(Vi,k − Vj,k)| hamming distance(Vi, Vj) = n k=1 (Vi,k ̸= Vj,k) (Lei et al. 2011)
  28. 28. Before/AZer Image Filtering Image Processing Pipeline •  Number of images that remain in our dataset auer each image filtering opera>on
  29. 29. Before/AZer Image Filtering (Cont.) Image Processing Pipeline •  Number of images that remain in our dataset auer each image filtering opera>on ~ 2 % ~ 2 % ~ 50 %
  30. 30. Before/AZer Image Filtering (Cont.) Image Processing Pipeline •  Number of images that remain in our dataset auer each image filtering opera>on ~ 2 % ~ 2 % ~ 50 % ~ 58 % ~ 50 % ~ 30 %
  31. 31. Before/AZer Image Filtering (Cont.) Image Processing Pipeline •  Number of images that remain in our dataset auer each image filtering opera>on ~ 2 % ~ 2 % ~ 50 % ~ 58 % ~ 50 % ~ 30 % Assume tagging an image costs $1, We can save ~$17k, almost saving 62% of the budget!!!
  32. 32. Damage Assessment Image Processing Pipeline Tweet Collector Image Collector - Receives tweets from Tweet collector - Image URL extrac>on - Image downloading from the Web Relevancy Filtering De-duplica>on Filtering Tweets Images Web Image downloading Images Images All images + meta-data pass through this channel Only relevant images + meta-data pass through this channel Only relevant and unique images + meta-data pass through this channel Damage assessment
  33. 33. Damage Assessment •  Three-class classifica>on – Categories: severe, mild & liIle-to-none •  Dis>nc>on between categories is ambiguous. Image Processing Pipeline Descrip'on Severe Mild None All Number of Images 1765 483 3751 6000
  34. 34. Damage Assessment •  Fine-tuning a pre-trained CNN (e.g., VGG16) •  5-fold cross valida>on Image Processing Pipeline
  35. 35. Outline Background Data Image Processing Pipeline Summary and Future Works
  36. 36. Summary •  We propose a real >me image processing pipeline, which filters –  duplicate, near duplicate, and –  irrelevant images and categories –  images into severe, mild and liIle-to-no damage •  Relevancy filter model: AUC: 0.98; F1: 0.98 •  Damage assessment model: AUC: 0.72; F1: 0.67 •  We can save 62% of the whole budget Summary and Future Works 62% images are irrelevant and duplicates D. T. Nguyen, F. Alam, F. Ofli, M. Imran, “Automa>c image filtering on social networks using deep learning and perceptual hashing during crises”, ISCRAM 2017.
  37. 37. Future Work •  Image processing pipeline – Please check: hIp://aidr.qcri.org •  Fine-grained object analysis – Severity of injury, quality of aid/shelter, etc. •  Mul>modal analysis – text + image + metadata Summary and Future Works
  38. 38. Thank you! Please follow us on twiIer @aidr_qcri Researchgate: goo.gl/TvqPpw To get the data: hQp://crisisnlp.qcri.org/ D. T. Nguyen, F. Alam, F. Ofli, M. Imran, “Automa>c image filtering on social networks using deep learning and perceptual hashing during crises”, ISCRAM 2017. Please cite us:

×