More Related Content

Similar to Media REVEALr: A social multimedia monitoring and intelligence system for Web multimedia veri cation(20)


More from Symeon Papadopoulos(20)

Recently uploaded(20)


Media REVEALr: A social multimedia monitoring and intelligence system for Web multimedia veri cation

  1. Media REVEALr: A social multimedia monitoring and intelligence system for Web multimedia verification Katerina Andreadou1, Symeon Papadopoulos1, Lazaros Apostolidis1, Anastasia Krithara2 and Yiannis Kompatsiaris1, 1Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI) 2National Centre for Scientific Research ‘Demokritos’ (NCSR ’D’) PAISI 2015, May 19, 2015, Ho Chi Minh City, Vietnam
  2. Can multimedia on the Web be trusted? #2 Real photo captured April 2011 by WSJ but heavily tweeted during Hurricane Sandy (29 Oct 2012) Tweeted by multiple sources & retweeted multiple times Original online at: journal-clouds-gathered-but-no-tornado-damage/
  3. The Problem • Everyone can easily publish content on the Web • Content can be easily repurposed and manipulated • News outlets are competing for views and clicks  Pressure for airing stories very quickly leaves very little room for verification.  Very often, even well- reputed news providers fall for fake news content. • Multiple tools and services available for individual tasks  complex verification process Very hard and time consuming to check the veracity of Web multimedia #3
  4. Media REVEALr • Developed within the REVEAL project: • Framework for collecting, indexing and browsing multimedia content from the Web and social media • Support for verification: – Near-duplicate detection against an indexed collection – Clustering of social media posts by visual similarity  comparative view of the same incident – Aggregation and visualization of Named Entities around an incident #4
  5. Related Work • Majority of works have focused on problem of topic detection and summarization: – TwitInfo (Marcus et al., 2011) – Twittermonitor (Mathioudakis & Koudas, 2010) – Meme detection & prediction (Weng et al., 2014) • Visual memes and clustering – Visual meme tracking (Xie et al., 2011) – Supervised multimodal clustering (Petkos et al., 2012) • Image manipulation tracking – Internet image archaeology (Kennedy & Chang, 2008) #5
  6. Overview of Media REVEALr #6 Media collection Media pre-processing & feature extraction Media analysis, mining & indexing Persistence Access (API) Visualization, front-end TEXT VISUAL
  7. Named Entity Detection • Brevity and noisy nature of text in social media poses a serious challenge • Employed solution: – Pre-processing: tokenization, user mention resolution, text cleaning – Stanford NER + user mention resolution – Regular expressions to remove special characters and symbols (e.g., #, @, URLs, etc.) #7
  8. Visual Indexing • Content-based image retrieval to solve Near- Duplicate Search (NDS) problem • Based on local descriptors (SURF), aggregation (VLAD), dimensionality reduction (PCA), quantization (PQ) and indexing (IVFADC) • State-of-the-art visual similarity search – High precision/recall – Very efficient and scalable implementation (search many millions of images in a few msec, maintain full index in memory using ~1GB/10M images) #8
  9. Improving NDS Resilience (NDS+) • Often, NDS performance suffers from overlay graphics and fonts • To address this issue, we integrate a descriptor-level classifier that tries to remove the font/graphic descriptors from the VLAD vector #9
  10. Example: Filtering Out Font Descriptors • Assuming that in most cases the classifier is correct, the resulting VLAD vector is of much higher quality compared to the one without filtering #10
  11. Classifier Details • Random Forest used as base classifier • Cost Sensitive meta-classifier to penalize misclassification of True Positives • Challenge due to Class Imbalance (overlay descriptors << useful image content descriptors) – Cost Sensitive meta-classifier performs over-sampling of minority class to balance the training set • Training set created by collecting images with overlays (e.g., memes) from the Web and manually annotating them (selecting areas w. fonts/overlays) #11
  12. Mining: Clustering and Aggregation • Visual aggregation – DBSCAN on the visual feature representation (PCA- reduced VLAD vectors) – Element (tweet) selected based on the largest amount of keywords (expected to result in more information) • Entity aggregation – NER on individual items – Entity categorization ( Persons, Location, Organizations) – Entity ranking based on frequency of occurrence #12
  13. User Interface: Collections View #13
  14. User Interface: Items View & Search #14
  15. User Interface: Clusters View #15
  16. User Interface: Entities View #16
  17. Evaluation: NER • Manual annotation of 400 tweets from the SNOW Data Challenge dataset (Papadopoulos et al., 2014) • Measure: Accuracy  instance is considered correct when both entity and type are correctly identified • Three competing solutions: – Base Stanford NER (S-NER) – S-NER + Extensions/Post-processing (S-NER+) – Ellogon library ( #17
  18. Evaluation: NDS • Benchmark Datasets – Holidays: 1,491 images, 500 queries (Jegou et al., 2008) – Oxford: 5,063 images, 55 queries (Philbin et al., 2008) – Paris: 6,412 images, 55 queries (Philbin et al., 2008) • Accuracy: mean Average Precision (mAP) #18 CLEAN DATASET NOISY DATASET
  19. Evaluation: NDS • Execution Time (msec) • Example #19 INDEXED IMAGE QUERY IMAGE NDS: #27 NDS+: #1
  20. Use Cases: Real-world Datasets #20 sandy boston malaysia ferry
  21. NDS Use Case (boston) #21
  22. Clustering Use Case (boston) • Visual clustering enables comparative view and analysis over time (in this case showing increasing confidence on picture). • When journalists see many similar photos of the same scene, they have more confidence that it is real and not fabricated. #22
  23. Entity Aggregation Use Case (snow) #23 LOCATIONS PERSONS ORGANIZATIONS
  24. Conclusion • Key contributions – Framework and web application offering valuable verification support for Web multimedia – High-quality individual components for NER, NDS, clustering and aggregation • Future Work – Incremental image clustering – Temporal views to explore evolution of a story – Multimedia forensics toolbox (splice, copy-move detection) #24
  25. Future Work: Web Multimedia Forensics • Possibility to offer image manipulation detection as a service for arbitrary Web images – challenges: social media platforms incur additional transformations (scaling, JPEG recompression, etc.) making the problem much more complex #25
  26. References (1/2) • A. Marcus, M. S. Bernstein, O. Badar, D. R. Karger, S. Madden, and R. C. Miller. Twitinfo: Aggregating and visualizing microblogs for event exploration. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '11, pages 227-236, New York, NY, USA, 2011. ACM • M. Mathioudakis and N. Koudas. Twittermonitor: Trend detection over the twitter stream. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 1155-1158, New York, NY, USA, 2010. ACM • G. Petkos, S. Papadopoulos, and Y. Kompatsiaris. Social event detection using multimodal clustering and integrating supervisory signals. In Proceedings of the 2Nd ACM International Conference on Multimedia Retrieval, ICMR '12, pages 23:1- 23:8, New York, NY, USA, 2012. ACM • L. Weng, F. Menczer, and Y. Ahn. Predicting successful memes using network and community structure. CoRR, abs/1403.6199, 2014 • L. Xie, A. Natsev, J. R. Kender, M. Hill, and J. R. Smith. Visual memes in social media: Tracking real-world news in youtube videos. In Proceedings of the 19th ACM International Conference on Multimedia, MM '11, pages 53{62, New York, NY, USA, 2011. ACM #26
  27. References (2/2) • L. Kennedy and S.-F. Chang. Internet image archaeology: Automatically tracing the manipulation history of photographs on the web. In Proceedings of the 16th ACM International Conference on Multimedia, MM '08, pages 349{358, New York, NY, USA, 2008. ACM • H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In Proceedings of the 10th European Conference on Computer Vision: Part I, ECCV '08, pages 304-317, Berlin, Heidelberg, 2008. Springer-Verlag • S. Papadopoulos, D. Corney, and L. M. Aiello. SNOW 2014 Data Challenge: Assessing the performance of news topic detection methods in social media. In Proceedings of the SNOW 2014 Data Challenge Workshop co- located with 23rd International World Wide Web Conference (WWW 2014), Seoul, Korea, April 8, 2014, pages 1-8, 2014. • J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2008), pages 1-8, June 2008. #27
  28. Thank you! • Resources: Slides: Code: Data: • Get in touch: @sympapadopoulos / @kandreads / #28

Editor's Notes