Media REVEALr: A social multimedia monitoring and intelligence system for Web multimedia verication
May. 13, 2015•0 likes
3 likes
Be the first to like this
Show More
•1,037 views
views
Total views
0
On Slideshare
0
From embeds
0
Number of embeds
0
Download to read offline
Report
Technology
Presentation of Media REVEALr, a framework for mining social and Web multimedia with the goal of supporting verification. Presented at PAISI workshop, co-located with PA-KDD 2015, Ho Chi Minh City, Vietnam
Media REVEALr: A social multimedia monitoring and intelligence system for Web multimedia verication
Media REVEALr: A social multimedia
monitoring and intelligence system for Web
multimedia verification
Katerina Andreadou1, Symeon Papadopoulos1, Lazaros Apostolidis1,
Anastasia Krithara2 and Yiannis Kompatsiaris1,
1Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI)
2National Centre for Scientific Research ‘Demokritos’ (NCSR ’D’)
PAISI 2015, May 19, 2015, Ho Chi Minh City, Vietnam
Can multimedia on the Web be trusted?
#2
Real photo
captured April 2011 by WSJ
but
heavily tweeted during Hurricane Sandy
(29 Oct 2012)
Tweeted by multiple sources &
retweeted multiple times
Original online at:
http://blogs.wsj.com/metropolis/2011/04/28/weather-
journal-clouds-gathered-but-no-tornado-damage/
The Problem
• Everyone can easily publish content on the Web
• Content can be easily repurposed and manipulated
• News outlets are competing for views and clicks
Pressure for airing stories very quickly leaves very
little room for verification. Very often, even well-
reputed news providers fall for fake news content.
• Multiple tools and services available for individual
tasks complex verification process
Very hard and time consuming to check the veracity
of Web multimedia
#3
Media REVEALr
• Developed within the REVEAL project:
http://revealproject.eu/
• Framework for collecting, indexing and browsing
multimedia content from the Web and social media
• Support for verification:
– Near-duplicate detection against an indexed collection
– Clustering of social media posts by visual similarity
comparative view of the same incident
– Aggregation and visualization of Named Entities around an
incident
#4
Related Work
• Majority of works have focused on problem of topic
detection and summarization:
– TwitInfo (Marcus et al., 2011)
– Twittermonitor (Mathioudakis & Koudas, 2010)
– Meme detection & prediction (Weng et al., 2014)
• Visual memes and clustering
– Visual meme tracking (Xie et al., 2011)
– Supervised multimodal clustering (Petkos et al., 2012)
• Image manipulation tracking
– Internet image archaeology (Kennedy & Chang, 2008)
#5
Overview of Media REVEALr
#6
Media collection
Media pre-processing &
feature extraction
Media analysis, mining &
indexing
Persistence
Access (API)
Visualization, front-end
TEXT VISUAL
Named Entity Detection
• Brevity and noisy nature of text in social media poses
a serious challenge
• Employed solution:
– Pre-processing: tokenization, user mention resolution, text
cleaning
– Stanford NER + user mention resolution
– Regular expressions to remove special characters and
symbols (e.g., #, @, URLs, etc.)
#7
Visual Indexing
• Content-based image retrieval to solve Near-
Duplicate Search (NDS) problem
• Based on local descriptors (SURF), aggregation
(VLAD), dimensionality reduction (PCA), quantization
(PQ) and indexing (IVFADC)
• State-of-the-art visual similarity search
– High precision/recall
– Very efficient and scalable implementation (search many
millions of images in a few msec, maintain full index in
memory using ~1GB/10M images)
#8
Improving NDS Resilience (NDS+)
• Often, NDS performance suffers from overlay
graphics and fonts
• To address this issue, we integrate a descriptor-level
classifier that tries to remove the font/graphic
descriptors from the VLAD vector
#9
Example: Filtering Out Font Descriptors
• Assuming that in most cases the classifier is correct,
the resulting VLAD vector is of much higher quality
compared to the one without filtering
#10
Classifier Details
• Random Forest used as base classifier
• Cost Sensitive meta-classifier to penalize
misclassification of True Positives
• Challenge due to Class Imbalance (overlay
descriptors << useful image content descriptors)
– Cost Sensitive meta-classifier performs over-sampling of
minority class to balance the training set
• Training set created by collecting images with
overlays (e.g., memes) from the Web and manually
annotating them (selecting areas w. fonts/overlays)
#11
Mining: Clustering and Aggregation
• Visual aggregation
– DBSCAN on the visual feature representation (PCA-
reduced VLAD vectors)
– Element (tweet) selected based on the largest amount of
keywords (expected to result in more information)
• Entity aggregation
– NER on individual items
– Entity categorization ( Persons, Location, Organizations)
– Entity ranking based on frequency of occurrence
#12
Evaluation: NER
• Manual annotation of 400 tweets from the SNOW
Data Challenge dataset (Papadopoulos et al., 2014)
• Measure: Accuracy instance is considered correct
when both entity and type are correctly identified
• Three competing solutions:
– Base Stanford NER (S-NER)
– S-NER + Extensions/Post-processing (S-NER+)
– Ellogon library (http://www.ellogon.org)
#17
Evaluation: NDS
• Benchmark Datasets
– Holidays: 1,491 images, 500 queries (Jegou et al., 2008)
– Oxford: 5,063 images, 55 queries (Philbin et al., 2008)
– Paris: 6,412 images, 55 queries (Philbin et al., 2008)
• Accuracy: mean Average Precision (mAP)
#18
CLEAN DATASET NOISY DATASET
Clustering Use Case (boston)
• Visual clustering enables comparative view and analysis over
time (in this case showing increasing confidence on picture).
• When journalists see many similar photos of the same scene,
they have more confidence that it is real and not fabricated.
#22
Conclusion
• Key contributions
– Framework and web application offering valuable
verification support for Web multimedia
– High-quality individual components for NER, NDS,
clustering and aggregation
• Future Work
– Incremental image clustering
– Temporal views to explore evolution of a story
– Multimedia forensics toolbox (splice, copy-move
detection)
#24
Future Work: Web Multimedia Forensics
• Possibility to offer image manipulation detection as a
service for arbitrary Web images
– challenges: social media platforms incur additional
transformations (scaling, JPEG recompression, etc.) making
the problem much more complex
#25
References (1/2)
• A. Marcus, M. S. Bernstein, O. Badar, D. R. Karger, S. Madden, and R. C. Miller.
Twitinfo: Aggregating and visualizing microblogs for event exploration. In
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,
CHI '11, pages 227-236, New York, NY, USA, 2011. ACM
• M. Mathioudakis and N. Koudas. Twittermonitor: Trend detection over the twitter
stream. In Proceedings of the 2010 ACM SIGMOD International Conference on
Management of Data, SIGMOD '10, pages 1155-1158, New York, NY, USA, 2010.
ACM
• G. Petkos, S. Papadopoulos, and Y. Kompatsiaris. Social event detection using
multimodal clustering and integrating supervisory signals. In Proceedings of the
2Nd ACM International Conference on Multimedia Retrieval, ICMR '12, pages 23:1-
23:8, New York, NY, USA, 2012. ACM
• L. Weng, F. Menczer, and Y. Ahn. Predicting successful memes using network and
community structure. CoRR, abs/1403.6199, 2014
• L. Xie, A. Natsev, J. R. Kender, M. Hill, and J. R. Smith. Visual memes in social
media: Tracking real-world news in youtube videos. In Proceedings of the 19th
ACM International Conference on Multimedia, MM '11, pages 53{62, New York,
NY, USA, 2011. ACM
#26
References (2/2)
• L. Kennedy and S.-F. Chang. Internet image archaeology: Automatically
tracing the manipulation history of photographs on the web. In
Proceedings of the 16th ACM International Conference on Multimedia,
MM '08, pages 349{358, New York, NY, USA, 2008. ACM
• H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak
geometric consistency for large scale image search. In Proceedings of the
10th European Conference on Computer Vision: Part I, ECCV '08, pages
304-317, Berlin, Heidelberg, 2008. Springer-Verlag
• S. Papadopoulos, D. Corney, and L. M. Aiello. SNOW 2014 Data Challenge:
Assessing the performance of news topic detection methods in social
media. In Proceedings of the SNOW 2014 Data Challenge Workshop co-
located with 23rd International World Wide Web Conference (WWW
2014), Seoul, Korea, April 8, 2014, pages 1-8, 2014.
• J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in
quantization: Improving particular object retrieval in large scale image
databases. In IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2008), pages 1-8, June 2008.
#27