SlideShare a Scribd company logo
1 of 19
Learning to detect Misleading
Content on Twitter
Christina Boididou, Symeon Papadopoulos,
Lazaros Apostolidis, Yiannis Kompatsiaris
Information Technologies Institute, CERTH, Thessaloniki, Greece
ACM International Conference on Multimedia Retrieval
June 6-9, Bucharest, Romania
REAL OR FAKE: THE VERIFICATION PROBLEM
FAKE PHOTO
Photoshopped!
REAL OR FAKE: THE VERIFICATION PROBLEM
REAL PHOTO
Captured in Dublin’s Olympia Theatre
BUT
Mislabeled on social media as showing
the crowd at the Bataclan theatre just
before gunmen began firing.
TYPES OF FAKE
Reposting of real
multimedia
content
Reposting of
synthetic
Digital
tampering
Speculations
fake is any post (tweet) that shares multimedia content that does not faithfully represent the event that it refers to
Verification
Corpus
CL11 CL12 CL1n
CL2nCL22CL21
..
..
Tweet
FRAMEWORK OVERVIEW
Visualization
Tweet-based
features
User-based
features
Tweet-based
features
User-based
features
Predictive
model
Predictive
model
Prediction
Prediction
Label
majority vote
majority vote
Training
Testing
Fusion
FEATURE EXTRACTION
TWEET-BASED
Features related to tweets
• Text-based
• Language-specific
• Twitter-specific
• Link-based
USER-BASED
Features related to users
• User-specific
• Link-based num of
uppercase
characters: 13
num of
words: 24
num of slang
words: 1
Contains
first order
pronoun
num of
retweets: 3
Num of
favorites: 13
num of
mentions: 2
text
readability: 73
FEATURE EXTRACTION
TWEET-BASED
Features related to tweets
• Text-based
• Language-specific
• Twitter-specific
• Link-based
USER-BASED
Features related to users
• User-specific
• Link-based
Verified?
AGREEMENT-BASED RETRAINING
Verification
Corpus
Testing
Set
Tweet-based
features
User-based
features
Predictive
model
Predictive
model
Prediction
Prediction
Predictions
agreed?
Agreed
samples
Predictive
model
Disagreed
samples
Predictions
for agreed
no
yes
Predictions for
disagreed
Training
Testing
VERIFICATION CORPUS
COLLECTION
Set of tweets T collected with a set of keywords K
Tweets contain multimedia content (Image or Video)
GROUND TRUTH
Reputable online resources which debunk
images/videos
Publicly available corpus here:
https://github.com/MKLab-ITI/image-verification-
corpus
193real Images & Videos
6,225real Tweets
220fake Images & Videos
9,596fake Tweets
17events
EXPERIMENTAL STUDY
AIM
Evaluate the fake detection accuracy on samples from new events
Accuracy: 𝑎 =
𝑁 𝑐
𝑁
EXPERIMENTS
Kind of event-based cross-validation
For each event Ei -> training: 16 remaining events, testing: Ei
Additional split proposed on MediaEval task [1]
Random Forest of 100 trees
[1] Christina Boididou, Katerina Andreadou, Symeon Papadopoulos, Duc-Tien Dang-Nguyen, Giulia Boato, Michael Riegler, and Yiannis
Kompatsiaris. 2015. Verifying Multimedia Use at MediaEval 2015. In MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany.
EXPERIMENTAL STUDY
0
10
20
30
40
50
60
70
80
90
100
Baseline Features Total Features
Effect of bagging across the models and the feature groups
Tweet-based model Tweet-based model (bagging)
User-based model User-based model (bagging)
Baseline Features
Proposed in our previous work
Total Features
Baseline Features +
Newly proposed ones
EXPERIMENTAL STUDY
0
10
20
30
40
50
60
70
80
90
100
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 Average
Agreement levels and agreed accuracy across the trials
Agreement percentage Agreed accuracy
EXPERIMENTAL STUDY
50
55
60
65
70
75
80
85
90
95
100
Average values
50
55
60
65
70
75
80
85
90
95
100
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18
Agreement levels and agreed, disagreed, overall accuracy
across the trials
Agreed accuracy Disagreed accuracy Overall accuracy
EXPERIMENTAL STUDY
0
10
20
30
40
50
60
70
80
90
100
English Spanish No language Dutch French
Accuracy for most frequent languages Samples distribution per language
English
Spanish
No language
Dutch
French
COMPARISON WITH OTHER METHODS
METHOD F1-SCORE
MEDIAEVAL
2015
UoS-ITI 0.830
MCG-ICT 0.942
CERTH-UNITN 0.911
MEDIAEVAL
2016
Linkmedia 0.8246
MMLAB@DISI 0.8283
MCG-ICT 0.6761
VMU 0.9116
Proposed 0.934
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
MCG-ICT (2015) method:
• Approach tailored to the given MediaEval dataset
• Preprocessing step that first groups tweets by their multimedia content
• Difficult to apply in realistic setting
TWEET VERIFICATION ASSISTANT
ABOUT
Visualize the verification result
Present list of extracted features
and their values
Compare values in comparison to
the ones from the verification
corpus
HOW TO USE
Provide URL or tweet ID
Inspect the features and the
verification result (fake/real)
Find the Tweet Verification Assistant here: http://reveal-mklab.iti.gr/reveal/fake/
TWEET VERIFICATION ASSISTANT: EXAMPLE
CHALLENGES AND FUTURE WORK
CHALLENGES
Making the tool usable and easy to understand by non-computer scientists
• Interpretation of Machine Learning outputs is challenging
• Difficult to create an application that journalists could rely on and trust
FUTURE WORK
Test the Verification Assistant usefulness when used by journalists/news editors
Extend the framework to other social media
Leverage method output for other verification problems [1]
[1] Olga Papadopoulou, Markos Zampoglou, Symeon Papadopoulos and Yiannis Kompatsiaris. Web Video
Verification using Contextual Cues
Thank you!
Get in touch:
• Christina Boididou: christina.mpoid@gmail.com / @CMpoi
• Symeon Papadopoulos: papadop@iti.gr / @sympap
• Lazaros Apostolidis: laaposto@iti.gr
• Verification Corpus: https://github.com/MKLab-ITI/image-verification-corpus
• Tweet Verification Assistant: http://reveal-mklab.iti.gr/reveal/fake/
With the support of:

More Related Content

Similar to Learning to detect Misleading Content on Twitter

Facebook Video Unplugged (Turkey - 2015)
Facebook Video Unplugged (Turkey - 2015)Facebook Video Unplugged (Turkey - 2015)
Facebook Video Unplugged (Turkey - 2015)Crow Digital Marketing
 
What can users do for multimedia?
What can users do for multimedia?What can users do for multimedia?
What can users do for multimedia?Lora Aroyo
 
Presentation of the InVID tool for social media verification
Presentation of the InVID tool for social media verificationPresentation of the InVID tool for social media verification
Presentation of the InVID tool for social media verificationInVID Project
 
Extracting evidence from unstructured data
Extracting evidence from unstructured dataExtracting evidence from unstructured data
Extracting evidence from unstructured dataEFSA EU
 
Caveon Webinar Series: Lessons Learned from EATP and CSDPTF November 2013
Caveon Webinar Series:  Lessons Learned  from EATP and CSDPTF November 2013Caveon Webinar Series:  Lessons Learned  from EATP and CSDPTF November 2013
Caveon Webinar Series: Lessons Learned from EATP and CSDPTF November 2013Caveon Test Security
 
201409 Online Tuesday - Jeroen Elfferich
201409 Online Tuesday - Jeroen Elfferich201409 Online Tuesday - Jeroen Elfferich
201409 Online Tuesday - Jeroen ElfferichJeroen Elfferich
 
The Unfinished a11y agenda: Closing the Loop
The Unfinished a11y agenda:  Closing the LoopThe Unfinished a11y agenda:  Closing the Loop
The Unfinished a11y agenda: Closing the LoopMike Paciello
 
Connectivity in the Workplace
Connectivity in the Workplace Connectivity in the Workplace
Connectivity in the Workplace TPGmarketing
 
Nielsen´s Total Audience Report
Nielsen´s Total Audience ReportNielsen´s Total Audience Report
Nielsen´s Total Audience ReportJonathan Blum
 
How technology has changed our lives
How technology has changed our livesHow technology has changed our lives
How technology has changed our livesTracy Robinson
 
How telemetry can be your best friend
How telemetry can be your best friendHow telemetry can be your best friend
How telemetry can be your best friendMatteo Emili
 
The Tipping Point
The Tipping PointThe Tipping Point
The Tipping PointWei Li
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentSymeon Papadopoulos
 
Capitalizing on OTT Breakfast Forum-Heavy Reading for posting
Capitalizing on OTT Breakfast Forum-Heavy Reading for postingCapitalizing on OTT Breakfast Forum-Heavy Reading for posting
Capitalizing on OTT Breakfast Forum-Heavy Reading for postingVerimatrix
 
Realeyes and Mediacom at IieX 2016
Realeyes and Mediacom at IieX 2016Realeyes and Mediacom at IieX 2016
Realeyes and Mediacom at IieX 2016Realeyes
 
Wave 9 – The Meaning of Moments
Wave 9 – The Meaning of MomentsWave 9 – The Meaning of Moments
Wave 9 – The Meaning of MomentsLiz Haas
 
Comparative survey on technology transfer the cases of the d'annunzio and lud...
Comparative survey on technology transfer the cases of the d'annunzio and lud...Comparative survey on technology transfer the cases of the d'annunzio and lud...
Comparative survey on technology transfer the cases of the d'annunzio and lud...Fabiano Madonna
 

Similar to Learning to detect Misleading Content on Twitter (19)

Facebook Video Unplugged (Turkey - 2015)
Facebook Video Unplugged (Turkey - 2015)Facebook Video Unplugged (Turkey - 2015)
Facebook Video Unplugged (Turkey - 2015)
 
What can users do for multimedia?
What can users do for multimedia?What can users do for multimedia?
What can users do for multimedia?
 
Presentation of the InVID tool for social media verification
Presentation of the InVID tool for social media verificationPresentation of the InVID tool for social media verification
Presentation of the InVID tool for social media verification
 
Extracting evidence from unstructured data
Extracting evidence from unstructured dataExtracting evidence from unstructured data
Extracting evidence from unstructured data
 
Caveon Webinar Series: Lessons Learned from EATP and CSDPTF November 2013
Caveon Webinar Series:  Lessons Learned  from EATP and CSDPTF November 2013Caveon Webinar Series:  Lessons Learned  from EATP and CSDPTF November 2013
Caveon Webinar Series: Lessons Learned from EATP and CSDPTF November 2013
 
201409 Online Tuesday - Jeroen Elfferich
201409 Online Tuesday - Jeroen Elfferich201409 Online Tuesday - Jeroen Elfferich
201409 Online Tuesday - Jeroen Elfferich
 
The Unfinished a11y agenda: Closing the Loop
The Unfinished a11y agenda:  Closing the LoopThe Unfinished a11y agenda:  Closing the Loop
The Unfinished a11y agenda: Closing the Loop
 
Connectivity in the Workplace
Connectivity in the Workplace Connectivity in the Workplace
Connectivity in the Workplace
 
Nielsen´s Total Audience Report
Nielsen´s Total Audience ReportNielsen´s Total Audience Report
Nielsen´s Total Audience Report
 
Work or Play
Work or PlayWork or Play
Work or Play
 
How technology has changed our lives
How technology has changed our livesHow technology has changed our lives
How technology has changed our lives
 
How telemetry can be your best friend
How telemetry can be your best friendHow telemetry can be your best friend
How telemetry can be your best friend
 
The Tipping Point
The Tipping PointThe Tipping Point
The Tipping Point
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
Capitalizing on OTT Breakfast Forum-Heavy Reading for posting
Capitalizing on OTT Breakfast Forum-Heavy Reading for postingCapitalizing on OTT Breakfast Forum-Heavy Reading for posting
Capitalizing on OTT Breakfast Forum-Heavy Reading for posting
 
Realeyes and Mediacom at IieX 2016
Realeyes and Mediacom at IieX 2016Realeyes and Mediacom at IieX 2016
Realeyes and Mediacom at IieX 2016
 
Task 6 issues
Task 6 issuesTask 6 issues
Task 6 issues
 
Wave 9 – The Meaning of Moments
Wave 9 – The Meaning of MomentsWave 9 – The Meaning of Moments
Wave 9 – The Meaning of Moments
 
Comparative survey on technology transfer the cases of the d'annunzio and lud...
Comparative survey on technology transfer the cases of the d'annunzio and lud...Comparative survey on technology transfer the cases of the d'annunzio and lud...
Comparative survey on technology transfer the cases of the d'annunzio and lud...
 

More from Symeon Papadopoulos

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...Symeon Papadopoulos
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionSymeon Papadopoulos
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationSymeon Papadopoulos
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Symeon Papadopoulos
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingSymeon Papadopoulos
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSymeon Papadopoulos
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualitySymeon Papadopoulos
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetSymeon Papadopoulos
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionSymeon Papadopoulos
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersSymeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Symeon Papadopoulos
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceSymeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Symeon Papadopoulos
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsSymeon Papadopoulos
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsSymeon Papadopoulos
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Symeon Papadopoulos
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015Symeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015Symeon Papadopoulos
 

More from Symeon Papadopoulos (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015
 
Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015Verifying Multimedia Use at MediaEval 2015
Verifying Multimedia Use at MediaEval 2015
 

Recently uploaded

Unveiling SOCIO COSMOS: Where Socializing Meets the Stars
Unveiling SOCIO COSMOS: Where Socializing Meets the StarsUnveiling SOCIO COSMOS: Where Socializing Meets the Stars
Unveiling SOCIO COSMOS: Where Socializing Meets the StarsSocioCosmos
 
Call Girls In Dwarka ⏩7838079806 ⏩Escort Service In Patel Nagar Delhi
Call Girls In Dwarka ⏩7838079806 ⏩Escort Service In Patel Nagar DelhiCall Girls In Dwarka ⏩7838079806 ⏩Escort Service In Patel Nagar Delhi
Call Girls In Dwarka ⏩7838079806 ⏩Escort Service In Patel Nagar Delhidelhiescort
 
办理伯明翰大学毕业证书文凭学位证书
办理伯明翰大学毕业证书文凭学位证书办理伯明翰大学毕业证书文凭学位证书
办理伯明翰大学毕业证书文凭学位证书saphesg8
 
Protecting Your Little Explorer at Home!
Protecting Your Little Explorer at Home!Protecting Your Little Explorer at Home!
Protecting Your Little Explorer at Home!andrekr997
 
Upgrade Your Twitter Presence with Socio Cosmos
Upgrade Your Twitter Presence with Socio CosmosUpgrade Your Twitter Presence with Socio Cosmos
Upgrade Your Twitter Presence with Socio CosmosSocioCosmos
 
Models Call Girls Shettihalli - 7001305949 Escorts Service 50% Off with Cash ...
Models Call Girls Shettihalli - 7001305949 Escorts Service 50% Off with Cash ...Models Call Girls Shettihalli - 7001305949 Escorts Service 50% Off with Cash ...
Models Call Girls Shettihalli - 7001305949 Escorts Service 50% Off with Cash ...jicagig173
 
Mastering Wealth with YouTube Content Marketing.pdf
Mastering Wealth with YouTube Content Marketing.pdfMastering Wealth with YouTube Content Marketing.pdf
Mastering Wealth with YouTube Content Marketing.pdfTirupati Social Media
 
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一ra6e69ou
 
Cosmic Conversations with Sociocosmos...
Cosmic Conversations with Sociocosmos...Cosmic Conversations with Sociocosmos...
Cosmic Conversations with Sociocosmos...SocioCosmos
 
fraud storyboards powerpoint media project
fraud storyboards powerpoint media projectfraud storyboards powerpoint media project
fraud storyboards powerpoint media project17mos052
 
AI Virtual Influencers: The Future of Influencer Marketing
AI Virtual Influencers:  The Future of Influencer MarketingAI Virtual Influencers:  The Future of Influencer Marketing
AI Virtual Influencers: The Future of Influencer MarketingCut-the-SaaS
 
VIP Moti Bagh Call Girls Free Doorstep Delivery 9873777170
VIP Moti Bagh Call Girls Free Doorstep Delivery 9873777170VIP Moti Bagh Call Girls Free Doorstep Delivery 9873777170
VIP Moti Bagh Call Girls Free Doorstep Delivery 9873777170Komal Khan
 
The--Fraud: Netflix Original Media Pitch
The--Fraud: Netflix Original Media PitchThe--Fraud: Netflix Original Media Pitch
The--Fraud: Netflix Original Media Pitch17mos052
 
social media advantages and disadvantages
social media advantages and disadvantagessocial media advantages and disadvantages
social media advantages and disadvantagesmehwishkhan1018786
 
THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT
THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECTTHE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT
THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT17mos052
 

Recently uploaded (20)

Unveiling SOCIO COSMOS: Where Socializing Meets the Stars
Unveiling SOCIO COSMOS: Where Socializing Meets the StarsUnveiling SOCIO COSMOS: Where Socializing Meets the Stars
Unveiling SOCIO COSMOS: Where Socializing Meets the Stars
 
Call Girls In Dwarka ⏩7838079806 ⏩Escort Service In Patel Nagar Delhi
Call Girls In Dwarka ⏩7838079806 ⏩Escort Service In Patel Nagar DelhiCall Girls In Dwarka ⏩7838079806 ⏩Escort Service In Patel Nagar Delhi
Call Girls In Dwarka ⏩7838079806 ⏩Escort Service In Patel Nagar Delhi
 
办理伯明翰大学毕业证书文凭学位证书
办理伯明翰大学毕业证书文凭学位证书办理伯明翰大学毕业证书文凭学位证书
办理伯明翰大学毕业证书文凭学位证书
 
Protecting Your Little Explorer at Home!
Protecting Your Little Explorer at Home!Protecting Your Little Explorer at Home!
Protecting Your Little Explorer at Home!
 
Upgrade Your Twitter Presence with Socio Cosmos
Upgrade Your Twitter Presence with Socio CosmosUpgrade Your Twitter Presence with Socio Cosmos
Upgrade Your Twitter Presence with Socio Cosmos
 
Models Call Girls Shettihalli - 7001305949 Escorts Service 50% Off with Cash ...
Models Call Girls Shettihalli - 7001305949 Escorts Service 50% Off with Cash ...Models Call Girls Shettihalli - 7001305949 Escorts Service 50% Off with Cash ...
Models Call Girls Shettihalli - 7001305949 Escorts Service 50% Off with Cash ...
 
FULL ENJOY Call Girls In Mohammadpur (Delhi) Call Us 9953056974
FULL ENJOY Call Girls In Mohammadpur  (Delhi) Call Us 9953056974FULL ENJOY Call Girls In Mohammadpur  (Delhi) Call Us 9953056974
FULL ENJOY Call Girls In Mohammadpur (Delhi) Call Us 9953056974
 
Hot Sexy call girls in Ramesh Nagar🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Ramesh Nagar🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Ramesh Nagar🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Ramesh Nagar🔝 9953056974 🔝 Delhi escort Service
 
Mastering Wealth with YouTube Content Marketing.pdf
Mastering Wealth with YouTube Content Marketing.pdfMastering Wealth with YouTube Content Marketing.pdf
Mastering Wealth with YouTube Content Marketing.pdf
 
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一
定制(ENU毕业证书)英国爱丁堡龙比亚大学毕业证成绩单原版一比一
 
Cosmic Conversations with Sociocosmos...
Cosmic Conversations with Sociocosmos...Cosmic Conversations with Sociocosmos...
Cosmic Conversations with Sociocosmos...
 
Enjoy ➥8448380779▻ Call Girls In Noida Sector 93 Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Noida Sector 93 Escorts Delhi NCREnjoy ➥8448380779▻ Call Girls In Noida Sector 93 Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Noida Sector 93 Escorts Delhi NCR
 
looking for escort 9953056974 Low Rate Call Girls In Vinod Nagar
looking for escort 9953056974 Low Rate Call Girls In  Vinod Nagarlooking for escort 9953056974 Low Rate Call Girls In  Vinod Nagar
looking for escort 9953056974 Low Rate Call Girls In Vinod Nagar
 
fraud storyboards powerpoint media project
fraud storyboards powerpoint media projectfraud storyboards powerpoint media project
fraud storyboards powerpoint media project
 
AI Virtual Influencers: The Future of Influencer Marketing
AI Virtual Influencers:  The Future of Influencer MarketingAI Virtual Influencers:  The Future of Influencer Marketing
AI Virtual Influencers: The Future of Influencer Marketing
 
young call girls in Greater Noida 🔝 9953056974 🔝 Delhi escort Service
young call girls in  Greater Noida 🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in  Greater Noida 🔝 9953056974 🔝 Delhi escort Service
young call girls in Greater Noida 🔝 9953056974 🔝 Delhi escort Service
 
VIP Moti Bagh Call Girls Free Doorstep Delivery 9873777170
VIP Moti Bagh Call Girls Free Doorstep Delivery 9873777170VIP Moti Bagh Call Girls Free Doorstep Delivery 9873777170
VIP Moti Bagh Call Girls Free Doorstep Delivery 9873777170
 
The--Fraud: Netflix Original Media Pitch
The--Fraud: Netflix Original Media PitchThe--Fraud: Netflix Original Media Pitch
The--Fraud: Netflix Original Media Pitch
 
social media advantages and disadvantages
social media advantages and disadvantagessocial media advantages and disadvantages
social media advantages and disadvantages
 
THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT
THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECTTHE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT
THE FRAUD NETFLIX ORIGINAL MEDIA PITCH PROJECT
 

Learning to detect Misleading Content on Twitter

  • 1. Learning to detect Misleading Content on Twitter Christina Boididou, Symeon Papadopoulos, Lazaros Apostolidis, Yiannis Kompatsiaris Information Technologies Institute, CERTH, Thessaloniki, Greece ACM International Conference on Multimedia Retrieval June 6-9, Bucharest, Romania
  • 2. REAL OR FAKE: THE VERIFICATION PROBLEM FAKE PHOTO Photoshopped!
  • 3. REAL OR FAKE: THE VERIFICATION PROBLEM REAL PHOTO Captured in Dublin’s Olympia Theatre BUT Mislabeled on social media as showing the crowd at the Bataclan theatre just before gunmen began firing.
  • 4. TYPES OF FAKE Reposting of real multimedia content Reposting of synthetic Digital tampering Speculations fake is any post (tweet) that shares multimedia content that does not faithfully represent the event that it refers to
  • 5. Verification Corpus CL11 CL12 CL1n CL2nCL22CL21 .. .. Tweet FRAMEWORK OVERVIEW Visualization Tweet-based features User-based features Tweet-based features User-based features Predictive model Predictive model Prediction Prediction Label majority vote majority vote Training Testing Fusion
  • 6. FEATURE EXTRACTION TWEET-BASED Features related to tweets • Text-based • Language-specific • Twitter-specific • Link-based USER-BASED Features related to users • User-specific • Link-based num of uppercase characters: 13 num of words: 24 num of slang words: 1 Contains first order pronoun num of retweets: 3 Num of favorites: 13 num of mentions: 2 text readability: 73
  • 7. FEATURE EXTRACTION TWEET-BASED Features related to tweets • Text-based • Language-specific • Twitter-specific • Link-based USER-BASED Features related to users • User-specific • Link-based Verified?
  • 9. VERIFICATION CORPUS COLLECTION Set of tweets T collected with a set of keywords K Tweets contain multimedia content (Image or Video) GROUND TRUTH Reputable online resources which debunk images/videos Publicly available corpus here: https://github.com/MKLab-ITI/image-verification- corpus 193real Images & Videos 6,225real Tweets 220fake Images & Videos 9,596fake Tweets 17events
  • 10. EXPERIMENTAL STUDY AIM Evaluate the fake detection accuracy on samples from new events Accuracy: 𝑎 = 𝑁 𝑐 𝑁 EXPERIMENTS Kind of event-based cross-validation For each event Ei -> training: 16 remaining events, testing: Ei Additional split proposed on MediaEval task [1] Random Forest of 100 trees [1] Christina Boididou, Katerina Andreadou, Symeon Papadopoulos, Duc-Tien Dang-Nguyen, Giulia Boato, Michael Riegler, and Yiannis Kompatsiaris. 2015. Verifying Multimedia Use at MediaEval 2015. In MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany.
  • 11. EXPERIMENTAL STUDY 0 10 20 30 40 50 60 70 80 90 100 Baseline Features Total Features Effect of bagging across the models and the feature groups Tweet-based model Tweet-based model (bagging) User-based model User-based model (bagging) Baseline Features Proposed in our previous work Total Features Baseline Features + Newly proposed ones
  • 12. EXPERIMENTAL STUDY 0 10 20 30 40 50 60 70 80 90 100 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 Average Agreement levels and agreed accuracy across the trials Agreement percentage Agreed accuracy
  • 13. EXPERIMENTAL STUDY 50 55 60 65 70 75 80 85 90 95 100 Average values 50 55 60 65 70 75 80 85 90 95 100 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 Agreement levels and agreed, disagreed, overall accuracy across the trials Agreed accuracy Disagreed accuracy Overall accuracy
  • 14. EXPERIMENTAL STUDY 0 10 20 30 40 50 60 70 80 90 100 English Spanish No language Dutch French Accuracy for most frequent languages Samples distribution per language English Spanish No language Dutch French
  • 15. COMPARISON WITH OTHER METHODS METHOD F1-SCORE MEDIAEVAL 2015 UoS-ITI 0.830 MCG-ICT 0.942 CERTH-UNITN 0.911 MEDIAEVAL 2016 Linkmedia 0.8246 MMLAB@DISI 0.8283 MCG-ICT 0.6761 VMU 0.9116 Proposed 0.934 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MCG-ICT (2015) method: • Approach tailored to the given MediaEval dataset • Preprocessing step that first groups tweets by their multimedia content • Difficult to apply in realistic setting
  • 16. TWEET VERIFICATION ASSISTANT ABOUT Visualize the verification result Present list of extracted features and their values Compare values in comparison to the ones from the verification corpus HOW TO USE Provide URL or tweet ID Inspect the features and the verification result (fake/real) Find the Tweet Verification Assistant here: http://reveal-mklab.iti.gr/reveal/fake/
  • 18. CHALLENGES AND FUTURE WORK CHALLENGES Making the tool usable and easy to understand by non-computer scientists • Interpretation of Machine Learning outputs is challenging • Difficult to create an application that journalists could rely on and trust FUTURE WORK Test the Verification Assistant usefulness when used by journalists/news editors Extend the framework to other social media Leverage method output for other verification problems [1] [1] Olga Papadopoulou, Markos Zampoglou, Symeon Papadopoulos and Yiannis Kompatsiaris. Web Video Verification using Contextual Cues
  • 19. Thank you! Get in touch: • Christina Boididou: christina.mpoid@gmail.com / @CMpoi • Symeon Papadopoulos: papadop@iti.gr / @sympap • Lazaros Apostolidis: laaposto@iti.gr • Verification Corpus: https://github.com/MKLab-ITI/image-verification-corpus • Tweet Verification Assistant: http://reveal-mklab.iti.gr/reveal/fake/ With the support of:

Editor's Notes

  1. Recent years, we have seen a tremendous increase in the use of social media platforms as means of sharing content. The simplicity of sharing has led to large volumes of news content reaching huge numbers of readers in short time. Especially multimedia content can easily become viral as easily consumed and carrying entertainment value. Given the speed of the news and the competition of journalists to publish first, the verification of the content is neglected or carried out in superficial manner. This leads to the online appearance of misleading multimedia content, or for the sake of brevity fake content. For example, let’s look at this picture: Can you make a guess? Is it real or fake? Even though Sharuman could well attend this meeting, this image was ultimately found to be photoshopped.
  2. Now, let’s have a look at this image. What is your guess now? Here we deal with an other type of fake photos. It is a real photo but was mislabeled on social media as showing the crowd at the Bataclan theatre just before gunmen started firing.
  3. So, as misleading or fake we consider any twitter post that shares multimedia content that does not faithfully represent the event that it refers to. This could include Reposting of real multimedia content, Reposting of synthetic/artworks, Digital tampering/photoshop or Speculations.
  4. In order to deal with the verification problem, we present a robust approach for detecting in real time whether a tweet that shares a multimedia item is fake or real. The proposed framework relies on two independent classification models built on the training data (verification corpus) using different sets of features, tweet-based and user based features. A bagging technique is used when building the models. We use n subsets of tweets including equal number of samples for each class leading to the creation of n classifiers. The final prediction is the majority vote among the n predictions. At prediction time, an agreement based retraining technique is employed which combines the outputs of the two models. The outome is then visualized to the users, using information of the labelled verification corpus.
  5. The selection of our features was carried out following a thorough study of the way journalists verify content on the web. We have defined two sets of features, the tweet-based extracted from the tweet itself. Assess the trust of the website
  6. A key novelty in our approach is the ABR technique (fusion block). We combine the outputs as follows: for each sample, we compare the predictions and depending on the agreement we divide the test set in agreed and disagreed samples. The agreed samples are assigned the agreed label (fake or real) assuming that it is correct with high likelihood and they consistute the predictions for the agreed samples. Then, we use a retraining technique. First we select the most effective of the independent classifiers based on their performance on the training set with cross validation. Then we use the agreed samples together with the initial training samples of the VC to predict labels for the disagreed samples. The goal is to adapt the initial model to the characteristics of the new unseen event.
  7. Our VC is a publicly available dataset with fake and real tweets. It consists to tweets related to 17 events compromising in total 220 cases blah blah. The tweets were collected using a set of keywords and they were debunked using reputable online resources. Only tweets with a multimedia item of these ones were included in the dataset and several manual steps were necessary to come up with those.
  8. The aim of the conducted experiments was to evaluate the fake detection accuracy on samples from new events. We consider this very important aspect of a verification framework as the nature of fake tweets may vary across different events. The employed scheme can be thought as an event-based cross-validation
  9. We first assess the contribution of the features on the method’s accuracy. We compare the performance using the baseline and the full set of features. The baseline features are just a subset of the features that we used on our previous work. Then, we assess the bagging we applied in our method. We can see that the full set of features and the bagging in both the tweet and user based features model led to considerably improved accuracy.
  10. In this graph, we present the agreement level and the accuracy of the classifiers on the agreed set. We note that the higher the agreed level the higher the achieved accuracy. The last column is the average percentage of the classifiers across the different trials.
  11. This bar chart shows the agreed accuracy, the disagreed accuracy and finally the overall across the trials. On the right chart, we can see the average accuracy levels of them with green orange and grey respectively. The last columns, with the blue color, are the performance of each of the models when tested individually on the test set. One can see a clear improvement (about 5%) compared to the overall accuracy.
  12. We also assessed the model on tweets written in different languages. Five most used languages in the corpus. No lang -> not detected or not much text Accuracy is stable independent of the language
  13. We also compare our model with methods sybmitted to Mediaeval 2015 verification task against their best run. Our proposed method achieves the second best performance reaching almost equals to the best run.
  14. One of the biggest challenges we are facing is making the tool usable and easy to understand by non computer scientists. Our experience with media experts from Deutsche Welle & AFP (Agence France Presse) shows that the …