SlideShare a Scribd company logo
1 of 11
Download to read offline
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium
Your Name – Your Company
your@email
www.yourwebsite.com
Stuart E. Middleton
University of Southampton IT Innovation Centre
sem@it-innovation.soton.ac.uk @stuart_e_middle @IT_Innov @RevealEU
www.it-innovation.soton.ac.uk
Extracting Attributed Verification and Debunking Reports from Social Media:
MediaEval-2015 Trust and Credibility Analysis of Image and Video
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium
Overview
1
 Problem Statement
 Approach
 Results
 Discussion
 Suggestions for Verification Challenge 2016
UoS-ITI Team
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium
Verification of Images and Videos for Breaking News
2
 Breaking News Timescales
 Minutes not hours - its old news after a couple of hours
 Journalists need to verify copy and get it published before their rivals do
 Journalistic Manual Verification Procedures for User Generated Content (UGC)
 Check content provenance - original post? location? timestamp? similar posts? website? ...
 Check author / source - attributed or author? known (un)reliable? popular? reputation? post history? ...
 Check content credibility - right image metadata? right location? right people? right weather? ...
 Phone the author up - triangulate facts, quiz author to check genuine, get authorization to publish
 Automate the Simpler Verification Steps
 Empowering journalists
 Increases the volume of contextual content that can be considered
 Focus humans on the more complex & subjective cross-checking tasks
 Contact content authors via phone and ask them difficult questions
 Does human behaviour 'look right' in a video?
 Cross-reference buildings / landmarks in image backgrounds to Google StreetView / image databases
 ... see the VerificationHandbook » http://verificationhandbook.com/
Problem Statement
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium
Attribute evidence to trusted or untrusted sources
3
 Hypothesis
 The 'wisdom of the crowd' is not really wisdom at all when it comes to verifying suspicious content
 It is better to rank evidence according to the most trusted & credible sources like journalists do
 Semi-automated approach
 Manually create a list of trusted sources
 Tweets » NLP » Extract fake & genuine claims & attribution to sources » Evidence
 Evidence » Cross-check all content for image / video » Fake/real decision based on best evidence
 Trustworthiness hierarchy for tweeted claims about images & videos
 Claim = statement that its a fake image / video or its genuine
 Claim authored by trusted source   
 Claim authored by untrusted source   
 Claim attributed to trusted source  
 Claim attributed to untrusted source  
 Unattributed claim 
Approach
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium
Regex patterns
4
Approach
Named Entity Patterns
@ (NNP|NN)
# (NNP|NN)
(NNP|NN) (NNP|NN)
(NNP|NN)
Attribution Patterns
<NE> *{0,3} <IMAGE> ...
<NE> *{0,2} <RELEASE> *{0,4} <IMAGE> ...
... <IMAGE> *{0,6} <FROM> *{0,1} <NE>
... <FROM> *{0,1} <NE>
... <IMAGE> *{0,1} <NE>
... <RT> <SEP>{0,1} <NE>
Faked Patterns
... *{0,2} <FAKED> ...
... <REAL> ? ...
... <NEGATIVE> *{0,1} <REAL> ...
Genuine Patterns
... <IMAGE> *{0,2} <REAL> ...
... <REAL> *{0,2} <IMAGE> ...
... <IS> *{0,1} <REAL> ...
... <NEGATIVE> *{0,1} <FAKE> ...
e.g.
CNN
BBC News
@bbcnews
e.g.
FBI has released prime suspect photos ...
... pic - BBC News
... image released via CNN
... RT: BBC News
e.g.
... what a fake! ...
... is it real? ...
... thats not real ...
e.g.
... this image is totally genuine ...
... its real ...
Key
<NE> = named entity (e.g. trusted source)
<IMAGE> = image variants(e.g. pic, image, video)
<FROM> = from variants(e.g. via, from, attributed)
<REAL> = real variants (e.g. real, genuine)
<NEGATIVE> = negative variants (e.g. not, isn't)
<RT> = RT variants (e.g. RT, MT)
<SEP> = separator variants (e.g. : - = )
<IS> = is | its | thats
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium
Fake & Real Tweet Classifier
5
Results
fake classification real classification
P R F1 P R F1
faked & genuine patterns
1.0 0.03 0.06 0.75 0.001 0.003
faked & genuine & attribution patterns
1.0 0.03 0.06 0.43 0.03 0.06
faked & genuine & attribution patterns & cross-check
1.0 0.72 0.83 0.74 0.74 0.74
fake classification real classification
P R F1 P R F1
faked & genuine & attribution patterns & cross-check
1.0 0.04 0.09 0.62 0.23 0.33
Fake & Real Image Classifier
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium
Fake & Real Tweet Classifier
6
Results
fake classification real classification
P R F1 P R F1
faked & genuine patterns
1.0 0.03 0.06 0.75 0.001 0.003
faked & genuine & attribution patterns
1.0 0.03 0.06 0.43 0.03 0.06
faked & genuine & attribution patterns & cross-check
1.0 0.72 0.83 0.74 0.74 0.74
fake classification real classification
P R F1 P R F1
faked & genuine & attribution patterns & cross-check
1.0 0.04 0.09 0.62 0.23 0.33
Fake & Real Image Classifier
No mistakes classifying
fakes in testset
Low false positives important
for end users like journalists
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium
Fake & Real Tweet Classifier
7
Results
fake classification real classification
P R F1 P R F1
faked & genuine patterns
1.0 0.03 0.06 0.75 0.001 0.003
faked & genuine & attribution patterns
1.0 0.03 0.06 0.43 0.03 0.06
faked & genuine & attribution patterns & cross-check
1.0 0.72 0.83 0.74 0.74 0.74
fake classification real classification
P R F1 P R F1
faked & genuine & attribution patterns & cross-check
1.0 0.04 0.09 0.62 0.23 0.33
Fake & Real Image Classifier
Performance looks good
when averaged on whole
dataset
Not good for all images though
Better classifying real images
than fake ones
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium
Application to our journalism use case
8
 Classifying tweets in isolation (fake and real) is of limited value
 High precision (89%+) but low recall (1%)
 Cross-check tweets then ranking by trustworthiness
 No false positives for fake classification using testset
 High precision (94%+) with average recall (43%+) looking across events in devset and testset
 Typically viral images & videos will have 100's of tweets before journalists become aware of them so a
recall of 20% is probably OK in this context
 Image classifiers
 Fake image classifier » High precision (96-100%) but low recall (4-10%)
 Real image classifier » High precision (62-95%) but low recall (19-23%)
 Classification explained in ways journalists understand & therefore trust
 Image X claimed verified by Tweet Y attributing to trusted entity Z
 We can alert journalists to trustworthy reports of verification and/or debunking
 Our approach does not replace manual verification techniques
 Someone still needs to actually verify the content!
Discussion
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium
Focus on image classification not Tweet classification
9
 The long term aim is to classify the images & videos NOT the tweets about them
 Suggestion » Score image classification results as well as tweet classification results
 End users usually wants to know if its real, not if its fake
 Classifying something as fake is usually a means to an end (e.g. to allow filtering)
 Suggestion » Score results for fake classification & real classification
Suggestions for Verification Challenge 2016
Improve the Tweet datasets to avoid bias to a single event
 Suggest using leave one event out cross validation when computing P/R/F1
 Suggest removing tweet repetition
 Some events (e.g. Syrian Boy) contain many duplicate tweets with a different author
 A classifier might only work well on 1 or 2 text styles BUT score highly as they are repeated a lot
 Suggest evenly balancing number of tweets per event type to avoid bias
 Devset - Hurricane Sandy event has about 84% of the tweets
 Testset - Syrian Boy event has about 47% of the tweets
REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium 10
Any questions?
Stuart E. Middleton
University of Southampton IT Innovation Centre
email: sem@it-innovation.soton.ac.uk
web: www.it-innovation.soton.ac.uk
twitter:@stuart_e_middle, @IT_Innov, @RevealEU
Many thanks for your attention!

More Related Content

Viewers also liked

Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...REVEAL - Social Media Verification
 
Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...REVEAL - Social Media Verification
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...SlideShare
 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksSlideShare
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShareSlideShare
 

Viewers also liked (7)

News-oriented multimedia search over multiple social networks
News-oriented multimedia search over multiple social networksNews-oriented multimedia search over multiple social networks
News-oriented multimedia search over multiple social networks
 
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
 
Prix Italia 2015 - Verification in Social Newsgathering
Prix Italia 2015 - Verification in Social NewsgatheringPrix Italia 2015 - Verification in Social Newsgathering
Prix Italia 2015 - Verification in Social Newsgathering
 
Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...
 
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
A Guide to SlideShare Analytics - Excerpts from Hubspot's Step by Step Guide ...
 
How to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & TricksHow to Make Awesome SlideShares: Tips & Tricks
How to Make Awesome SlideShares: Tips & Tricks
 
Getting Started With SlideShare
Getting Started With SlideShareGetting Started With SlideShare
Getting Started With SlideShare
 

Similar to "Extracting Attributed Verification and Debunking Reports from Social Media: MediaEval-2015 Trust and Credibility Analysis of Image and Video"

Edmo research + platforms panel
Edmo research + platforms panel Edmo research + platforms panel
Edmo research + platforms panel Weverify
 
Master Minds on Data Science - Joost de Wit
Master Minds on Data Science - Joost de WitMaster Minds on Data Science - Joost de Wit
Master Minds on Data Science - Joost de WitMedia Perspectives
 
TTO Keynote 08 10 2021
TTO Keynote 08 10 2021TTO Keynote 08 10 2021
TTO Keynote 08 10 2021Weverify
 
Social media as a trustworthy news source: Exploring journalists’ working pra...
Social media as a trustworthy news source: Exploring journalists’ working pra...Social media as a trustworthy news source: Exploring journalists’ working pra...
Social media as a trustworthy news source: Exploring journalists’ working pra...Petter Bae Brandtzæg
 
IRJET- Reality Show Analytics for TRP Ratings Based on Viewer’s Opinion
IRJET- Reality Show Analytics for TRP Ratings Based on Viewer’s OpinionIRJET- Reality Show Analytics for TRP Ratings Based on Viewer’s Opinion
IRJET- Reality Show Analytics for TRP Ratings Based on Viewer’s OpinionIRJET Journal
 
CrowdTV - Crowd-source TV content ratings
CrowdTV - Crowd-source TV content ratingsCrowdTV - Crowd-source TV content ratings
CrowdTV - Crowd-source TV content ratingsHenri Dambreville
 
Data driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & SnowplowData driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & SnowplowGiuseppe Gaviani
 
Context Aggregation and Analysis: A tool for User-Generated Video Verification
Context Aggregation and Analysis: A tool for User-Generated Video VerificationContext Aggregation and Analysis: A tool for User-Generated Video Verification
Context Aggregation and Analysis: A tool for User-Generated Video VerificationOlga Papadopoulou
 
Context Aggregation and Analysis: A Tool for User- Generated Video Verificati...
Context Aggregation and Analysis: A Tool for User- Generated Video Verificati...Context Aggregation and Analysis: A Tool for User- Generated Video Verificati...
Context Aggregation and Analysis: A Tool for User- Generated Video Verificati...Weverify
 
UCA: Data Gathering Techniques. Selection criteria
UCA: Data Gathering Techniques. Selection criteriaUCA: Data Gathering Techniques. Selection criteria
UCA: Data Gathering Techniques. Selection criteriaaukee
 
eMarketer Webinar: The Evolving Online Video Landscape
eMarketer Webinar: The Evolving Online Video LandscapeeMarketer Webinar: The Evolving Online Video Landscape
eMarketer Webinar: The Evolving Online Video LandscapeeMarketer
 
Channel Factory Brand Safety Thesis
Channel Factory Brand Safety ThesisChannel Factory Brand Safety Thesis
Channel Factory Brand Safety ThesisTony Chen
 
Union Suisse Spring :: Unleashing the power of Public Private Partnerships (P...
Union Suisse Spring :: Unleashing the power of Public Private Partnerships (P...Union Suisse Spring :: Unleashing the power of Public Private Partnerships (P...
Union Suisse Spring :: Unleashing the power of Public Private Partnerships (P...Catalyx
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentSymeon Papadopoulos
 
o.vision - NOAH19 London
o.vision - NOAH19 Londono.vision - NOAH19 London
o.vision - NOAH19 LondonNOAH Advisors
 
Online video advertising challenges
Online video advertising  challenges Online video advertising  challenges
Online video advertising challenges Javier Lasa
 
10:50AM Jonathan Nash - Safe, Premium Video: Still A Programmatic Myth?
10:50AM Jonathan Nash - Safe, Premium Video: Still A Programmatic Myth?10:50AM Jonathan Nash - Safe, Premium Video: Still A Programmatic Myth?
10:50AM Jonathan Nash - Safe, Premium Video: Still A Programmatic Myth?Digiday
 
WeVerify at NILC - May 2019.pptx
WeVerify at NILC - May 2019.pptxWeVerify at NILC - May 2019.pptx
WeVerify at NILC - May 2019.pptxWeverify
 

Similar to "Extracting Attributed Verification and Debunking Reports from Social Media: MediaEval-2015 Trust and Credibility Analysis of Image and Video" (20)

Edmo research + platforms panel
Edmo research + platforms panel Edmo research + platforms panel
Edmo research + platforms panel
 
Master Minds on Data Science - Joost de Wit
Master Minds on Data Science - Joost de WitMaster Minds on Data Science - Joost de Wit
Master Minds on Data Science - Joost de Wit
 
TTO Keynote 08 10 2021
TTO Keynote 08 10 2021TTO Keynote 08 10 2021
TTO Keynote 08 10 2021
 
Fake News Analyzer
Fake News AnalyzerFake News Analyzer
Fake News Analyzer
 
Durex - Real Feel
Durex - Real FeelDurex - Real Feel
Durex - Real Feel
 
Social media as a trustworthy news source: Exploring journalists’ working pra...
Social media as a trustworthy news source: Exploring journalists’ working pra...Social media as a trustworthy news source: Exploring journalists’ working pra...
Social media as a trustworthy news source: Exploring journalists’ working pra...
 
IRJET- Reality Show Analytics for TRP Ratings Based on Viewer’s Opinion
IRJET- Reality Show Analytics for TRP Ratings Based on Viewer’s OpinionIRJET- Reality Show Analytics for TRP Ratings Based on Viewer’s Opinion
IRJET- Reality Show Analytics for TRP Ratings Based on Viewer’s Opinion
 
CrowdTV - Crowd-source TV content ratings
CrowdTV - Crowd-source TV content ratingsCrowdTV - Crowd-source TV content ratings
CrowdTV - Crowd-source TV content ratings
 
Data driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & SnowplowData driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & Snowplow
 
Context Aggregation and Analysis: A tool for User-Generated Video Verification
Context Aggregation and Analysis: A tool for User-Generated Video VerificationContext Aggregation and Analysis: A tool for User-Generated Video Verification
Context Aggregation and Analysis: A tool for User-Generated Video Verification
 
Context Aggregation and Analysis: A Tool for User- Generated Video Verificati...
Context Aggregation and Analysis: A Tool for User- Generated Video Verificati...Context Aggregation and Analysis: A Tool for User- Generated Video Verificati...
Context Aggregation and Analysis: A Tool for User- Generated Video Verificati...
 
UCA: Data Gathering Techniques. Selection criteria
UCA: Data Gathering Techniques. Selection criteriaUCA: Data Gathering Techniques. Selection criteria
UCA: Data Gathering Techniques. Selection criteria
 
eMarketer Webinar: The Evolving Online Video Landscape
eMarketer Webinar: The Evolving Online Video LandscapeeMarketer Webinar: The Evolving Online Video Landscape
eMarketer Webinar: The Evolving Online Video Landscape
 
Channel Factory Brand Safety Thesis
Channel Factory Brand Safety ThesisChannel Factory Brand Safety Thesis
Channel Factory Brand Safety Thesis
 
Union Suisse Spring :: Unleashing the power of Public Private Partnerships (P...
Union Suisse Spring :: Unleashing the power of Public Private Partnerships (P...Union Suisse Spring :: Unleashing the power of Public Private Partnerships (P...
Union Suisse Spring :: Unleashing the power of Public Private Partnerships (P...
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
o.vision - NOAH19 London
o.vision - NOAH19 Londono.vision - NOAH19 London
o.vision - NOAH19 London
 
Online video advertising challenges
Online video advertising  challenges Online video advertising  challenges
Online video advertising challenges
 
10:50AM Jonathan Nash - Safe, Premium Video: Still A Programmatic Myth?
10:50AM Jonathan Nash - Safe, Premium Video: Still A Programmatic Myth?10:50AM Jonathan Nash - Safe, Premium Video: Still A Programmatic Myth?
10:50AM Jonathan Nash - Safe, Premium Video: Still A Programmatic Myth?
 
WeVerify at NILC - May 2019.pptx
WeVerify at NILC - May 2019.pptxWeVerify at NILC - May 2019.pptx
WeVerify at NILC - May 2019.pptx
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

"Extracting Attributed Verification and Debunking Reports from Social Media: MediaEval-2015 Trust and Credibility Analysis of Image and Video"

  • 1. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium Your Name – Your Company your@email www.yourwebsite.com Stuart E. Middleton University of Southampton IT Innovation Centre sem@it-innovation.soton.ac.uk @stuart_e_middle @IT_Innov @RevealEU www.it-innovation.soton.ac.uk Extracting Attributed Verification and Debunking Reports from Social Media: MediaEval-2015 Trust and Credibility Analysis of Image and Video
  • 2. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium Overview 1  Problem Statement  Approach  Results  Discussion  Suggestions for Verification Challenge 2016 UoS-ITI Team
  • 3. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium Verification of Images and Videos for Breaking News 2  Breaking News Timescales  Minutes not hours - its old news after a couple of hours  Journalists need to verify copy and get it published before their rivals do  Journalistic Manual Verification Procedures for User Generated Content (UGC)  Check content provenance - original post? location? timestamp? similar posts? website? ...  Check author / source - attributed or author? known (un)reliable? popular? reputation? post history? ...  Check content credibility - right image metadata? right location? right people? right weather? ...  Phone the author up - triangulate facts, quiz author to check genuine, get authorization to publish  Automate the Simpler Verification Steps  Empowering journalists  Increases the volume of contextual content that can be considered  Focus humans on the more complex & subjective cross-checking tasks  Contact content authors via phone and ask them difficult questions  Does human behaviour 'look right' in a video?  Cross-reference buildings / landmarks in image backgrounds to Google StreetView / image databases  ... see the VerificationHandbook » http://verificationhandbook.com/ Problem Statement
  • 4. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium Attribute evidence to trusted or untrusted sources 3  Hypothesis  The 'wisdom of the crowd' is not really wisdom at all when it comes to verifying suspicious content  It is better to rank evidence according to the most trusted & credible sources like journalists do  Semi-automated approach  Manually create a list of trusted sources  Tweets » NLP » Extract fake & genuine claims & attribution to sources » Evidence  Evidence » Cross-check all content for image / video » Fake/real decision based on best evidence  Trustworthiness hierarchy for tweeted claims about images & videos  Claim = statement that its a fake image / video or its genuine  Claim authored by trusted source     Claim authored by untrusted source     Claim attributed to trusted source    Claim attributed to untrusted source    Unattributed claim  Approach
  • 5. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium Regex patterns 4 Approach Named Entity Patterns @ (NNP|NN) # (NNP|NN) (NNP|NN) (NNP|NN) (NNP|NN) Attribution Patterns <NE> *{0,3} <IMAGE> ... <NE> *{0,2} <RELEASE> *{0,4} <IMAGE> ... ... <IMAGE> *{0,6} <FROM> *{0,1} <NE> ... <FROM> *{0,1} <NE> ... <IMAGE> *{0,1} <NE> ... <RT> <SEP>{0,1} <NE> Faked Patterns ... *{0,2} <FAKED> ... ... <REAL> ? ... ... <NEGATIVE> *{0,1} <REAL> ... Genuine Patterns ... <IMAGE> *{0,2} <REAL> ... ... <REAL> *{0,2} <IMAGE> ... ... <IS> *{0,1} <REAL> ... ... <NEGATIVE> *{0,1} <FAKE> ... e.g. CNN BBC News @bbcnews e.g. FBI has released prime suspect photos ... ... pic - BBC News ... image released via CNN ... RT: BBC News e.g. ... what a fake! ... ... is it real? ... ... thats not real ... e.g. ... this image is totally genuine ... ... its real ... Key <NE> = named entity (e.g. trusted source) <IMAGE> = image variants(e.g. pic, image, video) <FROM> = from variants(e.g. via, from, attributed) <REAL> = real variants (e.g. real, genuine) <NEGATIVE> = negative variants (e.g. not, isn't) <RT> = RT variants (e.g. RT, MT) <SEP> = separator variants (e.g. : - = ) <IS> = is | its | thats
  • 6. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium Fake & Real Tweet Classifier 5 Results fake classification real classification P R F1 P R F1 faked & genuine patterns 1.0 0.03 0.06 0.75 0.001 0.003 faked & genuine & attribution patterns 1.0 0.03 0.06 0.43 0.03 0.06 faked & genuine & attribution patterns & cross-check 1.0 0.72 0.83 0.74 0.74 0.74 fake classification real classification P R F1 P R F1 faked & genuine & attribution patterns & cross-check 1.0 0.04 0.09 0.62 0.23 0.33 Fake & Real Image Classifier
  • 7. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium Fake & Real Tweet Classifier 6 Results fake classification real classification P R F1 P R F1 faked & genuine patterns 1.0 0.03 0.06 0.75 0.001 0.003 faked & genuine & attribution patterns 1.0 0.03 0.06 0.43 0.03 0.06 faked & genuine & attribution patterns & cross-check 1.0 0.72 0.83 0.74 0.74 0.74 fake classification real classification P R F1 P R F1 faked & genuine & attribution patterns & cross-check 1.0 0.04 0.09 0.62 0.23 0.33 Fake & Real Image Classifier No mistakes classifying fakes in testset Low false positives important for end users like journalists
  • 8. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium Fake & Real Tweet Classifier 7 Results fake classification real classification P R F1 P R F1 faked & genuine patterns 1.0 0.03 0.06 0.75 0.001 0.003 faked & genuine & attribution patterns 1.0 0.03 0.06 0.43 0.03 0.06 faked & genuine & attribution patterns & cross-check 1.0 0.72 0.83 0.74 0.74 0.74 fake classification real classification P R F1 P R F1 faked & genuine & attribution patterns & cross-check 1.0 0.04 0.09 0.62 0.23 0.33 Fake & Real Image Classifier Performance looks good when averaged on whole dataset Not good for all images though Better classifying real images than fake ones
  • 9. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium Application to our journalism use case 8  Classifying tweets in isolation (fake and real) is of limited value  High precision (89%+) but low recall (1%)  Cross-check tweets then ranking by trustworthiness  No false positives for fake classification using testset  High precision (94%+) with average recall (43%+) looking across events in devset and testset  Typically viral images & videos will have 100's of tweets before journalists become aware of them so a recall of 20% is probably OK in this context  Image classifiers  Fake image classifier » High precision (96-100%) but low recall (4-10%)  Real image classifier » High precision (62-95%) but low recall (19-23%)  Classification explained in ways journalists understand & therefore trust  Image X claimed verified by Tweet Y attributing to trusted entity Z  We can alert journalists to trustworthy reports of verification and/or debunking  Our approach does not replace manual verification techniques  Someone still needs to actually verify the content! Discussion
  • 10. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium Focus on image classification not Tweet classification 9  The long term aim is to classify the images & videos NOT the tweets about them  Suggestion » Score image classification results as well as tweet classification results  End users usually wants to know if its real, not if its fake  Classifying something as fake is usually a means to an end (e.g. to allow filtering)  Suggestion » Score results for fake classification & real classification Suggestions for Verification Challenge 2016 Improve the Tweet datasets to avoid bias to a single event  Suggest using leave one event out cross validation when computing P/R/F1  Suggest removing tweet repetition  Some events (e.g. Syrian Boy) contain many duplicate tweets with a different author  A classifier might only work well on 1 or 2 text styles BUT score highly as they are repeated a lot  Suggest evenly balancing number of tweets per event type to avoid bias  Devset - Hurricane Sandy event has about 84% of the tweets  Testset - Syrian Boy event has about 47% of the tweets
  • 11. REVEAL Project: Co-funded by the EU FP7 Programme Nr.: 610928 www.revealproject.eu © 2015 REVEAL consortium 10 Any questions? Stuart E. Middleton University of Southampton IT Innovation Centre email: sem@it-innovation.soton.ac.uk web: www.it-innovation.soton.ac.uk twitter:@stuart_e_middle, @IT_Innov, @RevealEU Many thanks for your attention!