SlideShare a Scribd company logo
1 of 14
Download to read offline
Logically at the Factify 2: A Multi-Modal Fact
Checking System Based on Evidence Retrieval
techniques and Transformer Encoder
Architecture
Pim Jordi Verschuuren, Jie Gao, Adelize Van Eeden, Stylianos
Oikonomou, Anil Bandhakavi
Feb, 2023
1 Introduction
Dataset and Evaluation
Challenges
Data analysis
2
3
4
Table of content
5 Our participating system
6 Results and Experiments
7 Conclusion
Factify 2 Challenge: Introduction - Task
Claim
Text: BREAKING: Meghan, the Duchess of
Sussex, is in labor with her first child,
Buckingham Palace announces.
https://abcn.ws/2GXAYNy
Text: By Lauren Said-Moorhouse, CNN
Buckingham Palace tells CNN that the
Duchess of Sussex went into labor in the early
hours on Monday morning. … Harry and
Meghan married … and announced they were
expecting in October when they touched down
in Australia for their first overseas tour as a
married couple.
Document
label:
SUPPORT_MULTIMODAL
_MULTIMODAL
SUPPORT
images similar /
about the same
scene/entity
doc text supports
claim text
Factify 2 Challenge: Dataset and Evaluation
● Images, textual claims, doc/reference textual
document/images, OCR texts from claim/doc images
provided
● Balanced dataset
Metrics
● Weighted-average F1 score
Category Train Val Test
Support_Multimodal 7000 1500 1500
Support_Text 7000 1500 1500
Insufficient_Multimodal 7000 1500 1500
Insufficient_Text 7000 1500 1500
Refute 7000 1500 1500
35000 7500 7500
Dataset summary
Factify 2 Challenge: Introduction - Challenges
● Combine different modalities such as text, image,
videos and audio into a single system to accurately
detect false information
● Subtle differences in fine-grained veracity
categories
● Feature representation from multiple sources in
order to identify discrepancies
● Integrate of NLP for context understanding and
semantics within each data source type to correctly
assess factuality presented by various media outlets
● Scalability of large data volume and long sequence
multimodal modeling complexity (doc text in this
task)
Major technical challenges
Data Analysis - Text Length Distribution
● Claim/Doc text and OCR text length distributions from train
set are represented (right-side)
● Text length distribution varies between the claim and doc
text
Observations
● Texts are much shorter and less varied in “Refute” category in
both claim and doc
● “Support_Multimodal” and “Support_Text” categories have
document text lengths that are on average longer and have
a bigger spread
● “Insufficient_Multimodal” and “Insufficient_Text” have
document text lengths that are shorter on average
● OCR text distribution show less variability between claim and
doc
● “Refute” has on average longer OCR text lengths
○ Followed by two text related categories
Long text sequences challenge
Fig 2 (a). Claim text length dist. Fig 2 (b) Doc text length dist.
Fig 3 (a). Claim OCR length dist. Fig 3 (b). Doc OCR length dist.
Data Analysis - Image Similarity Distribution
● CLIP pre-trained model is used to encode images
● Correlation between claim and doc images is measured
with cosine similarity
Observations
● Image similarity correlation is higher for two “Multimodal”
related categories than other three categories
○ I.e., can be leveraged to verify multimodal entailment
categories
● The label correlation largely increased compared to the
dataset in Factify 1
Image Matching Challenge
Fig 4 (b). Claim Image and Doc Image Similarity Dist.
Fig 4 (a). Claim Image and Doc Image Similarity Histogram
Data Analysis - Multimodal Similarity Distribution
● CLIP pre-trained model is used to encode both
text and images
● Bimodality pair correlation is measured with
cosine similarity
Observations
● “Support_Multimodal” presents the relatively
higher pairwise similarity correlation between
label and multimodal pair of claim text and
evidence image
● “Insufficient text” have the lowest pairwise
similarity correlation between claim text and doc
image
● No significant correlation among other
multimodal pairs
Cross-Modality matching Challenge
Fig. 5 (b) Claim Text and Doc Image
Similarity Dist.
Fig. 6 (c) Claim Image and Doc Text
Similarity Dist.
Our participating system - overall architecture
● a textual evidence retrieval component
● a transformer based seq2seq cross-modal veracity model
Two-stage evidence based seq2seq veracity detection system
Our participating system - Evidence Retrieval Component
● “multi-qa-mpnet-base-dot-v1” (from
SBERT) to compute embeddings of
claim-doc text pairs (Bi-Encoder)
● Top 𝐾 passages are selected, ranked and
concatenated
Textual Evidence Retrieval
Our participating system - seq2seq cross-modal veracity model
● Embedding layer
○ A pre-trained cross-modal model (i.e.
CLIP)
○ a pre-trained text embedding (w2v)
● Cross-modal embeddings of 6 modality
input are concatenated (listwise)
● Text/tokens embeddings of claim and
evidence passage pairs are
concatenated
● Feed into two separate transformer
encoders before concatenating and
passing through an MLP classifier
Transformer based seq2seq cross-modality
veracity prediction
Results and Experiments
● Settings
○ with or without evidence selection
○ vary length of evidence doc text sorted
by evidence retriever
○ passage ranking at paragraph level
versus sentence level;
○ text-to-text alignment with SBERT vs.
cross-modal alignment with CLIP
○ Validate if SBERT model trained on QA
dataset perform better than general
purpose SBERT model
● Preliminary Results
○ QA-enhanced/fine-tuned models
perform better than all-round model
○ combining SBERT-QA at top K sentence-
level evidence passage retrieval achieves
optimal performance
○ best model "SBERT-
QA_sentence_ER_top5" obtains 0.79
weighted avg. F1 with 20th epochs.
Validate and optimize the effect of
evidence retrieval settings
Conclusion
● Cross-modal pre-trained models (such as CLIP)
exhibit great transferability for zero-shot or few-
shot scenarios
● Self-attentive modules based on transformer
encoder architecture is a very effective fine-
grained alignment technique to learn hidden
relationships across multi-modalities
Future work
● Finer-grained cross-modal representations for
intra- and intermodality alignment (sentence
words or entities/visual objects)
● More focus should be placed on real-world
challenges
○ Cross-modal retrieval
○ Diverse context and domains
○ Large and high-quality multimodal fact checking
datasets reflecting real-world scenario
○ Explainability
Lessons learnt
Thank You!
www.logically.ai

More Related Content

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Factify2_Challenge_Logically.pptx

  • 1. Logically at the Factify 2: A Multi-Modal Fact Checking System Based on Evidence Retrieval techniques and Transformer Encoder Architecture Pim Jordi Verschuuren, Jie Gao, Adelize Van Eeden, Stylianos Oikonomou, Anil Bandhakavi Feb, 2023
  • 2. 1 Introduction Dataset and Evaluation Challenges Data analysis 2 3 4 Table of content 5 Our participating system 6 Results and Experiments 7 Conclusion
  • 3. Factify 2 Challenge: Introduction - Task Claim Text: BREAKING: Meghan, the Duchess of Sussex, is in labor with her first child, Buckingham Palace announces. https://abcn.ws/2GXAYNy Text: By Lauren Said-Moorhouse, CNN Buckingham Palace tells CNN that the Duchess of Sussex went into labor in the early hours on Monday morning. … Harry and Meghan married … and announced they were expecting in October when they touched down in Australia for their first overseas tour as a married couple. Document label: SUPPORT_MULTIMODAL _MULTIMODAL SUPPORT images similar / about the same scene/entity doc text supports claim text
  • 4. Factify 2 Challenge: Dataset and Evaluation ● Images, textual claims, doc/reference textual document/images, OCR texts from claim/doc images provided ● Balanced dataset Metrics ● Weighted-average F1 score Category Train Val Test Support_Multimodal 7000 1500 1500 Support_Text 7000 1500 1500 Insufficient_Multimodal 7000 1500 1500 Insufficient_Text 7000 1500 1500 Refute 7000 1500 1500 35000 7500 7500 Dataset summary
  • 5. Factify 2 Challenge: Introduction - Challenges ● Combine different modalities such as text, image, videos and audio into a single system to accurately detect false information ● Subtle differences in fine-grained veracity categories ● Feature representation from multiple sources in order to identify discrepancies ● Integrate of NLP for context understanding and semantics within each data source type to correctly assess factuality presented by various media outlets ● Scalability of large data volume and long sequence multimodal modeling complexity (doc text in this task) Major technical challenges
  • 6. Data Analysis - Text Length Distribution ● Claim/Doc text and OCR text length distributions from train set are represented (right-side) ● Text length distribution varies between the claim and doc text Observations ● Texts are much shorter and less varied in “Refute” category in both claim and doc ● “Support_Multimodal” and “Support_Text” categories have document text lengths that are on average longer and have a bigger spread ● “Insufficient_Multimodal” and “Insufficient_Text” have document text lengths that are shorter on average ● OCR text distribution show less variability between claim and doc ● “Refute” has on average longer OCR text lengths ○ Followed by two text related categories Long text sequences challenge Fig 2 (a). Claim text length dist. Fig 2 (b) Doc text length dist. Fig 3 (a). Claim OCR length dist. Fig 3 (b). Doc OCR length dist.
  • 7. Data Analysis - Image Similarity Distribution ● CLIP pre-trained model is used to encode images ● Correlation between claim and doc images is measured with cosine similarity Observations ● Image similarity correlation is higher for two “Multimodal” related categories than other three categories ○ I.e., can be leveraged to verify multimodal entailment categories ● The label correlation largely increased compared to the dataset in Factify 1 Image Matching Challenge Fig 4 (b). Claim Image and Doc Image Similarity Dist. Fig 4 (a). Claim Image and Doc Image Similarity Histogram
  • 8. Data Analysis - Multimodal Similarity Distribution ● CLIP pre-trained model is used to encode both text and images ● Bimodality pair correlation is measured with cosine similarity Observations ● “Support_Multimodal” presents the relatively higher pairwise similarity correlation between label and multimodal pair of claim text and evidence image ● “Insufficient text” have the lowest pairwise similarity correlation between claim text and doc image ● No significant correlation among other multimodal pairs Cross-Modality matching Challenge Fig. 5 (b) Claim Text and Doc Image Similarity Dist. Fig. 6 (c) Claim Image and Doc Text Similarity Dist.
  • 9. Our participating system - overall architecture ● a textual evidence retrieval component ● a transformer based seq2seq cross-modal veracity model Two-stage evidence based seq2seq veracity detection system
  • 10. Our participating system - Evidence Retrieval Component ● “multi-qa-mpnet-base-dot-v1” (from SBERT) to compute embeddings of claim-doc text pairs (Bi-Encoder) ● Top 𝐾 passages are selected, ranked and concatenated Textual Evidence Retrieval
  • 11. Our participating system - seq2seq cross-modal veracity model ● Embedding layer ○ A pre-trained cross-modal model (i.e. CLIP) ○ a pre-trained text embedding (w2v) ● Cross-modal embeddings of 6 modality input are concatenated (listwise) ● Text/tokens embeddings of claim and evidence passage pairs are concatenated ● Feed into two separate transformer encoders before concatenating and passing through an MLP classifier Transformer based seq2seq cross-modality veracity prediction
  • 12. Results and Experiments ● Settings ○ with or without evidence selection ○ vary length of evidence doc text sorted by evidence retriever ○ passage ranking at paragraph level versus sentence level; ○ text-to-text alignment with SBERT vs. cross-modal alignment with CLIP ○ Validate if SBERT model trained on QA dataset perform better than general purpose SBERT model ● Preliminary Results ○ QA-enhanced/fine-tuned models perform better than all-round model ○ combining SBERT-QA at top K sentence- level evidence passage retrieval achieves optimal performance ○ best model "SBERT- QA_sentence_ER_top5" obtains 0.79 weighted avg. F1 with 20th epochs. Validate and optimize the effect of evidence retrieval settings
  • 13. Conclusion ● Cross-modal pre-trained models (such as CLIP) exhibit great transferability for zero-shot or few- shot scenarios ● Self-attentive modules based on transformer encoder architecture is a very effective fine- grained alignment technique to learn hidden relationships across multi-modalities Future work ● Finer-grained cross-modal representations for intra- and intermodality alignment (sentence words or entities/visual objects) ● More focus should be placed on real-world challenges ○ Cross-modal retrieval ○ Diverse context and domains ○ Large and high-quality multimodal fact checking datasets reflecting real-world scenario ○ Explainability Lessons learnt