SlideShare a Scribd company logo
A Multimodal Ensemble Model for Detecting
Unreliable Information on Vietnamese SNS
Phạm Quang Nhật Minh
AImesoft JSC, Vietnam
minhpham0902@gmail.com
December 18, 2020
Nguyễn Mạnh Đức Tuân
Toyo University, Japan
ductuan024@gmail.com
7th International Workshop on
Vietnamese Language and Speech Processing (VLSP 2020)
What is Fake News?
2
n “Fake news is a news article that is intentionally and
veritably false.” (Shu et al., 2017)
Why Fake News Detection?
3
n Fake news negatively affects to society
n Fake news spreads like a real virus, especially via
social medias
¨ https://engineering.stanford.edu/magazine/article/how-
fake-news-spreads-real-virus
n Fake news detection is useful to increase the
credibility of information of medias, and prevent
spreading of fake contents
Why Multimodal is Important?
4
n In a addition to texts, images and videos are popular in
social medias
¨ Visual information is helpful in detecting rumors
n Other metadata information is useful: number of likes,
shares, retweets, time stamps, etc
Our Approach
5
Text contentsImages Metadata features
VGG 19
Fully-connected layer
BERT + CNN
Making Classification
Main Findings
6
n The proposed attention mechanism used to get the
representation of images is useful
n Adding residual connections in blocks leads to
performance improvement
n System accuracy is improved with our proposed
ensemble model
Proposed Method in Detail
7
n Data processing
n Model architecture
n Experiments and results
Data Format
8
n Each piece of information includes 6 main
attributes:
¨ The anonymized id of the owner
¨ Text contents
¨ Timestamp
¨ Number of likes
¨ Number of comments
¨ Number of shares
n Each news may contain zero or more than one
image
Text pre-processing
9
n Convert emojis such as =]], :( into sentiment
words "happy" or "sad" in Vietnamese.
n Converted words and tokens that have been
lengthened into short form.
¨ “coool” to “cool”
n Changed different terms about COVID-19 into
one term for consistency.
¨ “covid”, “ncov”
Data Imputation
10
n Mean values to fill missing values.
n For the timestamp, we applied the MICE
imputation method (Azuret al., 2011)
General Model
11
Given the representation of an image and a
text, we learn which parts of the impage we
should give more attention
Model 1
12
1D-CNN layers with filter sizes 2, 3, 4, 5 follow the
BERT module, and then a fully connected layer
with Batch Normalization follow 1D-CNN layers
Model 2&3
13
Model 2&3 used three additional
1D-CNN layers
Model 3 used residual connections
for additional 1D-CNN layers
Feature Design (1)
14
n Timestamp feature is converted into:
¨ Day
¨ Month
¨ Year
¨ Hour
¨ Weekday
n Text-based features:
¨ Number of hashtags
¨ Number of URLs
¨ Number of characters
¨ Number of words
¨ Number of question-marks
¨ Number of exclaim-marks
¨ A Boolean variable to indicate that post contains images or not
Feature Design (2)
15
n User-based features:
¨ Number of unreliable news
¨ Number of reliable news
¨ Ratio between two numbers, to indicate the sharing behavior
n All the above features will be standardized by subtracting the mean and
scaling to unit variance, except for the Boolean feature.
Multi Image Posts
16
n Some posts contain more than one image
n Two strategies:
¨ Use one image as input
¨ Multiple images (4 images at most) as input.
Proposed Ensemble Model
17
n Choose two best models among three models
n Calculate averages of probabilities returned by two
models
Experiments & Results
18
n Evaluation measure: ROC AUC
n We conducted experiments in order to evaluate
¨ The effect of pre-trained BERT models
¨ Text preprocessing strategies
¨ The effectiveness of the attention mechanism
PhoBERT vs NlpHUST/vibert4news
19
n Bert4news uses syllable-based tokenization
¨ Trained on 20GB of news texts
n PhoBERT uses word-level/subword tokenization
¨ Trained on 20GB of texts including Wikipedia and news
Pre-trained model Result on private test (AUC)
PhoBERT 0.921
bert4news 0.928
Effectiveness of Attention Mechanism
20
n Using attention mechanism significantly
improved the result
n Images and texts are co-related.
¨ Images and texts of reliable news are often
related
¨ Someone may use images that do not relate to
the content of the news for the click-bait purpose
Models Result on private test (AUC)
w/o attention 0.928
attention 0.940
Incorrect vs correct form words
21
n “sá.thại” vs “sát hại”
¨ Contain violent contents or ex-treme words.
¨ Can bypass the social media’s filtering function.
n Keeping is better!
¨ Partly reflect the sentiment of the text.
¨ Unreliable contents tend to use more subjective or extreme words to
convey a particular perspective.
Models (PhoBERT) Result on private test (AUC)
Words in correct form 0.918
Words in incorrect form 0.921
Results
22
Run Result on private test (AUC)
Model 1 0.939
Model 2 0.919
Model 3 0.940
Ensemble 0.945
n Results on the private test
Future work
23
n Use external data for fake news detection
n The natural way to make a judgement in fake
news detection task is to compare with
different information sources to find out
relevant evidences of fake news.
Thank you very much for listening!
24

More Related Content

Similar to A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnamese SNS

Predicting cyber bullying on t witter using machine learning
Predicting cyber bullying on t witter using machine learningPredicting cyber bullying on t witter using machine learning
Predicting cyber bullying on t witter using machine learning
MirXahid1
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
ManishReddy706923
 
Portofolio Muhammad Afrizal Septiansyah 2024
Portofolio Muhammad Afrizal Septiansyah 2024Portofolio Muhammad Afrizal Septiansyah 2024
Portofolio Muhammad Afrizal Septiansyah 2024
MuhammadAfrizalSepti
 
Audubon's UX portfolio
Audubon's UX portfolioAudubon's UX portfolio
Audubon's UX portfolio
Audubon McKeown D.
 
An evolutionary approach to comparative analysis of detecting Bangla abusive ...
An evolutionary approach to comparative analysis of detecting Bangla abusive ...An evolutionary approach to comparative analysis of detecting Bangla abusive ...
An evolutionary approach to comparative analysis of detecting Bangla abusive ...
journalBEEI
 
IRJET- Segmenting, Multimedia Summarizing and Query based Retrieval of New...
IRJET- 	  Segmenting, Multimedia Summarizing and Query based Retrieval of New...IRJET- 	  Segmenting, Multimedia Summarizing and Query based Retrieval of New...
IRJET- Segmenting, Multimedia Summarizing and Query based Retrieval of New...
IRJET Journal
 
Graph embedding approach to analyze sentiments on cryptocurrency
Graph embedding approach to analyze sentiments on cryptocurrencyGraph embedding approach to analyze sentiments on cryptocurrency
Graph embedding approach to analyze sentiments on cryptocurrency
IJECEIAES
 
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...
IRJET Journal
 
The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...
IRJET Journal
 
[DSC Croatia 22] Experience in collaboration between academia and industry: N...
[DSC Croatia 22] Experience in collaboration between academia and industry: N...[DSC Croatia 22] Experience in collaboration between academia and industry: N...
[DSC Croatia 22] Experience in collaboration between academia and industry: N...
DataScienceConferenc1
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
Yiannis Kompatsiaris
 
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Sara Alvarez
 
my model genuines.
my model genuines.my model genuines.
my model genuines.Teng Xiaolu
 
Fake News Detection Using Machine Learning
Fake News Detection Using Machine LearningFake News Detection Using Machine Learning
Fake News Detection Using Machine Learning
IRJET Journal
 
News Reliability Evaluation using Latent Semantic Analysis
News Reliability Evaluation using Latent Semantic AnalysisNews Reliability Evaluation using Latent Semantic Analysis
News Reliability Evaluation using Latent Semantic Analysis
TELKOMNIKA JOURNAL
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
jcscholtes
 
IRJET- Fake Message Deduction using Machine Learining
IRJET- Fake Message Deduction using Machine LeariningIRJET- Fake Message Deduction using Machine Learining
IRJET- Fake Message Deduction using Machine Learining
IRJET Journal
 
ReTV at EBU MDN Workshop 2020
ReTV at EBU MDN Workshop 2020ReTV at EBU MDN Workshop 2020
ReTV at EBU MDN Workshop 2020
ReTV project
 
Analyzing sentiment dynamics from sparse text coronavirus disease-19 vaccina...
Analyzing sentiment dynamics from sparse text coronavirus  disease-19 vaccina...Analyzing sentiment dynamics from sparse text coronavirus  disease-19 vaccina...
Analyzing sentiment dynamics from sparse text coronavirus disease-19 vaccina...
IJECEIAES
 
PhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomiesPhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomies
Freddy Limpens
 

Similar to A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnamese SNS (20)

Predicting cyber bullying on t witter using machine learning
Predicting cyber bullying on t witter using machine learningPredicting cyber bullying on t witter using machine learning
Predicting cyber bullying on t witter using machine learning
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
 
Portofolio Muhammad Afrizal Septiansyah 2024
Portofolio Muhammad Afrizal Septiansyah 2024Portofolio Muhammad Afrizal Septiansyah 2024
Portofolio Muhammad Afrizal Septiansyah 2024
 
Audubon's UX portfolio
Audubon's UX portfolioAudubon's UX portfolio
Audubon's UX portfolio
 
An evolutionary approach to comparative analysis of detecting Bangla abusive ...
An evolutionary approach to comparative analysis of detecting Bangla abusive ...An evolutionary approach to comparative analysis of detecting Bangla abusive ...
An evolutionary approach to comparative analysis of detecting Bangla abusive ...
 
IRJET- Segmenting, Multimedia Summarizing and Query based Retrieval of New...
IRJET- 	  Segmenting, Multimedia Summarizing and Query based Retrieval of New...IRJET- 	  Segmenting, Multimedia Summarizing and Query based Retrieval of New...
IRJET- Segmenting, Multimedia Summarizing and Query based Retrieval of New...
 
Graph embedding approach to analyze sentiments on cryptocurrency
Graph embedding approach to analyze sentiments on cryptocurrencyGraph embedding approach to analyze sentiments on cryptocurrency
Graph embedding approach to analyze sentiments on cryptocurrency
 
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...
 
The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...
 
[DSC Croatia 22] Experience in collaboration between academia and industry: N...
[DSC Croatia 22] Experience in collaboration between academia and industry: N...[DSC Croatia 22] Experience in collaboration between academia and industry: N...
[DSC Croatia 22] Experience in collaboration between academia and industry: N...
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
 
my model genuines.
my model genuines.my model genuines.
my model genuines.
 
Fake News Detection Using Machine Learning
Fake News Detection Using Machine LearningFake News Detection Using Machine Learning
Fake News Detection Using Machine Learning
 
News Reliability Evaluation using Latent Semantic Analysis
News Reliability Evaluation using Latent Semantic AnalysisNews Reliability Evaluation using Latent Semantic Analysis
News Reliability Evaluation using Latent Semantic Analysis
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
 
IRJET- Fake Message Deduction using Machine Learining
IRJET- Fake Message Deduction using Machine LeariningIRJET- Fake Message Deduction using Machine Learining
IRJET- Fake Message Deduction using Machine Learining
 
ReTV at EBU MDN Workshop 2020
ReTV at EBU MDN Workshop 2020ReTV at EBU MDN Workshop 2020
ReTV at EBU MDN Workshop 2020
 
Analyzing sentiment dynamics from sparse text coronavirus disease-19 vaccina...
Analyzing sentiment dynamics from sparse text coronavirus  disease-19 vaccina...Analyzing sentiment dynamics from sparse text coronavirus  disease-19 vaccina...
Analyzing sentiment dynamics from sparse text coronavirus disease-19 vaccina...
 
PhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomiesPhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomies
 

More from Minh Pham

Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPTPrompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Minh Pham
 
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...
Minh Pham
 
Research methods for engineering students (v.2020)
Research methods for engineering students (v.2020)Research methods for engineering students (v.2020)
Research methods for engineering students (v.2020)
Minh Pham
 
Giới thiệu về AIML
Giới thiệu về AIMLGiới thiệu về AIML
Giới thiệu về AIML
Minh Pham
 
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiênMạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Minh Pham
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Minh Pham
 
Deep Contexualized Representation
Deep Contexualized RepresentationDeep Contexualized Representation
Deep Contexualized Representation
Minh Pham
 
Research Methods in Natural Language Processing (2018 version)
Research Methods in Natural Language Processing (2018 version)Research Methods in Natural Language Processing (2018 version)
Research Methods in Natural Language Processing (2018 version)
Minh Pham
 
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
Minh Pham
 
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017
Minh Pham
 
Research Methods in Natural Language Processing
Research Methods in Natural Language ProcessingResearch Methods in Natural Language Processing
Research Methods in Natural Language Processing
Minh Pham
 
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbotCác bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Minh Pham
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
Minh Pham
 

More from Minh Pham (13)

Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPTPrompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
 
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...
 
Research methods for engineering students (v.2020)
Research methods for engineering students (v.2020)Research methods for engineering students (v.2020)
Research methods for engineering students (v.2020)
 
Giới thiệu về AIML
Giới thiệu về AIMLGiới thiệu về AIML
Giới thiệu về AIML
 
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiênMạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Deep Contexualized Representation
Deep Contexualized RepresentationDeep Contexualized Representation
Deep Contexualized Representation
 
Research Methods in Natural Language Processing (2018 version)
Research Methods in Natural Language Processing (2018 version)Research Methods in Natural Language Processing (2018 version)
Research Methods in Natural Language Processing (2018 version)
 
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
 
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017
 
Research Methods in Natural Language Processing
Research Methods in Natural Language ProcessingResearch Methods in Natural Language Processing
Research Methods in Natural Language Processing
 
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbotCác bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 

Recently uploaded

Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
NoelManyise1
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 

Recently uploaded (20)

Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 

A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnamese SNS

  • 1. A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnamese SNS Phạm Quang Nhật Minh AImesoft JSC, Vietnam minhpham0902@gmail.com December 18, 2020 Nguyễn Mạnh Đức Tuân Toyo University, Japan ductuan024@gmail.com 7th International Workshop on Vietnamese Language and Speech Processing (VLSP 2020)
  • 2. What is Fake News? 2 n “Fake news is a news article that is intentionally and veritably false.” (Shu et al., 2017)
  • 3. Why Fake News Detection? 3 n Fake news negatively affects to society n Fake news spreads like a real virus, especially via social medias ¨ https://engineering.stanford.edu/magazine/article/how- fake-news-spreads-real-virus n Fake news detection is useful to increase the credibility of information of medias, and prevent spreading of fake contents
  • 4. Why Multimodal is Important? 4 n In a addition to texts, images and videos are popular in social medias ¨ Visual information is helpful in detecting rumors n Other metadata information is useful: number of likes, shares, retweets, time stamps, etc
  • 5. Our Approach 5 Text contentsImages Metadata features VGG 19 Fully-connected layer BERT + CNN Making Classification
  • 6. Main Findings 6 n The proposed attention mechanism used to get the representation of images is useful n Adding residual connections in blocks leads to performance improvement n System accuracy is improved with our proposed ensemble model
  • 7. Proposed Method in Detail 7 n Data processing n Model architecture n Experiments and results
  • 8. Data Format 8 n Each piece of information includes 6 main attributes: ¨ The anonymized id of the owner ¨ Text contents ¨ Timestamp ¨ Number of likes ¨ Number of comments ¨ Number of shares n Each news may contain zero or more than one image
  • 9. Text pre-processing 9 n Convert emojis such as =]], :( into sentiment words "happy" or "sad" in Vietnamese. n Converted words and tokens that have been lengthened into short form. ¨ “coool” to “cool” n Changed different terms about COVID-19 into one term for consistency. ¨ “covid”, “ncov”
  • 10. Data Imputation 10 n Mean values to fill missing values. n For the timestamp, we applied the MICE imputation method (Azuret al., 2011)
  • 11. General Model 11 Given the representation of an image and a text, we learn which parts of the impage we should give more attention
  • 12. Model 1 12 1D-CNN layers with filter sizes 2, 3, 4, 5 follow the BERT module, and then a fully connected layer with Batch Normalization follow 1D-CNN layers
  • 13. Model 2&3 13 Model 2&3 used three additional 1D-CNN layers Model 3 used residual connections for additional 1D-CNN layers
  • 14. Feature Design (1) 14 n Timestamp feature is converted into: ¨ Day ¨ Month ¨ Year ¨ Hour ¨ Weekday n Text-based features: ¨ Number of hashtags ¨ Number of URLs ¨ Number of characters ¨ Number of words ¨ Number of question-marks ¨ Number of exclaim-marks ¨ A Boolean variable to indicate that post contains images or not
  • 15. Feature Design (2) 15 n User-based features: ¨ Number of unreliable news ¨ Number of reliable news ¨ Ratio between two numbers, to indicate the sharing behavior n All the above features will be standardized by subtracting the mean and scaling to unit variance, except for the Boolean feature.
  • 16. Multi Image Posts 16 n Some posts contain more than one image n Two strategies: ¨ Use one image as input ¨ Multiple images (4 images at most) as input.
  • 17. Proposed Ensemble Model 17 n Choose two best models among three models n Calculate averages of probabilities returned by two models
  • 18. Experiments & Results 18 n Evaluation measure: ROC AUC n We conducted experiments in order to evaluate ¨ The effect of pre-trained BERT models ¨ Text preprocessing strategies ¨ The effectiveness of the attention mechanism
  • 19. PhoBERT vs NlpHUST/vibert4news 19 n Bert4news uses syllable-based tokenization ¨ Trained on 20GB of news texts n PhoBERT uses word-level/subword tokenization ¨ Trained on 20GB of texts including Wikipedia and news Pre-trained model Result on private test (AUC) PhoBERT 0.921 bert4news 0.928
  • 20. Effectiveness of Attention Mechanism 20 n Using attention mechanism significantly improved the result n Images and texts are co-related. ¨ Images and texts of reliable news are often related ¨ Someone may use images that do not relate to the content of the news for the click-bait purpose Models Result on private test (AUC) w/o attention 0.928 attention 0.940
  • 21. Incorrect vs correct form words 21 n “sá.thại” vs “sát hại” ¨ Contain violent contents or ex-treme words. ¨ Can bypass the social media’s filtering function. n Keeping is better! ¨ Partly reflect the sentiment of the text. ¨ Unreliable contents tend to use more subjective or extreme words to convey a particular perspective. Models (PhoBERT) Result on private test (AUC) Words in correct form 0.918 Words in incorrect form 0.921
  • 22. Results 22 Run Result on private test (AUC) Model 1 0.939 Model 2 0.919 Model 3 0.940 Ensemble 0.945 n Results on the private test
  • 23. Future work 23 n Use external data for fake news detection n The natural way to make a judgement in fake news detection task is to compare with different information sources to find out relevant evidences of fake news.
  • 24. Thank you very much for listening! 24