SlideShare a Scribd company logo
1 of 24
Download to read offline
A Multimodal Ensemble Model for Detecting
Unreliable Information on Vietnamese SNS
Phạm Quang Nhật Minh
AImesoft JSC, Vietnam
minhpham0902@gmail.com
December 18, 2020
Nguyễn Mạnh Đức Tuân
Toyo University, Japan
ductuan024@gmail.com
7th International Workshop on
Vietnamese Language and Speech Processing (VLSP 2020)
What is Fake News?
2
n “Fake news is a news article that is intentionally and
veritably false.” (Shu et al., 2017)
Why Fake News Detection?
3
n Fake news negatively affects to society
n Fake news spreads like a real virus, especially via
social medias
¨ https://engineering.stanford.edu/magazine/article/how-
fake-news-spreads-real-virus
n Fake news detection is useful to increase the
credibility of information of medias, and prevent
spreading of fake contents
Why Multimodal is Important?
4
n In a addition to texts, images and videos are popular in
social medias
¨ Visual information is helpful in detecting rumors
n Other metadata information is useful: number of likes,
shares, retweets, time stamps, etc
Our Approach
5
Text contentsImages Metadata features
VGG 19
Fully-connected layer
BERT + CNN
Making Classification
Main Findings
6
n The proposed attention mechanism used to get the
representation of images is useful
n Adding residual connections in blocks leads to
performance improvement
n System accuracy is improved with our proposed
ensemble model
Proposed Method in Detail
7
n Data processing
n Model architecture
n Experiments and results
Data Format
8
n Each piece of information includes 6 main
attributes:
¨ The anonymized id of the owner
¨ Text contents
¨ Timestamp
¨ Number of likes
¨ Number of comments
¨ Number of shares
n Each news may contain zero or more than one
image
Text pre-processing
9
n Convert emojis such as =]], :( into sentiment
words "happy" or "sad" in Vietnamese.
n Converted words and tokens that have been
lengthened into short form.
¨ “coool” to “cool”
n Changed different terms about COVID-19 into
one term for consistency.
¨ “covid”, “ncov”
Data Imputation
10
n Mean values to fill missing values.
n For the timestamp, we applied the MICE
imputation method (Azuret al., 2011)
General Model
11
Given the representation of an image and a
text, we learn which parts of the impage we
should give more attention
Model 1
12
1D-CNN layers with filter sizes 2, 3, 4, 5 follow the
BERT module, and then a fully connected layer
with Batch Normalization follow 1D-CNN layers
Model 2&3
13
Model 2&3 used three additional
1D-CNN layers
Model 3 used residual connections
for additional 1D-CNN layers
Feature Design (1)
14
n Timestamp feature is converted into:
¨ Day
¨ Month
¨ Year
¨ Hour
¨ Weekday
n Text-based features:
¨ Number of hashtags
¨ Number of URLs
¨ Number of characters
¨ Number of words
¨ Number of question-marks
¨ Number of exclaim-marks
¨ A Boolean variable to indicate that post contains images or not
Feature Design (2)
15
n User-based features:
¨ Number of unreliable news
¨ Number of reliable news
¨ Ratio between two numbers, to indicate the sharing behavior
n All the above features will be standardized by subtracting the mean and
scaling to unit variance, except for the Boolean feature.
Multi Image Posts
16
n Some posts contain more than one image
n Two strategies:
¨ Use one image as input
¨ Multiple images (4 images at most) as input.
Proposed Ensemble Model
17
n Choose two best models among three models
n Calculate averages of probabilities returned by two
models
Experiments & Results
18
n Evaluation measure: ROC AUC
n We conducted experiments in order to evaluate
¨ The effect of pre-trained BERT models
¨ Text preprocessing strategies
¨ The effectiveness of the attention mechanism
PhoBERT vs NlpHUST/vibert4news
19
n Bert4news uses syllable-based tokenization
¨ Trained on 20GB of news texts
n PhoBERT uses word-level/subword tokenization
¨ Trained on 20GB of texts including Wikipedia and news
Pre-trained model Result on private test (AUC)
PhoBERT 0.921
bert4news 0.928
Effectiveness of Attention Mechanism
20
n Using attention mechanism significantly
improved the result
n Images and texts are co-related.
¨ Images and texts of reliable news are often
related
¨ Someone may use images that do not relate to
the content of the news for the click-bait purpose
Models Result on private test (AUC)
w/o attention 0.928
attention 0.940
Incorrect vs correct form words
21
n “sá.thại” vs “sát hại”
¨ Contain violent contents or ex-treme words.
¨ Can bypass the social media’s filtering function.
n Keeping is better!
¨ Partly reflect the sentiment of the text.
¨ Unreliable contents tend to use more subjective or extreme words to
convey a particular perspective.
Models (PhoBERT) Result on private test (AUC)
Words in correct form 0.918
Words in incorrect form 0.921
Results
22
Run Result on private test (AUC)
Model 1 0.939
Model 2 0.919
Model 3 0.940
Ensemble 0.945
n Results on the private test
Future work
23
n Use external data for fake news detection
n The natural way to make a judgement in fake
news detection task is to compare with
different information sources to find out
relevant evidences of fake news.
Thank you very much for listening!
24

More Related Content

Similar to A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnamese SNS

Predicting cyber bullying on t witter using machine learning
Predicting cyber bullying on t witter using machine learningPredicting cyber bullying on t witter using machine learning
Predicting cyber bullying on t witter using machine learningMirXahid1
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSManishReddy706923
 
Portofolio Muhammad Afrizal Septiansyah 2024
Portofolio Muhammad Afrizal Septiansyah 2024Portofolio Muhammad Afrizal Septiansyah 2024
Portofolio Muhammad Afrizal Septiansyah 2024MuhammadAfrizalSepti
 
An evolutionary approach to comparative analysis of detecting Bangla abusive ...
An evolutionary approach to comparative analysis of detecting Bangla abusive ...An evolutionary approach to comparative analysis of detecting Bangla abusive ...
An evolutionary approach to comparative analysis of detecting Bangla abusive ...journalBEEI
 
IRJET- Segmenting, Multimedia Summarizing and Query based Retrieval of New...
IRJET- 	  Segmenting, Multimedia Summarizing and Query based Retrieval of New...IRJET- 	  Segmenting, Multimedia Summarizing and Query based Retrieval of New...
IRJET- Segmenting, Multimedia Summarizing and Query based Retrieval of New...IRJET Journal
 
Graph embedding approach to analyze sentiments on cryptocurrency
Graph embedding approach to analyze sentiments on cryptocurrencyGraph embedding approach to analyze sentiments on cryptocurrency
Graph embedding approach to analyze sentiments on cryptocurrencyIJECEIAES
 
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...IRJET Journal
 
The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...IRJET Journal
 
[DSC Croatia 22] Experience in collaboration between academia and industry: N...
[DSC Croatia 22] Experience in collaboration between academia and industry: N...[DSC Croatia 22] Experience in collaboration between academia and industry: N...
[DSC Croatia 22] Experience in collaboration between academia and industry: N...DataScienceConferenc1
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?Yiannis Kompatsiaris
 
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...Sara Alvarez
 
my model genuines.
my model genuines.my model genuines.
my model genuines.Teng Xiaolu
 
Fake News Detection Using Machine Learning
Fake News Detection Using Machine LearningFake News Detection Using Machine Learning
Fake News Detection Using Machine LearningIRJET Journal
 
News Reliability Evaluation using Latent Semantic Analysis
News Reliability Evaluation using Latent Semantic AnalysisNews Reliability Evaluation using Latent Semantic Analysis
News Reliability Evaluation using Latent Semantic AnalysisTELKOMNIKA JOURNAL
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...jcscholtes
 
IRJET- Fake Message Deduction using Machine Learining
IRJET- Fake Message Deduction using Machine LeariningIRJET- Fake Message Deduction using Machine Learining
IRJET- Fake Message Deduction using Machine LeariningIRJET Journal
 
ReTV at EBU MDN Workshop 2020
ReTV at EBU MDN Workshop 2020ReTV at EBU MDN Workshop 2020
ReTV at EBU MDN Workshop 2020ReTV project
 
Analyzing sentiment dynamics from sparse text coronavirus disease-19 vaccina...
Analyzing sentiment dynamics from sparse text coronavirus  disease-19 vaccina...Analyzing sentiment dynamics from sparse text coronavirus  disease-19 vaccina...
Analyzing sentiment dynamics from sparse text coronavirus disease-19 vaccina...IJECEIAES
 
PhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomiesPhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomiesFreddy Limpens
 

Similar to A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnamese SNS (20)

Predicting cyber bullying on t witter using machine learning
Predicting cyber bullying on t witter using machine learningPredicting cyber bullying on t witter using machine learning
Predicting cyber bullying on t witter using machine learning
 
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONSTHE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
THE REACTION DATA ANALYSIS OFCOVID-19 VACCINATIONS
 
Portofolio Muhammad Afrizal Septiansyah 2024
Portofolio Muhammad Afrizal Septiansyah 2024Portofolio Muhammad Afrizal Septiansyah 2024
Portofolio Muhammad Afrizal Septiansyah 2024
 
Audubon's UX portfolio
Audubon's UX portfolioAudubon's UX portfolio
Audubon's UX portfolio
 
An evolutionary approach to comparative analysis of detecting Bangla abusive ...
An evolutionary approach to comparative analysis of detecting Bangla abusive ...An evolutionary approach to comparative analysis of detecting Bangla abusive ...
An evolutionary approach to comparative analysis of detecting Bangla abusive ...
 
IRJET- Segmenting, Multimedia Summarizing and Query based Retrieval of New...
IRJET- 	  Segmenting, Multimedia Summarizing and Query based Retrieval of New...IRJET- 	  Segmenting, Multimedia Summarizing and Query based Retrieval of New...
IRJET- Segmenting, Multimedia Summarizing and Query based Retrieval of New...
 
Graph embedding approach to analyze sentiments on cryptocurrency
Graph embedding approach to analyze sentiments on cryptocurrencyGraph embedding approach to analyze sentiments on cryptocurrency
Graph embedding approach to analyze sentiments on cryptocurrency
 
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...
A Intensified Approach On Enhanced Transformer Based Models Using Natural Lan...
 
The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...The Identification of Depressive Moods from Twitter Data by Using Convolution...
The Identification of Depressive Moods from Twitter Data by Using Convolution...
 
[DSC Croatia 22] Experience in collaboration between academia and industry: N...
[DSC Croatia 22] Experience in collaboration between academia and industry: N...[DSC Croatia 22] Experience in collaboration between academia and industry: N...
[DSC Croatia 22] Experience in collaboration between academia and industry: N...
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
Application Of Sentiment Lexicons On Movies Transcripts To Detect Violence In...
 
my model genuines.
my model genuines.my model genuines.
my model genuines.
 
Fake News Detection Using Machine Learning
Fake News Detection Using Machine LearningFake News Detection Using Machine Learning
Fake News Detection Using Machine Learning
 
News Reliability Evaluation using Latent Semantic Analysis
News Reliability Evaluation using Latent Semantic AnalysisNews Reliability Evaluation using Latent Semantic Analysis
News Reliability Evaluation using Latent Semantic Analysis
 
How can text-mining leverage developments in Deep Learning? Presentation at ...
How can text-mining leverage developments in Deep Learning?  Presentation at ...How can text-mining leverage developments in Deep Learning?  Presentation at ...
How can text-mining leverage developments in Deep Learning? Presentation at ...
 
IRJET- Fake Message Deduction using Machine Learining
IRJET- Fake Message Deduction using Machine LeariningIRJET- Fake Message Deduction using Machine Learining
IRJET- Fake Message Deduction using Machine Learining
 
ReTV at EBU MDN Workshop 2020
ReTV at EBU MDN Workshop 2020ReTV at EBU MDN Workshop 2020
ReTV at EBU MDN Workshop 2020
 
Analyzing sentiment dynamics from sparse text coronavirus disease-19 vaccina...
Analyzing sentiment dynamics from sparse text coronavirus  disease-19 vaccina...Analyzing sentiment dynamics from sparse text coronavirus  disease-19 vaccina...
Analyzing sentiment dynamics from sparse text coronavirus disease-19 vaccina...
 
PhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomiesPhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomies
 

More from Minh Pham

Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPTPrompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPTMinh Pham
 
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...Minh Pham
 
Research methods for engineering students (v.2020)
Research methods for engineering students (v.2020)Research methods for engineering students (v.2020)
Research methods for engineering students (v.2020)Minh Pham
 
Giới thiệu về AIML
Giới thiệu về AIMLGiới thiệu về AIML
Giới thiệu về AIMLMinh Pham
 
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiênMạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiênMinh Pham
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
Deep Contexualized Representation
Deep Contexualized RepresentationDeep Contexualized Representation
Deep Contexualized RepresentationMinh Pham
 
Research Methods in Natural Language Processing (2018 version)
Research Methods in Natural Language Processing (2018 version)Research Methods in Natural Language Processing (2018 version)
Research Methods in Natural Language Processing (2018 version)Minh Pham
 
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...Minh Pham
 
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017Minh Pham
 
Research Methods in Natural Language Processing
Research Methods in Natural Language ProcessingResearch Methods in Natural Language Processing
Research Methods in Natural Language ProcessingMinh Pham
 
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbotCác bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbotMinh Pham
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 

More from Minh Pham (13)

Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPTPrompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
Prompt Engineering Tutorial: Cách viết prompt hiệu quả với ChatGPT
 
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...
AimeLaw at ALQAC 2021: Enriching Neural Network Models with Legal-Domain Know...
 
Research methods for engineering students (v.2020)
Research methods for engineering students (v.2020)Research methods for engineering students (v.2020)
Research methods for engineering students (v.2020)
 
Giới thiệu về AIML
Giới thiệu về AIMLGiới thiệu về AIML
Giới thiệu về AIML
 
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiênMạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
Mạng neural nhân tạo và ứng dụng trong xử lý ngôn ngữ tự nhiên
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Deep Contexualized Representation
Deep Contexualized RepresentationDeep Contexualized Representation
Deep Contexualized Representation
 
Research Methods in Natural Language Processing (2018 version)
Research Methods in Natural Language Processing (2018 version)Research Methods in Natural Language Processing (2018 version)
Research Methods in Natural Language Processing (2018 version)
 
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
A Feature-Based Model for Nested Named-Entity Recognition at VLSP-2018 NER Ev...
 
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017Về kỹ thuật Attention trong mô hình sequence-to-sequence  tại hội nghị ACL 2017
Về kỹ thuật Attention trong mô hình sequence-to-sequence tại hội nghị ACL 2017
 
Research Methods in Natural Language Processing
Research Methods in Natural Language ProcessingResearch Methods in Natural Language Processing
Research Methods in Natural Language Processing
 
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbotCác bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
Các bài toán xử lý ngôn ngữ tự nhiên trong phát triển hệ thống chatbot
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 

Recently uploaded

Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Masticationvidulajaib
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiologyDrAnita Sharma
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxVarshiniMK
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10ROLANARIBATO3
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 

Recently uploaded (20)

Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Mastication
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiology
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptx
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 

A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnamese SNS

  • 1. A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnamese SNS Phạm Quang Nhật Minh AImesoft JSC, Vietnam minhpham0902@gmail.com December 18, 2020 Nguyễn Mạnh Đức Tuân Toyo University, Japan ductuan024@gmail.com 7th International Workshop on Vietnamese Language and Speech Processing (VLSP 2020)
  • 2. What is Fake News? 2 n “Fake news is a news article that is intentionally and veritably false.” (Shu et al., 2017)
  • 3. Why Fake News Detection? 3 n Fake news negatively affects to society n Fake news spreads like a real virus, especially via social medias ¨ https://engineering.stanford.edu/magazine/article/how- fake-news-spreads-real-virus n Fake news detection is useful to increase the credibility of information of medias, and prevent spreading of fake contents
  • 4. Why Multimodal is Important? 4 n In a addition to texts, images and videos are popular in social medias ¨ Visual information is helpful in detecting rumors n Other metadata information is useful: number of likes, shares, retweets, time stamps, etc
  • 5. Our Approach 5 Text contentsImages Metadata features VGG 19 Fully-connected layer BERT + CNN Making Classification
  • 6. Main Findings 6 n The proposed attention mechanism used to get the representation of images is useful n Adding residual connections in blocks leads to performance improvement n System accuracy is improved with our proposed ensemble model
  • 7. Proposed Method in Detail 7 n Data processing n Model architecture n Experiments and results
  • 8. Data Format 8 n Each piece of information includes 6 main attributes: ¨ The anonymized id of the owner ¨ Text contents ¨ Timestamp ¨ Number of likes ¨ Number of comments ¨ Number of shares n Each news may contain zero or more than one image
  • 9. Text pre-processing 9 n Convert emojis such as =]], :( into sentiment words "happy" or "sad" in Vietnamese. n Converted words and tokens that have been lengthened into short form. ¨ “coool” to “cool” n Changed different terms about COVID-19 into one term for consistency. ¨ “covid”, “ncov”
  • 10. Data Imputation 10 n Mean values to fill missing values. n For the timestamp, we applied the MICE imputation method (Azuret al., 2011)
  • 11. General Model 11 Given the representation of an image and a text, we learn which parts of the impage we should give more attention
  • 12. Model 1 12 1D-CNN layers with filter sizes 2, 3, 4, 5 follow the BERT module, and then a fully connected layer with Batch Normalization follow 1D-CNN layers
  • 13. Model 2&3 13 Model 2&3 used three additional 1D-CNN layers Model 3 used residual connections for additional 1D-CNN layers
  • 14. Feature Design (1) 14 n Timestamp feature is converted into: ¨ Day ¨ Month ¨ Year ¨ Hour ¨ Weekday n Text-based features: ¨ Number of hashtags ¨ Number of URLs ¨ Number of characters ¨ Number of words ¨ Number of question-marks ¨ Number of exclaim-marks ¨ A Boolean variable to indicate that post contains images or not
  • 15. Feature Design (2) 15 n User-based features: ¨ Number of unreliable news ¨ Number of reliable news ¨ Ratio between two numbers, to indicate the sharing behavior n All the above features will be standardized by subtracting the mean and scaling to unit variance, except for the Boolean feature.
  • 16. Multi Image Posts 16 n Some posts contain more than one image n Two strategies: ¨ Use one image as input ¨ Multiple images (4 images at most) as input.
  • 17. Proposed Ensemble Model 17 n Choose two best models among three models n Calculate averages of probabilities returned by two models
  • 18. Experiments & Results 18 n Evaluation measure: ROC AUC n We conducted experiments in order to evaluate ¨ The effect of pre-trained BERT models ¨ Text preprocessing strategies ¨ The effectiveness of the attention mechanism
  • 19. PhoBERT vs NlpHUST/vibert4news 19 n Bert4news uses syllable-based tokenization ¨ Trained on 20GB of news texts n PhoBERT uses word-level/subword tokenization ¨ Trained on 20GB of texts including Wikipedia and news Pre-trained model Result on private test (AUC) PhoBERT 0.921 bert4news 0.928
  • 20. Effectiveness of Attention Mechanism 20 n Using attention mechanism significantly improved the result n Images and texts are co-related. ¨ Images and texts of reliable news are often related ¨ Someone may use images that do not relate to the content of the news for the click-bait purpose Models Result on private test (AUC) w/o attention 0.928 attention 0.940
  • 21. Incorrect vs correct form words 21 n “sá.thại” vs “sát hại” ¨ Contain violent contents or ex-treme words. ¨ Can bypass the social media’s filtering function. n Keeping is better! ¨ Partly reflect the sentiment of the text. ¨ Unreliable contents tend to use more subjective or extreme words to convey a particular perspective. Models (PhoBERT) Result on private test (AUC) Words in correct form 0.918 Words in incorrect form 0.921
  • 22. Results 22 Run Result on private test (AUC) Model 1 0.939 Model 2 0.919 Model 3 0.940 Ensemble 0.945 n Results on the private test
  • 23. Future work 23 n Use external data for fake news detection n The natural way to make a judgement in fake news detection task is to compare with different information sources to find out relevant evidences of fake news.
  • 24. Thank you very much for listening! 24