A Multimodal Ensemble Model for Detecting Unreliable Information on Vietnamese SNS
1. A Multimodal Ensemble Model for Detecting
Unreliable Information on Vietnamese SNS
Phạm Quang Nhật Minh
AImesoft JSC, Vietnam
minhpham0902@gmail.com
December 18, 2020
Nguyễn Mạnh Đức Tuân
Toyo University, Japan
ductuan024@gmail.com
7th International Workshop on
Vietnamese Language and Speech Processing (VLSP 2020)
2. What is Fake News?
2
n “Fake news is a news article that is intentionally and
veritably false.” (Shu et al., 2017)
3. Why Fake News Detection?
3
n Fake news negatively affects to society
n Fake news spreads like a real virus, especially via
social medias
¨ https://engineering.stanford.edu/magazine/article/how-
fake-news-spreads-real-virus
n Fake news detection is useful to increase the
credibility of information of medias, and prevent
spreading of fake contents
4. Why Multimodal is Important?
4
n In a addition to texts, images and videos are popular in
social medias
¨ Visual information is helpful in detecting rumors
n Other metadata information is useful: number of likes,
shares, retweets, time stamps, etc
6. Main Findings
6
n The proposed attention mechanism used to get the
representation of images is useful
n Adding residual connections in blocks leads to
performance improvement
n System accuracy is improved with our proposed
ensemble model
7. Proposed Method in Detail
7
n Data processing
n Model architecture
n Experiments and results
8. Data Format
8
n Each piece of information includes 6 main
attributes:
¨ The anonymized id of the owner
¨ Text contents
¨ Timestamp
¨ Number of likes
¨ Number of comments
¨ Number of shares
n Each news may contain zero or more than one
image
9. Text pre-processing
9
n Convert emojis such as =]], :( into sentiment
words "happy" or "sad" in Vietnamese.
n Converted words and tokens that have been
lengthened into short form.
¨ “coool” to “cool”
n Changed different terms about COVID-19 into
one term for consistency.
¨ “covid”, “ncov”
10. Data Imputation
10
n Mean values to fill missing values.
n For the timestamp, we applied the MICE
imputation method (Azuret al., 2011)
11. General Model
11
Given the representation of an image and a
text, we learn which parts of the impage we
should give more attention
12. Model 1
12
1D-CNN layers with filter sizes 2, 3, 4, 5 follow the
BERT module, and then a fully connected layer
with Batch Normalization follow 1D-CNN layers
13. Model 2&3
13
Model 2&3 used three additional
1D-CNN layers
Model 3 used residual connections
for additional 1D-CNN layers
14. Feature Design (1)
14
n Timestamp feature is converted into:
¨ Day
¨ Month
¨ Year
¨ Hour
¨ Weekday
n Text-based features:
¨ Number of hashtags
¨ Number of URLs
¨ Number of characters
¨ Number of words
¨ Number of question-marks
¨ Number of exclaim-marks
¨ A Boolean variable to indicate that post contains images or not
15. Feature Design (2)
15
n User-based features:
¨ Number of unreliable news
¨ Number of reliable news
¨ Ratio between two numbers, to indicate the sharing behavior
n All the above features will be standardized by subtracting the mean and
scaling to unit variance, except for the Boolean feature.
16. Multi Image Posts
16
n Some posts contain more than one image
n Two strategies:
¨ Use one image as input
¨ Multiple images (4 images at most) as input.
17. Proposed Ensemble Model
17
n Choose two best models among three models
n Calculate averages of probabilities returned by two
models
18. Experiments & Results
18
n Evaluation measure: ROC AUC
n We conducted experiments in order to evaluate
¨ The effect of pre-trained BERT models
¨ Text preprocessing strategies
¨ The effectiveness of the attention mechanism
19. PhoBERT vs NlpHUST/vibert4news
19
n Bert4news uses syllable-based tokenization
¨ Trained on 20GB of news texts
n PhoBERT uses word-level/subword tokenization
¨ Trained on 20GB of texts including Wikipedia and news
Pre-trained model Result on private test (AUC)
PhoBERT 0.921
bert4news 0.928
20. Effectiveness of Attention Mechanism
20
n Using attention mechanism significantly
improved the result
n Images and texts are co-related.
¨ Images and texts of reliable news are often
related
¨ Someone may use images that do not relate to
the content of the news for the click-bait purpose
Models Result on private test (AUC)
w/o attention 0.928
attention 0.940
21. Incorrect vs correct form words
21
n “sá.thại” vs “sát hại”
¨ Contain violent contents or ex-treme words.
¨ Can bypass the social media’s filtering function.
n Keeping is better!
¨ Partly reflect the sentiment of the text.
¨ Unreliable contents tend to use more subjective or extreme words to
convey a particular perspective.
Models (PhoBERT) Result on private test (AUC)
Words in correct form 0.918
Words in incorrect form 0.921
22. Results
22
Run Result on private test (AUC)
Model 1 0.939
Model 2 0.919
Model 3 0.940
Ensemble 0.945
n Results on the private test
23. Future work
23
n Use external data for fake news detection
n The natural way to make a judgement in fake
news detection task is to compare with
different information sources to find out
relevant evidences of fake news.