Ensemble based method for the classification of flooding event using social media data

An ensemble based method for the classification of flooding
event using social media data
Muhammad Hanif, Huzaifa Joozer, Muhammad Atif Tahir, Muhammad Rafi
Flood related multimedia: MediaEval 2020,
December 14-15 2020, Online

Table of Contents
 Introduction
 Approach
 Visual Data Processing
 Text Data Processing
 Ensemble (Visual and Textual Data)
 Results
 Challenges and Future work

Introduction
 Flood-related Multimedia Task at MediaEval 2020
 Goal:
 Analyze Tweets (Visual and Textual data) for classification of flooding
event [1]
 Challenges:
 Imbalance Dataset (4:1 ratio for majority to minority class)

Approach: 1. Visual Data
A. Selection of N input samples
 Data Augmentation: To oversample the minority class, three augmented copies of each
image in minority class are created through Augmentor library [2].
 Stratified Random Sample Selection: Equal-sized random samples are selected from
minority and majority class of the train-set.
 Different samples (N=15) are created, so that same quantity of training models could be
trained.

Approach: 1. Visual Data
B. Parallel training of N Models
 Each training model receives its respective input of Randomly selected input
images
 Each model has utilized Visual Geometry Group (VGG16) [3] model, pre-trained on
hybrid dataset of ImageNet [4] and Places365 [5] dataset.
C. Prediction on Test-data (Majority Voting)
 As N=15 number of models are trained during the training stage, so each of the
model is utilized for the prediction of test-data, hence N number of predictions are
retrieved.
 Majority voting is applied on N predictions to decide final prediction for test-set.

Approach: 2. Textual Data
A. Translation: Translation of each tweet is performed by using googletrans [6].
B. Oversampling: Population of minority class in increased by Oversampling.
C. TF-IDF Calculation: TF-IDF is calculated for the tweets.
D. Training: Different classifiers are tried but Multinomial Naïve Bayes has produced
better results during training, so it was selected for the prediction.
E. Prediction: Trained Classifiers are utilized to perform prediction on test-set.

Approach: 3. Ensemble of Visual & Textual Data
 Average of probabilities
 Final outcome of visual and textual data is retrieved in the form of probabilities
 Average of both out come is calculated.
 Prediction of class
 Class of each instance of test-set is prediction on the basis of average probability

Results
Task Type Prediction on Test-set Average outcome
Run 1: Fusion of text and visual data 27.86% 22.13%
Run 2: Text 36.31% 35.71%
Run 3: Visual 20.76% 13.18%
 F1 evaluation results on Test-set Dataset

Challenges and Future Work
 The research has proposed ensemble based approach which has
combined deep learning and shallow learning based methods, for
the classification of flooding incident by using text and visual data
extracted from social media data.
 In future more data may be added for the effective training of
deep learning model.
 More deep leaning based pre-trained models may be test to
improve the results.

References
1. Stelios Andreadis, Ilias Gialampoukidis, Anastasios Karakostas, Stefanos Vrochidis, Ioannis Kompatsiaris,
Roberto Fiorin, Daniele Norbiato, and Michele Ferri. 2020. The Flood-related Multimedia Task at
MediaEval 2020. (2020)
2. Marcus D Bloice, Christof Stocker, and Andreas Holzinger. 2017. Augmentor: an image augmentation
library for machine learning. arXiv preprint arXiv:1708.04680 (2017)
3. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image
recognition. arXiv preprint arXiv:1409.1556 (2014).
4. Ryan Lagerstrom, Yulia Arzhaeva, Piotr Szul, Oliver Obst, Robert Power, Bella Robinson, and Tomasz
Bednarz. 2016. Image classification to support emergency situation awareness. Frontiers in Robotics and
AI 3 (2016), 54.
5. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale
hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. IEEE,
248–255.
6. Suhun Han. 2020. googletrans 3.0.0. https://pypi.org/project/googletrans/. (2020). Accessed: 2020-11-1

Ensemble based method for the classification of flooding event using social media data

More Related Content

Similar to Ensemble based method for the classification of flooding event using social media data

More from multimediaeval

Recently uploaded

Ensemble based method for the classification of flooding event using social media data