Proposal of a Terrorist
Detection Model in
Social Networks
Master’s thesis defense
Presented By : Wajdi Khattel on 07.12.2019
2018 / 2019
In front of jury composed of:
● President: Najet AROUS
● Evaluator: Olfa EL MOURALI
● Academic supervisor: Ramzi GUETARI
● Laboratory supervisor: Nour El Houda BEN CHAABENE
Outline
2
1
Introduction
2 3
Proposed
Model
5
Implementation
& Results
6
Conclusion
& Perspective
Existing works
Introduction
3
1
Context
▰The appearance of social networks created an ease of
communication
▰The usage of social networks differs: Friendly vs harmful
▰Terrorists are one of the most dangerous category
▰The detection of these users is important
4
Problematic
▰Terrorists tend to hide their abnormal behavior
▰Normal user could adopt terrorist behavior
▰Socio-cultural definition of a terrorist could change over time
⇒ Time is important
5
Objective
▰Propose a terrorist detection model
▻Consider over-time user’s behavior change
▻Consider over-time behavior’s definition change
▰Cover Limitation of existing models
6
Existing Works
7
2
Input Format
8
Anomaly Detection
9
Paper Input
Format
Description Multiple
Social
Networks
Multiple
Input data
types
User’s
Behavior
Change
Behavior
Definition
Change
Lashakry et al.,
2019
Activity Proposal of model for user profile
creation to monitor users
✓ ✓ ✗ ✗
Zamanian et
al.,2019
Activity Proposal of model for user activity
pattern recognition
✗ ✓ ✓ ✗
Bhattacharjee
et al., 2017
Graph Proposal of a probabilistic anomaly
classifier mode
✗ ✗ ✓ ✓
Chen et al.,
2018
Graph Proposal of a user profiling
framework that can be used to
detect anomalous users
✗ ✓ ✗ ✗
Proposed Input Format
10
Hybrid Input Format:
▰Graph Input
▰Activity-based score for each node
11
Terrorism Detection
Alvari et al. (2019)
- Different data
collecting methods
- Textual-content data
features
Chitrakar et al.
(2016)
Kalpakis et al. (2019)
- Advantages of using
Convolutional Neural
Network (CNN)
- Advantages of using
Transfer Learning
Technique
- Advantages of using
multidimensional
networks
- Social Network
Analysis
methodologies
Proposed Model
12
3
▰Model Input: Multidimensional Network
▰Three sub-models:
▻Text classification model
▻Image classification model
▻General Information classification model
▰Decision Making
13
Proposed Model
Model Input
14
▰Nodes: Users
▰Dimensions: User’s social medias content
▰Edges: Connection between users on a
certain dimension
Multidimensional Network
▰Input: Textual data
▰Process:
▻Natural Language Processing
▻Word Embedding
▻Machine Learning classification
▰Output: Score
15
Text Classification Model (TCM)
▰Objective: Make the machine able to understand the human
language
▰Process:
▻Morphological Analysis
▻Syntactical Analysis
▻Semantical Analysis
16
TCM: Natural Language Processing
▰Objective: Represent text in a numerical way
while preserving its semantics
▰Process:
▻Term Frequency-Inverse Document Frequency
(TF-IDF)
17
TCM: Word Embedding
▰Input: Image data
▰Process:
▻Use pre-trained convolutional neural network
model
▻Add new convolutional layers
▰Output: Score
19
Image Classification Model (ICM)
20
ICM: CNN Architecture
Terrorist
Not Terrorist
▰Input: General Information data
▰Process:
▻If data is non-numerical ⇒ Encode it
▻Machine Learning classification
▰Output: Score
21
General Information Classification Model
▰Input: 3 submodels scores
▰Process:
▻Calculate user score
▻Classify it based on threshold
▰Output: User category (Terrorist or not)
22
Decision Making
TCM ICM GICM
Decision
Making
S1
S1 = Score1 * Weight1
S2 = Score2 * Weight2
S3 = Score3 * Weight3
S2 S3
23
Model Workflow
Implementation & Results
24
4
▰Offline Data: Data used for the model training
▻Textual Data: Tweets from banned Twitter accounts
▻Image Data: Images from google image
▻General Information Data: PIRUS dataset
▰Online Data: Data used for testing and live usage
▻Facebook Graph API
▻Instagram REST API
▻Twitter REST API
25
Data Collection
26
TCM: NLP + Word Embedding
27
Label Number of samples
Positive labels 122619
Negative labels 181691
Total Data 304310
TCM: Training Data
28
Model Name Accuracy F1-Score Training Time
Logistic Regression 0.9726 0.9674 39.9 secs
SVM 0.9626 0.9548 6h 48min 33secs
Neural Network 0.9774 0.9719 1min 11secs
TCM: Classification Model
29
Label Number of samples
Positive labels 219
Negative labels 314
Total Data 533
ICM: Training Data
30
ICM: Data Augmentation
31
ICM: Classification Model
Model Name Accuracy F1-Score Training Time
CNN 0.7631 0.7219 3mins 50secs
CNN + DA 0.7781 0.7463 4mins 12secs
CNN + TL 0.8291 0.8103 8mins 48secs
CNN + DA + TL 0.8571 0.8454 9min 23secs
32
Label Number of samples
Positive labels 114
Negative labels 126
Total Data 240
GICM: Training Data
33
GICM: Classification Model
Model Name Accuracy F1-Score Training Time
Logistic Regression 0.7650 0.7873 5 secs
SVM 0.8300 0.8495 7 secs
Neural Network 0.8173 0.8325 48.6 secs
34
Proposed Model
▰Text Classification Model: Neural Network
▰Image Classification Model: CNN + DA + TL
▰General Information Model: SVM
35
Results
Conclusion & Perspective
36
6
Conclusion
37
▰Proof-of-concept of terrorist detection model
▰Working with multiple social networks and multiple
data types
▰Supports over-time behavior change
Perspective
38
▰Graph Analysis
▰Support more data types: Video
▰Train on more data
39
Thank you for your
attention !

Master's Thesis Presentation

  • 1.
    Proposal of aTerrorist Detection Model in Social Networks Master’s thesis defense Presented By : Wajdi Khattel on 07.12.2019 2018 / 2019 In front of jury composed of: ● President: Najet AROUS ● Evaluator: Olfa EL MOURALI ● Academic supervisor: Ramzi GUETARI ● Laboratory supervisor: Nour El Houda BEN CHAABENE
  • 2.
  • 3.
  • 4.
    Context ▰The appearance ofsocial networks created an ease of communication ▰The usage of social networks differs: Friendly vs harmful ▰Terrorists are one of the most dangerous category ▰The detection of these users is important 4
  • 5.
    Problematic ▰Terrorists tend tohide their abnormal behavior ▰Normal user could adopt terrorist behavior ▰Socio-cultural definition of a terrorist could change over time ⇒ Time is important 5
  • 6.
    Objective ▰Propose a terroristdetection model ▻Consider over-time user’s behavior change ▻Consider over-time behavior’s definition change ▰Cover Limitation of existing models 6
  • 7.
  • 8.
  • 9.
    Anomaly Detection 9 Paper Input Format DescriptionMultiple Social Networks Multiple Input data types User’s Behavior Change Behavior Definition Change Lashakry et al., 2019 Activity Proposal of model for user profile creation to monitor users ✓ ✓ ✗ ✗ Zamanian et al.,2019 Activity Proposal of model for user activity pattern recognition ✗ ✓ ✓ ✗ Bhattacharjee et al., 2017 Graph Proposal of a probabilistic anomaly classifier mode ✗ ✗ ✓ ✓ Chen et al., 2018 Graph Proposal of a user profiling framework that can be used to detect anomalous users ✗ ✓ ✗ ✗
  • 10.
    Proposed Input Format 10 HybridInput Format: ▰Graph Input ▰Activity-based score for each node
  • 11.
    11 Terrorism Detection Alvari etal. (2019) - Different data collecting methods - Textual-content data features Chitrakar et al. (2016) Kalpakis et al. (2019) - Advantages of using Convolutional Neural Network (CNN) - Advantages of using Transfer Learning Technique - Advantages of using multidimensional networks - Social Network Analysis methodologies
  • 12.
  • 13.
    ▰Model Input: MultidimensionalNetwork ▰Three sub-models: ▻Text classification model ▻Image classification model ▻General Information classification model ▰Decision Making 13 Proposed Model
  • 14.
    Model Input 14 ▰Nodes: Users ▰Dimensions:User’s social medias content ▰Edges: Connection between users on a certain dimension Multidimensional Network
  • 15.
    ▰Input: Textual data ▰Process: ▻NaturalLanguage Processing ▻Word Embedding ▻Machine Learning classification ▰Output: Score 15 Text Classification Model (TCM)
  • 16.
    ▰Objective: Make themachine able to understand the human language ▰Process: ▻Morphological Analysis ▻Syntactical Analysis ▻Semantical Analysis 16 TCM: Natural Language Processing
  • 17.
    ▰Objective: Represent textin a numerical way while preserving its semantics ▰Process: ▻Term Frequency-Inverse Document Frequency (TF-IDF) 17 TCM: Word Embedding
  • 18.
    ▰Input: Image data ▰Process: ▻Usepre-trained convolutional neural network model ▻Add new convolutional layers ▰Output: Score 19 Image Classification Model (ICM)
  • 19.
  • 20.
    ▰Input: General Informationdata ▰Process: ▻If data is non-numerical ⇒ Encode it ▻Machine Learning classification ▰Output: Score 21 General Information Classification Model
  • 21.
    ▰Input: 3 submodelsscores ▰Process: ▻Calculate user score ▻Classify it based on threshold ▰Output: User category (Terrorist or not) 22 Decision Making TCM ICM GICM Decision Making S1 S1 = Score1 * Weight1 S2 = Score2 * Weight2 S3 = Score3 * Weight3 S2 S3
  • 22.
  • 23.
  • 24.
    ▰Offline Data: Dataused for the model training ▻Textual Data: Tweets from banned Twitter accounts ▻Image Data: Images from google image ▻General Information Data: PIRUS dataset ▰Online Data: Data used for testing and live usage ▻Facebook Graph API ▻Instagram REST API ▻Twitter REST API 25 Data Collection
  • 25.
    26 TCM: NLP +Word Embedding
  • 26.
    27 Label Number ofsamples Positive labels 122619 Negative labels 181691 Total Data 304310 TCM: Training Data
  • 27.
    28 Model Name AccuracyF1-Score Training Time Logistic Regression 0.9726 0.9674 39.9 secs SVM 0.9626 0.9548 6h 48min 33secs Neural Network 0.9774 0.9719 1min 11secs TCM: Classification Model
  • 28.
    29 Label Number ofsamples Positive labels 219 Negative labels 314 Total Data 533 ICM: Training Data
  • 29.
  • 30.
    31 ICM: Classification Model ModelName Accuracy F1-Score Training Time CNN 0.7631 0.7219 3mins 50secs CNN + DA 0.7781 0.7463 4mins 12secs CNN + TL 0.8291 0.8103 8mins 48secs CNN + DA + TL 0.8571 0.8454 9min 23secs
  • 31.
    32 Label Number ofsamples Positive labels 114 Negative labels 126 Total Data 240 GICM: Training Data
  • 32.
    33 GICM: Classification Model ModelName Accuracy F1-Score Training Time Logistic Regression 0.7650 0.7873 5 secs SVM 0.8300 0.8495 7 secs Neural Network 0.8173 0.8325 48.6 secs
  • 33.
    34 Proposed Model ▰Text ClassificationModel: Neural Network ▰Image Classification Model: CNN + DA + TL ▰General Information Model: SVM
  • 34.
  • 35.
  • 36.
    Conclusion 37 ▰Proof-of-concept of terroristdetection model ▰Working with multiple social networks and multiple data types ▰Supports over-time behavior change
  • 37.
    Perspective 38 ▰Graph Analysis ▰Support moredata types: Video ▰Train on more data
  • 38.
    39 Thank you foryour attention !