How AI is preventing account fraud at web scale

How AI is preventing
account fraud at web
scale
Amir Moghimi
Co-founder & CTO
crossclassify.com
1

Byron Bay data breach victim told to pay
Adidas, National Basketball Association
$1.2m by US courts
"The charges were cybersquatting,
trademark infringement, IP infringement,
things I don't know anything about."
ABC North Coast /
25 July 2023
2

3
■ In a survey run by AIC, 47% of respondents in 2023, experienced
at least one cybercrime in the 12 months prior to the survey.
■ 20% of these cybercrimes was identity crime and misuse. *
■ No surprise with Optus, Latitude and Medibank data breaches.
* Australian Institute of Criminology

Running an
online service
User
interactions
Sign-up
Sign-in
Range of fraud
patterns
Accelerate
user acquisition and
forget about fraud
4

AI-powered fraud detection and
prevention system
Understanding
Online Accounts
Entity Description
Fraud Detection
Methods
Evaluation
5

prevention system
Understanding
Online Accounts
Entity Description
Fraud Detection
Methods
Evaluation
6

Detecting
important entities
prevention system
Understanding
Online Accounts
Entity Description
Fraud Detection
Methods
Evaluation
Account Device IP
7

Proﬁle
Construction
Detecting
important entities
prevention system
Understanding
Online Accounts
Entity Description
Fraud Detection
Methods
Evaluation
8
Account Device IP

More than 65 features for Account
9

More than 70 features for Device
10

More than 40 features for IP
11

12
Proﬁle
Construction
Detecting
important entities
prevention system
Understanding
Online Accounts
Entity Description
Fraud Detection
Methods
Evaluation
Account Device IP

Deviation from
normal behavior
13
Proﬁle
Construction
Detecting
important entities
prevention system
Understanding
Online Accounts
Entity Description
Fraud Detection
Methods
Evaluation
Account Device IP

14
Deviation from
normal behavior
Proﬁle
Construction
Detecting
important entities
prevention system
Understanding
Online Accounts
Entity Description
Fraud Detection
Methods
Evaluation
Account Device IP
Statistical Supervised Unsupervised

Supervised
Z-score
Box
whisker
Clustering
Collective
Outliers
Outlier
Score
15
Deviation from
normal behavior
Proﬁle
Construction
Detecting
important entities
prevention system
Understanding
Online Accounts
Entity Description
Fraud Detection
Methods
Evaluation
Account Device IP

16
Supervised
Z-score
Box
whisker
Clustering
Collective
Outliers
Outlier
Score
Deviation from
normal behavior
Proﬁle
Construction
Detecting
important entities
prevention system
Understanding
Online Accounts
Entity Description
Fraud Detection
Methods
Evaluation
Account Device IP

Statistical
Box-Whisker
Outlier: An
account with
9 distinct
devices
Z-Score
Outlier
17
Device Count

Unsupervised
Account sharing like Netﬂix,
Spotify, Gaming apps
One account with more
than 20 distinct devices and
10 different IPs (Far from
normal accounts)
Accounts with
normal behavior
One account with
approximately near normal
behavior
18
Dev Count
IP Count

Supervised
Outlier
Residual= Anomaly Score
19

20
Supervised
Z-score
Box
whisker
Clustering
Collective
Outliers
Outlier
Score
Deviation from
normal behavior
Proﬁle
Construction
Detecting
important entities
prevention system
Understanding
Online Accounts
Entity Description
Fraud Detection
Methods
Evaluation
Account Device IP
Statistics Supervised Unsupervised

Quantitative
21
Supervised
Z-score
Box
whisker
Clustering
Collective
Outliers
Outlier
Score
Deviation from
normal behavior
Proﬁle
Construction
Detecting
important entities
prevention system
Understanding
Online Accounts
Entity Description
Fraud Detection
Methods
Evaluation
Account Device IP
Qualitative

Precision
Recall F-measure
Adaptability
Explainability Scalability
22
Quantitative
Supervised
Z-score
Box
whisker
Clustering
Collective
Outliers
Outlier
Score
Deviation from
normal behavior
Proﬁle
Construction
Detecting
important entities
prevention system
Understanding
Online Accounts
Entity Description
Fraud Detection
Methods
Evaluation
Account Device IP
Qualitative

23
Explainability-Accuracy Trade-Off
Prediction
Accuracy
Explainability
Learning Techniques (today) Explainability
(notional)
Neural
Nets
Statistica
l
Models
Ensembl
e
Methods
Decisio
n
Trees
Deep
Learnin
g
SVM
s
AOG
s
Bayesian
Belief
Nets
Markov
Models
HBN
s
MLN
s
New
Approach
Create a suite of
machine learning
techniques that
produce more
explainable models,
while maintaining a
high level of
learning
performance
SR
L
CRF
s
Rando
m
Forests
Graphic
al
Models

Challenges
Imbalanced
Dataset
Fraud Patterns
Change/Evolve
Cat & Mouse
Model Selection
+
Parameter Tuning
24

25
Challenges
Imbalanced
Dataset
Fraud Patterns
Change/Evolve
Cat & Mouse
Model Selection
+
Parameter Tuning

Make it
balanced
26
Challenges
Imbalanced
Dataset
Fraud Patterns
Change/Evolve
Cat & Mouse
Model Selection
+
Parameter Tuning

Imbalanced Dataset
Make it balanced
Data
Generation
Data
Augmentation
Sampling
27
Over
Sampling
Under
Sampling
0.001% of
actions are
fraud

Make it
balanced
Ensemble
methods
28
Make it
balanced
Challenges
Imbalanced
Dataset
Fraud Patterns
Change/Evolve
Cat & Mouse
Model Selection
+
Parameter Tuning

Ensemble methods
Model
1
Model
2
Model
3
New
Sample
Final
Prediction
What is the best
combination of
methods?
Random Forest and
XGBoost are best
practices
29
Imbalanced Dataset

Make it
balanced
Ensemble
methods
Adaptive AI
30
Challenges
Imbalanced
Dataset
Fraud Patterns
Change/Evolve
Cat & Mouse
Model Selection
+
Parameter Tuning

Unsupervised methods adaptively
detect fraud with unseen patterns
stream
Dynamic Clustering
Adaptive AI
31
New Inlier
Cluster

stream
32
outlier
Dynamic Clustering
Adaptive AI

stream
33
outlier
Dynamic Clustering
Adaptive AI

stream
34
Changing the
outlier Cluster
to inlier one
Dynamic Clustering
Adaptive AI

Make it
balanced
Ensemble
methods
Adaptive AI Trial and Error
Proper
Modeling
35
Challenges
Imbalanced
Dataset
Fraud Patterns
Change/Evolve
Cat & Mouse
Model Selection
+
Parameter Tuning

Trial and Error /
Experience
At the beginning,
interpretability &
explainability are more
important
As we get more data, more
complicated methods with
large parameters come into
the picture
Model Selection
Parameter Tuning
Implement the
Proper Modeling
36

Account Take Over
Modeling Example
37

Dataset (Account Take Over)
Account Login features:
✔ Timestamp, IP address, Country, City, User agent , … .
✔ Derived features from existing ones
Original dataset consists of:
✔ 34.1 million login attempts (records)
✔ 3.2 million users
✔ 14 features
✔ Non-fraud records: 32269123
✔ Fraud records: 181
38

Challenges
39
Imbalanced Dataset
Parameter Tuning
Explainability
Under Sampling
Neural Network Modeling
Decision Tree

Dataset sampling
✔ Non-fraud records: 32,269,123
✔ Fraud records: 181
✔ Too imbalanced for neural network classiﬁers
✔ Train and test split: 20% test size
✔ Under sampling for showcasing different model types:
• Under sampled the non-fraud records to 1,000
40

Layers architecture Parameter No. Recall F1
1 13 – 64 – 1 961 4% 6%
2 13 – 128 – 1 1,921 4% 7%
3 13 – 256 – 128 – 1 36,609 11% 19%
4 13 – 128 – 256 – 256 – 128 – 1 133,633 71% 23%
5 13 – 128 – 512 – 256 – 128 – 1 232,193 14% 20%
Neural Network
41

How AI is preventing account fraud at web scale

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How AI is preventing account fraud at web scale

Similar to How AI is preventing account fraud at web scale (20)

Recently uploaded

Recently uploaded (20)

How AI is preventing account fraud at web scale