Machine learning techniques in fraud prevention

Digital Marketing Fraud Prevention
using Machine Learning approach
in real-time

Outline:
1. What is digital fraud?;
2. Some kinds of digital fraud;
3. Principles of digital fraud detection;
4. Fraud Prevention issues and their solutions;
5. Machine learning as a solution;
6. Benefits of Factorization machine;
7. FM, how it works;
8. Tensor Flow FM Samples;
9. Conclusions;

What is digital fraud?
• User activities and market manipulations that
affect legitimate business flows and prevent
customer satisfaction;
• We will consider risks for online products,
publishers, and advertisers;

Main kinds of Digital Fraud:
• Marketing fraud (click fraud, anti-competitive
behavior, advertising budgets wasting)
• Account/Identity fraud (Account takeovers, fake
accounts)
• Payment fraud (banking identity theft, credit card
fraud)
• Other types(email, auction fraud, etc.)

Principles of Digital Fraud prevention:
• Using social authentication and cookies from all
parties;
• Using captcha protection;
• Using proofs of humanity (phone numbers,
emails);
• Tracking device parameters /device fingerprint;
• Tracking visitors behavior in details;
• Tracking IP address and HTTP protocol
parameters;
• Triggers and score based evaluation;

Tracking everything
• during the test?
• What if the traffic is nonhomogeneous?

Device/Browser Fingerprint:
How is your browser unique from http://panopticlick.eff.org
Browser info
from:
http://browserspy.dk,
https://amiunique.org/fp
https://browserleaks.com

Triggers and rules:
𝐹𝑟𝑎𝑢𝑑 𝑆𝑐𝑜𝑟𝑒 = 𝑓 𝜔0 + 𝜔1 𝑥1 + 𝜔2 𝑥2 + ⋯ + 𝜔 𝑛 𝑥 𝑛
The main approach is fraud scoring:
𝑥𝑖 means trigger event {yes, no}
𝜔𝑖 corresponds to trigger’s weight (importance)
Triggers:
• Browser’s User agent change;
• Device Emulation;
• Using Anonymization techniques like TOR;
• Using Proxy;
• Visitor is not a human;
• Out-of-date browser versions;
• Using Do-Not-Track option;
• Time zone and language inconsistency;

Some issues behind digital fraud detection:
• Visitors do not like to wait to be checked;
• Visitors do not like captchas;
• Visitors can spend a short time on a website;
• Visitors like to be anonymous;
• Fraudsters can be smarter than your fraud
prevention solution;

Can these issues be solved?
• Visitors do not like to wait to be checked;
Real-time checks (machine learning)
• Visitors do not like captchas;
Using captcha once and to keep visitor identity (fingerprint)
• Visitors can spend a short time on a website;
Checking should be fast and accurate (machine learning)
• Visitors like to be anonymous;
No de-anonymization techniques, human proof only
• Fraudsters can be smarter than any fraud prevention solution;
It is normal, no solutions here

Machine learning approach
“A computer program is said to learn
from experience E with respect to some
task T and some performance measure
P, if its performance on T, as measured
by P, improves with experience E.”
Tom Mitchell,
Carnegie Mellon University
https://www.toptal.com/machine-learning/machine-learning-theory-an-introductory-primer
In FraudHunt we are looking for a
compromise between the
uncertainty and the ability to
apply existing knowledge about
fraud, taking data from the
visitor’s profile.

Considering fraud prevention as a
mathematical task for classification
Let 𝑋 —a set of object (data sample) features,
𝑌 —a set of labels of object’s classes.
An unknown target dependency exists 𝑦∗: 𝑋 → 𝑌,
where their values are known on finite number of examples of
training dataset 𝑋 𝑚 = 𝑥1, 𝑦1 , … , 𝑥 𝑚, 𝑦 𝑚 .
An algorithm 𝑎: 𝑋 → 𝑌 is needed for classification of
any arbitrary object (data sample) 𝑥 ∈ 𝑋.
Data Algorithm Predictive Model

Fraud Predictor
𝜔 – classification model parameters vector;
𝜑𝑖– a classifying function;
𝑥 − visitor’s profile attributes vector;
Predictive model
Is a Fraud
Is Not a Fraud
Is Unknown
𝜑𝑖 𝜔, 𝑥
We are using a classification model 𝜑𝑖 𝜔, 𝑥
where:
Applying a predictive model

Classifier requirements:
• High computing speed for real-time predictions;
• High training speed for a model building process;
• The ability to learn using high dimensional data (a
visitor’s profile contains dozens of attributes);
• The option to scale effectively in case of big data
volumes with high degree of sparsity;

The suitable algorithms:
• Naive Bayes classifier;
• Logistic regression;
• Decision trees, random forest, boosted
trees;
• Support vector machine (SVM);
• Factorization machine;
• Deep neural networks;

Factorization machine model
𝑎 𝑥 = 𝜔0 +
𝑖=1
𝑛
𝜔𝑖 𝑥𝑖 +
𝑖=1
𝑛
𝑗>𝑖
𝑛
𝜔𝑖,𝑗 𝑥𝑖 𝑥𝑗 𝑥 ∈ ℝ 𝑛
The model contains unknown parameters:
𝑛 𝑛 − 1
2
+ 𝑛 + 1
In the case of the binarization of categorical variables, n can be quite large.
The polynomial regression of the 2nd order with n-attributes is
𝑦 𝑥 = 𝜔0 +
𝑖=1
𝑛
𝜔𝑖 𝑥𝑖 +
𝑖=1
𝑛
𝑗>𝑖
𝑛
𝑣𝑖, 𝑣𝑗 𝑥𝑖 𝑥𝑗
The number of unknown parameters is 𝑘 ∙ 𝑛 + 𝑛 + 1
𝜔0 ∈ ℝ
𝑽 ∈ ℝ 𝑛×𝑘
k determines the size of the factorization
Factorization machine has a following equation:

What are the significant advantages of the FM?
1. Allows you to effectively evaluate the options in case of
highly sparse, high dimensional data.
2. Has a linear learning complexity, providing a polynomial
effect.
3. It is a generalization of recommending models with
matrix factorizations.
4. Combines advantages of SVM and factorization models.
Reduce complexity from O(N2)
to O(kN):
Credit from [5]

FM learning. Data featuring
𝑥 𝑖
, 𝑦 𝑖
|𝑖 = 1, … , 𝑀 𝑦 𝑖
∈ 0,1
Metadata features
Observation x1 x2 x3 … xN-1 xN
o(1)
1 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 12 1 0 0 0 0
o(2)
0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 1 20 0 1 0 0 0
o(3)
0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 9 0 0 1 0 1
o(4)
1 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 8 0 0 0 1 0
o(5)
1 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 15 0 0 0 1 0
o(6)
0 1 0 1 1 0 1 0 0 0 0 0 0 1 1 0 7 0 0 1 0 0
o(7)
1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 10 0 0 1 0 1
o(8)
1 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 2 1 0 0 0 0
o(M-1)
1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 11 0 0 0 1 0
o(M)
0 1 0 1 1 0 0 0 0 0 1 0 1 0 0 0 18 0 0 1 0 0
Device:Mobile
Device:Desktop
Browser:Chrome
Browser:FF
ScreenRes:Wide
Country:USA
Country:UK
Country:Fr
…
Language:En
Language:Fr
Language:…
TSource:Search
TSource:Direct
TSource:Refer
EntryDay:Mon
EntryDay:Tue
…
EntryHour
Cont:VarA
Cont:VarB
Cont:VarC
Cont:VarX
isFraud
Device /Env
attributes
Country/Localization
Features
Traffic
Features
Entry
Features
Contextual
Features

FM model main learning points
1. SGD [Rendle] is used for learning
argmin
𝜔
𝑖
𝑙 𝜑 𝜔, 𝑥 , 𝑦 + 𝜆 × 𝜔 2
𝑙 𝜔 = − ln σ 𝜑 𝜔, 𝑥 ∙ 𝑦 σ 𝛼 =
1
1 + 𝑒−𝛼
The function of losses of a binary classification:
where
2. The rating model quality through auROC, Log-Loss.
3. A threshold post processing for predicted score

Implement FM using Tensor Flow
import tensorflow as tf
from tffm import TFFMRegressor, TFFMClassifier
model = TFFMClassifier(
order=2,
rank=10,
optimizer=tf.train.AdamOptimizer(learning_rate=0.01),
n_epochs=100,
batch_size=-1,
init_std=0.001,
input_type='sparse')
model = TFFMRegressor(
order=2,
rank=10,
optimizer=tf.train.AdamOptimizer(learning_rate=0.1),
n_epochs=100,
batch_size=-1,
init_std=0.001,
input_type='dense'
)
model.fit(X_tr, y_tr, show_progress=True)
predictions = model.predict(X_te)

FM and Neural Networks
The neural network architecture of Attentional Factorization Machine
Source: https://github.com/hexiangnan/attentional_factorization_machine

Conclusions
• The factorization machines are a powerful tool for fraud
prediction. It is difficult to estimate which accuracy indicators
should be considered acceptable for the real world, as the
models built on real data have average auROC: 0.7…0.75.
• To improve the predictive modeling accuracy, some
additional efforts are required from the analysts, which are
often hard to execute on practice.
• Using GPU boosts training process up to 3.5-4 times
compared to CPU (one instance).

Stuff worth of your attention:
1. Rendle, S. 2012. Factorization machines with libFM. ACM Trans. Intell. Syst. Technol. 3, 3, Article 57,
(May 2012), 22 pages. (https://www.ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdf )
2. Bayer, I. "fastFM: A Library for Factorization Machines" Journal of Machine Learning Research 17, pp.
1-5 (2016)
3. Mathieu Blondel, Masakazu Ishihata, Akinori Fujino, Naonori Ueda. Polynomial Networks and
Factorization Machines: New Insights and Efficient Training Algorithms. In: Proc. of ICML 2016.
4. http://www.libfm.org
5. https://www.slideshare.net/SparkSummit/spark-summit-eu-talk-by-nick-pentreath
6. https://www.slideshare.net/BartomiejTwardowski/warsaw-data-science-factorization-machines-
introduction
7. http://www.jefkine.com/recsys/2017/03/27/factorization-machines
8. https://getstream.io/blog/factorization-machines-recommendation-systems
9. https://www.tensorflow.org/get_started/get_started
GitHub links for python:
1. https://github.com/jfloff/pywFM (LibFM based)
2. https://github.com/geffy/tffm (Tensorflow based)
3. https://github.com/kopopt/fast_tffm (Tensorflow based)
4. https://github.com/ibayer/fastFM
5. https://github.com/coreylynch/pyFM
6. https://github.com/scikit-learn-contrib/polylearn
7. https://github.com/comadan/FM_FTRL
8. https://github.com/guestwalk/libffm (c++ implementation of Kaggle winning Field Aware FM)

Machine learning techniques in fraud prevention

More Related Content

Similar to Machine learning techniques in fraud prevention

Recently uploaded

Machine learning techniques in fraud prevention