Digital Marketing Fraud Prevention
using Machine Learning approach
in real-time
Outline:
1. What is digital fraud?;
2. Some kinds of digital fraud;
3. Principles of digital fraud detection;
4. Fraud Prevention issues and their solutions;
5. Machine learning as a solution;
6. Benefits of Factorization machine;
7. FM, how it works;
8. Tensor Flow FM Samples;
9. Conclusions;
What is digital fraud?
• User activities and market manipulations that
affect legitimate business flows and prevent
customer satisfaction;
• We will consider risks for online products,
publishers, and advertisers;
Main kinds of Digital Fraud:
• Marketing fraud (click fraud, anti-competitive
behavior, advertising budgets wasting)
• Account/Identity fraud (Account takeovers, fake
accounts)
• Payment fraud (banking identity theft, credit card
fraud)
• Other types(email, auction fraud, etc.)
Principles of Digital Fraud prevention:
• Using social authentication and cookies from all
parties;
• Using captcha protection;
• Using proofs of humanity (phone numbers,
emails);
• Tracking device parameters /device fingerprint;
• Tracking visitors behavior in details;
• Tracking IP address and HTTP protocol
parameters;
• Triggers and score based evaluation;
Are you human?
Tracking everything
• during the test?
• What if the traffic is nonhomogeneous?
Device/Browser Fingerprint:
How is your browser unique from http://panopticlick.eff.org
Browser info
from:
http://browserspy.dk,
https://amiunique.org/fp
https://browserleaks.com
Triggers and rules:
𝐹𝑟𝑎𝑢𝑑 𝑆𝑐𝑜𝑟𝑒 = 𝑓 𝜔0 + 𝜔1 𝑥1 + 𝜔2 𝑥2 + ⋯ + 𝜔 𝑛 𝑥 𝑛
The main approach is fraud scoring:
𝑥𝑖 means trigger event {yes, no}
𝜔𝑖 corresponds to trigger’s weight (importance)
Triggers:
• Browser’s User agent change;
• Device Emulation;
• Using Anonymization techniques like TOR;
• Using Proxy;
• Visitor is not a human;
• Out-of-date browser versions;
• Using Do-Not-Track option;
• Time zone and language inconsistency;
Some issues behind digital fraud detection:
• Visitors do not like to wait to be checked;
• Visitors do not like captchas;
• Visitors can spend a short time on a website;
• Visitors like to be anonymous;
• Fraudsters can be smarter than your fraud
prevention solution;
Can these issues be solved?
• Visitors do not like to wait to be checked;
Real-time checks (machine learning)
• Visitors do not like captchas;
Using captcha once and to keep visitor identity (fingerprint)
• Visitors can spend a short time on a website;
Checking should be fast and accurate (machine learning)
• Visitors like to be anonymous;
No de-anonymization techniques, human proof only
• Fraudsters can be smarter than any fraud prevention solution;
It is normal, no solutions here
Machine learning approach
“A computer program is said to learn
from experience E with respect to some
task T and some performance measure
P, if its performance on T, as measured
by P, improves with experience E.”
Tom Mitchell,
Carnegie Mellon University
https://www.toptal.com/machine-learning/machine-learning-theory-an-introductory-primer
In FraudHunt we are looking for a
compromise between the
uncertainty and the ability to
apply existing knowledge about
fraud, taking data from the
visitor’s profile.
Considering fraud prevention as a
mathematical task for classification
Let 𝑋 —a set of object (data sample) features,
𝑌 —a set of labels of object’s classes.
An unknown target dependency exists 𝑦∗: 𝑋 → 𝑌,
where their values are known on finite number of examples of
training dataset 𝑋 𝑚 = 𝑥1, 𝑦1 , … , 𝑥 𝑚, 𝑦 𝑚 .
An algorithm 𝑎: 𝑋 → 𝑌 is needed for classification of
any arbitrary object (data sample) 𝑥 ∈ 𝑋.
Data Algorithm Predictive Model
Fraud Predictor
𝜔 – classification model parameters vector;
𝜑𝑖– a classifying function;
𝑥 − visitor’s profile attributes vector;
Predictive model
Is a Fraud
Is Not a Fraud
Is Unknown
𝜑𝑖 𝜔, 𝑥
We are using a classification model 𝜑𝑖 𝜔, 𝑥
where:
Applying a predictive model
Classifier requirements:
• High computing speed for real-time predictions;
• High training speed for a model building process;
• The ability to learn using high dimensional data (a
visitor’s profile contains dozens of attributes);
• The option to scale effectively in case of big data
volumes with high degree of sparsity;
The suitable algorithms:
• Naive Bayes classifier;
• Logistic regression;
• Decision trees, random forest, boosted
trees;
• Support vector machine (SVM);
• Factorization machine;
• Deep neural networks;
Factorization machine model
𝑎 𝑥 = 𝜔0 +
𝑖=1
𝑛
𝜔𝑖 𝑥𝑖 +
𝑖=1
𝑛
𝑗>𝑖
𝑛
𝜔𝑖,𝑗 𝑥𝑖 𝑥𝑗 𝑥 ∈ ℝ 𝑛
The model contains unknown parameters:
𝑛 𝑛 − 1
2
+ 𝑛 + 1
In the case of the binarization of categorical variables, n can be quite large.
The polynomial regression of the 2nd order with n-attributes is
𝑦 𝑥 = 𝜔0 +
𝑖=1
𝑛
𝜔𝑖 𝑥𝑖 +
𝑖=1
𝑛
𝑗>𝑖
𝑛
𝑣𝑖, 𝑣𝑗 𝑥𝑖 𝑥𝑗
The number of unknown parameters is 𝑘 ∙ 𝑛 + 𝑛 + 1
𝜔0 ∈ ℝ
𝑽 ∈ ℝ 𝑛×𝑘
k determines the size of the factorization
Factorization machine has a following equation:
What are the significant advantages of the FM?
1. Allows you to effectively evaluate the options in case of
highly sparse, high dimensional data.
2. Has a linear learning complexity, providing a polynomial
effect.
3. It is a generalization of recommending models with
matrix factorizations.
4. Combines advantages of SVM and factorization models.
Reduce complexity from O(N2)
to O(kN):
Credit from [5]
FM learning. Data featuring
𝑥 𝑖
, 𝑦 𝑖
|𝑖 = 1, … , 𝑀 𝑦 𝑖
∈ 0,1
Metadata features
Observation x1 x2 x3 … xN-1 xN
o(1)
1 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 12 1 0 0 0 0
o(2)
0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 1 20 0 1 0 0 0
o(3)
0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 9 0 0 1 0 1
o(4)
1 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 8 0 0 0 1 0
o(5)
1 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 15 0 0 0 1 0
o(6)
0 1 0 1 1 0 1 0 0 0 0 0 0 1 1 0 7 0 0 1 0 0
o(7)
1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 10 0 0 1 0 1
o(8)
1 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 2 1 0 0 0 0
o(M-1)
1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 11 0 0 0 1 0
o(M)
0 1 0 1 1 0 0 0 0 0 1 0 1 0 0 0 18 0 0 1 0 0
Device:Mobile
Device:Desktop
Browser:Chrome
Browser:FF
ScreenRes:Wide
Country:USA
Country:UK
Country:Fr
…
Language:En
Language:Fr
Language:…
TSource:Search
TSource:Direct
TSource:Refer
EntryDay:Mon
EntryDay:Tue
…
EntryHour
Cont:VarA
Cont:VarB
Cont:VarC
Cont:VarX
isFraud
Device /Env
attributes
Country/Localization
Features
Traffic
Features
Entry
Features
Contextual
Features
FM model main learning points
1. SGD [Rendle] is used for learning
argmin
𝜔
𝑖
𝑙 𝜑 𝜔, 𝑥 , 𝑦 + 𝜆 × 𝜔 2
𝑙 𝜔 = − ln σ 𝜑 𝜔, 𝑥 ∙ 𝑦 σ 𝛼 =
1
1 + 𝑒−𝛼
The function of losses of a binary classification:
where
2. The rating model quality through auROC, Log-Loss.
3. A threshold post processing for predicted score
Implement FM using Tensor Flow
import tensorflow as tf
from tffm import TFFMRegressor, TFFMClassifier
model = TFFMClassifier(
order=2,
rank=10,
optimizer=tf.train.AdamOptimizer(learning_rate=0.01),
n_epochs=100,
batch_size=-1,
init_std=0.001,
input_type='sparse')
model = TFFMRegressor(
order=2,
rank=10,
optimizer=tf.train.AdamOptimizer(learning_rate=0.1),
n_epochs=100,
batch_size=-1,
init_std=0.001,
input_type='dense'
)
model.fit(X_tr, y_tr, show_progress=True)
predictions = model.predict(X_te)
FM and Neural Networks
The neural network architecture of Attentional Factorization Machine
Source: https://github.com/hexiangnan/attentional_factorization_machine
Conclusions
• The factorization machines are a powerful tool for fraud
prediction. It is difficult to estimate which accuracy indicators
should be considered acceptable for the real world, as the
models built on real data have average auROC: 0.7…0.75.
• To improve the predictive modeling accuracy, some
additional efforts are required from the analysts, which are
often hard to execute on practice.
• Using GPU boosts training process up to 3.5-4 times
compared to CPU (one instance).
Stuff worth of your attention:
1. Rendle, S. 2012. Factorization machines with libFM. ACM Trans. Intell. Syst. Technol. 3, 3, Article 57,
(May 2012), 22 pages. (https://www.ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdf )
2. Bayer, I. "fastFM: A Library for Factorization Machines" Journal of Machine Learning Research 17, pp.
1-5 (2016)
3. Mathieu Blondel, Masakazu Ishihata, Akinori Fujino, Naonori Ueda. Polynomial Networks and
Factorization Machines: New Insights and Efficient Training Algorithms. In: Proc. of ICML 2016.
4. http://www.libfm.org
5. https://www.slideshare.net/SparkSummit/spark-summit-eu-talk-by-nick-pentreath
6. https://www.slideshare.net/BartomiejTwardowski/warsaw-data-science-factorization-machines-
introduction
7. http://www.jefkine.com/recsys/2017/03/27/factorization-machines
8. https://getstream.io/blog/factorization-machines-recommendation-systems
9. https://www.tensorflow.org/get_started/get_started
GitHub links for python:
1. https://github.com/jfloff/pywFM (LibFM based)
2. https://github.com/geffy/tffm (Tensorflow based)
3. https://github.com/kopopt/fast_tffm (Tensorflow based)
4. https://github.com/ibayer/fastFM
5. https://github.com/coreylynch/pyFM
6. https://github.com/scikit-learn-contrib/polylearn
7. https://github.com/comadan/FM_FTRL
8. https://github.com/guestwalk/libffm (c++ implementation of Kaggle winning Field Aware FM)
Thank you for your attention!

Machine learning techniques in fraud prevention

  • 1.
    Digital Marketing FraudPrevention using Machine Learning approach in real-time
  • 2.
    Outline: 1. What isdigital fraud?; 2. Some kinds of digital fraud; 3. Principles of digital fraud detection; 4. Fraud Prevention issues and their solutions; 5. Machine learning as a solution; 6. Benefits of Factorization machine; 7. FM, how it works; 8. Tensor Flow FM Samples; 9. Conclusions;
  • 3.
    What is digitalfraud? • User activities and market manipulations that affect legitimate business flows and prevent customer satisfaction; • We will consider risks for online products, publishers, and advertisers;
  • 4.
    Main kinds ofDigital Fraud: • Marketing fraud (click fraud, anti-competitive behavior, advertising budgets wasting) • Account/Identity fraud (Account takeovers, fake accounts) • Payment fraud (banking identity theft, credit card fraud) • Other types(email, auction fraud, etc.)
  • 5.
    Principles of DigitalFraud prevention: • Using social authentication and cookies from all parties; • Using captcha protection; • Using proofs of humanity (phone numbers, emails); • Tracking device parameters /device fingerprint; • Tracking visitors behavior in details; • Tracking IP address and HTTP protocol parameters; • Triggers and score based evaluation;
  • 6.
  • 7.
    Tracking everything • duringthe test? • What if the traffic is nonhomogeneous?
  • 8.
    Device/Browser Fingerprint: How isyour browser unique from http://panopticlick.eff.org Browser info from: http://browserspy.dk, https://amiunique.org/fp https://browserleaks.com
  • 9.
    Triggers and rules: 𝐹𝑟𝑎𝑢𝑑𝑆𝑐𝑜𝑟𝑒 = 𝑓 𝜔0 + 𝜔1 𝑥1 + 𝜔2 𝑥2 + ⋯ + 𝜔 𝑛 𝑥 𝑛 The main approach is fraud scoring: 𝑥𝑖 means trigger event {yes, no} 𝜔𝑖 corresponds to trigger’s weight (importance) Triggers: • Browser’s User agent change; • Device Emulation; • Using Anonymization techniques like TOR; • Using Proxy; • Visitor is not a human; • Out-of-date browser versions; • Using Do-Not-Track option; • Time zone and language inconsistency;
  • 10.
    Some issues behinddigital fraud detection: • Visitors do not like to wait to be checked; • Visitors do not like captchas; • Visitors can spend a short time on a website; • Visitors like to be anonymous; • Fraudsters can be smarter than your fraud prevention solution;
  • 11.
    Can these issuesbe solved? • Visitors do not like to wait to be checked; Real-time checks (machine learning) • Visitors do not like captchas; Using captcha once and to keep visitor identity (fingerprint) • Visitors can spend a short time on a website; Checking should be fast and accurate (machine learning) • Visitors like to be anonymous; No de-anonymization techniques, human proof only • Fraudsters can be smarter than any fraud prevention solution; It is normal, no solutions here
  • 12.
    Machine learning approach “Acomputer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.” Tom Mitchell, Carnegie Mellon University https://www.toptal.com/machine-learning/machine-learning-theory-an-introductory-primer In FraudHunt we are looking for a compromise between the uncertainty and the ability to apply existing knowledge about fraud, taking data from the visitor’s profile.
  • 13.
    Considering fraud preventionas a mathematical task for classification Let 𝑋 —a set of object (data sample) features, 𝑌 —a set of labels of object’s classes. An unknown target dependency exists 𝑦∗: 𝑋 → 𝑌, where their values are known on finite number of examples of training dataset 𝑋 𝑚 = 𝑥1, 𝑦1 , … , 𝑥 𝑚, 𝑦 𝑚 . An algorithm 𝑎: 𝑋 → 𝑌 is needed for classification of any arbitrary object (data sample) 𝑥 ∈ 𝑋. Data Algorithm Predictive Model
  • 14.
    Fraud Predictor 𝜔 –classification model parameters vector; 𝜑𝑖– a classifying function; 𝑥 − visitor’s profile attributes vector; Predictive model Is a Fraud Is Not a Fraud Is Unknown 𝜑𝑖 𝜔, 𝑥 We are using a classification model 𝜑𝑖 𝜔, 𝑥 where: Applying a predictive model
  • 15.
    Classifier requirements: • Highcomputing speed for real-time predictions; • High training speed for a model building process; • The ability to learn using high dimensional data (a visitor’s profile contains dozens of attributes); • The option to scale effectively in case of big data volumes with high degree of sparsity;
  • 16.
    The suitable algorithms: •Naive Bayes classifier; • Logistic regression; • Decision trees, random forest, boosted trees; • Support vector machine (SVM); • Factorization machine; • Deep neural networks;
  • 17.
    Factorization machine model 𝑎𝑥 = 𝜔0 + 𝑖=1 𝑛 𝜔𝑖 𝑥𝑖 + 𝑖=1 𝑛 𝑗>𝑖 𝑛 𝜔𝑖,𝑗 𝑥𝑖 𝑥𝑗 𝑥 ∈ ℝ 𝑛 The model contains unknown parameters: 𝑛 𝑛 − 1 2 + 𝑛 + 1 In the case of the binarization of categorical variables, n can be quite large. The polynomial regression of the 2nd order with n-attributes is 𝑦 𝑥 = 𝜔0 + 𝑖=1 𝑛 𝜔𝑖 𝑥𝑖 + 𝑖=1 𝑛 𝑗>𝑖 𝑛 𝑣𝑖, 𝑣𝑗 𝑥𝑖 𝑥𝑗 The number of unknown parameters is 𝑘 ∙ 𝑛 + 𝑛 + 1 𝜔0 ∈ ℝ 𝑽 ∈ ℝ 𝑛×𝑘 k determines the size of the factorization Factorization machine has a following equation:
  • 18.
    What are thesignificant advantages of the FM? 1. Allows you to effectively evaluate the options in case of highly sparse, high dimensional data. 2. Has a linear learning complexity, providing a polynomial effect. 3. It is a generalization of recommending models with matrix factorizations. 4. Combines advantages of SVM and factorization models. Reduce complexity from O(N2) to O(kN): Credit from [5]
  • 19.
    FM learning. Datafeaturing 𝑥 𝑖 , 𝑦 𝑖 |𝑖 = 1, … , 𝑀 𝑦 𝑖 ∈ 0,1 Metadata features Observation x1 x2 x3 … xN-1 xN o(1) 1 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 12 1 0 0 0 0 o(2) 0 1 0 1 1 1 0 0 1 0 0 1 0 0 0 1 20 0 1 0 0 0 o(3) 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 9 0 0 1 0 1 o(4) 1 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 8 0 0 0 1 0 o(5) 1 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 15 0 0 0 1 0 o(6) 0 1 0 1 1 0 1 0 0 0 0 0 0 1 1 0 7 0 0 1 0 0 o(7) 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 10 0 0 1 0 1 o(8) 1 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 2 1 0 0 0 0 o(M-1) 1 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 11 0 0 0 1 0 o(M) 0 1 0 1 1 0 0 0 0 0 1 0 1 0 0 0 18 0 0 1 0 0 Device:Mobile Device:Desktop Browser:Chrome Browser:FF ScreenRes:Wide Country:USA Country:UK Country:Fr … Language:En Language:Fr Language:… TSource:Search TSource:Direct TSource:Refer EntryDay:Mon EntryDay:Tue … EntryHour Cont:VarA Cont:VarB Cont:VarC Cont:VarX isFraud Device /Env attributes Country/Localization Features Traffic Features Entry Features Contextual Features
  • 20.
    FM model mainlearning points 1. SGD [Rendle] is used for learning argmin 𝜔 𝑖 𝑙 𝜑 𝜔, 𝑥 , 𝑦 + 𝜆 × 𝜔 2 𝑙 𝜔 = − ln σ 𝜑 𝜔, 𝑥 ∙ 𝑦 σ 𝛼 = 1 1 + 𝑒−𝛼 The function of losses of a binary classification: where 2. The rating model quality through auROC, Log-Loss. 3. A threshold post processing for predicted score
  • 21.
    Implement FM usingTensor Flow import tensorflow as tf from tffm import TFFMRegressor, TFFMClassifier model = TFFMClassifier( order=2, rank=10, optimizer=tf.train.AdamOptimizer(learning_rate=0.01), n_epochs=100, batch_size=-1, init_std=0.001, input_type='sparse') model = TFFMRegressor( order=2, rank=10, optimizer=tf.train.AdamOptimizer(learning_rate=0.1), n_epochs=100, batch_size=-1, init_std=0.001, input_type='dense' ) model.fit(X_tr, y_tr, show_progress=True) predictions = model.predict(X_te)
  • 22.
    FM and NeuralNetworks The neural network architecture of Attentional Factorization Machine Source: https://github.com/hexiangnan/attentional_factorization_machine
  • 23.
    Conclusions • The factorizationmachines are a powerful tool for fraud prediction. It is difficult to estimate which accuracy indicators should be considered acceptable for the real world, as the models built on real data have average auROC: 0.7…0.75. • To improve the predictive modeling accuracy, some additional efforts are required from the analysts, which are often hard to execute on practice. • Using GPU boosts training process up to 3.5-4 times compared to CPU (one instance).
  • 24.
    Stuff worth ofyour attention: 1. Rendle, S. 2012. Factorization machines with libFM. ACM Trans. Intell. Syst. Technol. 3, 3, Article 57, (May 2012), 22 pages. (https://www.ismll.uni-hildesheim.de/pub/pdfs/Rendle2010FM.pdf ) 2. Bayer, I. "fastFM: A Library for Factorization Machines" Journal of Machine Learning Research 17, pp. 1-5 (2016) 3. Mathieu Blondel, Masakazu Ishihata, Akinori Fujino, Naonori Ueda. Polynomial Networks and Factorization Machines: New Insights and Efficient Training Algorithms. In: Proc. of ICML 2016. 4. http://www.libfm.org 5. https://www.slideshare.net/SparkSummit/spark-summit-eu-talk-by-nick-pentreath 6. https://www.slideshare.net/BartomiejTwardowski/warsaw-data-science-factorization-machines- introduction 7. http://www.jefkine.com/recsys/2017/03/27/factorization-machines 8. https://getstream.io/blog/factorization-machines-recommendation-systems 9. https://www.tensorflow.org/get_started/get_started GitHub links for python: 1. https://github.com/jfloff/pywFM (LibFM based) 2. https://github.com/geffy/tffm (Tensorflow based) 3. https://github.com/kopopt/fast_tffm (Tensorflow based) 4. https://github.com/ibayer/fastFM 5. https://github.com/coreylynch/pyFM 6. https://github.com/scikit-learn-contrib/polylearn 7. https://github.com/comadan/FM_FTRL 8. https://github.com/guestwalk/libffm (c++ implementation of Kaggle winning Field Aware FM)
  • 25.
    Thank you foryour attention!