Fraud Analytics with Machine Learning and Big Data Engineering for Telecom

Fraud Analytics with Machine Learning & Engineering
(FAME) for Telecom using Big Data
Presented by:
Sudarson Roy Pratihar
Pranab Kumar Dash
Subhadip Paul
Amartya Kumar Das
1Copyright © 2015 Authors. All rights reserved.

A Quick Intro – Telecom Frauds
Fraud Analytics With Machine Learning &
Engineering
2
• Have you got missed call from unknown numbers from
overseas?
• Have you heard of PBX hacking and corporate facing huge
bills?

Problem Definition
• Telecom industries loose 46.3 billion USD
globally due to various frauds
• 10% operators have bad debt due to fraud
• Detection is cat and mouse game – pattern
changes to get undetected by available
data mining techniques
• Timely alert by processing huge volume of
call records is a challenge
• Alerts with high false positives have more
operational expenses
Fraud Analytics With Machine Learning &
Engineering
3

Importance to Telecom Industry & Society
• Efficient and self adaptive detection
mechanism can reduce significant loss
(about 2.1% of the revenue) due to fraud
and operational cost
• Less “Bad Money” to the system
Fraud Analytics With Machine Learning & Engineering 4

Data Source
• More than 1 TB of Call Detail Record
(CDR) from a reputed wholesale carrier
as history data
• Tested on few weeks of live CDR of the
carrier

Analytics Technique
• Basic components of FAME are:
– Self adaptive Machine learning
methodology
– Actionable dash board for operations and
investigations team to act upon the alerts
and feedback sent to machine learning
model for adjusting weights.
– High performance big data platform for
data processing and machine learning

How it detects and adapts …
7Fraud Analytics With Machine Learning & Engineering
Fraud Detection Model
Pipeline
Novelty Detection
Pipeline / Stacking
Actionable Dashboards
Pattern validation and
tuning work bench
CDR Feed
1
2 4
Remaining
Data
Frauds detected
3
5
6
7 New Patterns
More frauds
8
New model addition / Tuning of existing9
10
Operators
feedback
Analyst
Operator

Novelty Detection Pipeline
• Novelty detection of origin and destination
numbers separately
• Various Contextual Anomaly Detection used and
outputs are combined
• Below are some examples of algorithms used
• Box-plot based outlier
• Clustering to find out cluster with distinct
centroid
• Use of Mahalonbis Distance –
Mdist > ɸ. IQR

Novelty Detection – Illustrations

Fraud Detection Pipeline
10
• Use history data and flag records based on
“Novelty Detection Pipeline”
• Verify those records and mark them
• Build separate models (logistic regression,
random forest models and threshold based)
for different patterns
• Combine outputs of the models
Fraud Analytics With Machine Learning & Engineering

ACTIONABLE DASHBOARD
System Behind Magic …
ENSEMBLE OF SELF ADAPTIVE ALGOS
BIG DATA PLATFORM
POWERED BY HADOOP & SPARK
INTEGRATION
FACETS
FEEDBACK
CDR FEED
FROM TELECOM SYSTEM

Platform Behind Magic …

Accuracy Results
13
0 0.2 0.4 0.6 0.8 1
True positive
False positive
Accuracy
B-Number A-Number
• Individual accuracy for
origin and destination
numbers detection
• Combined mechanism
has <5% false positive

What Next …
14
• Test for different types telecom frauds
• Extend this industrialized approach to other
areas (such as network intrusion detection)
• Productize as cloud based service as well as on
premise implementation

Contact Us @
Amartya Kumar Das
amartya_das_2014@cba.isb.edu
https://in.linkedin.com/pub/amartya-
das/b/72b/637
Subhadip Paul
Subhadip_paul_2014@cba.isb.edu
https://in.linkedin.com/in/subhadippaul
Pranab Kumar Dash
Pranab_dash_2014@cba.isb.edu
www.linkedin.com/profile/view?id=19155
039
Sudarson Roy Pratihar
sudarson_pratihar_2014@cba.isb.edu
www.linkedin.com/in/sudarson
Follow us #FAMETELCO

Fraud Analytics with Machine Learning and Big Data Engineering for Telecom

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Fraud Analytics with Machine Learning and Big Data Engineering for Telecom

Similar to Fraud Analytics with Machine Learning and Big Data Engineering for Telecom (20)

Recently uploaded

Recently uploaded (20)

Fraud Analytics with Machine Learning and Big Data Engineering for Telecom