SlideShare a Scribd company logo
A data mining framework for fraud detection in telecom based
on MapReduce
By
Mohammed Fahmi Kharma
May 31, 2011
Table of Contents
Introduction ..........................................................................................................................................3
Background ...............................................................................................................................................4
Related work.............................................................................................................................................6
Contribution..............................................................................................................................................7
General Objective .....................................................................................................................................7
Specific Objectives ....................................................................................................................................7
Scope of the work .....................................................................................................................................8
The added value of our work....................................................................................................................8
Methodology.............................................................................................................................................8
Time table ...............................................................................................................................................10
References ..............................................................................................................................................10
Introduction
During the last years, Word have seen a rapid growing and expansion in modern
technology especially in telecommunication and internet, in parallel with this development fraud
events are increasing dramatically where it is causing major losses estimated by billions of
dollars throughout the worldwide yearly. According to Concise Oxford dictionary fraud is a
wrongful or criminal deception intended to result in financial or personal gain.
MapReduce is a programming model and an associated implementation for processing and
generating large data sets. Users specify a map function that processes a key/value pair to
generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate
values associated with the same intermediate key [1], MapReduce model use two operations for
computation: map and reduce, map operation should executed before reduce operation, and it’s a
commonly style in functional programming languages. Each map operation applies computation
to a key-value pair, and the result is one or more key-value pairs that are fed as input to the
reduce step. Each reduce operation receives a list of key-value pairs which share the same key,
and reduces these pairs by aggregating the results into one or more values for this key.
MapReduce framework automatically parallelizes and executes on a large cluster of machines.
The run-time system takes care of the details of partitioning the input data, scheduling the
program’s execution across a set of machines, handling machine failures, and managing the
required inter-machine communication. This enables programmers with no experience in parallel
and distributed systems programming to easily utilize the resources of a large distributed system.
The main idea inside MapReduce framework, Users specify a map function that processes a
key/value pair to generate a set of intermediate key/value pairs, and a reduce function that
merges all intermediate values associated with the same intermediate key[1].
.
Fig.1 - MapReduce overview, Jeffrey Dean and Sanjay[1]
Background
Telecommunication is one very interesting environment as it generating and storing a
huge amount of data collected through its systems to record and reflect the company operation
and its subscriber activity, one of these data can be obtained from call details record(CDR) where
information about A-number, B-number, Duration, Call Path, Timestamps...etc exists.
According to Mieke Jans et al(2010). They presented an overview of how they see the different
classifications and their relations to each other presented by In Figure 2; the most public
classification is the internal versus external fraud, since all other classifications are situated
within internal fraud. As already pointed out, we see occupational fraud and abuse as an
equivalent of internal fraud. Figure 2 also shows that all classifications left, apply only to
corporate fraud. Also they classified internal fraud into three different classifications. Starting
from a differentiation between statement fraud and transaction fraud. A second classification is
based upon the occupation level of the fraudulent employee. Thirdly, fraud classification for
fraud against the company [6].
Fig.2 - Fraud Classification Overview, Mert Sanver, Adem Karahoca [2]
Fraud activity can be defined as a dishonest or illegal use of services, with the intention to avoid
service charges. Fraud detection is the name of the activities to identify unauthorized usage and
prevent losses for the mobile network operators’ [2]. Telecommunication Companies often
receive revenue loss from customers’ fraudulent behaviors. There are different types of fraud in
the telecommunication business [3]. Shawe-Taylor et al. (2000) present six different fraud types:
subscription fraud, the manipulation of Private Branch Exchange (PBX) facilities or dial through
fraud, free phone fraud, premium rate service fraud, handset theft, and roaming fraud[5].
In fraud detection process, in order to determine the fraud attack and its types, Call detail
records are processed to investigate the subscription fraud, premium rate service fraud or
roaming fraud. In subscription fraud, a fraudster obtains a subscription with fake personal
information to be registered on the network to perform his fraudulent activity with no intention
to pay the bill or fees [2].
Related work
Telecom fraud history extends from early days of Telecom companies, where these
companies are expensing a lot of money to reduce fraudster’s attaches and to keep the
competition with other operators by saving itself from possible significant losses may be caused
by fraudster that may affect the company ability in facing their competitors.
There are many studies have been started in fraud detection and prevention track, we will have a
look on some of these studies. Hamid Farvaresh et al. (2011) study aimed at identifying
customers’ subscription fraud in telecom by employing combined SOM and K-means techniques
through a hybrid approach consisting of preprocessing, clustering, and classification phases, and
adopting knowledge discovery process. MARTIN HÄGER et al. (2011) the application of
general outlier detection and classification methods to the problem of detecting fraudulent
behavior in an online advertisement metrics. Viaene et al. (2004) and Viaene et al. (2002) for
automobile insurance fraud detection by combining the advantages of boosting and the
explanatory power of the weight of evidence AdaBoosted naive Bayes scoring framework. A
combination of neural network and rules by Brause et al. (1999) and Estévez et al. (2006) have
been used. Mert Sanver et al.(2009) offers the Adaptive Neuro Fuzzy Inference (ANFIS)
method as a means to efficient fraud detection.
He et al. (1997) apply neural networks: a multi-layer perception network in the supervised
component of their study and Kohonen’s self-organizing maps for the unsupervised part. Fawcett
et al. proposed an adaptive rule-based detection framework for fraud detection. Roset et al. state
the standard classification and rule generation were not appropriate for fraud detection. D.
Hawkins(1980) interested in data outlier where these data most likely would be more suspicious
than regular and normal distributed data. R. Rastogi S. Ramaswamy(2000) et al. extend outlier
method based on the distance of a point from its k th nearest neighbor based on previous work
contained distance based method outlier applications was accomplished by R. Ng E. Knorr et
al(2000).
Contribution
We mentioned in previous sections various data mining techniques and how can be used
to enable fraud detection. In our work, we are focusing in design and implement the first fraud
detection model for telecom environment in different domain, in MapReduce domain, we will
use commodity machines and network to implement our model, where our model will be the first
live example on fraud detection using cloud computing. Our model will include implementation
of data mining algorithm, initially we selected K-mean algorithm and also we expect our model
should operate in near online mode to detect and classify fraud events, so this will enhance the
ability to detect the subscription fraud events early and results in major reduction in revenue
losses.
General Objective
The outputs of our research is a design and implement a model using data mining to
detect fraud cases targeting telecom environment where a huge volume of data should to be
processed based on cloud computing infrastructure we will build using the most popular and
powerful cloud computing framework MapReduce. We will use Data obtained from call details
record (CDR) in billing repository and the result is subscriber subset that classified as fraudulent
subscription in near online mode. This will help to reduce time in detecting fraud events and
enhance revenue assurance team ability to identify fraudulent cases efficiently.
Specific Objectives
 Collecting required data from a telecomm operator.
 Identifying the classification parameters required in for data mining process.
 Design a framework for fraud detection based on MapReduce framework.
 Running the proposed framework and collect the results based on collected data from the
telecom operator to analyze and evaluate our work from performance and classification
of fraud events point of view.
Scope of the work
We are interested in our research on telecommunication fraud. We will take one
telecommunication operator as a case study; International Data Corporation has identified more
than 200 forms of telecommunication fraud [12]. We will focus on subscription fraud in telecom
throughout our research as a specific type of fraud categories.
The added value of our work
Using our framework, we will get the following added values:
 Design and implement the first fraud detection model for telecom environment based on
Map reduce framework.
 Our system will work in near online results, so this will enhance the ability to detect the
subscription fraud events early and results in major reduction in revenue losses.
 Increase the trust in the telecom operator who is using our system by avoiding the
company many fraudulent attaches.
Methodology
We are planning to build an environment for fraud detection/data mining for telecom
sector, our framework will be built on top of MapReduce framework, as we mentioned early,
MapReduce framework allows his users to parallelizes and executes program's on a large cluster
of machines through partitioning the input data, and scheduling the program's execution over a
set of machines. And as we are aware about the large volume of data that are generated every
day, we selected MapReduce to help use in building the distributed environment for our
framework.
Our framework will use fraud detection/data mining algorithms with adopted implementation to
MapReduce framework as it will work in parallelized and executed on a cluster of machines
which we plan to use SunGrid clusters to build our own distributed environment as SunGrid is
open source and free use or we can use one of cloud computing vendor infrastructure to use it as
infrastructure in our work like Amazon or Google.
Our framework will implement at least one classification algorithm; this algorithm/s will be used
to detect subscription fraud cases and to build a model from a set of training data. This model is
subsequently used to classify new data entered to the system. We will try to implement more
than one algorithm to see their results and performance also in MapReduce environment. Initial
K-means algorithm has been selected to be adopted in our framework as a starting point in our
work. We are organizing our work in our thesis as below:
 Prepare all required research that we will be used in our thesis with taking advantage
from related work not necessary in telecom, May in other fields.
 Design our model for detection for fraud based on MapReduce domain.
 Identity and extract the top N factors that we will build our fraud detection data mining
model on them and any other parameter / rule that can help us in detecting fraud events.
 Prepare the dataset and perform data cleaning from missing values...etc. and divide the
main dataset into testing data set and training dataset.
 Setup the cloud infrastructure including MapReduce framework and SunGrid clusters to
build our own distributed environment.
 Initial K-means algorithm has been selected to be adopted in our framework as a starting
point in our work.
 Test the data mining results and validate it.
 Perform stress test for our framework against the various volumes of datasets and
monitor its behavior.
 Refine our framework and perform the necessary.
 Final Review / Complete and submit the final report.
Time table
Number Tasks Time Period
1 Preparing all needed research that we will be used in our thesis
with taking advantage from related work not necessary in telecom,
may in other fields
4 week
2 Design the proposed framework for fraud detection based on
MapReduce framework with identifying parameters required in
for data mining process
2 weeks
3 Gathering specific requirements if exists especially related to
MapReduce setup, required data, programming language and
supporting technology.
2 weeks
5 Set up the environment of MapReduce and SunGrid based on
distributed environment
2 weeks
6 Implementation of our detection framework, including coding 4 weeks
7 Phase one full testing and fixing bugs 3 Week
8 Optimization and stress test 2 Week
9 Phase two testing. 2 weeks
10 Final Review / Complete and submit. 3 week
References
1- Jeffrey Dean and Sanjay. MapReduce: Simplified Data Processing on Large Clusters,
Ghemawat, Google, Inc.
2- Mert Sanver, Adem Karahoca . Fraud Detection Using an Adaptive Neuro-Fuzzy
Inference System in Mobile Telecommunication Networks.
3- Shawe-Taylor, J., Howker, K., Burge, P.,. Detection of Fraud in Mobile
Telecommunications. Information Security Technical Report 4 (1), 16–28.
4- Shawe-Taylor, J., Howker, K., Gosset, P., Hyland, M., Verrelst, H., Moreau, Y., et al..
Novel techniques for profiling and fraud in mobile telecommunication. In: Lisboa, P.J.G.,
Edisbury, B., Vellido, A. (Eds.), Business Applications of Neural Networks. The State-
of-the-Art of Real World Applications. World scientific, Singapore, pp. 113–139.
5- Hamid Farvaresh, Mohammad Mehdi Sepehri, 2011. A data mining framework for
detecting subscription fraud in telecommunication.
6- Mieke Jans, Nadine Lybaert and Koen Vanhoof, 2011. Framework for Internal Fraud
Risk Reduction at IT Integrating Business Processes: The IFR² Framework
7- MARTIN HÄGER TORSTEN LANDERGREN, 2010. Implementing best practices for
fraud detection on an online advertising platform
8- Viaene, S., Derrig, R., Baesens, B. & Dedene, G. (2002). A Comparison of State-of-the-
Art Classification Techniques for Expert Automobile Insurance Claim Fraud Detection.
9- Viaene, S., Derrig, R. & Dedene, G. (2004). A Case Study of Applying Boosting Naive
Bayes to Claim Fraud Diagnosis. IEEE Transactions on Knowledge and Data
Engineering.
10- Fawcett, T. and Provost, F. (1997). Adaptive fraud detection. Journal of Data Mining and
Knowledge Discovery 1(3).
11- Roset, S., Murad, U., Neumann, E., Idan, Y. and Pinkas, G. (1999). Discovery of fraud
rules for telecommunications—challenges and solutions. Proceedings of the Fifth ACM
SIGKDD International.
12- OJUKA NELSON, 2009. DETECTION OF SUBSCRIPTION FRAUD IN
TELECOMMUNICATIONS USING DECISION TREE LEARNING.
13- D. Hawkins, 1980. “Identification of outliers,” Champman and Hall, Reading, London.
14- R. Ng E. Knorr and T. Tucakov, “Distance-based outliers,” Algorithms and Applications,
vol. 8, no. 3,pp. 237–253, 2000.
15- R. Rastogi S. Ramaswamy and S. Kyuseok, “Efficient algorithms for mining outliers
from large data sets,” SIGMOD’OO, 2000.

More Related Content

What's hot

IRJET- Survey on Credit Card Fraud Detection
IRJET- Survey on Credit Card Fraud DetectionIRJET- Survey on Credit Card Fraud Detection
IRJET- Survey on Credit Card Fraud Detection
IRJET Journal
 
Detecting fraud in cellular telephone networks
Detecting fraud in cellular telephone networksDetecting fraud in cellular telephone networks
Detecting fraud in cellular telephone networks
Jamal Meselmani
 
Comprehensive training on bypass sim box fraud
Comprehensive training on bypass sim box fraudComprehensive training on bypass sim box fraud
Comprehensive training on bypass sim box fraud
Massango Junior
 
credit card fraud analysis using predictive modeling python project abstract
credit card fraud analysis using predictive modeling python project abstractcredit card fraud analysis using predictive modeling python project abstract
credit card fraud analysis using predictive modeling python project abstract
Venkat Projects
 
Analysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionAnalysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detection
Justluk Luk
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
ijceronline
 
Reveneu frauds and telcos
Reveneu frauds and telcosReveneu frauds and telcos
Reveneu frauds and telcosmrkhanlodhi
 
SAS for Claims Fraud
SAS for Claims FraudSAS for Claims Fraud
SAS for Claims Fraud
stuartdrose
 
TM Forum Fraud Management Group Activities - Presented at TM Forum's Manageme...
TM Forum Fraud Management Group Activities - Presented at TM Forum's Manageme...TM Forum Fraud Management Group Activities - Presented at TM Forum's Manageme...
TM Forum Fraud Management Group Activities - Presented at TM Forum's Manageme...
cVidya Networks
 
Detecting Fraud Using Transaction Frequency Data
Detecting Fraud Using Transaction Frequency DataDetecting Fraud Using Transaction Frequency Data
Detecting Fraud Using Transaction Frequency Data
ITIIIndustries
 
A Survey of Online Credit Card Fraud Detection using Data Mining Techniques
A Survey of Online Credit Card Fraud Detection using Data Mining TechniquesA Survey of Online Credit Card Fraud Detection using Data Mining Techniques
A Survey of Online Credit Card Fraud Detection using Data Mining Techniques
IJSRD
 
IRJET - Online Credit Card Fraud Detection and Prevention System
IRJET - Online Credit Card Fraud Detection and Prevention SystemIRJET - Online Credit Card Fraud Detection and Prevention System
IRJET - Online Credit Card Fraud Detection and Prevention System
IRJET Journal
 
A Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine LearningA Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine Learning
ijtsrd
 
SAS Fraud Framework for Insurance
SAS Fraud Framework for InsuranceSAS Fraud Framework for Insurance
SAS Fraud Framework for Insurance
stuartdrose
 
Ijigsp v6-n2-6
Ijigsp v6-n2-6Ijigsp v6-n2-6
Ijigsp v6-n2-6
Anita Pal
 
Review on Fraud Detection in Electronic Payment Gateway
Review on Fraud Detection in Electronic Payment GatewayReview on Fraud Detection in Electronic Payment Gateway
Review on Fraud Detection in Electronic Payment Gateway
IRJET Journal
 
Target@ Data Breach2edit
Target@ Data Breach2editTarget@ Data Breach2edit
Target@ Data Breach2editKehinde Adelusi
 
Audit,fraud detection Using Picalo
Audit,fraud detection Using PicaloAudit,fraud detection Using Picalo
Audit,fraud detection Using Picalo
guest4ea866f
 
Merchant Account Tips: Proven Methods for Reducing Online Credit Card Fraud &...
Merchant Account Tips: Proven Methods for Reducing Online Credit Card Fraud &...Merchant Account Tips: Proven Methods for Reducing Online Credit Card Fraud &...
Merchant Account Tips: Proven Methods for Reducing Online Credit Card Fraud &...
CDGcommerce
 
Online Payment System using Steganography and Visual Cryptography
Online Payment System using Steganography and Visual CryptographyOnline Payment System using Steganography and Visual Cryptography
Online Payment System using Steganography and Visual Cryptography
ijtsrd
 

What's hot (20)

IRJET- Survey on Credit Card Fraud Detection
IRJET- Survey on Credit Card Fraud DetectionIRJET- Survey on Credit Card Fraud Detection
IRJET- Survey on Credit Card Fraud Detection
 
Detecting fraud in cellular telephone networks
Detecting fraud in cellular telephone networksDetecting fraud in cellular telephone networks
Detecting fraud in cellular telephone networks
 
Comprehensive training on bypass sim box fraud
Comprehensive training on bypass sim box fraudComprehensive training on bypass sim box fraud
Comprehensive training on bypass sim box fraud
 
credit card fraud analysis using predictive modeling python project abstract
credit card fraud analysis using predictive modeling python project abstractcredit card fraud analysis using predictive modeling python project abstract
credit card fraud analysis using predictive modeling python project abstract
 
Analysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionAnalysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detection
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Reveneu frauds and telcos
Reveneu frauds and telcosReveneu frauds and telcos
Reveneu frauds and telcos
 
SAS for Claims Fraud
SAS for Claims FraudSAS for Claims Fraud
SAS for Claims Fraud
 
TM Forum Fraud Management Group Activities - Presented at TM Forum's Manageme...
TM Forum Fraud Management Group Activities - Presented at TM Forum's Manageme...TM Forum Fraud Management Group Activities - Presented at TM Forum's Manageme...
TM Forum Fraud Management Group Activities - Presented at TM Forum's Manageme...
 
Detecting Fraud Using Transaction Frequency Data
Detecting Fraud Using Transaction Frequency DataDetecting Fraud Using Transaction Frequency Data
Detecting Fraud Using Transaction Frequency Data
 
A Survey of Online Credit Card Fraud Detection using Data Mining Techniques
A Survey of Online Credit Card Fraud Detection using Data Mining TechniquesA Survey of Online Credit Card Fraud Detection using Data Mining Techniques
A Survey of Online Credit Card Fraud Detection using Data Mining Techniques
 
IRJET - Online Credit Card Fraud Detection and Prevention System
IRJET - Online Credit Card Fraud Detection and Prevention SystemIRJET - Online Credit Card Fraud Detection and Prevention System
IRJET - Online Credit Card Fraud Detection and Prevention System
 
A Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine LearningA Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine Learning
 
SAS Fraud Framework for Insurance
SAS Fraud Framework for InsuranceSAS Fraud Framework for Insurance
SAS Fraud Framework for Insurance
 
Ijigsp v6-n2-6
Ijigsp v6-n2-6Ijigsp v6-n2-6
Ijigsp v6-n2-6
 
Review on Fraud Detection in Electronic Payment Gateway
Review on Fraud Detection in Electronic Payment GatewayReview on Fraud Detection in Electronic Payment Gateway
Review on Fraud Detection in Electronic Payment Gateway
 
Target@ Data Breach2edit
Target@ Data Breach2editTarget@ Data Breach2edit
Target@ Data Breach2edit
 
Audit,fraud detection Using Picalo
Audit,fraud detection Using PicaloAudit,fraud detection Using Picalo
Audit,fraud detection Using Picalo
 
Merchant Account Tips: Proven Methods for Reducing Online Credit Card Fraud &...
Merchant Account Tips: Proven Methods for Reducing Online Credit Card Fraud &...Merchant Account Tips: Proven Methods for Reducing Online Credit Card Fraud &...
Merchant Account Tips: Proven Methods for Reducing Online Credit Card Fraud &...
 
Online Payment System using Steganography and Visual Cryptography
Online Payment System using Steganography and Visual CryptographyOnline Payment System using Steganography and Visual Cryptography
Online Payment System using Steganography and Visual Cryptography
 

Viewers also liked

Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
Gwen (Chen) Shapira
 
Data mining in Telecommunications
Data mining in TelecommunicationsData mining in Telecommunications
Data mining in Telecommunications
Mohsin Nadaf
 
Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model final
Ritu Sarkar
 
Big Data CDR Analyzer - Kanthaka
Big Data CDR Analyzer - KanthakaBig Data CDR Analyzer - Kanthaka
Big Data CDR Analyzer - Kanthaka
Pushpalanka Jayawardhana
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
Marc Berman
 
Survey on Credit Card Fraud Detection Using Different Data Mining Techniques
Survey on Credit Card Fraud Detection Using Different Data Mining TechniquesSurvey on Credit Card Fraud Detection Using Different Data Mining Techniques
Survey on Credit Card Fraud Detection Using Different Data Mining Techniques
ijsrd.com
 
Data Mining in telecommunication industry
Data Mining in telecommunication industryData Mining in telecommunication industry
Data Mining in telecommunication industry
pragya ratan
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learning
Adam Gibson
 
Fraud Detection presentation
Fraud Detection presentationFraud Detection presentation
Fraud Detection presentation
Hernan Huwyler
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Salah Amean
 
ACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and MitigationACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and Mitigation
Scott Mongeau
 
Bayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesBayesian Belief Networks for dummies
Bayesian Belief Networks for dummies
Gilad Barkan
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
DataWorks Summit/Hadoop Summit
 
PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014
Sri Ambati
 
Detecting Fraud Using Data Mining Techniques
Detecting Fraud Using Data Mining TechniquesDetecting Fraud Using Data Mining Techniques
Detecting Fraud Using Data Mining TechniquesDecosimoCPAs
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017
LinkedIn
 

Viewers also liked (18)

Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
Data mining in Telecommunications
Data mining in TelecommunicationsData mining in Telecommunications
Data mining in Telecommunications
 
Credit risk scoring model final
Credit risk scoring model finalCredit risk scoring model final
Credit risk scoring model final
 
Big Data CDR Analyzer - Kanthaka
Big Data CDR Analyzer - KanthakaBig Data CDR Analyzer - Kanthaka
Big Data CDR Analyzer - Kanthaka
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
Survey on Credit Card Fraud Detection Using Different Data Mining Techniques
Survey on Credit Card Fraud Detection Using Different Data Mining TechniquesSurvey on Credit Card Fraud Detection Using Different Data Mining Techniques
Survey on Credit Card Fraud Detection Using Different Data Mining Techniques
 
Lecture - Data Mining
Lecture - Data MiningLecture - Data Mining
Lecture - Data Mining
 
Data Mining in telecommunication industry
Data Mining in telecommunication industryData Mining in telecommunication industry
Data Mining in telecommunication industry
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learning
 
Fraud Detection presentation
Fraud Detection presentationFraud Detection presentation
Fraud Detection presentation
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
ACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and MitigationACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and Mitigation
 
Bayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesBayesian Belief Networks for dummies
Bayesian Belief Networks for dummies
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014
 
Detecting Fraud Using Data Mining Techniques
Detecting Fraud Using Data Mining TechniquesDetecting Fraud Using Data Mining Techniques
Detecting Fraud Using Data Mining Techniques
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017The Top Skills That Can Get You Hired in 2017
The Top Skills That Can Get You Hired in 2017
 

Similar to A data mining framework for fraud detection in telecom based on MapReduce (Proposal)

Machine Learning-Based Approaches for Fraud Detection in Credit Card Transact...
Machine Learning-Based Approaches for Fraud Detection in Credit Card Transact...Machine Learning-Based Approaches for Fraud Detection in Credit Card Transact...
Machine Learning-Based Approaches for Fraud Detection in Credit Card Transact...
IRJET Journal
 
An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...
An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...
An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...
IRJET Journal
 
Data Mining in Telecommunication Industry
Data Mining in Telecommunication IndustryData Mining in Telecommunication Industry
Data Mining in Telecommunication Industry
ijsrd.com
 
A review of Fake News Detection Methods
A review of Fake News Detection MethodsA review of Fake News Detection Methods
A review of Fake News Detection Methods
IRJET Journal
 
Ict2005 fms
Ict2005 fmsIct2005 fms
Ict2005 fmskkvences
 
Life and science journal.pdf
Life and science journal.pdfLife and science journal.pdf
Life and science journal.pdf
Sarita30844
 
An intrusion detection algorithm for ami
An intrusion detection algorithm for amiAn intrusion detection algorithm for ami
An intrusion detection algorithm for ami
IJCI JOURNAL
 
A Comparative Study on Credit Card Fraud Detection
A Comparative Study on Credit Card Fraud DetectionA Comparative Study on Credit Card Fraud Detection
A Comparative Study on Credit Card Fraud Detection
IRJET Journal
 
IRJET - Fraud Detection in Credit Card using Machine Learning Techniques
IRJET -  	  Fraud Detection in Credit Card using Machine Learning TechniquesIRJET -  	  Fraud Detection in Credit Card using Machine Learning Techniques
IRJET - Fraud Detection in Credit Card using Machine Learning Techniques
IRJET Journal
 
SECURING MOBILE AGENTS IN MANET AGAINST ATTACKS USING TRUST
SECURING MOBILE AGENTS IN MANET AGAINST ATTACKS USING TRUSTSECURING MOBILE AGENTS IN MANET AGAINST ATTACKS USING TRUST
SECURING MOBILE AGENTS IN MANET AGAINST ATTACKS USING TRUST
IJNSA Journal
 
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
IRJET Journal
 
An Optimized Stacking Ensemble Model For Phishing Websites Detection
An Optimized Stacking Ensemble Model For Phishing Websites DetectionAn Optimized Stacking Ensemble Model For Phishing Websites Detection
An Optimized Stacking Ensemble Model For Phishing Websites Detection
Joshua Gorinson
 
A Review of deep learning techniques in detection of anomaly incredit card tr...
A Review of deep learning techniques in detection of anomaly incredit card tr...A Review of deep learning techniques in detection of anomaly incredit card tr...
A Review of deep learning techniques in detection of anomaly incredit card tr...
IRJET Journal
 
A Privacy-Aware Tracking and Tracing System
A Privacy-Aware Tracking and Tracing SystemA Privacy-Aware Tracking and Tracing System
A Privacy-Aware Tracking and Tracing System
IJCNCJournal
 
A Privacy-Aware Tracking and Tracing System
A Privacy-Aware Tracking and Tracing SystemA Privacy-Aware Tracking and Tracing System
A Privacy-Aware Tracking and Tracing System
IJCNCJournal
 
Intrusion Detection System Using Machine Learning: An Overview
Intrusion Detection System Using Machine Learning: An OverviewIntrusion Detection System Using Machine Learning: An Overview
Intrusion Detection System Using Machine Learning: An Overview
IRJET Journal
 
etfa2014-CR.pdf
etfa2014-CR.pdfetfa2014-CR.pdf
etfa2014-CR.pdf
John Paul
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar reportmayurik19
 
A Compendium of Various Applications of Machine Learning
A Compendium of Various Applications of Machine LearningA Compendium of Various Applications of Machine Learning
A Compendium of Various Applications of Machine Learning
IRJET Journal
 
A NOVEL CHARGING AND ACCOUNTING SCHEME IN MOBILE AD-HOC NETWORKS
A NOVEL CHARGING AND ACCOUNTING SCHEME IN MOBILE AD-HOC NETWORKSA NOVEL CHARGING AND ACCOUNTING SCHEME IN MOBILE AD-HOC NETWORKS
A NOVEL CHARGING AND ACCOUNTING SCHEME IN MOBILE AD-HOC NETWORKS
IJNSA Journal
 

Similar to A data mining framework for fraud detection in telecom based on MapReduce (Proposal) (20)

Machine Learning-Based Approaches for Fraud Detection in Credit Card Transact...
Machine Learning-Based Approaches for Fraud Detection in Credit Card Transact...Machine Learning-Based Approaches for Fraud Detection in Credit Card Transact...
Machine Learning-Based Approaches for Fraud Detection in Credit Card Transact...
 
An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...
An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...
An Identification and Detection of Fraudulence in Credit Card Fraud Transacti...
 
Data Mining in Telecommunication Industry
Data Mining in Telecommunication IndustryData Mining in Telecommunication Industry
Data Mining in Telecommunication Industry
 
A review of Fake News Detection Methods
A review of Fake News Detection MethodsA review of Fake News Detection Methods
A review of Fake News Detection Methods
 
Ict2005 fms
Ict2005 fmsIct2005 fms
Ict2005 fms
 
Life and science journal.pdf
Life and science journal.pdfLife and science journal.pdf
Life and science journal.pdf
 
An intrusion detection algorithm for ami
An intrusion detection algorithm for amiAn intrusion detection algorithm for ami
An intrusion detection algorithm for ami
 
A Comparative Study on Credit Card Fraud Detection
A Comparative Study on Credit Card Fraud DetectionA Comparative Study on Credit Card Fraud Detection
A Comparative Study on Credit Card Fraud Detection
 
IRJET - Fraud Detection in Credit Card using Machine Learning Techniques
IRJET -  	  Fraud Detection in Credit Card using Machine Learning TechniquesIRJET -  	  Fraud Detection in Credit Card using Machine Learning Techniques
IRJET - Fraud Detection in Credit Card using Machine Learning Techniques
 
SECURING MOBILE AGENTS IN MANET AGAINST ATTACKS USING TRUST
SECURING MOBILE AGENTS IN MANET AGAINST ATTACKS USING TRUSTSECURING MOBILE AGENTS IN MANET AGAINST ATTACKS USING TRUST
SECURING MOBILE AGENTS IN MANET AGAINST ATTACKS USING TRUST
 
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
A Comparative Study for Credit Card Fraud Detection System using Machine Lear...
 
An Optimized Stacking Ensemble Model For Phishing Websites Detection
An Optimized Stacking Ensemble Model For Phishing Websites DetectionAn Optimized Stacking Ensemble Model For Phishing Websites Detection
An Optimized Stacking Ensemble Model For Phishing Websites Detection
 
A Review of deep learning techniques in detection of anomaly incredit card tr...
A Review of deep learning techniques in detection of anomaly incredit card tr...A Review of deep learning techniques in detection of anomaly incredit card tr...
A Review of deep learning techniques in detection of anomaly incredit card tr...
 
A Privacy-Aware Tracking and Tracing System
A Privacy-Aware Tracking and Tracing SystemA Privacy-Aware Tracking and Tracing System
A Privacy-Aware Tracking and Tracing System
 
A Privacy-Aware Tracking and Tracing System
A Privacy-Aware Tracking and Tracing SystemA Privacy-Aware Tracking and Tracing System
A Privacy-Aware Tracking and Tracing System
 
Intrusion Detection System Using Machine Learning: An Overview
Intrusion Detection System Using Machine Learning: An OverviewIntrusion Detection System Using Machine Learning: An Overview
Intrusion Detection System Using Machine Learning: An Overview
 
etfa2014-CR.pdf
etfa2014-CR.pdfetfa2014-CR.pdf
etfa2014-CR.pdf
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
 
A Compendium of Various Applications of Machine Learning
A Compendium of Various Applications of Machine LearningA Compendium of Various Applications of Machine Learning
A Compendium of Various Applications of Machine Learning
 
A NOVEL CHARGING AND ACCOUNTING SCHEME IN MOBILE AD-HOC NETWORKS
A NOVEL CHARGING AND ACCOUNTING SCHEME IN MOBILE AD-HOC NETWORKSA NOVEL CHARGING AND ACCOUNTING SCHEME IN MOBILE AD-HOC NETWORKS
A NOVEL CHARGING AND ACCOUNTING SCHEME IN MOBILE AD-HOC NETWORKS
 

More from Mohammed Kharma

Data Mining Project for student academic specialization and performance
Data Mining Project for student academic specialization and performanceData Mining Project for student academic specialization and performance
Data Mining Project for student academic specialization and performance
Mohammed Kharma
 
Cloud Computing Presentation
Cloud Computing PresentationCloud Computing Presentation
Cloud Computing Presentation
Mohammed Kharma
 
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner softwareData Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
Mohammed Kharma
 
How to speedup GWT compiler
How to speedup GWT compilerHow to speedup GWT compiler
How to speedup GWT compiler
Mohammed Kharma
 
37 c 551 - reduced changes in the carrier of steganography algorithm
37 c 551 - reduced changes in the carrier of steganography algorithm37 c 551 - reduced changes in the carrier of steganography algorithm
37 c 551 - reduced changes in the carrier of steganography algorithm
Mohammed Kharma
 
Learning objects and metadata framework - Mohammed Kharma
Learning objects and metadata framework - Mohammed KharmaLearning objects and metadata framework - Mohammed Kharma
Learning objects and metadata framework - Mohammed Kharma
Mohammed Kharma
 
Mohammed Kharma-A flexible framework for quality assurance and testing of sof...
Mohammed Kharma-A flexible framework for quality assurance and testing of sof...Mohammed Kharma-A flexible framework for quality assurance and testing of sof...
Mohammed Kharma-A flexible framework for quality assurance and testing of sof...
Mohammed Kharma
 
Mohammed Kharma - A flexible framework for quality assurance and testing of s...
Mohammed Kharma - A flexible framework for quality assurance and testing of s...Mohammed Kharma - A flexible framework for quality assurance and testing of s...
Mohammed Kharma - A flexible framework for quality assurance and testing of s...
Mohammed Kharma
 

More from Mohammed Kharma (8)

Data Mining Project for student academic specialization and performance
Data Mining Project for student academic specialization and performanceData Mining Project for student academic specialization and performance
Data Mining Project for student academic specialization and performance
 
Cloud Computing Presentation
Cloud Computing PresentationCloud Computing Presentation
Cloud Computing Presentation
 
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner softwareData Mining: Implementation of Data Mining Techniques using RapidMiner software
Data Mining: Implementation of Data Mining Techniques using RapidMiner software
 
How to speedup GWT compiler
How to speedup GWT compilerHow to speedup GWT compiler
How to speedup GWT compiler
 
37 c 551 - reduced changes in the carrier of steganography algorithm
37 c 551 - reduced changes in the carrier of steganography algorithm37 c 551 - reduced changes in the carrier of steganography algorithm
37 c 551 - reduced changes in the carrier of steganography algorithm
 
Learning objects and metadata framework - Mohammed Kharma
Learning objects and metadata framework - Mohammed KharmaLearning objects and metadata framework - Mohammed Kharma
Learning objects and metadata framework - Mohammed Kharma
 
Mohammed Kharma-A flexible framework for quality assurance and testing of sof...
Mohammed Kharma-A flexible framework for quality assurance and testing of sof...Mohammed Kharma-A flexible framework for quality assurance and testing of sof...
Mohammed Kharma-A flexible framework for quality assurance and testing of sof...
 
Mohammed Kharma - A flexible framework for quality assurance and testing of s...
Mohammed Kharma - A flexible framework for quality assurance and testing of s...Mohammed Kharma - A flexible framework for quality assurance and testing of s...
Mohammed Kharma - A flexible framework for quality assurance and testing of s...
 

Recently uploaded

SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 

Recently uploaded (20)

SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 

A data mining framework for fraud detection in telecom based on MapReduce (Proposal)

  • 1. A data mining framework for fraud detection in telecom based on MapReduce By Mohammed Fahmi Kharma May 31, 2011
  • 2. Table of Contents Introduction ..........................................................................................................................................3 Background ...............................................................................................................................................4 Related work.............................................................................................................................................6 Contribution..............................................................................................................................................7 General Objective .....................................................................................................................................7 Specific Objectives ....................................................................................................................................7 Scope of the work .....................................................................................................................................8 The added value of our work....................................................................................................................8 Methodology.............................................................................................................................................8 Time table ...............................................................................................................................................10 References ..............................................................................................................................................10
  • 3. Introduction During the last years, Word have seen a rapid growing and expansion in modern technology especially in telecommunication and internet, in parallel with this development fraud events are increasing dramatically where it is causing major losses estimated by billions of dollars throughout the worldwide yearly. According to Concise Oxford dictionary fraud is a wrongful or criminal deception intended to result in financial or personal gain. MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key [1], MapReduce model use two operations for computation: map and reduce, map operation should executed before reduce operation, and it’s a commonly style in functional programming languages. Each map operation applies computation to a key-value pair, and the result is one or more key-value pairs that are fed as input to the reduce step. Each reduce operation receives a list of key-value pairs which share the same key, and reduces these pairs by aggregating the results into one or more values for this key. MapReduce framework automatically parallelizes and executes on a large cluster of machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This enables programmers with no experience in parallel and distributed systems programming to easily utilize the resources of a large distributed system. The main idea inside MapReduce framework, Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key[1].
  • 4. . Fig.1 - MapReduce overview, Jeffrey Dean and Sanjay[1] Background Telecommunication is one very interesting environment as it generating and storing a huge amount of data collected through its systems to record and reflect the company operation and its subscriber activity, one of these data can be obtained from call details record(CDR) where information about A-number, B-number, Duration, Call Path, Timestamps...etc exists. According to Mieke Jans et al(2010). They presented an overview of how they see the different classifications and their relations to each other presented by In Figure 2; the most public classification is the internal versus external fraud, since all other classifications are situated within internal fraud. As already pointed out, we see occupational fraud and abuse as an
  • 5. equivalent of internal fraud. Figure 2 also shows that all classifications left, apply only to corporate fraud. Also they classified internal fraud into three different classifications. Starting from a differentiation between statement fraud and transaction fraud. A second classification is based upon the occupation level of the fraudulent employee. Thirdly, fraud classification for fraud against the company [6]. Fig.2 - Fraud Classification Overview, Mert Sanver, Adem Karahoca [2] Fraud activity can be defined as a dishonest or illegal use of services, with the intention to avoid service charges. Fraud detection is the name of the activities to identify unauthorized usage and prevent losses for the mobile network operators’ [2]. Telecommunication Companies often receive revenue loss from customers’ fraudulent behaviors. There are different types of fraud in the telecommunication business [3]. Shawe-Taylor et al. (2000) present six different fraud types:
  • 6. subscription fraud, the manipulation of Private Branch Exchange (PBX) facilities or dial through fraud, free phone fraud, premium rate service fraud, handset theft, and roaming fraud[5]. In fraud detection process, in order to determine the fraud attack and its types, Call detail records are processed to investigate the subscription fraud, premium rate service fraud or roaming fraud. In subscription fraud, a fraudster obtains a subscription with fake personal information to be registered on the network to perform his fraudulent activity with no intention to pay the bill or fees [2]. Related work Telecom fraud history extends from early days of Telecom companies, where these companies are expensing a lot of money to reduce fraudster’s attaches and to keep the competition with other operators by saving itself from possible significant losses may be caused by fraudster that may affect the company ability in facing their competitors. There are many studies have been started in fraud detection and prevention track, we will have a look on some of these studies. Hamid Farvaresh et al. (2011) study aimed at identifying customers’ subscription fraud in telecom by employing combined SOM and K-means techniques through a hybrid approach consisting of preprocessing, clustering, and classification phases, and adopting knowledge discovery process. MARTIN HÄGER et al. (2011) the application of general outlier detection and classification methods to the problem of detecting fraudulent behavior in an online advertisement metrics. Viaene et al. (2004) and Viaene et al. (2002) for automobile insurance fraud detection by combining the advantages of boosting and the explanatory power of the weight of evidence AdaBoosted naive Bayes scoring framework. A combination of neural network and rules by Brause et al. (1999) and Estévez et al. (2006) have been used. Mert Sanver et al.(2009) offers the Adaptive Neuro Fuzzy Inference (ANFIS) method as a means to efficient fraud detection. He et al. (1997) apply neural networks: a multi-layer perception network in the supervised component of their study and Kohonen’s self-organizing maps for the unsupervised part. Fawcett et al. proposed an adaptive rule-based detection framework for fraud detection. Roset et al. state the standard classification and rule generation were not appropriate for fraud detection. D. Hawkins(1980) interested in data outlier where these data most likely would be more suspicious than regular and normal distributed data. R. Rastogi S. Ramaswamy(2000) et al. extend outlier method based on the distance of a point from its k th nearest neighbor based on previous work contained distance based method outlier applications was accomplished by R. Ng E. Knorr et al(2000).
  • 7. Contribution We mentioned in previous sections various data mining techniques and how can be used to enable fraud detection. In our work, we are focusing in design and implement the first fraud detection model for telecom environment in different domain, in MapReduce domain, we will use commodity machines and network to implement our model, where our model will be the first live example on fraud detection using cloud computing. Our model will include implementation of data mining algorithm, initially we selected K-mean algorithm and also we expect our model should operate in near online mode to detect and classify fraud events, so this will enhance the ability to detect the subscription fraud events early and results in major reduction in revenue losses. General Objective The outputs of our research is a design and implement a model using data mining to detect fraud cases targeting telecom environment where a huge volume of data should to be processed based on cloud computing infrastructure we will build using the most popular and powerful cloud computing framework MapReduce. We will use Data obtained from call details record (CDR) in billing repository and the result is subscriber subset that classified as fraudulent subscription in near online mode. This will help to reduce time in detecting fraud events and enhance revenue assurance team ability to identify fraudulent cases efficiently. Specific Objectives  Collecting required data from a telecomm operator.  Identifying the classification parameters required in for data mining process.  Design a framework for fraud detection based on MapReduce framework.  Running the proposed framework and collect the results based on collected data from the telecom operator to analyze and evaluate our work from performance and classification of fraud events point of view.
  • 8. Scope of the work We are interested in our research on telecommunication fraud. We will take one telecommunication operator as a case study; International Data Corporation has identified more than 200 forms of telecommunication fraud [12]. We will focus on subscription fraud in telecom throughout our research as a specific type of fraud categories. The added value of our work Using our framework, we will get the following added values:  Design and implement the first fraud detection model for telecom environment based on Map reduce framework.  Our system will work in near online results, so this will enhance the ability to detect the subscription fraud events early and results in major reduction in revenue losses.  Increase the trust in the telecom operator who is using our system by avoiding the company many fraudulent attaches. Methodology We are planning to build an environment for fraud detection/data mining for telecom sector, our framework will be built on top of MapReduce framework, as we mentioned early, MapReduce framework allows his users to parallelizes and executes program's on a large cluster of machines through partitioning the input data, and scheduling the program's execution over a set of machines. And as we are aware about the large volume of data that are generated every day, we selected MapReduce to help use in building the distributed environment for our framework.
  • 9. Our framework will use fraud detection/data mining algorithms with adopted implementation to MapReduce framework as it will work in parallelized and executed on a cluster of machines which we plan to use SunGrid clusters to build our own distributed environment as SunGrid is open source and free use or we can use one of cloud computing vendor infrastructure to use it as infrastructure in our work like Amazon or Google. Our framework will implement at least one classification algorithm; this algorithm/s will be used to detect subscription fraud cases and to build a model from a set of training data. This model is subsequently used to classify new data entered to the system. We will try to implement more than one algorithm to see their results and performance also in MapReduce environment. Initial K-means algorithm has been selected to be adopted in our framework as a starting point in our work. We are organizing our work in our thesis as below:  Prepare all required research that we will be used in our thesis with taking advantage from related work not necessary in telecom, May in other fields.  Design our model for detection for fraud based on MapReduce domain.  Identity and extract the top N factors that we will build our fraud detection data mining model on them and any other parameter / rule that can help us in detecting fraud events.  Prepare the dataset and perform data cleaning from missing values...etc. and divide the main dataset into testing data set and training dataset.  Setup the cloud infrastructure including MapReduce framework and SunGrid clusters to build our own distributed environment.  Initial K-means algorithm has been selected to be adopted in our framework as a starting point in our work.  Test the data mining results and validate it.  Perform stress test for our framework against the various volumes of datasets and monitor its behavior.  Refine our framework and perform the necessary.  Final Review / Complete and submit the final report.
  • 10. Time table Number Tasks Time Period 1 Preparing all needed research that we will be used in our thesis with taking advantage from related work not necessary in telecom, may in other fields 4 week 2 Design the proposed framework for fraud detection based on MapReduce framework with identifying parameters required in for data mining process 2 weeks 3 Gathering specific requirements if exists especially related to MapReduce setup, required data, programming language and supporting technology. 2 weeks 5 Set up the environment of MapReduce and SunGrid based on distributed environment 2 weeks 6 Implementation of our detection framework, including coding 4 weeks 7 Phase one full testing and fixing bugs 3 Week 8 Optimization and stress test 2 Week 9 Phase two testing. 2 weeks 10 Final Review / Complete and submit. 3 week References 1- Jeffrey Dean and Sanjay. MapReduce: Simplified Data Processing on Large Clusters, Ghemawat, Google, Inc. 2- Mert Sanver, Adem Karahoca . Fraud Detection Using an Adaptive Neuro-Fuzzy Inference System in Mobile Telecommunication Networks. 3- Shawe-Taylor, J., Howker, K., Burge, P.,. Detection of Fraud in Mobile Telecommunications. Information Security Technical Report 4 (1), 16–28. 4- Shawe-Taylor, J., Howker, K., Gosset, P., Hyland, M., Verrelst, H., Moreau, Y., et al.. Novel techniques for profiling and fraud in mobile telecommunication. In: Lisboa, P.J.G., Edisbury, B., Vellido, A. (Eds.), Business Applications of Neural Networks. The State- of-the-Art of Real World Applications. World scientific, Singapore, pp. 113–139. 5- Hamid Farvaresh, Mohammad Mehdi Sepehri, 2011. A data mining framework for detecting subscription fraud in telecommunication. 6- Mieke Jans, Nadine Lybaert and Koen Vanhoof, 2011. Framework for Internal Fraud Risk Reduction at IT Integrating Business Processes: The IFR² Framework
  • 11. 7- MARTIN HÄGER TORSTEN LANDERGREN, 2010. Implementing best practices for fraud detection on an online advertising platform 8- Viaene, S., Derrig, R., Baesens, B. & Dedene, G. (2002). A Comparison of State-of-the- Art Classification Techniques for Expert Automobile Insurance Claim Fraud Detection. 9- Viaene, S., Derrig, R. & Dedene, G. (2004). A Case Study of Applying Boosting Naive Bayes to Claim Fraud Diagnosis. IEEE Transactions on Knowledge and Data Engineering. 10- Fawcett, T. and Provost, F. (1997). Adaptive fraud detection. Journal of Data Mining and Knowledge Discovery 1(3). 11- Roset, S., Murad, U., Neumann, E., Idan, Y. and Pinkas, G. (1999). Discovery of fraud rules for telecommunications—challenges and solutions. Proceedings of the Fifth ACM SIGKDD International. 12- OJUKA NELSON, 2009. DETECTION OF SUBSCRIPTION FRAUD IN TELECOMMUNICATIONS USING DECISION TREE LEARNING. 13- D. Hawkins, 1980. “Identification of outliers,” Champman and Hall, Reading, London. 14- R. Ng E. Knorr and T. Tucakov, “Distance-based outliers,” Algorithms and Applications, vol. 8, no. 3,pp. 237–253, 2000. 15- R. Rastogi S. Ramaswamy and S. Kyuseok, “Efficient algorithms for mining outliers from large data sets,” SIGMOD’OO, 2000.