This document discusses using Naive Bayes classification to detect subscription fraud at a telecom company. It analyzes call detail records from known fraudsters to build a model to predict fraudulent subscribers. The model is trained on 70% of records from 3 known fraudsters and tested on 30% of records. It then uses the model to analyze an audit log of 15 subscribers and predicts which subscribers are most likely to be the known fraudsters Sally, Virginia or Vince, along with the probabilities. The document outlines the datasets, tools used including R and various R packages, data preprocessing steps, model building process using 10-fold cross validation, performance evaluation on test data and using the model to analyze the audit log dataset.
Subscription fraud analytics using classificationSomdeep Sen
A fictitious telecom company called Bad Idea came up with a strange rate plan called Praxis Plan where the callers are allowed to make only one call in the Morning (9AM-Noon), Afternoon (Noon-4PM), Evening(4PM-9PM) and Night (9PM-Midnight); i.e. four calls per day. Despite the popularity of the plan, Bad Idea was a target of Subscription Fraud by a gang of fraudsters consisting of three people: Sally, Virginia and Vince. They finally terminated their services. Bad Idea has their call logs spanning over one and half months.
The analytics team of the company has been provided two data sets: Black-List Subscriber Call-Logs & Audit Log. The Black-List Subscriber Call-Logs data set includes the calling patterns of the three fraudsters i.e. Sally, Virginia and Vince. After every 5 days the company undertakes an audit to see whether these Fraudsters have joined their network. The company reviews the list of subscribers who have made calls to the same people as these three fraudsters and in the same time frame. This has been provided in the Audit Log.
Test Data: http://bit.ly/1du9cRs
Training Data: http://bit.ly/1du9AQ1
Telecommunication Fraud Detection and PreventionSumera Khan
Telecommunication fraud is the use of telecommunication products or services with the intent of illegitimately acquiring money from, or deteriorating to pay, a telecommunication company or its clients. E.g. PBX/IP-PBX Fraud: The hacking of a PBX to initiate long distance and high case destination calling by fraudsters.
A data mining framework for fraud detection in telecom based on MapReduce (Pr...Mohammed Kharma
The outputs of this research is a design and implement a model using data mining to detect fraud cases targeting telecom environment where a huge volume of data should to be processed based on cloud computing infrastructure we will build using the most popular and powerful cloud computing framework MapReduce. We will use Data obtained from call details record (CDR) in billing repository and the result is subscriber subset that classified as fraudulent subscription in near online mode. This will help to reduce time in detecting fraud events and enhance revenue assurance team ability to identify fraudulent cases efficiently.
Detecting Fraud Using Transaction Frequency DataITIIIndustries
Despite all attempts to prevent fraud, it continues to be a major threat to industry and government. In this paper, we present a fraud detection method which detects irregular frequency of transaction usage in an Enterprise Resource Planning (ERP) system. We discuss the design, development and empirical evaluation of outlier detection and distance measuring techniques to detect frequency-based anomalies within an individual user’s profile, relative to other similar users. Primarily, we propose three automated techniques: a univariate method, called Boxplot which is based on the sample’s median; and two multivariate methods which use Euclidean distance, for detecting transaction frequency anomalies within each transaction profile. The two multivariate approaches detect potentially fraudulent activities by identifying: (1) users where the Euclidean distance between their transaction-type set is above a certain threshold and (2) users/data points that lie far apart from other users/clusters or represent a small cluster size, using k-means clustering. The proposed methodology allows an auditor to investigate the transaction frequency anomalies and adjust the different parameters, such as the outlier threshold and the Euclidean distance threshold values to tune the number of alerts. The novelty of the proposed technique lies in its ability to automatically trigger alerts from transaction profiles, based on transaction usage performed over a period of time. Experiments were conducted using a real dataset obtained from the production client of a large organization using SAP R/3 (presently the most predominant ERP system), to run its business. The results of this empirical research demonstrate the effectiveness of the proposed approach.
Subscription fraud analytics using classificationSomdeep Sen
A fictitious telecom company called Bad Idea came up with a strange rate plan called Praxis Plan where the callers are allowed to make only one call in the Morning (9AM-Noon), Afternoon (Noon-4PM), Evening(4PM-9PM) and Night (9PM-Midnight); i.e. four calls per day. Despite the popularity of the plan, Bad Idea was a target of Subscription Fraud by a gang of fraudsters consisting of three people: Sally, Virginia and Vince. They finally terminated their services. Bad Idea has their call logs spanning over one and half months.
The analytics team of the company has been provided two data sets: Black-List Subscriber Call-Logs & Audit Log. The Black-List Subscriber Call-Logs data set includes the calling patterns of the three fraudsters i.e. Sally, Virginia and Vince. After every 5 days the company undertakes an audit to see whether these Fraudsters have joined their network. The company reviews the list of subscribers who have made calls to the same people as these three fraudsters and in the same time frame. This has been provided in the Audit Log.
Test Data: http://bit.ly/1du9cRs
Training Data: http://bit.ly/1du9AQ1
Telecommunication Fraud Detection and PreventionSumera Khan
Telecommunication fraud is the use of telecommunication products or services with the intent of illegitimately acquiring money from, or deteriorating to pay, a telecommunication company or its clients. E.g. PBX/IP-PBX Fraud: The hacking of a PBX to initiate long distance and high case destination calling by fraudsters.
A data mining framework for fraud detection in telecom based on MapReduce (Pr...Mohammed Kharma
The outputs of this research is a design and implement a model using data mining to detect fraud cases targeting telecom environment where a huge volume of data should to be processed based on cloud computing infrastructure we will build using the most popular and powerful cloud computing framework MapReduce. We will use Data obtained from call details record (CDR) in billing repository and the result is subscriber subset that classified as fraudulent subscription in near online mode. This will help to reduce time in detecting fraud events and enhance revenue assurance team ability to identify fraudulent cases efficiently.
Detecting Fraud Using Transaction Frequency DataITIIIndustries
Despite all attempts to prevent fraud, it continues to be a major threat to industry and government. In this paper, we present a fraud detection method which detects irregular frequency of transaction usage in an Enterprise Resource Planning (ERP) system. We discuss the design, development and empirical evaluation of outlier detection and distance measuring techniques to detect frequency-based anomalies within an individual user’s profile, relative to other similar users. Primarily, we propose three automated techniques: a univariate method, called Boxplot which is based on the sample’s median; and two multivariate methods which use Euclidean distance, for detecting transaction frequency anomalies within each transaction profile. The two multivariate approaches detect potentially fraudulent activities by identifying: (1) users where the Euclidean distance between their transaction-type set is above a certain threshold and (2) users/data points that lie far apart from other users/clusters or represent a small cluster size, using k-means clustering. The proposed methodology allows an auditor to investigate the transaction frequency anomalies and adjust the different parameters, such as the outlier threshold and the Euclidean distance threshold values to tune the number of alerts. The novelty of the proposed technique lies in its ability to automatically trigger alerts from transaction profiles, based on transaction usage performed over a period of time. Experiments were conducted using a real dataset obtained from the production client of a large organization using SAP R/3 (presently the most predominant ERP system), to run its business. The results of this empirical research demonstrate the effectiveness of the proposed approach.
PDMLP: PHISHING DETECTION USING MULTILAYER PERCEPTRONIJNSA Journal
A phishing website is a significant problem on the internet. It’s one of the Cyber-attack types where attackers try to obtain sensitive information such as username and password or credit card information. The recent growth in deploying a Detection phishing URL system on many websites has resulted in a massive amount of available data to predict phishing websites. In this paper, we purpose a new method to develop a phishing detection system called phishing detection based on a multilayer perceptron (PDMLP), which used on two types of datasets. The performance of these mechanisms evaluated in terms of Accuracy, Precision, Recall, and F-measure. Results showed that PDMLP provides better performance in comparison to KNN, SVM, C4.5 Decision Tree, RF, and RoF to classifiers.
The goal of seminar is to detect when the distributor’s sensitive data has been leaked by agents, and show the probability for identifying the agent that leaked the data. We study unobtrusive techniques for detecting leakage of a set of objects or records.
An estimated 74% of organizations face payment fraud attempts every year, with efforts increasing in sophistication. In this session, Kyriba will present best practices in fraud prevention and detection, including application security, workflow controls, securing bank connectivity, and improving payment controls through real-time fraud detection and prevention.
Whats your organization doing to protect against fraudulent activity? Have you taken the necessary measures to reduce monetary loss, keep brand reputation high while keeping organizational efficiencies on track?
Either intentionally or unintentionally, fraud and error happen in clinical research. Even today, data manipulation and tampering have been a continuing issue that bio-pharmaceutical companies and clinical research institutions alike are trying to combat.
DO YOU WANT TO BE THE NEXT 'SCANDAL'?
Learn how intelligent, risk-based methods and technology can detect instances of data fraud as well as other data problems at an early, treatable time point, and.....help you stay out of prison.
We will highlight and illustrate with real-life examples how to detect and prevent fraud and sloppiness by adopting a risk-based approach.
Streamlining Submission Intake in Commercial Underwriting for Middle Market S...Cognizant
For many insurance carriers, data submission for intake is still a manual, time-consuming process that ultimately takes up valuable underwriter time better spent on risk selection and pricing. We offer a streamlined submission intake process for commercial insurance firms that integrate automated tools and manual prequalification to handle ACORD forms and other data formats.
In spite of the development of aversion strategies, phishing remains an essential risk even after the
primary countermeasures and in view of receptive URL blacklisting. This strategy is insufficient because of the
short lifetime of phishing websites. In order to overcome this problem, developing a real-time phishing website
detection method is an effective solution. This research introduces the PrePhish algorithm which is an automated
machine learning approach to analyze phishing and non-phishing URL to produce reliable result. It represents that
phishing URLs typically have couple of connections between the part of the registered domain level and the path
or query level URL. Using these connections URL is characterized by inter-relatedness and it estimates using
features mined from attributes. These features are then used in machine learning technique to detect phishing
URLs from a real dataset. The classification of phishing and non-phishing website has been implemented by
finding the range value and threshold value for each attribute using decision making classification. This method is
also evaluated in Matlab using three major classifiers SVM, Random Forest and Naive Bayes to find how it works
on the dataset assessed
International Journal of Computational Engineering Research(IJCER) ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
A telecom company named as Bad Idea is expecting for fraudsters.
They designed a weird rate plan called Praxis plan where only four calls are allowed during a day.
Bad Idea has their call logs spanning over one and half months.We are using the Naive Bayesian Classification rule to predict the fraudsters for telecom company.
PDMLP: PHISHING DETECTION USING MULTILAYER PERCEPTRONIJNSA Journal
A phishing website is a significant problem on the internet. It’s one of the Cyber-attack types where attackers try to obtain sensitive information such as username and password or credit card information. The recent growth in deploying a Detection phishing URL system on many websites has resulted in a massive amount of available data to predict phishing websites. In this paper, we purpose a new method to develop a phishing detection system called phishing detection based on a multilayer perceptron (PDMLP), which used on two types of datasets. The performance of these mechanisms evaluated in terms of Accuracy, Precision, Recall, and F-measure. Results showed that PDMLP provides better performance in comparison to KNN, SVM, C4.5 Decision Tree, RF, and RoF to classifiers.
The goal of seminar is to detect when the distributor’s sensitive data has been leaked by agents, and show the probability for identifying the agent that leaked the data. We study unobtrusive techniques for detecting leakage of a set of objects or records.
An estimated 74% of organizations face payment fraud attempts every year, with efforts increasing in sophistication. In this session, Kyriba will present best practices in fraud prevention and detection, including application security, workflow controls, securing bank connectivity, and improving payment controls through real-time fraud detection and prevention.
Whats your organization doing to protect against fraudulent activity? Have you taken the necessary measures to reduce monetary loss, keep brand reputation high while keeping organizational efficiencies on track?
Either intentionally or unintentionally, fraud and error happen in clinical research. Even today, data manipulation and tampering have been a continuing issue that bio-pharmaceutical companies and clinical research institutions alike are trying to combat.
DO YOU WANT TO BE THE NEXT 'SCANDAL'?
Learn how intelligent, risk-based methods and technology can detect instances of data fraud as well as other data problems at an early, treatable time point, and.....help you stay out of prison.
We will highlight and illustrate with real-life examples how to detect and prevent fraud and sloppiness by adopting a risk-based approach.
Streamlining Submission Intake in Commercial Underwriting for Middle Market S...Cognizant
For many insurance carriers, data submission for intake is still a manual, time-consuming process that ultimately takes up valuable underwriter time better spent on risk selection and pricing. We offer a streamlined submission intake process for commercial insurance firms that integrate automated tools and manual prequalification to handle ACORD forms and other data formats.
In spite of the development of aversion strategies, phishing remains an essential risk even after the
primary countermeasures and in view of receptive URL blacklisting. This strategy is insufficient because of the
short lifetime of phishing websites. In order to overcome this problem, developing a real-time phishing website
detection method is an effective solution. This research introduces the PrePhish algorithm which is an automated
machine learning approach to analyze phishing and non-phishing URL to produce reliable result. It represents that
phishing URLs typically have couple of connections between the part of the registered domain level and the path
or query level URL. Using these connections URL is characterized by inter-relatedness and it estimates using
features mined from attributes. These features are then used in machine learning technique to detect phishing
URLs from a real dataset. The classification of phishing and non-phishing website has been implemented by
finding the range value and threshold value for each attribute using decision making classification. This method is
also evaluated in Matlab using three major classifiers SVM, Random Forest and Naive Bayes to find how it works
on the dataset assessed
International Journal of Computational Engineering Research(IJCER) ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
A telecom company named as Bad Idea is expecting for fraudsters.
They designed a weird rate plan called Praxis plan where only four calls are allowed during a day.
Bad Idea has their call logs spanning over one and half months.We are using the Naive Bayesian Classification rule to predict the fraudsters for telecom company.
IBM Endpoint Manager for Mobile Devices (Overview)Kimber Spradlin
Manage all devices - smartphones, tablets, laptops, desktops, and servers - from a single console. IBM Endpoint Manager also integrates Enterproid Divide secure container and NitroDesk TouchDown secure email technologies for separation of organizational content on BYOD and contractor devices.
Mandibular nerve block and mental nerve / oral surgery courses Indian dental academy
The Indian Dental Academy is the Leader in continuing dental education , training dentists in all aspects of dentistry and offering a wide range of dental certified courses in different formats.
Indian dental academy provides dental crown & Bridge,rotary endodontics,fixed orthodontics,
Dental implants courses.for details pls visit www.indiandentalacademy.com ,or call
00919248678078
Designed a fully customized 128x10b SRAM by constructing schematic & virtuoso layout of memory cell array (6T cell), row & column decoder, pre-charge circuit, write circuit and sense amplifier using Cadence. Manually placed and routed all components, performed DRC & LVS debugging of constructed schematic and layout and ran PEX to generate the final Netlist, Hspice Spectre simulation of final design for verification of the correct functionality and analysis of best read, best write cycles & the worst case timing for read and write. Timing and power consumed is analyzed through STA-Primetime (Static timing Analysis)
Churn in the Telecommunications Industryskewdlogix
Strategic Business Analysis Capstone Project Telecommunications Churn Management
Churn is a significant problem that costs telecommunications companies billions of dollars through lost revenue. Now that the market is more mature, the only way for a company to grow is to take their competitors customers. This issue
combined with the greater choice that consumers have gained means that any adverse touch point with a consumer can result in a lost customer.
Dive into the intricate world of fraud detection with this comprehensive presentation featuring an unique student project. Explore the project's objectives, methodologies, and innovative solutions developed to combat fraudulent activities within financial transactions. From data analysis to model implementation, witness the journey our student has undertaken to create a robust fraud detection system. Whether you're a fellow student, industry professional, or enthusiast, this showcase provides valuable insights into the challenges and advancements in fraud detection technology. To learn more, do check out https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/.
A Comparative Analysis of Different Feature Set on the Performance of Differe...gerogepatton
Reducing the risk pose by phishers and other cybercriminals in the cyber space requires a robust and
automatic means of detecting phishing websites, since the culprits are constantly coming up with new
techniques of achieving their goals almost on daily basis. Phishers are constantly evolving the methods
they used for luring user to revealing their sensitive information. Many methods have been proposed in
past for phishing detection. But the quest for better solution is still on. This research covers the
development of phishing website model based on different algorithms with different set of features in order
to investigate the most significant features in the dataset.
Learn how IBM Smarter Analytics Solution for insurance helps Detect and prevent insurance claims fraud, waste and abuse. For more information on IBM Systems, visit http://ibm.co/RKEeMO.
Visit the official Scribd Channel of IBM India Smarter Computing at http://bit.ly/VwO86R to get access to more documents.
Cyber Loss Model for the cost of a data breach.Thomas Lee
Cyber Loss Model is a rigorous statistical model based upon historical industry data, which predicts the cost of a data breach.
This valuable model can help demonstrate cyber insurance adequacy, or a no insurance stance, for CCAR/DFAST idiosyncratic scenarios. Some banks are using this model to demonstrate a stronger culture of risk management for tier 1 capital. This model could also serve as a strong Challenger Model to a banks Champion Model, or a Champion model if the bank has no method for assessing the cost of a data breach. This model complies with SR11-7 and can pass model validation.
Data Mining on SpamBase,Wine Quality and Communities and Crime DatasetsAnkit Ghosalkar
Application of Data Mining Techniques like Linear Discriminant Analysis(LDA), k-means clustering, Multiple Linear Regression, Principle Component Analysis(PCA) and Logistic Regression on Datasets
Learn how IBM Smarter Analytics is Signature Solution for healthcare, detecting and preventing healthcare fraud, waste and abuse. For more information on IBM Systems, visit http://ibm.co/RKEeMO.
Visit the official Scribd Channel of IBM India Smarter Computing at http://bit.ly/VwO86R to get access to more documents.
In today's digital world, credit card fraud is a growing concern. This project explores machine learning techniques for credit card fraud detection. We delve into building models that can identify suspicious transactions in real-time, protecting both consumers and financial institutions. for more detection and machine learning algorithm explore data science and analysis course: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
This project showcases an AI-driven approach to detecting credit card fraud using machine learning algorithms. The project utilizes a dataset containing transactions with various features such as transaction amount, location, and time. The goal is to build a predictive model that can accurately identify fraudulent transactions and minimize financial losses for banks and customers. The presentation covers data preprocessing techniques, feature engineering, and the application of machine learning algorithms such as logistic regression or random forests. It also discusses model evaluation metrics and the importance of fraud detection in the banking industry. Visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Customer churn has been evolving as one of the major problems for financial organizations. The incessant competitions in the market and high cost of acquiring new customers have made organizations to drive their focus towards more effective customer retention strategies.
Law Enforcement Fraud Prevention Network and Financial Instrument Secure Tran...Michael Abernathy
A description of our fraud prevention and cyber crime projects we are preparing to implement on a nationwide basis. We are seeking investors, marketing, human resources, IT infrastructure, coding and programming, cyber security professionals. The services described are projected to generate over $286 Billion in revenue in 2018
The market share consists of 11 million businesses, 120 million consumers and family members, along with victims of ID Theft, Check Fraud. Tax Refund Fraud, Title Fraud, Mortgage Fraud, Installment loan fraud, and internet fraud.
Fraud crimes are at epidemic levels and has seen dramatic increases over the last 20 years.
This is a system that can be implemented throughout the U.S. Canada, and the world.
Consumers will be able to verify their identity and protect themselves from fraud just by submitting their fingerprint.
It will be required by any business that sells products or services.
Understanding IDP: Data Validation and Feedback LoopInfrrd
According to Gartner, "The market for document capture, extraction, and processing is highly fragmented. Data and analytics leaders should use this research to understand the process flow and differentiated capabilities offered by intelligent document processing solutions". Gartner's recently released "Infographic: Understand Intelligent Document Processing" covers these 6 critical flows in IDP.
1. Capture or Ingestion
2. Document Preprocessing
3. Document Classification
4. Data Extraction
5. Validation and Feedback Loop
6. Integration
This is the fourth post in the series exploring Data Validation and Feedback Loop.
Similar to Telecom Fraud Detection - Naive Bayes Classification (20)
In this study, we have to project the airline travel for the next 12 months .The dataset used here is SASHELP.AIR which is Airline data and contains two variables – DATE and AIR( labeled as International Airline Travel).It contains the data from JAN 1949 to DEC 1960.
1) To understand the underlying structure of Time Series represented by sequence of observations by breaking it down to its components.
2) To fit a mathematical model and proceed to forecast the future.
Hospital Market Segmentation using Cluster AnalysisMaruthi Nataraj K
To identify the hospitals’ segments with overall high sales. Then look for hospitals within that segment where the company sales of surgical equipments are low.
Maruti Suzuki India Ltd Financial Statement AnalysisMaruthi Nataraj K
Maruti Suzuki India Ltd Financial Statement Analysis
We have considered Tata Motors in whole as its competitor but it is advised to take the related segments for better results.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
2. Introduction
Business Background
Objective
Datasets Description
Tools & Methods Used
Statistical Procedure
Future Direction
3. Telecommunication Fraud which is the focus is appealing to
fraudsters as calling from the mobile terminal is not bound to any
physical local and it is easy to get subscription.
Fraud negatively impacts on the company in 4 ways such as
financially, marketing, customer relations and shareholder
perceptions.
In Subscription Fraud, fraudsters obtain an account without
intention to pay the bill(theft of service).Thus, at the level of phone
number, all the calls from it will be fraudulent indicating an
abnormal usage. The account is usually used for call selling at
cheaper rates or intensive self usage.
4. Bad Idea Company Ltd was a target of Subscription Fraud by a
gang of fraudsters consisting of 3 people: Sally, Virginia and Vince.
Call logs of the fraudsters spanning over one and half months
were recorded.
An audit is undertaken after every 5 days to check whether the
above fraudsters have joined the company network.
The list of subscribers is reviewed to identify their calling
pattern matching with that of fraudsters.
Note : Praxis Plan –>(4 calls a day) Morning (9AM-Noon)-1, Afternoon (Noon-4PM)-1,
Evening(4PM-9PM)-1 and Night (9PM-Midnight)-1.
5. Our goal is to create a fraud management classification model
that is powerful enough to handle the subscription fraud that the
company has encountered and flexible enough to potentially apply
to things that had not been witnessed yet.
In this case, the company wants to be absolutely sure that the
person is a fraudster backed up by high percentage of confidence
(probability).
Call detail records generated in real time that are available
immediately could be used for building a robust statistical model.
10. Then, a random sample of 70% of the total instances from the Black List
Callers is selected as Training dataset and the remaining 30% as Test
dataset.
11. Next, the table proportions are checked for target variable of both
Training and Test datasets to maintain uniformity.
Later, all the attributes and labels of Training and Test datasets are
stored in separate variables(X~, Y~ respectively) for convenience in coding.
It is followed by building of Naïve Bayes Classifier Model based on 10 fold
cross-validation using Training dataset ( Data is broken down into 10 sets of
size n/10. Trained on 9 datasets and tested on 1. The process is repeated 10
times and mean accuracy is taken.)
12.
13. The classification model generated is applied on Test data for the
prediction of target class (Here, the posterior probabilities are also seen in
the bottom half).
14. After that, confusion matrix is generated for predictions of the Naive
Bayes model versus the actual classification of the data instances to
visualize the classification errors.
15.
16. AuditLog (CSV) file (validation dataset) is read into R Environment as
shown below (Working Directory – My Documents) :
17. At this stage, all the required independent attributes are stored in
separate variable accordingly and the same previous model is applied on
validation dataset this time for the prediction of probable fraudsters along
with probability.
18.
19. From above, we can infer that Customer X and Customer Z might probably be
Sally (as per calling pattern) and Customer Y might be Virginia.
20. The same results
with a greater
accuracy can be
obtained using
E1071 package
and laplacian
correction as
shown here.
The same results
with a greater
accuracy can be
obtained using
E1071 package
and laplacian
correction as
shown here.
21.
22. Naïve Bayes Classification in RapidMiner
Black List Callers
Split Validation(70:30)
Audit Log
4 Time Slots