SlideShare a Scribd company logo
www.artivatic.com contact@artivatic.com
Copyright © Artivatic Data Labs Private Limited
Artivatic AI Labs Capabilities for
Anomaly & Fraud Detection
Version 1.0
Artivatic Technology Team
Ownership
Artivatic Data Labs Private Limited is the owner of the document. Unless otherwise specified, no part of this document may
be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm,
without permission in writing from Artivatic. Similarly, distribution of this document to a third party is also prohibited unless
specific approval is taken from Artivatic
www.artivatic.com contact@artivatic.com
Copyright © Artivatic Data Labs Private Limited
Proposed Possible Solution by Artivatic
Artivatic team did study for the problems considering the output, need and processes to identify
the best solution for anomaly detection based on time series data and fraud detection in multiple
sectors.
Introduction
Anomaly detection is the identification of items, events or observations which do not
conform to an expected pattern or other items in a dataset. The goal of anomaly detection is to
identify unusual or suspicious cases based on deviation from the norm within data that is
seemingly homogeneous.
Categories of anomaly detection techniques
Three broad categories of anomaly detection techniques exist.
1. Unsupervised anomaly detection techniques detect anomalies in an unlabelled test
data set under the assumption that the majority of the instances in the data set are
normal by looking for instances that seem to fit least to the remainder of the data set.
2. Supervised anomaly detection techniques require a data set that has been labelled
as "normal" and "abnormal" and involves training a classifier (the key difference to
many other statistical classification problems is the inherent unbalanced nature of
outlier detection).
3. Semi-supervised anomaly detection techniques construct a model representing
normal behaviour from a given normal training data set, and then testing the
likelihood of a test instance to be generated by the learnt model.
www.artivatic.com contact@artivatic.com
Copyright © Artivatic Data Labs Private Limited
Techniques
A. Density-based spatial clustering of applications with noise (DBSCAN)
1. Introduction:
a. DBSCAN works on the density of the data points.
b. Each Cluster has a considerable higher density of points than outside of the cluster.
2. Algorithm:
a. Arbitrary select a point p.
b. Retrieve all points density-reachable from p wrt radius of the circles and MinPts.
c. If p is a core point, a cluster is formed.
d. If p is a border point, no points are density-reachable from p and DBSCAN visits the
next point of the database.
e. Continue the process until all of the points have been processed.
3. How can we apply?
a. Client should provide the sales data per city/zone.
b. We will apply the DBSCAN algorithm on the historical as well as current data.
c. Sales figures tend to stay within a small range. i.e. the sales of ‘Noodles’ in Bangalore is
approx. 10k every month. Hence Data-point density near 10k is higher.
d. DBSCAN works on the density. If the sales data is not within the historical range of data,
we consider it as an Anomaly. Ex. If in the month of June, the sales goes down to 8k for
noodles then we pop-up its anomaly.
4. Advantage: No need of training and creating model
www.artivatic.com contact@artivatic.com
Copyright © Artivatic Data Labs Private Limited
5. Disadvantage: If the same anomaly occurs multiple times, it is not detected.
B. Regression Based Anomaly Detection
1. Introduction
a. Regression analysis is a form of predictive modelling technique which investigates the
relationship between a dependent (target) and independent variable (s) (predictor).
b. This technique is used for forecasting, time series modelling and finding the causal effect
relationship between the variables.
2. Type of Regression
a. Linear Regression
b. Logistic Regression
c. Polynomial Regression
d. Stepwise Regression
3. How can we apply for Client?
a. Client should provide data is time-series format.
b. From time-series based data we can predict the expected sales for the future.
c. If the sales figure for particular time is greater than or less than the Predicted value +/-
threshold value. Then that is detected as an Anomaly.
d. We calculate threshold depending on the average of the previous error values.
e. Threshold is re-calculated automatically after every training.
4. Advantage: Repeated anomaly also detected.
5. Disadvantage: Need to create prediction model and slower than DBSCAN.
www.artivatic.com contact@artivatic.com
Copyright © Artivatic Data Labs Private Limited
Type of Anomalies
Anomalies can be broadly categorized as:
1. Point anomalies:
a. A single instance of data is anomalous if it's too far off from the rest.
b. Business use case: Detecting credit card fraud based on "amount spent."
Fig1. Point anomalies
www.artivatic.com contact@artivatic.com
Copyright © Artivatic Data Labs Private Limited
2. Contextual anomalies:
a. An individual data instance is anomalous within a context
b. Requires a notion of context
c. Also, referred to as “conditional anomalies”.
d. Business use case: Spending $100 on food every day during the holiday season
is normal, but may be odd otherwise.
Fig2. Contextual anomalies
3. Collective anomalies:
a. A collection of related data instances is anomalous
b. Requires a relationship among data instances
i. Sequential Data
ii. Spatial Data
iii. Graph Data
c. The individual instances within a collective anomaly are not anomalous by
themselves.
d. Business use case: Someone is trying to copy data from a remote machine to a
local host unexpectedly, an anomaly that would be flagged as a potential cyber
attack.
www.artivatic.com contact@artivatic.com
Copyright © Artivatic Data Labs Private Limited
Fig 3. Collective anomalies
Type of Anomalies Solution Process
1. Point anomalies:
a. Input data point not conforming to historical models are identified as
anomalous.
Ex: Sales value of the particular product is very high as compared to overall
average of the sales of that product.
b. Large difference between two similar products from the same brand should be
identified as outliers.
Ex: Sales of the “AXE Dark Temptation” and sales of the “AXE Blast
Deodorant” have huge difference in sales then it can be anomalous.
c. Large gap between sales figures of competing brands should be identified.
Ex: Sales of a brand of “shampoo” as compared to the sales of a competing brand
of “shampoo” are decreasing.
d. Solution: Point anomalies can be found using AV Hybrid approach for anomaly
detection.
www.artivatic.com contact@artivatic.com
Copyright © Artivatic Data Labs Private Limited
2. Contextual anomalies:
a. If there is the sudden spike (positive or negative) then it considered as a
contextual anomaly.
Ex: If there is sudden drop in the sales by 25% for a particular product then it
is considered as an anomaly?
b. If there is a random change or the value is not correct according the context of
the attribute, then it is also considered as an anomaly.
Ex: Sales value of the product is $0 on the weekdays.
c. Solution: For finding the contextual anomalies we compare the input data with
latest history data and find if there is sudden drop above threshold or not. Also
find context and apply the context of attribute to find the anomaly.
3. Collective anomalies:
a. If there is a continuous change in the figures it can be considered as an anomaly.
Ex: Continuous drop in the sales for 4-6 months that may be anomalous.
b. Constant value for large period of time can be an anomaly.
Ex: If there is sales value of 0 for whole week.
c. Solution: Compare input data with previous dataset. If there is constant value
that can be an anomaly.
Artivatic’s approach for Client’s anomaly Fraud detection
www.artivatic.com contact@artivatic.com
Copyright © Artivatic Data Labs Private Limited
1. ‘Hybrid approach for the anomaly detection’ will be developed.
2. Hybrid approach combines the advantage of both DBSCAN based Anomaly detection
and Regression based anomaly detection.
3. Initially we will apply PCA for dimension reduction on the given dataset which help
us to reduce complexity and runtime for Anomaly detection.
4. After PCA, we will apply regression based anomaly detection algorithm on the given
dataset to get the initial set of anomalies.
5. In regression based anomaly detection if the data is crossing the threshold limit its
consider as the anomaly.
6. But in some edge cases, correct behaviour of data points are identified as anomalous.
For example, every Sunday the sales figures are 0 and it will be detected as an outlier
in the regression based algorithm.
7. To verify the outliers detected in the regression model, the list of attributes detected as
anomalies will be checked using the Density based anomaly detection.
8. The outliers detected after the Density based algorithm will be labelled as anomalous.
9. For finding the contextual anomalies we compare the input data with latest history data
and find if there is sudden drop above threshold or not. Also, find context and apply
the context of attribute to find the anomaly.
10. To find the collective anomalies, the input data will be compared with the previous
dataset. If there is a constant value that can be an anomaly.
Details of step 9 and 10 will be defined post further analysis of the given data sets.
www.artivatic.com contact@artivatic.com
Copyright © Artivatic Data Labs Private Limited
System Architecture
www.artivatic.com contact@artivatic.com
Copyright © Artivatic Data Labs Private Limited
Technologies, Languages and Cloud Platform
Technologies, we use for architecture, backend, models/algorithms:
Scala, C++, Java, Hadoop, OpenCV, Angular2
Our APIs/SDKs are available Scala, Java, JavaScript, PHP, Android, iOS as well. The new
technology wrappers can be built quickly based on Client’s technology.
Database: Cassandra [We can store data to any other database based on the need of Client]
For ML processing: Hadoop, Mahout
For NLP Tool: Java, C ++
Cloud Platform: AWS, Google, Azure, Own Servers (It can be installed on client's
local/private servers as well, no such restrictions)
The technologies will not have dependency much as we will be needing sometime to create
required technology focused software as per need of Clients, so no technology, cloud, server
and data based dependency for Artivatic to integrate.
www.artivatic.com contact@artivatic.com
Copyright © Artivatic Data Labs Private Limited
Estimation for Completion Time for the PoC
Task Name Task in Details Approx. Hours
Needed (Hrs)
A. Data retrieving from client 60
B. Data Cleaning and pre-processing 40
C. DBSCAN Approach 40
D. Regression based Approach 40
E. Integration and creating Hybrid approach 20
F. Contextual anomalies finding module 80
G. Collective anomalies 80
H. Dashboard Depends if needed to build
separately or want to
directly integrate with
existing Client’s Dashboard
80
I. Internal Testing 50
J. Integration with the client and testing
with client
On Client’s Servers 80
Total Approx. Hours 475-570 Hrs
Approximate Time = 60- 70 Days [If Single person works]
The duration can be much lower by spending more time and can finish within 1 month as
well if needed by Client. Timings are flexible.
www.artivatic.com contact@artivatic.com
Copyright © Artivatic Data Labs Private Limited
Disclaimer: This document has been released solely for educational and informational purposes. Artivatic does not make any
representations or warranties whatsoever regarding quality, reliability, functionality, or compatibility of products and
solutions, services and technologies mentioned herewith. Depending on specific situations, products and solutions may need
customization, and performance and results may vary.

More Related Content

Similar to Fraud detection- Retail, Banking, Finance & FMCG

Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
Roger Barga
 
Types of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike MoinTypes of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike Moin
Tanvir Moin
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - Report
Akanksha Gohil
 
Nss power point_machine_learning
Nss power point_machine_learningNss power point_machine_learning
Nss power point_machine_learning
Gauravsd2014
 
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYNAutomobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
IRJET Journal
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
Roger Barga
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
Datamining Tools
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
DataminingTools Inc
 
Data Cleaning and Summarising
Data Cleaning and SummarisingData Cleaning and Summarising
Data Cleaning and Summarising
CHISANHONG
 
BIG MART SALES.pptx
BIG MART SALES.pptxBIG MART SALES.pptx
BIG MART SALES.pptx
LSURYAPRAKASHREDDY
 
BIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptxBIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptx
LSURYAPRAKASHREDDY
 
The Dangers of Machine Learning
The Dangers of Machine LearningThe Dangers of Machine Learning
The Dangers of Machine Learning
tothepointIT
 
Credit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data ScienceCredit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data Science
IRJET Journal
 
Credit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data ScienceCredit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data Science
IRJET Journal
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaRahul Bhatia
 
Capstone Project.pptx
Capstone Project.pptxCapstone Project.pptx
Capstone Project.pptx
ARESProject1
 
A Review of deep learning techniques in detection of anomaly incredit card tr...
A Review of deep learning techniques in detection of anomaly incredit card tr...A Review of deep learning techniques in detection of anomaly incredit card tr...
A Review of deep learning techniques in detection of anomaly incredit card tr...
IRJET Journal
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
Johnson Ubah
 
Data Analyst Interview Questions & Answers
Data Analyst Interview Questions & AnswersData Analyst Interview Questions & Answers
Data Analyst Interview Questions & Answers
Satyam Jaiswal
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014
Roger Barga
 

Similar to Fraud detection- Retail, Banking, Finance & FMCG (20)

Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Types of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike MoinTypes of Machine Learning- Tanvir Siddike Moin
Types of Machine Learning- Tanvir Siddike Moin
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - Report
 
Nss power point_machine_learning
Nss power point_machine_learningNss power point_machine_learning
Nss power point_machine_learning
 
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYNAutomobile Insurance Claim Fraud Detection using Random Forest and ADASYN
Automobile Insurance Claim Fraud Detection using Random Forest and ADASYN
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data Cleaning and Summarising
Data Cleaning and SummarisingData Cleaning and Summarising
Data Cleaning and Summarising
 
BIG MART SALES.pptx
BIG MART SALES.pptxBIG MART SALES.pptx
BIG MART SALES.pptx
 
BIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptxBIG MART SALES PRIDICTION PROJECT.pptx
BIG MART SALES PRIDICTION PROJECT.pptx
 
The Dangers of Machine Learning
The Dangers of Machine LearningThe Dangers of Machine Learning
The Dangers of Machine Learning
 
Credit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data ScienceCredit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data Science
 
Credit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data ScienceCredit Card Fraud Detection Using Machine Learning & Data Science
Credit Card Fraud Detection Using Machine Learning & Data Science
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
 
Capstone Project.pptx
Capstone Project.pptxCapstone Project.pptx
Capstone Project.pptx
 
A Review of deep learning techniques in detection of anomaly incredit card tr...
A Review of deep learning techniques in detection of anomaly incredit card tr...A Review of deep learning techniques in detection of anomaly incredit card tr...
A Review of deep learning techniques in detection of anomaly incredit card tr...
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Data Analyst Interview Questions & Answers
Data Analyst Interview Questions & AnswersData Analyst Interview Questions & Answers
Data Analyst Interview Questions & Answers
 
Data Driven Engineering 2014
Data Driven Engineering 2014Data Driven Engineering 2014
Data Driven Engineering 2014
 

More from Artivatic.ai

Artivatic-Preauth-discharge-claims-ai.pdf
Artivatic-Preauth-discharge-claims-ai.pdfArtivatic-Preauth-discharge-claims-ai.pdf
Artivatic-Preauth-discharge-claims-ai.pdf
Artivatic.ai
 
Revolutionizing Health Claims Management with GPT
Revolutionizing Health Claims Management with GPTRevolutionizing Health Claims Management with GPT
Revolutionizing Health Claims Management with GPT
Artivatic.ai
 
Alfred Health Platform - AI Health Claims
Alfred Health Platform - AI Health Claims Alfred Health Platform - AI Health Claims
Alfred Health Platform - AI Health Claims
Artivatic.ai
 
Healthcare Expenses in India: How Indians Pay for Medical Treatment
Healthcare Expenses in India: How Indians Pay for Medical TreatmentHealthcare Expenses in India: How Indians Pay for Medical Treatment
Healthcare Expenses in India: How Indians Pay for Medical Treatment
Artivatic.ai
 
GPT-4 Use Cases in Insurance Sector.pdf
GPT-4 Use Cases in Insurance Sector.pdfGPT-4 Use Cases in Insurance Sector.pdf
GPT-4 Use Cases in Insurance Sector.pdf
Artivatic.ai
 
How technology is helping in faster claim settlements in health insurance.pdf
How technology is helping in faster claim settlements in health insurance.pdfHow technology is helping in faster claim settlements in health insurance.pdf
How technology is helping in faster claim settlements in health insurance.pdf
Artivatic.ai
 
Web 3.0 Presentation (1).pdf
Web 3.0 Presentation (1).pdfWeb 3.0 Presentation (1).pdf
Web 3.0 Presentation (1).pdf
Artivatic.ai
 
Life Insurance Trends For 2022 And Beyond
Life Insurance Trends For 2022 And Beyond Life Insurance Trends For 2022 And Beyond
Life Insurance Trends For 2022 And Beyond
Artivatic.ai
 
The Power of IoT in Healthcare Sector (1).pdf
The Power of IoT in Healthcare Sector (1).pdfThe Power of IoT in Healthcare Sector (1).pdf
The Power of IoT in Healthcare Sector (1).pdf
Artivatic.ai
 
Robotic process automation powers digital transformation in insurance industry
Robotic process automation powers digital transformation in insurance industryRobotic process automation powers digital transformation in insurance industry
Robotic process automation powers digital transformation in insurance industry
Artivatic.ai
 
Chatbots: The New Sales Agent in Insurance Industry
Chatbots: The New Sales Agent in Insurance IndustryChatbots: The New Sales Agent in Insurance Industry
Chatbots: The New Sales Agent in Insurance Industry
Artivatic.ai
 
Insurance innovation through microservices
Insurance innovation through microservicesInsurance innovation through microservices
Insurance innovation through microservices
Artivatic.ai
 
Intelligent underwriting workbench
Intelligent underwriting workbenchIntelligent underwriting workbench
Intelligent underwriting workbench
Artivatic.ai
 
Blockchain and it’s importance on Insurance Industry
Blockchain and it’s importance on Insurance IndustryBlockchain and it’s importance on Insurance Industry
Blockchain and it’s importance on Insurance Industry
Artivatic.ai
 
Insurance Sales Revolution
Insurance Sales RevolutionInsurance Sales Revolution
Insurance Sales Revolution
Artivatic.ai
 
Bancassurance: It's time for Digital
Bancassurance: It's time for DigitalBancassurance: It's time for Digital
Bancassurance: It's time for Digital
Artivatic.ai
 
The rise of automation in employee health benefits
The rise of automation in employee health benefitsThe rise of automation in employee health benefits
The rise of automation in employee health benefits
Artivatic.ai
 
AUSIS AI UNDERWRITING PLATFORM TRANSFORMING INSURANCE
AUSIS AI UNDERWRITING PLATFORM TRANSFORMING INSURANCEAUSIS AI UNDERWRITING PLATFORM TRANSFORMING INSURANCE
AUSIS AI UNDERWRITING PLATFORM TRANSFORMING INSURANCE
Artivatic.ai
 
Adoption of Technologies for Claims Management in the Health Insurance Sector.
Adoption of Technologies for Claims Management in the Health Insurance Sector.Adoption of Technologies for Claims Management in the Health Insurance Sector.
Adoption of Technologies for Claims Management in the Health Insurance Sector.
Artivatic.ai
 
Health insurance Access in Rural Areas
Health insurance Access in Rural AreasHealth insurance Access in Rural Areas
Health insurance Access in Rural Areas
Artivatic.ai
 

More from Artivatic.ai (20)

Artivatic-Preauth-discharge-claims-ai.pdf
Artivatic-Preauth-discharge-claims-ai.pdfArtivatic-Preauth-discharge-claims-ai.pdf
Artivatic-Preauth-discharge-claims-ai.pdf
 
Revolutionizing Health Claims Management with GPT
Revolutionizing Health Claims Management with GPTRevolutionizing Health Claims Management with GPT
Revolutionizing Health Claims Management with GPT
 
Alfred Health Platform - AI Health Claims
Alfred Health Platform - AI Health Claims Alfred Health Platform - AI Health Claims
Alfred Health Platform - AI Health Claims
 
Healthcare Expenses in India: How Indians Pay for Medical Treatment
Healthcare Expenses in India: How Indians Pay for Medical TreatmentHealthcare Expenses in India: How Indians Pay for Medical Treatment
Healthcare Expenses in India: How Indians Pay for Medical Treatment
 
GPT-4 Use Cases in Insurance Sector.pdf
GPT-4 Use Cases in Insurance Sector.pdfGPT-4 Use Cases in Insurance Sector.pdf
GPT-4 Use Cases in Insurance Sector.pdf
 
How technology is helping in faster claim settlements in health insurance.pdf
How technology is helping in faster claim settlements in health insurance.pdfHow technology is helping in faster claim settlements in health insurance.pdf
How technology is helping in faster claim settlements in health insurance.pdf
 
Web 3.0 Presentation (1).pdf
Web 3.0 Presentation (1).pdfWeb 3.0 Presentation (1).pdf
Web 3.0 Presentation (1).pdf
 
Life Insurance Trends For 2022 And Beyond
Life Insurance Trends For 2022 And Beyond Life Insurance Trends For 2022 And Beyond
Life Insurance Trends For 2022 And Beyond
 
The Power of IoT in Healthcare Sector (1).pdf
The Power of IoT in Healthcare Sector (1).pdfThe Power of IoT in Healthcare Sector (1).pdf
The Power of IoT in Healthcare Sector (1).pdf
 
Robotic process automation powers digital transformation in insurance industry
Robotic process automation powers digital transformation in insurance industryRobotic process automation powers digital transformation in insurance industry
Robotic process automation powers digital transformation in insurance industry
 
Chatbots: The New Sales Agent in Insurance Industry
Chatbots: The New Sales Agent in Insurance IndustryChatbots: The New Sales Agent in Insurance Industry
Chatbots: The New Sales Agent in Insurance Industry
 
Insurance innovation through microservices
Insurance innovation through microservicesInsurance innovation through microservices
Insurance innovation through microservices
 
Intelligent underwriting workbench
Intelligent underwriting workbenchIntelligent underwriting workbench
Intelligent underwriting workbench
 
Blockchain and it’s importance on Insurance Industry
Blockchain and it’s importance on Insurance IndustryBlockchain and it’s importance on Insurance Industry
Blockchain and it’s importance on Insurance Industry
 
Insurance Sales Revolution
Insurance Sales RevolutionInsurance Sales Revolution
Insurance Sales Revolution
 
Bancassurance: It's time for Digital
Bancassurance: It's time for DigitalBancassurance: It's time for Digital
Bancassurance: It's time for Digital
 
The rise of automation in employee health benefits
The rise of automation in employee health benefitsThe rise of automation in employee health benefits
The rise of automation in employee health benefits
 
AUSIS AI UNDERWRITING PLATFORM TRANSFORMING INSURANCE
AUSIS AI UNDERWRITING PLATFORM TRANSFORMING INSURANCEAUSIS AI UNDERWRITING PLATFORM TRANSFORMING INSURANCE
AUSIS AI UNDERWRITING PLATFORM TRANSFORMING INSURANCE
 
Adoption of Technologies for Claims Management in the Health Insurance Sector.
Adoption of Technologies for Claims Management in the Health Insurance Sector.Adoption of Technologies for Claims Management in the Health Insurance Sector.
Adoption of Technologies for Claims Management in the Health Insurance Sector.
 
Health insurance Access in Rural Areas
Health insurance Access in Rural AreasHealth insurance Access in Rural Areas
Health insurance Access in Rural Areas
 

Recently uploaded

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 

Recently uploaded (20)

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 

Fraud detection- Retail, Banking, Finance & FMCG

  • 1. www.artivatic.com contact@artivatic.com Copyright © Artivatic Data Labs Private Limited Artivatic AI Labs Capabilities for Anomaly & Fraud Detection Version 1.0 Artivatic Technology Team Ownership Artivatic Data Labs Private Limited is the owner of the document. Unless otherwise specified, no part of this document may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from Artivatic. Similarly, distribution of this document to a third party is also prohibited unless specific approval is taken from Artivatic
  • 2. www.artivatic.com contact@artivatic.com Copyright © Artivatic Data Labs Private Limited Proposed Possible Solution by Artivatic Artivatic team did study for the problems considering the output, need and processes to identify the best solution for anomaly detection based on time series data and fraud detection in multiple sectors. Introduction Anomaly detection is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. The goal of anomaly detection is to identify unusual or suspicious cases based on deviation from the norm within data that is seemingly homogeneous. Categories of anomaly detection techniques Three broad categories of anomaly detection techniques exist. 1. Unsupervised anomaly detection techniques detect anomalies in an unlabelled test data set under the assumption that the majority of the instances in the data set are normal by looking for instances that seem to fit least to the remainder of the data set. 2. Supervised anomaly detection techniques require a data set that has been labelled as "normal" and "abnormal" and involves training a classifier (the key difference to many other statistical classification problems is the inherent unbalanced nature of outlier detection). 3. Semi-supervised anomaly detection techniques construct a model representing normal behaviour from a given normal training data set, and then testing the likelihood of a test instance to be generated by the learnt model.
  • 3. www.artivatic.com contact@artivatic.com Copyright © Artivatic Data Labs Private Limited Techniques A. Density-based spatial clustering of applications with noise (DBSCAN) 1. Introduction: a. DBSCAN works on the density of the data points. b. Each Cluster has a considerable higher density of points than outside of the cluster. 2. Algorithm: a. Arbitrary select a point p. b. Retrieve all points density-reachable from p wrt radius of the circles and MinPts. c. If p is a core point, a cluster is formed. d. If p is a border point, no points are density-reachable from p and DBSCAN visits the next point of the database. e. Continue the process until all of the points have been processed. 3. How can we apply? a. Client should provide the sales data per city/zone. b. We will apply the DBSCAN algorithm on the historical as well as current data. c. Sales figures tend to stay within a small range. i.e. the sales of ‘Noodles’ in Bangalore is approx. 10k every month. Hence Data-point density near 10k is higher. d. DBSCAN works on the density. If the sales data is not within the historical range of data, we consider it as an Anomaly. Ex. If in the month of June, the sales goes down to 8k for noodles then we pop-up its anomaly. 4. Advantage: No need of training and creating model
  • 4. www.artivatic.com contact@artivatic.com Copyright © Artivatic Data Labs Private Limited 5. Disadvantage: If the same anomaly occurs multiple times, it is not detected. B. Regression Based Anomaly Detection 1. Introduction a. Regression analysis is a form of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable (s) (predictor). b. This technique is used for forecasting, time series modelling and finding the causal effect relationship between the variables. 2. Type of Regression a. Linear Regression b. Logistic Regression c. Polynomial Regression d. Stepwise Regression 3. How can we apply for Client? a. Client should provide data is time-series format. b. From time-series based data we can predict the expected sales for the future. c. If the sales figure for particular time is greater than or less than the Predicted value +/- threshold value. Then that is detected as an Anomaly. d. We calculate threshold depending on the average of the previous error values. e. Threshold is re-calculated automatically after every training. 4. Advantage: Repeated anomaly also detected. 5. Disadvantage: Need to create prediction model and slower than DBSCAN.
  • 5. www.artivatic.com contact@artivatic.com Copyright © Artivatic Data Labs Private Limited Type of Anomalies Anomalies can be broadly categorized as: 1. Point anomalies: a. A single instance of data is anomalous if it's too far off from the rest. b. Business use case: Detecting credit card fraud based on "amount spent." Fig1. Point anomalies
  • 6. www.artivatic.com contact@artivatic.com Copyright © Artivatic Data Labs Private Limited 2. Contextual anomalies: a. An individual data instance is anomalous within a context b. Requires a notion of context c. Also, referred to as “conditional anomalies”. d. Business use case: Spending $100 on food every day during the holiday season is normal, but may be odd otherwise. Fig2. Contextual anomalies 3. Collective anomalies: a. A collection of related data instances is anomalous b. Requires a relationship among data instances i. Sequential Data ii. Spatial Data iii. Graph Data c. The individual instances within a collective anomaly are not anomalous by themselves. d. Business use case: Someone is trying to copy data from a remote machine to a local host unexpectedly, an anomaly that would be flagged as a potential cyber attack.
  • 7. www.artivatic.com contact@artivatic.com Copyright © Artivatic Data Labs Private Limited Fig 3. Collective anomalies Type of Anomalies Solution Process 1. Point anomalies: a. Input data point not conforming to historical models are identified as anomalous. Ex: Sales value of the particular product is very high as compared to overall average of the sales of that product. b. Large difference between two similar products from the same brand should be identified as outliers. Ex: Sales of the “AXE Dark Temptation” and sales of the “AXE Blast Deodorant” have huge difference in sales then it can be anomalous. c. Large gap between sales figures of competing brands should be identified. Ex: Sales of a brand of “shampoo” as compared to the sales of a competing brand of “shampoo” are decreasing. d. Solution: Point anomalies can be found using AV Hybrid approach for anomaly detection.
  • 8. www.artivatic.com contact@artivatic.com Copyright © Artivatic Data Labs Private Limited 2. Contextual anomalies: a. If there is the sudden spike (positive or negative) then it considered as a contextual anomaly. Ex: If there is sudden drop in the sales by 25% for a particular product then it is considered as an anomaly? b. If there is a random change or the value is not correct according the context of the attribute, then it is also considered as an anomaly. Ex: Sales value of the product is $0 on the weekdays. c. Solution: For finding the contextual anomalies we compare the input data with latest history data and find if there is sudden drop above threshold or not. Also find context and apply the context of attribute to find the anomaly. 3. Collective anomalies: a. If there is a continuous change in the figures it can be considered as an anomaly. Ex: Continuous drop in the sales for 4-6 months that may be anomalous. b. Constant value for large period of time can be an anomaly. Ex: If there is sales value of 0 for whole week. c. Solution: Compare input data with previous dataset. If there is constant value that can be an anomaly. Artivatic’s approach for Client’s anomaly Fraud detection
  • 9. www.artivatic.com contact@artivatic.com Copyright © Artivatic Data Labs Private Limited 1. ‘Hybrid approach for the anomaly detection’ will be developed. 2. Hybrid approach combines the advantage of both DBSCAN based Anomaly detection and Regression based anomaly detection. 3. Initially we will apply PCA for dimension reduction on the given dataset which help us to reduce complexity and runtime for Anomaly detection. 4. After PCA, we will apply regression based anomaly detection algorithm on the given dataset to get the initial set of anomalies. 5. In regression based anomaly detection if the data is crossing the threshold limit its consider as the anomaly. 6. But in some edge cases, correct behaviour of data points are identified as anomalous. For example, every Sunday the sales figures are 0 and it will be detected as an outlier in the regression based algorithm. 7. To verify the outliers detected in the regression model, the list of attributes detected as anomalies will be checked using the Density based anomaly detection. 8. The outliers detected after the Density based algorithm will be labelled as anomalous. 9. For finding the contextual anomalies we compare the input data with latest history data and find if there is sudden drop above threshold or not. Also, find context and apply the context of attribute to find the anomaly. 10. To find the collective anomalies, the input data will be compared with the previous dataset. If there is a constant value that can be an anomaly. Details of step 9 and 10 will be defined post further analysis of the given data sets.
  • 10. www.artivatic.com contact@artivatic.com Copyright © Artivatic Data Labs Private Limited System Architecture
  • 11. www.artivatic.com contact@artivatic.com Copyright © Artivatic Data Labs Private Limited Technologies, Languages and Cloud Platform Technologies, we use for architecture, backend, models/algorithms: Scala, C++, Java, Hadoop, OpenCV, Angular2 Our APIs/SDKs are available Scala, Java, JavaScript, PHP, Android, iOS as well. The new technology wrappers can be built quickly based on Client’s technology. Database: Cassandra [We can store data to any other database based on the need of Client] For ML processing: Hadoop, Mahout For NLP Tool: Java, C ++ Cloud Platform: AWS, Google, Azure, Own Servers (It can be installed on client's local/private servers as well, no such restrictions) The technologies will not have dependency much as we will be needing sometime to create required technology focused software as per need of Clients, so no technology, cloud, server and data based dependency for Artivatic to integrate.
  • 12. www.artivatic.com contact@artivatic.com Copyright © Artivatic Data Labs Private Limited Estimation for Completion Time for the PoC Task Name Task in Details Approx. Hours Needed (Hrs) A. Data retrieving from client 60 B. Data Cleaning and pre-processing 40 C. DBSCAN Approach 40 D. Regression based Approach 40 E. Integration and creating Hybrid approach 20 F. Contextual anomalies finding module 80 G. Collective anomalies 80 H. Dashboard Depends if needed to build separately or want to directly integrate with existing Client’s Dashboard 80 I. Internal Testing 50 J. Integration with the client and testing with client On Client’s Servers 80 Total Approx. Hours 475-570 Hrs Approximate Time = 60- 70 Days [If Single person works] The duration can be much lower by spending more time and can finish within 1 month as well if needed by Client. Timings are flexible.
  • 13. www.artivatic.com contact@artivatic.com Copyright © Artivatic Data Labs Private Limited Disclaimer: This document has been released solely for educational and informational purposes. Artivatic does not make any representations or warranties whatsoever regarding quality, reliability, functionality, or compatibility of products and solutions, services and technologies mentioned herewith. Depending on specific situations, products and solutions may need customization, and performance and results may vary.