3. Fraud Detection System- What is It?
• Fraud Detection System improves the productivity of claims (analyst) department
to detect fraud
– Higher detection with lower human effort
• Some challenges for an insurance agency
– Fraud cases are not labeled and often unknown i.e. not self revealing
– Patterns of fraud change frequently. Old fraud patterns might not continue.
– Cases occur with relative rarity
– Fewer cases of fraud across small data set (base rate and sample size problems)
• Given the challenges
– Off-the-shelf products might not work effectively. They might yield a one time
performance gain and level off afterwards
– Need a hand crafted solution that matures over time to fit a specific insurance agency’s
business lines
• Long Term view of Fraud Detection System
– Instead of one-time quick performance gain, Deep Blue proposes a long term view for
fraud detection in which we continuously label the new cases with the help of Analysts
and improve the coverage of fraud cases.
3
4. Our Methodology For Chartis Context
• Bootstrap the knowledge base of fraud detection
– Work with existing fraud analyst team/experts to construct criterion for fraud cases
– Anomaly detection by deep analysis of available data and features generates a large
number of hypothesis to locate potential fraud cases
• This is done by detecting anomalies across various hierarchies (providers, claimants,
geographies, etc.) and across features within hierarchies
– This leads to a simple system which flags cases for labeling
• Deploy Machine Learning to analyze labeled cases and construct robust fraud
prediction models
– Adapt the algorithms to changing patterns in the fraud by periodic rebuilding
– Continuously force the fraud prediction models to explore other features (attributes) as
potential lead indicators of fraud. Expand types of fraud that are uncovered.
• Make continuous effort to improve the quality of fraud detection case data
– In bootstrapped system, cases flagged for review may not have a prediction (i.e.
fraud/not and “case of interest” or not) due to lack of labeled data
– Active learning uses subsequently labeled cases to enable prediction of (1) Fraud; and
(2) Cases of Interest
4
5. Fraud Detection System (FDS) – Boot Strapping
Fraud Analysts Labeled Cases
(Human) Labeled Case database
Anomaly Expert System
Detection - Boot Strapping
- Distribution Analysis Knowledge base
- Feature Analysis
5
6. FDS – Expansion and Adaption
Expert System
Decommissioned Labeled Cases
Fraud Analysts
(Human) Labeled Case database
Cases to Fraud And/Or
Evaluate Case Of Interest
Anomaly Fraud Detection Engine
Detection Active Learner -ML Predictive Models
- Distribution Analysis Ongoing - Rankings & Voting
- Feature Analysis - Adapting Models Case Stream
Monitoring
Not Classifiable
Not Fraud and
Not Case Of Interest
6
7. Key Strengths Of Proposed Design
• Incremental design which produces incremental benefits at each step
• Extremely adaptable to changing patterns of fraud
• Modular in design and highly reusable across business lines
– A minor customization is necessary to adapt to specific business lines
• Low risk investment approach
– To improve data collection and knowledge repository around fraud detection
– To develop analytical infrastructure that creates fraud detection capabilities inside
Chartis
• Ability to apply the best of breed techniques and latest research advancements in
fraud detection
– Packaged products often lag cutting edge modeling advancements by a few years
7
8. Sample Application - Auto Insurance Claim Fraud
Labeled Claims
1) Bumper, injury, NY, … F, COI*
Fraud Analysts Labeled Cases
2) Side collision, dent, OH,..NF, NCOI
(Human) 3) flooding, radiator, AZ,.. NF, COI
Cases to Fraud And/Or
Evaluate Case Of Interest
Input for
Fraud Detection Engine Model Building
Anomaly Active Learner - Some Techniques: Logistic
Detection Regression, Neural Networks,
-Active clustering Decision Trees , Random Forest
- Distribution Analysis Ongoing
- Feature Analysis - Fuzzy claims - If trees are used, a potential rule: Case Stream
Monitoring If ( zip = 10063 && type = bumper
&& time_of_incident < 6 AM ) =>
COI
Input Data:
(1) Accident Characteristics Not Classifiable
(2) Claimant Characteristics
(3) Insured Characteristics Not Fraud and
(4) Injury Characteristics Not Case Of Interest
(5) Treatment
* F- Fraud, NF- No Fraud, COI- Case of Interest, NCOI- No Case of Interest
8