1. ADVANCED BUSINESS
ANALYTICS
A study on identifying the factors
influencing Fraudulent Insurance
Claims
BALA GOWTHAM CHANDRASEKARAN- A0148536X
-
2. Vision Statement
To perform an exploratory data analysis on the
ABIBA automobile insurance transactional data.
To employ dimensionality reduction methods like
Principal Component Analysis (PCA) to reduce
the given input variables to minimal factors.
To determine the factors, its usage and it’s
reliability to enable the data analytic process for
the given data sets
Team 8 - Assignment 1
3. Factor Analysis
The Data Set contains:
◦ 33 input variables and 15420 sample records
◦ The initial Scree plot is shown below:
The
Component
No. greater
than λ = 1
has 12
factors.
Hence we
start from
factor
analysis
with 12
values.
Team 8 - Assignment 1
4. Factor Analysis
The Factor Analysis was done for
Fraudsters keeping the value of
FraudFound = 1
And the factor analysis for Non-
Fraudsters by keeping the value of
FraudFound = 0
The sample data was checked for multi-
collinearity from the correlation table
Team 8 - Assignment 1
5. Significant Variables (Fraudsters Vs
Non-Fraudsters)
The order of variables
based on communality:
1. Policy Type
2. Vehicle Category
3. Month
4. Month Claimed
The above mentioned variables have high
communalities (i.e. >5)
Variables
Communality
Extraction
PolicyType .930
VehicleCategory .930
Month .919
MonthClaimed .919
AgeOfVehicle .871
AgeOfPolicyHolder .871
Team 8 - Assignment 1
6. How ABIBA is benefitted?
The above mentioned variables helps ABIBA
to find the fraudster by limiting the 33 input
variables to 3 significant factors.
These Factors provides ABIBA with higher
probability of identifying the fraudster and
non-fraudster.
ABIBA can closely monitor these six input
variables to prevent fraudulent activities in
their company.
Team 8 - Assignment 1
7. Model Output Indicating
factors
Rotated Component Matrixa,b
Component
1 2 3
VehicleCategory .964
PolicyType .964
Month .959
MonthClaimed .958
AgeOfPolicyHolder .933
AgeOfVehicle .933
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 4 iterations.
b. Only cases for which FraudFound = 0 are used in the analysis phase.
• The Rotated Component Matric indicates the resulting
components from the most significant variables
• This is the result of Varimax Rotation which consists of
three components resulting in three factors
Team 8 - Assignment 1
8. Model Output Indicating
factors
The Scree plot has 3 factors over Eigen value = 1 for both Fraudulent Vs Non-Fraudulent Factoring
methodologies
FaultFound = 1 FaultFound = 0
Team 8 - Assignment 1
9. Model Output Indicating
factors
The Cumulative Variance are greater than 90% for both Fraudulent and
Non-Fraudulent Factors
Here the absolute value was mentioned as 0.5.
Total Variance Explained
a
Component
Initial Eigenvalues
Extraction Sums of Squared
Loadings
Rotation Sums of Squared
Loadings
Total
% of
Variance
Cumulative
% Total
% of
Variance
Cumulative
% Total
% of
Variance
Cumulative
%
1 1.946 32.435 32.435 1.946 32.435 32.435 1.927 32.118 32.118
2 1.805 30.086 62.521 1.805 30.086 62.521 1.770 29.500 61.618
3 1.712 28.536 91.057 1.712 28.536 91.057 1.766 29.439 91.057
4 .233 3.884 94.941
5 .230 3.837 98.778
6 .073 1.222 100.000
Extraction Method: Principal Component Analysis.
a. Only cases for which FraudFound = 1 are used in the analysis phase.
Total Variance Explained
a
Component
Initial Eigenvalues
Extraction Sums of Squared
Loadings
Rotation Sums of Squared
Loadings
Total
% of
Variance
Cumulative
% Total
% of
Variance
Cumulative
% Total
% of
Variance
Cumulative
%
1 1.867 31.110 31.110 1.867 31.110 31.110 1.860 31.006 31.006
2 1.859 30.985 62.095 1.859 30.985 62.095 1.837 30.625 61.631
3 1.714 28.571 90.666 1.714 28.571 90.666 1.742 29.035 90.666
4 .259 4.311 94.977
5 .162 2.706 97.684
6 .139 2.316 100.000
Extraction Method: Principal Component Analysis.
a. Only cases for which FraudFound = 0 are used in the analysis phase.
FaultFound = 0
FaultFound = 1
Team 8 - Assignment 1
10. Factors based on Order of
Importance
1. Vehicle Category Vs Policy Type
2. Month Vs Month Claimed
3. Age Of Policy Holder Vs Age Of Vehicle
Rotated Component Matrixa,b
Component
1 2 3
VehicleCategory .964
PolicyType .964
Month .959
MonthClaimed .958
AgeOfPolicyHolder .933
AgeOfVehicle .933
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 4 iterations.
b. Only cases for which FraudFound = 0 are used in the analysis phase.
Team 8 - Assignment 1
11. Factors contributing to percentage
of Variance
From slide 9, we infer that the cumulative
variance is greater than 90% (91.057 and
90.666) for both Fraudulent and Non-Fraudulent
values
KMO Measure of sampling Adequacy > 0.600
Significance < 0.05
KMO and Bartlett's Testa
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .600
Bartlett's Test of Sphericity Approx. Chi-Square 3441.530
df 15
Sig. .000
a. Only cases for which FraudFound = 1 are used in the analysis phase.
Team 8 - Assignment 1
12. Reliability of Factors
Reliability Statistics
Cronbach's Alpha N of Items
.786 2
Vehicle Category Vs Policy Type
Reliability Statistics
Cronbach's Alpha N of Items
.853 2
Age Of Vehicle Vs Age Of Policy Holder
Month Vs Month Claimed
Reliability Statistics
Cronbach's Alpha N of Items
.909 2
The Reliability is greater than 0.7 for all the factors
and thus all the factors are highly reliable.
Type equation here.
1
2
3
Team 8 - Assignment 1
13. My Factors = ABIBA’s
Success
By employing factor analysis, we’ve reduced the
number of variables which influence fraud as 6 against
the original 33.
This will ABIBA to narrow down to the exact variables
to manage and there is less cost involved in spotting the
fraud.
These 6 variables can be used to construct a logistic
regression model or any other model instead of given 33
input variables.
Value added for business is how critical these 6 variables
are in order to predict the probability of being fraud or
not
Team 8 - Assignment 1
14. My Factors = ABIBA’s
Success
Team 8 - Assignment 1
15. My Factors = ABIBA’s
Success
From the Composite Score of the Factors, ABIBA
could find the component contributing the most to
fraudulent suspicion.
For instance, in the previous slide, the factor 1
value is high for customer 1 (around 2.255) and
hence for that customer it attributes to the
particular component.
Similarly 1.35 for customer 4 is attributed to
factor 2 and so on.
A negative value indicates that the factor
contributes negatively to determine the fraudster.
Team 8 - Assignment 1