SlideShare a Scribd company logo
1 of 15
Download to read offline
K6255 – Knowledge Discovery and Data Mining

                      Statistical Analysis of Caravan Insurance using IBM SPSS

                              Muthu Kumaar Thangavelu (G1101765E)

                                         Muthu1@e.ntu.edu.sg

1. INTRODUCTION:

The data set contains information on customers of an insurance company which includes the
product usage data and socio-demographic data derived from zip area codes supplied by the Dutch
data mining company Sentient Machine Research. Our aim is to predict a customer circle who will be
interested in buying caravan insurance and predict a model with the given 86 variable values
representing the socio demographic, education, insurance interests and income levels of customers.

2. STATISTICAL ANALYSIS

2.1. DATA PREPARATION:

2.1.1. ANALYZING AND CATEGORIZING THE VARIABLES:

We extract and analyze the raw variables with labels and try to categorize the variables based on the
understanding of the insurance product and the product buyers. We classify the broad range of 86
variables to significant predictors as below

CUST_SUB_LIFESTYLE_REFLECTION:

Customer sub type MOSTYPE variable has 41 value types which can be categorised under two broad
classes which relate to their age, social class, life style and reflection towards investing or spending
as follows

- Middle and Upper Class, middle aged and senior citizens, high risk cultured liberal investors (8, 9,
12, 13, 23, 25, 36, 2, 3, 4, 5, 15, and 27)

 - Distributed age and social class, low risk cultured conservative investors
(1,6,7,10,11,14,16,17,18,19,20,21,22,24,26,28,29,30,31,32,33,34,35,37,38,39,40,41)

CUST_LEVEL_LIFECYCLE:

Average age MGEMLEEF holds 6 types of values which can be categorised into three groups and are
based on family status and age.

- Young, family starters (1)
- Middle aged family men (2, 3, and 4)
- Senior, family men (5, 6)
CUST_MAIN_SPEND_INVEST_ATTITUDE:

Customer main type MOSHOOFD can be classified into two groups based on the attitude of
customers towards buying / spending.

- Liberals (1, 2, 5, 6)
- Conservatives (3, 4, 7, 8, 9, 10)

CUST_MARITAL_STAT:

MRELGE, MRELSA, MRELOV, MFALLEEN describe the relationship status of a person which can be
combined into two categories signifying the marital status

- Married (MRELGE)
- Unmarried (MRELSA, MRELOV, MFALLEEN)

CUST_WORK_CATEGORY_PROFILE:

Variables 19 – 24 describe the profile of work category of a person which can be of 2 types.

- Potential income generating high profile work category (MBERHOOG, MBERZELF, MBERMIDD)
- Relatively less Potential Income generating low profile work category (MBERBOER, MBERARBG,
MBERARBO)

CUST_INCOME_LEVEL:

Variables 37 to 41 represent the income of a person which can be grouped into three classes

Low (MINKM30)
Middle (MINK3045, MINK4575)
High (MINK7512, MINK123M)

These can be best represented by a standalone factor depicting the average income (MINKGEM)

CUST_INSURANCE_INTEREST:

Variables 44 to 85 and 35,36 describe the interest of customers towards various insurance policies
in general starting from much needed insurance policies for life, health, disabilities, family/private
accidents and optimal insurance policies for property, small automobiles of individuals (especially
where cost of replacement of damaged parts are as costly as getting a new vehicle) or delivery
vehicles of companies which are operated by third party drivers or an industrial machine to the most
sophisticated policies offering luxury and high safety in the form of private third party insurance
where the insurer pays off the third party even if the insured is at fault and Car, fire and social
security also represent forms of luxury or high sophistication. Hence here is the classification for
both the number and contribution of policies by different customers:

- Individuals opting sophistication and high safety Insurance policies (WAPART, PERSAUT, BRAND,
BYSTAND)
- Firms/Individuals Opting much needed and Optimal Safety Insurance policies (All others)
2.1.2. MAPPING TARGET VARIABLES AS PREDICTORS OF CARAVAN INSURANCE BUYERS:

These predictions have been made with descriptive statistics results of the data set along with the
real world logical themes (Appendix-1)

FACTOR 1: AGE
Middle aged people are more likely to get caravan insurance

FACTOR 2: ATTITUDE TOWARDS SPENDING/ BUYING
People with a liberal attitude predicted by Customer Main type are more likely to get caravan
insurance

FACTOR 3: SOCIAL LIFE STYLE REFLECTOR
People who are modern, professional, middle and upper class and liberal investors of their income
as predicted Customer Sub type are likely to get caravan insurance.

FACTOR 4: MARITAL STATUS

Married Family Men are more likely to buy caravan insurance

FACTOR 5: WORK CATEGORY PROFILE

Potential income generating high profile work category people are more likely to get the insurance.

FACTOR 6: INCOME LEVEL

Average, middle scale Income generators are more likely to get caravan insurance
Here the variable MINKGEM acts as a standalone factor to represent the average income of a
person.

FACTOR 7: INSURANCE INTEREST

Individuals opting highly sophisticated high safety Insurance policies are more likely to buy caravan
insurance

FACTOR 8: PURCHASING POWER CLASS

Individuals who purchase or afford to buy high cost products as caravan insurance is not a need but
a luxury which is aimed at the average and high income generators.

FACTOR 9: RENTED HOME RESIDENTS

Residents who stay in rented home might have their own house in their native or settled elsewhere
in a rented home for work and family convenience or might not have enough savings for investing on
home. All these individuals are more likely to be interested in caravan insurance as they are in need
of a local Asset.

FACTOR 10: CAR OWNERSHIP:
People who own a car signify their buying power, average income and also their interest in cars and
driving and can be interested in buying a caravan and its insurance scheme.
People who own more than one car are unusual and must be car freaks who will be considering the
best quality and fashion symbolizing new models; Caravans are most unlikely to suit their needs.

2.2. DATA TRANSFORMATION

2.2.1. INDEPENDENCE OF DEPENDENT VARIABLES WITH RESPECT TO PREDICTION PARAMETERS:

CUSTOMER SUB TYPE (MOSTYPE) variable represents a combination of the age factor,
spending/buying attitude and social life style. Hence it can be used as a standalone factor for
predicting the potential buyers.

MARRIED PEOPLE are represented by MRELGE and the rest of the variables describing relationship
status can be ignored



2.2.2. INTERACTION VARIABLES DEFINITION FOR INDEPENDENT REPRESENTATION OF A
COMBINATION:

PURCHASING POWER CLASS * AVERAGE INCOME

Work Category, Income Level and purchasing power class can be combined and accurately predicted
as Average Income generators with a high profile work category belonging to the purchasing power
class category represented by the interaction of Independent variables Average Income and
Purchasing Power Class.

PWAPART*PBRAND*PBYSTAND*PERSAUT

People who are already interested in buying sophisticated insurance policies are most likely to
choose caravan insurance. Interaction or Cross Product of Contribution to fire, third party, social
security and car insurance represents a high probability of getting caravan insurance

2.2.3. DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS:
Almost all variables used in the final model are significantly independent predicting different factors
of the caravan insurance buying factor.

CUST_SUB_LIFESTYLE_REFLECTION – Social Lifestyle and Attitude towards Spending/investing
MRELGE – Marital Status
MAUT1 – Single Car Owner
MHHUUR – Rented Home Resident
PBRAND, PPERSAUT, PBYSTAND, PWAPART – Contribution towards different sophisticated and high
safety Insurance policies.

The two factors with significant correlation are MINKGEM and MKOOPKLA where there can be a
bigger overlap in the population logically. It means that Potential Purchasing Class should have a
high or middle scaled average Income which form most part of MINIKGEM variable. So these two
dimensions can be reduced into one that represents high orthogonality of the variable.

 Factor Analysis was carried out and the extracted component was rotated and coded as a regression
variable in the data set.
This new variable PURCHASING_POWER_CLASS_INK represents the reduced component of
MINKGEM and MKOOPKLA through PCA.
The factor analysis results are attached in Appendix-3

2.3. DATA ANALYSIS:

2.3.1. APPLYING LOGISTIC REGRESSION: (WITHOUT INCLUDING THE VARIABLE REDUCED BY PCA)

2.3.1.1. CHOSEN VARIABLES REPRESENTING INDEPENDENT FACTORS TO PREDICT THE CARAVAN
INSURANCE BUYERS:

The predictor variables are represented in 2 blocks of covariates for the dependent variable,
CARAVAN (0- Customers will not buy, 1- will buy)

BLOCK 1:
CUST_SUB_LIFESTYLE_ATTITUDE (Social Life Style Reflector)
MRELEGE (Marital Status)
MAUT1 (Car Ownership factor – Single Car Indicating potential income generation)
MHHUUR (House owners –Potential Earning Factor)

BLOCK 2: (INTERACTION VARIABLES)
PBRAND, PBYSTAND, PPERSAUT, PWAPART (Customer Insurance Interest factor on sophisticated and
high Safety policies)
MKOOPKLA, MINKGEM (Purchasing Power Class with Average Income Level factor)

Method: FORWARD LR
Cut Off Value: 0.5
Probability Entry Criteria: 0.05
Probability Exit Criteria: 0.10
2.3.1.2. CHOOSING THE CATEGORICAL VARIABLES:

The variables which represent a category of users internally are to be marked as categorical in a
logistic regression
In our case
Contribution to various insurance policies (PWAPART, PPERSAUT, PBRAND, and PBYSTAND)
represents internal categories such as high, average and low. They are not evenly distributed across
their base value types as seen in the fig1.3, 1.4, 1.5, 1.6 and hence they can be indicated as
categorical.
Customer Sub type (CUST_SUB_LIFESTYLE_REFLECTION) representing two main categories - Middle
and Upper Class, middle aged and senior citizens, high risk cultured liberal investors and Distributed
age and social class, low risk cultured conservative investors and these values are not evenly spread
as seen in fig 1.2 and they can be treated as categorical.

All other variables are continuous which contain values corresponding to single category which it
stands for. MAUT1 (Owning a Single Car), MRELGE (Married), MHHUUR (Rented Home Residents),
MINKGEM (Average Income), MKOOPKLA (Purchasing Power Class)

The Regression Converged in two steps in block 2 and the prediction model is generated.
The model summary and predictor equation is described in the Appendix-2.

2.3.1.3. GENERATED EQUATION BY LOGISTIC REGRESSION FOR PREDICTING POTENTIAL CARAVAN
INSURANCE BUYERS:

0.073 (MAUT1) +0.069 (MRELGE) – 0.018(MHHUUR) -0.376 (CUST_SUB_LIFESTYLE_REFLECTION(1))
+ 0.016(MINKGEM by MKOOPKLA) + (PBRAND by PBYSTAND by PPERSAUT by PWAPART) – 2.924

Accuracy of the model as predicted by the Nagelkerke R square value is 19.3%

2.3.2. APPLYING LOGISTIC REGRESSION: (WITH THE VARIABLE REDUCED BY PCA)

With the new component extracted with PCA, PURCHASING_POWER_CLASS_INK, we can apply
logistic regression along with other variables.

The regression converged in the first step.

The predictor model is almost the same as the one above without the reduced component through
PCA and is given by the equation

0.093 (MAUT1) +0.069 (MRELGE) – 0.024(MHHUUR) -0.345 (CUST_SUB_LIFESTYLE_REFLECTION (1))
+ 0.237(PURCHASING_POWER_CLASS_INK) + (PBRAND by PBYSTAND by PPERSAUT by PWAPART) –
2.336
The model also has a high degree of accuracy with a Nagelkerke R square percentage of 19.2%
The model summary and predictor equation is described in the Appendix-4.
3. MODEL INSIGHTS AND CONCLUSION:

The understanding and classification of the initial variables have been thoroughly done to reflect
properties of socio demographic, education, lifestyle, income, car and insurance interests with
relevance to the product type. The logically predicted significant variables have then been analyzed
based on the descriptive statistics of the target variables in the data set using IBM SPSS. Dimension
Reduction, Variable Recoding and Interaction Variables definition have been done to represent
accurate and independent predictors. The logistic regression then gives the required predictor
model.
The model should be broad in prediction with appropriate real world logical reasons for categorizing
and recoding of variables so that it holds good for most possible cases and avoids OVERFITTING.



Appendix -1

DESCRIPTIVE STATISTICS – CROSS TAB RESULTS

Fig 1.0. Rental Home Residents Caravan Insurance Buying Pattern
Fig 1.1. Purchasing Power Class Caravan Insurance Buying Pattern




Fig.1.2. Social Lifestyle based Caravan Insurance Buying Pattern (RECODED VARIABLE)

1 – Middle and Upper Class, middle aged and senior citizens, high risk cultured liberal investors
0 - Distributed age and social class, low risk cultured conservative investors
Fig 1.3. Third Party Insurance Buyers and Caravan Insurance buyers




Fig 1.4. Car Insurance Buyers and Caravan Insurance Buyers
Fig 1.5. Fire Insurance Contribution and Caravan Insurance Interest




Fig 1.6. Social Security Insurance Vs Caravan Insurance Buyers
Appendix -2: (Logistic Regression Summary and Last Convergence Results without PCA Component)


                    Model Summary
              -2 Log        Cox & Snell R    Nagelkerke R
Step       likelihood         Square           Square
1            2220.272a                .069             .189
2            2210.325a                .070             .193
a. Estimation terminated at iteration number 20 because
maximum iterations has been reached. Final solution
cannot be found.


Converged Predictors and corresponding Coefficients in
binary logistic regression ( BLOCK 2 - Second Step )


Variables in the Equation




                                             B         S.E.   Wald      df        Sig.     Exp(B)




.

                                                   .
.

                                                .

                      The Cross Product continuing up to (4x4 combinations)




a. Variable(s) entered on step 1: PBRAND * PBYSTAND * PPERSAUT * PWAPART .
 b. Variable(s) entered on step 2: MINKGEM * MKOOPKLA .
Appendix -3 (Logistic Regression with reduced component with PCA)
Initial Components (Average Income and Purchasing Power Class) Vs Principle Component Extracted


APPENDIX -3:

PRINCIPLE COMPONENT ANALYSIS:

FACTOR ANALYSIS:

                   Correlation Matrix
                              MINKGEM MKOOPKLA
Correlation    MINKGEM             1.000            .452
               MKOOPKLA             .452        1.000
Sig. (1-tailed) MINKGEM                             .000
               MKOOPKLA             .000
After Principal Component Analysis -

    Component Matrixa
                Component
                     1
MINKGEM                  .852
MKOOPKLA                 .852
Extraction Method:
Principal Component
Analysis.
a. 1 components extracted.




                     Reproduced Correlations
                                           MINKGEM MKOOPKLA
Reproduced Correlation MINKGEM                 .726a          .726
                           MKOOPKLA             .726          .726a
Residualb                  MINKGEM                            -.274
                           MKOOPKLA            -.274
Extraction Method: Principal Component Analysis.
a. Reproduced communalities
b. Residuals are computed between observed and reproduced
correlations. There are 1 (100.0%) nonredundant residuals with
absolute values greater than 0.05.


APPENDIX -4:


After PCA with the Reduced Component – Binary Logistic Regression with other predictor variables

                    Model Summary
               -2 Log      Cox & Snell R    Nagelkerke R
Step        likelihood       Square           Square
1              2213.728a             .070              .192
a. Estimation terminated at iteration number 20 because
maximum iterations has been reached. Final solution
cannot be found.
Variables in the Equation
                                    B            S.E.          Wald      df         Sig.      Exp(B)
Step 1a   CUST_SUB_LIFESTYLE_REF     -.345              .124     7.778         1       .005      .709
          LECTION(1)
          PURCHASING_POWER_CL           .237            .068    12.009         1       .001     1.268
          ASS_INK
          MHHUUR                     -.024              .024     1.049         1       .306      .976
          MAUT1                         .093            .040     5.315         1       .021     1.098
          PBRAND * PBYSTAND *                                  207.422        112      .000
          PPERSAUT * PWAPART
          PBRAND(1) by              -1.467              .779     3.549         1       .060      .231
          PBYSTAND(1) by
          PPERSAUT(1) by
          PWAPART(1)
          PBRAND(1) by             -18.885      7541.184          .000         1       .998      .000
          PBYSTAND(1) by
          PPERSAUT(1) by
          PWAPART(2)
          PBRAND(1) by              -1.627              .960     2.874         1       .090      .197
          PBYSTAND(1) by
          PPERSAUT(1) by
          PWAPART(3)
          PBRAND(1) by             -19.134     40192.970          .000         1     1.000       .000
          PBYSTAND(1) by
          PPERSAUT(2) by
          PWAPART(1)
          PBRAND(1) by              -3.743         1.257         8.862         1       .003      .024
          PBYSTAND(1) by
          PPERSAUT(3) by
          PWAPART(1)
          PBRAND(1) by               -.218         1.065          .042         1       .838      .804
          PBYSTAND(1) by
          PPERSAUT(3) by
          PWAPART(3)
.
         .
         .
         .
         .
         .
         PBRAND(7) by            -19.341     23141.295       .000            1    .999   .000
         PBYSTAND(4) by
         PPERSAUT(4) by
         PWAPART(1)
         PBRAND(8) by            -18.797     28317.506       .000            1    .999   .000
         PBYSTAND(1) by
         PPERSAUT(1) by
         PWAPART(1)
         PBRAND(8) by            -19.114     40192.970       .000            1   1.000   .000
         PBYSTAND(1) by
         PPERSAUT(1) by
         PWAPART(3)
         PBRAND(8) by            -19.252     28290.099       .000            1    .999   .000
         PBYSTAND(1) by
         PPERSAUT(4) by
         PWAPART(1)
         PBRAND(8) by            -18.921     28301.176       .000            1    .999   .000
         PBYSTAND(1) by
         PPERSAUT(4) by
         PWAPART(3)
         PBRAND(8) by            -19.476     40192.970       .000            1   1.000   .000
         PBYSTAND(1) by
         PPERSAUT(5) by
         PWAPART(3)
         Constant                -2.336            .812     8.271            1    .004   .097
a. Variable(s) entered on step 1: PBRAND * PBYSTAND * PPERSAUT * PWAPART .

More Related Content

What's hot

The age of continuous connection | Nicolaj Siggelkow & Christian Terwiesch
The age of continuous connection | Nicolaj Siggelkow  & Christian Terwiesch The age of continuous connection | Nicolaj Siggelkow  & Christian Terwiesch
The age of continuous connection | Nicolaj Siggelkow & Christian Terwiesch Muhammad Nizam Uddin
 
Mahindra & Mahindra Final Project
Mahindra & Mahindra  Final ProjectMahindra & Mahindra  Final Project
Mahindra & Mahindra Final ProjectRonit Das
 
BMW - A case study
BMW - A case studyBMW - A case study
BMW - A case studyTRIJYA SAINI
 
Tata motors - Consumer Buyer Behavior
Tata motors - Consumer Buyer BehaviorTata motors - Consumer Buyer Behavior
Tata motors - Consumer Buyer BehaviorDhananjay Mull
 
physical distribution network of maruti Suzuki
physical distribution network of maruti Suzuki physical distribution network of maruti Suzuki
physical distribution network of maruti Suzuki MIDHUNASUSANSAMUEL
 
Customer experience management royal enfield
Customer experience management  royal enfieldCustomer experience management  royal enfield
Customer experience management royal enfieldAbrar Mazhar
 
A PROJECT REPORT ON “THE LEADERSHIP STORY OF MARUTI SUZUKI”
A PROJECT REPORT ON “THE LEADERSHIP STORY OF  MARUTI SUZUKI”A PROJECT REPORT ON “THE LEADERSHIP STORY OF  MARUTI SUZUKI”
A PROJECT REPORT ON “THE LEADERSHIP STORY OF MARUTI SUZUKI”jitendrasangle
 
BMW - the ultimate driving machine
BMW - the ultimate driving machineBMW - the ultimate driving machine
BMW - the ultimate driving machineNaveen Yakkundi
 
A project report on analysis of the pre owned car market in hublli of true value
A project report on analysis of the pre owned car market in hublli of true valueA project report on analysis of the pre owned car market in hublli of true value
A project report on analysis of the pre owned car market in hublli of true valueBabasab Patil
 
Supply Chain Management of Tesla
Supply Chain Management of TeslaSupply Chain Management of Tesla
Supply Chain Management of Teslafurqi1
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industryskewdlogix
 
IT Case Study - SAP CRM in Asian Paints
IT Case Study - SAP CRM in Asian PaintsIT Case Study - SAP CRM in Asian Paints
IT Case Study - SAP CRM in Asian PaintsSharad Srivastava
 
SmartHelmet Business Plan Fall 2014
SmartHelmet Business Plan Fall 2014SmartHelmet Business Plan Fall 2014
SmartHelmet Business Plan Fall 2014Emily Tillo
 

What's hot (20)

Prism cement
Prism cement Prism cement
Prism cement
 
Marico ERP
Marico ERPMarico ERP
Marico ERP
 
The age of continuous connection | Nicolaj Siggelkow & Christian Terwiesch
The age of continuous connection | Nicolaj Siggelkow  & Christian Terwiesch The age of continuous connection | Nicolaj Siggelkow  & Christian Terwiesch
The age of continuous connection | Nicolaj Siggelkow & Christian Terwiesch
 
Malhotra20
Malhotra20Malhotra20
Malhotra20
 
Mahindra & Mahindra Final Project
Mahindra & Mahindra  Final ProjectMahindra & Mahindra  Final Project
Mahindra & Mahindra Final Project
 
BMW - A case study
BMW - A case studyBMW - A case study
BMW - A case study
 
Kirana stores
Kirana storesKirana stores
Kirana stores
 
Tata motors - Consumer Buyer Behavior
Tata motors - Consumer Buyer BehaviorTata motors - Consumer Buyer Behavior
Tata motors - Consumer Buyer Behavior
 
Project d mart
Project d martProject d mart
Project d mart
 
physical distribution network of maruti Suzuki
physical distribution network of maruti Suzuki physical distribution network of maruti Suzuki
physical distribution network of maruti Suzuki
 
Customer experience management royal enfield
Customer experience management  royal enfieldCustomer experience management  royal enfield
Customer experience management royal enfield
 
A PROJECT REPORT ON “THE LEADERSHIP STORY OF MARUTI SUZUKI”
A PROJECT REPORT ON “THE LEADERSHIP STORY OF  MARUTI SUZUKI”A PROJECT REPORT ON “THE LEADERSHIP STORY OF  MARUTI SUZUKI”
A PROJECT REPORT ON “THE LEADERSHIP STORY OF MARUTI SUZUKI”
 
BMW - the ultimate driving machine
BMW - the ultimate driving machineBMW - the ultimate driving machine
BMW - the ultimate driving machine
 
A project report on analysis of the pre owned car market in hublli of true value
A project report on analysis of the pre owned car market in hublli of true valueA project report on analysis of the pre owned car market in hublli of true value
A project report on analysis of the pre owned car market in hublli of true value
 
Jaguar
JaguarJaguar
Jaguar
 
Supply Chain Management of Tesla
Supply Chain Management of TeslaSupply Chain Management of Tesla
Supply Chain Management of Tesla
 
Gaps model on e commerce
Gaps model on e commerceGaps model on e commerce
Gaps model on e commerce
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industry
 
IT Case Study - SAP CRM in Asian Paints
IT Case Study - SAP CRM in Asian PaintsIT Case Study - SAP CRM in Asian Paints
IT Case Study - SAP CRM in Asian Paints
 
SmartHelmet Business Plan Fall 2014
SmartHelmet Business Plan Fall 2014SmartHelmet Business Plan Fall 2014
SmartHelmet Business Plan Fall 2014
 

Similar to Caravan insurance data mining statistical analysis

Bank Customer Segmentation & Insurance Claim Prediction
Bank Customer Segmentation & Insurance Claim PredictionBank Customer Segmentation & Insurance Claim Prediction
Bank Customer Segmentation & Insurance Claim PredictionIRJET Journal
 
Mf0018 insurance and risk management
Mf0018   insurance and risk managementMf0018   insurance and risk management
Mf0018 insurance and risk managementStudy Stuff
 
The Future of P&C Insurance
The Future of P&C InsuranceThe Future of P&C Insurance
The Future of P&C InsuranceChayan Dutta
 
Epgp09 10 - term v - prm - group ii - pricing in-insurance_industry - project...
Epgp09 10 - term v - prm - group ii - pricing in-insurance_industry - project...Epgp09 10 - term v - prm - group ii - pricing in-insurance_industry - project...
Epgp09 10 - term v - prm - group ii - pricing in-insurance_industry - project...Rajendra Inani
 
Profiling banking customers - Insurance and Pension Products
Profiling banking customers - Insurance and Pension ProductsProfiling banking customers - Insurance and Pension Products
Profiling banking customers - Insurance and Pension ProductsSuryakumar Thangarasu
 
Proof of Concept Kit - Insurance
Proof of Concept Kit - InsuranceProof of Concept Kit - Insurance
Proof of Concept Kit - InsuranceVarun Mittal
 
Training report on metlife- shubhashish
Training report on metlife-  shubhashishTraining report on metlife-  shubhashish
Training report on metlife- shubhashishShubhashish Mandal
 
2017 Top Issues - Changing Business Models - January 2017
2017 Top Issues -  Changing Business Models  - January 20172017 Top Issues -  Changing Business Models  - January 2017
2017 Top Issues - Changing Business Models - January 2017PwC
 
Mf0018 insurance and risk management
Mf0018   insurance and risk managementMf0018   insurance and risk management
Mf0018 insurance and risk managementsmumbahelp
 
Mf0018 insurance and risk management
Mf0018   insurance and risk managementMf0018   insurance and risk management
Mf0018 insurance and risk managementsmumbahelp
 
G031102045061
G031102045061G031102045061
G031102045061theijes
 
Data Mining Applied To Construct Risk Factors For Building Claim on Fire Insu...
Data Mining Applied To Construct Risk Factors For Building Claim on Fire Insu...Data Mining Applied To Construct Risk Factors For Building Claim on Fire Insu...
Data Mining Applied To Construct Risk Factors For Building Claim on Fire Insu...theijes
 
Small-ticket Insurance point of view - VF
Small-ticket Insurance point of view - VFSmall-ticket Insurance point of view - VF
Small-ticket Insurance point of view - VFRiaan Singh
 
Customer perception towards max newyork life insurance
Customer perception towards max newyork life insuranceCustomer perception towards max newyork life insurance
Customer perception towards max newyork life insurancemalay srivastava
 

Similar to Caravan insurance data mining statistical analysis (20)

Bank Customer Segmentation & Insurance Claim Prediction
Bank Customer Segmentation & Insurance Claim PredictionBank Customer Segmentation & Insurance Claim Prediction
Bank Customer Segmentation & Insurance Claim Prediction
 
Mf0018 insurance and risk management
Mf0018   insurance and risk managementMf0018   insurance and risk management
Mf0018 insurance and risk management
 
The Future of P&C Insurance
The Future of P&C InsuranceThe Future of P&C Insurance
The Future of P&C Insurance
 
Epgp09 10 - term v - prm - group ii - pricing in-insurance_industry - project...
Epgp09 10 - term v - prm - group ii - pricing in-insurance_industry - project...Epgp09 10 - term v - prm - group ii - pricing in-insurance_industry - project...
Epgp09 10 - term v - prm - group ii - pricing in-insurance_industry - project...
 
Profiling banking customers - Insurance and Pension Products
Profiling banking customers - Insurance and Pension ProductsProfiling banking customers - Insurance and Pension Products
Profiling banking customers - Insurance and Pension Products
 
Proof of Concept Kit - Insurance
Proof of Concept Kit - InsuranceProof of Concept Kit - Insurance
Proof of Concept Kit - Insurance
 
ALO2016-MV-Insurance
ALO2016-MV-InsuranceALO2016-MV-Insurance
ALO2016-MV-Insurance
 
Startup InsurTech Award - Laka
Startup InsurTech Award - LakaStartup InsurTech Award - Laka
Startup InsurTech Award - Laka
 
Training report on metlife- shubhashish
Training report on metlife-  shubhashishTraining report on metlife-  shubhashish
Training report on metlife- shubhashish
 
2017 Top Issues - Changing Business Models - January 2017
2017 Top Issues -  Changing Business Models  - January 20172017 Top Issues -  Changing Business Models  - January 2017
2017 Top Issues - Changing Business Models - January 2017
 
Recitation of Public and Private Sector General Insurance Industry in Structu...
Recitation of Public and Private Sector General Insurance Industry in Structu...Recitation of Public and Private Sector General Insurance Industry in Structu...
Recitation of Public and Private Sector General Insurance Industry in Structu...
 
Recitation of Public and Private Sector General Insurance Industry in Structu...
Recitation of Public and Private Sector General Insurance Industry in Structu...Recitation of Public and Private Sector General Insurance Industry in Structu...
Recitation of Public and Private Sector General Insurance Industry in Structu...
 
Mf0018 insurance and risk management
Mf0018   insurance and risk managementMf0018   insurance and risk management
Mf0018 insurance and risk management
 
Mf0018 insurance and risk management
Mf0018   insurance and risk managementMf0018   insurance and risk management
Mf0018 insurance and risk management
 
G031102045061
G031102045061G031102045061
G031102045061
 
Data Mining Applied To Construct Risk Factors For Building Claim on Fire Insu...
Data Mining Applied To Construct Risk Factors For Building Claim on Fire Insu...Data Mining Applied To Construct Risk Factors For Building Claim on Fire Insu...
Data Mining Applied To Construct Risk Factors For Building Claim on Fire Insu...
 
Ltv upsellig
Ltv upselligLtv upsellig
Ltv upsellig
 
Article
ArticleArticle
Article
 
Small-ticket Insurance point of view - VF
Small-ticket Insurance point of view - VFSmall-ticket Insurance point of view - VF
Small-ticket Insurance point of view - VF
 
Customer perception towards max newyork life insurance
Customer perception towards max newyork life insuranceCustomer perception towards max newyork life insurance
Customer perception towards max newyork life insurance
 

More from Muthu Kumaar Thangavelu

Semantic web design for www.data.gov.sg - Technical Report
Semantic web design for www.data.gov.sg - Technical ReportSemantic web design for www.data.gov.sg - Technical Report
Semantic web design for www.data.gov.sg - Technical ReportMuthu Kumaar Thangavelu
 
Semantic web design for www.data.gov.sg - Presentation
Semantic web design for www.data.gov.sg - PresentationSemantic web design for www.data.gov.sg - Presentation
Semantic web design for www.data.gov.sg - PresentationMuthu Kumaar Thangavelu
 
Knowledge Management and Risk Management Connection explained with Unilever
Knowledge Management and Risk Management Connection explained with UnileverKnowledge Management and Risk Management Connection explained with Unilever
Knowledge Management and Risk Management Connection explained with UnileverMuthu Kumaar Thangavelu
 
Bp business and information strategy alignment
Bp   business and information strategy alignmentBp   business and information strategy alignment
Bp business and information strategy alignmentMuthu Kumaar Thangavelu
 
Unilever's Lipton Risk Management with Business Intelligence
Unilever's Lipton Risk Management with Business IntelligenceUnilever's Lipton Risk Management with Business Intelligence
Unilever's Lipton Risk Management with Business IntelligenceMuthu Kumaar Thangavelu
 
Information to Intelligence (BI Context)
Information to Intelligence (BI Context)Information to Intelligence (BI Context)
Information to Intelligence (BI Context)Muthu Kumaar Thangavelu
 
Load balancing implementation in wireless networks
Load balancing implementation in wireless networksLoad balancing implementation in wireless networks
Load balancing implementation in wireless networksMuthu Kumaar Thangavelu
 
Boeing rocketdyne radical innovation case study
Boeing rocketdyne radical innovation case studyBoeing rocketdyne radical innovation case study
Boeing rocketdyne radical innovation case studyMuthu Kumaar Thangavelu
 
Habits that Knowledge workers need to cultivate
Habits that Knowledge workers need to cultivateHabits that Knowledge workers need to cultivate
Habits that Knowledge workers need to cultivateMuthu Kumaar Thangavelu
 
Knowledge process productivity indexing schema
Knowledge process productivity indexing schemaKnowledge process productivity indexing schema
Knowledge process productivity indexing schemaMuthu Kumaar Thangavelu
 
Innovation management in fashion industry
Innovation management in fashion industryInnovation management in fashion industry
Innovation management in fashion industryMuthu Kumaar Thangavelu
 

More from Muthu Kumaar Thangavelu (15)

Semantic web design for www.data.gov.sg - Technical Report
Semantic web design for www.data.gov.sg - Technical ReportSemantic web design for www.data.gov.sg - Technical Report
Semantic web design for www.data.gov.sg - Technical Report
 
Semantic web design for www.data.gov.sg - Presentation
Semantic web design for www.data.gov.sg - PresentationSemantic web design for www.data.gov.sg - Presentation
Semantic web design for www.data.gov.sg - Presentation
 
Knowledge Management and Risk Management Connection explained with Unilever
Knowledge Management and Risk Management Connection explained with UnileverKnowledge Management and Risk Management Connection explained with Unilever
Knowledge Management and Risk Management Connection explained with Unilever
 
Bp business and information strategy alignment
Bp   business and information strategy alignmentBp   business and information strategy alignment
Bp business and information strategy alignment
 
Unilever's Lipton Risk Management with Business Intelligence
Unilever's Lipton Risk Management with Business IntelligenceUnilever's Lipton Risk Management with Business Intelligence
Unilever's Lipton Risk Management with Business Intelligence
 
Ul lipton-presentation v4
Ul lipton-presentation v4Ul lipton-presentation v4
Ul lipton-presentation v4
 
Information to Intelligence (BI Context)
Information to Intelligence (BI Context)Information to Intelligence (BI Context)
Information to Intelligence (BI Context)
 
Load balancing implementation in wireless networks
Load balancing implementation in wireless networksLoad balancing implementation in wireless networks
Load balancing implementation in wireless networks
 
Human Capital Management
Human Capital ManagementHuman Capital Management
Human Capital Management
 
Buckmann labs KM case study
Buckmann labs KM case studyBuckmann labs KM case study
Buckmann labs KM case study
 
Boeing rocketdyne radical innovation case study
Boeing rocketdyne radical innovation case studyBoeing rocketdyne radical innovation case study
Boeing rocketdyne radical innovation case study
 
Habits that Knowledge workers need to cultivate
Habits that Knowledge workers need to cultivateHabits that Knowledge workers need to cultivate
Habits that Knowledge workers need to cultivate
 
Knowledge process productivity indexing schema
Knowledge process productivity indexing schemaKnowledge process productivity indexing schema
Knowledge process productivity indexing schema
 
Innovation management in fashion industry
Innovation management in fashion industryInnovation management in fashion industry
Innovation management in fashion industry
 
Linked data migrational framework
Linked data migrational frameworkLinked data migrational framework
Linked data migrational framework
 

Recently uploaded

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Caravan insurance data mining statistical analysis

  • 1. K6255 – Knowledge Discovery and Data Mining Statistical Analysis of Caravan Insurance using IBM SPSS Muthu Kumaar Thangavelu (G1101765E) Muthu1@e.ntu.edu.sg 1. INTRODUCTION: The data set contains information on customers of an insurance company which includes the product usage data and socio-demographic data derived from zip area codes supplied by the Dutch data mining company Sentient Machine Research. Our aim is to predict a customer circle who will be interested in buying caravan insurance and predict a model with the given 86 variable values representing the socio demographic, education, insurance interests and income levels of customers. 2. STATISTICAL ANALYSIS 2.1. DATA PREPARATION: 2.1.1. ANALYZING AND CATEGORIZING THE VARIABLES: We extract and analyze the raw variables with labels and try to categorize the variables based on the understanding of the insurance product and the product buyers. We classify the broad range of 86 variables to significant predictors as below CUST_SUB_LIFESTYLE_REFLECTION: Customer sub type MOSTYPE variable has 41 value types which can be categorised under two broad classes which relate to their age, social class, life style and reflection towards investing or spending as follows - Middle and Upper Class, middle aged and senior citizens, high risk cultured liberal investors (8, 9, 12, 13, 23, 25, 36, 2, 3, 4, 5, 15, and 27) - Distributed age and social class, low risk cultured conservative investors (1,6,7,10,11,14,16,17,18,19,20,21,22,24,26,28,29,30,31,32,33,34,35,37,38,39,40,41) CUST_LEVEL_LIFECYCLE: Average age MGEMLEEF holds 6 types of values which can be categorised into three groups and are based on family status and age. - Young, family starters (1) - Middle aged family men (2, 3, and 4) - Senior, family men (5, 6)
  • 2. CUST_MAIN_SPEND_INVEST_ATTITUDE: Customer main type MOSHOOFD can be classified into two groups based on the attitude of customers towards buying / spending. - Liberals (1, 2, 5, 6) - Conservatives (3, 4, 7, 8, 9, 10) CUST_MARITAL_STAT: MRELGE, MRELSA, MRELOV, MFALLEEN describe the relationship status of a person which can be combined into two categories signifying the marital status - Married (MRELGE) - Unmarried (MRELSA, MRELOV, MFALLEEN) CUST_WORK_CATEGORY_PROFILE: Variables 19 – 24 describe the profile of work category of a person which can be of 2 types. - Potential income generating high profile work category (MBERHOOG, MBERZELF, MBERMIDD) - Relatively less Potential Income generating low profile work category (MBERBOER, MBERARBG, MBERARBO) CUST_INCOME_LEVEL: Variables 37 to 41 represent the income of a person which can be grouped into three classes Low (MINKM30) Middle (MINK3045, MINK4575) High (MINK7512, MINK123M) These can be best represented by a standalone factor depicting the average income (MINKGEM) CUST_INSURANCE_INTEREST: Variables 44 to 85 and 35,36 describe the interest of customers towards various insurance policies in general starting from much needed insurance policies for life, health, disabilities, family/private accidents and optimal insurance policies for property, small automobiles of individuals (especially where cost of replacement of damaged parts are as costly as getting a new vehicle) or delivery vehicles of companies which are operated by third party drivers or an industrial machine to the most sophisticated policies offering luxury and high safety in the form of private third party insurance where the insurer pays off the third party even if the insured is at fault and Car, fire and social security also represent forms of luxury or high sophistication. Hence here is the classification for both the number and contribution of policies by different customers: - Individuals opting sophistication and high safety Insurance policies (WAPART, PERSAUT, BRAND, BYSTAND) - Firms/Individuals Opting much needed and Optimal Safety Insurance policies (All others)
  • 3. 2.1.2. MAPPING TARGET VARIABLES AS PREDICTORS OF CARAVAN INSURANCE BUYERS: These predictions have been made with descriptive statistics results of the data set along with the real world logical themes (Appendix-1) FACTOR 1: AGE Middle aged people are more likely to get caravan insurance FACTOR 2: ATTITUDE TOWARDS SPENDING/ BUYING People with a liberal attitude predicted by Customer Main type are more likely to get caravan insurance FACTOR 3: SOCIAL LIFE STYLE REFLECTOR People who are modern, professional, middle and upper class and liberal investors of their income as predicted Customer Sub type are likely to get caravan insurance. FACTOR 4: MARITAL STATUS Married Family Men are more likely to buy caravan insurance FACTOR 5: WORK CATEGORY PROFILE Potential income generating high profile work category people are more likely to get the insurance. FACTOR 6: INCOME LEVEL Average, middle scale Income generators are more likely to get caravan insurance Here the variable MINKGEM acts as a standalone factor to represent the average income of a person. FACTOR 7: INSURANCE INTEREST Individuals opting highly sophisticated high safety Insurance policies are more likely to buy caravan insurance FACTOR 8: PURCHASING POWER CLASS Individuals who purchase or afford to buy high cost products as caravan insurance is not a need but a luxury which is aimed at the average and high income generators. FACTOR 9: RENTED HOME RESIDENTS Residents who stay in rented home might have their own house in their native or settled elsewhere in a rented home for work and family convenience or might not have enough savings for investing on
  • 4. home. All these individuals are more likely to be interested in caravan insurance as they are in need of a local Asset. FACTOR 10: CAR OWNERSHIP: People who own a car signify their buying power, average income and also their interest in cars and driving and can be interested in buying a caravan and its insurance scheme. People who own more than one car are unusual and must be car freaks who will be considering the best quality and fashion symbolizing new models; Caravans are most unlikely to suit their needs. 2.2. DATA TRANSFORMATION 2.2.1. INDEPENDENCE OF DEPENDENT VARIABLES WITH RESPECT TO PREDICTION PARAMETERS: CUSTOMER SUB TYPE (MOSTYPE) variable represents a combination of the age factor, spending/buying attitude and social life style. Hence it can be used as a standalone factor for predicting the potential buyers. MARRIED PEOPLE are represented by MRELGE and the rest of the variables describing relationship status can be ignored 2.2.2. INTERACTION VARIABLES DEFINITION FOR INDEPENDENT REPRESENTATION OF A COMBINATION: PURCHASING POWER CLASS * AVERAGE INCOME Work Category, Income Level and purchasing power class can be combined and accurately predicted as Average Income generators with a high profile work category belonging to the purchasing power class category represented by the interaction of Independent variables Average Income and Purchasing Power Class. PWAPART*PBRAND*PBYSTAND*PERSAUT People who are already interested in buying sophisticated insurance policies are most likely to choose caravan insurance. Interaction or Cross Product of Contribution to fire, third party, social security and car insurance represents a high probability of getting caravan insurance 2.2.3. DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS: Almost all variables used in the final model are significantly independent predicting different factors of the caravan insurance buying factor. CUST_SUB_LIFESTYLE_REFLECTION – Social Lifestyle and Attitude towards Spending/investing MRELGE – Marital Status MAUT1 – Single Car Owner MHHUUR – Rented Home Resident
  • 5. PBRAND, PPERSAUT, PBYSTAND, PWAPART – Contribution towards different sophisticated and high safety Insurance policies. The two factors with significant correlation are MINKGEM and MKOOPKLA where there can be a bigger overlap in the population logically. It means that Potential Purchasing Class should have a high or middle scaled average Income which form most part of MINIKGEM variable. So these two dimensions can be reduced into one that represents high orthogonality of the variable. Factor Analysis was carried out and the extracted component was rotated and coded as a regression variable in the data set. This new variable PURCHASING_POWER_CLASS_INK represents the reduced component of MINKGEM and MKOOPKLA through PCA. The factor analysis results are attached in Appendix-3 2.3. DATA ANALYSIS: 2.3.1. APPLYING LOGISTIC REGRESSION: (WITHOUT INCLUDING THE VARIABLE REDUCED BY PCA) 2.3.1.1. CHOSEN VARIABLES REPRESENTING INDEPENDENT FACTORS TO PREDICT THE CARAVAN INSURANCE BUYERS: The predictor variables are represented in 2 blocks of covariates for the dependent variable, CARAVAN (0- Customers will not buy, 1- will buy) BLOCK 1: CUST_SUB_LIFESTYLE_ATTITUDE (Social Life Style Reflector) MRELEGE (Marital Status) MAUT1 (Car Ownership factor – Single Car Indicating potential income generation) MHHUUR (House owners –Potential Earning Factor) BLOCK 2: (INTERACTION VARIABLES) PBRAND, PBYSTAND, PPERSAUT, PWAPART (Customer Insurance Interest factor on sophisticated and high Safety policies) MKOOPKLA, MINKGEM (Purchasing Power Class with Average Income Level factor) Method: FORWARD LR Cut Off Value: 0.5 Probability Entry Criteria: 0.05 Probability Exit Criteria: 0.10
  • 6. 2.3.1.2. CHOOSING THE CATEGORICAL VARIABLES: The variables which represent a category of users internally are to be marked as categorical in a logistic regression In our case Contribution to various insurance policies (PWAPART, PPERSAUT, PBRAND, and PBYSTAND) represents internal categories such as high, average and low. They are not evenly distributed across their base value types as seen in the fig1.3, 1.4, 1.5, 1.6 and hence they can be indicated as categorical. Customer Sub type (CUST_SUB_LIFESTYLE_REFLECTION) representing two main categories - Middle and Upper Class, middle aged and senior citizens, high risk cultured liberal investors and Distributed age and social class, low risk cultured conservative investors and these values are not evenly spread as seen in fig 1.2 and they can be treated as categorical. All other variables are continuous which contain values corresponding to single category which it stands for. MAUT1 (Owning a Single Car), MRELGE (Married), MHHUUR (Rented Home Residents), MINKGEM (Average Income), MKOOPKLA (Purchasing Power Class) The Regression Converged in two steps in block 2 and the prediction model is generated. The model summary and predictor equation is described in the Appendix-2. 2.3.1.3. GENERATED EQUATION BY LOGISTIC REGRESSION FOR PREDICTING POTENTIAL CARAVAN INSURANCE BUYERS: 0.073 (MAUT1) +0.069 (MRELGE) – 0.018(MHHUUR) -0.376 (CUST_SUB_LIFESTYLE_REFLECTION(1)) + 0.016(MINKGEM by MKOOPKLA) + (PBRAND by PBYSTAND by PPERSAUT by PWAPART) – 2.924 Accuracy of the model as predicted by the Nagelkerke R square value is 19.3% 2.3.2. APPLYING LOGISTIC REGRESSION: (WITH THE VARIABLE REDUCED BY PCA) With the new component extracted with PCA, PURCHASING_POWER_CLASS_INK, we can apply logistic regression along with other variables. The regression converged in the first step. The predictor model is almost the same as the one above without the reduced component through PCA and is given by the equation 0.093 (MAUT1) +0.069 (MRELGE) – 0.024(MHHUUR) -0.345 (CUST_SUB_LIFESTYLE_REFLECTION (1)) + 0.237(PURCHASING_POWER_CLASS_INK) + (PBRAND by PBYSTAND by PPERSAUT by PWAPART) – 2.336 The model also has a high degree of accuracy with a Nagelkerke R square percentage of 19.2% The model summary and predictor equation is described in the Appendix-4.
  • 7. 3. MODEL INSIGHTS AND CONCLUSION: The understanding and classification of the initial variables have been thoroughly done to reflect properties of socio demographic, education, lifestyle, income, car and insurance interests with relevance to the product type. The logically predicted significant variables have then been analyzed based on the descriptive statistics of the target variables in the data set using IBM SPSS. Dimension Reduction, Variable Recoding and Interaction Variables definition have been done to represent accurate and independent predictors. The logistic regression then gives the required predictor model. The model should be broad in prediction with appropriate real world logical reasons for categorizing and recoding of variables so that it holds good for most possible cases and avoids OVERFITTING. Appendix -1 DESCRIPTIVE STATISTICS – CROSS TAB RESULTS Fig 1.0. Rental Home Residents Caravan Insurance Buying Pattern
  • 8. Fig 1.1. Purchasing Power Class Caravan Insurance Buying Pattern Fig.1.2. Social Lifestyle based Caravan Insurance Buying Pattern (RECODED VARIABLE) 1 – Middle and Upper Class, middle aged and senior citizens, high risk cultured liberal investors 0 - Distributed age and social class, low risk cultured conservative investors
  • 9. Fig 1.3. Third Party Insurance Buyers and Caravan Insurance buyers Fig 1.4. Car Insurance Buyers and Caravan Insurance Buyers
  • 10. Fig 1.5. Fire Insurance Contribution and Caravan Insurance Interest Fig 1.6. Social Security Insurance Vs Caravan Insurance Buyers
  • 11. Appendix -2: (Logistic Regression Summary and Last Convergence Results without PCA Component) Model Summary -2 Log Cox & Snell R Nagelkerke R Step likelihood Square Square 1 2220.272a .069 .189 2 2210.325a .070 .193 a. Estimation terminated at iteration number 20 because maximum iterations has been reached. Final solution cannot be found. Converged Predictors and corresponding Coefficients in binary logistic regression ( BLOCK 2 - Second Step ) Variables in the Equation B S.E. Wald df Sig. Exp(B) . .
  • 12. . . The Cross Product continuing up to (4x4 combinations) a. Variable(s) entered on step 1: PBRAND * PBYSTAND * PPERSAUT * PWAPART . b. Variable(s) entered on step 2: MINKGEM * MKOOPKLA . Appendix -3 (Logistic Regression with reduced component with PCA) Initial Components (Average Income and Purchasing Power Class) Vs Principle Component Extracted APPENDIX -3: PRINCIPLE COMPONENT ANALYSIS: FACTOR ANALYSIS: Correlation Matrix MINKGEM MKOOPKLA Correlation MINKGEM 1.000 .452 MKOOPKLA .452 1.000 Sig. (1-tailed) MINKGEM .000 MKOOPKLA .000
  • 13. After Principal Component Analysis - Component Matrixa Component 1 MINKGEM .852 MKOOPKLA .852 Extraction Method: Principal Component Analysis. a. 1 components extracted. Reproduced Correlations MINKGEM MKOOPKLA Reproduced Correlation MINKGEM .726a .726 MKOOPKLA .726 .726a Residualb MINKGEM -.274 MKOOPKLA -.274 Extraction Method: Principal Component Analysis. a. Reproduced communalities b. Residuals are computed between observed and reproduced correlations. There are 1 (100.0%) nonredundant residuals with absolute values greater than 0.05. APPENDIX -4: After PCA with the Reduced Component – Binary Logistic Regression with other predictor variables Model Summary -2 Log Cox & Snell R Nagelkerke R Step likelihood Square Square 1 2213.728a .070 .192 a. Estimation terminated at iteration number 20 because maximum iterations has been reached. Final solution cannot be found.
  • 14. Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 1a CUST_SUB_LIFESTYLE_REF -.345 .124 7.778 1 .005 .709 LECTION(1) PURCHASING_POWER_CL .237 .068 12.009 1 .001 1.268 ASS_INK MHHUUR -.024 .024 1.049 1 .306 .976 MAUT1 .093 .040 5.315 1 .021 1.098 PBRAND * PBYSTAND * 207.422 112 .000 PPERSAUT * PWAPART PBRAND(1) by -1.467 .779 3.549 1 .060 .231 PBYSTAND(1) by PPERSAUT(1) by PWAPART(1) PBRAND(1) by -18.885 7541.184 .000 1 .998 .000 PBYSTAND(1) by PPERSAUT(1) by PWAPART(2) PBRAND(1) by -1.627 .960 2.874 1 .090 .197 PBYSTAND(1) by PPERSAUT(1) by PWAPART(3) PBRAND(1) by -19.134 40192.970 .000 1 1.000 .000 PBYSTAND(1) by PPERSAUT(2) by PWAPART(1) PBRAND(1) by -3.743 1.257 8.862 1 .003 .024 PBYSTAND(1) by PPERSAUT(3) by PWAPART(1) PBRAND(1) by -.218 1.065 .042 1 .838 .804 PBYSTAND(1) by PPERSAUT(3) by PWAPART(3)
  • 15. . . . . . . PBRAND(7) by -19.341 23141.295 .000 1 .999 .000 PBYSTAND(4) by PPERSAUT(4) by PWAPART(1) PBRAND(8) by -18.797 28317.506 .000 1 .999 .000 PBYSTAND(1) by PPERSAUT(1) by PWAPART(1) PBRAND(8) by -19.114 40192.970 .000 1 1.000 .000 PBYSTAND(1) by PPERSAUT(1) by PWAPART(3) PBRAND(8) by -19.252 28290.099 .000 1 .999 .000 PBYSTAND(1) by PPERSAUT(4) by PWAPART(1) PBRAND(8) by -18.921 28301.176 .000 1 .999 .000 PBYSTAND(1) by PPERSAUT(4) by PWAPART(3) PBRAND(8) by -19.476 40192.970 .000 1 1.000 .000 PBYSTAND(1) by PPERSAUT(5) by PWAPART(3) Constant -2.336 .812 8.271 1 .004 .097 a. Variable(s) entered on step 1: PBRAND * PBYSTAND * PPERSAUT * PWAPART .