Upcoming SlideShare
×

# Caravan insurance data mining statistical analysis

2,018 views

Published on

Published in: Technology, Economy & Finance
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
2,018
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
72
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Caravan insurance data mining statistical analysis

1. 1. K6255 – Knowledge Discovery and Data Mining Statistical Analysis of Caravan Insurance using IBM SPSS Muthu Kumaar Thangavelu (G1101765E) Muthu1@e.ntu.edu.sg1. INTRODUCTION:The data set contains information on customers of an insurance company which includes theproduct usage data and socio-demographic data derived from zip area codes supplied by the Dutchdata mining company Sentient Machine Research. Our aim is to predict a customer circle who will beinterested in buying caravan insurance and predict a model with the given 86 variable valuesrepresenting the socio demographic, education, insurance interests and income levels of customers.2. STATISTICAL ANALYSIS2.1. DATA PREPARATION:2.1.1. ANALYZING AND CATEGORIZING THE VARIABLES:We extract and analyze the raw variables with labels and try to categorize the variables based on theunderstanding of the insurance product and the product buyers. We classify the broad range of 86variables to significant predictors as belowCUST_SUB_LIFESTYLE_REFLECTION:Customer sub type MOSTYPE variable has 41 value types which can be categorised under two broadclasses which relate to their age, social class, life style and reflection towards investing or spendingas follows- Middle and Upper Class, middle aged and senior citizens, high risk cultured liberal investors (8, 9,12, 13, 23, 25, 36, 2, 3, 4, 5, 15, and 27) - Distributed age and social class, low risk cultured conservative investors(1,6,7,10,11,14,16,17,18,19,20,21,22,24,26,28,29,30,31,32,33,34,35,37,38,39,40,41)CUST_LEVEL_LIFECYCLE:Average age MGEMLEEF holds 6 types of values which can be categorised into three groups and arebased on family status and age.- Young, family starters (1)- Middle aged family men (2, 3, and 4)- Senior, family men (5, 6)
2. 2. CUST_MAIN_SPEND_INVEST_ATTITUDE:Customer main type MOSHOOFD can be classified into two groups based on the attitude ofcustomers towards buying / spending.- Liberals (1, 2, 5, 6)- Conservatives (3, 4, 7, 8, 9, 10)CUST_MARITAL_STAT:MRELGE, MRELSA, MRELOV, MFALLEEN describe the relationship status of a person which can becombined into two categories signifying the marital status- Married (MRELGE)- Unmarried (MRELSA, MRELOV, MFALLEEN)CUST_WORK_CATEGORY_PROFILE:Variables 19 – 24 describe the profile of work category of a person which can be of 2 types.- Potential income generating high profile work category (MBERHOOG, MBERZELF, MBERMIDD)- Relatively less Potential Income generating low profile work category (MBERBOER, MBERARBG,MBERARBO)CUST_INCOME_LEVEL:Variables 37 to 41 represent the income of a person which can be grouped into three classesLow (MINKM30)Middle (MINK3045, MINK4575)High (MINK7512, MINK123M)These can be best represented by a standalone factor depicting the average income (MINKGEM)CUST_INSURANCE_INTEREST:Variables 44 to 85 and 35,36 describe the interest of customers towards various insurance policiesin general starting from much needed insurance policies for life, health, disabilities, family/privateaccidents and optimal insurance policies for property, small automobiles of individuals (especiallywhere cost of replacement of damaged parts are as costly as getting a new vehicle) or deliveryvehicles of companies which are operated by third party drivers or an industrial machine to the mostsophisticated policies offering luxury and high safety in the form of private third party insurancewhere the insurer pays off the third party even if the insured is at fault and Car, fire and socialsecurity also represent forms of luxury or high sophistication. Hence here is the classification forboth the number and contribution of policies by different customers:- Individuals opting sophistication and high safety Insurance policies (WAPART, PERSAUT, BRAND,BYSTAND)- Firms/Individuals Opting much needed and Optimal Safety Insurance policies (All others)
3. 3. 2.1.2. MAPPING TARGET VARIABLES AS PREDICTORS OF CARAVAN INSURANCE BUYERS:These predictions have been made with descriptive statistics results of the data set along with thereal world logical themes (Appendix-1)FACTOR 1: AGEMiddle aged people are more likely to get caravan insuranceFACTOR 2: ATTITUDE TOWARDS SPENDING/ BUYINGPeople with a liberal attitude predicted by Customer Main type are more likely to get caravaninsuranceFACTOR 3: SOCIAL LIFE STYLE REFLECTORPeople who are modern, professional, middle and upper class and liberal investors of their incomeas predicted Customer Sub type are likely to get caravan insurance.FACTOR 4: MARITAL STATUSMarried Family Men are more likely to buy caravan insuranceFACTOR 5: WORK CATEGORY PROFILEPotential income generating high profile work category people are more likely to get the insurance.FACTOR 6: INCOME LEVELAverage, middle scale Income generators are more likely to get caravan insuranceHere the variable MINKGEM acts as a standalone factor to represent the average income of aperson.FACTOR 7: INSURANCE INTERESTIndividuals opting highly sophisticated high safety Insurance policies are more likely to buy caravaninsuranceFACTOR 8: PURCHASING POWER CLASSIndividuals who purchase or afford to buy high cost products as caravan insurance is not a need buta luxury which is aimed at the average and high income generators.FACTOR 9: RENTED HOME RESIDENTSResidents who stay in rented home might have their own house in their native or settled elsewherein a rented home for work and family convenience or might not have enough savings for investing on