INSURANCE ASSIGNMENT
Submitted to:- Prof. Vinod Sir
Age of the policy Holder
In [3]: insurance.age.value_counts()
...: sns.histplot(x='age', data=insurance,color='r')
...: plt.title('Age of Policy Holders',fontsize=15)
...: plt.xlabel('Age',fontsize=15)
Out[3]: Text(0.5, 0, 'Age')
Insights:- Maximum Policy Holders age is b/w 18 to
27
Sex Gender (Male & Females)
In [5]: insurance.sex.value_counts()
Out[5]:
male 676
female 662
Name: sex, dtype: int64
 In [6]: sns.countplot(x='sex',data=insurance)
 ...: plt.title('No of Males and
Female',fontsize=15)
 Out[6]: Text(0.5, 1.0, 'No of Males and
Female')
Insights:- Male Policy Holders are slightly on the higher
than Female
BMI of Policy Holders
In [3]: insurance.bmi.value_counts()
Out[3]:
32.300 13
28.310 9
30.495 8
30.875 8
31.350 8
..
46.200 1
23.800 1
44.770 1
32.120 1
30.970 1
Name: bmi, Length: 548, dtype:
In [4]: sns.histplot(x='bmi',
data=insurance,color='r')
...: plt.title('No of Policy Holders
wrt bmi',fontsize=15)
...: plt.xlabel('bmi',fontsize=15)
Insights:- Maximum Policy Holders have BMI index around
30
No of Children in the family of Policy Holders
In [6]: insurance.children.value_counts()
Out[6]:
0 574
1 324
2 240
3 157
4 25
5 18
Name: children, dtype: int64
In [7]: sns.countplot(x='children',data=insurance)
...: plt.title('No of Policy holders wrt to Children',fontsize=15)
Out[7]: Text(0.5, 1.0, 'No of Policy holders wrt to Children')
Insights:- With this Plot we can conclude that the Maximum Policy Holders have no children
Smoker and Non Smoker (Male & Female)
In [8]: insurance.smoker.value_counts()
Out[8]:
no 1064
yes 274
Name: smoker, dtype: int64
In [9]: sns.countplot(x='smoker',data=insurance)
...: plt.title('Smoker and Non Smoker',fontsize=15)
Out[9]: Text(0.5, 1.0, 'Smoker and Non Smoker’)
In [10]: pd.crosstab(insurance.sex , insurance.smoker)
Out[10]:
smoker no yes
sex
female 547 115
male 517 159
Insights:- With these Plot we got a outcome that Non smokers are is around 80% as compared to smoker and
among them only 9% are Female smoker as compared to men which are 12%
Regions of Policy Holder
In [11]: insurance.region.value_counts()
Out[11]:
southeast 364
southwest 325
northwest 325
northeast 324
Name: region, dtype: int64
In [12]: sns.countplot(x='region',data=insurance)
...: plt.title('Policy holders from different region of
India',fontsize=15)
Out[12]: Text(0.5, 1.0, 'Policy holders from different region of
India')
Insights:- Maximum policy Holders came from South East Region
Insurance charges paid by Policy Holder
In [13]: insurance.charges.value_counts()
Out[13]:
1639.56310 2
16884.92400 1
29330.98315 1
2221.56445 1
19798.05455 1
..
7345.08400 1
26109.32905 1
28287.89766 1
1149.39590 1
29141.36030 1
Name: charges, Length: 1337, dtype: int64
In [14]: sns.histplot(x='charges', data=insurance,color='r')
...: plt.title('Yearly Charges for Insurance among Policy holders',fontsize=15)
...: plt.xlabel('charges',fontsize=15)
Out[14]: Text(0.5, 0, 'charges')
Insights:- Max Insurance Charges Varies b/w 1121 Rs to 8700 Rs
Charges Connection with Age , BMI or Children
In [5]: da=insurance[['charges','age','bmi']]
...: sns.pairplot(da,kind='reg')
Out[5]: <seaborn.axisgrid.PairGrid at 0x23ca1412f70>
In [6]: print(da.corr())
charges age bmi
charges 1.000000 0.299008 0.198341
age 0.299008 1.000000 0.109272
bmi 0.198341 0.109272 1.000000
In [7]: sns.heatmap(da.corr(),cmap="YlGnBu", annot=True)
Out[7]: <AxesSubplot:>
In [9]: da=insurance[['charges','age','children']]
...: sns.pairplot(da,kind='reg')
Out[9]: <seaborn.axisgrid.PairGrid at 0x23ca6259250>
In [12]: print(da.corr())
charges age children
charges 1.000000 0.299008 0.067998
age 0.299008 1.000000 0.042469
children 0.067998 0.042469 1.000000
In [13]: sns.heatmap(da.corr(),cmap="YlGnBu", annot=True)
Out[13]: <AxesSubplot:>
Insights:- With this Plot we can conclude that Charges
and age is closely connected to each other with there
ratio 0.3 which is comparatively higher than BMI and
Children with ratio 0.2 and 0.068 respectively

insurance.pptx

  • 1.
  • 2.
    Age of thepolicy Holder In [3]: insurance.age.value_counts() ...: sns.histplot(x='age', data=insurance,color='r') ...: plt.title('Age of Policy Holders',fontsize=15) ...: plt.xlabel('Age',fontsize=15) Out[3]: Text(0.5, 0, 'Age') Insights:- Maximum Policy Holders age is b/w 18 to 27
  • 3.
    Sex Gender (Male& Females) In [5]: insurance.sex.value_counts() Out[5]: male 676 female 662 Name: sex, dtype: int64  In [6]: sns.countplot(x='sex',data=insurance)  ...: plt.title('No of Males and Female',fontsize=15)  Out[6]: Text(0.5, 1.0, 'No of Males and Female') Insights:- Male Policy Holders are slightly on the higher than Female
  • 4.
    BMI of PolicyHolders In [3]: insurance.bmi.value_counts() Out[3]: 32.300 13 28.310 9 30.495 8 30.875 8 31.350 8 .. 46.200 1 23.800 1 44.770 1 32.120 1 30.970 1 Name: bmi, Length: 548, dtype: In [4]: sns.histplot(x='bmi', data=insurance,color='r') ...: plt.title('No of Policy Holders wrt bmi',fontsize=15) ...: plt.xlabel('bmi',fontsize=15) Insights:- Maximum Policy Holders have BMI index around 30
  • 5.
    No of Childrenin the family of Policy Holders In [6]: insurance.children.value_counts() Out[6]: 0 574 1 324 2 240 3 157 4 25 5 18 Name: children, dtype: int64 In [7]: sns.countplot(x='children',data=insurance) ...: plt.title('No of Policy holders wrt to Children',fontsize=15) Out[7]: Text(0.5, 1.0, 'No of Policy holders wrt to Children') Insights:- With this Plot we can conclude that the Maximum Policy Holders have no children
  • 6.
    Smoker and NonSmoker (Male & Female) In [8]: insurance.smoker.value_counts() Out[8]: no 1064 yes 274 Name: smoker, dtype: int64 In [9]: sns.countplot(x='smoker',data=insurance) ...: plt.title('Smoker and Non Smoker',fontsize=15) Out[9]: Text(0.5, 1.0, 'Smoker and Non Smoker’) In [10]: pd.crosstab(insurance.sex , insurance.smoker) Out[10]: smoker no yes sex female 547 115 male 517 159 Insights:- With these Plot we got a outcome that Non smokers are is around 80% as compared to smoker and among them only 9% are Female smoker as compared to men which are 12%
  • 7.
    Regions of PolicyHolder In [11]: insurance.region.value_counts() Out[11]: southeast 364 southwest 325 northwest 325 northeast 324 Name: region, dtype: int64 In [12]: sns.countplot(x='region',data=insurance) ...: plt.title('Policy holders from different region of India',fontsize=15) Out[12]: Text(0.5, 1.0, 'Policy holders from different region of India') Insights:- Maximum policy Holders came from South East Region
  • 8.
    Insurance charges paidby Policy Holder In [13]: insurance.charges.value_counts() Out[13]: 1639.56310 2 16884.92400 1 29330.98315 1 2221.56445 1 19798.05455 1 .. 7345.08400 1 26109.32905 1 28287.89766 1 1149.39590 1 29141.36030 1 Name: charges, Length: 1337, dtype: int64 In [14]: sns.histplot(x='charges', data=insurance,color='r') ...: plt.title('Yearly Charges for Insurance among Policy holders',fontsize=15) ...: plt.xlabel('charges',fontsize=15) Out[14]: Text(0.5, 0, 'charges') Insights:- Max Insurance Charges Varies b/w 1121 Rs to 8700 Rs
  • 9.
    Charges Connection withAge , BMI or Children In [5]: da=insurance[['charges','age','bmi']] ...: sns.pairplot(da,kind='reg') Out[5]: <seaborn.axisgrid.PairGrid at 0x23ca1412f70> In [6]: print(da.corr()) charges age bmi charges 1.000000 0.299008 0.198341 age 0.299008 1.000000 0.109272 bmi 0.198341 0.109272 1.000000 In [7]: sns.heatmap(da.corr(),cmap="YlGnBu", annot=True) Out[7]: <AxesSubplot:>
  • 10.
    In [9]: da=insurance[['charges','age','children']] ...:sns.pairplot(da,kind='reg') Out[9]: <seaborn.axisgrid.PairGrid at 0x23ca6259250> In [12]: print(da.corr()) charges age children charges 1.000000 0.299008 0.067998 age 0.299008 1.000000 0.042469 children 0.067998 0.042469 1.000000 In [13]: sns.heatmap(da.corr(),cmap="YlGnBu", annot=True) Out[13]: <AxesSubplot:> Insights:- With this Plot we can conclude that Charges and age is closely connected to each other with there ratio 0.3 which is comparatively higher than BMI and Children with ratio 0.2 and 0.068 respectively