AIML INTERNSHIP
DEPARTMENT OF INFORMATION TECHNOLOGY, SRKREC
Presented by:
MATCHA MANOJ KUMAR
3/4 - IT
Regd.No: 20B91A12B6
HENOTIC TECHNOLOGY PRIVATE LIMITED
7th July 2022 to 06th September 2022
CONTE NTS :
 INTRODUCTION ABOUT DATASET
 OBJECTIVE
 DATA SCIENCE PROJECT LIFE CYCLE
 RESULTS
 CONCLUSIONS
 REFERENCES
TRAVEL INSURANCE
DATASET :
LINK : https://www.kaggle.com/datasets/ersany/travel-insurance
INTRODUCTION ABOUT DATASET :
 Every year thousands of peoples travelling from one place to another place for any cause. If the
journey is too expensive or too important, they apply for travel insurance. That one usually tends to
overlook while planning the vacation or a trip.
 Travel Insurance covers risks during travel such as loss of passport and personal belonging cover, loss
of checked in baggage etc. Having these risks covered ensures an additional layer of protection
against financial loss.
 The dataset contains information such as – agency , agency type , distribution channel , product name
, duration , destination , net sales , commission , gender , age and claim(target variable).
 Based on the destination , agency type , channel , product name , duration of journey and some
another data based we decide whether the traveler able to claim the insurance or not.
 By implementing ML algorithms on the dataset, we can predict the results efficiently and accurately.
 This dataset contains 11 columns and 48260 observations, respectively.
OBJECTIVE :
The main agenda of this project is:
 In back days people cheat insurance companies in different accepts and claim their
insurance to over come this we build a project.
 In this project we study the previous data and now we decide whether the traveler can
claim their insurance or not.
 We Build an appropriate Machine Learning Model that will help to predict whether
the traveler can claim their insurance or not .
Data Science Project Life Cycle :
1. Data Pre-processing
i. Check the Duplicate and low variation data
ii. Identify and address the missing variables
iii. Handling of Outliers
iv. Categorical data and Encoding Techniques
v. Feature Scaling
vi. As this is unbalanced dataset we apply over sampling (RandomOverSampler) techniques to
balance the dataset.
2. Selection of Dependent and Independent variable
In my dataset, target variable is claim.
3. Training Models
Results
:
Based on the analysis , these are the results generated from different models
Model Name
True_Posi
tive
False_Ne
gative
False_Pos
itive
True_Neg
ative Accuracy Precision Recall F1 Score
Specificit
y MCC
ROC_AUC
_Score
Balanced
Accuracy
0LogisticRegression() 0 212 3 14263 0.985 0 0 0 1 -0.002 0.499895 0.5
1DecisionTreeClassifier() 9 203 237 14029 0.97 0.037 0.042 0.039 0.983 0.024 0.51292 0.512
2RandomForestClassifier() 0 212 0 14266 0.985 0 0 0 1 0 0.5 0.5
3ExtraTreesClassifier() 0 212 3 14263 0.985 0 0 0 1 -0.002 0.499895 0.5
4KNeighborsClassifier() 1 211 6 14260 0.985 0.143 0.005 0.009 1 0.023 0.502148 0.502
5SVC(probability=True) 0 212 0 14266 0.985 0 0 0 1 0 0.5 0.5
6BaggingClassifier(n_estimators=100) 0 212 19 14247 0.984 0 0 0 0.999 -0.004 0.499334 0.5
Results :
The model results in the following order by considering the model Accuracy, F1 score and
ROC_AUC_score.
1) Decision tree Classifier
2) Random forest classifier
3) KNeighbors Classifier
CONCLUSIONS :
ROC curve for Decision tree ROC curve for logistic regression
REFERENCES:
 https://www.kaggle.com/datasets/ersany/travel-insurance
AIML PPT.pptx

AIML PPT.pptx

  • 1.
    AIML INTERNSHIP DEPARTMENT OFINFORMATION TECHNOLOGY, SRKREC Presented by: MATCHA MANOJ KUMAR 3/4 - IT Regd.No: 20B91A12B6 HENOTIC TECHNOLOGY PRIVATE LIMITED 7th July 2022 to 06th September 2022
  • 2.
    CONTE NTS : INTRODUCTION ABOUT DATASET  OBJECTIVE  DATA SCIENCE PROJECT LIFE CYCLE  RESULTS  CONCLUSIONS  REFERENCES
  • 3.
    TRAVEL INSURANCE DATASET : LINK: https://www.kaggle.com/datasets/ersany/travel-insurance
  • 4.
    INTRODUCTION ABOUT DATASET:  Every year thousands of peoples travelling from one place to another place for any cause. If the journey is too expensive or too important, they apply for travel insurance. That one usually tends to overlook while planning the vacation or a trip.  Travel Insurance covers risks during travel such as loss of passport and personal belonging cover, loss of checked in baggage etc. Having these risks covered ensures an additional layer of protection against financial loss.  The dataset contains information such as – agency , agency type , distribution channel , product name , duration , destination , net sales , commission , gender , age and claim(target variable).  Based on the destination , agency type , channel , product name , duration of journey and some another data based we decide whether the traveler able to claim the insurance or not.  By implementing ML algorithms on the dataset, we can predict the results efficiently and accurately.  This dataset contains 11 columns and 48260 observations, respectively.
  • 5.
    OBJECTIVE : The mainagenda of this project is:  In back days people cheat insurance companies in different accepts and claim their insurance to over come this we build a project.  In this project we study the previous data and now we decide whether the traveler can claim their insurance or not.  We Build an appropriate Machine Learning Model that will help to predict whether the traveler can claim their insurance or not .
  • 6.
    Data Science ProjectLife Cycle : 1. Data Pre-processing i. Check the Duplicate and low variation data ii. Identify and address the missing variables iii. Handling of Outliers iv. Categorical data and Encoding Techniques v. Feature Scaling vi. As this is unbalanced dataset we apply over sampling (RandomOverSampler) techniques to balance the dataset. 2. Selection of Dependent and Independent variable In my dataset, target variable is claim. 3. Training Models
  • 7.
    Results : Based on theanalysis , these are the results generated from different models Model Name True_Posi tive False_Ne gative False_Pos itive True_Neg ative Accuracy Precision Recall F1 Score Specificit y MCC ROC_AUC _Score Balanced Accuracy 0LogisticRegression() 0 212 3 14263 0.985 0 0 0 1 -0.002 0.499895 0.5 1DecisionTreeClassifier() 9 203 237 14029 0.97 0.037 0.042 0.039 0.983 0.024 0.51292 0.512 2RandomForestClassifier() 0 212 0 14266 0.985 0 0 0 1 0 0.5 0.5 3ExtraTreesClassifier() 0 212 3 14263 0.985 0 0 0 1 -0.002 0.499895 0.5 4KNeighborsClassifier() 1 211 6 14260 0.985 0.143 0.005 0.009 1 0.023 0.502148 0.502 5SVC(probability=True) 0 212 0 14266 0.985 0 0 0 1 0 0.5 0.5 6BaggingClassifier(n_estimators=100) 0 212 19 14247 0.984 0 0 0 0.999 -0.004 0.499334 0.5
  • 8.
    Results : The modelresults in the following order by considering the model Accuracy, F1 score and ROC_AUC_score. 1) Decision tree Classifier 2) Random forest classifier 3) KNeighbors Classifier
  • 9.
    CONCLUSIONS : ROC curvefor Decision tree ROC curve for logistic regression
  • 10.