SlideShare a Scribd company logo
1 of 7
Download to read offline
IBM Data Science Capstone: Car Accident
Severity Report
Introduction | Business Understanding
In an effort to reduce the frequency of car collisions in a community, an algorithm must be
developed to predict the severity of an accident given the current weather, road and
visibility conditions. When conditions are bad, this model will alert drivers to remind them
to be more careful.
Data Understanding
Our predictor or target variable will be 'SEVERITYCODE' because it is used measure the
severity of an accident from 0 to 5 within the dataset. Attributes used to weigh the severity
of an accident are 'WEATHER', 'ROADCOND' and 'LIGHTCOND'.
Severity codes are as follows:
• 0: Little to no Probability (Clear Conditions)
• 1: Very Low Probability - Chance or Property Damage
• 2: Low Probability - Chance of Injury
• 3: Mild Probability - Chance of Serious Injury
• 4: High Probability - Chance of Fatality
Extract Dataset & Convert
In its original form, this data is not fit for analysis. For one, there are many columns that we
will not use for this model. Also, most of the features are of type object, when they should
be numerical type.
We must use label encoding to covert the features to our desired data type.
With the new columns, we can now use this data in our analysis and ML models!
Now let's check the data types of the new columns in our dataframe. Moving forward, we
will only use the new columns for our analysis.
Balancing the Dataset
Our target variable SEVERITYCODE is only 42% balanced. In fact, severitycode in class 1 is
nearly three times the size of class 2.
We can fix this by downsampling the majority class.
Perfectly balanced.
Methodology
Our data is now ready to be fed into machine learning models.
We will use the following models:
K-Nearest Neighbor (KNN)
KNN will help us predict the severity code of an outcome by finding the most similar to data
point within k distance.
Decision Tree
A decision tree model gives us a layout of all possible outcomes so we can fully analyze the
concequences of a decision. It context, the decision tree observes all possible outcomes of
different weather conditions.
Logistic Regression
Because our dataset only provides us with two severity code outcomes, our model will only
predict one of those two classes. This makes our data binary, which is perfect to use with
logistic regression.
Let's get started!
Initialization
Define X and y
Normalize the dataset
Train/Test Split
We will use 30% of our data for testing and 70% for training.
Here we will begin our modeling and predictions...
Results & Evaluation Now we will check the accuracy of our models.
Discussion
In the beginning of this notebook, we had categorical data that was of type 'object'. This is
not a data type that we could have fed through an algorithm, so label encoding was used to
created new classes that were of type int8; a numerical data type.
After solving that issue, we were presented with another - imbalanced data. As mentioned
earlier, class 1 was nearly three times larger than class 2. The solution to this was down
sampling the majority class with sklearn's resample tool. We down sampled to match the
minority class exactly with 58188 values each.
Once we analyzed and cleaned the data, it was then fed through three ML models; K-
Nearest Neighbor, Decision Tree and Logistic Regression. Although the first two are ideal
for this project, logistic regression made most sense because of its binary nature.
Evaluation metrics used to test the accuracy of our models were Jaccard index, f-1 score
and logloss for logistic regression. Choosing different k, max depth and hyperparameter C
values helped to improve our accuracy to be the best possible.
Conclusion
Based on historical data from weather conditions pointing to certain classes, we can
conclude that particular weather conditions have a somewhat impact on whether or not
travel could result in property damage (class 1) or injury (class 2).
Thank you for reading!

More Related Content

What's hot

An Elementary Introduction to Kalman Filtering
An Elementary Introduction to Kalman FilteringAn Elementary Introduction to Kalman Filtering
An Elementary Introduction to Kalman FilteringAshish Sharma
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimationData Con LA
 
Partial Binomial Distribution method for Generation capacity outage using Spr...
Partial Binomial Distribution method for Generation capacity outage using Spr...Partial Binomial Distribution method for Generation capacity outage using Spr...
Partial Binomial Distribution method for Generation capacity outage using Spr...vivatechijri
 
Predicting house prices_Regression
Predicting house prices_RegressionPredicting house prices_Regression
Predicting house prices_RegressionSruti Jain
 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)Learnbay Datascience
 
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...csandit
 
KNN and ARL Based Imputation to Estimate Missing Values
KNN and ARL Based Imputation to Estimate Missing ValuesKNN and ARL Based Imputation to Estimate Missing Values
KNN and ARL Based Imputation to Estimate Missing Valuesijeei-iaes
 
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...csandit
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 
Opportunity Challenge: a comparative study
Opportunity Challenge: a comparative studyOpportunity Challenge: a comparative study
Opportunity Challenge: a comparative studyMatteo Ciprian
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classificationSnehaDey21
 
Approximated Bayes Estimators for the Parameters and Reliability Function of ...
Approximated Bayes Estimators for the Parameters and Reliability Function of ...Approximated Bayes Estimators for the Parameters and Reliability Function of ...
Approximated Bayes Estimators for the Parameters and Reliability Function of ...Conferenceproceedings
 
COMPUTER BASED NUMERICAL AND STATISTICAL TECHNIQUES(CBNST)
COMPUTER BASED NUMERICAL AND STATISTICAL TECHNIQUES(CBNST)COMPUTER BASED NUMERICAL AND STATISTICAL TECHNIQUES(CBNST)
COMPUTER BASED NUMERICAL AND STATISTICAL TECHNIQUES(CBNST)Prashant Srivastav
 

What's hot (20)

An Elementary Introduction to Kalman Filtering
An Elementary Introduction to Kalman FilteringAn Elementary Introduction to Kalman Filtering
An Elementary Introduction to Kalman Filtering
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
 
Data analysis of weather forecasting
Data analysis of weather forecastingData analysis of weather forecasting
Data analysis of weather forecasting
 
Partial Binomial Distribution method for Generation capacity outage using Spr...
Partial Binomial Distribution method for Generation capacity outage using Spr...Partial Binomial Distribution method for Generation capacity outage using Spr...
Partial Binomial Distribution method for Generation capacity outage using Spr...
 
Predicting house prices_Regression
Predicting house prices_RegressionPredicting house prices_Regression
Predicting house prices_Regression
 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)
 
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
 
KNN and ARL Based Imputation to Estimate Missing Values
KNN and ARL Based Imputation to Estimate Missing ValuesKNN and ARL Based Imputation to Estimate Missing Values
KNN and ARL Based Imputation to Estimate Missing Values
 
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
 
KNN Classifier
KNN ClassifierKNN Classifier
KNN Classifier
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
Machine learning
Machine learningMachine learning
Machine learning
 
XL-MINER: Data Exploration
XL-MINER: Data ExplorationXL-MINER: Data Exploration
XL-MINER: Data Exploration
 
Opportunity Challenge: a comparative study
Opportunity Challenge: a comparative studyOpportunity Challenge: a comparative study
Opportunity Challenge: a comparative study
 
Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classification
 
Approximated Bayes Estimators for the Parameters and Reliability Function of ...
Approximated Bayes Estimators for the Parameters and Reliability Function of ...Approximated Bayes Estimators for the Parameters and Reliability Function of ...
Approximated Bayes Estimators for the Parameters and Reliability Function of ...
 
COMPUTER BASED NUMERICAL AND STATISTICAL TECHNIQUES(CBNST)
COMPUTER BASED NUMERICAL AND STATISTICAL TECHNIQUES(CBNST)COMPUTER BASED NUMERICAL AND STATISTICAL TECHNIQUES(CBNST)
COMPUTER BASED NUMERICAL AND STATISTICAL TECHNIQUES(CBNST)
 
Ew36913917
Ew36913917Ew36913917
Ew36913917
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 
XL-MINER:Prediction
XL-MINER:PredictionXL-MINER:Prediction
XL-MINER:Prediction
 

Similar to Car Accident Severity Report

DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..butest
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET Journal
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET Journal
 
AMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTAMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTIRJET Journal
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityGon-soo Moon
 
Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson ChallengeRaouf KESKES
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMSAli T. Lotia
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperJames by CrowdProcess
 
KnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectKnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectMarciano Moreno
 
Forecasting Default Probabilities in Emerging Markets and Dynamical Regula...
Forecasting Default Probabilities  in Emerging Markets and   Dynamical Regula...Forecasting Default Probabilities  in Emerging Markets and   Dynamical Regula...
Forecasting Default Probabilities in Emerging Markets and Dynamical Regula...SSA KPI
 
Scaling Multinomial Logistic Regression via Hybrid Parallelism
Scaling Multinomial Logistic Regression via Hybrid ParallelismScaling Multinomial Logistic Regression via Hybrid Parallelism
Scaling Multinomial Logistic Regression via Hybrid ParallelismParameswaran Raman
 
AIML2 DNN lab 1 3 1hr (111-1).pdf
AIML2 DNN lab 1 3 1hr (111-1).pdfAIML2 DNN lab 1 3 1hr (111-1).pdf
AIML2 DNN lab 1 3 1hr (111-1).pdfssuserb4d806
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfAaryanArora10
 
Analysis_of_deep_learning_algorithms_for_diabetic_retinopathy.pptx
Analysis_of_deep_learning_algorithms_for_diabetic_retinopathy.pptxAnalysis_of_deep_learning_algorithms_for_diabetic_retinopathy.pptx
Analysis_of_deep_learning_algorithms_for_diabetic_retinopathy.pptxDebrajBanerjee22
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET Journal
 

Similar to Car Accident Severity Report (20)

DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
AMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTAMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLT
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-Severity
 
Final Project Statr 503
Final Project Statr 503Final Project Statr 503
Final Project Statr 503
 
Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson Challenge
 
presentationIDC - 14MAY2015
presentationIDC - 14MAY2015presentationIDC - 14MAY2015
presentationIDC - 14MAY2015
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
 
KnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProjectKnowledgeFromDataAtScaleProject
KnowledgeFromDataAtScaleProject
 
Forecasting Default Probabilities in Emerging Markets and Dynamical Regula...
Forecasting Default Probabilities  in Emerging Markets and   Dynamical Regula...Forecasting Default Probabilities  in Emerging Markets and   Dynamical Regula...
Forecasting Default Probabilities in Emerging Markets and Dynamical Regula...
 
Scaling Multinomial Logistic Regression via Hybrid Parallelism
Scaling Multinomial Logistic Regression via Hybrid ParallelismScaling Multinomial Logistic Regression via Hybrid Parallelism
Scaling Multinomial Logistic Regression via Hybrid Parallelism
 
AIML2 DNN lab 1 3 1hr (111-1).pdf
AIML2 DNN lab 1 3 1hr (111-1).pdfAIML2 DNN lab 1 3 1hr (111-1).pdf
AIML2 DNN lab 1 3 1hr (111-1).pdf
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
 
Analysis_of_deep_learning_algorithms_for_diabetic_retinopathy.pptx
Analysis_of_deep_learning_algorithms_for_diabetic_retinopathy.pptxAnalysis_of_deep_learning_algorithms_for_diabetic_retinopathy.pptx
Analysis_of_deep_learning_algorithms_for_diabetic_retinopathy.pptx
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
 

Recently uploaded

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 

Car Accident Severity Report

  • 1. IBM Data Science Capstone: Car Accident Severity Report Introduction | Business Understanding In an effort to reduce the frequency of car collisions in a community, an algorithm must be developed to predict the severity of an accident given the current weather, road and visibility conditions. When conditions are bad, this model will alert drivers to remind them to be more careful. Data Understanding Our predictor or target variable will be 'SEVERITYCODE' because it is used measure the severity of an accident from 0 to 5 within the dataset. Attributes used to weigh the severity of an accident are 'WEATHER', 'ROADCOND' and 'LIGHTCOND'. Severity codes are as follows: • 0: Little to no Probability (Clear Conditions) • 1: Very Low Probability - Chance or Property Damage • 2: Low Probability - Chance of Injury • 3: Mild Probability - Chance of Serious Injury • 4: High Probability - Chance of Fatality
  • 2. Extract Dataset & Convert In its original form, this data is not fit for analysis. For one, there are many columns that we will not use for this model. Also, most of the features are of type object, when they should be numerical type. We must use label encoding to covert the features to our desired data type. With the new columns, we can now use this data in our analysis and ML models! Now let's check the data types of the new columns in our dataframe. Moving forward, we will only use the new columns for our analysis. Balancing the Dataset Our target variable SEVERITYCODE is only 42% balanced. In fact, severitycode in class 1 is nearly three times the size of class 2. We can fix this by downsampling the majority class. Perfectly balanced.
  • 3. Methodology Our data is now ready to be fed into machine learning models. We will use the following models: K-Nearest Neighbor (KNN) KNN will help us predict the severity code of an outcome by finding the most similar to data point within k distance. Decision Tree A decision tree model gives us a layout of all possible outcomes so we can fully analyze the concequences of a decision. It context, the decision tree observes all possible outcomes of different weather conditions. Logistic Regression Because our dataset only provides us with two severity code outcomes, our model will only predict one of those two classes. This makes our data binary, which is perfect to use with logistic regression. Let's get started! Initialization Define X and y
  • 4. Normalize the dataset Train/Test Split We will use 30% of our data for testing and 70% for training. Here we will begin our modeling and predictions...
  • 5.
  • 6. Results & Evaluation Now we will check the accuracy of our models.
  • 7. Discussion In the beginning of this notebook, we had categorical data that was of type 'object'. This is not a data type that we could have fed through an algorithm, so label encoding was used to created new classes that were of type int8; a numerical data type. After solving that issue, we were presented with another - imbalanced data. As mentioned earlier, class 1 was nearly three times larger than class 2. The solution to this was down sampling the majority class with sklearn's resample tool. We down sampled to match the minority class exactly with 58188 values each. Once we analyzed and cleaned the data, it was then fed through three ML models; K- Nearest Neighbor, Decision Tree and Logistic Regression. Although the first two are ideal for this project, logistic regression made most sense because of its binary nature. Evaluation metrics used to test the accuracy of our models were Jaccard index, f-1 score and logloss for logistic regression. Choosing different k, max depth and hyperparameter C values helped to improve our accuracy to be the best possible. Conclusion Based on historical data from weather conditions pointing to certain classes, we can conclude that particular weather conditions have a somewhat impact on whether or not travel could result in property damage (class 1) or injury (class 2). Thank you for reading!