SlideShare a Scribd company logo
Machine Learning statistical
model using Transportation data
Introduction
► As the world is growing rapidly the people and the vehicles we use to move
from one place to another, so the transportation is playing a vital role in
making human lives easiest to travel from one place to another, everyday more
and more vehicles are being produced and being bought by the people around
the world, be it Electric, Hydrogen, petrol, diesel or solar powered. So, most
importantly Road Transportation such as, Road transport can be classified as
either transporting goods and materials or transporting people. The main
advantage of road transportation is that it allows for door-to-door delivery of
goods and materials while also being a very cost-effective mode of cartage,
loading, and unloading. Road transport is sometimes the only option for
transporting goods and people to and from rural areas that are not served by
rail, water, or air transport. Road transportation has numerous advantages over
other modes of transportation. Road transport requires significantly less
investment than other modes of transportation such as railways and air
transport. Roads are less expensive to build, operate, and maintain than
railways.
Dataset Description
► The dataset is collected from the Kaggle data repository,(US Accidents (2016
- 2021)
► Dataset is in Comma Separated Value format, It consists of 2845342 entries,
Ranging from 0 to 2845341, which has 47 columns.
► Since the dataset is very huge and contains many columns, we are
going to discuss about important columns over here.
1. Severity – Type(int), this columns explains about the severity of the accident, and importantly this is our
target class for making prediction further in the project.
2. Start_time & End_time – Type(object), This shows the start and end time of the accident took palce at
certain place, similarly, we have latitude and longitude coordinates of the accident place, since the dataset is
about accidents taken place in US.
3. Distance – Length of the road extent affected by the accident occurred.
4. Description – Explains about the description of the accidents given by the fellow drivers who were driving
along side with the accident victims.
5. City, State, County – Explains about the place where the accident took place, in which specific city, state
and county.
6. Along with these, we also have other columns such as weather, temperature, traffic signal, sunrise_sunset,
railway_line etc.
Dataset Overview.
Dataset info
Missing Values
Descriptive Analysis
► Here we are going to dive deep into the dataset and know some more
information about it.
► Below functions helps us to understand the insights of the data and also helps
us to extract information which might help us to fill the null values.
1. df.info() -> information about the dataset, such as type of each column and
the numebr of entries present in the dataset.
2. df.describe() -> helps us to understand the descriptive data of each column,
note: the description for numerical and categorical will be different, by
default we get the numerical column description.
3. df.isnull().sum() -> Count of Missing Values for each column.
4. df.head() -> Displays first 5 rows of the dataset, similarly df.tail displays last
5.
Top Cities with Highest Number of Accidents
Top States with Highest Number of Accidents
Missing Values Plots
Since temperature has less than 10% null values of the total number of values and they
appear to be normally distributed. It might be a good idea to fill these empty data with
the mean value. Whereas for Visibility(mi), it's right skewed. So replacing null values
with a median value is more suitable.
Since Precipitation(in), Wind_Speed(mph) have an right skewed distribution. It's better
to use mode value to fill the Null value in these two columns. Humidity(%) though has
a left skewed distribution. I still used the mode value to fill out the Null. It may not be
accurate to fill out the Null value based on the previous or latter adjacent value, as
every two accidents were hardly related.
Also, Most of the columns were Irrelevant and consisted of more than 60% of missing
values, so I decided to drop those features.
Geographical heatmap of accidents in each state
Predictive Analysis
► Predictive analytics uses mathematical modeling tools to generate predictions
about an unknown fact, characteristic, or event. “It’s about taking the data that
you know exists and building a mathematical model from that data to help you
make predictions about somebody not yet in that data set,” Goulding explains.
► An analyst’s role in predictive analysis is to assemble and organize the data,
identify which type of mathematical model applies to the case at hand, and
then draw the necessary conclusions from the results. They are often also
tasked with communicating those conclusions to stakeholders effectively and
engagingly.
► “The tools we’re using for predictive analytics now have improved and
become much more sophisticated,” Goulding says, explaining that these
advanced models have allowed us to “handle massive amounts of data in ways
we couldn’t before.”
► Example: Linear Regression, Logistic Regression, Decision Trees, Random
Forest, Support Vector Machines etc.
Cluster Analysis
► Clustering is the process of dividing a population or set of data points into
groups so that data points in the same group are more similar to other data
points in the same group and dissimilar to data points in other groups. It is
essentially a collection of objects based on their similarity and dissimilarity.
► Cluster analysis itself is not one specific algorithm but the general task to be
solved. It can be achieved by various algorithms that differ significantly in
their understanding of what constitutes a cluster and how to efficiently find
them. Popular notions of clusters include groups with small distances between
cluster members, dense areas of the data space, intervals or particular
Statistical distributions.
► Clustering can therefore be formulated as a multi-objective
optimization problem. The appropriate clustering algorithm and parameter
settings (including parameters such as the distance function to use, a density
threshold or the number of expected clusters) depend on the individual data
set and intended use of the results.
Random Forest
► Random Forest is a supervised machine learning algorithm. This Technique can be
used for both regression and classification tasks but generally performs better in
classification tasks. As the name suggests, Random Forest technique considers
multiple decision trees before giving an output. So, it is basically an ensemble of
decision trees.
► This technique is based on the belief that a greater number of trees would converge
to the right decision. For classification, it uses a voting system and then decides the
class whereas in regression it takes the mean of all the outputs of each of the
decision trees.
► It works well with large datasets with high dimensionality. The random forest
algorithm is an extension of the bagging method as it utilizes both bagging and
feature randomness to create an uncorrelated forest of decision trees. Feature
randomness, also known as feature bagging or “the random subspace method
generates a random subset of features, which ensures low correlation among
decision trees.
Random Forest Results
KNearest Neighbors
► The k-nearest neighbor algorithm, also known as KNN or k-NN, is a
non-parametric, supervised learning classifier that uses proximity to classify or
predict the grouping of an individual data point. It can be used for both regression
and classification problems, but it is most commonly used as a classification
algorithm, based on the assumption that similar points can be found close together.
► A majority vote is used to assign a class label to a classification problem that is, the
label that is most frequently represented around a given data point is used. While
technically this is referred to as "plurality voting," the term "majority vote" is more
commonly used in literature.
► The difference between these terms is that "majority voting" technically requires a
majority of more than 50%, which only works when there are only two options.
When there are multiple classes say, four categories you don't always need 50% of
the vote to make a decision about a class; you could assign a class label with a vote
of more than 25%.
KNeighbors Classifier
Variable Selection Method
► Feature or Variable selection methods are used to select specific features from our dataset, which are useful and important
for our model to learn and predict. As a result, feature selection is an important step in the development of a machine
learning model. Its goal is to identify the best set of features for developing a machine learning model.
► Some popular techniques of feature selection in machine learning are:
• Filter methods
• Wrapper methods
• Embedded methods
► Filter Methods
• These methods are generally used while doing the pre-processing step. These methods select features from the dataset
irrespective of the use of any machine learning algorithm.
• Techniques such as : Information gain, Chi-Square, Variance_Threshold, Mean_Absolute_Difference etc.
► Wrapper methods:
• Wrapper methods, also referred to as greedy algorithms train the algorithm by using a subset of features in an iterative
manner. Based on the conclusions made from training in prior to the model, addition and removal of features takes place.
• Techniques such as: Forward selection, Backward Elimination, Bi-Directional Elimination etc.
► Embedded methods:
• In embedded methods, the feature selection algorithm is blended as part of the learning algorithm, thus having its own
built-in feature selection methods. Embedded methods encounter the drawbacks of filter and wrapper methods and merge
their advantages.
• Techniques such as: Regularization, tree based methods
Variable selection using SequentialFeatureSelection
► Sequential feature selection algorithms are a type of greedy search algorithm that is
used to reduce a d-dimensional feature space to a k-dimensional feature subspace,
where k d. Feature selection algorithms are designed to automatically select a subset of
features that are most relevant to the problem.
► A wrapper approach, such as sequential feature selection, is especially useful when
embedded feature selection, such as a regularization penalty like LASSO, is not
applicable.
► SFAs, in a nutshell, remove or add features one at a time based on classifier
performance until a feature subset of the desired size k is reached.
► There are basically 4 types of SFA’s such as:
1. Sequential Forward Selection (SFS)
2. Sequential Backward Selection (SBS)
3. Sequential Forward Floating Selection (SFFS)
4. Sequential Backward Floating Selection (SBFS)
► The one we have employed in our project is the Sequential forward selection
Mlxlend Feature selection library for selecting the best features for the model.
Testing the Model on Variables selected by
algorithm.
Decision Tree
► A decision tree is a decision support tool that uses a tree-like model of decisions and their possible
consequences, including chance event outcomes, resource costs, and utility. It is one way to display
an algorithm that only contains conditional control statements. Decision trees are commonly used
in operations research, specifically in decision analysis, to help identify a strategy most likely to reach
a goal but are also a popular tool in machine learning. A decision tree is a flowchart-like structure in
which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or
tails), each branch represent the outcome of the test, and each leaf node represents a class label
(decision taken after computing all attributes). The paths from root to leaf represent classification
rules. In decision analysis, a decision tree and the closely related influence diagram are used as a
visual and analytical decision support tool, where the expected values (or expected utility) of
competing alternatives are calculated.
► A decision tree consists of three types of nodes
► Decision nodes – typically represented by squares
► Chance nodes – typically represented by circles
► End nodes – typically represented by triangles
Using Decision Tree as a classifier, we have fitted a sequential feature selector
model to extract the important features from the dataset.
Sfal. Subsets_ -> Explains about the average accuracy we got by training the model
for number of features in each step.
Plot about the important features extracted from Sequential Feature
Selector, X axis represents numebr of features and Y axis represents
prediction accuracy we got by selecting those specific features.
Results are converted into a dataframe where 1st column represents the number
of features and 2nd
column represents the accuracy we got from selecting those
features.
Conclusion
► In this project, we have done a lot of preprocessing and exploratory data
analysis, since the main objective was to get insights from the road
transportation data and do statistical analysis.
► Data preprocessing has been performed by filling in the null vlaues and
dropping of irrelevant columns based on how important they are for building
an efficient model keeping computational cost in mind.
► Predictive models such as Decision tree, Random Forest and KNearest
Neighbors Classification algorithms has been applied to predict the target
variable i.e Severity of the accident using the other independent features.
► Variable selection methods such as Sequential Feature Selector has been
applied to the cleaned data to extract the most important features, and those
features are trained and tested on the Decision tree model.
About TechieYan Technologies
TechieYan Technologies offers a special platform where you can study all the most
cutting-edge technologies directly from industry professionals and get
certifications. TechieYan collaborates closely with engineering schools,
engineering students, academic institutions, the Indian Army, and businesses.
Project trainings, engineering workshops, internships, and laboratory setup are all
things we provide. We work on projects related to robotics, python, deep learning,
artificial intelligence, IoT, embedded systems, matlab, hfss pcb design, vlsi, and
ieee current projects.
Address: 16-11-16/V/24, Sri Ram Sadan, Moosarambagh, Hyderabad 500036
phone no: +91 7075575787
website:https://techieyantechnologies.com

More Related Content

Similar to Machine Learning statistical model using Transportation data

ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
Harry Zhang
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
AmAn Singh
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
VenkateswaraBabuRavi
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
DrGnaneswariG
 
Data analytics for engineers- introduction
Data analytics for engineers-  introductionData analytics for engineers-  introduction
Data analytics for engineers- introduction
RINUSATHYAN
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
ijcseit
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
IJCSES Journal
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
Dr. C.V. Suresh Babu
 
UNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningUNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data Mining
Nandakumar P
 
Rohit 10103543
Rohit 10103543Rohit 10103543
Rohit 10103543
Pulkit Chhabra
 
1234
12341234
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
Dinusha Dilanka
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 
Af03301980202
Af03301980202Af03301980202
Af03301980202
ijceronline
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
butest
 
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
ijcnes
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Gingles Caroline
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2
uetian12
 
V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1
warishali570
 
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
rahulmonikasharma
 

Similar to Machine Learning statistical model using Transportation data (20)

ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
ZhangTorkkolaLiSchreinerZhangGardnerZhao(04279048)
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
Data analytics for engineers- introduction
Data analytics for engineers-  introductionData analytics for engineers-  introduction
Data analytics for engineers- introduction
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
 
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
UNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningUNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data Mining
 
Rohit 10103543
Rohit 10103543Rohit 10103543
Rohit 10103543
 
1234
12341234
1234
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Af03301980202
Af03301980202Af03301980202
Af03301980202
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
Perfomance Comparison of Decsion Tree Algorithms to Findout the Reason for St...
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
 
2018 p 2019-ee-a2
2018 p 2019-ee-a22018 p 2019-ee-a2
2018 p 2019-ee-a2
 
V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1V2 i9 ijertv2is90699-1
V2 i9 ijertv2is90699-1
 
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggre...
 

More from jagan477830

Exciting IoT projects for your final year.pdf
Exciting IoT projects for your final year.pdfExciting IoT projects for your final year.pdf
Exciting IoT projects for your final year.pdf
jagan477830
 
Innovative IoT-Based Projects to Revolutionize Everyday Life.pdf
Innovative IoT-Based Projects to Revolutionize Everyday Life.pdfInnovative IoT-Based Projects to Revolutionize Everyday Life.pdf
Innovative IoT-Based Projects to Revolutionize Everyday Life.pdf
jagan477830
 
IoT based mini projects.pdf
IoT based mini projects.pdfIoT based mini projects.pdf
IoT based mini projects.pdf
jagan477830
 
Mini Projects for Computer Science Engineering.pdf
Mini Projects for Computer Science Engineering.pdfMini Projects for Computer Science Engineering.pdf
Mini Projects for Computer Science Engineering.pdf
jagan477830
 
Mini Projects for Electronics and Communication Engineering.pdf
Mini Projects for Electronics and Communication Engineering.pdfMini Projects for Electronics and Communication Engineering.pdf
Mini Projects for Electronics and Communication Engineering.pdf
jagan477830
 
Mini Projects for Computer Science Engineering Students.pdf
Mini Projects for Computer Science Engineering Students.pdfMini Projects for Computer Science Engineering Students.pdf
Mini Projects for Computer Science Engineering Students.pdf
jagan477830
 
Overview of Embedded Systems Projects Examples.pdf
Overview of Embedded Systems Projects Examples.pdfOverview of Embedded Systems Projects Examples.pdf
Overview of Embedded Systems Projects Examples.pdf
jagan477830
 
The Future of CSE Projects_ Emerging Technologies to Watch Out For.pdf
The Future of CSE Projects_ Emerging Technologies to Watch Out For.pdfThe Future of CSE Projects_ Emerging Technologies to Watch Out For.pdf
The Future of CSE Projects_ Emerging Technologies to Watch Out For.pdf
jagan477830
 
A Comprehensive Guide of Python Final Year Projects with Source Code.pdf
A Comprehensive Guide of Python Final Year Projects with Source Code.pdfA Comprehensive Guide of Python Final Year Projects with Source Code.pdf
A Comprehensive Guide of Python Final Year Projects with Source Code.pdf
jagan477830
 
Top AI project ideas for engineering students.pdf
Top AI project ideas for engineering students.pdfTop AI project ideas for engineering students.pdf
Top AI project ideas for engineering students.pdf
jagan477830
 
How to Choose the Perfect Mtech Project Topic for Your Interests and Career G...
How to Choose the Perfect Mtech Project Topic for Your Interests and Career G...How to Choose the Perfect Mtech Project Topic for Your Interests and Career G...
How to Choose the Perfect Mtech Project Topic for Your Interests and Career G...
jagan477830
 
Beginner-Friendly IoT Arduino Projects to Try.pdf
Beginner-Friendly IoT Arduino Projects to Try.pdfBeginner-Friendly IoT Arduino Projects to Try.pdf
Beginner-Friendly IoT Arduino Projects to Try.pdf
jagan477830
 
Sentiment Analysis on social networking sites.pptx.pdf
Sentiment Analysis on social networking sites.pptx.pdfSentiment Analysis on social networking sites.pptx.pdf
Sentiment Analysis on social networking sites.pptx.pdf
jagan477830
 
Diabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine LearningDiabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine Learning
jagan477830
 
Lung Cancer Detection using transfer learning.pptx.pdf
Lung Cancer Detection using transfer learning.pptx.pdfLung Cancer Detection using transfer learning.pptx.pdf
Lung Cancer Detection using transfer learning.pptx.pdf
jagan477830
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
jagan477830
 
Detection of Retinal pigmentosa in paediatric age
Detection of Retinal pigmentosa in paediatric ageDetection of Retinal pigmentosa in paediatric age
Detection of Retinal pigmentosa in paediatric age
jagan477830
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
jagan477830
 
Journey of TechieYan Technologies
Journey of TechieYan Technologies Journey of TechieYan Technologies
Journey of TechieYan Technologies
jagan477830
 
Mini Projects for ECE Students with Low Cost in Hyderabad
Mini Projects for ECE Students with Low Cost in HyderabadMini Projects for ECE Students with Low Cost in Hyderabad
Mini Projects for ECE Students with Low Cost in Hyderabad
jagan477830
 

More from jagan477830 (20)

Exciting IoT projects for your final year.pdf
Exciting IoT projects for your final year.pdfExciting IoT projects for your final year.pdf
Exciting IoT projects for your final year.pdf
 
Innovative IoT-Based Projects to Revolutionize Everyday Life.pdf
Innovative IoT-Based Projects to Revolutionize Everyday Life.pdfInnovative IoT-Based Projects to Revolutionize Everyday Life.pdf
Innovative IoT-Based Projects to Revolutionize Everyday Life.pdf
 
IoT based mini projects.pdf
IoT based mini projects.pdfIoT based mini projects.pdf
IoT based mini projects.pdf
 
Mini Projects for Computer Science Engineering.pdf
Mini Projects for Computer Science Engineering.pdfMini Projects for Computer Science Engineering.pdf
Mini Projects for Computer Science Engineering.pdf
 
Mini Projects for Electronics and Communication Engineering.pdf
Mini Projects for Electronics and Communication Engineering.pdfMini Projects for Electronics and Communication Engineering.pdf
Mini Projects for Electronics and Communication Engineering.pdf
 
Mini Projects for Computer Science Engineering Students.pdf
Mini Projects for Computer Science Engineering Students.pdfMini Projects for Computer Science Engineering Students.pdf
Mini Projects for Computer Science Engineering Students.pdf
 
Overview of Embedded Systems Projects Examples.pdf
Overview of Embedded Systems Projects Examples.pdfOverview of Embedded Systems Projects Examples.pdf
Overview of Embedded Systems Projects Examples.pdf
 
The Future of CSE Projects_ Emerging Technologies to Watch Out For.pdf
The Future of CSE Projects_ Emerging Technologies to Watch Out For.pdfThe Future of CSE Projects_ Emerging Technologies to Watch Out For.pdf
The Future of CSE Projects_ Emerging Technologies to Watch Out For.pdf
 
A Comprehensive Guide of Python Final Year Projects with Source Code.pdf
A Comprehensive Guide of Python Final Year Projects with Source Code.pdfA Comprehensive Guide of Python Final Year Projects with Source Code.pdf
A Comprehensive Guide of Python Final Year Projects with Source Code.pdf
 
Top AI project ideas for engineering students.pdf
Top AI project ideas for engineering students.pdfTop AI project ideas for engineering students.pdf
Top AI project ideas for engineering students.pdf
 
How to Choose the Perfect Mtech Project Topic for Your Interests and Career G...
How to Choose the Perfect Mtech Project Topic for Your Interests and Career G...How to Choose the Perfect Mtech Project Topic for Your Interests and Career G...
How to Choose the Perfect Mtech Project Topic for Your Interests and Career G...
 
Beginner-Friendly IoT Arduino Projects to Try.pdf
Beginner-Friendly IoT Arduino Projects to Try.pdfBeginner-Friendly IoT Arduino Projects to Try.pdf
Beginner-Friendly IoT Arduino Projects to Try.pdf
 
Sentiment Analysis on social networking sites.pptx.pdf
Sentiment Analysis on social networking sites.pptx.pdfSentiment Analysis on social networking sites.pptx.pdf
Sentiment Analysis on social networking sites.pptx.pdf
 
Diabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine LearningDiabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine Learning
 
Lung Cancer Detection using transfer learning.pptx.pdf
Lung Cancer Detection using transfer learning.pptx.pdfLung Cancer Detection using transfer learning.pptx.pdf
Lung Cancer Detection using transfer learning.pptx.pdf
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Detection of Retinal pigmentosa in paediatric age
Detection of Retinal pigmentosa in paediatric ageDetection of Retinal pigmentosa in paediatric age
Detection of Retinal pigmentosa in paediatric age
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
 
Journey of TechieYan Technologies
Journey of TechieYan Technologies Journey of TechieYan Technologies
Journey of TechieYan Technologies
 
Mini Projects for ECE Students with Low Cost in Hyderabad
Mini Projects for ECE Students with Low Cost in HyderabadMini Projects for ECE Students with Low Cost in Hyderabad
Mini Projects for ECE Students with Low Cost in Hyderabad
 

Recently uploaded

The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
Kavitha Krishnan
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 

Recently uploaded (20)

The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 

Machine Learning statistical model using Transportation data

  • 1. Machine Learning statistical model using Transportation data
  • 2. Introduction ► As the world is growing rapidly the people and the vehicles we use to move from one place to another, so the transportation is playing a vital role in making human lives easiest to travel from one place to another, everyday more and more vehicles are being produced and being bought by the people around the world, be it Electric, Hydrogen, petrol, diesel or solar powered. So, most importantly Road Transportation such as, Road transport can be classified as either transporting goods and materials or transporting people. The main advantage of road transportation is that it allows for door-to-door delivery of goods and materials while also being a very cost-effective mode of cartage, loading, and unloading. Road transport is sometimes the only option for transporting goods and people to and from rural areas that are not served by rail, water, or air transport. Road transportation has numerous advantages over other modes of transportation. Road transport requires significantly less investment than other modes of transportation such as railways and air transport. Roads are less expensive to build, operate, and maintain than railways.
  • 3. Dataset Description ► The dataset is collected from the Kaggle data repository,(US Accidents (2016 - 2021) ► Dataset is in Comma Separated Value format, It consists of 2845342 entries, Ranging from 0 to 2845341, which has 47 columns. ► Since the dataset is very huge and contains many columns, we are going to discuss about important columns over here. 1. Severity – Type(int), this columns explains about the severity of the accident, and importantly this is our target class for making prediction further in the project. 2. Start_time & End_time – Type(object), This shows the start and end time of the accident took palce at certain place, similarly, we have latitude and longitude coordinates of the accident place, since the dataset is about accidents taken place in US. 3. Distance – Length of the road extent affected by the accident occurred. 4. Description – Explains about the description of the accidents given by the fellow drivers who were driving along side with the accident victims. 5. City, State, County – Explains about the place where the accident took place, in which specific city, state and county. 6. Along with these, we also have other columns such as weather, temperature, traffic signal, sunrise_sunset, railway_line etc.
  • 7. Descriptive Analysis ► Here we are going to dive deep into the dataset and know some more information about it. ► Below functions helps us to understand the insights of the data and also helps us to extract information which might help us to fill the null values. 1. df.info() -> information about the dataset, such as type of each column and the numebr of entries present in the dataset. 2. df.describe() -> helps us to understand the descriptive data of each column, note: the description for numerical and categorical will be different, by default we get the numerical column description. 3. df.isnull().sum() -> Count of Missing Values for each column. 4. df.head() -> Displays first 5 rows of the dataset, similarly df.tail displays last 5.
  • 8. Top Cities with Highest Number of Accidents
  • 9. Top States with Highest Number of Accidents
  • 11.
  • 12.
  • 13. Since temperature has less than 10% null values of the total number of values and they appear to be normally distributed. It might be a good idea to fill these empty data with the mean value. Whereas for Visibility(mi), it's right skewed. So replacing null values with a median value is more suitable. Since Precipitation(in), Wind_Speed(mph) have an right skewed distribution. It's better to use mode value to fill the Null value in these two columns. Humidity(%) though has a left skewed distribution. I still used the mode value to fill out the Null. It may not be accurate to fill out the Null value based on the previous or latter adjacent value, as every two accidents were hardly related. Also, Most of the columns were Irrelevant and consisted of more than 60% of missing values, so I decided to drop those features.
  • 14. Geographical heatmap of accidents in each state
  • 15. Predictive Analysis ► Predictive analytics uses mathematical modeling tools to generate predictions about an unknown fact, characteristic, or event. “It’s about taking the data that you know exists and building a mathematical model from that data to help you make predictions about somebody not yet in that data set,” Goulding explains. ► An analyst’s role in predictive analysis is to assemble and organize the data, identify which type of mathematical model applies to the case at hand, and then draw the necessary conclusions from the results. They are often also tasked with communicating those conclusions to stakeholders effectively and engagingly. ► “The tools we’re using for predictive analytics now have improved and become much more sophisticated,” Goulding says, explaining that these advanced models have allowed us to “handle massive amounts of data in ways we couldn’t before.” ► Example: Linear Regression, Logistic Regression, Decision Trees, Random Forest, Support Vector Machines etc.
  • 16. Cluster Analysis ► Clustering is the process of dividing a population or set of data points into groups so that data points in the same group are more similar to other data points in the same group and dissimilar to data points in other groups. It is essentially a collection of objects based on their similarity and dissimilarity. ► Cluster analysis itself is not one specific algorithm but the general task to be solved. It can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances between cluster members, dense areas of the data space, intervals or particular Statistical distributions. ► Clustering can therefore be formulated as a multi-objective optimization problem. The appropriate clustering algorithm and parameter settings (including parameters such as the distance function to use, a density threshold or the number of expected clusters) depend on the individual data set and intended use of the results.
  • 17.
  • 18. Random Forest ► Random Forest is a supervised machine learning algorithm. This Technique can be used for both regression and classification tasks but generally performs better in classification tasks. As the name suggests, Random Forest technique considers multiple decision trees before giving an output. So, it is basically an ensemble of decision trees. ► This technique is based on the belief that a greater number of trees would converge to the right decision. For classification, it uses a voting system and then decides the class whereas in regression it takes the mean of all the outputs of each of the decision trees. ► It works well with large datasets with high dimensionality. The random forest algorithm is an extension of the bagging method as it utilizes both bagging and feature randomness to create an uncorrelated forest of decision trees. Feature randomness, also known as feature bagging or “the random subspace method generates a random subset of features, which ensures low correlation among decision trees.
  • 20. KNearest Neighbors ► The k-nearest neighbor algorithm, also known as KNN or k-NN, is a non-parametric, supervised learning classifier that uses proximity to classify or predict the grouping of an individual data point. It can be used for both regression and classification problems, but it is most commonly used as a classification algorithm, based on the assumption that similar points can be found close together. ► A majority vote is used to assign a class label to a classification problem that is, the label that is most frequently represented around a given data point is used. While technically this is referred to as "plurality voting," the term "majority vote" is more commonly used in literature. ► The difference between these terms is that "majority voting" technically requires a majority of more than 50%, which only works when there are only two options. When there are multiple classes say, four categories you don't always need 50% of the vote to make a decision about a class; you could assign a class label with a vote of more than 25%.
  • 22. Variable Selection Method ► Feature or Variable selection methods are used to select specific features from our dataset, which are useful and important for our model to learn and predict. As a result, feature selection is an important step in the development of a machine learning model. Its goal is to identify the best set of features for developing a machine learning model. ► Some popular techniques of feature selection in machine learning are: • Filter methods • Wrapper methods • Embedded methods ► Filter Methods • These methods are generally used while doing the pre-processing step. These methods select features from the dataset irrespective of the use of any machine learning algorithm. • Techniques such as : Information gain, Chi-Square, Variance_Threshold, Mean_Absolute_Difference etc. ► Wrapper methods: • Wrapper methods, also referred to as greedy algorithms train the algorithm by using a subset of features in an iterative manner. Based on the conclusions made from training in prior to the model, addition and removal of features takes place. • Techniques such as: Forward selection, Backward Elimination, Bi-Directional Elimination etc. ► Embedded methods: • In embedded methods, the feature selection algorithm is blended as part of the learning algorithm, thus having its own built-in feature selection methods. Embedded methods encounter the drawbacks of filter and wrapper methods and merge their advantages. • Techniques such as: Regularization, tree based methods
  • 23. Variable selection using SequentialFeatureSelection ► Sequential feature selection algorithms are a type of greedy search algorithm that is used to reduce a d-dimensional feature space to a k-dimensional feature subspace, where k d. Feature selection algorithms are designed to automatically select a subset of features that are most relevant to the problem. ► A wrapper approach, such as sequential feature selection, is especially useful when embedded feature selection, such as a regularization penalty like LASSO, is not applicable. ► SFAs, in a nutshell, remove or add features one at a time based on classifier performance until a feature subset of the desired size k is reached. ► There are basically 4 types of SFA’s such as: 1. Sequential Forward Selection (SFS) 2. Sequential Backward Selection (SBS) 3. Sequential Forward Floating Selection (SFFS) 4. Sequential Backward Floating Selection (SBFS) ► The one we have employed in our project is the Sequential forward selection
  • 24. Mlxlend Feature selection library for selecting the best features for the model.
  • 25. Testing the Model on Variables selected by algorithm. Decision Tree ► A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements. Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal but are also a popular tool in machine learning. A decision tree is a flowchart-like structure in which each internal node represents a "test" on an attribute (e.g. whether a coin flip comes up heads or tails), each branch represent the outcome of the test, and each leaf node represents a class label (decision taken after computing all attributes). The paths from root to leaf represent classification rules. In decision analysis, a decision tree and the closely related influence diagram are used as a visual and analytical decision support tool, where the expected values (or expected utility) of competing alternatives are calculated. ► A decision tree consists of three types of nodes ► Decision nodes – typically represented by squares ► Chance nodes – typically represented by circles ► End nodes – typically represented by triangles
  • 26. Using Decision Tree as a classifier, we have fitted a sequential feature selector model to extract the important features from the dataset.
  • 27. Sfal. Subsets_ -> Explains about the average accuracy we got by training the model for number of features in each step.
  • 28.
  • 29. Plot about the important features extracted from Sequential Feature Selector, X axis represents numebr of features and Y axis represents prediction accuracy we got by selecting those specific features.
  • 30. Results are converted into a dataframe where 1st column represents the number of features and 2nd column represents the accuracy we got from selecting those features.
  • 31. Conclusion ► In this project, we have done a lot of preprocessing and exploratory data analysis, since the main objective was to get insights from the road transportation data and do statistical analysis. ► Data preprocessing has been performed by filling in the null vlaues and dropping of irrelevant columns based on how important they are for building an efficient model keeping computational cost in mind. ► Predictive models such as Decision tree, Random Forest and KNearest Neighbors Classification algorithms has been applied to predict the target variable i.e Severity of the accident using the other independent features. ► Variable selection methods such as Sequential Feature Selector has been applied to the cleaned data to extract the most important features, and those features are trained and tested on the Decision tree model.
  • 32. About TechieYan Technologies TechieYan Technologies offers a special platform where you can study all the most cutting-edge technologies directly from industry professionals and get certifications. TechieYan collaborates closely with engineering schools, engineering students, academic institutions, the Indian Army, and businesses. Project trainings, engineering workshops, internships, and laboratory setup are all things we provide. We work on projects related to robotics, python, deep learning, artificial intelligence, IoT, embedded systems, matlab, hfss pcb design, vlsi, and ieee current projects. Address: 16-11-16/V/24, Sri Ram Sadan, Moosarambagh, Hyderabad 500036 phone no: +91 7075575787 website:https://techieyantechnologies.com