SlideShare a Scribd company logo
1 of 24
SAN FRANCISCO CRIME
CLASSIFICATION
Sai Praneeth
Project Outline
1.Problem Identification
2.Data Understanding & Cleansing
3.Data Visualization
4.Prediction Methodologies
5.Validation & Scoring
Problem Identification
Current State
• The current crime index of
S.F is 3(Safer than 3% of
the cities in the US.)
• 67.67 annual crimes per
1,000 residents.
• Don’t have model to
predict crimes based on
location and time
Future State
• A proper model
predicting crime based
on Date, Time and
Location.
• Help the corrections
department to act
properly with corrective
measures based on our
model.
• What are the different metrics that
influence response?
• Is the data enough to give us a clear
picture of crime committed?
• What kind of model best fits the
data?
Problem Statement
• Given time and location, you must predict the category of
crime that occurred.
• This competition's dataset provides nearly 12 years of crime
reports from across all of San Francisco's neighborhoods.
• It also encourages us to explore the dataset visually.
Data Overview
Timestamp
Category(Different Crimes)
Description
Resolution
Day of Week
PdDistrict
Address
Longitude & Latitude
Data Cleansing and Manipulation
Cleaning The Data
Check for Missing values
Check for Entry errors
Check for Duplicates
Check for outliers
Manipulating The Data
Time Stamp
Address
Longitude
Latitude
Data Visualization
Data Visualization
Data Visualization
Data Visualization
Data Visualization
Variables Selection & Data Partition
• Data Partition
▫ 60:40
Project Diagram
1. Decision Tree (Two-way split)
• This decision tree with typical two way split.
• In the properties panel the method was changed to assessment and the
assessment measure was changed to decision as we are trying to classify
the categorical variables.
1.Decision Tree (Two-way split)
• Most Important variable for split -> Zip code
• No of leaves in the pruned tree -> 6
• Validation Misclassification 0.273474
1. Decision Tree (Two-way split)
2. Decision Tree (Three-way splits)
• This decision tree has three way split.
• In the properties panel we changed the maximum branch to three and we
still have the same assessment criteria.
• This greatly increased model accuracy.
2. Decision Tree (Three-way splits)
• Most Important variable for split -> Zip codes
• No of leaves in the pruned tree -> 7
• Validation Misclassification -> 0.134316
2. Decision Tree (Three-way splits)
3.Gradient Boosting
• “Gradient boosting is a boosting approach that resamples the data set
several times to generate results that form a weighted average of the
resampled data set. Tree boosting creates a series of decision trees which
together form a single predictive model”
• Here the assessment measure is taken as misclassification.
• The Train proportion is taken as 60%
• Most Important variable for split -> PDistrict
• Validation Misclassification -> 0.34221
4.Ensemble model
• Combination of all the four models.
• Validation misclassification of 0.141683
Model Comparison
• Best model is Three way decision tree with misclassification of 0.135668
• Model drastically improved after converting latitude and longitude to zip
codes.
Betterment of Model
• Demographics Data Inclusion
• Time Series Analysis
Questions
THANK YOU

More Related Content

What's hot

Using machine learning algorithms to
Using machine learning algorithms toUsing machine learning algorithms to
Using machine learning algorithms tomlaij
 
Survey of Data Mining Techniques on Crime Data Analysis
Survey of Data Mining Techniques on Crime Data AnalysisSurvey of Data Mining Techniques on Crime Data Analysis
Survey of Data Mining Techniques on Crime Data Analysisijdmtaiir
 
Fundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis ConceptsFundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis ConceptsOsokop
 
An New Attractive Mage Technique Using L-Diversity
An New Attractive Mage Technique Using L-Diversity  An New Attractive Mage Technique Using L-Diversity
An New Attractive Mage Technique Using L-Diversity mlaij
 
Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender ...
Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender ...Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender ...
Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender ...Nima Dokoohaki
 
Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...IRJET Journal
 
Performance Analysis of a Gaussian Mixture based Feature Selection Algorithm
Performance Analysis of a Gaussian Mixture based Feature Selection AlgorithmPerformance Analysis of a Gaussian Mixture based Feature Selection Algorithm
Performance Analysis of a Gaussian Mixture based Feature Selection Algorithmrahulmonikasharma
 
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...theijes
 
Information Security Risk Analysis Using Analytic Hierarchy Process and Fuzzy...
Information Security Risk Analysis Using Analytic Hierarchy Process and Fuzzy...Information Security Risk Analysis Using Analytic Hierarchy Process and Fuzzy...
Information Security Risk Analysis Using Analytic Hierarchy Process and Fuzzy...IJCSIS Research Publications
 
Mat 255 chapter 3 notes
Mat 255 chapter 3 notesMat 255 chapter 3 notes
Mat 255 chapter 3 notesadrushle
 
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKING
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKINGUSE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKING
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKINGIJDKP
 
Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Pratibha Singh
 
IRJET - Random Data Perturbation Techniques in Privacy Preserving Data Mi...
IRJET -  	  Random Data Perturbation Techniques in Privacy Preserving Data Mi...IRJET -  	  Random Data Perturbation Techniques in Privacy Preserving Data Mi...
IRJET - Random Data Perturbation Techniques in Privacy Preserving Data Mi...IRJET Journal
 
Poor man's missing value imputation
Poor man's missing value imputationPoor man's missing value imputation
Poor man's missing value imputationLeonardo Auslender
 
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Nattiya Kanhabua
 

What's hot (20)

Using machine learning algorithms to
Using machine learning algorithms toUsing machine learning algorithms to
Using machine learning algorithms to
 
Survey of Data Mining Techniques on Crime Data Analysis
Survey of Data Mining Techniques on Crime Data AnalysisSurvey of Data Mining Techniques on Crime Data Analysis
Survey of Data Mining Techniques on Crime Data Analysis
 
Fundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis ConceptsFundamentalsof Crime Mapping Tactical Analysis Concepts
Fundamentalsof Crime Mapping Tactical Analysis Concepts
 
An New Attractive Mage Technique Using L-Diversity
An New Attractive Mage Technique Using L-Diversity  An New Attractive Mage Technique Using L-Diversity
An New Attractive Mage Technique Using L-Diversity
 
Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender ...
Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender ...Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender ...
Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender ...
 
Di35605610
Di35605610Di35605610
Di35605610
 
"Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm"
"Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm""Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm"
"Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm"
 
15-088-pub
15-088-pub15-088-pub
15-088-pub
 
Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...
 
Performance Analysis of a Gaussian Mixture based Feature Selection Algorithm
Performance Analysis of a Gaussian Mixture based Feature Selection AlgorithmPerformance Analysis of a Gaussian Mixture based Feature Selection Algorithm
Performance Analysis of a Gaussian Mixture based Feature Selection Algorithm
 
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
 
Information Security Risk Analysis Using Analytic Hierarchy Process and Fuzzy...
Information Security Risk Analysis Using Analytic Hierarchy Process and Fuzzy...Information Security Risk Analysis Using Analytic Hierarchy Process and Fuzzy...
Information Security Risk Analysis Using Analytic Hierarchy Process and Fuzzy...
 
Saif_CCECE2007_full_paper_submitted
Saif_CCECE2007_full_paper_submittedSaif_CCECE2007_full_paper_submitted
Saif_CCECE2007_full_paper_submitted
 
Mat 255 chapter 3 notes
Mat 255 chapter 3 notesMat 255 chapter 3 notes
Mat 255 chapter 3 notes
 
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKING
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKINGUSE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKING
USE OF PLS COMPONENTS TO IMPROVE CLASSIFICATION ON BUSINESS DECISION MAKING
 
Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...
 
IRJET - Random Data Perturbation Techniques in Privacy Preserving Data Mi...
IRJET -  	  Random Data Perturbation Techniques in Privacy Preserving Data Mi...IRJET -  	  Random Data Perturbation Techniques in Privacy Preserving Data Mi...
IRJET - Random Data Perturbation Techniques in Privacy Preserving Data Mi...
 
Poor man's missing value imputation
Poor man's missing value imputationPoor man's missing value imputation
Poor man's missing value imputation
 
Mkt research
Mkt researchMkt research
Mkt research
 
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
 

Similar to San Francisco Crime Classification

Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptxPallabiSahoo5
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mininghktripathy
 
A high level overview of all that is Analytics
A high level overview of all that is AnalyticsA high level overview of all that is Analytics
A high level overview of all that is AnalyticsRamkumar Ravichandran
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematicshktripathy
 
Feature selection with imbalanced data in agriculture
Feature selection with  imbalanced data in agricultureFeature selection with  imbalanced data in agriculture
Feature selection with imbalanced data in agricultureAboul Ella Hassanien
 
BS 1 and 2 30th Oct.pptx
BS 1 and 2 30th Oct.pptxBS 1 and 2 30th Oct.pptx
BS 1 and 2 30th Oct.pptxTanMak1
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detectionShantanuDeosthale
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISBabasID2
 
Measure of central tendency
Measure of central tendency Measure of central tendency
Measure of central tendency Kannan Iyanar
 
Module 3 Identifying fraud in forensic analysis.pptx
Module 3 Identifying fraud in forensic analysis.pptxModule 3 Identifying fraud in forensic analysis.pptx
Module 3 Identifying fraud in forensic analysis.pptxIqbalAli61
 
Estimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishEstimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishArsalan Qadri
 
Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Institute of Contemporary Sciences
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics Bahzad5
 
QUANTITATIVE-DATA.pptx
QUANTITATIVE-DATA.pptxQUANTITATIVE-DATA.pptx
QUANTITATIVE-DATA.pptxViaFortuna
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsJen Stirrup
 

Similar to San Francisco Crime Classification (20)

Intro to Statistics.pptx
Intro to Statistics.pptxIntro to Statistics.pptx
Intro to Statistics.pptx
 
Data Wrangling_1.pptx
Data Wrangling_1.pptxData Wrangling_1.pptx
Data Wrangling_1.pptx
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
 
A high level overview of all that is Analytics
A high level overview of all that is AnalyticsA high level overview of all that is Analytics
A high level overview of all that is Analytics
 
Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
 
PCA.pptx
PCA.pptxPCA.pptx
PCA.pptx
 
Feature selection with imbalanced data in agriculture
Feature selection with  imbalanced data in agricultureFeature selection with  imbalanced data in agriculture
Feature selection with imbalanced data in agriculture
 
AERA 2007 Developing benchmark
AERA 2007 Developing benchmarkAERA 2007 Developing benchmark
AERA 2007 Developing benchmark
 
BS 1 and 2 30th Oct.pptx
BS 1 and 2 30th Oct.pptxBS 1 and 2 30th Oct.pptx
BS 1 and 2 30th Oct.pptx
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
 
Measure of central tendency
Measure of central tendency Measure of central tendency
Measure of central tendency
 
Module 3 Identifying fraud in forensic analysis.pptx
Module 3 Identifying fraud in forensic analysis.pptxModule 3 Identifying fraud in forensic analysis.pptx
Module 3 Identifying fraud in forensic analysis.pptx
 
Estimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishEstimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit Rish
 
Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...Conceptual framework for entity integration from multiple data sources - Draz...
Conceptual framework for entity integration from multiple data sources - Draz...
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics
 
QUANTITATIVE-DATA.pptx
QUANTITATIVE-DATA.pptxQUANTITATIVE-DATA.pptx
QUANTITATIVE-DATA.pptx
 
2.2 Mesure Phase (1).pptx
2.2 Mesure Phase (1).pptx2.2 Mesure Phase (1).pptx
2.2 Mesure Phase (1).pptx
 
Res701 research methodology lecture 7 8-devaprakasam
Res701 research methodology lecture 7 8-devaprakasamRes701 research methodology lecture 7 8-devaprakasam
Res701 research methodology lecture 7 8-devaprakasam
 
R - what do the numbers mean? #RStats
R - what do the numbers mean? #RStatsR - what do the numbers mean? #RStats
R - what do the numbers mean? #RStats
 

Recently uploaded

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

San Francisco Crime Classification

  • 2. Project Outline 1.Problem Identification 2.Data Understanding & Cleansing 3.Data Visualization 4.Prediction Methodologies 5.Validation & Scoring
  • 3. Problem Identification Current State • The current crime index of S.F is 3(Safer than 3% of the cities in the US.) • 67.67 annual crimes per 1,000 residents. • Don’t have model to predict crimes based on location and time Future State • A proper model predicting crime based on Date, Time and Location. • Help the corrections department to act properly with corrective measures based on our model. • What are the different metrics that influence response? • Is the data enough to give us a clear picture of crime committed? • What kind of model best fits the data?
  • 4. Problem Statement • Given time and location, you must predict the category of crime that occurred. • This competition's dataset provides nearly 12 years of crime reports from across all of San Francisco's neighborhoods. • It also encourages us to explore the dataset visually.
  • 6. Data Cleansing and Manipulation Cleaning The Data Check for Missing values Check for Entry errors Check for Duplicates Check for outliers Manipulating The Data Time Stamp Address Longitude Latitude
  • 12. Variables Selection & Data Partition • Data Partition ▫ 60:40
  • 14. 1. Decision Tree (Two-way split) • This decision tree with typical two way split. • In the properties panel the method was changed to assessment and the assessment measure was changed to decision as we are trying to classify the categorical variables.
  • 15. 1.Decision Tree (Two-way split) • Most Important variable for split -> Zip code • No of leaves in the pruned tree -> 6 • Validation Misclassification 0.273474
  • 16. 1. Decision Tree (Two-way split)
  • 17. 2. Decision Tree (Three-way splits) • This decision tree has three way split. • In the properties panel we changed the maximum branch to three and we still have the same assessment criteria. • This greatly increased model accuracy.
  • 18. 2. Decision Tree (Three-way splits) • Most Important variable for split -> Zip codes • No of leaves in the pruned tree -> 7 • Validation Misclassification -> 0.134316
  • 19. 2. Decision Tree (Three-way splits)
  • 20. 3.Gradient Boosting • “Gradient boosting is a boosting approach that resamples the data set several times to generate results that form a weighted average of the resampled data set. Tree boosting creates a series of decision trees which together form a single predictive model” • Here the assessment measure is taken as misclassification. • The Train proportion is taken as 60% • Most Important variable for split -> PDistrict • Validation Misclassification -> 0.34221
  • 21. 4.Ensemble model • Combination of all the four models. • Validation misclassification of 0.141683
  • 22. Model Comparison • Best model is Three way decision tree with misclassification of 0.135668 • Model drastically improved after converting latitude and longitude to zip codes.
  • 23. Betterment of Model • Demographics Data Inclusion • Time Series Analysis