SlideShare a Scribd company logo
A Model to predict Crime Rate based on Education and Economic metrics.
Reuben Hilliard
Kennesaw State University
Purpose Methods
Why do yearly crime rates fluctuate up and down in the US? The exact
reasons are difficult to quantify, but a number of Socio-economic
variables have been identified as the most likely causes. The purpose of
this analysis was to build a model to predict crime rate. It could be
utilized to assist state & local governments in combatting and reducing
crime, using an alternative solution outside the standard law enforcement
or incarceration methodology.
Data was collected from various sources, including The National Crime
Victimization Survey Series,1 Bureau of Labor and Statistics,2 US
Census Bureau,3,4 and National Education Association5. This raw data
was extracted, cleaned and combined to build a master dataset. All the
data collected was from the 2008 Calendar Year or 2007/2008 School
Year. The dataset includes 50 observations, representing the 50 US
States, of which 15 variables were analyzed. Total Population represents
the total number of residents per State. The Population Size Level
divides the States into 3 categories, ‘SPARSE’ (under 2 Mil.),
‘MEDIUM’ (2-10 Mil.) and ‘DENSE’ (over 10 Mil. inhabitants). The
Violent Crime Incidents, which include murder, rape, robbery and assault
and The Property Crime Incidents, which include burglary, larceny and
vehicle theft have corresponding rates of per 100,000 residents, divided
into ‘LO RISK’ (under 2500 incidents) or ‘HI RISK’ (over 2500
incidents) and finally a Combined Crime Rate (Total number of
Incidents). Then the Unemployment Rate per state for 2008 and Median
Household Income represent the economic indicators. Unemployment
Rate Level was created to categorize Unemployment rate for the
following levels, ‘ROBUST’ (Rate less 5%), ‘FUNCTIONAL’ (5-6%)
and ‘INADEQUATE’ (over 6%). Lastly, there is the percent of adults
over the age of 25 who have graduated from High School and the dollar
amount spent per capita on Education from State and Local
Governments, that both represent education indicators.
Analysis of this dataset was performed on a number of variables to
assess what their ultimate effect on the ‘Combined Crime Rate’ per State
would be. All the data gathered was pre-recession, 2008. The goal was to
generate models to predict a crime rate in a given State. The size of the
State’s population was found to have no effect on the Crime Rate. Also,
as Crime Rate was a normalized variable, State Population Density also
did not influence this factor. Ultimately, after isolating the two variables
that had the greatest direct influence on the Combined Crime Rate,
namely ‘Unemployment Rate’ and ‘Percentage of High School
Graduates over 25’, further analysis led to multiple linear regression
analysis. All programming and output was performed using SAS 9.3.
The following is a portion of the analysis.
The histogram in Figure 1.1 and the boxplot in Figure 1.2, demonstrate
that the distribution for Spending on Education per capita is unimodal
and slightly skewed to the right. The median dollar amount spent on
Education is $2769.00, and the Interquartile Range (IQR) is $387.00.
There are five outliers.
The histogram in Figure 2.1 and the boxplot in Figure 2.2, demonstrate
that the distribution for Violent Crime Rate is unimodal and skewed to
the right. The median rate is 348.20 incidents per 100,000 residents and
the IQR is 245.10. There is only a single outlier.
In regards to economic health and specifically, the unemployment rate,
the ordered bar chart in Figure 3 indicates that 22 States have a
ROBUST level, 12 a FUNCTIONAL level and 17 an INADEQUATE
level.
Figure 4 shows the Stacked Bar Chart for Unemployment Rate Level
(URL) by Violent Risk Level (VRL). As the Bar Chart moves from left
to right the HI RISK VRL indicator (shown in red) increases, only
making up 31.8% in the ROBUST category, but 70.59% in the
INADEQUATE category. As the economy declines, unemployment rates
increased and in turn, violent crime rates increased.
Introduction
Analysis
A Model to predict Crime Rate based on Education and Economic metrics.
Reuben Hilliard
Kennesaw State University
Conclusion
References
1. The National Crime Victimization Survey, 2008. Web retrieved 4/12/2014.
https://explore.data.gov/Law-Enforcement-Courts-and-Prisons/National-
Crime-Victimization-Survey-2008-Record-Ty/rfme-ynch
2. Unemployment Rate, 2008. Web retrieved 4/12/2014.
http://www.bls.gov/lau/lastrk08.htm
3. Median Family Income, Wage Census 2008. Web retrieved 4/12/2014.
http://www.census.gov/hhes/www/income/data/statemedian/
4. State Education Levels, 2008. Web retrieved 4/12/2014.
http://www.census.gov/compendia/statab/2011/tables/11s0229.pdf
5. Per Capita Spending on Education, 2007/2008. Web retrieved 4/12/2014.
http://www.nea.org/assets/docs/HE/NEA_Rankings_and_Estimates010711.pdf
These results in this study are important because it gives states the
opportunity to approach crime from a different angle. Using this
model, law-makers can predict their state’s crime rate, but more
importantly focus their attention on decreasing the unemployment
rate and increasing the number of high school graduates.
Ultimately, it would cost states more to house and monitor
criminals in the long term, than to invest in worker retraining
programs for adults or technical preparation programs at local high
schools, giving teenagers the skills they would need to be
successful and productive citizens in our society. Additionally,
altering the variables up or down can also predict where the Crime
Rate would be in the future, allowing for a measure of course
correction on behalf of officials. Further study would include using
the data from the 2012 Census and other sources to refine these
Models for future use. It would be interesting to compare the
Combined Crime Rates for the 2008-2012 periods. During this
time, the US went through the Great Recession, when
unemployment rates skyrocketed and funding to schools were
slashed.Multiple Linear Regression Equation: CCRate = 10907 - 94.32 (HS) + 15329 (URate)
Figure 5:
Regression Model for Crime Rate vs. HS Education
Figure 6:
Regression Model for Crime Rate vs. Unemployment Rate
Table I: Performance of the bottom 10% of States
Results
Figure 5 shows the Regression Model for Combine Crime Rate
(CCRate) vs. Percent of Population over 25 with High School
Education (HS), with the equation of CCRate = 13826 – 118.6
(HS). HS is significant at a 0.05 level, as p = 0.0008. Also, High
School Education Percentage and Crime Rate are negatively
correlated.
Figure 6 shows the Regression Model for Combine Crime Rate
(CCRate) vs. Unemployment Rate (URate), with the equation
of CCRate = 2128.5 + 26867 (URate). URate is significant at a
0.05 level, as p = 0.0073. Also, Unemployment Rate and Crime
Rate are positively correlated.
The correlation coefficient between the two variables, HS and
URate was not strong enough to warrant any multicollinearity
concerns.
Using the indicators above, the dataset was formatted in
decreasing order within SAS. This is displayed in Table I,
‘Performance of the Bottom 10% of States’. Michigan, Rhode
Island, California, Nevada and South Carolina all had the
highest Unemployment Rates and the lowest Percentage of
High School Graduates. Corresponding Risk Levels were High,
indicating that overall, these States also had a combined large
number of Violent and Property Crime incidents per 100,000
inhabitants.
A Model to predict Crime Rate based on Education and Economic metrics.
Reuben Hilliard
Kennesaw State University
Base SAS 9.3 Code
*Contingency table for Unemployment Rate Level by Violent Risk Level;
title "Table 4: Contingency Table for Unemployment Rate Level (URL) by Violent Risk Level (VRL)";
proc sort data=pop1.project5;
by vrl;run;
proc freq data=pop1.project5;
tables url*vrl;run;
*100% Stacked Bar Chart for Unemployment Rate Level by Violent Risk Level";
proc freq data=pop1.project5;
tables url*vrl;
ods output crosstabfreqs=ct;title;
data ct2;
set ct(drop = _type_ _table_ missing ColPercent Percent Frequency);
if not missing (rowpercent);run;title;
proc sgplot data = ct2;
title1 "Figure 4: 100% Stacked Bar Chart for Unemployment Rate Level by Violent Risk
Level";
vbar url/ group = vrl response=rowpercent;
yaxis label= "Percent within Violent Risk Level";run;
*Property Crime Rate and Population Size Level;
proc sort data=pop1.project5;by popsl;
run;
*Regression Models for Combined Crime Rate vs. High School Education(%)and Unemployment Rate;
%macro Regmodel(var1=,var2=);
proc reg data=pop1.project5 plots =(CooksD);
model &var1=&var2/r;
run;
%mend Regmodel;
%Regmodel(var1=ccrate,var2=hs);
%Regmodel(var1=ccrate,var2=urate);
*Sorting performance of bottom 10% of States;
proc sort data=pop1.project5 out=sorted;
by descending urate descending hs;run;
proc print data=sorted (obs=5);
title "Table 5: Performance of Bottom 10% of States by metrics: Unemployment Rate and
Percent High School Education";
var urate hs state ccrate;
ods rtf close;
*Run all formats;
proc format;
VALUE URLformat
1= "ROBUST"
2 = "FUNCTIONAL"
3 = "INADEQUATE"; run; quit;
proc format;
VALUE PopSLformat
1= "SPARSE"
2 = "MEDIUM"
3 = "DENSE"; run; quit;
proc format;
VALUE VRLformat
1= "LO RISK"
2 = "HI RISK"; run; quit;
proc format;
VALUE PRLformat
1= "LO RISK"
2 = "HI RISK"; run; quit;
*Creating new variable: Unemployment Rate Level (URL);
data pop1.project2;
set pop1.project1a;
if urate<0.050 then URL=1;
else if urate >=0.050 & urate <0.060 then URL=2;
else if urate >= 0.060 then URL=3;run;
* Creating a new variable: Population Size Level;
data pop1.project3;
set pop1.project2;
if TPop<2000000 then PopSL=1;
else if TPop >=2000000 & TPop <10000000 then PopSL=2;
else if TPop >= 0.060 then PopSL=3;run;
* Creating a new variable: Violent Rate Level;
data pop1.project4;set pop1.project3;
if VCRate <400 then VRL=1;
else if VCRate >= 400 then VRL=2;run;
* Creating a new variable: Property Rate Level;
data pop1.project5;set pop1.project4;
if PCRate <2500 then PRL=1;
else if PCRate >=2500 then PRL=2;run;
*Macro Program to run Descriptive Stats, Histogram and Boxplot for ANY Variable;
%macro Analyze(I=,BLANK1=);
proc SQL noprint;
select label into :LBlank1
from dictionary.columns
where UPCASE(MEMNAME) = UPCASE("project5") &
UPCASE(NAME) = UPCASE("&BLANK1");
quit;
*Descriptive Statistics;
proc means data=pop1.project5 mean median stddev qrange min max maxdec=2;
title "Table &I:Descrptive Statistics for &LBlank1";
var &Blank1;
*Histogram;
proc sgplot data=pop1.project5;
title "Figure &I..1: Histogram for &LBlank1";
histogram &Blank1/ fillattrs=(color=gold) transparency=0.5;
*Boxplot;
proc sgplot data=pop1.project5;
title "Figure &I..2: Boxplot for &LBlank1";
vbox &Blank1;
run;
%mend Analyze;
%Analyze(I=1,BLANK1=Edu)
%Analyze(I=2,BLANK1=CCRate)
*Create a frequency table for URL;
ods rtf;
proc freq data=pop1.project5;
title 'Table 3: Frequency Table of Unemployment Rate Level for The 50 States';
tables URL;run;
*Create and print the ordered bar chart;
proc sgplot data = pop1.project5;
title 'Figure 3: Bar Chart of Unemployment Rate Level for The 50 States';
vbar URL/ fillattrs=(color=green) transparency=0.5;
xaxis LABEL = 'Unemployment Rate';
run;

More Related Content

What's hot

Final Draft of Illegal Immigrant Paper
Final Draft of Illegal Immigrant PaperFinal Draft of Illegal Immigrant Paper
Final Draft of Illegal Immigrant Paper
Keegan Keegan Pronovost
 
Ev Adoption in the US and Government-backed EV incentives
Ev Adoption in the US and Government-backed EV incentivesEv Adoption in the US and Government-backed EV incentives
Ev Adoption in the US and Government-backed EV incentives
JanelleNtim
 
Contributions of Immigrant Labor to the American Economy - A Different Take
Contributions of Immigrant Labor to the American Economy - A Different TakeContributions of Immigrant Labor to the American Economy - A Different Take
Contributions of Immigrant Labor to the American Economy - A Different Take
Instituto Diáspora Brasil (IDB)
 
Survey | 2019 U.S. Tax Survey
Survey | 2019 U.S. Tax SurveySurvey | 2019 U.S. Tax Survey
Wonder - Julio - Scioli gana 1era, pero Macri la 2da vuelta
Wonder - Julio - Scioli gana 1era, pero Macri la 2da vueltaWonder - Julio - Scioli gana 1era, pero Macri la 2da vuelta
Wonder - Julio - Scioli gana 1era, pero Macri la 2da vuelta
Javier Casabal
 
Mercer Capital's Value Focus: Laboratory Services | Mid-Year 2016
Mercer Capital's Value Focus: Laboratory Services | Mid-Year 2016Mercer Capital's Value Focus: Laboratory Services | Mid-Year 2016
Mercer Capital's Value Focus: Laboratory Services | Mid-Year 2016
Mercer Capital
 
eco620finalpaper
eco620finalpapereco620finalpaper
eco620finalpaper
Johnny Wright
 
Victims of Crime Survey 2015/2016
Victims of Crime Survey 2015/2016Victims of Crime Survey 2015/2016
Victims of Crime Survey 2015/2016
Statistics South Africa
 

What's hot (8)

Final Draft of Illegal Immigrant Paper
Final Draft of Illegal Immigrant PaperFinal Draft of Illegal Immigrant Paper
Final Draft of Illegal Immigrant Paper
 
Ev Adoption in the US and Government-backed EV incentives
Ev Adoption in the US and Government-backed EV incentivesEv Adoption in the US and Government-backed EV incentives
Ev Adoption in the US and Government-backed EV incentives
 
Contributions of Immigrant Labor to the American Economy - A Different Take
Contributions of Immigrant Labor to the American Economy - A Different TakeContributions of Immigrant Labor to the American Economy - A Different Take
Contributions of Immigrant Labor to the American Economy - A Different Take
 
Survey | 2019 U.S. Tax Survey
Survey | 2019 U.S. Tax SurveySurvey | 2019 U.S. Tax Survey
Survey | 2019 U.S. Tax Survey
 
Wonder - Julio - Scioli gana 1era, pero Macri la 2da vuelta
Wonder - Julio - Scioli gana 1era, pero Macri la 2da vueltaWonder - Julio - Scioli gana 1era, pero Macri la 2da vuelta
Wonder - Julio - Scioli gana 1era, pero Macri la 2da vuelta
 
Mercer Capital's Value Focus: Laboratory Services | Mid-Year 2016
Mercer Capital's Value Focus: Laboratory Services | Mid-Year 2016Mercer Capital's Value Focus: Laboratory Services | Mid-Year 2016
Mercer Capital's Value Focus: Laboratory Services | Mid-Year 2016
 
eco620finalpaper
eco620finalpapereco620finalpaper
eco620finalpaper
 
Victims of Crime Survey 2015/2016
Victims of Crime Survey 2015/2016Victims of Crime Survey 2015/2016
Victims of Crime Survey 2015/2016
 

Viewers also liked

Multivibrator bistabil
Multivibrator bistabilMultivibrator bistabil
Multivibrator bistabil
RifqiAdy
 
Charles E. Weaver Jr. Resume
Charles E. Weaver Jr. ResumeCharles E. Weaver Jr. Resume
Charles E. Weaver Jr. Resume
Charles Weaver
 
Presentasi
PresentasiPresentasi
Presentasi
RifqiAdy
 
R_DAY_POSTER_F
R_DAY_POSTER_FR_DAY_POSTER_F
R_DAY_POSTER_F
Reuben Hilliard
 
I john dewey (1859 1952) ok (1)
I john dewey (1859 1952) ok (1)I john dewey (1859 1952) ok (1)
I john dewey (1859 1952) ok (1)
Leo Ls
 
SASDay2016
SASDay2016SASDay2016
SASDay2016
Reuben Hilliard
 
SAS Data Mining - Crime Modeling
SAS Data Mining - Crime ModelingSAS Data Mining - Crime Modeling
SAS Data Mining - Crime Modeling
John Michael Croft
 
Final SAS Day 2015 Poster
Final SAS Day 2015 PosterFinal SAS Day 2015 Poster
Final SAS Day 2015 Poster
Reuben Hilliard
 
Gmc portafolio actividad_integradora
Gmc portafolio actividad_integradoraGmc portafolio actividad_integradora
Gmc portafolio actividad_integradora
Georgina Magaña Cervantes
 

Viewers also liked (9)

Multivibrator bistabil
Multivibrator bistabilMultivibrator bistabil
Multivibrator bistabil
 
Charles E. Weaver Jr. Resume
Charles E. Weaver Jr. ResumeCharles E. Weaver Jr. Resume
Charles E. Weaver Jr. Resume
 
Presentasi
PresentasiPresentasi
Presentasi
 
R_DAY_POSTER_F
R_DAY_POSTER_FR_DAY_POSTER_F
R_DAY_POSTER_F
 
I john dewey (1859 1952) ok (1)
I john dewey (1859 1952) ok (1)I john dewey (1859 1952) ok (1)
I john dewey (1859 1952) ok (1)
 
SASDay2016
SASDay2016SASDay2016
SASDay2016
 
SAS Data Mining - Crime Modeling
SAS Data Mining - Crime ModelingSAS Data Mining - Crime Modeling
SAS Data Mining - Crime Modeling
 
Final SAS Day 2015 Poster
Final SAS Day 2015 PosterFinal SAS Day 2015 Poster
Final SAS Day 2015 Poster
 
Gmc portafolio actividad_integradora
Gmc portafolio actividad_integradoraGmc portafolio actividad_integradora
Gmc portafolio actividad_integradora
 

Similar to SAS Analytics Experience 2014

The Factors which Influence National Crime_5A
The Factors which Influence National Crime_5AThe Factors which Influence National Crime_5A
The Factors which Influence National Crime_5A
Tal Fisher
 
Scholarly Paper Rubric (100 Points Possible) Students.docx
Scholarly Paper Rubric (100 Points Possible)     Students.docxScholarly Paper Rubric (100 Points Possible)     Students.docx
Scholarly Paper Rubric (100 Points Possible) Students.docx
kenjordan97598
 
Effects of Socio - Economic Factors on Children Ever Born in India: Applicati...
Effects of Socio - Economic Factors on Children Ever Born in India: Applicati...Effects of Socio - Economic Factors on Children Ever Born in India: Applicati...
Effects of Socio - Economic Factors on Children Ever Born in India: Applicati...
inventionjournals
 
How Much Should We Trust China's GDP.pdf
How Much Should We Trust China's GDP.pdfHow Much Should We Trust China's GDP.pdf
How Much Should We Trust China's GDP.pdf
Kapil Khandelwal (KK)
 
Monitoring the impact of the economic crisis on crime final-1
Monitoring the impact of the economic crisis on crime  final-1Monitoring the impact of the economic crisis on crime  final-1
Monitoring the impact of the economic crisis on crime final-1
UN Global Pulse
 
Substance Abuse Mason, Michigan
Substance Abuse Mason, MichiganSubstance Abuse Mason, Michigan
Substance Abuse Mason, Michigan
recoveryrestart2
 
Gini index and income inequality
Gini index and income inequalityGini index and income inequality
Gini index and income inequality
Duanrui Shi
 
corruptionexpenditure_135050_Anaya_Luis
corruptionexpenditure_135050_Anaya_Luis corruptionexpenditure_135050_Anaya_Luis
corruptionexpenditure_135050_Anaya_Luis
Luis Rodrigo Anaya
 
Degree of economic_freedom_and_relationship_to_economic_growth
Degree of economic_freedom_and_relationship_to_economic_growthDegree of economic_freedom_and_relationship_to_economic_growth
Degree of economic_freedom_and_relationship_to_economic_growth
Anochi.com.
 
Fiscal Analysis Brief
Fiscal Analysis BriefFiscal Analysis Brief
Fiscal Analysis Brief
Las Vegas Chamber of Commerce
 
Mapping Business Challenges to Types of Control © 201.docx
Mapping Business Challenges to Types of Control © 201.docxMapping Business Challenges to Types of Control © 201.docx
Mapping Business Challenges to Types of Control © 201.docx
gertrudebellgrove
 
Rohr_CPS_ResearchPaper
Rohr_CPS_ResearchPaperRohr_CPS_ResearchPaper
Rohr_CPS_ResearchPaper
Rebecca Rohr
 
Math ia final
Math ia finalMath ia final
Math ia final
mischas
 
Demographic Assessment ProjectNURS 4404 Community Health .docx
Demographic Assessment ProjectNURS 4404 Community Health .docxDemographic Assessment ProjectNURS 4404 Community Health .docx
Demographic Assessment ProjectNURS 4404 Community Health .docx
simonithomas47935
 
COPS OFFICE PUBS 2011_2014
COPS OFFICE PUBS 2011_2014COPS OFFICE PUBS 2011_2014
COPS OFFICE PUBS 2011_2014
Fletcher Maffett
 
Presentation 1.pptx
Presentation 1.pptxPresentation 1.pptx
Presentation 1.pptx
UrgessaFiromsa
 
THE IMPACT OF YOUTH CRIMINAL BEHAVIORON ADULT EARNINGS.docx
THE IMPACT OF YOUTH CRIMINAL BEHAVIORON ADULT EARNINGS.docxTHE IMPACT OF YOUTH CRIMINAL BEHAVIORON ADULT EARNINGS.docx
THE IMPACT OF YOUTH CRIMINAL BEHAVIORON ADULT EARNINGS.docx
oreo10
 
Hdi calculation
Hdi calculationHdi calculation
Hdi calculation
Javed Mazher
 
Proyecto tasa de delincuencia
Proyecto tasa de delincuencia Proyecto tasa de delincuencia
Proyecto tasa de delincuencia
ClaribelPariTicona
 
Proyecto tasa de delincuencia subir
Proyecto tasa de delincuencia subirProyecto tasa de delincuencia subir
Proyecto tasa de delincuencia subir
ClaribelPariTicona
 

Similar to SAS Analytics Experience 2014 (20)

The Factors which Influence National Crime_5A
The Factors which Influence National Crime_5AThe Factors which Influence National Crime_5A
The Factors which Influence National Crime_5A
 
Scholarly Paper Rubric (100 Points Possible) Students.docx
Scholarly Paper Rubric (100 Points Possible)     Students.docxScholarly Paper Rubric (100 Points Possible)     Students.docx
Scholarly Paper Rubric (100 Points Possible) Students.docx
 
Effects of Socio - Economic Factors on Children Ever Born in India: Applicati...
Effects of Socio - Economic Factors on Children Ever Born in India: Applicati...Effects of Socio - Economic Factors on Children Ever Born in India: Applicati...
Effects of Socio - Economic Factors on Children Ever Born in India: Applicati...
 
How Much Should We Trust China's GDP.pdf
How Much Should We Trust China's GDP.pdfHow Much Should We Trust China's GDP.pdf
How Much Should We Trust China's GDP.pdf
 
Monitoring the impact of the economic crisis on crime final-1
Monitoring the impact of the economic crisis on crime  final-1Monitoring the impact of the economic crisis on crime  final-1
Monitoring the impact of the economic crisis on crime final-1
 
Substance Abuse Mason, Michigan
Substance Abuse Mason, MichiganSubstance Abuse Mason, Michigan
Substance Abuse Mason, Michigan
 
Gini index and income inequality
Gini index and income inequalityGini index and income inequality
Gini index and income inequality
 
corruptionexpenditure_135050_Anaya_Luis
corruptionexpenditure_135050_Anaya_Luis corruptionexpenditure_135050_Anaya_Luis
corruptionexpenditure_135050_Anaya_Luis
 
Degree of economic_freedom_and_relationship_to_economic_growth
Degree of economic_freedom_and_relationship_to_economic_growthDegree of economic_freedom_and_relationship_to_economic_growth
Degree of economic_freedom_and_relationship_to_economic_growth
 
Fiscal Analysis Brief
Fiscal Analysis BriefFiscal Analysis Brief
Fiscal Analysis Brief
 
Mapping Business Challenges to Types of Control © 201.docx
Mapping Business Challenges to Types of Control © 201.docxMapping Business Challenges to Types of Control © 201.docx
Mapping Business Challenges to Types of Control © 201.docx
 
Rohr_CPS_ResearchPaper
Rohr_CPS_ResearchPaperRohr_CPS_ResearchPaper
Rohr_CPS_ResearchPaper
 
Math ia final
Math ia finalMath ia final
Math ia final
 
Demographic Assessment ProjectNURS 4404 Community Health .docx
Demographic Assessment ProjectNURS 4404 Community Health .docxDemographic Assessment ProjectNURS 4404 Community Health .docx
Demographic Assessment ProjectNURS 4404 Community Health .docx
 
COPS OFFICE PUBS 2011_2014
COPS OFFICE PUBS 2011_2014COPS OFFICE PUBS 2011_2014
COPS OFFICE PUBS 2011_2014
 
Presentation 1.pptx
Presentation 1.pptxPresentation 1.pptx
Presentation 1.pptx
 
THE IMPACT OF YOUTH CRIMINAL BEHAVIORON ADULT EARNINGS.docx
THE IMPACT OF YOUTH CRIMINAL BEHAVIORON ADULT EARNINGS.docxTHE IMPACT OF YOUTH CRIMINAL BEHAVIORON ADULT EARNINGS.docx
THE IMPACT OF YOUTH CRIMINAL BEHAVIORON ADULT EARNINGS.docx
 
Hdi calculation
Hdi calculationHdi calculation
Hdi calculation
 
Proyecto tasa de delincuencia
Proyecto tasa de delincuencia Proyecto tasa de delincuencia
Proyecto tasa de delincuencia
 
Proyecto tasa de delincuencia subir
Proyecto tasa de delincuencia subirProyecto tasa de delincuencia subir
Proyecto tasa de delincuencia subir
 

SAS Analytics Experience 2014

  • 1. A Model to predict Crime Rate based on Education and Economic metrics. Reuben Hilliard Kennesaw State University Purpose Methods Why do yearly crime rates fluctuate up and down in the US? The exact reasons are difficult to quantify, but a number of Socio-economic variables have been identified as the most likely causes. The purpose of this analysis was to build a model to predict crime rate. It could be utilized to assist state & local governments in combatting and reducing crime, using an alternative solution outside the standard law enforcement or incarceration methodology. Data was collected from various sources, including The National Crime Victimization Survey Series,1 Bureau of Labor and Statistics,2 US Census Bureau,3,4 and National Education Association5. This raw data was extracted, cleaned and combined to build a master dataset. All the data collected was from the 2008 Calendar Year or 2007/2008 School Year. The dataset includes 50 observations, representing the 50 US States, of which 15 variables were analyzed. Total Population represents the total number of residents per State. The Population Size Level divides the States into 3 categories, ‘SPARSE’ (under 2 Mil.), ‘MEDIUM’ (2-10 Mil.) and ‘DENSE’ (over 10 Mil. inhabitants). The Violent Crime Incidents, which include murder, rape, robbery and assault and The Property Crime Incidents, which include burglary, larceny and vehicle theft have corresponding rates of per 100,000 residents, divided into ‘LO RISK’ (under 2500 incidents) or ‘HI RISK’ (over 2500 incidents) and finally a Combined Crime Rate (Total number of Incidents). Then the Unemployment Rate per state for 2008 and Median Household Income represent the economic indicators. Unemployment Rate Level was created to categorize Unemployment rate for the following levels, ‘ROBUST’ (Rate less 5%), ‘FUNCTIONAL’ (5-6%) and ‘INADEQUATE’ (over 6%). Lastly, there is the percent of adults over the age of 25 who have graduated from High School and the dollar amount spent per capita on Education from State and Local Governments, that both represent education indicators. Analysis of this dataset was performed on a number of variables to assess what their ultimate effect on the ‘Combined Crime Rate’ per State would be. All the data gathered was pre-recession, 2008. The goal was to generate models to predict a crime rate in a given State. The size of the State’s population was found to have no effect on the Crime Rate. Also, as Crime Rate was a normalized variable, State Population Density also did not influence this factor. Ultimately, after isolating the two variables that had the greatest direct influence on the Combined Crime Rate, namely ‘Unemployment Rate’ and ‘Percentage of High School Graduates over 25’, further analysis led to multiple linear regression analysis. All programming and output was performed using SAS 9.3. The following is a portion of the analysis. The histogram in Figure 1.1 and the boxplot in Figure 1.2, demonstrate that the distribution for Spending on Education per capita is unimodal and slightly skewed to the right. The median dollar amount spent on Education is $2769.00, and the Interquartile Range (IQR) is $387.00. There are five outliers. The histogram in Figure 2.1 and the boxplot in Figure 2.2, demonstrate that the distribution for Violent Crime Rate is unimodal and skewed to the right. The median rate is 348.20 incidents per 100,000 residents and the IQR is 245.10. There is only a single outlier. In regards to economic health and specifically, the unemployment rate, the ordered bar chart in Figure 3 indicates that 22 States have a ROBUST level, 12 a FUNCTIONAL level and 17 an INADEQUATE level. Figure 4 shows the Stacked Bar Chart for Unemployment Rate Level (URL) by Violent Risk Level (VRL). As the Bar Chart moves from left to right the HI RISK VRL indicator (shown in red) increases, only making up 31.8% in the ROBUST category, but 70.59% in the INADEQUATE category. As the economy declines, unemployment rates increased and in turn, violent crime rates increased. Introduction Analysis
  • 2. A Model to predict Crime Rate based on Education and Economic metrics. Reuben Hilliard Kennesaw State University Conclusion References 1. The National Crime Victimization Survey, 2008. Web retrieved 4/12/2014. https://explore.data.gov/Law-Enforcement-Courts-and-Prisons/National- Crime-Victimization-Survey-2008-Record-Ty/rfme-ynch 2. Unemployment Rate, 2008. Web retrieved 4/12/2014. http://www.bls.gov/lau/lastrk08.htm 3. Median Family Income, Wage Census 2008. Web retrieved 4/12/2014. http://www.census.gov/hhes/www/income/data/statemedian/ 4. State Education Levels, 2008. Web retrieved 4/12/2014. http://www.census.gov/compendia/statab/2011/tables/11s0229.pdf 5. Per Capita Spending on Education, 2007/2008. Web retrieved 4/12/2014. http://www.nea.org/assets/docs/HE/NEA_Rankings_and_Estimates010711.pdf These results in this study are important because it gives states the opportunity to approach crime from a different angle. Using this model, law-makers can predict their state’s crime rate, but more importantly focus their attention on decreasing the unemployment rate and increasing the number of high school graduates. Ultimately, it would cost states more to house and monitor criminals in the long term, than to invest in worker retraining programs for adults or technical preparation programs at local high schools, giving teenagers the skills they would need to be successful and productive citizens in our society. Additionally, altering the variables up or down can also predict where the Crime Rate would be in the future, allowing for a measure of course correction on behalf of officials. Further study would include using the data from the 2012 Census and other sources to refine these Models for future use. It would be interesting to compare the Combined Crime Rates for the 2008-2012 periods. During this time, the US went through the Great Recession, when unemployment rates skyrocketed and funding to schools were slashed.Multiple Linear Regression Equation: CCRate = 10907 - 94.32 (HS) + 15329 (URate) Figure 5: Regression Model for Crime Rate vs. HS Education Figure 6: Regression Model for Crime Rate vs. Unemployment Rate Table I: Performance of the bottom 10% of States Results Figure 5 shows the Regression Model for Combine Crime Rate (CCRate) vs. Percent of Population over 25 with High School Education (HS), with the equation of CCRate = 13826 – 118.6 (HS). HS is significant at a 0.05 level, as p = 0.0008. Also, High School Education Percentage and Crime Rate are negatively correlated. Figure 6 shows the Regression Model for Combine Crime Rate (CCRate) vs. Unemployment Rate (URate), with the equation of CCRate = 2128.5 + 26867 (URate). URate is significant at a 0.05 level, as p = 0.0073. Also, Unemployment Rate and Crime Rate are positively correlated. The correlation coefficient between the two variables, HS and URate was not strong enough to warrant any multicollinearity concerns. Using the indicators above, the dataset was formatted in decreasing order within SAS. This is displayed in Table I, ‘Performance of the Bottom 10% of States’. Michigan, Rhode Island, California, Nevada and South Carolina all had the highest Unemployment Rates and the lowest Percentage of High School Graduates. Corresponding Risk Levels were High, indicating that overall, these States also had a combined large number of Violent and Property Crime incidents per 100,000 inhabitants.
  • 3. A Model to predict Crime Rate based on Education and Economic metrics. Reuben Hilliard Kennesaw State University Base SAS 9.3 Code *Contingency table for Unemployment Rate Level by Violent Risk Level; title "Table 4: Contingency Table for Unemployment Rate Level (URL) by Violent Risk Level (VRL)"; proc sort data=pop1.project5; by vrl;run; proc freq data=pop1.project5; tables url*vrl;run; *100% Stacked Bar Chart for Unemployment Rate Level by Violent Risk Level"; proc freq data=pop1.project5; tables url*vrl; ods output crosstabfreqs=ct;title; data ct2; set ct(drop = _type_ _table_ missing ColPercent Percent Frequency); if not missing (rowpercent);run;title; proc sgplot data = ct2; title1 "Figure 4: 100% Stacked Bar Chart for Unemployment Rate Level by Violent Risk Level"; vbar url/ group = vrl response=rowpercent; yaxis label= "Percent within Violent Risk Level";run; *Property Crime Rate and Population Size Level; proc sort data=pop1.project5;by popsl; run; *Regression Models for Combined Crime Rate vs. High School Education(%)and Unemployment Rate; %macro Regmodel(var1=,var2=); proc reg data=pop1.project5 plots =(CooksD); model &var1=&var2/r; run; %mend Regmodel; %Regmodel(var1=ccrate,var2=hs); %Regmodel(var1=ccrate,var2=urate); *Sorting performance of bottom 10% of States; proc sort data=pop1.project5 out=sorted; by descending urate descending hs;run; proc print data=sorted (obs=5); title "Table 5: Performance of Bottom 10% of States by metrics: Unemployment Rate and Percent High School Education"; var urate hs state ccrate; ods rtf close; *Run all formats; proc format; VALUE URLformat 1= "ROBUST" 2 = "FUNCTIONAL" 3 = "INADEQUATE"; run; quit; proc format; VALUE PopSLformat 1= "SPARSE" 2 = "MEDIUM" 3 = "DENSE"; run; quit; proc format; VALUE VRLformat 1= "LO RISK" 2 = "HI RISK"; run; quit; proc format; VALUE PRLformat 1= "LO RISK" 2 = "HI RISK"; run; quit; *Creating new variable: Unemployment Rate Level (URL); data pop1.project2; set pop1.project1a; if urate<0.050 then URL=1; else if urate >=0.050 & urate <0.060 then URL=2; else if urate >= 0.060 then URL=3;run; * Creating a new variable: Population Size Level; data pop1.project3; set pop1.project2; if TPop<2000000 then PopSL=1; else if TPop >=2000000 & TPop <10000000 then PopSL=2; else if TPop >= 0.060 then PopSL=3;run; * Creating a new variable: Violent Rate Level; data pop1.project4;set pop1.project3; if VCRate <400 then VRL=1; else if VCRate >= 400 then VRL=2;run; * Creating a new variable: Property Rate Level; data pop1.project5;set pop1.project4; if PCRate <2500 then PRL=1; else if PCRate >=2500 then PRL=2;run; *Macro Program to run Descriptive Stats, Histogram and Boxplot for ANY Variable; %macro Analyze(I=,BLANK1=); proc SQL noprint; select label into :LBlank1 from dictionary.columns where UPCASE(MEMNAME) = UPCASE("project5") & UPCASE(NAME) = UPCASE("&BLANK1"); quit; *Descriptive Statistics; proc means data=pop1.project5 mean median stddev qrange min max maxdec=2; title "Table &I:Descrptive Statistics for &LBlank1"; var &Blank1; *Histogram; proc sgplot data=pop1.project5; title "Figure &I..1: Histogram for &LBlank1"; histogram &Blank1/ fillattrs=(color=gold) transparency=0.5; *Boxplot; proc sgplot data=pop1.project5; title "Figure &I..2: Boxplot for &LBlank1"; vbox &Blank1; run; %mend Analyze; %Analyze(I=1,BLANK1=Edu) %Analyze(I=2,BLANK1=CCRate) *Create a frequency table for URL; ods rtf; proc freq data=pop1.project5; title 'Table 3: Frequency Table of Unemployment Rate Level for The 50 States'; tables URL;run; *Create and print the ordered bar chart; proc sgplot data = pop1.project5; title 'Figure 3: Bar Chart of Unemployment Rate Level for The 50 States'; vbar URL/ fillattrs=(color=green) transparency=0.5; xaxis LABEL = 'Unemployment Rate'; run;