S16_MAP_Using Data Science to Model Relationships Between Educational Levels and Crime Rates
1. Using Data Science to Model Relationships Between Educational Levels and Crime Rates
MAP Research Proposal
Spring 2016
Ibuki Ogasawara, Jarren Santos, and Peiyun Zhang
A) Topic and Project Description
The Ferguson shooting of Michael Brown, an unarmed black teenager, has called for
discussion regarding crime and what initiative to take. The United States Department of Justice
Civil Rights Division (2014) reported clear racial discrimination by the Ferguson, MO police
department. They revealed that African Americans "accounted for 85 percent of [the
department's] traffic stops, 90 percent [the department's] citations, and 93 percent of [the
department's arrests] from 2012 to 2014." Others like Vaughan (2014) have reported how unjust
incidents like Ferguson have initiated professionals to revisit urban policy, since we have done
nothing but "repealed the public jobs policy... and reversed civil rights policies." One key area of
discussion focuses on educational policies. For example, Hannah-Jones (2014) state that
“Brown’s tragedy, then, is not limited to his individual potential cut brutally short. His schooling
also reveals a more subtle, ongoing racial injustice: the vast disparity in resources and
expectations for black children in America’s stubbornly segregated educational system.” This
2014 Ferguson incident, is the motivation for our proposed MAP for the Spring 2016 semester to
use machine learning to model relationships between quality of education and crime rates.
While there is reason to believe that crime rates can be reduced by raising the education
levels, researchers have struggled to show a clear relationship. For example, Witte (1997) states
that “...neither years of schooling completed nor receipt of a high school degree has a significant
affect on an individual’s level of criminal activity.” Tauchen, et al. (1994) and Witte and
Tauchen (1994), also conclude that there is no connections between education and crime.
However, Moretti (2005) showed that a 1% increase in the high school completion rate of all
men ages 20-60 would save the United States as much as $1.4 billion per year in reduced costs
from crime incurred by victims and society at large. Lochner and Moretti (2001) stated that a
one-year increase in average education levels in a state reduces state-level arrest rates by 11
percent or more. They also explored separate effects of education for different types of crime and
found that an increase in average years of schooling would reduce specifically both property and
violent crime. Lochner and Moretti (2001) and Viscusi (1986) claimed that education raises the
opportunity cost of crime and the cost of time spent in prison, which suggests crime may be an
alternative income source. In addition, Lochner (2004) suggested that “education is negatively
correlated with violent and property crimes even after controlling for a number of important
individual, family, and community characteristics” (Lochner, 2004). According to the most
recent data from the U.S. Bureau of Justice, "56 percent of federal inmates, 67 percent of inmates
in state prisons, and 69 percent of inmates in local jails did not complete high school" (The
Alliance, 2013). It is evident to model relationships between education and crime because it
could serve as strong evidence to encourage initiatives that will positively impact both factors.
We focus on two key locations, New York, New York and St. Louis, Missouri. Our goal
is to develop models and interactive visualization demonstrating how education levels in each
district relates to the crime rate. We will create multiple statistical models and visualizations to
2. explore the possible relationships between crime and education. We will use several years of
data within the NYPD Stop, Question, and Frisk Database, the New York City Department of
Education, the St. Louis Metropolitan Police Department crime files, and the Missouri
Comprehensive Data System provided by the Missouri Department of Elementary and
Secondary Education. This work will build upon the Summer 2015 MAP: Interactive
Visualization and Modeling of Large Datasets analyzed that visualized the 2006 crime datasets
in New York City. We plan to update their model by including data from before and after
2006. Additionally, we want to integrate these results with education datasets by school districts
to develop a new statistical models and visualizations.
This project will allow all three students, Ibuki, Jarren, and Peiyun, to apply the
mathematical, statistical, and computer science concepts learned so far to obtain important
interpretations of relevant datasets. In a collaborative effort, each of these students will work to
create statistical models using R statistical software and interactive data visualization tools. In
addition to R, all three students will be using elements from data science, machine learning and
multivariate modeling to ensure that the models and visualizations presented are accurate and
meaningful.
We will focus on the following primary sources of data for this MAP:
• The Stop, Question, and Frisk Report Database was collected and provided by the New
York Police Department. This database includes over 500,000 police interactions with
detailed information about location, time, criminal possession of weaponry, criminal
response, criminal information, officer information, and other details. Entries in this
database include data from 2003 to 2014.
• The SLMPD Crime Files was collected by the St. Louis Metropolitan Police Department
(SLMPD). This database includes basic information about crimes related to the SLMPD
that occurred within the city of St. Louis with detailed information about location, time,
type of arrest, criminal response, and other details. Entries in this database include data
from 2008 to present.
• The Missouri Comprehensive Data System was provided by the Missouri Department of
Elementary and Secondary Education that allows the public to access education-related
data. This developing data resource has information on educational factors like
accountability, college and career, early childhood education, state assessment data,
student characteristics, and other details. This database includes more recent entries that
go all the way up to 2015.
• The New York City Department of Education releases “Data About Schools” that covers
information about New York City’s student population and schools. This data resource
has information on student and school performance, population and demographics,
management and operations, public surveys, and more. This database includes data from
2005 to present.
B) Relationship of this project to your previous studies
3. Peiyun Zhang ‘17
Peiyun’s related course coursework includes MAT 209, MAT309, MAT310 ,and
MAT335 from Mathematics and Statistics department and CSC151 and CSC161 from computer
science department. Courses particularly relevant includes MAT310 and MAT335, which have
given him a basic understanding of both applied and theoretical statistics and have equipped him
with solid statistical/coding techniques that would be very beneficial in analyzing data. Peiyun
has been constantly exposed to R, which is the analysis tool that will be used in this project.
Although R is a statistical tool, it involves fairly heavy coding processes. Therefore, his one-year
computer science background in Grinnell enables him to perform better on R. With all these
being said, Peiyun is well-prepared regarding to the background knowledge and analytical skills
that are involved in such a project. He is particularly interested in applied statistics and data
science. He is even on the track to develope an independent major in data science, which
combines relevant statistics and computer science courses. Since Peiyun wants to pursue more
data science education at the graduate level, this project will be a very instructive and meaningful
for him regarding to his future education and work.
C) Sources
Hronec, Stefan, Beata Mikusova Merickova, and Jana Hroncova Vicianova. “Social Non-
Economic Effects of Education on The Level of Crime.” The New Educational Review. 38.4
(2014): 43-57. Print.
Laura, Crystal T., and Deborah Lynch. "Teaching Ferguson: Meaningful Classroom Dialogue
About the Michael Brown Case." Black History Bulletin. 78.1 (2014): 6-10. Print.
Lochner, Lance. "Education, Work, and Crime: A Human Capital Approach." National Bureau
of Economic Research. May 2004. http://www.nber.org/papers/w10478.pdf
Lochner, Lance, and Enrico Moretti. “The effect of education on crime: Evidence from prison
inmates, arrests, and self-reports.” No. w8605. National Bureau of Economic Research, (2001).
Missouri Department of Elementary and Secondary Education. Missouri Comprehensive Data
System. Missouri Government. January 2015. Web. URL:
http://mcds.dese.mo.gov/Pages/default.aspx
Moretti, Enrico. "Does education reduce participation in criminal activities." Symposium on “The
Social Costs of Inadequate Education.” Columbia University Teachers College. (2005).
New York City Department of Education. Data about Schools.Web. New York Government.
January 2014. Web.URL:http://schools.nyc.gov/AboutUs/schools/data/default.htm
New York Open Data. Education Datasets. December 2014. Web. URL:
https://data.cityofnewyork.us/
New York Police Department. Stop, Question, and Frisk Database. New York Government.
January 2003. Web. URL:
4. http://www.nyc.gov/html/nypd/html/analysis_and_planning/stop_question_and_frisk_report.sht
ml
Sparks, Sarah D. “Hidden Biases Tough for Schools to Erase.” Education Week 16 September
2015: 35.4
St. Louis Metropolitan Police Department. SLMPD Crime data. Web. January 2008. URL:
http://www.slmpd.org/Crimereports.shtml
The Alliance." Saving Future, Saving Dollars: The Impact of Education on Crime Reduction and
Earnings." Alliance for Excellent Education. September 2013. Web. URL:
http://all4ed.org/reports-factsheets/saving-futures-saving-dollars-the-impact-of-education-on-
crime-reduction-and-earnings-2/
United States Department of Justice Civil Rights Division. "Investigation of the Ferguson Police
Department." Department of Justice. 4 March 2015. Web. URL:
http://www.justice.gov/sites/default/files/opa/press-
releases/attachments/2015/03/04/ferguson_police_department_report.pdf
Vaughan, Richard. "How 'Resegregation' was the Spark that Ignited US Riots." TES: Times
Educational Supplement. 5109. (2014): 12-14. Print.
Viscusi, W. Kip. "Market Incentives for Criminal Behavior." in R. Freeman and H. Holzer, eds,
"The Black Youth Employment Crisis." University of Chicago Press, Chicago, 8. (1986)
D) List of graded work and deadlines
E) Budget for any needed materials and/or travel