Handwritten Text Recognition for manuscripts and early printed texts
Fundamentalsof Crime Mapping 8
1.
2. Understand the difference between qualitative and
quantitative data.
Define and explain levels of measurement including
nominal, ordinal, interval, and ratio.
Understand the difference between discrete and continuous
variables.
Understand descriptive statistics, including typical measures
of central tendency and dispersion.
Understand inferential statistics, including typical tests of
significance and measures of association.
Understand what a regression model is and how it works.
Understand the limitations of statistics and how their
improper application can yield misleading results.
Define and explain classification in crime mapping and be
able to identify strengths and weaknesses of each method.
3. Qualitative
◦ Yields narrative-oriented information
Park, Blue, Yes, Tall, Short, etc
Quantitative
◦ Produces number-oriented information
Key Factors or ―Variables‖
4. Ratio
◦ Highest level
◦ Can be reclassified to any of the other
levels
◦ - ∞ to + ∞
Interval
◦ Precise value of a measure is known
and thus can also be ranked
◦ 1,2,3,4,5,6,7,8,9,10
Ordinal
◦ Rank order nominal data and order can
be important
◦ Officer, Sergeant, Lt, Commander, Majo
r, Chief
Nominal
◦ Male, Female
5. Nominal
◦ Dichotomous Caucasian
African American Non-Caucasian
Caucasian
Hispanic
Native American
Asian
Other
Must be mutually
exclusive and
exhaustive
6.
7. Traits, concepts,
and ideas in
criminal justice
can be difficult to
Ordinal operationalize, or
measure.
◦ Categorical or numerical data
that can be ranked, but the
precise value is not known
Likert scale example
I feel safe walking in my neighborhood
alone at night
1 -Strongly agree
2 – Agree What is your annual household
3 – Neutral income?
4 – Disagree 1. Less than $20,000
5 - Strongly disagree 2. Between $20,000 and $40,000
6 - Don’t know
3. Between $40,001 and $60,000
4. Between $60,001 and $80,000
5. More than $80,000
8.
9. Validity
◦ A variable accurately
reflects the trait or
concept it is measuring
Reliability
◦ The measure is
representative
consistently across
people, places, and time
10. Interval
◦ What is your annual
household income?
__________________
Ranking possible and
precise value known
112 burglaries occurred in
beat 32
11.
12. Ratio
◦ Treated the same as
interval data
112.23 burglaries occurred
on average in beat 32
Can we have .23 of a
burglary?
14. Discrete Continuous
◦ Variables that cannot be Can be subdivided—
subdivided theoretically they can
be subdivided an
The number of persons
living in a household is a infinite number of
discrete variable. For
times.
example, there cannot be
Time for example
2.3 persons living in a
Days, Hrs, Mins, Secs,
household. There can be 2,
Nanosecs, etc.
or there can be 3, but not
2.3.
15. Rates Ratios
◦ Violent crimes per Violent Crimes ―per‖
100,000 population Property crime
Violent Crimes / Violent crimes = 10
(Population/100000) = Property crimes = 300
Rate PC/VC (300/10)=30
For every one violent
crime, there are 30 property
crimes
16. Percent Change
◦ For comparing time
periods
((New-Old)/Old) *100
2009 property crimes =2567
2008 property crimes = 2655
Percent change=
(2567-2655)/2655
or -0.033 * 100 = -3.3%
17. Measures of Central
25
Tendency
55
◦ Mean or Average
56
65
Average of a distribution of
Median = 82-72
72
values
= 10/2
82
= 72+5 ◦ Mode
82
84
Most often found value in a
90
distribution
97
◦ Median
The middle value in a
distribution
19. Mean
Positive or Right Skewed
◦ Should not be used
when distribution is
greatly ―skewed‖
As with most crime data
◦ Use Median where it
Almost normal
makes sense instead
Negative or Left
Skewed
20.
21. Measures of Variance or
Dispersion 25
◦ Range 55
55
The distance between the 1st Quartile = 57.5
65
lowest and highest score
72
◦ Interquartile range 26
82
The distance between the 82
3rd Quartile = 83.5
25th and 75th percentile 84
◦ Variance 90
The average squared 97
distance of each score in a
distribution from the mean
of the distribution
◦ Standard deviation
The average distance of each
score from the mean
22.
23. Measures of Variance or
Dispersion
◦ Range
The distance between the
lowest and highest score
◦ Interquartile range
The distance between the
25th and 75th percentile
◦ Variance
The average squared
distance of each score in a
distribution from the mean
of the distribution
◦ Standard deviation
The average distance of each
score from the mean
24.
25. Sample Analyzed and
―infer‖ information to
the population
◦ Probability theory
The number of times
any given outcome will
occur if the event is
repeated many times.
27. Histogram
◦ Normal Average 13.6
Median 10
Mode 1
◦ Skewed
Average 20
Average 26.20
Median 20
Median 30
Mode 20
Mode 40
28. What variables are available?
What is the overall n?
What is the unit of analysis?
What do I want to know about the variable(s)?
What is the level of measurement of the
variable(s)?
Are the variables discrete or continuous?
How many groups will be compared in the
analysis?
Am I interested in just describing the data or
finding inferences within it?
29. Independent variable
◦ The variable that analysts are trying to explain
(in crime mapping, the dependent variable is often some
crime measure).
Dependent variable
◦ Variables that produce a change in our dependent
variable
30. X
Casual relationship
Intervening variable
◦
Antecedent variable
Multicollinearity
◦
Contingent variable
◦ Z Y
Multicollinearity
◦
When X, Y, and Z have overlapping measures of the same
concept
◦ Spurious relationships
When X and Y have no direct relationship but are both
affected by Z
31. Chi-square
T-tests
Z-tests
ANOVA
◦ Essentially, they work by determining whether or not
variable distributions or differences between groups
or areas would be expected based on random
chance
32. Lambda
Gamma
Kendall’s tau statistics
Spearman’s rho
Pearson’s correlation coefficient
◦ To determine the strength and direction of a
relationship between two variables
◦ Values between -1 and +1
◦ Inverse/negative or positive relationships possible
Variable 2 Variable 2
Variable 1 Variable 1
33. Spatial Autocorrelation
◦ Moran’s I
A value between 0 and 1 indicates positive spatial
autocorrelation (or clustering).
A value between 1 and 0 indicates negative spatial
autocorrelation (random distribution).
◦ Geary’s C
Values under 1 signify positive spatial autocorrelation
Values over 1 designate negative spatial autocorrelation
34. Linear relationship
◦ (OLS) Ordinary least-squares
Y =a + b1 X1 + b2 X2 + b3 X3 …
◦ Units of analysis
Must be the same
35.
36. Nominal (categories), Ordinal, Interval and Ratio
(Quantities) can be used with different methods
Fills and outlines
Nominal data
example
Ratio Data
Example
37. Category data
symbology
comes next
It displays data
by unique values
of a field, or
multiple fields
Nominal, ordinal,
ratio or interval
data
38. Next, comes the
quantities
symbology
method
It uses a number
field in the table
to display data by
classified values
Ratio and interval
data
39. Six different ways to classify data, with an
added manual method for infinite freedom
40. Equal Interval
Defined Interval
Quantile
Natural Breaks
Geometrical Interval
Standard Deviation
41. Categorical (Qualitative)
Grouping based on some quality
◦
Labels or categories
◦
E.g.; Sex = Male or Female
◦
Nominal or Ordinal
◦
Nominal the order is not important
E.g.: Sex = male or female
Ordinal the order is important
E.g.; Rank = Officer, Sergeant, Lieutenant, etc
◦ Can be binary or non-binary
Binary = only two values (male or female)
Non-Binary = More than two (red, blonde, brunette, etc)
42. Measurement (Quantitative)
◦ Grouping based on some quantity or value
◦ Always numbers
◦ Discrete or continuous
Discrete = only certain values are possible and data
could have gaps (1, 2, 3, or 4)
Continuous = Any value along some interval (any value
between 1 and 4 (ie: 3.24211)
◦ Interval or ratio
In interval data the interval between values is important
(ie; temperature of 30 compared to 110 means
something)
Ratio data is the best, and the ―0‖ value can be
informative (ie; a grid can have 0 crimes, or any value
up to infinity)
44. Number of
Equal Interval (ratio, Interval)
classes desired
◦ The range between the classifications is thedetermines
interval
same
Take the
high value-low
value and for
each of the 5
classes, the value
is 199.61
45. Defined Interval (ratio, interval)
◦ Similar to the equal interval, but here, we
define what the interval will be and thus
establish the classes
In this case the
interval was set
to 150, and so
the number of
classes is
determined by
the interval
46. Quantile (ratio, interval)
◦ A percentage of the values in the class
falling with the range. Each class contains
an equal number of features.
Each of the 10
classes has the
same number of
features within
each class, or
makes up 10% of
the total records
47. Natural Breaks (ratio, interval)
◦ Breaks the data where there are natural
holes between values
Use test exam score example
48. Geometrical Interval (ratio, interval)
◦ This is a classification scheme where the
class breaks are based on class intervals
that have a geometrical series. This
ensures that each class range has
approximately the same number of values
with each class and that the change
between intervals is fairly consistent.
The interval is
determined by a
geometric
equation (large
and small
changes
depending on
breaks in data)
49. Standard Deviation (ratio, interval)
◦ Classes are determined by mean and
standard deviation of values. Can display
by 1, ½, ¼ standard deviations as needed
50. Getting to know your data, and the factors that
influence crime can help analysts create more useful
maps and analysis products and do problem solving
Handling data properly will keep your from making
incorrect assumptions and coming to unrealistic
conclusions
Remember the wheel of science