SlideShare a Scribd company logo
1 of 54
Prof Ashis Sarkar, W.B.S.E.S.
Managing Editor & Publisher: Indian Journal of Spatial Science
www.indiansss.org
Sampling and Probability in Geography
7-Day Workshop (11 – 17 July, 2017)
On
“Statistical Methods and Techniques of Spatial Analysis”
organized by
Women’s Christian College, Kolkata
in collaboration with
Department of Geography, University of Calcutta
Why “Statistics” in Geography??
1. Geography is rooted in the ancient practice, concerned with the
characteristics of places, in particular the interrelations between natural
habitat, economy and society.
2. Its unique identity was given by Eratosthenes (276–194 BC): geo+graphein
= “earth description.” It seeks to answer the questions of — why things are
a) as they are, and
b) where they are.
Thus, it is concerned with the characteristics of places:
1. Location and
2. Attributes at Locations
3. Currently, Geography seeks —
a) to classify phenomena,
b) to compare,
c) to generalize,
d) to ascend from effects to causes, and in doing so,
e) to trace out the laws of nature and to mark their influences upon man,
thereby simply making it 'a description of the world’.
4. Hence, it is a SCIENCE — of argument and reason, and of course, of
cause and effect.
1. It helps us to understand the principles of living with nature….
2. It teaches us to put things in perspective….
3. It helps us to understand, live and exist in the current world peacefully
with others…
4. It helps us to ensure social order, social justice and sustainable smart
living….
5. It builds a bridge between the natural / physical and social sciences..
The spatial order on the earth’s surface can be defined, identified and
analysed (measured, monitored, mapped and modelled) with a scientific
geographical mind in space–time frame.
It forms the “philosophical foundation of the discipline of Geography, which is
both inter-disciplinary and multi-disciplinary.
The effects of QR (late 1950s–80s) are manifested in the following ways —
1. Statistical Techniques remains a key and virtually universal element in
the training of new breed of quantitative geographers from positivist
approach.
2. The RS/GIS revolution renewed the resurgence of interest in Spatial
Statistics (e.g., TFL and TSL).
3. Besides, statistical techniques helped the Marxist, Structuralist, Political,
Economic and even Behaviouralist Geographers
1. Geographers study —
a) how and why elements differ from place to place, as well as
b) how spatial patterns change through time.
2. Thus, geographers begin with the question ‘where?',
exploring how features are distributed on a physical or
cultural landscape,
observing the spatial patterns and its trend.
3. Contemporary geographical analysis has shifted from ‘where’
to ‘why’—
a) why a specific spatial pattern exists,
b) what spatial processes have produced such pattern, and
c) why such processes operate.
3. Only via these 'why' questions, geographers begin to
understand the mechanisms of change, which are, in fact,
infinite in their complexity.
Geography needs it because―
1. These help summarize the findings
of studies (e.g.: total rainfall during a
period in a state),
2. These help understanding the
phenomenon under study (e.g.:
rainfall is more in the southern
districts),
3. These help forecast the state of
variables (e.g.: draught is likely to
occur next year),
4. These help evaluate performance of
certain activity (e.g.: more rainfall
means more rice production),
5. These help decision making (e.g.:
finding out the best location),
6. They also help to establish whether
relationships between the
‘characteristics’ of a set of
observations are genuine or not, and
thus make a valuable contribution to
Geographical knowledge.
Geographers use Statistical
Methods —
1. To describe and summarize
spatial data.
2. To make generalizations
about complex spatial
patterns.
3. To estimate the probability of
outcomes for an event at a
given location.
4. To use sample data to infer
characteristics for a larger set
of geographic data.
5. To determine if the magnitude
or frequency of some
phenomenon differs from one
location to another.
6. To learn whether an actual
spatial pattern matches some
expected pattern.
1. Statistical analysis in Geography
is unique as it concerns ‘spatial’ or
‘geographically referenced’ data
(with co-ordinates).
2. The variety of techniques being
almost infinite, the GDA has to
pick and choose the best suited
one for his specific job.
3. Today user-friendly software
packages are easily available,
e.g., SPSS, Statistica, R, etc
4. Processing of geographical data
involves the ‘application of
suitable’ techniques of spatial
statistics.
5. Its presentation requires the
‘application of the most suited’
cartographic techniques.
6. Its interpretation needs the ‘wisest
use of geographical principles’
leading ultimately to the scientific
geographical explanation.
Spatial data have difficulties associated with
its analysis: boundary delineation,
modifiable areal units, and the level of
spatial aggregation or scale. That is why,
methods of statistical analysis varies:
1. A study of /capita income within a city, if
confined to the inner core shows that
income levels are lower, but if whole city
is taken, it become higher. In the
determination of internal boundaries this
is also true. Thus interpretations are valid
for the area and sub-area configuration of
a region.
2. In Census, the available information is
hierarchically arranged. The GDA must
use the same level for comparison.
3. Socio-economic data are available at a
variety of scales, e.g., Ward, Municipality,
Mouza, GP, Block, District, and State:
When it is aggregated at different scales,
the resulting descriptive statistics may
exhibit variations, either in a systematic,
predictable way, or in a more uncertain
fashion.
Nature of Geographical Data
Geographical data has two important properties —
1. the reference to geographic space, which means the data are
registered to a geographical co-ordinate system so that data from
different sources can be cross – referenced and integrated, and
2. the representation at geographical scale; data are normally recorded at
relatively small scales and are often generalised and symbolised.
Therefore, geographical data / spatial data / map data are essentially scaled
spatial data. It falls into three categories —
1. the geodetic control network (GCN) {Reference Map},
2. the topographic base (tBase) {Contour Map}, and
3. the geographic overlays (GrO) {Thematic Map}
Geographical data are recorded using four different means —
1. Alphanumeric characters or texts (Document Form: document data),
2. Numerical values (Numeric Form: numeric data),
3. Symbols or signs (Graphical Form: graph/diagram/map data), and
4. Signals (Digital Form: digital data)
Geographical data concern facts about the geographic reality and
geographical facts are always inter-subjective and reliable.
Inter-subjective ⇒ repeated observations of the same phenomena by
different people yield the same factual statement
Reliable ⇒ an observer repeatedly recording the same
phenomena produces the same factual statement
A geographer’s data matrix (GDM) is a matrix of individuals (objects or events)
against attributes (various observations made on the attributes of these
individuals).
Places Co-ordinates a1 aN
x y
1 .. .. .. ..
2
3 .. .. .. ..
N .. .. .. ..
1. A single data set typically contains
information on many variables and
many objects.
2. Conventionally, variables are entered
in columns and objects in rows.
3. Thus, each measurement on a row
genuinely concerns the same object
and all data in a single column relate
to the same variable.
4. The entire contents of a column are
normally analysed together such that
the analysis essentially concerns
comparable data (GDM).
Real world features occur in two fundamental forms –
1. Objects (these are discrete and definite: e.g., hills, rivers, forests,
mines, highways, cities, bio-reserves) and
2. Phenomena (these are continuously distributed over a geographical
space: e.g., terrain, slope, rainfall, temperature, pollution level and
other environmental indices).
A spatial object has three essential characteristics –
1. It can be delineated by identifiable boundaries,
2. it can be described by one or more attributes, and
3. it is relevant to some intended application.
Spatial objects may be:
1. “exact objects”, when they have well defined boundaries (e.g.,
landholding, building, etc) and
2. “inexact objects” or “fuzzy entities”, when such boundaries are not well
defined (e.g., landform features and natural resource features).
Graphically, spatial objects are represented as three geometric elements –
points (0D), lines (1D), and polygons (2D).
There are two distinct ways of the real world representation in geographic
database –
1. the object-based model (it views geographic space as populated by sets of
objects, for which data are obtained both by field surveying and laboratory
analysis. In such databases, attributes are arranged against locations
defined by co-ordinate lists or vector lines. Hence, it is called the “vector
data structure”), and
2. the field-based model (it views geographic space as populated by one or
more phenomena, that are basically represented as “surfaces”; these are
often conceptualised as being made up of a number of spatial data units.
Data for these are normally arranged against locations defined by
elemental tessellation in the form of “square” or “rectangular” cells. Hence,
such data structures are called “raster data models”.
The field-based spatial data can be obtained either
1. Directly from aerial photography, satellite imagery, map scanning and field
measurements at selected or sampled locations (e.g., data for triangulated
irregular networks or TIN) or
2. Indirectly generated applying mathematical functions, e.g., contours and
digital elevation models (DEM).
Geographical data are often collected on the basis of Areal Units—
1. These may be Natural, e.g., plain, plateau, mountains, etc.
2. It can well be Artificial, e.g.,
• Singular units (e.g., individual households or landholding) or
• Collective / Administrative divisions (i.e., regions made up of many
such landholdings).
These units are often arranged into some sort of hierarchical structure—
India = {States} State = {Districts} Districts = {CD Blocks}
CDB = {Gram Panchayats} GP = {Landholdings/Households}
At each level these have different but nested hierarchic identification code that
makes them unique (Census).
In such a perfect set-hierarchy, comparisons can only be made between similar
individuals occupying the same level in the hierarchy:
GP vs. GP, CDB vs. CDB, District vs. District, State vs. State
Thus, GP level inferences may not hold good for the CDB / District / State level.
1. A “high level” to “low level” analysis yields a contextual relationship and
2. A “low level” to “high level” analysis yields an aggregate relationship.
These are unique and require its own particular mode of thought about data
collection, storage, and procedure for manipulation and analysis.
Based on “simple scalar systems” for measuring objects or their attributes,
Stevens (1959) describes four kinds of measurement models or scales with one
or two hybrid variants ―
1. Nominal or Categorical Data: It provides a device for labelling or classifying
objects rather than measuring its attributes. These are qualitative data,
often presented in the form of names. Special “statistical techniques” are
used for this sort of data.
2. Ordinal Data: In this, a set of objects is ordered from the "most“ to the
"least" but in which there is actually no information regarding the value of
the measured attributes.
Thus, it produces poor quality data and is not valid for algebraic operations
(i.e., addition, subtraction, multiplication or division) and requires the use of
non-parametric statistical methods for analysis.
3. Interval Data: In this, one can not only rank order objects with respect to a
measured attribute but is also able to specify how far apart the magnitudes
are from each other.
However, it has no natural origin: lack of “absolute zero”.
4. Ratio Data: It incorporates equivalence of ratios and starts from an absolute
zero. It is most precise as it uses the real number system and allows
continuous measurement. Units can be converted directly from one system
to another. It can be divided in various ways, as follows ―
a) Continuous and Discrete data.
b) Directional and Non-directional data.
c) Closed and Open data.
d) Positive and Negative data.
e) Point (place) and Period (time) data.
f) Spatial and Non-spatial/ Attribute data.
Scale of Measurement is very important, because it determines the Techniques
and Measures of —
1. Data Description (…)
2. Correlation and Association between and among Variables (…)
3. Regression and Estimation (….)
4. Comparison and Tests of Significance (…)
5. Randomness, Order and Regularity (…)
6. Forecasting and Decision Making (…)
7. Understanding the Phenomena (…)
8. Cartographic Presentation (…)
1. There is no such thing as an exact measurement.
2. All measurements contain error, the magnitude being dependent on the
instruments used and also the ability of the observer to use.
3. As the true value is never known, the true error is never determined.
4. The degree of accuracy, or its precision, can only be called a relative
accuracy.
5. The estimated error is shown as a fraction of the measured quantity. Thus
100 ft measured with an estimated error of 1 inch represents a relative
accuracy of 1 / 1200. An error of 1 cm in 100 m = 1 / 10,000.
6. Where readings are taken on a graduated scale to the nearest subdivision,
the maximum error in estimation will be ± 1/2 division.
7. Repeated measurement increases the accuracy by √n, where n = number
of repetitions. (N.B. This cant not be applied indefinitely).
8. Agreement between repeated measurements does not imply accuracy but
only consistency.
To Note…..
Measurement
Scale
Ratio, Interval,
Ordinal,
Nominal Data
Ratio, Interval,
Ordinal Data
Ratio, Interval
Data
Ratio Data
1-Component Case
- f, Mo Me, Px Mean, Variance GM, HM, CV
2-Component Case
Nominal Scale
Data
Ordinal Scale
Interval + Ratio
Scale
Chi Square
Contingency
Coefficient
U Test
Spearman’s rs
Kendall’s τ
Comparison of Mean
Comparison of Variance
Pearson’s r
Linear and Non-Linear
Regression
Multi-Component Case
Interval + Ratio
Scale
Multiple Variance Analysis
Co-Variance Analysis
Multiple Correlation
Multiple Regression
Statistical Tests / Methods and Measurement Scale
Scale
System
Defining
Relation
Possible
Transformations
Central
Tendency
Dispersion Tests
Nominal
Equivalence
y = f ( x ),
where f ( x )
means any
one-to-one
substitution
Mode % in the
Mode
Non - parametric :
Chi - square,
Contingency coefficient,
Goodman-Kruskal's
Lambda,
Phi - coefficient
Ordinal
Equivalence
Greater than
y = f ( x ),
where f ( x )
means any
increasing
monotonic
transformation
Median Percentiles
Non - parametric :
Spearman's Rho
Kendall's Tau
Kolmogorov-Smirnov
Goodman-Kruskal's
Gamma
Phi – coefficient
Interval
Equivalence
Greater than Known
ratio of any two
intervals
any linear
transformation :
y = a. x + b
( a > 0 )
Mean Standard
Deviation
Parametric and
Non-parametric :
t - test
F - test
Pearson's r
Point Biserial etc.
Ratio
Equivalence
Greater than
Known ratio of any
two intervals
Known ratio of any
two scale values
y = c. x
( c > 0 )
Geometric
Mean
Coefficient
of
Variation
Parametric and
Non-parametric :
t - test
F - test
Pearson's r
Point Biserial etc.
Scale of Measurement and Statistical Measures and Tests
Levels of Measurement Mapping Techniques
Nominal Scale Chorochromatic Map
Choroschematic Map
Ordinal Scale Flowline Map
Ray Map
Qualitative Dot Map
Interval and Ratio Scale Isarithmic Maps
Quantitative Dot Map
Choropleth Map
Animated Map
Landform Map
Cartograms
Single Component Map
Bi-component Map
Multi-component Maps
Mapping Techniques and Measurement Scale
Why Sampling??
Sources of Geographical Information
1. Field Observations
a) Quantitative Measurements
b) Qualitative Observations
2. Archival Sources
a) Areal Archives: Maps, Air-photo, Satellite Image
b) Non-areal Sources: Census Data
3. Theoretical Works
a) Mathematical Models
b) Analogue Models
i. Physical Simulation Models
ii. Monte Carlo Simulation Models
1. Locational Data are collected mainly for non-geographical purposes.
2. Geographers depend on the original accuracy of the Survey.
3. Data are released in ‘bundles’, i.e., administrative areas which are
inconvenient and anachronistic and pose extremely acute problems in
mapping and interpretations.
1. Sampling is the process of selecting units (e.g., people,
organizations) from a population of interest so that by studying
the sample we may fairly generalize our results back to the
population from which they were chosen.
2. It is necessary because we usually cannot gather data from
the entire population due to large or inaccessible population or
lack of resources
3. It is used to draw inferences about the population as a whole.
4. The subset of units that are selected is called a sample.
Sampling
The purpose of geography is ‘...... to provide accurate, orderly, and rational
description and interpretation of the variable character of the earth surface’
(Hartshrone, 1959).
This can only be achieved in two particular ways ―
1. first, by increasing the database to an optimum level, and
2. second by taking representative samples.
Principles of Sampling
1. A statistical population (universe) is defined as the total set of measurements
which could be taken from the entity being studied.
2. A geographical population is too large to have either the time or money to
examine.
3. Therefore, we select from the population some sub-set of individuals for
study. These are called ‘samples’, and the selection procedure, ‘sampling’.
For example, all students of a college are considered as a population, but
students of a particular class may be taken as samples.
A sample of entities should faithfully represent its parent population so that
estimates from sample can be an accurate estimate of the corresponding figure
in the whole population. The equivalent actual figure in the population is known
as a parameter.
The fundamental principles of sampling are ―
1. Sample units must be chosen in a systematic and objective manner
2. They must be clearly defined and easily identifiable
3. They must be independent of one another
4. Same units of samples should be used throughout the study
5. Selection process should be based on sound criteria and should avoid
error, bias and distortions.
1. There are several measures of inferential statistics to test whether any
significant difference is there between sample statistic and population
parameter. If the difference is nil, samples are representative.
2. However, inaccuracy may be there due to errors in measurement,
computation, and data processing.
3. Therefore, the answer to the question as to which sampling to be used:
choose the method that gives the most representative results.
4. The number of items to be included in a sample yields the concept of
sampling fraction that can be estimated using techniques available for a
certain type of sampling.
Basically, the following methods are there ―
A. Non-probability Sampling: based on researcher’s subjective mind
1. Judgmental or Purposive (when only minimal information about the
parent population is available, or only certain sites are accessible)
2. Convenience or Accessibility (based on the ease of access to the sites
or members of population)
3. Quota (a combination of the above two: first the no. of observations are
decided and then samples are finally identified from these)
4. Volunteer (when certain members of the population / respondents
volunteer information)
B. Probability Sampling: based on probability, i.e., an element of chance exists
as to whether a particular observation is included in a sample.
1. Random: Samples are collected by chance methods or random numbers.
Random numbers could be generated in Calculator or a PC.
2. Systematic: By numbering each subject and then selecting the k th
subject .
3. Stratified: The population is divided into groups (strata) at first depending
on the `importance of study. Then samples were selected randomly within
each strata.
4. Cluster: The population is divided into groups (clusters) by some means
(geographic area). Then some clusters are randomly selected.
For Example:
1. geomorphologists collect slope data often based on quota sampling;
2. meteorologists capture weather patterns by sampling at measuring stations,
knowing that conditions between these sample locations will vary
systematically and smoothly.
3. Census data are published in spatially aggregated form and the TFL ensures
that the variance within each aggregate unit is not so large as to render the
aggregated data meaningless.
1. The ability to represent geographical features as points is certainly scale-
dependent:
a school may be a point feature on 1: 50000 scale maps, but not
on a 1: 2500 scale map.
2. In geography, point distribution is basic and the most fundamental: as lines
are simply ordered set of points and, areas are polygons / closed traverses
formed by lines. Hence, the sampling unit in most geographic studies is
either a point or a quadrat.
3. The basic purpose of analysis is to identify and distinguish homogeneity vs.
heterogeneity, isotropy vs. anisotropy, and randomness.
There are five types of Spatial Sampling ―
1. Random: By using a grid overlay on a map, a population of co-ordinate
points can be generated, each one of which can be identified by its x and y
– coordinates, and random numbers (from Tables, Calculators, PCs, etc)
can then be used for the selection of random samples.
2. Regular or Gridded: (also called, systematic area sampling planned on
perfectly square, or rectangular or triangular grids) The area is first divided
into uniform grids and samples are selected in a regular manner from each
grid.
3. Clustered: It is usually not planned, but often forced by patchy distribution
of objects.
4. Uniform or Systematic Random: It is initiated by the random selection of a
points or quadrats; then following a predetermined plan the remainder of
the sample is selected (e.g., planned by randomization within grid squares).
1. There are two basic designs for this ― one has points aligned in a
checkerboard fashion, while in the other, the points are non-aligned.
2. In the first case, the area is divided into a set of grids all containing the
same number of points. One point is chosen randomly in the first cell;
the corresponding point locations in other cells are then selected as the
remaining sample members.
3. In the second case, the first point is chosen as above. Its x – coordinate
is then held constant for all the remaining grids across the top row and a
point in each of these squares is then chosen by randomly selecting new
y – coordinates. Similarly, for other cells in the first column, the y –
coordinate of the point in the first cell is now held constant, while the x –
coordinate is randomized.
5. Traverse: The use of cross-sectional lines and traverses has always been
favoured in geographic approaches and found more efficient in estimating
spatial pattern. These are also the common practice, especially when
constrained by access and exposure, viz., contours, rivers, coasts, lakes,
volcanoes, roads, roads, etc.
As no single technique can satisfy a geographer’s objective, he is
free to make his own sampling design, e.g., quota, multi-phase,
multi-stage, etc (Haggett, 1968), as follows ―
Data Collecting System
Purposive or Hunch Sampling Controlled Sampling Complete Survey
Random Designs Systematic Designs
With Purposive With Systematic Unaligned Designs
Stratification Stratification
Nested or Stratified Systematic
or Hierarchical Designs Unaligned Designs
1. Any geographical population certainly have some internal variation, which
may either be random or systematic.
2. However, we may wish to compute statistics relating to just the overall
characteristics of the population.
3. If heterogeneity is apparent at the outset, the whole should be broken into
sub-populations and sampled and analysed separately.
4. In the special case of gradients or trends of variation, this must be defined
by traverses parallel to the gradient.
5. When variation is not so distinctive, all parts of the area need to be explored
equally. Hence, a uniform or gridded design is preferable as random
distributions may be too uneven.
6. As a general rule of thumb, a sample size of 30 is usually sufficient for the
sampling distribution of the means to be closely approximated by a normal
distribution.
Thus, acquisition of data comprise the following sequence:
define the population → construct a sample frame → select
a sampling design → specify the characteristics to be
measured for each element of the sample → collect the data
1. There are many methods of sampling when doing research.
2. Simple random sampling is the most ideal one.
3. But researchers seldom have the luxury of time or money to access the
whole population.
4. Hence, many compromises often have to be made.
Method Best when
Simple Random Sampling Whole population is available.
Stratified Sampling
(random within target
groups)
There are specific sub-groups to investigate
(e.g. demographic groupings).
Systematic Sampling
(every nth person)
When a stream of representative people are
available (e.g. in the street).
Cluster Sampling When population groups are separated and
access to all is difficult, e.g. in many distant cities.
Probability Methods
This is the best overall group of methods to use as you can subsequently use
the most powerful statistical analyses on the results.
Summary
Quota Methods
1. For a particular analysis and valid results, you can determine the number of
people you need to sample.
2. In particular when you are studying a number of groups and when sub-
groups are small, then you will need equivalent numbers to enable
equivalent analysis and conclusions
Method Best when
Quota Sampling
(get only as many as you need)
You have access to a wide
population, including sub-groups.
Proportionate Quota Sampling
(in proportion to population sub-groups)
You know the population distribution.
across groups, and when normal
sampling may not give enough in
minority groups.
Non-Proportionate Quota Sampling
(minimum number from each sub-group)
There is likely to a wide variation in
the characteristic within minority
groups.
Selective Methods
Sometimes your study leads you to target particular groups.
Method Best when
Purposive Sampling
(based on intent)
You are studying particular groups
Expert Sampling
(seeking ‘experts’)
You want expert opinion
Snowball Sampling
(ask for recommendations)
You seek similar subjects (eg. young
drinkers)
Modal Instance Sampling
(focus on typical people)
When sought 'typical' opinion may get
lost in a wider study, and when you are
able to identify the 'typical' group
Diversity Sampling
(deliberately seeking variation)
You are specifically seeking
differences, eg. to identify sub-groups
or potential conflicts
Convenience Methods
1. Good sampling is time-consuming and expensive.
2. Not all experimenters have the time or funds to use more accurate
methods.
3. There is a price, of course, in the potential limited validity of results.
Method Best when
Snowball Sampling
(ask for recommendations)
You are ethically and socially able to ask and
seek similar subjects.
Convenience Sampling
(use who's available)
You cannot proactively seek out subjects.
Judgement Sampling
(guess a good-enough sample)
You are expert and there is no other choice.
Ethnographic Methods
1. When doing field-based observations, it is often impossible to intrude into
the lives of people you are studying.
2. Samples must thus be surreptitious.
3. It may be based more on who is available and willing to participate in any
interviews or studies.
Method Best when
Selective Sampling
(gut feel)
Focus is needed in particular group, location,
subject, etc.
Theoretical Sampling
(testing a theory)
Theories are emerging and focused sampling
may help clarify these.
Convenience Sampling
(use who's available)
You cannot proactively seek out subjects.
Judgement Sampling
(guess a good-enough sample)
You are expert and there is no other choice.
Estimates from Samples
1) Population Mean
For this, the Standard Error of Mean (SEm ) is first calculated using the
following equation — SEm = s / √n, where, s = sample standard deviation,
and n = size of sample.
The equation has been contrived in such a way that —
1. Population Mean = Sample Mean ± SEm with 0.682 probability
2. Population Mean = Sample Mean ± 2SEm with 0.954 probability
3. Population Mean = Sample Mean ± 3SEm with 0.997 probability
Thus with a certain probability, the range of population mean can be easily
estimated.
Example:
Let for a data set of 100, Sample Mean = 12.34 and s = 2.56
Therefore, SEm = s / √n = 2.56 / √100 = 0.256
Thus,
Population Mean lies between 12.084 and 12.596 at 0.682 probability
Population Mean lies between 11.828 and 12.852 at 0.954 probability
Population Mean lies between 11.572 and 13.108 at 0.997 probability
2) Population Standard Deviation
The Standard Error of the Standard Deviation (SES ) is first calculated from the
following Equation —
SES = s / √(2n)
where, s = sample standard deviation, and n = size of the sample.
The equation has been contrived in such a way that —
1. σ= s ± SES with 0.682 probability
2. σ= s ± 2SES with 0.954 probability
3. σ= s ± 3SES with 0.997 probability
Thus with a certain chosen probability, the range of population standard
deviation (σ) can be easily estimated.
Example
For population standard deviation, SES = s / √(2n)= 2.56 / √(2x100)= 0.181
Thus,
Population SD lies between 2.379 and 2.741 at 0.682 probability
Population SD lies between 2.198 and 2.922 at 0.954 probability
Population SD lies between 2.017 and 3.103 at 0.997 probability
3) Proportion of Population
The standard error of a percentage is first estimated, as follows —
SE% = √(p.q / n)
where p is the percentage of a sample possessing the particular attribute, q is
the percentage of the sample not possessing that attribute, and n is the number
of individuals in the sample.
The Population % can be easily estimated by using equations, as follows —
1. Population % = Sample % ± SE% with 0.682 probability
2. Population % = Sample % ± 2SE% with 0.954 probability
3. Population % = Sample % ± 3SE% with 0.997 probability
Example
Let in a household survey, a sample of 100 produced an estimate of 75% for
the percentage of the households have broad band connection (p). Therefore,
the estimated percentage of the households not having broad band connection
is 25% (q = 100 – 75).
SE% = √(p.q / n) = √(75x25 /100) = 4.33
Hence,
Proportion of HH in the Population with a BB connection lies between 70.67%
and 79.33% at 0.682 probability, between 66.34% and 83.66% at 0.954
probability and between 62.01% and 87.99% at 0.997 probability.
4) Sample Size
Sample size (n) can also be estimated with known s, known SEm or SEs or
SE% within the confidence limits as follows —
n = (s/SEm)2 at 0.682
= (2s/SEm)2 at 0.954, and
= (3s/SEm)2 at 0.997 probability,
n = (s/2SEs)2 at 0.682
=(2s/2SEs)2 at 0.954, and
=(3s/2SEs)2 at 0.997 probability
n = p.q/(SE%)2 at 0.682
= 2p.q/(SE%)2 at 0.954, and
= 3p.q/(SE%)2at 0.997 probability
Example: For s = 3.45 and SEm= 0.25
at 0.682 probability, n = (s/SEm)2 = (3.45/0.25)2 = 190
at 0.954 probability, n = (2s/SEm)2 = (2x3.45/0.25)2= 762
at 0.997 probability, n = (3s/SEm)2 = (3x3.45/0.25)2= 1714
Note:
1. Sometimes, it may so happen that n becomes larger than the population. It is the
major drawback of the methods.
2. However, it should not be used without judgement as this is commonly used in
situations in which population size is uncountably large.
Why Probability??
One thing about “future”, of which we can be “certain” is
that it will be “utterly fantastic”
1. Probability is the fundamental building block for inferential statistics.
2. It uses theory to make confidence statements about the
characteristics of populations based on sample information, or to
test hypotheses.
3. It provides a quantitative description of the likely occurrence of a
particular event, and are expressed on a scale from 0 to 1 (or 0 to
100%).
4. A rare event has a probability of occurrence close to 0, while a very
common event has a probability of occurrence close to 1.
5. We encounter probability statements on an almost daily basis, e.g.,
chance of rain, or chance of a major earthquake, etc
6. It can be obtained in different ways. Some are purely subjective,
based on 'gut feeling' or 'best guess’, while others are either based
on observation or derived from theory.
A common way of obtaining probabilities is from data. The probability of
an event occurring, written as P(E), is defined as the proportion of times
the event occurs in a series of trials.
Example: to assess the probability of rain on a given day in Kolkata
between November and February:
Total No. of Days: 4 months = 120 days
Observed Rainy Days = 88 days
Probability of Rain (November to February): P(rain) = (88 / 120)
= 0.73
Conclusion:
1. The probability of Rain on any day during Winter in Kolkata, is 0.73
or 73% and
2. The probability of a Sunny Day during this period in Kolkata is (1 -
0.73) = 0.27 or 27%
Rain, Snow, Monsoon, El Nino, Soil Fertility, Cyclone, Earthquake, Slope
Failure, Lanslide, High Tide, Flood, River Stage, Species Diversity, ……
Crop Production, Industrial Production, Population Growth, Migration, …….
Trend: both temporal and spatial,…etc
Probability Theory and Normal Distribution
Concept of Probability
1. The post-quantitative revolution period viewed the organisation of
geographic elements over the earth’s surface and that of spaces as a matter
of chance.
2. Geographical objects are true representations of multivariate situations;
hence, the application of probability statistics. To understand this, the basic
concept of the following need to be explored first —
a) Random Experiment — are those whose results depend on chance.
When a coin is tossed, either head or tail
appears; but the result can not be predicted in
advance as it depends on chance.
b) Outcome — The result of a random experiment is
called an outcome.
c) Event — denotes what occurs in a random experiment.
These are of two types –
1. Elementary (i.e., can not be decomposed
into simpler ones) and
2. Composite (i.e., aggregates of several
elementary events).
d) Mutually Exclusive Events: when two or more of them can not occur
simultaneously. Such events can occur only
one at a time.
d) Exhaustive Events: formed by the complete group of all possible
events of a random experiment. For example
in coin tossing, the two events ‘head’ and ‘tail’
comprise an exhaustive set.
e) Trial: It refers to any particular performance of the
random experiment.
f) Cases Favourable to an Experiment:
g) In dice throwing, there are 6 possible outcomes. Of
these, 3 cases (1, 3, 5) are favourable to ‘odd number of points’
and 3 cases (2, 4, 6) are favourable to ‘even number of
points’.
h) Equally Likely:
If all outcomes occur with equal certainty; i.e., none of the
outcomes is expected in preference to another.
1st C 2nd C 3rd C Elementary Events TREE DIAGRAM
B BBB FAMILY
B
G BBG
B B BGB
G
Definition of Probability
(1) Classical Definition
If there are n possible outcomes (that are mutually exclusive, exhaustive and
equally likely) in a random experiment, and m of these are favourable to an
event A, then the probability of the event,
P(A) = m / n
1. When, P(A) = 0, the event is impossible, i.e., m = 0; it occurs when none of
the outcomes is favourable to the event (RARE EVENT).
2. When P(A) = 1, the event is certain, i.e., m = n; it occurs when all the
outcomes are favourable to the event (ABSOLUTELY CERTAIN EVENT).
The classical definition fails when the number of possible outcomes is infinitely
large. It has only limited applications in coin tossing, dice-throwing and similar
games.
(2) Empirical Definition
In N trials of a random experiment, if an event occurs f times, its relative
frequency is f /N. If it → a limiting value p, as N → infinity, then p is called the
‘probability’ of the event. Thus,
p = lim f / N).
N →∞
(3) Axiomatic Definition
a) It introduces ‘probability’ simply as a number associated with each
event, based on certain axioms.
b) Thus, it embraces all situations. The classical theory is simply a special
case of axiomatic theory.
c) Let a random experiment has a ‘finite’ number (n) of possible
‘elementary outcomes’, e1, e2, ……….……. ,en.
i. the set , S = {e1, e2, ……. , en } is called its ‘sample space’ and its
elements ei are called sample points.
ii. The sets {ei} consisting of single elements are called ‘elementary
events’, while those constituting more than one are called
‘composite events’.
iii. The ‘null set’, Ф is called of the impossible event and
iv. The ‘universal set’ is called the sure event.
a) Let the real numbers p1, p2, ……. , pn correspond to the elementary events
{e1}, {e2}, ……. , {en} respectively such that pi ≥ 0 and ∑pi = 1.
b) The numbers pi are called probabilities assigned to the elementary events
Ai = ei.
c) The probability of any event, A = {e1, e2, ……. , en} is then given by the sum
of the probabilities associated those outcomes belonging to this event.
Hence, P(A) = p1 + p2 + …. + pn
Probability Distribution
The probability distribution of a random variable is a statement that specifies
the set of its possible values together with their respective probabilities.
1) Discrete Probability Distribution
Let a discrete random variable X assume the values x1, x2, ….., xn with
probabilities p1, p2, ……. , pn respectively, where ∑pi = 1. the specification
of the set of values xi together with their probabilities pi (i = 1, 2, 3, …., n)
defines the discrete probability distribution of X.
Mathematically, f(x) = probability that X assumes the value x
= P (X = x)
The function f(x) is called the probability mass function (p.m.f.). It satisfies
two conditions —
a) f(x) ≥ 0 and
b) ∑ f(x) = 1
a) Uniform Distribution — When a discrete random variable assumes n
possible values with equal probabilities, then the probability becomes a
constant value 1/n.
The p.m.f is given by f(x) = 1/n
b) Binomial Distribution —
It is defined by the p.m.f.
f(x) = nCx px qn-a (x = 0, 1, 2, .., n),
where p and q are positive fractions (p+q = 1)
c) Poisson Distribution —
It is defined by the p.m.f., f(x) = e–m. mx / x! (x = 0, 1, 2, .., n), where m is
the ‘parameter’ of Poisson distribution (always +ve) and e is a mathematical
constant (2.718 app) given by the infinite series, e = 1 + (1/1!) + (1/2!) + ……
(2) Continuous Probability Distribution
Let x be a continuous random variable that can take any value in the
interval (a, b) i.e., a ≤ x ≤ b.
As the number of possible values of x is infinite, probabilities can not be
assigned to each one of its value; it is assigned to intervals. For a
continuous probability distribution, let f (x) be a non-negative function such
that — P (c ≤ x ≤ d) = d∫c f (x) dx
The function f (x) is called the probability density function (p.d.f.) or simply
density function of x. It satisfies two conditions —
a) f(x) ≥0 and
b) b∫a f (x) dx = 1.
The curve represented by the equation, y = f (x) is known as the probability
curve. Geometrically, the integral of the p.d.f. represents the area under the
probability curve between interval (c, d) in the range (a, b).
a) Uniform Distribution — If the probabilities associated with intervals of
equal length are equal at any part of the range, it is called a uniform
distribution.
It is defined by the p.d.f.
f(x) = 1/(b – a): a ≤ x ≤ b.
It is also called a rectangular distribution as the distribution looks like
rectangle of height 1/(b – a) over the range a ≤ x ≤ b.
b) Normal Distribution — It is called Gaussian distribution and is defined by
the p.d.f.,
f(x) = [1 / σ√(2π)]. e–(x –μ)2/2σ2 : –∞ < x < ∞
where, μ = mean, σ = standard deviation, π and e are mathematical
constants.
1) The probability curve of normal distribution is known as normal curve
which is bell-shaped and symmetrical about the line x = μ, and the two
tails extend indefinitely on either side.
2) The maximum ordinate lies at x = μ and is given by
y = 1/σ√(2π)
1. In Normal Distributions, mean = median = mode
2. Fractiles (e.g., quartiles, deciles, etc) are
equidistant from mean, i.e.,
quartile deviation = 0.67σ
mean deviation = 0.80σ
3. Skewness = 0, Kurtosis = 0
4. The points of inflection of the Normal Curve lie at
x = (μ ± σ)
5. The standard score of x is given by —
z = (x – μ)/σ
It has the p.d.f. of p(z) = [1/√(2π)]. e– z2/2
where –∞ < z < ∞
1. In geography, a huge data set is likely to follow
normal distribution.
2. In sampling theory, any statistic based on a large
sample approximately follow normal distribution,
thereby simplifying the testing of statistical
hypotheses and identifying the confidence limit
of parameters.
Area under Normal Curve
1. Barber, G. M. (1988): Elementary Statistics for Geographers, The Guilford Press, London
2. Berry, B. J. L. and Marble, D. F. (ed. 1968): Spatial Analysis - a reader in statistical geography,
Englewood Cliff, NJ
3. Clark, W. A. V. and P. Hosking (1986): Statistical Methods for Geographers, John Wiley, NY
4. Cressie, N. A. C. (1993): Statistics for Spatial Data, John Wiley & Sons, NY
5. Ebdon, D. (1977): Statistics in Geography - a practical approach, Basil Blackwell, Oxford
6. Gregory, S. (1963): Statistical Methods and the Geographer, Longman, London
7. Haggett, P., A. W. Cliff and A. Frey (1977): Locational Methods, Vol-I & II, Edward Arnold, London
8. Haggett, P. and R. J. Chorley (1969): Network Analysis in Geography, Edward Arnold, London
9. Hammond, R. and McCullagh, P. S. (1974): Quantitative Techniques in Geography, Claredon
Press, Oxford
10.Harvey, D. H. (1969): Explanation in Geography, Edward Arnold Pub., London
11.Johnston, R. J. (1978): Multivariate Statistical Analysis in Geography, New York : London
12.King, L. J. (1969): Statistical Analysis in Geography, Englewood Cliffs, NJ : Prentice Hall
13.Kitanidis, P. K. (1997): Introduction to Geostatistics, Cambridge University Press
List of Further Reading
14. Limb, M. and Dwyer, C (ed. 2001): Qualitative Methodologies for Geographers, London: Arnold
15. Lindsay, J. M. (1997): Techniques in Human Geography, Routledge
16. Matthews. M. H. and Foster, I. D. L. (1989) : Geographical Data - sources, presentation and
analysis, OUP
17. O’Brien, L (1992): Introducing Quantitative Geography, Routledge, London
18. Ripley, B. D. (1981): Spatial Statistics, Wiley, NY
19. Robinson, G. (1998): Methods and Techniques in Human Geography, Wiley, NY
20. Rogerson, P. A. (2001): Statistical Methods for Geography, Sage, London
21. Shaw, R. L. and Wheeler, D. (1985): Statistical Techniques in Geographical Analysis, John
Wiley & Sons, NY
22. Streich, T. A. (1986): Geographic Data Processing – an overview, California Univ. Press, Santa
Barbara
23. Taylor, P. J. (1977): Quantitative Methods in Geography – an introduction to spatial analysis,
Houghton Mifflin, Boston
24. Unwin, D. (1981): Introductory Spatial Analysis, New York : Methuen
25. Walford, N. (2002): Geographical Data – characteristics and sources, Wiley, NJ
26. Worthington, B. D. R. and R. Gont (1975): Techniques in Map Analysis, McMillan Ltd, London
27. Wrigley, N. and Bennett, R. J. (ed.1981): Quantitative Geography, Methuen, London
List of Further Reading
Thank You
Ethics in Statistical Geography
1. Be Honest with Data
Enumeration, Measurement and
Collection
2. Be Wise while selecting the
Statistical Technique(s) for your
intended Purpose
3. Explore the Results, observe the
Geographical Associations and
go for the Statistical Inferences
4. Be Precise and very Simple while
translating the “Language of
Statistics” into the “Language of
Geography”
Looking for a Publication in a Peer
Reviewed Journal?
Visit: www.indiansss.org
Indian Journal of Spatial Science
Contact:
editorijss2012@gmail.com

More Related Content

What's hot

Contributions of greek scholars in geography
Contributions of greek scholars in geographyContributions of greek scholars in geography
Contributions of greek scholars in geographyMuhammadBilawal20
 
British school of geography
British school of geographyBritish school of geography
British school of geographyDebosmitaRouth
 
Raster data model
Raster data modelRaster data model
Raster data modelPramoda Raj
 
Definition and scope of settlement geography
Definition and scope of settlement geographyDefinition and scope of settlement geography
Definition and scope of settlement geographymarguburrahaman
 
Statistics for Geography and Environmental Science: an introductory lecture c...
Statistics for Geography and Environmental Science:an introductory lecture c...Statistics for Geography and Environmental Science:an introductory lecture c...
Statistics for Geography and Environmental Science: an introductory lecture c...Rich Harris
 
Introduction and Application of GIS
Introduction and Application of GISIntroduction and Application of GIS
Introduction and Application of GISSatish Taji
 
Mean centre of population
Mean centre of populationMean centre of population
Mean centre of populationArindam Sarkar
 
Applications of gis
Applications of gisApplications of gis
Applications of gisPramoda Raj
 
Geo referencing by Mashhood Arif
Geo referencing by Mashhood ArifGeo referencing by Mashhood Arif
Geo referencing by Mashhood ArifKU Leuven
 
Distribution maps
Distribution mapsDistribution maps
Distribution mapsivisdude82
 
Health GIS (Geographic Information System)
Health GIS (Geographic Information System)Health GIS (Geographic Information System)
Health GIS (Geographic Information System)Zulfiquer Ahmed Amin
 
Representation of data on map by proportional circles
Representation of data on map by proportional circlesRepresentation of data on map by proportional circles
Representation of data on map by proportional circlesSubhasish Sutradhar
 
Spatial Autocorrelation
Spatial AutocorrelationSpatial Autocorrelation
Spatial AutocorrelationEhsan Hamzei
 

What's hot (20)

Contributions of greek scholars in geography
Contributions of greek scholars in geographyContributions of greek scholars in geography
Contributions of greek scholars in geography
 
Research in geography
Research in geographyResearch in geography
Research in geography
 
British school of geography
British school of geographyBritish school of geography
British school of geography
 
Raster data model
Raster data modelRaster data model
Raster data model
 
Definition and scope of settlement geography
Definition and scope of settlement geographyDefinition and scope of settlement geography
Definition and scope of settlement geography
 
Statistics for Geography and Environmental Science: an introductory lecture c...
Statistics for Geography and Environmental Science:an introductory lecture c...Statistics for Geography and Environmental Science:an introductory lecture c...
Statistics for Geography and Environmental Science: an introductory lecture c...
 
Cartography
CartographyCartography
Cartography
 
Definition, nature and scope of population geography
Definition, nature and scope of population geographyDefinition, nature and scope of population geography
Definition, nature and scope of population geography
 
Introduction and Application of GIS
Introduction and Application of GISIntroduction and Application of GIS
Introduction and Application of GIS
 
Mean centre of population
Mean centre of populationMean centre of population
Mean centre of population
 
Applications of gis
Applications of gisApplications of gis
Applications of gis
 
Geo referencing by Mashhood Arif
Geo referencing by Mashhood ArifGeo referencing by Mashhood Arif
Geo referencing by Mashhood Arif
 
Distribution maps
Distribution mapsDistribution maps
Distribution maps
 
georeference
georeferencegeoreference
georeference
 
Health GIS (Geographic Information System)
Health GIS (Geographic Information System)Health GIS (Geographic Information System)
Health GIS (Geographic Information System)
 
STATISTICAL METHODS IN GEOGRAPHY
STATISTICAL METHODS IN GEOGRAPHYSTATISTICAL METHODS IN GEOGRAPHY
STATISTICAL METHODS IN GEOGRAPHY
 
Basics of geodesy
Basics of geodesyBasics of geodesy
Basics of geodesy
 
Representation of data on map by proportional circles
Representation of data on map by proportional circlesRepresentation of data on map by proportional circles
Representation of data on map by proportional circles
 
Land use and land cover ppt
Land use and land cover pptLand use and land cover ppt
Land use and land cover ppt
 
Spatial Autocorrelation
Spatial AutocorrelationSpatial Autocorrelation
Spatial Autocorrelation
 

Similar to Statistical Methods Workshop

Geographic Information: Aspects of Phenomenology and Cognition
Geographic Information: Aspects of Phenomenology and CognitionGeographic Information: Aspects of Phenomenology and Cognition
Geographic Information: Aspects of Phenomenology and CognitionRobert (Bob) Williams
 
Role of Modern Geographical Knowledge in National Development
Role  of Modern Geographical Knowledge in National DevelopmentRole  of Modern Geographical Knowledge in National Development
Role of Modern Geographical Knowledge in National DevelopmentProf Ashis Sarkar
 
Introduction to Geographic Information system and Remote Sensing (RS)
Introduction to Geographic Information system  and Remote Sensing (RS)Introduction to Geographic Information system  and Remote Sensing (RS)
Introduction to Geographic Information system and Remote Sensing (RS)chala hailu
 
Geographical Information System By Zewde Alemayehu Tilahun.pptx
Geographical Information System By Zewde Alemayehu Tilahun.pptxGeographical Information System By Zewde Alemayehu Tilahun.pptx
Geographical Information System By Zewde Alemayehu Tilahun.pptxzewde alemayehu
 
Geographical information system by zewde alemayehu tilahun
Geographical information system by zewde alemayehu tilahunGeographical information system by zewde alemayehu tilahun
Geographical information system by zewde alemayehu tilahunzewde alemayehu
 
5.1 major analytical techniques
5.1 major analytical techniques5.1 major analytical techniques
5.1 major analytical techniquesmd Siraj
 
TYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial Database
TYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial DatabaseTYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial Database
TYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial DatabaseArti Parab Academics
 
INTRODUCTION_TO_GIS.ppt
INTRODUCTION_TO_GIS.pptINTRODUCTION_TO_GIS.ppt
INTRODUCTION_TO_GIS.pptSafriyana1
 
Chapter 1: True Maps, False Impressions
Chapter 1: True Maps, False ImpressionsChapter 1: True Maps, False Impressions
Chapter 1: True Maps, False Impressionsjstubblefield
 
Sa Presentation 20070917111 Thomas
Sa Presentation 20070917111 ThomasSa Presentation 20070917111 Thomas
Sa Presentation 20070917111 Thomasnspiropo
 
Phenomenological Structuring of Geographic Information
Phenomenological Structuring of Geographic InformationPhenomenological Structuring of Geographic Information
Phenomenological Structuring of Geographic InformationRobert (Bob) Williams
 
Spatial Analysis of House Price Determinants
Spatial Analysis of House Price DeterminantsSpatial Analysis of House Price Determinants
Spatial Analysis of House Price DeterminantsLaurent Lacaze Santos
 
Spatial association discovery process using frequent subgraph mining
Spatial association discovery process using frequent subgraph miningSpatial association discovery process using frequent subgraph mining
Spatial association discovery process using frequent subgraph miningTELKOMNIKA JOURNAL
 

Similar to Statistical Methods Workshop (20)

Geographic Information: Aspects of Phenomenology and Cognition
Geographic Information: Aspects of Phenomenology and CognitionGeographic Information: Aspects of Phenomenology and Cognition
Geographic Information: Aspects of Phenomenology and Cognition
 
Adamas university2018 f
Adamas university2018 fAdamas university2018 f
Adamas university2018 f
 
391 pensando espacialmente
391 pensando espacialmente391 pensando espacialmente
391 pensando espacialmente
 
Role of Modern Geographical Knowledge in National Development
Role  of Modern Geographical Knowledge in National DevelopmentRole  of Modern Geographical Knowledge in National Development
Role of Modern Geographical Knowledge in National Development
 
Introduction to Geographic Information system and Remote Sensing (RS)
Introduction to Geographic Information system  and Remote Sensing (RS)Introduction to Geographic Information system  and Remote Sensing (RS)
Introduction to Geographic Information system and Remote Sensing (RS)
 
Geographical Information System By Zewde Alemayehu Tilahun.pptx
Geographical Information System By Zewde Alemayehu Tilahun.pptxGeographical Information System By Zewde Alemayehu Tilahun.pptx
Geographical Information System By Zewde Alemayehu Tilahun.pptx
 
Geographical information system by zewde alemayehu tilahun
Geographical information system by zewde alemayehu tilahunGeographical information system by zewde alemayehu tilahun
Geographical information system by zewde alemayehu tilahun
 
5.1 major analytical techniques
5.1 major analytical techniques5.1 major analytical techniques
5.1 major analytical techniques
 
TYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial Database
TYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial DatabaseTYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial Database
TYBSC IT PGIS Unit I Chapter II Geographic Information and Spacial Database
 
Cartographic data structures
Cartographic data structuresCartographic data structures
Cartographic data structures
 
INTRODUCTION_TO_GIS.ppt
INTRODUCTION_TO_GIS.pptINTRODUCTION_TO_GIS.ppt
INTRODUCTION_TO_GIS.ppt
 
Chapter 1: True Maps, False Impressions
Chapter 1: True Maps, False ImpressionsChapter 1: True Maps, False Impressions
Chapter 1: True Maps, False Impressions
 
Sa Presentation 20070917111 Thomas
Sa Presentation 20070917111 ThomasSa Presentation 20070917111 Thomas
Sa Presentation 20070917111 Thomas
 
AIC Darwin Phenomenology 1990
AIC Darwin Phenomenology 1990AIC Darwin Phenomenology 1990
AIC Darwin Phenomenology 1990
 
Phenomenological Structuring of Geographic Information
Phenomenological Structuring of Geographic InformationPhenomenological Structuring of Geographic Information
Phenomenological Structuring of Geographic Information
 
Spatial Analysis of House Price Determinants
Spatial Analysis of House Price DeterminantsSpatial Analysis of House Price Determinants
Spatial Analysis of House Price Determinants
 
Spatial association discovery process using frequent subgraph mining
Spatial association discovery process using frequent subgraph miningSpatial association discovery process using frequent subgraph mining
Spatial association discovery process using frequent subgraph mining
 
Introduction to gis
Introduction to gisIntroduction to gis
Introduction to gis
 
Gis concepts
Gis conceptsGis concepts
Gis concepts
 
10.1.1.17.1245
10.1.1.17.124510.1.1.17.1245
10.1.1.17.1245
 

More from Prof Ashis Sarkar

My Experiments with the Innovative Research Techniques in Geography
My Experiments with the Innovative Research Techniques in GeographyMy Experiments with the Innovative Research Techniques in Geography
My Experiments with the Innovative Research Techniques in GeographyProf Ashis Sarkar
 
Role of Remote Sensing(RS) and Geographical Information System (GIS) in Geogr...
Role of Remote Sensing(RS) and Geographical Information System (GIS) in Geogr...Role of Remote Sensing(RS) and Geographical Information System (GIS) in Geogr...
Role of Remote Sensing(RS) and Geographical Information System (GIS) in Geogr...Prof Ashis Sarkar
 
Global Climate Change - a geographer's sojourn
Global Climate Change - a geographer's sojournGlobal Climate Change - a geographer's sojourn
Global Climate Change - a geographer's sojournProf Ashis Sarkar
 
Development, Environment and Sustainabilty–the triumvirate on Geographical Frame
Development, Environment and Sustainabilty–the triumvirate on Geographical FrameDevelopment, Environment and Sustainabilty–the triumvirate on Geographical Frame
Development, Environment and Sustainabilty–the triumvirate on Geographical FrameProf Ashis Sarkar
 
GEOGRAPHICAL DIMENSIONS OF ‘DEVELOPMENT – ENVIRONMENT INTERRELATION’
GEOGRAPHICAL DIMENSIONS OF ‘DEVELOPMENT – ENVIRONMENT INTERRELATION’GEOGRAPHICAL DIMENSIONS OF ‘DEVELOPMENT – ENVIRONMENT INTERRELATION’
GEOGRAPHICAL DIMENSIONS OF ‘DEVELOPMENT – ENVIRONMENT INTERRELATION’Prof Ashis Sarkar
 
GEOGRAPHY AND MAPS –myth and contemporary realities
GEOGRAPHY AND MAPS –myth and contemporary realitiesGEOGRAPHY AND MAPS –myth and contemporary realities
GEOGRAPHY AND MAPS –myth and contemporary realitiesProf Ashis Sarkar
 
CARTOGRAPHY – yesterday, today and tomorrow
CARTOGRAPHY – yesterday, today and tomorrowCARTOGRAPHY – yesterday, today and tomorrow
CARTOGRAPHY – yesterday, today and tomorrowProf Ashis Sarkar
 
Land Degradation – nature and concerns
Land Degradation – nature and concernsLand Degradation – nature and concerns
Land Degradation – nature and concernsProf Ashis Sarkar
 
Research Issues and Concerns
Research Issues and ConcernsResearch Issues and Concerns
Research Issues and ConcernsProf Ashis Sarkar
 
MANAGEMENT OF DISASTERS – THE TDRM APPROACH
MANAGEMENT OF DISASTERS – THE TDRM APPROACHMANAGEMENT OF DISASTERS – THE TDRM APPROACH
MANAGEMENT OF DISASTERS – THE TDRM APPROACHProf Ashis Sarkar
 
The Discipline of Cartography – philosophical basis and modern transformations
The Discipline of Cartography – philosophical basis and modern transformationsThe Discipline of Cartography – philosophical basis and modern transformations
The Discipline of Cartography – philosophical basis and modern transformationsProf Ashis Sarkar
 
Application of Modern Geographical Tools & Techniques in Planning and Develo...
Application  of Modern Geographical Tools & Techniques in Planning and Develo...Application  of Modern Geographical Tools & Techniques in Planning and Develo...
Application of Modern Geographical Tools & Techniques in Planning and Develo...Prof Ashis Sarkar
 
DEVELOPMENT VS ENVIRONMENT IN GEOGRAPHICAL FRAMEWORK
DEVELOPMENT VS ENVIRONMENT IN GEOGRAPHICAL FRAMEWORKDEVELOPMENT VS ENVIRONMENT IN GEOGRAPHICAL FRAMEWORK
DEVELOPMENT VS ENVIRONMENT IN GEOGRAPHICAL FRAMEWORKProf Ashis Sarkar
 
Information System and Cartographic Abstraction
Information System and Cartographic AbstractionInformation System and Cartographic Abstraction
Information System and Cartographic AbstractionProf Ashis Sarkar
 
Map Projections ―concepts, classes and usage
Map Projections ―concepts, classes and usage Map Projections ―concepts, classes and usage
Map Projections ―concepts, classes and usage Prof Ashis Sarkar
 
ENVIRONMENTAL DEGRADATION – CONCEPT, CLASSES AND LINKAGES
ENVIRONMENTAL DEGRADATION – CONCEPT, CLASSES AND LINKAGESENVIRONMENTAL DEGRADATION – CONCEPT, CLASSES AND LINKAGES
ENVIRONMENTAL DEGRADATION – CONCEPT, CLASSES AND LINKAGESProf Ashis Sarkar
 
Resource Analysis from a Geographers’ Perspective
Resource Analysis from a Geographers’ PerspectiveResource Analysis from a Geographers’ Perspective
Resource Analysis from a Geographers’ PerspectiveProf Ashis Sarkar
 

More from Prof Ashis Sarkar (20)

Mapping the Astycene
Mapping the AstyceneMapping the Astycene
Mapping the Astycene
 
My Experiments with the Innovative Research Techniques in Geography
My Experiments with the Innovative Research Techniques in GeographyMy Experiments with the Innovative Research Techniques in Geography
My Experiments with the Innovative Research Techniques in Geography
 
Role of Remote Sensing(RS) and Geographical Information System (GIS) in Geogr...
Role of Remote Sensing(RS) and Geographical Information System (GIS) in Geogr...Role of Remote Sensing(RS) and Geographical Information System (GIS) in Geogr...
Role of Remote Sensing(RS) and Geographical Information System (GIS) in Geogr...
 
Geography and Geographers
Geography and GeographersGeography and Geographers
Geography and Geographers
 
Geography and Cartography
Geography and CartographyGeography and Cartography
Geography and Cartography
 
Global Climate Change - a geographer's sojourn
Global Climate Change - a geographer's sojournGlobal Climate Change - a geographer's sojourn
Global Climate Change - a geographer's sojourn
 
Development, Environment and Sustainabilty–the triumvirate on Geographical Frame
Development, Environment and Sustainabilty–the triumvirate on Geographical FrameDevelopment, Environment and Sustainabilty–the triumvirate on Geographical Frame
Development, Environment and Sustainabilty–the triumvirate on Geographical Frame
 
GEOGRAPHICAL DIMENSIONS OF ‘DEVELOPMENT – ENVIRONMENT INTERRELATION’
GEOGRAPHICAL DIMENSIONS OF ‘DEVELOPMENT – ENVIRONMENT INTERRELATION’GEOGRAPHICAL DIMENSIONS OF ‘DEVELOPMENT – ENVIRONMENT INTERRELATION’
GEOGRAPHICAL DIMENSIONS OF ‘DEVELOPMENT – ENVIRONMENT INTERRELATION’
 
GEOGRAPHY AND MAPS –myth and contemporary realities
GEOGRAPHY AND MAPS –myth and contemporary realitiesGEOGRAPHY AND MAPS –myth and contemporary realities
GEOGRAPHY AND MAPS –myth and contemporary realities
 
CARTOGRAPHY – yesterday, today and tomorrow
CARTOGRAPHY – yesterday, today and tomorrowCARTOGRAPHY – yesterday, today and tomorrow
CARTOGRAPHY – yesterday, today and tomorrow
 
Land Degradation – nature and concerns
Land Degradation – nature and concernsLand Degradation – nature and concerns
Land Degradation – nature and concerns
 
Research Issues and Concerns
Research Issues and ConcernsResearch Issues and Concerns
Research Issues and Concerns
 
MANAGEMENT OF DISASTERS – THE TDRM APPROACH
MANAGEMENT OF DISASTERS – THE TDRM APPROACHMANAGEMENT OF DISASTERS – THE TDRM APPROACH
MANAGEMENT OF DISASTERS – THE TDRM APPROACH
 
The Discipline of Cartography – philosophical basis and modern transformations
The Discipline of Cartography – philosophical basis and modern transformationsThe Discipline of Cartography – philosophical basis and modern transformations
The Discipline of Cartography – philosophical basis and modern transformations
 
Application of Modern Geographical Tools & Techniques in Planning and Develo...
Application  of Modern Geographical Tools & Techniques in Planning and Develo...Application  of Modern Geographical Tools & Techniques in Planning and Develo...
Application of Modern Geographical Tools & Techniques in Planning and Develo...
 
DEVELOPMENT VS ENVIRONMENT IN GEOGRAPHICAL FRAMEWORK
DEVELOPMENT VS ENVIRONMENT IN GEOGRAPHICAL FRAMEWORKDEVELOPMENT VS ENVIRONMENT IN GEOGRAPHICAL FRAMEWORK
DEVELOPMENT VS ENVIRONMENT IN GEOGRAPHICAL FRAMEWORK
 
Information System and Cartographic Abstraction
Information System and Cartographic AbstractionInformation System and Cartographic Abstraction
Information System and Cartographic Abstraction
 
Map Projections ―concepts, classes and usage
Map Projections ―concepts, classes and usage Map Projections ―concepts, classes and usage
Map Projections ―concepts, classes and usage
 
ENVIRONMENTAL DEGRADATION – CONCEPT, CLASSES AND LINKAGES
ENVIRONMENTAL DEGRADATION – CONCEPT, CLASSES AND LINKAGESENVIRONMENTAL DEGRADATION – CONCEPT, CLASSES AND LINKAGES
ENVIRONMENTAL DEGRADATION – CONCEPT, CLASSES AND LINKAGES
 
Resource Analysis from a Geographers’ Perspective
Resource Analysis from a Geographers’ PerspectiveResource Analysis from a Geographers’ Perspective
Resource Analysis from a Geographers’ Perspective
 

Recently uploaded

Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayZachary Labe
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsCharlene Llagas
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 

Recently uploaded (20)

Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work Day
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of Traits
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 

Statistical Methods Workshop

  • 1. Prof Ashis Sarkar, W.B.S.E.S. Managing Editor & Publisher: Indian Journal of Spatial Science www.indiansss.org Sampling and Probability in Geography 7-Day Workshop (11 – 17 July, 2017) On “Statistical Methods and Techniques of Spatial Analysis” organized by Women’s Christian College, Kolkata in collaboration with Department of Geography, University of Calcutta
  • 3. 1. Geography is rooted in the ancient practice, concerned with the characteristics of places, in particular the interrelations between natural habitat, economy and society. 2. Its unique identity was given by Eratosthenes (276–194 BC): geo+graphein = “earth description.” It seeks to answer the questions of — why things are a) as they are, and b) where they are. Thus, it is concerned with the characteristics of places: 1. Location and 2. Attributes at Locations 3. Currently, Geography seeks — a) to classify phenomena, b) to compare, c) to generalize, d) to ascend from effects to causes, and in doing so, e) to trace out the laws of nature and to mark their influences upon man, thereby simply making it 'a description of the world’. 4. Hence, it is a SCIENCE — of argument and reason, and of course, of cause and effect.
  • 4. 1. It helps us to understand the principles of living with nature…. 2. It teaches us to put things in perspective…. 3. It helps us to understand, live and exist in the current world peacefully with others… 4. It helps us to ensure social order, social justice and sustainable smart living…. 5. It builds a bridge between the natural / physical and social sciences.. The spatial order on the earth’s surface can be defined, identified and analysed (measured, monitored, mapped and modelled) with a scientific geographical mind in space–time frame. It forms the “philosophical foundation of the discipline of Geography, which is both inter-disciplinary and multi-disciplinary. The effects of QR (late 1950s–80s) are manifested in the following ways — 1. Statistical Techniques remains a key and virtually universal element in the training of new breed of quantitative geographers from positivist approach. 2. The RS/GIS revolution renewed the resurgence of interest in Spatial Statistics (e.g., TFL and TSL). 3. Besides, statistical techniques helped the Marxist, Structuralist, Political, Economic and even Behaviouralist Geographers
  • 5. 1. Geographers study — a) how and why elements differ from place to place, as well as b) how spatial patterns change through time. 2. Thus, geographers begin with the question ‘where?', exploring how features are distributed on a physical or cultural landscape, observing the spatial patterns and its trend. 3. Contemporary geographical analysis has shifted from ‘where’ to ‘why’— a) why a specific spatial pattern exists, b) what spatial processes have produced such pattern, and c) why such processes operate. 3. Only via these 'why' questions, geographers begin to understand the mechanisms of change, which are, in fact, infinite in their complexity.
  • 6. Geography needs it because― 1. These help summarize the findings of studies (e.g.: total rainfall during a period in a state), 2. These help understanding the phenomenon under study (e.g.: rainfall is more in the southern districts), 3. These help forecast the state of variables (e.g.: draught is likely to occur next year), 4. These help evaluate performance of certain activity (e.g.: more rainfall means more rice production), 5. These help decision making (e.g.: finding out the best location), 6. They also help to establish whether relationships between the ‘characteristics’ of a set of observations are genuine or not, and thus make a valuable contribution to Geographical knowledge. Geographers use Statistical Methods — 1. To describe and summarize spatial data. 2. To make generalizations about complex spatial patterns. 3. To estimate the probability of outcomes for an event at a given location. 4. To use sample data to infer characteristics for a larger set of geographic data. 5. To determine if the magnitude or frequency of some phenomenon differs from one location to another. 6. To learn whether an actual spatial pattern matches some expected pattern.
  • 7. 1. Statistical analysis in Geography is unique as it concerns ‘spatial’ or ‘geographically referenced’ data (with co-ordinates). 2. The variety of techniques being almost infinite, the GDA has to pick and choose the best suited one for his specific job. 3. Today user-friendly software packages are easily available, e.g., SPSS, Statistica, R, etc 4. Processing of geographical data involves the ‘application of suitable’ techniques of spatial statistics. 5. Its presentation requires the ‘application of the most suited’ cartographic techniques. 6. Its interpretation needs the ‘wisest use of geographical principles’ leading ultimately to the scientific geographical explanation. Spatial data have difficulties associated with its analysis: boundary delineation, modifiable areal units, and the level of spatial aggregation or scale. That is why, methods of statistical analysis varies: 1. A study of /capita income within a city, if confined to the inner core shows that income levels are lower, but if whole city is taken, it become higher. In the determination of internal boundaries this is also true. Thus interpretations are valid for the area and sub-area configuration of a region. 2. In Census, the available information is hierarchically arranged. The GDA must use the same level for comparison. 3. Socio-economic data are available at a variety of scales, e.g., Ward, Municipality, Mouza, GP, Block, District, and State: When it is aggregated at different scales, the resulting descriptive statistics may exhibit variations, either in a systematic, predictable way, or in a more uncertain fashion.
  • 9. Geographical data has two important properties — 1. the reference to geographic space, which means the data are registered to a geographical co-ordinate system so that data from different sources can be cross – referenced and integrated, and 2. the representation at geographical scale; data are normally recorded at relatively small scales and are often generalised and symbolised. Therefore, geographical data / spatial data / map data are essentially scaled spatial data. It falls into three categories — 1. the geodetic control network (GCN) {Reference Map}, 2. the topographic base (tBase) {Contour Map}, and 3. the geographic overlays (GrO) {Thematic Map} Geographical data are recorded using four different means — 1. Alphanumeric characters or texts (Document Form: document data), 2. Numerical values (Numeric Form: numeric data), 3. Symbols or signs (Graphical Form: graph/diagram/map data), and 4. Signals (Digital Form: digital data)
  • 10. Geographical data concern facts about the geographic reality and geographical facts are always inter-subjective and reliable. Inter-subjective ⇒ repeated observations of the same phenomena by different people yield the same factual statement Reliable ⇒ an observer repeatedly recording the same phenomena produces the same factual statement A geographer’s data matrix (GDM) is a matrix of individuals (objects or events) against attributes (various observations made on the attributes of these individuals). Places Co-ordinates a1 aN x y 1 .. .. .. .. 2 3 .. .. .. .. N .. .. .. .. 1. A single data set typically contains information on many variables and many objects. 2. Conventionally, variables are entered in columns and objects in rows. 3. Thus, each measurement on a row genuinely concerns the same object and all data in a single column relate to the same variable. 4. The entire contents of a column are normally analysed together such that the analysis essentially concerns comparable data (GDM).
  • 11. Real world features occur in two fundamental forms – 1. Objects (these are discrete and definite: e.g., hills, rivers, forests, mines, highways, cities, bio-reserves) and 2. Phenomena (these are continuously distributed over a geographical space: e.g., terrain, slope, rainfall, temperature, pollution level and other environmental indices). A spatial object has three essential characteristics – 1. It can be delineated by identifiable boundaries, 2. it can be described by one or more attributes, and 3. it is relevant to some intended application. Spatial objects may be: 1. “exact objects”, when they have well defined boundaries (e.g., landholding, building, etc) and 2. “inexact objects” or “fuzzy entities”, when such boundaries are not well defined (e.g., landform features and natural resource features). Graphically, spatial objects are represented as three geometric elements – points (0D), lines (1D), and polygons (2D).
  • 12. There are two distinct ways of the real world representation in geographic database – 1. the object-based model (it views geographic space as populated by sets of objects, for which data are obtained both by field surveying and laboratory analysis. In such databases, attributes are arranged against locations defined by co-ordinate lists or vector lines. Hence, it is called the “vector data structure”), and 2. the field-based model (it views geographic space as populated by one or more phenomena, that are basically represented as “surfaces”; these are often conceptualised as being made up of a number of spatial data units. Data for these are normally arranged against locations defined by elemental tessellation in the form of “square” or “rectangular” cells. Hence, such data structures are called “raster data models”. The field-based spatial data can be obtained either 1. Directly from aerial photography, satellite imagery, map scanning and field measurements at selected or sampled locations (e.g., data for triangulated irregular networks or TIN) or 2. Indirectly generated applying mathematical functions, e.g., contours and digital elevation models (DEM).
  • 13. Geographical data are often collected on the basis of Areal Units— 1. These may be Natural, e.g., plain, plateau, mountains, etc. 2. It can well be Artificial, e.g., • Singular units (e.g., individual households or landholding) or • Collective / Administrative divisions (i.e., regions made up of many such landholdings). These units are often arranged into some sort of hierarchical structure— India = {States} State = {Districts} Districts = {CD Blocks} CDB = {Gram Panchayats} GP = {Landholdings/Households} At each level these have different but nested hierarchic identification code that makes them unique (Census). In such a perfect set-hierarchy, comparisons can only be made between similar individuals occupying the same level in the hierarchy: GP vs. GP, CDB vs. CDB, District vs. District, State vs. State Thus, GP level inferences may not hold good for the CDB / District / State level. 1. A “high level” to “low level” analysis yields a contextual relationship and 2. A “low level” to “high level” analysis yields an aggregate relationship. These are unique and require its own particular mode of thought about data collection, storage, and procedure for manipulation and analysis.
  • 14. Based on “simple scalar systems” for measuring objects or their attributes, Stevens (1959) describes four kinds of measurement models or scales with one or two hybrid variants ― 1. Nominal or Categorical Data: It provides a device for labelling or classifying objects rather than measuring its attributes. These are qualitative data, often presented in the form of names. Special “statistical techniques” are used for this sort of data. 2. Ordinal Data: In this, a set of objects is ordered from the "most“ to the "least" but in which there is actually no information regarding the value of the measured attributes. Thus, it produces poor quality data and is not valid for algebraic operations (i.e., addition, subtraction, multiplication or division) and requires the use of non-parametric statistical methods for analysis. 3. Interval Data: In this, one can not only rank order objects with respect to a measured attribute but is also able to specify how far apart the magnitudes are from each other. However, it has no natural origin: lack of “absolute zero”.
  • 15. 4. Ratio Data: It incorporates equivalence of ratios and starts from an absolute zero. It is most precise as it uses the real number system and allows continuous measurement. Units can be converted directly from one system to another. It can be divided in various ways, as follows ― a) Continuous and Discrete data. b) Directional and Non-directional data. c) Closed and Open data. d) Positive and Negative data. e) Point (place) and Period (time) data. f) Spatial and Non-spatial/ Attribute data. Scale of Measurement is very important, because it determines the Techniques and Measures of — 1. Data Description (…) 2. Correlation and Association between and among Variables (…) 3. Regression and Estimation (….) 4. Comparison and Tests of Significance (…) 5. Randomness, Order and Regularity (…) 6. Forecasting and Decision Making (…) 7. Understanding the Phenomena (…) 8. Cartographic Presentation (…)
  • 16. 1. There is no such thing as an exact measurement. 2. All measurements contain error, the magnitude being dependent on the instruments used and also the ability of the observer to use. 3. As the true value is never known, the true error is never determined. 4. The degree of accuracy, or its precision, can only be called a relative accuracy. 5. The estimated error is shown as a fraction of the measured quantity. Thus 100 ft measured with an estimated error of 1 inch represents a relative accuracy of 1 / 1200. An error of 1 cm in 100 m = 1 / 10,000. 6. Where readings are taken on a graduated scale to the nearest subdivision, the maximum error in estimation will be ± 1/2 division. 7. Repeated measurement increases the accuracy by √n, where n = number of repetitions. (N.B. This cant not be applied indefinitely). 8. Agreement between repeated measurements does not imply accuracy but only consistency. To Note…..
  • 17. Measurement Scale Ratio, Interval, Ordinal, Nominal Data Ratio, Interval, Ordinal Data Ratio, Interval Data Ratio Data 1-Component Case - f, Mo Me, Px Mean, Variance GM, HM, CV 2-Component Case Nominal Scale Data Ordinal Scale Interval + Ratio Scale Chi Square Contingency Coefficient U Test Spearman’s rs Kendall’s τ Comparison of Mean Comparison of Variance Pearson’s r Linear and Non-Linear Regression Multi-Component Case Interval + Ratio Scale Multiple Variance Analysis Co-Variance Analysis Multiple Correlation Multiple Regression Statistical Tests / Methods and Measurement Scale
  • 18. Scale System Defining Relation Possible Transformations Central Tendency Dispersion Tests Nominal Equivalence y = f ( x ), where f ( x ) means any one-to-one substitution Mode % in the Mode Non - parametric : Chi - square, Contingency coefficient, Goodman-Kruskal's Lambda, Phi - coefficient Ordinal Equivalence Greater than y = f ( x ), where f ( x ) means any increasing monotonic transformation Median Percentiles Non - parametric : Spearman's Rho Kendall's Tau Kolmogorov-Smirnov Goodman-Kruskal's Gamma Phi – coefficient Interval Equivalence Greater than Known ratio of any two intervals any linear transformation : y = a. x + b ( a > 0 ) Mean Standard Deviation Parametric and Non-parametric : t - test F - test Pearson's r Point Biserial etc. Ratio Equivalence Greater than Known ratio of any two intervals Known ratio of any two scale values y = c. x ( c > 0 ) Geometric Mean Coefficient of Variation Parametric and Non-parametric : t - test F - test Pearson's r Point Biserial etc. Scale of Measurement and Statistical Measures and Tests
  • 19. Levels of Measurement Mapping Techniques Nominal Scale Chorochromatic Map Choroschematic Map Ordinal Scale Flowline Map Ray Map Qualitative Dot Map Interval and Ratio Scale Isarithmic Maps Quantitative Dot Map Choropleth Map Animated Map Landform Map Cartograms Single Component Map Bi-component Map Multi-component Maps Mapping Techniques and Measurement Scale
  • 21. Sources of Geographical Information 1. Field Observations a) Quantitative Measurements b) Qualitative Observations 2. Archival Sources a) Areal Archives: Maps, Air-photo, Satellite Image b) Non-areal Sources: Census Data 3. Theoretical Works a) Mathematical Models b) Analogue Models i. Physical Simulation Models ii. Monte Carlo Simulation Models 1. Locational Data are collected mainly for non-geographical purposes. 2. Geographers depend on the original accuracy of the Survey. 3. Data are released in ‘bundles’, i.e., administrative areas which are inconvenient and anachronistic and pose extremely acute problems in mapping and interpretations.
  • 22. 1. Sampling is the process of selecting units (e.g., people, organizations) from a population of interest so that by studying the sample we may fairly generalize our results back to the population from which they were chosen. 2. It is necessary because we usually cannot gather data from the entire population due to large or inaccessible population or lack of resources 3. It is used to draw inferences about the population as a whole. 4. The subset of units that are selected is called a sample. Sampling The purpose of geography is ‘...... to provide accurate, orderly, and rational description and interpretation of the variable character of the earth surface’ (Hartshrone, 1959). This can only be achieved in two particular ways ― 1. first, by increasing the database to an optimum level, and 2. second by taking representative samples.
  • 23. Principles of Sampling 1. A statistical population (universe) is defined as the total set of measurements which could be taken from the entity being studied. 2. A geographical population is too large to have either the time or money to examine. 3. Therefore, we select from the population some sub-set of individuals for study. These are called ‘samples’, and the selection procedure, ‘sampling’. For example, all students of a college are considered as a population, but students of a particular class may be taken as samples. A sample of entities should faithfully represent its parent population so that estimates from sample can be an accurate estimate of the corresponding figure in the whole population. The equivalent actual figure in the population is known as a parameter. The fundamental principles of sampling are ― 1. Sample units must be chosen in a systematic and objective manner 2. They must be clearly defined and easily identifiable 3. They must be independent of one another 4. Same units of samples should be used throughout the study 5. Selection process should be based on sound criteria and should avoid error, bias and distortions.
  • 24. 1. There are several measures of inferential statistics to test whether any significant difference is there between sample statistic and population parameter. If the difference is nil, samples are representative. 2. However, inaccuracy may be there due to errors in measurement, computation, and data processing. 3. Therefore, the answer to the question as to which sampling to be used: choose the method that gives the most representative results. 4. The number of items to be included in a sample yields the concept of sampling fraction that can be estimated using techniques available for a certain type of sampling. Basically, the following methods are there ― A. Non-probability Sampling: based on researcher’s subjective mind 1. Judgmental or Purposive (when only minimal information about the parent population is available, or only certain sites are accessible) 2. Convenience or Accessibility (based on the ease of access to the sites or members of population) 3. Quota (a combination of the above two: first the no. of observations are decided and then samples are finally identified from these) 4. Volunteer (when certain members of the population / respondents volunteer information)
  • 25. B. Probability Sampling: based on probability, i.e., an element of chance exists as to whether a particular observation is included in a sample. 1. Random: Samples are collected by chance methods or random numbers. Random numbers could be generated in Calculator or a PC. 2. Systematic: By numbering each subject and then selecting the k th subject . 3. Stratified: The population is divided into groups (strata) at first depending on the `importance of study. Then samples were selected randomly within each strata. 4. Cluster: The population is divided into groups (clusters) by some means (geographic area). Then some clusters are randomly selected. For Example: 1. geomorphologists collect slope data often based on quota sampling; 2. meteorologists capture weather patterns by sampling at measuring stations, knowing that conditions between these sample locations will vary systematically and smoothly. 3. Census data are published in spatially aggregated form and the TFL ensures that the variance within each aggregate unit is not so large as to render the aggregated data meaningless.
  • 26. 1. The ability to represent geographical features as points is certainly scale- dependent: a school may be a point feature on 1: 50000 scale maps, but not on a 1: 2500 scale map. 2. In geography, point distribution is basic and the most fundamental: as lines are simply ordered set of points and, areas are polygons / closed traverses formed by lines. Hence, the sampling unit in most geographic studies is either a point or a quadrat. 3. The basic purpose of analysis is to identify and distinguish homogeneity vs. heterogeneity, isotropy vs. anisotropy, and randomness. There are five types of Spatial Sampling ― 1. Random: By using a grid overlay on a map, a population of co-ordinate points can be generated, each one of which can be identified by its x and y – coordinates, and random numbers (from Tables, Calculators, PCs, etc) can then be used for the selection of random samples. 2. Regular or Gridded: (also called, systematic area sampling planned on perfectly square, or rectangular or triangular grids) The area is first divided into uniform grids and samples are selected in a regular manner from each grid. 3. Clustered: It is usually not planned, but often forced by patchy distribution of objects.
  • 27. 4. Uniform or Systematic Random: It is initiated by the random selection of a points or quadrats; then following a predetermined plan the remainder of the sample is selected (e.g., planned by randomization within grid squares). 1. There are two basic designs for this ― one has points aligned in a checkerboard fashion, while in the other, the points are non-aligned. 2. In the first case, the area is divided into a set of grids all containing the same number of points. One point is chosen randomly in the first cell; the corresponding point locations in other cells are then selected as the remaining sample members. 3. In the second case, the first point is chosen as above. Its x – coordinate is then held constant for all the remaining grids across the top row and a point in each of these squares is then chosen by randomly selecting new y – coordinates. Similarly, for other cells in the first column, the y – coordinate of the point in the first cell is now held constant, while the x – coordinate is randomized. 5. Traverse: The use of cross-sectional lines and traverses has always been favoured in geographic approaches and found more efficient in estimating spatial pattern. These are also the common practice, especially when constrained by access and exposure, viz., contours, rivers, coasts, lakes, volcanoes, roads, roads, etc.
  • 28. As no single technique can satisfy a geographer’s objective, he is free to make his own sampling design, e.g., quota, multi-phase, multi-stage, etc (Haggett, 1968), as follows ― Data Collecting System Purposive or Hunch Sampling Controlled Sampling Complete Survey Random Designs Systematic Designs With Purposive With Systematic Unaligned Designs Stratification Stratification Nested or Stratified Systematic or Hierarchical Designs Unaligned Designs
  • 29. 1. Any geographical population certainly have some internal variation, which may either be random or systematic. 2. However, we may wish to compute statistics relating to just the overall characteristics of the population. 3. If heterogeneity is apparent at the outset, the whole should be broken into sub-populations and sampled and analysed separately. 4. In the special case of gradients or trends of variation, this must be defined by traverses parallel to the gradient. 5. When variation is not so distinctive, all parts of the area need to be explored equally. Hence, a uniform or gridded design is preferable as random distributions may be too uneven. 6. As a general rule of thumb, a sample size of 30 is usually sufficient for the sampling distribution of the means to be closely approximated by a normal distribution. Thus, acquisition of data comprise the following sequence: define the population → construct a sample frame → select a sampling design → specify the characteristics to be measured for each element of the sample → collect the data
  • 30. 1. There are many methods of sampling when doing research. 2. Simple random sampling is the most ideal one. 3. But researchers seldom have the luxury of time or money to access the whole population. 4. Hence, many compromises often have to be made. Method Best when Simple Random Sampling Whole population is available. Stratified Sampling (random within target groups) There are specific sub-groups to investigate (e.g. demographic groupings). Systematic Sampling (every nth person) When a stream of representative people are available (e.g. in the street). Cluster Sampling When population groups are separated and access to all is difficult, e.g. in many distant cities. Probability Methods This is the best overall group of methods to use as you can subsequently use the most powerful statistical analyses on the results. Summary
  • 31. Quota Methods 1. For a particular analysis and valid results, you can determine the number of people you need to sample. 2. In particular when you are studying a number of groups and when sub- groups are small, then you will need equivalent numbers to enable equivalent analysis and conclusions Method Best when Quota Sampling (get only as many as you need) You have access to a wide population, including sub-groups. Proportionate Quota Sampling (in proportion to population sub-groups) You know the population distribution. across groups, and when normal sampling may not give enough in minority groups. Non-Proportionate Quota Sampling (minimum number from each sub-group) There is likely to a wide variation in the characteristic within minority groups.
  • 32. Selective Methods Sometimes your study leads you to target particular groups. Method Best when Purposive Sampling (based on intent) You are studying particular groups Expert Sampling (seeking ‘experts’) You want expert opinion Snowball Sampling (ask for recommendations) You seek similar subjects (eg. young drinkers) Modal Instance Sampling (focus on typical people) When sought 'typical' opinion may get lost in a wider study, and when you are able to identify the 'typical' group Diversity Sampling (deliberately seeking variation) You are specifically seeking differences, eg. to identify sub-groups or potential conflicts
  • 33. Convenience Methods 1. Good sampling is time-consuming and expensive. 2. Not all experimenters have the time or funds to use more accurate methods. 3. There is a price, of course, in the potential limited validity of results. Method Best when Snowball Sampling (ask for recommendations) You are ethically and socially able to ask and seek similar subjects. Convenience Sampling (use who's available) You cannot proactively seek out subjects. Judgement Sampling (guess a good-enough sample) You are expert and there is no other choice.
  • 34. Ethnographic Methods 1. When doing field-based observations, it is often impossible to intrude into the lives of people you are studying. 2. Samples must thus be surreptitious. 3. It may be based more on who is available and willing to participate in any interviews or studies. Method Best when Selective Sampling (gut feel) Focus is needed in particular group, location, subject, etc. Theoretical Sampling (testing a theory) Theories are emerging and focused sampling may help clarify these. Convenience Sampling (use who's available) You cannot proactively seek out subjects. Judgement Sampling (guess a good-enough sample) You are expert and there is no other choice.
  • 35. Estimates from Samples 1) Population Mean For this, the Standard Error of Mean (SEm ) is first calculated using the following equation — SEm = s / √n, where, s = sample standard deviation, and n = size of sample. The equation has been contrived in such a way that — 1. Population Mean = Sample Mean ± SEm with 0.682 probability 2. Population Mean = Sample Mean ± 2SEm with 0.954 probability 3. Population Mean = Sample Mean ± 3SEm with 0.997 probability Thus with a certain probability, the range of population mean can be easily estimated. Example: Let for a data set of 100, Sample Mean = 12.34 and s = 2.56 Therefore, SEm = s / √n = 2.56 / √100 = 0.256 Thus, Population Mean lies between 12.084 and 12.596 at 0.682 probability Population Mean lies between 11.828 and 12.852 at 0.954 probability Population Mean lies between 11.572 and 13.108 at 0.997 probability
  • 36. 2) Population Standard Deviation The Standard Error of the Standard Deviation (SES ) is first calculated from the following Equation — SES = s / √(2n) where, s = sample standard deviation, and n = size of the sample. The equation has been contrived in such a way that — 1. σ= s ± SES with 0.682 probability 2. σ= s ± 2SES with 0.954 probability 3. σ= s ± 3SES with 0.997 probability Thus with a certain chosen probability, the range of population standard deviation (σ) can be easily estimated. Example For population standard deviation, SES = s / √(2n)= 2.56 / √(2x100)= 0.181 Thus, Population SD lies between 2.379 and 2.741 at 0.682 probability Population SD lies between 2.198 and 2.922 at 0.954 probability Population SD lies between 2.017 and 3.103 at 0.997 probability
  • 37. 3) Proportion of Population The standard error of a percentage is first estimated, as follows — SE% = √(p.q / n) where p is the percentage of a sample possessing the particular attribute, q is the percentage of the sample not possessing that attribute, and n is the number of individuals in the sample. The Population % can be easily estimated by using equations, as follows — 1. Population % = Sample % ± SE% with 0.682 probability 2. Population % = Sample % ± 2SE% with 0.954 probability 3. Population % = Sample % ± 3SE% with 0.997 probability Example Let in a household survey, a sample of 100 produced an estimate of 75% for the percentage of the households have broad band connection (p). Therefore, the estimated percentage of the households not having broad band connection is 25% (q = 100 – 75). SE% = √(p.q / n) = √(75x25 /100) = 4.33 Hence, Proportion of HH in the Population with a BB connection lies between 70.67% and 79.33% at 0.682 probability, between 66.34% and 83.66% at 0.954 probability and between 62.01% and 87.99% at 0.997 probability.
  • 38. 4) Sample Size Sample size (n) can also be estimated with known s, known SEm or SEs or SE% within the confidence limits as follows — n = (s/SEm)2 at 0.682 = (2s/SEm)2 at 0.954, and = (3s/SEm)2 at 0.997 probability, n = (s/2SEs)2 at 0.682 =(2s/2SEs)2 at 0.954, and =(3s/2SEs)2 at 0.997 probability n = p.q/(SE%)2 at 0.682 = 2p.q/(SE%)2 at 0.954, and = 3p.q/(SE%)2at 0.997 probability Example: For s = 3.45 and SEm= 0.25 at 0.682 probability, n = (s/SEm)2 = (3.45/0.25)2 = 190 at 0.954 probability, n = (2s/SEm)2 = (2x3.45/0.25)2= 762 at 0.997 probability, n = (3s/SEm)2 = (3x3.45/0.25)2= 1714 Note: 1. Sometimes, it may so happen that n becomes larger than the population. It is the major drawback of the methods. 2. However, it should not be used without judgement as this is commonly used in situations in which population size is uncountably large.
  • 40. One thing about “future”, of which we can be “certain” is that it will be “utterly fantastic” 1. Probability is the fundamental building block for inferential statistics. 2. It uses theory to make confidence statements about the characteristics of populations based on sample information, or to test hypotheses. 3. It provides a quantitative description of the likely occurrence of a particular event, and are expressed on a scale from 0 to 1 (or 0 to 100%). 4. A rare event has a probability of occurrence close to 0, while a very common event has a probability of occurrence close to 1. 5. We encounter probability statements on an almost daily basis, e.g., chance of rain, or chance of a major earthquake, etc 6. It can be obtained in different ways. Some are purely subjective, based on 'gut feeling' or 'best guess’, while others are either based on observation or derived from theory.
  • 41. A common way of obtaining probabilities is from data. The probability of an event occurring, written as P(E), is defined as the proportion of times the event occurs in a series of trials. Example: to assess the probability of rain on a given day in Kolkata between November and February: Total No. of Days: 4 months = 120 days Observed Rainy Days = 88 days Probability of Rain (November to February): P(rain) = (88 / 120) = 0.73 Conclusion: 1. The probability of Rain on any day during Winter in Kolkata, is 0.73 or 73% and 2. The probability of a Sunny Day during this period in Kolkata is (1 - 0.73) = 0.27 or 27% Rain, Snow, Monsoon, El Nino, Soil Fertility, Cyclone, Earthquake, Slope Failure, Lanslide, High Tide, Flood, River Stage, Species Diversity, …… Crop Production, Industrial Production, Population Growth, Migration, ……. Trend: both temporal and spatial,…etc
  • 42. Probability Theory and Normal Distribution Concept of Probability 1. The post-quantitative revolution period viewed the organisation of geographic elements over the earth’s surface and that of spaces as a matter of chance. 2. Geographical objects are true representations of multivariate situations; hence, the application of probability statistics. To understand this, the basic concept of the following need to be explored first — a) Random Experiment — are those whose results depend on chance. When a coin is tossed, either head or tail appears; but the result can not be predicted in advance as it depends on chance. b) Outcome — The result of a random experiment is called an outcome. c) Event — denotes what occurs in a random experiment. These are of two types – 1. Elementary (i.e., can not be decomposed into simpler ones) and 2. Composite (i.e., aggregates of several elementary events).
  • 43. d) Mutually Exclusive Events: when two or more of them can not occur simultaneously. Such events can occur only one at a time. d) Exhaustive Events: formed by the complete group of all possible events of a random experiment. For example in coin tossing, the two events ‘head’ and ‘tail’ comprise an exhaustive set. e) Trial: It refers to any particular performance of the random experiment. f) Cases Favourable to an Experiment: g) In dice throwing, there are 6 possible outcomes. Of these, 3 cases (1, 3, 5) are favourable to ‘odd number of points’ and 3 cases (2, 4, 6) are favourable to ‘even number of points’. h) Equally Likely: If all outcomes occur with equal certainty; i.e., none of the outcomes is expected in preference to another. 1st C 2nd C 3rd C Elementary Events TREE DIAGRAM B BBB FAMILY B G BBG B B BGB G
  • 44. Definition of Probability (1) Classical Definition If there are n possible outcomes (that are mutually exclusive, exhaustive and equally likely) in a random experiment, and m of these are favourable to an event A, then the probability of the event, P(A) = m / n 1. When, P(A) = 0, the event is impossible, i.e., m = 0; it occurs when none of the outcomes is favourable to the event (RARE EVENT). 2. When P(A) = 1, the event is certain, i.e., m = n; it occurs when all the outcomes are favourable to the event (ABSOLUTELY CERTAIN EVENT). The classical definition fails when the number of possible outcomes is infinitely large. It has only limited applications in coin tossing, dice-throwing and similar games. (2) Empirical Definition In N trials of a random experiment, if an event occurs f times, its relative frequency is f /N. If it → a limiting value p, as N → infinity, then p is called the ‘probability’ of the event. Thus, p = lim f / N). N →∞
  • 45. (3) Axiomatic Definition a) It introduces ‘probability’ simply as a number associated with each event, based on certain axioms. b) Thus, it embraces all situations. The classical theory is simply a special case of axiomatic theory. c) Let a random experiment has a ‘finite’ number (n) of possible ‘elementary outcomes’, e1, e2, ……….……. ,en. i. the set , S = {e1, e2, ……. , en } is called its ‘sample space’ and its elements ei are called sample points. ii. The sets {ei} consisting of single elements are called ‘elementary events’, while those constituting more than one are called ‘composite events’. iii. The ‘null set’, Ф is called of the impossible event and iv. The ‘universal set’ is called the sure event. a) Let the real numbers p1, p2, ……. , pn correspond to the elementary events {e1}, {e2}, ……. , {en} respectively such that pi ≥ 0 and ∑pi = 1. b) The numbers pi are called probabilities assigned to the elementary events Ai = ei. c) The probability of any event, A = {e1, e2, ……. , en} is then given by the sum of the probabilities associated those outcomes belonging to this event. Hence, P(A) = p1 + p2 + …. + pn
  • 46. Probability Distribution The probability distribution of a random variable is a statement that specifies the set of its possible values together with their respective probabilities. 1) Discrete Probability Distribution Let a discrete random variable X assume the values x1, x2, ….., xn with probabilities p1, p2, ……. , pn respectively, where ∑pi = 1. the specification of the set of values xi together with their probabilities pi (i = 1, 2, 3, …., n) defines the discrete probability distribution of X. Mathematically, f(x) = probability that X assumes the value x = P (X = x) The function f(x) is called the probability mass function (p.m.f.). It satisfies two conditions — a) f(x) ≥ 0 and b) ∑ f(x) = 1
  • 47. a) Uniform Distribution — When a discrete random variable assumes n possible values with equal probabilities, then the probability becomes a constant value 1/n. The p.m.f is given by f(x) = 1/n b) Binomial Distribution — It is defined by the p.m.f. f(x) = nCx px qn-a (x = 0, 1, 2, .., n), where p and q are positive fractions (p+q = 1) c) Poisson Distribution — It is defined by the p.m.f., f(x) = e–m. mx / x! (x = 0, 1, 2, .., n), where m is the ‘parameter’ of Poisson distribution (always +ve) and e is a mathematical constant (2.718 app) given by the infinite series, e = 1 + (1/1!) + (1/2!) + …… (2) Continuous Probability Distribution Let x be a continuous random variable that can take any value in the interval (a, b) i.e., a ≤ x ≤ b. As the number of possible values of x is infinite, probabilities can not be assigned to each one of its value; it is assigned to intervals. For a continuous probability distribution, let f (x) be a non-negative function such that — P (c ≤ x ≤ d) = d∫c f (x) dx
  • 48. The function f (x) is called the probability density function (p.d.f.) or simply density function of x. It satisfies two conditions — a) f(x) ≥0 and b) b∫a f (x) dx = 1. The curve represented by the equation, y = f (x) is known as the probability curve. Geometrically, the integral of the p.d.f. represents the area under the probability curve between interval (c, d) in the range (a, b).
  • 49. a) Uniform Distribution — If the probabilities associated with intervals of equal length are equal at any part of the range, it is called a uniform distribution. It is defined by the p.d.f. f(x) = 1/(b – a): a ≤ x ≤ b. It is also called a rectangular distribution as the distribution looks like rectangle of height 1/(b – a) over the range a ≤ x ≤ b. b) Normal Distribution — It is called Gaussian distribution and is defined by the p.d.f., f(x) = [1 / σ√(2π)]. e–(x –μ)2/2σ2 : –∞ < x < ∞ where, μ = mean, σ = standard deviation, π and e are mathematical constants. 1) The probability curve of normal distribution is known as normal curve which is bell-shaped and symmetrical about the line x = μ, and the two tails extend indefinitely on either side. 2) The maximum ordinate lies at x = μ and is given by y = 1/σ√(2π)
  • 50. 1. In Normal Distributions, mean = median = mode 2. Fractiles (e.g., quartiles, deciles, etc) are equidistant from mean, i.e., quartile deviation = 0.67σ mean deviation = 0.80σ 3. Skewness = 0, Kurtosis = 0 4. The points of inflection of the Normal Curve lie at x = (μ ± σ) 5. The standard score of x is given by — z = (x – μ)/σ It has the p.d.f. of p(z) = [1/√(2π)]. e– z2/2 where –∞ < z < ∞ 1. In geography, a huge data set is likely to follow normal distribution. 2. In sampling theory, any statistic based on a large sample approximately follow normal distribution, thereby simplifying the testing of statistical hypotheses and identifying the confidence limit of parameters. Area under Normal Curve
  • 51. 1. Barber, G. M. (1988): Elementary Statistics for Geographers, The Guilford Press, London 2. Berry, B. J. L. and Marble, D. F. (ed. 1968): Spatial Analysis - a reader in statistical geography, Englewood Cliff, NJ 3. Clark, W. A. V. and P. Hosking (1986): Statistical Methods for Geographers, John Wiley, NY 4. Cressie, N. A. C. (1993): Statistics for Spatial Data, John Wiley & Sons, NY 5. Ebdon, D. (1977): Statistics in Geography - a practical approach, Basil Blackwell, Oxford 6. Gregory, S. (1963): Statistical Methods and the Geographer, Longman, London 7. Haggett, P., A. W. Cliff and A. Frey (1977): Locational Methods, Vol-I & II, Edward Arnold, London 8. Haggett, P. and R. J. Chorley (1969): Network Analysis in Geography, Edward Arnold, London 9. Hammond, R. and McCullagh, P. S. (1974): Quantitative Techniques in Geography, Claredon Press, Oxford 10.Harvey, D. H. (1969): Explanation in Geography, Edward Arnold Pub., London 11.Johnston, R. J. (1978): Multivariate Statistical Analysis in Geography, New York : London 12.King, L. J. (1969): Statistical Analysis in Geography, Englewood Cliffs, NJ : Prentice Hall 13.Kitanidis, P. K. (1997): Introduction to Geostatistics, Cambridge University Press List of Further Reading
  • 52. 14. Limb, M. and Dwyer, C (ed. 2001): Qualitative Methodologies for Geographers, London: Arnold 15. Lindsay, J. M. (1997): Techniques in Human Geography, Routledge 16. Matthews. M. H. and Foster, I. D. L. (1989) : Geographical Data - sources, presentation and analysis, OUP 17. O’Brien, L (1992): Introducing Quantitative Geography, Routledge, London 18. Ripley, B. D. (1981): Spatial Statistics, Wiley, NY 19. Robinson, G. (1998): Methods and Techniques in Human Geography, Wiley, NY 20. Rogerson, P. A. (2001): Statistical Methods for Geography, Sage, London 21. Shaw, R. L. and Wheeler, D. (1985): Statistical Techniques in Geographical Analysis, John Wiley & Sons, NY 22. Streich, T. A. (1986): Geographic Data Processing – an overview, California Univ. Press, Santa Barbara 23. Taylor, P. J. (1977): Quantitative Methods in Geography – an introduction to spatial analysis, Houghton Mifflin, Boston 24. Unwin, D. (1981): Introductory Spatial Analysis, New York : Methuen 25. Walford, N. (2002): Geographical Data – characteristics and sources, Wiley, NJ 26. Worthington, B. D. R. and R. Gont (1975): Techniques in Map Analysis, McMillan Ltd, London 27. Wrigley, N. and Bennett, R. J. (ed.1981): Quantitative Geography, Methuen, London List of Further Reading
  • 53.
  • 54. Thank You Ethics in Statistical Geography 1. Be Honest with Data Enumeration, Measurement and Collection 2. Be Wise while selecting the Statistical Technique(s) for your intended Purpose 3. Explore the Results, observe the Geographical Associations and go for the Statistical Inferences 4. Be Precise and very Simple while translating the “Language of Statistics” into the “Language of Geography” Looking for a Publication in a Peer Reviewed Journal? Visit: www.indiansss.org Indian Journal of Spatial Science Contact: editorijss2012@gmail.com