SlideShare a Scribd company logo
1 of 89
Watershed Analysis
Lab 8
Lecture 14 1
Basin
• Creates a raster delineating all drainage
basins.
• All cells in the raster will belong to a basin,
even if that basin is only one cell.
• The drainage basins are delineated within
the analysis window by identifying ridge
lines between basins.
• The input flow direction raster is analyzed
to find all sets of connected cells that
belong to the same drainage basin.
Lecture 14 2
• The drainage basins
are created by locating
the pour points at the
edges of the analysis
window (where water
would pour out of the
raster), as well as
sinks, then identifying
the contributing area
above each pour point.
This results in a raster
of drainage basins.
Lecture 14 3
Lab 8 Data
• Report Sheet is on the Website with the
instructions,
• Three files will be downloaded:
– 2 from NY Gis Clearinghouse
– 1 from Cornell University
• The DEM will be obtained from ArcGIS
online.
4
Lecture 14
Step 13 – DEM (30-second Arc)
5
Lecture 14
Measuring in Arc-Seconds
Lecture 14 6
• Some USGS DEM data is stored in a
format that utilizes three, five, or 30 arc-
seconds of longitude and latitude to
register cell values.
• The geographic reference system treats
the globe as if it were a sphere divided into
360 equal parts called degrees.
• Each degree is subdivided into 60
minutes. Each minute is composed of 60
seconds.
• Arc-seconds of latitude remain nearly
constant, while arc-seconds of longitude
decrease in a trigonometric cosine-based
fashion as one moves toward the earth's
poles.
Lecture 14 7
Processing of DEM
• Raster clip – To the buffered park
boundry.
• Raster projection – from Geographic to
UTM Zone 18N, NAD83
• Resample – Bilinear Interpolation
• Change the cell size – from 30 second arc
to 30 meters,
8
Lecture 14
Raster Geometry and Resampling
• Data must often be resampled when
converting between coordinate systems
or changing the cell size of a raster data
set.
• Common methods:
– Nearest neighbor
– Bilinear interpolation
– Cubic convolution
9
Lecture 14
Distance
weighted
averaging
Resampling
Bilinear Interpolation
10
Lecture 14
Step 17 - DEM
Lecture 14 11
DEM_UTM83
FlowDir
Sinks
Filled
FlowDir2
FlowAcc
Reclass
Net
Source
Watershed
Con
Stream Link
Hydrological Modeling
Lecture 14 12
Flow Direction
• Creates a raster of flow direction from
each cell to its steepest downslope
neighbor.
Lecture 14 13
flowDir
Lecture 14 14
Sinks
• A sink is a cell or set of spatially connected cells
whose flow direction cannot be assigned one of
the eight valid values in a flow direction raster.
• This can occur when all neighboring cells are
higher than the processing cell or when two cells
flow into each other, creating a two-cell loop.
• To create an accurate representation of flow
direction and, therefore, accumulated flow, it is
best to use a dataset that is free of sinks.
• A digital elevation model (DEM) that has been
processed to remove all sinks is called a
depressionless DEM.
From ArcGIS 10 Desktop Help Lecture 14 15
High pass filters
Return:
•Small values when smoothly changing
values.
•Large positive values when centered on a
spike
•Large negative values when centered on a
pit
Lecture 14 16
35.7
Lecture 14 17
Lecture 14 18
Step 18 - Sinks
Lecture 14 19
Fill
• Fills sinks in a surface raster to remove
small imperfections in the data.
• Sinks (and peaks) are often errors due to
the resolution of the data or rounding of
elevations to the nearest integer value.
Lecture 14 20
Step 19 - Fill
Lecture 14 21
flowDir2
Lecture 14 22
Flow Accumulation
From ArcGIS 10 Desktop Help Lecture 14 23
Flow Accumulation
Step 21 before inverting
Lecture 14 24
After inverting
Lecture 14 25
Conditional
• The results of Flow Accumulation can be used to
create a stream network by applying a threshold
value to select cells with a high accumulated flow.
• For example, the procedure to create a raster
where the value 1 represents the stream network
on a background of NoData could use one of the
following:
• Perform a conditional operation with the Con tool
with the following settings:
– Input conditional raster : Flowacc
– Expression : Value > 50000
– Input true raster or constant : 1
From ArcGIS 10 Desktop Help
Lecture 14 26
Net50K
Lecture 14 27
Stream Link
• Assigns unique values to
sections of a raster linear
network between intersections.
• Links are the sections of a
stream channel connecting two
successive junctions, a junction
and the outlet, or a junction and
the drainage divide.
• Links are the sections of a
stream channel connecting two
successive junctions, a junction
and the outlet, or a junction and
the drainage divide.
From ArcGIS 10 Desktop Help Lecture 14 28
Watershed 50K
Lecture 14 29
Watershed Clipped to Park
Boundry
Lecture 14 30
31
31
Lecture 14
Data Quality Issues
Ch. 14
Lecture 14
32
Introduction
• Spatial data and analysis standards are
important because of the range of organizations
producing and using spatial data, and the
amount of data transferred between these
organizations.
• There are several types of standards:
– Data standards
– Interoperability standards
– Analysis standards
– Professional and certification standards
Lecture 14
33
Introduction (continued)
• National and international standards
organizations are important in defining and
maintaining geospatial standards:
– Federal Geographic Data Committee (FGDC) which
focuses on the national spatial data infrastructure
(www.fgdc.gov)
– International Spatial Data Standards Commission
which is a clearing house and gateway for
international standards
– Open Geospatial Consortium (OGC) which is
developing interoperability standards. Web Mapping
Service (WMS) standards are an example.
Lecture 14
34
The Geospatial Competency Model
Lecture 14
35
Lecture 14
36
36
GIS Professional Certification
URISA is the founding member of the GIS
Certification Institute, the organization that
administers professional certification for the field
and is dedicated to advancing the industry.
Education: 30 Points
Experience: 60 Points
Contributions: 8 Points
The additional 52 points can be counted from any of the three categories.
The minimum number of points needed to become a certified GIS
Professional as detailed in the three point schedules given below is 150
points. Thus, all applicants are expected to document achievements valued at a
minimum of 150 points. To ensure that applicants have a broad foundation, specific
minimums in each of the three achievement categories must be met or
exceeded. These minimums are as follows:
Lecture 14
37
Lecture 14
38
University Certificates
• UMM – undergraduate
• USM undergrad/grad
• UM – graduate
• Penn State – graduate
• University of Denver
• University of Southern California
• George Mason University
Lecture 14
39
39
Spatial Data Standards
• Data – measurements and observations
• Data quality – a measure of the fitness for
use of data for a particular task (Chrisman,
1994).
• It is the responsibility of the user to insure
that the data is fit for the task.
• Metadata – data about the data
Lecture 14
40
40
Spatial Data Standards
• Spatial Data Standards – methods for structuring,
describing and delivering spatially-referenced data.
• Media Standards – the physical form of the data
(CD/download etc).
• Format Standards – specify data file components
and structures. These standards aid in data
transfer.
• Spatial Data Accuracy Standards –document the
quality of the positional and attribute accuracy.
• Document Standards – define how we describe
spatial data.
Lecture 14
41
41
GIS Is Not Perfect
A GIS cannot perfectly represent the world for many
reasons, including:
• The world is too complex and detailed.
• The data structures or models (raster, vector, or
TIN) used by a GIS to represent the world are not
discriminating or flexible enough.
• We make decisions (how to categorize data, how
to define zones) that are not always fully informed
or justified.
• It is impossible to make a perfect representation
of the world, so uncertainty is inevitable
• Uncertainty degrades the quality of a spatial
representation
Lecture 14
42
42
Concepts Related to Data
Quality
• Related to individual data sets:
– Errors – flaws in data
– Accuracy – the extent to which an estimated
value approaches the true value.
– Precision – the recorded level of detail of your
data.
– Bias – the systematic variation of the data
from reality.
Lecture 14
43
43
Lecture 14
44
44
Concepts Related to Data
Quality
• Related to source data:
– Resolution – the smallest feature in the data
set that can be displayed.
– Generalization- simplification of objects in the
real world to produce scale models and maps.
Lecture 14
45
45
Resolution and generalization of raster datasets
Lecture 14
46
46
Scale-related generalization
Lecture 14
47
47
Data Sets Used for Analysis
Must be:
– Complete – spatially and temporally
– Compatible – same scale, units of measure,
measurement level
– Consistent – both within and between data
sets.
– And Applicable for the analysis being
performed.
Lecture 14
48
48
A Conceptual View of Uncertainty
Real World
Conception
Data conversion and Analysis
Source Data, Measurements &
Representation
Result
Lecture 14
49
49
Uncertainty in The Conception of
Geographic Phenomena
Many spatial objects are not well defined or their
definition is to some extent arbitrary, so that people
can reasonably disagree about whether a particular
object is x or not. There are at least four types of
conceptual uncertainty
– Spatial uncertainty
– Vagueness
– Ambiguity
– Regionalization problems
Lecture 14
50
50
Spatial uncertainty occurs when objects do not
have a discrete, well defined extent.
• They may have indistinct boundaries.
• They may have impacts that extend beyond
their boundaries.
• They may simply be statistical entities.
• The attributes ascribed to spatial objects may
also be subjective.
Spatial uncertainty
Lecture 14
51
51
• Vagueness occurs when the criteria that
define an object as x are not explicit or
rigorous.
• For example:
– In a land cover analysis, how many oaks (or
what proportion of oaks) must be found in a
tract of land to qualify it as oak woodland?
– What incidence of crime (or resident
criminals) defines a high crime neighborhood?
Vagueness (obscureness)
Lecture 14
52
52
Ambiguity
Ambiguity occurs when y is used as a substitute,
or indicator, for x because x is not available.
• The link between direct indicators and the
phenomena for which they substitute is
straightforward and fairly unambiguous (soil
nutrients for crop yield).
• Indirect indicators tend to be more ambiguous and
opaque (wetlands as an indicator of species
diversity).
• Of course, indicators are not simply direct or
indirect; they occupy a continuum. The more
indirect they are, the greater the ambiguity.
Lecture 14
53
53
• Regional geography is largely founded on the
creation of a mosaic of zones that make it easy
to portray spatial data distributions.
• A uniform zone is defined by the extent of a
common characteristic, such as climate,
landform, or soil type.
• Functional zones are areas that delimit the
extent of influence of a facility or feature—for
example, how far people travel to a shopping
center or the geographic extent of support for a
football team.
• Regionalization problems occur because zones
are artificial.
Regionalization problems
Lecture 14
54
54
Uncertainty in the measurement of
geographic phenomena
Error occurs in physical measurement of
objects. This error creates further
uncertainty about the true nature of spatial
objects.
– Physical measurement error
– Digitizing error
– Error caused by combining data sets with
different lineages
Lecture 14
55
55
Physical measurement error
• Instruments and procedures used to make
physical measurements are not perfectly
accurate.
• In addition, the earth is not a perfectly stable
platform from which to make measurements.
Seismic motion, continental drift, and the
wobbling of the earth's axis cause physical
measurements to be inexact. (GPSing error,
remote sensing error)
Lecture 14
56
56
Digitizing Error
A great deal of spatial
data has been
digitized from paper
maps.
Any digitized map requires:
Considerable post-processing
Check for missing features
Connect lines
Remove spurious polygons
Some of these steps can be automated
Lecture 14
57
57
Error caused by combining data
sets with different lineages
• Data sets produced by different agencies or vendors
may not match because different processes were
used to capture or automate the data.
– For example, buildings in one data set may appear on the
opposite side of the street in another data set.
– Error may also be caused by combining sample and
population data or by using sample estimates that are not
robust at fine scales.
– "Lifestyle" data are derived from shopping surveys and
provide business and service planners with up-to-date
socioeconomic data not found in traditional data sources
like the census. Yet the methods by which lifestyle data
are gathered and aggregated to zones or are compared to
census data may not be scientifically rigorous
58
58
Uncertainty in the representation of
geographic phenomena
• Representation is closely related to measurement.
• Representation is not just an input to analysis, but
sometimes also the outcome of it. For this reason, we
consider representation separately from measurement.
– The world is infinitely complex, but computer system are finite.
– Representation is all about the choices that are made in capturing
knowledge about the world
– Uncertainty in earth model: ellipsoid models, datum, projection
types
– Uncertainty in the raster data model (structure)
– Uncertainty in the vector data model (structure)
Lecture 14
59
59
• The raster structure partitions space into square cells of
equal size.
• Spatial objects x, y, and z emerge from cell classification, in
which Cell A1 is classified as x, Cell A2 as y, Cell A3 as z,
and so on, until all cells are evaluated.
• A spatial object x can be defined as a set of contiguous cells
classified as x.
• But not all the area covered by the cell is x
• These impure cells are termed mixed pixels or "mixels."
• Because a cell can hold only one value, a mixel must be
classified as if it were all one thing or another. Therefore, the
raster structure may distort the shape of spatial objects.
Uncertainty in the raster data
structure
Lecture 14
60
Raster – The Mixed Pixel Problem
Landcover map –
Two classes, land or
water
Cell A is
straightforward
What category to
assign
For B, C, or D?
Lecture 14
61
61
Error in raster
• raster
- because of the distortions due to flattening, cells in a raster can never be
perfectly equal in size on the Earth’s surface.
- when information is represented in raster form all detail about variation within
cells is lost, and instead the cell is given a single value. largest share, central
point (f.g. USGS DEM), and mean value (f.g. remote sensing imagery)
Largest share
Central point
8
6 7.5
Mean value
6.33
6
6.29
8
8
8 6
6
6
6
6
8x(1/6)+6x(5/6)=6.33
8x(3/4)+6x(1/4)=7.5
8x(1/7)+6x(6/7)=6.29
Lecture 14
62
62
Figure 10.8 Problems with remotely sensed imagery: (left) example of a satellite image
with cloud cover (A), shadows from topography (B), and shadows from cloud cover
(C); (right) an urban area showing a building leaning away from the camera
Source: Ian Bishop (left) and Google UK (right)
Lecture 14
63
63
• Socioeconomic data—facts about people, houses,
and households—are often best represented as
points.
• For various reasons (to protect privacy, to limit data
volume), data are usually aggregated and reported
at a zonal level, such as census tracts or ZIP
Codes.
• This distorts the data in two ways:
– First, it gives them a spatially inappropriate representation
(polygons instead of points);
– Second, it forces the data into zones whose boundaries
may not respect natural distribution patterns.
Uncertainty in the vector data
structure
Lecture 14
64
64
Map scale Ground distance, accuracy, or resolution
(corresponding to 0.5 mm map distance)
1:1,250 0.625 m
1:2,500 1.25 m
1:5,000 2.5 m
1:10,000 5 m
1:24,000 12 m
1:50,000 25 m
1:100,000 50 m
1:250,000 125 m
1:1,000,000 500 m
1:10,000,000 5 km
Lecture 14
Map Representation Error
65
65
Uncertainty in the data conversion and
analysis of geographic phenomena
Uncertainties in data lead to uncertainties in the results of
analysis; Data conversion and spatial analysis methods
can create further uncertainty
• Data conversion error
• Georeferencing and resampling
• Projection and datum conversions
• The ecological fallacy
• The modifiable areal unit problem (MAUP)
• Classification errors
Lecture 14
66
66
Lecture 14
Ecological Fallacy Example
67
67
Lecture 14
MAUP Example
68
68
Classification error and
quality check
Lecture 14
69
69
Selecting
ROIs
Alfalfa
Cotton
Grass
Fallow
Lecture 14
70
70
Background:
ETM+, 7/15/01
Top image:
IKONOS, Oct, 2000
Classification Result
Lecture 14
71
71
Confusion Matrix
1686
Grass Alfalfa Cotton Chili Fallow
(corn)
total User
accuracy (%)
Grass 110 22 0 0 0 132 83.3
Alfalfa 5 105 0 0 0 110 79.5
Cotton 0 0 945 5 0 950 99.5
Chili 0 0 50 42 0 92 45.7
Fallow 0 0 0 0 484 484 100
total 115 127 995 47 484 1768
Producer
accuracy (%)
95.6 82.7 95.0 89.4 100
G
r
o
u
n
d
t
r
u
t
h
%
4
.
95
1768
1686
_ 

Accuracy
Overlay
%
3
.
89
1768
/
)
484
484
47
92
995
950
127
110
115
132
(
1768
1768
/
)
484
484
47
92
995
950
127
110
115
132
(
1686
_ 











x
x
x
x
x
x
x
x
x
x
Index
Kappa Lecture 14
72
72
• Producer accuracy is a measure indicating the probability
that the classifier has labeled an image pixel into Class A
given that the ground truth is Class A.
• User accuracy is a measure indicating the probability that a
pixel is Class A given that the classifier has labeled the
pixel into Class A
• Overall accuracy is total classification accuracy.
• Kappa index (another parameter for overall accuracy) is a
more useful index for evaluating accuracy.
– Errors of commission represent pixels that belong to another class
but are labeled as belonging to the class.
– Errors of omission represent pixels that belong to the ground truth
class but that the classification technique has failed to classify them
into the proper class.
Bases of Confusion Matrix
Lecture 14
73
73
Finding and Modeling Errors
• Checking for errors
– Visual inspection during data editing and
cleaning.
– Attributes can be checked by using
annotation, line colors and patterns.
– Double digitizing
– Statistical analysis may identify extreme
values of attributes.
Lecture 14
74
74
Finding and Modeling Errors
• Error modeling
– 1. Epsilon modeling
• Based on a method of line generalization, and
adapted by Blakemore.
• It places an error band around a digitized line,
describing the probable distribution of error.
• Error distribution is subject to debate:
– Normal curve
– Piecewise quartile distribution
– Bimodal
• The epsilon band can be used in analyses to
improve the confidence of the user in the result.
Lecture 14
75
75
Figure 10.17 Point-in-polygon categories of containment
Source: Blakemore (1984)
Lecture 14
76
76
Finding and Modeling Errors
• Error modeling
– 2. Monte Carlo simulation – used in
overlays.
• Simulates input data error by adding random noise
to the line coordinates of the map data.
• Each input is assumed to be characterized by an
estimate of positional error.
• This changes the shape of the line.
• The process is repeated multiple times and the
randomized data put through the GIS analyses.
• Output:
– A number
– A map
Lecture 14
77
77
Figure 10.18 Simulating effects of DEM error and algorithm uncertainty on derived
stream networks
Lecture 14
78
78
Managing GIS Error
• To manage errors we must track and
document them.
• The concepts introduced earlier:
– Accuracy, Precision, Resolution,
Generalization, Bias, Compatibility,
Completeness and Consistency
provide a checklist of quality indicators:
• These should be documented for each
data layer.
Lecture 14
79
79
Managing GIS Error
• Data quality information can be used to
create a data lineage.
• A record of the data history that presents
essential information about the
development of the data.
• This becomes the metadata.
Lecture 14
80
80
Living with uncertainty
• uncertainty is inevitable and easier to find,
• use metadata to document the uncertainty
• sensitivity analysis to find the impacts of input
uncertainty on output,
• rely on multiple sources of data,
• be honest and informative in reporting the results of GIS
analysis.
• US Federal Geographic Data Committee lists five
components of data quality: attribute accuracy,
positional accuracy, logical consistency, completeness,
and lineage (details see www.fgdc.gov)
Lecture 14
81
81
Basics of FGDC
• Federal Geographic Data Committee
(FGDC) metadata answers the who, what,
where, when, how and why questions of
geospatial data.
• The data structure and elements defined
for FGDC metadata are described fully in
the “Content Standard for Digital
Geospatial Metadata” (CSDGM).
Lecture 14
82
82
SEVEN SECTIONS OF FGDC
The Federal Geographic Data Committee
(FGDC), Content Standard for Digital Geospatial
Metadata (CSDGM) organizes a metadata
record into seven main sections:
– Identification Information
– Data Quality Information
– Spatial Data Organization Information
– Spatial Reference Information
– Entity and Attribute Information
– Distribution Information
– Metadata Reference Information
Lecture 14
83
83
Lecture 14
Identification Information
• What is the name of the dataset?
• What is the subject or theme of the information included?
• What is the scale of the dataset?
• What are the attributes of the dataset?
• Where is the geographic location of the dataset?
• Who developed the dataset?
• Who provided the source material for the dataset?
• Who will publish the dataset?
• When were the features of the dataset identified?
• How are the features of the dataset depicted?
• Why was the data set created?
• Are there restrictions on accessing or using the data?
• Are external files available that are related to the dataset?
84
84
Lecture 14
Data Quality Information
• How reliable are the data?
• What are its limitations or inconsistencies?
• What is the positional and attribute accuracy?
• Is the dataset complete?
• Were the consistency and content of the data
verified?
• Where can the sources of the data be located?
• What processes were applied to these sources
and by whom?
85
85
Lecture 14
Spatial Data Organization
• What spatial data model was used to
encode the spatial data?
• How many and what kind of spatial objects
are included in the dataset?
• Are methods other than coordinates, such
as street addresses used to encode
locations?
86
86
Lecture 14
Spatial Reference
• Are coordinate locations encoded using
longitude and latitude?
• What map projections is used?
• What horizontal datum and/or vertical
datum are used?
• What parameters should be used to
convert the data to another coordinate
system?
87
87
Lecture 14
Entity and Attribute Information
• What geographic information (roads,
houses, elevation, temperature, etc.) is
described?
• How is this information coded?
• What do the codes mean?
• What source was used for defining the
attributes or codes, i.e. Cowardin
classification?
88
88
Lecture 14
Distribution
• From whom can the data be obtained?
• What formats are available?
• What media are available?
• Are the data available online?
• What is the price of the data?
89
89
Lecture 14
Metadata Reference
• When were the metadata compiled, and
by whom?
• When was the metadata record created?
• Who is the responsible party?
• When were the metadata last updated?

More Related Content

What's hot

Introduction to GIS
Introduction to GISIntroduction to GIS
Introduction to GIS
Uday kumar Devalla
 
Application of Geo-informatics in Environmental Management
Application of Geo-informatics in Environmental ManagementApplication of Geo-informatics in Environmental Management
Application of Geo-informatics in Environmental Management
MahaMadhu2
 
Chapter one gis
Chapter one gisChapter one gis
Chapter one gis
Gokul Saud
 

What's hot (20)

Spatial analysis & interpolation in ARC GIS
Spatial analysis & interpolation in ARC GISSpatial analysis & interpolation in ARC GIS
Spatial analysis & interpolation in ARC GIS
 
Geographical Information System.ppt
Geographical Information System.pptGeographical Information System.ppt
Geographical Information System.ppt
 
Introduction to GIS
Introduction to GISIntroduction to GIS
Introduction to GIS
 
raster data model
raster data modelraster data model
raster data model
 
Introduction and Application of GIS
Introduction and Application of GISIntroduction and Application of GIS
Introduction and Application of GIS
 
Raster data and Vector data
Raster data and Vector dataRaster data and Vector data
Raster data and Vector data
 
Geographic Phenomena
Geographic PhenomenaGeographic Phenomena
Geographic Phenomena
 
Data base management system
Data base management systemData base management system
Data base management system
 
Gis powerpoint
Gis powerpointGis powerpoint
Gis powerpoint
 
Geoprocessing
GeoprocessingGeoprocessing
Geoprocessing
 
Application of Geo-informatics in Environmental Management
Application of Geo-informatics in Environmental ManagementApplication of Geo-informatics in Environmental Management
Application of Geo-informatics in Environmental Management
 
Chapter one gis
Chapter one gisChapter one gis
Chapter one gis
 
Geo referencing by Mashhood Arif
Geo referencing by Mashhood ArifGeo referencing by Mashhood Arif
Geo referencing by Mashhood Arif
 
Spatial Databases
Spatial DatabasesSpatial Databases
Spatial Databases
 
Spatial data for GIS
Spatial data for GISSpatial data for GIS
Spatial data for GIS
 
ppt spatial data
ppt spatial datappt spatial data
ppt spatial data
 
Urban Landuse/ Landcover change analysis using Remote Sensing and GIS
Urban Landuse/ Landcover change analysis using Remote Sensing and GISUrban Landuse/ Landcover change analysis using Remote Sensing and GIS
Urban Landuse/ Landcover change analysis using Remote Sensing and GIS
 
Geo-spatial Analysis and Modelling
Geo-spatial Analysis and ModellingGeo-spatial Analysis and Modelling
Geo-spatial Analysis and Modelling
 
Arc Geographic Information System (GIS) Digital Elevation Models (DEM).
Arc Geographic Information System (GIS) Digital Elevation Models (DEM).Arc Geographic Information System (GIS) Digital Elevation Models (DEM).
Arc Geographic Information System (GIS) Digital Elevation Models (DEM).
 
Web GIS using Google Map and QGIS
Web GIS using Google Map and QGISWeb GIS using Google Map and QGIS
Web GIS using Google Map and QGIS
 

Similar to GIS Data Quality

대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
NAVER Engineering
 
Unit-3-Part-1 [Autosaved].ppt
Unit-3-Part-1 [Autosaved].pptUnit-3-Part-1 [Autosaved].ppt
Unit-3-Part-1 [Autosaved].ppt
Ramya Nellutla
 
Visual tools for databade queries and analysis
Visual tools for databade queries and analysisVisual tools for databade queries and analysis
Visual tools for databade queries and analysis
moochm
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithms
Farhan Zaki
 

Similar to GIS Data Quality (20)

Daamen r 2010scwr-cpaper
Daamen r 2010scwr-cpaperDaamen r 2010scwr-cpaper
Daamen r 2010scwr-cpaper
 
Dem analaysis and catchment delineation using GIS
Dem analaysis and catchment delineation using GISDem analaysis and catchment delineation using GIS
Dem analaysis and catchment delineation using GIS
 
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioKickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
 
PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...PEARC17:A real-time machine learning and visualization framework for scientif...
PEARC17:A real-time machine learning and visualization framework for scientif...
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
Data Requirements for Groundwater Modelling
Data Requirements for Groundwater ModellingData Requirements for Groundwater Modelling
Data Requirements for Groundwater Modelling
 
Modelling D2D Communications in Cellular Access Networks via Coupled Processors
Modelling D2D Communications in Cellular Access Networks via Coupled ProcessorsModelling D2D Communications in Cellular Access Networks via Coupled Processors
Modelling D2D Communications in Cellular Access Networks via Coupled Processors
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
 
Common Design Elements for Data Movement Eli Dart
Common Design Elements for Data Movement Eli DartCommon Design Elements for Data Movement Eli Dart
Common Design Elements for Data Movement Eli Dart
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
Taming Big Data!
Taming Big Data!Taming Big Data!
Taming Big Data!
 
Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...
Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...
Distributed Monte Carlo Feature Selection: Extracting Informative Features Ou...
 
Unit-3-Part-1 [Autosaved].ppt
Unit-3-Part-1 [Autosaved].pptUnit-3-Part-1 [Autosaved].ppt
Unit-3-Part-1 [Autosaved].ppt
 
Adbms 24 data fragmentation
Adbms 24 data fragmentationAdbms 24 data fragmentation
Adbms 24 data fragmentation
 
Visual tools for databade queries and analysis
Visual tools for databade queries and analysisVisual tools for databade queries and analysis
Visual tools for databade queries and analysis
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
 
Lambda Data Grid
Lambda Data GridLambda Data Grid
Lambda Data Grid
 
Water quality parameters estimation using remote sensing techniques
Water quality parameters estimation using remote sensing techniquesWater quality parameters estimation using remote sensing techniques
Water quality parameters estimation using remote sensing techniques
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithms
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptx
 

More from Dr. Zahir Ali

More from Dr. Zahir Ali (18)

SDI Module IV - Spatial Data Quality.pdf
SDI Module IV - Spatial Data Quality.pdfSDI Module IV - Spatial Data Quality.pdf
SDI Module IV - Spatial Data Quality.pdf
 
SDI Module III - SDI Standardization.pdf
SDI Module III - SDI Standardization.pdfSDI Module III - SDI Standardization.pdf
SDI Module III - SDI Standardization.pdf
 
SDI Module II - Metadata Concepts.pdf
SDI Module II - Metadata Concepts.pdfSDI Module II - Metadata Concepts.pdf
SDI Module II - Metadata Concepts.pdf
 
SDI Module I - Canadian SDI.pdf
SDI Module I - Canadian SDI.pdfSDI Module I - Canadian SDI.pdf
SDI Module I - Canadian SDI.pdf
 
SDI Module IV - Accuracy Assessment.pdf
SDI Module IV - Accuracy Assessment.pdfSDI Module IV - Accuracy Assessment.pdf
SDI Module IV - Accuracy Assessment.pdf
 
SDI Module I - Spatial Data Infrastructure.pdf
SDI Module I - Spatial Data Infrastructure.pdfSDI Module I - Spatial Data Infrastructure.pdf
SDI Module I - Spatial Data Infrastructure.pdf
 
SDI Module III - System Architecture for SDI.pdf
SDI Module III - System Architecture for SDI.pdfSDI Module III - System Architecture for SDI.pdf
SDI Module III - System Architecture for SDI.pdf
 
SDI Module V - GIS Data Modeling.pdf
SDI Module V - GIS Data Modeling.pdfSDI Module V - GIS Data Modeling.pdf
SDI Module V - GIS Data Modeling.pdf
 
SDI Module II - Metadata in ArcGIS.pdf
SDI Module II - Metadata in ArcGIS.pdfSDI Module II - Metadata in ArcGIS.pdf
SDI Module II - Metadata in ArcGIS.pdf
 
SDI Module II - Functionality of Metadata.pdf
SDI Module II - Functionality of Metadata.pdfSDI Module II - Functionality of Metadata.pdf
SDI Module II - Functionality of Metadata.pdf
 
SDI Module III - SDI Development.pdf
SDI Module III - SDI Development.pdfSDI Module III - SDI Development.pdf
SDI Module III - SDI Development.pdf
 
SDI Module IV - Concepts related to Data Quality.pdf
SDI Module IV - Concepts related to Data Quality.pdfSDI Module IV - Concepts related to Data Quality.pdf
SDI Module IV - Concepts related to Data Quality.pdf
 
SDI Module I - Spatial Data.pdf
SDI Module I - Spatial Data.pdfSDI Module I - Spatial Data.pdf
SDI Module I - Spatial Data.pdf
 
SDI Module I - Components of SDI.pdf
SDI Module I - Components of SDI.pdfSDI Module I - Components of SDI.pdf
SDI Module I - Components of SDI.pdf
 
SDI Module V - Data Modeling for SDI.pdf
SDI Module V - Data Modeling for SDI.pdfSDI Module V - Data Modeling for SDI.pdf
SDI Module V - Data Modeling for SDI.pdf
 
SDI Module I - SDI and Decision Making.pdf
SDI Module I - SDI and Decision Making.pdfSDI Module I - SDI and Decision Making.pdf
SDI Module I - SDI and Decision Making.pdf
 
SDI Module III - System Architecture.pdf
SDI Module III - System Architecture.pdfSDI Module III - System Architecture.pdf
SDI Module III - System Architecture.pdf
 
SDI Module IV - Data Quality Information.pdf
SDI Module IV - Data Quality Information.pdfSDI Module IV - Data Quality Information.pdf
SDI Module IV - Data Quality Information.pdf
 

Recently uploaded

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Recently uploaded (20)

SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 

GIS Data Quality

  • 2. Basin • Creates a raster delineating all drainage basins. • All cells in the raster will belong to a basin, even if that basin is only one cell. • The drainage basins are delineated within the analysis window by identifying ridge lines between basins. • The input flow direction raster is analyzed to find all sets of connected cells that belong to the same drainage basin. Lecture 14 2
  • 3. • The drainage basins are created by locating the pour points at the edges of the analysis window (where water would pour out of the raster), as well as sinks, then identifying the contributing area above each pour point. This results in a raster of drainage basins. Lecture 14 3
  • 4. Lab 8 Data • Report Sheet is on the Website with the instructions, • Three files will be downloaded: – 2 from NY Gis Clearinghouse – 1 from Cornell University • The DEM will be obtained from ArcGIS online. 4 Lecture 14
  • 5. Step 13 – DEM (30-second Arc) 5 Lecture 14
  • 6. Measuring in Arc-Seconds Lecture 14 6 • Some USGS DEM data is stored in a format that utilizes three, five, or 30 arc- seconds of longitude and latitude to register cell values. • The geographic reference system treats the globe as if it were a sphere divided into 360 equal parts called degrees.
  • 7. • Each degree is subdivided into 60 minutes. Each minute is composed of 60 seconds. • Arc-seconds of latitude remain nearly constant, while arc-seconds of longitude decrease in a trigonometric cosine-based fashion as one moves toward the earth's poles. Lecture 14 7
  • 8. Processing of DEM • Raster clip – To the buffered park boundry. • Raster projection – from Geographic to UTM Zone 18N, NAD83 • Resample – Bilinear Interpolation • Change the cell size – from 30 second arc to 30 meters, 8 Lecture 14
  • 9. Raster Geometry and Resampling • Data must often be resampled when converting between coordinate systems or changing the cell size of a raster data set. • Common methods: – Nearest neighbor – Bilinear interpolation – Cubic convolution 9 Lecture 14
  • 11. Step 17 - DEM Lecture 14 11
  • 13. Flow Direction • Creates a raster of flow direction from each cell to its steepest downslope neighbor. Lecture 14 13
  • 15. Sinks • A sink is a cell or set of spatially connected cells whose flow direction cannot be assigned one of the eight valid values in a flow direction raster. • This can occur when all neighboring cells are higher than the processing cell or when two cells flow into each other, creating a two-cell loop. • To create an accurate representation of flow direction and, therefore, accumulated flow, it is best to use a dataset that is free of sinks. • A digital elevation model (DEM) that has been processed to remove all sinks is called a depressionless DEM. From ArcGIS 10 Desktop Help Lecture 14 15
  • 16. High pass filters Return: •Small values when smoothly changing values. •Large positive values when centered on a spike •Large negative values when centered on a pit Lecture 14 16
  • 19. Step 18 - Sinks Lecture 14 19
  • 20. Fill • Fills sinks in a surface raster to remove small imperfections in the data. • Sinks (and peaks) are often errors due to the resolution of the data or rounding of elevations to the nearest integer value. Lecture 14 20
  • 21. Step 19 - Fill Lecture 14 21
  • 23. Flow Accumulation From ArcGIS 10 Desktop Help Lecture 14 23
  • 24. Flow Accumulation Step 21 before inverting Lecture 14 24
  • 26. Conditional • The results of Flow Accumulation can be used to create a stream network by applying a threshold value to select cells with a high accumulated flow. • For example, the procedure to create a raster where the value 1 represents the stream network on a background of NoData could use one of the following: • Perform a conditional operation with the Con tool with the following settings: – Input conditional raster : Flowacc – Expression : Value > 50000 – Input true raster or constant : 1 From ArcGIS 10 Desktop Help Lecture 14 26
  • 28. Stream Link • Assigns unique values to sections of a raster linear network between intersections. • Links are the sections of a stream channel connecting two successive junctions, a junction and the outlet, or a junction and the drainage divide. • Links are the sections of a stream channel connecting two successive junctions, a junction and the outlet, or a junction and the drainage divide. From ArcGIS 10 Desktop Help Lecture 14 28
  • 30. Watershed Clipped to Park Boundry Lecture 14 30
  • 31. 31 31 Lecture 14 Data Quality Issues Ch. 14 Lecture 14
  • 32. 32 Introduction • Spatial data and analysis standards are important because of the range of organizations producing and using spatial data, and the amount of data transferred between these organizations. • There are several types of standards: – Data standards – Interoperability standards – Analysis standards – Professional and certification standards Lecture 14
  • 33. 33 Introduction (continued) • National and international standards organizations are important in defining and maintaining geospatial standards: – Federal Geographic Data Committee (FGDC) which focuses on the national spatial data infrastructure (www.fgdc.gov) – International Spatial Data Standards Commission which is a clearing house and gateway for international standards – Open Geospatial Consortium (OGC) which is developing interoperability standards. Web Mapping Service (WMS) standards are an example. Lecture 14
  • 34. 34 The Geospatial Competency Model Lecture 14
  • 36. 36 36 GIS Professional Certification URISA is the founding member of the GIS Certification Institute, the organization that administers professional certification for the field and is dedicated to advancing the industry. Education: 30 Points Experience: 60 Points Contributions: 8 Points The additional 52 points can be counted from any of the three categories. The minimum number of points needed to become a certified GIS Professional as detailed in the three point schedules given below is 150 points. Thus, all applicants are expected to document achievements valued at a minimum of 150 points. To ensure that applicants have a broad foundation, specific minimums in each of the three achievement categories must be met or exceeded. These minimums are as follows: Lecture 14
  • 38. 38 University Certificates • UMM – undergraduate • USM undergrad/grad • UM – graduate • Penn State – graduate • University of Denver • University of Southern California • George Mason University Lecture 14
  • 39. 39 39 Spatial Data Standards • Data – measurements and observations • Data quality – a measure of the fitness for use of data for a particular task (Chrisman, 1994). • It is the responsibility of the user to insure that the data is fit for the task. • Metadata – data about the data Lecture 14
  • 40. 40 40 Spatial Data Standards • Spatial Data Standards – methods for structuring, describing and delivering spatially-referenced data. • Media Standards – the physical form of the data (CD/download etc). • Format Standards – specify data file components and structures. These standards aid in data transfer. • Spatial Data Accuracy Standards –document the quality of the positional and attribute accuracy. • Document Standards – define how we describe spatial data. Lecture 14
  • 41. 41 41 GIS Is Not Perfect A GIS cannot perfectly represent the world for many reasons, including: • The world is too complex and detailed. • The data structures or models (raster, vector, or TIN) used by a GIS to represent the world are not discriminating or flexible enough. • We make decisions (how to categorize data, how to define zones) that are not always fully informed or justified. • It is impossible to make a perfect representation of the world, so uncertainty is inevitable • Uncertainty degrades the quality of a spatial representation Lecture 14
  • 42. 42 42 Concepts Related to Data Quality • Related to individual data sets: – Errors – flaws in data – Accuracy – the extent to which an estimated value approaches the true value. – Precision – the recorded level of detail of your data. – Bias – the systematic variation of the data from reality. Lecture 14
  • 44. 44 44 Concepts Related to Data Quality • Related to source data: – Resolution – the smallest feature in the data set that can be displayed. – Generalization- simplification of objects in the real world to produce scale models and maps. Lecture 14
  • 45. 45 45 Resolution and generalization of raster datasets Lecture 14
  • 47. 47 47 Data Sets Used for Analysis Must be: – Complete – spatially and temporally – Compatible – same scale, units of measure, measurement level – Consistent – both within and between data sets. – And Applicable for the analysis being performed. Lecture 14
  • 48. 48 48 A Conceptual View of Uncertainty Real World Conception Data conversion and Analysis Source Data, Measurements & Representation Result Lecture 14
  • 49. 49 49 Uncertainty in The Conception of Geographic Phenomena Many spatial objects are not well defined or their definition is to some extent arbitrary, so that people can reasonably disagree about whether a particular object is x or not. There are at least four types of conceptual uncertainty – Spatial uncertainty – Vagueness – Ambiguity – Regionalization problems Lecture 14
  • 50. 50 50 Spatial uncertainty occurs when objects do not have a discrete, well defined extent. • They may have indistinct boundaries. • They may have impacts that extend beyond their boundaries. • They may simply be statistical entities. • The attributes ascribed to spatial objects may also be subjective. Spatial uncertainty Lecture 14
  • 51. 51 51 • Vagueness occurs when the criteria that define an object as x are not explicit or rigorous. • For example: – In a land cover analysis, how many oaks (or what proportion of oaks) must be found in a tract of land to qualify it as oak woodland? – What incidence of crime (or resident criminals) defines a high crime neighborhood? Vagueness (obscureness) Lecture 14
  • 52. 52 52 Ambiguity Ambiguity occurs when y is used as a substitute, or indicator, for x because x is not available. • The link between direct indicators and the phenomena for which they substitute is straightforward and fairly unambiguous (soil nutrients for crop yield). • Indirect indicators tend to be more ambiguous and opaque (wetlands as an indicator of species diversity). • Of course, indicators are not simply direct or indirect; they occupy a continuum. The more indirect they are, the greater the ambiguity. Lecture 14
  • 53. 53 53 • Regional geography is largely founded on the creation of a mosaic of zones that make it easy to portray spatial data distributions. • A uniform zone is defined by the extent of a common characteristic, such as climate, landform, or soil type. • Functional zones are areas that delimit the extent of influence of a facility or feature—for example, how far people travel to a shopping center or the geographic extent of support for a football team. • Regionalization problems occur because zones are artificial. Regionalization problems Lecture 14
  • 54. 54 54 Uncertainty in the measurement of geographic phenomena Error occurs in physical measurement of objects. This error creates further uncertainty about the true nature of spatial objects. – Physical measurement error – Digitizing error – Error caused by combining data sets with different lineages Lecture 14
  • 55. 55 55 Physical measurement error • Instruments and procedures used to make physical measurements are not perfectly accurate. • In addition, the earth is not a perfectly stable platform from which to make measurements. Seismic motion, continental drift, and the wobbling of the earth's axis cause physical measurements to be inexact. (GPSing error, remote sensing error) Lecture 14
  • 56. 56 56 Digitizing Error A great deal of spatial data has been digitized from paper maps. Any digitized map requires: Considerable post-processing Check for missing features Connect lines Remove spurious polygons Some of these steps can be automated Lecture 14
  • 57. 57 57 Error caused by combining data sets with different lineages • Data sets produced by different agencies or vendors may not match because different processes were used to capture or automate the data. – For example, buildings in one data set may appear on the opposite side of the street in another data set. – Error may also be caused by combining sample and population data or by using sample estimates that are not robust at fine scales. – "Lifestyle" data are derived from shopping surveys and provide business and service planners with up-to-date socioeconomic data not found in traditional data sources like the census. Yet the methods by which lifestyle data are gathered and aggregated to zones or are compared to census data may not be scientifically rigorous
  • 58. 58 58 Uncertainty in the representation of geographic phenomena • Representation is closely related to measurement. • Representation is not just an input to analysis, but sometimes also the outcome of it. For this reason, we consider representation separately from measurement. – The world is infinitely complex, but computer system are finite. – Representation is all about the choices that are made in capturing knowledge about the world – Uncertainty in earth model: ellipsoid models, datum, projection types – Uncertainty in the raster data model (structure) – Uncertainty in the vector data model (structure) Lecture 14
  • 59. 59 59 • The raster structure partitions space into square cells of equal size. • Spatial objects x, y, and z emerge from cell classification, in which Cell A1 is classified as x, Cell A2 as y, Cell A3 as z, and so on, until all cells are evaluated. • A spatial object x can be defined as a set of contiguous cells classified as x. • But not all the area covered by the cell is x • These impure cells are termed mixed pixels or "mixels." • Because a cell can hold only one value, a mixel must be classified as if it were all one thing or another. Therefore, the raster structure may distort the shape of spatial objects. Uncertainty in the raster data structure Lecture 14
  • 60. 60 Raster – The Mixed Pixel Problem Landcover map – Two classes, land or water Cell A is straightforward What category to assign For B, C, or D? Lecture 14
  • 61. 61 61 Error in raster • raster - because of the distortions due to flattening, cells in a raster can never be perfectly equal in size on the Earth’s surface. - when information is represented in raster form all detail about variation within cells is lost, and instead the cell is given a single value. largest share, central point (f.g. USGS DEM), and mean value (f.g. remote sensing imagery) Largest share Central point 8 6 7.5 Mean value 6.33 6 6.29 8 8 8 6 6 6 6 6 8x(1/6)+6x(5/6)=6.33 8x(3/4)+6x(1/4)=7.5 8x(1/7)+6x(6/7)=6.29 Lecture 14
  • 62. 62 62 Figure 10.8 Problems with remotely sensed imagery: (left) example of a satellite image with cloud cover (A), shadows from topography (B), and shadows from cloud cover (C); (right) an urban area showing a building leaning away from the camera Source: Ian Bishop (left) and Google UK (right) Lecture 14
  • 63. 63 63 • Socioeconomic data—facts about people, houses, and households—are often best represented as points. • For various reasons (to protect privacy, to limit data volume), data are usually aggregated and reported at a zonal level, such as census tracts or ZIP Codes. • This distorts the data in two ways: – First, it gives them a spatially inappropriate representation (polygons instead of points); – Second, it forces the data into zones whose boundaries may not respect natural distribution patterns. Uncertainty in the vector data structure Lecture 14
  • 64. 64 64 Map scale Ground distance, accuracy, or resolution (corresponding to 0.5 mm map distance) 1:1,250 0.625 m 1:2,500 1.25 m 1:5,000 2.5 m 1:10,000 5 m 1:24,000 12 m 1:50,000 25 m 1:100,000 50 m 1:250,000 125 m 1:1,000,000 500 m 1:10,000,000 5 km Lecture 14 Map Representation Error
  • 65. 65 65 Uncertainty in the data conversion and analysis of geographic phenomena Uncertainties in data lead to uncertainties in the results of analysis; Data conversion and spatial analysis methods can create further uncertainty • Data conversion error • Georeferencing and resampling • Projection and datum conversions • The ecological fallacy • The modifiable areal unit problem (MAUP) • Classification errors Lecture 14
  • 70. 70 70 Background: ETM+, 7/15/01 Top image: IKONOS, Oct, 2000 Classification Result Lecture 14
  • 71. 71 71 Confusion Matrix 1686 Grass Alfalfa Cotton Chili Fallow (corn) total User accuracy (%) Grass 110 22 0 0 0 132 83.3 Alfalfa 5 105 0 0 0 110 79.5 Cotton 0 0 945 5 0 950 99.5 Chili 0 0 50 42 0 92 45.7 Fallow 0 0 0 0 484 484 100 total 115 127 995 47 484 1768 Producer accuracy (%) 95.6 82.7 95.0 89.4 100 G r o u n d t r u t h % 4 . 95 1768 1686 _   Accuracy Overlay % 3 . 89 1768 / ) 484 484 47 92 995 950 127 110 115 132 ( 1768 1768 / ) 484 484 47 92 995 950 127 110 115 132 ( 1686 _             x x x x x x x x x x Index Kappa Lecture 14
  • 72. 72 72 • Producer accuracy is a measure indicating the probability that the classifier has labeled an image pixel into Class A given that the ground truth is Class A. • User accuracy is a measure indicating the probability that a pixel is Class A given that the classifier has labeled the pixel into Class A • Overall accuracy is total classification accuracy. • Kappa index (another parameter for overall accuracy) is a more useful index for evaluating accuracy. – Errors of commission represent pixels that belong to another class but are labeled as belonging to the class. – Errors of omission represent pixels that belong to the ground truth class but that the classification technique has failed to classify them into the proper class. Bases of Confusion Matrix Lecture 14
  • 73. 73 73 Finding and Modeling Errors • Checking for errors – Visual inspection during data editing and cleaning. – Attributes can be checked by using annotation, line colors and patterns. – Double digitizing – Statistical analysis may identify extreme values of attributes. Lecture 14
  • 74. 74 74 Finding and Modeling Errors • Error modeling – 1. Epsilon modeling • Based on a method of line generalization, and adapted by Blakemore. • It places an error band around a digitized line, describing the probable distribution of error. • Error distribution is subject to debate: – Normal curve – Piecewise quartile distribution – Bimodal • The epsilon band can be used in analyses to improve the confidence of the user in the result. Lecture 14
  • 75. 75 75 Figure 10.17 Point-in-polygon categories of containment Source: Blakemore (1984) Lecture 14
  • 76. 76 76 Finding and Modeling Errors • Error modeling – 2. Monte Carlo simulation – used in overlays. • Simulates input data error by adding random noise to the line coordinates of the map data. • Each input is assumed to be characterized by an estimate of positional error. • This changes the shape of the line. • The process is repeated multiple times and the randomized data put through the GIS analyses. • Output: – A number – A map Lecture 14
  • 77. 77 77 Figure 10.18 Simulating effects of DEM error and algorithm uncertainty on derived stream networks Lecture 14
  • 78. 78 78 Managing GIS Error • To manage errors we must track and document them. • The concepts introduced earlier: – Accuracy, Precision, Resolution, Generalization, Bias, Compatibility, Completeness and Consistency provide a checklist of quality indicators: • These should be documented for each data layer. Lecture 14
  • 79. 79 79 Managing GIS Error • Data quality information can be used to create a data lineage. • A record of the data history that presents essential information about the development of the data. • This becomes the metadata. Lecture 14
  • 80. 80 80 Living with uncertainty • uncertainty is inevitable and easier to find, • use metadata to document the uncertainty • sensitivity analysis to find the impacts of input uncertainty on output, • rely on multiple sources of data, • be honest and informative in reporting the results of GIS analysis. • US Federal Geographic Data Committee lists five components of data quality: attribute accuracy, positional accuracy, logical consistency, completeness, and lineage (details see www.fgdc.gov) Lecture 14
  • 81. 81 81 Basics of FGDC • Federal Geographic Data Committee (FGDC) metadata answers the who, what, where, when, how and why questions of geospatial data. • The data structure and elements defined for FGDC metadata are described fully in the “Content Standard for Digital Geospatial Metadata” (CSDGM). Lecture 14
  • 82. 82 82 SEVEN SECTIONS OF FGDC The Federal Geographic Data Committee (FGDC), Content Standard for Digital Geospatial Metadata (CSDGM) organizes a metadata record into seven main sections: – Identification Information – Data Quality Information – Spatial Data Organization Information – Spatial Reference Information – Entity and Attribute Information – Distribution Information – Metadata Reference Information Lecture 14
  • 83. 83 83 Lecture 14 Identification Information • What is the name of the dataset? • What is the subject or theme of the information included? • What is the scale of the dataset? • What are the attributes of the dataset? • Where is the geographic location of the dataset? • Who developed the dataset? • Who provided the source material for the dataset? • Who will publish the dataset? • When were the features of the dataset identified? • How are the features of the dataset depicted? • Why was the data set created? • Are there restrictions on accessing or using the data? • Are external files available that are related to the dataset?
  • 84. 84 84 Lecture 14 Data Quality Information • How reliable are the data? • What are its limitations or inconsistencies? • What is the positional and attribute accuracy? • Is the dataset complete? • Were the consistency and content of the data verified? • Where can the sources of the data be located? • What processes were applied to these sources and by whom?
  • 85. 85 85 Lecture 14 Spatial Data Organization • What spatial data model was used to encode the spatial data? • How many and what kind of spatial objects are included in the dataset? • Are methods other than coordinates, such as street addresses used to encode locations?
  • 86. 86 86 Lecture 14 Spatial Reference • Are coordinate locations encoded using longitude and latitude? • What map projections is used? • What horizontal datum and/or vertical datum are used? • What parameters should be used to convert the data to another coordinate system?
  • 87. 87 87 Lecture 14 Entity and Attribute Information • What geographic information (roads, houses, elevation, temperature, etc.) is described? • How is this information coded? • What do the codes mean? • What source was used for defining the attributes or codes, i.e. Cowardin classification?
  • 88. 88 88 Lecture 14 Distribution • From whom can the data be obtained? • What formats are available? • What media are available? • Are the data available online? • What is the price of the data?
  • 89. 89 89 Lecture 14 Metadata Reference • When were the metadata compiled, and by whom? • When was the metadata record created? • Who is the responsible party? • When were the metadata last updated?