2. Data Editing in GIS
Editing geographic data is the process of
creating, modifying, or deleting features
and related data on layers in a map.
Each layer is connected to a data source
that defines and stores the features; this
is typically a geodatabase feature class or
a feature service.
3. Data Editing in GIS
Geographic Information System simply represents real world
conditions with the aid of computer.
It is a tool for analyzing the problems.
For that we need some data. It may be spatial or non-spatial.
These data may include errors.
We could expect errors from the original source as well as derived
during encoding.
Before the processing of data it is essential to identify and eliminate
the error, otherwise it will contaminate the GIS data base.
4. Data Editing in GIS
The pre-processing of GIS data i.e. data editing can be grouped into the
following:
Detecting and correcting errors
Reprojection, transformation and generalization
Edge matching and rubber sheeting
5. 1. Detecting and Correcting Errors
Errors affect the quality of GIS data. Once the data is collected,
and prepared for visualization and analysis it must be checked for
errors.
Errors in input data may derive from three main sources.
Errors in the sources of data: It may the errors in maps used by
surveyors or printing errors.
Errors resulting from original measurements (while encoding):
It may be scanning errors, digitizing errors, typing errors, etc.
Errors arising through processing (at the time of transfer and
conversion): While transferring and converting data different
formats makes errors and data loss.
6. Common sources of Errors
Old data sources: The data sources used for a GIS project may be
too old to use. Data collected in past may not be acceptable for
current time projects.
Lack of data: The data for a given area may be incomplete or
entirely lacking. For example the land-use map for border regions
may not be available.
7. Common sources of Errors
Map scale: The details shown on a map depend on the scale used.
Maps or data of the appropriate scale at which details are
required, must be used for the project. Use of wrong scale would
make the analysis erroneous.
Observation: High density of observations in an area increases the
reliability of the data. Insufficient observations may not provide
the level of resolution required for adequate spatial analysis as
expected from the project.
8.
9. Errors resulting from original measurements
Positional accuracy: Representing correct positions of geographic
features on map depend upon the data being used. Biased field
work, improper digitization and scanning errors result in
accuracies in GIS projects.
Content accuracy: Maps must be labeled correctly. An incorrect
labeling can introduce errors which may go unnoticed by the user.
Any omission from map or spatial database may result in
inaccurate analysis.
10. Errors arising through processing
Numerical errors: Different computers have different capabilities
for mathematical operations. Computer processing errors occur in
rounding off operations and are subject to the inherent limits of
number manipulation by the processor.
Topological errors: Data is subject to variation. Errors such as
dangles, slivers, overlap etc are found to be present in the GIS
data layers.
11. Errors arising through processing
Dangle: An arc is said to be a dangling arc if either it is not
connected to another arc properly (undershoot) or is digitized
past its intersection with another arc (overshoot).
12. Errors arising through processing
Sliver: It refers to the gap which is created between the two
polygons when snapping is not considered while creating those
polygons.
These errors can be corrected using the constraints or the rules
which are defined for the layers. Topology rules define the
permissible spatial relationships between features.
13. Errors arising through processing
Digitizing and geocoding:
Many errors arise at the time of digitization, geocoding,
overlaying or rasterizing.
The errors associated with damaged source maps and error
while digitizing can be corrected by comparing original maps
with digitized versions.
14. Errors arising through processing
Raster data editing is concerned with correcting the
specific contents of raster images than their general
geometric characteristics.
The objective of the editing is to produce an image suitable
for raster geoprocessing.
1. Filling holes and gaps: To fill holes and gaps that
appear in the raster image.
15. Errors arising through processing
2. Edge smoothing: To remove or fill single pixel
irregularities in the foreground pixels and background
pixels along lines
16. Errors arising through processing
3. Deskewing: To rotate the image by a small angle so
that it is aligned orthogonally to the x and y axes of the
computer screen
17. Errors arising through processing
4. Filtering: To remove speckles or the random high or
low valued pixels in the image
18. Errors arising through processing
5. Clipping and delete: To create a subset of an image or
to remove unwanted pixels.
19. Errors arising through processing
Vector data editing is a post digitizing process that
ensures that the data is free from errors.
1. Lines intersect properly without having any undershoots
or overshoots
20. Errors arising through processing
2. Nodes are created at all points where lines intersect.
3. All polygons are closed and each of them contain a label
point.
4. Topology of the layer is built.
21. 2. Reprojection, Transformation and Generalization
Once spatial and attribute data have been encoded and edited,
it is necessary to process data geometrically in order to provide
a common reference.
The data derived from various sources should be converted in
to a common projection before they combined and analyzed.
If it not reprojected, data derived from a source map using one
projection will not plot the same location data derived from
another source using another projection.
Data derived from different sources may also have different co-
ordinate systems.
They may have different origins, units of measurements and
orientations.
So it is necessary to transform it in to a common grid system. It
involves some mathematical calculations.
22. 2. Reprojection, Transformation and Generalization
Data may be derived from different maps with different scales.
The generalization should be done while comparing data of
large and small scales.
This will also helps to save time and reduce the space of
storage.
The simplest method for generalization is to delete points
between two points with in a specified interval. But it will not
preserve the space of the object.
When we generalize a map, data loss is a min problem. But it is
necessary with comparison of different scale maps.
Instead of this, compaction technique could be used it will help
to reduce the space with out any data loss.
24. 3. Edge Matching and Rubber Sheeting
When our study area extends across two or more map sheets,
small difference and miss matches may occur.
For that normally each map sheets would be digitize separately
and then adjacent sheets joined after editing, projection,
transformation and generalization.
This joining process is known as edge matching.
This involves three basic steps: Mismatches at sheet boundaries
must be resolved. When the maps are joining, the adjacent
lines and polygons may not join. It should be corrected to
complete features and ensure that the data are correct
topologically.
26. TO AN INTEGRATED GIS DATA BASE
We are preparing an integrated GIS
data base using the edited and
reprojected data from various
sources
27. Raster data models
Raster data model would appear to be very simple, but even
raster data can be stored inside a computer in lot of different
ways.
Let us take the example of GIS containing two layers: a land
use layer depicting a relatively small number of land uses, each
of which is represented by land use code number (Ex: 1 =
Urban, 2 = Forest, 3 = Village, etc) and a transport network
layer (Ex: 0 = None, 1 = Road, 2 = Railway, etc).
The data could be organized in the computer in any of the
following ways.
28. Raster data models
By location (Grid cell) – This would list the data values for each
of the different layers for the first grid cell, then the second
cell and so on.
By coverage – This would store all the data values for the first
coverage (i.e., land use) as a 2D matrix and then all the data
values for the 2nd coverage.
By binary coverage – This would represent all the cells having 1
indicated presence of land use and 0 indicates absence of land
use.
By data value
29. Raster data structure
In a simple raster data structure the geographical entities are
stored in a matrix of rectangular cells.
A code is given to each cell which informs users which entity is
present in which cell.
30. Raster data structure
The simplest way of encoding a raster data into computers can be
understood as follows:
(a) Entity model: It represents the whole raster data. Let us
assume that the raster data belongs to an area where land is
surrounded by water. Here a particular entity (land) is shown in
green color and the area where land is not present is shown by
white.
31. Raster data structure
(b) Pixel values: The pixel value for the full image is shown. Cells
having a part of the land are encoded as 1 and others where land
is not present are encoded as 0.
32. Raster data structure
(c) File structure: It demonstrates the method of coding raster
data. The first row of the file structure data tells that there are 5
rows and 5 columns in the image, and 1 is the maximum pixel
value. The subsequent rows have cells with value as either 0 or 1
(similar to pixel values).
33. Raster data structure
The huge size of the data is a major problem with raster data.
An image consisting of twenty different land-use classes takes the
same storage space as a similar raster map showing the location
of a single forest.
To address this problem many data compaction methods have
been developed which are discussed below:
34. Raster data Compression/ compaction
1. Run-length Encoding
Reduction of data on a row by row basis
Stores a single value for a group of cells rather than storing
values for individual cells
First line represents the dimension of the matrix (5×5) and the
number of entities (1) present.
In second and subsequent lines, the first number in the pair
represents absence (0) or presence (1) of the entity and the
second number indicates the number of cells referenced.
36. Raster data Compression/ compaction
2. Block Encoding
Data is stored in blocks in the raster matrix.
The entity is subdivided into hierarchical blocks and the blocks
are located using coordinates.
The first cell at top left hand is used as the origin for locating
the blocks
37. Raster data Compression/ compaction
3. Chain coding
Works by defining boundary of the entity i.e. sequence of cells
starting from and returning to the given origin
Direction of travel is specified using numbers. (0 = North, 1 =
East, 2 = South, 3 = West)
The first line tells that the coding started at cell (4, 2) and
there is only one chain.
In the second line the first number in the pair tells the
direction and the second number represents the number of
cells lying in this direction.
39. Raster data Compression/ compaction
4. Quad trees
A raster is divided into a hierarchy of quadrants that are
subdivided based on similar value pixels.
The division of the raster stops when a quadrant is made
entirely from cells of the same value.
A quadrant that cannot be subdivided is called a leaf node.
40. Raster data Compression/ compaction
A satellite or remote sensing image is a raster data where each
cell has some value and together these values create a layer.
A raster may have a single layer or multiple layers.
In a multi-layer/ multi-band raster each layer is congruent with
all other layers, have identical numbers of rows and columns,
and have same locations in the plane.
Digital elevation model (DEM) is an example of a single-band
raster dataset each cell of which contains only one value
representing surface elevation.
41. Raster data Compression/ compaction
A single layer raster data can be represented using
a. Two colors (binary): The raster is represented as
binary image with cell values as either 0 or 1 appearing
black and white respectively
42. Raster data Compression/ compaction
Grayscale: Typical remote sensing images are recorded in an 8
bit digital system. A grayscale image is thus represented in 256
shades of gray which range from 0 (black) to 255 (white).
However a human eye can’t make distinction between the 255
different shades. It can only interpret 8 to 16 shades of gray.
43. Raster data Compression/ compaction
A satellite image can have multiple bands, i.e. the
scene/details are captured at different wavelengths
(Ultraviolet- visible- infrared portions) of the electromagnetic
spectrum.
While creating a map we can choose to display a single band of
data or form a color composite using multiple bands.
A combination of any three of the available bands can be used
to create RGB composites.
These composites present a greater amount of information as
compared to that provided by a single band raster.
44. Raster data Compression/ compaction
Data Model Advantages Disadvantages
Raster
Simple data structure Cell size determines the
resolution at which the
data is represented
Compatible with remote
sensing or scanned data
Requires a lot of storage
space
Spatial analysis is easier Projection
transformations are
time consuming
Simulation is easy
because each unit has
the same size and shape
Network linkages are
difficult to establish
45. Raster File Formats
BMP – Bit Map Graphics in MS windows applications
TIFF – Tagged Image File Format
GeoTIFF – Geographic Tagged Image File Format
GIF – Graphic Interchange Format
JPEG – Joint Photographic Experts Group
PNG – Portable Network Graphics
GRID – Global Research Identifier Database
MrSID – Multi resolution Seamless Image Database
46. Vector data model
The vector data model is closely linked with the discrete object
view.
In vector data model, geographical phenomena are represented in
three different forms: Point, Line and Polygon.
Point – A location depicted by a single set of (x, y) co-ordinates at
the scale of abstraction.
The wells in a village, electricity poles in a town and cities in the
world map are the examples of spatial features described by
points.
Note – A city can be marked as a single point on a world map
but polygon on a state map. Scale plays an important role in
deciding the geometry of a geographical feature.
47. Vector data model
Line/ Arc – Ordered sets of (x,y) co-ordinate pair arranged to form
a linear feature.
The roads, rails and telephone cables are the examples of
spatial features described by lines.
Polygon – The set of (x,y) co-ordinate pairs enclosing a
homogeneous area.
The land parcels, agricultural farms and water bodies are the
examples of spatial features described by polygons.
48. Vector data structure
Geographic entities encoded using the vector data model, are
often called features. The features can be divided into two
classes:
a. Simple features
These are easy to create, store and are rendered on screen very
quickly. They lack connectivity relationships and so are inefficient
for modeling phenomena conceptualized as fields.
49. Vector data structure
b. Topological features
A topology is a mathematical procedure that describes how
features are spatially related and ensures data quality of the
spatial relationships. Topological relationships include following
three basic elements:
I. Connectivity: Information about linkages among spatial objects
II. Contiguity: Information about neighboring spatial object
III. Containment: Information about inclusion of one spatial object
within another spatial object
50. Vector data structure
Connectivity
Arc node topology defines connectivity - arcs are connected to
each other if they share a common node.
This is the basis for many network tracing and path finding
operations.
Arcs represent linear features and the borders of area features.
Every arc has a from-node which is the first vertex in the arc
and a to-node which is the last vertex.
These two nodes define the direction of the arc. Nodes indicate
the endpoints and intersections of arcs. They do not exist
independently and therefore cannot be added or deleted
except by adding and deleting arcs.
51. Vector data structure
Nodes can, however, be used to represent
point features which connect segments of
a linear feature (e.g., intersections
connecting street segments, valves
connecting pipe segments).
Arc Node Topology
Node showing Intersection
52. Vector data structure
Arc-node topology is supported through an arc-node list.
For each arc in the list there is a from node and a to node.
Connected arcs are determined by common node
numbers.
Arc-Node Topology with list
53. Vector data structure
Contiguity
Polygon topology defines contiguity. The polygons are said to be
contiguous if they share a common arc. Contiguity allows the
vector data model to determine adjacency.
Polygon Topology
54. Vector data structure
The from node and to node of an arc indicate its direction, and
it helps determining the polygons on its left and right side.
Left-right topology refers to the polygons on the left and right
sides of an arc. In the illustration above, polygon B is on the
left and polygon C is on the right of the arc 4.
Polygon A is outside the boundary of the area covered by
polygons B, C and D. It is called the external or universe
polygon, and represents the world outside the study area.
The universe polygon ensures that each arc always has a left
and right side defined.
55. Vector data structure
Containment
Geographic features cover distinguishable area on the surface of
the earth. An area is represented by one or more boundaries
defining a polygon. The polygons can be simple or they can be
complex with a hole or island in the middle. In the illustration
given below assume a lake with an island in the middle. The lake
actually has two boundaries, one which defines its outer edge
and the other (island) which defines its inner edge. An island
defines the inner boundary of a polygon. The polygon D is made
up of arc 5, 6 and 7. The 0 before the 7 indicates that the arc 7
creates an island in the polygon.
57. Vector data structure
Polygons are represented as an ordered list of arcs and not in
terms of X, Y coordinates. This is called Polygon-Arc topology.
Since arcs define the boundary of polygon, arc coordinates are
stored only once, thereby reducing the amount of data and
ensuring no overlap of boundaries of the adjacent polygons.
58. Vector data structure
Simple Features
Point entities : These represent all geographical entities that are
positioned by a single XY coordinate pair. Along with the XY
coordinates the point must store other information such as what
does the point represent etc.
Line entities : Linear features made by tracing two or more XY
coordinate pair.
Simple line: It requires a start and an end point.
Arc: A set of XY coordinate pairs describing a continuous complex
line. The shorter the line segment and the higher the number of
coordinate pairs, the closer the chain approximates a complex
curve.
59. Vector data structure
Simple Polygons : Enclosed structures formed by joining set of
XY coordinate pairs. The structure is simple but it carries few
disadvantages which are mentioned below:
Lines between adjacent polygons must be digitized and stored
twice, improper digitization give rise to slivers and gaps
Convey no information about neighbor
Creating islands is not possible
60. Vector data structure
Topologic Features
Networks : A network is a topologic feature model which is
defined as a line graph composed of links representing linear
channels of flow and nodes representing their connections. The
topologic relationship between the features is maintained in a
connectivity table. By consulting connectivity table, it is possible
to trace the information flowing in the network
61. Vector data structure
Polygons with explicit topological structures : Introducing
explicit topological relationships takes care of islands as well as
neighbors. The topological structures are built either by creating
topological links during data input or using software. Dual
Independent Map Encoding (DIME) system of US Bureau of the
Census is one of the first attempts to create topology in
geographic data.
63. Vector data structure
Polygons with explicit topological structures
•Polygons are formed using the lines and their nodes.
•Once formed, polygons are individually identified by a unique
identification number.
•The topological information among the polygons is computed and stored
using the adjacency information (the nodes of a line, and identifiers of the
polygons to the left and right of the line) stored with the lines.
64. Vector data structure
Fully topological polygon network structure
•A fully topological polygon network structure is built using
boundary chains that are digitized in any direction.
•It takes care of islands and lakes and allows automatic checks for
improper polygons.
•Neighborhood searches are fully supported.
•These structures are edited by moving the coordinates of
individual points and nodes, by changing polygon attributes and by
cutting out or adding sections of lines or whole polygons.
•Changing coordinates require no modification to the topology but
cutting out or adding lines and polygons requires recalculation of
topology and rebuilding the database.
65. Vector data structure
Triangular Irregular Network (TIN)
TIN represents surface as contiguous non-overlapping triangles
created by performing Delaunay triangulation. These triangles
have a unique property that the circumcircle that passes through
the vertices of a triangle contains no other point inside it. TIN is
created from a set of mass points with x, y and z coordinate
values. This topologic data structure manages information about
the nodes that form each triangle and the neighbors of each
triangle.
67. Vector data structure
Advantages of Delaunay triangulation
The triangles are as equiangular as possible, thus reducing
potential numerical precision problems created by long skinny
triangles
The triangulation is independent of the order the points are
processed
Ensures that any point on the surface is as close as possible to a
node
70. Vector data structure
The TIN model is a vector data model which is stored
using the relational attribute tables. A TIN dataset
contains three basic attribute tables: Arc attribute table
that contains length, from node and to node of all the
edges of all the triangles.
Node attribute table that contains x, y coordinates and z
(elevation) of the vertices
Polygon attribute table that contains the areas of the
triangles, the identification number of the edges and the
identifier of the adjacent polygons.
71. Vector data structure
Storing data in this manner eliminated redundancy as all the
vertices and edges are stored only once even if they are used
for more than one triangle.
As TIN stores topological relationships, the datasets can be
applied to vector based geoprocessing such as automatic
contouring, 3D landscape visualization, volumetric design,
surface characterization etc.
72. Vector data model
Vector Data is represented at
its original resolution
and form without
generalization
The location of each
vertex is to be stored
explicitly
Require less storage
space
Overlay based on
criteria is difficult
Editing is faster and
convenient
Spatial analysis is
cumbersome
Network analysis is fast Simulation is difficult
because each unit has a
different topological
form
Projection
transformations are
easier