Geographical Information System
Dr. Nishant Sinha
– Basics of GIS
– Components of GIS
– GIS Data Models (Raster andVector)
– GIS DataTypes and Metadata
– VariousGIS Data formats and GIS Data
– Process of GIS Data Generation/creation to
– WebGIS –WMS,WFS
Spatial is Special
▪ “Everything is related to everything else,
but near things are more related than
Tobler, W. 1970. A computer movie simulating urban growth
in the Detroit region. Economic Geography 46, 234–40
▪ Sometimes called the First Law of
Geography (because it is generally true!).
How do we describe geographical features?
▪ by recognizing two types of data:
– Spatial data which describes location (where)
– Attribute data which specifies characteristics at that location
(what, how much, and when)
How do we represent these digitally in a GIS?
▪ by grouping into layers based on similar characteristics (e.g
hydrography, elevation, water lines, sewer lines, grocery sales) and
– vector data model
– raster data model
▪ by selecting appropriate data properties for each layer with respect to:
– projection, scale, accuracy, and resolution
How do we incorporate into a computer application system?
▪ by using a relational Data Base Management System (RDBMS)
Representing Geographic Features
▪ no inherent ordering
▪ land use types, county names
▪ inherent order
▪ road class; stream class
▪ often coded to numbers eg SSN but can’t
Known difference between values
▪ No natural zero
▪ can’t say ‘twice as much’
▪ temperature (Celsius or Fahrenheit)
▪ natural zero
▪ ratios make sense (e.g. twice as much)
▪ income, age, rainfall
▪ may be expressed as integer [whole number] or
floating point [decimal fraction]
Attribute data tables can contain locational information, such as addresses or a list of X,Y coordinates. ArcView refers to these as event tables.
However, these must be converted to true spatial data (shape file), for example by geocoding, before they can be displayed as a map.
Attribute data types
ContainTables or feature classes in which:
– rows: entities, records, observations, features:
▪ ‘all’ information about one occurrence of a feature
– columns: attributes, fields, data elements, variables, items (ArcInfo)
▪ one type of information for all features
The key field is an attribute whose values uniquely identify each row
Parcel # Address Block $ Value
8 501 N Hi 1 105,450
9 590 N Hi 2 89,780
36 1001 W. Main 4 101,500
75 1175 W. 1st 12 98,000
Data Base Management Systems (DBMS)
Geographic Information System
A system that doesn't hold maps or pictures but holds a database
GIS Defined …..
▪ A computer-based system for the manipulation
and analysis of geospatial information in which
there is an automated link between a data object
and their spatial location.
(Free on-line textbook)
Roger F. Tomlinson, (born 17
November 1933) is an English
geographer and the primary
originator of modern
information systems (GIS), and
has been acknowledged CM as
the "father of GIS"
G Information S
▪ Data is a fact or collection of facts
▪ Data that is processed, organized, structured
or presented in a given context to make them
useful, are called Information
G Information System
A set of components for:
One example of an Information System:
Microsoft Access database
What is the S in GIS?
▪ 1980s: Geographic Information Systems
– technology for the acquisition and management of spatial information
– software for professional users, e.g. cartographers
– Example: MapInfo
▪ 1990s: Geographic Information Science
– comprehending the underlying conceptual issues of representing data and
processes in space-time
– the science (or theory and concepts) behind the technology
– Example: design spatial data types and operations for querying
▪ 1990s: Geographic Information Studies
– understanding the social, legal and ethical issues associated with the application
of GISy and GISc
▪ 2000s: Geographic Information Services
– Web-sites and service centers for casual users, e.g. travelers
– Service (e.g., GPS, mapquest) for route planning
Location - What is at………….?
The first of these questions seeks to find out
what exists at a particular location.
A location can be described in many ways,
using, for example place name, post code, or
geographic reference such as longitude/latitude
Condition - Where is it………….?
The second question is the converse
of the first and requires spatial data
Instead of identifying what exists at
a given location, one may wish to
find location(s) where certain
conditions are satisfied (e.g., an
unforested section of at-least 2000
square meters in size, within 100
meters of road, and with soils
suitable for supporting buildings)
Trends - What has changed since…………..?
The third question might involve
both the first two and seeks to find
the differences (e.g. in land use or
elevation) over time.
Patterns - What spatial patterns exists….?
This question is more sophisticated
One might ask this question to
determine whether landslides are
mostly occurring near streams. It might
be just as important to know how many
anomalies there are that do not fit the
pattern and where they are located.
Modelling - What if………..?
"What if…" questions are posed to
determine what happens,
if a new road is added to a network or if
a toxic substance seeps into the local
ground water supply.
Answering this type of question requires
both geographic and other information
(as well as specific models). GIS permits
"What's the average number of people
working with GIS in each location?" is an
the answer to which does not require the
stored value of latitude and longitude; nor
does it describe where the places are in
relation with each other.
" How many people work with GIS in the
major centres of Delhi" OR "Which centres
lie within 10 Kms. of each other? ", OR "
What is the shortest route passing through
all these centres".
These are spatial questions that can only
be answered using latitude and longitude
data and other information such as the
radius of earth. Geographic Information
Systems can answer such questions.
Storing Geographic Data
One GIS data layer combines both
Geographic Features and their Attributes
Geographic Features indicate “where”
Storing “Everyday” Geographical Objects
▪ The fundamental primitive is the point, a 0-dimensional
(0-D) object that has a position in space but no length.
– home, day-care, health clinics, schools, retail and tobacco outlets,
crimes & graffiti, bus stops, neighborhood anchor institutions,
community assets, resources and risks
▪ A line is a 1-D geographic object having a length and is
composed of two or more 0-D point objects.
– roads, railway, pathways, walking or bus routes, rivers
▪ Areas (Polygons)
▪ A polygon is a geographic object bounded by at least three
1-D line objects or segments with the requirement that
they must start and end at the same location (i.e., node)
– census unit, ZIP code, school district, police precinct, health
service areas, counties, states, provinces, watersheds
Mapping Geographic Data – India States
India Airports (point layer)
India States (polygon layer)
Analyzing Geographic Data
• Query GIS data layers based on
attributes or geography, or both
Which states’ population was more
than 75 million in 2011?
Analyzing Geographic Data
• Query GIS data layers based on
attributes or geography, or both
Which are the neighboring states
of Madhya Pradesh
What is Scale?
▪ Ratio of distance on a map, to equivalent distance on the earth's surface.
– Large scale: large detail, small area covered (1”=200’ or 1:2,400)
– Small scale -->small detail, large area (1:250,000)
– A given object (e.g. land parcel) appears larger on a large scale map
– Scale can never be constant everywhere on a map
because of map projection
– Scale representation
▪ Verbal: (good for interpretation.)
▪ Representative fraction (RF)
(good for measurement)
(smaller fraction=smaller scale:
1:2,000,000 smaller than 1:2,000)
▪ Scale bar (good if enlarged/reduced)
0ne inch each equals one statute mile
0 1 2
1:2,000 (1”=56 yards; 1cm=20m)
1:62,500 (1.6cm=1km; 1”=.986mi)
1:63,360 (1”=1mile; 1cm=.634km)
1:100,000 (1”=1.58mi; 1cm=1km)
1:500,000 (1”=7.9mi; 1cm=5km)
Large versus Small
large: above 1:12,500
medium: 1:13,000 - 1:126,720
small: 1:130,000 - 1:1,000,000
very small: below 1:1,000,000
( really, relative to what’s available for a given area; Maling 1989)
Map sheet examples:
1:24,000: 7.5 minute USGS Quads
(17 by 22 inches; 6 by 8 miles)
1:7,500,000 US wall map
(26 by 16 inches)
1:20,000,000: US 8.5” X 11”
Precision or Resolution
- it’s not the same as scale or accuracy!
Precision: the exactness of measurement or description
▪ the “size” of the “smallest” feature which can be displayed, recognized, or described
▪ Can apply to space, time (e.g. daily versus annual), or attribute (douglas fir v. conifer)
▪ For raster data, it is the size of the pixel (resolution)
– e.g. for NTGISC digital orthos is 1.6ft (half meter)
▪ raster data can be resampled by combining adjacent cells; this decreases resolution but saves storage
– eg 1.6 ft to 3.2 ft (1/4 storage); to 6.4 ft (1/16 storage)
▪ Resolution and scale
– generally, increasing to larger scale allows features to be observed better and requires higher resolution
– but, because of the human eye’s ability to recognize patterns, features in a lower resolution data set can sometimes be
observed better by decreasing the scale (6.4 ft resolution shown at 1:400 rather than 1:200)
▪ Resolution and positional accuracy
– you can see a feature (resolution), but it may not be in the right place (accuracy)
– Higher accuracy generally costs much more to obtain than higher resolution
– Accuracy cannot be greater (but may be much less) than resolution
▪ e.g. if pixel size is one meter, then best accuracy possible is one meter)
Accuracy: Rests on at least four legs, not one!
Positional Accuracy (sometimes called Quantitative accuracy)
▪ horizontal accuracy: distance from true location
▪ vertical accuracy: difference from true height
▪ Difference from actual time and/or date
Attribute Accuracy or Consistency: the validity concept in experimental design/stat. inf.
– a feature is what the GIS/map purports it to be
– a railroad is a railroad, and not a road
Completeness--the reliability concept from experimental design/stat. inf.
– Are all instances of a feature the GIS/map claims to include, in fact, there?
– Partially a function of the criteria for including features: when does a road become a track?
– Simply put, how much data is missing?
LogicalConsistency: The presence of contradictory relationships in the database
▪ Data for one country is for 2000, for another its for 2001
▪ Data uses different source or estimation technique for different years (again, lineage)
▪ Overshoots and gaps in road networks or parcel polygons
▪ Consists of discrete coordinates to store
the geographic position of
▪ Points: People or Cities (center)
▪ Roads or Other Linkages
▪ Vector Data Model
– Geographic features stored as X,Y
– Each vector layers has an attribute table
– Each feature corresponds to a row in the
Data Types: Vector Data
▪ Raster data represents a continuous surface
divided into a regular grid of cells
▪ Often used as background map layer
▪ Points: People or Cities (center)
▪ Roads or Other Linkages
▪ Raster Data Model
– Stores images as rows and columns of numbers,
forming a regular grid structure
– Great for computational analysis or modeling
– Bad for mapping precise locations
Data Types: Raster Data
Vector vs Raster
• Low data volume
• Faster display
• Can also store attributes
• Less pleasing to the eye
• Does not dictate how features
should look in the GIS
• High data volume
• Slower display
• Has no attribute information
• More pleasing to the eye
• Inherently stores how features
should look in the GIS
▪ Describing the correct location and shape of
features requires a framework for defining
▪ A geographic coordinate system is used to
assign geographic locations to objects.
▪ GIS data layers must have a coordinate
system defined to integrate with other layers
Transforming 3-dimensional space (Earth) onto a 2-dimensional map (GIS)
Mercator Azimuthal Equidistant Albers Equal Area Conic
Lambert Conformal Conic Robinson
Map Projection is important
▪ Small-scale (large area) maps
– Interested in Comparing shapes, areas, distances, or directions of map features?
– Measurement errors can be quite substantial:
Distance: 3,124.67 miles
Projection: Albers EqualArea
Distance: 2,455.03 miles
Actual distance: 2,451 miles
Data collected may need to be reorganized and checked for
errors, before being used for spatial analysis, or mapping
Error detection and correction may include:
- Compare data with input document
- Check topology of spatial objects
- Check attributes of spatial objects
- Check for missing spatial objects
Data Storage and Editing
Three major types of error:
(1) Entity error (positional error). Entity error can take three different forms:
missing entities, incorrectly placed entities, and disordered entities.
(2) Attribute error.Attribute error occurs in both vector and raster systems.
(3) Entity-attribute agreement error (logical consistency).
Of the three basic types of error found in GIS databases, the last two are the
most difficult to find.
Detecting and Editing Errors of Diff. Types
▪ Negative cases of the following statements will cause errors:
1. All entities that should have been entered are present.
2. No extra entities have been digitized.
3. The entities are in the right place and are of the correct shape and size.
4. All entities that are supposed to be connected to each other are connected .
5. All entities are within the outside boundary identified with registration marks.
▪ Dangling node, can be defined as a
single node connected to a single
line entity. Dangling nodes are also
▪ Dangles can result from three
(1)Failure to close a polygon
(2)Failure to connect the node to the
object it was supposed to be connected
to (called an undershoot)
(3)Going beyond the entity you were
supposed to connect to (called an
Source of Errors
▪ Dangles can also be a result of incorrect placement of the digitizing
puck, or improper fuzzy tolerance distance setting.
Distance between left dangle and
its above line segment is 0.25mm
Fuzzy tolerance = 0.1mm, if you
change it to o.3mm, dangle will
▪ Sliver polygons
▪ This occurs when the software uses a
vector model that treats each
polygon as a separate entity. (or
▪ Solution: Use a GIS that does not
require digitizing the same line twice.
▪ Weird polygons
▪ Polygons with missing nodes.
▪ Missing Arcs/segments
Attribute Errors: Raster and Vector
A. Missing row
B. Incorrect or misplaced attributes
Incorrect attribute values are very difficult to detect.
Checklist to Avoid Errors
As geospatial analyst, you should always approach a project with the
obvious sources of error discussed firmly on you mind. Therefore, when
given a task to perform, and the associated data, the following should act
as a good checklist:
– Is the data current?
– Were the data mapped at the correct scale? Do they have the same
– What is the resolution of the data? Will it support the kinds of analysis
we want to perform?
– Do we have all the data for the project areas, or is there some data
– If we need other data sets, are they available, or will we have trouble
▪ The statement “to err is human” is very applicable to creating spatial data. Humans make a
lot of errors. Typing in the wrong value in a computer is a common mistake that humans
make. However, there are other sources of obvious error besides human error:
– Age: a map is a representation of real-world objects at a given point in time. The reliability of a
dataset typically goes down as it gets older. This is especially true of data that would frequently
change such as housing within a city. Many GIS projects take years to complete, and it is entirely
possible that much of the data collected in the beginning of a project may be out of date by the end
of the project.
– Map Scale: In general, larger scale maps show more detail than smaller scale maps. Also, larger
scale maps tend to have greater accuracy than smaller scale maps, especially maps within the “same
family” such as the differences between 1:250,000, 1:100,000 and 1:24,000 GIS will process any of
your data, whether the processing is appropriate or not. Therefore, you can combine data from
different scales rather easily, however, doing so may not be a good idea due to the different
accuracies of the products.
– Data Format: The way we represent data also presents an obvious source of error. For example, a
raster map of landuse represented by 10 meter grid cells will differ significantly from a raster map
of landuse represented by 100 meter grid cells. The following is a grid of landuse values around
Ithaca, NewYork. You can see the differences in representation between a map with 10 meter grid
cells, 30 meter grid cells, and 100 meter grid cells.
Problems with Age
The following maps show the different land cover types between 1968 and
1995. You can see how the data has changed over 30 years, and why using
older data might present a problem.
Components of Data Quality
▪ Positional Accuracy
▪ Attribute Accuracy
▪ positional accuracy relates to the coordinate values for the
geographic objects. But, even positional accuracy is divided into two
– Absolute accuracy: refers to the actual X,Y coordinates of a geographic object.
If one knows the correct position of the geographic object, they can compare the
differences with the position represented in the geographic database.
Typically, absolute accuracy will measure the total different between an
object, or the difference in the X coordinate and the difference in theY
– Relative accuracy: refers to the displacement of two or more points on a map (in
both the distance and angle), compared to the displacement of those same
points in the real world.
Errors Associated with Spatial Analysis
▪ Errors in Digitizing a Map
– Source errors
▪ Boundaries drawn on a map have a “thickness”
– 1 mm line
▪ 1.25 m wide on 1:250 map
▪ 100m wide on 1:100000
▪ Estimates show that 10% of a 1:24000 soil map may represent the boundary lines
– Digital Representation
▪ Curves are approximated by many vertices
▪ Boundaries are not absolute, but should have a confidence interval
Errors Resulting from Natural Variations from Original
▪ Measurement Error
– Accuracy vs. Precision
▪ Accuracy: extent to which an estimated value approaches the true value
▪ Precision: measure of dispersion of observations about a mean
Accuracy and Precision
▪ Accuracy is defined as displacement of a
plotted point from its true position in relation
to an established standard while Precision is
the degree of perfection; or repeatability of a
▪ For mapping, accuracy is associated with
position of an object to its true position.
▪ Precision is then the ability to repeat a
measurement, or how likely you are to return
to the same location time and time again.
▪ The figures to the right illustrate the
differences between accuracy and precision.
▪ Therefore, if there are natural variations in
either the instruments used for
measurement, or the object you are
measuring, the accuracy or precision may be
Digitizing errors from duplicate lines include slivers and missing labels for
the sliver polygons. Slivers are exaggerated for the purpose of illustration.
Digitizing errors of overshoot (left) and undershoot (right)
Digitizing Errors- Overshoot & Undershoots
Digitizing errors of an unclosed polygon
Digitizing errors-Unclosed Polygon
Pseudo nodes, shown by the diamond symbol, are nodes that are not located at
Digitizing errors- Pusedo Nodes
The from-node and to-node of an arc determine the arc’s direction.
Digitizing error of multiple labels due to unclosed polygons
Digitizing Unclosed Polygon –Multi labels
The dangle length specified by the CLEAN command can remove an overshoot if the
overextension is smaller than the specified length. In this diagram, the overshoot a is removed
and the overshoot b remains.
Removing Dangles - Using Clean Command
Typical Digitizing Situations
this is ideal, but...
overshoot, and what
to do with it
d what to do
These slides are aggregations for better understanding of GIS. I acknowledge the
contribution of all the authors and photographers from where I tried to
accumulate the info and used for better presentation.
Dr. Nishant Sinha
Pitney Bowes Software, India