SlideShare a Scribd company logo
1 of 52
Data 101


Fundamentals of data in a GIS
Overview

 Role of data
 Data structures and schemas
 Metadata
 Linking data
 Issues of confidentiality
Review
90 percent rule


                       90% Data Preparation




90% of the cost,                              10% Mapping
time and effort will
be devoted to
data preparation
90% Rule

Data Preparation            Mapping
 Collecting                 Map design
 Cleaning                   Categorization decisions
 Validating                 Production
 Formatting
 Linking with other data
GIS analysis is only as strong as
the data used.
Strategies for strong data

 Accuracy
 Timlieness
 Properly structured
 Properly documented
Data accuracy

 Data should accurately reflect reality
 In GIS there are two types of accuracy to be
  concerned with:
    Spatial accuracy
       Items located correctly
    Attribute accuracy
       Attributes are correct and properly linked to
        geography
Spatial accuracy
      Real Location




         Hotel Suryaa
Spatial Accuracy and Scale



Hotel Suryaa
Attribute Accuracy

 Is the data associated with the location accurate?
 Is it linked to the right geographic entity?
Attribute Accuracy
Timeliness

 Is the data for the
  time period of
  interest?
    Boundaries change
    New features
     created
    Features change
Data Structure

 Proper data structure is necessary in order to
  effectively use data
 Software must know how to read the data, and
  query it.
 The structure of the data is also known as data
  schema
Data Schema

 For most programs, data will need to be stored in
  a row and column format
 GIS programs expect well formed data in the
  following schema:
    One record per geographic unit
    Geographic units don’t repeat in records
    Variables are stored in columns
    No blank cells unless data is missing
Data Schema
Population   China      India      United      Indonesia
                                   States
Total        1339724852 1210193422 312417000   237556363
Percent of   19.23%     17.37%     4.48%       3.41%
World’s
Population
Population   140/km2    368/km2    32/km2      121/km2
Density


Poor data schema
•Columns are geographic units
•Variables are rows
Duplic
               ate Dist   r i ct N
                                  ames
Bla   nk Cells
Proper Data Schema
   Columns are variables




    One record per geographic unit
Metadata

 Data about data
 Provides information on:
    Source of data
    Who created it
    When it was created
    Coordinate system and datum
    Usage and sharing restrictions
Metadata

 Metadata is especially important with spatial data
  because of issues of:
    Spatial accuracy
    Coordinate systems and datums
    Confidentiality
    Timeliness
Metadata formats

 International standard
    ISO 9115
    Mandatory elements
    Schema for metadata
 Countries may have their own national standards
  that are compatible with the ISO standard but
  provide extra elements
Metadata Example
Data Types

 Text
 Numeric
    Coordinates
 Programs assign variables to be a specific type
  which can affect the way the program handles
  data
Data Types

 Text
   Arithmetic can not be conducted on values in text
    fields
 Numeric
   Arithmetic permitted
   May require user to declare number of decimal
    places before entering data
      This can be important when storing coordinates
Linking data

 Key field
    The field that contains information common
     between tables
    Tables are linked using the key field
    Can’t link using key fields that are two different
     types
District      Population     Male Pop     Female Pop
        North         24015          14409        9606
        West          31154          16202        14952
        South         62442          29972        32470


                                      District            Area (sq km)
                                      North               243
District is the key field             West                310
                                      South               602

District           Population      Male Pop      Female Pop     Area (sq km)
North              24015           14409         9606           243
West               31154           16202         14952          310
South              62442           29972         32470          602
Linking data

 Linking using text fields can be problematic
    Variations in spelling
District       Population     Male Pop      Female Pop
       North Kinley   24015          14409         9606
       West           31154          16202         14952
       South          62442          29972         32470


 The two tables have                  District             Area (sq km)
 different spellings for              N. Kinley            243
 the district North Kinley
                                      West                 310
                                      South                602


District          Population       Male Pop       Female Pop     Area (sq km)
West              31154            16202          14952          310
South             62442            29972          32470          602
Linking data
 Linking using numeric fields is often more reliable
  and less vulnerable to variations and other
  issues
 Countries often use numeric codes for
  administrative units to get around problems with
  spelling variations
 If standardized national codes exist, it is a good
  idea to include them in data
    National Bureau of Statistics or Census often
     manage such codes
District         Dist code      Population       Male Pop        Female
                                                                         Pop
        North Kinley 100                24015            14409           9606
        West             200            31154            16202           14952
        South            300            62442            29972           32470

                                     District        Dist code            Area (sq km)
Dist code is the                     N. Kinley       100                  243
key field                            West            200                  310
                                     South           300                  602

District           Dist Code   Population Male Pop               Female     Area (sq km)
                                                                 Pop
North              100         24015             14409           9606       243
West               200         31154             16202           14952      310
South              300         62442             29972           32470      602
Advantage of numeric codes

Can manage hierarchy effectively
                            District   Province   Code
                  Coast     North      Coast      101
Savanna                     North      Mountain   103
                            North      Savanna    105


    Mountain


  North District Code 100
Linking data key points

 Key fields must be of the same type
 Text fields can be problematic due to spelling
  variations
 Numeric fields are often a more reliable key field
 Unique geography codes, if available in a country
  is often the best option for making linkages
Data and confidentiality issues

 Important issue when working with spatial data
 Discuss issues of confidentiality and spatial tools
 Present strategies for protecting confidentiality
Confidentiality

 Protecting identity of individuals
 Requirement
    Informed consent agreements
    Ethical research
Overt disclosure


The act of explicitly
making data available
that breaches
confidentiality
commitments.
Deductive Disclosure
45 year old           45 year old     45 year old female
  female                female
                                        Has 5 children
                     Has 5 children
                                      Works for General
                                       Electric in Delhi




28.67171, 77.21211
Spatial Data

 Overt disclosure
 Makes deductive
  disclosure easier
Geoprivacy

“[an] individual’s right to
prevent disclosure of the
location of one’s home,
workplace, daily activities
or trips.”

Protection of geoprivacy and accuracy
of Spatial Information: How Effective are
Geographical Masks?
Kwan, Casas, Schmitz
Cartographica, Vol 39, #2
Four Principles
 Protection of
  Confidentiality
 Social-Spatial Linkage
 Data Sharing
 Data Preservation

Confidentiality and spatially explicit data:
   Concerns and challenges
VanWey, Rindfuss, Gutmann, Entwisle,
   Balk PNAS, vol. 102, no. 43
1. Protection of Confidentiality

 Fundamental to ethical research
 Information that might lead to physical,
  emotional, financial or other harm
 Protection of information that discloses identity
2. Social-Spatial Linkage

 All human activity takes place on earth
 Understanding that adds context and perspective
 Key to advancement of science
 Essential for understanding the diffusion of
  behaviors
3. Data Sharing

 Essential on both scientific and financial grounds
 Provide access to data for other researchers
 Condition of funders
4. Data Preservation

 Data available in the future
 How long should data be deemed “sensitive”?
 When, if ever, can it be released
Strategies
Random Perturbations

 Random shifting of
  point locations
 Pros: Easy
  (relatively) to do
 Cons: Lose original
  location, introduces
  error
Affine Transformation
 Change scale
 Rotate
 Shift a set distance
 Combination
 Pros: Easy to do
 Cons: Easy to undo,
  can impact some
  types of analysis
Aggregate

 Point locations are
  aggregated to
  higher unit of
  analysis
 Pros: Easy to do
 Cons: Requires
  sufficient data
  points, Finer data
  variations will be lost
Despatialize

 Remove Coordinate
  System
 Use Euclidean space
 Pros: Simple, keeps
  relative position and
  placement
 Cons: Loses
  contextual data
Nothing
 Do not collect or
  release data
 Cold room or on-site
  analysis only
 Pros: Maintains all of
  the original spatial data
 Cons: Complicated,
  limits data sharing,
  limits social-spatial link
Mx u
 a im m


   Spatial Integrity




M im m
 in u

      Mx u
       a im m                       M im m
                                     in u
       R k
         is            Disclosure     R k
                                       is
“Ignoring is unacceptable”

 Can get lost in the excitement about GIS
 Those who collect data must think about the
  confidentiality issues
 Data users must also think about how their
  analysis may increase the risk of deductive
  disclosure.
Key points

 Confidentiality issues arise when spatial context
  is included in data.
 It’s important to protect confidentiality. People
  have an expectation that their identities are
  protected.
 There are strategies that can preserve
  confidentiality, but there is no “one-size-fits-all
  solution”

More Related Content

What's hot

Gis powerpoint
Gis powerpointGis powerpoint
Gis powerpointkaushdave
 
Geographic Information System unit 1
Geographic Information System   unit 1Geographic Information System   unit 1
Geographic Information System unit 1sridevi5983
 
Introduction to GPS
Introduction to GPSIntroduction to GPS
Introduction to GPSDaniel Kim
 
Seminar on gis analysis functions
Seminar on gis analysis functionsSeminar on gis analysis functions
Seminar on gis analysis functionsPramoda Raj
 
Introduction to GIS
Introduction to GISIntroduction to GIS
Introduction to GISJoey Li
 
CARTOGRAPHY – yesterday, today and tomorrow
CARTOGRAPHY – yesterday, today and tomorrowCARTOGRAPHY – yesterday, today and tomorrow
CARTOGRAPHY – yesterday, today and tomorrowProf Ashis Sarkar
 
GNSS - GPS Surveying
GNSS - GPS SurveyingGNSS - GPS Surveying
GNSS - GPS SurveyingOpenmaps
 
Free open source gis
Free open source gisFree open source gis
Free open source gisAshok Peddi
 
Understanding Coordinate Systems and Projections for ArcGIS
Understanding Coordinate Systems and Projections for ArcGISUnderstanding Coordinate Systems and Projections for ArcGIS
Understanding Coordinate Systems and Projections for ArcGISJohn Schaeffer
 
Gis Geographical Information System Fundamentals
Gis Geographical Information System FundamentalsGis Geographical Information System Fundamentals
Gis Geographical Information System FundamentalsUroosa Samman
 
Network analysis in gis
Network analysis in gisNetwork analysis in gis
Network analysis in gisstudent
 

What's hot (20)

Gis powerpoint
Gis powerpointGis powerpoint
Gis powerpoint
 
Introduction to GIS
Introduction to GISIntroduction to GIS
Introduction to GIS
 
Geographic Information System unit 1
Geographic Information System   unit 1Geographic Information System   unit 1
Geographic Information System unit 1
 
Introduction to GPS
Introduction to GPSIntroduction to GPS
Introduction to GPS
 
Seminar on gis analysis functions
Seminar on gis analysis functionsSeminar on gis analysis functions
Seminar on gis analysis functions
 
Topology in GIS
Topology in GISTopology in GIS
Topology in GIS
 
Introduction to GIS
Introduction to GISIntroduction to GIS
Introduction to GIS
 
CARTOGRAPHY – yesterday, today and tomorrow
CARTOGRAPHY – yesterday, today and tomorrowCARTOGRAPHY – yesterday, today and tomorrow
CARTOGRAPHY – yesterday, today and tomorrow
 
GNSS - GPS Surveying
GNSS - GPS SurveyingGNSS - GPS Surveying
GNSS - GPS Surveying
 
Web Based GIS
Web Based GISWeb Based GIS
Web Based GIS
 
Gis georeference
Gis georeferenceGis georeference
Gis georeference
 
Geomatics
Geomatics Geomatics
Geomatics
 
Free open source gis
Free open source gisFree open source gis
Free open source gis
 
Geo-spatial Analysis and Modelling
Geo-spatial Analysis and ModellingGeo-spatial Analysis and Modelling
Geo-spatial Analysis and Modelling
 
Understanding Coordinate Systems and Projections for ArcGIS
Understanding Coordinate Systems and Projections for ArcGISUnderstanding Coordinate Systems and Projections for ArcGIS
Understanding Coordinate Systems and Projections for ArcGIS
 
Geographical Information System.ppt
Geographical Information System.pptGeographical Information System.ppt
Geographical Information System.ppt
 
GIS Geographical Information System
GIS Geographical Information SystemGIS Geographical Information System
GIS Geographical Information System
 
Coordinate systems
Coordinate systemsCoordinate systems
Coordinate systems
 
Gis Geographical Information System Fundamentals
Gis Geographical Information System FundamentalsGis Geographical Information System Fundamentals
Gis Geographical Information System Fundamentals
 
Network analysis in gis
Network analysis in gisNetwork analysis in gis
Network analysis in gis
 

Viewers also liked

Viewers also liked (20)

GIS data structure
GIS data structureGIS data structure
GIS data structure
 
GIS Data Types
GIS Data TypesGIS Data Types
GIS Data Types
 
raster data model
raster data modelraster data model
raster data model
 
Data Models - GIS I
Data Models - GIS IData Models - GIS I
Data Models - GIS I
 
functions of GIS
functions of GISfunctions of GIS
functions of GIS
 
GIS & Raster
GIS & RasterGIS & Raster
GIS & Raster
 
Spatial vs non spatial
Spatial vs non spatialSpatial vs non spatial
Spatial vs non spatial
 
ppt spatial data
ppt spatial datappt spatial data
ppt spatial data
 
What Is GIS?
What Is GIS?What Is GIS?
What Is GIS?
 
GIS presentation
GIS presentationGIS presentation
GIS presentation
 
Data 101: A Gentle Introduction
Data 101: A Gentle IntroductionData 101: A Gentle Introduction
Data 101: A Gentle Introduction
 
GIS DATA Dictionary mars 2012 action 1
GIS DATA Dictionary mars 2012 action 1GIS DATA Dictionary mars 2012 action 1
GIS DATA Dictionary mars 2012 action 1
 
Strategy to strengthening M&E Systems in Nigeria
Strategy to strengthening M&E Systems in NigeriaStrategy to strengthening M&E Systems in Nigeria
Strategy to strengthening M&E Systems in Nigeria
 
What is gis
What is gisWhat is gis
What is gis
 
GIS Data Visualization to Optimize the Family Planning Supply Chain Performan...
GIS Data Visualization to Optimize the Family Planning Supply Chain Performan...GIS Data Visualization to Optimize the Family Planning Supply Chain Performan...
GIS Data Visualization to Optimize the Family Planning Supply Chain Performan...
 
The Important Components of GIS
The Important Components of GISThe Important Components of GIS
The Important Components of GIS
 
Introduction to GIS
Introduction to GISIntroduction to GIS
Introduction to GIS
 
What is GIS?
What is GIS?What is GIS?
What is GIS?
 
B&C Master Land Use Plan JAN 2003
B&C Master Land Use Plan JAN 2003B&C Master Land Use Plan JAN 2003
B&C Master Land Use Plan JAN 2003
 
The Top 10 Asked Interview Questions - How to Answer
The Top 10 Asked Interview Questions - How to Answer The Top 10 Asked Interview Questions - How to Answer
The Top 10 Asked Interview Questions - How to Answer
 

More from MEASURE Evaluation

Managing missing values in routinely reported data: One approach from the Dem...
Managing missing values in routinely reported data: One approach from the Dem...Managing missing values in routinely reported data: One approach from the Dem...
Managing missing values in routinely reported data: One approach from the Dem...MEASURE Evaluation
 
Use of Routine Data for Economic Evaluations
Use of Routine Data for Economic EvaluationsUse of Routine Data for Economic Evaluations
Use of Routine Data for Economic EvaluationsMEASURE Evaluation
 
Routine data use in evaluation: practical guidance
Routine data use in evaluation: practical guidanceRoutine data use in evaluation: practical guidance
Routine data use in evaluation: practical guidanceMEASURE Evaluation
 
Tuberculosis/HIV Mobility Study: Objectives and Background
Tuberculosis/HIV Mobility Study: Objectives and BackgroundTuberculosis/HIV Mobility Study: Objectives and Background
Tuberculosis/HIV Mobility Study: Objectives and BackgroundMEASURE Evaluation
 
How to improve the capabilities of health information systems to address emer...
How to improve the capabilities of health information systems to address emer...How to improve the capabilities of health information systems to address emer...
How to improve the capabilities of health information systems to address emer...MEASURE Evaluation
 
LCI Evaluation Uganda Organizational Network Analysis
LCI Evaluation Uganda Organizational Network AnalysisLCI Evaluation Uganda Organizational Network Analysis
LCI Evaluation Uganda Organizational Network AnalysisMEASURE Evaluation
 
Using Organizational Network Analysis to Plan and Evaluate Global Health Prog...
Using Organizational Network Analysis to Plan and Evaluate Global Health Prog...Using Organizational Network Analysis to Plan and Evaluate Global Health Prog...
Using Organizational Network Analysis to Plan and Evaluate Global Health Prog...MEASURE Evaluation
 
Understanding Referral Networks for Adolescent Girls and Young Women
Understanding Referral Networks for Adolescent Girls and Young WomenUnderstanding Referral Networks for Adolescent Girls and Young Women
Understanding Referral Networks for Adolescent Girls and Young WomenMEASURE Evaluation
 
Data for Impact: Lessons Learned in Using the Ripple Effects Mapping Method
Data for Impact: Lessons Learned in Using the Ripple Effects Mapping MethodData for Impact: Lessons Learned in Using the Ripple Effects Mapping Method
Data for Impact: Lessons Learned in Using the Ripple Effects Mapping MethodMEASURE Evaluation
 
Local Capacity Initiative (LCI) Evaluation
Local Capacity Initiative (LCI) EvaluationLocal Capacity Initiative (LCI) Evaluation
Local Capacity Initiative (LCI) EvaluationMEASURE Evaluation
 
Development and Validation of a Reproductive Empowerment Scale
Development and Validation of a Reproductive Empowerment ScaleDevelopment and Validation of a Reproductive Empowerment Scale
Development and Validation of a Reproductive Empowerment ScaleMEASURE Evaluation
 
Sustaining the Impact: MEASURE Evaluation Conversation on Maternal and Child ...
Sustaining the Impact: MEASURE Evaluation Conversation on Maternal and Child ...Sustaining the Impact: MEASURE Evaluation Conversation on Maternal and Child ...
Sustaining the Impact: MEASURE Evaluation Conversation on Maternal and Child ...MEASURE Evaluation
 
Using Most Significant Change in a Mixed-Methods Evaluation in Uganda
Using Most Significant Change in a Mixed-Methods Evaluation in UgandaUsing Most Significant Change in a Mixed-Methods Evaluation in Uganda
Using Most Significant Change in a Mixed-Methods Evaluation in UgandaMEASURE Evaluation
 
Lessons Learned In Using the Most Significant Change Technique in Evaluation
Lessons Learned In Using the Most Significant Change Technique in EvaluationLessons Learned In Using the Most Significant Change Technique in Evaluation
Lessons Learned In Using the Most Significant Change Technique in EvaluationMEASURE Evaluation
 
Malaria Data Quality and Use in Selected Centers of Excellence in Madagascar:...
Malaria Data Quality and Use in Selected Centers of Excellence in Madagascar:...Malaria Data Quality and Use in Selected Centers of Excellence in Madagascar:...
Malaria Data Quality and Use in Selected Centers of Excellence in Madagascar:...MEASURE Evaluation
 
Evaluating National Malaria Programs’ Impact in Moderate- and Low-Transmissio...
Evaluating National Malaria Programs’ Impact in Moderate- and Low-Transmissio...Evaluating National Malaria Programs’ Impact in Moderate- and Low-Transmissio...
Evaluating National Malaria Programs’ Impact in Moderate- and Low-Transmissio...MEASURE Evaluation
 
Improved Performance of the Malaria Surveillance, Monitoring, and Evaluation ...
Improved Performance of the Malaria Surveillance, Monitoring, and Evaluation ...Improved Performance of the Malaria Surveillance, Monitoring, and Evaluation ...
Improved Performance of the Malaria Surveillance, Monitoring, and Evaluation ...MEASURE Evaluation
 
Lessons learned in using process tracing for evaluation
Lessons learned in using process tracing for evaluationLessons learned in using process tracing for evaluation
Lessons learned in using process tracing for evaluationMEASURE Evaluation
 
Use of Qualitative Comparative Analysis in the Assessment of the Actionable D...
Use of Qualitative Comparative Analysis in the Assessment of the Actionable D...Use of Qualitative Comparative Analysis in the Assessment of the Actionable D...
Use of Qualitative Comparative Analysis in the Assessment of the Actionable D...MEASURE Evaluation
 
Sustaining the Impact: MEASURE Evaluation Conversation on Health Informatics
Sustaining the Impact: MEASURE Evaluation Conversation on Health InformaticsSustaining the Impact: MEASURE Evaluation Conversation on Health Informatics
Sustaining the Impact: MEASURE Evaluation Conversation on Health InformaticsMEASURE Evaluation
 

More from MEASURE Evaluation (20)

Managing missing values in routinely reported data: One approach from the Dem...
Managing missing values in routinely reported data: One approach from the Dem...Managing missing values in routinely reported data: One approach from the Dem...
Managing missing values in routinely reported data: One approach from the Dem...
 
Use of Routine Data for Economic Evaluations
Use of Routine Data for Economic EvaluationsUse of Routine Data for Economic Evaluations
Use of Routine Data for Economic Evaluations
 
Routine data use in evaluation: practical guidance
Routine data use in evaluation: practical guidanceRoutine data use in evaluation: practical guidance
Routine data use in evaluation: practical guidance
 
Tuberculosis/HIV Mobility Study: Objectives and Background
Tuberculosis/HIV Mobility Study: Objectives and BackgroundTuberculosis/HIV Mobility Study: Objectives and Background
Tuberculosis/HIV Mobility Study: Objectives and Background
 
How to improve the capabilities of health information systems to address emer...
How to improve the capabilities of health information systems to address emer...How to improve the capabilities of health information systems to address emer...
How to improve the capabilities of health information systems to address emer...
 
LCI Evaluation Uganda Organizational Network Analysis
LCI Evaluation Uganda Organizational Network AnalysisLCI Evaluation Uganda Organizational Network Analysis
LCI Evaluation Uganda Organizational Network Analysis
 
Using Organizational Network Analysis to Plan and Evaluate Global Health Prog...
Using Organizational Network Analysis to Plan and Evaluate Global Health Prog...Using Organizational Network Analysis to Plan and Evaluate Global Health Prog...
Using Organizational Network Analysis to Plan and Evaluate Global Health Prog...
 
Understanding Referral Networks for Adolescent Girls and Young Women
Understanding Referral Networks for Adolescent Girls and Young WomenUnderstanding Referral Networks for Adolescent Girls and Young Women
Understanding Referral Networks for Adolescent Girls and Young Women
 
Data for Impact: Lessons Learned in Using the Ripple Effects Mapping Method
Data for Impact: Lessons Learned in Using the Ripple Effects Mapping MethodData for Impact: Lessons Learned in Using the Ripple Effects Mapping Method
Data for Impact: Lessons Learned in Using the Ripple Effects Mapping Method
 
Local Capacity Initiative (LCI) Evaluation
Local Capacity Initiative (LCI) EvaluationLocal Capacity Initiative (LCI) Evaluation
Local Capacity Initiative (LCI) Evaluation
 
Development and Validation of a Reproductive Empowerment Scale
Development and Validation of a Reproductive Empowerment ScaleDevelopment and Validation of a Reproductive Empowerment Scale
Development and Validation of a Reproductive Empowerment Scale
 
Sustaining the Impact: MEASURE Evaluation Conversation on Maternal and Child ...
Sustaining the Impact: MEASURE Evaluation Conversation on Maternal and Child ...Sustaining the Impact: MEASURE Evaluation Conversation on Maternal and Child ...
Sustaining the Impact: MEASURE Evaluation Conversation on Maternal and Child ...
 
Using Most Significant Change in a Mixed-Methods Evaluation in Uganda
Using Most Significant Change in a Mixed-Methods Evaluation in UgandaUsing Most Significant Change in a Mixed-Methods Evaluation in Uganda
Using Most Significant Change in a Mixed-Methods Evaluation in Uganda
 
Lessons Learned In Using the Most Significant Change Technique in Evaluation
Lessons Learned In Using the Most Significant Change Technique in EvaluationLessons Learned In Using the Most Significant Change Technique in Evaluation
Lessons Learned In Using the Most Significant Change Technique in Evaluation
 
Malaria Data Quality and Use in Selected Centers of Excellence in Madagascar:...
Malaria Data Quality and Use in Selected Centers of Excellence in Madagascar:...Malaria Data Quality and Use in Selected Centers of Excellence in Madagascar:...
Malaria Data Quality and Use in Selected Centers of Excellence in Madagascar:...
 
Evaluating National Malaria Programs’ Impact in Moderate- and Low-Transmissio...
Evaluating National Malaria Programs’ Impact in Moderate- and Low-Transmissio...Evaluating National Malaria Programs’ Impact in Moderate- and Low-Transmissio...
Evaluating National Malaria Programs’ Impact in Moderate- and Low-Transmissio...
 
Improved Performance of the Malaria Surveillance, Monitoring, and Evaluation ...
Improved Performance of the Malaria Surveillance, Monitoring, and Evaluation ...Improved Performance of the Malaria Surveillance, Monitoring, and Evaluation ...
Improved Performance of the Malaria Surveillance, Monitoring, and Evaluation ...
 
Lessons learned in using process tracing for evaluation
Lessons learned in using process tracing for evaluationLessons learned in using process tracing for evaluation
Lessons learned in using process tracing for evaluation
 
Use of Qualitative Comparative Analysis in the Assessment of the Actionable D...
Use of Qualitative Comparative Analysis in the Assessment of the Actionable D...Use of Qualitative Comparative Analysis in the Assessment of the Actionable D...
Use of Qualitative Comparative Analysis in the Assessment of the Actionable D...
 
Sustaining the Impact: MEASURE Evaluation Conversation on Health Informatics
Sustaining the Impact: MEASURE Evaluation Conversation on Health InformaticsSustaining the Impact: MEASURE Evaluation Conversation on Health Informatics
Sustaining the Impact: MEASURE Evaluation Conversation on Health Informatics
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Data 101: Fundamentals of Data in GIS

  • 1. Data 101 Fundamentals of data in a GIS
  • 2. Overview  Role of data  Data structures and schemas  Metadata  Linking data  Issues of confidentiality
  • 4. 90 percent rule 90% Data Preparation 90% of the cost, 10% Mapping time and effort will be devoted to data preparation
  • 5. 90% Rule Data Preparation Mapping  Collecting  Map design  Cleaning  Categorization decisions  Validating  Production  Formatting  Linking with other data
  • 6. GIS analysis is only as strong as the data used.
  • 7. Strategies for strong data  Accuracy  Timlieness  Properly structured  Properly documented
  • 8. Data accuracy  Data should accurately reflect reality  In GIS there are two types of accuracy to be concerned with:  Spatial accuracy  Items located correctly  Attribute accuracy  Attributes are correct and properly linked to geography
  • 9. Spatial accuracy Real Location Hotel Suryaa
  • 10. Spatial Accuracy and Scale Hotel Suryaa
  • 11. Attribute Accuracy  Is the data associated with the location accurate?  Is it linked to the right geographic entity?
  • 13. Timeliness  Is the data for the time period of interest?  Boundaries change  New features created  Features change
  • 14. Data Structure  Proper data structure is necessary in order to effectively use data  Software must know how to read the data, and query it.  The structure of the data is also known as data schema
  • 15. Data Schema  For most programs, data will need to be stored in a row and column format  GIS programs expect well formed data in the following schema:  One record per geographic unit  Geographic units don’t repeat in records  Variables are stored in columns  No blank cells unless data is missing
  • 16. Data Schema Population China India United Indonesia States Total 1339724852 1210193422 312417000 237556363 Percent of 19.23% 17.37% 4.48% 3.41% World’s Population Population 140/km2 368/km2 32/km2 121/km2 Density Poor data schema •Columns are geographic units •Variables are rows
  • 17. Duplic ate Dist r i ct N ames Bla nk Cells
  • 18. Proper Data Schema Columns are variables One record per geographic unit
  • 19. Metadata  Data about data  Provides information on:  Source of data  Who created it  When it was created  Coordinate system and datum  Usage and sharing restrictions
  • 20. Metadata  Metadata is especially important with spatial data because of issues of:  Spatial accuracy  Coordinate systems and datums  Confidentiality  Timeliness
  • 21. Metadata formats  International standard  ISO 9115  Mandatory elements  Schema for metadata  Countries may have their own national standards that are compatible with the ISO standard but provide extra elements
  • 23. Data Types  Text  Numeric  Coordinates  Programs assign variables to be a specific type which can affect the way the program handles data
  • 24. Data Types  Text  Arithmetic can not be conducted on values in text fields  Numeric  Arithmetic permitted  May require user to declare number of decimal places before entering data  This can be important when storing coordinates
  • 25. Linking data  Key field  The field that contains information common between tables  Tables are linked using the key field  Can’t link using key fields that are two different types
  • 26. District Population Male Pop Female Pop North 24015 14409 9606 West 31154 16202 14952 South 62442 29972 32470 District Area (sq km) North 243 District is the key field West 310 South 602 District Population Male Pop Female Pop Area (sq km) North 24015 14409 9606 243 West 31154 16202 14952 310 South 62442 29972 32470 602
  • 27. Linking data  Linking using text fields can be problematic  Variations in spelling
  • 28. District Population Male Pop Female Pop North Kinley 24015 14409 9606 West 31154 16202 14952 South 62442 29972 32470 The two tables have District Area (sq km) different spellings for N. Kinley 243 the district North Kinley West 310 South 602 District Population Male Pop Female Pop Area (sq km) West 31154 16202 14952 310 South 62442 29972 32470 602
  • 29. Linking data  Linking using numeric fields is often more reliable and less vulnerable to variations and other issues  Countries often use numeric codes for administrative units to get around problems with spelling variations  If standardized national codes exist, it is a good idea to include them in data  National Bureau of Statistics or Census often manage such codes
  • 30. District Dist code Population Male Pop Female Pop North Kinley 100 24015 14409 9606 West 200 31154 16202 14952 South 300 62442 29972 32470 District Dist code Area (sq km) Dist code is the N. Kinley 100 243 key field West 200 310 South 300 602 District Dist Code Population Male Pop Female Area (sq km) Pop North 100 24015 14409 9606 243 West 200 31154 16202 14952 310 South 300 62442 29972 32470 602
  • 31. Advantage of numeric codes Can manage hierarchy effectively District Province Code Coast North Coast 101 Savanna North Mountain 103 North Savanna 105 Mountain North District Code 100
  • 32. Linking data key points  Key fields must be of the same type  Text fields can be problematic due to spelling variations  Numeric fields are often a more reliable key field  Unique geography codes, if available in a country is often the best option for making linkages
  • 33. Data and confidentiality issues  Important issue when working with spatial data  Discuss issues of confidentiality and spatial tools  Present strategies for protecting confidentiality
  • 34. Confidentiality  Protecting identity of individuals  Requirement  Informed consent agreements  Ethical research
  • 35. Overt disclosure The act of explicitly making data available that breaches confidentiality commitments.
  • 36. Deductive Disclosure 45 year old 45 year old 45 year old female female female Has 5 children Has 5 children Works for General Electric in Delhi 28.67171, 77.21211
  • 37. Spatial Data  Overt disclosure  Makes deductive disclosure easier
  • 38. Geoprivacy “[an] individual’s right to prevent disclosure of the location of one’s home, workplace, daily activities or trips.” Protection of geoprivacy and accuracy of Spatial Information: How Effective are Geographical Masks? Kwan, Casas, Schmitz Cartographica, Vol 39, #2
  • 39. Four Principles  Protection of Confidentiality  Social-Spatial Linkage  Data Sharing  Data Preservation Confidentiality and spatially explicit data: Concerns and challenges VanWey, Rindfuss, Gutmann, Entwisle, Balk PNAS, vol. 102, no. 43
  • 40. 1. Protection of Confidentiality  Fundamental to ethical research  Information that might lead to physical, emotional, financial or other harm  Protection of information that discloses identity
  • 41. 2. Social-Spatial Linkage  All human activity takes place on earth  Understanding that adds context and perspective  Key to advancement of science  Essential for understanding the diffusion of behaviors
  • 42. 3. Data Sharing  Essential on both scientific and financial grounds  Provide access to data for other researchers  Condition of funders
  • 43. 4. Data Preservation  Data available in the future  How long should data be deemed “sensitive”?  When, if ever, can it be released
  • 45. Random Perturbations  Random shifting of point locations  Pros: Easy (relatively) to do  Cons: Lose original location, introduces error
  • 46. Affine Transformation  Change scale  Rotate  Shift a set distance  Combination  Pros: Easy to do  Cons: Easy to undo, can impact some types of analysis
  • 47. Aggregate  Point locations are aggregated to higher unit of analysis  Pros: Easy to do  Cons: Requires sufficient data points, Finer data variations will be lost
  • 48. Despatialize  Remove Coordinate System  Use Euclidean space  Pros: Simple, keeps relative position and placement  Cons: Loses contextual data
  • 49. Nothing  Do not collect or release data  Cold room or on-site analysis only  Pros: Maintains all of the original spatial data  Cons: Complicated, limits data sharing, limits social-spatial link
  • 50. Mx u a im m Spatial Integrity M im m in u Mx u a im m M im m in u R k is Disclosure R k is
  • 51. “Ignoring is unacceptable”  Can get lost in the excitement about GIS  Those who collect data must think about the confidentiality issues  Data users must also think about how their analysis may increase the risk of deductive disclosure.
  • 52. Key points  Confidentiality issues arise when spatial context is included in data.  It’s important to protect confidentiality. People have an expectation that their identities are protected.  There are strategies that can preserve confidentiality, but there is no “one-size-fits-all solution”

Editor's Notes

  1. For this presentation we will talk about the role of data in effective use of data. We will also cover the proper data structures and schemas for use of GIS as well as review the notion of metadata. Lastly we’ll review some important issues concerning linking data as well as discuss issues of confidentiality.
  2. To review, you will remember that GIS combines software, hardware, procedures, people and data. Each element is important, but use of GIS is easier when the data is well formed and ready to go into GIS.
  3. There is a rule of thumb with GIS work known as the 90% rule. It states that for any GIS activity, 90% of the cost will be devoted to data preparation, and 10% to actually producing maps. T
  4. This means that before any map can be produced, many tasks will need to be completed in order to produce maps. For instance, it is necessary to collect, clean, validate, format the data to make sure it is accurate. Then the data may need to be linked with other data to be used, which means that there may be additional work needed to make this possible. For mapping, there is indeed work to be done, but comparatively speaking, much less.
  5. As you can see, data is important in GIS. In fact, GIS analysis is only as strong as the data used.
  6. Data, whether in a GIS or not, should of course be accurate. This means that it reflects reality as much as possible. In GIS there are two types of accuracy to be concerned with: spatial accuracy which refers to whether items are located correctly and attribute accuracy, which refers to the attributes. Here this means that the attributes are correct and are properly linked to geography.
  7. Here is a representation of spatial accuracy. Let’s say you found online a file with latitude and longitude coordinates of hotels in India. You decide you want to create a shapefile with these coordinates. When you then overlay them on images in Google Earth, you see that the points aren’t accurate. Here’s the scene in Google Earth <CLICK TO DISPLAY FIRST ANIMATED ELEMENT> And here is the location of the Hotel Suryaa <CLICK TO DISPLAY NEXT ANIMATED ELEMENT>. This location is inaccurate because the real location of the hotel Suryaa is here. <CLICK TO DISPLAY NEXT ANIMATED ELEMENT>. The point is off by 50 meters or more.
  8. Spatial Accuracy can be affected by scale. For instance here is the same point when viewed at a different scale. <CLICK TO DISPLAY ANIMATED ELEMENT> At this scale the point location is still inaccurate, in that it isn’t the exact latitude and longitude for the hotel, however because our scale exceeds the error of the point, the effect is less obvious. In fact if the location derived using a map at one scale, the accuracy can be assessed by using a map at a smaller scale (a map that has “zoomed in”)
  9. To illustrate, here is a screen shot from Google Maps. Even though it isn’t a GIS, it does rely on a spatial database in that it has locational information and attributes about the locations. If you zoom into the location of the hotel, you see that <CLICK TO DISPLAY FIRST ANIMATED ELEMENT> instead of saying the building is the Hotel Suryaa, it has the building listed as “Hotel Crowne Plaza”
  10. The example from Google Maps illustrates another consideration for strong data, timeliness. Their database is old and doesn’t reflect that this hotel is now the Suryaa and no longer the Crowne Plaza. The world changes, that means that spatial databases, or any data set can quickly become out of date, so it is important to be aware of the timeliness of the data. The data doesn’t necessarily have to be the most recent, sometimes there may be value in having older files, for instance if you want to track changes over time. However, you as the data user needs to be aware of the time frame of the data you use and include information about the time frame of the data you create.
  11. Software, whether it’s a GIS program or not, must know how to read and interpret data files. This means that the data needs to store the data in a standard way that the software expects. The way that the data is stored is known as it’s structure or more commonly, schema.
  12. There has been a standard schema that has evolved over the years for data and it is considered best practice to use this schema generally, whether the data will be used in a GIS or not. This standard schema is as follows: one record per geographic unit, variables are stored in columns and there are no blank cells unless data is missing
  13. Here’s an example of poor data schema. The variables are listed as rows and the columns are the geographic units. It is still an valid way to display data for a table in a publication or presentation, but you would not want to store data using this schema if you wanted to use it in a GIS.
  14. Here is another example of poor data schema. There are several things wrong with this table. First, there are blank cells that don’t represent missing data. The blank cells are supposed to indicate that the values of the last cell is to be repeated. <CLICK TO ADVANCE ANIMATION>. Second there are duplications for district names. In this made up country, there are districts with the same name in different provinces. <CLICK TO ADVANCE ANIMATION>. We’ll come back to this problem in a little bit.
  15. Here is a proper data schema for a GIS program. As you can see there is one record per geographic unit. In this case Region. Regions don’t duplicate. Columns contain variables. Each cell contains well formed data.
  16. As I mentioned, proper documentation is a key component of strong data. Including metadata is the best way to document data. Simply put Metadata is data about data. It provides the data user with information about the data such as: <READ SLIDE>
  17. Metadata is especially important with spatial data because of issues of : Spatial accuracy: it’s important for data users to know how the data was collected, if there are scale issues to consider Coordinate systems and datums: sometimes it is clear what coordinate system was used, but other times it isn’t. Without metadata the user may not know what coordinate system/datum the data is in and this may make it difficult to use the data. Confidentiality: Spatial data can raise issues concerning confidentiality and privacy. The metdata can make sure data users are aware of these issues and what restrictions may exist on sharing, the data or even presenting maps Timeliness: This one should be obvious, when the data was collected
  18. Because metadata is so important, the international standards organization (ISO) has produced an international standard for geographic metadata. The ISO 9115 standard mandates certain elements be included in the metadata. It also developed the schema or structure for metadata. For more information about the ISO9115 standard, you can visit their web site. It’s important to note that many countries have developed their own national standards for spatial metadata, these national standards should be compatible with the ISO standard. It is important to research any national metadata standards you may want to conform to.
  19. Here is an excerpt from metadata for a file obtained from the UN’s Second Administrative Level Boundary (SALB) site. The actual metadata file is much longer and contains many more elements, but this will give you a example of the type of information that is contained in a metadata file.
  20. Most data programs differentiate between different data types and will assign variables to be one type or another the way the field is assigned can affect the way the program handles data.
  21. For fields that are defined as text, arithmetic operations such as addition and subtraction are not allowed. For fields that are defined as numeric, arithmetic is permitted. One issue however is that many programs may require the user to declare the number of decimal places before entering data. This is an important consideration when storing coordinates in a field, since if inadequate number of decimal places are declared, the full coordinate may not be able to be stored which can have an impact on accuracy.
  22. One of the key tasks that a GIS needs to be able to do is linking tables. GIS uses a key field to make the link between tables. A key field is the field that contains information common between the tables. It is important to remember that it is not possible to link tables using key fields that are two different types. In the next slides I’ll illustrate this.
  23. Here are two tables. <ASK GROUP> What is the field that will be the key field? <ANSWER: DISTRICT> <ADVANCE SLIDE TO DISPLAY ANIMATED ELEMENT> It is possible to link these two tables using the common field, District. Just a note to point out that it is no coincidence that a geographic unit is the key field. As we’ve mentioned, geography is the common link between human activity. <ADVANCE SLIDE TO DISPLAY ANIMATED ELEMENT> As you can see there is now a link between the two tables.
  24. One important thing to point out is linking using text fields can be problematic because of variations in spelling.
  25. Here are two tables, notice that they each have a different spelling for the district North Kinley. <ADVANCE SLIDE TO DISPLAY ANIMATION> <ASK GROUP> What do you think will happen? Will it be possible to join these tables? <ADVANCE SLIDE TO DISPLAY ANIMATION> The answer depends on the software and the settings you select, for many GIS programs, a link will be made for those records that do match. It’s easy to see that the linked table doesn’t have the complete number of records in this example, but if you had many records, it might be possible to miss this fact. So a good practice is to check the record count after the join to make sure it is correct.
  26. As you can imagine, there are many different ways text fields can be problematic. Linking using numeric fields is often more reliable since they are less vulnerable to variations. For this reason, countries often use numeric codes to identify administrative units. Often the national bureau of statistics or census bureau manage such codes. If there are standardized national codes, it is a good idea to include them in databases.
  27. So here are two tables with a field for district code which were assigned by the national bureau of statistics. If District code is used as the key field <ADVANCE SLIDE TO DISPLAY ANIMATION> then spelling variation in the district field doesn’t matter and the table can be joined successfully. <ADVANCE SLIDE TO DISPLAY ANIMATION>
  28. Another advantage of numeric codes associated with geography is they can manage geographic hierarchy effectively. So let’s say this is North District. North District is divided into three provinces. <ADVANCE SLIDE TO DISPLAY ANIMATED ELEMENT> Coast province, mountain province and savanna province. North district has a code of 100. Most countries set up their national codes so that hierarchy is included. <ADVANCE SLIDE TO DISPLAY ANIMATED ELEMENT> As you can see from the table, all of the provinces are numbered in the 100’s since they are in North District.
  29. To review, here are the key points from the discussion on linking data <READ SLIDE>
  30. Now to switch topics slightly. Confidentiality is an important consideration when working with spatial data. During this part of the lecture, we’ll discuss issues of confidentiality and spatial tools as well as present strategies for protecting confidentiality.
  31. So let’s start by talking about confidentiality and what I’m referring to. Put simply, confidentiality is the idea that it is important to protect the identity of individuals. This is a requirement of many informed consent agreements that people sign when we collect data. It’s also a pillar of ethical research.
  32. There are two threats to confidentiality, one is overt disclosure. Overt disclosure is the the act of explicitly making data available that breaches confidentiality commitments. Such as releasing data files that contain an individual’s name and/or data.
  33. The second way that confidentiality can be breached is through deductive disclosure. That the process of piecing together multiple pieces of the puzzle until a picture emerges. So for instance, let’s say there was a survey conducted and you knew if you knew that a person was 45 year old female that narrows down the list somewhat [ADVANCE SLIDE] then if you knew that she has 5 children that narrows it down even more [ADVANCE SLIDE] if you know that she works for General Electric in Delhi that makes it a little easier to potentially identify a person. [ADVANCE SLIDE] If you add a geographic coordinate of where they live. [ADVANCE SLIDE] It’s almost the same as listing a name.
  34. When you add a spatial component to data it can be an overt disclosure of identifying information. At the very least it makes deductive disclosure easier. So what’s the answer? Should the spatial element be dropped?
  35. There is an emerging recognition that there is a need to explicitly define issues of geoprivacy. Geoprivacy is a term coined to refer to “an individual’s right to prevent disclosure of the location of one’s home, workplace, daily activities or trips”
  36. As people have thought about this issue of geoprivacy, there are 4 principles that have been laid out to guide people: [READ SLIDE] I’ll talk about each of them
  37. The first principle is the basic protection of confidentiality. This protection is fundamental to ethical research. Information that might lead to physical, emotional, financial or other harm. It’s important to protect information that discloses identity
  38. The second key principle that informs the discussion on confidentiality is the importance of preserving the social-spatial linkage. As we’ve mentioned, all human activity takes place on earth. Understanding that adds context and perspective. Its also a key to advancement of science.
  39. The third principle is the notion of data sharing. Data sharing means sharing data with other researchers or other important stakeholders. It’s essential on both scientific and financial grounds. It allows the data to have maximal use by letting other researchers use the data. Lastly, there’s a growing trend among funders of data collection efforts that the data be shared either publicly or within the research community.
  40. The last principle is the notion that data should be preserved and be available for future use. This raises the question, how long should the data be deemed “sensitive”? When if ever, can it be released? These are things that should be considered at the beginning of any data collection effort or establishment of a data system. It should be spelled out in advance to respondents or individuals who are providing information/data.
  41. What are the strategies that can be employed to protect data?
  42. The first strategy we’ll talk about is simply just randomly shifting the locations. The advantage of this is that is relatively easy to do. There are plugins for QGIS that will do this. The disadvantage is that you lose the original location and it introduces error
  43. The second strategy is what’s known as an affine transformation. This is a systematic change to the data, changing the scale, rotating shifting a set distance. This is easy to do, but it’s also easy to undo if people know the parameters of the transformation. In some cases even if the exact parameters aren’t known, it’s still possible to deduce the types of transformation done if a set of points don’t match the geography on the ground (say points end up in the ocean or lake because of the transformation)
  44. Another strategy is to just aggregate the data. So say for instance you have individual patient data you can just aggregate to mask individual data. This too is easy to do, but it does require sufficient number of data points. Finer data variations will be lost.
  45. Another strategy is to despatialize the data. Simply remove the coordinate system that ties the data to the earth. It uses euclidean space instead of geographic space. This is simple, it keeps relative position and placement. On the downside though, you lose contextual data, so it won’t be possible to bring other data that might be helpful to look at (such as road networks or the surrounding landscape).
  46. Lastly there’s always, “do nothing”. You could make the decision to not collect or release the data. Another option would be to set up a cold room or on-site analysis only. This maintains all of the original spatial data. The disadvantage is that schemes like cold-room or on-site analysis can reduce accessibility to data which can limit social-spatial link and can be complicated to implement.
  47. There is no magic answer. It’s a matter of finding the technique that best suits your needs and the commitments made to respondents. It’s possible to think about the issue in terms of spatial integrity and disclosure risks. Making the decision on what approach to take is dependent on where on this spectrum you want to land. You can preserve spatial integrity or you can minimize risk of confidentiality breaches, but you can’t have both.
  48. The article by Van Wey that I mentioned earlier has a quote that ignoring the issue is unacceptable. Something that often gets lost in the excitement over GIS is the issues around confidentiality. However, those who collect data must think about the confidentiality issues and make sure their informed consent agreements adequately describe the way data will and won’t be used. Data users also have a responsibility to ensure that any extra contextual analysis they do doesn’t increase the risk of deductive disclosure.
  49. Key points are [READ SLIDE] Any questions?