Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introducing the SQUAD Tool: A tool for identifying anomalies in large spatial data sets

250 views

Published on

This webinar introduced the Spatial Quality and Anomalies Diagnosis (SQUAD) Tool, now compatible with both ArcGIS and QGIS.

Published in: Health & Medicine
  • Be the first to comment

Introducing the SQUAD Tool: A tool for identifying anomalies in large spatial data sets

  1. 1. SQUAD Tool Identifying Anomalies to Improvethe Quality of Spatial Data Sets John Spencer Becky Wilkes Veronica Escamilla MEASURE Evaluation May 31, 2018
  2. 2. The worldis a complexplace
  3. 3. Geographic data helps us understand our world a little better
  4. 4. Growthin spatially referenced datasets
  5. 5. Where are things located and what do we know about them?
  6. 6. Healthcare
  7. 7. It’s a good idea to know where health facilities are.
  8. 8. Better understanding of demand and need and helps prospective patients know what services are available.
  9. 9. Many countries are developingMaster Facility Lists.
  10. 10. Facility lists can have thousands of sites. Data quality is important.
  11. 11. Real LocationCoordinate places it here With poor data quality, you can end up with facilities seeming like they’re in places where they aren’t, such as in lakes or oceans…
  12. 12. Or on the other side of the world Reallyit is here Likelynot in Denmark
  13. 13. ? Whendata is wrong it can lead to confusion that can interferewith adequate provisionof services.
  14. 14. What do we mean by data quality?
  15. 15. Spatial • Is there a coordinate? • Is it in an appropriateplace? o Not in a lake o Not outsidethe country • Are coordinates duplicated?
  16. 16. Attribute • Are there duplicatenames? • Missingvalues? • Out of range values? Spatial • Is there a coordinate? • Is it in an appropriateplace? o Not in a lake o Not outsidethe country • Are coordinates duplicated?
  17. 17. Spatial Attribute There should also be congruity between the two.
  18. 18. Data qualityreviews can be a slow and meticulousprocess.
  19. 19. Even with a team of people,it requiresa lot of work.
  20. 20. Can we automate the process to speed it up without compromising its effectiveness as a way to assess quality?
  21. 21. Are there patterns in the data that indicate a data quality problem?
  22. 22. Six anomalies that may indicate a data quality issue 1. Missingcoordinate 2. Coordinatestruncated 3. Duplicatecoordinates 4. Duplicatefacilitynames 5. Siteis slightlyoutsideof expected administrativeunit 6. Siteis far outsideof expected administrativeunit
  23. 23. Find anomalous data. Investigate those records first.
  24. 24. Spatial Quality Anomalies and Diagnosis Tool
  25. 25. Anomaly 1: Missing coordinate Problem: No coordinate for site or coordinate of 0,0 Possible solutions • Reviewthe GPS log or other records • Recapturethe locationon the next sitevisit • Use imageryfrom ArcGIS,Google Earth,or another source to locate the siteand get the coordinate ?
  26. 26. Anomaly 2: Coordinates truncated Problem: Coordinate is missing significant digits Possiblesolutions • Reviewthe GPS log or other records • Recapturethe locationon the next sitevisit • Use imageryfrom ArcGIS,Google Earth,or another source to locate the siteand get the coordinate Example: -6.72, 35.43 Coordinate Approximate precision 23.1 10 kilometers 23.12 1 kilometer 23.123 100 meters 23.1234 10 meters 23.12345 1 meter 23.123456 10 centimeters
  27. 27. Anomaly 3: Duplicate coordinates Problem: Multiple records with identical coordinates Possiblesolutions • Determineif there are, in fact, two distinctsitesat that location. If there aren’t two sites,then: o Reviewthe GPS log or other records o Recapturethe locationon the next sitevisit o Use imageryfrom ArcGIS,Google Earth,or another source to locate the siteand get the coordinate
  28. 28. Anomaly 4: Duplicate facility names Problem: Multiple records with identical names Possiblesolutions • Determineif there are, in fact, two distinctsiteswiththe same name o Contact the sitedirectly o Reviewdocuments or reports to determinethe name MercyClinic MercyClinic
  29. 29. Anomaly 5: Site is outside expected location but is within 2 kilometers Problem:Site is slightly outside its expectedadministrativeunit Possiblesolutions • Try a different administrativeunit boundary file • Reviewrecords to validatethe administrativeunit and GPS coordinate • Locate the siteusing imagery • Revisitthesite NorthDistrict SouthDistrict 1.8KM Facility Name District Mercy North
  30. 30. Anomaly 6: Site is not at all near its expected location Problem: Siteis more than2 kilometersfrom its expected location Possiblesolutions • Look for obvious issues in the coordinate(e.g., X/Y transposed; typos) • Reviewthe GPS log, if available • Locate the siteusing imagery • Revisitthesite Nairobi General Hospital
  31. 31. Using the Tool
  32. 32. Prerequisites to use SQUAD Site LocationFile • Unique identifierfor each site • X/Y coordinate • Name of site • Field with administrative unit* AdministrativeUnits File • Name field for administrative unit* *Bothfilesshouldrelyonstandardnamesfortheadministrativeunitthatuseconsistentnaming conventionsanddiacriticalmarks.
  33. 33. Load the relevant files and the tool into ArcGIS or QGIS Load the relevantfiles • Load the individualfeature class files for sites and boundary files Load the SQUAD Tool • Add the tool to the Arctoolbox or use the Plugin Managerin QGIS
  34. 34. Open the tool, provide parameters, and run the tool The tool will ask you to indicate the relevant files and fields.
  35. 35. Run the tool and review results Fields added for each Anomaly type • 1 = Anomalypresent
  36. 36. The presence of an anomaly does not automatically indicate an error. Anomalous records do require investigation, though.
  37. 37. The SQUAD Tool is suitable for initial data quality checks in a large spatial data set and for routine quality checks.
  38. 38. Availableon MEASURE Evaluation’sweb site, here: www.measureevaluation.org/gis
  39. 39. ThispresentationwasproducedwiththesupportoftheUnitedStates AgencyforInternationalDevelopment(USAID)underthetermsofMEASURE EvaluationcooperativeagreementAID-OAA-L-14-00004.MEASURE EvaluationisimplementedbytheCarolinaPopulationCenter,Universityof NorthCarolinaatChapelHillinpartnershipwithICFInternational;John Snow,Inc.;ManagementSciencesforHealth;Palladium;andTulane University.ViewsexpressedarenotnecessarilythoseofUSAID ortheUnited Statesgovernment. www.measureevaluation.org

×