Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A Method for Determining and Improving the Horizontal Accuracy of Geospatial Features


Published on

Many data sets stewarded by geospatial professionals are spatially correlated derivatives of higher accuracy data sets such as parcels and road networks. This article documents the use of the Buffer-Overlay method of Goodchild and Hunter (1997) to determine and improve the horizontal accuracy of geospatial features.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

A Method for Determining and Improving the Horizontal Accuracy of Geospatial Features

  1. 1. A Method for Determining and Improving the Horizontal Accuracy of Geospatial Features Juan Tobar, Shakir Ahmed, Linda McCafferty, and Carlos Piccirillo South Florida Water Management District, West Palm Beach, FL, USAAbstractMany data sets stewarded by geospatial professionals are spatially correlated derivatives ofhigher accuracy data sets such as parcels and road networks. This article documents the use ofthe Buffer-Overlay method of Goodchild and Hunter (1997) to determine and improve thehorizontal accuracy of geospatial features. The method relies on a comparison with arepresentation of higher accuracy, and estimates the percentage of the total length of the higheraccuracy representation that is within a specified distance of the lower accuracy representation.The method is then extended using topological operators to extract and replace lower accuracyrepresentations with those of higher accuracy.IntroductionThe South Florida Water Management District (SFWMD) regulates water supply, water quality,groundwater withdrawals, and surface water runoff through the issuance of permits for theseactivities on specific land parcels. The District’s Regulatory GIS consists of approximately85,000 permits spread over a 16 county jurisdictional area from Orlando to the Keys. Thepermits are maintained in an SDE database in 18 feature classes based on permit type. About halfof these permits (Environmental Resource Permits) never expire and the other half are valid for20 years (Water Use Permits) before they need to be renewed. These feature classes are used byengineers, environmental scientists, hydrologist, and compliance staff to make informeddecisions during the application review process and post permit compliance. For these reasons itis important that even the oldest permits are depicted as accurately as possible in the GIS system.The Data - PermitsFrom 1980 to 1987 (15 years) permits were drawn directly on USGS 1:24,000 topographicquadrangles maps and mylar overlays. From 1987 to 1995 (8 years) the maps had been migratedto CAD and permits where being heads-up digitized using SPOT 10 Meter Panchromatic and 20Meter Multi-Spectral Scanner imagery. From 1995 to 1999 (4 years) 1 meter Digital Ortho-photoQuarter Quads were used, and by 1999 some permits were being digitized using county parceldata. Today all permits are digitized to parcel but we have 23 years of badly data digitized withmuch less than optimal base maps.
  2. 2. The Data - ParcelsThe District uses a contiguous parcel base that is composed of features from the 16 countieswithin the District’s jurisdiction. The State of Florida’s Cadastral Mapping Guidelinesrecommend that horizontal accuracy should meet or exceed U.S. National Map AccuracyStandards (NMAS). These standards state that at “scales larger than 1:20,000, not more than 10percent of the points tested shall be in error by more than 1/30 inch, measured on the publicationscale.” Common scales for cadastral maps range from 1:500 to 1:10,000 assuming that they arefollowing NMAS horizontal positional accuracy at the 90% confidence will range from ±1.38 to27.78 feet (Table 2). NMAS NMAS NSSDA NSSDA Map Scale CMAS RMSE(R) Accuracy (R) 95% 90% confidence level 1:1,200 (1” = 100’) 3.33 2.20 ft 3.80 ft 1:2,400 (1” = 200’) 6.67 4.39 ft 7.60 ft 1:4,800 (1” = 400’) 13.33 8.79 ft 15.21 ft 1:6,000 (1” = 500’) 16.67 10.98 ft 19.01 ft 1:12,000 (1” = 1000’) 33.33 21.97 ft 38.02 ftTable 1: Comparison of NMAS, NSSDA Horizontal Accuracy for ParcelsThese two data set are spatially correlated as permits are based on the same legal boundariesused for parcels and we can therefore use parcels as a control to test the accuracy of our permits.In general, the horizontal accuracy of the parcels can be considered to be an order of magnitudebetter than the permits.Literature ReviewPositional accuracy or spatial accuracy refers to the accuracy of a test feature when compared toa control feature. Methods for determining the positional accuracy of points are well establishedand are usually provided by the Euclidean distance between the test point and a control point.The error can be reported as errors in x, y, and z and descriptive statistics can be generated basedon these numbers.Determining the positional accuracy of a line is more complex since they are composed ofmultiple points each of which may or may not have a matching control point. Additionalproblems include the determination of an appropriate search radius and the identification ofequivalent features to be used for comparison. Atkinson-Gordo and Ariza-Lopez (2002) providean excellent review of methods for measuring the position accuracy of linear features.Methods for measuring the positional accuracy of polygons come from the extension of methodsused to measure the positional accuracy of lines. The five primary methods from Atkinson-Gordo and Ariza-Lopez in brief are as follows: 2
  3. 3. Epsilon Band Error methods are based on defining an uncertainty band around a polygonfeature. The band width is known as Epsilon and the wider it is the greater the uncertaintyin the position of a line. The band can be derived by error propogation or by thecomparison of test line segments to a control. The method determines an error bandrather than determining or quantifying the accuracy of the line Figure 1: Epsilon BandsThe Buffer-Overlay method of Goodchild and Hunter (1997) is based on defining abuffer around a control line of higher accuracy and computing the percentage of thelength of the less accurate line within the buffer zone. Then, the width of the buffer isincreased and the percentage computed again. The process is repeated several timesproducing a probability distribution. Figure 2: Buffer-OverlayThe Buffer Overlay Statistics method of Tveite and Langaas (1999) involves buffering,overlay, and generating statistics. First both the test line (X) and the control line (Q) arebuffered to produce buffers XB and QB. An overlay operation is then performed resultingin four types of areas (Figure 3):Type 1: Area outside XB and outside QB:Type 2: Area outside XB and inside QB:Type 3: Area inside XB and outside QB:Type 4: Area inside XB and inside QB: 3
  4. 4. A number of different statistics can be generated from the above metrics but for ourpurposes the most interesting is Type 4 which will dominate if the test and controlpolygon are very similar. When the lines are similar in form but differ in position(displacement is present), an estimate of the positional accuracy can be made when Type4 approaches 50%. Figure 3: Buffer Overlay StatisticsHausdorff Distance methods of Abbas, Grussenmeyer and Hunter (1995) is based oncalculating the Hausdorff distance on a pair of equivalent lines that have been generalizedand normalized using the RMSE and a generalization factor. Two values are computedfor evaluation of a line: percentage of agreement (ratio between the normalized lines andthe original lines) and the RMSE for planimetric features (computed from all thenormalized lines). Figure 4: Hausdorff DistanceMaximum Proportion Standard (MPS) and Maximum Distortion Standard (MDS) methodof Veregin (2000) is based on the computation of the uniform distortion (UDD). TheUDD is computed from areas between two lines and the length of the line in the map.Then, a diagram of cumulative frequencies is built for a given band width at a given levelof confidence. 4
  5. 5. Figure 5: MPS and MDSThe advantages of the Buffer-Overlay method over other methods discussed is that: (1) it canperform effectively without the need to extract both the test and the control polygon, (2) it doesnot require matching of points between the two representations, (3) it is relatively insensitive tooutlying values, and (4) it is statistically based. Additionally, the algorithm uses commonbuffering and clipping functions available in all major GIS.The Test AreaIn order to thoroughly test the limits of our procedures for determining and improving horizontalaccuracy we chose to run our test on a subset of the data. Specifically, we extracted theEnvironmental Resource Permits for Township 44S Range 25E in Lee County, Florida. LeeCounty was selected because it was an area known to have permits that were highly displacedfrom their parcel counterparts. All permits that intersected this township range were extractedinto a File Geodatabase consisting of 259 features. Figure 6: Test Area 5
  6. 6. MethodsA straight forward method for determining the horizontal accuracy of a polygon feature class isto measure the offset between polygon vertices and parcel vertices and then calculate the RootMean Square Error (RMSE). In order to facilitate this activity a C# program was written thatwould allow staff to create a database of coordinate sample points. The RMSE provides us withthe accuracy of the entire feature class but does not tell us the accuracy of individual permits,hence, the need for Buffer-Overlay.Buffer-Overlay is usually implemented by buffering a control line and quantifying how much ofthe test line is found within each buffer. This works well with small control data sets such as ashoreline but is not practical when using parcels. In this case it would require buffering eachparcel line segment and then checking for an overlapping permit line segment that in themajority of cases does not exist. This implementation will therefore buffer the permit lines (test)and quantify how much of the parcel line (control) is found within each buffer. The output is thecumulative probability (CP) curve for each individual permit.The pseudo code for calculating the initial horizontal accuracy is as follows: Convert parcel polygons to parcel lines For each permit o Buffer from 0.5 ft to 60 ft @ 0.5 ft intervals  Clip the parcel lines (control) using buffer distance  Drop dangling nodes (where length = buffer)  Calculate the CP  If CP 1 horizontal accuracy is the buffer distance  Else If CP 0.999 and buffer < 60 next bufferClipping produces short and long line segment dangles as artifacts the length of which aredirectly related to the buffer distance used to clip. Short dangles are easily removed byeliminating segments equal to the buffer distance. In the case of long dangles the CP reaches 1before a complete ring can be extracted and will result in a failed polygon build.This algorithm was run on all polygons in the test area resulting in 259 curves composed of theindividual probability at each buffer distance for each feature. In Figure 7 a random sample ofCP curves for 21 permits is displayed. On this graph the x-axis represents the distance bufferedfrom 0.5 to 60 feet @ 0.5 ft intervals. The y-axis represents the CP and when the curve reaches 1or more the length of clipped parcel line is greater than or equal to the perimeter of the permit 6
  7. 7. line. In these cases the buffer distance used is assigned as the horizontal accuracy of the permit.Those curves that never reach 1 are outside of our maximum buffer distance of 60 feet. 1.2 1 Cumulative Probability (%) 0.8 0.6 0.4 0.2 0 0.5 2.5 4.5 6.5 8.5 10.5 12.5 14.5 16.5 18.5 20.5 22.5 24.5 26.5 28.5 30.5 32.5 34.5 36.5 38.5 40.5 42.5 44.5 46.5 48.5 50.5 52.5 54.5 56.5 58.5 Buffer Distance (ft)Figure 7: Cumulative Probability for Individual PermitsPhase I CorrectionPhase I involved converting the extracted parcel line segments into polygons using geospatialtools. This functionality is built into many GIS and is best associated with the creation of parcelpolygons from meets and bounds entered using Coordinate Geometry. The pseudo code for thisis as follows: For each permit o Buffer at the accuracy level previously determined o Clip the parcel lines (control) using the test buffer o Drop dangling nodes (where length = buffer) o Build parcel lines as polygons  Compare area of polygon to original permit  Only accept if polygon area = 0.03 * permit area  Build Succeeds/FailsIn some cases long line segment artifacts are extracted that form closed rings and results inpolygon builds that are significantly larger or smaller in area than the original permit and can beexcluded through an area comparison. 7
  8. 8. Once complete each permit feature will have a CP and an assigned horizontal accuracy. TheRMSE will be recalculated to quantify the improvements on the entire feature class.Phase II CorrectionPhase II involved adding arc segments to parcel lines with gaps in order to form a closed ringthat could be built into a permit polygon. The pseudo code for this is as follows: For each permit that failed to build o Buffer @ accuracy level previously determined o Clip the parcel lines (control) using test buffer o Drop dangling nodes (where length = buffer) o For each remaining node  Identify the closest node  Connect the two nodes with a line segment o Build lines as polygons o Compare area of polygon to original permit  if polygon area = permit area ( +/- 0.03 * permit area )In this case, an improved CP cannot be calculated since Phase 2 adds line segments to permitfeatures where they are missing from parcel features. Since the CP is based on the parcels and inthis case parcel line segments are missing an improved CP cannot be calculated. However, wecan recalculate the RMSE to quantify any improvements.ResultsThe initial confidence interval on the estimate of RMSE for x and y at 95% probability wascalculated using 30 coordinate pairs from the entire test area. The initial values were 20.22 ±5.74 in the x and 22.18 ± 7.29 in the y (Table 2). The RMSE measure is circular meaning that thevalues are relatively similar between the x and y and indicate that there is no systematic error inthe data that would produce more errors in any particular direction. Initial X/Y DimensionDefinitions ValuesConfidence interval on the estimate of RMSEx at 95% probabilityRMSEx + 1.96 * SRMSE > exi > RMSEx - 1.96* SRMSE 20.22 ± 5.74 = 14.49 to 25.95Confidence interval on the estimate of RMSEy at 95% probabilityRMSEy + 1.96 * SRMSE > eyi > RMSEy - 1.96 * SRMSE 22.18 ± 7.29 = 14.89 to 29.46Table 2: Initial Root Mean Square error (RMSE) 8
  9. 9. 40 Figure 8, is a graph of the initial 35 horizontal accuracy distribution from 0 30 to 60 feet for all 259 permit features. 25 The distribution has two peaks at either# 20 extreme representing a large number of 15 high accuracy features ( 0.5 feet) and 10 a large number low accuracy features 5 0 ( 60 feet) in between the curve is randomly distributed and contains a 61 0.5 4.5 8.5 12.5 16.5 20.5 24.5 28.5 32.5 36.5 40.5 44.5 48.5 52.5 56.5 significant number of features. Buffer Distance (ft) 300 Figure 9, is a graph of the cumulative horizontal accuracy distribution. In the 250 best case scenario this would be a 200 straight line across the y–axis at 259# 150 indicating that all features had 100 accuracies of 0.5 feet. About 10% of the features have accuracies 0.5 feet, 50 then there is a steady stream of features 0 of various accuracies up to 60 feet 0.5 4.5 8.5 12.5 16.5 20.5 24.5 28.5 32.5 36.5 40.5 44.5 48.5 52.5 56.5 61 (80%), and lastly about 10% of the records were not measured because Buffer Distance (ft) their accuracy was 60 feet. Figure 10, is a classified map of the initial horizontal accuracies. Permits in green have accuracies of 1 foot, yellow from 2 to 59 feet, and red from 60 feet to 999. Where 999 represents features beyond our 60 foot buffer distance. 9
  10. 10. Phase I Correction was applied once the RMSE for the feature class and individual featureaccuracies had been generated. Phase I correction consisted of buffering features at thepreviously determined accuracy, using this buffer to clip parcels, and then building higheraccuracy replacement polygons. Buffer-Overlay was then used to re-calculate the horizontalaccuracy for all permits. Figure 11 is a graph of the initial (red) 140 and Phase I (green) accuracy 120 distribution for all 259 permits. After 100 correction the number of features with 80 displacements of 1 foot increased by# 105 records or 40%. 60 40 20 0 61 0.5 4.5 8.5 12.5 16.5 20.5 24.5 28.5 32.5 36.5 40.5 44.5 48.5 52.5 56.5 Buffer Distance (ft) Initial Phase 1 Correction Figure 12 is a close-up view of the 35 curve for horizontal accuracies between 30 0 and 30. Here we see that the 25 amplitude of the curve has been 20 reduced and that the Phase I curve # 15 (green) runs above the initial conditions 10 (red) for accuracies 1 foot and below 5 for the rest of the curve. 0 4.5 6.5 0.5 2.5 8.5 10.5 12.5 14.5 16.5 18.5 20.5 22.5 24.5 26.5 28.5 Buffer Distance (ft) Initial Phase 1 Correction 10
  11. 11. 300 Figure 13 is a graph of the cumulative curve for both the Initial (red) and 250 Phase I (green) conditions. Here we see 200 that an addition of 105 records now # 150 have accuracies of 1 foot. 100 50 0 0.5 4.5 8.5 12.5 16.5 20.5 24.5 28.5 32.5 36.5 40.5 44.5 48.5 52.5 56.5 61 Buffer Distance (ft) Initial Phase 1 CorrectionIn Table 3, the before and after RMSE are provided for comparison displaying a significantreduction in the mean of the RMSE.Confidence interval on the estimate of Initial Phase IRMSEx at 95% probability X/Y Dimension Values X/Y Dimension ValuesRMSEx + 1.96 * SRMSE > exi > RMSEx - 1.96* SRMSE 20.22 ± 5.74 = 14.49 to 25.95 15.52 ± 5.78 = 9.75 to 21.3RMSEy + 1.96 * SRMSE > eyi > RMSEy - 1.96 * SRMSE 22.18 ± 7.29 = 14.89 to 29.46 12.76 ± 4.6 = 8.16 to 17.35Table 3: RMSE Initial and Phase 1 CorrectionIn Figure 14, two maps are shown depicting the horizontal accuracy before (left) and after Phase I (right).Figure 14: Before and After Accuracy Classification 11
  12. 12. In the process of building higher accuracy features in Phase I some polygons could not be builtbecause of clipped line segments that did not form a complete rings. Phase II atempts to correctthese features by adding line segment at dangling nodes in order to form a complete ring. Thisoperation resulted in 6% or 15 additional records being classified as 1 foot (Figure 15 and 16). 35 Initial 30 Phase 1 Correction Phase 2 Correction 25 20# 15 10 5 0 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 18.5 19.5 20.5 21.5 22.5 23.5 24.5 25.5 26.5 27.5 28.5 29.5 Buffer Distance (ft)Figure 15: Initial, Phase I, and Phase II Horizontal Accuracy Distribution 300 250 200 # 150 Initial Phase 1 Correction 100 Phase 2 Correction 50 0 61 24.5 0.5 2.5 4.5 6.5 8.5 10.5 12.5 14.5 16.5 18.5 20.5 22.5 26.5 28.5 30.5 32.5 34.5 36.5 38.5 40.5 42.5 44.5 46.5 48.5 50.5 52.5 54.5 56.5 58.5 Buffer Distance (ft)Figure 16: Initial, Phase I, and Phase II Cumulative Curves 12
  13. 13. DiscussionMany of the data sets stewarded by geospatial professionals are based on or directly related tohigher accuracy data sets that could be used to improve horizontal spatial accuracy. In this paperwe have demonstrated the use of Buffer-Overlay to determine and improve the accuracy ofpermits whose boundaries are related to higher accuracy parcel boundaries. The initial accuracyassessment included the RMSE for the feature class and then each feature was assigned ahorizontal spatial accuracy from 0 to 60 feet at 0.5 foot intervals. Phase I used these accuracymeasures to clip parcel lines and build higher accuracy polygons. The results were a 40%increase in the number of records with accuracies 1 foot. Phase II examined those records thatfailed to build in Phase I. Line segments were added between node gaps in order to form ringsthat could be built into polygons. The result was a 6% increase in the number of records withaccuracies 1 foot.In general, we find that Buffer Overlay is an effective method for quantifying and improving theaccuracy of features where control data exists. Most data stewards would acknowledge having adata set that should be improved but lack the time and money to make such improvements. Thecost of improving data using Buffer Overlay is confined to algorithm development and timerequirements if automated boil down to CPU cycles leaving the steward free to focus on thecapture and accuracy of new data.References[1] Goodchild, F.M., and G.J. Hunter, 1997. A simple positional accuracy measure for linearfeatures, International Journal of Geographical Information Sciences, 11(3):299-306. [2] Atkinson, A.D.J., and F. Ariza, 2002. Nuevo Enfoque para el Analisis de la CalidadPosicional en cartografica Mediante Estudios Basados en la geometria Lineal, Proceedings XIVInternational Congress of Engineering Graphics, Santander, Spain.[3] Tveite, H., and S. Langaas, 1999. An accuracy assessment method for geographical line datasets based on buffering, International Journal of Geographical Information Sciences, 13(1): 27-47. 13