Your SlideShare is downloading. ×
0
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Quality Assessment and 
l
d
Improvement for Addressed 
p
Locations in Colorad...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Abstract:

Quality	Assessment	and	Improvement	for	Addressed	Locations	in	Colo...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY
Status of Address Points Received Per County
August 2013
SEDGWICK
LOGAN
MOFFAT...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

What are we trying to map?
 Most often: buildings
Most often: buildings
– Re...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Why?  Multi‐purpose:
1.
1

2.
2

Increase accuracy of broadband mapping avail...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Why now?
 S i l
Social consciousness and value of georeferenced address loca...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Common Data Model
 Why?
– Allows local and state wide querying analysis and ...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Common Data Model


What?
– An implementation of the United States Thoroughf...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

FGDC‐STD‐016‐2011 
United States Thoroughfare, Landmark, and Postal Address D...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY
Address

AddressPoint
Parcel Centroid
Building Centroid
Main Entrance
Driveway...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Common Data Model

Field Name 
PlaceID
AddressUID
AddressUUID 

Data Type 
Lo...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Common Data Model














Quantity over Quality Speed over...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Publication
1. Address Data received at OIT – Dec 2012
2. Conversion and Load...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY
Source

Source Field Name

Source Data Type

Source Type

Target

Target Field...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Colorado State Address Dataset ‐ Common Data Model Crosswalk
Source
Source Fi...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY
Custer	County
AddressNumber and	AddressSuffix
If	IsNumeric([CusterAddressPoint...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

“Errata”
Errata

Archuleta
Lats and	Longs
Side	from	Boolean	to	Text	(TranSegS...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Land Use types to LBCS Structure Codes
Land Use types to LBCS Structure Codes...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Automated Processing
1. Means of Transfer
1 M
fT
f
• data.colorado.gov – (Soc...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality
Two Tracks:
Two Tracks:
1. Develop criteria and measure quality
...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Address Data Quality
Brainstorm
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality ‐ Status
 Reviewed ISO Geographic information data quality elem...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality Measurement Concepts
D
Q li M
C
QualityElement
completeness

Qua...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Potential Tests:
FGDC-STD-016-2011US…Address Data Standard
Qua...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality Requirements:
NENA 02-014 GIS Data Collection and Maintenance
Qu...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality Requirements:
NENA 71-501 Synchronizing GIS Databases with MSAG ...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

USPS Address 
USPS Address
Quality Improvement Processes
1. Locatable Address...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

USPS Comparisons:
p
Coding Accuracy Support System 
(CASS) (CASS)
(CASS) (CAS...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality: USPS State
 Street segments with more than one name
h
h
 AEC ...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality Elements
ISO 19157 Geographic information  Data quality defines ...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Sampling Size
Sample Size and Confidence Interval Tutorial
The...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Sampling Size

http://williamgodden.com/samplesizeformula.pdf
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Sampling Size

http://williamgodden.com/samplesizeformula.pdf
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Sampling Size
With a confidence interval of 3 percentage point...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Sampling Method
1. Randomly select 5 address points
d l
l
dd
2...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Sampling Method
1. Randomly select of 5 address points
y
p
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Sampling Method
2. Select road segments associated with addres...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Sampling Method
3. Select adjacent connected road segments
j
g
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Sampling Method
4.

Select the address points associated with ...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Sampling Method
5. Repeat steps 3 & 4 until sample size is exc...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Sampling Method
1000
900
800
700
600
500

Address Points

400
...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Sampling Method
Issues:
 Sample selection – why start with 5?...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality Elements
ISO 19157 Geographic information  Data quality defines ...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Completeness
 O i i
Omissions ‐ agreed upon by many as the pr...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Positional Accuracy
 P iti
Positional accuracy will be measur...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Temporal Accuracy
 Wh t temporal information is being reporte...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Thematic Accuracy

Increasing 
Value

CompleteStreetNumber
Com...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Logical Consistency
 Fishbones!
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Logical Consistency
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Logical Consistency
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Data Quality – Logical Consistency
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Credits
 A very special thanks to:
–
–
–
–
–
–
–
–
–

Rick Smajter, City of ...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

References
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Colorado St t Address D t
C l d State Add
Dataset W b it
t Websites


Govern...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Colorado St t Address D t
C l d State Add
Dataset W b it
t Websites


Colora...
GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY

Colorado State Address Dataset
Nathan Lowry, Colorado OIT
N h L
C l d OIT
Oct...
Upcoming SlideShare
Loading in...5
×

2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations in Colorado by Nathan Lowry

181

Published on

ISO 19157 Geographic information - Data quality provides a structure for organizing comprehensive data quality assessment measures. What it doesn't provide is a priority of data quality elements for a specific dataset and jurisdiction. Over the past year, the Colorado Address Data Quality subgroup has developed a prioritized list of data quality measures for addressed locations, in an effort to establish common criteria and a scorecard. These will provide a means to describe the data compiled from multiple jurisdictions with varying origins in an objective manner so users of the data can determine their fitness for use. It also provides feedback for local jurisdictions to increase their level of quality according to their need and discretion.

In addition, the State of Colorado in coordination with the US Postal Service, the US Census Bureau, and state and local agencies will begin to provide feedback to local jurisdictions on possible discrepancies in comparison to Master Street Address Guides (MSAGs), the Coding Accuracy Support System (CASS), Statewide Colorado Voter Registration and Election System (SCORE), the Colorado Motorist Insurance Identification Database MIDB, and other datasets that contain addresses. These comparisons are particularly helpful in identifying possible omissions but also in confirming and completing georeferenced address data content. This presentation will describe the value of these comparisons and progress in developing and measuring data quality using common criteria and objective measures.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
181
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations in Colorado by Nathan Lowry"

  1. 1. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Quality Assessment and  l d Improvement for Addressed  p Locations in Colorado GIS in the Rockies October 9, 2013 Nathan Lowry, Colorado OIT
  2. 2. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Abstract: Quality Assessment and Improvement for Addressed Locations in Colorado ISO 19157 Geographic information ‐ Data quality provides a structure for organizing comprehensive data quality assessment measures. What it doesn't provide is a priority of data quality elements for a specific dataset and jurisdiction. Over the past year, the Colorado Address Data Quality subgroup has developed a prioritized list of data quality measures for addressed locations, in an effort to establish common criteria and a scorecard. These will p p p j y g g provide a means to describe the data compiled from multiple jurisdictions with varying origins in an objective manner so users of the data can determine their fitness for use. It also provides feedback for local jurisdictions to increase their level of quality according to their need and discretion. In addition, the State of Colorado in coordination with the US Postal Service, the US Census Bureau, and state and local agencies will begin to provide feedback to local jurisdictions on possible discrepancies in comparison to Master Street Address Guides (MSAGs), the Coding Accuracy Support System (CASS), the Statewide Colorado Voter Registration and Election Accuracy Support System (CASS) the Statewide Colorado Voter Registration and Election System (SCORE), the Colorado Motorist Insurance Identification Database (MIDB), and other datasets that contain addresses. These comparisons are particularly helpful in identifying possible omissions but also in confirming and completing georeferenced address data content This presentation will describe the value of these comparisons and progress in content. This presentation will describe the value of these comparisons and progress in developing and measuring data quality using common criteria and objective measures.
  3. 3. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Status of Address Points Received Per County August 2013 SEDGWICK LOGAN MOFFAT JACKSON !  !  LARIMER !  !  ROUTT PHILLIPS WELD !  MORGAN !  RIO BLANCO GARFIELD !  GRAND !  !  ! !   ADAMS GILPIN ! ! !    DENVER CLEAR CREEK !  ARAPAHOE EAGLE SUMMIT !  JEFFERSON !  !  !  ! !  ELBERT ! !   PITKIN DOUGLAS !  LAKE BROOMFIELD WASHINGTON YUMA KIT CARSON !  PARK MESA LINCOLN TELLER DELTA !  GUNNISON MONTROSE !  !  CHAFFEE !  !  !  PASO EL ! !   !  !  !  DOLORES SAN JUANHINSDALE SAN MIGUEL CHEYENNE KIOWA FREMONT !  !  OURAY CROWLEY PUEBLO !  !  SAGUACHE OTERO !  !  !  CUSTER MINERAL RIO GRANDEALAMOSA MONTEZUMA !  BOULDER BENT HUERFANO !  !  !  LA PLATA !  ARCHULETA !  BACA LAS ANIMAS CONEJOS !  !  PROWERS COSTILLA !  Sharing !  !  Counties with Address Points Public Not Developed State In Development Pending Agreement Pending Receipt Received
  4. 4. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY What are we trying to map?  Most often: buildings Most often: buildings – Residences and workplaces  Sometimes: building complexes – High‐rises, apartment complexes, campuses g , p p , p  Accesses to buildings: – Main entrances, service entrances – Driveways, access roads y  Sometimes other structures:  – Communications, Electrical, Natural Gas, Water, Heating and Cooling,  Sanitary Sewer and Storm Drainage utilities, Signage, etc.  Sometimes, land only – Parcels, park lands, event sites, etc.  Occasionally, a location w/o reference to property – Traffic incident locations, other abstract locations ff d l h b l
  5. 5. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Why?  Multi‐purpose: 1. 1 2. 2 Increase accuracy of broadband mapping availability per Census block as  I f b db d i il bilit C bl k portrayed by the Colorado Broadband Data Mapping and Development  Program For administrative accuracy For administrati e acc rac – Identify the taxation of, registration of, services provided to Colorado residences  correctly – Enumerate  or estimate the right number of people within a given boundary g p p g y • County, Municipal or Service district boundary, intra‐district school population  balancing, business service area boundaries, voter precincts, etc. 3. Assist in the notification, evacuation, and recovery of personnel and  property from facilities in response to natural and man‐made  emergencies
  6. 6. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Why now?  S i l Social consciousness and value of georeferenced address locations has  i d l f f d dd l ti h significantly increased within the last decade (NSGIC, URISA, NENA, etc.)  NTIA (Broadband), Census (GSS‐I), (NG)9‐1‐1, USPS, and many other  communities with significant interests – esp. State of Colorado comm nities ith significant interests esp State of Colorado  Cost is low ‐ Low floor for “getting in the door”…  Value is high ‐ Implementation is cross‐functional – “It is the ‘key’ for so many other data sets” – Paul Tessar, NCR April meeting – After crime stats, the number two data download for the City and County of Denver  And frankly, for a majority, we’re already doing it
  7. 7. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Common Data Model  Why? – Allows local and state wide querying analysis and integration Allows local and state‐wide querying, analysis, and integration … – Accommodates information exchanges • Hierarchical ‐ City to County, County to Region, Region to State  • Among neighboring jurisdictions (eg. County to County, etc.) – All Allows profiles to provide data in standard forms for specific objectives fil t id d t i t d d f f ifi bj ti • • • NENA CLDXF for NG‐911 USPS Pub‐28 for CASS ArcGIS Geocoding (for quality comparisons, etc.) – It’s more efficient (less work) and assures more quality (less loss) It s more efficient (less work) and assures more quality (less loss) Common  Data  Model (x inputs) X (y outputs) = z translations (x inputs) + (y outputs) = z translations
  8. 8. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Common Data Model  What? – An implementation of the United States Thoroughfare,  Landmark, and Postal Address Data  Standard (FGDC‐STD‐016‐2011),  – Leans on: • • • National Emergency Number Association (NENA) 02‐014 GIS Data Collection and Maintenance, 02‐010  Standard Data Formats for 9‐1‐1 Data Exchange & GIS Mapping St d d D t F t f 911D t E h & GIS M i NENA draft Civic Location Data EXchange Format (CLDXF) and GIS Data Model for Next Generation  (NG)‐911  Census Optimal Address Data Submission Guidelines
  9. 9. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY FGDC‐STD‐016‐2011  United States Thoroughfare, Landmark, and Postal Address Data Standard Of Greatest Significance: Of Greatest Significance: 1. Everything* is ‘fully explicit’ (fully spelled‐out)  ‒ No abbreviations allowed; No Ambiguity *The only exception is two‐letter state postal codes (eg. “CO” = Colorado) 2. You will express exactly how each address will be parsed          ‒ Parsing is no longer subject to interpretation  ‒ The break‐down is stored in the data for each record 3. Each Address must be assigned a Unique Identifier (UID) ‒ Multiple representations of the same address can be “tied together”  if and only if (iff) addresses are assigned UIDs. These are big changes that few have yet implemented • Our common data model is designed to accommodate both: ‒ your current state and  ‒ thi “t b ” t t this “to be” state
  10. 10. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Address AddressPoint Parcel Centroid Building Centroid Main Entrance Driveway Entrance Centerline Reference … Lat/Long Site UID StreetAddress Landmark AddressNumber StreetName ParcedAddress (Complex) IsMailingAddress (B) HasSubAddress (B) … AddressRange Name … SubAddress PointAddressRange LineAddressRange … … MailingAddress (only) Place PO Box Zipcode Etc. Etc … AddressArea Building Footprint Unit Area Parcel … Name … AddressReferenceSystem AddressVolume ExtrudedVolume BIMVolume AddressExchange (Input/Output Table)* Lat Long UID StreetAddress Parced Address 12345.12 ‐12345.12 080211239 912593 123 Clark Street,  Antonito, CO 123 | Clark | Street | Antonito | Conejos | CO 12348.57 ‐12346.28 080211239 912593 123 Clark Street,  Antonito, CO 123 | Clark | Street | Antonito | Conejos | CO 456 Jones Ave,  456 J A Antonito, CO *In advance of XSD/XML implementation routines as described by FGDC‐STD‐016‐2011 Part 5: Address Data Exchange 
  11. 11. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY
  12. 12. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Common Data Model Field Name  PlaceID AddressUID AddressUUID  Data Type  Long Integer  Long Integer GUID  Length  Description  Unique identifier assigned to each value in the dPlace domain.  AddressID as defined in FGDC‐STD‐016‐2011. Uniquely identifiable integer assigned by each AddressAuthority.  Used in this data model to help uniquely identify Addresses.  A Universally (aka Globally) Unique Identifier, usually a 16‐byte binary value,  AddressID as defined in FGDC‐STD‐016‐2011. Uniquely identifiable GUID assigned by each AddressAuthority.  Used in this  data model to help uniquely identify Addresses.  LandElementName  NumberPrefix  AddressNumber  NumberSuffix  StreetPreModifier  Text  Text Long Integer Text Text 255 15 StreetPreDirectional  Text 20 StreetNamePreDirectional as defined in FGDC‐STD‐016‐2011.  A word preceding the StreetName that indicates the direction or position of the thoroughfare relative to an arbitrary starting point or line,  or the sector where it is located. h h i i l d StreetPreType  StreetSeparator  Text Text 35 10 StreetNamePreType as defined in FGDC‐STD‐016‐2011.  A word or phrase that precedes the StreetName and identifies a type of thoroughfare in a CompleteStreetName. SeparatorElement as defined in FGDC‐STD‐016‐2011.  A ...  [prepositional] phrase ... used as a separator between a StreetPreType and a StreetName, as in "Avenue of the Americas"]. StreetName  StreetPostType  StreetPostDirectional Text Text Text 75 35 20 StreetName as defined in FGDC‐STD‐016‐2011.  The portion of the CompleteStreetName that identifies the particular thoroughfare ... . StreetNamePostType as defined in FGDC‐STD‐016‐2011.  A word or phrase that follows the StreetName and identifies a type of thoroughfare in a CompleteStreetName. StreetNamePostDirectional as defined in FGDC‐STD‐016‐2011. A word following the Street Name that indicates the direction or position of the thoroughfare relative to an arbitrary starting point or line,  or the sector where it is located. StreetPostModifier  Text 20 StreetNamePostModifier as defined in FGDC‐STD‐016‐2011.  A word or phrase in a Complete Street Name that follows and modifies the StreetName, but is separated from it by a StreetNamePostType or a StreetNamePostDirectional or both. AddressLocDesc  PlaceName  PlaceNameType  CountyName  Text Text Text Text 255 100 35 25 LocationDescription as defined in FGDC‐STD‐016‐2011.  A text description providing more detail on how to identify or find the addressed feature. PlaceName as defined in FGDC‐STD‐016‐2011. The name of an area, sector, or development; incorporated municipality ...; county ...; or region within which the address is physically located; or a name PlaceNameType as defined in FGDC‐STD‐016‐2011. The type of Place Name used in an Address. The county or county equivalent where the address is physically located as defined in FGDC‐STD‐016‐2011 and the NENA NG9‐1‐1 US CLDXF Standard.  A county (or its equivalent) is the primary legal  division of a state or territory. StateName  Text 30 State names in ANSI INCITS 38:2009. The US states and state equivalents: the fifty US states, the District of Columbia, and all U.S. territories and outlying possessions. A state (or equivalent) is "a  primary governmental division of the United States." ZIPCode  Long Integer Zone Improvement Plan Code. A system of 5‐digit codes that identifies the individual Post Office or metropolitan area delivery station associated with an address. See USPS, "Quick Service Guide 800:  Glossary of Postal Terms and Abbreviations in the DMM." ZIPCodePlusFour  Short Integer ZipPlus4 in FGDC‐STD‐016‐2011. A 4‐digit extension of the 5‐digit Zip Code (preceded by a hyphen) that, in conjunction with the Zip Code, identifies a specific range of USPS delivery addresses. Adapted  from USPS, "Quick Service Guide 800: Glossary ...  ." 15 20 The name of a relatively permanent feature of the ... landscape that has recognizable identity within a particular cultural context.  Modified from LandmarkName as defined in FGDC‐STD‐016‐2011. AddressNumberPrefix as defined in FGDC‐STD‐016‐2011.  The portion of the CompleteAddressNumber which precedes the AddressNumber itself. AddressNumber as defined in FGDC‐STD‐016‐2011. The numeric identifier for a land parcel, house, building, or other location along a thoroughfare or within a com AddressNumberSuffix as defined in FGDC‐STD‐016‐2011.  The portion of the CompleteAddressNumber which follows the AddressNumber itself. StreetNamePreModifier as defined in FGDC‐STD‐016‐2011.  A word or phrase in a CompleteStreetName that precedes and modifies the StreetName, but is separated from it, ... or is placed outside the  StreetName ... [to] sort ... [a] list of street names. Country  Text 50 CountryName as defined in FGDC‐STD‐016‐2011. The name of the country in which the address is located. A country is "an independent, self‐governing, political entity." See ISO 3166‐1. ParcelID  TransSegID  Text Text 20 30 Foreign Key Parcel Identifier as defined in FGDC‐STD‐016‐2011.  The primary permanent identifier ... for a parcel that includes the land or feature identified by an address. Foreign Key AddressTransportationFeatureID as defined in FGDC‐STD‐016‐2011. The unique identifier assigned to the particular feature that represents an address within a transportation base model. Feature Type LBSCStructCode NAICSCode Text Text Short Integer Long Integer 30 35 The physical feature of the landscape that is being represented by this geometry. A description of the placement of this geometry on the landscape to represent the physical feature. Land Based Classification Standards (LBCS) Structure Code defined by the Address of the American Planning Association (APA). See http://www.planning.org/lbcs/standards/ North American Industry Classification System (NAICS) code for the Address (if it is a business) as defined by the Office of Management and Budget (OMB). See   http://www.census.gov/eos/www/naics/index.html Longitude Latitude Double Double AddressLongitude as defined in FGDC‐STD‐016‐2011.  The longitude of the address location, in decimal degrees [using the North American Datum of 1983 (NAD83)]. AddressLatitude as defined in FGDC‐STD‐016‐2011. The latitude of the address location, in decimal degrees [using the North American Datum of 1983 (NAD83)].
  13. 13. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Common Data Model              Quantity over Quality Speed over Structure Quality, Only one dataset per area processed (eg. Regional) Retains 60-80% of the attribute data from sources Converted many values to target data types Address numbers and number suffixes separated Domain values not (yet) used or enforced Multiple representations of fields for common records are not represented (only first instance) Not all fields that can be are populated Some source fields may have been misinterpreted Some source fields not identified to any target field Cases (Mixed case instead of UPPER) not standardized Land use domain values not (yet) translated Documentation incomplete We’ve worked further, but the work is not yet complete
  14. 14. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Publication 1. Address Data received at OIT – Dec 2012 2. Conversion and Loading at OIT – Feb 2013 • Conversion to Interim Data Model • Loaded into state‐wide database 3. Publication as Interim Services – Apr‐Aug 2013 (Access Controlled to State Agencies) • ArcGIS Server Services: Server Services: • OGC Services: OGC Services: ‐ Mapping, Feature Access, Geodata ‐ WFS (Mapping and Geodata),  WMS, KML Network Links ‐ SQL Server, ArcSDE 4. Publication to Internet ‐ (Publicly Accessible Data only) Publication to Internet   (Publicly Accessible Data only) • Conversion to Common Data Model in state‐wide database • Mapping Services, data.colorado.gov, etc. 5. Address Locator, CASS, etc. 5 Address Locator CASS etc
  15. 15. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Source Source Field Name Source Data Type Source Type Target Target Field Name Target Data Type Target Type Target Field Calculation CusterAddressPoints.shp Shape Geometry Point Shapefile AddressPoint Shape LGID APSAID Longitude Lattitude IsPrincipal Feature Type LGID AddressSAID FullAddress Geometry Long Integer Long Integer Double (15*,9) Double (15* 9) Double (15*,9) Text(10) Text(30) Text(35) Long Integer Long Integer Text (255) Point Feature Class = =14001 Assign value (increment) Y Coordinate of Point; GCS: NAD 83 Y Coordinate of Point; GCS: NAD 83 X Coordinate of Point; GCS: NAD 83 ="Yes" Object Table =14001 Assign value (increment) =[LandmarkElement.LandmarkElementName]&" "&[StreetName.ExpressStAdd] ParcelID LGID AddressSAID SubAddressSAID APSAID AddressJoinID LGID AddressSAID LandmarkID LandmarkID LandElementID LandElementName LandEleSequence LGID AddressSAID StreetAddID Text (20) Long Integer Long Integer Long Integer Long Integer Long Integer Long Integer Long Integer Long Integer Long Integer Long Integer Long Integer Text (255) Byte Long Integer Long Integer Long Integer Long Integer ExpressStAdd IsPrincipal OfficialStatus AddressAuthority IsAnomaly IsMailableAddress StreetAddID StreetNameID NumberAddID AddressNumber NumberSuffix NumberAddID StreetNameID ExpressedStreetName LGID AddressSAID StreetAddID LandmarkID LastLineID CountyName StateName S N Country LastLineID LastLineEleID PlaceName PlaceNameType IsPrincipal LastLineEleSequence LGID AddressSAID StreetAddID LandmarkID L d kID SubAddressSAID LocationDescription LGID AddressSAID SubAddressSAID SubAddressEleID SubAddressType SubAddressIdentifier SubAddEleSequence Text (255) Text (10) Text (100) Text (50) Text (10) Text (10) Long Integer Long Integer Long Integer Long Integer Text (15) Long Integer Long Integer Text(255) Long Integer Long Integer Long Integer Long Integer Long Integer Text (25) Text (30) T (30) Text (50) Long Integer Long Integer Text (100) Text (10) Text (10) Byte Long Integer Long Integer Long Integer Long Integer L I t Long Integer Text (255) Long Integer Long Integer Long Integer Long Integer Text (25) Text (25) Byte Address Schedule Text (10) AddressesMayHaveAddressPoints Landmark LandmarkElement BusinessNm Text (50) StreetAddress FullAddr; FullAddr2 Text (100), Text (100) NumberedAddress Address Address Text (8) ( ) Text (8) Roadname; Route Text (65), Text (50) StreetName LastLine LastLineElement Roadname SubAddress SubAddressElement Relationship Class Object Table Object Table Object Table Object Table Object Table Object Table Object Table Object Table Object Table =[Schedule] =14001 =[Address.AddressSAID] =[SubAddress.SubAddressSAID] [SubAddress SubAddressSAID] =[AddressPoint.APSAID] Assign value (increment) =14001 =[Address.AddressSAID] Assign value (increment) =[Landmark.LandmarkID] Assign value (increment) [BusinessNm] =1 =14001 =[Address.AddressSAID] =[Address AddressSAID] Assign value (increment) =[FullAddr] ="Yes" when "FullAddr"; "No" when "FullAddr2" ="Unknown" "Custer County Planning and Zoning" ="Unknown" ="Unknown" =[StreetAddress.StreetAddID] =[StreetName.StreetNameID] Assign value (increment) =[Address] Right([CusterAddressPoints.Address],InStr([CusterAddressPoints.Address],“ “)) =[NumberedAddress.NumberAddID] Assign value (increment) =[Roadname]; = [Route] =14001 =[Address.AddressSAID] =[StreetAddress.StreetAddID] =[Landmark.LandmarkID] Assign value (increment) "Custer" "Colorado" "C l d " "United States" =[LastLine.LastLineID] Assign value (increment) IS NULL; = “Westcliffe” ; = “Silver Cliff” IS NULL; ="Incorporated Municipality" = "Yes" =1 =14001 =[Address.AddressSAID] =[StreetAddress.StreetAddID] =[Landmark.LandmarkID] [L d kL d kID] Assign value (increment) Null =14001 =[Address.AddressSAID] =[SubAddress.SubAddressSAID] Assign value (increment) ="Apartment" =[Apartment] =1
  16. 16. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Colorado State Address Dataset ‐ Common Data Model Crosswalk Source Source Field Name AddressPoints122112.shp Source Data Type Source Type Target AddressPoint Address PARCEL_NUM SCHEDULE_N SCHEDULE N AddressJoinAddressPoints StreetAddress Address NumberedAddress STREETNO STREETNO StreetName STREETDIR STREETNAME STREETSUF LastLine LastLineElement LOCCITY SubAddress SubAddressElement STREETALP URL Sequence Target Field Name PlaceID Target Data Type Long Integer APSAID Longitude Lattitude MetadataID PlaceID AddressSAID FullAddress ParcelID Long Integer Double (15*,9) Double (15*,9) Long Integer Long Integer Long Integer Text (255) Text (20) MetadataID PlaceID AddressSAID SubAddressSAID APSAID AddressJoinID PlaceID AddressSAID StreetAddID ExpressStAdd StreetAddID StreetNameID NumberAddID AddressNumber NumberSuffix Long Integer Long Integer Long Integer Long Integer Long Integer Long Integer Long Integer Long Integer Long Integer Text (255) Long Integer Long Integer Long Integer Long Integer Text (15) NumberAddID StreetNameID StreetPreDirectional StreetName StreetPostType PlaceID AddressSAID StreetAddID LastLineID CountyName StateName Country LastLineID LastLineEleID PlaceName IsPrincipal LastLineEleSequence PlaceID AddressSAID StreetAddID LandmarkID MailingID SubAddressSAID PlaceID AddressSAID SubAddressSAID S bAdd SAID SubAddressEleID SubAddressType SubAddressIdentifier SubAddEleSequence Long Integer Long Integer Text (20) Text (75) Text (35) Text (35) Long Integer Long Integer Long Integer Long Integer Text (25) Text (30) Text (50) Long Integer Long Integer Text (100) Text (10) Byte Long Integer Long Integer Long Integer Long Integer Long Integer Long Integer Long Integer Long Integer Long Integer L I Long Integer Text (25) Text (25) Byte Target Type Object Table Object Table Target Field Calculation =20  Assign value (increment) X Coordinate of Point; GCS: NAD 83 Y Coordinate of Point; GCS: NAD 83 Null (for now) =20  Assign value (increment) =[PARCEL_NUM] Object Table Object Table Object Table Object Table Object Table Object Table Object Table Object Table Null (for now) =20  =[Address.AddressSAID] =[SubAddress.SubAddressSAID] =[AddressPoint.APSAID] Assign value (increment) =20  =[Address.AddressSAID] Assign value (increment) =[Address] =[StreetAddress.StreetAddID] =[StreetName.StreetNameID] Assign value (increment) =[STREETNO] Right([STREETNO],InStr([STREETNO],“ “)) =[NumberedAddress.NumberAddID] Assign value (increment) =[STREETDIR] =[STREETNAME] =[STREETSUF] [STREETSUF] =20  =[Address.AddressSAID] =[StreetAddress.StreetAddID] Assign value (increment) "Eagle" "Colorado" "United States" =[LastLine.LastLineID] Assign value (increment) =[LOCCITY] = "Yes" =1 =20  =[Address.AddressSAID] =[StreetAddress.StreetAddID] =[Landmark.LandmarkID] =[MailAddress.MailingID] Assign value (increment) =20  =[Address.AddressSAID] =[SubAddress.SubAddressSAID] [S bAdd S bAdd SAID] Assign value (increment) "Unit" =[STREETALP] =1
  17. 17. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Custer County AddressNumber and AddressSuffix If IsNumeric([CusterAddressPoints.Address]) = 0 then if [CusterAddressPoints.Address] = ‘’ or if [CusterAddressPoints.Address] = ‘???’ then [NumberedAddress.AddressNumber] IS NULL else if InStr([CusterAddressPoints.Address],“ “) =0 then [NumberedAddress.AddressNumber] = CLng(Left[CusterAddressPoints.Address], InStr([CusterAddressPoints.Address],“‐“))) and [NumberedAddress.NumberSuffix] = Right([CusterAddressPoints.Address],InStr([CusterAddressPoints.Address],“‐“)) else [NumberedAddress.AddressNumber] = CLng(Left([CusterAddressPoints.Address], InStr([CusterAddressPoints.Address],“ “))) and [NumberedAddress.NumberSuffix] Right([CusterAddressPoints.Address],InStr([CusterAddressPoints.Address],“ )) and [NumberedAddress.NumberSuffix] = Right([CusterAddressPoints.Address],InStr([CusterAddressPoints.Address], “)) else [NumberedAddress.AddressNumber] = [CusterAddressPoints.Address] StreetName and PlaceName If [CusterAdddressPoints.Roadname] cn “Westcliffe” then [StreetName.StreetName] = Left[CusterAddressPoints.Address], InStr([CusterAddressPoints.Roadname], Westcliffe )) then [StreetName StreetName] = Left[CusterAddressPoints Address] InStr([CusterAddressPoints Roadname] “Westcliffe“)) and [LastLineElement.PlaceName] = “Westcliffe” If [CusterAdddressPoints.Roadname] cn “Silver Cliff” then [StreetName.StreetName] = Left[CusterAddressPoints.Address], InStr([CusterAddressPoints.Roadname],“Westcliffe“)) and [LastLineElement.PlaceName] = “Westcliffe” else [StreetName.StreetName] = [CusterAddressPoints.Roadname] else [StreetName StreetName] [CusterAddressPoints Roadname]
  18. 18. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY “Errata” Errata Archuleta Lats and Longs Side from Boolean to Text (TranSegSide domain) Bent Address Numbers are text (not numeric) Chaffee Address Numbers are text (not numeric) ZIPCodes are text (not numeric) are text (not numeric) Custer Address Number Suffixes must separated from Address Numbers Eagle Address Number Suffixes must separated from Address Numbers El Paso ‐ Teller ZIPCodes are text (not numeric) ZIPCode+4’s are text (not numeric) ( ) Fremont Address Numbers are text (not numeric) Garfield Address Numbers are text (not numeric) ZIPCodes are text (not numeric) Some Intersection Addresses Grand Address Numbers are text (not numeric) ZIPCodes are text (not numeric) Some Range Addresses Huerfano Single attribute for expressed street address Kit Carson Address Numbers are text (not numeric) Some address numbers appear to be street names (eg. 1st 1st Street) ZIPCodes are text (not numeric) Two Range Addresses La Plata AddressUIDs are text (not numeric) Address Numbers are text (not numeric) ZIPCodes are text (not numeric) Complex LocationDescription [TWNRNG_TXT, SECT_TXT, ALQPARTS, BLOCK, LOT, LANDTRACT] Logan Address Numbers are double (not integer) ZIPCodes are text (not numeric) A dozen or so Address Number Suffixes must separated from Address Numbers One Range Address Mesa Address Numbers are text (not numeric) ZIPCodes are text (not numeric) 57 ZIPCodes have values of “816XX” or “815XX” 57 ZIPCodes have values of “816XX” or “815XX” Moffat Address Numbers are text (not numeric) ZIPCodes are text (not numeric) Moffat Address Numbers are text (not numeric) ZIPCodes are text (not numeric) Several Range Addresseses Several Range Addresseses One Address Number Suffix must separated from its Address Number Montezuma Address Numbers are text (not numeric) ZIPCodes are text (not numeric) One Address Number Suffix must separated from its Address Number Park AddressUIDs are text (not numeric) are text (not numeric) AddressUIDs are not unique (many unpopulated) AddressUIDs are text are larger than an integer field – must be 17 characters long. Address Numbers are double (not integer) ZIPCodes are text (not numeric) Many SubAddress Identifiers must separated from Address Numbers Many Range Addresses Many ZipCode values are ‘ ‘ y p Pitkin Address Numbers are double (not integer) ZIPCodes are text (not numeric) About 6 Address Number Suffixes must separated from Address Numbers GlobalID is text (not GUID) Pueblo AddressUIDs are text (not numeric) and not unique ( ) q Routt Community Names One record with SubAddress length over 25 characters LandmarkName, FeatureType, and Location all contain A mix of functional descriptions and landmark names FeatureType should be functional values LandmarkName should be proper names of facilities Location should be necessary directions to arrive at the address correctly and successfully. San Luis Valley Address Numbers are text (not integer) Summit Address Numbers are text (not integer) Address Number Suffixes must be separated from Address Numbers
  19. 19. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Land Use types to LBCS Structure Codes Land Use types to LBCS Structure Codes Bldg_Type RES PARTIAL EXEMPT SINGLE FAMILY RESIDENTIAL SINGLE FAMILY DETACHED SINGLE FAMILY LOW‐RISE CONDOMINUM DETACHED CONDOMINUM MIXED USE RESIDENTIAL W/COMMERCIAL OR INDUST RES POLITICAL SUB EXEMPT LOW‐RISE CONDOMINIUM RES RELIGIOUS EXEMPT RES CHARITABLE EXEMPT RES STATE EXEMPT DETACTED SINGLE FAMILY DETACHED SINGLE FAMILY Mail dlvry at 1783 pe MIXED USE  RESIDENTIAL / RELIGIOUS EXEMPT MIXED USE ‐ RESIDENTIAL / RELIGIOUS EXEMPT MIXED USE RESIDENTIAL W/ COMMERCIAL OR INDUS RES COUNTY EXEMPT RESIDENTIAL‐AGRICULTURAL ATTACHED TOWNHOUSE ATTACHED CONDOMINUM NON‐CONFORMING RESIDENCE‐SINGLE FAMILY CONDOMINIUM LOW‐RISE APARTMENTS ATTACHED CONDOMINIUM ATTACHED CONDOMINIUM ATTACHED RESIDENTIAL DUPLEX TWO FAMILY RESIDENTIAL MID‐RISE CONDOMINUM HIGH‐RISE CONDOMINUM MOBILE HOME MOBILE HOME PARK MOBILE HOME SITE‐UNIMPROVED MOBILE HOME SITE MOBILE HOME (STRUCTURE ONLY) MOBILE HOME (STRUCTURE ONLY) 2 & 3 FAMILY UNIT APARTMENTS APARTMENT CONVERSION 4‐8 UNIT APARTMENTS MULTI‐UNITS(9 AND UP) CONDOMINUM CONVERSION 2‐4 FAMILY RESIDENTIAL APARTMENT MILITARY HOUSING RESIDENTIAL DORMATORIES/NURSING HOMES RESIDENTIAL SF ASSISTED LIVING RESIDENTIAL DORMNATORIES/NURSING HOMES RES PRIVATE SCHOOL EXEMPT RESIDENTIAL DORMATORIES/ NURSING HOMES HOTEL‐MOTELS Frequency LBCSStructCode LBSCStructDesc 3 1000Residential buildings 2 1100Single‐family buildings 1 1100Single‐family buildings 125848 1110Detached units 15858 1110Detached units 143 1110Detached units 128 1110Detached units 79 1110Detached units 56 1110Detached units 44 1110Detached units 29 1110Detached units 25 1110Detached units 12 1110Detached units 1 1110Detached units 1 1110Detached units 1110Detached units 1 1110Detached units 1 1110Detached units 1 1110Detached units 19400 1120Attached units 7201 1120Attached units 122 1120Attached units 65 1120Attached units 22 1120Attached units 14 1120Attached units 1120Attached units 4 1120Attached units 3 1121Duplex structures 2988 1140Townhouses 384 1140Townhouses 2063 1150Manufactured housing 42 1150Manufactured housing 4 1150Manufactured housing 2 1150Manufactured housing 1 1150Manufactured housing 1150M f t dh i 1836 1200Multifamily structures 1544 1200Multifamily structures 997 1200Multifamily structures 612 1200Multifamily structures 42 1200Multifamily structures 12 1200Multifamily structures 1 1200Multifamily structures 1 1200Multifamily structures 149 1310Barracks 57 1320Dormatories 44 1320Dormatories 6 1320Dormatories 3 1320Dormatories 1 1320Dormitories 75 1330Hotels, motels, and tourist courts
  20. 20. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Automated Processing 1. Means of Transfer 1 M fT f • data.colorado.gov – (Socrata w/Mondara) C l d i ) •O OpenColorado.org ( CKAN i l (a CKAN implementation) • (SSH protocol by October 2013) o t ose o spec ca y ust co t o access to data – For those who specifically must control access to data 2. Conversion to Common Data Model – – Correlate local address datasets to Common Data Model Correlate local address datasets to Common Data Model Conversion via scripting (SQL, VBScript, or Python) 3. Loading of data into state wide database 3 Loading of data into state‐wide database
  21. 21. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality Two Tracks: Two Tracks: 1. Develop criteria and measure quality • Develop quality measures in relation to ISO standards • Draw from measures in standards and practice 2. Compare for potential corrective actions • • • • Master Street Address Guide (MSAG) and ALI ( ) US Postal Service Address Quality Improvement DBs Statewide Voter Registration System (SCORE) Motorist Insurance Identification Database (MIIDB) ( ) Present criteria and comparisons by 4Q CY 2013   Address Working Group meeting Address Working Group meeting
  22. 22. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Address Data Quality Brainstorm
  23. 23. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality ‐ Status  Reviewed ISO Geographic information data quality elements Reviewed ISO Geographic information data quality elements   Developed quality measurement concepts related to ISO data quality – Added database integrity concepts (e.g. database normalization, referential  integrity, etc.) not well addressed in geographic quality standards to  data d t quality spreadsheet lit dh t  Itemized: – Tests in Chapter 4 Address Data Quality, FGDC‐STD‐016‐2011US…Address Data  Standard – Requirements for data quality from NENA 02‐014 GIS Data Collection and  Maintenance and NENA 71‐501 Synchronizing GIS Databases with MSAG and  ALI .   Compared datasets to identify completeness and currency, including: p y p y, g – Colorado State Address Dataset vs. Master Street Address Guide (MSAG) – Colorado State Address Dataset vs. USPS Coding Accuracy Support System  (CASS)
  24. 24. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality Measurement Concepts D Q li M C QualityElement completeness QualitySubelement commission omission logical consistency MeasurementConcept ComparisonwithImagery(recent) Duplicate features PermittingandLicensing DateofAddition ParcelIDRelationship CorrelationwithMSAG/ALI FieldVerification CorrelationwithMSAG/ALI FieldVerification CheckotherDatabasesinProcess CoordinationwithinProcesses KnowledgefromOthersespExperts ComparisonwithBuildingFootprints ConsistencywithPolicyonAddressAssignment conceptual consistency Database Normalization Entity Integrity Referential integrity  Domain integrity User‐defined integrity constraints domain consistency format consistency topological consistency Definition Notes Building Permits Business Licenses If ParcelID is unknown it is manually placed If ParcelID is unknown‐  it is manually placed Sampling Sampling InsertionwithinAddressRelatedProcesses Parcel vs. Address based software Prior to 2003 in Loveland ‐ inconsistency Third normal form (3NF), and most often  Sequence  Many‐to‐one Every non‐key a ribute field must provid 1 Every table must have a primary key Every field value in a table must exist as a value in another field in the database.  Speci Every element from a relation should respectthe range of values that the element can  Consistency (inconsistency) with assignment e.g. no zipcodes for addresses that don't h AdditionstoDomains domain list doesn't exist domain value that doesn't exist not correct domain duplicate domain values comission missing domain values omission field datatype e.g. only numeric values in a text field like address numbers being numeric field length field precision/scale e.g. lat/long values without sufficient significant digits order of fields Logical/PhysicalDataModelComparison Fields in the right FeatureClass FeatureClass in the right FeatureDataset duplicate fields commission must spec missing fields omission in referen respectivespecific want‐to‐have missing fields sensitive fields (not for public consumption) should be related but not contained in the Compliance to industry standards (FGDC, NENA, etc.) e.g. pre‐parsed addresses as per standard(s) e.g. no abbreviations, etc. Comparison of expressed complex elements with composed complex elements AddressNumberFishbonesMeasure LeftRightParity Esp. if generated from geometry.  If not, i Sequence of address assignment The identification of parity inconsistency  Extent of address ranges Inherently topological (with centerline rangeEsp. if gen new address without range upd MSAG Ranges must be equal to or fall within centerline ranges ALI  postitional accuracy absolute accuracy relative accuracy Nathan to write a white‐paper on sampling and positional accuracy (and completeness) measurement gridded data position accuracy temporal accuracy accuracy of a time measurement create date inherited date transaction date date last updated active matched/unmatched inactive proposed/retired "forensic" addressin effective date rollback? parent‐child temporal records temporal consistency
  25. 25. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Potential Tests: FGDC-STD-016-2011US…Address Data Standard QualityElement completeness completeness completeness completeness completeness completeness completeness logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency l i l i t logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency logical consistency g y logical consistency positional accuracy temporal accuracy temporal accuracy thematic accuracy thematic accuracy thematic accuracy thematic accuracy thematic accuracy thematic accuracy thematic accuracy thematic accuracy QualitySubelement commission commission omission omission omission omission omission conceptual consistency conceptual consistency conceptual consistency conceptual consistency conceptual consistency conceptual consistency conceptual consistency format consistency format consistency (relationship) format consistency (relationship) format consistency (relationship) topological consistency topological consistency topological consistency topological consistency t l i l i t topological consistency topological consistency topological consistency topological consistency topological consistency topological consistency topological consistency topological consistency topological consistency p g y domain consistency relative accuracy temporal consistency temporal consistency non‐quantitative attribute correctness non‐quantitative attribute correctness non‐quantitative attribute correctness quantitative attribute accuracy quantitative attribute accuracy quantitative attribute accuracy quantitative attribute accuracy quantitative attribute accuracy Reference 4.5.27 4.5.35 4.5.1 451 4.5.29 4.5.7 4.5.37 4.5.13 4.5.6 4.5.8 4.5.15 4.5.20 4.5.22 4.5.23 4.5.24 4.5.14 4.5.12 4.5.25 4.5.33 4.5.5 4.5.5.1 4.5.5.2 4.5.5.3 4553 4.5.5.4 4.5.5.5 4.5.9 4.5.10 4.5.11 4.5.16 4.5.19 4.5.30 4.5.31 4.5.34 4.5.2 4.5.4 4.5.32 4.5.21 4.5.3 4.5.28 4.5.17 4.5.26 4.5.36 4.5.38 4.5.18 MeasureName MeasureDescription RelatedElementUniquenessMeasure This measure checks the uniqueness of the values related to a given element, in either the UniquenessMeasure This measure tests the uniqueness of a simple or complex value. AddressCompletenessMeasure This measure compares the number of addressable objects with the address information  This measure compares the number of addressable objects with the address information RelatedNotNullMeasure This measure checks the completeness of data related to another part of the address.  AddressNumberRangeCompletenessMeasure Check for a low and high value in each Two Number Address Range or Four Number XYCoordinateCompletenessMeasure This measure checks for coordinate pairs with one member missing. The query produces CompleteElementSequenceNumberMeasure This measure requires assembling a complex element in order by Element Sequence AddressNumberParityMeasure Test agreement of the odd/even status of the numeric value of an address number with the AddressNumberRangeParityConsistencyMeasure Test agreement of the odd/even status of the numeric value of low and high address DeliveryAddressTypeSubaddressMeasure This measure checks for null Complete Subaddress values where the Delivery Address LeftRightOddEvenParityMeasure This measure tests the association of odd and even values in each Two Number Address LowHighAddressSequenceMeasure This measure confirms that the value of the low address is less than or equal to the high OfficialStatusAddressAuthorityConsistencyMeasure This measure tests logical agreement of the Official Status with the Address Authority. OverlappingRangesMeasure This measure checks the sequence of numbers where one non‐zero Two Number Address DataTypeMeasure This measure uses pattern matching to test for data types. It is common for delimited text CheckAttachedPairsMeasure This measure describes how to check Attached Element attributes set to "attached" for PatternSequenceMeasure This measure tests the sequence of values in each complex element for conformance to SubaddressComponentOrderMeasure This measure tests Subaddress Elements against the component parts in the order AddressNumberFishbonesMeasure This measure generates lines between addressed locations and the corresponding locations Addresses without fishbones This may show an address with a Complete Street Name value that doesn't match anything  Addresses with fishbones that touch other fishbones Address Number values may have been assigned out of order. Another possibility,  Addresses with fishbones that cross centerlines Add ith fi hb th t t li There may be inconsistencies in the Complete Street Name values recorded in the  Th b i i t i i th C l t St t N l d d i th Addresses with long fishbones These may indicate variations in street names that need to be resolved, especially when a  Addresses with suspected bowtie fishbones These frequently indicate address ranges that inappropriately begin with zero (0). AddressRangeDirectionalityMeasure* This measure derives Address Range Directionality values, allowing update to and/or AddressReferenceSystemAxesPointOfBeginningMeasure This measure checks for a common point to describe the intersection of the Address AddressReferenceSystemRulesMeasure Address Reference System layers are essential for both address assignment and quality DuplicateStreetNameMeasure In many Address Reference Systems distantly disconnected street segments with the IntersectionValidityMeasure Check intersection addresses for streets that do not intersect in geometry. SegmentDirectionalityConsistencyMeasure Check consistency of street segment directionality, which affects the use of Two Number SpatialDomainMeasure p This measure tests values of some simple elements constrained by domains based on p y TabularDomainMeasure This measure tests each value for a simple element for agreement with the corresponding AddressElevationMeasure This measure checks each elevation in an address point collection against polygons AddressLifecycleStatusDateConsistencyMeasure This measure tests the agreement of the Address Lifecycle Status with the development StartEndDateOrderMeasure Test the logical ordering of the start and end dates. LocationDescriptionFieldCheckMeasure This measure describes checking the location description in the field. AddressLeftRightMeasure This measure checks stored values describing left and right against those found by  RelatedElementValueMeasure This measure checks the logical consistency of data related to another part of the address. ElementSequenceNumberMeasure Element Sequence Number values must begin at 1 and increment by 1. This measure RangeDomainMeasure* This measure tests each Address Number for agreement with ranges. Address Number USNGCoordinateSpatialMeasure This measure tests the agreement between the location of the addressed object and the XYCoordinateSpatialMeasure This measure compares the coordinate location of the addressed object with the FutureDateMeasure This measure produces a list of dates that are in the future. TestOn AddressPtCollection; AddressPtCollection; AddressPtCollection AddressPtCollection StCenterlineCollection; AddressPtCollection AddressPtCollection; AddressPtCollection StCenterlineCollection; AddressPtCollection StCenterlineCollection StCenterlineCollection AddressPtCollection StCenterlineCollection AddressPtCollection; AddressPtCollection; AddressPtCollection AddressPtCollection AddressPtCollection AddressPtCollection AddressPtCollection AddressPtCollection Add PtC ll ti AddressPtCollection AddressPtCollection StCenterlineCollection; StCenterlineCollection StCenterlineCollection StCenterlineCollection StCenterlineCollection StCenterlineCollection AddressPtCollection; ; AddressPtCollection; AddressPtCollection AddressPtCollection AddressPtCollection AddressPtCollection StCenterlineCollection AddressPtCollection AddressPtCollection AddressPtCollection AddressPtCollection AddressPtCollection AddressPtCollection TestAgainst AddressPtCollection; AddressPtCollection; AddressPtCollection AddressPtCollection StCenterlineCollection; AddressPtCollection AddressPtCollection; StCenterlineCollection StCenterlineCollection; AddressPtCollection StCenterlineCollection StCenterlineCollection AddressPtCollection StCenterlineCollection; AddressPtCollection; AddressPtCollection; AddressPtCollection AddressPtCollection StCenterlineCollection Available t StCenterlineCollection StCenterlineCollection StCenterlineCollection StC t li C ll ti StCenterlineCollection StCenterlineCollection StCenterlineCollection; AddressReferenceSystem AddressReferenceSystem StCenterlineCollection StCenterlineCollection StCenterlineCollection AddressPtCollection; ; AddressPtCollection; AddressPtCollection AddressPtCollection AddressPtCollection AddressPtCollection StCenterlineCollection StCenterlineCollection AddressPtCollection StCenterlineCollection AddressPtCollection AddressPtCollection AddressPtCollection
  26. 26. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality Requirements: NENA 02-014 GIS Data Collection and Maintenance QualityElement positional accuracy positional accuracy positional accuracy positional accuracy positional accuracy positional accuracy positional accuracy QualitySubelement relative accuracy relative accuracy relative accuracy relative accuracy relative accuracy relative accuracy relative accuracy completeness commission; omission thematic accuracy non‐quantitative attribute correctness temporal accuracy temporal validity Reference 2.1 2.2 2.2 3.1 3.1 3.1 3.1 4 4.1 4.1.1 4.1.1 4.1.2 4.1.3 4.1.4 4.2 logical consistency format consistency 4.3 4.3 thematic accuracy non‐quantitative attribute correctness 4.3 (i) completeness commission 4.3 (ii) logical consistency topological consistency 4.3.1 (iii) logical consistency topological consistency 4.3.2 (iii) thematic accuracy non‐quantitative attribute correctness 4.3.3 (iii) thematic accuracy non‐quantitative attribute correctness 4.3.1 (iv) logical consistency topological consistency 4.3.2 (iv) logical consistency topological consistency 4.3.3 (iv) logical consistency conceptual consistency (referential integ 4.3.4 (i) logical consistency topological consistency 4.3.4 (ii) thematic accuracy non‐quantitative attribute correctness 4.3.4 (iii) logical consistency topological consistency 4.3.4 (iv) logical consistency topological consistency 4.3.5 (v) 5.1 temporal accuracy temporal validity 5.1 temporal accuracy temporal validity 5.1 temporal accuracy temporal validity 5.1 5.1 temporal accuracy temporal validity 5.2 temporal accuracy temporal validity 5.2 5.2 positional accuracy relative accuracy 5.2 (1) thematic accuracy quantitative accuracy temporal accuracy temporal validity 5.2 (2) 5.2 (3) temporal accuracy temporal validity 5.2 (4) temporal accuracy temporal validity 5.2 (5) thematic accuracy non‐quantitative attribute correctness 5.2 (7) temporal accuracy temporal validity 5.2 (8) temporal accuracy temporal validity 5.2 (9) thematic accuracy quantitative accuracy 5.2 (10) thematic accuracy non‐quantitative attribute correctness 5.2 (11) metadata 5.2 (12) logical consistency topological consistency 5.2 temporal accuracy temporal validity 5.3 temporal accuracy temporal validity 5.3 completeness commission; omission 5.3 thematic accuracy non‐quantitative attribute correctness temporal accuracy temporal validity 5.3 Requirement RequirementDescription Shall meet NMAS for 1:5000 The overall accuracy of GIS vector data shall meet National Map Accuracy Standards at 1:5000 Source 1:24000 or less Source map data standards are … 1:24,000 or better shall be the standard for GIS vector data Source ortho 1:2400 or less Digital Orthoimagery data or raster data standard shall be 1:2400 or better GPS data collected 10 feet horizontal accuracy at 95% confidence GPS data shall be collected with accuracy of 10 feet (3.048 meters) or less 95% of the time  Minimum 30 positions collected at 1 second intervals For point features it is recommended that a minimum of 30 positions be collected at 1  At least 4 satellites One should always acquire at least 4 satellites. Differentially corrected ALWAYS do differential corrections, either real‐time or post processed. Annual validation(s): … attributes and spatial features of the GIS data shall be validated at a minimum of once a  Validate against Automatic Location Inforamtion (ALI) and the  … compar[e] the GIS data and either the entire ALI Data Base or a data base of daily service  Master Street Address Guide (MSAG) order changes that have “passed edit entry,” Compare GIS data to ALI …ensure … that each ALI address is represented in the GIS data layer. … IF ALI data is not  Compare GIS data to MSAG …compare GIS data with MSAG road ranges, road names and Emergency Service Numbers  Compare GIS data to tax assessment information Compare address points and road centerline address ranges with tax assessment  Compare GIS data to utility meter address information …ensure that the GIS data … includes site and street address information that corresponds to  Compare GIS data to building permit issuances … verify address ranges on a GIS centerline dataset … [with] points located on building with  Compare GIS data to orthoimagery or sattelite imagery … for positional accuracy validation Assure that the GIS data includes Sites, Roads, Road Names and At a minimum, the digital mapping system shall include the following GIS data and spatial  Assure that the following audits are performed for each of thes audits with accompanying metadata: Check for valid attribute values  for Sites, Roads, Road Names Assure no duplication of any feature  for Sites, Roads, Road Names Assure parity of addresses is consistent (right side odd, left ev for Sites and Roads Assure line segment connectivity  for Roads (vital for network and routing analysis) Assure road names in MSAG/ALI agree with road names in GIS  for Road Names Assure site addresses match ALI Data Base  for Sites Eliminate overlapping address ranges within the same ESN  for Roads (and perhaps Sites) Assure address ranges includes all ranges in MSAG  for Roads Assure that relationships between ESZs, ESNs, and ESAs are coEmergency Service Zones (ESZs), Emergency Service Numbers (ESNs), and Emergency  Remove empty (null/sliver) polygons Emergency Service Zones (ESZs) Assure ESZ and ESN information matches to MSAG/ALI Emergency Service Zones (ESZs) Assure coincident geometry within ESZ layer Eliminate gap and overlapping polygons in Emergency Service Zones (ESZs) Assure coincident geometry between ESZs and jurisdictional bESZ Boundaries should be joined to jurisdictional boundaries where appropriate (e.g. roads, r Consistently apply a program to identify and correct errors … a consistently applied program to identify and correct errors. Provide timely updates to telecommunicators Timeliness of the update of the GIS data is key to maintaining an accurate map data layer …  Provide timely updates to PSAPs The updated GIS data layer shall be provided to the PSAPs in a timely manner. Provide GIS updates within five business days of address receipt … GIS updates be processed as part of the Enhanced 9‐1‐1 GIS data within five business days  Personnel must be qualified and trained to maintain GIS data … personnel [must be] qualified and trained to maintain GIS data [and] must understand the  Update road centerlines as structures are constructed or demolis As structures are constructed or demolished, the road centerline layer must be updated to  Update one‐way and closed roads as they occur Maintenance of the one‐way and closed roads within the map’s road centerline layer  To geocode accurately, maintain in road centerline layers: In order to geocode accurately, the road centerline layer requires maintenance of: coordinate locations coordinate locations name changes name changes new roads new roads changed addressing start/end points changed addressing start/end points the turn‐table (one‐way status) the turn‐table (one‐way status) road classifications for symbology and routing (including overpa road classifications for symbology and routing (including overpass/underpass/no‐turn attrib address range changes address range changes municipality annexations municipality annexations speed limit or impedance field speed limit or impedance field municipal route number field; and municipal route number field; and source data field, source data field. Where possible, use road centerlines as boundary lines for ESZs It is recommended that the ESZ boundary be joined to the road centerline where the road for Add or delete address points as buildings are constructed or demAs buildings are constructed or demolished, SITE points need to be added or deleted. Modify address points when sites require a change in address.  Existing sites may require a change of address. Every ALI address is represented in GIS … every address in the ALI Data Base matches to an address in the GIS data layer. (see also 4.1 Receive notices of change of address from: 1) the addressing authority, 2) as a discrepancy between the GIS data and a service order chan
  27. 27. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality Requirements: NENA 71-501 Synchronizing GIS Databases with MSAG and ALI QualityElement QualitySubelement temporal accuracy accuracy of a time measurement thematic accuracy non‐quantitative attribute correctness thematic accuracy non‐quantitative attribute correctness thematic accuracy thematic accuracy thematic accuracy thematic accuracy completeness thematic accuracy completeness completeness non‐quantitative attribute correctness quantitative attribute accuracy non‐quantitative attribute correctness non‐quantitative attribute correctness commission non‐quantitative attribute correctness omission omission thematic accuracy thematic accuracy thematic accuracy thematic accuracy thematic accuracy thematic accuracy thematic accuracy thematic accuracy logical consistency non‐quantitative attribute correctness non‐quantitative attribute correctness non‐quantitative attribute correctness quantitative attribute accuracy non‐quantitative attribute correctness non‐quantitative attribute correctness classification correctness classification correctness domain consistency thematic accuracy non‐quantitative attribute correctness completeness commission Reference Requirement RequirementDescription 1 Synchronization should be performed by qualified staff The synchronization process of the GIS data is most reliably accomplished by qualified, traine 3 Develop a process to identify and quickly correct errors ...develop a process that will consistently identify errors or discrepancies in the data and quic 3 Perform an analysis of discrepencies, estimate time, and then correct errors The amount of time to correct the data and eliminate errors cannot be estimated until an  3 Temporal accuracy and speedy updating is essential All GIS, MSAG, and ALI data must be continuously updated with the newest information and  3 Consolidate and standardize (independently) both MSAG and GIS data an agency specific workflow can be implemented to consolidate and standardize the MSAG  3 Compare MSAG and GIS data for accuracy and completeness: Once the MSAG and GIS databases are standardized, they need to be compared for accuracy  3 Prepare and standardize data, make initial corrections, synchronize,  prepare dThe … synchronization process … can be broken down into Data Preparation, Data  3 GIS and MSAG must match 98% of the time before being used for ERDB or LoST… a minimum match rate of 98% be set prior to using the GIS data in the Emergency Routing  3.1 Standardize and quality review GIS street centerline and MSAG data before co Standardization and quality control processing must take place on the GIS street centerline  3.1.1 Compare MSAG and GIS data and identify: A ... comparison of ... GIS street centerline data and the MSAG will identify …  3.1.1 Different road naming conventions Different road naming conventions 3.1.1 Inaccurate address ranges Inaccurate address ranges 3.1.1 Improper MSAG Community designations Improper MSAG Community designations 3.1.1 Improper Postal Community designations Improper Postal Community designations 3.1.1 Improper Exchange designations Improper Exchange designations 3.1.1 Incorrect ESN assignments Incorrect ESN assignments 3.1.1 Incomplete or missing records Incomplete or missing records 3.1.1 Road segments w/o addressed structures found in GIS but not in MSAG Roads may be in the GIS that are not in the MSAG because the GIS roads do not have 3.1.1 Standardize GIS street centerlines and the MSAG as follows: Standardization of the GIS road centerline data and the MSAG data should incorporate the  3.1.1 Use only the eight cardinal/bi‐cardinal directions and their abbreviations N, S, E, W, NE, NW, SE, or SW are the only prefix and suffix directional abbreviations which  3.1.1 Avoid all punctuation All punctuation should be avoided. 3.1.1 Eliminate special characters Remove special characters (dash, underscore, apostrophe, quotes or any other special  3.1.1 Use only whole numbers in house number fields Use only whole numbers in the house number fields (fractional house numbers belong in  3.1.1 Spell‐out street names as assigned by addressing authorities Use complete spelling of the legal street name assigned by the addressing authority (e.g.  3.1.1 Spell‐out Postal or Community names Spell out the complete MSAG and Postal Community name. 3.1.1 Abbreviate directions only when they are not part of the street name Prefix directional is only abbreviated when not part of the actual street name (North Dr wou 3.1.1 Post directional abbreviated when they are not the actual street name. (Lone Pine Dr South 3.1.1 Only abbreviate USPS Pub 28 Appx C1‐listed suffixes Standardize street suffix according USPS Publication No. 28 – Appendix C1  3.1.1 Standardization of address information is for data interoperability and exchan … standardization must take place on the 9‐1‐1 databases to ensure interoperability and to  3.1.1 Encourage best practice address standardization in address authorities ...educate the local addressing authorities that standardization will improve quality, lower  3.1.1 All MSAG, ALI, and GIS road naming conventions must be consistent The street naming conventions should be consistent in the GIS street centerline, the MSAG  3.1.1 Standardization of address information must occur both with MSAG and GIS The standardization process should take place in both the MSAG and the GIS databases. 3.1.1 Agree to the number of changes that can occur per unit of time Since the number of changes to the databases may be quite high, all involved parties must  3.1.1 Request the MSAG Request the MSAG from your Data Base Management System provider. 3.1.1 Load MSAG into worksheet or database Load the MSAG into a worksheet or database format, with each field being in a separate colum 3.1.1 Save the file Save the MSAG file (e.g. Initial MSAG). 3.1.1 Save a copy of the file Save another copy of the MSAG under a different name (e.g. Copy of MSAG). 3.1.1 Open the copy Open the copy of the MSAG (e.g. Copy of MSAG). 3.1.1 Only remove records from the copy Do not delete any records out of the original MSAG, only removing certain records from the "C 3.1.1 Sort by MSAG Community, delete FX records, and flag unpopulated MSAG CSort the data by MSAG COMMUNITY and delete any FX Records in the “Copy of MSAG”. Make n
  28. 28. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY USPS Address  USPS Address Quality Improvement Processes 1. Locatable Address Conversion System (LACS) 1 Locatable Address Conversion System (LACS) – 911 address conversions from rural to street addresses 2. Coding Accuracy Support System (CASS) – Correction or Addition of ZIPCode+4 – Validation of Postal Place Names and States – Street Names, Street Types, and Directionals • Conflicts identify that corrections may be needed for either address  authorities/9‐1‐1 or postal service, etc. 3. Delivery Point Validation (DPV) – IsMailableAddress = “Yes” l bl dd “ ” 4. Address Element Correction (AEC) – To be determined…
  29. 29. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY USPS Comparisons: p Coding Accuracy Support System  (CASS) (CASS) (CASS) (CASS)
  30. 30. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality: USPS State  Street segments with more than one name h h  AEC II – May resolve addresses with multiple Street  Names? – May identify addresses without mail delivery (e.g.  P.O. Boxes)  CASS Errorcodes 412, 413, 491 are not clearly  understood in all cases
  31. 31. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality Elements ISO 19157 Geographic information  Data quality defines comprehensive definitions and testing guidance to measure  ISO 19157 Geographic information ‐ Data quality defines comprehensive definitions and testing guidance to measure data quality: completeness: presence and absence of features, their attributes and relationships • • commission: excess data present in a dataset omission: data absent from a dataset logical consistency: degree of adherence to logical rules of data structure, attribution and relationships (data structure can be  g y g g , p ( conceptual, logical or physical) • • • • conceptual consistency: adherence to rules of the conceptual schema domain consistency: adherence of values to the value domains format consistency: degree to which data is stored in accordance with the physical structure of the dataset topological consistency: correctness of the explicitly encoded topological characteristics of a dataset positional accuracy: accuracy of the position of features • • • absolute (or external) accuracy: closeness of reported coordinate values to values accepted as or being true b l ( l) l f d d l l d b relative (or internal) accuracy: closeness of the relative positions of features in a dataset to their respective relative positions accepted  as or being true gridded data position accuracy: closeness of gridded data position values to values accepted as or being true. temporal accuracy: accuracy of the temporal attributes and temporal relationships of features • • • accuracy of a time measurement: correctness of the temporal references of an item (reporting of error in time measurement) temporal consistency: correctness of ordered events or sequences, if reported temporal consistency: correctness of ordered events or sequences, if reported temporal validity: validity of data with respect to time thematic accuracy: accuracy of quantitative attributes and the correctness of non‐quantitative attributes and of the classifications of  features and their relationships. • • • classification correctness: comparison of the classes assigned to features or their attributes to a universe of discourse (e.g. ground  truth or reference dataset) non‐quantitative attribute correctness: correctness of non‐quantitative attributes, quantitative attribute accuracy: accuracy of quantitative attributes quantitative attribute accuracy acc rac of q antitati e attrib tes
  32. 32. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Sampling Size Sample Size and Confidence Interval Tutorial The confidence interval (commonly referred to as the margin of error or error rate) is the plus-or-minus figure you hear mentioned relative to surveys or opinion polls. For example, if you use a confidence interval of 4 and 47% percent of your sample picks an answer you can be "sure" that if you had asked the question of the entire relevant population between 43% (47 4) and 51% (47+4) would have picked that answer Most (47-4) answer. researchers prefer a confidence interval of less than 4 percentage points. The confidence level tells you how sure you can be. Expressed as a percentage, it represents how often the true percentage of the population who would pick an answer lies within the confidence interval. The 95% confidence level means you can be 95% certain; th 99% confidence level means you can b 99% certain. fid l l b t i the fid l l be t i Most researchers use the 95% confidence level. When you put the confidence level and the confidence interval together, you can say (for example) that you are 95% sure that the true percentage of the population is between 43% and 51%. The wider the confidence interval (higher margin of error) you are willing to accept, the more certain you can be that the whole population answers would be within that range. For example, if you asked a sample of 1000 people in a city which brand of cola they preferred, and 60% said Brand A, you can be very certain that between 40 and 80% (80% confidence interval) of all the people in the city actually do prefer that brand. However, you cannot be so sure th t between 59 and 61% (99% confidence i t H tb that b t d fid interval) of the people in th city l) f th l i the it prefer the brand. http://williamgodden.com/tutorial.pdf
  33. 33. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Sampling Size http://williamgodden.com/samplesizeformula.pdf
  34. 34. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Sampling Size http://williamgodden.com/samplesizeformula.pdf
  35. 35. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Sampling Size With a confidence interval of 3 percentage points and a 95 % confidence level: PlacePreMod PlaceName PlacePost AddressPopu InFiniteSamp FiniteSampl Mod lation2012 leSize2012 eSize2012 PlacePre Mod PlaceName PlacePostMod AddressPopu InFiniteSampl FiniteSampl lation2012 eSize2012 eSize2012 Arapahoe County 209531 1067 1062 Otero County Archuleta County 7805 1067 939 Ouray County 3379 1067 811 Baca County Park County 15414 1067 998 Bent County County Chaffee County Cheyenne County Crowley County Phillips 1067 1028 Prowers County 12822 1067 985 Pueblo P bl County C 84672 1067 1054 Rio Grande County 3532 1067 820 Routt County 14170 1067 992 San Juan County County Custer 716 County Clear Creek 1067 28164 City And County of Broomfield 2171 1067 886 1067 916 1067 1063 County Sedgwick County Summit County 20485 1067 1014 Weld 6485 258910 City And County of Denver Dolores 5215 County 100399 1067 1056 County Douglas County 121194 1067 1058 Yuma Eagle County 34736 1067 1035 City of Aspen Garfield County 20269 1067 1014 City of Centennial Gilpin County 3548 1067 821 Grand County 21051 1067 1016 Huerfano County Jackson County Kiowa County Kit Carson La Plata 1067 947 1067 1041 5808 1067 902 215521 1067 1062 County 3290 1067 806 County 30174 1067 1031 Lake County 7702 1067 937 Larimer County 156186 1067 1060 City of Commerce City 24674 1067 1023 City of Fort Collins 80913 1067 1053 City of Grand Junction County Jefferson 8427 42772 74767 1067 1052 22565 1067 1019 254898 1067 1063 Fremont County Regional GIS Authority El Paso‐Teller County Enhanced 911 Authority Las Animas County Emergency Telephone Service Authority 10059 1067 965 San Luis Valley Emergency Telephone Service Authority 18254 1067 1008 Lincoln County 3169 1067 798 Logan County 16600 1067 1003 Denver County 1 School District 272600 1067 1063 Moffat County 6175 1067 910 North Central All‐Hazards Region 1333483 1067 1066 Montezuma County 15819 1067 1000 Southern Ute Indian Reservation 588 1067 379 Morgan County West Region GIS Group 52808 1067 1046
  36. 36. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Sampling Method 1. Randomly select 5 address points d l l dd 2. Select road segments associated with  address points 3. Select adjacent connected road segments j g 4. Select the address points associated with the  selected road segments selected road segments 5. Repeat steps 3 & 4 until sample size is  exceeded
  37. 37. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Sampling Method 1. Randomly select of 5 address points y p
  38. 38. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Sampling Method 2. Select road segments associated with address points g p
  39. 39. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Sampling Method 3. Select adjacent connected road segments j g
  40. 40. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Sampling Method 4. Select the address points associated with the selected road segments
  41. 41. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Sampling Method 5. Repeat steps 3 & 4 until sample size is exceeded
  42. 42. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Sampling Method 1000 900 800 700 600 500 Address Points 400 Road Centerlines 300 200 100 0 1 Address Points Road Centerlines 2 5 3 32 6 4 255 101 5 6 369 115 7 539 172 816 239 926 303
  43. 43. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Sampling Method Issues:  Sample selection – why start with 5? – Constrained for ease of collection  Bias – how random must the sample be? – Confidence interval is set at 3 percentage points (high) – Will a larger sample size mitigate bias (how much)? – If so is the actual confidence interval and confidence level lower? If so, is the actual confidence interval and confidence level lower?  Does the sample capture adequately the structure of the population? – Distribution of the population is assumed to be very similar to the distribution  p p y in the dataset – Distribution of address points correlates more highly to the distribution of  road centerlines than random points on a plane
  44. 44. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality Elements ISO 19157 Geographic information  Data quality defines comprehensive definitions and testing guidance to measure  ISO 19157 Geographic information ‐ Data quality defines comprehensive definitions and testing guidance to measure data quality: completeness: presence and absence of features, their attributes and relationships • • commission: excess data present in a dataset omission: data absent from a dataset logical consistency: degree of adherence to logical rules of data structure, attribution and relationships (data structure can be  g y g g , p ( conceptual, logical or physical) • • • • conceptual consistency: adherence to rules of the conceptual schema domain consistency: adherence of values to the value domains format consistency: degree to which data is stored in accordance with the physical structure of the dataset topological consistency: correctness of the explicitly encoded topological characteristics of a dataset positional accuracy: accuracy of the position of features • • • absolute (or external) accuracy: closeness of reported coordinate values to values accepted as or being true b l ( l) l f d d l l d b relative (or internal) accuracy: closeness of the relative positions of features in a dataset to their respective relative positions accepted  as or being true gridded data position accuracy: closeness of gridded data position values to values accepted as or being true. temporal accuracy: accuracy of the temporal attributes and temporal relationships of features • • • accuracy of a time measurement: correctness of the temporal references of an item (reporting of error in time measurement) temporal consistency: correctness of ordered events or sequences, if reported temporal consistency: correctness of ordered events or sequences, if reported temporal validity: validity of data with respect to time thematic accuracy: accuracy of quantitative attributes and the correctness of non‐quantitative attributes and of the classifications of  features and their relationships. • • • classification correctness: comparison of the classes assigned to features or their attributes to a universe of discourse (e.g. ground  truth or reference dataset) non‐quantitative attribute correctness: correctness of non‐quantitative attributes, quantitative attribute accuracy: accuracy of quantitative attributes quantitative attribute accuracy acc rac of q antitati e attrib tes
  45. 45. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Completeness  O i i Omissions ‐ agreed upon by many as the principle threat d b th i i l th t  Field collection will help measure the completeness of the  population through stats population through stats  Comparisons with other data sets will aid in finding and  resolving these omissions: • Master Street Address Guide (MSAG) and ALI • US Postal Service Address Quality Improvement DBs • Statewide Voter Registration System (SCORE) • Motorist Insurance Identification Database (MIIDB)
  46. 46. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Positional Accuracy  P iti Positional accuracy will be measured from sampling using  l ill b df li i the National Standard for Spatial Data Accuracy (NSSDA)  Primary entrances are assumed to be the “target” Primary entrances are assumed to be the  target – Even structure or parcel “centroids” positional accuracy will be  relative (does it matter if the point is 20 feet from door if you see  the door?) the door?)
  47. 47. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Temporal Accuracy  Wh t temporal information is being reported What t li f ti i b i t d (eg. attributes or metadata) if at all? – D t d t provided t St t of Colorado Date data id d to State f C l d – Standard dates provided in metadata (CSDGM) – D t ( d ti ) i f Date (and time) information i attributes ti in tt ib t  An inventory of data will be taken  A scale for > to < temporal info will be created
  48. 48. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Thematic Accuracy Increasing  Value CompleteStreetNumber Complete Correct* Complete Correct* Populated Correct* Populated P l d Null Populated Null CompleteStreetName Complete Complete Correct* Correct* Correct* Populated Populated P l d Populated Null Null Score 100 80 70 60 50 40 30 20 10 0 *Correct means that most people can likely correctly  p p y y deduce the complete street name from the value provided
  49. 49. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Logical Consistency  Fishbones!
  50. 50. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Logical Consistency
  51. 51. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Logical Consistency
  52. 52. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Data Quality – Logical Consistency
  53. 53. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Credits  A very special thanks to: – – – – – – – – – Rick Smajter, City of Durango Robb Menzies, Denver Public Schools , Matt Goetsch, City of Montrose Cindy Jones, Park County Heather Lassner, Cit of Loveland H th L City f L l d Bob Bush, Fremont County GIS Authority Mary Kunkel, (formerly of) El Paso-Teller E911 Paso Teller Pete Magee, San Luis Valley GIS/GPS Authority Mike Sexson and Kris Schley, State of Colorado Integrated Document Solutions (IDS)
  54. 54. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY References
  55. 55. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Colorado St t Address D t C l d State Add Dataset W b it t Websites  Governor's Office of Information Technology Colorado Broadband Data and Development Program  Colorado State Address Dataset (public site)
  56. 56. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Colorado St t Address D t C l d State Add Dataset W b it t Websites  Colorado Address Working Group (access controlled)
  57. 57. GOVERNOR’S OFFICE OF INFORMATION TECHNOLOGY Colorado State Address Dataset Nathan Lowry, Colorado OIT N h L C l d OIT October 9, 2013 Nathan Lowry, GIS Outreach Coordinator State of Colorado, Governor's Office of Information Technology 601 East 18th Avenue, Suite 220, Desk D‐23, Denver, CO 80203‐1494 303.764.7801 nathan.lowry@state.co.us, http://www.colorado.gov/oit How am I doing? Please contact my manager Jon Gottsegen (Jon.Gottsegen@state.co.us) for comments or questions. g y g J g (J g ) q
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×