SlideShare a Scribd company logo
1 of 11
Download to read offline
Colorado State Address Dataset 
Automated Processing 
Nathan Lowry, GIS Outreach Coordinator 
State of Colorado 
September 23, 2014
Common Data Model 
● Allows local and state-wide querying, analysis, and integration … 
● Accommodates information exchanges 
▪ Hierarchical - City to County, County to Region, Region to State 
▪ Among neighboring jurisdictions (eg. County to County, etc.) 
● Allows profiles to provide data in standard forms for specific 
objectives 
▪ NENA CLDXF for NG-911 
▪ USPS Pub-28 for CASS 
▪ ArcGIS Geocoding (for quality comparisons, etc.) 
● It’s more efficient (less work) and assures more quality (less loss)
FGDC-STD-016-2011 
United States Thoroughfare, Landmark, and Postal Address Data Standard 
Of Greatest Significance: 
1.Everything* is ‘fully explicit’ (fully spelled‐out) 
No abbreviations allowed; No Ambiguity 
*The only exception is two‐letter state postal codes (eg. “CO” = Colorado) 
●2.You will express exactly how each address will be parsed 
Parsing is no longer subject to interpretation 
The break‐down is stored in the data for each record 
3.Each Address must be assigned a Unique Identifier (UID) 
Multiple representations of the same address can be “tied 
together” if and only if (iff) addresses are assigned UIDs. 
These are big changes that few have yet implemented 
•Our common data model is designed to accommodate both: 
‒your current state and 
‒this “to be” state
Presuppositions: 
● SQL Server Integration Services (SSIS) 
o Parallel processing - fast translations - True. 
o Most Compatible with SQL Server - Irrelevant* 
o Developed by DBAs for DBAs - No, developed by app 
developers for app developers 
▪ (ie. Normalization tools) - Hah, hah, hah, hah, 
hah! 
o No Additional Cost - (This one bore out) 
o I learned French instead of Spanish - (SSIS instead of 
Python) 
● No Parsing 
o I will translate, but it’ll be the locals’ responsibility to 
pre-parse... - No parsing, no geocoding* 
o In addition, no last lines, no geocoding* 
● 6-8 Weeks Processing - 6-8 Months of Processing
Automating Processes
Colorado State Address Dataset 
Automated and Manual Processes
Automating Processes
Observations 
● SQL Server Integration Services (SSIS) 
○ SSIS is quirky 
○ SSIS Expression Language is Swahili 
○ A modeling canvas may be more effective for design 
○ SSIS can integrate with many other server processes (FTP) 
● Parsing and “Last Lining” will give CO jurisdictions a 
leg up 
○ The level of effort can be significant 
○ CLDXF Street Naming and Address Numbering Conventions 
● Standards 
○ Jurisdictional pretypes, sequencers - minor tweaks 
○ Subaddress conventions need ... something
Opportunities 
● Standards 
○ Improvement via implementation 
○ Coalescence on Subaddresses 
● Common implementations of data models 
○ Reduce the cost of development 
○ Makes sharing of code useful and possible 
● Common code 
○ Shared parsing tools 
○ Shared applications
Questions? 
Thank You!

More Related Content

Viewers also liked (9)

hollee r10
hollee r10hollee r10
hollee r10
 
Diploma in quality management
Diploma in quality managementDiploma in quality management
Diploma in quality management
 
Scientific notation pop quiz
Scientific notation pop quizScientific notation pop quiz
Scientific notation pop quiz
 
FutureCafé
FutureCaféFutureCafé
FutureCafé
 
как малката кухня да изглежда по просторна
как малката кухня да изглежда по просторнакак малката кухня да изглежда по просторна
как малката кухня да изглежда по просторна
 
Reggia di caserta interni 2
Reggia di caserta interni 2Reggia di caserta interni 2
Reggia di caserta interni 2
 
Merger agrrement BT4
Merger agrrement BT4Merger agrrement BT4
Merger agrrement BT4
 
Les propietats de la matèria (angel guimera's conflicted copy 2011 11-29)
Les propietats de la matèria (angel guimera's conflicted copy 2011 11-29)Les propietats de la matèria (angel guimera's conflicted copy 2011 11-29)
Les propietats de la matèria (angel guimera's conflicted copy 2011 11-29)
 
Los 10 Principios Heurísticos de Nielsen
Los  10 Principios Heurísticos de Nielsen Los  10 Principios Heurísticos de Nielsen
Los 10 Principios Heurísticos de Nielsen
 

Similar to Colorado State Address Dataset Automated Processing

2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
GIS in the Rockies
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
Itai Yaffe
 
How We Use Functional Programming to Find the Bad Guys
How We Use Functional Programming to Find the Bad GuysHow We Use Functional Programming to Find the Bad Guys
How We Use Functional Programming to Find the Bad Guys
New York City College of Technology Computer Systems Technology Colloquium
 

Similar to Colorado State Address Dataset Automated Processing (20)

2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
2013 GISCO Track, Quality Assessment and Improvement for Addressed Locations ...
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
An Introduction to MapReduce
An Introduction to MapReduce An Introduction to MapReduce
An Introduction to MapReduce
 
Locality Sensitive Hashing By Spark
Locality Sensitive Hashing By SparkLocality Sensitive Hashing By Spark
Locality Sensitive Hashing By Spark
 
Handling the growth of data
Handling the growth of dataHandling the growth of data
Handling the growth of data
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Big Data processing with Apache Spark
Big Data processing with Apache SparkBig Data processing with Apache Spark
Big Data processing with Apache Spark
 
Druid
DruidDruid
Druid
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
Streamsets and spark at SF Hadoop User Group
Streamsets and spark at SF Hadoop User GroupStreamsets and spark at SF Hadoop User Group
Streamsets and spark at SF Hadoop User Group
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's RoadmapA Day in the Life of a Druid Implementor and Druid's Roadmap
A Day in the Life of a Druid Implementor and Druid's Roadmap
 
Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial data
 
Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
 
NoSQL for Artificial Intelligence
NoSQL for Artificial IntelligenceNoSQL for Artificial Intelligence
NoSQL for Artificial Intelligence
 
How We Use Functional Programming to Find the Bad Guys
How We Use Functional Programming to Find the Bad GuysHow We Use Functional Programming to Find the Bad Guys
How We Use Functional Programming to Find the Bad Guys
 
Geospatial Options in Apache Spark
Geospatial Options in Apache SparkGeospatial Options in Apache Spark
Geospatial Options in Apache Spark
 
AS-STATS
AS-STATSAS-STATS
AS-STATS
 
Hybrid Databases - PHP UK Conference 22 February 2019
Hybrid Databases - PHP UK Conference 22 February 2019Hybrid Databases - PHP UK Conference 22 February 2019
Hybrid Databases - PHP UK Conference 22 February 2019
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 

More from GeCo in the Rockies

Fusion of Geodesy and GIS at NOAA as NGS
Fusion of Geodesy and GIS at NOAA as NGSFusion of Geodesy and GIS at NOAA as NGS
Fusion of Geodesy and GIS at NOAA as NGS
GeCo in the Rockies
 
Stone national spatial reference system heights
Stone national spatial reference system   heightsStone national spatial reference system   heights
Stone national spatial reference system heights
GeCo in the Rockies
 
Edwards frontier precision terrestrial imagingandmeasurement
Edwards frontier precision terrestrial imagingandmeasurementEdwards frontier precision terrestrial imagingandmeasurement
Edwards frontier precision terrestrial imagingandmeasurement
GeCo in the Rockies
 
Siddle connecting surveying and mgis to mesa countys rtrn
Siddle connecting surveying and mgis to mesa countys rtrnSiddle connecting surveying and mgis to mesa countys rtrn
Siddle connecting surveying and mgis to mesa countys rtrn
GeCo in the Rockies
 
Londe mobile devices appropriate uses
Londe mobile devices appropriate usesLonde mobile devices appropriate uses
Londe mobile devices appropriate uses
GeCo in the Rockies
 
Lowry colorado state address dataset data quality
Lowry colorado state address dataset data qualityLowry colorado state address dataset data quality
Lowry colorado state address dataset data quality
GeCo in the Rockies
 
Vetter employee residence reports weld county
Vetter employee residence reports weld countyVetter employee residence reports weld county
Vetter employee residence reports weld county
GeCo in the Rockies
 
Caldwell community sustainability and land use policy
Caldwell community sustainability and land use policyCaldwell community sustainability and land use policy
Caldwell community sustainability and land use policy
GeCo in the Rockies
 
Behunin and lasslo inexpensive mobile mapping solutions
Behunin and lasslo inexpensive mobile mapping solutionsBehunin and lasslo inexpensive mobile mapping solutions
Behunin and lasslo inexpensive mobile mapping solutions
GeCo in the Rockies
 

More from GeCo in the Rockies (20)

Fusion of Geodesy and GIS at NOAA as NGS
Fusion of Geodesy and GIS at NOAA as NGSFusion of Geodesy and GIS at NOAA as NGS
Fusion of Geodesy and GIS at NOAA as NGS
 
Stone national spatial reference system heights
Stone national spatial reference system   heightsStone national spatial reference system   heights
Stone national spatial reference system heights
 
Buck appgeo
Buck appgeoBuck appgeo
Buck appgeo
 
Edwards frontier precision terrestrial imagingandmeasurement
Edwards frontier precision terrestrial imagingandmeasurementEdwards frontier precision terrestrial imagingandmeasurement
Edwards frontier precision terrestrial imagingandmeasurement
 
Siddle connecting surveying and mgis to mesa countys rtrn
Siddle connecting surveying and mgis to mesa countys rtrnSiddle connecting surveying and mgis to mesa countys rtrn
Siddle connecting surveying and mgis to mesa countys rtrn
 
Stone four corners monument
Stone four corners monumentStone four corners monument
Stone four corners monument
 
Isaac esri living atlas
Isaac esri living atlasIsaac esri living atlas
Isaac esri living atlas
 
Londe mobile devices appropriate uses
Londe mobile devices appropriate usesLonde mobile devices appropriate uses
Londe mobile devices appropriate uses
 
Lowry colorado state address dataset data quality
Lowry colorado state address dataset data qualityLowry colorado state address dataset data quality
Lowry colorado state address dataset data quality
 
Lindemann arc gis forlocalgovt
Lindemann arc gis forlocalgovtLindemann arc gis forlocalgovt
Lindemann arc gis forlocalgovt
 
Duran here presentation
Duran here presentationDuran here presentation
Duran here presentation
 
Underwood esri serug
Underwood esri serugUnderwood esri serug
Underwood esri serug
 
Korris national map corps
Korris national map corpsKorris national map corps
Korris national map corps
 
Chamberlain hazus
Chamberlain hazusChamberlain hazus
Chamberlain hazus
 
Gup web mobilegis
Gup web mobilegisGup web mobilegis
Gup web mobilegis
 
Vetter employee residence reports weld county
Vetter employee residence reports weld countyVetter employee residence reports weld county
Vetter employee residence reports weld county
 
Caldwell community sustainability and land use policy
Caldwell community sustainability and land use policyCaldwell community sustainability and land use policy
Caldwell community sustainability and land use policy
 
Caldwell uas
Caldwell uasCaldwell uas
Caldwell uas
 
Gijselaers lights camerang911
Gijselaers lights camerang911Gijselaers lights camerang911
Gijselaers lights camerang911
 
Behunin and lasslo inexpensive mobile mapping solutions
Behunin and lasslo inexpensive mobile mapping solutionsBehunin and lasslo inexpensive mobile mapping solutions
Behunin and lasslo inexpensive mobile mapping solutions
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Colorado State Address Dataset Automated Processing

  • 1. Colorado State Address Dataset Automated Processing Nathan Lowry, GIS Outreach Coordinator State of Colorado September 23, 2014
  • 2.
  • 3. Common Data Model ● Allows local and state-wide querying, analysis, and integration … ● Accommodates information exchanges ▪ Hierarchical - City to County, County to Region, Region to State ▪ Among neighboring jurisdictions (eg. County to County, etc.) ● Allows profiles to provide data in standard forms for specific objectives ▪ NENA CLDXF for NG-911 ▪ USPS Pub-28 for CASS ▪ ArcGIS Geocoding (for quality comparisons, etc.) ● It’s more efficient (less work) and assures more quality (less loss)
  • 4. FGDC-STD-016-2011 United States Thoroughfare, Landmark, and Postal Address Data Standard Of Greatest Significance: 1.Everything* is ‘fully explicit’ (fully spelled‐out) No abbreviations allowed; No Ambiguity *The only exception is two‐letter state postal codes (eg. “CO” = Colorado) ●2.You will express exactly how each address will be parsed Parsing is no longer subject to interpretation The break‐down is stored in the data for each record 3.Each Address must be assigned a Unique Identifier (UID) Multiple representations of the same address can be “tied together” if and only if (iff) addresses are assigned UIDs. These are big changes that few have yet implemented •Our common data model is designed to accommodate both: ‒your current state and ‒this “to be” state
  • 5. Presuppositions: ● SQL Server Integration Services (SSIS) o Parallel processing - fast translations - True. o Most Compatible with SQL Server - Irrelevant* o Developed by DBAs for DBAs - No, developed by app developers for app developers ▪ (ie. Normalization tools) - Hah, hah, hah, hah, hah! o No Additional Cost - (This one bore out) o I learned French instead of Spanish - (SSIS instead of Python) ● No Parsing o I will translate, but it’ll be the locals’ responsibility to pre-parse... - No parsing, no geocoding* o In addition, no last lines, no geocoding* ● 6-8 Weeks Processing - 6-8 Months of Processing
  • 7. Colorado State Address Dataset Automated and Manual Processes
  • 9. Observations ● SQL Server Integration Services (SSIS) ○ SSIS is quirky ○ SSIS Expression Language is Swahili ○ A modeling canvas may be more effective for design ○ SSIS can integrate with many other server processes (FTP) ● Parsing and “Last Lining” will give CO jurisdictions a leg up ○ The level of effort can be significant ○ CLDXF Street Naming and Address Numbering Conventions ● Standards ○ Jurisdictional pretypes, sequencers - minor tweaks ○ Subaddress conventions need ... something
  • 10. Opportunities ● Standards ○ Improvement via implementation ○ Coalescence on Subaddresses ● Common implementations of data models ○ Reduce the cost of development ○ Makes sharing of code useful and possible ● Common code ○ Shared parsing tools ○ Shared applications