SlideShare a Scribd company logo
1 of 24
Download to read offline
Developing and Implementing
Spatial ETL Processes
with Open Source Tools
Matthew Baker
Denver Public Schools
Dep't of Planning and Analysis
Quick FOSS4G Update
● Replaced ArcSDE with PostGIS
– Dev, QA (Prod TBA...)
● 'Non-GIS' users using QGIS
● Editable PostGIS Views!
● Data-driven cartography
– OSM data with styles saved in PostGIS
● Manager now using QGIS
SQL Server
Dev QA Prod
PostGIS
ArcGIS
Enterprise
maps.dpsk12.org
Database Structure
ETL:
Extract
Transform
Load
ETL Process Needs
● PostGIS read/write
● SQL Server Spatial read/write
● Daily Updating of Tables
● Daily Building of Datasets
● Daily Delivery to DPS Enterprise
Goals of ETL Development
● Break dependency on GUI-based tools
● Overcome 'other' FOSS ETL Tools
– Geokettle
– GDAL
● Avoid commercial ETL tools
– SSIS
– FME
Creating New Tables
● Import Shapefiles to Dev
– QGIS DB Manager
● Import Non-Spatial Tables
– CSVKit - Python via command line
– Read CSV Schema
● Generate SQL ‘Create Table’
PostGIS and Spatial “Text”
select
geom
from dpsdata."Schools_Current"
PostGIS and Spatial “Text”
select
cast(geom as varchar)
from dpsdata."Schools_Current"
PostGIS and Spatial “Text”
select
ST_AsText(geom)
from dpsdata."Schools_Current"
PostGIS Spatial Transformation
select
ST_Transform(geom, 2877)
from dpsdata."Schools_Current"
select
ST_AsText(ST_Transform(geom, 2877))
from dpsdata."Schools_Current"
Python for Databases
● Pypyodbc
– Pure python implementation of pyodbc
– Connect to databases using ODBC
● MS SQL Server
● Psycopg2
– PostgreSQL adapter (libraries) for Python
SQL Inserts
insert
into tablename (column1, column2)
values ('value1', 'value2')
SQL Inserts with Parameters
INSERT INTO "Schools_Current"
(school_name, abbreviation, elem,
mid, high, schnum, geom
classification)
VALUES (?, ?, ?, ?, ?, ?, ?);*
* postgresql syntax
Python ETL Pattern
Connect to databases
Truncate Destination
Insert into Destination
Select from Source
Python ETL Pattern
● Connect to databases
● Source
● Destination
● Set Up Cursors
● Select from Source
● Use SQL Expression (with spatial function)
● Assign data to rows (in memory)
● Insert into Destination
● Create insert statement with parameters
● Iterate through rows (data)
● Assign row values to variables
● Commit data with Insert
● Truncate Destination
Example: PostGIS to PostGIS
import psycopg2
connSource = psycopg2.connect("host=arcgisdev01 dbname=dpspgisdev user=dpsdata
password=*** ")
curSource = connSource.cursor()
connDest = psycopg2.connect("host=FOSS4GLin01 dbname=dpspgisqa user=dpsdata
password=*** ")
curDest = connDest.cursor()
curSource.execute('''
select addressid, cast(geom as varchar) from public."Address_Master"
''')
sql = ('''
insert into dqmt.Address_Master (addressid, geom) values (%s, %s)
''')
data = []
rows = curSource.fetchall()
for row in rows:
data = [row[0], row[1]]
curDest.execute (sql, data)
connDest.commit()
connSource.close()
connDest.close()
Deployed Processes
● Daily Active Students
– Extract from MSSQL View joining geometry to students
– Deliver to PostGIS and MSSQL
● Refresh Boundaries
– PostGIS Materialized Views
● Geocoding
● Enterprise Delivery
– Schools and Boundaries
– Shared Enrollment Zone Info
– Current Addresses and Boundary Information (spatial join)
Deployment
● Microsoft Windows Server
– Task Scheduler
– (still doesn't run FME / ArcPY scripts)
Ubuntu Server Deployment
● Cron Task Scheduler
0 3 * * * python /home/dpspgisqa/scripts/SchoolBoundaries_All.py
0 3 * * * python /home/dpspgisqa/scripts/SchoolBoundaries...
0 3 * * * python /home/dpspgisqa/scripts/Schools_Current.py
0 3 * * * python /home/dpspgisqa/scripts/Schools_Projected.py
* * * * * /folder/runThisFile.py
| | | | |
| | | | ----- Day of week (0 - 7) (Sunday=0 or 7)
| | | ------- Month (1 - 12)
| | --------- Day of month (1 - 31)
| ----------- Hour (0 - 23)
------------- Minute (0 - 59)
Other Python Tricks
● Error Handling
– On script fail
● Send Email
● Insert message to database
● Run single SQL Script
– Within 1 database
● Bulk Inserts
Next Steps
● Implement PostGIS Prod server
– CentOS (new IT staff!!!)
● Document Internally
● Share Externally
– github.com/DPSSpatial
● Web maps
– Internal
– External
THANK YOU!
planning@dpsk12.org
github.com/dpsspatial

More Related Content

What's hot

Presentation 2
Presentation 2Presentation 2
Presentation 2
s2team
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
Takahiro Inoue
 
Serverless Apps - droidcon london 2012
Serverless Apps - droidcon london 2012Serverless Apps - droidcon london 2012
Serverless Apps - droidcon london 2012
Friedger Müffke
 

What's hot (20)

Inference-Based Detection of Architectural Violations in MVC2
Inference-Based Detection of Architectural Violations in MVC2Inference-Based Detection of Architectural Violations in MVC2
Inference-Based Detection of Architectural Violations in MVC2
 
Workshop 20140522 BigQuery Implementation
Workshop 20140522   BigQuery ImplementationWorkshop 20140522   BigQuery Implementation
Workshop 20140522 BigQuery Implementation
 
Omnibus database machine
Omnibus database machineOmnibus database machine
Omnibus database machine
 
Daniel Sikar: Hadoop MapReduce - 06/09/2010
Daniel Sikar: Hadoop MapReduce - 06/09/2010 Daniel Sikar: Hadoop MapReduce - 06/09/2010
Daniel Sikar: Hadoop MapReduce - 06/09/2010
 
Raw system logs processing with hive
Raw system logs processing with hiveRaw system logs processing with hive
Raw system logs processing with hive
 
Intro to Google Apps Script
Intro to Google Apps ScriptIntro to Google Apps Script
Intro to Google Apps Script
 
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
 
I See NoSQL Document Stores in Geospatial Applications
I See NoSQL Document Stores in Geospatial ApplicationsI See NoSQL Document Stores in Geospatial Applications
I See NoSQL Document Stores in Geospatial Applications
 
How to performance tune spark applications in large clusters
How to performance tune spark applications in large clustersHow to performance tune spark applications in large clusters
How to performance tune spark applications in large clusters
 
Why is postgis awesome?
Why is postgis awesome?Why is postgis awesome?
Why is postgis awesome?
 
Presentation 2
Presentation 2Presentation 2
Presentation 2
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
 
Accumulo Summit Keynote 2018
Accumulo Summit Keynote 2018Accumulo Summit Keynote 2018
Accumulo Summit Keynote 2018
 
JavaScript client API for Google Apps Script API primer
JavaScript client API for Google Apps Script API primerJavaScript client API for Google Apps Script API primer
JavaScript client API for Google Apps Script API primer
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
Serverless Apps - droidcon london 2012
Serverless Apps - droidcon london 2012Serverless Apps - droidcon london 2012
Serverless Apps - droidcon london 2012
 
Sprint 77
Sprint 77Sprint 77
Sprint 77
 
Experiment no 05
Experiment no 05Experiment no 05
Experiment no 05
 
Receipt processing with Google Cloud Platform and the Google Assistant
Receipt processing with Google Cloud Platform and the Google AssistantReceipt processing with Google Cloud Platform and the Google Assistant
Receipt processing with Google Cloud Platform and the Google Assistant
 

Viewers also liked

Performance and Application of GIS and Big Data ETL Processes Using FME
Performance and Application of GIS and Big Data ETL Processes Using FMEPerformance and Application of GIS and Big Data ETL Processes Using FME
Performance and Application of GIS and Big Data ETL Processes Using FME
Safe Software
 
2012 URISA Track, Object-Oriented GIS: A Flat Ontology of Pixels, Charlie Jac...
2012 URISA Track, Object-Oriented GIS: A Flat Ontology of Pixels, Charlie Jac...2012 URISA Track, Object-Oriented GIS: A Flat Ontology of Pixels, Charlie Jac...
2012 URISA Track, Object-Oriented GIS: A Flat Ontology of Pixels, Charlie Jac...
GIS in the Rockies
 

Viewers also liked (20)

BRIN indexes on geospatial databases - FOSS4G.NA 2016
BRIN indexes on geospatial databases - FOSS4G.NA 2016BRIN indexes on geospatial databases - FOSS4G.NA 2016
BRIN indexes on geospatial databases - FOSS4G.NA 2016
 
Geospatial ETL with Stetl - GeoPython 2016
Geospatial ETL with Stetl - GeoPython 2016Geospatial ETL with Stetl - GeoPython 2016
Geospatial ETL with Stetl - GeoPython 2016
 
Performance and Application of GIS and Big Data ETL Processes Using FME
Performance and Application of GIS and Big Data ETL Processes Using FMEPerformance and Application of GIS and Big Data ETL Processes Using FME
Performance and Application of GIS and Big Data ETL Processes Using FME
 
2016 asprs track: faa and uavs: what you need to know by caitlin reusch
2016 asprs track:  faa and uavs:  what you need to know by caitlin reusch2016 asprs track:  faa and uavs:  what you need to know by caitlin reusch
2016 asprs track: faa and uavs: what you need to know by caitlin reusch
 
2016 gisco track: coupling gis with online time reporting to monitor and repo...
2016 gisco track: coupling gis with online time reporting to monitor and repo...2016 gisco track: coupling gis with online time reporting to monitor and repo...
2016 gisco track: coupling gis with online time reporting to monitor and repo...
 
2016 urisa track: ring pattern of older adult population in urban areas by y...
2016 urisa track:  ring pattern of older adult population in urban areas by y...2016 urisa track:  ring pattern of older adult population in urban areas by y...
2016 urisa track: ring pattern of older adult population in urban areas by y...
 
2016 education track: r evolving the classroom by jennifer muha
2016 education track: r evolving the classroom by jennifer muha2016 education track: r evolving the classroom by jennifer muha
2016 education track: r evolving the classroom by jennifer muha
 
2016 gisco track: gis and emergency response the critical role of gis in a...
2016 gisco track:  gis and emergency response   the critical role of gis in a...2016 gisco track:  gis and emergency response   the critical role of gis in a...
2016 gisco track: gis and emergency response the critical role of gis in a...
 
2016 asprs track: science, scale, and innovation: when remote sensing analys...
2016 asprs track: science, scale, and innovation:  when remote sensing analys...2016 asprs track: science, scale, and innovation:  when remote sensing analys...
2016 asprs track: science, scale, and innovation: when remote sensing analys...
 
2016 foss4 g track: grass gis point cloud exploratory data analysis an open ...
2016 foss4 g track: grass gis point cloud exploratory data analysis  an open ...2016 foss4 g track: grass gis point cloud exploratory data analysis  an open ...
2016 foss4 g track: grass gis point cloud exploratory data analysis an open ...
 
2012 URISA Track, Object-Oriented GIS: A Flat Ontology of Pixels, Charlie Jac...
2012 URISA Track, Object-Oriented GIS: A Flat Ontology of Pixels, Charlie Jac...2012 URISA Track, Object-Oriented GIS: A Flat Ontology of Pixels, Charlie Jac...
2012 URISA Track, Object-Oriented GIS: A Flat Ontology of Pixels, Charlie Jac...
 
2016 develoment track: using esri’s java script api to disperse gis data by m...
2016 develoment track: using esri’s java script api to disperse gis data by m...2016 develoment track: using esri’s java script api to disperse gis data by m...
2016 develoment track: using esri’s java script api to disperse gis data by m...
 
2016 urisa track: challenges to implementing an enterprise landbase maintenan...
2016 urisa track: challenges to implementing an enterprise landbase maintenan...2016 urisa track: challenges to implementing an enterprise landbase maintenan...
2016 urisa track: challenges to implementing an enterprise landbase maintenan...
 
2016 conservation track: a climate change vulnerability framework and intera...
2016 conservation track:  a climate change vulnerability framework and intera...2016 conservation track:  a climate change vulnerability framework and intera...
2016 conservation track: a climate change vulnerability framework and intera...
 
2013 Poster Session, Geospatial Modeling of Mountain Pine Beetle Mortality by...
2013 Poster Session, Geospatial Modeling of Mountain Pine Beetle Mortality by...2013 Poster Session, Geospatial Modeling of Mountain Pine Beetle Mortality by...
2013 Poster Session, Geospatial Modeling of Mountain Pine Beetle Mortality by...
 
2016 gisco track: mapping right of way by brandi rank and claire brewer
2016 gisco track:  mapping   right of way by brandi rank and claire brewer2016 gisco track:  mapping   right of way by brandi rank and claire brewer
2016 gisco track: mapping right of way by brandi rank and claire brewer
 
2016 foss4 g track: why free and open source software for geospatial applic...
2016 foss4 g track: why  free and open source software for  geospatial applic...2016 foss4 g track: why  free and open source software for  geospatial applic...
2016 foss4 g track: why free and open source software for geospatial applic...
 
2016 gisco track: improving GIS Response for Emergency Management by stephani...
2016 gisco track: improving GIS Response for Emergency Management by stephani...2016 gisco track: improving GIS Response for Emergency Management by stephani...
2016 gisco track: improving GIS Response for Emergency Management by stephani...
 
2016 asprs track: spatial analysis at the continental scale: a practical app...
2016 asprs track:  spatial analysis at the continental scale: a practical app...2016 asprs track:  spatial analysis at the continental scale: a practical app...
2016 asprs track: spatial analysis at the continental scale: a practical app...
 
2016 gisco track: creating a quick map configured to us topo specifications: ...
2016 gisco track: creating a quick map configured to us topo specifications: ...2016 gisco track: creating a quick map configured to us topo specifications: ...
2016 gisco track: creating a quick map configured to us topo specifications: ...
 

Similar to 2016 foss4 g track: developing and implementing spatial etl processes with open source tools by matthew baker

Barcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationBarcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop Presentation
Norberto Leite
 

Similar to 2016 foss4 g track: developing and implementing spatial etl processes with open source tools by matthew baker (20)

Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
 
Barcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop PresentationBarcelona MUG MongoDB + Hadoop Presentation
Barcelona MUG MongoDB + Hadoop Presentation
 
hadoop
hadoophadoop
hadoop
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
Geospatial Graphs made easy with OrientDB - Codemotion Warsaw 2016
Geospatial Graphs made easy with OrientDB - Codemotion Warsaw 2016Geospatial Graphs made easy with OrientDB - Codemotion Warsaw 2016
Geospatial Graphs made easy with OrientDB - Codemotion Warsaw 2016
 
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
Connecting and using PostgreSQL database with psycopg2 [Python 2.7]
 
Speed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS AcceleratorSpeed up UDFs with GPUs using the RAPIDS Accelerator
Speed up UDFs with GPUs using the RAPIDS Accelerator
 
CouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourCouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 Hour
 
Geo django
Geo djangoGeo django
Geo django
 
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster Computing
 
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop seriesIntroducing Apache Spark's Data Frames and Dataset APIs workshop series
Introducing Apache Spark's Data Frames and Dataset APIs workshop series
 
Relational Database Access with Python ‘sans’ ORM
Relational Database Access with Python ‘sans’ ORM  Relational Database Access with Python ‘sans’ ORM
Relational Database Access with Python ‘sans’ ORM
 
How Spark Does It Internally?
How Spark Does It Internally?How Spark Does It Internally?
How Spark Does It Internally?
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Execution
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Meetup spark structured streaming
Meetup spark structured streamingMeetup spark structured streaming
Meetup spark structured streaming
 
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
Tutorial: Implementing your first Postgres extension | PGConf EU 2019 | Burak...
 
Getting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDBGetting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDB
 

More from GIS in the Rockies

2018 GIS in the Rockies Vendor Showcase (Th): The Data Driven Government
2018 GIS in the Rockies Vendor Showcase (Th): The Data Driven Government2018 GIS in the Rockies Vendor Showcase (Th): The Data Driven Government
2018 GIS in the Rockies Vendor Showcase (Th): The Data Driven Government
GIS in the Rockies
 
2018 GIS in the Rockies Vendor Showcase (Th): Solving Real World Issues With ...
2018 GIS in the Rockies Vendor Showcase (Th): Solving Real World Issues With ...2018 GIS in the Rockies Vendor Showcase (Th): Solving Real World Issues With ...
2018 GIS in the Rockies Vendor Showcase (Th): Solving Real World Issues With ...
GIS in the Rockies
 
2018 GIS in Recreation: A Creek Runs Through It
2018 GIS in Recreation: A Creek Runs Through It2018 GIS in Recreation: A Creek Runs Through It
2018 GIS in Recreation: A Creek Runs Through It
GIS in the Rockies
 

More from GIS in the Rockies (20)

GISCO Fall 2018: Bike Network Equity: A GIS and Qualitative Analysis of Ameri...
GISCO Fall 2018: Bike Network Equity: A GIS and Qualitative Analysis of Ameri...GISCO Fall 2018: Bike Network Equity: A GIS and Qualitative Analysis of Ameri...
GISCO Fall 2018: Bike Network Equity: A GIS and Qualitative Analysis of Ameri...
 
GISCO Fall 2018: Colorado 811: Changes and Challenges – Brian Collison
GISCO Fall 2018: Colorado 811: Changes and Challenges – Brian CollisonGISCO Fall 2018: Colorado 811: Changes and Challenges – Brian Collison
GISCO Fall 2018: Colorado 811: Changes and Challenges – Brian Collison
 
GISCO Fall 2018: Senate Bill 18-167 and GIS – Dave Murray
GISCO Fall 2018: Senate Bill 18-167 and GIS – Dave MurrayGISCO Fall 2018: Senate Bill 18-167 and GIS – Dave Murray
GISCO Fall 2018: Senate Bill 18-167 and GIS – Dave Murray
 
2018 GIS in the Rockies Workshop: Coordinate Systems and Projections
2018 GIS in the Rockies Workshop: Coordinate Systems and Projections 2018 GIS in the Rockies Workshop: Coordinate Systems and Projections
2018 GIS in the Rockies Workshop: Coordinate Systems and Projections
 
2018 GIS in Emergency Management: Denver Office of Emergency Management Overview
2018 GIS in Emergency Management: Denver Office of Emergency Management Overview2018 GIS in Emergency Management: Denver Office of Emergency Management Overview
2018 GIS in Emergency Management: Denver Office of Emergency Management Overview
 
2018 GIS in the Rockies Vendor Showcase (Th): The Data Driven Government
2018 GIS in the Rockies Vendor Showcase (Th): The Data Driven Government2018 GIS in the Rockies Vendor Showcase (Th): The Data Driven Government
2018 GIS in the Rockies Vendor Showcase (Th): The Data Driven Government
 
2018 GIS in the Rockies Vendor Showcase (Th): Solving Real World Issues With ...
2018 GIS in the Rockies Vendor Showcase (Th): Solving Real World Issues With ...2018 GIS in the Rockies Vendor Showcase (Th): Solving Real World Issues With ...
2018 GIS in the Rockies Vendor Showcase (Th): Solving Real World Issues With ...
 
2018 GIS in the Rockies Vendor Showcase (Th): ERDAS Imagine What's New and Ti...
2018 GIS in the Rockies Vendor Showcase (Th): ERDAS Imagine What's New and Ti...2018 GIS in the Rockies Vendor Showcase (Th): ERDAS Imagine What's New and Ti...
2018 GIS in the Rockies Vendor Showcase (Th): ERDAS Imagine What's New and Ti...
 
2018 GIS in the Rockies Vendor Showcase (Th): Building High Performance Gover...
2018 GIS in the Rockies Vendor Showcase (Th): Building High Performance Gover...2018 GIS in the Rockies Vendor Showcase (Th): Building High Performance Gover...
2018 GIS in the Rockies Vendor Showcase (Th): Building High Performance Gover...
 
2018 GIS in Recreation: The Making of a Trail
2018 GIS in Recreation: The Making of a Trail2018 GIS in Recreation: The Making of a Trail
2018 GIS in Recreation: The Making of a Trail
 
2018 GIS in Recreation: The Latest Trail Technology Crowdsourcing Maps and Apps
2018 GIS in Recreation: The Latest Trail Technology Crowdsourcing Maps and Apps2018 GIS in Recreation: The Latest Trail Technology Crowdsourcing Maps and Apps
2018 GIS in Recreation: The Latest Trail Technology Crowdsourcing Maps and Apps
 
2018 GIS in the Rockies: Riparian Shrub Assessment of the Mancos River Canyon...
2018 GIS in the Rockies: Riparian Shrub Assessment of the Mancos River Canyon...2018 GIS in the Rockies: Riparian Shrub Assessment of the Mancos River Canyon...
2018 GIS in the Rockies: Riparian Shrub Assessment of the Mancos River Canyon...
 
2018 GIS in Development: Partnerships Lead to Additional Recreational Content...
2018 GIS in Development: Partnerships Lead to Additional Recreational Content...2018 GIS in Development: Partnerships Lead to Additional Recreational Content...
2018 GIS in Development: Partnerships Lead to Additional Recreational Content...
 
2018 GIS in Recreation: Adding Value to Colorado the Beautiful Initiative carr
2018 GIS in Recreation: Adding Value to Colorado the Beautiful Initiative carr2018 GIS in Recreation: Adding Value to Colorado the Beautiful Initiative carr
2018 GIS in Recreation: Adding Value to Colorado the Beautiful Initiative carr
 
2018 GIS in Recreation: A Creek Runs Through It
2018 GIS in Recreation: A Creek Runs Through It2018 GIS in Recreation: A Creek Runs Through It
2018 GIS in Recreation: A Creek Runs Through It
 
2018 GIS in Recreation: Virtually Touring the National Trails
2018 GIS in Recreation: Virtually Touring the National Trails2018 GIS in Recreation: Virtually Touring the National Trails
2018 GIS in Recreation: Virtually Touring the National Trails
 
2018 GIS in the Rockies PLSC Track: Turning Towards the Future
2018 GIS in the Rockies PLSC Track: Turning Towards the Future2018 GIS in the Rockies PLSC Track: Turning Towards the Future
2018 GIS in the Rockies PLSC Track: Turning Towards the Future
 
2018 GIS in the Rockies PLSC: Intro to PLSS
2018 GIS in the Rockies PLSC: Intro to PLSS2018 GIS in the Rockies PLSC: Intro to PLSS
2018 GIS in the Rockies PLSC: Intro to PLSS
 
2018 GIS in the Rockies PLSC Track: Grid to Ground NATRF2022
2018 GIS in the Rockies PLSC Track: Grid to Ground NATRF20222018 GIS in the Rockies PLSC Track: Grid to Ground NATRF2022
2018 GIS in the Rockies PLSC Track: Grid to Ground NATRF2022
 
2018 GIS in Development: USGS and Citizen Science Success and Enhancements fo...
2018 GIS in Development: USGS and Citizen Science Success and Enhancements fo...2018 GIS in Development: USGS and Citizen Science Success and Enhancements fo...
2018 GIS in Development: USGS and Citizen Science Success and Enhancements fo...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

2016 foss4 g track: developing and implementing spatial etl processes with open source tools by matthew baker

  • 1. Developing and Implementing Spatial ETL Processes with Open Source Tools Matthew Baker Denver Public Schools Dep't of Planning and Analysis
  • 2. Quick FOSS4G Update ● Replaced ArcSDE with PostGIS – Dev, QA (Prod TBA...) ● 'Non-GIS' users using QGIS ● Editable PostGIS Views! ● Data-driven cartography – OSM data with styles saved in PostGIS ● Manager now using QGIS
  • 3. SQL Server Dev QA Prod PostGIS ArcGIS Enterprise maps.dpsk12.org Database Structure
  • 5. ETL Process Needs ● PostGIS read/write ● SQL Server Spatial read/write ● Daily Updating of Tables ● Daily Building of Datasets ● Daily Delivery to DPS Enterprise
  • 6. Goals of ETL Development ● Break dependency on GUI-based tools ● Overcome 'other' FOSS ETL Tools – Geokettle – GDAL ● Avoid commercial ETL tools – SSIS – FME
  • 7. Creating New Tables ● Import Shapefiles to Dev – QGIS DB Manager ● Import Non-Spatial Tables – CSVKit - Python via command line – Read CSV Schema ● Generate SQL ‘Create Table’
  • 8.
  • 9. PostGIS and Spatial “Text” select geom from dpsdata."Schools_Current"
  • 10. PostGIS and Spatial “Text” select cast(geom as varchar) from dpsdata."Schools_Current"
  • 11. PostGIS and Spatial “Text” select ST_AsText(geom) from dpsdata."Schools_Current"
  • 12. PostGIS Spatial Transformation select ST_Transform(geom, 2877) from dpsdata."Schools_Current" select ST_AsText(ST_Transform(geom, 2877)) from dpsdata."Schools_Current"
  • 13. Python for Databases ● Pypyodbc – Pure python implementation of pyodbc – Connect to databases using ODBC ● MS SQL Server ● Psycopg2 – PostgreSQL adapter (libraries) for Python
  • 14. SQL Inserts insert into tablename (column1, column2) values ('value1', 'value2')
  • 15. SQL Inserts with Parameters INSERT INTO "Schools_Current" (school_name, abbreviation, elem, mid, high, schnum, geom classification) VALUES (?, ?, ?, ?, ?, ?, ?);* * postgresql syntax
  • 16. Python ETL Pattern Connect to databases Truncate Destination Insert into Destination Select from Source
  • 17. Python ETL Pattern ● Connect to databases ● Source ● Destination ● Set Up Cursors ● Select from Source ● Use SQL Expression (with spatial function) ● Assign data to rows (in memory) ● Insert into Destination ● Create insert statement with parameters ● Iterate through rows (data) ● Assign row values to variables ● Commit data with Insert ● Truncate Destination
  • 18. Example: PostGIS to PostGIS import psycopg2 connSource = psycopg2.connect("host=arcgisdev01 dbname=dpspgisdev user=dpsdata password=*** ") curSource = connSource.cursor() connDest = psycopg2.connect("host=FOSS4GLin01 dbname=dpspgisqa user=dpsdata password=*** ") curDest = connDest.cursor() curSource.execute(''' select addressid, cast(geom as varchar) from public."Address_Master" ''') sql = (''' insert into dqmt.Address_Master (addressid, geom) values (%s, %s) ''') data = [] rows = curSource.fetchall() for row in rows: data = [row[0], row[1]] curDest.execute (sql, data) connDest.commit() connSource.close() connDest.close()
  • 19. Deployed Processes ● Daily Active Students – Extract from MSSQL View joining geometry to students – Deliver to PostGIS and MSSQL ● Refresh Boundaries – PostGIS Materialized Views ● Geocoding ● Enterprise Delivery – Schools and Boundaries – Shared Enrollment Zone Info – Current Addresses and Boundary Information (spatial join)
  • 20. Deployment ● Microsoft Windows Server – Task Scheduler – (still doesn't run FME / ArcPY scripts)
  • 21. Ubuntu Server Deployment ● Cron Task Scheduler 0 3 * * * python /home/dpspgisqa/scripts/SchoolBoundaries_All.py 0 3 * * * python /home/dpspgisqa/scripts/SchoolBoundaries... 0 3 * * * python /home/dpspgisqa/scripts/Schools_Current.py 0 3 * * * python /home/dpspgisqa/scripts/Schools_Projected.py * * * * * /folder/runThisFile.py | | | | | | | | | ----- Day of week (0 - 7) (Sunday=0 or 7) | | | ------- Month (1 - 12) | | --------- Day of month (1 - 31) | ----------- Hour (0 - 23) ------------- Minute (0 - 59)
  • 22. Other Python Tricks ● Error Handling – On script fail ● Send Email ● Insert message to database ● Run single SQL Script – Within 1 database ● Bulk Inserts
  • 23. Next Steps ● Implement PostGIS Prod server – CentOS (new IT staff!!!) ● Document Internally ● Share Externally – github.com/DPSSpatial ● Web maps – Internal – External