Geospatial ETL with Stetl - GeoPython 2016

633 views

Published on

Explains basics of Stetl, an Open Source spatial ETL framework. More on http://stetl.org. Talk given on GeoPython conference, June 24, 2016, Basel, Switserland.

Published in: Software
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
633
On SlideShare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Geospatial ETL with Stetl - GeoPython 2016

  1. 1. Geospatial Data Processing with Stetl Just van den Broecke GeoPython 2016 Muttenz - Switserland June 24, 2016 www.justobjects.nl www.stetl.org
  2. 2. About Me Independent Open Source Geospatial Professional + Secretary OSGeo Dutch Local Chapter + Member of the Dutch OpenGeoGroep Just van den Broecke just@justobjects.nl www.justobjects.nl
  3. 3. Agenda • Spatial ETL • Stetl • Concepts • Cases • Status • Q & A
  4. 4. Spatial ETL
  5. 5. Data Wrangling
  6. 6. ETL - Extract Transform Load
  7. 7. Spatial ETL GML PostGIS Shapefile WFS CSV SQLite XML GML PostGIS Shapefile WFS CSV SQLite XML
  8. 8. GML PostGIS Shapefile WFS CSV SQLite XML Spatial ETL 
 Example non-standard source
  9. 9. From: https://live.osgeo.org/en/overview/gdal_overview.html
  10. 10. Plenty of Tools… Each tool is powerful by itself but cannot do the entire ETL ogr2ogr Spatial ETL
  11. 11. FOSS ETL - How to Combine Components? = + + ?+ ..
  12. 12. Example - 2011 INSPIRE-FOSS http://inspire.kademo.nl/doc/design-etl.html Nice ideas but hard to scale, deploy and reuse. 
 Need Framework
  13. 13. Solution: Add Python to the Equation =+ + ?( )+ ..
  14. 14. Stetl Solution: Add Python to the Equation =+ +( )+ ..
  15. 15. Stetl = Simple Streaming Spatial Speedy ETL
  16. 16. Stetl Concepts
  17. 17. Process Chain Input Filter OutputFilter Stetl concepts Source Target
  18. 18. Process Chain Input Filter Output gml Filter Stetl concepts
  19. 19. Example: GML to PostGIS GML
 Reader PG
 Output gml Stetl concepts
  20. 20. Example: Data Model Transform OGR Reader XSLT GML
 Writer gml Stetl concepts Simple Features Complex Features or Jinja2 !
  21. 21. Process Chain - How? Input Filters Output Stetl concepts Stetl Config File Instantiate
  22. 22. Example: XML to Shape XML Input XSLT Filter OGR Output
  23. 23. Example: XML to Shape The Source File
  24. 24. Example: XML to Shape XML Input
  25. 25. Example: XML to Shape XML Input XSLT Filter
  26. 26. Example: XML to Shape Prepare XSLT Script
  27. 27. Example: XML to Shape XSLT GML Output
  28. 28. Example: XML to Shape XML Input XSLT Filter OGR Output OGC Simple Features XML DOM
  29. 29. Example: XML to Shape The Stetl Config File Process Chain XML InputXSLT Filter ogr2ogr Output
  30. 30. Running Stetl stetl -c etl.cfg stetl -c etl.cfg [-a <properties>]
  31. 31. Installing Stetl - PyPi Deps •GDAL+Python bindings •lxml (xml proc) •psycopg2 (Postgres) sudo pip install stetl
  32. 32. Installing Stetl - new: Docker https://hub.docker.com/r/ justb4/stetl
  33. 33. Speed: Streaming Input Filter Output gml Stetl concepts
  34. 34. Speed: Going Native Input Filter Output gml ogr2ogr StetlStetl Native C Libs/Progs Calls Stetl concepts
  35. 35. Example Components Input Filters Output Stetl concepts XMLFile XSLT GML ogr2ogr XMLAssembler GDAL/OGR LineStream XMLValidator WFS-T Postgres/PostGIS Jinja2 Postgres/PostGIS deegree* FeatureExtractor deegree* YourInput YourFilter YourOutput
  36. 36. Example: XsltFilter Pythonfrom util import Util, etree from filter import Filter from packet import FORMAT log = Util.get_log("xsltfilter") class XsltFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc) self.xslt_file_path = self.cfg.get('script') self.xslt_file = open(self.xslt_file_path, 'r') # Parse XSLT file only once self.xslt_doc = etree.parse(self.xslt_file) self.xslt_obj = etree.XSLT(self.xslt_doc) self.xslt_file.close() def invoke(self, packet): if packet.data is None: return packet return self.transform(packet) def transform(self, packet): packet.data = self.xslt_obj(packet.data) log.info("XSLT Transform OK") return packet
  37. 37. [etl] chains = input_xml_file|my_filter|output_std [input_xml_file] class = inputs.fileinput.XmlFileInput file_path = input/cities.xml # My custom component [my_filter] class = my.myfilter.MyFilter [output_std] class = outputs.standardoutput.StandardXmlOutput class MyFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc) def invoke(self, packet): log.info("CALLING MyFilter OK!!!!") return packet Your Own Components Stetl concepts Step 1- Define Class Step 2- Configure Class
  38. 38. Data Structures Stetl concepts • Components exchange Packets • Packet contains data and status • Data formats, e.g. : xml_line_stream etree_doc
 etree_element (feature) etree_element_array string any . .
  39. 39. Cases
  40. 40. Cases - GML to PostGIS • National Dutch Open Datasets (GML)
 
 http://nlextract.nl ✴ Topography:Top10NL, BGT ✴ Cadastral Parcels (BRK)
 • Ordnance Survey UK ✴ PoC Topography (OS Mastermap)
  41. 41. Cases - IoT/SensorWeb • SOSPilot ✴ Dutch Air Quality Data ✴ publish to PostGIS+SOS (SOS-T) ✴ EU Reporting (Jinja2 Filter) ✴ sospilot.geonovum.nl • Smart Emission ✴ AQ Sensors hosted by citizens ✴ Calibration/Aggregation ✴ publication to SOS and OGC SensorThings ✴ data.smartemission.nl
  42. 42. Project Status - June 24, 2016 • v1.0.9 installable via PyPi or Docker • Documentation on www.stetl.org • Real world transforms done
  43. 43. Project - Planned & in progress • PyWPS integration • GUI 
 - Flask+Celery+Redis? 
 - Node Red?
 - Jupiter Notebook? • More on GitHub
 https://github.com/geopython/stetl
  44. 44. ThankYou ! www.stetl.org github.com/geopython/stetl

×