Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

1,249 views

Published on

Presentation on sept 21, 2013 at FOSS4G 2013 in Nottingham (UK). Stetl, Streaming ETL, is a lightweight, geospatial ETL-framework written in Python, integrating transformation tools like GDAL/OGR, XSLT and PostGIS. Stetl targets ETL cases that involve XML and GML data, like INSPIRE data harmonization, but other transformations, even non-geospatial, can also be made. Stetl applies declarative programming: a configuration file specifies an ETL chain of input/filter/output modules. Stetl uses native calls to C-level libraries like libxml2 (via lxml) for speed. See more at http://stetl.org

Watch this presentation video recording on FOSSLC: http://www.fosslc.org/drupal/content/taming-rich-gml-stetl-lightweight-python-framework-geospatial-etl

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,249
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Taming Rich GML with Stetl - FOSS4G 2013 Nottingham

  1. 1. Taming Rich GML with Stetl - A lightweight Python Framework for Geospatial ETL Just van den Broecke FOSS4G Nottingham 2013 Sept 21, 2013 www.justobjects.nl 1
  2. 2. About Me Independent Open Source Geospatial Professional Secretary OSGeo Dutch Local Chapter Member of the Dutch OpenGeoGroep Just van den Broecke just@justobjects.nl www.justobjects.nl 2
  3. 3. We have a Problem 3
  4. 4. The Rich GML Problem 4
  5. 5. Rich GML = Complex Mess 5
  6. 6. INSPIRE Dutch National Datasets Germany:AFIS-ALKIS-ATKIS UK: OS Mastermap . . 6
  7. 7. “Semi GML” e.g. Dutch Addresses & Buildings (BAG) Arbitrary Nesting 7
  8. 8. The Street Name! A Street Element in an INSPIRE Annex I Address.. 8
  9. 9. Complex Model Transformations 9
  10. 10. 100+ MB GML Files 10
  11. 11. 11
  12. 12. Millions of Objects 12
  13. 13. 10s of Millions of <Elements> 13
  14. 14. Multiple Transformation Steps 14
  15. 15. Solution is Spatial ETL 15
  16. 16. But How ? 16
  17. 17. FOSS ETL - DIY ? Maybe 17
  18. 18. FOSS ETL - High Level 18
  19. 19. FOSS ETL - Lower Level Each powerful individually but cannot do the entire ETL ogr2ogr 19
  20. 20. FOSS ETL - How to Combine? =+ + ? ogr2ogr 20
  21. 21. Example - 2011 INSPIRE-FOSS http://inspire.kademo.nl/doc/design-etl.html Good ideas but hard to scale and reuse. Need Framework 21
  22. 22. FOSS ETL - Add Python to Equation =+ + ?( ) ogr2ogr 22
  23. 23. =+ + Stetl ( ) ogr2ogr 23
  24. 24. Stetl = Simple Streaming Spatial Speedy ETL 24
  25. 25. GML1 GML2 Stetl From Barrels of GML to Maps 25
  26. 26. 26
  27. 27. Stetl Concepts 27
  28. 28. Process Chain Input Filter OutputFilter Stetl concepts Source Target 28
  29. 29. Process Chain Input Filter Output gml Filter Stetl concepts 29
  30. 30. Example: GML to PostGIS Reader ogr2ogr gml Stetl concepts 30
  31. 31. Example: INSPIRE Model Transform ogr2ogr XSLT Writer gml Stetl concepts Simple Features Complex Features 31
  32. 32. Example: deegree Store ogr2ogr XSLT deegree Writer Stetl concepts Or via WFS-T 32
  33. 33. Process Chain - How? Input Filters Output Stetl concepts 33
  34. 34. Example: XML to Shape XML Input XSLT Filter ogr2ogr Output 34
  35. 35. Example: XML to Shape The Source 35
  36. 36. Example: XML to Shape XML Input 36
  37. 37. Example: XML to Shape XML Input XSLT Filter 37
  38. 38. Example: XML to Shape Prepare XSLT Script 38
  39. 39. Example: XML to Shape XSLT GML Output 39
  40. 40. Example: XML to Shape XML Input XSLT Filter ogr2ogr Output 40
  41. 41. Example: XML to Shape The Stetl Config File Process Chain XML InputXSLT Filter ogr2ogr Output 41
  42. 42. Running Stetl stetl -c etl.cfg 42
  43. 43. Result Shapefile viewed in QGIS 43
  44. 44. Installing Stetl via PyPi Deps •GDAL+Python bindings •lxml (xml proc) •psycopg2 (Postgres) sudo pip install stetl 44
  45. 45. Speed: Streaming Input Filter Output gml Stetl concepts 45
  46. 46. Speed: Going Native Input Filter Output gml ogr2ogr StetlStetl Native C Libs/Progs Calls Stetl concepts 46
  47. 47. Example Components Input Filters Output Stetl concepts XMLFile XSLT GMLFile ogr2ogr XMLAssembler ogr2ogr LineStream XMLValidator WFS-T deegree* FeatureExtractor deegree* YourInput YourFilter YourOutput 47
  48. 48. Example: XsltFilter Python from util import Util, etree from filter import Filter from packet import FORMAT log = Util.get_log("xsltfilter") class XsltFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc) self.xslt_file_path = self.cfg.get('script') self.xslt_file = open(self.xslt_file_path, 'r') # Parse XSLT file only once self.xslt_doc = etree.parse(self.xslt_file) self.xslt_obj = etree.XSLT(self.xslt_doc) self.xslt_file.close() def invoke(self, packet): if packet.data is None: return packet return self.transform(packet) def transform(self, packet): packet.data = self.xslt_obj(packet.data) log.info("XSLT Transform OK") return packet 48
  49. 49. [etl] chains = input_xml_file|my_filter|output_std [input_xml_file] class = inputs.fileinput.XmlFileInput file_path = input/cities.xml # My custom component [my_filter] class = my.myfilter.MyFilter [output_std] class = outputs.standardoutput.StandardXmlOutput class MyFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc) def invoke(self, packet): log.info("CALLING MyFilter OK!!!!") return packet Your Own Components Stetl concepts Step 1- Define Class Step 2- Config Class 49
  50. 50. Data Structures Stetl concepts • Components exchange Packets • Packet contains data and status • Data formats, e.g. : xml_line_stream etree_doc etree_element (feature) etree_element_array string any . . 50
  51. 51. deegree Integration Stetl concepts •Input DeegreeBlobstoreInput •Output DeegreeBlobstoreInput DeegreeFSLoaderOutput WFSTOutput 51
  52. 52. Cases - The Netherlands •INSPIRE Download Services publish to deegree store (WFS) generate GML files (for Atom Feed) •National GML Datasets GML to PostGIS (Top10NL, BGT) 52
  53. 53. [etl] chains = input_sql_pre|schema_name_filter|output_postgres, input_big_gml_files|xml_assembler|transformer_xslt|output_ogr2ogr, input_sql_post|schema_name_filter|output_postgres # Pre SQL file inputs to be executed [input_sql_pre] class = inputs.fileinput.StringFileInput file_path = sql/drop-tables.sql,sql/create-schema.sql # Post SQL file inputs to be executed [input_sql_post] class = inputs.fileinput.StringFileInput file_path = sql/delete-duplicates.sql # Generic filter to substitute Python-format string values like {schema} in string [schema_name_filter] class = filters.stringfilter.StringSubstitutionFilter # format args {schema} is schema name format_args = schema:{schema} [output_postgres] class = outputs.dboutput.PostgresDbOutput database = {database} host = {host} port = {port} user = {user} password = {password} schema = {schema} # The source input file(s) from dir and produce gml:featureMember elements [input_big_gml_files] class = inputs.fileinput.XmlElementStreamerFileInput file_path = {gml_files} element_tags = featureMember Top10NL Extract Parameter Substitution 53
  54. 54. Top10NL+BAG (Dutch Topo + Buildings) 54
  55. 55. BGT - Dutch Large Scale Topo 55
  56. 56. Case: INSPIRE DL Services - Dutch Addresses Source <GML> NLExtract Stetl deegree WFS INSPIRE <GML> Atom Feed INSPIRE Addresses Dutch Addresses+ Buildings deegree blobstore Stetl 56
  57. 57. Project Status - Sept 21, 2013 • v1.0.4 installable via PyPi • Documentation on www.stetl.org • Real world transforms done • Seeking feedback, support and contributors 57
  58. 58. Rich GML Problem Solved? 58
  59. 59. ThankYou ! www.stetl.org github.com/justb4/stetl 59

×