Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

Stetl for INSPIRE Data Transformation

  1. INSPIRE Transformation with Stetl - A lightweight Python Framework for Geospatial ETL Just van den Broecke EuroGeographics - KEN Workshop Paris, Oct 8, 2013 www.justobjects.nl
  2. About Me Independent Open Source Geospatial Professional Secretary OSGeo Dutch Local Chapter Member of the Dutch OpenGeoGroep Just van den Broecke just@justobjects.nl www.justobjects.nl
  3. We have a Problem
  4. The Rich GML Problem
  5. Rich GML = Complex Mess
  6. INSPIRE Dutch National Datasets Germany:AFIS-ALKIS-ATKIS UK: OS Mastermap . .
  7. “Semi GML” e.g. Dutch Addresses & Buildings (BAG) Arbitrary Nesting
  8. The Street Name! A Street Element in an INSPIRE Annex I Address..
  9. Complex Model Transformations
  10. 100+ MB GML Files
  11. Millions of Objects
  12. 10s of Millions of <Elements>
  13. Multiple Transformation Steps
  14. Solution is Spatial ETL
  15. But How ? (with FOSS)
  16. FOSS ETL - DIY ? Maybe
  17. FOSS ETL - High Level
  18. FOSS ETL - Lower Level Each powerful individually but cannot do the entire ETL ogr2ogr
  19. FOSS ETL - How to Combine? =+ + ? ogr2ogr
  20. Example - 2011 Kadaster ESDIN http://inspire.kademo.nl/doc/design-etl.html Good ideas but hard to scale and reuse. Need Framework
  21. FOSS ETL :Add Python to Equation =+ + ?( ) ogr2ogr
  22. =+ + Stetl ( ) ogr2ogr
  23. Stetl = Simple Streaming Spatial Speedy ETL
  24. GML1 GML2 Stetl From Barrels of GML to Maps
  25. From Local National Data to INSPIRE DL Services Source <GML> NLExtract Stetl deegree WFS INSPIRE <GML> Atom Feed INSPIRE Addresses Dutch Addresses+ Buildings deegree blobstore Stetl
  26. Stetl Concepts
  27. Process Chain Input Filter OutputFilter Stetl concepts Source Target
  28. Process Chain Input Filter Output gml Filter Stetl concepts
  29. Example: GML to PostGIS Reader ogr2ogr gml Stetl concepts
  30. Example: INSPIRE Model Transform ogr2ogr XSLT Writer gml Stetl concepts Simple Features Complex Features
  31. Example: deegree Store ogr2ogr XSLT deegree Writer Stetl concepts Or via WFS-T
  32. Process Chain - How? Input Filters Output Stetl concepts
  33. Example: XML to Shape XML Input XSLT Filter ogr2ogr Output
  34. Example: XML to Shape The Source
  35. Example: XML to Shape XML Input
  36. Example: XML to Shape XML Input XSLT Filter
  37. Example: XML to Shape Prepare XSLT Script
  38. Example: XML to Shape XSLT GML Output
  39. Example: XML to Shape XML Input XSLT Filter ogr2ogr Output
  40. Example: XML to Shape The Stetl Config File Process Chain XML InputXSLT Filter ogr2ogr Output
  41. Running Stetl stetl -c etl.cfg
  42. Result Shapefile viewed in QGIS
  43. Installing Stetl via PyPi Deps •GDAL+Python bindings •lxml (xml proc) •psycopg2 (Postgres) sudo pip install stetl
  44. Speed: Streaming Input Filter Output gml Stetl concepts
  45. Speed: Going Native Input Filter Output gml ogr2ogr StetlStetl Native C Libs/Progs Calls Stetl concepts
  46. Example Components Input Filters Output Stetl concepts XMLFile XSLT GMLFile ogr2ogr XMLAssembler ogr2ogr LineStream XMLValidator WFS-T deegree* FeatureExtractor deegree* YourInput YourFilter YourOutput
  47. Example: XsltFilter Python from util import Util, etree from filter import Filter from packet import FORMAT log = Util.get_log("xsltfilter") class XsltFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc) self.xslt_file_path = self.cfg.get('script') self.xslt_file = open(self.xslt_file_path, 'r') # Parse XSLT file only once self.xslt_doc = etree.parse(self.xslt_file) self.xslt_obj = etree.XSLT(self.xslt_doc) self.xslt_file.close() def invoke(self, packet): if packet.data is None: return packet return self.transform(packet) def transform(self, packet): packet.data = self.xslt_obj(packet.data) log.info("XSLT Transform OK") return packet
  48. [etl] chains = input_xml_file|my_filter|output_std [input_xml_file] class = inputs.fileinput.XmlFileInput file_path = input/cities.xml # My custom component [my_filter] class = my.myfilter.MyFilter [output_std] class = outputs.standardoutput.StandardXmlOutput class MyFilter(Filter): # Constructor def __init__(self, configdict, section): Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc) def invoke(self, packet): log.info("CALLING MyFilter OK!!!!") return packet Your Own Components Stetl concepts Step 1- Define Class Step 2- Config Class
  49. Data Structures Stetl concepts • Components exchange Packets • Packet contains data and status • Data formats, e.g. : xml_line_stream etree_doc etree_element (feature) etree_element_array string any . .
  50. deegree Integration Stetl concepts •Input DeegreeBlobstoreInput •Output DeegreeBlobstoreInput DeegreeFSLoaderOutput WFSTOutput
  51. Cases - The Netherlands •INSPIRE Download Services publish to deegree store (WFS) generate GML files (for Atom Feed) •National GML Datasets GML to PostGIS (Top10NL, BGT)
  52. [etl] chains = input_sql_pre|schema_name_filter|output_postgres, input_big_gml_files|xml_assembler|transformer_xslt|output_ogr2ogr, input_sql_post|schema_name_filter|output_postgres # Pre SQL file inputs to be executed [input_sql_pre] class = inputs.fileinput.StringFileInput file_path = sql/drop-tables.sql,sql/create-schema.sql # Post SQL file inputs to be executed [input_sql_post] class = inputs.fileinput.StringFileInput file_path = sql/delete-duplicates.sql # Generic filter to substitute Python-format string values like {schema} in string [schema_name_filter] class = filters.stringfilter.StringSubstitutionFilter # format args {schema} is schema name format_args = schema:{schema} [output_postgres] class = outputs.dboutput.PostgresDbOutput database = {database} host = {host} port = {port} user = {user} password = {password} schema = {schema} # The source input file(s) from dir and produce gml:featureMember elements [input_big_gml_files] class = inputs.fileinput.XmlElementStreamerFileInput file_path = {gml_files} element_tags = featureMember Top10NL Extract Parameter Substitution
  53. Top10NL+BAG (Dutch Topo + Buildings)
  54. BGT - Dutch Large Scale Topo
  55. Cases - INSPIRE Transforms •Simple: Dutch Admin Borders to AU •Advanced: Dutch Addresses to AD
  56. INSPIRE - XSLT STRUCTURE Local CP GML to INSPIRE SpatialDataset Local CP GML to INSPIRE GML Generate CP INSPIRE GML Reusable XSLT ScriptsReusable XSLT Scripts Theme CP Local AU GML to INSPIRE SpatialDataset Local AU GML to INSPIRE GML Generate AU INSPIRE GML Theme AU Local GN GML to INSPIRE SpatialDataset Local GN GML to INSPIRE GML Generate GN INSPIRE GML Theme GN Called by All Locally Specific XSL Generic XSL XSLT Template Call
  57. XSLT - 3 MAIN STEPS/SCRIPTS 1.Generate Spatial Dataset GML Container (specific) 2.Extract data values from local OGR simple feature data (specific) 3. Call XSLT template per Theme Feature type (generic)
  58. XSLT AU - STEP 1
  59. XSLT AU - STEP 2
  60. XSLT AU - STEP 3
  61. XSLT - REUSE
  62. STETL CONFIG
  63. STETL CONFIG AD
  64. Case: INSPIRE DL Services - Dutch Addresses Source <GML> NLExtract Stetl deegree WFS INSPIRE <GML> Atom Feed INSPIRE Addresses Dutch Addresses+ Buildings deegree blobstore Stetl Other Uses (Geocoder etc)
  65. Project Status - Sept 21, 2013 • v1.0.4 installable via PyPi • Documentation on www.stetl.org • Real world transforms done • Seeking feedback, support and contributors
  66. Rich GML Problem Solved?
  67. ThankYou ! www.stetl.org github.com/justb4/stetl
Advertisement