Geospatial ETL with Stetl

490 views
384 views

Published on

Stetl, Streaming ETL, is a toolkit for the transformation (ETL) of geospatial data. Stetl is based on existing ETL tools like GDAL/OGR and XSLT. Stetl processing is driven from a configuration (.ini) file. Stetl is written in Python and in particular suited for processing GML. Several INSPIRE transformations have been successfully performed with Stetl.

This is an introductory presentation given at the OSGeo Bolsena Codesprint on June 4, 2013.

Find more info, downloads and documentation on Stetl at http://stetl.org

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
490
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Geospatial ETL with Stetl

  1. 1. Geospatial ETL with Stetl-“TamingYour Rich GML”Just van den BroeckeOSGeo Bolsena Codesprint 2013, Bolsena, ItalyJune 4, 2012www.justobjects.nl
  2. 2. About MeIndependent Open Source Geospatial ProfessionalSecretary OSGeo Dutch Local ChapterMember of the Dutch OpenGeoGroepJust van den Broeckejust@justobjects.nlwww.justobjects.nl
  3. 3. OSGeo - Bolsena - 2010
  4. 4. BOLSENA2012
  5. 5. ALLES VORBEI ?BOLSENA2012
  6. 6. BOLSENA2012
  7. 7. We have aProblem
  8. 8. The Rich GMLProblem
  9. 9. Rich GML = Complex Mess
  10. 10. INSPIREDutch National DSsAFIS-ALKIS-ATKIS..
  11. 11. “Semi GML”e.g. Dutch Addresses & Buildings (BAG)
  12. 12. The Streetname!Application Schema GMLe.g. INSPIRE Addresses
  13. 13. ComplexModelTransformations
  14. 14. 100+ MBGML Files
  15. 15. MillionsofObjects
  16. 16. 10s of Millionsof<Elements>
  17. 17. MultipleTransformationSteps
  18. 18. Solution isSpatial ETL
  19. 19. A.K.A.
  20. 20. ThankYoufor yourAttention!
  21. 21. But what about.......FOSS ?... Stetl?
  22. 22. FOSS ETL - Lower LevelEach Powerful by Itselfogr2ogr
  23. 23. FOSS ETL - High Level
  24. 24. FOSS ETL - DIY ? (No!)
  25. 25. FOSS ETL - How to Combine?=+ + ?ogr2ogr
  26. 26. Example - 2011 INSPIRE-FOSShttp://inspire.kademo.nl/doc/design-etl.htmlGood ideas buthard to scaleand reuse.Need Framework
  27. 27. FOSS ETL - Add Python to Equation=+ + ?( )ogr2ogr
  28. 28. =+ +Stetl( )ogr2ogr
  29. 29. Stetl=SimpleStreamingSpatialSpeedyETL
  30. 30. Process ChainInput Filter OutputgmlFilterStetl concepts
  31. 31. Speed: StreamingInput Filter OutputgmlStetl concepts
  32. 32. Speed: Going NativeInput Filter Outputgmlogr2ogr sETLsETLNative C Libs/ProgsCallsStetl concepts
  33. 33. Example: GML to PostGISReaderXMLSplitterogr2ogrgmlStetl concepts
  34. 34. Example: INSPIRE Model Transformogr2ogr XSLT WritergmlStetl concepts
  35. 35. Example: deegree Storeogr2ogr XSLTdeegreeWriterStetl concepts
  36. 36. Process Chain - How?Input Filters OutputStetl concepts
  37. 37. Example: XML to ShapeThe Source
  38. 38. Example: XML to ShapeThe XSLT Script
  39. 39. Example: XML to ShapeXSLT Transform to GML
  40. 40. Example: XML to ShapeXMLInputXSLTFilterogr2ogrOutput
  41. 41. Example: XML to ShapeThe SETL Chain Config FileProcessChainReaderXSLTogr2ogr
  42. 42. Example: XsltFilter Pythonfrom util import Util, etreefrom filter import Filterfrom packet import FORMATlog = Util.get_log("xsltfilter")class XsltFilter(Filter):# Constructordef __init__(self, configdict, section):Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)self.xslt_file_path = self.cfg.get(script)self.xslt_file = open(self.xslt_file_path, r)# Parse XSLT file only onceself.xslt_doc = etree.parse(self.xslt_file)self.xslt_obj = etree.XSLT(self.xslt_doc)self.xslt_file.close()def invoke(self, packet):if packet.data is None:return packetreturn self.transform(packet)def transform(self, packet):packet.data = self.xslt_obj(packet.data)log.info("XSLT Transform OK")return packet
  43. 43. Example ComponentsInput Filters OutputStetl conceptsXMLFile XSLT GMLFileogr2gml GMLSplitter gml2ogrLineStream XMLValidator WFS-Tdeegree* FeatureExtractor deegree*YourInput YourFilter YourOutput
  44. 44. [etl]chains = input_xml_file|my_filter|output_std[input_xml_file]class = inputs.fileinput.XmlFileInputfile_path = input/cities.xml# My custom component[my_filter]class = my.myfilter.MyFilter[output_std]class = outputs.standardoutput.StandardXmlOutputclass MyFilter(Filter):# Constructordef __init__(self, configdict, section):Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc,produces=FORMAT.etree_doc)def invoke(self, packet):log.info("CALLING MyFilter OK!!!!")return packetYour Own ComponentsStetl conceptsStep 1- Define ClassStep 2- Config Class
  45. 45. Data StructuresStetl concepts✴ Components exchange Packets✴ Packet contains data and status✴ Data formats:xml_line_streametree_docetree_feature_arrayxml_doc_as_stringany
  46. 46. deegree IntegrationStetl concepts✴InputDeegreeBlobstoreInput✴OutputDeegreeBlobstoreInputDeegreeFSLoaderOutputWFSTOutput
  47. 47. Cases✴INSPIRE Download Servicespublish to deegree store (WFS)GML files (for Atom Feed)✴National GML DatasetsGML to PostGIS (Top10NL, BGT)
  48. 48. [etl]chains = input_sql_pre|schema_name_filter|output_postgres,input_big_gml_files|xml_assembler|transformer_xslt|output_ogr2ogr,input_sql_post|schema_name_filter|output_postgres# Pre SQL file inputs to be executed[input_sql_pre]class = inputs.fileinput.StringFileInputfile_path = sql/drop-tables.sql,sql/create-schema.sql# Post SQL file inputs to be executed[input_sql_post]class = inputs.fileinput.StringFileInputfile_path = sql/delete-duplicates.sql# Generic filter to substitute Python-format string values like {schema} in string[schema_name_filter]class = filters.stringfilter.StringSubstitutionFilter# format args {schema} is schema nameformat_args = schema:{schema}[output_postgres]class = outputs.dboutput.PostgresDbOutputdatabase = {database}host = {host}port = {port}user = {user}password = {password}schema = {schema}# The source input file(s) from dir and produce gml:featureMember elements[input_big_gml_files]class = inputs.fileinput.XmlElementStreamerFileInputfile_path = {gml_files}element_tags = featureMemberTop10NL Extract
  49. 49. Case: INSPIRE DL Services -Dutch AddressesSource<GML>NLExtractStetldeegreeWFSINSPIRE<GML>AtomFeedINSPIREAddressesDutchAddresses+BuildingsdeegreeblobstoreStetl
  50. 50. ThankYou !stetl.orggithub.com/justb4/stetlinspire-foss.org

×