SlideShare a Scribd company logo
1 of 17
Download to read offline
GeoKettle: A powerful open
 source spatial ETL tool

               FOSS4G 2010

               Dr. Thierry Badard, CTO
               Spatialytics inc.
               Quebec, Canada
               tbadard@spatialytics.com




                                   Barcelona, Spain – Sept 9th, 2010
What is GeoKettle?
●
    It is part of the geospatial BI software stack
    developed initially by the GeoSOA research group at
    Laval University in Quebec …
         –   GeoKettle
         –   GeoMondrian
         –   SOLAPLayers

●
    But are now developed and supported by Spatialytics
         –   http://www.spatialytics.org (open source community)
         –   http://www.spatialytics.com (professional support, training)
●
    OK but … what is geospatial BI? ;-)
As you probably know …
●
    Business Intelligence applications are usually used to
    better understand historical, current and future
    aspects of business operations in a company.
●
    The applications typically offer ways to mine
    database- and spreadsheet-centric data, and
    produce graphical, table-based and other types of
    analytics regarding business operations.
●
    They support the decision process and allow to take
    more informed decision!
Data visualization to support decision …
As you probably know …
●   Business Intelligence applications are usually used to better
    understand historical, current and future aspects of business
    operations in a company.
●   The applications typically offer ways to mine database- and
    spreadsheet-centric data, and produce graphical, table-based and
    other types of analytics regarding business operations.
●   They support the decision process and allow to take more informed
    decision!
●   Rely on an architecture with robust components and applications:
    βˆ’   ETL tools & data warehousing (DW)
    βˆ’   On-line Analytical Processing (OLAP) servers and clients
    βˆ’   Reporting tools & dashboards
    βˆ’   Data mining
So, an ETL tool is …
●   A type of software used to populate databases or data
    warehouses from heterogeneous data sources.
●   ETL stands for:
    βˆ’   Extract – Extract data from data sources
    βˆ’   Transform – Transformation of data in order to correct errors, make
        some data cleansing, change the data structure, make them compliant
        to defined standards, etc.
    βˆ’   Load – Load transformed data into the target DBMS
●   An ETL tool should manage the insertion of new data and the
    updating of existing data.
●   Should be able to perform transformations from :
    βˆ’   An OLTP system to another OLTP system
    βˆ’   An OLTP system to an analytical data warehouse
Why use an ETL tool?
●   Automation of complex and repetitive data
    processing without producing any specific code
●   Conversion between various data formats
●   Migration of data from a DBMS to another
●   Data feeding into various DBMS
●   Population of analytical data warehouses for
    decision support purposes
●   etc.
GeoKettle
●   GeoKettle is a "spatially-enabled" version of Pentaho Data
    Integration (Kettle)
●   Kettle is a metadata-driven ETL with direct execution of
    transformations
    βˆ’   No intermediate code generation!
●   Support of several DBMS and file formats
    βˆ’   DBMS support: MySQL, PostgreSQL, Oracle, DB2, MS SQL
        Server, ... (total of 37)
    βˆ’   Read/write support of various data file formats: text, Excel, Access,
        DBF, XML, ...
●   Numerous transformation steps
●   Support of methods for the updating of DW
GeoKettle
●   GeoKettle provides a true and consistent integration of the spatial
    component
    βˆ’   All steps provided by Kettle are able to deal with geospatial data types
    βˆ’   Some geospatial dedicated steps have been added
●   First release in May 2008: 2.5.2-20080531
●   Current stable version: 3.2.0-r188-20090706
●   To be released shortly: GeoKettle 2.0 with many new features!
●   Released under LGPL at http://www.geokettle.org
●   Used in different organizations and countries:
    βˆ’   Some ministries, bank, insurance, integrators, …
    βˆ’   E.g. GeoETL from Inova is in fact GeoKettle! :-)
●   A growing community of users and developers
GeoKettle
●   Transformations vs. Jobs:
    βˆ’   Running in parallel vs. running sequentially
●   All can be stored in a central repository (database)
    βˆ’   But each transformation or job could also be saved in a simple XML
        file!
●   Offers different interfaces:
    βˆ’   Spoon: GUI for the edition of transformations and jobs
    βˆ’   Pan: command line interface for running transformations
    βˆ’   Kitchen: command line interface for running jobs
    βˆ’   Carte: Web service for the remote execution of transformations and
        jobs
GeoKettle - Spoon
●   Provides support for:
                                 GeoKettle
    βˆ’   Handling geometry data types (based on JTS)
    βˆ’   Accessing Geometry objects in JavaScript
    βˆ’   It allows the definition of custom transformation steps by the user (β€œModified
        JavaScript Value” step)
    βˆ’   Topological predicates (Intersects, crosses, etc.) and aggregation operators
        (envelope, union, geometry collection, ...)
    βˆ’   SRS definition and transformations
    βˆ’   Input / Output with some spatial DBMS
          - Native support for Oracle, PostGIS and MySQL
          - MS SQL Server 2008 and IBM DB2 can be used but it
            requires some tricks
    βˆ’   GIS file Input / Output: Shapefile, GML 3, KML 2.2 and OGR support (~33
        vector data formats and DBMS)
    βˆ’   Cartographic preview
GeoKettle
●
    GeoKettle releases are aligned with the ones of Pentaho
    Data Integration (Kettle),
    βˆ’   GeoKettle then benefits all new features provided by
        PDI (Kettle).
●
    Kettle is natively designed to be deployed in cluster and
    web service environments.
    βˆ’   It makes GeoKettle a perfect software component to be deployed
        as a service (SaaS) in cloud computing environments as those
        provided by Amazon EC2.
    βˆ’   It enables then the scalable, distributed and on demand processing
        of large and complex volumes of geospatial data in minutes for
        critical applications and without requiring a company to invest in an
        expensive IT infrastructure of servers, networks and software.
GeoKettle – Requirements and install
●
    Very simple installation procedure
●
    All you need is a Java Runtime Environment
         –   Version 5 or higher

●
    Just unzip the binary archive of GeoKettle ...
●
    And let’s go !
         –   Run spoon.sh (UNIX/Linux/Mac)
              or spoon.bat (Windows)

●
    Need help, please visit our wiki:
        – http://wiki.spatialytics.org
GeoKettle



- Demo -
GeoKettle
●
    Upcoming features:
    βˆ’   Implementation of data matching and conflation steps/jobs in order to
        allow geometric data cleansing and comparison of geospatial datasets
        (results of a Google Summer of Code, should be available in version 2.x)
    βˆ’   Read/write support for other DBMS, GIS file formats and services
         βˆ’   LAS (LiDAR), ...
         βˆ’   Native support for MS SQL Server 2008, ...
         βˆ’   WFS-T, Sensor Web (TML, SensorML, SOS, ...), ...
         βˆ’   GIS metadata and CSW
    βˆ’   Implementation of a β€œSpatial analysis” step with a GUI
    βˆ’   Dedicated steps for social media (Twitter, ...), OSM, generalization, ...
    βˆ’   Support of the third dimension
    βˆ’   Raster support: development in progress of a plugin to integrate all
        capabilities provided by the Sextante library (BeETLe)
Questions?
●   Thanks for your attention and do not hesitate to ask for more
    demos!
●   Contact:
    Dr. Thierry Badard, CTO
    Spatialytics inc.
    Quebec, Canada
    Email: tbadard@spatialytics.com
    Web: http://www.spatialytics.org
          http://www.spatialytics.com
    Twitter: tbadard & spatialytics


                      http://www.geokettle.org      Twitter : geokettle

                      http://www.geo-mondrian.org   Twitter : geomondrian

                      http://www.solaplayers.org    Twitter : solaplayers

More Related Content

What's hot

Mondrian - Geo Mondrian
Mondrian - Geo MondrianMondrian - Geo Mondrian
Mondrian - Geo MondrianSimone Campora
Β 
Mapping, GIS and geolocating data in Java @ JAX London
Mapping, GIS and geolocating data in Java @ JAX LondonMapping, GIS and geolocating data in Java @ JAX London
Mapping, GIS and geolocating data in Java @ JAX LondonJoachim Van der Auwera
Β 
Spatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data SharingSpatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data SharingSafe Software
Β 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP OverviewAlex Meadows
Β 
Taming the Survey Data "Tower of Babel"
Taming the Survey Data "Tower of Babel"Taming the Survey Data "Tower of Babel"
Taming the Survey Data "Tower of Babel"mercatorlem
Β 
OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...
OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...
OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...giovannibiallo
Β 
Samuel Bayeta
Samuel BayetaSamuel Bayeta
Samuel BayetaSam B
Β 
2008 Geodatabase Re Design V2
2008 Geodatabase Re Design V22008 Geodatabase Re Design V2
2008 Geodatabase Re Design V2davinci7_gis
Β 
Neeti resume 1
Neeti resume 1Neeti resume 1
Neeti resume 1Neeti pratap
Β 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsChristophe Debruyne
Β 
Top Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data ModelerTop Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data ModelerKent Graziano
Β 
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Cloudera, Inc.
Β 
Webinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroWebinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroSpagoWorld
Β 
2013 Enterprise Track, Building GIS, Decision Support, and Location Intellige...
2013 Enterprise Track, Building GIS, Decision Support, and Location Intellige...2013 Enterprise Track, Building GIS, Decision Support, and Location Intellige...
2013 Enterprise Track, Building GIS, Decision Support, and Location Intellige...GIS in the Rockies
Β 
Business Intelligence Open Source
Business Intelligence Open SourceBusiness Intelligence Open Source
Business Intelligence Open SourceRoberto Marchetto
Β 
Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems divjeev
Β 

What's hot (20)

Mondrian - Geo Mondrian
Mondrian - Geo MondrianMondrian - Geo Mondrian
Mondrian - Geo Mondrian
Β 
Mapping, GIS and geolocating data in Java @ JAX London
Mapping, GIS and geolocating data in Java @ JAX LondonMapping, GIS and geolocating data in Java @ JAX London
Mapping, GIS and geolocating data in Java @ JAX London
Β 
Spatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data SharingSpatial ETL For Web Services-Based Data Sharing
Spatial ETL For Web Services-Based Data Sharing
Β 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP Overview
Β 
jagadeesh updated
jagadeesh updatedjagadeesh updated
jagadeesh updated
Β 
Taming the Survey Data "Tower of Babel"
Taming the Survey Data "Tower of Babel"Taming the Survey Data "Tower of Babel"
Taming the Survey Data "Tower of Babel"
Β 
OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...
OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...
OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...
Β 
Samuel Bayeta
Samuel BayetaSamuel Bayeta
Samuel Bayeta
Β 
2008 Geodatabase Re Design V2
2008 Geodatabase Re Design V22008 Geodatabase Re Design V2
2008 Geodatabase Re Design V2
Β 
Gopi
GopiGopi
Gopi
Β 
Neeti resume 1
Neeti resume 1Neeti resume 1
Neeti resume 1
Β 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Β 
Top Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data ModelerTop Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data Modeler
Β 
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Hadoop World 2011: Unlocking the Value of Big Data with Oracle - Jean-Pierre ...
Β 
Geohosting
GeohostingGeohosting
Geohosting
Β 
Oracle: DW Design
Oracle: DW DesignOracle: DW Design
Oracle: DW Design
Β 
Webinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence IntroWebinar: Open Source Business Intelligence Intro
Webinar: Open Source Business Intelligence Intro
Β 
2013 Enterprise Track, Building GIS, Decision Support, and Location Intellige...
2013 Enterprise Track, Building GIS, Decision Support, and Location Intellige...2013 Enterprise Track, Building GIS, Decision Support, and Location Intellige...
2013 Enterprise Track, Building GIS, Decision Support, and Location Intellige...
Β 
Business Intelligence Open Source
Business Intelligence Open SourceBusiness Intelligence Open Source
Business Intelligence Open Source
Β 
Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems Magic quadrant for data warehouse database management systems
Magic quadrant for data warehouse database management systems
Β 

Similar to GeoKettle: A powerful open source spatial ETL tool

Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachDatabricks
Β 
Resume_sukanta_updated
Resume_sukanta_updatedResume_sukanta_updated
Resume_sukanta_updatedSukanta Saha
Β 
Resume_APRIL_updated
Resume_APRIL_updatedResume_APRIL_updated
Resume_APRIL_updatedSukanta Saha
Β 
Resume april updated
Resume april updatedResume april updated
Resume april updatedSukanta Saha
Β 
Pullareddy_tavva_resume.doc
Pullareddy_tavva_resume.docPullareddy_tavva_resume.doc
Pullareddy_tavva_resume.docT Pulla Reddy
Β 
PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...moneyjh
Β 
State of GeoServer
State of GeoServerState of GeoServer
State of GeoServerJody Garnett
Β 
Resume 11 2015
Resume 11 2015Resume 11 2015
Resume 11 2015Sukanta Saha
Β 
Build an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsBuild an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsShawn Zhu
Β 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Lucas Jellema
Β 
Aleksandr_Savelyev_Resume_Mar_2016
Aleksandr_Savelyev_Resume_Mar_2016Aleksandr_Savelyev_Resume_Mar_2016
Aleksandr_Savelyev_Resume_Mar_2016Aleksandr Savelyev
Β 
Ajith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETLAjith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETLAjith Kumar Pampatti
Β 
Tamilarasu_Uthirasamy_10Yrs_Resume
Tamilarasu_Uthirasamy_10Yrs_ResumeTamilarasu_Uthirasamy_10Yrs_Resume
Tamilarasu_Uthirasamy_10Yrs_ResumeTAMILARASU UTHIRASAMY
Β 
NaliniProfile
NaliniProfileNaliniProfile
NaliniProfileNalini Sahoo
Β 
Subhabrata Deb Resume
Subhabrata Deb ResumeSubhabrata Deb Resume
Subhabrata Deb ResumeSubhabrata Deb
Β 
State of GeoServer 2.12
State of GeoServer 2.12State of GeoServer 2.12
State of GeoServer 2.12GeoSolutions
Β 

Similar to GeoKettle: A powerful open source spatial ETL tool (20)

Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
Β 
Shivaprasada_Kodoth
Shivaprasada_KodothShivaprasada_Kodoth
Shivaprasada_Kodoth
Β 
Resume_sukanta_updated
Resume_sukanta_updatedResume_sukanta_updated
Resume_sukanta_updated
Β 
Resume_APRIL_updated
Resume_APRIL_updatedResume_APRIL_updated
Resume_APRIL_updated
Β 
Resume april updated
Resume april updatedResume april updated
Resume april updated
Β 
Resume
ResumeResume
Resume
Β 
Pullareddy_tavva_resume.doc
Pullareddy_tavva_resume.docPullareddy_tavva_resume.doc
Pullareddy_tavva_resume.doc
Β 
PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...
Β 
State of GeoServer
State of GeoServerState of GeoServer
State of GeoServer
Β 
Resume 11 2015
Resume 11 2015Resume 11 2015
Resume 11 2015
Β 
Build an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsBuild an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data Scientists
Β 
Oow2016 review-db-dev-bigdata-BI
Oow2016 review-db-dev-bigdata-BIOow2016 review-db-dev-bigdata-BI
Oow2016 review-db-dev-bigdata-BI
Β 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Β 
Aleksandr_Savelyev_Resume_Mar_2016
Aleksandr_Savelyev_Resume_Mar_2016Aleksandr_Savelyev_Resume_Mar_2016
Aleksandr_Savelyev_Resume_Mar_2016
Β 
Ajith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETLAjith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETL
Β 
Ganesh CV
Ganesh CVGanesh CV
Ganesh CV
Β 
Tamilarasu_Uthirasamy_10Yrs_Resume
Tamilarasu_Uthirasamy_10Yrs_ResumeTamilarasu_Uthirasamy_10Yrs_Resume
Tamilarasu_Uthirasamy_10Yrs_Resume
Β 
NaliniProfile
NaliniProfileNaliniProfile
NaliniProfile
Β 
Subhabrata Deb Resume
Subhabrata Deb ResumeSubhabrata Deb Resume
Subhabrata Deb Resume
Β 
State of GeoServer 2.12
State of GeoServer 2.12State of GeoServer 2.12
State of GeoServer 2.12
Β 

Recently uploaded

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
Β 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
Β 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
Β 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
Β 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
Β 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
Β 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
Β 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
Β 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
Β 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
Β 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
Β 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
Β 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
Β 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
Β 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
Β 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
Β 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
Β 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
Β 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
Β 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
Β 

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Β 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Β 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Β 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
Β 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Β 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Β 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Β 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
Β 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Β 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Β 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Β 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
Β 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Β 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Β 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Β 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Β 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Β 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Β 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Β 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Β 

GeoKettle: A powerful open source spatial ETL tool

  • 1. GeoKettle: A powerful open source spatial ETL tool FOSS4G 2010 Dr. Thierry Badard, CTO Spatialytics inc. Quebec, Canada tbadard@spatialytics.com Barcelona, Spain – Sept 9th, 2010
  • 2. What is GeoKettle? ● It is part of the geospatial BI software stack developed initially by the GeoSOA research group at Laval University in Quebec … – GeoKettle – GeoMondrian – SOLAPLayers ● But are now developed and supported by Spatialytics – http://www.spatialytics.org (open source community) – http://www.spatialytics.com (professional support, training) ● OK but … what is geospatial BI? ;-)
  • 3. As you probably know … ● Business Intelligence applications are usually used to better understand historical, current and future aspects of business operations in a company. ● The applications typically offer ways to mine database- and spreadsheet-centric data, and produce graphical, table-based and other types of analytics regarding business operations. ● They support the decision process and allow to take more informed decision!
  • 4. Data visualization to support decision …
  • 5. As you probably know … ● Business Intelligence applications are usually used to better understand historical, current and future aspects of business operations in a company. ● The applications typically offer ways to mine database- and spreadsheet-centric data, and produce graphical, table-based and other types of analytics regarding business operations. ● They support the decision process and allow to take more informed decision! ● Rely on an architecture with robust components and applications: βˆ’ ETL tools & data warehousing (DW) βˆ’ On-line Analytical Processing (OLAP) servers and clients βˆ’ Reporting tools & dashboards βˆ’ Data mining
  • 6. So, an ETL tool is … ● A type of software used to populate databases or data warehouses from heterogeneous data sources. ● ETL stands for: βˆ’ Extract – Extract data from data sources βˆ’ Transform – Transformation of data in order to correct errors, make some data cleansing, change the data structure, make them compliant to defined standards, etc. βˆ’ Load – Load transformed data into the target DBMS ● An ETL tool should manage the insertion of new data and the updating of existing data. ● Should be able to perform transformations from : βˆ’ An OLTP system to another OLTP system βˆ’ An OLTP system to an analytical data warehouse
  • 7. Why use an ETL tool? ● Automation of complex and repetitive data processing without producing any specific code ● Conversion between various data formats ● Migration of data from a DBMS to another ● Data feeding into various DBMS ● Population of analytical data warehouses for decision support purposes ● etc.
  • 8. GeoKettle ● GeoKettle is a "spatially-enabled" version of Pentaho Data Integration (Kettle) ● Kettle is a metadata-driven ETL with direct execution of transformations βˆ’ No intermediate code generation! ● Support of several DBMS and file formats βˆ’ DBMS support: MySQL, PostgreSQL, Oracle, DB2, MS SQL Server, ... (total of 37) βˆ’ Read/write support of various data file formats: text, Excel, Access, DBF, XML, ... ● Numerous transformation steps ● Support of methods for the updating of DW
  • 9. GeoKettle ● GeoKettle provides a true and consistent integration of the spatial component βˆ’ All steps provided by Kettle are able to deal with geospatial data types βˆ’ Some geospatial dedicated steps have been added ● First release in May 2008: 2.5.2-20080531 ● Current stable version: 3.2.0-r188-20090706 ● To be released shortly: GeoKettle 2.0 with many new features! ● Released under LGPL at http://www.geokettle.org ● Used in different organizations and countries: βˆ’ Some ministries, bank, insurance, integrators, … βˆ’ E.g. GeoETL from Inova is in fact GeoKettle! :-) ● A growing community of users and developers
  • 10. GeoKettle ● Transformations vs. Jobs: βˆ’ Running in parallel vs. running sequentially ● All can be stored in a central repository (database) βˆ’ But each transformation or job could also be saved in a simple XML file! ● Offers different interfaces: βˆ’ Spoon: GUI for the edition of transformations and jobs βˆ’ Pan: command line interface for running transformations βˆ’ Kitchen: command line interface for running jobs βˆ’ Carte: Web service for the remote execution of transformations and jobs
  • 12. ● Provides support for: GeoKettle βˆ’ Handling geometry data types (based on JTS) βˆ’ Accessing Geometry objects in JavaScript βˆ’ It allows the definition of custom transformation steps by the user (β€œModified JavaScript Value” step) βˆ’ Topological predicates (Intersects, crosses, etc.) and aggregation operators (envelope, union, geometry collection, ...) βˆ’ SRS definition and transformations βˆ’ Input / Output with some spatial DBMS - Native support for Oracle, PostGIS and MySQL - MS SQL Server 2008 and IBM DB2 can be used but it requires some tricks βˆ’ GIS file Input / Output: Shapefile, GML 3, KML 2.2 and OGR support (~33 vector data formats and DBMS) βˆ’ Cartographic preview
  • 13. GeoKettle ● GeoKettle releases are aligned with the ones of Pentaho Data Integration (Kettle), βˆ’ GeoKettle then benefits all new features provided by PDI (Kettle). ● Kettle is natively designed to be deployed in cluster and web service environments. βˆ’ It makes GeoKettle a perfect software component to be deployed as a service (SaaS) in cloud computing environments as those provided by Amazon EC2. βˆ’ It enables then the scalable, distributed and on demand processing of large and complex volumes of geospatial data in minutes for critical applications and without requiring a company to invest in an expensive IT infrastructure of servers, networks and software.
  • 14. GeoKettle – Requirements and install ● Very simple installation procedure ● All you need is a Java Runtime Environment – Version 5 or higher ● Just unzip the binary archive of GeoKettle ... ● And let’s go ! – Run spoon.sh (UNIX/Linux/Mac) or spoon.bat (Windows) ● Need help, please visit our wiki: – http://wiki.spatialytics.org
  • 16. GeoKettle ● Upcoming features: βˆ’ Implementation of data matching and conflation steps/jobs in order to allow geometric data cleansing and comparison of geospatial datasets (results of a Google Summer of Code, should be available in version 2.x) βˆ’ Read/write support for other DBMS, GIS file formats and services βˆ’ LAS (LiDAR), ... βˆ’ Native support for MS SQL Server 2008, ... βˆ’ WFS-T, Sensor Web (TML, SensorML, SOS, ...), ... βˆ’ GIS metadata and CSW βˆ’ Implementation of a β€œSpatial analysis” step with a GUI βˆ’ Dedicated steps for social media (Twitter, ...), OSM, generalization, ... βˆ’ Support of the third dimension βˆ’ Raster support: development in progress of a plugin to integrate all capabilities provided by the Sextante library (BeETLe)
  • 17. Questions? ● Thanks for your attention and do not hesitate to ask for more demos! ● Contact: Dr. Thierry Badard, CTO Spatialytics inc. Quebec, Canada Email: tbadard@spatialytics.com Web: http://www.spatialytics.org http://www.spatialytics.com Twitter: tbadard & spatialytics http://www.geokettle.org Twitter : geokettle http://www.geo-mondrian.org Twitter : geomondrian http://www.solaplayers.org Twitter : solaplayers