Spatial Data Integrator   software presentation and use cases National Geographic Community Meeting Day Ministry of Ecolog...
Summary <ul><li>Software presentation </li><ul><li>General aspects
Place of an ETL inside a data infrastructure
The different interface elements of SDI </li></ul><li>Demonstration: joining data and managing rejects </li><ul><li>Config...
Connecting the components insite the workspace
Configuring the tMap component
Executing the job </li></ul><li>Use cases </li><ul><li>Scheduling the aggregation of different sources of data
Merging layers
Chaining the quality checking of layers
Migrating data to PostgreSQL/PostGIS
Other applications </li></ul><li>Conclusion </li><ul><li>Some other functionalities
Links </li></ul></ul>
1- Software presentation
General aspects <ul><li>Opensource ETL  (Extract, Treat and Load) Software created by CampToCamp
Based on  Talend Open Studio
It adds a  spatial  layer to TOS thanks to geospatial access and treatment components
Developed on Java: Eclipse environment, UDig elements, GeoTools library, Java Topology Suite, Sextante </li></ul>
Place of an ETL in a data infrastructure Dashboards Portal
The interface elements the map window This windows enables to visualize geographic data. It is useful when controlling the...
The tool  The business modeler The business modeler enables to model the job processes Il allows a large public to take pa...
The interface elements The repository metadata tab The repository contains, among other things, the metadata part The meta...
The interface elements The graphical workspace The main window is where you create your jobs You pick your components and ...
The interface elements The components palette The palette contains the different components. It's a kind of toolbox Spatia...
The interface elements The configuration tab the bottom windows is where you configure the behaviour of each component it ...
2- Demonstration How to manage outer joins
Configuring the data access  and creating the schemas the first step consists in configuring the access to you data source.
Connecting the components  inside the workspace You put and connect the components inside the workspace
Configuring the tMap component Here, the city name links the two tables. Two output flows are generated: one for inner joi...
The job execution The job can now be executed There are two modes of execution: - statistics  mode displays the number of ...
Going further:  detecting similarities between rows Here, we use a fuzzy logic component named  tFuzzyMatch . It detects t...
3- Use Cases
Scheduling the aggregation of data A web geographic portal demands joining periodically the data from different sources He...
Scheduling the aggregation of data -SDI task scheduler  -crontab for Linux env -windows task  scheduler
Upcoming SlideShare
Loading in …5
×

Spatial Data Integrator - Software Presentation and Use Cases

2,387 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,387
On SlideShare
0
From Embeds
0
Number of Embeds
706
Actions
Shares
0
Downloads
35
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Spatial Data Integrator - Software Presentation and Use Cases

  1. 1. Spatial Data Integrator software presentation and use cases National Geographic Community Meeting Day Ministry of Ecology and Sustainable Development – Ministry of Agroculture mathieu.rajerison
  2. 2. Summary <ul><li>Software presentation </li><ul><li>General aspects
  3. 3. Place of an ETL inside a data infrastructure
  4. 4. The different interface elements of SDI </li></ul><li>Demonstration: joining data and managing rejects </li><ul><li>Configuring the access and creating the schemas
  5. 5. Connecting the components insite the workspace
  6. 6. Configuring the tMap component
  7. 7. Executing the job </li></ul><li>Use cases </li><ul><li>Scheduling the aggregation of different sources of data
  8. 8. Merging layers
  9. 9. Chaining the quality checking of layers
  10. 10. Migrating data to PostgreSQL/PostGIS
  11. 11. Other applications </li></ul><li>Conclusion </li><ul><li>Some other functionalities
  12. 12. Links </li></ul></ul>
  13. 13. 1- Software presentation
  14. 14. General aspects <ul><li>Opensource ETL (Extract, Treat and Load) Software created by CampToCamp
  15. 15. Based on Talend Open Studio
  16. 16. It adds a spatial layer to TOS thanks to geospatial access and treatment components
  17. 17. Developed on Java: Eclipse environment, UDig elements, GeoTools library, Java Topology Suite, Sextante </li></ul>
  18. 18. Place of an ETL in a data infrastructure Dashboards Portal
  19. 19. The interface elements the map window This windows enables to visualize geographic data. It is useful when controlling the results of a treatment. This windows is part of UDig Software.
  20. 20. The tool The business modeler The business modeler enables to model the job processes Il allows a large public to take part of of the data flow conception and to follow the advancement of development, without requiring any computer skills Modelling in this window has no impact on the job execution
  21. 21. The interface elements The repository metadata tab The repository contains, among other things, the metadata part The metadata part is a place where to store the data access parameters. On the image, you can notice-the different types of data sources. Note that the configuration of geographic data is not made inside the metadata part (we'll see that further in the demo)
  22. 22. The interface elements The graphical workspace The main window is where you create your jobs You pick your components and put them here There are different types of relations between components that won't be detailed in this keynote.
  23. 23. The interface elements The components palette The palette contains the different components. It's a kind of toolbox Spatial Data Integrator adds the geo part to it The palette is extensible thanks to the contributions of developers As it is opensource, you can develop your own components
  24. 24. The interface elements The configuration tab the bottom windows is where you configure the behaviour of each component it also enables you to parameter the execution of your job.
  25. 25. 2- Demonstration How to manage outer joins
  26. 26. Configuring the data access and creating the schemas the first step consists in configuring the access to you data source.
  27. 27. Connecting the components inside the workspace You put and connect the components inside the workspace
  28. 28. Configuring the tMap component Here, the city name links the two tables. Two output flows are generated: one for inner join results, one for the outer join ones.
  29. 29. The job execution The job can now be executed There are two modes of execution: - statistics mode displays the number of rows for each flow - traces mode displays its content Each of these modes is executed in streaming.
  30. 30. Going further: detecting similarities between rows Here, we use a fuzzy logic component named tFuzzyMatch . It detects the similarities between rows coming from two different flows. It can be useful to see which rows from a reference (lookup) table correspond the most to the outer join results.
  31. 31. 3- Use Cases
  32. 32. Scheduling the aggregation of data A web geographic portal demands joining periodically the data from different sources Here, it is an Access database fed by users. We'll associate its entries with the cities objects. WMS Access SHP BDCARTO Map Server Sybase XML ... Client part SCP SHP
  33. 33. Scheduling the aggregation of data -SDI task scheduler -crontab for Linux env -windows task scheduler
  34. 34. Merging layers Imagine a data infrastructure where geograhic layers are disseminated in as many files as cities. Consequently, there is one file per city. The jobs aims at merging all these files in one unique table. SHP5 SHP4 SHP3 SHP2 SHP1 SHP
  35. 35. Merging layers
  36. 36. Chaining the Quality Control of Digitalized Documents After having digitalized a huge mass of data, we must operate a complete control on it. The geometry of the objects and their attributes must be checked. This task is very time-consuming if we accomplish it with usual mapping softwares. checking the tables structure checking the content checking the geometric compliance comparison to the reference data
  37. 37. Chaining the Quality Control of Digitalized Documents With a single click, SDI enables to operate this series of controls Reports will list errors related to the objects geometric compliance or attribute values. checking the tables structure checking the content checking the geometric compliance comparison to the reference data
  38. 38. Chaining the Quality Control of Digitalized Documents
  39. 39. Chaining the Quality Control of Digitalized Documents Job comparing the Urban Planning Project Map to the Cadastral Reference Data.
  40. 40. Chaining the Quality Control of Digitalized Documents Tmap joining component Used function Result type row4.the_geom. symDifference (row2.the_geom) géométrique GeometryOperation.GETAREA (row4.the_geom.difference(row2.the_geom)) flottant
  41. 41. Migrating data into a PostgreSQL/PostGIS database At a regional scope, we want to mutualize data and integrate it into a PostgreSQL/postGIS database management system Folder tree Relational Database System
  42. 42. Migrating data into a PostgreSQL/PostGIS database
  43. 43. Other applications <ul><li>Mass geometric treatment : splitting or slicing objects using ones of a different layer
  44. 44. Dividing an image in multiple images, each cut using the city contour and naming each image with the name of the city it has been cut with
  45. 45. Using Talend with GDAL-OGR : conversion in other formats
  46. 46. Massive reprojections
  47. 47. Extending the possibilities by using auxiliar java libraries </li></ul>
  48. 48. Conclusion <ul><li>Shortens drastically the delay between the data collecting and its valorization
  49. 49. Enables to migrate , consolidate spatial data infrastructures
  50. 50. Simplifies usually time-consuming tasks
  51. 51. Avoids errors due to the repeating of manual operations, enhances the quality of controls
  52. 52. A very active community
  53. 53. New components are to be available </li></ul>
  54. 54. Some other functionalities <ul><li>Can read multiple formats, amongst which GPX , WFS and &quot;contemporary&quot; standards: OpenStreetMap , GeoRSS
  55. 55. Multiple accesses to data : SCP, FTP, WebServices, POP
  56. 56. Metadata automatic creation: MEF, XML files for GeoNetwork
  57. 57. Raster processing using Sextante </li></ul>
  58. 58. Conclusion Links <ul><li>Learn how to use Talend </li><ul><li>A general documentation , and one dedicated to the components covering multiple use cases </li></ul><li>Learn how to use Spatial Data Integrator </li><ul><li>A wiki </li></ul><li>Meet the community of users </li><ul><li>The spatial data integrator forum host by Tale nd </li></ul></ul>

×