Open Source Geospatial Business Intelligence (GeoBI): Definition, architectures, projects, challenges and outlooks

Uploaded on

Slides of the presentation I gave at the OSGIS 2011 conference, June 21-22, 2011, Nottingham, UK.

Slides of the presentation I gave at the OSGIS 2011 conference, June 21-22, 2011, Nottingham, UK.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Open Source Geospatial Business Intelligence (GeoBI):Definition, architectures, projects, challenges and outlooks Prof. Thierry Badard 3rd OSGIS Conference Nottingham University, Nottingham, UK – June 21-22, 2011
  • 2. Content of the presentation ● Open data / Data deluge ● BI & Geospatial BI (GeoBI) ● Definition ● Architecture ● Offerings ● Open source GeoBI solutions ● Conclusion & outlooks
  • 3. Open data, OSM, social networks, mobility ...
  • 4. Open data, OSM, social networks, mobility ...
  • 5. Open data, OSM, social networks, mobility ...
  • 6. Open data, OSM, social networks, mobility ...
  • 7. Open data, OSM, social networks, mobility ...
  • 8. Open data, OSM, social networks, mobility ...
  • 9. Open data, OSM, social networks, mobility ...
  • 10. Open data, OSM, social networks, mobility ...
  • 11. Are you ready for the data deluge?
  • 12. Nowadays, companies and public bodies … ● Store and face huge amounts of data, that are very diverse and heterogeneous ● Sales ● Stocks ● Socio-economical data ● Surveys ● … ● Impact assessment of their advertisment campaigns ● Website visits ● Social networks
  • 13. In a world that changes rapidly ... ● Which imposes to take informed and strategic decisions more and more often and rapidly in order to keep competitive ● So, they require to analyze these huge amounts of data rapidly, simply and efficiently ● For that they use more and more some BI (Business Intelligence) tools ● These tools produce: ● Synthesis reports (PDF, Excel, ...) ● Interactive analytical dashboards
  • 14. Reports and dashboards
  • 15. Reports and dashboards
  • 16. Reports and dashboards
  • 17. Reports and dashboards
  • 18. Reports and dashboards
  • 19. Reports and dashboards
  • 20. Location is everywhere! ;-)● About 80% of all data stored in corporate databases has a spatial component [Franklin, 1992] ● Mail address ● Postal/ZIP code ● Coordinates ● IP, phone number, …● As time, space (location) is a crucial dimension that should be taken into account when analyzing corporate data!
  • 21. How to take into account location … ● It requires to get the right tools in order to observe, handle and explore this type of information in BI tools ... ● Let us imagine you are HSBC … or ministry for public safety … or Haiti government ... ● Do you think that cross tables or charts are the best media to answer these type of questions ?
  • 22. The right medium is obviouslyLe bon médium, cest la carte ! a map!
  • 23. The right medium is obviouslyLe bon médium, cest la carte ! a map!
  • 24. The right medium is obviouslyLe bon médium, cest la carte ! a map!
  • 25. So, GeoBI (Geospatial BI) ...● Brings : ● The map ... ● and all the tools required to represent, navigate and analyse the geospatial dimension● In the very core of BI tools of the market● In order to fully take into account the spatial dimension in the corporate data analysis and decision processes based on these data!
  • 26. BI and geospatial industryIndustry Business Geospatial Web Intelligence 2.0 (BI) GeoBI 3 billions $ in software 8,8 billions $ in softwareSize (total of 15 billions $ IT) (increased of 22% in 2008)Players
  • 27. Back in 2005 ...● GeoSOA research team: research in BI, geospatial and mobiliy● Purposes: ● Propose ubiquitous and mobile decision making tools which make use of the geospatial dimension (mobile GeoBI), ● And find ways to deliver a business info adapted to the context of the user (location, environment, situation, pupose, ...)● But, unfortunately: ● No GeoBI tool were available and enabling: – The processing of large volumes of information that come from very diverse and heterogeneous data sources – The rapid delivery of these data to numerous concurrent users in a connected/disconnected modes (web services architecture) – The support of standards dedicated to mobility – With key GeoBI capabilities and with a rich, comprehensive and extensible API at an affordable cost – Relying on a robust and complete BI architecture in order to be able to fulfill all types of user requirements
  • 28. Classical architecture of a BI infrastructure● Transactional databases● Web ressources● XML, flat files, proprietary file formats (Excel spreadsheets, …)● LDAP● …
  • 29. The Data Warehouse: the crucial/central part! ● Repository of an organization’s historical data, for analysis purposes. ● Primarily destined to analysts and decision makers. ● Separate from operational (OLTP) systems (source data)  But often stored in relational DBMS: Oracle, MSSQL, PostgreSQL, MySQL, Ingres, … ● Contents are often presented in a summarized form (e.g. key performance indicators, dashboards, OLAP client applications, reports).  Need to define some metrics/measures
  • 30. The Data Warehouse: the crucial/central part! ● Optimized for:  Large volumes of data (up to terabytes);  Fast response (<10 s) to analytical queries (vs. update speed for transactional DB):  de-normalized data schemas (e.g. star or snowflake schemas),  Introduces some redundancy to avoid time consuming JOIN queries  all data are stored in the DW across time (no corrections),  summary (aggregate) data at different levels of details and/or time scales,  (multi)dimensional modeling (a dimension per analysis axis).  All data are interrelated according to the analysis axes (OLAP datacube paradigm) ● Focus is thus more on the analysis / correlation of large amount of data than on retrieving/updating a precise set of data! ● Specific methods to propagate updates into the DW needed!
  • 31. MDX query language ● MDX stands for MultiDimensional eXpressions ● Multidimensional query language ● De facto standard from Microsoft for SQL Server OLAP Services (now Analysis Services) ● Also implemented by other OLAP servers (Essbase, Mondrian) and clients (Proclarity, Excel PivotTables, Cognos, JPivot, …) ● MDX is for OLAP data cubes what SQL is for relational databases ● Looks like a SQL query but relies on a different model (close to the one used in spreasheets) ● SELECT { [Measures].[Store Sales] } ON COLUMNS, { [Date].[2002], [Date].[2003] } ON ROWS FROM Sales WHERE ( [Store].[USA].[CA] )
  • 32. Results representation ● SELECT { [Product].[All Products].[Drink], [Product].[All Products].[Food] } ON COLUMNS, { [Store].[All Stores].[USA].[WA].[Yakima].[Store 23], [Store].[All Stores].[USA].[CA].[Beverly Hills].[Store 6], [Store].[All Stores].[USA].[OR].[Portland].[Store 11] } ON ROWS FROM Warehouse WHERE ([Time].[1997], [Measures].[Units Shipped])  OLAP client software propose:  Alternate representation modes (pie charts, diagrams, etc.)  Different tools to refine queries/explore data  Drill down, roll up, pivot, …  Based on operators provided by MDX
  • 33. Geospatial BI adds maps and spatial analysis!Require to consistently integrate the geospatial component in all parts of the architecture!
  • 34. Pentaho open source BI software stack ●
  • 35. Pentaho open source BI software stack ● Pentaho ( Pentaho Reporting Kettle Mondrian Weka + CDF: Community Dashboard Framework + Other projects: olap4j, JPivot, Halogen, …
  • 36. Towards an open source geospatial BI stack & integration in various dashboard and reporting tools Spatial S • PostGIS • Oracle Spatial GeoKNIME Spatial
  • 37. An ETL tool is … ● A type of software used to populate databases or data warehouses from heterogeneous data sources. ● ETL stands for: − Extract – Extract data from data sources − Transform – Transformation of data in order to correct errors, make some data cleansing, change the data structure, make them compliant to defined standards, etc. − Load – Load transformed data into the target DBMS ● An ETL tool should manage the insertion of new data and the updating of existing data. ● Should be able to perform transformations from : − An OLTP system to another OLTP system − An OLTP system to an analytical data warehouse
  • 38. Why use an ETL tool? ● Automation of complex and repetitive data processing without producing any specific code ● Conversion between various data formats ● Migration of data from a DBMS to another ● Data feeding into various DBMS ● Population of analytical data warehouses for decision support purposes ● etc.
  • 39. GeoKettle ● GeoKettle is a "spatially-enabled" version of Pentaho Data Integration (Kettle) ● Kettle is a metadata-driven ETL with direct execution of transformations − No intermediate code generation! ● Support of several DBMS and file formats − DBMS support: MySQL, PostgreSQL, Oracle, DB2, MS SQL Server, ... (total of 37) − Read/write support of various data file formats: text, Excel, Access, DBF, XML, ... ● Numerous transformation steps ● Support of methods for the updating of DW
  • 40. GeoKettle ● GeoKettle provides a true and consistent integration of the spatial component − All steps provided by Kettle are able to deal with geospatial data types − Some geospatial dedicated steps have been added ● First release in May 2008: 2.5.2-20080531 ● Current stable version: 3.2.0-r188-20090706 ● To be released shortly: GeoKettle 2.0 with many new features! ● Released under LGPL at ● Used in different organizations and countries: − Some ministries, bank, insurance, integrators, … − E.g. GeoETL from Inova is in fact GeoKettle! :-) ● A growing community of users and developers
  • 41. GeoKettle ● Transformations vs. Jobs: − Running in parallel vs. running sequentially ● All can be stored in a central repository (database) − But each transformation or job could also be saved in a simple XML file! ● Offers different interfaces: − Spoon: GUI for the edition of transformations and jobs − Pan: command line interface for running transformations − Kitchen: command line interface for running jobs − Carte: Web service for the remote execution of transformations and jobs
  • 42. GeoKettle – Spoon
  • 43. GeoKettle ● Version 2.0 provides support for: − Handling geometry data types (based on JTS) − Accessing Geometry objects in JavaScript − It allows the definition of custom transformation steps by the user (“Modified JavaScript Value” step) − Topological predicates (Intersects, crosses, etc.) and aggregation operators (envelope, union, geometry collection, ...) − SRS definition and transformations − Input / Output with some spatial DBMS - Native support for Oracle, PostGIS and MySQL - MS SQL Server 2008 and SQLite/Spatialite - Ingres and IBM DB2 can be used but it requires some tricks − GIS file Input / Output: Shapefile, GML 3, KML 2.2 and OGR support (~33 vector data formats and DBMS) − Sensor Web (SOS) and metadata (CSW) services − Cartographic preview − Spatial analysis and advanced geo-processing
  • 44. GeoKettle ● GeoKettle releases are aligned with the ones of Pentaho Data Integration (Kettle), − GeoKettle then benefits all new features provided by PDI (Kettle). ● Kettle is natively designed to be deployed in cluster and web service environments. − It makes GeoKettle a perfect software component to be deployed as a service (SaaS) in cloud computing environments as those provided by Amazon EC2. − It enables then the scalable, distributed and on demand processing of large and complex volumes of geospatial data in minutes for critical applications and without requiring a company to invest in an expensive IT infrastructure of servers, networks and software.
  • 45. GeoKettle – Requirements and install ● Very simple installation procedure ● All you need is a Java Runtime Environment ● Version 5 or higher ● Just unzip the binary archive of GeoKettle ... ● And let’s go ! ● Run (UNIX/Linux/Mac) or spoon.bat (Windows) ● Need help, please visit our wiki: ●
  • 46. GeoKettle - Demo -
  • 47. GeoKettle ● Upcoming features: − Implementation of data matching and conflation steps/jobs in order to allow advanced geometric data cleansing and comparison of geospatial datasets − Read/write support for other DBMS, GIS file formats and services − LAS (LiDAR), ... − WFS-T, Sensor Web (TML, SensorML, SOS-T, ...), ... − Dedicated steps for social media (Twitter, ...), OSM, generalization, ... − Support of the third dimension, (x,y,z, M), linear reference systems, raster ... − Raster support: development in progress of a plugin to integrate all capabilities provided by the Sextante library (BeETLe)
  • 48. GeoMondrian ● GeoMondrian is a "spatially-enabled" version of Pentaho Analysis Services (Mondrian) ● GeoMondrian brings to the Mondrian OLAP server what PostGIS brings to the PostgreSQL DBMS – i.e. a consistent and powerful support for geospatial data. ● Licensed under the EPL ●
  • 49. GeoMondrian ● As far as we know, it is the first implementation of a true Spatial OLAP (SOLAP) Server − And it is an open source project! ;-) ● Provides a consistent integration of spatial objects into the OLAP data cube structure − Instead of fetching them from an external spatial DBMS, web service or a GIS file ● Implements a native Geometry data type ● Provides first spatial extensions to the MDX language − Add spatial analysis capabilities to the analytical queries ● At present, it only supports PostGIS datawarehouses − But other DBMS will be supported in the next version!
  • 50. Spatially enabled MDX ● Goal: bring to Mondrian and MDX what SQL spatial extensions do for relational DBMS (i.e. Simple Features for SQL and implementations such as PostGIS). ● Example query: filter spatial dimension members based on distance from a feature − SELECT {[Measures].[Population]} on columns, Filter( {[Unite geographique].[Region economique].members}, ST_Distance([Unitegeographique].CurrentMember.Properties("geom"), [Unite geographique].[Province].[Ontario].Properties("geom")) < 2.0 ) on rows FROM [Recensements] WHERE [Temps].[Rencensement 2001 (2001-2003)].[2001]
  • 51. Spatially enabled MDX ● Many more possibilities: − in-line geometry constructors (from WKT) − member filters based on topological predicates (intersects, contains, within, …) − spatial calculated members and measures (e.g. aggregates of spatial features, buffers) − calculations based on scalar attributes derived from spatial features (area, length, distance, …)
  • 52. GeoMondrian - Demo -
  • 53. SOLAPLayers ● SOLAPLayers is a lightweight cartographic component (framework) which enables navigation in geospatial (Spatial OLAP or SOLAP) data cubes, such as those handled by GeoMondrian. ● It aims to be integrated into existing dashboard frameworks in order to produce interactive geo-analytical dashboards. ● Such dashboards help in supporting the decision making process by including the geospatial dimension in the analysis of enterprise data. ● First version stems from a GSoC 2008 project performed under the umbrella of OSGeo. ● Licensed under BSD (client part) and EPL (server part). ●
  • 54. SOLAPLayers v1 ● Version 1 was based on OpenLayers and Dojo ● It allows: − the connection with a Spatial OLAP server such as GeoMondrian, − some basic navigation capabilities in the geospatial data cubes, − and the cartographic representation of some measures as static or dynamic choropleth maps, maps with proportional symbols.
  • 55. SOLAPLayers v1
  • 56. SOLAPLayers v1
  • 57. SOLAPLayers v1 ● Version 1 was a mostly proof of concept! ● It presents important limitations: − Allows only the cartographic representation (no crosstabs or charts) − Works only for one measure and the spatial dimension ! − Offers limited navigation capabilities in the geospatial data cubes − Is able to connect to GeoMondrian only − Extending the framework is difficult due to the lack of flexibility and the poor documentation of Dojo, − Integration with other currently used geo-web and dashboard frameworks was difficult − ...
  • 58. SOLAPLayers 2.0 ● So, SOLAPLayers has undergone (and is still undergoing ;-) ) a deep re-engineering! ● Version 2 is fully based on ExtJS/GeoExt (and hence OpenLayers) − It will make its integration with other geo/web and BI/dashboard frameworks easier − It provides some new ExtJS components dedicated to GeoBI! − Based on the philosophy for the development of applications adopted by these geo-web frameworks, it allows an easier creation/maintenance of the produced geo-analytical dashboards! − Like ExtJS, it supports internationalization!
  • 59. SOLAPLayers 2.0 – Architecture 1 Built-in or LDAP Server SOLAP Server Authentication Native or XML/A OLAP4J MDX Client Server SOLAPJSON
  • 60. SOLAPLayers 2.0 – Architecture 1 Built-in or LDAP Server SOLAP Server Authentication Native or XML/A OLAP4J MDX OLAP Server Client Native or XML/A Server SOLAPJSON 2 Geospatial data source (WFS, DBMS, ...)
  • 61. SOLAPLayers 2.0 – Architecture 1 Built-in or LDAP Server SOLAP Server Authentication Native or XML/A OLAP4J MDX OLAP Server Client Native or XML/A Server SOLAPJSON 2 Bridge architecture Geospatial data source • Maximize what is in place in organisations • But, no Geo-MDX capabilities available! (WFS, DBMS, ...)
  • 62. SOLAPLayers 2.0 – Architecture 1 Built-in or LDAP Server SOLAP Server Authentication Native or XML/A OLAP4J MDX OLAP Server Client Native or XML/A Server SOLAPJSON 2 3 Geospatial data source (WFS, DBMS, ...) Geospatial data source (WFS, DBMS, ...)
  • 63. SOLAPLayers 2.0 – Architecture 1 Built-in or LDAP Server SOLAP Server Authentication Native or XML/A OLAP4J MDX OLAP Server Client Native or XML/A Server SOLAPJSON 2 • For simple geo-dashboards • Based on transactional data 3 • Thematic mapping Geospatial data source • No Geo-MDX and drill-down (WFS, DBMS, ...) Geospatial data source (WFS, DBMS, ...) or roll-up capabilities!
  • 64. SOLAPLayers 2.0 – Architecture ● Many more connectors to come … – GeoKettle – CDA (BI domain) – Oracle/MSSQL support – ... ● Many more output formats to be available ... – Image (PNG, ...) – KML – ...
  • 65. SOLAPLayers 2.0 – Geo-dashboard made easy! 1 Define the template of the dashboard in a HTML file
  • 66. SOLAPLayers 2.0 – Geo-dashboard made easy! 1 Define the template of the dashboard in a HTML file Define your dashboard components in a JS file 2 and map it to the div in the HTML file
  • 67. SOLAPLayers 2.0 – Geo-dashboard made easy! 3Enjoy! ;-)
  • 68. SOLAPLayers 2.0
  • 69. SOLAPLayers 2.0
  • 70. SOLAPLayers 2.0
  • 71. SOLAPLayers 2.0
  • 72. SOLAPLayers 2.0
  • 73. SOLAPLayers 2.0 - Demo -
  • 74. GeoKNIME ● Constats ● Quartiers pauvres – Tenderloin – Bayview – Mission ● Quartiers riches – Central – Southern – Northern – … ● Sud-Est de San Francisco – Criminalité élevée
  • 75. GeoKNIME
  • 76. 77éo
  • 77. Drogue/NarcotiqueViolence arméeProstitution 78
  • 78. Cluster#1 Cluster#2 Cluster#3
  • 79. Prédiction Nœud de configuration KNNValeurs réelles
  • 80. Conclusion & outlooks ● A rich and passionating topic ... ● And it is just an introduction ;-) ● Do not hesitate to ask for more demos! ● R&D is still required in this domain ... ● Open source allows cooperation/collaboration and makes research work easier: students can use the infrastructure to develop their own research without reinventing the wheel ● Based on this infrastructure, some research works on mobile GeoBI have already been performed: – Web services for the dissemination of geo-analytical data – Methods for real time updating and integration of data stemming from sensors – Definition of mobile GeoBI context profiles – Implementation of first mobile and context aware GeoBI clients –… ● Towards active geospatial datawarehouses ...