Open Source Geospatial Business Intelligence (GeoBI): Definition, architectures, projects, challenges and outlooks

  • 3,289 views
Uploaded on

Slides of the presentation I gave at the OSGIS 2011 conference, June 21-22, 2011, Nottingham, UK.

Slides of the presentation I gave at the OSGIS 2011 conference, June 21-22, 2011, Nottingham, UK.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,289
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
179
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Open Source Geospatial Business Intelligence (GeoBI):Definition, architectures, projects, challenges and outlooks Prof. Thierry Badard 3rd OSGIS Conference Nottingham University, Nottingham, UK – June 21-22, 2011
  • 2. Content of the presentation ● Open data / Data deluge ● BI & Geospatial BI (GeoBI) ● Definition ● Architecture ● Offerings ● Open source GeoBI solutions ● Conclusion & outlooks
  • 3. Open data, OSM, social networks, mobility ...
  • 4. Open data, OSM, social networks, mobility ...
  • 5. Open data, OSM, social networks, mobility ...
  • 6. Open data, OSM, social networks, mobility ...
  • 7. Open data, OSM, social networks, mobility ...
  • 8. Open data, OSM, social networks, mobility ...
  • 9. Open data, OSM, social networks, mobility ...
  • 10. Open data, OSM, social networks, mobility ...
  • 11. Are you ready for the data deluge?
  • 12. Nowadays, companies and public bodies … ● Store and face huge amounts of data, that are very diverse and heterogeneous ● Sales ● Stocks ● Socio-economical data ● Surveys ● … ● Impact assessment of their advertisment campaigns ● Website visits ● Social networks
  • 13. In a world that changes rapidly ... ● Which imposes to take informed and strategic decisions more and more often and rapidly in order to keep competitive ● So, they require to analyze these huge amounts of data rapidly, simply and efficiently ● For that they use more and more some BI (Business Intelligence) tools ● These tools produce: ● Synthesis reports (PDF, Excel, ...) ● Interactive analytical dashboards
  • 14. Reports and dashboards
  • 15. Reports and dashboards
  • 16. Reports and dashboards
  • 17. Reports and dashboards
  • 18. Reports and dashboards
  • 19. Reports and dashboards
  • 20. Location is everywhere! ;-)● About 80% of all data stored in corporate databases has a spatial component [Franklin, 1992] ● Mail address ● Postal/ZIP code ● Coordinates ● IP, phone number, …● As time, space (location) is a crucial dimension that should be taken into account when analyzing corporate data!
  • 21. How to take into account location … ● It requires to get the right tools in order to observe, handle and explore this type of information in BI tools ... ● Let us imagine you are HSBC … or ministry for public safety … or Haiti government ... ● Do you think that cross tables or charts are the best media to answer these type of questions ?
  • 22. The right medium is obviouslyLe bon médium, cest la carte ! a map!
  • 23. The right medium is obviouslyLe bon médium, cest la carte ! a map!
  • 24. The right medium is obviouslyLe bon médium, cest la carte ! a map!
  • 25. So, GeoBI (Geospatial BI) ...● Brings : ● The map ... ● and all the tools required to represent, navigate and analyse the geospatial dimension● In the very core of BI tools of the market● In order to fully take into account the spatial dimension in the corporate data analysis and decision processes based on these data!
  • 26. BI and geospatial industryIndustry Business Geospatial Web Intelligence 2.0 (BI) GeoBI 3 billions $ in software 8,8 billions $ in softwareSize (total of 15 billions $ IT) (increased of 22% in 2008)Players
  • 27. Back in 2005 ...● GeoSOA research team: research in BI, geospatial and mobiliy● Purposes: ● Propose ubiquitous and mobile decision making tools which make use of the geospatial dimension (mobile GeoBI), ● And find ways to deliver a business info adapted to the context of the user (location, environment, situation, pupose, ...)● But, unfortunately: ● No GeoBI tool were available and enabling: – The processing of large volumes of information that come from very diverse and heterogeneous data sources – The rapid delivery of these data to numerous concurrent users in a connected/disconnected modes (web services architecture) – The support of standards dedicated to mobility – With key GeoBI capabilities and with a rich, comprehensive and extensible API at an affordable cost – Relying on a robust and complete BI architecture in order to be able to fulfill all types of user requirements
  • 28. Classical architecture of a BI infrastructure● Transactional databases● Web ressources● XML, flat files, proprietary file formats (Excel spreadsheets, …)● LDAP● …
  • 29. The Data Warehouse: the crucial/central part! ● Repository of an organization’s historical data, for analysis purposes. ● Primarily destined to analysts and decision makers. ● Separate from operational (OLTP) systems (source data)  But often stored in relational DBMS: Oracle, MSSQL, PostgreSQL, MySQL, Ingres, … ● Contents are often presented in a summarized form (e.g. key performance indicators, dashboards, OLAP client applications, reports).  Need to define some metrics/measures
  • 30. The Data Warehouse: the crucial/central part! ● Optimized for:  Large volumes of data (up to terabytes);  Fast response (<10 s) to analytical queries (vs. update speed for transactional DB):  de-normalized data schemas (e.g. star or snowflake schemas),  Introduces some redundancy to avoid time consuming JOIN queries  all data are stored in the DW across time (no corrections),  summary (aggregate) data at different levels of details and/or time scales,  (multi)dimensional modeling (a dimension per analysis axis).  All data are interrelated according to the analysis axes (OLAP datacube paradigm) ● Focus is thus more on the analysis / correlation of large amount of data than on retrieving/updating a precise set of data! ● Specific methods to propagate updates into the DW needed!
  • 31. MDX query language ● MDX stands for MultiDimensional eXpressions ● Multidimensional query language ● De facto standard from Microsoft for SQL Server OLAP Services (now Analysis Services) ● Also implemented by other OLAP servers (Essbase, Mondrian) and clients (Proclarity, Excel PivotTables, Cognos, JPivot, …) ● MDX is for OLAP data cubes what SQL is for relational databases ● Looks like a SQL query but relies on a different model (close to the one used in spreasheets) ● SELECT { [Measures].[Store Sales] } ON COLUMNS, { [Date].[2002], [Date].[2003] } ON ROWS FROM Sales WHERE ( [Store].[USA].[CA] )
  • 32. Results representation ● SELECT { [Product].[All Products].[Drink], [Product].[All Products].[Food] } ON COLUMNS, { [Store].[All Stores].[USA].[WA].[Yakima].[Store 23], [Store].[All Stores].[USA].[CA].[Beverly Hills].[Store 6], [Store].[All Stores].[USA].[OR].[Portland].[Store 11] } ON ROWS FROM Warehouse WHERE ([Time].[1997], [Measures].[Units Shipped])  OLAP client software propose:  Alternate representation modes (pie charts, diagrams, etc.)  Different tools to refine queries/explore data  Drill down, roll up, pivot, …  Based on operators provided by MDX
  • 33. Geospatial BI adds maps and spatial analysis!Require to consistently integrate the geospatial component in all parts of the architecture!
  • 34. Pentaho open source BI software stack ● http://www.pentaho.org
  • 35. Pentaho open source BI software stack ● Pentaho (http://www.pentaho.org) Pentaho Reporting Kettle Mondrian Weka + CDF: Community Dashboard Framework + Other projects: olap4j, JPivot, Halogen, …
  • 36. Towards an open source geospatial BI stack & integration in various dashboard and reporting tools Spatial S • PostGIS • Oracle Spatial GeoKNIME Spatial
  • 37. An ETL tool is … ● A type of software used to populate databases or data warehouses from heterogeneous data sources. ● ETL stands for: − Extract – Extract data from data sources − Transform – Transformation of data in order to correct errors, make some data cleansing, change the data structure, make them compliant to defined standards, etc. − Load – Load transformed data into the target DBMS ● An ETL tool should manage the insertion of new data and the updating of existing data. ● Should be able to perform transformations from : − An OLTP system to another OLTP system − An OLTP system to an analytical data warehouse
  • 38. Why use an ETL tool? ● Automation of complex and repetitive data processing without producing any specific code ● Conversion between various data formats ● Migration of data from a DBMS to another ● Data feeding into various DBMS ● Population of analytical data warehouses for decision support purposes ● etc.
  • 39. GeoKettle ● GeoKettle is a "spatially-enabled" version of Pentaho Data Integration (Kettle) ● Kettle is a metadata-driven ETL with direct execution of transformations − No intermediate code generation! ● Support of several DBMS and file formats − DBMS support: MySQL, PostgreSQL, Oracle, DB2, MS SQL Server, ... (total of 37) − Read/write support of various data file formats: text, Excel, Access, DBF, XML, ... ● Numerous transformation steps ● Support of methods for the updating of DW
  • 40. GeoKettle ● GeoKettle provides a true and consistent integration of the spatial component − All steps provided by Kettle are able to deal with geospatial data types − Some geospatial dedicated steps have been added ● First release in May 2008: 2.5.2-20080531 ● Current stable version: 3.2.0-r188-20090706 ● To be released shortly: GeoKettle 2.0 with many new features! ● Released under LGPL at http://www.geokettle.org ● Used in different organizations and countries: − Some ministries, bank, insurance, integrators, … − E.g. GeoETL from Inova is in fact GeoKettle! :-) ● A growing community of users and developers
  • 41. GeoKettle ● Transformations vs. Jobs: − Running in parallel vs. running sequentially ● All can be stored in a central repository (database) − But each transformation or job could also be saved in a simple XML file! ● Offers different interfaces: − Spoon: GUI for the edition of transformations and jobs − Pan: command line interface for running transformations − Kitchen: command line interface for running jobs − Carte: Web service for the remote execution of transformations and jobs
  • 42. GeoKettle – Spoon
  • 43. GeoKettle ● Version 2.0 provides support for: − Handling geometry data types (based on JTS) − Accessing Geometry objects in JavaScript − It allows the definition of custom transformation steps by the user (“Modified JavaScript Value” step) − Topological predicates (Intersects, crosses, etc.) and aggregation operators (envelope, union, geometry collection, ...) − SRS definition and transformations − Input / Output with some spatial DBMS - Native support for Oracle, PostGIS and MySQL - MS SQL Server 2008 and SQLite/Spatialite - Ingres and IBM DB2 can be used but it requires some tricks − GIS file Input / Output: Shapefile, GML 3, KML 2.2 and OGR support (~33 vector data formats and DBMS) − Sensor Web (SOS) and metadata (CSW) services − Cartographic preview − Spatial analysis and advanced geo-processing
  • 44. GeoKettle ● GeoKettle releases are aligned with the ones of Pentaho Data Integration (Kettle), − GeoKettle then benefits all new features provided by PDI (Kettle). ● Kettle is natively designed to be deployed in cluster and web service environments. − It makes GeoKettle a perfect software component to be deployed as a service (SaaS) in cloud computing environments as those provided by Amazon EC2. − It enables then the scalable, distributed and on demand processing of large and complex volumes of geospatial data in minutes for critical applications and without requiring a company to invest in an expensive IT infrastructure of servers, networks and software.
  • 45. GeoKettle – Requirements and install ● Very simple installation procedure ● All you need is a Java Runtime Environment ● Version 5 or higher ● Just unzip the binary archive of GeoKettle ... ● And let’s go ! ● Run spoon.sh (UNIX/Linux/Mac) or spoon.bat (Windows) ● Need help, please visit our wiki: ● http://wiki.spatialytics.org
  • 46. GeoKettle - Demo -
  • 47. GeoKettle ● Upcoming features: − Implementation of data matching and conflation steps/jobs in order to allow advanced geometric data cleansing and comparison of geospatial datasets − Read/write support for other DBMS, GIS file formats and services − LAS (LiDAR), ... − WFS-T, Sensor Web (TML, SensorML, SOS-T, ...), ... − Dedicated steps for social media (Twitter, ...), OSM, generalization, ... − Support of the third dimension, (x,y,z, M), linear reference systems, raster ... − Raster support: development in progress of a plugin to integrate all capabilities provided by the Sextante library (BeETLe)
  • 48. GeoMondrian ● GeoMondrian is a "spatially-enabled" version of Pentaho Analysis Services (Mondrian) ● GeoMondrian brings to the Mondrian OLAP server what PostGIS brings to the PostgreSQL DBMS – i.e. a consistent and powerful support for geospatial data. ● Licensed under the EPL ● http://www.geo-mondrian.org
  • 49. GeoMondrian ● As far as we know, it is the first implementation of a true Spatial OLAP (SOLAP) Server − And it is an open source project! ;-) ● Provides a consistent integration of spatial objects into the OLAP data cube structure − Instead of fetching them from an external spatial DBMS, web service or a GIS file ● Implements a native Geometry data type ● Provides first spatial extensions to the MDX language − Add spatial analysis capabilities to the analytical queries ● At present, it only supports PostGIS datawarehouses − But other DBMS will be supported in the next version!
  • 50. Spatially enabled MDX ● Goal: bring to Mondrian and MDX what SQL spatial extensions do for relational DBMS (i.e. Simple Features for SQL and implementations such as PostGIS). ● Example query: filter spatial dimension members based on distance from a feature − SELECT {[Measures].[Population]} on columns, Filter( {[Unite geographique].[Region economique].members}, ST_Distance([Unitegeographique].CurrentMember.Properties("geom"), [Unite geographique].[Province].[Ontario].Properties("geom")) < 2.0 ) on rows FROM [Recensements] WHERE [Temps].[Rencensement 2001 (2001-2003)].[2001]
  • 51. Spatially enabled MDX ● Many more possibilities: − in-line geometry constructors (from WKT) − member filters based on topological predicates (intersects, contains, within, …) − spatial calculated members and measures (e.g. aggregates of spatial features, buffers) − calculations based on scalar attributes derived from spatial features (area, length, distance, …)
  • 52. GeoMondrian - Demo -
  • 53. SOLAPLayers ● SOLAPLayers is a lightweight cartographic component (framework) which enables navigation in geospatial (Spatial OLAP or SOLAP) data cubes, such as those handled by GeoMondrian. ● It aims to be integrated into existing dashboard frameworks in order to produce interactive geo-analytical dashboards. ● Such dashboards help in supporting the decision making process by including the geospatial dimension in the analysis of enterprise data. ● First version stems from a GSoC 2008 project performed under the umbrella of OSGeo. ● Licensed under BSD (client part) and EPL (server part). ● http://www.solaplayers.org
  • 54. SOLAPLayers v1 ● Version 1 was based on OpenLayers and Dojo ● It allows: − the connection with a Spatial OLAP server such as GeoMondrian, − some basic navigation capabilities in the geospatial data cubes, − and the cartographic representation of some measures as static or dynamic choropleth maps, maps with proportional symbols.
  • 55. SOLAPLayers v1
  • 56. SOLAPLayers v1
  • 57. SOLAPLayers v1 ● Version 1 was a mostly proof of concept! ● It presents important limitations: − Allows only the cartographic representation (no crosstabs or charts) − Works only for one measure and the spatial dimension ! − Offers limited navigation capabilities in the geospatial data cubes − Is able to connect to GeoMondrian only − Extending the framework is difficult due to the lack of flexibility and the poor documentation of Dojo, − Integration with other currently used geo-web and dashboard frameworks was difficult − ...
  • 58. SOLAPLayers 2.0 ● So, SOLAPLayers has undergone (and is still undergoing ;-) ) a deep re-engineering! ● Version 2 is fully based on ExtJS/GeoExt (and hence OpenLayers) − It will make its integration with other geo/web and BI/dashboard frameworks easier − It provides some new ExtJS components dedicated to GeoBI! − Based on the philosophy for the development of applications adopted by these geo-web frameworks, it allows an easier creation/maintenance of the produced geo-analytical dashboards! − Like ExtJS, it supports internationalization!
  • 59. SOLAPLayers 2.0 – Architecture 1 Built-in or LDAP Server SOLAP Server Authentication Native or XML/A OLAP4J MDX Client Server SOLAPJSON
  • 60. SOLAPLayers 2.0 – Architecture 1 Built-in or LDAP Server SOLAP Server Authentication Native or XML/A OLAP4J MDX OLAP Server Client Native or XML/A Server SOLAPJSON 2 Geospatial data source (WFS, DBMS, ...)
  • 61. SOLAPLayers 2.0 – Architecture 1 Built-in or LDAP Server SOLAP Server Authentication Native or XML/A OLAP4J MDX OLAP Server Client Native or XML/A Server SOLAPJSON 2 Bridge architecture Geospatial data source • Maximize what is in place in organisations • But, no Geo-MDX capabilities available! (WFS, DBMS, ...)
  • 62. SOLAPLayers 2.0 – Architecture 1 Built-in or LDAP Server SOLAP Server Authentication Native or XML/A OLAP4J MDX OLAP Server Client Native or XML/A Server SOLAPJSON 2 3 Geospatial data source (WFS, DBMS, ...) Geospatial data source (WFS, DBMS, ...)
  • 63. SOLAPLayers 2.0 – Architecture 1 Built-in or LDAP Server SOLAP Server Authentication Native or XML/A OLAP4J MDX OLAP Server Client Native or XML/A Server SOLAPJSON 2 • For simple geo-dashboards • Based on transactional data 3 • Thematic mapping Geospatial data source • No Geo-MDX and drill-down (WFS, DBMS, ...) Geospatial data source (WFS, DBMS, ...) or roll-up capabilities!
  • 64. SOLAPLayers 2.0 – Architecture ● Many more connectors to come … – GeoKettle – CDA (BI domain) – Oracle/MSSQL support – ... ● Many more output formats to be available ... – Image (PNG, ...) – KML – ...
  • 65. SOLAPLayers 2.0 – Geo-dashboard made easy! 1 Define the template of the dashboard in a HTML file
  • 66. SOLAPLayers 2.0 – Geo-dashboard made easy! 1 Define the template of the dashboard in a HTML file Define your dashboard components in a JS file 2 and map it to the div in the HTML file
  • 67. SOLAPLayers 2.0 – Geo-dashboard made easy! 3Enjoy! ;-)
  • 68. SOLAPLayers 2.0
  • 69. SOLAPLayers 2.0
  • 70. SOLAPLayers 2.0
  • 71. SOLAPLayers 2.0
  • 72. SOLAPLayers 2.0
  • 73. SOLAPLayers 2.0 - Demo -
  • 74. GeoKNIME ● Constats ● Quartiers pauvres – Tenderloin – Bayview – Mission ● Quartiers riches – Central – Southern – Northern – … ● Sud-Est de San Francisco – Criminalité élevée
  • 75. GeoKNIME
  • 76. 77éo
  • 77. Drogue/NarcotiqueViolence arméeProstitution 78
  • 78. Cluster#1 Cluster#2 Cluster#3
  • 79. Prédiction Nœud de configuration KNNValeurs réelles
  • 80. Conclusion & outlooks ● A rich and passionating topic ... ● And it is just an introduction ;-) ● Do not hesitate to ask for more demos! ● R&D is still required in this domain ... ● Open source allows cooperation/collaboration and makes research work easier: students can use the infrastructure to develop their own research without reinventing the wheel ● Based on this infrastructure, some research works on mobile GeoBI have already been performed: – Web services for the dissemination of geo-analytical data – Methods for real time updating and integration of data stemming from sensors – Definition of mobile GeoBI context profiles – Implementation of first mobile and context aware GeoBI clients –… ● Towards active geospatial datawarehouses ...