The document discusses geospatial business intelligence (GeoBI), including its definition, architectures, open source solutions, and outlook. Specifically, it defines GeoBI as bringing maps and geospatial analysis tools into BI systems to fully analyze spatial dimensions in corporate data. It presents open source GeoBI solutions like GeoKettle and GeoMondrian, describing their roles in extracting, transforming, loading, and analyzing geospatial data in BI systems.
GeoKettle, GeoMondrian et Spatialytics : une suite open source de GeoBIACSG Section Montréal
Avec la disponibilité croissante de données très diverses, géospatiales ou non, liés aux efforts et argents investis dans la collection et mise en place de nombreux référentiels ou bases de données tant dans les entreprises que dans les institutions gouvernementales, se fait jour le besoin de plus en plus important et pressant de croiser ces différentes sources de données afin d'appuyer la prise de décision et ceci dans des domaines très variés : banque, assurance, environnement, infrastructures, santé, changements climatiques, ... Les technologies de Business Intelligence (Intelligence d'affaires) permettent le croisement de telles masses de données et leur exploration interactive et rapide afin de dégager des tendances ou mettre en lumière différents phénomènes et ainsi prendre en pleine connaissance de cause des décisions éclairées afin de contrer les effets, corriger ou améliorer la situation observée. La pile logicielle open source disponible et supportée à Spatialytics.org a été développée par l’équipe de recherche GeoSOA mené par le Dr. Thierry Badard de l’Université Laval. La suite Geo-BI est constituée de GeoKettle (un outil ETL spatial), GeoMondrian (un serveur Spatial OLAP ou SOLAP) et SOLAPLayers (un composant cartographique permettant la navigation dans les cubes de données géo-décisionnelles) et permet la pleine prise en compte de la composante spatiale dans l'analyse de ces grandes masses d'information. Il devient ainsi possible d'observer certaines tendances, et ceci à différents niveaux de détails et différentes époques, par le biais de cartes qui viennent s'ajouter aux moyens usuels de représentation des données de synthèse que sont les tableaux croisés et les graphiques dans les outils décisionnels (tableau de bord, outil de reporting, ...). L'estimation de la répartition spatiale de tels ou tels phénomènes ou de son évolution spatiotemporelle est rendue possible et facile. Cette présentation se veut donc une vitrine technologique dans laquelle seront présentés les différents outils géo-décisionnels disponibles à Spatialytics.org , des exemples et démonstrations permettront d'en comprendre le fonctionnement, la portée et pertinence.
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayersThierry Badard
Slides of the presentation about GeoMondrian and SOLAPLayers I gave during the 1st rendez-vous OSGeo-Quebec (http://rendez-vous-osgeo-qc.org/2010) at Saguenay, Quebec, Canada on June 15-16, 2010.
Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...Thierry Badard
This document discusses spatially enabled open source business intelligence (GeoBI) tools including GeoKettle, GeoMondrian, and SOLAPLayers. GeoKettle adds spatial capabilities to Pentaho Data Integration. GeoMondrian does the same for Pentaho Analysis Services (Mondrian) by integrating spatial objects into OLAP data cubes. SOLAPLayers provides a lightweight framework for building interactive geospatial dashboards using these tools. The document demonstrates the capabilities and architecture of these projects.
Open Source Geospatial Business Intelligence (Geo-BI)Thierry Badard
The document discusses geospatial business intelligence (Geo-BI). It defines BI as using data analysis to understand business operations and support decision making. Geo-BI adds spatial analysis and maps to allow exploration of spatial relationships in data. About 80% of corporate data has a spatial component, and representing some phenomena on maps can better aid interpretation and decisions. The document introduces open source Geo-BI software from Spatialytics, including Pentaho tools for ETL, OLAP, and reporting integrated with geospatial capabilities.
Bringing Geospatial Business Intelligenceto the Enterprisemkarren
KOREM provides geospatial business intelligence (location intelligence) solutions to help organizations understand how location impacts business operations. They offer consulting, data management, software integration, and training services. Their solutions integrate spatial data and analysis with existing business intelligence tools to provide strategic, operational, and analytic insights. KOREM works across industries with both public and private sector customers to develop customized geospatial business intelligence applications.
Open source Geospatial Business Intelligence in action with GeoMondrian and S...Thierry Badard
This document introduces GeoMondrian and SOLAPLayers, open source geospatial business intelligence tools. GeoMondrian is a spatially-enabled version of the Pentaho Mondrian OLAP server that integrates spatial data and analysis capabilities into OLAP data cubes. SOLAPLayers is a lightweight cartographic component that enables interactive map-based exploration of geospatial data cubes from servers like GeoMondrian. The document discusses the architecture and capabilities of both tools, demonstrates them, and outlines the roadmap for future development including more advanced SOLAPLayers components for creating geospatial dashboards.
GeoKettle: A powerful open source spatial ETL toolThierry Badard
GeoKettle is an open source spatial ETL tool that is part of a geospatial business intelligence software stack. It is based on Pentaho Data Integration and provides consistent integration of spatial data types and capabilities. GeoKettle allows automated extraction, transformation, and loading of data across various sources into data warehouses. It supports spatial operations and can be deployed in cloud environments for scalable geospatial data processing.
The document discusses Mondrian, an open source OLAP server written in Java. It can be used to develop a trajectory data warehouse and interactively analyze large datasets stored in SQL databases without writing SQL. Mondrian uses MDX and XML for querying and cube definition. It provides an OLAP view of relational data and enables fast, on-line analytical processing through aggregation and caching. GeoMondrian extends it with spatial/GIS data types and functions for geographical analysis.
GeoKettle, GeoMondrian et Spatialytics : une suite open source de GeoBIACSG Section Montréal
Avec la disponibilité croissante de données très diverses, géospatiales ou non, liés aux efforts et argents investis dans la collection et mise en place de nombreux référentiels ou bases de données tant dans les entreprises que dans les institutions gouvernementales, se fait jour le besoin de plus en plus important et pressant de croiser ces différentes sources de données afin d'appuyer la prise de décision et ceci dans des domaines très variés : banque, assurance, environnement, infrastructures, santé, changements climatiques, ... Les technologies de Business Intelligence (Intelligence d'affaires) permettent le croisement de telles masses de données et leur exploration interactive et rapide afin de dégager des tendances ou mettre en lumière différents phénomènes et ainsi prendre en pleine connaissance de cause des décisions éclairées afin de contrer les effets, corriger ou améliorer la situation observée. La pile logicielle open source disponible et supportée à Spatialytics.org a été développée par l’équipe de recherche GeoSOA mené par le Dr. Thierry Badard de l’Université Laval. La suite Geo-BI est constituée de GeoKettle (un outil ETL spatial), GeoMondrian (un serveur Spatial OLAP ou SOLAP) et SOLAPLayers (un composant cartographique permettant la navigation dans les cubes de données géo-décisionnelles) et permet la pleine prise en compte de la composante spatiale dans l'analyse de ces grandes masses d'information. Il devient ainsi possible d'observer certaines tendances, et ceci à différents niveaux de détails et différentes époques, par le biais de cartes qui viennent s'ajouter aux moyens usuels de représentation des données de synthèse que sont les tableaux croisés et les graphiques dans les outils décisionnels (tableau de bord, outil de reporting, ...). L'estimation de la répartition spatiale de tels ou tels phénomènes ou de son évolution spatiotemporelle est rendue possible et facile. Cette présentation se veut donc une vitrine technologique dans laquelle seront présentés les différents outils géo-décisionnels disponibles à Spatialytics.org , des exemples et démonstrations permettront d'en comprendre le fonctionnement, la portée et pertinence.
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayersThierry Badard
Slides of the presentation about GeoMondrian and SOLAPLayers I gave during the 1st rendez-vous OSGeo-Quebec (http://rendez-vous-osgeo-qc.org/2010) at Saguenay, Quebec, Canada on June 15-16, 2010.
Spatially enabled open source BI (GeoBI) with GeoKettle, GeoMondrian & SOLAPL...Thierry Badard
This document discusses spatially enabled open source business intelligence (GeoBI) tools including GeoKettle, GeoMondrian, and SOLAPLayers. GeoKettle adds spatial capabilities to Pentaho Data Integration. GeoMondrian does the same for Pentaho Analysis Services (Mondrian) by integrating spatial objects into OLAP data cubes. SOLAPLayers provides a lightweight framework for building interactive geospatial dashboards using these tools. The document demonstrates the capabilities and architecture of these projects.
Open Source Geospatial Business Intelligence (Geo-BI)Thierry Badard
The document discusses geospatial business intelligence (Geo-BI). It defines BI as using data analysis to understand business operations and support decision making. Geo-BI adds spatial analysis and maps to allow exploration of spatial relationships in data. About 80% of corporate data has a spatial component, and representing some phenomena on maps can better aid interpretation and decisions. The document introduces open source Geo-BI software from Spatialytics, including Pentaho tools for ETL, OLAP, and reporting integrated with geospatial capabilities.
Bringing Geospatial Business Intelligenceto the Enterprisemkarren
KOREM provides geospatial business intelligence (location intelligence) solutions to help organizations understand how location impacts business operations. They offer consulting, data management, software integration, and training services. Their solutions integrate spatial data and analysis with existing business intelligence tools to provide strategic, operational, and analytic insights. KOREM works across industries with both public and private sector customers to develop customized geospatial business intelligence applications.
Open source Geospatial Business Intelligence in action with GeoMondrian and S...Thierry Badard
This document introduces GeoMondrian and SOLAPLayers, open source geospatial business intelligence tools. GeoMondrian is a spatially-enabled version of the Pentaho Mondrian OLAP server that integrates spatial data and analysis capabilities into OLAP data cubes. SOLAPLayers is a lightweight cartographic component that enables interactive map-based exploration of geospatial data cubes from servers like GeoMondrian. The document discusses the architecture and capabilities of both tools, demonstrates them, and outlines the roadmap for future development including more advanced SOLAPLayers components for creating geospatial dashboards.
GeoKettle: A powerful open source spatial ETL toolThierry Badard
GeoKettle is an open source spatial ETL tool that is part of a geospatial business intelligence software stack. It is based on Pentaho Data Integration and provides consistent integration of spatial data types and capabilities. GeoKettle allows automated extraction, transformation, and loading of data across various sources into data warehouses. It supports spatial operations and can be deployed in cloud environments for scalable geospatial data processing.
The document discusses Mondrian, an open source OLAP server written in Java. It can be used to develop a trajectory data warehouse and interactively analyze large datasets stored in SQL databases without writing SQL. Mondrian uses MDX and XML for querying and cube definition. It provides an OLAP view of relational data and enables fast, on-line analytical processing through aggregation and caching. GeoMondrian extends it with spatial/GIS data types and functions for geographical analysis.
Spatial ETL For Web Services-Based Data SharingSafe Software
Spatial ETL is a process that extracts, transforms, and loads spatial data to enable data sharing through web services. It supports various spatial data formats and sources. Spatial ETL can transform and integrate data from multiple sources into a single data model and output format. It then loads and publishes the data to make it available through web services for use in applications and by consumers on demand. Spatial ETL plays a key role in enabling organizations to leverage web services and share their spatial data.
Taming the Survey Data "Tower of Babel"mercatorlem
The document discusses managing diverse survey data from multiple sources in AutoCAD. It recommends using field codes and layers to organize data by type and source. Direct survey measurements are given prominent styles, while indirect data like scans use subdued styles. The goal is to make the nature and quality of data self-evident in CAD files to avoid mixing hazards and ensure clarity for clients.
OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...giovannibiallo
(1) The document discusses GetLOD, an open source solution for publishing geographic data as Linked Open Data. It allows publishing geospatial data and metadata from traditional cartographic web services as open, linkable RDF data.
(2) GetLOD is integrated with Spatial Data Infrastructures through OGC standards and allows publishing geographic open data in RDF and other formats like Shapefile and GML.
(3) The Region Emilia-Romagna uses GetLOD and Moka to organize their SDI and build applications, while also publishing open data through GetLOD's services. SDI and open data infrastructures will interoperate through Moka.
New tools are being developed by Czech Living Lab WirelessInfo, which allow users to easily publish their data and metadata as part of a Spatial Data Infrastructure (SDI). The paper describes the design of a Technological Infrastructure on the basis of ISO and OGC Standards and also the implementation of a prototype and first experiences. The solution is designed in distributed system form, which provides the connection to metadata about spatial data and services. This solution tests the principle of catalogue services at both national and international level which could be used in the UN SDI context. A catalogue portal is one of the independent components of GeoHosting complex system for raster and vector spatial data sharing. The catalogue portal provides data source searching on the basis of their metadata records through structure queries. The portal also contains edit functionality for new metadata records creating or editing. The metadata catalogue system corresponds to ISO 19115/19119/19139 standards [1], [2], [3], [4] and provides for cascade searching on the other standardized catalogue systems. The difference is, there exist different other initiatives offering publishing of own content like Google technology or OpenStreet Map, that GeoHosting is based fully on INSPIRE European standards and support establishing of network of distributed servers.
This document contains the resume of Ramesh, who has over 5 years of experience in SAP EIM technologies like BODS, BODI, Information Steward, and HANA. He has extensive experience designing and implementing ETL processes to extract, transform, and load data from various source systems into data warehouses. Ramesh has worked on multiple projects involving SAP data integration, data governance, and BI reporting.
This document contains the resume of Ramesh, who has over 5 years of experience in SAP EIM technologies including SAP Data Services, SAP Information Steward, and SAP HANA. He has extensive experience designing and implementing ETL processes, data integration projects, data governance solutions, and reporting dashboards. Ramesh has worked on multiple projects for clients in various industries.
The document summarizes a session from the 2011 Esri European User Conference in Madrid on data management and data exchange using geodatabases and interoperability. The session covered an overview of Esri geodatabases, geodatabase workflows and editing, database administration, and approaches for data storage and connection. Geodatabase concepts discussed included features, feature classes, relationships, and complex data types. Versioning, replication, and conflict resolution in a multi-user editing environment were also summarized.
Enterprise geodatabase sql access and administrationbrentpierce
The document provides an overview of accessing and administering an enterprise geodatabase through SQL and Python. It discusses how the geodatabase is based on relational database principles with user data stored in tables and system metadata stored in system tables. It describes how spatial types store geometry data and the benefits of using SQL to access and edit geodatabase content. The document also outlines how Python can be used for geodatabase administration tasks like schema creation, maintenance, and publishing tools.
Ozri 2013 Brisbane, Australia - Geodatabase EfficienciesWalter Simonazzi
The document discusses best practices for geodatabase design, storage, versioning, and performance management. It provides guidance to a new spatial data administrator, Dave, on assessing his organization's existing spatial data needs, creating a data model and governance plan, importing data, and collaborating with the database administrator to optimize database performance over time. Key topics covered include geodatabase design, storage and migration strategies, when and how to implement versioning, using replication to distribute data, and techniques for maintaining optimal performance such as attribute indexing, statistics, and scheduling maintenance tasks.
ESRI Mapping & Charting Solution: ArcGIS 10 Production Mappingmmarques_esri
This document describes mapping and charting solutions for efficiently producing standardized geospatial data and maps. It discusses workflows for capturing, managing, validating, and disseminating geospatial data and maps. Key components of the solutions include rules-based geodatabases, product libraries for managing map specifications and documents, validation tools like ArcGIS Data Reviewer, and workflow management tools like ArcGIS Workflow Manager. The solutions are designed to streamline production processes and improve data quality for organizations that produce high volumes of maps and geospatial datasets.
The document discusses key characteristics of data warehouses including that they contain historical data derived from transactions for querying, reporting, and analysis. It also compares online transaction processing (OLTP) systems to data warehouses. Additionally, it covers data warehouse architectures, design considerations, logical and physical design, and managing large volumes of data through techniques like partitioning and parallelism. Optimizing input/output performance is also highlighted as critical for data warehouses.
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsChristophe Debruyne
Data processing is increasingly the subject of various internal and external regulations, such as GDPR which has recently come into effect. Instead of assuming that such processes avail of data sources (such as files and relational databases), we approach the problem in a more abstract manner and view these processes as taking datasets as input. These datasets are then created by pulling data from various data sources. Taking a W3C Recommendation for prescribing the structure of and for describing datasets, we investigate an extension of that vocabulary for the generation of executable R2RML mappings. This results in a top-down approach where one prescribes the dataset to be used by a data process and where to find the data, and where that prescription is subsequently used to retrieve the data for the creation of the dataset “just in time”. We argue that this approach to the generation of an R2RML mapping from a dataset description is the first step towards policy-aware mappings, where the generation takes into account regulations to generate mappings that are compliant. In this paper, we describe how one can obtain an R2RML mapping from a data structure definition in a declarative manner using SPARQL CONSTRUCT queries, and demonstrate it using a running example. Some of the more technical aspects are also described.
Reference: Christophe Debruyne, Dave Lewis, Declan O'Sullivan: Generating Executable Mappings from RDF Data Cube Data Structure Definitions. OTM Conferences (2) 2018: 333-350
II-SDV 2012 Open Source Platform & Cloud Platform for Information AnalysisDr. Haxel Consult
This presentation discusses using the open source Vanilla BI platform to create dashboards, maps, and KPI visualizations hosted in the cloud. It covers the Vanilla technology, using cloud infrastructure like Amazon EC2, accessing data visually, and managing the Vanilla platform in the cloud. Key benefits mentioned are low cost, ease of development and deployment, and high availability using cloud providers like Amazon and Google.
Prague data management meetup 2017-01-23Martin Bém
The document discusses the components of a data warehouse, including:
- Data stores such as the data warehouse itself, data marts, operational data stores, and big data platforms.
- Data integration tools for extracting, transforming, and loading data from various sources.
- Access tools for querying, reporting, visualization, and advanced analytics.
- Metadata for technical, business, and transformation documentation.
- Administration and management functions like operations, security, and quality assurance.
- Development tools for modeling, ETL design, and testing.
Geodatabase: The ArcGIS Mechanism for Data ManagementEsri South Africa
This presentation is about understanding the content that goes into a geodatabase, advantages of using geodatabases, data management and maintaining data integrity.
Varadarajan Sourirajan is a data architect with over 16 years of experience seeking a new position. He has extensive experience in data modeling for both online transaction processing and data warehousing applications. Currently he is working on implementing a data warehouse for the treasury line of business at a large bank in the US, drawing on his experience delivering previous data warehouse projects and a proven track record of success.
Spatial ETL For Web Services-Based Data SharingSafe Software
Spatial ETL is a process that extracts, transforms, and loads spatial data to enable data sharing through web services. It supports various spatial data formats and sources. Spatial ETL can transform and integrate data from multiple sources into a single data model and output format. It then loads and publishes the data to make it available through web services for use in applications and by consumers on demand. Spatial ETL plays a key role in enabling organizations to leverage web services and share their spatial data.
Taming the Survey Data "Tower of Babel"mercatorlem
The document discusses managing diverse survey data from multiple sources in AutoCAD. It recommends using field codes and layers to organize data by type and source. Direct survey measurements are given prominent styles, while indirect data like scans use subdued styles. The goal is to make the nature and quality of data self-evident in CAD files to avoid mixing hazards and ensure clarity for clients.
OpenGeoData Italia 2014 - Marco Fago "Infrastrutture di dati territoriali, IN...giovannibiallo
(1) The document discusses GetLOD, an open source solution for publishing geographic data as Linked Open Data. It allows publishing geospatial data and metadata from traditional cartographic web services as open, linkable RDF data.
(2) GetLOD is integrated with Spatial Data Infrastructures through OGC standards and allows publishing geographic open data in RDF and other formats like Shapefile and GML.
(3) The Region Emilia-Romagna uses GetLOD and Moka to organize their SDI and build applications, while also publishing open data through GetLOD's services. SDI and open data infrastructures will interoperate through Moka.
New tools are being developed by Czech Living Lab WirelessInfo, which allow users to easily publish their data and metadata as part of a Spatial Data Infrastructure (SDI). The paper describes the design of a Technological Infrastructure on the basis of ISO and OGC Standards and also the implementation of a prototype and first experiences. The solution is designed in distributed system form, which provides the connection to metadata about spatial data and services. This solution tests the principle of catalogue services at both national and international level which could be used in the UN SDI context. A catalogue portal is one of the independent components of GeoHosting complex system for raster and vector spatial data sharing. The catalogue portal provides data source searching on the basis of their metadata records through structure queries. The portal also contains edit functionality for new metadata records creating or editing. The metadata catalogue system corresponds to ISO 19115/19119/19139 standards [1], [2], [3], [4] and provides for cascade searching on the other standardized catalogue systems. The difference is, there exist different other initiatives offering publishing of own content like Google technology or OpenStreet Map, that GeoHosting is based fully on INSPIRE European standards and support establishing of network of distributed servers.
This document contains the resume of Ramesh, who has over 5 years of experience in SAP EIM technologies like BODS, BODI, Information Steward, and HANA. He has extensive experience designing and implementing ETL processes to extract, transform, and load data from various source systems into data warehouses. Ramesh has worked on multiple projects involving SAP data integration, data governance, and BI reporting.
This document contains the resume of Ramesh, who has over 5 years of experience in SAP EIM technologies including SAP Data Services, SAP Information Steward, and SAP HANA. He has extensive experience designing and implementing ETL processes, data integration projects, data governance solutions, and reporting dashboards. Ramesh has worked on multiple projects for clients in various industries.
The document summarizes a session from the 2011 Esri European User Conference in Madrid on data management and data exchange using geodatabases and interoperability. The session covered an overview of Esri geodatabases, geodatabase workflows and editing, database administration, and approaches for data storage and connection. Geodatabase concepts discussed included features, feature classes, relationships, and complex data types. Versioning, replication, and conflict resolution in a multi-user editing environment were also summarized.
Enterprise geodatabase sql access and administrationbrentpierce
The document provides an overview of accessing and administering an enterprise geodatabase through SQL and Python. It discusses how the geodatabase is based on relational database principles with user data stored in tables and system metadata stored in system tables. It describes how spatial types store geometry data and the benefits of using SQL to access and edit geodatabase content. The document also outlines how Python can be used for geodatabase administration tasks like schema creation, maintenance, and publishing tools.
Ozri 2013 Brisbane, Australia - Geodatabase EfficienciesWalter Simonazzi
The document discusses best practices for geodatabase design, storage, versioning, and performance management. It provides guidance to a new spatial data administrator, Dave, on assessing his organization's existing spatial data needs, creating a data model and governance plan, importing data, and collaborating with the database administrator to optimize database performance over time. Key topics covered include geodatabase design, storage and migration strategies, when and how to implement versioning, using replication to distribute data, and techniques for maintaining optimal performance such as attribute indexing, statistics, and scheduling maintenance tasks.
ESRI Mapping & Charting Solution: ArcGIS 10 Production Mappingmmarques_esri
This document describes mapping and charting solutions for efficiently producing standardized geospatial data and maps. It discusses workflows for capturing, managing, validating, and disseminating geospatial data and maps. Key components of the solutions include rules-based geodatabases, product libraries for managing map specifications and documents, validation tools like ArcGIS Data Reviewer, and workflow management tools like ArcGIS Workflow Manager. The solutions are designed to streamline production processes and improve data quality for organizations that produce high volumes of maps and geospatial datasets.
The document discusses key characteristics of data warehouses including that they contain historical data derived from transactions for querying, reporting, and analysis. It also compares online transaction processing (OLTP) systems to data warehouses. Additionally, it covers data warehouse architectures, design considerations, logical and physical design, and managing large volumes of data through techniques like partitioning and parallelism. Optimizing input/output performance is also highlighted as critical for data warehouses.
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsChristophe Debruyne
Data processing is increasingly the subject of various internal and external regulations, such as GDPR which has recently come into effect. Instead of assuming that such processes avail of data sources (such as files and relational databases), we approach the problem in a more abstract manner and view these processes as taking datasets as input. These datasets are then created by pulling data from various data sources. Taking a W3C Recommendation for prescribing the structure of and for describing datasets, we investigate an extension of that vocabulary for the generation of executable R2RML mappings. This results in a top-down approach where one prescribes the dataset to be used by a data process and where to find the data, and where that prescription is subsequently used to retrieve the data for the creation of the dataset “just in time”. We argue that this approach to the generation of an R2RML mapping from a dataset description is the first step towards policy-aware mappings, where the generation takes into account regulations to generate mappings that are compliant. In this paper, we describe how one can obtain an R2RML mapping from a data structure definition in a declarative manner using SPARQL CONSTRUCT queries, and demonstrate it using a running example. Some of the more technical aspects are also described.
Reference: Christophe Debruyne, Dave Lewis, Declan O'Sullivan: Generating Executable Mappings from RDF Data Cube Data Structure Definitions. OTM Conferences (2) 2018: 333-350
II-SDV 2012 Open Source Platform & Cloud Platform for Information AnalysisDr. Haxel Consult
This presentation discusses using the open source Vanilla BI platform to create dashboards, maps, and KPI visualizations hosted in the cloud. It covers the Vanilla technology, using cloud infrastructure like Amazon EC2, accessing data visually, and managing the Vanilla platform in the cloud. Key benefits mentioned are low cost, ease of development and deployment, and high availability using cloud providers like Amazon and Google.
Prague data management meetup 2017-01-23Martin Bém
The document discusses the components of a data warehouse, including:
- Data stores such as the data warehouse itself, data marts, operational data stores, and big data platforms.
- Data integration tools for extracting, transforming, and loading data from various sources.
- Access tools for querying, reporting, visualization, and advanced analytics.
- Metadata for technical, business, and transformation documentation.
- Administration and management functions like operations, security, and quality assurance.
- Development tools for modeling, ETL design, and testing.
Geodatabase: The ArcGIS Mechanism for Data ManagementEsri South Africa
This presentation is about understanding the content that goes into a geodatabase, advantages of using geodatabases, data management and maintaining data integrity.
Varadarajan Sourirajan is a data architect with over 16 years of experience seeking a new position. He has extensive experience in data modeling for both online transaction processing and data warehousing applications. Currently he is working on implementing a data warehouse for the treasury line of business at a large bank in the US, drawing on his experience delivering previous data warehouse projects and a proven track record of success.
GeoTrends 2017 report - Geospatial industry trends, forecasts and predictions Geoawesomeness
We have invited executives from top GeoSpatial companies to share their thoughts about the industry trends, forecasts, and predictions for 2017.
http://geoawesomeness.com/topics/geotrends/
Presentation for Sydney Open Source Developers Conference 2008 covering the range of open source geospatial projects available to the modern programmer!
The document discusses improving MDX usability in Mondrian through user defined functions (UDFs). It provides examples of UDFs that allow accessing member properties, parsing dates, performing date calculations, calculating cumulative totals over time periods, aggregating calculated members, and setting default context. The UDFs extend MDX capabilities and make queries more readable and maintainable.
HR is the new Marketing; the future of Employer BrandingTEDxMongKok
Every candidate is a potential customer, and every customer is a potential candidate. It's no secret that candidates and employees are thinking and acting like customers. Job search behaviour has changed. Employee expectations have changed. HR can no longer think like recruiters, but think like marketers. Emma Reynolds shares insights on the changing candidate behaviour in an interactive presentation that will help you analyse the touchpoints in your recruitment experience and their impact on your employer brand.
The document summarizes research conducted by Coca-Cola to understand the relationships between employee opinions, behaviors, and retention and the company's financial performance and reputation. The research found several key relationships: high employee engagement is related to improved financial results like revenue and expenses; employee retention drives higher sales volume and performance; and favorable employee opinions correlate with stronger consumer brand perception and higher market share. The results supported Coca-Cola's focus on employee experience or "People" initiatives to boost business outcomes.
L'Oreal Employer Branding and Employee Value Proposition (EVP)Link Humans
L'Oreal's new EVP launched in 2012 after a big listening exercise internally - based on 3 pillars: A thrilling experience, an environment that will inspire you & a school of excellence. Note: This is not Link Humans work, all L'Oréal - thanks to Dennis de Munck for this information.
The Insiders Guide to Employer Branding - 27 Best Practice InsightsKelly Services
Many of the old tools and strategies for building an authentic, globally relevant employer brand have been discarded, and new ones are taking over. Both the challenges and opportunities have grown almost in tandem, and it’s all happening at break-neck speed.
One thing is clear: employer branding has changed, dramatically.
Our Global Best Practice Xchange (BPX) Roundtable on the subject confirmed it. It was 90 minutes of rigorous discussion with eight seasoned professionals leading the way in employer branding innovation for their organizations. They shared their successes, mistakes and thoughts on their plans for the future.
So, if you are wondering if there’s a better, clearer way to lead your organization and practice through this change, this guide is for you.
- Rohit Kumar is a DW/BI developer with over 5 years of experience developing data warehouse and BI reporting solutions for clients in various domains including banking, finance, insurance, and research.
- He has extensive experience designing and implementing ETL processes using tools like SAP BODS and Talend to extract, transform, and load data from various source systems into data warehouses.
- He also has experience designing BI universes and reports using tools like SAP BO and Microstrategy and providing reporting solutions, training, and support to clients.
The document provides a professional summary and work experience for VenkataSubbaReddy gvenkat9bi@gmail.com. It outlines over 5 years of experience in ETL development using Informatica and data warehousing. Specific skills and responsibilities mentioned include developing mappings and transformations in Informatica to load data from various sources into staging and target databases, designing logical and physical data models, working on projects in the banking and insurance domains, and developing reports in OBIEE. Details of 6 projects are also provided demonstrating experience implementing ETL processes and data warehousing solutions for clients.
The document discusses the role of a business intelligence (BI) developer. It describes the key responsibilities of a BI developer which include meeting with customers to define report requirements, gathering specifications, estimating timelines, understanding the data and technology, managing changes, delivering reports, and troubleshooting issues. The document also provides a brief history of reporting and how it has evolved from mainframes to relational databases to modern BI tools that allow self-service reporting and advanced visualizations. It discusses emerging areas like mobile BI, Hadoop, artificial intelligence, and predicts continued growth and expansion of BI in the future.
This talk will provide overview of big data software engineering and software engineering for big data as the tow fields need integrated. The interplay between the two field of research applications of Data Science and Software Engineering will enhance future perspective for a safe, secure, and sustainable approaches to data science and application of data science for 50 years of software engineering data that exists.
In computing, a data warehouse (DW, DWH), or an enterprise data warehouse (EDW), is a database used for reporting (1) and data analysis (2). Integrating data from one or more disparate sources creates a central repository of data, a data warehouse (DW). Data warehouses store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.
The document provides an overview of the Dublinked Technology Workshop held on December 15th, 2011. It includes presentations on transportation data, spatial web services, linked data, and semantic data description. Breakout sessions covered topics like data publishing, discovery, web services, and advanced functions. The workshop aimed to address challenges around sharing digital data between organizations and discussed technical requirements and tools to support open government data platforms.
Sowjanya H J_BE with 4 yr exp_ Informaticasowjanya H J
The document provides a curriculum vitae for Sowjanya HJ, a software engineer with 4 years of experience in data warehousing using the Informatica ETL tool. She has worked on multiple projects involving data integration and maintenance for clients like Societe Generale and ALD Automotive. Her roles have included developing mappings, performing testing, and providing documentation and training. She is proficient in technologies like Informatica, Oracle, SQL, and UNIX.
This document provides an agenda and overview for a training on Microstrategy, a business intelligence tool. The agenda includes sessions on Microstrategy's organic architecture, metadata unified model, and hands-on experience with Microstrategy projects, OLAP, and visual insight. The document also discusses Microstrategy's architecture, modules, and development tools. It explains how Microstrategy provides interactive reporting, analysis, dashboards, alerting and other functions through a unified platform that accesses various data sources.
The document discusses the role and responsibilities of a business intelligence (BI) developer. It describes the key steps a BI developer takes when creating a report, including meeting with customers, gathering requirements, estimating timelines, working with data sources, implementing the report, testing and troubleshooting, and closing out the request. The document provides an overview of how reporting has evolved from mainframes to relational databases to modern BI tools and discusses future trends like artificial intelligence and big data.
This document summarizes a summer training seminar on BigData Hadoop that was attended. The training was provided by LinuxWorld Informatics Pvt Ltd, which offers open source and commercial training programs. The attendee learned about Hadoop, MapReduce, single and multi-node clusters, Docker, and Ansible. Big data challenges related to volume, variety, velocity, and veracity of data were also covered. Hadoop and its core components HDFS and MapReduce were explained as solutions for storing and processing large datasets in a distributed manner across commodity hardware. Docker containers were introduced as a lightweight alternative to virtual machines.
Top Big data Analytics tools: Emerging trends and Best practicesSpringPeople
This document discusses top big data analytics tools and emerging trends in big data analytics. It defines big data analytics as examining large data sets to find patterns and business insights. The document then covers several open source and commercial big data analytics tools, including Jaspersoft and Talend for reporting, Skytree for machine learning, Tableau for visualization, and Pentaho and Splunk for reporting. It emphasizes that tool selection is just one part of a big data project and that evaluating business value is also important.
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
This presentation introduces the audience to the DataOps and AIOps practices. It deals with organizational & tech aspects, and provide hints to start you data journey.
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"jstrobl
- Location data from sources like social media, sensors, and smart devices is increasingly important for improving city services, security, and operations in smart cities (paragraph 1, 2)
- Oracle provides tools for managing and analyzing large volumes of spatial and location data using big data technologies like Hadoop and streaming data platforms to enable use cases like predictive analytics (paragraph 3, 4, 5, 19)
- Oracle's spatial capabilities allow for indexing, visualization, and analysis of geospatial vector and raster data at scale, including tools for data preparation, spatial queries, and analyzing streaming location data (paragraph 10, 13, 14, 20)
Bill Hayduk is the founder and CEO of QuerySurge, a software division that provides data integration and analytics solutions, with headquarters in New York; QuerySurge was founded in 1996 and has grown to serve Fortune 1000 customers through partnerships with technology companies and consulting firms. The document discusses the data and analytics marketplace and provides an overview of concepts like data warehousing, ETL, BI, data quality, data testing, big data, Hadoop, and NoSQL.
The document provides a technical summary and experience profile of Nootan Sharma. It summarizes his 8 years of experience in data warehousing and business intelligence projects. It details his expertise in tools like Informatica PowerCenter, Oracle, SQL Server and data quality management. It also lists his past work experience with companies like Capgemini, Birlasoft and Infogain on various BI and data warehousing projects for clients in different sectors.
Top 10 Data analytics tools to look for in 2021Mobcoder
This write-up has surrounded the top 10 tools used by data analysts, architects, scientists, and other professionals. Each tool has some specific feature that makes it an ideal fit for a specific task. So choose wisely depending on your business need, type of data, the volume of information, experience in analytical thinking.
This document provides an introduction to big data and analytics. It discusses definitions of key concepts like business intelligence, data analysis, and big data. It also provides a brief history of analytics, describing how technologies have evolved from early business intelligence systems to today's big data approaches. The document outlines some of the key components of Hadoop, including HDFS and MapReduce, and how it addresses issues like volume, variety and velocity of big data. It also discusses related technologies in the Hadoop ecosystem.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Open Source Geospatial Business Intelligence (GeoBI): Definition, architectures, projects, challenges and outlooks
1. Open Source Geospatial
Business Intelligence (GeoBI):
Definition, architectures, projects,
challenges and outlooks
Prof. Thierry Badard
3rd OSGIS Conference
Nottingham University, Nottingham, UK – June 21-22, 2011
2. Content of the presentation
● Open data / Data deluge
● BI & Geospatial BI (GeoBI)
● Definition
● Architecture
● Offerings
● Open source GeoBI solutions
● Conclusion & outlooks
12. Nowadays, companies and public bodies …
● Store and face huge amounts of data, that are very
diverse and heterogeneous
● Sales
● Stocks
● Socio-economical data
● Surveys
● …
● Impact assessment of their advertisment campaigns
● Website visits
● Social networks
13. In a world that changes rapidly ...
● Which imposes to take informed and strategic
decisions more and more often and rapidly in order
to keep competitive
● So, they require to analyze these huge amounts of
data rapidly, simply and efficiently
● For that they use more and more some BI
(Business Intelligence) tools
● These tools produce:
● Synthesis reports (PDF, Excel, ...)
● Interactive analytical dashboards
20. Location is everywhere! ;-)
● About 80% of all data stored in corporate
databases has a spatial component [Franklin, 1992]
● Mail address
● Postal/ZIP code
● Coordinates
● IP, phone number, …
● As time, space (location) is a crucial dimension that
should be taken into account when analyzing
corporate data!
21. How to take into account location …
● It requires to get the right tools in order to observe,
handle and explore this type of information in BI
tools ...
● Let us imagine you are HSBC … or ministry for
public safety … or Haiti government ...
● Do you think that cross tables or charts are the
best media to answer these type of questions ?
22. The right medium is obviously
Le bon médium, c'est la carte ! a map!
23. The right medium is obviously
Le bon médium, c'est la carte ! a map!
24. The right medium is obviously
Le bon médium, c'est la carte ! a map!
25. So, GeoBI (Geospatial BI) ...
● Brings :
● The map ...
● and all the tools required to represent, navigate and
analyse the geospatial dimension
● In the very core of BI tools of the market
● In order to fully take into account the spatial
dimension in the corporate data analysis and
decision processes based on these data!
26. BI and geospatial industry
Industry
Business
Geospatial Web Intelligence
2.0
(BI)
GeoBI
3 billions $ in software 8,8 billions $ in software
Size
(total of 15 billions $ IT) (increased of 22% in 2008)
Players
27. Back in 2005 ...
● GeoSOA research team: research in BI, geospatial and mobiliy
● Purposes:
● Propose ubiquitous and mobile decision making tools which make use of the
geospatial dimension (mobile GeoBI),
● And find ways to deliver a business info adapted to the context of the user
(location, environment, situation, pupose, ...)
● But, unfortunately:
● No GeoBI tool were available and enabling:
– The processing of large volumes of information that come from very diverse and
heterogeneous data sources
– The rapid delivery of these data to numerous concurrent users in a
connected/disconnected modes (web services architecture)
– The support of standards dedicated to mobility
– With key GeoBI capabilities and with a rich, comprehensive and extensible API at an
affordable cost
– Relying on a robust and complete BI architecture in order to be able to fulfill all types of
user requirements
28. Classical architecture of a BI infrastructure
●
Transactional databases
●
Web ressources
●
XML, flat files, proprietary file
formats (Excel spreadsheets, …)
●
LDAP
●
…
29. The Data Warehouse: the crucial/central part!
●
Repository of an organization’s historical data,
for analysis purposes.
●
Primarily destined to analysts and decision
makers.
●
Separate from operational (OLTP) systems
(source data)
But
often stored in relational DBMS: Oracle, MSSQL,
PostgreSQL, MySQL, Ingres, …
●
Contents are often presented in a summarized
form (e.g. key performance indicators,
dashboards, OLAP client applications, reports).
Need to define some metrics/measures
30. The Data Warehouse: the crucial/central part!
●
Optimized for:
Large volumes of data (up to terabytes);
Fast response (<10 s) to analytical queries (vs. update speed for
transactional DB):
de-normalized data schemas (e.g. star or snowflake schemas),
Introduces some redundancy to avoid time consuming JOIN queries
all data are stored in the DW across time (no corrections),
summary (aggregate) data at different levels of details and/or time scales,
(multi)dimensional modeling (a dimension per analysis axis).
All data are interrelated according to the analysis axes (OLAP datacube
paradigm)
●
Focus is thus more on the analysis / correlation of large
amount of data than on retrieving/updating a precise set of
data!
●
Specific methods to propagate updates into the DW needed!
31. MDX query language
●
MDX stands for MultiDimensional eXpressions
●
Multidimensional query language
●
De facto standard from Microsoft for SQL Server OLAP Services
(now Analysis Services)
●
Also implemented by other OLAP servers (Essbase, Mondrian) and
clients (Proclarity, Excel PivotTables, Cognos, JPivot, …)
●
MDX is for OLAP data cubes what SQL is for relational databases
●
Looks like a SQL query but relies on a different model (close to the
one used in spreasheets)
●
SELECT
{ [Measures].[Store Sales] } ON COLUMNS,
{ [Date].[2002], [Date].[2003] } ON ROWS
FROM Sales
WHERE ( [Store].[USA].[CA] )
32. Results representation
●
SELECT
{ [Product].[All Products].[Drink],
[Product].[All Products].[Food] } ON COLUMNS,
{ [Store].[All Stores].[USA].[WA].[Yakima].[Store 23],
[Store].[All Stores].[USA].[CA].[Beverly Hills].[Store 6],
[Store].[All Stores].[USA].[OR].[Portland].[Store 11] } ON ROWS
FROM Warehouse
WHERE ([Time].[1997], [Measures].[Units Shipped])
OLAP client software propose:
Alternate representation modes (pie charts, diagrams, etc.)
Different tools to refine queries/explore data
Drill down, roll up, pivot, …
Based on operators provided by MDX
33. Geospatial BI adds maps and spatial analysis!
Require to consistently integrate the geospatial component in all parts of the architecture!
35. Pentaho open source BI software stack
●
Pentaho (http://www.pentaho.org)
Pentaho
Reporting
Kettle Mondrian
Weka
+ CDF: Community Dashboard Framework
+ Other projects: olap4j, JPivot, Halogen, …
36. Towards an open source geospatial BI stack
& integration in various
dashboard and reporting
tools
Spatial S
• PostGIS
• Oracle Spatial
GeoKNIME
Spatial
37. An ETL tool is …
●
A type of software used to populate databases or data
warehouses from heterogeneous data sources.
●
ETL stands for:
− Extract – Extract data from data sources
− Transform – Transformation of data in order to correct errors, make
some data cleansing, change the data structure, make them compliant to
defined standards, etc.
− Load – Load transformed data into the target DBMS
●
An ETL tool should manage the insertion of new data and the
updating of existing data.
●
Should be able to perform transformations from :
− An OLTP system to another OLTP system
− An OLTP system to an analytical data warehouse
38. Why use an ETL tool?
●
Automation of complex and repetitive data processing
without producing any specific code
●
Conversion between various data formats
●
Migration of data from a DBMS to another
●
Data feeding into various DBMS
●
Population of analytical data warehouses for decision
support purposes
●
etc.
39. GeoKettle
●
GeoKettle is a "spatially-enabled" version of Pentaho Data
Integration (Kettle)
●
Kettle is a metadata-driven ETL with direct execution of
transformations
− No intermediate code generation!
●
Support of several DBMS and file formats
− DBMS support: MySQL, PostgreSQL, Oracle, DB2, MS SQL Server, ...
(total of 37)
− Read/write support of various data file formats: text, Excel, Access,
DBF, XML, ...
●
Numerous transformation steps
●
Support of methods for the updating of DW
40. GeoKettle
●
GeoKettle provides a true and consistent integration of the spatial
component
− All steps provided by Kettle are able to deal with geospatial data types
− Some geospatial dedicated steps have been added
●
First release in May 2008: 2.5.2-20080531
●
Current stable version: 3.2.0-r188-20090706
●
To be released shortly: GeoKettle 2.0 with many new features!
●
Released under LGPL at http://www.geokettle.org
●
Used in different organizations and countries:
− Some ministries, bank, insurance, integrators, …
− E.g. GeoETL from Inova is in fact GeoKettle! :-)
●
A growing community of users and developers
41. GeoKettle
●
Transformations vs. Jobs:
− Running in parallel vs. running sequentially
●
All can be stored in a central repository (database)
− But each transformation or job could also be saved in a
simple XML file!
●
Offers different interfaces:
− Spoon: GUI for the edition of transformations and jobs
− Pan: command line interface for running transformations
− Kitchen: command line interface for running jobs
− Carte: Web service for the remote execution of
transformations and jobs
43. GeoKettle
● Version 2.0 provides support for:
− Handling geometry data types (based on JTS)
− Accessing Geometry objects in JavaScript
− It allows the definition of custom transformation steps by the user (“Modified
JavaScript Value” step)
− Topological predicates (Intersects, crosses, etc.) and aggregation operators (envelope,
union, geometry collection, ...)
− SRS definition and transformations
− Input / Output with some spatial DBMS
- Native support for Oracle, PostGIS and MySQL
- MS SQL Server 2008 and SQLite/Spatialite
- Ingres and IBM DB2 can be used but it requires some tricks
− GIS file Input / Output: Shapefile, GML 3, KML 2.2 and OGR support (~33 vector data
formats and DBMS)
− Sensor Web (SOS) and metadata (CSW) services
− Cartographic preview
− Spatial analysis and advanced geo-processing
44. GeoKettle
●
GeoKettle releases are aligned with the ones of Pentaho
Data Integration (Kettle),
− GeoKettle then benefits all new features provided by
PDI (Kettle).
●
Kettle is natively designed to be deployed in cluster and
web service environments.
− It makes GeoKettle a perfect software component to be deployed
as a service (SaaS) in cloud computing environments as those
provided by Amazon EC2.
− It enables then the scalable, distributed and on demand
processing of large and complex volumes of geospatial data in
minutes for critical applications and without requiring a company
to invest in an expensive IT infrastructure of servers, networks
and software.
45. GeoKettle – Requirements and install
●
Very simple installation procedure
●
All you need is a Java Runtime Environment
●
Version 5 or higher
●
Just unzip the binary archive of GeoKettle ...
●
And let’s go !
●
Run spoon.sh (UNIX/Linux/Mac)
or spoon.bat (Windows)
●
Need help, please visit our wiki:
●
http://wiki.spatialytics.org
47. GeoKettle
●
Upcoming features:
− Implementation of data matching and conflation steps/jobs in order to
allow advanced geometric data cleansing and comparison of
geospatial datasets
− Read/write support for other DBMS, GIS file formats and services
− LAS (LiDAR), ...
− WFS-T, Sensor Web (TML, SensorML, SOS-T, ...), ...
− Dedicated steps for social media (Twitter, ...), OSM, generalization, ...
− Support of the third dimension, (x,y,z, M), linear reference systems,
raster ...
− Raster support: development in progress of a plugin to integrate all
capabilities provided by the Sextante library (BeETLe)
48. GeoMondrian
●
GeoMondrian is a "spatially-enabled" version of Pentaho
Analysis Services (Mondrian)
●
GeoMondrian brings to the Mondrian OLAP server what
PostGIS brings to the PostgreSQL DBMS
– i.e. a consistent and powerful support for geospatial data.
●
Licensed under the EPL
●
http://www.geo-mondrian.org
49. GeoMondrian
●
As far as we know, it is the first implementation of a true
Spatial OLAP (SOLAP) Server
− And it is an open source project! ;-)
●
Provides a consistent integration of spatial objects into the
OLAP data cube structure
− Instead of fetching them from an external spatial DBMS, web
service or a GIS file
●
Implements a native Geometry data type
●
Provides first spatial extensions to the MDX language
− Add spatial analysis capabilities to the analytical queries
●
At present, it only supports PostGIS datawarehouses
− But other DBMS will be supported in the next version!
50. Spatially enabled MDX
●
Goal: bring to Mondrian and MDX what SQL spatial extensions do
for relational DBMS (i.e. Simple Features for SQL and
implementations such as PostGIS).
●
Example query: filter spatial dimension members based on distance
from a feature
− SELECT
{[Measures].[Population]} on columns,
Filter(
{[Unite geographique].[Region economique].members},
ST_Distance([Unitegeographique].CurrentMember.Properties("geom"),
[Unite geographique].[Province].[Ontario].Properties("geom")) < 2.0
) on rows
FROM [Recensements]
WHERE [Temps].[Rencensement 2001 (2001-2003)].[2001]
51. Spatially enabled MDX
●
Many more possibilities:
− in-line geometry constructors (from WKT)
− member filters based on topological predicates
(intersects, contains, within, …)
− spatial calculated members and measures (e.g.
aggregates of spatial features, buffers)
− calculations based on scalar attributes derived from
spatial features (area, length, distance, …)
53. SOLAPLayers
●
SOLAPLayers is a lightweight cartographic component
(framework) which enables navigation in geospatial (Spatial
OLAP or SOLAP) data cubes, such as those handled by
GeoMondrian.
●
It aims to be integrated into existing dashboard frameworks in
order to produce interactive geo-analytical dashboards.
●
Such dashboards help in supporting the decision making
process by including the geospatial dimension in the analysis
of enterprise data.
●
First version stems from a GSoC 2008 project performed
under the umbrella of OSGeo.
●
Licensed under BSD (client part) and EPL (server part).
●
http://www.solaplayers.org
54. SOLAPLayers v1
●
Version 1 was based on OpenLayers and Dojo
●
It allows:
− the connection with a Spatial OLAP server such as
GeoMondrian,
− some basic navigation capabilities in the geospatial
data cubes,
− and the cartographic representation of some measures
as static or dynamic choropleth maps, maps with
proportional symbols.
57. SOLAPLayers v1
●
Version 1 was a mostly proof of concept!
●
It presents important limitations:
− Allows only the cartographic representation (no crosstabs or
charts)
− Works only for one measure and the spatial dimension !
− Offers limited navigation capabilities in the geospatial
data cubes
− Is able to connect to GeoMondrian only
− Extending the framework is difficult due to the
lack of flexibility and the poor documentation of Dojo,
− Integration with other currently used geo-web and dashboard
frameworks was difficult
− ...
58. SOLAPLayers 2.0
●
So, SOLAPLayers has undergone (and is still
undergoing ;-) ) a deep re-engineering!
●
Version 2 is fully based on ExtJS/GeoExt (and hence
OpenLayers)
− It will make its integration with other geo/web and
BI/dashboard frameworks easier
− It provides some new ExtJS components dedicated to
GeoBI!
− Based on the philosophy for the development of applications
adopted by these geo-web frameworks, it allows an easier
creation/maintenance of the produced geo-analytical
dashboards!
− Like ExtJS, it supports internationalization!
59. SOLAPLayers 2.0 – Architecture
1 Built-in or LDAP
Server
SOLAP Server Authentication
Native or XML/A
OLAP4J
MDX
Client
Server
SOLAPJSON
60. SOLAPLayers 2.0 – Architecture
1 Built-in or LDAP
Server
SOLAP Server Authentication
Native or XML/A
OLAP4J
MDX
OLAP Server
Client
Native or XML/A Server
SOLAPJSON
2
Geospatial data source
(WFS, DBMS, ...)
61. SOLAPLayers 2.0 – Architecture
1 Built-in or LDAP
Server
SOLAP Server Authentication
Native or XML/A
OLAP4J
MDX
OLAP Server
Client
Native or XML/A Server
SOLAPJSON
2
Bridge architecture
Geospatial data source • Maximize what is in place in organisations
• But, no Geo-MDX capabilities available!
(WFS, DBMS, ...)
62. SOLAPLayers 2.0 – Architecture
1 Built-in or LDAP
Server
SOLAP Server Authentication
Native or XML/A
OLAP4J
MDX
OLAP Server
Client
Native or XML/A Server
SOLAPJSON
2
3
Geospatial data source
(WFS, DBMS, ...) Geospatial data source
(WFS, DBMS, ...)
63. SOLAPLayers 2.0 – Architecture
1 Built-in or LDAP
Server
SOLAP Server Authentication
Native or XML/A
OLAP4J
MDX
OLAP Server
Client
Native or XML/A Server
SOLAPJSON
2
• For simple geo-dashboards
• Based on transactional data
3 • Thematic mapping
Geospatial data source • No Geo-MDX and drill-down
(WFS, DBMS, ...) Geospatial data source
(WFS, DBMS, ...) or roll-up capabilities!
64. SOLAPLayers 2.0 – Architecture
●
Many more connectors to come …
– GeoKettle
– CDA (BI domain)
– Oracle/MSSQL support
– ...
●
Many more output formats to be available ...
– Image (PNG, ...)
– KML
– ...
65. SOLAPLayers 2.0 – Geo-dashboard made easy!
1 Define the template of the dashboard in a HTML file
66. SOLAPLayers 2.0 – Geo-dashboard made easy!
1 Define the template of the dashboard in a HTML file
Define your dashboard components in a JS file
2 and map it to the div in the HTML file
80. Prédiction
Nœud de
configuration
KNN
Valeurs réelles
81. Conclusion & outlooks
● A rich and passionating topic ...
● And it is just an introduction ;-)
● Do not hesitate to ask for more demos!
● R&D is still required in this domain ...
● Open source allows cooperation/collaboration and makes research work easier:
students can use the infrastructure to develop their own research without
reinventing the wheel
● Based on this infrastructure, some research works on mobile GeoBI have
already been performed:
– Web services for the dissemination of geo-analytical data
– Methods for real time updating and integration of data stemming from
sensors
– Definition of mobile GeoBI context profiles
– Implementation of first mobile and context aware GeoBI clients
–…
● Towards active geospatial datawarehouses ...