a Trajectory Data Warehouse using Master Project Introduction Simone Campora Advisors Laura Spinsanti Jose Antonio Fernandes de Macedo Stefano Spaccapietra
Choosing OLAP Platform Which Architecture should we use to develop a Trajectory Data Warehouse? Several Candidates Oracle OLAP Microsoft SQL Server BI Workbench SAS ®  OLAP Server Pentaho Mondrian Mondrian stands out of the crowd for several aspects…
What is Mondrian?
Who is Mondrian ? A step forward to Data Warehouse Integration Mondrian is an OLAP server written in Java.  It enables to interactively analyze very large datasets stored in SQL databases without writing SQL.
It uses MDX Query Language as Query Language As well as XML Analytics (XMLA) <soap:Envelope>  <soap:Body>  <Execute xmlns=&quot;urn:schemas-microsoft-com:xml-analysis&quot;>  <Command>  <Statement> SELECT Measures.MEMBERS ON COLUMNS FROM Sales  </Statement>  </Command>  <Properties>  <PropertyList> <DataSourceInfo/>  <Catalog>FoodMart</Catalog>  <Format>Multidimensional</Format>  <AxisFormat>TupleFormat</AxisFormat>  </PropertyList>  </Properties>  </Execute> </soap:Body>  </soap:Envelope>
More on MDX MDX stands for MultiDimensional eXpressions query language De facto standard from Microsoft for SQL Server OLAP Services(now Analysis Services) MDX is for OLAP data cubes what SQL is for relational databases Looks like a SQL query but relies on a different model (close to the one used in spreadsheets) SELECT { [Measures].[Store Sales] }  ON COLUMNS,  { [Date].[2002], [Date].[2003] } ON ROWS  FROM Sales  WHERE ( [Store].[USA].[CA]
XML Cube Definition Mondrian uses XML “Schemas” to define the Cubes, like: < Cube  name=&quot;Sales&quot;>   < Table  name=&quot;sales&quot;>     < AggName  name=&quot;agg_1&quot;>       < AggFactCount  column=&quot;row count&quot;/>       < AggMeasure  name=&quot;[Measures].[Unit Sales]&quot; column=&quot;sum units&quot;/>       < AggMeasure  name=&quot;[Measures].[Min Units]&quot; column=&quot;min units&quot;/>       < AggMeasure  name=&quot;[Measures].[Max Units]&quot; column=&quot;max units&quot;/>       < AggMeasure  name=&quot;[Measures].[Dollar Sales]&quot; column=&quot;sum dollars&quot;/>       < AggLevel  name=&quot;[Time].[Year]&quot; column=&quot;year&quot;/>       < AggLevel  name=&quot;[Time].[Quarter]&quot; column=&quot;quarter&quot;/>       < AggLevel  name=&quot;[Product].[Mfrid]&quot; column=&quot;mfrid&quot;/>       < AggLevel  name=&quot;[Product].[Brand]&quot; column=&quot;brand&quot;/>       < AggLevel  name=&quot;[Product].[Prodid]&quot; column=&quot;prodid&quot;/>     </ AggName >   </ Table >     <!-- Rest of the cube definition --> </ Cube >
It is the OLAP version of JDBC It is considered to be for OLAP, what JDBC API is for Relational Databases. Using a similar Java Syntax it is possible to query the OLAP Server from any Java Application import mondrian.olap.*; import java.io.PrintWriter; Connection connection = DriverManager.getConnection(     &quot;Provider=mondrian;&quot; +     &quot;Jdbc=jdbc:odbc:MondrianFoodMart;&quot; +     &quot;Catalog=/WEB-INF/FoodMart.xml;&quot;,     null,     false); Query query = connection.parseQuery(     &quot;SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} on columns,&quot; +     &quot; {[Product].children} on rows &quot; +     &quot;FROM [Sales] &quot; +     &quot;WHERE ([Time].[1997].[Q1], [Store].[CA].[San Francisco])&quot;); Result result = connection.execute(query); result.print(new PrintWriter(System.out));
Mondrian is used for… &quot;Dimensional&quot; exploration of data Parsing of Multi-Dimensional eXpression (MDX) language into Structured Query Language (SQL) to retrieve answers to dimensional queries High-speed queries through the use of aggregate tables in the RDBMS Advanced calculations using the calculation expressions of the MDX language
Key Features On-Line Analytical Processing (OLAP) cubes automated aggregation speed-of-thought response times Open Architecture 100% Java J2EE Supports any JDBC data source MDX and XML/A (i.e. SOAP) Analysis Viewers Enables ad-hoc, interactive data exploration Ability to slice-and-dice, drill-down, and pivot
Mondrian’s Architecture
Architecture Database Provides Data storage SQL query execution Heavy-duty sorting, correlation, aggregation  Mondrian Provides Dimensional view of data MDX parsing SQL generation Caching Higher-level calculations Aggregate awareness Mondrian cube  RDBMS Apache Derby, Firebird, hsqldb, IBM DB2, Infobright, Informix, Ingres, Interbase, LucidDB, Microsoft Access, Microsoft SQL Server, MySQL, Netezza, Oracle, PostgreSQL, Sybase, Teradata
Architecture Open Standards (Java, XML, MDX, XML/A, SQL) Cross Platform (Windows & Unix/Linux) J2EE Architecture Server Clustering Fault Tolerance Data Sources JDBC JNDI Cube  Schema XML Cube  Schema XML Cube  Schema XML J2EE Application Server Mondrian Web Server JDBC RDBMS cube  cube  cube  File or RDBMS Repository RDBMS JDBC JDBC JPivot servlet Viewers JPivot servlet XML/A servlet Microsoft Excel (via Spreadsheet Services)
Strenghts Database Independant Applications  (it operates on a JVM) Open Source (Eclipse License) Standards such as: MDX, XMLA, JDBC Relevant Installed Base  (DivX, iStockPhoto, Sun, Mozilla, MySQL…) Widely recognized inside the Open Source Community
RDBMS Design Mondrian does not store data on disk:  it just read data from the DBMS and copy it into the cache It puts limits on Mondrian's performance when Mondrian is applied to a huge dataset. This can be overcome by using an “Aggregate Table” Designed DB Schema…
Aggregation Tables This is the plain Fact table This is considering a specific Aggregation
Tools and Caching you don't need to do any processing to populate special data structures before you start running OLAP queries. mondrian an excellent choice for 'real-time OLAP' -- running multi-dimensional queries on a database which is constantly changing. There are specific APIs and Tools that can be customized to handle Aggregate table creations and cache updating.
Geo Mondrian
Geo Mondrian GeoMondrian is a &quot;spatially-enabled&quot; version of the Mondrian OLAP.  GeoMondrian is an implementation of a Spatial OLAP (SOLAP) server, it is the first implementation of such a server up to now. Up to now it is  unreleased , and developed by  Thierry Badard and Etienne Dubé form  University of Laval, Canada.
Geo Mondrian… It adds to Mondrian a Geometry data type, enabling storage of vector geometries (points, lines, polygons) natively within the data cubes.  Instead of fetching them from an external spatial DBMS, web service or a GIS file Some MDX functions allow to add spatial analysis capabilities to the analytical queries.
Example Query Example query: filter spatial dimension members based on  distance from a  feature SELECT { [Measures].[Population]} on columns,  Filter( {[Unite geographique].[Region economique].members},  ST_Distance([Unitegeographique].CurrentMember.Properties(&quot;geom&quot;),[Unite geographique].[Province].[Ontario].Properties(&quot;geom&quot;)) < 2.0 ) on rows  FROM [Recensements]  WHERE [Temps].[Rencensement 2001 (2001-2003)].[2001]
Features Geometry objects are handled using the JTS library (Open GIS Consortium STD) http://www.vividsolutions.com/jts For the moment,  only  PostgreSQL with the  PostGIS  spatial extension is supported as a data source for Geometry values
Conclusions Open Source Solution for BI Applications Active Community Developing on both projects: Mondrian – Geo Mondrian Database Independent Alternative for BI
Thanks for the Attention Questions?

Mondrian - Geo Mondrian

  • 1.
    a Trajectory DataWarehouse using Master Project Introduction Simone Campora Advisors Laura Spinsanti Jose Antonio Fernandes de Macedo Stefano Spaccapietra
  • 2.
    Choosing OLAP PlatformWhich Architecture should we use to develop a Trajectory Data Warehouse? Several Candidates Oracle OLAP Microsoft SQL Server BI Workbench SAS ® OLAP Server Pentaho Mondrian Mondrian stands out of the crowd for several aspects…
  • 3.
  • 4.
    Who is Mondrian? A step forward to Data Warehouse Integration Mondrian is an OLAP server written in Java. It enables to interactively analyze very large datasets stored in SQL databases without writing SQL.
  • 5.
    It uses MDXQuery Language as Query Language As well as XML Analytics (XMLA) <soap:Envelope> <soap:Body> <Execute xmlns=&quot;urn:schemas-microsoft-com:xml-analysis&quot;> <Command> <Statement> SELECT Measures.MEMBERS ON COLUMNS FROM Sales </Statement> </Command> <Properties> <PropertyList> <DataSourceInfo/> <Catalog>FoodMart</Catalog> <Format>Multidimensional</Format> <AxisFormat>TupleFormat</AxisFormat> </PropertyList> </Properties> </Execute> </soap:Body> </soap:Envelope>
  • 6.
    More on MDXMDX stands for MultiDimensional eXpressions query language De facto standard from Microsoft for SQL Server OLAP Services(now Analysis Services) MDX is for OLAP data cubes what SQL is for relational databases Looks like a SQL query but relies on a different model (close to the one used in spreadsheets) SELECT { [Measures].[Store Sales] } ON COLUMNS, { [Date].[2002], [Date].[2003] } ON ROWS FROM Sales WHERE ( [Store].[USA].[CA]
  • 7.
    XML Cube DefinitionMondrian uses XML “Schemas” to define the Cubes, like: < Cube name=&quot;Sales&quot;>   < Table name=&quot;sales&quot;>     < AggName name=&quot;agg_1&quot;>       < AggFactCount column=&quot;row count&quot;/>       < AggMeasure name=&quot;[Measures].[Unit Sales]&quot; column=&quot;sum units&quot;/>       < AggMeasure name=&quot;[Measures].[Min Units]&quot; column=&quot;min units&quot;/>       < AggMeasure name=&quot;[Measures].[Max Units]&quot; column=&quot;max units&quot;/>       < AggMeasure name=&quot;[Measures].[Dollar Sales]&quot; column=&quot;sum dollars&quot;/>       < AggLevel name=&quot;[Time].[Year]&quot; column=&quot;year&quot;/>       < AggLevel name=&quot;[Time].[Quarter]&quot; column=&quot;quarter&quot;/>       < AggLevel name=&quot;[Product].[Mfrid]&quot; column=&quot;mfrid&quot;/>       < AggLevel name=&quot;[Product].[Brand]&quot; column=&quot;brand&quot;/>       < AggLevel name=&quot;[Product].[Prodid]&quot; column=&quot;prodid&quot;/>     </ AggName >   </ Table >     <!-- Rest of the cube definition --> </ Cube >
  • 8.
    It is theOLAP version of JDBC It is considered to be for OLAP, what JDBC API is for Relational Databases. Using a similar Java Syntax it is possible to query the OLAP Server from any Java Application import mondrian.olap.*; import java.io.PrintWriter; Connection connection = DriverManager.getConnection(     &quot;Provider=mondrian;&quot; +     &quot;Jdbc=jdbc:odbc:MondrianFoodMart;&quot; +     &quot;Catalog=/WEB-INF/FoodMart.xml;&quot;,     null,     false); Query query = connection.parseQuery(     &quot;SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} on columns,&quot; +     &quot; {[Product].children} on rows &quot; +     &quot;FROM [Sales] &quot; +     &quot;WHERE ([Time].[1997].[Q1], [Store].[CA].[San Francisco])&quot;); Result result = connection.execute(query); result.print(new PrintWriter(System.out));
  • 9.
    Mondrian is usedfor… &quot;Dimensional&quot; exploration of data Parsing of Multi-Dimensional eXpression (MDX) language into Structured Query Language (SQL) to retrieve answers to dimensional queries High-speed queries through the use of aggregate tables in the RDBMS Advanced calculations using the calculation expressions of the MDX language
  • 10.
    Key Features On-LineAnalytical Processing (OLAP) cubes automated aggregation speed-of-thought response times Open Architecture 100% Java J2EE Supports any JDBC data source MDX and XML/A (i.e. SOAP) Analysis Viewers Enables ad-hoc, interactive data exploration Ability to slice-and-dice, drill-down, and pivot
  • 11.
  • 12.
    Architecture Database ProvidesData storage SQL query execution Heavy-duty sorting, correlation, aggregation Mondrian Provides Dimensional view of data MDX parsing SQL generation Caching Higher-level calculations Aggregate awareness Mondrian cube RDBMS Apache Derby, Firebird, hsqldb, IBM DB2, Infobright, Informix, Ingres, Interbase, LucidDB, Microsoft Access, Microsoft SQL Server, MySQL, Netezza, Oracle, PostgreSQL, Sybase, Teradata
  • 13.
    Architecture Open Standards(Java, XML, MDX, XML/A, SQL) Cross Platform (Windows & Unix/Linux) J2EE Architecture Server Clustering Fault Tolerance Data Sources JDBC JNDI Cube Schema XML Cube Schema XML Cube Schema XML J2EE Application Server Mondrian Web Server JDBC RDBMS cube cube cube File or RDBMS Repository RDBMS JDBC JDBC JPivot servlet Viewers JPivot servlet XML/A servlet Microsoft Excel (via Spreadsheet Services)
  • 14.
    Strenghts Database IndependantApplications (it operates on a JVM) Open Source (Eclipse License) Standards such as: MDX, XMLA, JDBC Relevant Installed Base (DivX, iStockPhoto, Sun, Mozilla, MySQL…) Widely recognized inside the Open Source Community
  • 15.
    RDBMS Design Mondriandoes not store data on disk: it just read data from the DBMS and copy it into the cache It puts limits on Mondrian's performance when Mondrian is applied to a huge dataset. This can be overcome by using an “Aggregate Table” Designed DB Schema…
  • 16.
    Aggregation Tables Thisis the plain Fact table This is considering a specific Aggregation
  • 17.
    Tools and Cachingyou don't need to do any processing to populate special data structures before you start running OLAP queries. mondrian an excellent choice for 'real-time OLAP' -- running multi-dimensional queries on a database which is constantly changing. There are specific APIs and Tools that can be customized to handle Aggregate table creations and cache updating.
  • 18.
  • 19.
    Geo Mondrian GeoMondrianis a &quot;spatially-enabled&quot; version of the Mondrian OLAP. GeoMondrian is an implementation of a Spatial OLAP (SOLAP) server, it is the first implementation of such a server up to now. Up to now it is unreleased , and developed by Thierry Badard and Etienne Dubé form University of Laval, Canada.
  • 20.
    Geo Mondrian… Itadds to Mondrian a Geometry data type, enabling storage of vector geometries (points, lines, polygons) natively within the data cubes. Instead of fetching them from an external spatial DBMS, web service or a GIS file Some MDX functions allow to add spatial analysis capabilities to the analytical queries.
  • 21.
    Example Query Examplequery: filter spatial dimension members based on distance from a feature SELECT { [Measures].[Population]} on columns, Filter( {[Unite geographique].[Region economique].members}, ST_Distance([Unitegeographique].CurrentMember.Properties(&quot;geom&quot;),[Unite geographique].[Province].[Ontario].Properties(&quot;geom&quot;)) < 2.0 ) on rows FROM [Recensements] WHERE [Temps].[Rencensement 2001 (2001-2003)].[2001]
  • 22.
    Features Geometry objectsare handled using the JTS library (Open GIS Consortium STD) http://www.vividsolutions.com/jts For the moment, only PostgreSQL with the PostGIS spatial extension is supported as a data source for Geometry values
  • 23.
    Conclusions Open SourceSolution for BI Applications Active Community Developing on both projects: Mondrian – Geo Mondrian Database Independent Alternative for BI
  • 24.
    Thanks for theAttention Questions?