An Introduction to OLAP And Data warehouse Ashish Awasthi
Overview What is DWH What is OLAP Different Flavors of OLAP Questions
OLAP: 3 Tier DSS * Data Warehouse Database Layer Store atomic data in industry standard Data Warehouse. OLAP Engine Application Logic Layer Generate SQL execution plans in the OLAP engine to obtain OLAP functionality. Decision Support Client Presentation Layer Obtain multi-dimensional reports from the DSS Client.
What is OLAP? OLAP  - On Line Analytical Processing An approach to provide answers to analytical queries which are multidimensional in nature Typical applications are: Sales report Biz report for marketing Database for OLAP employ a multi-dimensional model
What is OLAP Cont… O/P typically displayed as a matrix Rows being the dimension Columns being the measures (the values) Indexes Bit-map Index Join Indices
OLAP Servers High-capacity data manipulation engine designed to  support and operate on multi-dimensional data structures. Two approaches for information retrieval: - Physically stage the processed multi-dimensional  information -- rapid response time – preferred choice. - Populate its data structures in real-time from relational or other databases.
Different Flavors - MOLAP Advantages Fast query performance due to optimized storage, multidimensional indexing and caching Automated computation of higher level aggregates of the data Very compact for low dimension data sets  Disadvantages The processing step (data load) can be quite lengthy, especially on large data volumes Introduces data redundancy Some MOAP query tools have difficulty querying models with dimensions with very high cardinality (e.g. million of members) MOLAP - Multidimensional Online Analytical Processing  Pre-computes and stores information in the form of a cube Uses an optimized multi-dimensional array storage rather than relational database The way each dimension is aggregated is defined in advance Products/Vendors Microsoft Analysis Service EssBase MIS Alea Palo (Open Source)
Different Flavors - ROLAP ROLAP Works directly with Relational DB Does not require pre-computation and storage of information Uses additional tables (summary or aggregations) which summarize data in desired combination of dimension Products/Vendors Microsoft Analysis Service Micro Strategy Oracle BI Business Objects Mondrian (Open Source) Advantages More Scalable in handling large data volume esp. with dimensions with high cardinality Data load is much faster as compared to MOLAP Data is stored in relational format and can be accessed by any standard SQL query tool Disadvantages The loading of aggregate tables must be managed by custom ETL tool If aggregate tables not created, performance of the queries suffers Relies on general purpose database for indexing and caching - special MOLAP indexing techniques are not available
Industry Trends - HOLAP Allows part of the data in the MOLAP store and another part of the data in ROLAP store Vertical partitioning model Stores aggregations  in MOLAP for fast query performance Stores detailed data in ROLAP to optimize time of cube  processing Horizontal partitioning model Stores some slice of data, usually the more recent one (i.e. sliced by Time dimension) in MOLAP and older data in ROLAP Products/Vendors MS Analysis Service Micro Strategy SAP AG BI Accelerator
Multidimensional Database Architecture MDDB Source Databases ETL Metadata Repository Data Modeling Tool Warehouse Admin Tool RDBMS Local Metadata Local Metadata Data Access Tools Data Access Tools
APIs and Query Languages MDX (Multidimensional Expressions) A query language for OLAP Is a Microsoft owned spec and not yet an open standard Adopted by various vendors  MS Reporting Service, SAP, NCR, BO, Crystal Reports, Cognos, MS Excel etc. mdXML It is part of XML for Analysis standard released by XML Council in 2001 Olap4J An open Java API for building OLAP applications. Parallel to JDBC.
MDX - Query Example The SELECT clause sets the query axes as the Store Sales Amount member of the Measures dimension, and the 2002 and 2003 members of the Date dimension.  The FROM clause indicates that the data source is the Sales cube.  The WHERE clause defines the "slicer axis" as the California member of the Store dimension.  SELECT { [Measures].[Store Sales] } ON COLUMNS, { [Date].[2002], [Date].[2003] } ON ROWS FROM Sales WHERE ( [Store].[USA].[CA] )
XMLA – XML for Analysis The industry standard for data access in analytical systems (OLAP, Data Mining) Based on industry standard such as XML, SOAP and HTTP XMLA Providers MS Analysis Service 2005 Hyperion Essbase 7 MS XMLA SDK Mondrian <soap:Envelope> <soap:Body>   <Execute xmlns=&quot;urn:schemas-microsoft-com:xml-  analysis&quot;>    <Command> <Statement>SELECT Measures.MEMBERS    ON COLUMNS FROM Sales</Statement>    </Command>    <Properties>   <PropertyList>   <DataSourceInfo/> <Catalog>FoodMart</Catalog>  <Format>Multidimensional</Format>  <AxisFormat>TupleFormat</AxisFormat>    </PropertyList>  </Properties>  </Execute>  </soap:Body> </soap:Envelope>
Top OLAP Vendors MOLAP :  Hyperion (Arbor Essbase),  Oracle Express ROLAP :  Informix MetaCube,  Microstrategy DSS Agent HOLAP :  Microsoft Analysis Services,  MicroStrategy DSS Agent, SAP AG
Microsoft Analysis Service Supports MOLAP, ROLAP and HOLAP Uses MDX as query language Partition Storage Modes MOLAP :Both fact data and aggregations are processed, stored, and indexed using a special format optimized for multidimensional data ROLAP : Both fact data and aggregations remain in the relational data source, eliminating the need for special processing HOLAP : Uses the relational data source to store the fact data, but pre-processes aggregations and indexes, storing these in a special format, optimized for multidimensional data
Microsoft Analysis Service Cont… Dimension Storage Modes MOLAP - dimension attributes and hierarchies are processed and stored in the special format  ROLAP - dimension attributes are not processed and remain in the relational data source. Partitions dimensioned by ROLAP dimensions must be in the ROLAP mode as well
Microsoft Analysis Service APIs Querying  XML For Analysis (Can be used from any platform and any language) – Good for us !! OLEDB, ADO.NET ( COM based and suitable for apps on Windows platform)
Mondrian OLAP Server written in Java “ ROLAP” architecture Works with all popular open source and proprietary DBs Good News!! View data “dimensionally” i.e. Sales by region, by channel, by time period Navigate and explore Ad Hoc analysis “ Drill-down” from year to quarter Pivot Select specific members for analysis Web-based or Excel front ends
Mondrian High performance, interactive analysis of large or small volumes of information  &quot;Dimensional&quot; exploration of data, for example analyzing sales by product line, by region, by time period  Parsing of Multi-Dimensional eXpression (MDX) language into Structured Query Language (SQL) to retrieve answers to dimensional queries  High-speed queries through the use of aggregate tables in the RDBMS  Advanced calculations using the calculation expressions of the MDX language
Mondrian Client Access API Olap4J : An open Java API for building OLAP applications Olap4j is to multidimensional data what JDBC is  for relational data . An OLAP application in Java for one server (say  Mondrian) can be easily switched to another (say  Microsoft Analysis Services, accessed via XML  for Analysis).
Mondrian Architecture

Olap introduction

  • 1.
    An Introduction toOLAP And Data warehouse Ashish Awasthi
  • 2.
    Overview What isDWH What is OLAP Different Flavors of OLAP Questions
  • 3.
    OLAP: 3 TierDSS * Data Warehouse Database Layer Store atomic data in industry standard Data Warehouse. OLAP Engine Application Logic Layer Generate SQL execution plans in the OLAP engine to obtain OLAP functionality. Decision Support Client Presentation Layer Obtain multi-dimensional reports from the DSS Client.
  • 4.
    What is OLAP?OLAP - On Line Analytical Processing An approach to provide answers to analytical queries which are multidimensional in nature Typical applications are: Sales report Biz report for marketing Database for OLAP employ a multi-dimensional model
  • 5.
    What is OLAPCont… O/P typically displayed as a matrix Rows being the dimension Columns being the measures (the values) Indexes Bit-map Index Join Indices
  • 6.
    OLAP Servers High-capacitydata manipulation engine designed to support and operate on multi-dimensional data structures. Two approaches for information retrieval: - Physically stage the processed multi-dimensional information -- rapid response time – preferred choice. - Populate its data structures in real-time from relational or other databases.
  • 7.
    Different Flavors -MOLAP Advantages Fast query performance due to optimized storage, multidimensional indexing and caching Automated computation of higher level aggregates of the data Very compact for low dimension data sets Disadvantages The processing step (data load) can be quite lengthy, especially on large data volumes Introduces data redundancy Some MOAP query tools have difficulty querying models with dimensions with very high cardinality (e.g. million of members) MOLAP - Multidimensional Online Analytical Processing Pre-computes and stores information in the form of a cube Uses an optimized multi-dimensional array storage rather than relational database The way each dimension is aggregated is defined in advance Products/Vendors Microsoft Analysis Service EssBase MIS Alea Palo (Open Source)
  • 8.
    Different Flavors -ROLAP ROLAP Works directly with Relational DB Does not require pre-computation and storage of information Uses additional tables (summary or aggregations) which summarize data in desired combination of dimension Products/Vendors Microsoft Analysis Service Micro Strategy Oracle BI Business Objects Mondrian (Open Source) Advantages More Scalable in handling large data volume esp. with dimensions with high cardinality Data load is much faster as compared to MOLAP Data is stored in relational format and can be accessed by any standard SQL query tool Disadvantages The loading of aggregate tables must be managed by custom ETL tool If aggregate tables not created, performance of the queries suffers Relies on general purpose database for indexing and caching - special MOLAP indexing techniques are not available
  • 9.
    Industry Trends -HOLAP Allows part of the data in the MOLAP store and another part of the data in ROLAP store Vertical partitioning model Stores aggregations in MOLAP for fast query performance Stores detailed data in ROLAP to optimize time of cube processing Horizontal partitioning model Stores some slice of data, usually the more recent one (i.e. sliced by Time dimension) in MOLAP and older data in ROLAP Products/Vendors MS Analysis Service Micro Strategy SAP AG BI Accelerator
  • 10.
    Multidimensional Database ArchitectureMDDB Source Databases ETL Metadata Repository Data Modeling Tool Warehouse Admin Tool RDBMS Local Metadata Local Metadata Data Access Tools Data Access Tools
  • 11.
    APIs and QueryLanguages MDX (Multidimensional Expressions) A query language for OLAP Is a Microsoft owned spec and not yet an open standard Adopted by various vendors MS Reporting Service, SAP, NCR, BO, Crystal Reports, Cognos, MS Excel etc. mdXML It is part of XML for Analysis standard released by XML Council in 2001 Olap4J An open Java API for building OLAP applications. Parallel to JDBC.
  • 12.
    MDX - QueryExample The SELECT clause sets the query axes as the Store Sales Amount member of the Measures dimension, and the 2002 and 2003 members of the Date dimension. The FROM clause indicates that the data source is the Sales cube. The WHERE clause defines the &quot;slicer axis&quot; as the California member of the Store dimension. SELECT { [Measures].[Store Sales] } ON COLUMNS, { [Date].[2002], [Date].[2003] } ON ROWS FROM Sales WHERE ( [Store].[USA].[CA] )
  • 13.
    XMLA – XMLfor Analysis The industry standard for data access in analytical systems (OLAP, Data Mining) Based on industry standard such as XML, SOAP and HTTP XMLA Providers MS Analysis Service 2005 Hyperion Essbase 7 MS XMLA SDK Mondrian <soap:Envelope> <soap:Body> <Execute xmlns=&quot;urn:schemas-microsoft-com:xml- analysis&quot;> <Command> <Statement>SELECT Measures.MEMBERS ON COLUMNS FROM Sales</Statement> </Command> <Properties> <PropertyList> <DataSourceInfo/> <Catalog>FoodMart</Catalog> <Format>Multidimensional</Format> <AxisFormat>TupleFormat</AxisFormat> </PropertyList> </Properties> </Execute> </soap:Body> </soap:Envelope>
  • 14.
    Top OLAP VendorsMOLAP : Hyperion (Arbor Essbase), Oracle Express ROLAP : Informix MetaCube, Microstrategy DSS Agent HOLAP : Microsoft Analysis Services, MicroStrategy DSS Agent, SAP AG
  • 15.
    Microsoft Analysis ServiceSupports MOLAP, ROLAP and HOLAP Uses MDX as query language Partition Storage Modes MOLAP :Both fact data and aggregations are processed, stored, and indexed using a special format optimized for multidimensional data ROLAP : Both fact data and aggregations remain in the relational data source, eliminating the need for special processing HOLAP : Uses the relational data source to store the fact data, but pre-processes aggregations and indexes, storing these in a special format, optimized for multidimensional data
  • 16.
    Microsoft Analysis ServiceCont… Dimension Storage Modes MOLAP - dimension attributes and hierarchies are processed and stored in the special format ROLAP - dimension attributes are not processed and remain in the relational data source. Partitions dimensioned by ROLAP dimensions must be in the ROLAP mode as well
  • 17.
    Microsoft Analysis ServiceAPIs Querying XML For Analysis (Can be used from any platform and any language) – Good for us !! OLEDB, ADO.NET ( COM based and suitable for apps on Windows platform)
  • 18.
    Mondrian OLAP Serverwritten in Java “ ROLAP” architecture Works with all popular open source and proprietary DBs Good News!! View data “dimensionally” i.e. Sales by region, by channel, by time period Navigate and explore Ad Hoc analysis “ Drill-down” from year to quarter Pivot Select specific members for analysis Web-based or Excel front ends
  • 19.
    Mondrian High performance,interactive analysis of large or small volumes of information &quot;Dimensional&quot; exploration of data, for example analyzing sales by product line, by region, by time period Parsing of Multi-Dimensional eXpression (MDX) language into Structured Query Language (SQL) to retrieve answers to dimensional queries High-speed queries through the use of aggregate tables in the RDBMS Advanced calculations using the calculation expressions of the MDX language
  • 20.
    Mondrian Client AccessAPI Olap4J : An open Java API for building OLAP applications Olap4j is to multidimensional data what JDBC is for relational data . An OLAP application in Java for one server (say Mondrian) can be easily switched to another (say Microsoft Analysis Services, accessed via XML for Analysis).
  • 21.