Multidimensional
 Data Analysis
   with Ruby
   Raimonds Simanovskis
Abstract
We have a lot of data in our databases but quite often users don't get the full benefit
of these data as they don't have good tools how to analyze these data. SQL
language is good for doing ad-hoc queries but it becomes very complicated when you
need to make more complex analytical queries to get summary results. And also new
NoSQL databases are focusing more on effective processing of detailed records and
not on analytical processing.

There is a range of OLAP (On-Line Analytical Processing) databases and engines
that are focused on making easier multi-dimensional analysis of your data at different
summary levels. One of most-popular open-source OLAP engines is Mondrian
(mondrian.pentaho.com) which can be put in front of your relational SQL database
but it provides MDX multi-dimensional query language which is much more suited for
analytical purposes.

mondrian-olap gem (soon to be released) integrates Mondian OLAP engine using
JRuby Java integration and provides Ruby DSL for creating OLAP schemas on top of
relational database schemas and provides MDX query language or ActiveRecord/
Arel-like query language for making analytical queries. It will be presented how to use
it for new or existing Ruby on Rails applications and how to do much easier data
analysis compared to standard ActiveRecord queries.
Example slides
SQL query like this

SELECT SUM(sales.unit_sales) unit_sales_sum,
       SUM(sales.store_sales) store_sales_sum
    FROM sales
      LEFT JOIN product ON sales.product_id = product.product_id
      LEFT JOIN product_class
           ON product.product_class_id = product_class.product_class_id
      LEFT JOIN time_by_day ON sales.time_id = time_by_day.time_id
      LEFT JOIN customer ON sales.customer_id = customer.customer_id
    WHERE time_by_day.the_year = 2011 AND time_by_day.quarter = 'Q1'
      AND customer.country = 'USA' AND customer.state_province = 'CA'
    GROUP BY product_class.product_family
Could be written in
           MDX like this

SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS,
           [Product].children ON ROWS
     FROM [Sales]
     WHERE ([Time].[2011].[Q1], [Customers].[USA].[CA])
Or in Ruby like this

olap.from('Sales').
columns('[Measures].[Unit Sales]', '[Measures].[Store Sales]').
rows('[Product].children').
where('[Time].[2011].[Q1]', '[Customers].[USA].[CA]')




      Get sales amount and number of units of all
      products sold in California during Q1 of 2011
More complex
                      queries
olap.from('Sales').
with_member('[Measures].[ProfitPct]').
  as('Val((Measures.[Store Sales] - Measures.[Store Cost]) / Measures.[Store Sales])',
  :format_string => 'Percent').
columns('[Measures].[Store Sales]', '[Measures].[ProfitPct]').
rows('[Product].children').crossjoin('[Customers].[Canada]', '[Customers].[USA]').
  top_count(50, '[Measures].[Store Sales]')
where('[Time].[2011].[Q1]')




             Get sales amount and profit % of top 50
             products cross-joined with USA and Canada
             country sales during Q1 of 2011
OLAP schema
Dimensional model:
 cubes
 dimensions (hierarchies & levels)
 measures, calculated measures


                   Mapping


Relational model:
 fact tables, dimension tables
 joined by foreign keys
OLAP schema
                       definition
schema = Mondrian::OLAP::Schema.new
schema.define do
  cube 'Sales' do
    table 'sales'
    dimension 'Gender', :foreign_key => 'customer_id' do
      hierarchy :has_all => true, :primary_key => 'customer_id' do
        table 'customer'
        level 'Gender', :column => 'gender', :unique_members => true
      end
    end
    dimension 'Time', :foreign_key => 'time_id' do
      hierarchy :has_all => false, :primary_key => 'time_id' do
        table 'time_by_day'
        level 'Year', :column => 'the_year', :type => 'Numeric', :unique_members => true
        level 'Quarter', :column => 'quarter', :unique_members => false
        level 'Month',:column => 'month_of_year',:type => 'Numeric',:unique_members => false
      end
    end
    measure 'Unit Sales', :column => 'unit_sales', :aggregator => 'sum'
    measure 'Store Sales', :column => 'store_sales', :aggregator => 'sum'
  end
end

Multidimensional Data Analysis with Ruby (sample)

  • 1.
    Multidimensional Data Analysis with Ruby Raimonds Simanovskis
  • 2.
    Abstract We have alot of data in our databases but quite often users don't get the full benefit of these data as they don't have good tools how to analyze these data. SQL language is good for doing ad-hoc queries but it becomes very complicated when you need to make more complex analytical queries to get summary results. And also new NoSQL databases are focusing more on effective processing of detailed records and not on analytical processing. There is a range of OLAP (On-Line Analytical Processing) databases and engines that are focused on making easier multi-dimensional analysis of your data at different summary levels. One of most-popular open-source OLAP engines is Mondrian (mondrian.pentaho.com) which can be put in front of your relational SQL database but it provides MDX multi-dimensional query language which is much more suited for analytical purposes. mondrian-olap gem (soon to be released) integrates Mondian OLAP engine using JRuby Java integration and provides Ruby DSL for creating OLAP schemas on top of relational database schemas and provides MDX query language or ActiveRecord/ Arel-like query language for making analytical queries. It will be presented how to use it for new or existing Ruby on Rails applications and how to do much easier data analysis compared to standard ActiveRecord queries.
  • 3.
  • 4.
    SQL query likethis SELECT SUM(sales.unit_sales) unit_sales_sum, SUM(sales.store_sales) store_sales_sum FROM sales LEFT JOIN product ON sales.product_id = product.product_id LEFT JOIN product_class ON product.product_class_id = product_class.product_class_id LEFT JOIN time_by_day ON sales.time_id = time_by_day.time_id LEFT JOIN customer ON sales.customer_id = customer.customer_id WHERE time_by_day.the_year = 2011 AND time_by_day.quarter = 'Q1' AND customer.country = 'USA' AND customer.state_province = 'CA' GROUP BY product_class.product_family
  • 5.
    Could be writtenin MDX like this SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS, [Product].children ON ROWS FROM [Sales] WHERE ([Time].[2011].[Q1], [Customers].[USA].[CA])
  • 6.
    Or in Rubylike this olap.from('Sales'). columns('[Measures].[Unit Sales]', '[Measures].[Store Sales]'). rows('[Product].children'). where('[Time].[2011].[Q1]', '[Customers].[USA].[CA]') Get sales amount and number of units of all products sold in California during Q1 of 2011
  • 7.
    More complex queries olap.from('Sales'). with_member('[Measures].[ProfitPct]'). as('Val((Measures.[Store Sales] - Measures.[Store Cost]) / Measures.[Store Sales])', :format_string => 'Percent'). columns('[Measures].[Store Sales]', '[Measures].[ProfitPct]'). rows('[Product].children').crossjoin('[Customers].[Canada]', '[Customers].[USA]'). top_count(50, '[Measures].[Store Sales]') where('[Time].[2011].[Q1]') Get sales amount and profit % of top 50 products cross-joined with USA and Canada country sales during Q1 of 2011
  • 8.
    OLAP schema Dimensional model: cubes dimensions (hierarchies & levels) measures, calculated measures Mapping Relational model: fact tables, dimension tables joined by foreign keys
  • 9.
    OLAP schema definition schema = Mondrian::OLAP::Schema.new schema.define do cube 'Sales' do table 'sales' dimension 'Gender', :foreign_key => 'customer_id' do hierarchy :has_all => true, :primary_key => 'customer_id' do table 'customer' level 'Gender', :column => 'gender', :unique_members => true end end dimension 'Time', :foreign_key => 'time_id' do hierarchy :has_all => false, :primary_key => 'time_id' do table 'time_by_day' level 'Year', :column => 'the_year', :type => 'Numeric', :unique_members => true level 'Quarter', :column => 'quarter', :unique_members => false level 'Month',:column => 'month_of_year',:type => 'Numeric',:unique_members => false end end measure 'Unit Sales', :column => 'unit_sales', :aggregator => 'sum' measure 'Store Sales', :column => 'store_sales', :aggregator => 'sum' end end