Your SlideShare is downloading. ×
Multidimensional Data Analysis   with Ruby   Raimonds Simanovskis
AbstractWe have a lot of data in our databases but quite often users dont get the full benefitof these data as they dont h...
Example slides
SQL query like thisSELECT SUM(sales.unit_sales) unit_sales_sum,       SUM(sales.store_sales) store_sales_sum    FROM sales...
Could be written in           MDX like thisSELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS,         ...
Or in Ruby like thisolap.from(Sales).columns([Measures].[Unit Sales], [Measures].[Store Sales]).rows([Product].children).w...
More complex                      queriesolap.from(Sales).with_member([Measures].[ProfitPct]).  as(Val((Measures.[Store Sa...
OLAP schema                       definitionschema = Mondrian::OLAP::Schema.newschema.define do  cube Sales do    table sa...
Multidimensional Data Analysis with Ruby (sample)
Upcoming SlideShare
Loading in...5
×

Multidimensional Data Analysis with Ruby (sample)

4,019

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,019
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Multidimensional Data Analysis with Ruby (sample)"

  1. 1. Multidimensional Data Analysis with Ruby Raimonds Simanovskis
  2. 2. AbstractWe have a lot of data in our databases but quite often users dont get the full benefitof these data as they dont have good tools how to analyze these data. SQLlanguage is good for doing ad-hoc queries but it becomes very complicated when youneed to make more complex analytical queries to get summary results. And also newNoSQL databases are focusing more on effective processing of detailed records andnot on analytical processing.There is a range of OLAP (On-Line Analytical Processing) databases and enginesthat are focused on making easier multi-dimensional analysis of your data at differentsummary levels. One of most-popular open-source OLAP engines is Mondrian(mondrian.pentaho.com) which can be put in front of your relational SQL databasebut it provides MDX multi-dimensional query language which is much more suited foranalytical purposes.I have developed mondrian-olap gem (soon to be released) which integratesMondian OLAP engine using JRuby Java integration and provides Ruby DSL forcreating OLAP schemas on top of relational database schemas and provides MDXquery language or ActiveRecord/Arel-like query language for making analyticalqueries. It will be presented how to use it for new or existing Ruby on Railsapplications and how to do much easier data analysis compared to standardActiveRecord queries.
  3. 3. Example slides
  4. 4. SQL query like thisSELECT SUM(sales.unit_sales) unit_sales_sum, SUM(sales.store_sales) store_sales_sum FROM sales LEFT JOIN product ON sales.product_id = product.product_id LEFT JOIN product_class ON product.product_class_id = product_class.product_class_id LEFT JOIN time_by_day ON sales.time_id = time_by_day.time_id LEFT JOIN customer ON sales.customer_id = customer.customer_id WHERE time_by_day.the_year = 2011 AND time_by_day.quarter = Q1 AND customer.country = USA AND customer.state_province = CA GROUP BY product_class.product_family
  5. 5. Could be written in MDX like thisSELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS, [Product].children ON ROWS FROM [Sales] WHERE ([Time].[2011].[Q1], [Customers].[USA].[CA])
  6. 6. Or in Ruby like thisolap.from(Sales).columns([Measures].[Unit Sales], [Measures].[Store Sales]).rows([Product].children).where([Time].[2011].[Q1], [Customers].[USA].[CA]) Get sales amount and number of units of all products sold in California during Q1 of 2011
  7. 7. More complex queriesolap.from(Sales).with_member([Measures].[ProfitPct]). as(Val((Measures.[Store Sales] - Measures.[Store Cost]) / Measures.[Store Sales]), :format_string => Percent).columns([Measures].[Store Sales], [Measures].[ProfitPct]).rows([Product].children).crossjoin([Customers].[Canada], [Customers].[USA]). top_count(50, [Measures].[Store Sales])where([Time].[2011].[Q1]) Get sales amount and profit % of top 50 products cross-joined with USA and Canada country sales during Q1 of 2011
  8. 8. OLAP schema definitionschema = Mondrian::OLAP::Schema.newschema.define do cube Sales do table sales dimension Gender, :foreign_key => customer_id do hierarchy :has_all => true, :primary_key => customer_id do table customer level Gender, :column => gender, :unique_members => true end end dimension Time, :foreign_key => time_id do hierarchy :has_all => false, :primary_key => time_id do table time_by_day level Year, :column => the_year, :type => Numeric, :unique_members => true level Quarter, :column => quarter, :unique_members => false level Month,:column => month_of_year,:type => Numeric,:unique_members => false end end measure Unit Sales, :column => unit_sales, :aggregator => sum measure Store Sales, :column => store_sales, :aggregator => sum endend

×