Successfully reported this slideshow.

RailsWayCon: Multidimensional Data Analysis with JRuby



Loading in …3
1 of 19
1 of 19

More Related Content

Related Audiobooks

Free with a 14 day trial from Scribd

See all

RailsWayCon: Multidimensional Data Analysis with JRuby

  1. 1. Raimonds Simanovskis Multidimensional Data Analysis with JRuby
  2. 2. Raimonds Simanovskis @rsim .com
  3. 3. Relational data model
  4. 4. SQL is good for detailed data queries Get all sales transactions in USA, California SELECT customers.fullname, products.product_name, sales.sales_date, sales.unit_sales, sales.store_sales FROM sales LEFT JOIN products ON sales.product_id = LEFT JOIN customers ON sales.customer_id = WHERE = 'USA' AND customers.state_province = 'CA'
  5. 5. SQL becomes complex for analytical queries Get total sales in USA, California in Q1, 2011 by main product groups SELECT product_class.product_family, SUM(sales.unit_sales) unit_sales_sum, SUM(sales.store_sales) store_sales_sum FROM sales LEFT JOIN product ON sales.product_id = product.product_id LEFT JOIN product_class ON product.product_class_id = product_class.product_class_id LEFT JOIN time_by_day ON sales.time_id = time_by_day.time_id LEFT JOIN customer ON sales.customer_id = customer.customer_id WHERE time_by_day.the_year = 2011 AND time_by_day.quarter = 'Q1' AND = 'USA' AND customer.state_province = 'CA' GROUP BY product_class.product_family
  6. 6. If SQL is not good then we need NoSQL!
  7. 7. Maybe write distributed map reduce function?
  8. 8. Multidimensional Data Model Multidimensional cubes Dimensions Hierarchies and levels Measures
  9. 9. OLAP technologies On-Line Analytical Processing
  10. 10. Commercial Vendors Oracle Essbase SAP BUSINESSOBJECTS Oracle OLAP Cognos Analysis Services
  11. 11. MDX query language Get total units sold and sales amount in USA, California in Q1, 2011 by main product groups SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS, [Product].children ON ROWS FROM [Sales] WHERE ( [Time].[2011].[Q1], [Customers].[USA].[CA] )
  12. 12.
  13. 13. (R)OLAP schema Dimensional model: cubes dimensions (hierarchies & levels) measures, calculated measures Mapping Relational model: fact tables, dimension tables joined by foreign keys
  14. 14. OLAP schema definition schema = Mondrian::OLAP::Schema.define do cube 'Sales' do table 'sales' dimension 'Gender', :foreign_key => 'customer_id' do hierarchy :has_all => true, :primary_key => 'customer_id' do table 'customer' level 'Gender', :column => 'gender', :unique_members => true end end dimension 'Time', :foreign_key => 'time_id' do hierarchy :has_all => false, :primary_key => 'time_id' do table 'time_by_day' level 'Year', :column => 'the_year', :type => 'Numeric', :unique_members => true level 'Quarter', :column => 'quarter', :unique_members => false level 'Month',:column => 'month_of_year',:type => 'Numeric',:unique_members => false end end measure 'Unit Sales', :column => 'unit_sales', :aggregator => 'sum' measure 'Store Sales', :column => 'store_sales', :aggregator => 'sum' end end
  15. 15. Query Builder in Ruby Get total units sold and sales amount in USA, California in Q1, 2011 by main product groups olap.from('Sales'). columns('[Measures].[Unit Sales]', '[Measures].[Store Sales]'). rows('[Product].children'). where('[Time].[2011].[Q1]', '[Customers].[USA].[CA]'). execute
  16. 16. Also more complex queries Get sales amount and profit % of top 50 products sold in USA and Canada during Q1, 2011 olap.from('Sales'). with_member('[Measures].[ProfitPct]'). as('(Measures.[Store Sales] - Measures.[Store Cost]) / Measures.[Store Sales]', :format_string => 'Percent'). columns('[Measures].[Store Sales]', '[Measures].[ProfitPct]'). rows('[Product].children').crossjoin('[Customers].[Canada]', '[Customers].[USA]'). top_count(50, '[Measures].[Store Sales]') where('[Time].[2011].[Q1]'). execute
  17. 17. Demo
  18. 18. Used in