Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RailsWayCon: Multidimensional Data Analysis with JRuby


Published on

Presentation at RailsWayCon 2011 conference

  • Be the first to comment

RailsWayCon: Multidimensional Data Analysis with JRuby

  1. 1. Raimonds SimanovskisMultidimensionalData Analysiswith JRuby
  2. 2. Raimonds Simanovskis @rsim .com
  3. 3. Relationaldata model
  4. 4. SQL is good for detailed data queries Get all sales transactions in USA, CaliforniaSELECT customers.fullname, products.product_name, sales.sales_date, sales.unit_sales, sales.store_salesFROM sales LEFT JOIN products ON sales.product_id = LEFT JOIN customers ON sales.customer_id = customers.idWHERE = USA AND customers.state_province = CA
  5. 5. SQL becomes complex for analytical queries Get total sales in USA, California in Q1, 2011 by main product groupsSELECT product_class.product_family, SUM(sales.unit_sales) unit_sales_sum, SUM(sales.store_sales) store_sales_sum FROM sales LEFT JOIN product ON sales.product_id = product.product_id LEFT JOIN product_class ON product.product_class_id = product_class.product_class_id LEFT JOIN time_by_day ON sales.time_id = time_by_day.time_id LEFT JOIN customer ON sales.customer_id = customer.customer_id WHERE time_by_day.the_year = 2011 AND time_by_day.quarter = Q1 AND = USA AND customer.state_province = CA GROUP BY product_class.product_family
  6. 6. If SQL is not good then we need NoSQL!
  7. 7. Maybe write distributedmap reduce function?
  8. 8. Multidimensional Data ModelMultidimensional cubes DimensionsHierarchies and levels Measures
  9. 9. OLAP technologies On-Line Analytical Processing
  10. 10. Commercial Vendors Oracle Essbase SAP BUSINESSOBJECTSOracle OLAP Cognos Analysis Services
  11. 11. MDX query language Get total units sold and sales amount in USA, California in Q1, 2011 by main product groupsSELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS, [Product].children ON ROWSFROM [Sales]WHERE ( [Time].[2011].[Q1], [Customers].[USA].[CA] )
  12. 12.
  13. 13. (R)OLAP schemaDimensional model: cubes dimensions (hierarchies & levels) measures, calculated measures MappingRelational model: fact tables, dimension tables joined by foreign keys
  14. 14. OLAP schema definitionschema = Mondrian::OLAP::Schema.define do cube Sales do table sales dimension Gender, :foreign_key => customer_id do hierarchy :has_all => true, :primary_key => customer_id do table customer level Gender, :column => gender, :unique_members => true end end dimension Time, :foreign_key => time_id do hierarchy :has_all => false, :primary_key => time_id do table time_by_day level Year, :column => the_year, :type => Numeric, :unique_members => true level Quarter, :column => quarter, :unique_members => false level Month,:column => month_of_year,:type => Numeric,:unique_members => false end end measure Unit Sales, :column => unit_sales, :aggregator => sum measure Store Sales, :column => store_sales, :aggregator => sum endend
  15. 15. Query Builder in Ruby Get total units sold and sales amount in USA, California in Q1, 2011 by main product groupsolap.from(Sales).columns([Measures].[Unit Sales], [Measures].[Store Sales]).rows([Product].children).where([Time].[2011].[Q1], [Customers].[USA].[CA]).execute
  16. 16. Also more complex queries Get sales amount and profit % of top 50 products sold in USA and Canada during Q1, 2011olap.from(Sales).with_member([Measures].[ProfitPct]). as((Measures.[Store Sales] - Measures.[Store Cost]) / Measures.[Store Sales], :format_string => Percent).columns([Measures].[Store Sales], [Measures].[ProfitPct]).rows([Product].children).crossjoin([Customers].[Canada], [Customers].[USA]). top_count(50, [Measures].[Store Sales])where([Time].[2011].[Q1]).execute
  17. 17. Demo
  18. 18. Used in