Raimonds SimanovskisMultidimensionalData Analysiswith JRuby
Raimonds Simanovskis       github.com/rsim         @rsim             .com
Relationaldata model
SQL is good for detailed       data queries           Get all sales transactions in           USA, CaliforniaSELECT custom...
SQL becomes complex       for analytical queries           Get total sales in USA, California           in Q1, 2011 by mai...
If SQL is not good   then we need      NoSQL!
Maybe write distributedmap reduce function?                http://browsertoolkit.com/fault-tolerance.png
Multidimensional      Data ModelMultidimensional cubes     DimensionsHierarchies and levels      Measures
OLAP technologies  On-Line Analytical Processing
Commercial Vendors                 Oracle Essbase   SAP BUSINESSOBJECTSOracle OLAP        Cognos                          ...
MDX query language          Get total units sold and sales amount          in USA, California in Q1, 2011          by main...
http://github.com/rsim/mondrian-olap
(R)OLAP schemaDimensional model: cubes dimensions (hierarchies & levels) measures, calculated measures                   M...
OLAP schema                       definitionschema = Mondrian::OLAP::Schema.define do  cube Sales do    table sales    dim...
Query Builder in              Ruby       Get total units sold and sales amount       in USA, California in Q1, 2011       ...
Also more complex                queries           Get sales amount and profit %           of top 50 products sold in USA ...
Demo
Used in eazybi.com
RailsWayCon: Multidimensional Data Analysis with JRuby
Upcoming SlideShare
Loading in...5
×

RailsWayCon: Multidimensional Data Analysis with JRuby

15,239

Published on

Presentation at RailsWayCon 2011 conference

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
15,239
On Slideshare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
45
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Transcript of "RailsWayCon: Multidimensional Data Analysis with JRuby"

  1. 1. Raimonds SimanovskisMultidimensionalData Analysiswith JRuby
  2. 2. Raimonds Simanovskis github.com/rsim @rsim .com
  3. 3. Relationaldata model
  4. 4. SQL is good for detailed data queries Get all sales transactions in USA, CaliforniaSELECT customers.fullname, products.product_name, sales.sales_date, sales.unit_sales, sales.store_salesFROM sales LEFT JOIN products ON sales.product_id = products.id LEFT JOIN customers ON sales.customer_id = customers.idWHERE customers.country = USA AND customers.state_province = CA
  5. 5. SQL becomes complex for analytical queries Get total sales in USA, California in Q1, 2011 by main product groupsSELECT product_class.product_family, SUM(sales.unit_sales) unit_sales_sum, SUM(sales.store_sales) store_sales_sum FROM sales LEFT JOIN product ON sales.product_id = product.product_id LEFT JOIN product_class ON product.product_class_id = product_class.product_class_id LEFT JOIN time_by_day ON sales.time_id = time_by_day.time_id LEFT JOIN customer ON sales.customer_id = customer.customer_id WHERE time_by_day.the_year = 2011 AND time_by_day.quarter = Q1 AND customer.country = USA AND customer.state_province = CA GROUP BY product_class.product_family
  6. 6. If SQL is not good then we need NoSQL!
  7. 7. Maybe write distributedmap reduce function? http://browsertoolkit.com/fault-tolerance.png
  8. 8. Multidimensional Data ModelMultidimensional cubes DimensionsHierarchies and levels Measures
  9. 9. OLAP technologies On-Line Analytical Processing
  10. 10. Commercial Vendors Oracle Essbase SAP BUSINESSOBJECTSOracle OLAP Cognos Analysis Services
  11. 11. MDX query language Get total units sold and sales amount in USA, California in Q1, 2011 by main product groupsSELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS, [Product].children ON ROWSFROM [Sales]WHERE ( [Time].[2011].[Q1], [Customers].[USA].[CA] )
  12. 12. http://github.com/rsim/mondrian-olap
  13. 13. (R)OLAP schemaDimensional model: cubes dimensions (hierarchies & levels) measures, calculated measures MappingRelational model: fact tables, dimension tables joined by foreign keys
  14. 14. OLAP schema definitionschema = Mondrian::OLAP::Schema.define do cube Sales do table sales dimension Gender, :foreign_key => customer_id do hierarchy :has_all => true, :primary_key => customer_id do table customer level Gender, :column => gender, :unique_members => true end end dimension Time, :foreign_key => time_id do hierarchy :has_all => false, :primary_key => time_id do table time_by_day level Year, :column => the_year, :type => Numeric, :unique_members => true level Quarter, :column => quarter, :unique_members => false level Month,:column => month_of_year,:type => Numeric,:unique_members => false end end measure Unit Sales, :column => unit_sales, :aggregator => sum measure Store Sales, :column => store_sales, :aggregator => sum endend
  15. 15. Query Builder in Ruby Get total units sold and sales amount in USA, California in Q1, 2011 by main product groupsolap.from(Sales).columns([Measures].[Unit Sales], [Measures].[Store Sales]).rows([Product].children).where([Time].[2011].[Q1], [Customers].[USA].[CA]).execute
  16. 16. Also more complex queries Get sales amount and profit % of top 50 products sold in USA and Canada during Q1, 2011olap.from(Sales).with_member([Measures].[ProfitPct]). as((Measures.[Store Sales] - Measures.[Store Cost]) / Measures.[Store Sales], :format_string => Percent).columns([Measures].[Store Sales], [Measures].[ProfitPct]).rows([Product].children).crossjoin([Customers].[Canada], [Customers].[USA]). top_count(50, [Measures].[Store Sales])where([Time].[2011].[Q1]).execute
  17. 17. Demo
  18. 18. Used in eazybi.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×