Successfully reported this slideshow.

Cubes - Lightweight Python OLAP (EuroPython 2012 talk)

43

Share

Loading in …3
×
1 of 75
1 of 75

Cubes - Lightweight Python OLAP (EuroPython 2012 talk)

43

Share

Description

Euro python 2012 talk. Video available: http://youtu.be/WSHD029BAls

Transcript

  1. 1. Cubes light-weight OLAP Stefan Urbanek ■ @Stiivi ■ stefan.urbanek@gmail.com ■ July 2012
  2. 2. source github.com/Stiivi/cubes documentation packages.python.org/cubes/
  3. 3. Overview ■ purpose ■ analytical modelling and OLAP ■ slicing and dicing ■ OLAP server ■ SQL backend
  4. 4. analytical data modelling lightweight
  5. 5. http://tendre.sme.sk
  6. 6. aggregation browsing slicing and dicing
  7. 7. modelling reporting aggregation browsing
  8. 8. Architecture
  9. 9. ✂ model browser http backends server
  10. 10. Logical Model multidimensional, analytical
  11. 11. business/analyst’s point of view
  12. 12. transactions analysis OLTP OLAP application (operational) data analytical data
  13. 13. Model { “name” = “My Model” “description” = .... “cubes” = [...] “dimensions” = [...] } cubes dimensions measures levels, attributes, hierarchy
  14. 14. Facts measurable fact fact data cell most detailed information
  15. 15. location type time dimensions
  16. 16. Dimension ■ provide context for facts ■ used to filter queries or reports ■ control scope of aggregation of facts
  17. 17. Hierarchy 2010 May 1st levels
  18. 18. Dimension ■ levels and attributes “dimensions” = [ { ■ hierarchy* “name”:”date”, “levels”: ... ■ key attributes }, “hierarchy”: ... ... ■ label attributes ] *partial support for multiple hierarchies
  19. 19. label attribute key attribute for links to slices
  20. 20. Cube “cubes” = [ { “name”:”contracts”, “dimensions”: [ “date”, “category” ] “measures”: [ ■ dimensions { “name”: “amount”, “label”: “Contract Amount”, ■ measures } “aggregations”: [“sum”] ] }, ... ] *partial support for multiple hierarchies
  21. 21. "attributes": [ { "name":"group", "label": "Group code" localizable }, { "name":"group_label", model and attributes "label": "Group", "locales": ["en", "sk"] } ]
  22. 22. Aggregation Browser ∑
  23. 23. ∑ measures
  24. 24. get more details
  25. 25. Aggregation Browser SQL Snowflake SQL Denormalized Some HTTP Data MongoDB Browser Browser Browser Service Browser ? “batteries” that are included
  26. 26. Browser Workspace logical model + data
  27. 27. Cell
  28. 28. context of interest cell
  29. 29. cell
  30. 30. Path [45,2] [2012, 6] list of level keys
  31. 31. 1 load_model("model.json") Application ∑ 3 model.cube("sales") 4 workspace.browser(cube) cubes Aggregation Browser backend 2 create_workspace("sql", model, url="sqlite:///data.sqlite")
  32. 32. summary drill-down
  33. 33. browser.aggregate(o cell) summary
  34. 34. browser.aggregate(o cell, . drilldown=[9 "sector"]) drill-down
  35. 35. for row in result.drilldown: row["amount_sum"] row[q label_attribute] row[k key]
  36. 36. received_amount_sum measure aggregation record_count
  37. 37. browser.facts(o cell) browser.values(o cell, 9 dimension) browser.cell_details(o cell)
  38. 38. ✂ Slicing and Dicing ✂
  39. 39. ✂ ✂ April 2012 constructi on work construction work in april 2012 type supplier date
  40. 40. cut types ✂ point set range [[2010,10], from=[2010,10] [2010] [2010,12]] to=[2010,12]
  41. 41. Implicit Hierarchy drilldown
  42. 42. whole cube o cell = Cell(cube) browser.aggregate(o cell) Total browser.aggregate(o cell, drilldown=[9 “date”]) 2006 2007 2008 2009 2010 ✂ cut = PointCut(9 “date”, [2010]) o cell = o cell.slice(✂ cut) browser.aggregate(o cell, drilldown=[9 “date”]) Jan Feb Mar Apr March April May ...
  43. 43. Drill-down Level . drilldown = [9 "date"] implicit: next from o cell . drilldown = {9 "date": "month"} explicit
  44. 44. Cross Table experimental interface
  45. 45. 2009 2010 Assets Due from Banks 3044 1803 Assets Investments 41012 36012 Assets Loans Outstanding 103657 118104 Assets Nonnegotiable 1202 1123 Assets Other Assets 2247 3071 Assets Other Receivables 984 811 Assets Receivables 176 171 Assets Securities 33 289 Equity Capital Stock 11491 11492 Equity Deferred Amounts 359 313 Equity Other -1683 -3043 Equity Retained Earnings 29870 28793 Liabilities Borrowings 110040 128577 Liabilities Derivative Liabilities 115642 110418 Liabilities Other 57 8 Liabilities Other Liabilities 7321 5454 Liabilities Sold or Lent 2323 998
  46. 46. rows = ["item.category", "item.subcategory"] columns = ["year"] measures = ["amount_sum"] table = result.cross_table( rows, columns, measures )
  47. 47. Slicer The HTTP OLAP Server ✂
  48. 48. Application HTTP JSON Slicer ∑ Aggregation Browser
  49. 49. GET /model GET /aggregate GET /values GET /report
  50. 50. w logical model configuration data $ slicer serve slicer.ini
  51. 51. [server] backend: sql log_level: info [model] path: model.json locales: en,sk [workspace] url: postgres://localhost/database schema: datamart fact_prefix: ft_ dimension_prefix: dm_ w
  52. 52. ∑ amount GET /aggregate
  53. 53. GET aggregate { "cell": [], "drilldown": [], "summary": { "record_count": 62, "amount_sum": 1116860 } }
  54. 54. ∑ amount ✂ GET /aggregate?cut=date:2010
  55. 55. GET aggregate?cut=year:2010 { "cell": [ { "path": ["2010"], "type": "point", "dimension": "year", "level_depth": 1 } ], "drilldown": [], "summary": { "record_count": 31, "amount_sum": 566020 } }
  56. 56. GET aggregate?drilldown=year { "cell": [], "total_cell_count": 2, "drilldown": [ { "record_count": 31, "amount_sum": 550840, "year": 2009 }, { "record_count": 31, "amount_sum": 566020, "year": 2010 } ], "summary": { "record_count": 62, "amount_sum": 1116860 } }
  57. 57. GET report Content-Type: application/json list of cuts { "cell": [ { "dimension": "date", "type": "range", "from": [2009], "to": [2011,6] } ], "queries": { list of "by_segment": { named queries "query": "aggregate", "drilldown": ["segment"] }, "by_year": { "query": "aggregate", "drilldown": {"date":"year"} } } }
  58. 58. SQL Backend What data it works with?
  59. 59. ★ or ❄
  60. 60. ★ dimensions fact table
  61. 61. ❄ fact table dimensions
  62. 62. Aggregation Browser Browsing Context Snowflake Denormalized or Mapper Mapper denormalized view snowflake ❄
  63. 63. logical physical ❄
  64. 64. SQL Features ■ does not require DB write access ■ denormalisation ■ denormalised browsing, indexing ■ simple date datatype dimension ■ extraction of date parts during mapping ■ multiple schema support
  65. 65. Slicer command-line tool
  66. 66. ■ model validation slicer model validate model.json ■ model translation slicer model translate model.json translation.json ■ workspace testing slicer test config.ini ■ denormalization slicer denormalize --materialize --index config.ini
  67. 67. Future
  68. 68. ■ formatters for visualisation libraries ■ JavaScript library* help needed ■ backends ■ derived measures *http://github.com/Stiivi/cubes-js
  69. 69. Open Data ■ shared repository of models ■ shared repository of dimensions ■ public cubes open Slicer HTTP APIs http://github.com/Stiivi/cubes/wiki
  70. 70. stay light Nutrition Facts Serving Size 1 cube Amount Per Serving % Daily Value Total Fat 0g 0% Saturated Fat 0g Trans Fat 0g
  71. 71. Thank You source: github.com/Stiivi/cubes documentation: packages.python.org/cubes/ examples: github.com/Stiivi/cubes-examples
  72. 72. Backup
  73. 73. Transactions Reporting multidimensional object–relational modelling modelling ORM mapping logical model (and mapping) database connection browser database engine workspace
  74. 74. Limitations ■ one cut per dimension in a cell ■ logical conjunction of cuts (cut1 AND cut2 AND cut3 ...) ■ dimension-only selection ■ one - default hierarchy ■ some internals are ready for multiple

Editor's Notes

  • OLAP and Logical Model, Architecture, Slicing and Dicing, HTTP Server, SQL Backend\n\n
  • \n
  • \n
  • Q: Who is familiar with OLAP?\n
  • quick setup and reporting\ndoes not cover everything (intentionally)\n
  • example application - public procurements of slovakia\n
  • quick setup and reporting\ndoes not cover everything (intentionally)\n
  • will talk about modelling first, then reporting, then going to mix\n
  • how it looks like and what it does?\n
  • FIXME: add slicer tool here\n
  • not going into details, but just to align terminology and define context\n
  • not so rare we see creating reports directly from what is available, instead of starting with business needs and tryig to find a way how to derive it from what is available\n
  • different approach to data use, different needs\nwhile in apps you are focusing on transactions - trans data/oltp, in reporting you are focusing on analysis -> analytical data\nlogically separate (does not have to be physically separate)\n
  • \n
  • \n
  • \n
  • CONTEXT: where did the sale happened? who signed the contract?\nFILTER: how much was spent for construction work?\nAGGREGATION SCOPE: what was the revenue by country?\n\nused for ordering or sorting\ndefine master-detail relationships\n
  • \n
  • \n
  • provides metadata to easily create apps\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • what the browser does?\n
  • aggregating measures\n
  • \n
  • aggregation browser has to have concrete backend implementation\n
  • + bunch of other stuff\n
  • context\n
  • before I will talk about aggregation browser, I have to introduce a cell\n
  • \n
  • \n
  • our filter/selection defines the cell\nthis is kind of multidimensional “breadcrumbs”\n
  • path - taken from file system terminology for easier understanding\nthose are keys\nnote that displayed is level label, not a key\n
  • ... let’s put it into a picture\n
  • \n
  • “aggregation result” was created according to usual report look\n
  • FIXME: add picture\n
  • you can specify multiple dimensions and explicit level to be drilled down (for example “month” level of a date dimension)\n
  • it provides list of records, which are represented as dictionaries \nyou have to find out which one is level attribute or the key\n\n
  • no need to find the context of dimension of interest\nif not sufficient, one can still fall-back to the manual method\n
  • \n
  • facts – get details\nvalues - can be used to create selection boxes, also level can be specified\ncell_details is used for creating the multidimensional breadcrumbs mentioned before - it contains data to humanly describe current context of interest\nordering and pagination is supported\n
  • what was that “cell” thing?\n
  • \n
  • also show hierarchy\n
  • \n
  • \n
  • same drilldown, different cell\n
  • implicit: raises error if current level is the last one\nexample: you are exploring year 2010 (cell) and would like to see split by year (higher level)\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • just to name a few...\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • powered by sqlalchemy\n
  • powered by great abstraction framework\nconstruction of SQL statements\n
  • \n
  • \n
  • \n
  • denormalized\n
  • thanks to new browser and browsing context it is possible to transparently switch between original snowflake and generated denormalized view (which can be materialized and indexed based on dimension level keys)\n
  • in which table and which column is the attribute?\n
  • \n
  • \n
  • \n
  • \n
  • if someone would like to contribute with his skills, he is more than welcome and I will help\n
  • so if you have OS app, like Django that more users use, you can publish reporting model for others.\nput your cube in the Wiki\n
  • \n
  • MIT license\n
  • \n
  • \n
  • \n
  • Description

    Euro python 2012 talk. Video available: http://youtu.be/WSHD029BAls

    Transcript

    1. 1. Cubes light-weight OLAP Stefan Urbanek ■ @Stiivi ■ stefan.urbanek@gmail.com ■ July 2012
    2. 2. source github.com/Stiivi/cubes documentation packages.python.org/cubes/
    3. 3. Overview ■ purpose ■ analytical modelling and OLAP ■ slicing and dicing ■ OLAP server ■ SQL backend
    4. 4. analytical data modelling lightweight
    5. 5. http://tendre.sme.sk
    6. 6. aggregation browsing slicing and dicing
    7. 7. modelling reporting aggregation browsing
    8. 8. Architecture
    9. 9. ✂ model browser http backends server
    10. 10. Logical Model multidimensional, analytical
    11. 11. business/analyst’s point of view
    12. 12. transactions analysis OLTP OLAP application (operational) data analytical data
    13. 13. Model { “name” = “My Model” “description” = .... “cubes” = [...] “dimensions” = [...] } cubes dimensions measures levels, attributes, hierarchy
    14. 14. Facts measurable fact fact data cell most detailed information
    15. 15. location type time dimensions
    16. 16. Dimension ■ provide context for facts ■ used to filter queries or reports ■ control scope of aggregation of facts
    17. 17. Hierarchy 2010 May 1st levels
    18. 18. Dimension ■ levels and attributes “dimensions” = [ { ■ hierarchy* “name”:”date”, “levels”: ... ■ key attributes }, “hierarchy”: ... ... ■ label attributes ] *partial support for multiple hierarchies
    19. 19. label attribute key attribute for links to slices
    20. 20. Cube “cubes” = [ { “name”:”contracts”, “dimensions”: [ “date”, “category” ] “measures”: [ ■ dimensions { “name”: “amount”, “label”: “Contract Amount”, ■ measures } “aggregations”: [“sum”] ] }, ... ] *partial support for multiple hierarchies
    21. 21. "attributes": [ { "name":"group", "label": "Group code" localizable }, { "name":"group_label", model and attributes "label": "Group", "locales": ["en", "sk"] } ]
    22. 22. Aggregation Browser ∑
    23. 23. ∑ measures
    24. 24. get more details
    25. 25. Aggregation Browser SQL Snowflake SQL Denormalized Some HTTP Data MongoDB Browser Browser Browser Service Browser ? “batteries” that are included
    26. 26. Browser Workspace logical model + data
    27. 27. Cell
    28. 28. context of interest cell
    29. 29. cell
    30. 30. Path [45,2] [2012, 6] list of level keys
    31. 31. 1 load_model("model.json") Application ∑ 3 model.cube("sales") 4 workspace.browser(cube) cubes Aggregation Browser backend 2 create_workspace("sql", model, url="sqlite:///data.sqlite")
    32. 32. summary drill-down
    33. 33. browser.aggregate(o cell) summary
    34. 34. browser.aggregate(o cell, . drilldown=[9 "sector"]) drill-down
    35. 35. for row in result.drilldown: row["amount_sum"] row[q label_attribute] row[k key]
    36. 36. received_amount_sum measure aggregation record_count
    37. 37. browser.facts(o cell) browser.values(o cell, 9 dimension) browser.cell_details(o cell)
    38. 38. ✂ Slicing and Dicing ✂
    39. 39. ✂ ✂ April 2012 constructi on work construction work in april 2012 type supplier date
    40. 40. cut types ✂ point set range [[2010,10], from=[2010,10] [2010] [2010,12]] to=[2010,12]
    41. 41. Implicit Hierarchy drilldown
    42. 42. whole cube o cell = Cell(cube) browser.aggregate(o cell) Total browser.aggregate(o cell, drilldown=[9 “date”]) 2006 2007 2008 2009 2010 ✂ cut = PointCut(9 “date”, [2010]) o cell = o cell.slice(✂ cut) browser.aggregate(o cell, drilldown=[9 “date”]) Jan Feb Mar Apr March April May ...
    43. 43. Drill-down Level . drilldown = [9 "date"] implicit: next from o cell . drilldown = {9 "date": "month"} explicit
    44. 44. Cross Table experimental interface
    45. 45. 2009 2010 Assets Due from Banks 3044 1803 Assets Investments 41012 36012 Assets Loans Outstanding 103657 118104 Assets Nonnegotiable 1202 1123 Assets Other Assets 2247 3071 Assets Other Receivables 984 811 Assets Receivables 176 171 Assets Securities 33 289 Equity Capital Stock 11491 11492 Equity Deferred Amounts 359 313 Equity Other -1683 -3043 Equity Retained Earnings 29870 28793 Liabilities Borrowings 110040 128577 Liabilities Derivative Liabilities 115642 110418 Liabilities Other 57 8 Liabilities Other Liabilities 7321 5454 Liabilities Sold or Lent 2323 998
    46. 46. rows = ["item.category", "item.subcategory"] columns = ["year"] measures = ["amount_sum"] table = result.cross_table( rows, columns, measures )
    47. 47. Slicer The HTTP OLAP Server ✂
    48. 48. Application HTTP JSON Slicer ∑ Aggregation Browser
    49. 49. GET /model GET /aggregate GET /values GET /report
    50. 50. w logical model configuration data $ slicer serve slicer.ini
    51. 51. [server] backend: sql log_level: info [model] path: model.json locales: en,sk [workspace] url: postgres://localhost/database schema: datamart fact_prefix: ft_ dimension_prefix: dm_ w
    52. 52. ∑ amount GET /aggregate
    53. 53. GET aggregate { "cell": [], "drilldown": [], "summary": { "record_count": 62, "amount_sum": 1116860 } }
    54. 54. ∑ amount ✂ GET /aggregate?cut=date:2010
    55. 55. GET aggregate?cut=year:2010 { "cell": [ { "path": ["2010"], "type": "point", "dimension": "year", "level_depth": 1 } ], "drilldown": [], "summary": { "record_count": 31, "amount_sum": 566020 } }
    56. 56. GET aggregate?drilldown=year { "cell": [], "total_cell_count": 2, "drilldown": [ { "record_count": 31, "amount_sum": 550840, "year": 2009 }, { "record_count": 31, "amount_sum": 566020, "year": 2010 } ], "summary": { "record_count": 62, "amount_sum": 1116860 } }
    57. 57. GET report Content-Type: application/json list of cuts { "cell": [ { "dimension": "date", "type": "range", "from": [2009], "to": [2011,6] } ], "queries": { list of "by_segment": { named queries "query": "aggregate", "drilldown": ["segment"] }, "by_year": { "query": "aggregate", "drilldown": {"date":"year"} } } }
    58. 58. SQL Backend What data it works with?
    59. 59. ★ or ❄
    60. 60. ★ dimensions fact table
    61. 61. ❄ fact table dimensions
    62. 62. Aggregation Browser Browsing Context Snowflake Denormalized or Mapper Mapper denormalized view snowflake ❄
    63. 63. logical physical ❄
    64. 64. SQL Features ■ does not require DB write access ■ denormalisation ■ denormalised browsing, indexing ■ simple date datatype dimension ■ extraction of date parts during mapping ■ multiple schema support
    65. 65. Slicer command-line tool
    66. 66. ■ model validation slicer model validate model.json ■ model translation slicer model translate model.json translation.json ■ workspace testing slicer test config.ini ■ denormalization slicer denormalize --materialize --index config.ini
    67. 67. Future
    68. 68. ■ formatters for visualisation libraries ■ JavaScript library* help needed ■ backends ■ derived measures *http://github.com/Stiivi/cubes-js
    69. 69. Open Data ■ shared repository of models ■ shared repository of dimensions ■ public cubes open Slicer HTTP APIs http://github.com/Stiivi/cubes/wiki
    70. 70. stay light Nutrition Facts Serving Size 1 cube Amount Per Serving % Daily Value Total Fat 0g 0% Saturated Fat 0g Trans Fat 0g
    71. 71. Thank You source: github.com/Stiivi/cubes documentation: packages.python.org/cubes/ examples: github.com/Stiivi/cubes-examples
    72. 72. Backup
    73. 73. Transactions Reporting multidimensional object–relational modelling modelling ORM mapping logical model (and mapping) database connection browser database engine workspace
    74. 74. Limitations ■ one cut per dimension in a cell ■ logical conjunction of cuts (cut1 AND cut2 AND cut3 ...) ■ dimension-only selection ■ one - default hierarchy ■ some internals are ready for multiple

    Editor's Notes

  • OLAP and Logical Model, Architecture, Slicing and Dicing, HTTP Server, SQL Backend\n\n
  • \n
  • \n
  • Q: Who is familiar with OLAP?\n
  • quick setup and reporting\ndoes not cover everything (intentionally)\n
  • example application - public procurements of slovakia\n
  • quick setup and reporting\ndoes not cover everything (intentionally)\n
  • will talk about modelling first, then reporting, then going to mix\n
  • how it looks like and what it does?\n
  • FIXME: add slicer tool here\n
  • not going into details, but just to align terminology and define context\n
  • not so rare we see creating reports directly from what is available, instead of starting with business needs and tryig to find a way how to derive it from what is available\n
  • different approach to data use, different needs\nwhile in apps you are focusing on transactions - trans data/oltp, in reporting you are focusing on analysis -> analytical data\nlogically separate (does not have to be physically separate)\n
  • \n
  • \n
  • \n
  • CONTEXT: where did the sale happened? who signed the contract?\nFILTER: how much was spent for construction work?\nAGGREGATION SCOPE: what was the revenue by country?\n\nused for ordering or sorting\ndefine master-detail relationships\n
  • \n
  • \n
  • provides metadata to easily create apps\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • what the browser does?\n
  • aggregating measures\n
  • \n
  • aggregation browser has to have concrete backend implementation\n
  • + bunch of other stuff\n
  • context\n
  • before I will talk about aggregation browser, I have to introduce a cell\n
  • \n
  • \n
  • our filter/selection defines the cell\nthis is kind of multidimensional “breadcrumbs”\n
  • path - taken from file system terminology for easier understanding\nthose are keys\nnote that displayed is level label, not a key\n
  • ... let’s put it into a picture\n
  • \n
  • “aggregation result” was created according to usual report look\n
  • FIXME: add picture\n
  • you can specify multiple dimensions and explicit level to be drilled down (for example “month” level of a date dimension)\n
  • it provides list of records, which are represented as dictionaries \nyou have to find out which one is level attribute or the key\n\n
  • no need to find the context of dimension of interest\nif not sufficient, one can still fall-back to the manual method\n
  • \n
  • facts – get details\nvalues - can be used to create selection boxes, also level can be specified\ncell_details is used for creating the multidimensional breadcrumbs mentioned before - it contains data to humanly describe current context of interest\nordering and pagination is supported\n
  • what was that “cell” thing?\n
  • \n
  • also show hierarchy\n
  • \n
  • \n
  • same drilldown, different cell\n
  • implicit: raises error if current level is the last one\nexample: you are exploring year 2010 (cell) and would like to see split by year (higher level)\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • just to name a few...\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • powered by sqlalchemy\n
  • powered by great abstraction framework\nconstruction of SQL statements\n
  • \n
  • \n
  • \n
  • denormalized\n
  • thanks to new browser and browsing context it is possible to transparently switch between original snowflake and generated denormalized view (which can be materialized and indexed based on dimension level keys)\n
  • in which table and which column is the attribute?\n
  • \n
  • \n
  • \n
  • \n
  • if someone would like to contribute with his skills, he is more than welcome and I will help\n
  • so if you have OS app, like Django that more users use, you can publish reporting model for others.\nput your cube in the Wiki\n
  • \n
  • MIT license\n
  • \n
  • \n
  • \n
  • More Related Content

    Related Books

    Free with a 30 day trial from Scribd

    See all

    ×