Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Cubes                   light-weight OLAPStefan Urbanek ■ @Stiivi ■ stefan.urbanek@gmail.com ■ July 2012
source  github.com/Stiivi/cubes         documentationpackages.python.org/cubes/
Overview■   purpose■   analytical modelling and OLAP■   slicing and dicing■   OLAP server■   SQL backend
analytical data modelling        lightweight
http://tendre.sme.sk
aggregation browsing     slicing and dicing
modelling   reporting            aggregation browsing
Architecture
✂ model     browser             httpbackends   server
Logical Model multidimensional, analytical
business/analyst’s  point of view
transactions                 analysis         OLTP                        OLAPapplication (operational) data   analytical ...
Model           {               “name” = “My Model”               “description” = ....               “cubes” = [...]      ...
Facts                  measurable      fact                    fact data cellmost detailed information
locationtype              time           dimensions
Dimension■ provide context for facts■ used to filter queries or reports■ control scope of aggregation of  facts
Hierarchy     2010 May 1st        levels
Dimension■   levels and attributes          “dimensions” = [                                     {■   hierarchy*          ...
label attribute   key attribute                  for links to slices
Cube               “cubes” = [                 {                    “name”:”contracts”,                    “dimensions”: [...
"attributes": [                           {                             "name":"group",                             "label...
Aggregation  Browser    ∑
∑ measures
get more details
Aggregation                               BrowserSQL Snowflake   SQL Denormalized                               Some HTTP D...
Browser Workspacelogical model                +   data
Cell
context of interestcell
cell
Path              [45,2][2012, 6]                       list of level keys
1   load_model("model.json")           Application                  ∑                                 3   model.cube("sale...
summarydrill-down
browser.aggregate(o cell)                            summary
browser.aggregate(o cell,                  . drilldown=[9 "sector"])                         drill-down
for row in result.drilldown:              row["amount_sum"]row[q label_attribute]            row[k key]
received_amount_summeasure      aggregation           record_count
browser.facts(o cell)browser.values(o cell, 9 dimension)browser.cell_details(o cell)
✂    Slicing and Dicing✂
✂✂               April 2012constructi on work                       construction work in                                  ...
cut types✂point         set           range           [[2010,10],   from=[2010,10][2010]            [2010,12]]   to=[2010,...
Implicit Hierarchy       drilldown
whole cube                                          o cell = Cell(cube)                                          browser.a...
Drill-down Level. drilldown = [9 "date"]                implicit: next from o cell. drilldown = {9 "date": "month"}       ...
Cross Table experimental interface
2009     2010     Assets           Due from Banks     3044     1803     Assets              Investments    41012    36012 ...
rows = ["item.category",        "item.subcategory"]columns = ["year"]measures = ["amount_sum"]table = result.cross_table( ...
SlicerThe HTTP OLAP Server      ✂
ApplicationHTTP                         JSON             Slicer                   ∑       Aggregation Browser
GET /modelGET /aggregateGET /valuesGET /report
w logical model       configuration   data$ slicer serve slicer.ini
[server]backend: sqllog_level: info[model]path: model.jsonlocales: en,sk[workspace]url: postgres://localhost/databaseschem...
∑      amountGET /aggregate
GET aggregate{    "cell": [],    "drilldown": [],    "summary": {        "record_count": 62,        "amount_sum": 1116860 ...
∑         amount✂GET /aggregate?cut=date:2010
GET aggregate?cut=year:2010{    "cell": [        {            "path": ["2010"],            "type": "point",            "di...
GET aggregate?drilldown=year{     "cell": [],     "total_cell_count": 2,     "drilldown": [         {             "record_...
GET report                     Content-Type: application/jsonlist of cuts         {                         "cell": [     ...
SQL Backend What data it works with?
★   or         ❄
★dimensions   fact table
❄             fact tabledimensions
Aggregation Browser                     Browsing Context               Snowflake            Denormalized                   ...
logical              physical          ❄
SQL Features■ does not require DB write access■ denormalisation ■   denormalised browsing, indexing■ simple date datatype ...
Slicercommand-line tool
■ model validation  slicer model validate model.json■ model translation  slicer model translate model.json translation.jso...
Future
■ formatters for visualisation libraries■ JavaScript library*             help needed■ backends■ derived measures         ...
Open Data■ shared repository of models■ shared repository of dimensions■ public cubes   open Slicer HTTP APIs             ...
stay light Nutrition Facts Serving Size 1 cube Amount Per Serving                       % Daily Value Total Fat 0g        ...
Thank You              source:    github.com/Stiivi/cubes           documentation:  packages.python.org/cubes/            ...
Backup
Transactions                 Reporting                              multidimensionalobject–relational modelling           ...
Limitations■ one cut per dimension in a cell ■   logical conjunction of cuts (cut1 AND cut2 AND cut3 ...)■ dimension-only ...
Cubes - Lightweight Python OLAP (EuroPython 2012 talk)
Upcoming SlideShare
Loading in …5
×

Cubes - Lightweight Python OLAP (EuroPython 2012 talk)

13,248 views

Published on

Euro python 2012 talk. Video available: http://youtu.be/WSHD029BAls

Published in: Technology

Cubes - Lightweight Python OLAP (EuroPython 2012 talk)

  1. 1. Cubes light-weight OLAPStefan Urbanek ■ @Stiivi ■ stefan.urbanek@gmail.com ■ July 2012
  2. 2. source github.com/Stiivi/cubes documentationpackages.python.org/cubes/
  3. 3. Overview■ purpose■ analytical modelling and OLAP■ slicing and dicing■ OLAP server■ SQL backend
  4. 4. analytical data modelling lightweight
  5. 5. http://tendre.sme.sk
  6. 6. aggregation browsing slicing and dicing
  7. 7. modelling reporting aggregation browsing
  8. 8. Architecture
  9. 9. ✂ model browser httpbackends server
  10. 10. Logical Model multidimensional, analytical
  11. 11. business/analyst’s point of view
  12. 12. transactions analysis OLTP OLAPapplication (operational) data analytical data
  13. 13. Model { “name” = “My Model” “description” = .... “cubes” = [...] “dimensions” = [...] }cubes dimensionsmeasures levels, attributes, hierarchy
  14. 14. Facts measurable fact fact data cellmost detailed information
  15. 15. locationtype time dimensions
  16. 16. Dimension■ provide context for facts■ used to filter queries or reports■ control scope of aggregation of facts
  17. 17. Hierarchy 2010 May 1st levels
  18. 18. Dimension■ levels and attributes “dimensions” = [ {■ hierarchy* “name”:”date”, “levels”: ...■ key attributes }, “hierarchy”: ... ...■ label attributes ] *partial support for multiple hierarchies
  19. 19. label attribute key attribute for links to slices
  20. 20. Cube “cubes” = [ { “name”:”contracts”, “dimensions”: [ “date”, “category” ] “measures”: [■ dimensions { “name”: “amount”, “label”: “Contract Amount”,■ measures } “aggregations”: [“sum”] ] }, ... ] *partial support for multiple hierarchies
  21. 21. "attributes": [ { "name":"group", "label": "Group code"localizable }, { "name":"group_label",model and attributes "label": "Group", "locales": ["en", "sk"] } ]
  22. 22. Aggregation Browser ∑
  23. 23. ∑ measures
  24. 24. get more details
  25. 25. Aggregation BrowserSQL Snowflake SQL Denormalized Some HTTP Data MongoDB Browser Browser Browser Service Browser ? “batteries” that are included
  26. 26. Browser Workspacelogical model + data
  27. 27. Cell
  28. 28. context of interestcell
  29. 29. cell
  30. 30. Path [45,2][2012, 6] list of level keys
  31. 31. 1 load_model("model.json") Application ∑ 3 model.cube("sales") 4 workspace.browser(cube) cubes Aggregation Browser backend2 create_workspace("sql", model, url="sqlite:///data.sqlite")
  32. 32. summarydrill-down
  33. 33. browser.aggregate(o cell) summary
  34. 34. browser.aggregate(o cell, . drilldown=[9 "sector"]) drill-down
  35. 35. for row in result.drilldown: row["amount_sum"]row[q label_attribute] row[k key]
  36. 36. received_amount_summeasure aggregation record_count
  37. 37. browser.facts(o cell)browser.values(o cell, 9 dimension)browser.cell_details(o cell)
  38. 38. ✂ Slicing and Dicing✂
  39. 39. ✂✂ April 2012constructi on work construction work in april 2012 type supplier date
  40. 40. cut types✂point set range [[2010,10], from=[2010,10][2010] [2010,12]] to=[2010,12]
  41. 41. Implicit Hierarchy drilldown
  42. 42. whole cube o cell = Cell(cube) browser.aggregate(o cell) Total browser.aggregate(o cell, drilldown=[9 “date”])2006 2007 2008 2009 2010 ✂ cut = PointCut(9 “date”, [2010]) o cell = o cell.slice(✂ cut) browser.aggregate(o cell, drilldown=[9 “date”])Jan Feb Mar Apr March April May ...
  43. 43. Drill-down Level. drilldown = [9 "date"] implicit: next from o cell. drilldown = {9 "date": "month"} explicit
  44. 44. Cross Table experimental interface
  45. 45. 2009 2010 Assets Due from Banks 3044 1803 Assets Investments 41012 36012 Assets Loans Outstanding 103657 118104 Assets Nonnegotiable 1202 1123 Assets Other Assets 2247 3071 Assets Other Receivables 984 811 Assets Receivables 176 171 Assets Securities 33 289 Equity Capital Stock 11491 11492 Equity Deferred Amounts 359 313 Equity Other -1683 -3043 Equity Retained Earnings 29870 28793Liabilities Borrowings 110040 128577Liabilities Derivative Liabilities 115642 110418Liabilities Other 57 8Liabilities Other Liabilities 7321 5454Liabilities Sold or Lent 2323 998
  46. 46. rows = ["item.category", "item.subcategory"]columns = ["year"]measures = ["amount_sum"]table = result.cross_table( rows, columns, measures )
  47. 47. SlicerThe HTTP OLAP Server ✂
  48. 48. ApplicationHTTP JSON Slicer ∑ Aggregation Browser
  49. 49. GET /modelGET /aggregateGET /valuesGET /report
  50. 50. w logical model configuration data$ slicer serve slicer.ini
  51. 51. [server]backend: sqllog_level: info[model]path: model.jsonlocales: en,sk[workspace]url: postgres://localhost/databaseschema: datamartfact_prefix: ft_dimension_prefix: dm_ w
  52. 52. ∑ amountGET /aggregate
  53. 53. GET aggregate{ "cell": [], "drilldown": [], "summary": { "record_count": 62, "amount_sum": 1116860 }}
  54. 54. ∑ amount✂GET /aggregate?cut=date:2010
  55. 55. GET aggregate?cut=year:2010{ "cell": [ { "path": ["2010"], "type": "point", "dimension": "year", "level_depth": 1 } ], "drilldown": [], "summary": { "record_count": 31, "amount_sum": 566020 }}
  56. 56. GET aggregate?drilldown=year{ "cell": [], "total_cell_count": 2, "drilldown": [ { "record_count": 31, "amount_sum": 550840, "year": 2009 }, { "record_count": 31, "amount_sum": 566020, "year": 2010 } ], "summary": { "record_count": 62, "amount_sum": 1116860 }}
  57. 57. GET report Content-Type: application/jsonlist of cuts { "cell": [ { "dimension": "date", "type": "range", "from": [2009], "to": [2011,6] } ], "queries": { list of "by_segment": { named queries "query": "aggregate", "drilldown": ["segment"] }, "by_year": { "query": "aggregate", "drilldown": {"date":"year"} } } }
  58. 58. SQL Backend What data it works with?
  59. 59. ★ or ❄
  60. 60. ★dimensions fact table
  61. 61. ❄ fact tabledimensions
  62. 62. Aggregation Browser Browsing Context Snowflake Denormalized or Mapper Mapperdenormalized viewsnowflake ❄
  63. 63. logical physical ❄
  64. 64. SQL Features■ does not require DB write access■ denormalisation ■ denormalised browsing, indexing■ simple date datatype dimension ■ extraction of date parts during mapping■ multiple schema support
  65. 65. Slicercommand-line tool
  66. 66. ■ model validation slicer model validate model.json■ model translation slicer model translate model.json translation.json■ workspace testing slicer test config.ini■ denormalization slicer denormalize --materialize --index config.ini
  67. 67. Future
  68. 68. ■ formatters for visualisation libraries■ JavaScript library* help needed■ backends■ derived measures *http://github.com/Stiivi/cubes-js
  69. 69. Open Data■ shared repository of models■ shared repository of dimensions■ public cubes open Slicer HTTP APIs http://github.com/Stiivi/cubes/wiki
  70. 70. stay light Nutrition Facts Serving Size 1 cube Amount Per Serving % Daily Value Total Fat 0g 0% Saturated Fat 0g Trans Fat 0g
  71. 71. Thank You source: github.com/Stiivi/cubes documentation: packages.python.org/cubes/ examples:github.com/Stiivi/cubes-examples
  72. 72. Backup
  73. 73. Transactions Reporting multidimensionalobject–relational modelling modelling ORM mapping logical model (and mapping) database connection browser database engine workspace
  74. 74. Limitations■ one cut per dimension in a cell ■ logical conjunction of cuts (cut1 AND cut2 AND cut3 ...)■ dimension-only selection■ one - default hierarchy ■ some internals are ready for multiple

×