Cubes - Lightweight Python OLAP (EuroPython 2012 talk)

10,550 views
10,260 views

Published on

Euro python 2012 talk. Video available: http://youtu.be/WSHD029BAls

Published in: Technology
1 Comment
32 Likes
Statistics
Notes
No Downloads
Views
Total views
10,550
On SlideShare
0
From Embeds
0
Number of Embeds
154
Actions
Shares
0
Downloads
0
Comments
1
Likes
32
Embeds 0
No embeds

No notes for slide
  • OLAP and Logical Model, Architecture, Slicing and Dicing, HTTP Server, SQL Backend\n\n
  • \n
  • \n
  • Q: Who is familiar with OLAP?\n
  • quick setup and reporting\ndoes not cover everything (intentionally)\n
  • example application - public procurements of slovakia\n
  • quick setup and reporting\ndoes not cover everything (intentionally)\n
  • will talk about modelling first, then reporting, then going to mix\n
  • how it looks like and what it does?\n
  • FIXME: add slicer tool here\n
  • not going into details, but just to align terminology and define context\n
  • not so rare we see creating reports directly from what is available, instead of starting with business needs and tryig to find a way how to derive it from what is available\n
  • different approach to data use, different needs\nwhile in apps you are focusing on transactions - trans data/oltp, in reporting you are focusing on analysis -> analytical data\nlogically separate (does not have to be physically separate)\n
  • \n
  • \n
  • \n
  • CONTEXT: where did the sale happened? who signed the contract?\nFILTER: how much was spent for construction work?\nAGGREGATION SCOPE: what was the revenue by country?\n\nused for ordering or sorting\ndefine master-detail relationships\n
  • \n
  • \n
  • provides metadata to easily create apps\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • what the browser does?\n
  • aggregating measures\n
  • \n
  • aggregation browser has to have concrete backend implementation\n
  • + bunch of other stuff\n
  • context\n
  • before I will talk about aggregation browser, I have to introduce a cell\n
  • \n
  • \n
  • our filter/selection defines the cell\nthis is kind of multidimensional “breadcrumbs”\n
  • path - taken from file system terminology for easier understanding\nthose are keys\nnote that displayed is level label, not a key\n
  • ... let’s put it into a picture\n
  • \n
  • “aggregation result” was created according to usual report look\n
  • FIXME: add picture\n
  • you can specify multiple dimensions and explicit level to be drilled down (for example “month” level of a date dimension)\n
  • it provides list of records, which are represented as dictionaries \nyou have to find out which one is level attribute or the key\n\n
  • no need to find the context of dimension of interest\nif not sufficient, one can still fall-back to the manual method\n
  • \n
  • facts – get details\nvalues - can be used to create selection boxes, also level can be specified\ncell_details is used for creating the multidimensional breadcrumbs mentioned before - it contains data to humanly describe current context of interest\nordering and pagination is supported\n
  • what was that “cell” thing?\n
  • \n
  • also show hierarchy\n
  • \n
  • \n
  • same drilldown, different cell\n
  • implicit: raises error if current level is the last one\nexample: you are exploring year 2010 (cell) and would like to see split by year (higher level)\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • just to name a few...\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • powered by sqlalchemy\n
  • powered by great abstraction framework\nconstruction of SQL statements\n
  • \n
  • \n
  • \n
  • denormalized\n
  • thanks to new browser and browsing context it is possible to transparently switch between original snowflake and generated denormalized view (which can be materialized and indexed based on dimension level keys)\n
  • in which table and which column is the attribute?\n
  • \n
  • \n
  • \n
  • \n
  • if someone would like to contribute with his skills, he is more than welcome and I will help\n
  • so if you have OS app, like Django that more users use, you can publish reporting model for others.\nput your cube in the Wiki\n
  • \n
  • MIT license\n
  • \n
  • \n
  • \n
  • Cubes - Lightweight Python OLAP (EuroPython 2012 talk)

    1. 1. Cubes light-weight OLAPStefan Urbanek ■ @Stiivi ■ stefan.urbanek@gmail.com ■ July 2012
    2. 2. source github.com/Stiivi/cubes documentationpackages.python.org/cubes/
    3. 3. Overview■ purpose■ analytical modelling and OLAP■ slicing and dicing■ OLAP server■ SQL backend
    4. 4. analytical data modelling lightweight
    5. 5. http://tendre.sme.sk
    6. 6. aggregation browsing slicing and dicing
    7. 7. modelling reporting aggregation browsing
    8. 8. Architecture
    9. 9. ✂ model browser httpbackends server
    10. 10. Logical Model multidimensional, analytical
    11. 11. business/analyst’s point of view
    12. 12. transactions analysis OLTP OLAPapplication (operational) data analytical data
    13. 13. Model { “name” = “My Model” “description” = .... “cubes” = [...] “dimensions” = [...] }cubes dimensionsmeasures levels, attributes, hierarchy
    14. 14. Facts measurable fact fact data cellmost detailed information
    15. 15. locationtype time dimensions
    16. 16. Dimension■ provide context for facts■ used to filter queries or reports■ control scope of aggregation of facts
    17. 17. Hierarchy 2010 May 1st levels
    18. 18. Dimension■ levels and attributes “dimensions” = [ {■ hierarchy* “name”:”date”, “levels”: ...■ key attributes }, “hierarchy”: ... ...■ label attributes ] *partial support for multiple hierarchies
    19. 19. label attribute key attribute for links to slices
    20. 20. Cube “cubes” = [ { “name”:”contracts”, “dimensions”: [ “date”, “category” ] “measures”: [■ dimensions { “name”: “amount”, “label”: “Contract Amount”,■ measures } “aggregations”: [“sum”] ] }, ... ] *partial support for multiple hierarchies
    21. 21. "attributes": [ { "name":"group", "label": "Group code"localizable }, { "name":"group_label",model and attributes "label": "Group", "locales": ["en", "sk"] } ]
    22. 22. Aggregation Browser ∑
    23. 23. ∑ measures
    24. 24. get more details
    25. 25. Aggregation BrowserSQL Snowflake SQL Denormalized Some HTTP Data MongoDB Browser Browser Browser Service Browser ? “batteries” that are included
    26. 26. Browser Workspacelogical model + data
    27. 27. Cell
    28. 28. context of interestcell
    29. 29. cell
    30. 30. Path [45,2][2012, 6] list of level keys
    31. 31. 1 load_model("model.json") Application ∑ 3 model.cube("sales") 4 workspace.browser(cube) cubes Aggregation Browser backend2 create_workspace("sql", model, url="sqlite:///data.sqlite")
    32. 32. summarydrill-down
    33. 33. browser.aggregate(o cell) summary
    34. 34. browser.aggregate(o cell, . drilldown=[9 "sector"]) drill-down
    35. 35. for row in result.drilldown: row["amount_sum"]row[q label_attribute] row[k key]
    36. 36. received_amount_summeasure aggregation record_count
    37. 37. browser.facts(o cell)browser.values(o cell, 9 dimension)browser.cell_details(o cell)
    38. 38. ✂ Slicing and Dicing✂
    39. 39. ✂✂ April 2012constructi on work construction work in april 2012 type supplier date
    40. 40. cut types✂point set range [[2010,10], from=[2010,10][2010] [2010,12]] to=[2010,12]
    41. 41. Implicit Hierarchy drilldown
    42. 42. whole cube o cell = Cell(cube) browser.aggregate(o cell) Total browser.aggregate(o cell, drilldown=[9 “date”])2006 2007 2008 2009 2010 ✂ cut = PointCut(9 “date”, [2010]) o cell = o cell.slice(✂ cut) browser.aggregate(o cell, drilldown=[9 “date”])Jan Feb Mar Apr March April May ...
    43. 43. Drill-down Level. drilldown = [9 "date"] implicit: next from o cell. drilldown = {9 "date": "month"} explicit
    44. 44. Cross Table experimental interface
    45. 45. 2009 2010 Assets Due from Banks 3044 1803 Assets Investments 41012 36012 Assets Loans Outstanding 103657 118104 Assets Nonnegotiable 1202 1123 Assets Other Assets 2247 3071 Assets Other Receivables 984 811 Assets Receivables 176 171 Assets Securities 33 289 Equity Capital Stock 11491 11492 Equity Deferred Amounts 359 313 Equity Other -1683 -3043 Equity Retained Earnings 29870 28793Liabilities Borrowings 110040 128577Liabilities Derivative Liabilities 115642 110418Liabilities Other 57 8Liabilities Other Liabilities 7321 5454Liabilities Sold or Lent 2323 998
    46. 46. rows = ["item.category", "item.subcategory"]columns = ["year"]measures = ["amount_sum"]table = result.cross_table( rows, columns, measures )
    47. 47. SlicerThe HTTP OLAP Server ✂
    48. 48. ApplicationHTTP JSON Slicer ∑ Aggregation Browser
    49. 49. GET /modelGET /aggregateGET /valuesGET /report
    50. 50. w logical model configuration data$ slicer serve slicer.ini
    51. 51. [server]backend: sqllog_level: info[model]path: model.jsonlocales: en,sk[workspace]url: postgres://localhost/databaseschema: datamartfact_prefix: ft_dimension_prefix: dm_ w
    52. 52. ∑ amountGET /aggregate
    53. 53. GET aggregate{ "cell": [], "drilldown": [], "summary": { "record_count": 62, "amount_sum": 1116860 }}
    54. 54. ∑ amount✂GET /aggregate?cut=date:2010
    55. 55. GET aggregate?cut=year:2010{ "cell": [ { "path": ["2010"], "type": "point", "dimension": "year", "level_depth": 1 } ], "drilldown": [], "summary": { "record_count": 31, "amount_sum": 566020 }}
    56. 56. GET aggregate?drilldown=year{ "cell": [], "total_cell_count": 2, "drilldown": [ { "record_count": 31, "amount_sum": 550840, "year": 2009 }, { "record_count": 31, "amount_sum": 566020, "year": 2010 } ], "summary": { "record_count": 62, "amount_sum": 1116860 }}
    57. 57. GET report Content-Type: application/jsonlist of cuts { "cell": [ { "dimension": "date", "type": "range", "from": [2009], "to": [2011,6] } ], "queries": { list of "by_segment": { named queries "query": "aggregate", "drilldown": ["segment"] }, "by_year": { "query": "aggregate", "drilldown": {"date":"year"} } } }
    58. 58. SQL Backend What data it works with?
    59. 59. ★ or ❄
    60. 60. ★dimensions fact table
    61. 61. ❄ fact tabledimensions
    62. 62. Aggregation Browser Browsing Context Snowflake Denormalized or Mapper Mapperdenormalized viewsnowflake ❄
    63. 63. logical physical ❄
    64. 64. SQL Features■ does not require DB write access■ denormalisation ■ denormalised browsing, indexing■ simple date datatype dimension ■ extraction of date parts during mapping■ multiple schema support
    65. 65. Slicercommand-line tool
    66. 66. ■ model validation slicer model validate model.json■ model translation slicer model translate model.json translation.json■ workspace testing slicer test config.ini■ denormalization slicer denormalize --materialize --index config.ini
    67. 67. Future
    68. 68. ■ formatters for visualisation libraries■ JavaScript library* help needed■ backends■ derived measures *http://github.com/Stiivi/cubes-js
    69. 69. Open Data■ shared repository of models■ shared repository of dimensions■ public cubes open Slicer HTTP APIs http://github.com/Stiivi/cubes/wiki
    70. 70. stay light Nutrition Facts Serving Size 1 cube Amount Per Serving % Daily Value Total Fat 0g 0% Saturated Fat 0g Trans Fat 0g
    71. 71. Thank You source: github.com/Stiivi/cubes documentation: packages.python.org/cubes/ examples:github.com/Stiivi/cubes-examples
    72. 72. Backup
    73. 73. Transactions Reporting multidimensionalobject–relational modelling modelling ORM mapping logical model (and mapping) database connection browser database engine workspace
    74. 74. Limitations■ one cut per dimension in a cell ■ logical conjunction of cuts (cut1 AND cut2 AND cut3 ...)■ dimension-only selection■ one - default hierarchy ■ some internals are ready for multiple

    ×