SlideShare a Scribd company logo
Cubes 1.0 Overview 
light data warehouse and conceptual modelling 
Štefan Urbánek, @Stiivi 
stefan.urbanek@gmail.com November 2014
understanding 
through metadata
model data 
reporting apps / modules 
metadata
❄ 
logical 
physical
Categorical Data 
Σ =
OLAP 
(online analytical processing) 
lightweight framework for 
conceptual modelling and 
analytics
Original Cubes 
before 1.0
Workspace 1 × 1 × 
store 
process or server 
| 
model
We needed more!
Models 
Stores 
file 
Σ 
database API 
Postgres Mongo API 
multiple model parts, 
different sources 
multiple data sources, 
heterogenous
Cubes 1.0
Python ≥ 3.4 
works with ≥ 2.7 too for the “two” series
■ analytical workspace 
■ model providers 
■ new and improved backends 
■ better extensibility 
■ authorisation
Analytical 
Workspace
Model Providers 
Cubes 
Stores 
Static Model 
Provider 
API Model 
Provider 
sales churn activations events 
BI Data 
(Postgres) 
BI Data 2 
(Mongo) 
Events 
(API)
Workspace 
Model Providers 
Cubes 
Stores 
Static Model 
Provider 
sales churn activations events 
BI Data 
(Postgres) 
BI Data 2 
(Mongo) 
crm sales events 
[workspace] 
models_path: /var/lib/cubes/models 
[models] 
crm: crm.cubesmodel 
sales: sales.cubesmodel 
events: events.cubesmodel 
[store crm] 
type: sql 
url: postgresql://localhost/crm 
[store events] 
type: mongo 
host: localhost 
collection: events
BYOB 
bring your own backend 
Slicer
Backend
| 
Browser 
" 
Store 
# 
Provider
Logical Physical 
create model connect 
physical data store 
(database or API) 
| 
Browser 
" 
Store 
# 
Provider 
Σ 
aggregate 
model 
cubes 
dimensions 
model 
backend objects
Model Provider 
model 
cubes 
dimensions
Cubes 1.0 Overview
Model Provider 
■ metadata on-the-fly 
■ local or external source 
■ might be linked to a store 
model 
cubes 
dimensions
Model 
required 
required 
automatic 
automatic 
automatic 
Dimensions 
column (table) 
key/attribute 
property 
dimension 
Cubes / Facts 
table 
collection 
event 
metric 
Backend 
SQL 
MongoDB 
Mixpanel 
Google Analytics 
Slicer cube dimension
Model Improvements
Model 
■ measures → aggregates 
■ more front-end metadata 
cube categories, dimension role and cardinality 
■ customised dimension linking
Cubes 1.0 Overview
"measures": [ 
{ 
"name": "amount", 
"label": "Sales Amount" 
}, 
{ 
"name": "vat", 
"label": "VAT" 
} 
] 
"aggregates": [ 
{ 
"name": “total_sales", 
"label": "Total Sales Amount", 
"measure": "amount", 
"function": "sum" 
}, 
{ 
"name": “total_vat", 
"label": "Total VAT", 
"measure": "vat", 
"function": "sum" 
}, 
{ 
"name": "item_count", 
"label": "Item Count", 
"function": "count" 
} 
]
Aggregates 
■ custom name 
■ can refer to other aggregates 
post-aggregation calculations 
■ functions are backend-specific 
SQL aggregations: sum, count, count_nonempty, count_distinct, min, 
max, avg, stddev, variance, …
Contextual Dimensions 
{ 
"measures": [ … ], 
"dimensions": [ 
{"name": "date", "hierarchies": ["ym", "yqm"]}, 
{"name": "date", "alias": "contract_date"} 
], 
… 
} 
customisable linking properties: 
alias, hierarchies, exclude_hierarchies, 
default_hierarchy_name, cardinality, 
nonadditive
Dimension Roles 
dimension.role 
time 
level.role 
year, month, day, … 
hint for reporting applications or backends
Cardinality 
overload precautions 
dimension.cardinality 
level.cardinality 
tiny < low < medium < high 
< <
Browser 
Σ
Browser 
■ uses logical model 
■ implements aggregation 
■ builds queries 
■ retrieves data 
Logical Physical 
physical data store 
(database or API) 
| 
Browser 
" 
Store 
Σ 
aggregate 
model
Browser Methods 
■ features() 
■ aggregate(cell, drilldown,…) 
■ members(cell, dimension, …) 
■ facts(cell, …) 
■ fact(id) 
■ cell_details(cell, drilldown, …)
Split Cell 
False True 
aggregate(split=cell) 
__within_split__ 
generated dimension
Post-aggregation 
“statutils” 
■ computed on aggregation result 
in Python 
■ moving averages, deviation, variance 
wma, sma, sms, smstd, smsrd, smsvar 
■ aggregate property: window_size
Store
Store 
■ provides database or API connection 
■ might provide a model 
■ slicer tool actions (future) 
validation, schema, optimization, ... Logical Physical 
physical data store 
(database or API) 
| 
Browser 
connect 
" 
Store
SQL Backend 
also known as ROLAP 
or SQL query generator
SQL Overview 
■ new query builder 
■ join optimisation 
■ support for outer-joins 
■ support for “split” dimension 
■ new aggregate functions
❄ 
fact table 
join optimisation
facts match date 
master detail 
facts detail date 
master detail 
facts master date 
master detail 
"joins" = [ 
{ 
"master": "fact_contracts.contract_date_id", 
"detail": "dim_date.id", 
"method": "detail" 
} 
]
Authentication and 
Authorisation
Cubes 1.0 Overview
Cubes 1.0 Overview
{ 
“lidia”: { 
“allowed_cubes”: [“sales”], 
“cube_restrictions”: { 
“sales”: [“store:3”] 
} 
}, 
“martin”: { 
“allowed_cubes”: [“sales”], 
“cube_restrictions”: { 
“sales”: [“store:5”] 
} 
} 
}
[workspace] 
authorization: simple 
[authorization] 
rights_file: access_rights.json 
! 
Authorizer
Slicer 
server 
✂
Cubes 1.0 Overview
Model Queries 
■ GET /cubes 
overview of cubes from all providers 
■ GET /cube/sales/model 
detailed cube model with described dimensions
Browser Queries 
■ GET /cube/name/aggregate 
■ GET /cube/name/members/dim 
■ GET /cube/name/facts 
■ GET /cube/name/fact 
■ GET /cube/name/cell
Aggregate 
GET /cube/sales/aggregate? cut=date:2010 
& drilldown=date|region & split=status:1 
& page=10 & page_size=100
{ 
"cell": [], 
"total_cell_count": 2, 
"drilldown": [ 
{ 
"record_count": 31, 
"amount_sum": 550840, 
“date.year": 2009 
}, 
{ 
"record_count": 31, 
"amount_sum": 566020, 
“date.year": 2010 
} 
], 
"summary": { 
"record_count": 62, 
"amount_sum": 1116860 
} 
}
Special Characters 
“category:10-24” 
→ “10-24” 
“city:Nové Mesto nad Váhom” 
→ “Nové Mesto nad Váhom"
Relative Time 
uses dimension roles and Calendar 
date:yesterday 
date:90daysago-today 
expiration_date:lastmonth-next2months
Output Format 
format=csv 
format=json 
format=json_lines 
* 
*for facts and members
Deployment 
reporting for your app or stand-alone
Public 
HTML & JS 
Application 
Slicer server 
store 
HTTP request 
JSON reply 
model 
Public 
HTML & JS 
Application 
WSGI 
store 
HTTP request 
JSON reply 
Slicer Flask App 
model 
Public 
HTML 
Django, Flask, … 
store 
JSON reply 
Cubes 
Python API 
model 
Public 
Public 
store 
Flask 
HTML 
HTML 
Web Application 
PHP, RoR, Django 
Slicer server 
Slicer Blueprint 
model 
Internal 
store 
HTTP request 
JSON reply 
model
Front-ends 
generic ad-hoc reporting
Cubes 1.0 Overview
✂ 
Slicer
Cubes 1.0 Overview
Cubes 1.0 Overview
Cubes 1.0 Overview
Cubes 1.0 Overview
Cubes Viewer 
Jose Juan Montes, ! jjmontesl/cubesviewer
checkgermany.de 
* 
Front-end by: Felix Ebert (@femeb) 
Data by: Friedrich Lindenberg (@pudo) Cubes 0.10.2
Summary & Future
Summary 
■ heterogenous pluggable environment 
■ easier to extend 
■ better SQL query generator
Not Mentioned 
■ localisation 
■ namespaces 
■ calendar 
■ query logging
Incubated 
■ non-additive properties 
■ periods-to-date 
■ modeler app 
■ cubes.js
Future 
■ arithmetic expressions 
■ SQL improvements 
■ improved API for custom browsers 
■ cubes.js
Nutrition Facts 
Serving Size 1 cube 
Amount Per Serving 
Total Fat 0g 
Saturated Fat 0g 
Trans Fat 0g 
% Daily Value 
Total Carbohydrate 0g 
Dietary Fiber 0g 
Sugars 0g 
0% 
0%
Want to contribute? 
#TODO, #FIXME, Issue # 
https://github.com/DataBrewery/cubes/issues
Credits
Thanks for 1.0 
Robin Thomas 
Ryan Berlew 
Jose Juan Montes 
Squarespace 
and all contributors on Github
Thank You 
"Stiivi
github.com/DataBrewery/cubes 
cubes.databrewery.org

More Related Content

What's hot

Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Julian Hyde
 
Apache Spark Side of Funnels
Apache Spark Side of FunnelsApache Spark Side of Funnels
Apache Spark Side of Funnels
Databricks
 
Data centric Metaprogramming by Vlad Ulreche
Data centric Metaprogramming by Vlad UlrecheData centric Metaprogramming by Vlad Ulreche
Data centric Metaprogramming by Vlad Ulreche
Spark Summit
 
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)
Michael Rys
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
Stamatis Zampetakis
 
Core Data Migrations and A Better Option
Core Data Migrations and A Better OptionCore Data Migrations and A Better Option
Core Data Migrations and A Better Option
Priya Rajagopal
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
Michael Rys
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
Michael Rys
 
Apache Hive
Apache HiveApache Hive
Apache Hive
Abhishek Gautam
 
U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)
Michael Rys
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)
Michael Rys
 
Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / Optiq
Julian Hyde
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4j
ArangoDB Database
 
Faites évoluer votre accès aux données avec MongoDB Stitch
Faites évoluer votre accès aux données avec MongoDB StitchFaites évoluer votre accès aux données avec MongoDB Stitch
Faites évoluer votre accès aux données avec MongoDB Stitch
MongoDB
 
Be A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineBe A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data Pipeline
Chester Chen
 
SparkSQL and Dataframe
SparkSQL and DataframeSparkSQL and Dataframe
SparkSQL and Dataframe
Namgee Lee
 
Siddhi - cloud-native stream processor
Siddhi - cloud-native stream processorSiddhi - cloud-native stream processor
Siddhi - cloud-native stream processor
Sriskandarajah Suhothayan
 
Spark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. JyotiskaSpark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. Jyotiska
Sigmoid
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
Michael Rys
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
Michael Rys
 

What's hot (20)

Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
 
Apache Spark Side of Funnels
Apache Spark Side of FunnelsApache Spark Side of Funnels
Apache Spark Side of Funnels
 
Data centric Metaprogramming by Vlad Ulreche
Data centric Metaprogramming by Vlad UlrecheData centric Metaprogramming by Vlad Ulreche
Data centric Metaprogramming by Vlad Ulreche
 
Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)Using C# with U-SQL (SQLBits 2016)
Using C# with U-SQL (SQLBits 2016)
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
Core Data Migrations and A Better Option
Core Data Migrations and A Better OptionCore Data Migrations and A Better Option
Core Data Migrations and A Better Option
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Apache Hive
Apache HiveApache Hive
Apache Hive
 
U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)U-SQL Reading & Writing Files (SQLBits 2016)
U-SQL Reading & Writing Files (SQLBits 2016)
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)
 
Drill / SQL / Optiq
Drill / SQL / OptiqDrill / SQL / Optiq
Drill / SQL / Optiq
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4j
 
Faites évoluer votre accès aux données avec MongoDB Stitch
Faites évoluer votre accès aux données avec MongoDB StitchFaites évoluer votre accès aux données avec MongoDB Stitch
Faites évoluer votre accès aux données avec MongoDB Stitch
 
Be A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data PipelineBe A Hero: Transforming GoPro Analytics Data Pipeline
Be A Hero: Transforming GoPro Analytics Data Pipeline
 
SparkSQL and Dataframe
SparkSQL and DataframeSparkSQL and Dataframe
SparkSQL and Dataframe
 
Siddhi - cloud-native stream processor
Siddhi - cloud-native stream processorSiddhi - cloud-native stream processor
Siddhi - cloud-native stream processor
 
Spark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. JyotiskaSpark Dataframe - Mr. Jyotiska
Spark Dataframe - Mr. Jyotiska
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
 

Viewers also liked

New york data brewery meetup #1 – introduction
New york data brewery meetup #1 – introductionNew york data brewery meetup #1 – introduction
New york data brewery meetup #1 – introduction
Stefan Urbanek
 
Knowledge Management Lecture 3: Cycle
Knowledge Management Lecture 3: CycleKnowledge Management Lecture 3: Cycle
Knowledge Management Lecture 3: Cycle
Stefan Urbanek
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
Growth Intelligence
 
Knowledge Management Lecture 1: definition, history and presence
Knowledge Management Lecture 1: definition, history and presenceKnowledge Management Lecture 1: definition, history and presence
Knowledge Management Lecture 1: definition, history and presence
Stefan Urbanek
 
Knowledge Management Presentation
Knowledge Management PresentationKnowledge Management Presentation
Knowledge Management Presentation
kreaume
 
Knowledge Management Lecture 4: Models
Knowledge Management Lecture 4: ModelsKnowledge Management Lecture 4: Models
Knowledge Management Lecture 4: Models
Stefan Urbanek
 
Knowledge management
Knowledge managementKnowledge management
Knowledge management
Sehar Abbas
 

Viewers also liked (7)

New york data brewery meetup #1 – introduction
New york data brewery meetup #1 – introductionNew york data brewery meetup #1 – introduction
New york data brewery meetup #1 – introduction
 
Knowledge Management Lecture 3: Cycle
Knowledge Management Lecture 3: CycleKnowledge Management Lecture 3: Cycle
Knowledge Management Lecture 3: Cycle
 
A Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with LuigiA Beginner's Guide to Building Data Pipelines with Luigi
A Beginner's Guide to Building Data Pipelines with Luigi
 
Knowledge Management Lecture 1: definition, history and presence
Knowledge Management Lecture 1: definition, history and presenceKnowledge Management Lecture 1: definition, history and presence
Knowledge Management Lecture 1: definition, history and presence
 
Knowledge Management Presentation
Knowledge Management PresentationKnowledge Management Presentation
Knowledge Management Presentation
 
Knowledge Management Lecture 4: Models
Knowledge Management Lecture 4: ModelsKnowledge Management Lecture 4: Models
Knowledge Management Lecture 4: Models
 
Knowledge management
Knowledge managementKnowledge management
Knowledge management
 

Similar to Cubes 1.0 Overview

Scale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | GimelScale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | Gimel
Deepak Chandramouli
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)
Ido Green
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
Ruhani Arora
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuery
Márton Kodok
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
Jim Dowling
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
Amazon Web Services
 
Snowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your businessSnowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your business
yalisassoon
 
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, BarcelonaReal-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Dobo Radichkov
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
GeeksLab Odessa
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
Amit Juneja
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
Mark Kromer
 
Snowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your businessSnowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your business
Giuseppe Gaviani
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Grega Kespret
 
2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL
Yu Ishikawa
 
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...
Noriaki Tatsumi
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch Warmup
Márton Kodok
 
Simplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseSimplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data Warehouse
FeatureByte
 
Bw training 1 intro dw
Bw training   1 intro dwBw training   1 intro dw
Bw training 1 intro dw
Joseph Tham
 
[WSO2Con Asia 2018] Patterns for Building Streaming Apps
[WSO2Con Asia 2018] Patterns for Building Streaming Apps[WSO2Con Asia 2018] Patterns for Building Streaming Apps
[WSO2Con Asia 2018] Patterns for Building Streaming Apps
WSO2
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
Johann Schleier-Smith
 

Similar to Cubes 1.0 Overview (20)

Scale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | GimelScale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | Gimel
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuery
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
Snowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your businessSnowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your business
 
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, BarcelonaReal-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
Real-time serverless analytics at Shedd – OLX data summit, Mar 2018, Barcelona
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Snowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your businessSnowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your business
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 
2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL
 
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...
GraphQL Summit 2019 - Configuration Driven Data as a Service Gateway with Gra...
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch Warmup
 
Simplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data WarehouseSimplify Feature Engineering in Your Data Warehouse
Simplify Feature Engineering in Your Data Warehouse
 
Bw training 1 intro dw
Bw training   1 intro dwBw training   1 intro dw
Bw training 1 intro dw
 
[WSO2Con Asia 2018] Patterns for Building Streaming Apps
[WSO2Con Asia 2018] Patterns for Building Streaming Apps[WSO2Con Asia 2018] Patterns for Building Streaming Apps
[WSO2Con Asia 2018] Patterns for Building Streaming Apps
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
 

More from Stefan Urbanek

StepTalk Introduction
StepTalk IntroductionStepTalk Introduction
StepTalk Introduction
Stefan Urbanek
 
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Stefan Urbanek
 
Sepro - introduction
Sepro - introductionSepro - introduction
Sepro - introduction
Stefan Urbanek
 
Dallas Data Brewery Meetup #2: Data Quality Perception
Dallas Data Brewery Meetup #2: Data Quality PerceptionDallas Data Brewery Meetup #2: Data Quality Perception
Dallas Data Brewery Meetup #2: Data Quality Perception
Stefan Urbanek
 
Dallas Data Brewery - introduction
Dallas Data Brewery - introductionDallas Data Brewery - introduction
Dallas Data Brewery - introduction
Stefan Urbanek
 
Knowledge Management Lecture 2: Individuals, communities and organizations
Knowledge Management Lecture 2: Individuals, communities and organizationsKnowledge Management Lecture 2: Individuals, communities and organizations
Knowledge Management Lecture 2: Individuals, communities and organizations
Stefan Urbanek
 
Open spending as-is 2011-06
Open spending   as-is 2011-06Open spending   as-is 2011-06
Open spending as-is 2011-06
Stefan Urbanek
 
Cubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP FrameworkCubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP Framework
Stefan Urbanek
 
Open Data Decentralisation
Open Data DecentralisationOpen Data Decentralisation
Open Data Decentralisation
Stefan Urbanek
 
Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)
Stefan Urbanek
 
Knowledge Management Introduction
Knowledge Management IntroductionKnowledge Management Introduction
Knowledge Management Introduction
Stefan Urbanek
 

More from Stefan Urbanek (11)

StepTalk Introduction
StepTalk IntroductionStepTalk Introduction
StepTalk Introduction
 
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
 
Sepro - introduction
Sepro - introductionSepro - introduction
Sepro - introduction
 
Dallas Data Brewery Meetup #2: Data Quality Perception
Dallas Data Brewery Meetup #2: Data Quality PerceptionDallas Data Brewery Meetup #2: Data Quality Perception
Dallas Data Brewery Meetup #2: Data Quality Perception
 
Dallas Data Brewery - introduction
Dallas Data Brewery - introductionDallas Data Brewery - introduction
Dallas Data Brewery - introduction
 
Knowledge Management Lecture 2: Individuals, communities and organizations
Knowledge Management Lecture 2: Individuals, communities and organizationsKnowledge Management Lecture 2: Individuals, communities and organizations
Knowledge Management Lecture 2: Individuals, communities and organizations
 
Open spending as-is 2011-06
Open spending   as-is 2011-06Open spending   as-is 2011-06
Open spending as-is 2011-06
 
Cubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP FrameworkCubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP Framework
 
Open Data Decentralisation
Open Data DecentralisationOpen Data Decentralisation
Open Data Decentralisation
 
Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)
 
Knowledge Management Introduction
Knowledge Management IntroductionKnowledge Management Introduction
Knowledge Management Introduction
 

Cubes 1.0 Overview

  • 1. Cubes 1.0 Overview light data warehouse and conceptual modelling Štefan Urbánek, @Stiivi stefan.urbanek@gmail.com November 2014
  • 3. model data reporting apps / modules metadata
  • 6. OLAP (online analytical processing) lightweight framework for conceptual modelling and analytics
  • 8. Workspace 1 × 1 × store process or server | model
  • 10. Models Stores file Σ database API Postgres Mongo API multiple model parts, different sources multiple data sources, heterogenous
  • 12. Python ≥ 3.4 works with ≥ 2.7 too for the “two” series
  • 13. ■ analytical workspace ■ model providers ■ new and improved backends ■ better extensibility ■ authorisation
  • 15. Model Providers Cubes Stores Static Model Provider API Model Provider sales churn activations events BI Data (Postgres) BI Data 2 (Mongo) Events (API)
  • 16. Workspace Model Providers Cubes Stores Static Model Provider sales churn activations events BI Data (Postgres) BI Data 2 (Mongo) crm sales events [workspace] models_path: /var/lib/cubes/models [models] crm: crm.cubesmodel sales: sales.cubesmodel events: events.cubesmodel [store crm] type: sql url: postgresql://localhost/crm [store events] type: mongo host: localhost collection: events
  • 17. BYOB bring your own backend Slicer
  • 19. | Browser " Store # Provider
  • 20. Logical Physical create model connect physical data store (database or API) | Browser " Store # Provider Σ aggregate model cubes dimensions model backend objects
  • 21. Model Provider model cubes dimensions
  • 23. Model Provider ■ metadata on-the-fly ■ local or external source ■ might be linked to a store model cubes dimensions
  • 24. Model required required automatic automatic automatic Dimensions column (table) key/attribute property dimension Cubes / Facts table collection event metric Backend SQL MongoDB Mixpanel Google Analytics Slicer cube dimension
  • 26. Model ■ measures → aggregates ■ more front-end metadata cube categories, dimension role and cardinality ■ customised dimension linking
  • 28. "measures": [ { "name": "amount", "label": "Sales Amount" }, { "name": "vat", "label": "VAT" } ] "aggregates": [ { "name": “total_sales", "label": "Total Sales Amount", "measure": "amount", "function": "sum" }, { "name": “total_vat", "label": "Total VAT", "measure": "vat", "function": "sum" }, { "name": "item_count", "label": "Item Count", "function": "count" } ]
  • 29. Aggregates ■ custom name ■ can refer to other aggregates post-aggregation calculations ■ functions are backend-specific SQL aggregations: sum, count, count_nonempty, count_distinct, min, max, avg, stddev, variance, …
  • 30. Contextual Dimensions { "measures": [ … ], "dimensions": [ {"name": "date", "hierarchies": ["ym", "yqm"]}, {"name": "date", "alias": "contract_date"} ], … } customisable linking properties: alias, hierarchies, exclude_hierarchies, default_hierarchy_name, cardinality, nonadditive
  • 31. Dimension Roles dimension.role time level.role year, month, day, … hint for reporting applications or backends
  • 32. Cardinality overload precautions dimension.cardinality level.cardinality tiny < low < medium < high < <
  • 34. Browser ■ uses logical model ■ implements aggregation ■ builds queries ■ retrieves data Logical Physical physical data store (database or API) | Browser " Store Σ aggregate model
  • 35. Browser Methods ■ features() ■ aggregate(cell, drilldown,…) ■ members(cell, dimension, …) ■ facts(cell, …) ■ fact(id) ■ cell_details(cell, drilldown, …)
  • 36. Split Cell False True aggregate(split=cell) __within_split__ generated dimension
  • 37. Post-aggregation “statutils” ■ computed on aggregation result in Python ■ moving averages, deviation, variance wma, sma, sms, smstd, smsrd, smsvar ■ aggregate property: window_size
  • 38. Store
  • 39. Store ■ provides database or API connection ■ might provide a model ■ slicer tool actions (future) validation, schema, optimization, ... Logical Physical physical data store (database or API) | Browser connect " Store
  • 40. SQL Backend also known as ROLAP or SQL query generator
  • 41. SQL Overview ■ new query builder ■ join optimisation ■ support for outer-joins ■ support for “split” dimension ■ new aggregate functions
  • 42. ❄ fact table join optimisation
  • 43. facts match date master detail facts detail date master detail facts master date master detail "joins" = [ { "master": "fact_contracts.contract_date_id", "detail": "dim_date.id", "method": "detail" } ]
  • 47. { “lidia”: { “allowed_cubes”: [“sales”], “cube_restrictions”: { “sales”: [“store:3”] } }, “martin”: { “allowed_cubes”: [“sales”], “cube_restrictions”: { “sales”: [“store:5”] } } }
  • 48. [workspace] authorization: simple [authorization] rights_file: access_rights.json ! Authorizer
  • 51. Model Queries ■ GET /cubes overview of cubes from all providers ■ GET /cube/sales/model detailed cube model with described dimensions
  • 52. Browser Queries ■ GET /cube/name/aggregate ■ GET /cube/name/members/dim ■ GET /cube/name/facts ■ GET /cube/name/fact ■ GET /cube/name/cell
  • 53. Aggregate GET /cube/sales/aggregate? cut=date:2010 & drilldown=date|region & split=status:1 & page=10 & page_size=100
  • 54. { "cell": [], "total_cell_count": 2, "drilldown": [ { "record_count": 31, "amount_sum": 550840, “date.year": 2009 }, { "record_count": 31, "amount_sum": 566020, “date.year": 2010 } ], "summary": { "record_count": 62, "amount_sum": 1116860 } }
  • 55. Special Characters “category:10-24” → “10-24” “city:Nové Mesto nad Váhom” → “Nové Mesto nad Váhom"
  • 56. Relative Time uses dimension roles and Calendar date:yesterday date:90daysago-today expiration_date:lastmonth-next2months
  • 57. Output Format format=csv format=json format=json_lines * *for facts and members
  • 58. Deployment reporting for your app or stand-alone
  • 59. Public HTML & JS Application Slicer server store HTTP request JSON reply model Public HTML & JS Application WSGI store HTTP request JSON reply Slicer Flask App model Public HTML Django, Flask, … store JSON reply Cubes Python API model Public Public store Flask HTML HTML Web Application PHP, RoR, Django Slicer server Slicer Blueprint model Internal store HTTP request JSON reply model
  • 67. Cubes Viewer Jose Juan Montes, ! jjmontesl/cubesviewer
  • 68. checkgermany.de * Front-end by: Felix Ebert (@femeb) Data by: Friedrich Lindenberg (@pudo) Cubes 0.10.2
  • 70. Summary ■ heterogenous pluggable environment ■ easier to extend ■ better SQL query generator
  • 71. Not Mentioned ■ localisation ■ namespaces ■ calendar ■ query logging
  • 72. Incubated ■ non-additive properties ■ periods-to-date ■ modeler app ■ cubes.js
  • 73. Future ■ arithmetic expressions ■ SQL improvements ■ improved API for custom browsers ■ cubes.js
  • 74. Nutrition Facts Serving Size 1 cube Amount Per Serving Total Fat 0g Saturated Fat 0g Trans Fat 0g % Daily Value Total Carbohydrate 0g Dietary Fiber 0g Sugars 0g 0% 0%
  • 75. Want to contribute? #TODO, #FIXME, Issue # https://github.com/DataBrewery/cubes/issues
  • 77. Thanks for 1.0 Robin Thomas Ryan Berlew Jose Juan Montes Squarespace and all contributors on Github