SlideShare a Scribd company logo
DOLAP@EDBT/ICDT 2023
The Whys and
Wherefores of Cubes
Matteo Francia1, Stefano Rizzi1, Patrick Marcel2
1DISI, University of Bologna, Italy 2LIFAT, University of Tours, France
DOLAP 2023: 25th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data
DOLAP@EDBT/ICDT 2023
Intentional Analytics Model
Context: Intentional Analytics Model (IAM) [1]
- Facilitate OLAP analysis of multidimensional cubes
- Escape from query answers as plain tables
Express high-level intentions, not queries
- Describe, Assess, Explain, etc.
Get cubes enhanced with insights
- Apply (mining/ML) models to data
- Return interesting insights
Explain: finding interesting relationships in cube facts
- Data exploration: automatically extracts meaningful relationships from facts
- Validating user’s belief: check if known relationships hold
- In agriculture, the quantity of potassium is correlated with the quality of Kiwifruits.
Do facts confirm this belief?
Matteo Francia – University of Bologna 2
[1] Panos Vassiliadis, Patrick Marcel, Stefano Rizzi: Beyond roll-up's and drill-down's:
An intentional analytics model to reinvent OLAP. Inf. Syst. 85: 68-91 (2019)
DOLAP@EDBT/ICDT 2023
Classical OLAP
Case study:
- Given the cube of Sales
- Explain monthly revenue against cost and quantity
If we had to do this in plain OLAP
- Query the cube, get a plain table
- Manually identify interesting patterns
But…
- What if we have thousands of cells?
- What if we have many measures?
- Can we have an effective representation?
Matteo Francia – University of Bologna 3
select month, sum(quantity), sum(cost), sum(revenue)
from sales_ft join date_dt on (…)
group by month
product
type
category
customer
gender
store
city
country
date month year
quantity
revenue
cost
SALES
month cost quantity revenue
125 10 12 125
132 20 14 150
12 30 10 60
15 40 5 15
50 50 9 50
DOLAP@EDBT/ICDT 2023
Intentional OLAP: Explain
`Explain` intention:
with cube explain m [ for P ] by l1,…,ln [ against m1, ..., mr ]
“Explained” measure: m
Selection predicate: P (consider all facts if omitted)
Group-by set: l1,…,ln (at least one level)
Measures: m1, ..., mr (compute against all measures if omitted)
Semantics translates into an execution plan
i. Execute query for given cube,
measures, predicate, group-by set
ii. Apply models explaining relationships
through components
iii. Rank components by interestingness
iv. Return effective visualization
Matteo Francia – University of Bologna 4
with sales explain
revenue by month
Analytic dashboard
R² = 0.9901
revenue
quantity
month cost quantity revenue
125 10 12 125
132 20 14 150
12 30 10 60
15 40 5 15
50 50 9 50
DOLAP@EDBT/ICDT 2023
Model
Models are “types” of relationships hiding in the cube facts
- Are made of components, each being a specific relationship…
- … computed on levels/members/measures
To give a proof-of-concept, we restrict to consider
- A single model: polynomial regression
- Each component is a polynomial relationship
between a pair of measures (univariate regression)
- The dependent variable revenue is modeled as an
dth degree polynomial in the independent variable
(e.g., quantity)
Matteo Francia – University of Bologna 5
R² = 0.9901
revenue
quantity R² = 0.6524
revenue
cost
Model: Polynomial regression
A component
(revenue, quantity)
Another component
(revenue, cost)
with sales explain
revenue by month
DOLAP@EDBT/ICDT 2023
Components
Each component is a polynomial relationship αd
( ) between a pair of measures
- How to choose the “best” polynomial and avoid overfitting?
- E.g., consider revenue = αd
(𝑐𝑜𝑠𝑡)
We need an error function weighting the degree (d): fact αd fact.m −fact.m
2
facts −d −1
- αd
( ) is the polynomial with degree d fitted with OrdinaryLeastSquares method
- The error is computed against a test set containing 30% of the facts
Matteo Francia – University of Bologna 6
Too simple
(high error, low polynomial degree)
Too complex
(lower error, higher degree)
DOLAP@EDBT/ICDT 2023
Computing components
Matteo Francia – University of Bologna 7
Start with d=0 and fit the polynomial
DOLAP@EDBT/ICDT 2023
Iterate:
- Increase the degree…
- … until we find a minimum of the error
To ensure training on “sufficient” facts
- Apply the one-to-ten rule of thumb
d=1
d=2
d=3
Computing components
Matteo Francia – University of Bologna 8
DOLAP@EDBT/ICDT 2023
Computing components
Matteo Francia – University of Bologna 9
Iterate:
- Increase the degree…
- … until we find a minimum of the error
d=2
DOLAP@EDBT/ICDT 2023
Computing components
Matteo Francia – University of Bologna 10
Iterate:
- Increase the degree…
- … until we find a minimum of the error
d=2
This could be a local minimum, but we
prefer to return a simpler model
• y = α2 x = a + bx + cx2
• y’ = α4
x = a + bx + … + ex4
DOLAP@EDBT/ICDT 2023
Interestingness
GOAL: given components, return the most interesting one
Interestingness: how variation in the dependent variable is predictable from the independent variable
- This is encoded by the coefficient of determination R2
- The better the model, the closer the value of R2 to 1
Matteo Francia – University of Bologna 11
R² = 0.9901
revenue
quantity R² = 0.6524
revenue
cost
Model: Polynomial regression
with sales explain
revenue by month
R² = 0.9901
revenue
quantity
month cost quantity revenue
125 10 12 125
132 20 14 150
12 30 10 60
15 40 5 15
50 50 9 50
DOLAP@EDBT/ICDT 2023
Visualization
Matteo Francia – University of Bologna 12
Matteo Francia, Matteo Golfarelli, Stefano Rizzi. Describing and Assessing Cubes Through Intentional Analytics. EDBT 2023 (demo)
Notebook-like interface
DOLAP@EDBT/ICDT 2023
(b) Computing on 106 facts (Synth. dataset)
scales linearly wrt the measures in the cube
Evaluation
(a) Computing the results on ~90K facts
(Foodmart dataset) takes 0.5 seconds
Matteo Francia – University of Bologna 13
Implemented in Python with numpy and sk-learn libraries
- The tests were run on an Intel(R) Core(TM)i7-6700 CPU@3.40GHz CPU with 8GB RAM
https://github.com/big-unibo/explain
DOLAP@EDBT/ICDT 2023
Discussion
Overall, this paper is not about:
- (Polynomial) Regression optimization
- “Yet Another” explainability approach
We propose a modular framework where approaches to aggregate data explanation can be plugged
- Regression: return relationships between a dependent variable and one or more independent variables [4]
- Data lineage: which database tuple(s) caused that output to the query? [1]
- Intervention: an input is a cause to an output if a change affects the output [2, 3]
The added value is in the IAM paradigm and augmented analytics
- Data scientists can express high-level intentions…
- … and the system (automatically) selects the most interesting explanations
- … coupled with data and visualization
14
[1] Alexandra Meliou et al. 2010. The Complexity of Causality and Responsibility for Query Answers and non-Answers. VLDB
[2] Sudeepa Roy et al. 2014. A formal approach to finding explanations for database queries. SIGMOD
[3] Zhengjie Miao et al. 2019. LensXPlain: Visualizing and Explaining Contributing Subsets for Aggregate Query Answers. VLDB
[4] Fotis Savva et al. 2018. Explaining Aggregates for Exploratory Analytics. BigData.
https://xkcd.com/605/
DOLAP@EDBT/ICDT 2023
Conclusion & research directions
We have given a proof-of-concept for explain intentions
- Syntax is flexible enough to suit users who wish to verify a specific hypothesis they made
- Intention processing takes a few seconds even on very large query results
- Performances are in line with the interactivity requirements of OLAP sessions
Future research directions
- Explain relationships between a measure and two or more other measures (e.g., multivariate regression)
- Evaluate the effectiveness of the approach by experimenting it with real users
- Generalize the definition of model to cope with additional model types from the literature
- Experiment other interestingness metrics
- Conciseness: large explanations will probably be not well understandable
- Interpretability: the suitability of an explanation will depend on the target users
- Actionability: explanations should point to actionable suggestions
Matteo Francia – University of Bologna 15
DOLAP@EDBT/ICDT 2023
Questions?
Matteo Francia – University of Bologna 16
Thank you.

More Related Content

Similar to [DOLAP2023] The Whys and Wherefores of Cubes

Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)
Sanjeev Deshmukh
 
Classes without Dependencies - UseR 2018
Classes without Dependencies - UseR 2018Classes without Dependencies - UseR 2018
Classes without Dependencies - UseR 2018
Sam Clifford
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approach
Lorenzo Cesaretti
 
E05312426
E05312426E05312426
E05312426
IOSR-JEN
 
BDS_QA.pdf
BDS_QA.pdfBDS_QA.pdf
BDS_QA.pdf
NikunjaParida1
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Osman Ali
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
Paolo Missier
 
Customer Segmentation with R - Deep Dive into flexclust
Customer Segmentation with R - Deep Dive into flexclustCustomer Segmentation with R - Deep Dive into flexclust
Customer Segmentation with R - Deep Dive into flexclust
Jim Porzak
 
Machine Learning vs Decision Optimization comparison
Machine Learning vs Decision Optimization comparisonMachine Learning vs Decision Optimization comparison
Machine Learning vs Decision Optimization comparison
Alain Chabrier
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET Journal
 
Introduction to operations research
Introduction to operations researchIntroduction to operations research
Introduction to operations research
Dr. Abdulfatah Salem
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
Shanmugasundaram M
 
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesFeature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Andreas Metzger
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
Christopher Wilson
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
David Raj Kanthi
 
MISY 3331 Advanced Database ConceptsAssignment 3Dr. Sotirios .docx
MISY 3331 Advanced Database ConceptsAssignment 3Dr.  Sotirios .docxMISY 3331 Advanced Database ConceptsAssignment 3Dr.  Sotirios .docx
MISY 3331 Advanced Database ConceptsAssignment 3Dr. Sotirios .docx
altheaboyer
 
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative FilteringIRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET Journal
 
vtu data structures lab manual bcs304 pdf
vtu data structures lab manual bcs304 pdfvtu data structures lab manual bcs304 pdf
vtu data structures lab manual bcs304 pdf
LPSChandana
 
Dad (Data Analysis And Design)
Dad (Data Analysis And Design)Dad (Data Analysis And Design)
Dad (Data Analysis And Design)
Jill Lyons
 
Practical Challenges ML Workflows
Practical Challenges ML WorkflowsPractical Challenges ML Workflows
Practical Challenges ML Workflows
Jenny Midwinter
 

Similar to [DOLAP2023] The Whys and Wherefores of Cubes (20)

Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)
 
Classes without Dependencies - UseR 2018
Classes without Dependencies - UseR 2018Classes without Dependencies - UseR 2018
Classes without Dependencies - UseR 2018
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approach
 
E05312426
E05312426E05312426
E05312426
 
BDS_QA.pdf
BDS_QA.pdfBDS_QA.pdf
BDS_QA.pdf
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Customer Segmentation with R - Deep Dive into flexclust
Customer Segmentation with R - Deep Dive into flexclustCustomer Segmentation with R - Deep Dive into flexclust
Customer Segmentation with R - Deep Dive into flexclust
 
Machine Learning vs Decision Optimization comparison
Machine Learning vs Decision Optimization comparisonMachine Learning vs Decision Optimization comparison
Machine Learning vs Decision Optimization comparison
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
 
Introduction to operations research
Introduction to operations researchIntroduction to operations research
Introduction to operations research
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesFeature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
MISY 3331 Advanced Database ConceptsAssignment 3Dr. Sotirios .docx
MISY 3331 Advanced Database ConceptsAssignment 3Dr.  Sotirios .docxMISY 3331 Advanced Database ConceptsAssignment 3Dr.  Sotirios .docx
MISY 3331 Advanced Database ConceptsAssignment 3Dr. Sotirios .docx
 
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative FilteringIRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
 
vtu data structures lab manual bcs304 pdf
vtu data structures lab manual bcs304 pdfvtu data structures lab manual bcs304 pdf
vtu data structures lab manual bcs304 pdf
 
Dad (Data Analysis And Design)
Dad (Data Analysis And Design)Dad (Data Analysis And Design)
Dad (Data Analysis And Design)
 
Practical Challenges ML Workflows
Practical Challenges ML WorkflowsPractical Challenges ML Workflows
Practical Challenges ML Workflows
 

More from University of Bologna

Data models in precision agriculture: from IoT to big data analytics
Data models in precision agriculture: from IoT to big data analyticsData models in precision agriculture: from IoT to big data analytics
Data models in precision agriculture: from IoT to big data analytics
University of Bologna
 
[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...
[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...
[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...
University of Bologna
 
[DataPlat2023] Opening
[DataPlat2023] Opening[DataPlat2023] Opening
[DataPlat2023] Opening
University of Bologna
 
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
University of Bologna
 
[EDBT2021] Conversational OLAP in Action (Best Demo Award EDBT2021)
[EDBT2021] Conversational OLAP in Action (Best Demo Award EDBT2021)[EDBT2021] Conversational OLAP in Action (Best Demo Award EDBT2021)
[EDBT2021] Conversational OLAP in Action (Best Demo Award EDBT2021)
University of Bologna
 
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
University of Bologna
 
[DOLAP2020] Towards Conversational OLAP
[DOLAP2020] Towards Conversational OLAP[DOLAP2020] Towards Conversational OLAP
[DOLAP2020] Towards Conversational OLAP
University of Bologna
 
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
University of Bologna
 
[DOLAP2019] Augmented Business Intelligence
[DOLAP2019] Augmented Business Intelligence[DOLAP2019] Augmented Business Intelligence
[DOLAP2019] Augmented Business Intelligence
University of Bologna
 

More from University of Bologna (9)

Data models in precision agriculture: from IoT to big data analytics
Data models in precision agriculture: from IoT to big data analyticsData models in precision agriculture: from IoT to big data analytics
Data models in precision agriculture: from IoT to big data analytics
 
[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...
[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...
[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...
 
[DataPlat2023] Opening
[DataPlat2023] Opening[DataPlat2023] Opening
[DataPlat2023] Opening
 
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
 
[EDBT2021] Conversational OLAP in Action (Best Demo Award EDBT2021)
[EDBT2021] Conversational OLAP in Action (Best Demo Award EDBT2021)[EDBT2021] Conversational OLAP in Action (Best Demo Award EDBT2021)
[EDBT2021] Conversational OLAP in Action (Best Demo Award EDBT2021)
 
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
 
[DOLAP2020] Towards Conversational OLAP
[DOLAP2020] Towards Conversational OLAP[DOLAP2020] Towards Conversational OLAP
[DOLAP2020] Towards Conversational OLAP
 
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
 
[DOLAP2019] Augmented Business Intelligence
[DOLAP2019] Augmented Business Intelligence[DOLAP2019] Augmented Business Intelligence
[DOLAP2019] Augmented Business Intelligence
 

Recently uploaded

一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 

Recently uploaded (20)

一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 

[DOLAP2023] The Whys and Wherefores of Cubes

  • 1. DOLAP@EDBT/ICDT 2023 The Whys and Wherefores of Cubes Matteo Francia1, Stefano Rizzi1, Patrick Marcel2 1DISI, University of Bologna, Italy 2LIFAT, University of Tours, France DOLAP 2023: 25th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data
  • 2. DOLAP@EDBT/ICDT 2023 Intentional Analytics Model Context: Intentional Analytics Model (IAM) [1] - Facilitate OLAP analysis of multidimensional cubes - Escape from query answers as plain tables Express high-level intentions, not queries - Describe, Assess, Explain, etc. Get cubes enhanced with insights - Apply (mining/ML) models to data - Return interesting insights Explain: finding interesting relationships in cube facts - Data exploration: automatically extracts meaningful relationships from facts - Validating user’s belief: check if known relationships hold - In agriculture, the quantity of potassium is correlated with the quality of Kiwifruits. Do facts confirm this belief? Matteo Francia – University of Bologna 2 [1] Panos Vassiliadis, Patrick Marcel, Stefano Rizzi: Beyond roll-up's and drill-down's: An intentional analytics model to reinvent OLAP. Inf. Syst. 85: 68-91 (2019)
  • 3. DOLAP@EDBT/ICDT 2023 Classical OLAP Case study: - Given the cube of Sales - Explain monthly revenue against cost and quantity If we had to do this in plain OLAP - Query the cube, get a plain table - Manually identify interesting patterns But… - What if we have thousands of cells? - What if we have many measures? - Can we have an effective representation? Matteo Francia – University of Bologna 3 select month, sum(quantity), sum(cost), sum(revenue) from sales_ft join date_dt on (…) group by month product type category customer gender store city country date month year quantity revenue cost SALES month cost quantity revenue 125 10 12 125 132 20 14 150 12 30 10 60 15 40 5 15 50 50 9 50
  • 4. DOLAP@EDBT/ICDT 2023 Intentional OLAP: Explain `Explain` intention: with cube explain m [ for P ] by l1,…,ln [ against m1, ..., mr ] “Explained” measure: m Selection predicate: P (consider all facts if omitted) Group-by set: l1,…,ln (at least one level) Measures: m1, ..., mr (compute against all measures if omitted) Semantics translates into an execution plan i. Execute query for given cube, measures, predicate, group-by set ii. Apply models explaining relationships through components iii. Rank components by interestingness iv. Return effective visualization Matteo Francia – University of Bologna 4 with sales explain revenue by month Analytic dashboard R² = 0.9901 revenue quantity month cost quantity revenue 125 10 12 125 132 20 14 150 12 30 10 60 15 40 5 15 50 50 9 50
  • 5. DOLAP@EDBT/ICDT 2023 Model Models are “types” of relationships hiding in the cube facts - Are made of components, each being a specific relationship… - … computed on levels/members/measures To give a proof-of-concept, we restrict to consider - A single model: polynomial regression - Each component is a polynomial relationship between a pair of measures (univariate regression) - The dependent variable revenue is modeled as an dth degree polynomial in the independent variable (e.g., quantity) Matteo Francia – University of Bologna 5 R² = 0.9901 revenue quantity R² = 0.6524 revenue cost Model: Polynomial regression A component (revenue, quantity) Another component (revenue, cost) with sales explain revenue by month
  • 6. DOLAP@EDBT/ICDT 2023 Components Each component is a polynomial relationship αd ( ) between a pair of measures - How to choose the “best” polynomial and avoid overfitting? - E.g., consider revenue = αd (𝑐𝑜𝑠𝑡) We need an error function weighting the degree (d): fact αd fact.m −fact.m 2 facts −d −1 - αd ( ) is the polynomial with degree d fitted with OrdinaryLeastSquares method - The error is computed against a test set containing 30% of the facts Matteo Francia – University of Bologna 6 Too simple (high error, low polynomial degree) Too complex (lower error, higher degree)
  • 7. DOLAP@EDBT/ICDT 2023 Computing components Matteo Francia – University of Bologna 7 Start with d=0 and fit the polynomial
  • 8. DOLAP@EDBT/ICDT 2023 Iterate: - Increase the degree… - … until we find a minimum of the error To ensure training on “sufficient” facts - Apply the one-to-ten rule of thumb d=1 d=2 d=3 Computing components Matteo Francia – University of Bologna 8
  • 9. DOLAP@EDBT/ICDT 2023 Computing components Matteo Francia – University of Bologna 9 Iterate: - Increase the degree… - … until we find a minimum of the error d=2
  • 10. DOLAP@EDBT/ICDT 2023 Computing components Matteo Francia – University of Bologna 10 Iterate: - Increase the degree… - … until we find a minimum of the error d=2 This could be a local minimum, but we prefer to return a simpler model • y = α2 x = a + bx + cx2 • y’ = α4 x = a + bx + … + ex4
  • 11. DOLAP@EDBT/ICDT 2023 Interestingness GOAL: given components, return the most interesting one Interestingness: how variation in the dependent variable is predictable from the independent variable - This is encoded by the coefficient of determination R2 - The better the model, the closer the value of R2 to 1 Matteo Francia – University of Bologna 11 R² = 0.9901 revenue quantity R² = 0.6524 revenue cost Model: Polynomial regression with sales explain revenue by month R² = 0.9901 revenue quantity month cost quantity revenue 125 10 12 125 132 20 14 150 12 30 10 60 15 40 5 15 50 50 9 50
  • 12. DOLAP@EDBT/ICDT 2023 Visualization Matteo Francia – University of Bologna 12 Matteo Francia, Matteo Golfarelli, Stefano Rizzi. Describing and Assessing Cubes Through Intentional Analytics. EDBT 2023 (demo) Notebook-like interface
  • 13. DOLAP@EDBT/ICDT 2023 (b) Computing on 106 facts (Synth. dataset) scales linearly wrt the measures in the cube Evaluation (a) Computing the results on ~90K facts (Foodmart dataset) takes 0.5 seconds Matteo Francia – University of Bologna 13 Implemented in Python with numpy and sk-learn libraries - The tests were run on an Intel(R) Core(TM)i7-6700 CPU@3.40GHz CPU with 8GB RAM https://github.com/big-unibo/explain
  • 14. DOLAP@EDBT/ICDT 2023 Discussion Overall, this paper is not about: - (Polynomial) Regression optimization - “Yet Another” explainability approach We propose a modular framework where approaches to aggregate data explanation can be plugged - Regression: return relationships between a dependent variable and one or more independent variables [4] - Data lineage: which database tuple(s) caused that output to the query? [1] - Intervention: an input is a cause to an output if a change affects the output [2, 3] The added value is in the IAM paradigm and augmented analytics - Data scientists can express high-level intentions… - … and the system (automatically) selects the most interesting explanations - … coupled with data and visualization 14 [1] Alexandra Meliou et al. 2010. The Complexity of Causality and Responsibility for Query Answers and non-Answers. VLDB [2] Sudeepa Roy et al. 2014. A formal approach to finding explanations for database queries. SIGMOD [3] Zhengjie Miao et al. 2019. LensXPlain: Visualizing and Explaining Contributing Subsets for Aggregate Query Answers. VLDB [4] Fotis Savva et al. 2018. Explaining Aggregates for Exploratory Analytics. BigData. https://xkcd.com/605/
  • 15. DOLAP@EDBT/ICDT 2023 Conclusion & research directions We have given a proof-of-concept for explain intentions - Syntax is flexible enough to suit users who wish to verify a specific hypothesis they made - Intention processing takes a few seconds even on very large query results - Performances are in line with the interactivity requirements of OLAP sessions Future research directions - Explain relationships between a measure and two or more other measures (e.g., multivariate regression) - Evaluate the effectiveness of the approach by experimenting it with real users - Generalize the definition of model to cope with additional model types from the literature - Experiment other interestingness metrics - Conciseness: large explanations will probably be not well understandable - Interpretability: the suitability of an explanation will depend on the target users - Actionability: explanations should point to actionable suggestions Matteo Francia – University of Bologna 15
  • 16. DOLAP@EDBT/ICDT 2023 Questions? Matteo Francia – University of Bologna 16 Thank you.

Editor's Notes

  1. The Intentional Analytics Model (IAM) has been envisioned as a way to tightly couple OLAP and analytics by (i) letting users explore multidimensional cubes stating their intentions, and (ii) returning multidimensional data coupled with knowledge in- sights in the form of annotations of subsets of data
  2. average squared difference between the observed and predicted values. When a model has no error, the MSE equals zero.