SlideShare a Scribd company logo
DOLAP@EDBT/ICDT 2023
The Whys and
Wherefores of Cubes
Matteo Francia1, Stefano Rizzi1, Patrick Marcel2
1DISI, University of Bologna, Italy 2LIFAT, University of Tours, France
DOLAP 2023: 25th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data
DOLAP@EDBT/ICDT 2023
Intentional Analytics Model
Context: Intentional Analytics Model (IAM) [1]
- Facilitate OLAP analysis of multidimensional cubes
- Escape from query answers as plain tables
Express high-level intentions, not queries
- Describe, Assess, Explain, etc.
Get cubes enhanced with insights
- Apply (mining/ML) models to data
- Return interesting insights
Explain: finding interesting relationships in cube facts
- Data exploration: automatically extracts meaningful relationships from facts
- Validating user’s belief: check if known relationships hold
- In agriculture, the quantity of potassium is correlated with the quality of Kiwifruits.
Do facts confirm this belief?
Matteo Francia – University of Bologna 2
[1] Panos Vassiliadis, Patrick Marcel, Stefano Rizzi: Beyond roll-up's and drill-down's:
An intentional analytics model to reinvent OLAP. Inf. Syst. 85: 68-91 (2019)
DOLAP@EDBT/ICDT 2023
Classical OLAP
Case study:
- Given the cube of Sales
- Explain monthly revenue against cost and quantity
If we had to do this in plain OLAP
- Query the cube, get a plain table
- Manually identify interesting patterns
But…
- What if we have thousands of cells?
- What if we have many measures?
- Can we have an effective representation?
Matteo Francia – University of Bologna 3
select month, sum(quantity), sum(cost), sum(revenue)
from sales_ft join date_dt on (…)
group by month
product
type
category
customer
gender
store
city
country
date month year
quantity
revenue
cost
SALES
month cost quantity revenue
125 10 12 125
132 20 14 150
12 30 10 60
15 40 5 15
50 50 9 50
DOLAP@EDBT/ICDT 2023
Intentional OLAP: Explain
`Explain` intention:
with cube explain m [ for P ] by l1,…,ln [ against m1, ..., mr ]
“Explained” measure: m
Selection predicate: P (consider all facts if omitted)
Group-by set: l1,…,ln (at least one level)
Measures: m1, ..., mr (compute against all measures if omitted)
Semantics translates into an execution plan
i. Execute query for given cube,
measures, predicate, group-by set
ii. Apply models explaining relationships
through components
iii. Rank components by interestingness
iv. Return effective visualization
Matteo Francia – University of Bologna 4
with sales explain
revenue by month
Analytic dashboard
R² = 0.9901
revenue
quantity
month cost quantity revenue
125 10 12 125
132 20 14 150
12 30 10 60
15 40 5 15
50 50 9 50
DOLAP@EDBT/ICDT 2023
Model
Models are “types” of relationships hiding in the cube facts
- Are made of components, each being a specific relationship…
- … computed on levels/members/measures
To give a proof-of-concept, we restrict to consider
- A single model: polynomial regression
- Each component is a polynomial relationship
between a pair of measures (univariate regression)
- The dependent variable revenue is modeled as an
dth degree polynomial in the independent variable
(e.g., quantity)
Matteo Francia – University of Bologna 5
R² = 0.9901
revenue
quantity R² = 0.6524
revenue
cost
Model: Polynomial regression
A component
(revenue, quantity)
Another component
(revenue, cost)
with sales explain
revenue by month
DOLAP@EDBT/ICDT 2023
Components
Each component is a polynomial relationship αd
( ) between a pair of measures
- How to choose the “best” polynomial and avoid overfitting?
- E.g., consider revenue = αd
(𝑐𝑜𝑠𝑡)
We need an error function weighting the degree (d): fact αd fact.m −fact.m
2
facts −d −1
- αd
( ) is the polynomial with degree d fitted with OrdinaryLeastSquares method
- The error is computed against a test set containing 30% of the facts
Matteo Francia – University of Bologna 6
Too simple
(high error, low polynomial degree)
Too complex
(lower error, higher degree)
DOLAP@EDBT/ICDT 2023
Computing components
Matteo Francia – University of Bologna 7
Start with d=0 and fit the polynomial
DOLAP@EDBT/ICDT 2023
Iterate:
- Increase the degree…
- … until we find a minimum of the error
To ensure training on “sufficient” facts
- Apply the one-to-ten rule of thumb
d=1
d=2
d=3
Computing components
Matteo Francia – University of Bologna 8
DOLAP@EDBT/ICDT 2023
Computing components
Matteo Francia – University of Bologna 9
Iterate:
- Increase the degree…
- … until we find a minimum of the error
d=2
DOLAP@EDBT/ICDT 2023
Computing components
Matteo Francia – University of Bologna 10
Iterate:
- Increase the degree…
- … until we find a minimum of the error
d=2
This could be a local minimum, but we
prefer to return a simpler model
• y = α2 x = a + bx + cx2
• y’ = α4
x = a + bx + … + ex4
DOLAP@EDBT/ICDT 2023
Interestingness
GOAL: given components, return the most interesting one
Interestingness: how variation in the dependent variable is predictable from the independent variable
- This is encoded by the coefficient of determination R2
- The better the model, the closer the value of R2 to 1
Matteo Francia – University of Bologna 11
R² = 0.9901
revenue
quantity R² = 0.6524
revenue
cost
Model: Polynomial regression
with sales explain
revenue by month
R² = 0.9901
revenue
quantity
month cost quantity revenue
125 10 12 125
132 20 14 150
12 30 10 60
15 40 5 15
50 50 9 50
DOLAP@EDBT/ICDT 2023
Visualization
Matteo Francia – University of Bologna 12
Matteo Francia, Matteo Golfarelli, Stefano Rizzi. Describing and Assessing Cubes Through Intentional Analytics. EDBT 2023 (demo)
Notebook-like interface
DOLAP@EDBT/ICDT 2023
(b) Computing on 106 facts (Synth. dataset)
scales linearly wrt the measures in the cube
Evaluation
(a) Computing the results on ~90K facts
(Foodmart dataset) takes 0.5 seconds
Matteo Francia – University of Bologna 13
Implemented in Python with numpy and sk-learn libraries
- The tests were run on an Intel(R) Core(TM)i7-6700 CPU@3.40GHz CPU with 8GB RAM
https://github.com/big-unibo/explain
DOLAP@EDBT/ICDT 2023
Discussion
Overall, this paper is not about:
- (Polynomial) Regression optimization
- “Yet Another” explainability approach
We propose a modular framework where approaches to aggregate data explanation can be plugged
- Regression: return relationships between a dependent variable and one or more independent variables [4]
- Data lineage: which database tuple(s) caused that output to the query? [1]
- Intervention: an input is a cause to an output if a change affects the output [2, 3]
The added value is in the IAM paradigm and augmented analytics
- Data scientists can express high-level intentions…
- … and the system (automatically) selects the most interesting explanations
- … coupled with data and visualization
14
[1] Alexandra Meliou et al. 2010. The Complexity of Causality and Responsibility for Query Answers and non-Answers. VLDB
[2] Sudeepa Roy et al. 2014. A formal approach to finding explanations for database queries. SIGMOD
[3] Zhengjie Miao et al. 2019. LensXPlain: Visualizing and Explaining Contributing Subsets for Aggregate Query Answers. VLDB
[4] Fotis Savva et al. 2018. Explaining Aggregates for Exploratory Analytics. BigData.
https://xkcd.com/605/
DOLAP@EDBT/ICDT 2023
Conclusion & research directions
We have given a proof-of-concept for explain intentions
- Syntax is flexible enough to suit users who wish to verify a specific hypothesis they made
- Intention processing takes a few seconds even on very large query results
- Performances are in line with the interactivity requirements of OLAP sessions
Future research directions
- Explain relationships between a measure and two or more other measures (e.g., multivariate regression)
- Evaluate the effectiveness of the approach by experimenting it with real users
- Generalize the definition of model to cope with additional model types from the literature
- Experiment other interestingness metrics
- Conciseness: large explanations will probably be not well understandable
- Interpretability: the suitability of an explanation will depend on the target users
- Actionability: explanations should point to actionable suggestions
Matteo Francia – University of Bologna 15
DOLAP@EDBT/ICDT 2023
Questions?
Matteo Francia – University of Bologna 16
Thank you.

More Related Content

Similar to [DOLAP2023] The Whys and Wherefores of Cubes

Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)
Sanjeev Deshmukh
 
Classes without Dependencies - UseR 2018
Classes without Dependencies - UseR 2018Classes without Dependencies - UseR 2018
Classes without Dependencies - UseR 2018
Sam Clifford
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approach
Lorenzo Cesaretti
 
E05312426
E05312426E05312426
E05312426
IOSR-JEN
 
BDS_QA.pdf
BDS_QA.pdfBDS_QA.pdf
BDS_QA.pdf
NikunjaParida1
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Osman Ali
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
Paolo Missier
 
Customer Segmentation with R - Deep Dive into flexclust
Customer Segmentation with R - Deep Dive into flexclustCustomer Segmentation with R - Deep Dive into flexclust
Customer Segmentation with R - Deep Dive into flexclust
Jim Porzak
 
Machine Learning vs Decision Optimization comparison
Machine Learning vs Decision Optimization comparisonMachine Learning vs Decision Optimization comparison
Machine Learning vs Decision Optimization comparison
Alain Chabrier
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET Journal
 
Introduction to operations research
Introduction to operations researchIntroduction to operations research
Introduction to operations research
Dr. Abdulfatah Salem
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
Shanmugasundaram M
 
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesFeature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Andreas Metzger
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
Christopher Wilson
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
David Raj Kanthi
 
MISY 3331 Advanced Database ConceptsAssignment 3Dr. Sotirios .docx
MISY 3331 Advanced Database ConceptsAssignment 3Dr.  Sotirios .docxMISY 3331 Advanced Database ConceptsAssignment 3Dr.  Sotirios .docx
MISY 3331 Advanced Database ConceptsAssignment 3Dr. Sotirios .docx
altheaboyer
 
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative FilteringIRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET Journal
 
vtu data structures lab manual bcs304 pdf
vtu data structures lab manual bcs304 pdfvtu data structures lab manual bcs304 pdf
vtu data structures lab manual bcs304 pdf
LPSChandana
 
Dad (Data Analysis And Design)
Dad (Data Analysis And Design)Dad (Data Analysis And Design)
Dad (Data Analysis And Design)
Jill Lyons
 
Practical Challenges ML Workflows
Practical Challenges ML WorkflowsPractical Challenges ML Workflows
Practical Challenges ML Workflows
Jenny Midwinter
 

Similar to [DOLAP2023] The Whys and Wherefores of Cubes (20)

Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)Sgd teaching-consulting-10-jan-2009 (1)
Sgd teaching-consulting-10-jan-2009 (1)
 
Classes without Dependencies - UseR 2018
Classes without Dependencies - UseR 2018Classes without Dependencies - UseR 2018
Classes without Dependencies - UseR 2018
 
Analysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approachAnalysis of Educational Robotics activities using a machine learning approach
Analysis of Educational Robotics activities using a machine learning approach
 
E05312426
E05312426E05312426
E05312426
 
BDS_QA.pdf
BDS_QA.pdfBDS_QA.pdf
BDS_QA.pdf
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Customer Segmentation with R - Deep Dive into flexclust
Customer Segmentation with R - Deep Dive into flexclustCustomer Segmentation with R - Deep Dive into flexclust
Customer Segmentation with R - Deep Dive into flexclust
 
Machine Learning vs Decision Optimization comparison
Machine Learning vs Decision Optimization comparisonMachine Learning vs Decision Optimization comparison
Machine Learning vs Decision Optimization comparison
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
 
Introduction to operations research
Introduction to operations researchIntroduction to operations research
Introduction to operations research
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive ServicesFeature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
Feature Model-Guided Online Reinforcement Learning for Self-Adaptive Services
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
MISY 3331 Advanced Database ConceptsAssignment 3Dr. Sotirios .docx
MISY 3331 Advanced Database ConceptsAssignment 3Dr.  Sotirios .docxMISY 3331 Advanced Database ConceptsAssignment 3Dr.  Sotirios .docx
MISY 3331 Advanced Database ConceptsAssignment 3Dr. Sotirios .docx
 
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative FilteringIRJET- Boosting Response Aware Model-Based Collaborative Filtering
IRJET- Boosting Response Aware Model-Based Collaborative Filtering
 
vtu data structures lab manual bcs304 pdf
vtu data structures lab manual bcs304 pdfvtu data structures lab manual bcs304 pdf
vtu data structures lab manual bcs304 pdf
 
Dad (Data Analysis And Design)
Dad (Data Analysis And Design)Dad (Data Analysis And Design)
Dad (Data Analysis And Design)
 
Practical Challenges ML Workflows
Practical Challenges ML WorkflowsPractical Challenges ML Workflows
Practical Challenges ML Workflows
 

More from University of Bologna

Data models in precision agriculture: from IoT to big data analytics
Data models in precision agriculture: from IoT to big data analyticsData models in precision agriculture: from IoT to big data analytics
Data models in precision agriculture: from IoT to big data analytics
University of Bologna
 
[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...
[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...
[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...
University of Bologna
 
[DataPlat2023] Opening
[DataPlat2023] Opening[DataPlat2023] Opening
[DataPlat2023] Opening
University of Bologna
 
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
University of Bologna
 
[EDBT2021] Conversational OLAP in Action (Best Demo Award EDBT2021)
[EDBT2021] Conversational OLAP in Action (Best Demo Award EDBT2021)[EDBT2021] Conversational OLAP in Action (Best Demo Award EDBT2021)
[EDBT2021] Conversational OLAP in Action (Best Demo Award EDBT2021)
University of Bologna
 
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
University of Bologna
 
[DOLAP2020] Towards Conversational OLAP
[DOLAP2020] Towards Conversational OLAP[DOLAP2020] Towards Conversational OLAP
[DOLAP2020] Towards Conversational OLAP
University of Bologna
 
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
University of Bologna
 
[DOLAP2019] Augmented Business Intelligence
[DOLAP2019] Augmented Business Intelligence[DOLAP2019] Augmented Business Intelligence
[DOLAP2019] Augmented Business Intelligence
University of Bologna
 

More from University of Bologna (9)

Data models in precision agriculture: from IoT to big data analytics
Data models in precision agriculture: from IoT to big data analyticsData models in precision agriculture: from IoT to big data analytics
Data models in precision agriculture: from IoT to big data analytics
 
[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...
[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...
[EDBT2023] Describing and Assessing Cubes Through Intentional Analytics (demo...
 
[DataPlat2023] Opening
[DataPlat2023] Opening[DataPlat2023] Opening
[DataPlat2023] Opening
 
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
[PhDThesis2021] - Augmenting the knowledge pyramid with unconventional data a...
 
[EDBT2021] Conversational OLAP in Action (Best Demo Award EDBT2021)
[EDBT2021] Conversational OLAP in Action (Best Demo Award EDBT2021)[EDBT2021] Conversational OLAP in Action (Best Demo Award EDBT2021)
[EDBT2021] Conversational OLAP in Action (Best Demo Award EDBT2021)
 
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
[SEBD2020] OLAP Querying of Document Stores in the Presence of Schema Variety
 
[DOLAP2020] Towards Conversational OLAP
[DOLAP2020] Towards Conversational OLAP[DOLAP2020] Towards Conversational OLAP
[DOLAP2020] Towards Conversational OLAP
 
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
[MIPRO2019] Map-Matching on Big Data: a Distributed and Efficient Algorithm w...
 
[DOLAP2019] Augmented Business Intelligence
[DOLAP2019] Augmented Business Intelligence[DOLAP2019] Augmented Business Intelligence
[DOLAP2019] Augmented Business Intelligence
 

Recently uploaded

VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
satpalsheravatmumbai
 
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
sheetal singh$A17
 
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdfCMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
IndranilDasgupta19
 
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
revolutionary575
 
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
kuldeepsharmaks8120
 
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
kinni singh$A17
 
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
NABLAS株式会社
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
uapta
 
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
sukaniyasunnu
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
huseindihon
 
PyData Sofia May 2024 - Intro to Apache Arrow
PyData Sofia May 2024 - Intro to Apache ArrowPyData Sofia May 2024 - Intro to Apache Arrow
PyData Sofia May 2024 - Intro to Apache Arrow
Uwe Korn
 
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
sheetal singh$A17
 
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2) hhh (1) (2) (5) (1) (1).pdf
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2)  hhh (1) (2) (5) (1) (1).pdfFINAL PROJECT WORK PORTFOLIO MANAGEMENT (2)  hhh (1) (2) (5) (1) (1).pdf
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2) hhh (1) (2) (5) (1) (1).pdf
bala krishna
 
Potential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriatePotential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriate
huseindihon
 
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
gargjiya84
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
Kanchana Weerasinghe
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
erynsouthern
 
Cyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & PricingCyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & Pricing
BaraDaniel1
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
revolutionary575
 
Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...
kittycrispy617
 

Recently uploaded (20)

VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
 
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
 
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdfCMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
 
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
Celebrity Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servi...
 
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...🚂🚘 Premium Girls Call Guwahati  🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
🚂🚘 Premium Girls Call Guwahati 🛵🚡000XX00000 💃 Choose Best And Top Girl Servi...
 
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
 
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
社内勉強会資料_TransNeXt: Robust Foveal Visual Perception for Vision Transformers
 
DU degree offer diploma Transcript
DU degree offer diploma TranscriptDU degree offer diploma Transcript
DU degree offer diploma Transcript
 
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
 
PyData Sofia May 2024 - Intro to Apache Arrow
PyData Sofia May 2024 - Intro to Apache ArrowPyData Sofia May 2024 - Intro to Apache Arrow
PyData Sofia May 2024 - Intro to Apache Arrow
 
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
 
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2) hhh (1) (2) (5) (1) (1).pdf
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2)  hhh (1) (2) (5) (1) (1).pdfFINAL PROJECT WORK PORTFOLIO MANAGEMENT (2)  hhh (1) (2) (5) (1) (1).pdf
FINAL PROJECT WORK PORTFOLIO MANAGEMENT (2) hhh (1) (2) (5) (1) (1).pdf
 
Potential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriatePotential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriate
 
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
 
Cyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & PricingCyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & Pricing
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
 
Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...
 

[DOLAP2023] The Whys and Wherefores of Cubes

  • 1. DOLAP@EDBT/ICDT 2023 The Whys and Wherefores of Cubes Matteo Francia1, Stefano Rizzi1, Patrick Marcel2 1DISI, University of Bologna, Italy 2LIFAT, University of Tours, France DOLAP 2023: 25th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data
  • 2. DOLAP@EDBT/ICDT 2023 Intentional Analytics Model Context: Intentional Analytics Model (IAM) [1] - Facilitate OLAP analysis of multidimensional cubes - Escape from query answers as plain tables Express high-level intentions, not queries - Describe, Assess, Explain, etc. Get cubes enhanced with insights - Apply (mining/ML) models to data - Return interesting insights Explain: finding interesting relationships in cube facts - Data exploration: automatically extracts meaningful relationships from facts - Validating user’s belief: check if known relationships hold - In agriculture, the quantity of potassium is correlated with the quality of Kiwifruits. Do facts confirm this belief? Matteo Francia – University of Bologna 2 [1] Panos Vassiliadis, Patrick Marcel, Stefano Rizzi: Beyond roll-up's and drill-down's: An intentional analytics model to reinvent OLAP. Inf. Syst. 85: 68-91 (2019)
  • 3. DOLAP@EDBT/ICDT 2023 Classical OLAP Case study: - Given the cube of Sales - Explain monthly revenue against cost and quantity If we had to do this in plain OLAP - Query the cube, get a plain table - Manually identify interesting patterns But… - What if we have thousands of cells? - What if we have many measures? - Can we have an effective representation? Matteo Francia – University of Bologna 3 select month, sum(quantity), sum(cost), sum(revenue) from sales_ft join date_dt on (…) group by month product type category customer gender store city country date month year quantity revenue cost SALES month cost quantity revenue 125 10 12 125 132 20 14 150 12 30 10 60 15 40 5 15 50 50 9 50
  • 4. DOLAP@EDBT/ICDT 2023 Intentional OLAP: Explain `Explain` intention: with cube explain m [ for P ] by l1,…,ln [ against m1, ..., mr ] “Explained” measure: m Selection predicate: P (consider all facts if omitted) Group-by set: l1,…,ln (at least one level) Measures: m1, ..., mr (compute against all measures if omitted) Semantics translates into an execution plan i. Execute query for given cube, measures, predicate, group-by set ii. Apply models explaining relationships through components iii. Rank components by interestingness iv. Return effective visualization Matteo Francia – University of Bologna 4 with sales explain revenue by month Analytic dashboard R² = 0.9901 revenue quantity month cost quantity revenue 125 10 12 125 132 20 14 150 12 30 10 60 15 40 5 15 50 50 9 50
  • 5. DOLAP@EDBT/ICDT 2023 Model Models are “types” of relationships hiding in the cube facts - Are made of components, each being a specific relationship… - … computed on levels/members/measures To give a proof-of-concept, we restrict to consider - A single model: polynomial regression - Each component is a polynomial relationship between a pair of measures (univariate regression) - The dependent variable revenue is modeled as an dth degree polynomial in the independent variable (e.g., quantity) Matteo Francia – University of Bologna 5 R² = 0.9901 revenue quantity R² = 0.6524 revenue cost Model: Polynomial regression A component (revenue, quantity) Another component (revenue, cost) with sales explain revenue by month
  • 6. DOLAP@EDBT/ICDT 2023 Components Each component is a polynomial relationship αd ( ) between a pair of measures - How to choose the “best” polynomial and avoid overfitting? - E.g., consider revenue = αd (𝑐𝑜𝑠𝑡) We need an error function weighting the degree (d): fact αd fact.m −fact.m 2 facts −d −1 - αd ( ) is the polynomial with degree d fitted with OrdinaryLeastSquares method - The error is computed against a test set containing 30% of the facts Matteo Francia – University of Bologna 6 Too simple (high error, low polynomial degree) Too complex (lower error, higher degree)
  • 7. DOLAP@EDBT/ICDT 2023 Computing components Matteo Francia – University of Bologna 7 Start with d=0 and fit the polynomial
  • 8. DOLAP@EDBT/ICDT 2023 Iterate: - Increase the degree… - … until we find a minimum of the error To ensure training on “sufficient” facts - Apply the one-to-ten rule of thumb d=1 d=2 d=3 Computing components Matteo Francia – University of Bologna 8
  • 9. DOLAP@EDBT/ICDT 2023 Computing components Matteo Francia – University of Bologna 9 Iterate: - Increase the degree… - … until we find a minimum of the error d=2
  • 10. DOLAP@EDBT/ICDT 2023 Computing components Matteo Francia – University of Bologna 10 Iterate: - Increase the degree… - … until we find a minimum of the error d=2 This could be a local minimum, but we prefer to return a simpler model • y = α2 x = a + bx + cx2 • y’ = α4 x = a + bx + … + ex4
  • 11. DOLAP@EDBT/ICDT 2023 Interestingness GOAL: given components, return the most interesting one Interestingness: how variation in the dependent variable is predictable from the independent variable - This is encoded by the coefficient of determination R2 - The better the model, the closer the value of R2 to 1 Matteo Francia – University of Bologna 11 R² = 0.9901 revenue quantity R² = 0.6524 revenue cost Model: Polynomial regression with sales explain revenue by month R² = 0.9901 revenue quantity month cost quantity revenue 125 10 12 125 132 20 14 150 12 30 10 60 15 40 5 15 50 50 9 50
  • 12. DOLAP@EDBT/ICDT 2023 Visualization Matteo Francia – University of Bologna 12 Matteo Francia, Matteo Golfarelli, Stefano Rizzi. Describing and Assessing Cubes Through Intentional Analytics. EDBT 2023 (demo) Notebook-like interface
  • 13. DOLAP@EDBT/ICDT 2023 (b) Computing on 106 facts (Synth. dataset) scales linearly wrt the measures in the cube Evaluation (a) Computing the results on ~90K facts (Foodmart dataset) takes 0.5 seconds Matteo Francia – University of Bologna 13 Implemented in Python with numpy and sk-learn libraries - The tests were run on an Intel(R) Core(TM)i7-6700 CPU@3.40GHz CPU with 8GB RAM https://github.com/big-unibo/explain
  • 14. DOLAP@EDBT/ICDT 2023 Discussion Overall, this paper is not about: - (Polynomial) Regression optimization - “Yet Another” explainability approach We propose a modular framework where approaches to aggregate data explanation can be plugged - Regression: return relationships between a dependent variable and one or more independent variables [4] - Data lineage: which database tuple(s) caused that output to the query? [1] - Intervention: an input is a cause to an output if a change affects the output [2, 3] The added value is in the IAM paradigm and augmented analytics - Data scientists can express high-level intentions… - … and the system (automatically) selects the most interesting explanations - … coupled with data and visualization 14 [1] Alexandra Meliou et al. 2010. The Complexity of Causality and Responsibility for Query Answers and non-Answers. VLDB [2] Sudeepa Roy et al. 2014. A formal approach to finding explanations for database queries. SIGMOD [3] Zhengjie Miao et al. 2019. LensXPlain: Visualizing and Explaining Contributing Subsets for Aggregate Query Answers. VLDB [4] Fotis Savva et al. 2018. Explaining Aggregates for Exploratory Analytics. BigData. https://xkcd.com/605/
  • 15. DOLAP@EDBT/ICDT 2023 Conclusion & research directions We have given a proof-of-concept for explain intentions - Syntax is flexible enough to suit users who wish to verify a specific hypothesis they made - Intention processing takes a few seconds even on very large query results - Performances are in line with the interactivity requirements of OLAP sessions Future research directions - Explain relationships between a measure and two or more other measures (e.g., multivariate regression) - Evaluate the effectiveness of the approach by experimenting it with real users - Generalize the definition of model to cope with additional model types from the literature - Experiment other interestingness metrics - Conciseness: large explanations will probably be not well understandable - Interpretability: the suitability of an explanation will depend on the target users - Actionability: explanations should point to actionable suggestions Matteo Francia – University of Bologna 15
  • 16. DOLAP@EDBT/ICDT 2023 Questions? Matteo Francia – University of Bologna 16 Thank you.

Editor's Notes

  1. The Intentional Analytics Model (IAM) has been envisioned as a way to tightly couple OLAP and analytics by (i) letting users explore multidimensional cubes stating their intentions, and (ii) returning multidimensional data coupled with knowledge in- sights in the form of annotations of subsets of data
  2. average squared difference between the observed and predicted values. When a model has no error, the MSE equals zero.