SlideShare a Scribd company logo
DATA CUBE
COMPUTATION
Why data cube computation is needed?
• To retrieve the information from the data cube in the most efficient way
possible.

• Queries run on the cube will be fast.
Cube Materialization(precomputation)
Different Data Cube materialization include
1. Full cube
2. Iceberg cube
3. Closed cube
4. Shell cube
The Full cube
• The multi way array aggregation method computes full data cube by using
a multidimensional array as its basic data structure
1. Partition array into the chunks
2. Compute aggregate by visiting (i.e. accessing the values at) cube cells

Advantage

the queries run on the cube will be very fast.
Disadvantage

pre-computed cube requires a lot of memory.
An Iceberg-Cube
• contains only those cells of the data cube that meet an aggregate
condition.
• It is called an Iceberg-Cube because it contains only some of the cells of
the full cube, like the tip of an iceberg.
• The purpose of the Iceberg-Cube is to identify and compute only those
values that will most likely be required for decision support queries.
• The aggregate condition specifies which cube values are more
meaningful and should therefore be stored.
• This is one solution to the problem of computing versus storing data
cubes.
Advantage:

pre-compute only those cells in the cube which will most likely be used for
decision support queries.
A Closed Cube
A closed cube is a data cube consisting of only closed cells

Shell Cube
we can choose to precompute only portions or fragments of the
cube shell, based on cuboids of interest.
General strategies for data cube
computation
1. Sorting hashing and grouping
2. Simultaneous aggregation and caching intermediate
results
3. Aggregation from smallest child when there exist
multiple child cuboid
4. The Apriori pruning method can be explored to compute
iceberg cube efficiently
1. Sorting, hashing and grouping.
These operations facilitate aggregation, i.e. computation of the cells that share
the same set of dimension values.

These techniques can also perform:
o shared-sorts: sharing sorting costs across multiple cuboids
o share-partitions: sharing partitioning costs across multiple cuboids

Example:
To compute total sales by branch, day, and item, it is more efficient to sort tuples
or cells by branch, and then by day, and then group them according to the item
name.
2. Simultaneous aggregation and caching intermediate
results.
Reduce expensive disk I/O operations by computing higher-level
group-bys from computed lower-level group-bys.
These techniques can also perform:
o Amortized-scans: computing as many cuboids as possible at the
same time to reduce disk reads

Example:
To compute sales by branch, we can use the intermediate results derived from the
computation of a lower-level cuboid, such as sales by branch and day.
3. Aggregation from the smallest child.
If a parent ‘cuboid’ has more than one child, it is efficient to compute it
from the smallest previously computed child ‘cuboid’.
Example:
To compute a sales cuboid, Cbranch, when there exist two previously computed
cuboids, C{branch,year} and C{branch,tem}, it is obviously more efficient to compute
Cbranch from the former than from the latter if there are many more distinct items
than distinct years.
4. The Apriori pruning method can be explored to compute
iceberg cube efficiently
The Apriori property, in the context of data cubes, states as follow:
If given cell does not satisfy minimum support, then no descendant (i.e. more
specialized or detailed version ) of the cell will satisfy minimum support either.
This property can be used to substantially reduce the computation of iceberg
cubes.
Example:
Notice that because cell (a2, b2) is empty, it can be effectively discarded in
subsequent computations, based on the Apriori property.
Thank You…………

More Related Content

What's hot

Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningSlideshare
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
Krish_ver2
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
lavanya marichamy
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
DataminingTools Inc
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architectures
Pooja Dixit
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
Krish_ver2
 
Dynamic Itemset Counting
Dynamic Itemset CountingDynamic Itemset Counting
Dynamic Itemset Counting
Tarat Diloksawatdikul
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
Saikiran Panjala
 
Tree pruning
 Tree pruning Tree pruning
Tree pruning
Shivangi Gupta
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
parry prabhu
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
Valerii Klymchuk
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
Krish_ver2
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
DataminingTools Inc
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
rajshreemuthiah
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
GauravBiswas9
 
Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.
Meghaj Mallick
 
Deductive databases
Deductive databasesDeductive databases
Deductive databases
Dabbal Singh Mahara
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
Krish_ver2
 
System models in distributed system
System models in distributed systemSystem models in distributed system
System models in distributed system
ishapadhy
 

What's hot (20)

Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Distributed dbms architectures
Distributed dbms architecturesDistributed dbms architectures
Distributed dbms architectures
 
5.2 mining time series data
5.2 mining time series data5.2 mining time series data
5.2 mining time series data
 
Dynamic Itemset Counting
Dynamic Itemset CountingDynamic Itemset Counting
Dynamic Itemset Counting
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
Tree pruning
 Tree pruning Tree pruning
Tree pruning
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
lazy learners and other classication methods
lazy learners and other classication methodslazy learners and other classication methods
lazy learners and other classication methods
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
 
Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.
 
Deductive databases
Deductive databasesDeductive databases
Deductive databases
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
System models in distributed system
System models in distributed systemSystem models in distributed system
System models in distributed system
 

Viewers also liked

OLAP Cubes: Basic operations
OLAP Cubes: Basic operationsOLAP Cubes: Basic operations
OLAP Cubes: Basic operations
Sthefan Berwanger
 
MS SQL SERVER: Olap cubes and data mining
MS SQL SERVER: Olap cubes and data miningMS SQL SERVER: Olap cubes and data mining
MS SQL SERVER: Olap cubes and data mining
DataminingTools Inc
 
Olap Cube Design
Olap Cube DesignOlap Cube Design
Olap Cube Design
h1m
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
Zalpa Rathod
 
How I data mined my text message history
How I data mined my text message historyHow I data mined my text message history
How I data mined my text message historyJoe Cannatti Jr.
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
Daniel JACOB
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Salah Amean
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
Dung Nguyen
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
Datamining Tools
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Salah Amean
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
Krish_ver2
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Data visualization
Data visualizationData visualization
Data visualization
Jan Willem Tulp
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
idnats
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
Krish_ver2
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
DataminingTools Inc
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
Prakash Pimpale
 
Multi dimensional model vs (1)
Multi dimensional model vs (1)Multi dimensional model vs (1)
Multi dimensional model vs (1)
JamesDempsey1
 

Viewers also liked (20)

OLAP Cubes: Basic operations
OLAP Cubes: Basic operationsOLAP Cubes: Basic operations
OLAP Cubes: Basic operations
 
MS SQL SERVER: Olap cubes and data mining
MS SQL SERVER: Olap cubes and data miningMS SQL SERVER: Olap cubes and data mining
MS SQL SERVER: Olap cubes and data mining
 
Olap Cube Design
Olap Cube DesignOlap Cube Design
Olap Cube Design
 
Datacube
DatacubeDatacube
Datacube
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
How I data mined my text message history
How I data mined my text message historyHow I data mined my text message history
How I data mined my text message history
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Data visualization
Data visualizationData visualization
Data visualization
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Multi dimensional model vs (1)
Multi dimensional model vs (1)Multi dimensional model vs (1)
Multi dimensional model vs (1)
 

Similar to Data cube computation

Lecture 8 is for best and you should read
Lecture 8 is for best and you should readLecture 8 is for best and you should read
Lecture 8 is for best and you should read
centralcollegepkr
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5 Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Salah Amean
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
Datamining Tools
 
Chapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptChapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.ppt
Subrata Kumer Paul
 
Comparison between cube techniques
Comparison between cube techniquesComparison between cube techniques
Comparison between cube techniques
ijsrd.com
 
Birch1
Birch1Birch1
Advanced Trees
Advanced TreesAdvanced Trees
Advanced Trees
Selvaraj Seerangan
 
Analysis of Allocation Algorithms in Memory Management
Analysis of Allocation Algorithms in Memory ManagementAnalysis of Allocation Algorithms in Memory Management
Analysis of Allocation Algorithms in Memory Management
ijtsrd
 
Introduction to Bizur
Introduction to BizurIntroduction to Bizur
Introduction to Bizur
Akira Hayakawa
 
A so common questions and answers
A so common questions and answersA so common questions and answers
A so common questions and answersAmit Sharma
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit
 
Understanding DLmalloc
Understanding DLmallocUnderstanding DLmalloc
Understanding DLmalloc
Haifeng Li
 
4 memory management bb
4   memory management bb4   memory management bb
4 memory management bb
Shahid Riaz
 
DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering Types
Ashwin Shenoy M
 
Data Warehouse Implementation
Data Warehouse ImplementationData Warehouse Implementation
Data Warehouse Implementation
omayva
 

Similar to Data cube computation (20)

Lecture 8 is for best and you should read
Lecture 8 is for best and you should readLecture 8 is for best and you should read
Lecture 8 is for best and you should read
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5 Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
 
05 cubetech
05 cubetech05 cubetech
05 cubetech
 
Chapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptChapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.ppt
 
datacub
datacubdatacub
datacub
 
Comparison between cube techniques
Comparison between cube techniquesComparison between cube techniques
Comparison between cube techniques
 
Birch1
Birch1Birch1
Birch1
 
Advanced Trees
Advanced TreesAdvanced Trees
Advanced Trees
 
Analysis of Allocation Algorithms in Memory Management
Analysis of Allocation Algorithms in Memory ManagementAnalysis of Allocation Algorithms in Memory Management
Analysis of Allocation Algorithms in Memory Management
 
Birch
BirchBirch
Birch
 
Optimization in essbase
Optimization in essbaseOptimization in essbase
Optimization in essbase
 
Introduction to Bizur
Introduction to BizurIntroduction to Bizur
Introduction to Bizur
 
A so common questions and answers
A so common questions and answersA so common questions and answers
A so common questions and answers
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
 
05 cubetech
05 cubetech05 cubetech
05 cubetech
 
Understanding DLmalloc
Understanding DLmallocUnderstanding DLmalloc
Understanding DLmalloc
 
4 memory management bb
4   memory management bb4   memory management bb
4 memory management bb
 
DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering Types
 
Data Warehouse Implementation
Data Warehouse ImplementationData Warehouse Implementation
Data Warehouse Implementation
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 

Data cube computation

  • 2. Why data cube computation is needed? • To retrieve the information from the data cube in the most efficient way possible. • Queries run on the cube will be fast.
  • 3. Cube Materialization(precomputation) Different Data Cube materialization include 1. Full cube 2. Iceberg cube 3. Closed cube 4. Shell cube
  • 4. The Full cube • The multi way array aggregation method computes full data cube by using a multidimensional array as its basic data structure 1. Partition array into the chunks 2. Compute aggregate by visiting (i.e. accessing the values at) cube cells Advantage the queries run on the cube will be very fast. Disadvantage pre-computed cube requires a lot of memory.
  • 5. An Iceberg-Cube • contains only those cells of the data cube that meet an aggregate condition. • It is called an Iceberg-Cube because it contains only some of the cells of the full cube, like the tip of an iceberg. • The purpose of the Iceberg-Cube is to identify and compute only those values that will most likely be required for decision support queries. • The aggregate condition specifies which cube values are more meaningful and should therefore be stored. • This is one solution to the problem of computing versus storing data cubes. Advantage: pre-compute only those cells in the cube which will most likely be used for decision support queries.
  • 6. A Closed Cube A closed cube is a data cube consisting of only closed cells Shell Cube we can choose to precompute only portions or fragments of the cube shell, based on cuboids of interest.
  • 7. General strategies for data cube computation 1. Sorting hashing and grouping 2. Simultaneous aggregation and caching intermediate results 3. Aggregation from smallest child when there exist multiple child cuboid 4. The Apriori pruning method can be explored to compute iceberg cube efficiently
  • 8. 1. Sorting, hashing and grouping. These operations facilitate aggregation, i.e. computation of the cells that share the same set of dimension values. These techniques can also perform: o shared-sorts: sharing sorting costs across multiple cuboids o share-partitions: sharing partitioning costs across multiple cuboids Example: To compute total sales by branch, day, and item, it is more efficient to sort tuples or cells by branch, and then by day, and then group them according to the item name.
  • 9. 2. Simultaneous aggregation and caching intermediate results. Reduce expensive disk I/O operations by computing higher-level group-bys from computed lower-level group-bys. These techniques can also perform: o Amortized-scans: computing as many cuboids as possible at the same time to reduce disk reads Example: To compute sales by branch, we can use the intermediate results derived from the computation of a lower-level cuboid, such as sales by branch and day.
  • 10. 3. Aggregation from the smallest child. If a parent ‘cuboid’ has more than one child, it is efficient to compute it from the smallest previously computed child ‘cuboid’. Example: To compute a sales cuboid, Cbranch, when there exist two previously computed cuboids, C{branch,year} and C{branch,tem}, it is obviously more efficient to compute Cbranch from the former than from the latter if there are many more distinct items than distinct years.
  • 11. 4. The Apriori pruning method can be explored to compute iceberg cube efficiently The Apriori property, in the context of data cubes, states as follow: If given cell does not satisfy minimum support, then no descendant (i.e. more specialized or detailed version ) of the cell will satisfy minimum support either. This property can be used to substantially reduce the computation of iceberg cubes.
  • 13. Notice that because cell (a2, b2) is empty, it can be effectively discarded in subsequent computations, based on the Apriori property.