SlideShare a Scribd company logo
1 of 19
Download to read offline
COURSE NAME:
DATA WAREHOUSING & DATA MINING
LECTURE 8
TOPICS TO BE COVERED:
 Complex aggregation at multiple granularities
DATA GENERALIZATION
 Data generalization is a process that abstracts a large set
of task-relevant data in a database from a relatively low
conceptual level to higher conceptual levels.
 Data generalization approaches include data cube–based
data aggregation and attribute oriented induction.
 From a data analysis point of view, data generalization is a
form of descriptive data mining. Descriptive data mining
describes data in a concise and summarative manner and
presents interesting general properties of the data.
EFFICIENT METHODS FOR DATA CUBE
COMPUTATION
 A data cube consists of a lattice of cuboids. Each cuboid
corresponds to a different degree of summarization of
the given multidimensional data.
 Data cube computation is an essential task in data
warehouse implementation. The precomputation of all or
part of a data cube can greatly reduce the response time
and enhance the performance of on-line analytical
processing.
CUBE MATERIALIZATION
• Full Cube
• Iceberg Cube
• Closed Cube
• Shell Cube
FULL CUBE
 Full Cube:-all the cells of all of the cuboids for a
given data cube. Thus, precomputation of the full
cube can require huge and often excessive
amounts of memory.
 Full materialization refers to the computation of all
of the cuboids in a data cube lattice.
ICEBERG CUBE
• Iceberg Cube:-partially materialized cubes are
known as iceberg cubes. The minimum threshold is
called the minimum support threshold, or minimum
support(min sup)
• Partial materialization refers to the selective
computation of a subset of the cuboid cells in the
lattice. Iceberg cubes and shell fragments are
examples of partial materialization.
• An iceberg cube is a data cube that stores only
those cube cells whose aggregate value (e.g., count)
is above some minimum support threshold.
ICEBERG CUBE(EXAMPLE)
 An iceberg cube can be specified with an
SQL query, as shown in the following
example.
compute cube sales iceberg as
select month, city, customer group, count(*)
from salesInfo
cube by month, city, customer group
having count(*) >= min sup
CLOSED CUBE
• To systematically compress a data cube, we need closed cell.
• A closed cube is a data cube consisting of only closed cells.
• A cell, c, is a closed cell if there exists no cell, d, such that d is a
specialization (descendant) of cell c (i.e, where d is obtained by
replacing a in c with a non- value), and d has the same measure
value as c.
• For example, the three cells derived above are the three closed
cells of the data cube for the data set: {(a1, a2, a3, - - - -, a100) :
10, (a1, a2, b3, - - - -, b100) : 10}.They form the lattice of a closed
cube as shown in Figure. Other nonclosed cells can be derived
from their corresponding closed cells in this lattice. For example,
“(a1, * ,*, - - - - , *) : 20” can be derived from “(a1, a2, - - - -,* ) :
20” because the former is a generalized nonclosed cell of the
latter. Similarly, we have “(a1, a2, b3, *,- - - -,* ) : 10”.
CLOSED CELL
SHELL CUBE
 Another strategy for partial materialization is to
precompute only the cuboids involving a small number
of dimensions, These cuboids form a cube shell for the
corresponding data cube
 For shell fragments of a data cube, only some cuboids
involving a small number of dimensions are computed.
Queries on additional ombinations of the dimensions
can be computed on the fly.
OPTIMIZATION TECHNIQUES FOR EFFICIENT
COMPUTATION OF DATA CUBES
 Following are general optimization techniques for the efficient computation of
data cubes.
1. Sorting, hashing, and grouping.
2. Simultaneous aggregation and caching intermediate
results.
3. Aggregation from the smallest child, when there exist
multiple child cuboids.
4. The Apriori pruning method can be explored to compute
iceberg cubes efficiently.
DATA CUBE COMPUTATION METHODS
1. MultiWay array aggregation for full data computation.
2. BUC :Computing iceberg cubes by exploring ordering and
sorting for efficient top-down computation.
3. Star-Cubing for integration of top-down and bottom-up
computation using a star-tree structure.
4. High-dimensional OLAP by precomputing only the
partitioned shell fragments.
MULTIWAY ARRAY AGGREGATION FOR
FULL DATA COMPUTATION
 The Multiway Array Aggregation (or simply MultiWay) method
computes a full data cube by using a multidimensional array as its basic
data structure. It is a typical MOLAP approach that uses direct array
addressing, where dimension values are accessed via the position or
index of their corresponding array locations. Hence, MultiWay cannot
perform any value-based reordering as an optimization technique.
 A different approach is developed for the array-based cube construction,
as follows:
 Partition the array into chunks
 Compute aggregates by visiting cube cells
PARTITION THE ARRAY INTO CHUNKS
 Partition the array into chunks: A chunk is a subcube that is small
enough to fit into the memory available for cube computation. Chunking
is a method for dividing an n-dimensional array into small n-dimensional
chunks, where each chunk is stored as an object on disk. The chunks
are compressed so as to remove wasted space resulting from empty
array cells (i.e., cells that do not contain any valid data, whose cell count
is zero). For instance, “chunkID + offset” can be used as a cell
addressing mechanism to compress a sparse array structure and when
searching for cells within a chunk. Such a compression technique is
powerful enough to handle sparse cubes, both on disk and in memory.
Multi-way Array Aggregation for Cube
Computation
17
MULTI-WAY ARRAY AGGREGATION FOR
CUBE COMPUTATION
 Partition arrays into chunks (a small subcube which fits in memory).
 Compressed sparse array addressing: (chunk_id, offset)
 Compute aggregates in “multiway” by visiting cube cells in the order
which minimizes the # of times to visit each cell, and reduces memory
access and storage cost.
A
B
29 30 31 32
1 2 3 4
5
9
13 14 15 16
64
63
62
61
48
47
46
45
a1
a0
c3
c2
c1
c 0
b3
b2
b1
b0
a2 a3
C
B
44
28 56
40
24 52
36
20
60
18
MULTI-WAY ARRAY AGGREGATION
FOR CUBE COMPUTATION
A
B
29 30 31 32
1 2 3 4
5
9
13 14 15 16
64
63
62
61
48
47
46
45
a1
a0
c3
c2
c1
c 0
b3
b2
b1
b0
a2 a3
C
44
28 56
40
24 52
36
20
60
B
19
MULTI-WAY ARRAY AGGREGATION
FOR CUBE COMPUTATION
A
B
29 30 31 32
1 2 3 4
5
9
13 14 15 16
64
63
62
61
48
47
46
45
a1
a0
c3
c2
c1
c 0
b3
b2
b1
b0
a2 a3
C
44
28 56
40
24 52
36
20
60
B

More Related Content

Similar to Lecture 8 is for best and you should read

CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfSowmyaJyothi3
 
Data Warehouse Implementation
Data Warehouse ImplementationData Warehouse Implementation
Data Warehouse Implementationomayva
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxSaiPragnaKancheti
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxSaiPragnaKancheti
 
New proximity estimate for incremental update of non uniformly distributed cl...
New proximity estimate for incremental update of non uniformly distributed cl...New proximity estimate for incremental update of non uniformly distributed cl...
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
 
Lecture8 clustering
Lecture8 clusteringLecture8 clustering
Lecture8 clusteringsidsingh680
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.pptImXaib
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.pptRajeshT305412
 
Column store databases approaches and optimization techniques
Column store databases  approaches and optimization techniquesColumn store databases  approaches and optimization techniques
Column store databases approaches and optimization techniquesIJDKP
 

Similar to Lecture 8 is for best and you should read (20)

datacub
datacubdatacub
datacub
 
Datacube
DatacubeDatacube
Datacube
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
Data Warehouse Implementation
Data Warehouse ImplementationData Warehouse Implementation
Data Warehouse Implementation
 
Birch1
Birch1Birch1
Birch1
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
 
New proximity estimate for incremental update of non uniformly distributed cl...
New proximity estimate for incremental update of non uniformly distributed cl...New proximity estimate for incremental update of non uniformly distributed cl...
New proximity estimate for incremental update of non uniformly distributed cl...
 
Chapter 5.pdf
Chapter 5.pdfChapter 5.pdf
Chapter 5.pdf
 
Open Source Datawarehouse
Open Source DatawarehouseOpen Source Datawarehouse
Open Source Datawarehouse
 
Lecture8 clustering
Lecture8 clusteringLecture8 clustering
Lecture8 clustering
 
mod 2.pdf
mod 2.pdfmod 2.pdf
mod 2.pdf
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
 
Cassndra (4).pptx
Cassndra (4).pptxCassndra (4).pptx
Cassndra (4).pptx
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
lecture12-clustering.ppt
lecture12-clustering.pptlecture12-clustering.ppt
lecture12-clustering.ppt
 
clustering.pptx
clustering.pptxclustering.pptx
clustering.pptx
 
Column store databases approaches and optimization techniques
Column store databases  approaches and optimization techniquesColumn store databases  approaches and optimization techniques
Column store databases approaches and optimization techniques
 

Recently uploaded

PORTAFOLIO 2024_ ANASTASIYA KUDINOVA
PORTAFOLIO   2024_  ANASTASIYA  KUDINOVAPORTAFOLIO   2024_  ANASTASIYA  KUDINOVA
PORTAFOLIO 2024_ ANASTASIYA KUDINOVAAnastasiya Kudinova
 
Untitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptxUntitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptxmapanig881
 
Cosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable BricksCosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable Bricksabhishekparmar618
 
Abu Dhabi Call Girls O58993O4O2 Call Girls in Abu Dhabi`
Abu Dhabi Call Girls O58993O4O2 Call Girls in Abu Dhabi`Abu Dhabi Call Girls O58993O4O2 Call Girls in Abu Dhabi`
Abu Dhabi Call Girls O58993O4O2 Call Girls in Abu Dhabi`dajasot375
 
NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...
NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...
NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...Amil baba
 
306MTAMount UCLA University Bachelor's Diploma in Social Media
306MTAMount UCLA University Bachelor's Diploma in Social Media306MTAMount UCLA University Bachelor's Diploma in Social Media
306MTAMount UCLA University Bachelor's Diploma in Social MediaD SSS
 
Introduction-to-Canva-and-Graphic-Design-Basics.pptx
Introduction-to-Canva-and-Graphic-Design-Basics.pptxIntroduction-to-Canva-and-Graphic-Design-Basics.pptx
Introduction-to-Canva-and-Graphic-Design-Basics.pptxnewslab143
 
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree 毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree ttt fff
 
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130Suhani Kapoor
 
在线办理ohio毕业证俄亥俄大学毕业证成绩单留信学历认证
在线办理ohio毕业证俄亥俄大学毕业证成绩单留信学历认证在线办理ohio毕业证俄亥俄大学毕业证成绩单留信学历认证
在线办理ohio毕业证俄亥俄大学毕业证成绩单留信学历认证nhjeo1gg
 
西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造kbdhl05e
 
shot list for my tv series two steps back
shot list for my tv series two steps backshot list for my tv series two steps back
shot list for my tv series two steps back17lcow074
 
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service Amravati
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service AmravatiVIP Call Girl Amravati Aashi 8250192130 Independent Escort Service Amravati
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Architecture case study India Habitat Centre, Delhi.pdf
Architecture case study India Habitat Centre, Delhi.pdfArchitecture case study India Habitat Centre, Delhi.pdf
Architecture case study India Habitat Centre, Delhi.pdfSumit Lathwal
 
(办理学位证)埃迪斯科文大学毕业证成绩单原版一比一
(办理学位证)埃迪斯科文大学毕业证成绩单原版一比一(办理学位证)埃迪斯科文大学毕业证成绩单原版一比一
(办理学位证)埃迪斯科文大学毕业证成绩单原版一比一Fi sss
 
办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一
办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一
办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一z xss
 
ARt app | UX Case Study
ARt app | UX Case StudyARt app | UX Case Study
ARt app | UX Case StudySophia Viganò
 
Design Portfolio - 2024 - William Vickery
Design Portfolio - 2024 - William VickeryDesign Portfolio - 2024 - William Vickery
Design Portfolio - 2024 - William VickeryWilliamVickery6
 

Recently uploaded (20)

PORTAFOLIO 2024_ ANASTASIYA KUDINOVA
PORTAFOLIO   2024_  ANASTASIYA  KUDINOVAPORTAFOLIO   2024_  ANASTASIYA  KUDINOVA
PORTAFOLIO 2024_ ANASTASIYA KUDINOVA
 
Untitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptxUntitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptx
 
Cheap Rate Call girls Kalkaji 9205541914 shot 1500 night
Cheap Rate Call girls Kalkaji 9205541914 shot 1500 nightCheap Rate Call girls Kalkaji 9205541914 shot 1500 night
Cheap Rate Call girls Kalkaji 9205541914 shot 1500 night
 
Cosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable BricksCosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable Bricks
 
Abu Dhabi Call Girls O58993O4O2 Call Girls in Abu Dhabi`
Abu Dhabi Call Girls O58993O4O2 Call Girls in Abu Dhabi`Abu Dhabi Call Girls O58993O4O2 Call Girls in Abu Dhabi`
Abu Dhabi Call Girls O58993O4O2 Call Girls in Abu Dhabi`
 
NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...
NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...
NO1 Famous Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi Add...
 
306MTAMount UCLA University Bachelor's Diploma in Social Media
306MTAMount UCLA University Bachelor's Diploma in Social Media306MTAMount UCLA University Bachelor's Diploma in Social Media
306MTAMount UCLA University Bachelor's Diploma in Social Media
 
Introduction-to-Canva-and-Graphic-Design-Basics.pptx
Introduction-to-Canva-and-Graphic-Design-Basics.pptxIntroduction-to-Canva-and-Graphic-Design-Basics.pptx
Introduction-to-Canva-and-Graphic-Design-Basics.pptx
 
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree 毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130
 
在线办理ohio毕业证俄亥俄大学毕业证成绩单留信学历认证
在线办理ohio毕业证俄亥俄大学毕业证成绩单留信学历认证在线办理ohio毕业证俄亥俄大学毕业证成绩单留信学历认证
在线办理ohio毕业证俄亥俄大学毕业证成绩单留信学历认证
 
西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造
 
shot list for my tv series two steps back
shot list for my tv series two steps backshot list for my tv series two steps back
shot list for my tv series two steps back
 
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service Amravati
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service AmravatiVIP Call Girl Amravati Aashi 8250192130 Independent Escort Service Amravati
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service Amravati
 
Architecture case study India Habitat Centre, Delhi.pdf
Architecture case study India Habitat Centre, Delhi.pdfArchitecture case study India Habitat Centre, Delhi.pdf
Architecture case study India Habitat Centre, Delhi.pdf
 
(办理学位证)埃迪斯科文大学毕业证成绩单原版一比一
(办理学位证)埃迪斯科文大学毕业证成绩单原版一比一(办理学位证)埃迪斯科文大学毕业证成绩单原版一比一
(办理学位证)埃迪斯科文大学毕业证成绩单原版一比一
 
Cheap Rate ➥8448380779 ▻Call Girls In Iffco Chowk Gurgaon
Cheap Rate ➥8448380779 ▻Call Girls In Iffco Chowk GurgaonCheap Rate ➥8448380779 ▻Call Girls In Iffco Chowk Gurgaon
Cheap Rate ➥8448380779 ▻Call Girls In Iffco Chowk Gurgaon
 
办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一
办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一
办理(UC毕业证书)查尔斯顿大学毕业证成绩单原版一比一
 
ARt app | UX Case Study
ARt app | UX Case StudyARt app | UX Case Study
ARt app | UX Case Study
 
Design Portfolio - 2024 - William Vickery
Design Portfolio - 2024 - William VickeryDesign Portfolio - 2024 - William Vickery
Design Portfolio - 2024 - William Vickery
 

Lecture 8 is for best and you should read

  • 2. LECTURE 8 TOPICS TO BE COVERED:  Complex aggregation at multiple granularities
  • 3. DATA GENERALIZATION  Data generalization is a process that abstracts a large set of task-relevant data in a database from a relatively low conceptual level to higher conceptual levels.  Data generalization approaches include data cube–based data aggregation and attribute oriented induction.  From a data analysis point of view, data generalization is a form of descriptive data mining. Descriptive data mining describes data in a concise and summarative manner and presents interesting general properties of the data.
  • 4. EFFICIENT METHODS FOR DATA CUBE COMPUTATION  A data cube consists of a lattice of cuboids. Each cuboid corresponds to a different degree of summarization of the given multidimensional data.  Data cube computation is an essential task in data warehouse implementation. The precomputation of all or part of a data cube can greatly reduce the response time and enhance the performance of on-line analytical processing.
  • 5. CUBE MATERIALIZATION • Full Cube • Iceberg Cube • Closed Cube • Shell Cube
  • 6. FULL CUBE  Full Cube:-all the cells of all of the cuboids for a given data cube. Thus, precomputation of the full cube can require huge and often excessive amounts of memory.  Full materialization refers to the computation of all of the cuboids in a data cube lattice.
  • 7. ICEBERG CUBE • Iceberg Cube:-partially materialized cubes are known as iceberg cubes. The minimum threshold is called the minimum support threshold, or minimum support(min sup) • Partial materialization refers to the selective computation of a subset of the cuboid cells in the lattice. Iceberg cubes and shell fragments are examples of partial materialization. • An iceberg cube is a data cube that stores only those cube cells whose aggregate value (e.g., count) is above some minimum support threshold.
  • 8. ICEBERG CUBE(EXAMPLE)  An iceberg cube can be specified with an SQL query, as shown in the following example. compute cube sales iceberg as select month, city, customer group, count(*) from salesInfo cube by month, city, customer group having count(*) >= min sup
  • 9. CLOSED CUBE • To systematically compress a data cube, we need closed cell. • A closed cube is a data cube consisting of only closed cells. • A cell, c, is a closed cell if there exists no cell, d, such that d is a specialization (descendant) of cell c (i.e, where d is obtained by replacing a in c with a non- value), and d has the same measure value as c. • For example, the three cells derived above are the three closed cells of the data cube for the data set: {(a1, a2, a3, - - - -, a100) : 10, (a1, a2, b3, - - - -, b100) : 10}.They form the lattice of a closed cube as shown in Figure. Other nonclosed cells can be derived from their corresponding closed cells in this lattice. For example, “(a1, * ,*, - - - - , *) : 20” can be derived from “(a1, a2, - - - -,* ) : 20” because the former is a generalized nonclosed cell of the latter. Similarly, we have “(a1, a2, b3, *,- - - -,* ) : 10”.
  • 11. SHELL CUBE  Another strategy for partial materialization is to precompute only the cuboids involving a small number of dimensions, These cuboids form a cube shell for the corresponding data cube  For shell fragments of a data cube, only some cuboids involving a small number of dimensions are computed. Queries on additional ombinations of the dimensions can be computed on the fly.
  • 12. OPTIMIZATION TECHNIQUES FOR EFFICIENT COMPUTATION OF DATA CUBES  Following are general optimization techniques for the efficient computation of data cubes. 1. Sorting, hashing, and grouping. 2. Simultaneous aggregation and caching intermediate results. 3. Aggregation from the smallest child, when there exist multiple child cuboids. 4. The Apriori pruning method can be explored to compute iceberg cubes efficiently.
  • 13. DATA CUBE COMPUTATION METHODS 1. MultiWay array aggregation for full data computation. 2. BUC :Computing iceberg cubes by exploring ordering and sorting for efficient top-down computation. 3. Star-Cubing for integration of top-down and bottom-up computation using a star-tree structure. 4. High-dimensional OLAP by precomputing only the partitioned shell fragments.
  • 14. MULTIWAY ARRAY AGGREGATION FOR FULL DATA COMPUTATION  The Multiway Array Aggregation (or simply MultiWay) method computes a full data cube by using a multidimensional array as its basic data structure. It is a typical MOLAP approach that uses direct array addressing, where dimension values are accessed via the position or index of their corresponding array locations. Hence, MultiWay cannot perform any value-based reordering as an optimization technique.  A different approach is developed for the array-based cube construction, as follows:  Partition the array into chunks  Compute aggregates by visiting cube cells
  • 15. PARTITION THE ARRAY INTO CHUNKS  Partition the array into chunks: A chunk is a subcube that is small enough to fit into the memory available for cube computation. Chunking is a method for dividing an n-dimensional array into small n-dimensional chunks, where each chunk is stored as an object on disk. The chunks are compressed so as to remove wasted space resulting from empty array cells (i.e., cells that do not contain any valid data, whose cell count is zero). For instance, “chunkID + offset” can be used as a cell addressing mechanism to compress a sparse array structure and when searching for cells within a chunk. Such a compression technique is powerful enough to handle sparse cubes, both on disk and in memory.
  • 16. Multi-way Array Aggregation for Cube Computation
  • 17. 17 MULTI-WAY ARRAY AGGREGATION FOR CUBE COMPUTATION  Partition arrays into chunks (a small subcube which fits in memory).  Compressed sparse array addressing: (chunk_id, offset)  Compute aggregates in “multiway” by visiting cube cells in the order which minimizes the # of times to visit each cell, and reduces memory access and storage cost. A B 29 30 31 32 1 2 3 4 5 9 13 14 15 16 64 63 62 61 48 47 46 45 a1 a0 c3 c2 c1 c 0 b3 b2 b1 b0 a2 a3 C B 44 28 56 40 24 52 36 20 60
  • 18. 18 MULTI-WAY ARRAY AGGREGATION FOR CUBE COMPUTATION A B 29 30 31 32 1 2 3 4 5 9 13 14 15 16 64 63 62 61 48 47 46 45 a1 a0 c3 c2 c1 c 0 b3 b2 b1 b0 a2 a3 C 44 28 56 40 24 52 36 20 60 B
  • 19. 19 MULTI-WAY ARRAY AGGREGATION FOR CUBE COMPUTATION A B 29 30 31 32 1 2 3 4 5 9 13 14 15 16 64 63 62 61 48 47 46 45 a1 a0 c3 c2 c1 c 0 b3 b2 b1 b0 a2 a3 C 44 28 56 40 24 52 36 20 60 B