Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
H-Frag:
A Hybrid Memory Data Cube Approach for High Dimension
Relations
Rodrigo Rocha Silva
Doctoral Student
Prof. Dr. Cel...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
217th International Confere...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
317th International Confere...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
417th International Confere...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
517th International Confere...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
617th International Confere...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
717th International Confere...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
817th International Confere...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
917th International Confere...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1017th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1117th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1217th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1317th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1417th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1517th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1617th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1717th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1817th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
1917th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2017th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2117th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2217th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2317th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2417th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2517th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2617th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2717th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2817th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
2917th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3017th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3117th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3217th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3317th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3417th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3517th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3617th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3717th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3817th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
3917th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
4017th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
4117th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
4217th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
4317th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
4417th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
4517th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
4617th International Confer...
Monday, April 27, 2015
H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations
4717th International Confer...
Upcoming SlideShare
Loading in …5
×

A Hybrid Memory Data Cube Approach for High Dimension Relations

594 views

Published on

H-Frag is a method for data cube computation that extends the frag-cubing approach enabling the computation of massive data cubes by making use of external memory, rather than fully relying on the main memory only.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

A Hybrid Memory Data Cube Approach for High Dimension Relations

  1. 1. H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations Rodrigo Rocha Silva Doctoral Student Prof. Dr. Celso Massaki Hirata Advisor Prof. Dr. Joubert de Castro Lima Co-Advisor ITA – AERONAUTICS INSTITUTE OF TECHNOLOGY Electronic Engineering and Computer Science Division - EEC/I Department of Computer Science Brazil
  2. 2. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 217th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva What is H-Frag? Is a method for data cube computation that extends the frag- cubing approach enabling the computation of massive data cubes by making use of external memory, rather than fully relying on the main memory only.
  3. 3. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 317th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Topics – Motivation; – Data Cube; – Frag-Cubing; – H-Frag approach; – Experiments; – Results; – Conclusions;
  4. 4. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 417th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Motivation Users need to view data in a tangible way, reports, cross tables and dashboards are usually the most used tools for visualizing data.
  5. 5. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 517th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Approaches that use inverted indexes indexes, such as Frag-Cubing, are considered efficient in terms of runtime and main memory usage for massive data cube computation and query. • Approaches that use main memory only, are limited when the data cube size exceeds the main memory capacity. Motivation
  6. 6. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 617th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva A data cube has exponential complexity in its runtime and storage space when the number of dimensions increases linearly. Data Cube For an input with size d the output has size 2d Allows the materialization of all or some cells or tuples of a cube, which is represented by measures and dimensions.
  7. 7. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 717th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Data Cube Subjects Department Year Hour Day Year A dimension may contain a hierarchical relation between two or more members. The individual members of a dimension may be hierarchically related to each other.
  8. 8. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 817th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Base Relation R – 11 tuples A B C COUNT a1 b1 c1 1 a3 b3 c2 1 a2 b3 c2 1 a3 b1 c1 1 a2 b1 c1 1 a2 b2 c2 1 a1 b1 c2 1 a2 b2 c1 1 a3 b1 c2 1 a1 b3 c2 1 a2 b1 c2 1 A B C COUNT * * * 11 a1 * * 3 a2 * * 5 a3 * * 3 * b1 * 6 * b2 * 2 * b3 * 3 * * c1 4 * * c2 7 a1 b1 * 2 a1 b3 * 1 a2 b1 * 2 a2 b2 * 2 a2 b3 * 1 a3 b1 * 2 a3 b3 * 1 a1 * c1 1 a1 * c2 2 a2 * c1 2 a2 * c2 3 a3 * c1 1 a3 * c2 2 * b1 c1 3 * b1 c2 3 FULL 3D CUBE A B C COUNT * b2 c1 1 * b2 c2 1 * b3 c2 3 a1 b1 c1 1 a3 b3 c2 1 a2 b3 c2 1 a3 b1 c1 1 a2 b1 c1 1 a2 b2 c2 1 a1 b1 c2 1 a2 b2 c1 1 a3 b1 c2 1 a1 b3 c2 1 a2 b1 c2 1 + 38 aggregations Data Cube Construction of a complete data cube is an exponential problem
  9. 9. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 917th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Related Work – Frag-Cubing Approach • Splits data vertically; • Reduces high-dimensional cube into cuboids of lower dimension; • Offers tradeoffs between the data cube computation runtime and the pre-processing of aggregations; …FEDCBA CUBE ABC CUBE DEF Dimensions From book Han and Kamber: Data Mining Concepts and Techniques
  10. 10. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1017th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva For a 5-dimension relation: two shell fragments can be built: (A, B, C) and (D, E) tid A B C D E 1 a1 b1 c1 d1 e1 2 a1 b2 c1 d2 e1 3 a1 b2 c1 d1 e2 4 a2 b1 c1 d1 e2 5 a2 b1 c1 d1 e3 Related Work – Frag-Cubing Example From book Han and Kamber: Data Mining Concepts and Techniques
  11. 11. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1117th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva • Build traditional invert index or RID list Attribute Value TID List List Size a1 1 2 3 3 a2 4 5 2 b1 1 4 5 3 b2 2 3 2 c1 1 2 3 4 5 5 d1 1 3 4 5 4 d2 2 1 e1 1 2 2 e2 3 4 2 e3 5 1 Related Work – Frag-Cubing 1-D Inverted Indexes From book Han and Kamber: Data Mining Concepts and Techniques
  12. 12. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1217th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Generalize the 1-D inverted indexes to multi-dimensional ones in the data cube sense; Computes all cuboids for data cubes ABC and DE while retaining the inverted indexes; For example, shell fragment cube ABC contains 7 cuboids: A, B, C AB, AC, BC ABC 111 2 3 1 4 5a1 b1 04 5 2 3a2 b2 24 54 5 1 4 5a2 b1 22 31 2 3 2 3a1 b2 List SizeTID ListIntersectionCell          Related Work – Frag-Cubing Approach From book Han and Kamber: Data Mining Concepts and Techniques This completes the offline computation stage Frag-cubing proposes to compute only the cuboid of a given fragment during the processes of the data cube computation.
  13. 13. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1317th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva • If measures other than count measures are present, store in ID_measure table separate from the shell fragments tid count sum 1 5 70 2 3 10 3 8 20 4 5 40 5 2 30 Related Work – Frag-Cubing Measure Table From book Han and Kamber: Data Mining Concepts and Techniques
  14. 14. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1417th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Once the data cube is computed into fragments, the query process follows these steps:  Divides the query into fragment;  Fetches the corresponding tid-list for each fragment from the fragment cube;  Intersects the tid-lists from each fragment in order to construct an instantiated base table; Related Work – Frag-Cubing Query From book Han and Kamber: Data Mining Concepts and Techniques Online Computation Base Table  Computes the data cube using the base table with any cubing algorithm.
  15. 15. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1517th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Approach Implements a set of tuple identifiers per dimension attribute, similar to Frag-Cubing; • H-Frag allows larger cubes_ by using external memory to store some of the computed cubes, rather than relying on the main memory only. The main challenge of using external memory_ is to define the criteria to select which fragments of the cube_ should be in main memory. H-Frag, selects fragments of the cube_ according to the attribute values frequencies_ and dimension cardinalities, to be stored in main memory.
  16. 16. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1617th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Architecture
  17. 17. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1717th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Computation First, the computation component _ scans Entry Relation completely_ in order to obtain the frequency of each attribute value for each dimension. Then, the average frequency is obtained, and attribute values with frequencies lower than the average are marked_ in order to be stored in the external memory. scans Frequencies of the attribute values attribute values are marked in order to be stored in the External Memory Entry Relation
  18. 18. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1817th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 The frequency of each attribute value is: fa1=4, fa2=3, fa3=2, fb1=4, fb2=2, fb3=3, fc1=3 and fc2=6. H-Frag Computation – Example
  19. 19. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 1917th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Computation – First Step - 3 is the average frequency in the dimensions A and B; - In dimension C, the average frequency is 4.5 (let´s consider 4). fa1=4, fa2=3, fa3=2; -> (4+3+2)/3 = 3; fb1=4, fb2=2, fb3=3; -> (4+2+3)/3 = 3; fc1=3 and fc2=6; -> (3+6)/2 = 4.5 External Memory a3, b2, b3 and c1 The attribute values a3, b2, b3 and c1 are marked to be stored in the external memory, because they are below the average.
  20. 20. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2017th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Computation – Second Step The Entry Relation_ is scanned a second time by the computation component_ in order to select the attribute values to be stored in the external memory; Each attribute value_ and its tid-list_ is stored in the external memory; H-Frag splits the Entry Relation into complementary portions defined by the user, with several tuples in each portion. a single attribute value can have several complementary tid-lists in external memory, since RAM can get full; scans to select the attribute values to be stored attribute value and its tid-list External Memory Entry Relation
  21. 21. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2117th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Computation – Second Step • In order to avoid attribute values with low number of tids in the external memory, H-Frag defines an occurrence percentage for each attribute value inside a portion. Entry Relation 1 a1 b1 c1 d1 e1 f1 g1 h1 2 a1 b2 c2 d1 e2 f2 g2 h2 3 a3 b8 c3 d3 e4 f5 g6 h7 4 a5 b6 c5 d5 e4 f5 g5 h6 5 a9 b9 c9 d9 e9 f9 g9 h9 6 a9 b4 c4 d4 e3 f4 g4 h4 7 a5 b7 c7 d7 e7 f7 g7 h7 8 a7 b7 c7 d7 e7 f7 g7 h7 first portion second portion if portion equals 4 a1 and d1 tid-list stored e4 and f5 tid-list stored a9 tid-list stored . . . Each attribute value, related to 50% of the number of processed tuples, - in relation to the total number of tuples in the portion - will have its tid-list stored in the external memory. Each portion_ should be stored fully in the main memory.
  22. 22. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2217th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva if 80% of the available working memory is being used, all the tid-lists of the processed attribute values and all measure values are stored in the external memory. H-Frag Computation – Second Step a1 = { 1, .. 4} a2 = { 2, .. 8} b1 = { 1, .. 3} c1 = { 3, .. 7} c2 = { 2, .. 4} 20 % all stored working memory This way, H-Frag eliminates the problem when there are many attribute values below 50% of a portion, which can happen_ in relations with high cardinality and low skew.
  23. 23. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2317th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Computation – Measure Values The measure values are grouped by portions; Each group of measure values_ is identified by a tid interval or range; This way, H-Frag will generate a few files with the measure values. For example, when a portion of 10 tuples, in which the initial tid equals 1 is processed, a file with measure values identified as 1_10 will be generated.
  24. 24. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2417th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Relation is scanned for a third time. As a result, it generates a map with the top frequent attribute values of relation and their tid-lists. Such a map_ is kept in the main memory. H-Frag Computation – Third Step scans map with the tid-lists of the top frequent attribute values Entry Relation Main Memory
  25. 25. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2517th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 Example of the computing process given this relation Remembering that the frequencies of the attribute values are: a1=4, a2=3, a3=2, b1=4, b2=2, b3=3, c1=3 and c2=6
  26. 26. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2617th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 Attribute Value Tids a2 2, 3 a2 6 a3 4, 7 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 Attribute Values in External Memory 3 Example of the computing process stores a tid-sublist each time the attribute value is associated to 50% or more of the tids of the defined portions
  27. 27. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2717th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 Attribute Value Tids a2 2, 3 a2 6 a3 4, 7 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 Attribute Values in External Memory In this example, let`s consider the size portion equals 2 3 Example of the computing process The attribute value a2, which frequency is 3, will have stored a sublist with tids 2 and 3_ and another sub list with tid 6
  28. 28. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2817th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 Attribute Value Tids a2 2, 3 a2 6 a3 4, 7 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 Attribute Values in External Memory In this example, let`s consider the size portion equals 2 3 4.5 Example of the computing process The attribute value C1 with frequency is 3, will have only one tid-list stored in the external memory, with three tids
  29. 29. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 2917th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Attribute Value tids a1 1,5,8,9 b1 1,5,6,7 c2 2,3,4,6,8,9 tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 Frequent Attribute Values in Main Memory Attribute Value Tids a2 2, 3 a2 6 a3 4, 7 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 Attribute Values in External Memory Frequencies of the attribute values: fa1=4, fa2=3, fa3=2, fb1=4, fb2=2, fb3=3, fc1=3 and fc2=6. Example of the computing process which are the most frequent
  30. 30. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3017th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Attribute Value tids a1 1,5,8,9 b1 1,5,6,7 c2 2,3,4,6,8,9 tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 Frequent Attribute Values in Main Memory Attribute Value Tids a2 2, 3 a2 6 a3 4, 7 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 Attribute Values in External Memory Frequencies of the attribute values: fa1=4, fa2=3, fa3=2, fb1=4, fb2=2, fb3=3, fc1=3 and fc2=6. Example of the computing process
  31. 31. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3117th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Attribute Value tids a1 1,5,8,9 b1 1,5,6,7 c2 2,3,4,6,8,9 tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 Frequent Attribute Values in Main Memory Attribute Value Tids a2 2, 3 a2 6 a3 4, 7 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 Attribute Values in External Memory tids M1 M2 Group ID 1 1.5 1 1_3 2 2.5 1 3 2 3 4 78.5 2 3_6 5 100 5 6 102.5 4 7 100 2 7_9 8 22.5 3 9 13.89 8 Measure Values Relation in External Memory Example of the computing process identifies the tids’ range of the processed tuples
  32. 32. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3217th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva The same H-Frag Computation algorithm H-Frag Update
  33. 33. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3317th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Update Relation: New Tuples tid A B C M1 M2 1 a1 b1 c1 1.5 1 2 a2 b2 c2 2.5 1 3 a2 b2 c2 2 3 4 a3 b3 c2 78.5 2 5 a1 b1 c1 100 5 6 a2 b1 c2 102.5 4 7 a3 b1 c1 100 2 8 a1 b3 c2 22.5 3 9 a1 b3 c2 13.89 8 tid A B C M1 M2 10 a4 b4 c4 3 7 11 a3 b3 c1 4.7 12 12 a1 b1 c2 5.5 6 Attribute Value tids a2 2, 3 a2 6 a3 4, 7 a4 10 a3 11 b2 2, 3 b3 4, 8 b3 9 b3 11 b4 10 c1 1, 5 c1 11 C4 10 new tuples Attribute Value tids a1 1,5,8,9,12 b1 1,5,6,7,12 c2 2,3,4,6,8,9,12 Records in the main memory are updated with the new tids_ or_ are replaced by attribute values_ which become more frequent new records are created in the external memory
  34. 34. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3417th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Updates: attribute values are merged Suppose that_ attribute value a2 and a3 are merged as attribute value a9. The attribute values a9 _ will have the highest frequency _ and will replace attribute value a1 in the main memory. Therefore, the attribute value a1 will be stored in external memory. Attribute Value Tids a2 2, 3 a2 6 a3 4, 7 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 a2 + a3 = a9 : {2, 3, 6, 4, 7} External Memory Attribute Value tids a9 2, 3, 6, 4, 7 b1 1,5,6,7 c2 2,3,4,6,8,9 Attribute Value Tids a1 1,5,8,9 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 External Memory Main Memory
  35. 35. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3517th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Update: new dimensions and measures tid A B C D M1 M2 M3 1 a1 b1 c1 d1 1.5 1 6 2 a2 b2 c2 d1 2.5 1 5.66 3 a2 b2 c2 d1 2 3 78.98 4 a3 b3 c2 d1 78.5 2 2.98 5 a1 b1 c1 d3 100 5 1.65 6 a2 b1 c2 d2 102.5 4 2.69 7 a3 b1 c1 d1 100 2 6.87 8 a1 b3 c2 d3 22.5 3 98.999 9 a1 b3 c2 d2 13.89 8 78.995 Attribute Value tids a1 1,5,8,9 b1 1,5,6,7 c2 2,3,4,6,8,9 d1 1,2,3,4,7 Attribute Values in External Memory Frequent Attribute Values in Main Memory Attribute Value tids a2 2, 3 a2 6 a3 4, 7 b2 2, 3 b3 4, 8 b3 9 c1 1, 5, 7 d2 6,9 d3 5,8 tids M1 M2 M3 1 1.5 1 6 2 2.5 1 5.66 3 2 3 78.98 4 78.5 2 2.98 5 100 5 1.65 6 102.5 4 2.69 7 100 2 6.87 8 22.5 3 98.999 9 13.89 8 78.995 Measure Values Relation in External Memory the computing algorithm processes only the new dimensions and measures
  36. 36. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3617th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva The H-Frag approach enables point queries and range queries rOp= (greater than + less than + between + some + different + similar x (fv1 … fvn)) H-Frag Range and Inquire Query It also allows inquire queries such as sub-cube and distinct. iOp =(sub-cube + distinct + top-k similar x (fv1 … fvn))
  37. 37. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3717th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Range and Inquire Query In order to achieve better performance, H-Frag organizes sub-cube queries, by always starting by the queries that generate fewer intersections. As a result of Q, we have qR=(TID1, TID2 … TIDk), where TIDi is the ith tuple identifier of relation R.
  38. 38. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3817th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Query For each type of query, it’s checked whether each attribute value is stored in the external memory when getting each tid-list for the attribute values that meet the user's query an intersection operation of those lists is performed, and this _ generates the end of the query.
  39. 39. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 3917th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva H-Frag Query - example q={?,?,c2} a query like this, with two inquire operators would be executed in SQL as follows: SELECT a, '*', 'c2', COUNT(a) FROM TABLE WHERE c = 'c2' GROUP BY 1,2,3 UNION SELECT '*', B, 'c2', COUNT (b) FROM TABLE WHERE c = 'c2' GROUP BY 1,2,3 UNION SELECT A, B, 'c2', COUNT (*) FROM TABLE WHERE c = 'c2' GROUP BY 1,2,3;
  40. 40. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 4017th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Experiments • We experimented H-Frag Computation and Query algorithms against Frag-Cubing algorithm used in [Li et al. 2004]; • The H-Frag algorithms were coded in Java 64 bits; • Frag-Cubing is a free and open source C++ application (http://illimine.cs.uiuc.edu/); • The synthetic base relations were created by using data generator provided by the IlliMine project; • The IlliMine project is an open-source project that provides various approaches for data mining and machine learning. • Frag-Cubing approach is part of IlliMine project. • We ran the algorithms in two Intel Xeon six-core processors with 2.4GHz each core, 12MB cache and 128GB of RAM DDR3 1333MHz. • The system runs Windows Server 2008 64 bits, High Performance version.
  41. 41. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 4117th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Results - Performance Evaluation of Point Queries Response time per query over 100 trials: T=107; C=104; D=30, S=0 In average, point queries were answered 3 times slower when accomplished by the H-Frag approach if compared to the Frag-Cubing approach.
  42. 42. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 4217th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Results - Performance Evaluation of Inquire Operators Response time queries with inquire operators: T = 107; C = 104; D = 30, S = 0. Queries _ with two inquire operators _ were answered by the H-Frag approach about 2.5 times slower than when answered by the Frag-Cubing approach. • The Frag-Cubing approach _ did not perform queries with three inquire operators due to memory overflow.
  43. 43. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 4317th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Results: Where the relation with different numbers of dimensions were computed. T = 107; C = 104; D = 30, S = 0. The runtime was linear in both approaches. In average, the hybrid memory usage_ caused the H-Frag approach_ to consume 3 times less main memory.
  44. 44. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 4417th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Results - Massive Data Cube One relation with T = 109 tuples was computed by the H-Frag approach. This experiment_ took 64 hours_ and consumed 126 GB of RAM. Queries_ with five range operators, ten point operators, and one inquire operator were answered in less than 35 seconds. Data cubes, with a high number of tuples_ could not be computed by the Frag-Cubing approach using the main memory only. This_ was demonstrated_ by trying to compute a base relation with 200 million tuples and 60 dimensions.
  45. 45. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 4517th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Conclusions • H-Frag has linear runtime and memory consumption, similar to Frag-Cubing; • When compared to Frag-Cubing, H-Frag is faster to answer sub-cube queries. • It introduces a different cube representation with less empty cells_ than Frag-Cubing; • Frag-Cubing cannot answer two sub-cube operators in a data cube with 200 million tuples , C=104, D=30 and S=0. • We had scenarios where the Frag-Cubing approach failed to compute the data cube due to the main memory lack.
  46. 46. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 4617th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Conclusions Interesting research directions_ to further extend H-Frag:  First, we must experiment H-Frag_ with holistic measures.  Top-k query is part of our interest, since inverted index is also useful for this type of problem.  Multicore and multicomputer versions of H-Frag must be implemented.
  47. 47. Monday, April 27, 2015 H-Frag: A Hybrid Memory Data Cube Approach for High Dimension Relations 4717th International Conference on Enterprise Information Systems - Rodrigo Rocha Silva Acknowledgements Thank you very much e-mail rrochas@gmail.com

×