0
qCube:
Efficient integration of range query operators over a high
dimension data cube
Rodrigo Rocha Silva
Doctorate Studen...
qCube: Efficient integration of range query operators over a high dimension data cube

Goal
Present a new cube approach, d...
qCube: Efficient integration of range query operators over a high dimension data cube

Topics
–
–
–
–
–
–
–

Motivation
Da...
qCube: Efficient integration of range query operators over a high dimension data cube

Motivation
Users need to view data ...
qCube: Efficient integration of range query operators over a high dimension data cube

Motivation
• Suppose that at some d...
qCube: Efficient integration of range query operators over a high dimension data cube

Data Cube
A data cube, introduced b...
qCube: Efficient integration of range query operators over a high dimension data cube

Data Cube

A data cube has exponent...
qCube: Efficient integration of range query operators over a high dimension data cube

Data Cube
• Hierarchies
Year

Disci...
qCube: Efficient integration of range query operators over a high dimension data cube

Data Cube
A

C

COUNT

A

B

C

COU...
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Approach...
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Example
...
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing 1-D Inve...
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Approach...
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Measure ...
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Query
•
...
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Approach...
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Approach
Implements a set of ...
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Approach
Implements the range...
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Architecture

Wednesday, Octo...
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Computation
TID
1
2
3
4
5
6

...
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Update

The same qCube Comput...
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Update
TID
1
2
3
4
5
6

A
a1
...
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Query
pQ= a1:*:*:*:e1
Attribu...
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Range and Inquire Query
rOp= ...
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Query - example

Wednesday, O...
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Query - example
“What is the ...
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Query - example

Wednesday, O...
qCube: Efficient integration of range query operators over a high dimension data cube

Experiments
•

We tested qCube Comp...
qCube: Efficient integration of range query operators over a high dimension data cube

Results - Performance Evaluation of...
qCube: Efficient integration of range query operators over a high dimension data cube

Results - Performance Evaluation of...
qCube: Efficient integration of range query operators over a high dimension data cube

Results - Performance Evaluation of...
qCube: Efficient integration of range query operators over a high dimension data cube

Results - Runtime and Memory Consum...
qCube: Efficient integration of range query operators over a high dimension data cube

Conclusions
• qCube has linear runt...
qCube: Efficient integration of range query operators over a high dimension data cube

Conclusions
Interesting research di...
qCube: Efficient integration of range query operators over a high dimension data cube

Acknowlegements

Wednesday, October...
Upcoming SlideShare
Loading in...5
×

qCube: Efficient integration of range query operators over a high dimension data cube

303

Published on

Present a new cube approach, designed for high dimension range queries. Our cube approach, named Query Cube (qCube), implements Equal, Not Equal, Greater or Less than, Some, Between and Similar range query operators and Distinct, Sub-cube and Top-k Similar inquire query operators

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
303
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "qCube: Efficient integration of range query operators over a high dimension data cube"

  1. 1. qCube: Efficient integration of range query operators over a high dimension data cube Rodrigo Rocha Silva Doctorate Student Prof. Dr. Celso Massaki Hirata Advisor Prof. Dr. Joubert de Castro Lima Co-Advisor ITA – INSTITUTO TECNOLÓGICO DE AERONÁUTICA Electronic Engineering and Computer Science Division - EEC/I Department of Computer Science Brazil
  2. 2. qCube: Efficient integration of range query operators over a high dimension data cube Goal Present a new cube approach, designed for high dimension range queries. Our cube approach, named Query Cube (qCube), implements Equal, Not Equal, Greater or Less than, Some, Between and Similar range query operators and Distinct, Subcube and Top-k Similar inquire query operators Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 2
  3. 3. qCube: Efficient integration of range query operators over a high dimension data cube Topics – – – – – – – Motivation Data Cube Related Work Query Cube (qCube) Experiments Results Conclusions Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 3
  4. 4. qCube: Efficient integration of range query operators over a high dimension data cube Motivation Users need to view data in a tangible way, such as reports, cross tables and histograms Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 4
  5. 5. qCube: Efficient integration of range query operators over a high dimension data cube Motivation • Suppose that at some decision-making process it is necessary the following information : “What is the women journal research papers variance impact, using months {1, 3, 5, 7, 11}, year 2012 and ages varying from 25-40 years? Return results for all countries” “The average temperatures above 30 degrees Celsius on the weekends of leap years in the last 200 years.” Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 5
  6. 6. qCube: Efficient integration of range query operators over a high dimension data cube Data Cube A data cube, introduced by Gray et al., 1996, is a generalization of the group-by operator over all possible combinations of dimensions with various granularity aggregates. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 6
  7. 7. qCube: Efficient integration of range query operators over a high dimension data cube Data Cube A data cube has exponential complexity with respect to the number of dimensions For an input with size d the output has size 2d Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 7
  8. 8. qCube: Efficient integration of range query operators over a high dimension data cube Data Cube • Hierarchies Year Discipline Day Department Year Wednesday, October 02, 2012 Hour 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 8
  9. 9. qCube: Efficient integration of range query operators over a high dimension data cube Data Cube A C COUNT A B C COUNT * * * 11 * b2 c1 1 a1 * * 3 * b2 c2 1 a2 * * 5 * b3 c2 3 a3 Base Relation R – 11 tuples B * * 3 a1 b1 c1 1 A B C COUNT * b1 * 6 a3 b3 c2 1 a1 b1 c1 1 * b2 * 2 a2 b3 c2 1 a3 b3 c2 1 * b3 * 3 a3 b1 c1 1 a2 b3 c2 1 * * c1 4 a2 b1 c1 1 a3 b1 c1 1 * * c2 7 a2 b2 c2 1 a2 b1 c1 1 a1 b1 * 2 a1 b1 c2 1 a2 b2 c2 1 a1 b3 * 1 a2 b2 c1 1 a1 b1 c2 1 a2 b1 * 2 a3 b1 c2 1 a2 b2 c1 1 a2 b2 * 2 a1 b3 c2 1 a3 b1 c2 1 a2 b3 * 1 a2 b1 c2 1 a1 b3 c2 1 a3 b1 * 2 a2 b1 c2 1 a3 b3 * 1 a1 * c1 1 a1 * c2 2 a2 * c1 2 a2 * c2 3 a3 * c1 1 a3 * c2 2 * b1 c1 3 * b1 c2 3 Wednesday, October 02, 2012 FULL 3D CUBE + 38 tuples 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 9
  10. 10. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Approach • Partitions the data vertically • Reduces high-dimensional cube into a set of lower dimensional cubes • Lossless reduction • Offers tradeoffs between the amount of pre-processing and the speed of online computation From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 10
  11. 11. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Example • Let the cube aggregation function be count tid A B C D E 1 a1 b1 c1 d1 e1 2 a1 b2 c1 d2 e1 3 a1 b2 c1 d1 e2 4 a2 b1 c1 d1 e2 5 a2 b1 c1 d1 e3 • Divide the 5 dimensions into 2 shell fragments: – (A, B, C) and (D, E) From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 11
  12. 12. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing 1-D Inverted Indices • Build traditional invert index or RID list Attribute Value TID List List Size a1 123 3 a2 45 2 b1 145 3 b2 23 2 c1 12345 5 d1 1345 4 d2 2 1 e1 12 2 e2 34 2 e3 5 1 From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 12
  13. 13. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Approach • Generalize the 1-D inverted indices to multi-dimensional ones in the data cube sense • Compute all cuboids for data cubes ABC and DE while retaining the inverted indices • For example, shell fragment cube ABC contains 7 cuboids: – A, B, C – AB, AC, BC – ABC • This completes the offline computation stage Cell Intersection TID List List Size a1 b1 1 2 3 ∩1 4 5 1 1 a1 b2 1 2 3 ∩2 3 23 2 a2 b1 4 5 ∩1 4 5 45 2 a2 b2 4 5 ∩2 3 ⊗ 0 From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 13
  14. 14. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Measure Table • If measures other than count are present, store in ID_measure table separate from the shell fragments tid count sum 1 5 70 2 3 10 3 8 20 4 5 40 5 2 30 From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 14
  15. 15. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Query • Given the fragment cubes, process a query as follows 1. Divide the query into fragment, same as the shell 2. Fetch the corresponding TID list for each fragment from the fragment cube 3. Intersect the TID lists from each fragment to construct instantiated base table 4. Compute the data cube using the base table with any cubing algorithm From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 15
  16. 16. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Approach A B C D E F G H I J K L M N … Base Table Online Computation From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 16
  17. 17. qCube: Efficient integration of range query operators over a high dimension data cube qCube Approach Implements a set of tuple identifiers per dimension attribute, similar to Frag-Cubing; Therefore, qCube can answer point queries using tuple identifiers intersections and range queries using unions plus intersections algorithms, regardless measure function types. Frag-Cubing just implements point and some inquire queries. There is no Frag-Cubing solution for queries like “What is the women journal research papers variance impact, using months {1, 3, 5, 7, 11}, year 2012 and ages varying from 25-40 years? Return results for all countries” Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 17
  18. 18. qCube: Efficient integration of range query operators over a high dimension data cube qCube Approach Implements the range query operators: • Equal; • Not Equal; • Greater or Less than; • Some; • Between and Similar. Also implements inquire query operators: • Distinct; • Sub-cube; • Top-k Similar. Over a high dimension data cube. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 18
  19. 19. qCube: Efficient integration of range query operators over a high dimension data cube qCube Architecture Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 19
  20. 20. qCube: Efficient integration of range query operators over a high dimension data cube qCube Computation TID 1 2 3 4 5 6 A a1 a2 a1 a3 a1 a5 Function tid 1 2 3 4 5 6 B b1 b2 b1 b3 b1 b5 C c1 c2 c1 c3 c4 c5 D d1 d2 d1 d2 d1 d2 Variance M1 2.56 3.14 2.45 6.7 9 1 Wednesday, October 02, 2012 E e1 e2 e1 e2 e2 e2 Count M2 1 1 1 1 1 1 Attribute Value TID List Attribute Value TID List a1 a2 a3 a5 b1 b2 b3 b5 c1 c2 c3 c4 c5 d1 d2 e1 e2 Average M3 10 20 10 11 3 1 1, 3, 5 2 4 6 1, 3, 5 2 4 6 1, 3 Skewness M4 1 0 1 1 1 1 2 4 5 6 1, 3, 5 2, 4, 6 1, 3 2, 4, 5, 6 Standard deviation M5 877686769698 7986676867.99 -7878789.8777 -99974333.23 100045.655 1 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 20
  21. 21. qCube: Efficient integration of range query operators over a high dimension data cube qCube Update The same qCube Computation algorithm Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 21
  22. 22. qCube: Efficient integration of range query operators over a high dimension data cube qCube Update TID 1 2 3 4 5 6 A a1 a2 a1 a3 a1 a5 B b1 b2 b1 b3 b1 b5 C c1 c2 c1 c3 c4 c5 D d1 d2 d1 d2 d1 d2 E e1 e2 e1 e2 e2 e2 Attribute Value a1 a2 a3 a5 b1 b2 b3 b5 c1 c2 c3 Wednesday, October 02, 2012 tid 5 7 8 9 TID List 1, 3 2, 8 4, 5, 7 6, 9 1, 3, 5 2, 7 4, 8 6, 9 1, 3 2 4, 7 A a3 a3 a2 a5 B b1 b2 b3 b5 C c4 c3 c4 c5 Attribute Value c4 c5 d1 d2 d3 e1 e2 e3 f1 f2 D d1 d3 d3 d1 E e2 e3 e2 e1 F f1 f2 TID List 5, 8 6, 9 1, 3, 5, 9 2, 4, 6 7, 8 1, 3, 9 2, 4, 5, 6, 8 7 8 9 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 22
  23. 23. qCube: Efficient integration of range query operators over a high dimension data cube qCube Query pQ= a1:*:*:*:e1 Attribute Value a1 a2 a3 a5 b1 b2 b3 b5 c1 c2 c3 Wednesday, October 02, 2012 TID List 1, 3 2, 8 4, 5 6, 9 1, 3, 5 2, 7 4, 8 6, 9 1, 3 2 4, 7 Attribute Value c4 c5 d1 d2 d3 e1 e2 e3 f1 f2 TID List 5, 8 6, 9 1, 3, 5, 9 2, 4, 6 7, 8 1, 3, 9 2, 4, 5, 6, 8 7 8 9 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 23
  24. 24. qCube: Efficient integration of range query operators over a high dimension data cube qCube Range and Inquire Query rOp= (greater than + less than + between + some + different + similar x (fv1 … fvn)) iOp =(sub-cube + distinct + top-k similar x (fv1 … fvn)) qCube rearranges Q sub-queries in order to improve query response times a result of Q we have qR=(TID1, TID2 … TIDk), where TIDi is the ith tuple identifier of relation R. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 24
  25. 25. qCube: Efficient integration of range query operators over a high dimension data cube qCube Query - example Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 25
  26. 26. qCube: Efficient integration of range query operators over a high dimension data cube qCube Query - example “What is the women journal research papers variance impact, using months {1, 3, 5, 7, 11}, year 2012 and ages varying from 2540 years? Return results for all countries” In Q, they are (sex = women, paperType=journal, year=2012). The range queries (month = (1,3,5,7,11), age <>25-40) are also sorted according to their cardinalities. In Q, there is inquire query (country=distinct). Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 26
  27. 27. qCube: Efficient integration of range query operators over a high dimension data cube qCube Query - example Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 27
  28. 28. qCube: Efficient integration of range query operators over a high dimension data cube Experiments • We tested qCube Computation and Query algorithms against Frag-Cubing algorithm used in [Li et al. 2004]; • The qCube algorithms were coded in Java 64 bits; • Frag-Cubing is a free and open source C++ application(http://illimine.cs.uiuc.edu/); • The synthetic base relations were created using data generator provided by the IlliMine project; • The IlliMine project is an open-source project to provide various approaches for data mining and machine learning. • Frag-Cubing approach is part of IlliMine project. • We ran the algorithms in two Intel Xeon six-core processors with 2.4GHz each core, 12MB cache and 128GB of RAM DDR3 1333MHz. • The system runs Windows Server 2008 64 bits, High Performance version. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 28
  29. 29. qCube: Efficient integration of range query operators over a high dimension data cube Results - Performance Evaluation of Point Queries and Skewed Relations Response time per query over 100 trials: T=107; C=5000; D=30, S=0 Response time per query over 100 trials: T=107; C=5000; D=30, S=2.5 Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 29
  30. 30. qCube: Efficient integration of range query operators over a high dimension data cube Results - Performance Evaluation of Range Query Operators and Skewed Relations Response time queries with one infrequent point operator: T=107; C=5000; D=30, S=2.5 Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 30
  31. 31. qCube: Efficient integration of range query operators over a high dimension data cube Results - Performance Evaluation of of Inquire Operators and Skewed Relations Response time queries with inquire operators: T = 107; C = 5000; D = 30, S = 2.5. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 31
  32. 32. qCube: Efficient integration of range query operators over a high dimension data cube Results - Runtime and Memory Consumption Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 32
  33. 33. qCube: Efficient integration of range query operators over a high dimension data cube Conclusions • qCube has linear runtime and memory consumption, similar to Frag-Cubing; • It implements Not Equal, Greater or Less than, Some, Between and Similar range query operators and Distinct, Sub-cube and Top-k Similar inquire query operators; • When compared with Frag-Cubing, qCube is faster to answer point and inquire queries with sub-cube operators. • It introduces a different cube representation with less empty cells than Frag-Cubing; • Frag-Cubing cannot answer two sub-cube operators in a data cube with 107 tuples, C=5000, D=30 and S=2.5. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 33
  34. 34. qCube: Efficient integration of range query operators over a high dimension data cube Conclusions Interesting research directions to further extend qCube: First, we must experiment it with holistic measures. Update and computation experiments with many holistic measures are a hard problem; TIDs can become huge, thus memory consumption and intersection costs can become impracticable, and therefore we must address an efficient solution to partition TIDs with fast data retrieval. Multicore and multicomputer versions of qCube must be implemented. qCube must be improved to answer top-k queries combined with range, point and inquire queries. Experiments with high dimensional text cubes must be made to evaluate qCube , specially its text measures computing. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 34
  35. 35. qCube: Efficient integration of range query operators over a high dimension data cube Acknowlegements Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 35
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×