Upcoming SlideShare
×

# DaWaK'07

877 views

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
877
On SlideShare
0
From Embeds
0
Number of Embeds
40
Actions
Shares
0
0
0
Likes
0
Embeds 0
No embeds

No notes for slide
• ### DaWaK'07

1. 1. Mining Top-K Multidimensional Gradients Department of Informatics School of Engineering University of Minho PORTUGAL Ronnie Alves, Orlando Belo and Joel Ribeiro 9th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2007) 3-7 September 2007, Regensburg, Germany
3. 3. Gradients <ul><li>Consider a telecom fact table T </li></ul><ul><ul><li>Phone (e.g., p1,p2, p3) </li></ul></ul><ul><ul><li>DateTime (e.g., 10/02/2006 10am, 10/03/2006 14pm, 10/04/2006 16pm) </li></ul></ul><ul><ul><li>Origin (e.g., Porto, Braga, Lisbon) </li></ul></ul><ul><ul><li>Destination (e.g., Porto, Lisbon, Regensburg) </li></ul></ul><ul><ul><li>TypeCall (e.g., Local, International) </li></ul></ul><ul><ul><li>Cost </li></ul></ul><ul><ul><li>Duration </li></ul></ul>*Introduction Mining Top-K Multidimensional Gradients How is the average of duration call affected By age , origin, weekday in cubes with at least 1000 customers and where the average of duration calls is between 300s and 720s ? > It goes (75%) up for middle-age and people in Porto area on Monday. Typical Cubegrade “how” query Imielinski et al DMKD’02, vol.6
4. 4. Gradients (A=a1, B=b1, C=c1) (A=a1, B=b1, C=c1, D=d1) (A=a1, B=b1) (A=a1, B=b1, C=c2) roll-up(C) drill-down(D=d1) mutate(C=c2) cubegrade operations Even when considering only iceberg cells , It may still generate a very large number of pairs . > Mining gradients with constraints: a) significance , b) probe and c) gradient > LiveSet-Driven strategy Constrained Gradients Mining Top-K Multidimensional Gradients Dong et al TKDM’02, vol.16 *Introduction
5. 5. Gradients <ul><li>Back to table T </li></ul><ul><ul><li>Phone (e.g., p1,p2, p3) </li></ul></ul><ul><ul><li>DateTime (e.g., 10/02/2006 10am, 10/03/2006 14pm, 10/04/2006 16pm) </li></ul></ul><ul><ul><li>Origin (e.g., Porto, Braga, Lisbon) </li></ul></ul><ul><ul><li>Destination (e.g., Porto, Lisbon, Regensburg) </li></ul></ul><ul><ul><li>TypeCall (e.g., Local, International) </li></ul></ul><ul><ul><li>Cost </li></ul></ul><ul><ul><li>Duration </li></ul></ul>*Introduction Mining Top-K Multidimensional Gradients Find the Top-K highest changes situations related to average of duration call originated in the Porto area during the week . > Find maximum gradient regions (MGRs) in the cube that maximize the task of mining Top-K gradient cells . Top-K Gradient Query Alves et al DaWaK’07
6. 6. What’s New with Top-K Gradients <ul><li>An effort made to enrich multidimensional data analysis </li></ul><ul><ul><li>A new rank-aware materialization handling gradients </li></ul></ul><ul><li>Top-K query processing based on rank gradient cells </li></ul><ul><ul><li>Partitioning based on a gradient ascent approach </li></ul></ul><ul><ul><li>DFS traversal order according to spreading factors of aggregating functions (GR tree ) </li></ul></ul><ul><ul><li>High-dimensional cubing (TopK gr -Cube) </li></ul></ul>*Introduction Mining Top-K Multidimensional Gradients
7. 7. Gradient Regions *Top-K Gradients Mining Top-K Multidimensional Gradients countXY( ) sumXY( ) avgXY() convex non-convex gradient region (GR) > Avg() is an algebraic function and It also has an unpredictable spreading factor regarding its distribution value > There are also sets of GRs to looking for Different shapes of aggregating functions
8. 8. Gradient Regions <ul><li>How to select gradient regions </li></ul><ul><ul><li>GR1: Rectangles with all bins </li></ul></ul><ul><ul><ul><li>GR1 = {[4,7]:[6,5] </li></ul></ul></ul><ul><ul><li>GR2: Rectangles with minimum bin and maximum bin </li></ul></ul><ul><ul><ul><li>GR2 = {[3,5]:[4,4]} </li></ul></ul></ul>Mining Top-K Multidimensional Gradients *Top-K Gradients GR1 GR2 We expect that GRs with largest aggregating values will provide higher gradient cells
9. 9. Definitions *Top-K Gradients Mining Top-K Multidimensional Gradients Base Table closed cell maximal cell maximal probe cell matchable cells A cell cg is said to be gradient cell of a probe cell cp , when they are matchable cells and their delta change, given by Δg(cg, cp)  (g(cg, cp) ≥  ) is true, where  is a constant value and g is a gradient function .
10. 10. Gradient Ascent Approach <ul><li>Equation 1 </li></ul><ul><li>Equation 2 </li></ul>*Top-K Gradients Mining Top-K Multidimensional Gradients When evaluating a GR we first search for the maximal probe cells , i.e. the highest aggregating values on it and then calculates its gradients from all possible matchable cells.
11. 11. Gradient-based Cubing <ul><li>Intuition </li></ul><ul><ul><li>Given a ranking gradient function </li></ul></ul><ul><ul><ul><li>Quick locate the most promising gradient regions </li></ul></ul></ul><ul><ul><ul><li>Evaluate dimensions’ spreading factors </li></ul></ul></ul><ul><ul><ul><li>Effective retrieval of Top-K gradient cells </li></ul></ul></ul><ul><li>Approach </li></ul><ul><ul><li>Cubing </li></ul></ul><ul><ul><ul><li>Step 1: Build inverted index and value-list indices from base table </li></ul></ul></ul><ul><ul><ul><li>Step 2: Pruning non-valid iceberg cells </li></ul></ul></ul><ul><ul><ul><li>Step 3: Calculate spreading factors, create GR tree following these factors </li></ul></ul></ul><ul><ul><ul><li>Step 4: Pruning non-valid GRs </li></ul></ul></ul><ul><ul><ul><ul><li>Evaluates Sf(GR i )>= min_sf </li></ul></ul></ul></ul><ul><ul><li>Top-K </li></ul></ul><ul><ul><ul><li>Step 5: Partitioning is due projecting GRs (High>>Low) </li></ul></ul></ul><ul><ul><ul><ul><li>Bin boundaries [min, max] are returned for each partition </li></ul></ul></ul></ul><ul><ul><ul><li>Step 6: Pruning non-valid Top-K regions </li></ul></ul></ul><ul><ul><ul><li>Step 7: Select maximal probe cells </li></ul></ul></ul><ul><ul><ul><li>Step 8: Calculate all Top-K gradients </li></ul></ul></ul>*Top-K Gradients Mining Top-K Multidimensional Gradients
12. 12. Cubing *Top-K Gradients Mining Top-K Multidimensional Gradients X,Y,Z: Selecting dimensions Value list Inverted index Spreading factors C i ={x1,y3,*}={4} Cuboid cell {1,4} {4} U Count (Ci)=1 Intersect tids aggregating function > Assembling high-dimensional cubes from low-dimensional ones > Follows Frag-Cubing ideas Li et al VLDB’04
13. 13. *Top-K Gradients Set Enumeration Tree Mining Top-K Multidimensional Gradients Gradient Region Top-K sets Min_sf>0.25, valid GR > Lattice is formed by projecting GR[x1] >> GR[y2] >> GR[z2] > Find local gradients Agg_value Probe cells 1
14. 14. *Top-K Gradients Mining Top-K Multidimensional Gradients 2 Projecting probe cells GR[x1] >> GR[y3] Top-K sets Matchable links Bin x1 = [1,4] Min_avg>2.7, valid Top-KGR
15. 15. *Top-K Gradients Mining Top-K Multidimensional Gradients 3 Projecting probe cells GR[x1] >> GR[z1] Top-3 = {i, L, j} {x1,y2,*} -> {x1,y3,*} {x1,*,z3} -> {x1,*,z1} {x1,*,*} -> {x1,y3,*} Top-K sets Matchable links That’s it!!