Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ICDE2014 Session 14 Data Warehousing

755 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

ICDE2014 Session 14 Data Warehousing

  1. 1. Session 14: Data Warehousing 担当: 若森 拓馬(NTT) 【ICDE2014勉強会】
  2. 2. 紹介する論文 2 1.  Distributed Interactive Cube Exploration N. Kamat, P. Jayachandran, K. Tunga, A. Nandi (Ohio State University) 2.  A Tunable Compression Framework for Bitmap Indices G. Guzun, G. Canahuate (The University of Iowa), D. Chiu (Washington State University), J. Sawin (University of St. Thomas) 3.  Pagrol: Parallel Graph OLAP over Large-scale Attributed Graphs Z. Wang, Q. Fan, H. Wang, K-L Tan (National University of Singapore), D. Agrawal, A. E. Abbadi (University of California) Session 14: Data Warehousing 担当:若森拓馬(NTT)
  3. 3. Distributed Interactive Cube Exploration 3 }  目的 }  大規模データ(キューブ)のインタラクティブなアドホック探索 }  貢献 }  DICE: 10億タプルを数秒で検索する分散システムを構築 }  分散環境でのキューブ探索の応答性能を向上するクエリの投 機実行とオンラインデータサンプリングを組み合わせたコスト ベースのフレームワーク }  投機クエリの探索空間を制限するため,連続するクエリをクエ リセッションとみなす多面キューブ探索モデルを提案 Session 14: Data Warehousing 担当:若森拓馬(NTT) SELECT rack, AVG(iops) FROM events WHERE datacenter = “EU” AND hour = 6 GROUP BY rack; rack hour iops <hour, iops, rack> <*, *, *> <month, *, *> <*, *, zone> … キューブのマテリアライズはコストが高い <*, *, zone>
  4. 4. Distributed Interactive Cube Exploration 4 }  隣接ディメンションに優先度を付け投機的にマテリアライズ }  キューブの探索を4種類(parent, child, sibling, pivot)に制限し投機クエリの実 行数を制限 }  ユーザ指定のサンプリングレート下で投機クエリを分散実行 Session 14: Data Warehousing 担当:若森拓馬(NTT) SELECT rack, AVG(iops) FROM events WHERE datacenter = “EU” /* datacenter = “*” */ /* rack = 12 */ AND hour = 6 GROUP BY rack; /* GROUP BY datacenter; */ the following schema: Database table events catalogs all the system events across the cluster and has three dimensions, two of which are hierarchical: location[zone:datacenter:rack], time[month:week:hour], iops Challenges in Exploration: As a user exploring a data cube, the number of possible parts of the cube to explore (i.e. cube groups) is very large, and thus, exploration can be unwieldy. To this end, we introduce the faceted model of cube exploration, which simplifies cube exploration into a set of facet traversals, as described below. As we will see in the following section, the faceted model drastically reduces the space of possible cube exploration and simplifies speculative query execution, which is essential to the DICE architecture. a,b$ b$a$ *$ sibling pivot parent child Fig. 2. Faceted Cube Explo- ration Traversals We introduce the term facet as the basic state of exploration of a data cube, drawing from the use of category counts in the exploratory search paradigm of faceted search [53]. Empirically, most visualizations such as map views and bar charts found in vi- sual analytics tools can be con- structed from aggregations along a single dimension. Facets are meant to be perused in an interactive fashion – a user is expected to fluidly explore the entire data cube by successively perusing multiple facets. Intuitively, a user explores a cube by inspecting a facet of a particular region in the data cube – a histogram view of a subset of groups from one region, along a specific dimension. The user then explores the cube by traversing from that facet to another facet. This successive facet can be a parent facet in the case of a rollup, a child facet in the case of a drilldown, a sibling facet in the case of change of a dimension value in the group and a pivot facet in the case of a change in the inspected dimension. Thus, the user is effectively moving around the cube lattice to either a parent region, or a child region or remaining in the same region using sibling and pivot traversals to look at the data differently. A session comprises of multiple traversals. The formal definitions are as follows. Facet: For a region r in cube C, a facet f is a set of groups g 2 r(d1...n) such that the group labels differ on exactly one dimension di, i.e. 8ga, gb 2 f, di(ga) 6= di(gb) ^ dj(ga) = dj(gb) where i 6= j and di is the grouping dimension, and the remaining dimensions are the bound dimensions. In its SQL Facet Ses f1...n that from one We now d inspired b to move f of the des Parent Fa by genera fp(dpg, dp dpg = dg in the cu generalize Child Fa by specia fc(dcg, dc dg and dc lattice. Th w1, hour Sibling F tained by dimension facet f(d differ by e m1, week Pivot Fac switching Thus, a fa fv(dvg, dv all but on f(week, z facet exam EXPLORA the user i groups ca any facet be obtaine ! d dg. from anot traversals switch be Fig. 2. Faceted Cube Exploration Traversals (rollup) スライド(*) P11 facet query result More'Dimensions' analytics frontends facet traversals UI actions Sampling Rate ! ! slave1! DB! slave3! DB! slaveN! DB! Result Cache Master Network slave2! DB! Workers Query!! Federa1on! (result = measure & std dev) Sample! Aggrega1on! Specula1ve!Query! Execu1on! Fig. 1. DICE Approach: Allow tunable sampling rates on low-latency frontends, with UI actions translated to facet traversal queries (Section II-A) over the data cube. Queries are executed by the master over distributed slaves (Section II-B). In the DICE approach (Section III), the master manages session state, query speculation and result aggregation, while the slaves manage execution and caching. For each query, the master distributes the query to each slave, which may have some results speculatively executed and cached. Results from each slave are then aggregated, error bounds are calculated, and returned to the user. Fig. 1. DICE Approach
  5. 5. Distributed Interactive Cube Exploration 5 }  ユーザ指定のサンプリングレート(R)で確度の高いものから優 先実行 }  並列実行クエリ数を減らすための工夫:複数クエリをセッショ ンに統合 }  カーディナリティやや高…複数WHERE句をGROUP BYに統合 }  カーディナリティ高…ディメンションのドメインを複数のレンジに分割 Session 14: Data Warehousing 担当:若森拓馬(NTT) chitecture ecture of our system employs a hierarchical approach, such that all queries are issued to the responded to by the master. In line with the bed in Section II-B, each slave manages multiple Each shard is atomic and read-only, and is as a table in a commodity relational database. of shards across all slave nodes is maintained For a single exploration session, the catalog is e that the list of shards addressed is constant. The in an in-memory LRU cache for the results. In a database, table shards can be atomically added rom the slaves, and the master’s catalog can be wing for querying over rapidly changing data. w evel query flow of DICE is as follows: each is rewritten and federated to the slave nodes, ecuted. The results are returned, aggregated and the user, along with the accuracy of the query. s, a set of speculative queries is executed till query is received, with the goal of increasing d of caching as many of the future queries as en the successive ad-hoc query is issued, it is en and federated, with the hope that its results the slaves at a high sampling rate, thus reducing the overall ad-hoc query. the exact query or a unified query (refer to Section III-F) is cached, the result of the ad-hoc query is materialized from the cached result. If it is not cached it is then executed on the database. The caching of speculated queries drastically impacts ad-hoc query latency and allows for a fluid and interactive data cube exploration experience. D. Prioritizing Speculative Queries As is clear from the query flow and the faceted model, each ad-hoc query can yield significantly large number of speculative queries. Given the bounded time available for exe- cution, it is typically not possible to execute all the speculative queries. Thus, it is necessary to prioritize speculative query execution such that it maximizes the likelihood of results for the successive query being returned from the cache. This can in turn be done by maximizing the overall gain in accuracy, as discussed in Section II-C. The selection of the maximal subset can be modeled as a linear integer programming problem as follows: MAXIMIZE: P q2Q Prob(q) · AccuracyGain(SR) · xq SUBJECT TO: P q2Q Time(q) · xq <= totalSpecTime WHERE: xq 2 {0, 1}. Here, Prob(q) gives the probability of a query q, which should be obtained from the query logs, Q is the set of all クエリqが呼ばれる確立(クエリログから算出) スライド(*) P14 OF FACETED MODEL: The four traversals e both intuitive and sufficient to explore e. The parent, child and pivot traversals are drilldown and pivot operations respectively. le to add more traversal types, especially ’s query history for common “patterns” eeping the bound dimensions the same group by dimension. Such extensions are to our system, but not required – the four above are intuitive and powerful enough to We quantify the applicability of our model ry logs and measure user satisfaction y, described in Section IV-C. cution nature of our use case necessitates the esults by executing queries over a subset se sharded tables to achieve distributed proportional to the square root of the current sampling rate. Therefore, we can estimate the future gain in accuracy based on the sampling rate. Thus, the estimated gain in the accuracy due to a unit sampling rate increase can be given as AccuracyGain(Rcurr) = c ⇤ ( 1 p Rcurr 1 p Rcurr + 1 ) (1) where Rcurr is the current sampling rate and c is the constant from the proportionality heuristic. With more time permissible, we issue the same query on multiple tables on multiple nodes progressively giving us a smaller standard error for the estimators. Our goal then during speculative execution of the queries is to increase the likelihood that the next user query would be cached at a higher sampling rate allowing us to retrieve the results at the desired sampling rate at the earliest. We cast this to fit the DICE framework in the following section. III. THE DICE SYSTEM 整数計画法で優先度計算: 確度(サンプリングレートRに依存)
  6. 6. our the as We ars as om the hm ies nd su- RM eed CE ld, as ncy hit est cal EC or- ral ed, me for ng) % to 0" 200" 400" 600" 800" 1000" 1200" 0" 50" 100" 150" 200" Time%(ms)% Sample%Size%(millions%of%rows)% NOSPEC" Random" Uniform" DICE" Perfect" Fig. 3. Varying Size of Dataset: CLUSTERSMALL 250$ 500$ 750$ 1000$ 0$ 200$ 400$ 600$ 800$ 1000$ 1200$ 1400$ 1600$ Time%(ms)% Sample%Size%(millions%of%rows)% Cluster:)Cloud) NOSPEC$ DICE$ Perfect$ Fig. 4. Varying Size of Dataset: CLUSTERCLOUD 5" 10" 20" 50" 75" 100" 40" 50" 60" 70" 80" 90" 100" Accuracy'(%)' Sample'Size'(%)' Cluster:)Small) Dataset:)1)billion)rows) Fig. 5. Accuracy over a workload 6 }  結果 }  Smallクラスタ:Xeon 15 slave nodes (4GiB RAM) }  Cloudクラスタ:Amazon EC2 50 slave nodes (7GiB Mem, 8C) }  (*)発表スライド: }  https://speakerdeck.com/arnabdotorg/dice-distributed- interactive-cube-exploration Session 14: Data Warehousing 担当:若森拓馬(NTT) Distributed Interactive Cube Exploration Dimensions are increased by adding new WHERE predicates o the query. As seen in Figure 6, execution time decreases up o a certain point and then starts increasing. The decreasing lope in the curve is caused by selectivity – as dimensions re added, less number of rows are processed, allowing for aster materialization of resultsets. After a certain point, the valuation cost of the multiple WHERE clauses takes over, specially because the order of filter dimensions is not ideal. 4) Number of Slave Nodes: We vary the number of slave odes in Figure 7, while keeping the size of the data constant at 00M rows. As expected, for all algorithms, latencies decrease s the number of nodes increases. An interesting observation s made for ALGODICE however – for 4 nodes, DICE thrashes memory due to the amount of data involved and the number f speculative queries, which is not a problem for both ALGONOSPEC (no speculation / caching) or ALGOPERFECT exactly one ad-hoc query being cached). 2000" 2500" 3000" 3500" Cluster:)Small) NOSPEC" DICE" Perfect" 0" 500" 1000" 1500" 2000" 2500" 3000" 3500" Time%(ms)% NOSPEC" DICE" Perfect" 0 .75 .75 .68 1 1 1 1 .68 .75 Cluster: Small Fig. 9. Individual Latencies for Anecdotal Query Session is 0.0, since there has been no speculation and the caches are empty. ALGODICE performs almost as well as ALGOPERFECT with hit rates equal or closer to 1.0. 7) Impact of Various Techniques: We now study in Fig- ure 10, the performance impact of the various algorithms and optimizations to our system on the CLUSTERSMALL cluster. We compare the AVERAGE LATENCY of various techniques compared to ALGONOSPEC. ALGOUNIFORM is slightly faster due to some of the speculative queries being part of the session. Including the unification optimization discussed in Section III-F reduces the number of concurrent queries, im- 投機実行しない場合と 比べて大幅に高速化 分散が小さい
  7. 7. A Tunable Compression Framework for Bitmap Indices 7 }  背景 }  従来のビットマップインデックスの圧縮は,伸長のオーバーヘッドを最小化する ために,bit vectorを固定エンコード長(ワード長)に圧縮:伸長せずに処理可能 }  エンコード長が短いほど圧縮率が高く,伸長時間が長い }  目的 }  可変エンコード長を用いることによる高圧縮率かつ短伸長時間のビットマップイ ンデックス圧縮 }  貢献 }  複数のエンコーディング手法と可変セグメント長を用いてbit vectorを圧縮する Variable Aligned Length (VAL) というフレームワークを提案 }  VAL上でWAHを実装(VAL-WAH) }  空間/時間コストのチューニングパラメータλを指定可能.VAL-WAHは64bit WAHと比較して圧縮率が3.4倍,オーバーヘッドが3%.32bit WAHと比較して 1.8倍,30%高速化 Session 14: Data Warehousing 担当:若森拓馬(NTT) 2 · Kesheng Wu et al. bitmap index row b0 b1 b2 b3 ID X =0 =1 =2 =3 1 0 1 0 0 0 2 1 0 1 0 0 3 3 0 0 0 1 4 2 0 0 1 0 5 3 0 0 0 1 6 3 0 0 0 1 7 1 0 1 0 0 8 3 0 0 0 1 Fig. 1. A sample bitmap index. Each column b0 . . . b3 is called a bitmap in this paper. empirical studies have shown that WAH compressed bitmap indices answer queries faster than uncompressed bitmap indices, projection indices, and B-tree indices, on both high and low cardinality attributes [Wu et al. 2001; 2002; Stockinger et al. 2002; Wu et al. 2004]. This paper complements the observations with rigorous analyses. The main conclusion of the paper is that the WAH compressed bitmap index is in fact optimal. Some of the most efficient indexing schemes such as B+ - tree indices and B∗ -tree indices have a similar optimality property [Comer 1979; Knuth 1998]. However, a unique advantage of compressed bitmap indices is that the results of one-dimensional queries can be efficiently combined to answer multi- dimensional queries. This makes WAH compressed bitmap indices well-suited for ad hoc analyses of large high-dimensional datasets. 1.1 The basic bitmap index In this paper, we define a bitmap index to be an indexing scheme that stores the bulk of its data as bit sequences and answers queries primarily with bitwise logical operations. We refer to these bit sequences as bitmaps. Figure 1 shows a set of such bitmaps for an attribute X of a hypothetical table (T), consisting of eight tuples (rows). The cardinality of X is four. Without loss of generality, we label the four values as 0, 1, 2 and 3. There are four bitmaps, appearing as four columns in Figure 1, each representing whether the value of X is one of the four choices. For convenience, we label the four bitmaps as b0, . . . , b3. When processing the query “select * from T where X < 2,” the main operation on the bitmaps is the bitwise logical operation “b0 OR b1.” Since bitwise logical operations are well supported by the computer hardware, bitmap indices are very efficient [O’Neil 1987]. The earlier forms of bitmap indices were commonly used to implemented in- verted files [Knuth 1998; Wong et al. 1985]. The first commercial product to make extensive use of the bitmap index was Model 204 [O’Neil 1987]. In many data ware- house applications, bitmap indices perform better than tree-based schemes, such as the variants of B-tree or R-tree [J¨urgens and Lenz 1999; Chan and Ioannidis 1998; O’Neil 1987; Wu and Buchmann 1998]. According to the performance model 8 · Kesheng Wu et al. 128 bits 1*1,20*0,3*1,79*0,25*1 31-bit groups 1,20*0,3*1,7*0 62*0 10*0,21*1 4*1 literal (hex) 40000380 00000000 00000000 001FFFFF 0000000F WAH (hex) 40000380 80000002 001FFFFF 0000000F Fig. 2. A WAH compressed bitmap. Each WAH word (last row) represents a multiple of 31 bits from the input bitmap, except the last word that represents the four leftover bits. Similar to WBC and BBC, WAH is also based on the basic idea of run-length encoding. However, unlike WBC or BBC, WAH does not use any header words or bytes, which removes the dependency mentioned above. In BBC, there are at least four different types of runs and a number of different ways to interpret a byte. In WAH, there are only two types of regular words: literal words and fill words. In our implementation, we use the most significant bit of a word to distinguish between a literal word (0) and a fill word (1). This choice allows us to distinguish a literal word from a fill word without explicitly extracting any bit. Let w denote the number of bits in a word, the lower (w−1) bits of a literal word contain the literal bit values. The second most significant bit of a fill word is the fill bit (0 or 1), and the remaining bits store the fill length. WAH imposes the word alignment requirement on the fills; all fill lengths are integer multiples of the Fig. 1. bitmap index 図は[K. Wu et al, Optimizing Bitmap Indices With Efficient Compression, 2006]より引用 Fig. 2. A WAH compressed bitmap. e.g. ) Word-Aligned Hybrid (WAH) encoding literal fill
  8. 8. A Tunable Compression Framework for Bitmap Indices 8 }  WAHの拡張(Enhanced WAH(EWAH), Concide, PLWAH) や可変セグメント長(VLC, Partitioned WAH(PWAH))のアル ゴリズムを共存させ,最もパフォーマンスの良いものを選択 }  可変セグメント長のビットマップを用いた効率的なクエリ処理 アルゴリズム(セグメントのアライメントを保持)を開発 Session 14: Data Warehousing 担当:若森拓馬(NTT) Legal Segments (LS) Bitmap Characterization Encoding Selector Segment Length Selector Compress VAL Encoder VAL Query Engine Disk Query Processor Fetch, Decode word size (w), alignment factor (b) tuning (lambda) Queries Results Data Set WAH PLWAH EWAH Compressed Bitmap ... 1. The VAL System Framework: Encoder and Query Engine ecause tuples can be ordered arbitrarily in relations, re- ering has been applied to maximize runs [18], [19], [20], , [22]. Finding an optimal order, however, has been shown e NP-Complete [23], and different heuristics have been posed, such as lexicographic order, Gray codes [19], and mming-distance-based [24], among others. Reordering pro- es longer runs for the first few bit vectors, but generally riorate into shorter runs (and worse, random noise) in higher dimensions. Such a pattern means that WAH can eve optimal compression for the first few columns, but compression reduces for the later columns. This is the ivation for allowing varying segment lengths for each bit or. revious efforts have also recognized the advantages in g variable segment lengths for encoding. In VLC [13], trary segment lengths can be used to encode each bit vector. III. VARIABLE ALIGNED LENGTH (VAL) FRAMEWORK Most modern bitmap compression techniques are variants of the Word-Aligned Hybrid (WAH) encoding, which uses w-bit words to align with the underlying CPU architecture, e.g., w = 32 or w = 64. While the word size w is fixed on physical constraints, there is no such requirement that the segment length s, i.e., the unit of compression, must be fixed at s = w 1. Indeed, WAH-style fills and literals can easily be represented in s = 7 bit segments, which is packed into the physical unit of bytes rather than words[12]. The selection of the compression method, and indepen- dently, the segment length s, are both data and application- dependent. It is observable that there exists clear scenarios in which one method outperforms the others in time and/or space. We have identified two orthogonal aspects that can be generalized: (1) the encoding segment length s and (2) the encoding method used for compression. We propose a uni- Fig. 1. The VAL System Framework: Encoder and Query Engine 異なるセグメント長で エンコードされた2つの bit vectorをクエリ可能 なアルゴリズムを設計
  9. 9. A Tunable Compression Framework for Bitmap Indices 9 }  ビットマップ処理の共通化 }  WAHのコアのアルゴリズムに小さな 修正を加える事でPLWAH, Concise, Compax, EWAHに対して拡張可能 Session 14: Data Warehousing 担当:若森拓馬(NTT) typedef struct {! /* holds encoded word value */! word_t val;! /* fill-specific vars */! bool fillBit;! int runLength;! } activeWord;! valの最上位ビット 0: fill,1: literal 1000 000000000000000000101000001000 101010000000100000000000100001 0010100000100001 000000000000000000101000001000 101010000000100000000000100001 1000 001010000010000 10101000000010000 1010100000001000 0000000001000010 000000000100001 ... activeBlock s=30 new Block s=15 activeBlock s=30 new Block s=15 new Block s=15 64-bit VAL-WAH(s=30) 64-bit VAL-WAH(s=15) 4. Example of converting a 64-bit VAL compressed bit vector using s=30 down to a 64-bit VAL compressed bit vector using segment s=15. }  セグメント長の選択 and short fills. The Segment Length Selector uses the profile and LS to identify an appropriate s 2 LS to compress each bit vector. In general, bitmaps compressed using smaller segments will compress more aggressively, but may require more decoding and bookkeeping operations when executing queries. To exploit this trade-off, we allow users to input a tuning parameter , 0   1. As approaches 0, the system will attempt to achieve the best compression possible, while a approaching 1 prioritizes faster query execution time. Given a bit vector B and , the Segment Length Selector will return s = ( sc+i, if size(B,sc)·(1+ )1+i+ i+1 size(B, sc+i) sc, otherwise (2) where size(B, s) is the size of the bit vector B when com- pressed with segment length s. The term, sc = arg min si2LS {size(B, si)} (3) refers to the segment length that yields the most compressed bit vector. Similarly, sc+i denotes the ith legal segment length greater than sc in LS. After these parameters are selected, B is compressed. A header byte must be appended to the beginning of each compressed bit vector. The most significant four bits in the header are used to identify the multiplier m for the alignment factor b, such that (m⇥b) 2 LS is the selected segment length. The remaining four bits encodes the method used, e.g., WAH or PLWAH. D. The VAL Query Engine Enabling variable segment lengths complicates query pro- ermined to be a fill. cases when executing the AND between the X and Y. (Case 1) If X and Y are both fills, the word with its fillBit equal to the result of .fillBit. The new fill word’s runLength X.runLength-Y.runLength). (Case 2) If X rals, then the result is a new literal word with & Y.val. (Case 3) Finally, if X is a literal en the number of segments in the fill word ed by one: Y.runLength--. Afterwards, the eral word with val being set to the & result nd the literal value of Y.fillBit. Bit vectors y decoded one bit at a time. Considering each g unit, operations of type (Case 1) observe a up, while operations of type (Case 2 and Case ⇥ speedup, where s is the encoded segment ed encoding similarities of the WAH variants, WAH’s core processing algorithm can also d to process PLWAH, Concise, Compax, or or modifications. For PLWAH and Concise, activeWord word could produce one fill and he position bits for the fill are not all zeros. word either following or preceding the fill, PLWAH and Concise. The logic for query ns similar; the difference is to operate both nd the fill before decoding the next word. Compax, there will be more branch opera- uses four types of words. For fill words, more red to decide whether the fill is of the form The goal of our framework is to improve compression without adversely affecting query performance. For this reason the segment lengths s cannot be arbitrary, as we would lose the alignment benefits. Queries would suffer from considerable de coding overhead during query execution. Given the machine’ word size w and an alignment factor b, b  w, we define the set of Legal Segment Lengths LS as, LS = {2i ⇥ (b 1) | 0  i  (log2 w log2 b)} (1 On a 64-bit architecture (w = 64) and alignment facto b = 16, the legal segment lengths are, LS = {15, 30, 60} This definition of segment lengths ensures that larger segmen lengths are always multiple of smaller segment lengths, and therefore the activeBlocks they create are always logically aligned. To further reduce the overhead of query execution the segments are also memory-aligned, i.e., segments neve cross over two physical words. For instance, segment length s = 15, 30, and 60 encapsulate four, two, and one block(s into a single physical word, respectively. When needed, block are padded with zeros within the physical word. For example recall that each block is encoded using s+1 bits (the one extra bit is needed to flag the block as being either a literal or a fill) and short fills. The Segment Length Selector uses the profile and LS to identify an appropriate s 2 LS to compress each bit vector. In general, bitmaps compressed using smaller segments will compress more aggressively, but may require more decoding and bookkeeping operations when executing queries. To exploit this trade-off, we allow users to input a tuning parameter , 0   1. As approaches 0, the system will attempt to achieve the best compression possible, while a approaching 1 prioritizes faster query execution time. Given a bit vector B and , the Segment Length Selector will return s = ( sc+i, if size(B,sc)·(1+ )1+i+ i+1 size(B, sc+i) sc, otherwise (2) where size(B, s) is the size of the bit vector B when com- pressed with segment length s. The term, sc = arg min si2LS {size(B, si)} (3) refers to the segment length that yields the most compressed bit vector. Similarly, sc+i denotes the ith legal segment length greater than sc in LS. After these parameters are selected, B is compressed. A header byte must be appended to the beginning of each compressed bit vector. The most significant four bits in the header are used to identify the multiplier m for the alignment factor b, such that (m⇥b) 2 LS is the selected segment length. The remaining four bits encodes the method used, e.g., WAH or PLWAH. D. The VAL Query Engine Enabling variable segment lengths complicates query pro- cessing. As discussed previously, queries are executed by performing logical bitwise operations between bit vectors. In general, the more compressed blocks are contained in the bit vector, the longer it takes to execute the query as compressed decremented by 1 wit Algorithm 1: Genera Input: Bit Vector X, Y Output: Bit Vector Z: T performing the 1 while X and Y are not 2 if X.currentWord is 3 X.decodeNext 4 end 5 if Y .currentWord is 6 Y .decodeNextW 7 end 8 while X.currentWo 9 if X.activeBlo 10 X.activeB 11 end 12 if Y .activeBloc 13 Y .activeB 14 end 15 if X.activeBlo 16 nSegmen 17 18 Z.addFill( 19 20 X.activeB 21 Y .activeB 22 end 23 else 24 Z.addLite 25 26 end 27 end 28 end 29 return Z; Note the similaritie processing algorithm d specific details are ena method decodeNextBl mented WAH within o query execution over 64bit(w=64),b=16のときLS={15,30,60}でアライメント (size(B, s)はbit vector B のサイズを表す) Fig. 4. 64-bit VAL (s=30) を64-bit VAL (s=15) に変換する例 左シフト 左シフト activeWordがfillの場合のみ代入される }  セグメント長の変換アルゴリズム Legal Segment Lengths: LS
  10. 10. A Tunable Compression Framework for Bitmap Indices 10 Session 14: Data Warehousing 担当:若森拓馬(NTT) }  結果 data as for synthetically generated data. VAL-WAH offers the best compression and better query time than WAH32. EWAH has the best performance in terms of query time, however this is mostly due to a difference in implementation. In our experiments we used ArrayLists for storing the bit vectors in all encoding techniques except EWAH. For EWAH, we compared using the implementation offered by [18]. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 kddcup berkeley earth CompressionRatio PLWAH32 EWAH32 EWAH64 WAH 32 WAH64 VAL0.2 (a) Compression Ratio: Real Data 0 0.005 0.01 0.015 0.02 0.025 0.03 kddcup berkeley earthQueryTimeRatio PLWAH32 EWAH32 EWAH 64 WAH 32 WAH64 VAL 0.2 (b) Query Time Ratio: Real Data Fig. 8. Compression and Query Time Ratio Comparison - Real Data: Sorted To help simplify the discussion on trade-off, we combine compression ratio and query time ratio into a single metric, gain. Presuming that speedup and compression rates are equally weighed, we can use the harmonic mean HM of the two ratios, gain = 1 H = query ratio + compression ratio 2 ⇥ query ratio ⇥ compression ratio Fig. 9. Combined Ga Arrays for storing use ArrayLists. Be PLWAH are able Fig. 10. Combined G ere generated using two different d zipf. These two distributions orld bitmaps because continuous dinality attributes are transformed ning) before creating the bitmap the data into bitmap indices de- or discretization. For example, if ing roughly the same number of bution of the bitmaps is uniform. n method is based on clustering, n the bins would follow a skewed ution generator assigns each bit a = 1/kf Pn i=1(1/if ) ements determined by cardinality, ficient f creates an exponentially nerated data sets for f = 1 and e synthetic bitmap index contains vectors (4 attributes each with a We also included two real data sets set was used for The Third dge Discovery and Data Mining hich was held in conjunction with contains 4, 898, 431 rows and 42 attributes were discretized into 25 The Berkeley Earth Surface Tem- ated a preliminary merged data set ion temperature reports from 16 ves. For our experiments we use 4, 786, 160 rows and 7 attributes. cretized with up to 25 bins. data sets was done using the least [25]. For sorted data experiments, s used to reorder the bitmaps. uted over a machine with an Intel MB Cache, 3.20 GHz) and 8 GB ws 7 Enterprise. Our code was ative performance measurements ations of EWAH and Concise. r two randomly selected columns as generated. Each query applies rresponding bitmaps. For query ke into account the time required mory. Note that comparatively, this he techniques that produce larger 64). However, since all bitmaps ing time can be done once and ber of queries. The query set was sets/KDD+Cup+1999+Data executed six times, and the result for the first run was discarded to prevent Java’s just-in-time compilation from skewing results. The times from the other five runs were averaged and reported. For all experiments, we assume w = 64 bits, the alignment factor b = 16, and parameter is set between 0 and 1 to choose among s 2 {15, 30, 60}. The query processing involving two bitmap vectors encoded using different segment lengths would convert the larger encoding down to the smaller one, as presented in Algorithm 2. B. Tuning Parameter The parameter allows users to tune the aggressiveness of our compression technique. Specifically, ! 0 prioritizes on the index size, while ! 1 prioritizes speed. 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0 0.2 0.4 0.6 0.8 1 CompressionRatio Lambda 2M-rows 4M-rows 6M-rows 8M-rows 10M-rows (a) Synthetic Data (zipf2) - Sorted: Compression Ratio 0 5 10 15 20 25 30 0 0.2 0.4 0.6 0.8 1 QueryTime(ms) Lambda 2M-rows 4M-rows 6M-rows 8M-rows 10M-rows (b) Synthetic Data (zipf2) - Sorted: Query Time Fig. 5. VAL: Effects of on Compression and Query Time: Synthetic Data Figure 5 shows the effect of over the compression ratio and query time for a synthetic sorted data set with zipf-2 distribution. Figure 5(a) shows the compression ratio as is varied from 0 to 1 by increments of 0.2. The compression ratio is computed as sizecompressed/sizeverbatim. As expected, compression ratio increases (i.e., we lose compression) as increases, which translates into faster query times as captured in Figure 5(b). Similar results were also observed for the real data sets and for non-sorted data. When = 0.0, VAL uses the smallest segment length (s = 15) for encoding over 70% of the columns, which reduces to little over 20% for = 0.8. Increasing allows more columns }  VAL-WAR (λ=0.2)がリアルデータに 対して高い圧縮率と短いクエリ時間を達成 }  チューニングパラメータλで 圧縮率とクエリ時間を設定可能 Fig. 5. Fig. 8.
  11. 11. Pagrol: Parallel Graph OLAP over Large-scale Attributed Graphs 11 }  目的 }  属性グラフ(Attributed Graphs)に対するOLAP分析 }  貢献 }  Pagrol: 並列グラフOLAPシステムを構築し,属性グラフを意思決定 サービスに拡張するHyper Graph Cubeモデルを設計 }  MapReduce上のグラフキューブ計算を最適化する手法(self- contained join, cuboid batching, cost-based batch processing) を提案 }  効率的なキューブマテリアライゼーション手法(MRGraph-Cubing): 大規模属性グラフに対する最初のグラフキューブ手法 Session 14: Data Warehousing 担当:若森拓馬(NTT) model is not applicable aph structures. Thus, we model that supports the ond, for large attributed promising approach to uch, we seek to develop ing queries represented aph processing systems 3], GraphLab [4], Pow- ex-centric and follow a ere vertices send mes- nections or IDs). They rocessing operators that as page rank, shortest tering[3]. However, it is ion. In fact, it will incur g the attributes values) vertices/edges with the ertex-centric processing tion paradigm has been n supporting aggregate- multiple iterations [6]. computation is exactly h has been successfully e graph mining [7][8], cube computation. p a new graph OLAP ed graphs as well as to over MR that is able to putation algorithm. Our s follows: OLAP system, Pagrol, to ing support over large- pport graph OLAP, Pa- raph cube model, Hyper ion making services on h Cube is able to capture ned three categories into odel supports a new set operations on attributed ion techniques to tackle II presents our Hyper Graph Cube model used in Pagrol , followed by discussions on the query support and OLAP operations. In Section III, we introduce a naive MR-based graph cube computation scheme. Section IV provides the MRGraph-Cubing approached adopted in Pagrol. In Section V, we report the experimental results. Finally, Section VI and Section VII provide the related works and conclusion of this paper respectively. II. HYPER GRAPH CUBE MODEL ϭ Ϯ ϯ ϳ ϴ ϰ ϲ ϵ ϱ ,' *HQGHU 1DWLRQ 3URIHVVLRQ W9 'DWH 7SH 6WUHQJWKV9 Fig. 1. A running example of an attributed graph An attributed graph is able to model various information networks by adding attributes to each vertex and edge. We first provide a formal definition of an attributed graph. Definition 1 (Attributed Graph): An attributed graph, G, is a graph denoted as G=(V, E, Av, Ae), where V is a set of vertices, E ⊆ V × V is a set of edges, and Av = (Av1, Av2, ..., Avn) is a set of n vertex-specific attributes, i.e. ∀u ∈ V , there is a multidimensional tuple Av(u) denoted as Av(u) = (Av1(u), Av2(u), ..., Avn(u)), and Ae=(Ae1, Ae2, ..., Aem) is a set of m edge-specific attributes, i.e. ∀e ∈ E, there is a multidimensional tuple Ae(e) denoted as Ae(e)= (Ae1(e), Ae2(e), ..., Aem(e)). To develop graph OLAP and warehousing, we formally define two types of dimensions in attributed graphs as follows: Fig. 1. A running example of an attributed graph 属性グラフ(Attributed graph) 頂点:オブジェクト 辺:関係 頂点は頂点の属性(vertex attributes)をもつ 辺は辺の属性(edge attributes)をもつ
  12. 12. Pagrol: Parallel Graph OLAP over Large-scale Attributed Graphs 12 }  Hyper Graph Cubeモデル }  Hyper Graph Cube上のOLAP処理 }  Roll-Up/Drill-Down }  頂点や辺のdimensionが階層的になる }  位置: 都市→州→国→全部 }  時間: 日→週→月→年→全部 Session 14: Data Warehousing 担当:若森拓馬(NTT) ! $ !% ! ! $)! (! )! *! % !$ ! ()*!$%)! ()! )*! *(!%(! $*(! ()*!$% ! $%()! $%*(!$%)*!$%()*!$()*!%()*! $%()*! $% ! 9$JJ ($JJ 9($JJ PDOH IHPDOH )ULHQG )DPLO )ULHQG ROOHDJXH )ULHQG )DPLO D
  13. 13. XERLG RQ *HQGHU 7SH! 86$ 6* KLQD E
  14. 14. XERLG RQ 1DWLRQ 'DWH! F
  15. 15. 7KH H[SHQGHG ZKROH ODWWLFH $)! %(! $ % $% $ $% î $ *(1'(5 %1$7,21 352)(66,21 ('DWH )7SH *6WUHQJWK G
  16. 16. 7KH 9$JJ ODWWLFH ; WKH ($JJ ODWWLFH $% (! $ (! % *! $ )! %()*!$%(! % ( ) () *( ()* )* * $()! Fig. 2. The Hyper Graph Cube lattice and aggregate cuboids analysis. For instance, the vertex degree can be extracted as one V-Dim for aggregation. This may extend the utility of the A ′ e(e) = A ′ e(e ′ ) and e ∈ Ge(e ′ ). The weight of e ′ , w(e ′ )=Γ (G (e ′ )), where Γ (·) is an edge aggregate Fig. 2. Hyper Graph Cube の 格子(lattice) と集約されたcuboid Vertex Dimensions (V-Dims) Edge Dimensions (E-Dims) OLAP over Hyper Graph Cube Roll-Up/Drill-Down Vertex or edge dimensions could be hieratical Location: City State Country all Time: Day Week Month Year all 15 country, month state, month city, month state, year state, week Motivation | Graph Aggregation | Graph Cube Model | Cube Materialization| Experiments country, year state, month city, year city, week country, week プレゼン(*) P.15
  17. 17. 13 }  Self-Contained Join }  DFSへの書き込み/再読み込みが発生しない }  Blk-Gen job }  頂点と辺のテーブルをMapフェーズで読み込む }  Reduceフェーズでself-containedなデータブロックを 生成して一つのファイルに書き出す }  Cuboid Batching }  バッチのない場合: ABC, AB, A }  Batch Processing }  Prefix orderにしたがって実行 }  ABCàA,AB,ABC }  EFGàE,EF,EFG }  ABC, EFG à { A, E, A, EF, A, EFG, AB, E, AB, EF, AB, EFG, ABC, E, ABC, EF, ABC, EFG} }  Cost-Based Execution Plan Optimization Session 14: Data Warehousing 担当:若森拓馬(NTT) Pagrol: Parallel Graph OLAP over Large-scale Attributed Graphs as a Cubing job, where each cuboid is processed independently. Obviously, calculating the cuboids in V-Agg and E-Agg based on the original vertex and edge attribute tables is more suitable than the joined data. Thus, this cubing job can take both original tables and joined data as input. Recall that each cuboid in the VE-Agg is an aggregate graph as defined in Definition 4. To construct such an aggregate graph, each condensed vertex actually can be calculated while processing the cuboids in V- Agg since they would group the vertices in the same way. Thus, we only need to calculate the weighted edges between two condensed vertices in the aggregate graph based on the joined data. The algorithm is provided in Alg. 1. This naive algorithm extracts, shuffles and processes each cuboid’s group- by attributes independently. However, for large attributed graph, such a naive algorithm is expected to perform poorly for two reasons: Size of joined data: As one vertex may be associated with multiple edges, the join output of the Join-Job can be very large. Assume that the average degree of each vertex, the size of vertex attribute table and the size of the edge attribute table are d, |V | and |E| respectively. The size of the output of the Join-Job is almost d ∗ |V | + |E|. Size of intermediate data: Processing each cuboid inde- pendently may generate a large amount of intermediate data, since each cuboid extracts its own (k,v) pairs which will incur high sorting/shuffling overheads. IV. MRGRAPH-CUBING GRAPH CUBE MATERIALIZATION In this section, we introduce an efficient Hyper Graph Cube computation approach, MRGraph-Cubing to handle large at- 9B1XP (B1XP (B6WDUW3RV 9 $WWU $WWU $WWUQ 9 $WWU $WWU $WWUQ 9 9 $WWU $WWU $WWUP 9 9 $WWU $WWU $WWUP ,ĞĂĚĞƌ ^ĞŐŵĞŶƚ EŽĚĞ ^ĞŐŵĞŶƚ ĚŐĞ ^ĞŐŵĞŶƚ Fig. 3. The self-contained file format Meanwhile, it also partitions the vertex information to the corresponding reducers whose edges contain the same vertex IDs. Note that each vertex is shuffled to multiple reducers as needed. In the reduce phase, the Blk-Gen job generates a series of self-contained data blocks and outputs each block as one file whose format is described in Fig. 3. Under this scheme, in each file, the vertex can be shared by multiple edges instead of being replicated multiple times. As MR does not partition the input file if it is not bigger than one block, each self-contained file will be supplied to one mapper directly to perform a join in the second cubing join. B. Cuboids Batching To build a graph cube, computing each cuboid indepen- dently is clearly inefficient. A more efficient solution, which we advocate, is to combine cuboids into batches so that intermediate data and computation can be shared and salvaged. For the scheme to be effective, we first identify the cuboids that can be combined and batched together. Here, we assume that we materialize the complete cube. Our scheme can be Fig. 3. self-contained file format Cuboid Batching 21 ABC AB A ABC Partition by ABC Partition by AB Partition by A Partition by A A1BC A1B A1 ABC ABC Compute 3 cuboids without batching: ABC, AB, A • Large intermediate data • Sorting overhead • Shuffling overhead • Small intermediate data • Share sorting • Share shuffling Mapper Mapper Reducer Reducer A AB ABC A2BC A1BC A3BC Motivation | Graph Aggregation | Graph Cube Model | Cube Materialization| Experiments Project via descendant cuboid Partition via ancestor cuboid Computing cuboids into batch: according to prefix order プレゼン(*) P.21 The benefits of this approach are: 1) In the reduce phase, the group-by dimensions are all in sorted order for every cuboid in the batch, since MR would sort the data before supplying to the reduce function. This is an efficient way of cube computation since it obtains sorting for “free” and no other extra sorting is needed for aggregation. 2) All the ancestor cuboids do not need to shuffle their own intermediate data but use their descendant’s. This would significantly reduce the intermediate data size, and thus remove a lot of data sorting/partitioning/shuffling overheads. In Graph cube computation, we claim that the cuboids in VE-Agg can be batched together under MR if they satisfy the following combine criterion: Criterion 2: Among the multiple VE-Agg cuboids, any two of them have the ancestor/descendant relationship. In addition, the V-Dims between any two of the cuboids share the same $ % $% % $ $% î ( ) () )* *( ()* *% % $ *( *) )* $ $% $% ( () ()* Fig. 4. The generated batches Definition 7 (Partition Cuboid): Given one batch, i cuboid A is the ancestor of all other cuboids, A is defined a the Partition Cuboid, denoted as Par Cubd. During the cub computation, the aggregation dimensions of the A are used t partition the key. Fig. 4. 生成されたバッチ Cuboid Batching 21 ABC AB A ABC Partition by ABC Partition by AB Partition by A Partition by A A1BC A1B A1 ABC ABC Compute 3 cuboids without batching: ABC, AB, A • Large intermediate data • Sorting overhead • Shuffling overhead • Small intermediate data • Share sorting • Share shuffling Mapper Mapper Reducer Reducer A AB ABC A2BC A1BC A3BC Motivation | Graph Aggregation | Graph Cube Model | Cube Materialization| Experiments Project via descendant cuboid Partition via ancestor cuboid Computing cuboids into batch: according to prefix order }  バッチのある場合: prefix orderにしたがって実行
  18. 18. 14 }  MRGraph-Cubing }  ナイーブな手法:各cuboidを独立して計算 }  CuboidをBatchに統合 }  Batchの実行プランを生成 }  複数のBatchをBagに置く }  1つのBagを1 M/R Jobで実行 Session 14: Data Warehousing 担当:若森拓馬(NTT) Pagrol: Parallel Graph OLAP over Large-scale Attributed Graphs Cube Materialization Naïve: Calculating each cuboid independently MRGraph-Cubing approach Combine the cuboids into batches Generate the batch execution plan Put different batches into bags Run each bag using one MR job Cuboids 20 Batches Bags Motivation | Graph Aggregation | Graph Cube Model | Cube Materialization| Experiments プレゼン(*) P.20
  19. 19. 15 }  結果 }  (*) 発表スライド: }  http://www.comp.nus.edu.sg/~wangzk/document/ Pagrol_ICDE14_Slides.pdf Session 14: Data Warehousing 担当:若森拓馬(NTT) Pagrol: Parallel Graph OLAP over Large-scale Attributed Graphs 400 600 800 1000 1200 k=10 k=20 k=40 k=60 k=80 k=100 Executiontime(s) map-join blk-gen VE-join VE-parse (a) Impact of graph degree 400 600 800 1000 1200 Size=10 Size=20 Size=30 Size=40 Size=50 Executiontime(s) map-join blk-gen VE-join VE-parse (b) Impact of V-Dim size 0 500 1000 1500 2000 0.01 0.06 0.18 0.38 0.71 0.85 RunningTime(s) Combine Ratio single-C-per-job single-B-per-job all-Cs-one-job all-Bs-one-job (c) Impact of batch processing Fig. 8. Evaluation for self-contained join and batch processing and {Affiliation, TotalFriends}, ∗ , we can easily obtain the top 5 communities which has the largest number of people in each category as shown in Fig. 6. We can see that “Teach for America”, “Texas AM University” and “New York, NY” have the largest number of people with more than 800 friends on Facebook. The result also indicates that “New York” belongs to the top 5 in each category which shows that people living in New York are very active in Facebook. Next, we investigate how people are connected with each other between the cities in California: LA (Los Angles), OC (Orange City), SFO (San Francisco) and SV (Silicon Valley). We perform such a query based on the VE-Agg cuboid Region, Type. Fig. 7 provides an aggregate graph which gives us a much simpler and clearer picture of the original Fackbook data. We observe that there are only 4 cross-city colleague relationships (denoted as C:4 in the figure) between SV and SFO. Such a relationship suggests that some compa- nies may be operating in these two cities. By drilling down, we discover that three of the relationships were built among the people working in Tellme Inc. which is a big company operating in both cities. Another observation is that cross-city schoolmate relationships (denoted as S:21) appear the most For the following experiments, we use the synthetic dataset. Each graph contains 3 V-Dims and 3 E-Dims. By default, the average size of each V-Dim or E-Dim is 7 bytes. Meanwhile, we choose COUNT(·) as the measure function for both the vertex and edge aggregation. Note that if a graph contains N edges with degree K, there will be N*2/K vertices in the graph. B. Self-Contained Join Optimization In this experiment, we compare the proposed self-contained join optimization with the naive approach. For our proposed self-contained join optimization, we have two MR jobs: blk- gen job which reads the vertex and edge attribute tables and generates a series of self-contained files, and map-join job which reads the self-contained files, conducts the map side join and ends after the join operation in the map phase. For the naive approach, we also have two MR jobs: VE-join job which joins the vertex with edge attribute tables, and VE-parse job which reads the joined data and ends when it finishes parsing each joined data in the map phase. Figure 8 (a) shows the performance comparison on datasets with 200 million(M) vertices when we vary the graph degree from 10 to 100. The results indicate that our optimization Fig. 8. self-containing joinとbatch processingの評価 800 1000 1200 1400 1600 1800 2000 0.01 0.06 0.20 0.41 0.73 0.87 RunningTime(s) Combine Ratio single-B-per-job Plan all-Bs-one-job Plan optimized Plan (a) Batch execution plans 0 200 400 600 800 1000 1200 120M 180M 270M 405M 600M 750M RunningTime(s) Number of Edges cube materialization (b) Impact of graph size 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 16 32 64 128 RunningTime(s) Number of Cluster Nodes cube materialization (c) Impact of parallelism Fig. 9. Evaluation of the plan optimizer and scalability Figure 8 (c) provides the comparison results based on the datasets with 600M edges and with degree 60 when we vary the average combine ratio of the graph. The results indicate that the algorithms with batching optimization are 2.5X and 4X faster than single-C-per-job and all-Cs-one-job respectively. Meanwhile, we observe that the larger the intermediate data (with a bigger combine ratio) are, the higher the performance gain by the batching optimization. time becomes almost 1.5X slower. This indicates that the algorithm scales well in terms of dataset size. The second set of experiments aims at studying the impact of parallelism. Fig. 9 (c) provides the execution times of cube materialization on a dataset with 750M vertices with degree 60 when we vary the cluster size from 16 to 128 nodes. The results indicate that when the computation power is doubled, the execution time almost reduces to half from 16 to 64 nodes. Fig. 9. plan optimizerとスケーラビリティの評価 MRGraph-Cubingにより 2x~4.5xの高速化

×