Report

Share

Follow

•0 likes•11,516 views

•0 likes•11,516 views

Report

Share

Presented at the 26th International Symposium on String Processing and Information Retrieval (SPIRE 2019)

Follow

- 1. SPIRE2019:26thInternationalSymposiumonStringProcessingandInformationRetrieval Fast Identification of Heavy Hitters by Cached and Packed Group Testing Hiroki Arimura Hokkaido University Japan Takeaki Uno NII Japan Yusaku Kaneta Rakuten Mobile, Inc. Japan
- 2. 2 The φ-Heavy Hitters Problem [Cormode, Muthukrishnan, ACM TODS 2005] §Tracking φ-heavy hitters in a dynamic multiset S of elements. • φ-heavy hitter: element in universe U = [0..n) with its frequency more than φ|S|. • Challenges: large alphabet, output sensitivity, high-speed operation Input: data stream of pairs (xi, Δi) ∈ U × {±1} and real numbers ε, φ in [0, 1). Task: Maintain frequency information of elements for supporting • QUERY(): return a set R ⊆ U such that R includes (1) all φ-heavy hitters and (2) no others whose frequency is no more than (φ − ε)N for N = Σi Δi. • INSERT(x)/DELETE(x): increment/decrement the frequency Nx of element x. The (ε-approximate) φ-Heavy Hitters Problem in the turnstile model Model of computation: The standard w-bit word RAM
- 3. 3 Large Universes in Mobile Networks The operation time of existing practical methods depends on log |U| = log n (Large in practice!) Combinatorial Group Testing [Cormode, Muthukrishnan, ACM TODS 2005] Hierarchical Count-Min Sketch [Cormode, Muthukrishnan, LATIN 2005] IPv4 IPv6 Examples of universe U log |U| = log n IP addresses 32 128 Pairs of IP addresses 64 256 Five tuples (source/destination IP/Port + Protocol) 104 296 Q. Can we eliminate dependency on log n from operation time?
- 4. 4 Main results Key technique: Packed Bidirectional Counter Arrays Our paper also proposes "cached candidate technique” for improving CGT for arbitrary updates. This study CGT: Combinatorial Group Testing [Cormode, Muthukrishnan, ACM TODS 2015] Update O(r) amortized O(log(n)r) O(logb(n)r) Query O(r2/ε) O((log(n)+r)r/ε) O((blogb(n)+r)r/ε) Space O(log(n)r/ε) O(log(n)r/ε) O(blogb(n)r/ε) n: size of universe δ: failure probability r = log(1/(δφ)) b: any integer in [2..n] Model of computation: The standard w-bit word RAM
- 5. 5 Related Work: Packed Counters Maintaining an array ofm = O(w) counters on the w-bit word RAM. §Textbook solution for a single counter [Mehlhorn & Sanders, 2008]: • Ops = inc/test or dec/test: O(1) space; O(m) amortized time. §Nested counters [Grabowski & Fredriksson, IPL 2008]: • Ops = inc/test: O(m) space; O(1) amortized time §Trit counters [Bille & Thorup, SODA 2010] • Ops = dec/reset/test: O(m) space; O(1) amortized time. §Bidirectional counters [This talk] • Ops = inc/dec/test: O(m) space; O(1) time for inc/dec (amortized) and test. • Naïve bidirectional counters: O(m) space; O(m) time for all operations. "test": ispositive (C[i] > 0), iszero (C[i] = 0), or isnegative (C[i] < 0)
- 6. 6 How to improve CGT using Packed Bidirectional Counter Arrays?
- 7. 7 CGT: A Practical Data Structure §Reports all φ-heavy hitters with probability at least 1 – δ for a specified δ. §Idea: Random partition of U into d = 2/ε subsets via each hash function hi. • A φ-heavy hitter x can be identified from each C[i, hi(x), 0..m] with probability at least 1/2. • Setting r = log(1/(δφ) results in a desired failure probability δ of missing any φ-heavy hitter. Combinatorial Group Testing [CM, ACM TODS 2005] 1. Three-dimensional counter array: C[1..r, 1..d, 0..m] 2. A set of universal hash functions: h1, ..., hr: U → [1..d] r = log(1/(δφ)) d = 2/ε m = 1 + lg n
- 8. 8 CGT: A Practical Data Structure §Reports all φ-heavy hitters with probability at least 1 – δ for a specified δ. Combinatorial Group Testing [CM, ACM TODS 2005] INCREMENT(C, x): C[i] = C[i] + bit(x, i) for every i in [1..m]. DECREMENT(C, x): C[i] = C[i] − bit(x, i) for every i in [1..m]. ISPOSITIVE(C) Return z = Σi [C[i] > 0] · 2i [X] is 1 (resp. 0) if X is true (resp. false) CGT reduces both QUERY and UPDATE to three basic operations on bidirectional counter array C[1..m]: CGT: A Practical Data Structure 1. Three-dimensional counter array: C[1..r, 1..d, 0..m] 2. A set of universal hash functions: h1, ..., hr: U → [1..d]
- 9. 9 CGT: A Practical Data Structure Combinatorial Group Testing [CM, ACM TODS 2005] UPDATE(x, Δ): O(log(n)r) time 1. Add Δ to N 2. for i in [1..r] do: 3. Add Δ to C[i, hi(x), 0] 4. if Δ < 0 then: x ← ~x 5. INCREMENT(C[i, hi(x), 1..m], x) 6. DECREMENT(C[i, hi(x), 1..m], ~x) QUERY(): O((log(n)+r)r/ε) time 1. for i in [1..r] do: 2. for j in [1..d] do: 3. // C[i, j, k] 2C[i, j, k] – C[i, j, 0] 4. x ← ISPOSITIVE(C[i, j, 1..m]) 5. if mini C[i, hi(x), 0] > φN then: 6. report x as a φ-heavy hitter INCREMENT(C, x): C[i] = C[i] + bit(x, i) for every i in [1..m]. DECREMENT(C, x): C[i] = C[i] − bit(x, i) for every i in [1..m]. ISPOSITIVE(C) Return z = Σi [C[i] > 0] · 2i [X] is 1 (resp. 0) if X is true (resp. false)
- 10. 10 CGT: A Practical Data Structure Combinatorial Group Testing [CM, ACM TODS 2005] UPDATE(x, Δ): O(log(n)r) time 1. Add Δ to N 2. for i in [1..r] do: 3. Add Δ to C[i, hi(x), 0] 4. if Δ < 0 then: x ← ~x 5. INCREMENT(C[i, hi(x), 1..m], x) 6. DECREMENT(C[i, hi(x), 1..m], ~x) QUERY(): O((log(n)+r)r/ε) time 1. for i in [1..r] do: 2. for j in [1..d] do: 3. // C[i, j, k] 2C[i, j, k] – C[i, j, 0] 4. x ← ISPOSITIVE(C[i, j, 1..m]) 5. if mini C[i, hi(x), 0] > φN then: 6. report x as a φ-heavy hitter INCREMENT(C, x): C[i] = C[i] + bit(x, i) for every i in [1..m]. DECREMENT(C, x): C[i] = C[i] − bit(x, i) for every i in [1..m]. ISPOSITIVE(C) Return z = Σi [C[i] > 0] 2i. [X] is 1 (resp. 0) if X is true (resp. false) Q. Can we implement INCREMENT/DECREMENT/ISPOSITIVE in o(m) time?
- 11. 11 Packed Bidirectional Counter Arrays §Basic idea: Exploiting word-level parallelism of the w-bit word RAM • Redundant binary representation of C[1..m] using digits {0, ±1, ±2}. • The corresponding k-th digits of C[1..m] are packed into O(1) words. • The packed k-th digits of C[1..m] are updated in O(1) time, once in 2k times. INCREMENT(C, x): C[i] = C[i] + bit(x, i) for every i in [1..m]. DECREMENT(C, x): C[i] = C[i] − bit(x, i) for every i in [1..m]. ISPOSITIVE(C) Return z = Σi [C[i] > 0] 2i. [X] is 1 (resp. 0) if X is true (resp. false) O(1) amortized time O(1) amortized time O(1) time using O(m) space (compact!) for m = O(w)
- 12. 12 Packed Bidirectional Counter Arrays §Basic idea: Exploiting word-level parallelism of the w-bit word RAM • Redundant binary representation of C[1..m] using digits {0, ±1, ±2}. • The corresponding k-th digits of C[1..m] are packed into O(1) words. • The packed k-th digits of C[1..m] are updated in O(1) time, once in 2k times. ··· ··· m × w = O(w2) bits: O(w) time to access Naïve bidirectional counters C[1] C[i] C[m] Packed bidirectional counter array m × O(1) = O(w) bits: O(1) time to access C[1] ··· C[i] ··· C[m] wdigits
- 13. 13 Packed Bidirectional Counter Arrays = 1 = 0 in {0, ±1} in {0, ±1, ±2} Fixed-schedule carry propagation in O(1) amortized time [GF, IPL 2008][BT, SODA 2010] Packed redundant binary counters using digits {0, ±1, ±2} Packed orders of magnitudes for detecting sign inversion ··· t 0 1 2 ··· level(t) ··· 1 2 3 4 5 6 7 8 3 9 level(t) = min{i | t mod 2i = 0} 1. Propagate carry bits 2. Fix orders of magnitudesThe k-th digits are updated once in 2k times Never overflow The t-th update:
- 14. 14 Lemma (Packed Bidirectional Counters) There exists an O(m)-space data structure for representing an array C[1..m] of m bidirectional counters supporting §INCREMENT/DECREMENT in O(1) amortized time §ISPOSITIVE in O(1) time on the standard w-bit word RAM with w ≥ m.
- 15. 15 Theorem §Plugging packed bidirectional counters into CGT, we obtain: There exists an O(lg(n)r/ε)-space randomized data structure for solving the ε-approximate φ-heavy hitters problem with §INSERT/DELETE in O(r) amortized time §QUERY in O(r2/ε) time with probability at least 1 - δ on the standard w-bit word RAM with w ≥ lg n. Here, n is the universe size, δ is a failure probability, and r = lg(1/(δφ)).
- 16. 16 Experiments: Setup §Data: 14 datasets of 10 M integers • Universe: [0, 264). • Zipf distribution of skewness z in { 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0 }. • Threshold φ (= ε) in { 0.0001, 0.0005, 0.001, 0.005, 0.01}. §Methods: • Ours [This work]: Our proposed method with #rounds r = 4. • CGT(b) [CM, TODS 2005]: Combinatorial Group Testing with branching factor b in { 2, 16 }. • CMH(b) [CM, LATIN 2005]: Hierarchical Count-Min Sketch with branching factor b in { 2/16 }. CGT(b) and CMH(b) were configured as in [Cormode, Hadjieleftheriou, PVLDB 2008]. §Hardware: • MacBook Pro with Intel® Core™ i7-8559 (2.7GHz) and 16GB main memory.
- 17. 17 Experiments: Precision §Ours achieved competitive precisions for skewness z ≥ 1.4. • Ours output more false positives than others for skewness z < 1.4. • For z < 1.4, ours should have used larger ε to suppress false positives. • Recalls of all methods were 100%. 0.8 1.2 1.6 2.0 0 20 40 60 80 100 Precision(%) = 0.0001 0.8 1.2 1.6 2.0 = 0.0005 0.8 1.2 1.6 2.0 Skewness = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 2.0 = 0.005 0.8 1.2 1.6 2.0 = 0.01 1.2 1.6 2.0 = 0.0005 0.8 1.2 1.6 2.0 Skewness = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 = 0.005 CMH(16)CMH(2)CGT(16)Ours CGT(2) [CM, TODS 2005] [CM, LATIN 2005]
- 18. 18 Experiments: Update time §Ours achieved competitive update throughputs with CMH(16). • CMH(16) achieved best and stable update throughputs. • CGT(16) had heavy dependence on φ even if it doesn’t in theory. • CGT(2) and CMH(2) were not competitive. 0.8 1.2 1.6 2.0 0 5000 10000 15000 20000 25000 30000 Updates/msec = 0.0001 0.8 1.2 1.6 2.0 = 0.0005 0.8 1.2 1.6 2.0 Skewness = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 2.0 = 0.005 0.8 1.2 1.6 2.0 = 0.01 Note: Median of 5 measured times is reported 1.2 1.6 2.0 = 0.0005 0.8 1.2 1.6 2.0 Skewness = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 = 0.005 CMH(16)CMH(2)CGT(16)Ours CGT(2) [CM, TODS 2005] [CM, LATIN 2005]
- 19. 19 Experiments: Query time §Ours achieved best query throughputs except for φ = 0.0001. • Note: ε = φ and r = O(1) in our experiments. • CGT family (including ours) must examine Θ(1/φ) candidates of heavy hitters. • CMH family is output sensitive: it is fast if # of heavy hitters is less than 1/φ. 0.8 1.2 1.6 2.0 0 1 2 3 4 5 Queries/msec = 0.0001 0.8 1.2 1.6 2.0 0 5 10 15 20 = 0.0005 0.8 1.2 1.6 2.0 Skewness 0 10 20 30 40 = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 2.0 0 50 100 150 200 250 = 0.005 0.8 1.2 1.6 2.0 0 200 400 600 800 1000 1200 = 0.01 1.2 1.6 2.0 = 0.0005 0.8 1.2 1.6 2.0 Skewness = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 = 0.005 CMH(16)CMH(2)CGT(16)Ours CGT(2) [CM, TODS 2005] [CM, LATIN 2005]
- 20. 20 Conclusion §The φ-Heavy Hitters Problem in the strict turnstile model. We improved CGT [CM, ACM TODS 2005] in • Update time: from O(log(n)r) to amortized O(r) • Query time: from O((log(n)+r)r/ε) to O(r2/ε) using the same O(log(n)r/ε) space for a universe of size n and r = log(1/(δφ)). §Packed Bidirectional Counter Array: • Extension of [GF, IPL 2008] and [BT, SODA 2010] to bidirectional counters. • Ops = inc/dec/test: O(1) amortized inc/dec and O(1) test in compact space. §Future work • Extension of our method to arbitrary updates.