Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fast Identification of Heavy Hitters by Cached and Packed Group Testing

3,846 views

Published on

Presented at the 26th International Symposium on String Processing and Information Retrieval (SPIRE 2019)

Published in: Technology
  • Login to see the comments

  • Be the first to like this

Fast Identification of Heavy Hitters by Cached and Packed Group Testing

  1. 1. SPIRE2019:26thInternationalSymposiumonStringProcessingandInformationRetrieval Fast Identification of Heavy Hitters by Cached and Packed Group Testing Hiroki Arimura Hokkaido University Japan Takeaki Uno NII Japan Yusaku Kaneta Rakuten Mobile, Inc. Japan
  2. 2. 2 The φ-Heavy Hitters Problem [Cormode, Muthukrishnan, ACM TODS 2005] §Tracking φ-heavy hitters in a dynamic multiset S of elements. • φ-heavy hitter: element in universe U = [0..n) with its frequency more than φ|S|. • Challenges: large alphabet, output sensitivity, high-speed operation Input: data stream of pairs (xi, Δi) ∈ U × {±1} and real numbers ε, φ in [0, 1). Task: Maintain frequency information of elements for supporting • QUERY(): return a set R ⊆ U such that R includes (1) all φ-heavy hitters and (2) no others whose frequency is no more than (φ − ε)N for N = Σi Δi. • INSERT(x)/DELETE(x): increment/decrement the frequency Nx of element x. The (ε-approximate) φ-Heavy Hitters Problem in the turnstile model Model of computation: The standard w-bit word RAM
  3. 3. 3 Large Universes in Mobile Networks The operation time of existing practical methods depends on log |U| = log n (Large in practice!) Combinatorial Group Testing [Cormode, Muthukrishnan, ACM TODS 2005] Hierarchical Count-Min Sketch [Cormode, Muthukrishnan, LATIN 2005] IPv4 IPv6 Examples of universe U log |U| = log n IP addresses 32 128 Pairs of IP addresses 64 256 Five tuples (source/destination IP/Port + Protocol) 104 296 Q. Can we eliminate dependency on log n from operation time?
  4. 4. 4 Main results Key technique: Packed Bidirectional Counter Arrays Our paper also proposes "cached candidate technique” for improving CGT for arbitrary updates. This study CGT: Combinatorial Group Testing [Cormode, Muthukrishnan, ACM TODS 2015] Update O(r) amortized O(log(n)r) O(logb(n)r) Query O(r2/ε) O((log(n)+r)r/ε) O((blogb(n)+r)r/ε) Space O(log(n)r/ε) O(log(n)r/ε) O(blogb(n)r/ε) n: size of universe δ: failure probability r = log(1/(δφ)) b: any integer in [2..n] Model of computation: The standard w-bit word RAM
  5. 5. 5 Related Work: Packed Counters Maintaining an array ofm = O(w) counters on the w-bit word RAM. §Textbook solution for a single counter [Mehlhorn & Sanders, 2008]: • Ops = inc/test or dec/test: O(1) space; O(m) amortized time. §Nested counters [Grabowski & Fredriksson, IPL 2008]: • Ops = inc/test: O(m) space; O(1) amortized time §Trit counters [Bille & Thorup, SODA 2010] • Ops = dec/reset/test: O(m) space; O(1) amortized time. §Bidirectional counters [This talk] • Ops = inc/dec/test: O(m) space; O(1) time for inc/dec (amortized) and test. • Naïve bidirectional counters: O(m) space; O(m) time for all operations. "test": ispositive (C[i] > 0), iszero (C[i] = 0), or isnegative (C[i] < 0)
  6. 6. 6 How to improve CGT using Packed Bidirectional Counter Arrays?
  7. 7. 7 CGT: A Practical Data Structure §Reports all φ-heavy hitters with probability at least 1 – δ for a specified δ. §Idea: Random partition of U into d = 2/ε subsets via each hash function hi. • A φ-heavy hitter x can be identified from each C[i, hi(x), 0..m] with probability at least 1/2. • Setting r = log(1/(δφ) results in a desired failure probability δ of missing any φ-heavy hitter. Combinatorial Group Testing [CM, ACM TODS 2005] 1. Three-dimensional counter array: C[1..r, 1..d, 0..m] 2. A set of universal hash functions: h1, ..., hr: U → [1..d] r = log(1/(δφ)) d = 2/ε m = 1 + lg n
  8. 8. 8 CGT: A Practical Data Structure §Reports all φ-heavy hitters with probability at least 1 – δ for a specified δ. Combinatorial Group Testing [CM, ACM TODS 2005] INCREMENT(C, x): C[i] = C[i] + bit(x, i) for every i in [1..m]. DECREMENT(C, x): C[i] = C[i] − bit(x, i) for every i in [1..m]. ISPOSITIVE(C) Return z = Σi [C[i] > 0] · 2i [X] is 1 (resp. 0) if X is true (resp. false) CGT reduces both QUERY and UPDATE to three basic operations on bidirectional counter array C[1..m]: CGT: A Practical Data Structure 1. Three-dimensional counter array: C[1..r, 1..d, 0..m] 2. A set of universal hash functions: h1, ..., hr: U → [1..d]
  9. 9. 9 CGT: A Practical Data Structure Combinatorial Group Testing [CM, ACM TODS 2005] UPDATE(x, Δ): O(log(n)r) time 1. Add Δ to N 2. for i in [1..r] do: 3. Add Δ to C[i, hi(x), 0] 4. if Δ < 0 then: x ← ~x 5. INCREMENT(C[i, hi(x), 1..m], x) 6. DECREMENT(C[i, hi(x), 1..m], ~x) QUERY(): O((log(n)+r)r/ε) time 1. for i in [1..r] do: 2. for j in [1..d] do: 3. // C[i, j, k] 2C[i, j, k] – C[i, j, 0] 4. x ← ISPOSITIVE(C[i, j, 1..m]) 5. if mini C[i, hi(x), 0] > φN then: 6. report x as a φ-heavy hitter INCREMENT(C, x): C[i] = C[i] + bit(x, i) for every i in [1..m]. DECREMENT(C, x): C[i] = C[i] − bit(x, i) for every i in [1..m]. ISPOSITIVE(C) Return z = Σi [C[i] > 0] · 2i [X] is 1 (resp. 0) if X is true (resp. false)
  10. 10. 10 CGT: A Practical Data Structure Combinatorial Group Testing [CM, ACM TODS 2005] UPDATE(x, Δ): O(log(n)r) time 1. Add Δ to N 2. for i in [1..r] do: 3. Add Δ to C[i, hi(x), 0] 4. if Δ < 0 then: x ← ~x 5. INCREMENT(C[i, hi(x), 1..m], x) 6. DECREMENT(C[i, hi(x), 1..m], ~x) QUERY(): O((log(n)+r)r/ε) time 1. for i in [1..r] do: 2. for j in [1..d] do: 3. // C[i, j, k] 2C[i, j, k] – C[i, j, 0] 4. x ← ISPOSITIVE(C[i, j, 1..m]) 5. if mini C[i, hi(x), 0] > φN then: 6. report x as a φ-heavy hitter INCREMENT(C, x): C[i] = C[i] + bit(x, i) for every i in [1..m]. DECREMENT(C, x): C[i] = C[i] − bit(x, i) for every i in [1..m]. ISPOSITIVE(C) Return z = Σi [C[i] > 0] 2i. [X] is 1 (resp. 0) if X is true (resp. false) Q. Can we implement INCREMENT/DECREMENT/ISPOSITIVE in o(m) time?
  11. 11. 11 Packed Bidirectional Counter Arrays §Basic idea: Exploiting word-level parallelism of the w-bit word RAM • Redundant binary representation of C[1..m] using digits {0, ±1, ±2}. • The corresponding k-th digits of C[1..m] are packed into O(1) words. • The packed k-th digits of C[1..m] are updated in O(1) time, once in 2k times. INCREMENT(C, x): C[i] = C[i] + bit(x, i) for every i in [1..m]. DECREMENT(C, x): C[i] = C[i] − bit(x, i) for every i in [1..m]. ISPOSITIVE(C) Return z = Σi [C[i] > 0] 2i. [X] is 1 (resp. 0) if X is true (resp. false) O(1) amortized time O(1) amortized time O(1) time using O(m) space (compact!) for m = O(w)
  12. 12. 12 Packed Bidirectional Counter Arrays §Basic idea: Exploiting word-level parallelism of the w-bit word RAM • Redundant binary representation of C[1..m] using digits {0, ±1, ±2}. • The corresponding k-th digits of C[1..m] are packed into O(1) words. • The packed k-th digits of C[1..m] are updated in O(1) time, once in 2k times. ··· ··· m × w = O(w2) bits: O(w) time to access Naïve bidirectional counters C[1] C[i] C[m] Packed bidirectional counter array m × O(1) = O(w) bits: O(1) time to access C[1] ··· C[i] ··· C[m] wdigits
  13. 13. 13 Packed Bidirectional Counter Arrays = 1 = 0 in {0, ±1} in {0, ±1, ±2} Fixed-schedule carry propagation in O(1) amortized time [GF, IPL 2008][BT, SODA 2010] Packed redundant binary counters using digits {0, ±1, ±2} Packed orders of magnitudes for detecting sign inversion ··· t 0 1 2 ··· level(t) ··· 1 2 3 4 5 6 7 8 3 9 level(t) = min{i | t mod 2i = 0} 1. Propagate carry bits 2. Fix orders of magnitudesThe k-th digits are updated once in 2k times Never overflow The t-th update:
  14. 14. 14 Lemma (Packed Bidirectional Counters) There exists an O(m)-space data structure for representing an array C[1..m] of m bidirectional counters supporting §INCREMENT/DECREMENT in O(1) amortized time §ISPOSITIVE in O(1) time on the standard w-bit word RAM with w ≥ m.
  15. 15. 15 Theorem §Plugging packed bidirectional counters into CGT, we obtain: There exists an O(lg(n)r/ε)-space randomized data structure for solving the ε-approximate φ-heavy hitters problem with §INSERT/DELETE in O(r) amortized time §QUERY in O(r2/ε) time with probability at least 1 - δ on the standard w-bit word RAM with w ≥ lg n. Here, n is the universe size, δ is a failure probability, and r = lg(1/(δφ)).
  16. 16. 16 Experiments: Setup §Data: 14 datasets of 10 M integers • Universe: [0, 264). • Zipf distribution of skewness z in { 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0 }. • Threshold φ (= ε) in { 0.0001, 0.0005, 0.001, 0.005, 0.01}. §Methods: • Ours [This work]: Our proposed method with #rounds r = 4. • CGT(b) [CM, TODS 2005]: Combinatorial Group Testing with branching factor b in { 2, 16 }. • CMH(b) [CM, LATIN 2005]: Hierarchical Count-Min Sketch with branching factor b in { 2/16 }. CGT(b) and CMH(b) were configured as in [Cormode, Hadjieleftheriou, PVLDB 2008]. §Hardware: • MacBook Pro with Intel® Core™ i7-8559 (2.7GHz) and 16GB main memory.
  17. 17. 17 Experiments: Precision §Ours achieved competitive precisions for skewness z ≥ 1.4. • Ours output more false positives than others for skewness z < 1.4. • For z < 1.4, ours should have used larger ε to suppress false positives. • Recalls of all methods were 100%. 0.8 1.2 1.6 2.0 0 20 40 60 80 100 Precision(%) = 0.0001 0.8 1.2 1.6 2.0 = 0.0005 0.8 1.2 1.6 2.0 Skewness = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 2.0 = 0.005 0.8 1.2 1.6 2.0 = 0.01 1.2 1.6 2.0 = 0.0005 0.8 1.2 1.6 2.0 Skewness = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 = 0.005 CMH(16)CMH(2)CGT(16)Ours CGT(2) [CM, TODS 2005] [CM, LATIN 2005]
  18. 18. 18 Experiments: Update time §Ours achieved competitive update throughputs with CMH(16). • CMH(16) achieved best and stable update throughputs. • CGT(16) had heavy dependence on φ even if it doesn’t in theory. • CGT(2) and CMH(2) were not competitive. 0.8 1.2 1.6 2.0 0 5000 10000 15000 20000 25000 30000 Updates/msec = 0.0001 0.8 1.2 1.6 2.0 = 0.0005 0.8 1.2 1.6 2.0 Skewness = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 2.0 = 0.005 0.8 1.2 1.6 2.0 = 0.01 Note: Median of 5 measured times is reported 1.2 1.6 2.0 = 0.0005 0.8 1.2 1.6 2.0 Skewness = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 = 0.005 CMH(16)CMH(2)CGT(16)Ours CGT(2) [CM, TODS 2005] [CM, LATIN 2005]
  19. 19. 19 Experiments: Query time §Ours achieved best query throughputs except for φ = 0.0001. • Note: ε = φ and r = O(1) in our experiments. • CGT family (including ours) must examine Θ(1/φ) candidates of heavy hitters. • CMH family is output sensitive: it is fast if # of heavy hitters is less than 1/φ. 0.8 1.2 1.6 2.0 0 1 2 3 4 5 Queries/msec = 0.0001 0.8 1.2 1.6 2.0 0 5 10 15 20 = 0.0005 0.8 1.2 1.6 2.0 Skewness 0 10 20 30 40 = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 2.0 0 50 100 150 200 250 = 0.005 0.8 1.2 1.6 2.0 0 200 400 600 800 1000 1200 = 0.01 1.2 1.6 2.0 = 0.0005 0.8 1.2 1.6 2.0 Skewness = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 = 0.005 CMH(16)CMH(2)CGT(16)Ours CGT(2) [CM, TODS 2005] [CM, LATIN 2005]
  20. 20. 20 Conclusion §The φ-Heavy Hitters Problem in the strict turnstile model. We improved CGT [CM, ACM TODS 2005] in • Update time: from O(log(n)r) to amortized O(r) • Query time: from O((log(n)+r)r/ε) to O(r2/ε) using the same O(log(n)r/ε) space for a universe of size n and r = log(1/(δφ)). §Packed Bidirectional Counter Array: • Extension of [GF, IPL 2008] and [BT, SODA 2010] to bidirectional counters. • Ops = inc/dec/test: O(1) amortized inc/dec and O(1) test in compact space. §Future work • Extension of our method to arbitrary updates.

×