Fast Identification of Heavy Hitters by Cached and Packed Group Testing

Rakuten Group, Inc.
Rakuten Group, Inc.Rakuten Group, Inc.
SPIRE2019:26thInternationalSymposiumonStringProcessingandInformationRetrieval
Fast Identification of Heavy Hitters
by Cached and Packed Group Testing
Hiroki Arimura
Hokkaido University
Japan
Takeaki Uno
NII
Japan
Yusaku Kaneta
Rakuten Mobile, Inc.
Japan
2
The φ-Heavy Hitters Problem
[Cormode, Muthukrishnan, ACM TODS 2005]
§Tracking φ-heavy hitters in a dynamic multiset S of elements.
• φ-heavy hitter: element in universe U = [0..n) with its frequency more than φ|S|.
• Challenges: large alphabet, output sensitivity, high-speed operation
Input: data stream of pairs (xi, Δi) ∈ U × {±1} and real numbers ε, φ in [0, 1).
Task: Maintain frequency information of elements for supporting
• QUERY(): return a set R ⊆ U such that R includes (1) all φ-heavy hitters and
(2) no others whose frequency is no more than (φ − ε)N for N = Σi Δi.
• INSERT(x)/DELETE(x): increment/decrement the frequency Nx of element x.
The (ε-approximate) φ-Heavy Hitters Problem in the turnstile model
Model of computation: The standard w-bit word RAM
3
Large Universes in Mobile Networks
The operation time of existing practical methods depends on
log |U| = log n (Large in practice!)
Combinatorial Group Testing [Cormode, Muthukrishnan, ACM TODS 2005]
Hierarchical Count-Min Sketch [Cormode, Muthukrishnan, LATIN 2005]
IPv4 IPv6
Examples of universe U log |U| = log n
IP addresses 32 128
Pairs of IP addresses 64 256
Five tuples (source/destination IP/Port + Protocol) 104 296
Q. Can we eliminate dependency on log n from operation time?
4
Main results
Key technique: Packed Bidirectional Counter Arrays
Our paper also proposes "cached candidate technique” for improving CGT for arbitrary updates.
This study CGT: Combinatorial Group Testing
[Cormode, Muthukrishnan, ACM TODS 2015]
Update O(r) amortized O(log(n)r) O(logb(n)r)
Query O(r2/ε) O((log(n)+r)r/ε) O((blogb(n)+r)r/ε)
Space O(log(n)r/ε) O(log(n)r/ε) O(blogb(n)r/ε)
n: size of universe δ: failure probability r = log(1/(δφ)) b: any integer in [2..n]
Model of computation: The standard w-bit word RAM
5
Related Work: Packed Counters
Maintaining an array ofm = O(w) counters on the w-bit word RAM.
§Textbook solution for a single counter [Mehlhorn & Sanders, 2008]:
• Ops = inc/test or dec/test: O(1) space; O(m) amortized time.
§Nested counters [Grabowski & Fredriksson, IPL 2008]:
• Ops = inc/test: O(m) space; O(1) amortized time
§Trit counters [Bille & Thorup, SODA 2010]
• Ops = dec/reset/test: O(m) space; O(1) amortized time.
§Bidirectional counters [This talk]
• Ops = inc/dec/test: O(m) space; O(1) time for inc/dec (amortized) and test.
• Naïve bidirectional counters: O(m) space; O(m) time for all operations.
"test": ispositive (C[i] > 0), iszero (C[i] = 0), or isnegative (C[i] < 0)
6
How to improve CGT using
Packed Bidirectional Counter Arrays?
7
CGT: A Practical Data Structure
§Reports all φ-heavy hitters with probability at least 1 – δ for a specified δ.
§Idea: Random partition of U into d = 2/ε
subsets via each hash function hi.
• A φ-heavy hitter x can be identified from each
C[i, hi(x), 0..m] with probability at least 1/2.
• Setting r = log(1/(δφ) results in a desired failure
probability δ of missing any φ-heavy hitter.
Combinatorial Group Testing [CM, ACM TODS 2005]
1. Three-dimensional counter array: C[1..r, 1..d, 0..m]
2. A set of universal hash functions: h1, ..., hr: U → [1..d]
r = log(1/(δφ))
d = 2/ε
m = 1 + lg n
8
CGT: A Practical Data Structure
§Reports all φ-heavy hitters with probability at least 1 – δ for a specified δ.
Combinatorial Group Testing [CM, ACM TODS 2005]
INCREMENT(C, x):
C[i] = C[i] + bit(x, i)
for every i in [1..m].
DECREMENT(C, x):
C[i] = C[i] − bit(x, i)
for every i in [1..m].
ISPOSITIVE(C)
Return z = Σi [C[i] > 0] · 2i
[X] is 1 (resp. 0) if X is true (resp. false)
CGT reduces both QUERY and UPDATE
to three basic operations on bidirectional counter array C[1..m]:
CGT: A Practical Data Structure
1. Three-dimensional counter array: C[1..r, 1..d, 0..m]
2. A set of universal hash functions: h1, ..., hr: U → [1..d]
9
CGT: A Practical Data Structure
Combinatorial Group Testing [CM, ACM TODS 2005]
UPDATE(x, Δ): O(log(n)r) time
1. Add Δ to N
2. for i in [1..r] do:
3. Add Δ to C[i, hi(x), 0]
4. if Δ < 0 then: x ← ~x
5. INCREMENT(C[i, hi(x), 1..m], x)
6. DECREMENT(C[i, hi(x), 1..m], ~x)
QUERY(): O((log(n)+r)r/ε) time
1. for i in [1..r] do:
2. for j in [1..d] do:
3. // C[i, j, k] 2C[i, j, k] – C[i, j, 0]
4. x ← ISPOSITIVE(C[i, j, 1..m])
5. if mini C[i, hi(x), 0] > φN then:
6. report x as a φ-heavy hitter
INCREMENT(C, x):
C[i] = C[i] + bit(x, i)
for every i in [1..m].
DECREMENT(C, x):
C[i] = C[i] − bit(x, i)
for every i in [1..m].
ISPOSITIVE(C)
Return z = Σi [C[i] > 0] · 2i
[X] is 1 (resp. 0) if X is true (resp. false)
10
CGT: A Practical Data Structure
Combinatorial Group Testing [CM, ACM TODS 2005]
UPDATE(x, Δ): O(log(n)r) time
1. Add Δ to N
2. for i in [1..r] do:
3. Add Δ to C[i, hi(x), 0]
4. if Δ < 0 then: x ← ~x
5. INCREMENT(C[i, hi(x), 1..m], x)
6. DECREMENT(C[i, hi(x), 1..m], ~x)
QUERY(): O((log(n)+r)r/ε) time
1. for i in [1..r] do:
2. for j in [1..d] do:
3. // C[i, j, k] 2C[i, j, k] – C[i, j, 0]
4. x ← ISPOSITIVE(C[i, j, 1..m])
5. if mini C[i, hi(x), 0] > φN then:
6. report x as a φ-heavy hitter
INCREMENT(C, x):
C[i] = C[i] + bit(x, i)
for every i in [1..m].
DECREMENT(C, x):
C[i] = C[i] − bit(x, i)
for every i in [1..m].
ISPOSITIVE(C)
Return z = Σi [C[i] > 0] 2i.
[X] is 1 (resp. 0) if X is true (resp. false)
Q. Can we implement
INCREMENT/DECREMENT/ISPOSITIVE in o(m) time?
11
Packed Bidirectional Counter Arrays
§Basic idea: Exploiting word-level parallelism of the w-bit word RAM
• Redundant binary representation of C[1..m] using digits {0, ±1, ±2}.
• The corresponding k-th digits of C[1..m] are packed into O(1) words.
• The packed k-th digits of C[1..m] are updated in O(1) time, once in 2k times.
INCREMENT(C, x):
C[i] = C[i] + bit(x, i)
for every i in [1..m].
DECREMENT(C, x):
C[i] = C[i] − bit(x, i)
for every i in [1..m].
ISPOSITIVE(C)
Return z = Σi [C[i] > 0] 2i.
[X] is 1 (resp. 0) if X is true (resp. false)
O(1) amortized time O(1) amortized time O(1) time
using O(m) space (compact!) for m = O(w)
12
Packed Bidirectional Counter Arrays
§Basic idea: Exploiting word-level parallelism of the w-bit word RAM
• Redundant binary representation of C[1..m] using digits {0, ±1, ±2}.
• The corresponding k-th digits of C[1..m] are packed into O(1) words.
• The packed k-th digits of C[1..m] are updated in O(1) time, once in 2k times.
··· ···
m × w = O(w2) bits: O(w) time to access
Naïve bidirectional counters
C[1] C[i] C[m]
Packed bidirectional counter array
m × O(1) = O(w) bits: O(1) time to access
C[1]
···
C[i]
···
C[m]
wdigits
13
Packed Bidirectional Counter Arrays
= 1
= 0
in {0, ±1}
in {0, ±1, ±2}
Fixed-schedule
carry propagation
in O(1) amortized time
[GF, IPL 2008][BT, SODA 2010]
Packed
redundant binary counters
using digits {0, ±1, ±2}
Packed
orders of magnitudes
for detecting sign inversion
···
t
0
1
2
···
level(t)
···
1 2 3 4 5 6 7 8
3
9
level(t) = min{i | t mod 2i = 0}
1. Propagate carry bits 2. Fix orders of magnitudesThe k-th digits are updated
once in 2k times
Never
overflow
The t-th update:
14
Lemma (Packed Bidirectional Counters)
There exists an O(m)-space data structure for representing
an array C[1..m] of m bidirectional counters supporting
§INCREMENT/DECREMENT in O(1) amortized time
§ISPOSITIVE in O(1) time
on the standard w-bit word RAM with w ≥ m.
15
Theorem
§Plugging packed bidirectional counters into CGT, we obtain:
There exists an O(lg(n)r/ε)-space randomized data structure
for solving the ε-approximate φ-heavy hitters problem with
§INSERT/DELETE in O(r) amortized time
§QUERY in O(r2/ε) time with probability at least 1 - δ
on the standard w-bit word RAM with w ≥ lg n. Here, n is
the universe size, δ is a failure probability, and r = lg(1/(δφ)).
16
Experiments: Setup
§Data: 14 datasets of 10 M integers
• Universe: [0, 264).
• Zipf distribution of skewness z in { 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0 }.
• Threshold φ (= ε) in { 0.0001, 0.0005, 0.001, 0.005, 0.01}.
§Methods:
• Ours [This work]: Our proposed method with #rounds r = 4.
• CGT(b) [CM, TODS 2005]: Combinatorial Group Testing with branching factor b in { 2, 16 }.
• CMH(b) [CM, LATIN 2005]: Hierarchical Count-Min Sketch with branching factor b in { 2/16 }.
CGT(b) and CMH(b) were configured as in [Cormode, Hadjieleftheriou, PVLDB 2008].
§Hardware:
• MacBook Pro with Intel® Core™ i7-8559 (2.7GHz) and 16GB main memory.
17
Experiments: Precision
§Ours achieved competitive precisions for skewness z ≥ 1.4.
• Ours output more false positives than others for skewness z < 1.4.
• For z < 1.4, ours should have used larger ε to suppress false positives.
• Recalls of all methods were 100%.
0.8 1.2 1.6 2.0
0
20
40
60
80
100
Precision(%)
= 0.0001
0.8 1.2 1.6 2.0
= 0.0005
0.8 1.2 1.6 2.0
Skewness
= 0.001
Ours CGT2 CGT16 CMH2 CMH16
0.8 1.2 1.6 2.0
= 0.005
0.8 1.2 1.6 2.0
= 0.01
1.2 1.6 2.0
= 0.0005
0.8 1.2 1.6 2.0
Skewness
= 0.001
Ours CGT2 CGT16 CMH2 CMH16
0.8 1.2 1.6
= 0.005
CMH(16)CMH(2)CGT(16)Ours CGT(2)
[CM, TODS 2005] [CM, LATIN 2005]
18
Experiments: Update time
§Ours achieved competitive update throughputs with CMH(16).
• CMH(16) achieved best and stable update throughputs.
• CGT(16) had heavy dependence on φ even if it doesn’t in theory.
• CGT(2) and CMH(2) were not competitive.
0.8 1.2 1.6 2.0
0
5000
10000
15000
20000
25000
30000
Updates/msec
= 0.0001
0.8 1.2 1.6 2.0
= 0.0005
0.8 1.2 1.6 2.0
Skewness
= 0.001
Ours CGT2 CGT16 CMH2 CMH16
0.8 1.2 1.6 2.0
= 0.005
0.8 1.2 1.6 2.0
= 0.01
Note: Median of 5 measured times is reported
1.2 1.6 2.0
= 0.0005
0.8 1.2 1.6 2.0
Skewness
= 0.001
Ours CGT2 CGT16 CMH2 CMH16
0.8 1.2 1.6
= 0.005
CMH(16)CMH(2)CGT(16)Ours CGT(2)
[CM, TODS 2005] [CM, LATIN 2005]
19
Experiments: Query time
§Ours achieved best query throughputs except for φ = 0.0001.
• Note: ε = φ and r = O(1) in our experiments.
• CGT family (including ours) must examine Θ(1/φ) candidates of heavy hitters.
• CMH family is output sensitive: it is fast if # of heavy hitters is less than 1/φ.
0.8 1.2 1.6 2.0
0
1
2
3
4
5
Queries/msec
= 0.0001
0.8 1.2 1.6 2.0
0
5
10
15
20
= 0.0005
0.8 1.2 1.6 2.0
Skewness
0
10
20
30
40
= 0.001
Ours CGT2 CGT16 CMH2 CMH16
0.8 1.2 1.6 2.0
0
50
100
150
200
250
= 0.005
0.8 1.2 1.6 2.0
0
200
400
600
800
1000
1200
= 0.01
1.2 1.6 2.0
= 0.0005
0.8 1.2 1.6 2.0
Skewness
= 0.001
Ours CGT2 CGT16 CMH2 CMH16
0.8 1.2 1.6
= 0.005
CMH(16)CMH(2)CGT(16)Ours CGT(2)
[CM, TODS 2005] [CM, LATIN 2005]
20
Conclusion
§The φ-Heavy Hitters Problem in the strict turnstile model.
We improved CGT [CM, ACM TODS 2005] in
• Update time: from O(log(n)r) to amortized O(r)
• Query time: from O((log(n)+r)r/ε) to O(r2/ε)
using the same O(log(n)r/ε) space for a universe of size n and r = log(1/(δφ)).
§Packed Bidirectional Counter Array:
• Extension of [GF, IPL 2008] and [BT, SODA 2010] to bidirectional counters.
• Ops = inc/dec/test: O(1) amortized inc/dec and O(1) test in compact space.
§Future work
• Extension of our method to arbitrary updates.
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
1 of 21

Recommended

Fast Wavelet Tree Construction in Practice by
Fast Wavelet Tree Construction in PracticeFast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in PracticeRakuten Group, Inc.
14.5K views18 slides
Faster Practical Block Compression for Rank/Select Dictionaries by
Faster Practical Block Compression for Rank/Select DictionariesFaster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesRakuten Group, Inc.
13.7K views14 slides
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization by
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationHidekazu Oiwa
14.8K views22 slides
Porting and optimizing UniFrac for GPUs by
Porting and optimizing UniFrac for GPUsPorting and optimizing UniFrac for GPUs
Porting and optimizing UniFrac for GPUsIgor Sfiligoi
63 views1 slide
論文紹介 Fast imagetagging by
論文紹介 Fast imagetagging論文紹介 Fast imagetagging
論文紹介 Fast imagetaggingTakashi Abe
9.8K views16 slides
Hyperparameter optimization with approximate gradient by
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientFabian Pedregosa
13.6K views18 slides

More Related Content

What's hot

Profiling in Python by
Profiling in PythonProfiling in Python
Profiling in PythonFabian Pedregosa
2.1K views19 slides
19 algorithms-and-complexity-110627100203-phpapp02 by
19 algorithms-and-complexity-110627100203-phpapp0219 algorithms-and-complexity-110627100203-phpapp02
19 algorithms-and-complexity-110627100203-phpapp02Muhammad Aslam
360 views35 slides
Tpr star tree by
Tpr star treeTpr star tree
Tpr star treeWin Yu
1.3K views23 slides
A Note on TopicRNN by
A Note on TopicRNNA Note on TopicRNN
A Note on TopicRNNTomonari Masada
430 views2 slides
A Note on Latent LSTM Allocation by
A Note on Latent LSTM AllocationA Note on Latent LSTM Allocation
A Note on Latent LSTM AllocationTomonari Masada
382 views3 slides
Simple representations for learning: factorizations and similarities by
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Gael Varoquaux
1.1K views64 slides

What's hot(20)

19 algorithms-and-complexity-110627100203-phpapp02 by Muhammad Aslam
19 algorithms-and-complexity-110627100203-phpapp0219 algorithms-and-complexity-110627100203-phpapp02
19 algorithms-and-complexity-110627100203-phpapp02
Muhammad Aslam360 views
Tpr star tree by Win Yu
Tpr star treeTpr star tree
Tpr star tree
Win Yu1.3K views
A Note on Latent LSTM Allocation by Tomonari Masada
A Note on Latent LSTM AllocationA Note on Latent LSTM Allocation
A Note on Latent LSTM Allocation
Tomonari Masada382 views
Simple representations for learning: factorizations and similarities by Gael Varoquaux
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
Gael Varoquaux1.1K views
NTHU AI Reading Group: Improved Training of Wasserstein GANs by Mark Chang
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANs
Mark Chang2.2K views
Speaker Diarization by HONGJOO LEE
Speaker DiarizationSpeaker Diarization
Speaker Diarization
HONGJOO LEE3.1K views
Accelerating Pseudo-Marginal MCMC using Gaussian Processes by Matt Moores
Accelerating Pseudo-Marginal MCMC using Gaussian ProcessesAccelerating Pseudo-Marginal MCMC using Gaussian Processes
Accelerating Pseudo-Marginal MCMC using Gaussian Processes
Matt Moores235 views
ZK Study Club: Sumcheck Arguments and Their Applications by Alex Pruden
ZK Study Club: Sumcheck Arguments and Their ApplicationsZK Study Club: Sumcheck Arguments and Their Applications
ZK Study Club: Sumcheck Arguments and Their Applications
Alex Pruden191 views
Digit recognizer by convolutional neural network by Ding Li
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
Ding Li84 views
Computational Linguistics week 5 by Mark Chang
Computational Linguistics  week 5Computational Linguistics  week 5
Computational Linguistics week 5
Mark Chang926 views
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ... by Alex Pruden
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
zkStudyClub: PLONKUP & Reinforced Concrete [Luke Pearson, Joshua Fitzgerald, ...
Alex Pruden792 views
CPM2013-tabei201306 by Yasuo Tabei
CPM2013-tabei201306CPM2013-tabei201306
CPM2013-tabei201306
Yasuo Tabei4.5K views
Java program-to-calculate-area-and-circumference-of-circle by University of Essex
Java program-to-calculate-area-and-circumference-of-circleJava program-to-calculate-area-and-circumference-of-circle
Java program-to-calculate-area-and-circumference-of-circle

Similar to Fast Identification of Heavy Hitters by Cached and Packed Group Testing

SIAM - Minisymposium on Guaranteed numerical algorithms by
SIAM - Minisymposium on Guaranteed numerical algorithmsSIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsJagadeeswaran Rathinavel
64 views48 slides
ByteCode 2012 Talk: Quantitative analysis of Java/.Net like programs to under... by
ByteCode 2012 Talk: Quantitative analysis of Java/.Net like programs to under...ByteCode 2012 Talk: Quantitative analysis of Java/.Net like programs to under...
ByteCode 2012 Talk: Quantitative analysis of Java/.Net like programs to under...garbervetsky
485 views75 slides
Mm chap08 -_lossy_compression_algorithms by
Mm chap08 -_lossy_compression_algorithmsMm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithmsEellekwameowusu
66 views21 slides
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo... by
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...Alexander Litvinenko
91 views14 slides
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar... by
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Michael Lie
507 views36 slides
Digitalcontrolsystems by
DigitalcontrolsystemsDigitalcontrolsystems
DigitalcontrolsystemsSatish Gottumukkala
465 views8 slides

Similar to Fast Identification of Heavy Hitters by Cached and Packed Group Testing(20)

ByteCode 2012 Talk: Quantitative analysis of Java/.Net like programs to under... by garbervetsky
ByteCode 2012 Talk: Quantitative analysis of Java/.Net like programs to under...ByteCode 2012 Talk: Quantitative analysis of Java/.Net like programs to under...
ByteCode 2012 Talk: Quantitative analysis of Java/.Net like programs to under...
garbervetsky485 views
Mm chap08 -_lossy_compression_algorithms by Eellekwameowusu
Mm chap08 -_lossy_compression_algorithmsMm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithms
Eellekwameowusu66 views
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo... by Alexander Litvinenko
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
Possible applications of low-rank tensors in statistics and UQ (my talk in Bo...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar... by Michael Lie
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Michael Lie507 views
D I G I T A L C O N T R O L S Y S T E M S J N T U M O D E L P A P E R{Www by guest3f9c6b
D I G I T A L  C O N T R O L  S Y S T E M S  J N T U  M O D E L  P A P E R{WwwD I G I T A L  C O N T R O L  S Y S T E M S  J N T U  M O D E L  P A P E R{Www
D I G I T A L C O N T R O L S Y S T E M S J N T U M O D E L P A P E R{Www
guest3f9c6b793 views
Digital Control Systems Jntu Model Paper{Www.Studentyogi.Com} by guest3f9c6b
Digital Control Systems Jntu Model Paper{Www.Studentyogi.Com}Digital Control Systems Jntu Model Paper{Www.Studentyogi.Com}
Digital Control Systems Jntu Model Paper{Www.Studentyogi.Com}
guest3f9c6b1K views
FPGA based BCH Decoder by ijsrd.com
FPGA based BCH DecoderFPGA based BCH Decoder
FPGA based BCH Decoder
ijsrd.com568 views
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro... by Chiheb Ben Hammouda
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by Oracles by Vissarion Fisikopoulos
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by OraclesEfficient Volume and Edge-Skeleton Computation for Polytopes Given by Oracles
Efficient Volume and Edge-Skeleton Computation for Polytopes Given by Oracles
An Efficient Convex Hull Algorithm for a Planer Set of Points by Kasun Ranga Wijeweera
An Efficient Convex Hull Algorithm for a Planer Set of PointsAn Efficient Convex Hull Algorithm for a Planer Set of Points
An Efficient Convex Hull Algorithm for a Planer Set of Points

More from Rakuten Group, Inc.

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話 by
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話Rakuten Group, Inc.
126 views32 slides
楽天における安全な秘匿情報管理への道のり by
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のりRakuten Group, Inc.
174 views43 slides
What Makes Software Green? by
What Makes Software Green?What Makes Software Green?
What Makes Software Green?Rakuten Group, Inc.
138 views39 slides
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At... by
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Rakuten Group, Inc.
225 views33 slides
大規模なリアルタイム監視の導入と展開 by
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開Rakuten Group, Inc.
528 views18 slides
楽天における大規模データベースの運用 by
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用Rakuten Group, Inc.
796 views20 slides

More from Rakuten Group, Inc.(20)

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話 by Rakuten Group, Inc.
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
楽天における安全な秘匿情報管理への道のり by Rakuten Group, Inc.
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At... by Rakuten Group, Inc.
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
大規模なリアルタイム監視の導入と展開 by Rakuten Group, Inc.
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開
楽天における大規模データベースの運用 by Rakuten Group, Inc.
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用
楽天サービスを支えるネットワークインフラストラクチャー by Rakuten Group, Inc.
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー
楽天の規模とクラウドプラットフォーム統括部の役割 by Rakuten Group, Inc.
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割
The Data Platform Administration Handling the 100 PB.pdf by Rakuten Group, Inc.
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdf
Supporting Internal Customers as Technical Account Managers.pdf by Rakuten Group, Inc.
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdf
Travel & Leisure Platform Department's tech info by Rakuten Group, Inc.
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info by Rakuten Group, Inc.
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
100PBを越えるデータプラットフォームの実情 by Rakuten Group, Inc.
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情
社内エンジニアを支えるテクニカルアカウントマネージャー by Rakuten Group, Inc.
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー
モニタリングプラットフォーム開発の裏側 by Rakuten Group, Inc.
モニタリングプラットフォーム開発の裏側モニタリングプラットフォーム開発の裏側
モニタリングプラットフォーム開発の裏側

Recently uploaded

Measuring User on the web with the core web vitals - by @theafolayan.pptx by
Measuring User on the web with the core web vitals - by @theafolayan.pptxMeasuring User on the web with the core web vitals - by @theafolayan.pptx
Measuring User on the web with the core web vitals - by @theafolayan.pptxOluwaseun Raphael Afolayan
14 views13 slides
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...ShapeBlue
120 views12 slides
KubeConNA23 Recap.pdf by
KubeConNA23 Recap.pdfKubeConNA23 Recap.pdf
KubeConNA23 Recap.pdfMichaelOLeary82
28 views27 slides
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...ShapeBlue
209 views20 slides
CryptoBotsAI by
CryptoBotsAICryptoBotsAI
CryptoBotsAIchandureddyvadala199
42 views5 slides
LLMs in Production: Tooling, Process, and Team Structure by
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureAggregage
65 views77 slides

Recently uploaded(20)

Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue120 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue209 views
LLMs in Production: Tooling, Process, and Team Structure by Aggregage
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
Aggregage65 views
The Power of Generative AI in Accelerating No Code Adoption.pdf by Saeed Al Dhaheri
The Power of Generative AI in Accelerating No Code Adoption.pdfThe Power of Generative AI in Accelerating No Code Adoption.pdf
The Power of Generative AI in Accelerating No Code Adoption.pdf
Saeed Al Dhaheri44 views
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」 by PC Cluster Consortium
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」
Mobile Core Solutions & Successful Cases.pdf by IPLOOK Networks
Mobile Core Solutions & Successful Cases.pdfMobile Core Solutions & Successful Cases.pdf
Mobile Core Solutions & Successful Cases.pdf
IPLOOK Networks16 views
Deep Tech and the Amplified Organisation: Core Concepts by Holonomics
Deep Tech and the Amplified Organisation: Core ConceptsDeep Tech and the Amplified Organisation: Core Concepts
Deep Tech and the Amplified Organisation: Core Concepts
Holonomics17 views
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De... by Moses Kemibaro
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Moses Kemibaro38 views
Business Analyst Series 2023 - Week 4 Session 8 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10180 views
Business Analyst Series 2023 - Week 4 Session 7 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10152 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash171 views

Fast Identification of Heavy Hitters by Cached and Packed Group Testing

  • 1. SPIRE2019:26thInternationalSymposiumonStringProcessingandInformationRetrieval Fast Identification of Heavy Hitters by Cached and Packed Group Testing Hiroki Arimura Hokkaido University Japan Takeaki Uno NII Japan Yusaku Kaneta Rakuten Mobile, Inc. Japan
  • 2. 2 The φ-Heavy Hitters Problem [Cormode, Muthukrishnan, ACM TODS 2005] §Tracking φ-heavy hitters in a dynamic multiset S of elements. • φ-heavy hitter: element in universe U = [0..n) with its frequency more than φ|S|. • Challenges: large alphabet, output sensitivity, high-speed operation Input: data stream of pairs (xi, Δi) ∈ U × {±1} and real numbers ε, φ in [0, 1). Task: Maintain frequency information of elements for supporting • QUERY(): return a set R ⊆ U such that R includes (1) all φ-heavy hitters and (2) no others whose frequency is no more than (φ − ε)N for N = Σi Δi. • INSERT(x)/DELETE(x): increment/decrement the frequency Nx of element x. The (ε-approximate) φ-Heavy Hitters Problem in the turnstile model Model of computation: The standard w-bit word RAM
  • 3. 3 Large Universes in Mobile Networks The operation time of existing practical methods depends on log |U| = log n (Large in practice!) Combinatorial Group Testing [Cormode, Muthukrishnan, ACM TODS 2005] Hierarchical Count-Min Sketch [Cormode, Muthukrishnan, LATIN 2005] IPv4 IPv6 Examples of universe U log |U| = log n IP addresses 32 128 Pairs of IP addresses 64 256 Five tuples (source/destination IP/Port + Protocol) 104 296 Q. Can we eliminate dependency on log n from operation time?
  • 4. 4 Main results Key technique: Packed Bidirectional Counter Arrays Our paper also proposes "cached candidate technique” for improving CGT for arbitrary updates. This study CGT: Combinatorial Group Testing [Cormode, Muthukrishnan, ACM TODS 2015] Update O(r) amortized O(log(n)r) O(logb(n)r) Query O(r2/ε) O((log(n)+r)r/ε) O((blogb(n)+r)r/ε) Space O(log(n)r/ε) O(log(n)r/ε) O(blogb(n)r/ε) n: size of universe δ: failure probability r = log(1/(δφ)) b: any integer in [2..n] Model of computation: The standard w-bit word RAM
  • 5. 5 Related Work: Packed Counters Maintaining an array ofm = O(w) counters on the w-bit word RAM. §Textbook solution for a single counter [Mehlhorn & Sanders, 2008]: • Ops = inc/test or dec/test: O(1) space; O(m) amortized time. §Nested counters [Grabowski & Fredriksson, IPL 2008]: • Ops = inc/test: O(m) space; O(1) amortized time §Trit counters [Bille & Thorup, SODA 2010] • Ops = dec/reset/test: O(m) space; O(1) amortized time. §Bidirectional counters [This talk] • Ops = inc/dec/test: O(m) space; O(1) time for inc/dec (amortized) and test. • Naïve bidirectional counters: O(m) space; O(m) time for all operations. "test": ispositive (C[i] > 0), iszero (C[i] = 0), or isnegative (C[i] < 0)
  • 6. 6 How to improve CGT using Packed Bidirectional Counter Arrays?
  • 7. 7 CGT: A Practical Data Structure §Reports all φ-heavy hitters with probability at least 1 – δ for a specified δ. §Idea: Random partition of U into d = 2/ε subsets via each hash function hi. • A φ-heavy hitter x can be identified from each C[i, hi(x), 0..m] with probability at least 1/2. • Setting r = log(1/(δφ) results in a desired failure probability δ of missing any φ-heavy hitter. Combinatorial Group Testing [CM, ACM TODS 2005] 1. Three-dimensional counter array: C[1..r, 1..d, 0..m] 2. A set of universal hash functions: h1, ..., hr: U → [1..d] r = log(1/(δφ)) d = 2/ε m = 1 + lg n
  • 8. 8 CGT: A Practical Data Structure §Reports all φ-heavy hitters with probability at least 1 – δ for a specified δ. Combinatorial Group Testing [CM, ACM TODS 2005] INCREMENT(C, x): C[i] = C[i] + bit(x, i) for every i in [1..m]. DECREMENT(C, x): C[i] = C[i] − bit(x, i) for every i in [1..m]. ISPOSITIVE(C) Return z = Σi [C[i] > 0] · 2i [X] is 1 (resp. 0) if X is true (resp. false) CGT reduces both QUERY and UPDATE to three basic operations on bidirectional counter array C[1..m]: CGT: A Practical Data Structure 1. Three-dimensional counter array: C[1..r, 1..d, 0..m] 2. A set of universal hash functions: h1, ..., hr: U → [1..d]
  • 9. 9 CGT: A Practical Data Structure Combinatorial Group Testing [CM, ACM TODS 2005] UPDATE(x, Δ): O(log(n)r) time 1. Add Δ to N 2. for i in [1..r] do: 3. Add Δ to C[i, hi(x), 0] 4. if Δ < 0 then: x ← ~x 5. INCREMENT(C[i, hi(x), 1..m], x) 6. DECREMENT(C[i, hi(x), 1..m], ~x) QUERY(): O((log(n)+r)r/ε) time 1. for i in [1..r] do: 2. for j in [1..d] do: 3. // C[i, j, k] 2C[i, j, k] – C[i, j, 0] 4. x ← ISPOSITIVE(C[i, j, 1..m]) 5. if mini C[i, hi(x), 0] > φN then: 6. report x as a φ-heavy hitter INCREMENT(C, x): C[i] = C[i] + bit(x, i) for every i in [1..m]. DECREMENT(C, x): C[i] = C[i] − bit(x, i) for every i in [1..m]. ISPOSITIVE(C) Return z = Σi [C[i] > 0] · 2i [X] is 1 (resp. 0) if X is true (resp. false)
  • 10. 10 CGT: A Practical Data Structure Combinatorial Group Testing [CM, ACM TODS 2005] UPDATE(x, Δ): O(log(n)r) time 1. Add Δ to N 2. for i in [1..r] do: 3. Add Δ to C[i, hi(x), 0] 4. if Δ < 0 then: x ← ~x 5. INCREMENT(C[i, hi(x), 1..m], x) 6. DECREMENT(C[i, hi(x), 1..m], ~x) QUERY(): O((log(n)+r)r/ε) time 1. for i in [1..r] do: 2. for j in [1..d] do: 3. // C[i, j, k] 2C[i, j, k] – C[i, j, 0] 4. x ← ISPOSITIVE(C[i, j, 1..m]) 5. if mini C[i, hi(x), 0] > φN then: 6. report x as a φ-heavy hitter INCREMENT(C, x): C[i] = C[i] + bit(x, i) for every i in [1..m]. DECREMENT(C, x): C[i] = C[i] − bit(x, i) for every i in [1..m]. ISPOSITIVE(C) Return z = Σi [C[i] > 0] 2i. [X] is 1 (resp. 0) if X is true (resp. false) Q. Can we implement INCREMENT/DECREMENT/ISPOSITIVE in o(m) time?
  • 11. 11 Packed Bidirectional Counter Arrays §Basic idea: Exploiting word-level parallelism of the w-bit word RAM • Redundant binary representation of C[1..m] using digits {0, ±1, ±2}. • The corresponding k-th digits of C[1..m] are packed into O(1) words. • The packed k-th digits of C[1..m] are updated in O(1) time, once in 2k times. INCREMENT(C, x): C[i] = C[i] + bit(x, i) for every i in [1..m]. DECREMENT(C, x): C[i] = C[i] − bit(x, i) for every i in [1..m]. ISPOSITIVE(C) Return z = Σi [C[i] > 0] 2i. [X] is 1 (resp. 0) if X is true (resp. false) O(1) amortized time O(1) amortized time O(1) time using O(m) space (compact!) for m = O(w)
  • 12. 12 Packed Bidirectional Counter Arrays §Basic idea: Exploiting word-level parallelism of the w-bit word RAM • Redundant binary representation of C[1..m] using digits {0, ±1, ±2}. • The corresponding k-th digits of C[1..m] are packed into O(1) words. • The packed k-th digits of C[1..m] are updated in O(1) time, once in 2k times. ··· ··· m × w = O(w2) bits: O(w) time to access Naïve bidirectional counters C[1] C[i] C[m] Packed bidirectional counter array m × O(1) = O(w) bits: O(1) time to access C[1] ··· C[i] ··· C[m] wdigits
  • 13. 13 Packed Bidirectional Counter Arrays = 1 = 0 in {0, ±1} in {0, ±1, ±2} Fixed-schedule carry propagation in O(1) amortized time [GF, IPL 2008][BT, SODA 2010] Packed redundant binary counters using digits {0, ±1, ±2} Packed orders of magnitudes for detecting sign inversion ··· t 0 1 2 ··· level(t) ··· 1 2 3 4 5 6 7 8 3 9 level(t) = min{i | t mod 2i = 0} 1. Propagate carry bits 2. Fix orders of magnitudesThe k-th digits are updated once in 2k times Never overflow The t-th update:
  • 14. 14 Lemma (Packed Bidirectional Counters) There exists an O(m)-space data structure for representing an array C[1..m] of m bidirectional counters supporting §INCREMENT/DECREMENT in O(1) amortized time §ISPOSITIVE in O(1) time on the standard w-bit word RAM with w ≥ m.
  • 15. 15 Theorem §Plugging packed bidirectional counters into CGT, we obtain: There exists an O(lg(n)r/ε)-space randomized data structure for solving the ε-approximate φ-heavy hitters problem with §INSERT/DELETE in O(r) amortized time §QUERY in O(r2/ε) time with probability at least 1 - δ on the standard w-bit word RAM with w ≥ lg n. Here, n is the universe size, δ is a failure probability, and r = lg(1/(δφ)).
  • 16. 16 Experiments: Setup §Data: 14 datasets of 10 M integers • Universe: [0, 264). • Zipf distribution of skewness z in { 0.8, 1.0, 1.2, 1.4, 1.6, 1.8, 2.0 }. • Threshold φ (= ε) in { 0.0001, 0.0005, 0.001, 0.005, 0.01}. §Methods: • Ours [This work]: Our proposed method with #rounds r = 4. • CGT(b) [CM, TODS 2005]: Combinatorial Group Testing with branching factor b in { 2, 16 }. • CMH(b) [CM, LATIN 2005]: Hierarchical Count-Min Sketch with branching factor b in { 2/16 }. CGT(b) and CMH(b) were configured as in [Cormode, Hadjieleftheriou, PVLDB 2008]. §Hardware: • MacBook Pro with Intel® Core™ i7-8559 (2.7GHz) and 16GB main memory.
  • 17. 17 Experiments: Precision §Ours achieved competitive precisions for skewness z ≥ 1.4. • Ours output more false positives than others for skewness z < 1.4. • For z < 1.4, ours should have used larger ε to suppress false positives. • Recalls of all methods were 100%. 0.8 1.2 1.6 2.0 0 20 40 60 80 100 Precision(%) = 0.0001 0.8 1.2 1.6 2.0 = 0.0005 0.8 1.2 1.6 2.0 Skewness = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 2.0 = 0.005 0.8 1.2 1.6 2.0 = 0.01 1.2 1.6 2.0 = 0.0005 0.8 1.2 1.6 2.0 Skewness = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 = 0.005 CMH(16)CMH(2)CGT(16)Ours CGT(2) [CM, TODS 2005] [CM, LATIN 2005]
  • 18. 18 Experiments: Update time §Ours achieved competitive update throughputs with CMH(16). • CMH(16) achieved best and stable update throughputs. • CGT(16) had heavy dependence on φ even if it doesn’t in theory. • CGT(2) and CMH(2) were not competitive. 0.8 1.2 1.6 2.0 0 5000 10000 15000 20000 25000 30000 Updates/msec = 0.0001 0.8 1.2 1.6 2.0 = 0.0005 0.8 1.2 1.6 2.0 Skewness = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 2.0 = 0.005 0.8 1.2 1.6 2.0 = 0.01 Note: Median of 5 measured times is reported 1.2 1.6 2.0 = 0.0005 0.8 1.2 1.6 2.0 Skewness = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 = 0.005 CMH(16)CMH(2)CGT(16)Ours CGT(2) [CM, TODS 2005] [CM, LATIN 2005]
  • 19. 19 Experiments: Query time §Ours achieved best query throughputs except for φ = 0.0001. • Note: ε = φ and r = O(1) in our experiments. • CGT family (including ours) must examine Θ(1/φ) candidates of heavy hitters. • CMH family is output sensitive: it is fast if # of heavy hitters is less than 1/φ. 0.8 1.2 1.6 2.0 0 1 2 3 4 5 Queries/msec = 0.0001 0.8 1.2 1.6 2.0 0 5 10 15 20 = 0.0005 0.8 1.2 1.6 2.0 Skewness 0 10 20 30 40 = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 2.0 0 50 100 150 200 250 = 0.005 0.8 1.2 1.6 2.0 0 200 400 600 800 1000 1200 = 0.01 1.2 1.6 2.0 = 0.0005 0.8 1.2 1.6 2.0 Skewness = 0.001 Ours CGT2 CGT16 CMH2 CMH16 0.8 1.2 1.6 = 0.005 CMH(16)CMH(2)CGT(16)Ours CGT(2) [CM, TODS 2005] [CM, LATIN 2005]
  • 20. 20 Conclusion §The φ-Heavy Hitters Problem in the strict turnstile model. We improved CGT [CM, ACM TODS 2005] in • Update time: from O(log(n)r) to amortized O(r) • Query time: from O((log(n)+r)r/ε) to O(r2/ε) using the same O(log(n)r/ε) space for a universe of size n and r = log(1/(δφ)). §Packed Bidirectional Counter Array: • Extension of [GF, IPL 2008] and [BT, SODA 2010] to bidirectional counters. • Ops = inc/dec/test: O(1) amortized inc/dec and O(1) test in compact space. §Future work • Extension of our method to arbitrary updates.