SlideShare a Scribd company logo
1 of 13
Download to read offline
CMPSCI 711: More Advanced Algorithms
Section 1-3: Count-Min Sketch and Applications
Andrew McGregor
Last Compiled: April 29, 2012
1/13
Point Queries etc.
Stream: m elements from universe [n] = {1, 2, . . . , n}, e.g.,
x1, x2, . . . , xm = 3, 5, 103, 17, 5, 4, . . . , 1
and let fi be the frequency of i in the stream.
Problems:
Point Query: For i ∈ [n], estimate fi
Range Query: For i, j ∈ [n], estimate fi + fi+1 + . . . + fj
Quantile Query: For φ ∈ [0, 1] find j with f1 + . . . + fj ≈ φm
Heavy Hitter Problem: For φ ∈ [0, 1], find all i with fi ≥ φm.
2/13
Count-Min Sketch
Let H1, . . . , Hd : [n] → [w] be 2-wise independent functions.
As we observe the stream, we maintain d · w counters where
ci,j = number of elements e in the stream with Hi (e) = j
For any x, ci,Hi (x) is an over-estimate for fx and so,
fx ≤ ˜fx = min(c1,H1(x), . . . , cd,Hd (x))
If w = 2/ and d = log2 δ−1
then,
P fx ≤ ˜fx ≤ fx + m ≥ 1 − δ .
3/13
Count-Min Sketch Analysis (a)
Define random variables Z1, . . . , Zk such that ci,Hi (x) = fx + Zi , i.e.,
Zi =
y=x:Hi (y)=Hi (x)
fy
Define Xi,y = 1 if Hi (y) = Hi (x) and 0 otherwise. Then,
Zi =
y=x
fy Xi,y
By 2-wise independence,
E [Zi ] =
y=x
fy E [Xi,y ] =
y=x
fy P [Hi (y) = Hi (x)] ≤ m/w
By Markov inequality,
P [Zi ≥ m] ≤ 1/(w ) = 1/2
4/13
Count-Min Sketch Analysis (b)
Since each Zi is independent
P [Zi ≥ m for all 1 ≤ i ≤ d] ≤ (1/2)d
= δ
Therefore, with probability 1 − δ there exists an j such that
Zj ≤ m
Therefore,
˜fx = min(c1,H1(x), . . . , cj,Hj (x), . . . , cd,Hd (x))
= min(fx + Z1, . . . , fx + Zj , . . . , fx + Zd ) ≤ fx + m
Theorem
We can find an estimate ˜fx for fx that satisfies,
fx ≤ ˜fx ≤ fx + m
with probability 1 − δ while only using O( −1
log δ−1
) memory.
5/13
Outline
Applications: Range Queries etc.
Variants
6/13
Dyadic Intervals
Define lg n partitions of [n]
I0 = {1, 2, 3, 4, 5, 6, 7, 8, . . .}
I1 = {{1, 2}, {3, 4}, {5, 6}, {7, 8}, . . .}
I2 = {{1, 2, 3, 4}, {5, 6, 7, 8}, . . .}
I3 = {{1, 2, 3, 4, 5, 6, 7, 8}, . . .}
...
...
...
Ilg n = {{1, 2, 3, 4, 5, 6, 7, 8, . . . , n}}
Exercise: Any interval [i, j] can be written as the union of ≤ 2 lg n of
the above intervals. E.g., for n = 256,
[48, 107] = [48, 48]∪[49, 64]∪[65, 96]∪[97, 104]∪[105, 106]∪[107, 107]
Call such a decomposition, the canonical decomposition.
7/13
Range Queries and Quantiles
Range Query: For 1 ≤ i ≤ j ≤ n, estimate f[i,j] = fi + fi+1 + . . . + fj
Approximate Median: Find j such that
f1 + . . . + fj ≥ m/2 − m and
f1 + . . . + fj−1 ≤ m/2 + m
Can approximate median via binary search of range queries.
Algorithm:
1. Construct lg n Count-Min sketches, one for each Ii such that for any
I ∈ Ii we have an estimate ˜fI for fI such that
P fI ≤ ˜fI ≤ fI + m ≥ 1 − δ .
2. To estimate [i, j], let I1 ∪ I2 ∪ . . . ∪ Ik be canonical decomposition. Set
˜f[i,j] = ˜fI1 + . . . + ˜fIk
3. Hence, P f[i,j] ≤ ˜f[i,j] ≤ 2 m lg n ≥ 1 − 2δ lg n.
8/13
Heavy Hitters
Heavy Hitter Problem: For 0 < < φ < 1, find a set of elements S
including all i with fi ≥ φm but no elements j with fj ≤ (φ − )m.
Algorithm:
Consider a binary tree whose leaves are [n] and associate internal
nodes with intervals corresponding to descendent leaves.
Compute Count-Min sketches for each Ii .
Going level-by-level from root, mark children I of marked nodes if
˜fI ≥ φm
Return all marked leaves.
Can find heavy-hitters in O(φ−1
log n) steps of post-processing.
9/13
Outline
Applications: Range Queries etc.
Variants
10/13
CR-Precis: Count-Min with deterministic Hash functions
Define t functions Hi (x) = x mod pi where pi is i-th prime number.
Maintain ci,j as before.
Define z1, . . . , zt such that ci,Hi (x) = fx + zi , i.e.,
zi =
y=x:Hi (y)=Hi (x)
fy
Claim: For any y = x, Hj (y) = Hj (x) for at most lg n primes pj .
Therefore i zt = m lg n and hence,
˜fx = min(c1,H1(x), . . . , ct,Ht (x)) = min(fx +z1, . . . , fx +zt) = fx +
m lg n
t
Setting t = (lg n)/ suffices for fx ≤ ˜fx ≤ fx + m.
Requires keeping tpt = O( −2
polylog n) counters.
11/13
Count-Sketch: Count-Min with a Twist
In addition to Hi : [n] → [w], use hash functions ri : [n] → {−1, 1}.
Compute ci,j = x:Hi (x)=j ri (x)fx .
Estimate ˆfx = median(r1(x)c1,H1(x), . . . , rd (x)cd,H1(x))
Analysis:
Lemma: E ri (x)ci,Hi (x) = fx
Lemma: V ri (x)ci,Hi (x) ≤ F2/w
Chebychev: For w = 3/ 2
,
P |fx − ri (x)ci,Hi (x)| ≥
√
F2 ≤
F2
2wF2
= 1/3
Chernoff: With d = O(log δ−1
) hash functions,
P |fx − ˆfx | ≥
√
F2 ≤ 1 − δ
12/13
Count-Sketch Analysis
Fix x and i. Let Xy = I[H(x) = H(y)] and so
r(x)cH(x) =
y
r(x)r(y)fy Xy
Expectation:
E r(x)cH(x) = E

fx +
y=x
r(x)r(y)fy Xy

 = fx
Variance:
V r(x)cH(x) ≤ E (
y
r(x)r(y)fy Xy )2
= E


y
f 2
y X2
y +
y=z
fy fz r(y)r(z)Xy Xz


= F2/w
13/13

More Related Content

What's hot

MLIP - Chapter 4 - Image classification and CNNs
MLIP - Chapter 4 - Image classification and CNNsMLIP - Chapter 4 - Image classification and CNNs
MLIP - Chapter 4 - Image classification and CNNsCharles Deledalle
 
Gentlest Introduction to Tensorflow - Part 2
Gentlest Introduction to Tensorflow - Part 2Gentlest Introduction to Tensorflow - Part 2
Gentlest Introduction to Tensorflow - Part 2Khor SoonHin
 
Code Verification Elections
Code Verification ElectionsCode Verification Elections
Code Verification ElectionsAnthi Orfanou
 
Gradient Descent
Gradient DescentGradient Descent
Gradient DescentJinho Choi
 
Mathematics for machine learning calculus formulasheet
Mathematics for machine learning calculus formulasheetMathematics for machine learning calculus formulasheet
Mathematics for machine learning calculus formulasheetNishant Upadhyay
 
Intermediate Macroeconomics
Intermediate MacroeconomicsIntermediate Macroeconomics
Intermediate MacroeconomicsLaurel Ayuyao
 
Program implementation and testing
Program implementation and testingProgram implementation and testing
Program implementation and testingabukky52
 
8.7 numerical integration
8.7 numerical integration8.7 numerical integration
8.7 numerical integrationdicosmo178
 
Glm talk Tomas
Glm talk TomasGlm talk Tomas
Glm talk TomasSri Ambati
 
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group TestingFast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group TestingRakuten Group, Inc.
 
Numerical_Methods_Simpson_Rule
Numerical_Methods_Simpson_RuleNumerical_Methods_Simpson_Rule
Numerical_Methods_Simpson_RuleAlex_5991
 
딥러닝 교육 자료 #2
딥러닝 교육 자료 #2딥러닝 교육 자료 #2
딥러닝 교육 자료 #2Ashal aka JOKER
 
1 5 graphs of functions
1 5 graphs of functions1 5 graphs of functions
1 5 graphs of functionsAlaina Wright
 
型理論 なんて自分には関係ないと思っているあなたへ
型理論 なんて自分には関係ないと思っているあなたへ型理論 なんて自分には関係ないと思っているあなたへ
型理論 なんて自分には関係ないと思っているあなたへYusuke Matsushita
 
The Uncertain Enterprise
The Uncertain EnterpriseThe Uncertain Enterprise
The Uncertain EnterpriseClarkTony
 

What's hot (20)

Codecomparaison
CodecomparaisonCodecomparaison
Codecomparaison
 
Ada boosting2
Ada boosting2Ada boosting2
Ada boosting2
 
MLIP - Chapter 4 - Image classification and CNNs
MLIP - Chapter 4 - Image classification and CNNsMLIP - Chapter 4 - Image classification and CNNs
MLIP - Chapter 4 - Image classification and CNNs
 
1573 measuring arclength
1573 measuring arclength1573 measuring arclength
1573 measuring arclength
 
Gentlest Introduction to Tensorflow - Part 2
Gentlest Introduction to Tensorflow - Part 2Gentlest Introduction to Tensorflow - Part 2
Gentlest Introduction to Tensorflow - Part 2
 
Code Verification Elections
Code Verification ElectionsCode Verification Elections
Code Verification Elections
 
Gradient Descent
Gradient DescentGradient Descent
Gradient Descent
 
Mathematics for machine learning calculus formulasheet
Mathematics for machine learning calculus formulasheetMathematics for machine learning calculus formulasheet
Mathematics for machine learning calculus formulasheet
 
Intermediate Macroeconomics
Intermediate MacroeconomicsIntermediate Macroeconomics
Intermediate Macroeconomics
 
Program implementation and testing
Program implementation and testingProgram implementation and testing
Program implementation and testing
 
8.7 numerical integration
8.7 numerical integration8.7 numerical integration
8.7 numerical integration
 
Glm talk Tomas
Glm talk TomasGlm talk Tomas
Glm talk Tomas
 
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group TestingFast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
 
Procedural Art
Procedural ArtProcedural Art
Procedural Art
 
Numerical_Methods_Simpson_Rule
Numerical_Methods_Simpson_RuleNumerical_Methods_Simpson_Rule
Numerical_Methods_Simpson_Rule
 
Calculus
CalculusCalculus
Calculus
 
딥러닝 교육 자료 #2
딥러닝 교육 자료 #2딥러닝 교육 자료 #2
딥러닝 교육 자료 #2
 
1 5 graphs of functions
1 5 graphs of functions1 5 graphs of functions
1 5 graphs of functions
 
型理論 なんて自分には関係ないと思っているあなたへ
型理論 なんて自分には関係ないと思っているあなたへ型理論 なんて自分には関係ないと思っているあなたへ
型理論 なんて自分には関係ないと思っているあなたへ
 
The Uncertain Enterprise
The Uncertain EnterpriseThe Uncertain Enterprise
The Uncertain Enterprise
 

Viewers also liked

Arutyunyan Gayane. Big data and socialmedia
Arutyunyan Gayane. Big data and socialmedia Arutyunyan Gayane. Big data and socialmedia
Arutyunyan Gayane. Big data and socialmedia Atner Yegorov
 
презентация шупашкар
презентация шупашкарпрезентация шупашкар
презентация шупашкарAtner Yegorov
 
сквер дк салют
сквер дк салютсквер дк салют
сквер дк салютAtner Yegorov
 
Госдоклад о состоянии здоровья населения Чувашии 2013
Госдоклад о состоянии здоровья населения Чувашии 2013Госдоклад о состоянии здоровья населения Чувашии 2013
Госдоклад о состоянии здоровья населения Чувашии 2013Atner Yegorov
 
Sara Seager - Lecture3 - MIT
Sara Seager - Lecture3 - MITSara Seager - Lecture3 - MIT
Sara Seager - Lecture3 - MITAtner Yegorov
 
Dmitriev Социология больших данных
Dmitriev Социология больших данныхDmitriev Социология больших данных
Dmitriev Социология больших данныхAtner Yegorov
 

Viewers also liked (8)

Lec 2-2
Lec 2-2Lec 2-2
Lec 2-2
 
Arutyunyan Gayane. Big data and socialmedia
Arutyunyan Gayane. Big data and socialmedia Arutyunyan Gayane. Big data and socialmedia
Arutyunyan Gayane. Big data and socialmedia
 
презентация шупашкар
презентация шупашкарпрезентация шупашкар
презентация шупашкар
 
сквер дк салют
сквер дк салютсквер дк салют
сквер дк салют
 
Госдоклад о состоянии здоровья населения Чувашии 2013
Госдоклад о состоянии здоровья населения Чувашии 2013Госдоклад о состоянии здоровья населения Чувашии 2013
Госдоклад о состоянии здоровья населения Чувашии 2013
 
Sara Seager - Lecture3 - MIT
Sara Seager - Lecture3 - MITSara Seager - Lecture3 - MIT
Sara Seager - Lecture3 - MIT
 
Otchet
OtchetOtchet
Otchet
 
Dmitriev Социология больших данных
Dmitriev Социология больших данныхDmitriev Социология больших данных
Dmitriev Социология больших данных
 

Similar to CMPSCI 711: Count-Min Sketch for Frequency Estimation in Data Streams

Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted ParaproductsVjekoslavKovac1
 
SIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsSIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsJagadeeswaran Rathinavel
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimizationhelalmohammad2
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusMatthew Leingang
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusMatthew Leingang
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfAlexander Litvinenko
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputszukun
 
Dynamic1
Dynamic1Dynamic1
Dynamic1MyAlome
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector MachinesEdgar Marca
 
SURF 2012 Final Report(1)
SURF 2012 Final Report(1)SURF 2012 Final Report(1)
SURF 2012 Final Report(1)Eric Zhang
 
Efficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formatsEfficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formatsAlexander Litvinenko
 
From moments to sparse representations, a geometric, algebraic and algorithmi...
From moments to sparse representations, a geometric, algebraic and algorithmi...From moments to sparse representations, a geometric, algebraic and algorithmi...
From moments to sparse representations, a geometric, algebraic and algorithmi...BernardMourrain
 
1.1_The_Definite_Integral.pdf odjoqwddoio
1.1_The_Definite_Integral.pdf odjoqwddoio1.1_The_Definite_Integral.pdf odjoqwddoio
1.1_The_Definite_Integral.pdf odjoqwddoioNoorYassinHJamel
 

Similar to CMPSCI 711: Count-Min Sketch for Frequency Estimation in Data Streams (20)

Multilinear Twisted Paraproducts
Multilinear Twisted ParaproductsMultilinear Twisted Paraproducts
Multilinear Twisted Paraproducts
 
Imc2017 day2-solutions
Imc2017 day2-solutionsImc2017 day2-solutions
Imc2017 day2-solutions
 
SIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithmsSIAM - Minisymposium on Guaranteed numerical algorithms
SIAM - Minisymposium on Guaranteed numerical algorithms
 
Derivative free optimization
Derivative free optimizationDerivative free optimization
Derivative free optimization
 
Evaluating definite integrals
Evaluating definite integralsEvaluating definite integrals
Evaluating definite integrals
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of Calculus
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of Calculus
 
gbapplfinal.pdf
gbapplfinal.pdfgbapplfinal.pdf
gbapplfinal.pdf
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
 
Teknik Simulasi
Teknik SimulasiTeknik Simulasi
Teknik Simulasi
 
Lecture5 kernel svm
Lecture5 kernel svmLecture5 kernel svm
Lecture5 kernel svm
 
Icros2021 handout
Icros2021 handoutIcros2021 handout
Icros2021 handout
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputs
 
Dynamic1
Dynamic1Dynamic1
Dynamic1
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
SURF 2012 Final Report(1)
SURF 2012 Final Report(1)SURF 2012 Final Report(1)
SURF 2012 Final Report(1)
 
Efficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formatsEfficient Analysis of high-dimensional data in tensor formats
Efficient Analysis of high-dimensional data in tensor formats
 
Lec 4-slides
Lec 4-slidesLec 4-slides
Lec 4-slides
 
From moments to sparse representations, a geometric, algebraic and algorithmi...
From moments to sparse representations, a geometric, algebraic and algorithmi...From moments to sparse representations, a geometric, algebraic and algorithmi...
From moments to sparse representations, a geometric, algebraic and algorithmi...
 
1.1_The_Definite_Integral.pdf odjoqwddoio
1.1_The_Definite_Integral.pdf odjoqwddoio1.1_The_Definite_Integral.pdf odjoqwddoio
1.1_The_Definite_Integral.pdf odjoqwddoio
 

More from Atner Yegorov

Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015Atner Yegorov
 
Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015Atner Yegorov
 
Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015Atner Yegorov
 
Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015Atner Yegorov
 
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015Atner Yegorov
 
Брендинг Чувашии
Брендинг ЧувашииБрендинг Чувашии
Брендинг ЧувашииAtner Yegorov
 
VIII Экономический форум Чебоксары
VIII Экономический форум ЧебоксарыVIII Экономический форум Чебоксары
VIII Экономический форум ЧебоксарыAtner Yegorov
 
Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015Atner Yegorov
 
Аттракционы тематического парка, Татарстан
Аттракционы тематического парка, ТатарстанАттракционы тематического парка, Татарстан
Аттракционы тематического парка, ТатарстанAtner Yegorov
 
Тематический парк, Татарстан
Тематический парк, ТатарстанТематический парк, Татарстан
Тематический парк, ТатарстанAtner Yegorov
 
Будущее журналистики
Будущее журналистикиБудущее журналистики
Будущее журналистикиAtner Yegorov
 
Будущее нишевых СМИ
Будущее нишевых СМИБудущее нишевых СМИ
Будущее нишевых СМИAtner Yegorov
 
Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах Atner Yegorov
 
Война форматов. Как люди читают новости
Война форматов. Как люди читают новости Война форматов. Как люди читают новости
Война форматов. Как люди читают новости Atner Yegorov
 
Национальная технологическая инициатива
Национальная технологическая инициативаНациональная технологическая инициатива
Национальная технологическая инициативаAtner Yegorov
 
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ Atner Yegorov
 
Участники заседания по модернизации
Участники заседания по модернизацииУчастники заседания по модернизации
Участники заседания по модернизацииAtner Yegorov
 
Город Innopolis
Город InnopolisГород Innopolis
Город InnopolisAtner Yegorov
 
презентация1
презентация1презентация1
презентация1Atner Yegorov
 

More from Atner Yegorov (20)

Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015
 
Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015
 
Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015
 
Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015
 
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
 
Брендинг Чувашии
Брендинг ЧувашииБрендинг Чувашии
Брендинг Чувашии
 
VIII Экономический форум Чебоксары
VIII Экономический форум ЧебоксарыVIII Экономический форум Чебоксары
VIII Экономический форум Чебоксары
 
Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015
 
Аттракционы тематического парка, Татарстан
Аттракционы тематического парка, ТатарстанАттракционы тематического парка, Татарстан
Аттракционы тематического парка, Татарстан
 
Тематический парк, Татарстан
Тематический парк, ТатарстанТематический парк, Татарстан
Тематический парк, Татарстан
 
Будущее журналистики
Будущее журналистикиБудущее журналистики
Будущее журналистики
 
Будущее нишевых СМИ
Будущее нишевых СМИБудущее нишевых СМИ
Будущее нишевых СМИ
 
Relap
RelapRelap
Relap
 
Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах
 
Война форматов. Как люди читают новости
Война форматов. Как люди читают новости Война форматов. Как люди читают новости
Война форматов. Как люди читают новости
 
Национальная технологическая инициатива
Национальная технологическая инициативаНациональная технологическая инициатива
Национальная технологическая инициатива
 
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
 
Участники заседания по модернизации
Участники заседания по модернизацииУчастники заседания по модернизации
Участники заседания по модернизации
 
Город Innopolis
Город InnopolisГород Innopolis
Город Innopolis
 
презентация1
презентация1презентация1
презентация1
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

CMPSCI 711: Count-Min Sketch for Frequency Estimation in Data Streams

  • 1. CMPSCI 711: More Advanced Algorithms Section 1-3: Count-Min Sketch and Applications Andrew McGregor Last Compiled: April 29, 2012 1/13
  • 2. Point Queries etc. Stream: m elements from universe [n] = {1, 2, . . . , n}, e.g., x1, x2, . . . , xm = 3, 5, 103, 17, 5, 4, . . . , 1 and let fi be the frequency of i in the stream. Problems: Point Query: For i ∈ [n], estimate fi Range Query: For i, j ∈ [n], estimate fi + fi+1 + . . . + fj Quantile Query: For φ ∈ [0, 1] find j with f1 + . . . + fj ≈ φm Heavy Hitter Problem: For φ ∈ [0, 1], find all i with fi ≥ φm. 2/13
  • 3. Count-Min Sketch Let H1, . . . , Hd : [n] → [w] be 2-wise independent functions. As we observe the stream, we maintain d · w counters where ci,j = number of elements e in the stream with Hi (e) = j For any x, ci,Hi (x) is an over-estimate for fx and so, fx ≤ ˜fx = min(c1,H1(x), . . . , cd,Hd (x)) If w = 2/ and d = log2 δ−1 then, P fx ≤ ˜fx ≤ fx + m ≥ 1 − δ . 3/13
  • 4. Count-Min Sketch Analysis (a) Define random variables Z1, . . . , Zk such that ci,Hi (x) = fx + Zi , i.e., Zi = y=x:Hi (y)=Hi (x) fy Define Xi,y = 1 if Hi (y) = Hi (x) and 0 otherwise. Then, Zi = y=x fy Xi,y By 2-wise independence, E [Zi ] = y=x fy E [Xi,y ] = y=x fy P [Hi (y) = Hi (x)] ≤ m/w By Markov inequality, P [Zi ≥ m] ≤ 1/(w ) = 1/2 4/13
  • 5. Count-Min Sketch Analysis (b) Since each Zi is independent P [Zi ≥ m for all 1 ≤ i ≤ d] ≤ (1/2)d = δ Therefore, with probability 1 − δ there exists an j such that Zj ≤ m Therefore, ˜fx = min(c1,H1(x), . . . , cj,Hj (x), . . . , cd,Hd (x)) = min(fx + Z1, . . . , fx + Zj , . . . , fx + Zd ) ≤ fx + m Theorem We can find an estimate ˜fx for fx that satisfies, fx ≤ ˜fx ≤ fx + m with probability 1 − δ while only using O( −1 log δ−1 ) memory. 5/13
  • 7. Dyadic Intervals Define lg n partitions of [n] I0 = {1, 2, 3, 4, 5, 6, 7, 8, . . .} I1 = {{1, 2}, {3, 4}, {5, 6}, {7, 8}, . . .} I2 = {{1, 2, 3, 4}, {5, 6, 7, 8}, . . .} I3 = {{1, 2, 3, 4, 5, 6, 7, 8}, . . .} ... ... ... Ilg n = {{1, 2, 3, 4, 5, 6, 7, 8, . . . , n}} Exercise: Any interval [i, j] can be written as the union of ≤ 2 lg n of the above intervals. E.g., for n = 256, [48, 107] = [48, 48]∪[49, 64]∪[65, 96]∪[97, 104]∪[105, 106]∪[107, 107] Call such a decomposition, the canonical decomposition. 7/13
  • 8. Range Queries and Quantiles Range Query: For 1 ≤ i ≤ j ≤ n, estimate f[i,j] = fi + fi+1 + . . . + fj Approximate Median: Find j such that f1 + . . . + fj ≥ m/2 − m and f1 + . . . + fj−1 ≤ m/2 + m Can approximate median via binary search of range queries. Algorithm: 1. Construct lg n Count-Min sketches, one for each Ii such that for any I ∈ Ii we have an estimate ˜fI for fI such that P fI ≤ ˜fI ≤ fI + m ≥ 1 − δ . 2. To estimate [i, j], let I1 ∪ I2 ∪ . . . ∪ Ik be canonical decomposition. Set ˜f[i,j] = ˜fI1 + . . . + ˜fIk 3. Hence, P f[i,j] ≤ ˜f[i,j] ≤ 2 m lg n ≥ 1 − 2δ lg n. 8/13
  • 9. Heavy Hitters Heavy Hitter Problem: For 0 < < φ < 1, find a set of elements S including all i with fi ≥ φm but no elements j with fj ≤ (φ − )m. Algorithm: Consider a binary tree whose leaves are [n] and associate internal nodes with intervals corresponding to descendent leaves. Compute Count-Min sketches for each Ii . Going level-by-level from root, mark children I of marked nodes if ˜fI ≥ φm Return all marked leaves. Can find heavy-hitters in O(φ−1 log n) steps of post-processing. 9/13
  • 10. Outline Applications: Range Queries etc. Variants 10/13
  • 11. CR-Precis: Count-Min with deterministic Hash functions Define t functions Hi (x) = x mod pi where pi is i-th prime number. Maintain ci,j as before. Define z1, . . . , zt such that ci,Hi (x) = fx + zi , i.e., zi = y=x:Hi (y)=Hi (x) fy Claim: For any y = x, Hj (y) = Hj (x) for at most lg n primes pj . Therefore i zt = m lg n and hence, ˜fx = min(c1,H1(x), . . . , ct,Ht (x)) = min(fx +z1, . . . , fx +zt) = fx + m lg n t Setting t = (lg n)/ suffices for fx ≤ ˜fx ≤ fx + m. Requires keeping tpt = O( −2 polylog n) counters. 11/13
  • 12. Count-Sketch: Count-Min with a Twist In addition to Hi : [n] → [w], use hash functions ri : [n] → {−1, 1}. Compute ci,j = x:Hi (x)=j ri (x)fx . Estimate ˆfx = median(r1(x)c1,H1(x), . . . , rd (x)cd,H1(x)) Analysis: Lemma: E ri (x)ci,Hi (x) = fx Lemma: V ri (x)ci,Hi (x) ≤ F2/w Chebychev: For w = 3/ 2 , P |fx − ri (x)ci,Hi (x)| ≥ √ F2 ≤ F2 2wF2 = 1/3 Chernoff: With d = O(log δ−1 ) hash functions, P |fx − ˆfx | ≥ √ F2 ≤ 1 − δ 12/13
  • 13. Count-Sketch Analysis Fix x and i. Let Xy = I[H(x) = H(y)] and so r(x)cH(x) = y r(x)r(y)fy Xy Expectation: E r(x)cH(x) = E  fx + y=x r(x)r(y)fy Xy   = fx Variance: V r(x)cH(x) ≤ E ( y r(x)r(y)fy Xy )2 = E   y f 2 y X2 y + y=z fy fz r(y)r(z)Xy Xz   = F2/w 13/13