SlideShare a Scribd company logo
1 of 3
Download to read offline
6.897: CSCI8980 Algorithmic Techniques for Big Data September 19, 2013
Lecture 4
Dr. Barna Saha Scribe: Barna Saha
Overview
In the previous lecture, we saw that using count-min sketch we can solve a variety of problems
related to frequency estimates such as point query, range query, heavy-hitters etc. where the error
in the estimate is in terms of the l1 norm of the stream. Can we obtain frequency estimates where
error will be in terms of l2 norm of the stream ? We will see one such algorithm Count-Sketch as
part of an exercise. Today, we consider the following problem given a stream S of size m where
elements are coming from domain [1, n] and have unknown frequencies f1, f2, ..., fn, what is the
second frequency moment F2 of the stream ? Here F2 is defined as F2 = n
i=1 f2
i . We give an
elegant solution based on sketches from [1] that requires logarithmic space and update time.
1 Estimating F2
Let H = {h : [n] → {+1, −1}} be a family of four-wise independent hash functions (we have seen
previously how to construct such families). We initialize t counters Z1, Z2, ...Zt to 0 and maintain
Zj = Zj + ahj(i) on arrival of (i, a) for j = 1, ..., t = c
2 , where c is some constant to be fixed later.
We return Y = 1
t
t
j=1 Z2
j as the estimate of F2 of the stream.
We first show that Y is an unbiased estimator of F2, that is E Y = F2(S). Then we compute
Var Y and apply Chebyshev inequality to bound the deviation of Y from its expectation that is
F2.
Lemma 1. E Y = F2
Proof.
E Y = E
1
t
t
j=1
Z2
j =
1
t
t
i=1
E Z2
j .
Now, Zj = i hj(i)fi. Hence
E Z2
j = E (
i
hj(i)fi)2
=
i
E (hj(i))2
f2
i + 2
i<k
E hj(i) E hj(k) fifk
by linearity of expectation and independence of hj(i) and hj(k)
=
i
f2
i since E hj(i) = 0 for all i and E (hj(i))2 = 1 for all i
= F2
1
Therefore,
E Y =
1
t
t
i=1
E Z2
j = F2.
Lemma 2. Var Y ≤
4F2
2
t
Proof.
Var Y = Var
1
t
t
j=1
Z2
j =
1
t2
t
i=1
Var Z2
j .
In the above we obtained the second inequality by noting that Z2
j random variables are all pair-wise
independent. We now calculate Var Z2
j which is E Z4
j − (E Z2
j )2. Note that
E Z4
j = E (
i
hj(i)fi)4
=
a≤b≤c≤d
fafbfcfdE fafbfcfd .
Now note that if either of the following conditions hold a < b < c < d or exactly three among
a, b, c, d are equal, then those terms contribute 0. Hence
E Z4
j = E (
i
hj(i)fi)4
=
i
E (hj(i))4
f4
i +
4
2
i<k
E (hj(i))2
E (hj(k))2
f2
i f2
k
=
i
E (hj(i))4
f4
i + 6
i<k
f2
i f2
k
On the otherhand,
(E Z2
j )2
=
i
E (hj(i))4
f4
i + 2
i<k
f2
i f2
k
Hence
Var Z2
j = 4
i<k
f2
i f2
k ≤ 4 max
i
f2
i
i
f2
i ≤ 4F2
2
Therefore,
Var Y =
1
t2
t
i=1
Var Z2
j ≤
4F2
t
.
Lemma 3. Pr |Y − E Y | > F2 ≤ 1
3 where t ≥ 12
2 .
Proof. By Chebyshev Inequality
Pr |Y − E Y | > F2 ≤
Var Y
2F2
2
≤
4F2
2
t 2F2
2
≤
1
3
So, we have an estimate Y for F2 which guarantees an absolute error at most F2 with probability
at least 2
3 ? Can we boost this probability to (1 − δ) for any δ > 0 ? To do so, we apply a generic
technique, boosting by median.
2
1.1 Boosting by Median
We keep s = O(log 1δ) independent estimates Y1, Y2, ..., Ys. We then arrange these values in non-
increasing order and return the s/2 -th estimate, that is the median of Y1, Y2, ..., Ys. Let without
loss of generality assume, Y1 ≤ Y2 ≤ ... ≤ Ys. And for simplicity assume, s is even. First consider
the upper tail (the lower tail is similar). If Ys/2 > (1 + )F2 then all of Ys/2+1, Ys/2+2, ..., Ys must
be higher than F2(1 + ).
Define an indicator random variable Xi, which is 1 if Yi > (1 + )F2 and 0 otherwise. From Lemma
3 Pr Xi = 1 ≤ 1
3. Hence if we denote by X, the number of estimates that return value more than
(1 + )F2, then X = s
i=1 Xi and E X ≤ s
3.
We now apply the Chernoff’s bound to obtain
Pr Ys/2 > (1 + )F2 = Pr X >
s
2
= Pr X > E X (1 +
1
3
) ≤ e− s
3
1
9
1
3
Setting s = C ln 1
δ where C is a large enough constant, the above probability becomes less than
δ/2.
Similarly, we have
Pr Ys/2 < (1 − )F2 <
δ
2
.
Therefore, by union bound
Pr |Ys/2 − F2| > F2 < δ.
Finally, we have the following theorem
Theorem 4. There is a randomized algorithm for estimating F2 within error (1± ) with probability
at least (1 − δ) that takes space O( 1
2 log 1
δ ) and update time O( 1
2 log 1
δ ).
References
[1] Noga Alon, Yossi Matias, Mario Szegedy: The Space Complexity of Approximating the Fre-
quency Moments. J. Comput. Syst. Sci. 58(1): 137-147 (1999)
3

More Related Content

What's hot

formulation of first order linear and nonlinear 2nd order differential equation
formulation of first order linear and nonlinear 2nd order differential equationformulation of first order linear and nonlinear 2nd order differential equation
formulation of first order linear and nonlinear 2nd order differential equationMahaswari Jogia
 
Normal ditribution
Normal ditributionNormal ditribution
Normal ditributionsialkot123
 
On the Secrecy of RSA Private Keys
On the Secrecy of RSA Private KeysOn the Secrecy of RSA Private Keys
On the Secrecy of RSA Private KeysDharmalingam Ganesan
 
Higher Differential Equation
Higher Differential Equation Higher Differential Equation
Higher Differential Equation Abdul Hannan
 
Linear differential equation
Linear differential equationLinear differential equation
Linear differential equationPratik Sudra
 
Automobile 3rd sem aem ppt.2016
Automobile 3rd sem aem ppt.2016Automobile 3rd sem aem ppt.2016
Automobile 3rd sem aem ppt.2016kalpeshvaghdodiya
 
Differential Equations Lecture: Non-Homogeneous Linear Differential Equations
Differential Equations Lecture: Non-Homogeneous Linear Differential EquationsDifferential Equations Lecture: Non-Homogeneous Linear Differential Equations
Differential Equations Lecture: Non-Homogeneous Linear Differential Equationsbullardcr
 
Implicit differentiation
Implicit differentiationImplicit differentiation
Implicit differentiationSporsho
 
Independence Complexes
Independence ComplexesIndependence Complexes
Independence ComplexesRickard Fors
 
Security of RSA and Integer Factorization
Security of RSA and Integer FactorizationSecurity of RSA and Integer Factorization
Security of RSA and Integer FactorizationDharmalingam Ganesan
 

What's hot (20)

Ch03 1
Ch03 1Ch03 1
Ch03 1
 
formulation of first order linear and nonlinear 2nd order differential equation
formulation of first order linear and nonlinear 2nd order differential equationformulation of first order linear and nonlinear 2nd order differential equation
formulation of first order linear and nonlinear 2nd order differential equation
 
1508.07756v1
1508.07756v11508.07756v1
1508.07756v1
 
Normal ditribution
Normal ditributionNormal ditribution
Normal ditribution
 
First order ordinary differential equations and applications
First order ordinary differential equations and applicationsFirst order ordinary differential equations and applications
First order ordinary differential equations and applications
 
On the Secrecy of RSA Private Keys
On the Secrecy of RSA Private KeysOn the Secrecy of RSA Private Keys
On the Secrecy of RSA Private Keys
 
Higher Differential Equation
Higher Differential Equation Higher Differential Equation
Higher Differential Equation
 
Ch04 2
Ch04 2Ch04 2
Ch04 2
 
Linear differential equation
Linear differential equationLinear differential equation
Linear differential equation
 
Ch03 5
Ch03 5Ch03 5
Ch03 5
 
Automobile 3rd sem aem ppt.2016
Automobile 3rd sem aem ppt.2016Automobile 3rd sem aem ppt.2016
Automobile 3rd sem aem ppt.2016
 
Sect4 4
Sect4 4Sect4 4
Sect4 4
 
Differential Equations Lecture: Non-Homogeneous Linear Differential Equations
Differential Equations Lecture: Non-Homogeneous Linear Differential EquationsDifferential Equations Lecture: Non-Homogeneous Linear Differential Equations
Differential Equations Lecture: Non-Homogeneous Linear Differential Equations
 
Implicit differentiation
Implicit differentiationImplicit differentiation
Implicit differentiation
 
160280102042 c3 aem
160280102042 c3 aem160280102042 c3 aem
160280102042 c3 aem
 
Active Attacks on DH Key Exchange
Active Attacks on DH Key ExchangeActive Attacks on DH Key Exchange
Active Attacks on DH Key Exchange
 
RSA without Integrity Checks
RSA without Integrity ChecksRSA without Integrity Checks
RSA without Integrity Checks
 
Independence Complexes
Independence ComplexesIndependence Complexes
Independence Complexes
 
Analysis of Shared RSA Modulus
Analysis of Shared RSA ModulusAnalysis of Shared RSA Modulus
Analysis of Shared RSA Modulus
 
Security of RSA and Integer Factorization
Security of RSA and Integer FactorizationSecurity of RSA and Integer Factorization
Security of RSA and Integer Factorization
 

Viewers also liked

бойко. 21ноября круглый стол
бойко. 21ноября круглый столбойко. 21ноября круглый стол
бойко. 21ноября круглый столAtner Yegorov
 
социокультурный портрет Чувашской Республики 2013
социокультурный портрет  Чувашской Республики 2013социокультурный портрет  Чувашской Республики 2013
социокультурный портрет Чувашской Республики 2013Atner Yegorov
 
о реализации РЦП "БДД Чувашии 2006-2012"
о реализации РЦП "БДД Чувашии 2006-2012"о реализации РЦП "БДД Чувашии 2006-2012"
о реализации РЦП "БДД Чувашии 2006-2012"Atner Yegorov
 
Algorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysisAlgorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysisAtner Yegorov
 
кку тракторные заводы
кку тракторные заводыкку тракторные заводы
кку тракторные заводыAtner Yegorov
 
презентация инвест предложения_чр_январь_2012_табаков
презентация инвест предложения_чр_январь_2012_табаковпрезентация инвест предложения_чр_январь_2012_табаков
презентация инвест предложения_чр_январь_2012_табаковAtner Yegorov
 
Prezentazia kostroma
Prezentazia kostromaPrezentazia kostroma
Prezentazia kostromaAtner Yegorov
 
птицефабрика
птицефабрикаптицефабрика
птицефабрикаAtner Yegorov
 
Efimov goroda upravlenie_razvitiem_gorodov_2010
Efimov goroda upravlenie_razvitiem_gorodov_2010Efimov goroda upravlenie_razvitiem_gorodov_2010
Efimov goroda upravlenie_razvitiem_gorodov_2010Atner Yegorov
 
Prezentaciya rab glave_17_fevralya
Prezentaciya rab glave_17_fevralyaPrezentaciya rab glave_17_fevralya
Prezentaciya rab glave_17_fevralyaAtner Yegorov
 
презентация топинамбур
презентация топинамбурпрезентация топинамбур
презентация топинамбурAtner Yegorov
 

Viewers also liked (19)

бойко. 21ноября круглый стол
бойко. 21ноября круглый столбойко. 21ноября круглый стол
бойко. 21ноября круглый стол
 
социокультурный портрет Чувашской Республики 2013
социокультурный портрет  Чувашской Республики 2013социокультурный портрет  Чувашской Республики 2013
социокультурный портрет Чувашской Республики 2013
 
о реализации РЦП "БДД Чувашии 2006-2012"
о реализации РЦП "БДД Чувашии 2006-2012"о реализации РЦП "БДД Чувашии 2006-2012"
о реализации РЦП "БДД Чувашии 2006-2012"
 
Lecture5
Lecture5Lecture5
Lecture5
 
чебоксары
чебоксарычебоксары
чебоксары
 
Lec12
Lec12Lec12
Lec12
 
Algorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysisAlgorithmic techniques-for-big-data-analysis
Algorithmic techniques-for-big-data-analysis
 
кку тракторные заводы
кку тракторные заводыкку тракторные заводы
кку тракторные заводы
 
презентация инвест предложения_чр_январь_2012_табаков
презентация инвест предложения_чр_январь_2012_табаковпрезентация инвест предложения_чр_январь_2012_табаков
презентация инвест предложения_чр_январь_2012_табаков
 
ВСМ-2
ВСМ-2ВСМ-2
ВСМ-2
 
Prezentazia ples
Prezentazia plesPrezentazia ples
Prezentazia ples
 
Prezentazia kostroma
Prezentazia kostromaPrezentazia kostroma
Prezentazia kostroma
 
зао чэаз
зао чэаззао чэаз
зао чэаз
 
птицефабрика
птицефабрикаптицефабрика
птицефабрика
 
Efimov goroda upravlenie_razvitiem_gorodov_2010
Efimov goroda upravlenie_razvitiem_gorodov_2010Efimov goroda upravlenie_razvitiem_gorodov_2010
Efimov goroda upravlenie_razvitiem_gorodov_2010
 
It gl-gor2013
It gl-gor2013It gl-gor2013
It gl-gor2013
 
It 2012 3
It 2012 3It 2012 3
It 2012 3
 
Prezentaciya rab glave_17_fevralya
Prezentaciya rab glave_17_fevralyaPrezentaciya rab glave_17_fevralya
Prezentaciya rab glave_17_fevralya
 
презентация топинамбур
презентация топинамбурпрезентация топинамбур
презентация топинамбур
 

Similar to Estimating F2 Frequency Moment with Log Space and Time

Similar to Estimating F2 Frequency Moment with Log Space and Time (20)

Problems and solutions_4
Problems and solutions_4Problems and solutions_4
Problems and solutions_4
 
Solution set 3
Solution set 3Solution set 3
Solution set 3
 
Computational techniques
Computational techniquesComputational techniques
Computational techniques
 
separation_of_var_examples.pdf
separation_of_var_examples.pdfseparation_of_var_examples.pdf
separation_of_var_examples.pdf
 
Cs jog
Cs jogCs jog
Cs jog
 
presentazione
presentazionepresentazione
presentazione
 
Lec 4-slides
Lec 4-slidesLec 4-slides
Lec 4-slides
 
μ-RANGE, λ-RANGE OF OPERATORS ON A HILBERT SPACE
μ-RANGE, λ-RANGE OF OPERATORS ON A HILBERT SPACEμ-RANGE, λ-RANGE OF OPERATORS ON A HILBERT SPACE
μ-RANGE, λ-RANGE OF OPERATORS ON A HILBERT SPACE
 
Summer Proj.
Summer Proj.Summer Proj.
Summer Proj.
 
Lecture notes
Lecture notes Lecture notes
Lecture notes
 
Section2 stochastic
Section2 stochasticSection2 stochastic
Section2 stochastic
 
2nd-year-Math-full-Book-PB.pdf
2nd-year-Math-full-Book-PB.pdf2nd-year-Math-full-Book-PB.pdf
2nd-year-Math-full-Book-PB.pdf
 
2018-G12-Math-E.pdf
2018-G12-Math-E.pdf2018-G12-Math-E.pdf
2018-G12-Math-E.pdf
 
Imc2016 day2-solutions
Imc2016 day2-solutionsImc2016 day2-solutions
Imc2016 day2-solutions
 
Jacobians new
Jacobians newJacobians new
Jacobians new
 
Maths 12
Maths 12Maths 12
Maths 12
 
QFP k=2 paper write-up
QFP k=2 paper write-upQFP k=2 paper write-up
QFP k=2 paper write-up
 
Ichimura 1993: Semiparametric Least Squares (non-technical)
Ichimura 1993: Semiparametric Least Squares (non-technical)Ichimura 1993: Semiparametric Least Squares (non-technical)
Ichimura 1993: Semiparametric Least Squares (non-technical)
 
final19
final19final19
final19
 
LDP.pdf
LDP.pdfLDP.pdf
LDP.pdf
 

More from Atner Yegorov

Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015Atner Yegorov
 
Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015Atner Yegorov
 
Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015Atner Yegorov
 
Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015Atner Yegorov
 
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015Atner Yegorov
 
Брендинг Чувашии
Брендинг ЧувашииБрендинг Чувашии
Брендинг ЧувашииAtner Yegorov
 
VIII Экономический форум Чебоксары
VIII Экономический форум ЧебоксарыVIII Экономический форум Чебоксары
VIII Экономический форум ЧебоксарыAtner Yegorov
 
Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015Atner Yegorov
 
Аттракционы тематического парка, Татарстан
Аттракционы тематического парка, ТатарстанАттракционы тематического парка, Татарстан
Аттракционы тематического парка, ТатарстанAtner Yegorov
 
Тематический парк, Татарстан
Тематический парк, ТатарстанТематический парк, Татарстан
Тематический парк, ТатарстанAtner Yegorov
 
Будущее журналистики
Будущее журналистикиБудущее журналистики
Будущее журналистикиAtner Yegorov
 
Будущее нишевых СМИ
Будущее нишевых СМИБудущее нишевых СМИ
Будущее нишевых СМИAtner Yegorov
 
Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах Atner Yegorov
 
Война форматов. Как люди читают новости
Война форматов. Как люди читают новости Война форматов. Как люди читают новости
Война форматов. Как люди читают новости Atner Yegorov
 
Национальная технологическая инициатива
Национальная технологическая инициативаНациональная технологическая инициатива
Национальная технологическая инициативаAtner Yegorov
 
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ Atner Yegorov
 
Участники заседания по модернизации
Участники заседания по модернизацииУчастники заседания по модернизации
Участники заседания по модернизацииAtner Yegorov
 
Город Innopolis
Город InnopolisГород Innopolis
Город InnopolisAtner Yegorov
 
презентация1
презентация1презентация1
презентация1Atner Yegorov
 

More from Atner Yegorov (20)

Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015Здравоохранение Чувашии 2015
Здравоохранение Чувашии 2015
 
Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015Чувашия. Итоги за 5 лет. 2015
Чувашия. Итоги за 5 лет. 2015
 
Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015Инвестиционная политика Чувашской Республики 2015
Инвестиционная политика Чувашской Республики 2015
 
Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015Господдержка малого и среднего бизнеса в Чувашии 2015
Господдержка малого и среднего бизнеса в Чувашии 2015
 
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
ИННОВАЦИИ В ПРОМЫШЛЕННОСТИ ЧУВАШИИ. 2015
 
Брендинг Чувашии
Брендинг ЧувашииБрендинг Чувашии
Брендинг Чувашии
 
VIII Экономический форум Чебоксары
VIII Экономический форум ЧебоксарыVIII Экономический форум Чебоксары
VIII Экономический форум Чебоксары
 
Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015Рейтин инвестклимата в регионах 2015
Рейтин инвестклимата в регионах 2015
 
Аттракционы тематического парка, Татарстан
Аттракционы тематического парка, ТатарстанАттракционы тематического парка, Татарстан
Аттракционы тематического парка, Татарстан
 
Тематический парк, Татарстан
Тематический парк, ТатарстанТематический парк, Татарстан
Тематический парк, Татарстан
 
Будущее журналистики
Будущее журналистикиБудущее журналистики
Будущее журналистики
 
Будущее нишевых СМИ
Будущее нишевых СМИБудущее нишевых СМИ
Будущее нишевых СМИ
 
Relap
RelapRelap
Relap
 
Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах Будущее онлайн СМИ в регионах
Будущее онлайн СМИ в регионах
 
Война форматов. Как люди читают новости
Война форматов. Как люди читают новости Война форматов. Как люди читают новости
Война форматов. Как люди читают новости
 
Национальная технологическая инициатива
Национальная технологическая инициативаНациональная технологическая инициатива
Национальная технологическая инициатива
 
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ  ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
О РАЗРАБОТКЕ И РЕАЛИЗАЦИИ НАЦИОНАЛЬНОЙ ТЕХНОЛОГИЧЕСКОЙ ИНИЦИАТИВЫ
 
Участники заседания по модернизации
Участники заседания по модернизацииУчастники заседания по модернизации
Участники заседания по модернизации
 
Город Innopolis
Город InnopolisГород Innopolis
Город Innopolis
 
презентация1
презентация1презентация1
презентация1
 

Recently uploaded

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Estimating F2 Frequency Moment with Log Space and Time

  • 1. 6.897: CSCI8980 Algorithmic Techniques for Big Data September 19, 2013 Lecture 4 Dr. Barna Saha Scribe: Barna Saha Overview In the previous lecture, we saw that using count-min sketch we can solve a variety of problems related to frequency estimates such as point query, range query, heavy-hitters etc. where the error in the estimate is in terms of the l1 norm of the stream. Can we obtain frequency estimates where error will be in terms of l2 norm of the stream ? We will see one such algorithm Count-Sketch as part of an exercise. Today, we consider the following problem given a stream S of size m where elements are coming from domain [1, n] and have unknown frequencies f1, f2, ..., fn, what is the second frequency moment F2 of the stream ? Here F2 is defined as F2 = n i=1 f2 i . We give an elegant solution based on sketches from [1] that requires logarithmic space and update time. 1 Estimating F2 Let H = {h : [n] → {+1, −1}} be a family of four-wise independent hash functions (we have seen previously how to construct such families). We initialize t counters Z1, Z2, ...Zt to 0 and maintain Zj = Zj + ahj(i) on arrival of (i, a) for j = 1, ..., t = c 2 , where c is some constant to be fixed later. We return Y = 1 t t j=1 Z2 j as the estimate of F2 of the stream. We first show that Y is an unbiased estimator of F2, that is E Y = F2(S). Then we compute Var Y and apply Chebyshev inequality to bound the deviation of Y from its expectation that is F2. Lemma 1. E Y = F2 Proof. E Y = E 1 t t j=1 Z2 j = 1 t t i=1 E Z2 j . Now, Zj = i hj(i)fi. Hence E Z2 j = E ( i hj(i)fi)2 = i E (hj(i))2 f2 i + 2 i<k E hj(i) E hj(k) fifk by linearity of expectation and independence of hj(i) and hj(k) = i f2 i since E hj(i) = 0 for all i and E (hj(i))2 = 1 for all i = F2 1
  • 2. Therefore, E Y = 1 t t i=1 E Z2 j = F2. Lemma 2. Var Y ≤ 4F2 2 t Proof. Var Y = Var 1 t t j=1 Z2 j = 1 t2 t i=1 Var Z2 j . In the above we obtained the second inequality by noting that Z2 j random variables are all pair-wise independent. We now calculate Var Z2 j which is E Z4 j − (E Z2 j )2. Note that E Z4 j = E ( i hj(i)fi)4 = a≤b≤c≤d fafbfcfdE fafbfcfd . Now note that if either of the following conditions hold a < b < c < d or exactly three among a, b, c, d are equal, then those terms contribute 0. Hence E Z4 j = E ( i hj(i)fi)4 = i E (hj(i))4 f4 i + 4 2 i<k E (hj(i))2 E (hj(k))2 f2 i f2 k = i E (hj(i))4 f4 i + 6 i<k f2 i f2 k On the otherhand, (E Z2 j )2 = i E (hj(i))4 f4 i + 2 i<k f2 i f2 k Hence Var Z2 j = 4 i<k f2 i f2 k ≤ 4 max i f2 i i f2 i ≤ 4F2 2 Therefore, Var Y = 1 t2 t i=1 Var Z2 j ≤ 4F2 t . Lemma 3. Pr |Y − E Y | > F2 ≤ 1 3 where t ≥ 12 2 . Proof. By Chebyshev Inequality Pr |Y − E Y | > F2 ≤ Var Y 2F2 2 ≤ 4F2 2 t 2F2 2 ≤ 1 3 So, we have an estimate Y for F2 which guarantees an absolute error at most F2 with probability at least 2 3 ? Can we boost this probability to (1 − δ) for any δ > 0 ? To do so, we apply a generic technique, boosting by median. 2
  • 3. 1.1 Boosting by Median We keep s = O(log 1δ) independent estimates Y1, Y2, ..., Ys. We then arrange these values in non- increasing order and return the s/2 -th estimate, that is the median of Y1, Y2, ..., Ys. Let without loss of generality assume, Y1 ≤ Y2 ≤ ... ≤ Ys. And for simplicity assume, s is even. First consider the upper tail (the lower tail is similar). If Ys/2 > (1 + )F2 then all of Ys/2+1, Ys/2+2, ..., Ys must be higher than F2(1 + ). Define an indicator random variable Xi, which is 1 if Yi > (1 + )F2 and 0 otherwise. From Lemma 3 Pr Xi = 1 ≤ 1 3. Hence if we denote by X, the number of estimates that return value more than (1 + )F2, then X = s i=1 Xi and E X ≤ s 3. We now apply the Chernoff’s bound to obtain Pr Ys/2 > (1 + )F2 = Pr X > s 2 = Pr X > E X (1 + 1 3 ) ≤ e− s 3 1 9 1 3 Setting s = C ln 1 δ where C is a large enough constant, the above probability becomes less than δ/2. Similarly, we have Pr Ys/2 < (1 − )F2 < δ 2 . Therefore, by union bound Pr |Ys/2 − F2| > F2 < δ. Finally, we have the following theorem Theorem 4. There is a randomized algorithm for estimating F2 within error (1± ) with probability at least (1 − δ) that takes space O( 1 2 log 1 δ ) and update time O( 1 2 log 1 δ ). References [1] Noga Alon, Yossi Matias, Mario Szegedy: The Space Complexity of Approximating the Fre- quency Moments. J. Comput. Syst. Sci. 58(1): 137-147 (1999) 3