06775257

948 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 6, JUNE 2014
A New Sketch Method for Measuring Host
Connection Degree Distribution
Pinghui Wang, Xiaohong Guan, Fellow, IEEE, Junzhou Zhao, Jing Tao, and Tao Qin, Student Member, IEEE
Abstract—The host connection degree distribution (HCDD) is
an important metric for network security monitoring. However,
it is difficult to accurately obtain the HCDD in real time for
high-speed links with a massive amount of traffic data. In this
paper, we propose a new sketch method to build a probabilistic
traffic summary of a host’s flows using a uniform Flajolet–Martin
sketch combined with a small bitmap. To study its performance
in comparison with previous sampling and sketch methods, we
present a general model that encompasses all these methods.
With this model, we compute the Cramér–Rao lower bounds
and the variances of HCDD estimations. The theoretic analysis
and numerical experimental results show that our sketch method
is six times more accurate than state-of-the-art methods with the
same memory usage.
Index Terms—Network monitoring, traffic analysis.
I. INTRODUCTION
ALOT of research efforts have been made on measuring
traffic volume including heavy hitter detection [1]–[5],
heavy change detection [6]–[8], flow size distribution estima-
tion [9]–[15], and traffic entropy estimation [16], [17]. Besides
these statistics of traffic volumes, the host connection degree
is also an important indicator for network monitoring, where
a host’s connection degree is defined as the number of distinct
hosts that the host connects to. For example, super hosts
with a large number of connections are usually caused by
network anomalies [18]–[21]. Moreover, Nychis et al. [22]
observe that the host connection degree distribution (HCDD)
is not consistent with other distributions, such as the host
traffic volume distribution, and more likely to reflect network
anomalies. In fact the HCDD has been applied to applications
Manuscript received February 8, 2013; revised December 8, 2013 and
February 27, 2014; accepted March 6, 2014. Date of publication March
19, 2014; date of current version April 30, 2014. This work was supported
in part by Xi’an Jiaotong University, in part by the National Natural Sci-
ence Foundation of China under Grant 61103240, Grant 61103241, Grant
61221063, and Grant 91118005, and in part by the Application Foundation
Research Program of SuZhou under Grant SYG201311. The associate editor
coordinating the review of this manuscript and approving it for publication was
Prof. Wanlei Zhou.
P. Wang is with the Noah’s Ark Laboratory, Huawei Technologies,
Hong Kong (e-mail: wang.pinghui@huawei.com).
X. Guan is with the MOE Key Laboratory for Intelligent Networks and
Network Security, Xi’an Jiaotong University, Xi’an 710049, China, and also
with the Center for Intelligent and Networked Systems, Tsinghua National
Laboratory for Information Science and Technology, Tsinghua University,
Beijing 100084, China (e-mail: xhguan@xjtu.edu.cn).
J. Zhao, J. Tao, and T. Qin are with the MOE Key Laboratory for
Intelligent Networks and Network Security, Xi’an Jiaotong University,
Xi’an 710049, China (e-mail: jzzhao@sei.xjtu.edu.cn; jtao@sei.xjtu.edu.cn;
tqin@sei.xjtu.edu.cn).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIFS.2014.2312544
for security monitoring, such as anomaly detection [23]–[28]
and host profiling [29].
In this paper, we aim to measure the HCDD in realtime
for high speed links. Measuring traffic statistics has to be
implemented with very fast and expensive SRAM for realtime
applications at routers, since DRAM does not support the
required high access rates. It is also critical for future soft
defined networks. For example, Yu et al. [30] propose a traffic
measurement architecture to deploy existing sketch methods
on routers in software defined networks. To obtain the total
number of flows generated by a host, one needs to build a table
that keeps track of existing flows to avoid duplicating flow
records for packets from the same flow, where a flow refers
to the set of packets with the same source and destination.
Clearly it is not practical to obtain the connection degrees
of all hosts by building per-host tables for high speed links
with a large number of hosts and flows, since SRAM is an
expensive and scarce resource on routers. Therefore, it is
very important to design memory- and computation-efficient
methods for computing the HCDD of large networks.
Sampling technique provides a direct and efficient way to
analyze and process massive data, since only a relatively small
subset of network traffic needs to be stored and processed.
Similar to estimate flow size distribution [9], [11], one can
estimate the HCDD using sampled traffic. Another is sketches
as in [31]–[35] to independently build a probabilistic summary
of flows per host using a small and fast memory, and then the
connection degree of a host can be estimated individually with
its associated sketch. To accurately estimate the connection
degree of a host with thousands of connections, these sketch
methods need to process an array of thousands of bits. Since
the connection degree of a host is not known in advance, the
total memory use is very large for a high speed link with
millions of hosts, and it limits the practical applications of
the sketch methods in [31]–[35]. To solve this problem, the
sketch methods in [18], [36], and [37] measure host connection
degrees by building a very compact digest of monitored traffic
for all hosts, where the bitmaps used for estimating host
connection degrees are not generated independently. However
they fail to compute the HCDD, since the correlations of
the bitmaps are too complex to analyze. Chen et al. [38]
proposed a sketch method to measure the HCDD by building a
continuous uniform Flajolet-Martin sketch (CUFMS) for each
host. The method uses a CUFMS to record the minimal order
statistic of the hash values of a host’s flows, where the hash
value of a flow is a random value selected from the range (0, 1)
at random. The CUFMS of a host with more flows is smaller
with high probability, so the magnitude of a host’s CUFMS
1556-6013 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

WANG et al.: NEW SKETCH METHOD FOR MEASURING HCDD 949
provides a measure for its connection degree. Based on this
property, the method in [38] estimate the HCDD using the
maximum likelihood estimation method. However, we cannot
easily determine whether a host has a large connection degree
from its CUFMS, since a small CUFMS might also be
generated by most hosts with small connection degrees. Thus,
the estimation accuracy of the CUFMS method in [38] is a
serious issue.
To address this problem, we present a joint sketch (JS)
method. JS uses a discrete uniform Flajolet-Martin sketch
(DUFMS) combined with a small bitmap [32] to build a
compact digest of a host’s flows. DUFMS is the actual
application version of the CUFMS, and the bitmap of a host is
defined as a bit array, where each bit is initialized with zero.
For each flow of a host, a bit is randomly selected from the
host’s bitmap, and set as one. As a probabilistic summary, the
number of ones in a host’s bitmap reflects the host’s connection
degree. JS is more efficient and accurate than the CUFMS
method because JS uses small bitmaps to build very accurate
sketches for most hosts, since most hosts have several flows
and only a small fraction of hosts have thousands of flows.
JS also builds a more effective traffic digest than the CUFMS
method for hosts with a large number of flows, because all
bits in the bitmap of a host with a large number of flows
will be set as one with high probability due to the “coupon
collector’s problem” [39], and then the bitmap reflects that the
host’s connection degree is not smaller than L ln L, where L
is the size of the bitmap. To study the performance of JS
in comparison with state-of-the-art methods, we propose a
general model that encompasses all these methods. With this
model, we compute the Cramér-Rao lower bounds and the
variances of HCDD estimations. The theoretic analysis and
numerical experimental results show that our method JS is
significantly more accurate than state-of-the-art methods with
the same memory usage. Meanwhile experiments based on real
and simulated anomaly traffic show that JS is also efficient to
measure the change of the HCDD, which is another important
indicator for detecting network anomalies.
This paper is organized as follows. The problem is formu-
lated in Section II. In Section III we briefly discuss previous
methods and our sketch method for measuring the HCDD, and
introduce a general model that encompasses all these HCDD
measurement methods. Section IV presents an efficient method
to compute the maximum likelihood estimation of the HCDD.
Then we apply the information theory to study and analyze
the confidence interval of HCDD estimates in Section V. The
performance evaluation and testing results are presented in
Section VI. Section VII summarizes related work. Concluding
remarks then follow.
II. PROBLEM FORMULATION
Let P = p1, p2, . . . be an input packet stream arriving
sequentially at a network monitor. Here pt = (st , dt) refers to
the t-th packet, where st and dt are its source and destination
respectively. We define the out-degree of a source s as the
number of distinct destinations s connects to, and the in-
degree of a destination d as the number of distinct sources
TABLE I
TABLE OF NOTATIONS
d connects to. Denote by n the number of sources, and ni the
number of sources with out-degree i, i = {1, . . . , W}. In this
paper we focus on compute the source out-degree distribution
θ = (θ1, . . . , θW )T, where θi = ni
n is the fraction of sources
with out-degree i and W is the maximum out-degree. Note
that the algorithms presented in the following sections can
also be applied to estimate the in-degree distribution ϑ =
(ϑ1, . . . , ϑV )T, where ϑi is the fraction of destinations with in-
degree i and V is the maximum in-degree. For the convenience
of reading, we summarize notations used throughout this paper
in Table I.
III. SAMPLING AND SKETCH METHODS
In this section, we first summarize previous methods and our
sketch method for measuring the HCDD of high speed links.
Then, we present a general model to alleviate the analysis and
the comparison of these methods discussed in the later two
sections.
A. Measurement Methods
In this subsection, we first briefly discuss the state-of-
the-art methods: flow sampling method (FS) [10], and dis-
crete Flajolet-Martin sketch (DUFMS) [38]. To address their
shortcomings, then we propose a new sketch method: joint
sketch (JS).
1) FS: FS is one of the most popular sampling methods for
network traffic measurement. FS retains each flow indepen-
dently with probability p, where 0 < p < 1, and otherwise
drops it with probability q = 1 − p. Finally, θ is estimated
based on sampled flows, which is discussed in the later section.

2) DUFMS: DUFMS can be viewed as an actual imple-
mentation of the continuous Flajolet-Martin sketch (CUFMS)
proposed in [38], which is a variant of a Flajolet-Martin
sketch [31]. DUFMS uses a random variable Zs to build a
rough sketch for flows of a source s. Let Zs initialized with H.
For an incoming packet (s, d), Zs is updated as
Zs = min{Zs, η(s||d)},
where η is a uniform hash function with the range {1, . . . , H}
and the flow label s||d is the concatenation of the source s and
the destination d. In a later subsection we show that the larger
the out-degree of s, the smaller Zs is with high probability.
Thus, the magnitude of Zs reflects the out-degree of s. Based
on this property, θ is estimated based on the sketches of
sampled sources, which is discussed later.
3) JS: To obtain an accurate estimate of θ, [40] shows that
FS requires to set a large value of p. It makes FS memory
extensive. For CUFMS, we cannot easily determine whether
a source s has a large out-degree from its zs, since the small
value of zs might also be generated by s with a small out-
degree. Therefore the estimation accuracy of CUFMS is a
serious issue. To solve this problem, we propose a new sketch
method JS. JS uses a DUFMS Zs combined with a bitmap
Bs to build a probability traffic summary for a source s. The
bitmap Bs is defined as a bit array Bs[l], 1 ≤ l ≤ L, and
each bit Bs[l] is initialized with zero. For an incoming packet
(s, d), we randomly select a bit from Bs and set it as one.
That is, we update Bs as
Bs[ζ(s||d)] = 1,
where ζ(s||d) is a uniform hash function with the
range {1, . . ., L}. Meanwhile we update Zs as Zs =
min(Zs, η(s||d)), where Zs is initialized with H and η is a
uniform hash function with range {1, . . ., H}.
We observe that most sources’ out-degrees are small and
only a small fraction of sources have large out-degrees. Thus,
JS using a small bitmap builds very accurate sketches for most
sources. For a source with a large out-degree, all bits in its
bitmap will be set as one with high probability due to the
“coupon collector’s problem” [39], and then its bitmap reflects
that its out-degree is not smaller than L ln L. Therefore, JS
can also build a more effective traffic digest than DUFMS for
sources with a large out-degree.
B. General Model
Next we present a general model shown in Fig. 1, which is
used for analyzing the above sampling and sketch methods dis-
cussed in the later two sections. Denote by X = {x1, . . . , xW }
the source state space. We define the source state of a source
with out-degree i (1 ≤ i ≤ W) as xi. Our goal is to estimate
the out-degree distribution θ, so all sources can be considered
mutually independent and the state of each source can be
treated as a random sample selected from X according to
the probability distribution θ. For a source s, we denote by
X ∈ X its source state and Y its system response. For example,
Y represents the number of sampled flows of s for FS,
Zs for DUFMS, and the pair of Zs and Bs for JS. Denote
Fig. 1. General model of measurement methods.
by Y = {y1, . . . , yM} the observation state space, i.e., the
set of observable system responses. Note that the system
response Y might be unobservable, that is Y /∈ Y. For example,
Y is unobserved when there exists no sampled flow of s for
FS. We define sources with unobserved system responses as
unobserved sources. Denote by ui = P(Y /∈ Y | X = xi)
the probability of unobserving a source given that its source
state is xi, 1 ≤ i ≤ W. For a source randomly selected from
the monitored network, the probability of unobserving it is
uθ = P(Y /∈ Y) = W
i=1 uiθi. Let G be the number of
observed sources. Then the likelihood function of G is
f (G = g | θ) =
n
g
(1 − uθ )g
u
n−g
θ
=
n
g
1 −
n
i=1
uiθi
g n
i=1
uiθi
n−g
(1)
where 0 ≤ g ≤ n.
Denote by aji = P(Y = yj | X = xi, Y ∈ Y) (1 ≤ i ≤ W,
1 ≤ j ≤ M) the probability of an observation’s state is yi
given its associated source is observed and the source state
is xi. We can easily find that M
j=1 aji = 1, i = 1, . . . , W.
Let ξ = (ξ1, . . . , ξM )T, where ξj = P(Y = yj | Y ∈ Y)
(1 ≤ j ≤ M) is the probability of obtaining an observation
with state yj. Then ξ and θ have the following relationship
ξj =
W
i=1
P(Y = yj | X = xi, Y ∈ Y)P(X = xi) =
W
i=1
ajiθi.
Or, in matrix notation,
ξ = Aθ, (2)
where A = [aji], j = 1, . . . , M, and i = 1, . . ., W. Thus, the
likelihood function of an observation Y = yj is
f (Y = yj | θ) = ξj , j = 1, . . . , M, (3)
and the likelihood function of G observations {Yl}G
l=1 is
f (Y1, . . . , YG | θ) = f (G | θ)
G
l=1
f (Yl | θ). (4)
From (1), (3), (4), we build a maximum likelihood estimator
of θ, which is discussed in details later in Section IV.
C. Model Specification
Next, we specify the general model for the three methods
in Section III-A respectively.

1) FS: Denote by Os the number of sampled flows for a
source s. Then we have
P(Os = j | Os = i) =
i
j
p j
qi− j
, 0 ≤ j ≤ i,
where Os is the out-degree of s. s is observed if and only
if at least one of its flows is sampled, which happens with
probability P(Os > 0) = 1 − qOs . Corresponding to the
general model, we have ξ = (ξ1, . . . , ξW )T and A = [aji],
j, i = 1, . . . , W, where aji (1 ≤ j ≤ i ≤ W) is the probability
of observing j flows of a source with out-degree i, that is
aji = P(Os = j | Os > 0, Os = i) =
i
j p j qi− j
1 − qi
.
Let aji = 0 for i < j. We can easily find that the probability
of unobserving a source with out-degree i is qi. Therefore, we
have ui = qi, and
uθ =
W
i=1
qi
θi. (5)
Thus, the probability that the number of sampled sources
equals g (0 ≤ g ≤ n) is
f (G = g | θ) =
n
g
1 −
W
i=1
qi
θi
g W
i=1
qi
θi
n−g
.
2) DUFMS: We can easily find that
P(Zs ≥ j | Os = i) =
⎧
⎨
⎩
1 − j−1
H
i
, 1 ≤ j ≤ H
0, otherwise.
Then, the probability that Zs equals j is
aji = P(Zs = j | Os = i)
= P(Zs ≥ j | Os = i) − P(Zs ≥ j + 1 | Os = i)
= 1 −
j − 1
H
i
− 1 −
j
H
i
. (6)
When the out-degree of s is large, (6) indicates that its
associated Zs is small with high probability. Therefore the
magnitude of Zs provides a measure for the out-degree of
s. Corresponding to the general model, we have A = [aji],
j = 1, . . . , H, i = 1, . . . , W, and ξ = (ξ1, . . . , ξH )T.
3) JS: Denote by Us the number of ones in Bs. Next we
compute the probability P(Us = j), 1 ≤ j ≤ L. For j distinct
bits, there exist j!S(i, j) ways to partition i flows into these
bits given that each bit has at least one flow, where S(i, j),
1 ≤ j ≤ i, the Stirling number of the second kind [41], is
computed as
S(i, j) =
1
j!
j−1
k=0
(−1)k j
k
( j − k)i
, 1 ≤ j ≤ i.
Meanwhile, since there exist L
j ways to select j distinct bits
from Bs, we have
P(Us = j | Os = i) =
L
j j!S(i, j)
Li
=
L
j
j−1
k=0
(−1)k j
k
j − k
L
i
. (7)
Let j1 = 1, . . . , min{L, W}, and j2 = 1, . . . , H. Denote
j = ( j1 − 1)H + j2. Corresponding to the general model, we
define the observation state of s as Y = yj when Us = j1 and
Zs = j2. Then we have ξ = (ξ1, . . . , ξM )T and A = [aji], j =
1, . . . , M, i = 1, . . . , W, where M = H min{L, W}, and
aji is computed as
aji = P(Us = j1, Zs = j2 | Os = i)
= P(Us = j1 | Os = i)P(Zs = j2 | Os = i).
From (6) and (7), we have
aji =
L
j1
j1−1
k=0
(−1)k j1
k
j1 − k
L
i
× 1 −
j2 − 1
H
i
− 1 −
j2
H
i
.
Let aji = 0 for j1 > i.
IV. HCDD ESTIMATION METHOD
In this section, we present our maximum likelihood estima-
tion (MLE) method to estimate the source out-degree distrib-
ution θ for the sampling and sketch methods in Section III-A.
For the general model, denote by gj (1 ≤ j ≤ M) the
number of observations with state yj, and g = M
j=1 gi the
number of observations. Let g = (g1, . . . , gM)T. From (4),
the log-likelihood function of observations with respect to the
parameter θ is computed as
L(θ) = log f (G = g | θ) +
M
j=1
gj log f (Y = yj | θ).
The MLE of θ is defined as ˆθ = arg maxθ L(θ). However it
is hard to find a closed form solution to this optimization. To
solve this problem, in what follows we study the complete-
data likelihood function of observations with respect to θ.
Denote by f ji (1 ≤ j ≤ M, 1 ≤ i ≤ W) the number of
observations with state yj generated by sources with state xi,
and f0i the number of unobserved sources with state xi. Then
the complete-data log-likelihood function is
Lc(θ) =
W
i=1
⎛
⎝ f0i log(uiθi) +
M
j=1
fji log (1 − ui)ajiθi
⎞
⎠ .
(8)
To solve the optimization of ˆθ = arg maxθ Lc(θ) with the
constrains W
i=1 θi = 1 and 0 ≤ θi ≤ 1, the MLE of θ is
computed as
ˆθi =
M
j=0 f ji
W
i=1
M
j=0 f ji
.
Clearly f ji (0 ≤ j ≤ M, 1 ≤ i ≤ W) are not directly
available. To solve this problem, we adopt a standard
method, expectation maximization (EM) algorithm [42].
Formally, EM begins with a guess of θ(0), and is an iterative
method which alternates between performing an expectation
step (E-step) and a maximization step (M-step). The k-th
(k ≥ 0) iteration of these two steps is shown as follows:

E-step: Compute Q(θ, θ(k)), the expectation of the
log-likelihood evaluated using the current estimate θ(k)
of θ, as
Q(θ, θ(k)
) =
W
i=1
Eθ(k) [ f0i | g] log(uiθi)
+
W
i=1
M
j=1
Eθ(k) [ f ji | g] log (1 − ui )ajiθi .
which replaces f ji by their expectations E[ f ji] given θ(k)
in (8). Denote by pi | j (1 ≤ j ≤ M, 1 ≤ i ≤ W) the
probability of an observation with state yj is generated by
a source with state xi. We have
pi | j = P X = xi | Y = yj, θ(k)
=
P Y = yj | X = xi, Y ∈ Y, θ(k) P X = xi | θ(k)
P Y = yj | Y ∈ Y, θ(k)
=
ajiθ(k)
i
W
l=1 ajlθ
(k)
l
.
Thus,
Eθ(k) [ f ji | g] = gj pi | j , 1 ≤ j ≤ M, 1 ≤ i ≤ W.
Denote by pi | 0 the probability that a system response is
generated by a source with state xi given that the system
response is unobservable. Then we have
pi | 0 = P X = xi | Y /∈ Y, θ(k)
=
ui θ(k)
i
W
l=1 ulθ
(k)
l
.
Denote by g0 = n − g the number of unobserved sources. If
n is known in advance, we have
Eθ(k) [ f0i | g] = g0 pi | 0.
Otherwise, since
Eθ(k) [g0 | g] = Eθ(k) [n]
W
i=1
uiθ
(k)
i
= Eθ(k) g0 + g | g
W
i=1
uiθ
(k)
i ,
we have
Eθ(k) [g0 | g] =
g W
i=1 ui θ
(k)
i
1 − W
i=1 uiθ
(k)
i
.
Thus,
Eθ(k) [ f0i | g] = Eθ(k) [g0 | g]pi | 0 =
gpi | 0
W
i=1 ui θ
(k)
i
1 − W
i=1 uiθ(k)
i
.
M-step: Compute θ(k+1) as the following equations that is
the value of θ maximizing the expected log-likelihood found
on the E-step
θ
(k+1)
i =
M
j=0 gj pi | j
W
i=1
M
j=0 gj pi | j
, i = 1, . . ., W.
where g0 = n − g if n is known in advance, otherwise g0 =
g W
i=1 ui θ
(k)
i
1− W
i=1 ui θ
(k)
i
. We repeat these two steps multiple times until
θ(k) and θ(k+1) are close enough to each other.
V. ERROR ANALYSIS OF HCDD ESTIMATES
It is hard to directly analyze the variance of the MLE in
Section IV. To solve this problem, in this section we compute
the Cramér-Rao lower bound (CRLB) of θ for the methods
in Section III-A. The CRLB of θ provides a confidence
interval of estimates given by the MLE, since the MLE is an
asymptotically efficient unbiased estimator of θ, and its mean
square error asymptotically approaches the CRLB of θ [43].
A. CRLB of General Model
To compute the CRLB of θ, in what follows we first
compute the Fisher information of observations in respect
to θ. Then we obtain the CRLB of θ assuming that θ is
unconstrained, i.e., θi ∈ R, i = 1, . . . , W, and W
i=1 θi can
assume any value. However, θ is a probability distribution
(constrained), which should decrease the estimation error and
consequently the CRLB of θ. Thus, later we see how to extend
this result to the case where θ has the constraints: W
i=1 θi = 1
and 0 < θi < 1, i = 1, . . . , W.
The following closely follows the exposition in [44], which
estimates the flow size distribution from sampled packets. For
an observed source, its associated observation is a random
variable distributed in the observation space Y = {y1, . . . , yM}
according to the probability distribution ξ = (ξ1, . . . , ξM )T.
The unconstrained Fisher information of observations {Yl}G
l=1
is a M × M matrix J = [Jik], i, k = 1, . . . , W, where
Jik E
∂ ln f (Y1, . . . , YG | θ)
∂θi
∂ ln f (Y1, . . . , YG | θ)
∂θk
(9)
Since Yl is obtained independently, the likelihood function
f (Y1, . . . , YG | θ) is computed as (4) and then we have
Theorem 1: J = [Jik], i, k = 1, . . . , W, is computed as
Jik = E
∂ ln f (G | θ)
∂θi
∂ ln f (G | θ)
∂θk
+E[G]E
∂ ln f (Y | θ)
∂θi
∂ ln f (Y | θ)
∂θk
. (10)
Proof. From (4), we have
ln f (Y1, . . . , YG | θ) = ln f (G | θ) +
G
l=1
ln f (Yl | θ). (11)
Meanwhile,
E
∂ ln f (G | θ)
∂θi
= E
∂ f (G | θ)
∂θi
1
f (G | θ)
=
n
g=0
∂ f (G = g | θ)
∂θi
=
∂
∂θi
n
g=0
f (G = g | θ)
= 0. (12)

Similarly,
E
∂ ln f (Yl | θ)
∂θi
= 0, l = 1, . . . , G. (13)
From (9), (11), (12), and (13), we have (10).
The Fisher information matrix provides a lower bound on
the accuracy of estimators. Let ˆθ = ( ˆθ1, . . . , ˆθW )T be an
unbiased estimator of θ. The Cramér-Rao Theorem states
that the mean square error of any unbiased estimator ˆθj is
lower bounded by J−1 = [(J−1)ik], i, k = 1, . . . , W, the
inverse of the Fisher information matrix J, provided some
weak regularity conditions [45, Ch. 2], i.e.,
E[ ˆθj − θj )2
] ≥ (J−1
)j j, 1 ≤ j ≤ W. (14)
The lower bound in (14) is known as the CRLB. Moreover,
constraints on the estimated parameters provide information to
the estimator and can increase the Fisher information content
of samples. In [46], they find that only the equality constraint
W
i=1 θi = 1 actually results in information gain. Denote
matrix I = [Iik], i, k = 1, . . . , W. Let
I = J−1
−
J−1ooT J−1
oT J−1o
, (15)
where o = (1, . . ., 1)T is a vector with size W. Then we have
E[ ˆθj − θj)2
] ≥ Ij j, 1 ≤ j ≤ W.
B. Simplified Formulation of CRLB
Next, we simplify the CRLB formulations for the three
methods in Section III-A.
1) DUFMS and JS: DUFMS and JS sample sources with
the same probability, and then build sketches for sampled
sources. Thus, f (G = g | θ) is uncorrelated with θ, i.e.,
∂ ln f (G = g | θ)
∂θi
= 0.
From (2) and (3), we have
∂ ln f (Y = yj | θ)
∂θi
=
aji
ξj
.
From Theorem 1, then we have
Jik = n(1 − uθ )
M
j=1
ajiajk
ξj
.
Or, in matrix form,
J = n(1 − uθ )AT
A,
where is a M×M diagonal matrix whose element at location
( j, j) is 1
ξj
, 1 ≤ j ≤ M. Denote ˆJ = AT A. Then we have
J = n(1 − uθ ) ˆJ.
Therefore, J−1 is computed as
J−1
=
ˆJ−1
n(1 − uθ )
.
Moreover, we have ˆJ−1ooT ˆJ−1 = θθT, and oT ˆJ−1o = 1
from [47]. Thus, we simplify (15) as
I =
ˆJ−1 − θθT
n(1 − uθ )
. (16)
2) FS: We have the following equation from (1)
∂ ln f (G = g | θ)
∂θi
=
(n(1 − uθ ) − g)ui
uθ (1 − uθ )
.
Thus, we have
E
∂ ln f (G | θ)
∂θi
∂ ln f (G | θ)
∂θk
=
n
g=0
(n(1 − uθ ) − g)2uiuk
(uθ (1 − uθ ))2
f (G = g | θ)
=
nuiuk
uθ (1 − uθ )
.
From Theorem 1, then we have
Jik =
nuiuk
uθ (1 − uθ )
+ n(1 − uθ )
M
j=1
ajiajk
ξj
.
Or, in matrix form,
J =
nuuT
uθ (1 − uθ )
+ n(1 − uθ ) ˆJ.
When uθ (1 − uθ )2 + uT ˆJ−1u = 0, J−1 is computed as the
following equation by the matrix inversion lemma [48]
J−1
=
1
n(1 − uθ )
ˆJ−1
−
ˆJ−1uuT ˆJ−1
uθ (1 − uθ )2 + uT ˆJ−1u
.
VI. EXPERIMENTS
In this section, we first conduct experiments to study the
CRLB of JS in comparison with state-of-the-art methods. Then
we compute the errors of HCDD estimates given by the MLE.
Finally we show results of measuring HCDDs’ changes for
network anomaly detection.
A. Datasets
We evaluate our method based on two public packet header
traces [49]. These two traces are obtained from the actual
network traffic over the backbones of CERNET (China Edu-
cation and Research Network) as gathered at the Northwest
Regional Center. Trace 1 is collected at a 10 Gbps link
by using TCPDUMP for about ten minutes. It consists of
2.1 × 108 packet headers, and includes 2.1 × 105 sources and
8.2 × 105 flows. Trace 2 is collected at an egress router with
a bandwidth of 1.5 Gbps for about eleven hours. It consists
of 5.8 × 108 packet headers, and includes 1.7 × 106 sources
and 7.0 × 106 flows. To evaluate our method for detecting
abnormal traffic, we also conduct experiments based on the
witty worm dataset [50].

TABLE II
SPACE COMPLEXITIES OF FS, DUFMS, AND JS
TABLE III
COMPUTATIONAL COMPLEXITIES OF FS, DUFMS, AND JS FOR
UPDATING A SAMPLED PACKET (RUNNING TIME IS CALCULATED BY
PERFORMING EXPERIMENTS ON A 2.39 GHZ INTEL PROCESSOR)
B. Comparison Model
In this subsection we give a model to compare FS,
DUFMS, and JS. From (5), we find that n(1 − W
i=1 qiθi)
sources are expected to be sampled for FS with sampling
probability p. To store the information of sampled flows, we
have two different schemes: the computation efficient scheme
FS_C and the memory efficient scheme FS_M. FS_C directly
adopts a flow table to store the labels of sampled flows.
Therefore, on average its memory use is 64npmθ bits. FS_M
allocates a destination table for a sampled source. Then it
needs 32 + 32t bits for a source with t flows sampled, and
its average memory use is 32n(pmθ + 1 − W
i=1 qiθi) bits.
Therefore, FS_M is more memory efficient at the cost of
more computations. Denote by nD and nJ the numbers of
sources sampled by DUFMS and JS respectively. Then the
space complexities of FS, DUFMS, and JS are shown in
Table II. Table III shows the computational complexities of
FS_C, FS_M, DUFMS, and JS. For a sampled packet, we use
a hash table to determine whether its associated flow appeared
previously for FS_C and FS_M. Similar, hash tables are used
to store data summaries of sampled sources for DUFMS and
JS. The average computational complexities are all O(1) for
these four methods. We conduct experiments on a 2.39 GHz
Intel processor running Windows XP. We observe that a
sampled packet is processed very efficiently, which is less than
one microsecond for all these four methods. In the following
experiments FS, DUFMS, and JS are compared under the
same memory usage, that is achieved by setting p, nD, and nJ .
C. Accuracy of Estimating the HCDD
In this subsection, we compute the CRLB and the error of
estimates given by the MLE to evaluate the performance of
JS in comparison with FS and DUFMS respectively.
1) CRLB: We first compute the CRLBs of FS and DUMFS.
Let θ be the source out-degree distribution of trace 1 truncated
with W = 10. Fig. 2 shows the CRLB of FS for different p
(i.e., the probability of sampling a flow), where the total
number of monitored sources is set as n = 106. We can see
that the CRLB greatly decreases with p. Section V shows that
the CRLB decreases linearly with 1/n. It indicates that CRLB
Fig. 2. (Trace 1) CRLB of FS for different p.
Fig. 3. (Trace 1) SCRLB of DUFMS for different H.
Fig. 4. (Trace 1) SCRLB of JS for different L.
of FS is still very large for p = 0.1, even when there exists
n = 109 sources monitored. We define ˆJ−1 − θθT in (16)
as the CRLB provided by a single source for DUFMS and
JS. Denote SCRLB(θk) = ( ˆJ−1 − θθT)kk as its k-th diagonal
element. Fig. 3 shows the SCRLB of DUFMS for different H.
We observe that the SCRLB decreases with H, and the rate
of decrease drops sharply as H increases. It indicates that the
error introduced by discretization is ignorable for the actual
implementation of CUFMS in [38]. We can easily find that
DUMFS requires to sample more than 1010 sources and build
their sketches to reduce its CRLB to 0.1. In summary, the
above results indicate that FS and DUFMS need to collect
most traffic to obtain an accurate estimation of θ.
Fig. 4 shows the SCRLB of JS with different L (i.e.,
the size of a bitmap) in comparison with DUMFS, where

Fig. 5. SCRLB of DUFMS for different θ.
Fig. 6. SCRLB of JS for different θ.
Fig. 7. The CRLB of JS in comparison with the MSE of MLE estimates.
H = 40. We observe that JS outperforms DUFMS, and its
SCRLB decreases sharply with L. Deﬁne the truncated Pareto
distribution as θk = α
σkα+1 , k = 1, . . . , W, where α > 0 and
σ = W
k=1
α
kα+1 . Figs. 5 and 6 show the SCRLBs of JS and
DUFMS for different θ, where L = 6 and H = 40. We
observe that the SCRLB of JS decreases as the fraction of
sources with small out-degrees increases, and it varies a lot
for different distributions. This is because JS builds accurate
sketches using small bitmaps for sources with small out-
degrees.
Fig. 7 shows the mean square error (MSE) of MLE estimates
compared to its respective CRLB for JS, where the MSE
is calculated based on 1,000 MLE estimates. We can see
Fig. 8. Compared CRLBs for different methods with the same memory
usage. (a) Trace 1. (b) Pareto, α = 0.1. (c) Pareto, α = 1. (d) Pareto, α = 2.
that the CRLB and the MSE of MLE estimates are almost
indistinguishable for JS with a large number of sampled
sources (≥ 2×105). For JS with a smaller number of sampled
sources (e.g., 2 × 104), the MSE of the MLE estimates is
fairly close to the CRLB. It indicates that the error of MLE
estimates is accurately bounded by the CRLB. We omit the
similar results of FS and DUFMS. Based on this property,
then we compare the performances of JS, FS, and DUFMS
based on their CRLBs. We select FS_M scheme for FS, and
set p = 0.1, L = 6, and H = 40 for FS, JS and DUFMS.
Fig. 8 shows the CRLBs of JS, FS, and DUFMS with the
same memory usage. We observe that JS is almost two orders
of magnitude more accurate than DUFMS and FS.
2) Estimation Error of the MLE: Next we compute the error
of HCDD estimates given by the MLE method in IV. Our
experiments are conducted on trafﬁc trace 1 truncated with
W = 500. Denote E[ ˆθk] as the average of estimates ˆθk of the

Fig. 9. (Trace 1) Compared errors of estimates given by the MLE for different methods with the same memory usage. (a) E[ ˆθk], p = 0.01, 66 kilobytes.
(b) E[ ˆθk], p = 0.1, 660 kilobytes. (c) E[ ˆθk ], p = 0.2, 1.3 megabytes. (d) NRMSE( ˆθk), p = 0.01, 66 kilobytes. (e) NRMSE( ˆθk), p = 0.1, 660 kilobytes.
(f) NRMSE( ˆθk), p = 0.2, 1.3 megabytes.
out-degree distribution. Let
NRMSE( ˆθk) = E[( ˆθk − θk)2]/θk, k = 1, 2, . . . ,
be a metric that measures the relative error of estimates ˆθk in
respect to its true value θk. In our experiments, we average the
estimates and calculate their NRMSEs over 1,000 runs. Fig. 9
shows the NRMSE of JS in comparison with DUFMS and FS
with the same memory usage (66 kilobytes, 660 kilobytes, and
1.3 megabytes), where H = 500 and L = 16. We can see that
JS outperforms DUFMS and FS for most out-degrees. FS is
highly biased for small p. JS is almost one order of magnitude
smaller than FS, and nearly six times smaller than DUFMS
for out-degrees smaller than 10. For all these three methods,
E[ ˆθk] is highly biased and NRMSE( ˆθk) is very large for out-
degrees larger than 100. It indicates that it is hard to estimate
the tail of the out-degree distribution accurately. To address
this issue, super host detection methods [18]–[21] can be used
to improve the estimation accuracy.
Moreover, we also compare JS with the state-of-the-art
method CUFMS in [38]. CUFMS can only be used to measure
the degree distribution in log2 scale, that is λi = 2i −1
k=2i−1 θi,
i = 1, 2, . . .. Thus, we divide out-degrees into bins [2i−1, 2i),
i = 1, 2, . . ., and compare performances of JS and CUFMS
for measuring the out-degree histogram λ = (λ1, λ2, . . . , ).
Fig. 10 shows the NRMSE of ˆλi , where we set H = 500,
L = 16 for JS. We can see that JS is more accurate than
CUFMS for bins with small out-degrees. Meanwhile, JS and
CUFMS both exhibit large errors for bins with large out-
degrees. To address this issue, we use the virtual connection
Fig. 10. (Trace 1) NRMSE(ˆλi ) compared results of JS and CUFMS.
degree sketch (VCDS) method in [37] for monitoring sources
with large out-degrees. VCDS uses a bit array to build a very
compact data summary of flows for estimating host connection
degrees, and is efficient for detecting hosts with large degrees.
For each source, we generate three virtual bitmaps consisting
of bits randomly selected from the shared bit array, and finally
its out-degree is estimated based on its virtual bitmaps. We
set the sizes of the shared bit array and the virtual bitmap
as 106 and 103 for VCDS respectively. Using JS combined
with VCDS, we compute λ1 to λ7 based on the sketches (i.e.,
DUFMSes and small bitmaps) of sources whose out-degrees
estimated by VCDS are not larger than 128. The other λi ,
i ≥ 8, are estimated based on high out-degree sources detected
by VCDS. In comparison with CUFMS and JS, Fig. 10 shows
that λi with large out-degrees is accurately measured by using
JS combined with VCDS at the cost of more memory usage.

Fig. 11. (Trace 2) Statistics of θ(t) and γ (t). Denote by T the number of
intervals, i.e., t ∈ {1, . . . , T}. The functions used in this figure are defined as
Mean(θ
(t)
k ) = 1
T
T
j=1 θ
( j)
k , Max(θ
(t)
k ) = maxj=1,...,T θ
( j)
k , Min(θ
(t)
k ) =
minj=1,...,T θ
( j)
k , Std(θ
(t)
k ) = 1
T −1
T
j=1(θ
( j)
k − Mean(θ
(t)
k ))2. (a) Mean
of θ
(t)
k . (b) Variance of θ
(t)
k . (c) Mean of γ
(t)
i . (d) Variance of γ
(t)
i .
D. Accuracy of Estimating the Change of the HCDD
In this subsection, we apply JS to measure the change
of the HCDD, which is an important indicator for network
monitoring [51], [52]. We conduct our experiments on traffic
trace 2 and the witty worm dataset [50]. We split traffic into
non-overlapping intervals of equal length. Let θ(t) be the
out-degree distribution for the t-th interval. The difference
between two adjacent θ(t) and θ(t−1) is usually measured using
Kullback-Leibler(KL) divergence, which is defined as
DK L(θ(t)
||θ(t−1)
) =
W
i=1
θ
(t)
i log
θ
(t)
i
θ(t−1)
i
.
As shown in Fig. 11, θ(t)
i of trace 2 is large and varies a lot for
small out-degrees, where the interval is set as thirty minutes.
It indicates that the accuracy of the KL divergence of ˆθ(t) and
ˆθ(t−1) is highly determined by the errors of ˆθ
(t)
i and ˆθ
(t−1)
i for
small out-degrees. For simplicity, we divide out-degrees into
the twenty bins {1}, . . ., {16}, {17, . . ., 32}, {33, . . ., 64},
{65, . . ., 128}, {129, . . ., W}, and use the following
histogram γ (t)
γ
(t)
i =
⎧
⎪⎨
⎪⎩
θ(t)
i , 1 ≤ i ≤ 16
2i−12
k=2i−13+1 θ(t)
k , 17 ≤ i ≤ 19
W
k=129 θ
(t)
k , i = 20
as the monitored traffic feature. Fig. 12 shows the KL diver-
gence of γ (t) and γ (t−1), which is defined as
DK L(γ (t)
||γ (t−1)
) =
20
i=1
γ
(t)
i log
γ
(t)
i
γ (t−1)
i
.
We can see that all KL divergences are smaller than 0.001
except two peaks caused by the two-hour midday break.
Fig. 12. (Trace 2) KL divergence of γ (t) and γ (t−1).
Fig. 13. KL divergence of abnormal γ and normal γ (t) when the i-th bin
of histogram γ (t) is abnormal.
Fig. 14. (Witty worm trace) KL divergences of γ (t) and γ (t−1).
Anomalies usually change γi a lot. For example, there exists
γi with change larger than 0.2 when a network is hit by the
Witty worm [38]. In what follows we suppose that ten percent
of hosts with out-degree 1 are controlled by botnets, and their
out-degrees are shifted into bin i ≥ 2 when attacks such
as DDoS are performing. The KL divergence of the original
γ (t) and the abnormal γ is shown as Fig. 13, where the
interval index t is selected at random. In comparison with
the results in Fig. 12, it shows that the KL divergence is
much larger when the network is abnormal. Meanwhile we
can see that our method JS is able to accurately measure KL
divergences.
Fig. 14 shows results for the Witty worm trace, where the
interval is set as one minute. When t = 11 (i.e., 11-th minute
of the trace), a large number of hosts have been affected

Fig. 15. (Witty worm trace) Results of CUSUM.
by the Witty worm. It results in a high KL divergence of
γ (11) and γ (10). Similar to Fig. 13 and 14 also shows that
JS is efficient to measure HCDD changes. To detect network
anomalies, then we can apply sequential analysis techniques
such as CUSUM [53] to KL divergence estimates. CUSUM
works as follows: It computes and monitors a metric r(t)
defined as
r(t)
= max{r(t−1)
+ DK L(γ (t)
||γ (t−1)
) − μ(t)
}, t ≥ 2,
where r(1) = 0 and μ(t) = 1
t−1
t
j=2 DK L(γ ( j)||γ ( j−1)) is
the average of DK L(γ ( j)||γ ( j−1)), 2 ≤ j ≤ t. An alarm
is reported when r(t) is larger than a predefined threshold.
Fig. 15 shows results of CUSUM for the Witty worm trace.
The threshold is defined as 3δ, where δ is the standard
deviation of KL divergences obtained from normal traffic. We
can see that the Witty worm is accurately detected by our
method.
VII. RELATED WORK
Previous work focuses on designing sampling and data
streaming methods for estimating the flow size distribu-
tion [9]–[15]. Performances of these methods are well studied
based on information theory in [44], [47], and [54]. Compared
to obtain the size of a flow by maintaining a counter, one
needs to build a hash table that keeps track of existing flows
to avoid duplicating flow records for packets from the same
flow to obtain the number of flows generated by a host. Thus,
the problem of estimating the HCDD solved in this paper
is more complex and difficult. Zhao et al. [18] proposed a
data streaming method to measure the host connection degree.
The method consists of a two-dimensional bit array and can
be viewed as a variant of a Bloom filter [55]. It randomly
selects several columns from the bit array for each host.
A host’ associated columns can be viewed as direct bitmaps
as proposed in [32]. All bits in the bit array are initialized
with zero. For each packet incoming, a random bit in each
of its host’s associated columns is set as one. Finally hosts’
connection degrees are estimated based on information of their
associated bitmaps. Moreover, [36], [37] proposed more mem-
ory efficient methods for generating host bitmaps. However
these methods fail to estimate the HCDD, since hosts’ bitmaps
are not generated independently and their correlations are too
complex to analyze. Chen et al. [38] proposed a data streaming
algorithm to measure the HCDD over high speed links. They
built a very compact data summary of host connection degrees
by using a group of counters. For each host s, a counter
Zs is used to record the minimal order statistic of its flows’
hash values, where the hash value of a flow is selected from
the range (0, 1) at random. The magnitude of Zs reflects
the connection degree of s, since the larger the connection
degree of s, the smaller Zs is with high probability. Based
on this data summary, Chen et al. [38] develop a method
to obtain the maximum likelihood estimation of the HCDD.
However, we cannot easily determine whether a host s has a
large connection degree from zs, since the small value of zs
might also be generated by s with a small connection degree.
Therefore the estimation accuracy of the CUFMS method is a
serious issue.
VIII. CONCLUSION
By analyzing and comparing the CRLB and the variance
of HCDD estimates, we observe that our method JS is sig-
nificantly more accurate than the state-of-the-art methods FS
and CUFMS with the same memory usage. Experiments based
on real and simulated anomaly traffic show that JS is also
efficient to measure the change of the HCDD, which is another
important indicator for detecting network anomalies.
ACKNOWLEDGMENT
The authors would like to thank the editor and anonymous
reviewers for their constructive comments and suggestions
that greatly contributed to improving the quality of the
paper.
REFERENCES
[1] C. Estan and G. Varghese, “New directions in traffic measurement and
accounting,” in Proc. ACM SIGCOMM, Aug. 2002, pp. 323–336.
[2] C. Estan and G. Varghese, “Automatically inferring patterns of resource
consumption in network traffic,” in Proc. ACM SIGCOMM, Aug. 2003,
pp. 137–148.
[3] G. Cormode, S. Muthukrishnan, and D. Srivastava, “Finding hierar-
chical heavy hitters in data streams,” in Proc. VLDB, Sep. 2003,
pp. 464–475.
[4] G. Cormode, S. Muthukrishnan, and D. Srivastava, “Diamond in the
rough: Finding hierarchical heavy hitters in multi-dimensional data,” in
Proc. ACM SIGMOD, Jun. 2004, pp. 155–166.
[5] Y. Zhang, S. Singh, S. Sen, N. G. Duffield, and C. Lund, “On-
line identification of hierarchical heavy hitters: Algorithms, evalua-
tion, and application,” in Proc. ACM SIGCOMM IMC, Oct. 2004,
pp. 101–114.
[6] B. Krishnamurthy, S. Sen, Y. Zhang, and Y. Chen, “Sketch-based
change detection: Methods, evaluation, and applications,” in Proc. ACM
SIGCOMM IMC, Oct. 2003, pp. 234–247.
[7] G. Cormode and S. Muthukrishnan, “What’s hot and what’s not: Track-
ing most frequent items dynamically,” in Proc. ACM PODC, Jul. 2003,
pp. 296–306.
[8] R. Schweller et al., “Reversible sketches: Enabling monitoring and
analysis over high-speed data streams,” IEEE/ACM Trans. Netw., vol. 15,
no. 5, pp. 1059–1072, Oct. 2007.
[9] N. Duffield, C. Lund, and M. Thorup, “Estimating flow distributions
from sampled flow statistics,” in Proc. ACM SIGCOMM, Aug. 2003,
pp. 325–336.
[10] N. Hohn and D. Veitch, “Inverting sampled traffic,” in Proc. ACM

[11] L. Yang and G. Michailidis, “Sample based estimation of network
traffic flow characteristics,” in Proc. IEEE INFOCOM, May 2007,
pp. 1775–1783.
[12] A. Kumar, M. Sung, J. Xu, and J. Wang, “Data streaming algorithms
for efficient and accurate estimation of flow size distribution,” in Proc.
ACM SIGMETRICS, Jun. 2004, pp. 177–188.
[13] B. Ribeiro, T. Ye, and D. Towsley, “A resource-minimalist flow size
histogram estimator,” in Proc. ACM SIGCOMM IMC, Oct. 2008,
pp. 285–290.
[14] A. Kumar, M. Sung, J. Xu, and E. W. Zegura, “A data streaming
algorithm for estimating subpopulation flow size distribution,” in Proc.
ACM SIGMETRICS, Jun. 2005, pp. 61–72.
[15] A. Kumar and J. Xu, “Sketch guided sampling–using online estimates
of flow size for adaptive data collection,” in Proc. IEEE INFOCOM,
Apr. 2006, pp. 1–11.
[16] A. Lall, V. Sekar, M. Ogihara, J. Xu, and H. Zhang, “Data streaming
algorithms for estimating entropy of network traffic,” in Proc. ACM
SIGMETRICS, Jun. 2006, pp. 145–156.
[17] H. Zhao, A. Lall, O. Spatscheck, J. Wang, and J. Xu, “A data streaming
algorithm for estimating entropies of OD flows,” in Proc. ACM SIG-
COMM IMC, Oct. 2007, pp. 279–290.
[18] Q. Zhao, A. Kumar, and J. Xu, “Joint data streaming and sampling
techniques for detection of super sources and destinations,” in Proc.
ACM SIGCOMM IMC, Oct. 2005, pp. 77–90.
[19] S. Venkataraman, D. Song, P. B. Gibbons, and A. Blum, “New stream-
ing algorithms for fast detection of superspreaders,” in Proc. NDSS,
Feb. 2005, pp. 149–166.
[20] J. Cao, Y. Jin, A. Chen, T. Bu, and Z. Zhang, “Identifying high cardinal-
ity Internet hosts,” in Proc. IEEE INFOCOM, Apr. 2009, pp. 810–818.
[21] T. Li, S. Chen, W. Luo, and M. Zhang, “Scan detection in high-
speed networks based on optimal dynamic bit sharing,” in Proc. IEEE
INFOCOM, Apr. 2011, pp. 3200–3208.
[22] G. Nychis, V. Sekar, D. G. Andersen, H. Kim, and H. Zhang, “An
empirical evaluation of entropy-based traffic anomaly detection,” in
Proc. ACM SIGCOMM IMC, Oct. 2008, pp. 151–156.
[23] W. Lee and D. Xiang, “Information-theoretic measures for anom-
aly detection,” in Proc. IEEE Symp. Security Privacy, May 2001,
pp. 130–143.
[24] L. Feinstein, D. Schnackenberg, R. Balupari, and D. Kindred, “Statistical
approaches to DDoS attack detection and response,” in Proc. DARPA Inf.
Survivability Conf. Exposit., Apr. 2003, pp. 303–314.
[25] A. Lakhina, M. Crovella, and C. Diot, “Mining anomalies using
traffic feature distributions,” in Proc. ACM SIGCOMM, Aug. 2005,
pp. 217–228.
[26] A. Wagner and B. Plattner, “Entropy based worm and anomaly detection
in fast IP networks,” in Proc. 14th IEEE Int. Workshops Enabling
Technol., Infrastruct. Collaborative Enterprise, Jun. 2005, pp. 172–177.
[27] P. Wang, X. Guan, T. Qin, and Q. Huang, “A data streaming
method for monitoring host connection degrees of high-speed links,”
IEEE Trans. Inf. Forensics Security, vol. 6, no. 3, pp. 1086–1098,
Sep. 2011.
[28] P. Wang, X. Guan, D. Towsley, and J. Tao, “Virtual indexing based meth-
ods for estimating node connection degrees,” Comput. Netw., vol. 56,
no. 12, pp. 2773–2787, 2012.
[29] K. Xu, Z.-L. Zhang, and S. Bhattacharyya, “Profiling Internet backbone
traffic: Behavior models and applications,” in Proc. ACM SIGCOMM,
Aug. 2005, pp. 169–180.
[30] M. Yu, L. Jose, and R. Miao, “Software defined traffic measurement with
opensketch,” in Proc. 10th USENIX Conf. Netw. Syst. Des. Implement.,
Berkeley, CA, USA, 2013, pp. 29–42.
[31] P. Flajolet and N. G. Martin, “Probabilistic counting algorithms for data
base applications,” J. Comput. Syst. Sci., vol. 31, no. 2, pp. 182–209,
Sep. 1985.
[32] K. Whang, B. T. Vander-zanden, and H. M. Taylor, “A linear-time
probabilistic counting algorithm for database applications,” IEEE Trans.
Database Syst., vol. 15, no. 2, pp. 208–229, Jun. 1990.
[33] M. Durand and P. Flajolet, “Loglog counting of large cardinalities,” in
Proc. Eur. Symp. Algorithms, Sep. 2003, pp. 605–617.
[34] C. Estan, G. Varghese, and M. Fisk, “Bitmap algorithms for counting
active flows on high speed links,” in Proc. ACM SIGCOMM IMC,
Oct. 2003, pp. 182–209.
[35] A. Chen and J. Cao, “Distinct counting with a self-learning bitmap,” in
Proc. IEEE ICDE, Mar. 2009, pp. 1171–1174.
[36] M. Yoon, T. Li, S. Chen, and J. K. Peir, “Fit a spread estimator in small
memory,” in Proc. IEEE INFOCOM, Apr. 2009, pp. 504–512.
[37] P. Wang, X. Guan, W. Gong, and D. Towsley, “A new virtual index-
ing method for measuring host connection degrees,” in Proc. IEEE
INFOCOM Mini-Conf., Apr. 2011, pp. 156–160.
[38] A. Chen, L. Li, and J. Cao, “Tracking cardinality distributions in network
traffic,” in Proc. IEEE INFOCOM, Apr. 2009, pp. 819–827.
[39] T. Motwani and P. Raghavan, Randomized Algorithms. New York, NY,
USA: Cambridge Univ. Press, 1995.
[40] F. Murai, B. F. Ribeiro, D. Towsley, and P. Wang, “On set size
distribution estimation and the characterization of large networks via
sampling,” IEEE J. Sel. Areas Commun., vol. 31, no. 6, pp. 1017–1025,
Jun. 2013.
[41] M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions
with Formulas, Graphs, and Mathematical Tables. Washington, DC,
USA: NBS, 1972.
[42] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood
from incomplete data via the EM algorithm,” J. R. Statist. Soc., Ser. B
(Methodol.), vol. 39, no. 1, pp. 1–38, 1977.
[43] P. J. Bickel and K. A. Doksum, Mathematical Statistics: Basic Ideas and
Selected Topics, vol. 1, 2nd ed. Englewood Cliffs, NJ, USA: Prentice-
Hall, 2000.
[44] B. Ribeiro, D. Towsley, T. Ye, and J. Bolot, “Fisher information of
sampled packets: An application to flow size estimation,” in Proc. ACM
[45] H. L. van Trees, Detection, Estimation and Modulation Theory, Part 1.
New York, NY, USA: Wiley, 2001.
[46] J. D. Gorman and A. O. Hero, “Lower bounds for parametric estimation
with constraints,” IEEE Trans. Inf. Theory, vol. 26, no. 6, pp. 1285–1301,
Nov. 1990.
[47] P. Tune and D. Veitch, “Towards optimal sampling for flow size
estimation,” in Proc. ACM SIGCOMM IMC, Oct. 2008, pp. 243–256.
[48] F. Zhang, Matrix Theory: Basic Results and Techniques. New York, NY,
USA: Springer-Verlag, 1999.
[49] (2009, Sep. 1). Virtual Index Methods: Public Data Set [Online].
Available: http://nskeylab.xjtu.edu.cn/people/phwang/data/
[50] (2013, Dec. 1). The CAIDA Witty Worm Dataset. [Online]. Available:
http://www.caida.org/data/passive/witty_worm_dataset.xml
[51] Y. Gu, A. McCallum, and D. Towsley, “Detecting anomalies in network
traffic using maximum entropy estimation,” in Proc. ACM SIGCOMM
IMC, Oct. 2005, pp. 345–350.
[52] M. P. Stoecklin, J.-Y. L. Boudec, and A. Kind, “A two-layered anom-
aly detection technique based on multi-modal flow behavior mod-
els,” in Proc. 9th Int. Conf. Passive Active Netw. Meas., Apr. 2008,
pp. 212–221.
[53] E. S. Page, “Continuous inspection schemes,” Biometrika, vol. 41,
nos. 1–2, pp. 100–115, Jun. 1954.
[54] P. Tune and D. Veitch, “Fisher information in flow size distribution
estimation,” in Proc. IEEE INFOCOM, Apr. 2011, pp. 2105–2113.
[55] B. H. Bloom, “Space/time trade-offs in hash coding with allowable
errors,” Commun. ACM, vol. 13, no. 7, pp. 422–426, Jul. 1970.
Pinghui Wang received the B.S. degree in infor-
mation engineering and the Ph.D. degree in auto-
matic control from Xi’an Jiaotong University, Xi’an,
China, in 2006 and 2012, respectively. In 2012, he
was a Post-Doctoral Researcher with the Department
of Computer Science and Engineering, Chinese Uni-
versity of Hong Kong, and the School of Computer
Science, McGill University, Montreal, QC, Canada,
from 2012 to 2013. He is currently a Researcher
with Noah’s Ark Laboratory, Huawei Technologies,
Hong Kong. His research interests include Internet
traffic measurement and modeling, traffic classification, abnormal detection,
and online social network measurement.

Xiaohong Guan (S’89–M’93–SM’94–F’07) re-
ceived the B.S. and M.S. degrees in automatic con-
trol from Tsinghua University, Beijing, China, and
the Ph.D. degree in electrical engineering from the
University of Connecticut, Storrs, CT, USA, in 1982,
1985, and 1993, respectively.
From 1993 to 1995, he was a Consulting Engi-
neer at PG&E. From 1985 to 1988, he was with
the Systems Engineering Institute, Xi’an Jiaotong
University, Xi’an, China, and the Division of En-
gineering and Applied Science, Harvard University,
Cambridge, MA, USA, from 1999 to 2000. Since 1995, he has been with the
Systems Engineering Institute, Xi’an Jiaotong University, and was a Cheung
Kong Professor of Systems Engineering in 1999 and the Dean of the School
of Electronic and Information Engineering in 2008. Since 2001, he has been
the Director of the Center for Intelligent and Networked Systems, Tsinghua
University, and served as the Head of the Department of Automation from
2003 to 2008. He is an Editor of the IEEE TRANSACTIONS ON POWER
SYSTEMS and an Associate Editor of Automata. His research interests include
allocation and scheduling of complex networked resources, network security,
and sensor networks.
Junzhou Zhao received the B.S. and M.S. degrees
in information engineering from Xi’an Jiaotong Uni-
versity, Xi’an, China, in 2008 and 2010, respectively,
where he is currently pursuing the Ph.D. degree
with the Systems Engineering Institute and SKLMS
Laboratory under the supervision of Prof. Xiaohong
Guan. His research interests include Internet traffic
measurement and modeling, traffic classification,
abnormal detection, and online social network mea-
surement.
Jing Tao received the B.S. and M.S. degrees in
automation engineering from Xi’an Jiaotong Univer-
sity, Xi’an, China, in 2001 and 2006, respectively,
where he is currently a Teacher and pursuing an on-
the-job Ph.D. degree with the Systems Engineering
Institute and SKLMS Laboratory under the supervi-
sion of Prof. Xiaohong Guan. His research interests
include Internet traffic measurement and modeling,
traffic classification, abnormal detection, and botnet.
Tao Qin (S’08) received the B.S. and Ph.D. degrees
in information engineering from Xi’an Jiaotong Uni-
versity, Xi’an, China, in 2004 and 2010, respec-
tively, where he is currently an Assistant Professor
with the Department of Computer Science. His re-
search interests include Internet traffic measurement
and modeling, traffic classification, and abnormal
detection.

06775257

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to 06775257

Similar to 06775257 (20)

Recently uploaded

Recently uploaded (20)

06775257