382 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 2, APRIL 2012 TABLE I SUMMARY OF NOTATION AND TERMINOLOGY GFig. 1. Example of a distributed attack. Let us assume that the gateway router F Fhas only two ﬁlters, 1 and 2, available to block malicious trafﬁc and protect V Fthe victim . It uses 1 to block a single malicious source address (A) and 2Fto block the entire source preﬁxa:b:c:3, which contains three malicious sources FSource IP Addresses and Preﬁxes: Every IPv4 address is abut also one legitimate source (B). Therefore, the selection of ﬁlter 2 tradesoff collateral damage (blocking B) for reduction in the number of ﬁlters (from 32-b sequence. We use standard IP/mask notation, i.e., we write F Fthree to one). We note that both ﬁlters, 1 and 2, are ACLs installed at the G to indicate a preﬁx of length bits, where and can takesame router . (a) Actual network. (b) Hierarchy of source IP addresses andpreﬁxes. values and , respectively. For brevity, when the meaning is obvious from the context, we simply write to indicate preﬁx . We write toformulate and solve ﬁve practical source-address ﬁltering prob- indicate that address is within the 2 addresses coveredlems, depending on the attack scenario and the operator’s policy by preﬁx .and constraints. Our contributions are twofold. On the theoret- Blacklists and Whitelists: A blacklist is a set of uniqueical side, ﬁlter selection optimization leads to novel variations source IP addresses that send bad (undesired) trafﬁc to theof the multidimensional knapsack problem. We exploit the spe- victim. Similarly, a whitelist is a set of unique source IPcial structure of each problem and design optimal and compu- addresses that send good (legitimate) trafﬁc to the victim. An http://ieeexploreprojects.blogspot.comtationally efﬁcient algorithms. On the practical side, we provide address may belong either to a blacklist (in which case we calla set of cost-efﬁcient algorithms that can be used both by op- it a “bad” address) or a to whitelist (in which case we call iterators to block undesired trafﬁc and by router manufacturers a “good” address), but not to both. We use and toto optimize the use of TCAM and eventually the cost of routers. indicate the number of addresses in and , respectively.We use logs from Dshield.org to demonstrate that optimally For brevity, we also use for the number of addressesselecting which source preﬁxes to ﬁlter brings signiﬁcant bene- in the blacklist, which is the size of the most important input toﬁts compared to nonoptimized ﬁltering or to generic clustering our problem.algorithms. Each address in a blacklist or a whitelist is assigned a The outline of the rest of the paper is as follows. In Sec- weight , indicating its importance. If is a bad address,tion II, we formulate the general framework for optimal source we assign it a negative weight , which indicates thepreﬁx ﬁltering. In Section III, we study ﬁve speciﬁc problems beneﬁt from blocking ; if is a good address, we assign itthat correspond to different attack scenarios and operator a positive weight , which indicates the damage frompolicies: blocking all addresses in a blacklist (BLOCK-ALL), blocking . The higher the absolute value of the weight, theblocking some addresses in a blacklist (BLOCK-SOME), higher the beneﬁt or damage and thus the preference to blockblocking all/some addresses in a time-varying blacklist the address or not. The weight can have a different inter-(TIME-VARYING BLOCK-ALL/SOME), blocking ﬂows pretation depending on the ﬁltering problem. For instance, itduring a DDoS ﬂooding attack to meet bandwidth constraints can represent the amount of bad/good trafﬁc originating from(FLOODING), and distributed ﬁltering across several routers the corresponding source address, or it can express policy: De-during ﬂooding (DIST-FLOODING). For each problem, we pending on the amount of money gained/lost by the ISP whendesign an optimal, yet computationally efﬁcient, algorithm blocking source address , an ISP operator can assign largeto solve it. In Section IV, we use data from Dshield.org positive weights to its important customers that should never be to evaluate the performance of our algorithms in realistic blocked, or large negative weights to the worst attack sourcesattack scenarios, and we demonstrate that they bring signiﬁcant that must deﬁnitely be blocked.beneﬁt in practice. Section V discusses related work and puts Creating blacklists and whitelists (i.e., identifying bad andour work in perspective. Section VI concludes the paper. good addresses and assigning appropriate weights to them) is a difﬁcult problem on its own right, but orthogonal to this work. We assume that the blacklist is provided by another module II. PROBLEM FORMULATION AND FRAMEWORK (e.g., an intrusion detection system or historical data) as input to our problem. The sources of legitimate trafﬁc are also assumedA. Terminology and Notation known—e.g., Web servers or ISPs typically keep historical data Table I summarizes our terminology and notation. and know their customers. If it is not explicitly given, we take
SOLDO et al.: OPTIMAL SOURCE-BASED FILTERING OF MALICIOUS TRAFFIC 383a conservative approach and deﬁne the whitelist to includevariables corresponding to all possible preﬁxes and will beall addresses that are not in . omitted from now on for brevity. Filters: We focus on ﬁltering of source address preﬁxes. In Eq. (1)–(4) provide the general framework for ﬁlter-selec-our context, a ﬁlter is an ACL rule that speciﬁes that all packets tion optimization. Different ﬁltering problems can be writtenwith a source IP address in preﬁx should be blocked. is as special cases, possibly with additional constraints. As wethe maximum number of available ﬁlters, and it is given as input discuss in Section V, these are all multidimensional knapsackto our problem. Filter optimization is meaningful only when problems , which are, in general, NP-hard. The speciﬁcs is much smaller than the size of the blacklist . of each problem dramatically affect the complexity, which canOtherwise, the optimal would be to block every single bad ad- vary from linear to NP-hard.dress. is indeed the case in practice due to the size In this paper, we formulate ﬁve practical ﬁltering problemsand cost of the TCAM, as mentioned in Section I. and develop optimal, yet computationally efﬁcient, algorithms The decision variable is 1 if a ﬁlter is to solve them. Here, we summarize the rationale behind eachassigned to block preﬁx , or 0 otherwise. A ﬁlter problem and outline our main results; the exact formulation andblocks all 2 addresses in that range. Hence, detailed solution is provided in Section III. expresses the beneﬁt from ﬁlter , whereas BLOCK-ALL: Suppose a network operator has a blacklist expresses the collateral damage it of size , a whitelist , and a weight assigned to each ad-causes. An effective ﬁlter should have a large beneﬁt anddress that indicates the amount of trafﬁc originating from thatlow collateral damage . address. The total number of available ﬁlters is . The ﬁrst Filtering Beneﬁt and Collateral Damage: We deﬁne the practical goal the operator may have is to install a set of ﬁl-ﬁltering beneﬁt as , i.e., the sum ters that block all bad trafﬁc so as to minimize the amount ofof the weights of the bad addresses whose trafﬁc is blocked. good trafﬁc that is blocked. We design an optimal algorithm thatWe deﬁne the collateral damage of a ﬁltering solution as solves this problem at the lowest achievable complexity (lin- , i.e., the sum of the weights of early increasing with ).the good addresses whose trafﬁc is blocked. BLOCK-SOME: A blacklist and a whitelist are given as be- fore, but the operator is now willing to block only some, insteadB. Rationale and Overview of Filtering Problems of all, bad trafﬁc, so as to decrease the amount of good trafﬁc blocked at the expense of leaving some bad trafﬁc unblocked. Given a set of bad and a set of good source addresses ( and The goal now is to block only those preﬁxes that have the highest ), a measure of their importance (the address weights ), impact and do not contain sources that generate a lot of good http://ieeexploreprojects.blogspot.comand a resource budget ( plus, possibly, other resources, trafﬁc, so as to minimize the total cost in (1). We design an op-depending on the particular problem), the goal is to select timal, lowest-complexity (linearly increasing with ) algorithmwhich source preﬁxes to ﬁlter so as to minimize the impact of for this problem, as well.bad trafﬁc and can be accommodated with the given resource TIME-VARYING BLOCK-ALL/SOME: Bad addresses maybudget. Different variations of the problem can be formulated, change over time : New sources may send malicious trafﬁcdepending on the attack scenario and the victim network’s and, conversely, previously active sources may disappearpolicies and constraints: The network operator may want to (e.g., when their vulnerabilities are patched). One way toblock all bad addresses or tolerate to leave some unblocked; the solve the dynamic versions of BLOCK-ALL (SOME) is toattack may be of low rate or a ﬂooding attack; ﬁlters may be run the algorithms we propose for the static versions for theinstalled at one or several routers. At the core of each ﬁltering blacklist/whitelist pair at each time slot. However, given thatproblem lies the following optimization: subsequent blacklists typically exhibit signiﬁcant overlap , it may be more efﬁcient to exploit this temporal correlation (1) and incrementally update the ﬁltering rules. We show that is it possible to update the optimal solution, as new IPs are inserteds.t. (2) in or removed from the blacklist, in time. FLOODING: In a ﬂooding attack, such as the one shown in Fig. 1, a large number of compromised hosts send trafﬁc to the (3) victim and exhaust the victim’s access bandwidth. In this case, our framework can be used to select the ﬁltering rules that mini- (4) mize the amount of good trafﬁc that is blocked while meeting the access bandwidth constraint—in particular, the total bandwidth Eq. (1) expresses the objective to minimize the total cost of consumed by the unblocked trafﬁc should not exceed the band-bad trafﬁc, which consists of two parts: the collateral damage width of the ﬂooded link, e.g., link G–V in Fig. 1. We prove that(the terms with ) and the cost of leaving bad trafﬁc un- this problem is NP-hard, and we design a pseudo-polynomialblocked (the terms with ). We use notation to algorithm that solves it optimally, with complexity that growsdenote summation over all possible preﬁxes : , linearly with the blacklist and whitelist size, i.e., . . Eq. (2) expresses the constraint on the number DIST-FLOODING: All the above problems aim at installingof ﬁlters. Eq. (3) states that overlapping ﬁlters are mutually ex- ﬁlters at a single router. However, a network operator may useclusive, i.e., each bad address can be blocked at most once, oth- the ﬁltering resources collaboratively across several routers toerwise ﬁltering resources are wasted. Eq. (4) lists the decision better defend against an attack. Distributed ﬁltering may also
384 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 2, APRIL 2012be enabled by the cooperation across several ISPs against acommon enemy. The question in both cases is not only whichpreﬁxes to block, but also at which router to install the ﬁlters.We study the practical problem of distributed ﬁltering against aﬂooding attack. We prove that the problem can be decomposedinto several FLOODING problems, which can be solved in adistributed way. III. FILTERING PROBLEMS AND ALGORITHMS In this section, we provide the detailed formulation of eachproblem and present the algorithm that solves it. We start by Fig. 2. Example of LCP-tree(BL). Consider a blacklist consisting of the following nine bad addresses: BL =deﬁning the data structure that we use to represent the problem f10:0:0:1; 10:0:0:3; 10:0:0:4; 10:0:0:5; 10:0:0:7; 10:0:0:8; 10:0:0:10;and to develop our algorithms. g 10:0:0:11; 10:0:0:12 . All remaining addresses are considered good. Each leaf represents one address in the BL. Each intermediate node represents theA. Data Structure for Representing Filtering Solutions longest common preﬁx p covering all bad addresses in that subtree. At each intermediate node p, we also show the collateral damage (i.e., number of good Deﬁnition 1 (LCP Tree): Given a set of addresses , we addresses blocked) when we ﬁlter preﬁx p instead of ﬁltering each of its children. For instance, if we use two ﬁlters to block bad addresses 10.0.0.5/32deﬁne the Longest Common Preﬁx tree of , denoted by and 10.0.0.7/32, the collateral damage is 0; if, instead, we use one ﬁlter toLCP-tree , as the binary tree with the following properties: block preﬁx 10.0.0.4/30, we also block good address 10.0.0.6/32, i.e., we1) each leaf represents a different address in , and there is a cause collateral damage 1.leaf for each address in ; 2) each intermediate (nonleaf) noderepresents the longest common preﬁx between the preﬁxesrepresented by its two children. damage. Second, (3) becomes (7), which enforces the constraint The LCP-tree can be constructed from the complete bi- that every bad address should be blocked by exactly one ﬁlter,nary tree (with root leaves at level 32 corresponding to all ad- as opposed to at most one ﬁlter in (3)dresses , and intermediate nodes at level corresponding to all preﬁxes of length ) by removing (5)the branches that do not have addresses in , and then by re- http://ieeexploreprojects.blogspot.commoving nodes with a single child. It is a variation of the binary s.t. (6)(or unibit) trie , but does not have nodes with a single child.The LCP-tree offers an intuitive way to represent sets of pre- (7)ﬁxes that can block the addresses in set : Each node in the LCPtree represents a preﬁx that can be blocked, hence we can rep-resent a ﬁltering solution as the pruned version of the LCP tree, Characterizing an Optimal Solution: Our algorithm startswhose leaves are all and only the blocked preﬁxes. from LCP-tree and outputs a pruned version of that LCP Example 1: For instance, consider the LCP tree depicted in tree. Hence, we start by proving that an optimal solution toFig. 2, whose leaves correspond to bad addresses that we want to BLOCK-ALL can indeed be represented as a pruned version ofblock. One (expensive) solution is to use one ﬁlter to block each that LCP tree.bad address; thus the LCP tree is not pruned, and its leaves cor- Proposition 3.1: An optimal solution to BLOCK-ALL can berespond to the ﬁlters. Another feasible solution is to use three ﬁl- represented as a pruned subtree of LCP-tree with the sameters and block trafﬁc from preﬁxes 0/1, 8/2, and 12/4; this can be root as LCP-tree , up to leaves, and each nonleaf noderepresented by the pruned version of the LCP tree that includes having exactly two children.the aforementioned preﬁxes as leaves. Yet another (rather rad- Proof: We prove that, for each feasible solution toical) solution is to ﬁlter a single preﬁx (0/0) to block all trafﬁc; BLOCK-ALL , there exists another feasible solution that:this can be represented by the pruned version of the LCP tree 1) can be represented as a pruned subtree of LCP-tree asthat includes only its root. described in the proposition; and 2) whose collateral damage is Complexity: Given a list of addresses , we can build smaller or equal to ’s. This is sufﬁcient to prove the proposi-the LCP-tree by performing insertions in a Patricia tion, since an optimal solution is also a feasible one.trie . To insert a string of bits, we need at most Any ﬁltering solution can be represented as a prunedcomparisons. Thus, the worst-case complexity is , subtree of the full binary tree of all IP addresses [LCP-where (bits) is the length of a 32-b IPv4 address. tree ] with the same root and leaves cor- responding to the ﬁltered preﬁxes. is a feasible solution toB. BLOCK-ALL BLOCK-ALL, therefore uses up to ﬁlters, i.e., its tree Problem Statement: Given: a blacklist , a whitelist , has up to leaves. Indeed, if this was not the case, (6)and the number of available ﬁlters ; select ﬁlters that block would be violated and would not be a feasible solution.all bad trafﬁc and minimize collateral damage. Let us assume that the tree representing includes a preﬁx Formulation: We formulate this problem by making two ad- that is not in LCP-tree . There are three possible cases.justments to the general framework of (1)–(4). First, (1) be- 1) includes no bad addresses. In this case, we can simplycomes (5), which expresses the goal to minimize the collateral remove from ’s tree (i.e., unblock ).
SOLDO et al.: OPTIMAL SOURCE-BASED FILTERING OF MALICIOUS TRAFFIC 385 2) Only one of ’s children includes bad addresses. In this case, we can replace with the child node. Algorithm 1: Algorithm for Solving BLOCK-ALL 3) Both of ’s children contain bad addresses. In this case, is already the longest common preﬁx of all bad addresses 1: build LCP-tree in , thus already on the LCP-tree . 2: for all leaf nodes doClearly, each of these operations transforms feasible solution , 3:which is assumed not to be on LCP-tree , into another fea- 4:sible solution with smaller or equal collateral damage but 5: end foron the LCP-Tree . We can repeat this process for all pre- 6:ﬁxes that are in ’s tree but not in LCP-tree , until we 7: while docreate a feasible solution that includes only preﬁxes from 8: for all node such that doLCP-tree and has smaller or equal collateral damage. 9: The only element missing to prove the proposition is to show 10:that, in the pruned LCP subtree that represents , each nonleaf 11:node has exactly two children. We show this by contradiction:Suppose there exists a nonleaf node in our pruned LCP subtree 12:that has exactly one child. This can only result from pruning 13: end forout one child of a node in the LCP tree. This means that all the 14:bad addresses (leaves) in the subtree of this child node remain 15: end whileunﬁltered, which violates (7), but this is a contradiction because 16: return , is a feasible solution. Algorithm: Algorithm 1, which solves BLOCK-ALL, con- the two children nodes (preﬁxes) of in the LCP-tree .sists of two steps. First, we build the LCP tree from the input Finding the optimal allocation of ﬁlters to block allblacklist . Second, in a bottom–up fashion, we compute addresses contained in (possibly all IP space) is equivalent , i.e., the minimum collateral damage needed to to ﬁnding the optimal allocation of ﬁlters to block allblock all bad addresses in the subtree of preﬁx using at most addresses in , and preﬁxes for bad addresses in , ﬁlters. Following a dynamic programming (DP) approach, such that . This is because preﬁxes andwe can ﬁnd the optimal allocation of ﬁlters in the subtree rooted jointly contain all bad addresses. Moreover, each of andat preﬁx by ﬁnding a value and assigning http://ieeexploreprojects.blogspot.com address. Thus, at least one ﬁlter must ﬁlters to contains at least one badthe left subtree and to the right subtree, so as to minimize be assigned to each of them. If , i.e., there is onlycollateral damage. The fact that BLOCK-ALL needs to ﬁlter one ﬁlter available, the only feasible solution is to select asall bad addresses (leaves in the LCP tree) implies that at least the preﬁx to ﬁlter out. The same argument recursively applies toone ﬁlter must be assigned to the left and right subtree, i.e., descendant nodes, until either we reach a leaf node, or we have . In other words, for every pair of sibling only one ﬁlter available. In these cases, the problem is triviallynodes, (left) and (right), with common parent node , the solved by (9).following recursive equation holds: Complexity: The LCP-tree is a binary tree with leaves. Therefore, it has intermediate nodes (pre- (8) ﬁxes). Computing (8) for every node and for every value involves solving subproblems, one for every pair with complexity .with boundary conditions for leaf and intermediate nodes in (8) requires only the optimal solution at the sibling nodes, . Thus, proceeding from the leaves to the (9) root, we can compute the optimal solution in . (10) In practice, the complexity is even lower since we do not need to compute for all values , but onlyOnce we compute for all preﬁxes in the LCP tree, we for , where is thesimply read the value of the optimal solution, . number of the leaves in preﬁx in the LCP tree. Moreover, weWe also use auxiliary variables to keep track of the set only need to compute entries for every preﬁx , s.t. weof preﬁxes used in the optimal solution. In lines 4 and 10 of cover all addresses in , which may requireAlgorithm 1, is initialized to the single preﬁx used. In for long preﬁxes in the LCP-tree.line 12, after computing the new cost, the corresponding set of Finally, we observe that the asymptotic complexity ispreﬁxes is updated: . since and does not depend Theorem 3.2: Algorithm 1 computes the optimal solution on but only on the TCAM size. Thus, the time complexityof problem BLOCK-ALL: the preﬁxes that are contained in increases linearly with the number of bad addresses . Thisset are the optimal for (5)–(7). is within a constant factor of the lowest achievable complexity Proof: Recall that denotes the value of the op- since we need to read all bad addresses at least once.timal solution of BLOCK-ALL with ﬁlters (i.e., the min- Although the above is a worst-case analysis, we conﬁrmed inimum collateral damage), while denotes the set of simulation that the computation time in practice is very closeﬁlters selected in the optimal solution. Let and denote to that.
386 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 2, APRIL 2012C. BLOCK-SOME Complexity: The analysis of Algorithm 1 applies to this al- Problem Statement: Given a blacklist , a whitelist , gorithm as well. The complexity is the same, i.e., linearly in-and the number of available ﬁlters , the goal is to select creasing with .ﬁlters so as to minimize the total cost of the attack. BLOCK-ALL Versus BLOCK-SOME: There is an interesting Formulation: This is precisely the problem described by connection between the two problems. The latter can be re-(1)–(4), but put slightly rephrased to better compare it to garded as an automatic way to select the best subset fromBLOCK-ALL. There are two differences from BLOCK-ALL. and run BLOCK-ALL only on that subset. If the absolute valueFirst, the goal is to minimize the total cost of the attack, which of weights of bad addresses are signiﬁcantly larger than theinvolves both collateral damage and the ﬁltering beneﬁt weights of the good addresses, then BLOCK-SOME degener- , which is expressed by (11). Second, (13) states that every ates to BLOCK-ALL.bad address must be ﬁltered by at most one preﬁx, which meansthat it may or may not be ﬁltered D. TIME-VARYING BLOCK-ALL(SOME) (11) We now consider the case when the blacklist and whitelist change over time, and we seek to incrementally update the ﬁltered preﬁxes that are affected by the change. More precisely, s.t. (12) consider a sequence of blacklists and of whitelists at times , respectively. (13) Problem Statement: Given: 1) a blacklist and whitelist, and ; 2) the number of available ﬁlters ; 3) the corresponding solution to BLOCK-ALL(SOME), de- Characterizing an Optimal Solution: As with BLOCK-ALL, noted ; and 4) another blacklist and whitelist, andour algorithm starts from LCP-tree and outputs a pruned ; obtain the solution to BLOCK-ALL(SOME) for theversion of that LCP tree. The only difference is that some bad second blacklist/whitelist, denoted .addresses may now remain unﬁltered. In the pruned LCP sub- Algorithm: Consider, for the moment, that the whitelist re-tree that represents our solution, this means that there may exist mains the same and assume the focus is on the changes in theintermediate (nonleaf) nodes with a single child. blacklist. Proposition 3.3: An optimal solution to BLOCK-SOME can 1) Addition: First, consider that the two blacklists differ only http://ieeexploreprojects.blogspot.combe represented as a pruned subtree of LCP-tree with the in a single new bad address, which does not appear in ,same root as LCP-tree and up to leaves. but appears in . There are two cases, depending on whether Proof: In Proposition 3.1, we proved that any solution of the new bad address belongs to a preﬁx that is already ﬁltered(5) and (6) can be reduced to a (pruned) subtree of the LCP tree in . If it is, no further action is needed, and .with at most leaves. Moreover, the constraint expressed by Otherwise, we modify the LCP tree that represents to also(13), which imposes the use of nonoverlapping preﬁxes, is auto- include the new bad address, as illustrated in Fig. 3. The keymatically imposed considering the of the pruned subtree point is that we only need to add one new intermediate nodeas the selected ﬁlter. This proves that any feasible solution of to the LCP tree (the gray node in Fig. 3), corresponding to theBLOCK-SOME can be represented as a pruned subtree of the longest common preﬁx between the new bad address and itsLCP tree with at most leaves, and thus, so can an optimal closest bad address that is already in the LCP tree. The optimalsolution. allocation of ﬁlters to the subtree rooted at preﬁx depends Algorithm: The algorithm that solves BLOCK-SOME is sim- only on how these ﬁlters are allocated to the children of .ilar to Algorithm 1 in that it relies on the LCP tree and a dynamic Hence, when we add a new node to the LCP tree, we need toprogramming (DP) approach. The main difference is that not all recompute the optimal ﬁlter allocation [i.e., recomputebad addresses need to be ﬁltered. Hence, at each step, we can and , according to (8)] for all and only the ancestorsassign ﬁlters to the left and/or right subtree, whereas in of the new node, all the way up to the root node.line 11 of Algorithm 1 we had , now we have 2) Deletion: Then, assume that two blacklists differ in one . We can recursively compute the optimal solu- deleted bad address, which appears in but not in .tion as before In this case, we modify the LCP tree that represents to remove the leaf node that corresponds to that address as well as (14) its parent node (since that node does not have two children any more), and we recompute the optimal ﬁlter allocation for all andwith boundary conditions only the node’s ancestors. 3) Adjustment: Finally, suppose that the two blacklists differ (15) in one address, which appears in both blacklists but with dif- (16) ferent weights, or that the two blacklists are the same, while the two whitelists differ in one address (it either appears in one of (17) the two whitelists or it appears in both whitelists but with dif- ferent weights). In all of these cases, we do not need to add or re-where is an intermediate node (preﬁx) and is a leaf node move any nodes from the LCP tree, but we do need to adjust thein the LCP-tree. collateral damage or ﬁltering beneﬁt associated with one node,
SOLDO et al.: OPTIMAL SOURCE-BASED FILTERING OF MALICIOUS TRAFFIC 387 gets blocked, ) should ﬁt within the link capacity , so as to avoid congestion and packet loss (18) s.t. (19) (20) (21)Fig. 3. Example of BLOCK-ALL and TIME-VARYING Characterizing an Optimal Solution: We represent the op-BLOCK-ALL. Consider a blacklist of 10 bad IP addresses timal solution as a pruned subtree of an LCP-tree. However,BL = f10:0:0:3; 10:0:0:10; 10:0:0:15; 10:0:0:17; 10:0:0:22; 10:0:0:31; we start with the full binary tree of all bad and good addresses10:0:0:32; 10:0:0:33; 10:0:0:57; 10:0:0:58g. The table next to eachnode p shows the minimum cost z (F ) computed by the DP algorithm LCP-tree . Moreover, to handle the constraint infor BLOCK-ALL for F = 1; . . . number of leaves in subtree. The optimal (20), each node corresponding to preﬁx is assigned an addi-solution to BLOCK-ALL consists of the four preﬁxes highlighted in black. tional cost, , indicating the total amount of trafﬁc sent by ,When a new address, e.g., 10.0.0.37, is added to the blacklist, a leaf nodeis added to the tree and TIME-VARYING needs to update all and only the .ancestor nodes in LCP-tree(BL), indicated by the dashed lines, according to Proposition 3.4: An optimal solution of FLOODING can be(8). Moreover, a new node is created to denote the longest common preﬁx represented as the leaves of a pruned subtree of LCP-treebetween 10.0.0.37 and 10.0.0.32 (or 10.0.0.33). Note that all other nodescorresponding to the longest common preﬁxes between 10.0.0.37 and other , with the same root, up to leaves, and the total costaddresses in BL are already in the LCP tree. The new optimal solution consists of the leaves .of the four preﬁxes indicated by the dashed circles. Proof: Similarly to Proposition 3.1, we prove that for every feasible solution to FLOODING , there exists another feasible solution , which: 1) can be represented as a pruned subtree http://ieeexploreprojects.blogspot.com as described in the proposition; andhence recompute the optimal ﬁlter allocation for all and only of LCP-treethat node’s ancestors. 2) whose collateral damage is smaller or equal to ’s. This is 4) Multiple Addresses: If the two successive time instances sufﬁcient to prove the proposition since an optimal solution isdiffer in multiple addresses, we repeat the procedures described also a feasible one.above as needed, i.e., we perform one node addition for each Any ﬁltering solution can be represented as a pruned subtreenew bad address, one deletion for each removed bad address, of LCP-tree with the same root and leavesand up to one adjustment for each other difference. corresponding to the ﬁltered preﬁxes. is a feasible solution Complexity: Since the LCP tree is a complete binary tree, to FLOODING, therefore: ’s tree has up to leaves, oth-any leaf node has at most ancestors, so inserting a erwise (19) would be violated; the total cost of ’s leaves isnew bad address (or removing one) requires , otherwise (20) would be violated.operations. Hence, deriving from as described above Suppose that includes a preﬁx that is not inis asymptotically better than computing it from scratch using LCP-tree . We can construct a better feasibleAlgorithm 1 if and only if the number of different addresses solution , which can be represented as a pruned subtreebetween the two time instances is less than . of LCP-tree : has the same root, up to leaves, and total cost of the leaves . There are three possibilities.E. FLOODING 1) includes neither bad nor good addresses. In this case, we can simply remove from , i.e., unblock . Problem Statement: Given: 1) a blacklist and a 2) Only one of ’s children includes bad or good addresses.whitelist , where the absolute weight of each bad and In this case, we can replace with the child that containsgood address is equal to the amount of trafﬁc it generates; the bad addresses.2) the number of available ﬁlters ; and 3) a constraint on 3) Both of ’s children include bad or good addresses. In thisthe victim’s link capacity (bandwidth) ; select ﬁlters so as to case, is already a longest common preﬁx, and we do notminimize collateral damage and make the total trafﬁc ﬁt within need to do anything.the victim’s link capacity. Clearly, each of these operations transforms feasible solu- Formulation: To formulate this problem, we need to make tion into another feasible solution with smaller or equaltwo adjustments to the general framework of (1)–(4). First, (1) collateral damage while still preserving the capacity constraint.becomes (18), which expresses the goal to minimize collateral This is because the transformations ﬁlter the same amountdamage. Second, we add a new constraint (20), which speciﬁes of trafﬁc, just using the longest preﬁx possible to do so. Wethat the total trafﬁc that remains unblocked after ﬁltering (which can repeat this process for all preﬁxes that are in but not inis the total trafﬁc, , minus the trafﬁc that LCP-tree , until we create a feasible solution
388 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 2, APRIL 2012that includes only preﬁxes from LCP-tree and has only the number of ﬁlters allocated to that subtree, but alsosmaller or equal collateral damage. the corresponding amount of capacity (i.e., the amount of the Theorem 3.5: FLOODING [i.e., (18)–(21)] is NP-hard. victim’s capacity consumed by the unﬁltered trafﬁc coming Proof: To prove that FLOODING is -hard, we con- from the corresponding preﬁx). We can recursively computesider the knapsack problem with a cardinality constraint the optimal solution bottom–up as before (29) (22) (23) where is the minimum collateral damage of preﬁx when allocating ﬁlters and capacity to that preﬁx. s.t. (24) Complexity: Our DP approach computes entries for every node in LCP-tree . Moreover, the compu- tation of a single entry, given the entries of descendant nodes,which is known to be -hard , and we show that it reduces require operations, (29). We can leverage again theto FLOODING. To do this, we put FLOODING in a slightly observation that we do not need to compute entries fordifferent form by making two changes. all nodes in the LCP tree: At a node , it is sufﬁcient to com- First, we change the inequality in (19) to an equality. Any pute (29) only forfeasible solution to FLOODING that uses ﬁlters and . Therefore, thecan be transformed to another feasible solution with exactly optimal solution to FLOODING, , can be com- ﬁlters, without increasing collateral damage. In fact, given puted in time. This is increasing linearlya feasible solution that uses ﬁlters, as long as with the number of addresses in and is polynomial , it is always possible to remove a ﬁlter from a preﬁx in . The overall complexity is pseudo-polynomial becauseand add two ﬁlters to the two preﬁxes corresponding to ’s chil- cannot be polynomially bounded in the input size. In the evalu-dren in LCP-tree . The solution constructed this way ation section, we present a heuristic algorithm that operates inuses ﬁlters, blocks all addresses blocked in , and has a increments of . Finally, we note that and thuscost that is less or equal to ’s. does not appear in the asymptotic complexity. Second, we deﬁne variables , , BLOCK-SOME Versus FLOODING: There is an interestingand and use them to rewrite FLOODING connection between the two problems. To see that, consider the http://ieeexploreprojects.blogspot.com partial Lagrangian relaxation of (18)–(21) (25) s.t. (26) (30) (27) s.t. (31) (28) (32) For a given instance of the problem deﬁned by (22) and (23), For every ﬁxed , (30)–(32) are equivalent to (11)–(13) forwe construct an equivalent instance of the problem deﬁned by a speciﬁc assignments of weights . This shows that dual fea-(25)–(28) by introducing the following mapping. For sible solutions of FLOODING are instances of BLOCK-SOME : , . For all other preﬁxes that for a particular assignment of weights. The dual problem, in theare not addresses in the blacklist or whitelist: variable , aims exactly at tuning the Lagrangian multiplier to . Moreover, we assign and . With ﬁnd the best assignment of weights.this assignment, a solution to the problem deﬁned by (22) can F. DISTRIBUTED-FLOODINGbe obtained by solving FLOODING, then taking the values ofvariables that are blocked. Problem Statement: Consider a victim that connects to Algorithm: Given the hardness of the problem, we do not the Internet through its ISP and is ﬂooded by a set of attackerslook for a polynomial-time algorithm. We design a pseudo-poly- listed in a blacklist , as in Fig. 1(a). To reach the victim,nomial-time algorithm that optimally solves FLOODING. Its attack trafﬁc passed through one or more ISP routers. Letcomplexity is linearly with the number of good and bad ad- be the set of unique such routers. Let each router havedresses and with the magnitude of . capacity on the downstream link (toward ) and a limited Our algorithm is similar to the one that solves number of ﬁlters . The volume of good/bad trafﬁc throughBLOCK-SOME, i.e., it relies on an LCP tree and a DP every router is assumed known. Our goal is to allocate ﬁlters onapproach. However, we now use the LCP tree of all the bad some or all routers, in a distributed way, so as to minimize theand good addresses. Moreover, when we compute the optimal total collateral damage and avoid congestion on all links of theﬁlter allocation for each subtree, we now need to consider not ISP network.
SOLDO et al.: OPTIMAL SOURCE-BASED FILTERING OF MALICIOUS TRAFFIC 389 Formulation: Let the variables indicate The dual problem iswhether or not ﬁlter is used at router . Then, the dis-tributed ﬁltering problem can be stated as (41) (33) where is the optimal solution of (38)–(40) for a given . Given the prices , every subproblem (38)–(40) can be s.t. (34) solved independently and optimally by router using (29). Problem (41) can be solved using a projected subgradient (35) method . In particular, we use the following update rule to compute shadow prices at each iteration: (36) Characterizing an Optimal Solution: Given the sets , , , and , at each router, we have the following. Proposition 3.6: There exists an optimal solution of where is the step size. The interpretation of the update rule isDIST-FLOODING that can be represented as a set of quite intuitive: For every that is ﬁltered with multiple ﬁlters,different pruned subtrees of the LCP-tree , each the corresponding shadow price is augmented proportion-corresponding to a feasible solution of FLOODING for the ally to the number of times it is blocked. Increasing the pricessame input, and s.t. every subtree leaf is not a node of another has in turn the effect of forcing the router to try to unblock thesubtree. corresponding . The price is increased until a single ﬁlter is Proof: Feasible solutions of DIST-FLOODING allocate used to block that .ﬁlters on different routers s.t. (34) and (35) are satisﬁed indepen- Note, however, that since is an integer variable, ,dently at every router. In the LCP tree, this means having the dual problem is not always guaranteed to converge to asubtrees, one for every router, each having at most leaves primal feasible solution .and their associated blocked trafﬁc , where is Distributed Versus Centralized Solution: When the number http://ieeexploreprojects.blogspot.comthe total incoming trafﬁc at router . Each subtree can be thought of source attackers is too large to be blocked at a single router,as a feasible solution of a FLOODING problem. Eq. (36) en- we need to combine resources such as the overall network ca-sures that the same address is not ﬁltered multiple times at dif- pacity and the number of ﬁlters, available at multiple routers toferent routers, to avoid waste of ﬁlters. In the LCP-tree, this reduce the overall collateral damage. The optimal allocation oftranslates into every leaf appearing at most in one subtree. ﬁlters, in that case, can be found by solving (33)–(36) provides Algorithm: Constraint (36), which imposes that different the optimal allocation of ﬁlters in that case.routers do not block the same preﬁxes, prevents us from a The solution can be found either in a centralized or in a dis-direct decomposition of the problem. To decouple the problem, tributed way. In a centralized approach, a single node solvesconsider the following partial Lagrangian relaxation: (33)–(36) and distributes the optimal ﬁlter allocation to the all routers involved. This reduces the communication overhead be- tween routers, however the computation burden is put all on the node that solves both the master problem and different sub- problems. This can become infeasible in practice due to the large number of attackers. An alternative approach is to have each router solve its own (37) subproblem (38)–(40) and a single node (e.g., the victim’s gateway or a dedicated node) to solve the master problem (41). The communication overhead of this scheme is limited. Atwhere is the Lagrangian multiplier (price) for the constraint every iteration of the subgradient, the shadow prices ’s arein (36), and is the price associated with broadcast to all routers. For 100 000 of bad IPs, if we encodepreﬁx . With this relaxation, both the objective function and the value of variables using 2 B, each router need to sendthe other constraints immediately decompose in indepen- about 200 kB of data at each iteration. Given the ’s, thedent subproblems, one per router routers solve independently a subproblem each and return the computed to the node in charge of the master problem. In (38) general, let denote the total number of active IP sources, and the communication overhead is per router per iteration. Our approach only requires communication between s.t. (39) the node solving the master problem and the nodes solving the subproblems. It does not entail communication between (40) subproblems, which would incur a signiﬁcant communication overhead.
390 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 2, APRIL 2012 IV. SIMULATION RESULTS In this section, we evaluate our algorithms using real logsof malicious trafﬁc from Dshield.org. We demonstrate thatour algorithms bring signiﬁcant beneﬁt compared to nonopti-mized ﬁlter selection or to generic clustering algorithms in awide range of scenarios. The reason is the well-known fact thatsources of malicious trafﬁc exhibit spatial and temporal clus-tering –, which is exploited by our algorithms. Indeed,clustering in a blacklist allows to use a small number of ﬁlters toblock preﬁxes with high density of malicious IPs at low collat-eral damage. Furthermore, it has also been observed that goodand bad addresses are typically not colocated, which allows fordistinguishing between good and bad trafﬁc , , , andin our case for efﬁcient ﬁltering of the preﬁxes with most mali- Fig. 4. Evaluation of BLOCK-ALL in Scenarios I (High Clustering blacklist)cious sources. and II (Low Clustering blacklist). We plot the collateral damage (CD) (normal- N ized over the number of malicious sources ) versus number of ﬁlters F . WeA. Simulation Setup k compare Algorithm 1 to -means clustering (we simulated 50 runs of Lloyd’s algorithm ). We used 61-day logs from Dshield.org —a repos-itory of ﬁrewall and intrusion detection logs collected. Thedataset consists of 758 698 491 attack reports, from 32 950 391 common algorithm to solve the problem, known as Lloyd’sdifferent IP sources belonging to about 600 contributing orga- algorithm , which uses an iterative reﬁnement technique.nizations. Each report includes a timestamp, the contributor BLOCK-ALL: We ran Algorithm 1 in these Scenarios I andID, and the information for the ﬂow that raised the alarm, II, and we show the results in Fig. 4. We made the following ob-including the (malicious) source IP and the (victim) destination servations. First, the optimal algorithm performs signiﬁcantlyIP. Looking at the attack sources in the logs, we veriﬁed that better than a generic clustering algorithm that does not exploitmalicious sources are clustered in a few preﬁxes, rather than the structure of IP preﬁxes. In particular, it reduces the collat-uniformly distributed over the IP space, consistently with what eral damage (CD) by up to 85% compared to -means when http://ieeexploreprojects.blogspot.comwas observed before, e.g., in –. run on the same (high-clustering) blacklist. Second, the degree In our simulations, we considered a blacklist to be the set of clustering in a blacklist matters: The CD is lowest (highest) inof sources attacking a particular organization (victim) during a the blacklist with highest (lowest) degree of clustering, respec-single day-period. The degree of clustering varied signiﬁcantly tively. Results obtained for different victims and days were sim-in the blacklists of different victims and across different days. ilar and lied in between the two extremes. A few thousands ofThe higher the clustering, the more beneﬁt we expect from our ﬁlters were sufﬁcient to signiﬁcantly reduce collateral damageapproach. We also simulated the whitelist by generating good in all cases.IP addresses according to the multifractal distribution in  BLOCK-SOME: In Fig. 5, we focus on Scenario II, i.e., theon routable preﬁxes. We performed the simulations on a Linux Low Clustering blacklist, which is the least favorable input formachine with a 2.4-GHz processor with 2 GB RAM. our algorithm and has the highest CD (shown in dashed line in Fig. 4). Compared to BLOCK-ALL, which blocks all badB. Simulation of BLOCK-ALL and BLOCK-SOME IPs, BLOCK-SOME allows the operator to trade off lower CD for some unﬁltered bad IPs by appropriately tuning the weights Simulation Scenarios I and II: In Fig. 4, we consider two assigned to good and bad addresses. For simplicity,example blacklists corresponding to two different victims, each in Fig. 5, we assign the same weight to all good addresses,attacked by a large number of malicious IPs in a single day. and the same weight to all bad addresses. Fig. 5 shows resultsIn order to demonstrate the range of beneﬁt of our approach, for two different values of . (However, we note thatwe chose the blacklists with the highest and the lowest degree our framework has the ﬂexibility to assign different weights toof source clustering observed in the entire data set, referred each individual IP address.)to as “High Clustering” (Scenario I) and “Low Clustering” In Fig. 5(a), CD is always smaller than the corresponding(Scenario II), respectively. The degree of clustering of a black- CD in Fig. 4; they become equal only when we block all badlist is captured by the entropy associated with the distribution IPs. In Fig. 5(b), we observe that BLOCK-SOME reduces theof the IP addresses . Intuitively, a blacklist with low entropy CD by 60% compared to BLOCK-ALL while leaving unﬁltered(i.e., high clustering) contains IP addresses in a few preﬁxes only 10% of bad IPs and using only a few hundreds a ﬁlters. Inand thus is easier to block. Fig. 5(c), the total cost of the attack (i.e., the weighted sum of We compare Algorithm 1 to a -means, which is gen- bad and good trafﬁc blocked) decreases as increases. Theeral yet preﬁx-agnostic. -means is a well-known clustering interaction between these two competing factors is complex andproblem : The goal is to partition all observations (in our strongly depends on the input blacklist and whitelist. In the datacontext, IP addresses) in clusters such that each address we analyzed, we observed that CD tends to ﬁrst increase andbelongs to the cluster with the nearest mean. We use the most then decrease with , while the number of unﬁltered bad
SOLDO et al.: OPTIMAL SOURCE-BASED FILTERING OF MALICIOUS TRAFFIC 391Fig. 5. Evaluation of BLOCK-SOME for Scenario II (Low Clustering black- Wlist). Three metrics are considered: (a) collateral damage (CD), (b) number ofunﬁltered bad IPs (UBIP), (c) and total cost CD + 1 UBIP . The operator w ;w W w =wexpresses preference for UBIP versus CD by tuning the weights . Weconsidered a high (2 ) and a low (2 ) value for = .IPs tends to decrease The ratio captures the effort2 madeby BLOCK-SOME to block all bad IPs and become similar toBLOCK-ALL.C. Simulation of FLOODING and DIST-FLOODING http://ieeexploreprojects.blogspot.comFLOODING in Scenario III. (a) CD=N versus F . We show the normalized collateral damage CD=N as a function of the Fig. 6. Optimal Solution of Simulation Scenario III: We consider a Web server under a number of available ﬁlters F when C is ﬁxed, C = C . (b) CD=N versus C . We show CD=N as a function of the available capacity CDDOS attack. We assume that the server has a typical access when F is ﬁxed, F = Fbandwidth of Mb/s and can handle 10 000 connec- .tions per second (a typical capability of a Web server that han-dles light content). We assume that each good (bad) source gen-erates the same amount of good (bad) trafﬁc. We also assume ﬂows, which is a common practice in DDOS attacks .that ﬁlters are available (consistently with the Since, in a typical DDOS attack, bad sources outnumber thediscussion in footnote 1), and we vary . Be- good sources, uniform rate-limiting disproportionately penal-fore the attack, 5000 good sources are picked from  and izes the good sources. While this solution is always applicableutilize 10% of the capacity. During the attack, the total bad and requires only one rate limiter, more ﬁlters (ACLs) cantrafﬁc is Gb/s and is generated by a typical black- drastically reduce the collateral damage. We simulated onelist (141 763 bad source IPs), based on Dshield logs of a ran- of the algorithms in , referred to as SAPF positive, whichdomly chosen victim for a randomly chosen day.3 selects preﬁxes to allow and denies trafﬁc generated by FLOODING—Optimal: Fig. 6(a) and (b) shows the collateral all IP addresses outside those preﬁxes.4 It uses clustering todamage of the optimal solution of FLOODING, for Scenario III, ﬁnd initial IP addresses to allow. Then, it greedilyas a function of the number of available ﬁlters and of the shortens the length of the allowed preﬁxes, thus allowingbottleneck capacity , respectively. more trafﬁc, until the available capacity is completely used. We simulate two baselines, namely uniform rate-limiting This heuristic solves the FLOODING problem, but does notand source address preﬁx ﬁltering (SAPF) , and we guarantee an optimal solution. In our simulations, the optimalcompare them to our approach. The uniform rate-limiting solution found by our algorithm, for the same number of ACLs,approach drops the same fraction of trafﬁc of all incoming reduces the collateral damage by about a factor of 2. Another observation is that, because of its heuristic nature, SAPF does 2Since w =w > 1, bad IPs are more important. WhenF we picked a ratio not necessarily decrease monotonically with , consistently W is high, the algorithm ﬁrst tries to cover small clusters or single bad IPs.In the case of high , this happens around 10 000 ﬁlters. CD remains almost 4In , there are three heuristics proposed for generating ACL lists: a pos-constant in this phase, at the end of which all bad IPs are ﬁltered [as in Fig. itive algorithm (which speciﬁes preﬁxes to allow and blocks all other trafﬁc),5(b)]. In the ﬁnal phase, the algorithm releases single good IPs, which are less a negative (which speciﬁes preﬁxes to block and accepts all other trafﬁc), andimportant, and all bad IPs are blocked similarly to BLOCK-ALL. a mixed (with possibly overlapping preﬁxes). In this paper, we compare our 3However, because this problem is NP-hard, we do not simulate the entire optimal algorithm for the FLOODING problem only against the positive algo-IP space, but the range [188.8.131.52, 184.108.40.206], which is known to account for the rithm. The positive algorithm was found, in , to perform the best (i.e., have F ;C ;wlargest amount of malicious trafﬁc, e.g., see . We also scale all parameters by the lowest collateral damage for the same number of ACLs), followed closely F Ca factor of 8, to maintain a constant ratio between the number by the mixed algorithm. The negative algorithm performed the worse; this isof IPs and , and the total ﬂow generated and . why we do not compare against it, although it is more similar to our approach.
392 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 2, APRIL 2012with , which might result in inefﬁcient use of the availableﬁlters. In summary, compared to , our work ﬁnds theoptimal solution at the lowest possible complexity (linear in theinput size) and for a wider range of problems (including but notlimited to FLOODING). We also observe that varying the number of ﬁlters or the avail-able capacity has a different impact on the collateral damage.While the collateral damage decreases drastically as the numberof ﬁlters increases, when we increase the available capacity, weobserve two trends. First, as capacity increases, the optimal so- Fig. 7. Heuristic solution of FLOODING in Scenario IV. An approximate so-lution allows trafﬁc from good sources that do not belong in lution (higher CD) is obtained by solving only the subproblems z (f; n1C )preﬁxes with many malicious sources. This causes a linear de- for n 2 and C = n1C C , n = 1; 2 . . .. The coarser the capacity increments 1C , the fewer subproblems we need to solve, but at the cost ofcrease with slope equal to the amount of trafﬁc generated by higher collateral damage. In this scenario, increasing 1C signiﬁcantly reducesgood sources. For even larger , good IPs located in the same the computational time by three orders of magnitude, while the percentage ofpreﬁx as malicious sources are released. This trend depends on good trafﬁc that is blocked (CD %) is only increased by a factor 3.the speciﬁc clustering of good and bad IPs considered as well ason the amount of trafﬁc generated by both good and bad sources.When the value of is too low, increasing does notyield any beneﬁt. Most of the improvement is obtained whenboth types of resources (number of ﬁlters and capacity) increase. FLOODING—Heuristic: The beneﬁt of the optimal solutionof FLOODING comes at high computational cost due to the in-trinsic hardness of the FLOODING problem. To address thisissue, we design a heuristic for solving FLOODING, whichcan be tuned to achieve the desired tradeoff between collateraldamage and computational time. In particular, instead of solvingall subproblems for all possible values ofand , we consider discrete increments of capacity , with step size . If http://ieeexploreprojects.blogspot.com , the ﬁnestgranularity of is considered, and the problem is optimallysolved. If , we may get a suboptimal solu- Fig. 8. Distributed ﬂooding. This topology exempliﬁes the part of a potentiallytion, but we reduce the computation cost, as fewer iterations are larger ISP topology involved in routing and blocking trafﬁc toward victim V .required to solve the DP. The edge routers receive all incoming, malicious and legitimate, trafﬁc toward Simulation Scenario IV: We consider again a DDOS at- victim V and route it through shortest paths with ties broken randomly. Any of the traversed routers (indicated with circles) can be used to deploy ACLs andtack launched by 61 229 different bad source IPs, based block the malicious trafﬁc.on the Dshield.org logs. The available capacity Mb/s. Before the attack, the legitimate trafﬁcconsumes Mb/s. During the attack, the total optimal ﬂooding for a single router, but now we assume thatbad trafﬁc generated is Gb/s. This scenario is the trafﬁc reaches the victim through its ISP, as in Fig. 8.more challenging than scenario III because there is less unused We use a subgradient descent method to solve the dualcapacity before the attack and more malicious trafﬁc during the problem in (41). In Fig. 9, we show the convergence of theﬂooding attack. method for two different step sizes: 0.05 and 0.01. We also In Fig. 7, we show the percentage of good trafﬁc that is compare against the “no coordination” case, when routersblocked by the heuristic versus the time required to obtain a do not coordinate but act independently to block malicioussolution for Scenario IV. As we can see in Fig. 7, the optimal trafﬁc; this corresponds to the ﬁrst iteration of the subgradientsolution of FLOODING requires about 1 day of method. In the next iterations, routers coordinate, through thecomputation and has a CD that is only 6% of the total good exchange of shadow prices , and avoid the redundant overlaptrafﬁc. Larger values of allow to dramatically reduce the of preﬁxes at multiple locations. This reduces the collateralcomputational time by about three orders of magnitude while damage signiﬁcantly, i.e., by 50%.the CD is only increased by a factor 2–3. This asymmetry was Increasing the number of routers has two effects. On onealso the case in other Dshield logs we simulated. This can hand, it increases the total number of ﬁlters available to blockbe very useful in practice: An operator may decide to use an source IPs and thus can potentially reduce the overall collateralapproximation of the optimal ﬁltering policy to immediately damage. On the other hand, deploying more routers increasescope with incoming bad trafﬁc, and then successively reﬁne the the communication overhead required to coordinate them.allocation of ﬁlters to further reduce the collateral damage if In Fig. 10, we study the tradeoff between reduced collateralthe attack persists. damage and increased communication overhead as the number DISTRIBUTED-FLOODING: We simulated the scenario of routers increases. For simplicity, we assume that each routerwhere an ISP utilizes multiple routers to collaboratively block has the same number of ﬁlters, i.e., . Asmalicious trafﬁc. We consider the same scenario (III) as for the we can see in Fig. 10, increasing the number of routers provides
SOLDO et al.: OPTIMAL SOURCE-BASED FILTERING OF MALICIOUS TRAFFIC 393 on which ﬁlters to pick. Therefore, they are complementary but orthogonal to this work. We rely on an intrusion detection system or on historical data to distinguish good from bad trafﬁc and to provide us with a blacklist. Detection of malicious trafﬁc is an important problem, but out of the scope of this paper. The sources of legitimate trafﬁc are also assumed known and used for assessing the collat- eral damage—e.g., Web servers or ISPs typically keep historical data and know their important customers. We also consider ad- dresses in the blacklist to not be spoofed. A practical deployment scenario is that of a single network under the same administrative authority, such as an ISP or a campus network. The operator can use our algorithms to install ﬁlters at a single edge router or at several routers in order toFig. 9. Evaluation of DIST-FLOODING in Scenario III. Results are shown optimize the use of its resources and to defend against an attackfor the distributed algorithm and two different values for the step size (0.01 in a cost-efﬁcient way. Our distributed algorithm may also beand 0.05) of the subgradient method. The dashed line shows the case of “NoCoordination,” i.e., when each router acts independently. useful, not only for a routers within the same ISP, but also, in the future, when different ISPs start cooperating against common enemies . The problems studied in this paper are also related to ﬁrewall conﬁguration to protect public-access networks. Unlike routers where TCAM puts a hard limit on the number of ACLs, there is no hard limit on the number of ﬁrewall rules in software. How- ever, there is still an incentive to minimize their number and thus any associated performance penalty . There is a body of work on ﬁrewall rule management and (mis)conﬁguration , which aims at detecting anomalies, while we focus on resource allocation. http://ieeexploreprojects.blogspot.com Several measurement studies have Measurement Studies:Fig. 10. Distributed ﬂooding: CD=N and communication overhead versus demonstrated that malicious sources exhibit spatial and tem-jRj. On the x-axis is the number of routers jRj. On the left y -axis is thenormalized collateral damage CD=N . On the right y -axis is the overall poral clustering –, . In order to deal with dynamiccommunication overhead (MB/iteration) required to coordinate routers. As malicious IP addresses , IP preﬁxes rather than individual IPCD=N decreases faster than the communication overhead increases, there is a addresses are typically considered. The clustering, in combina-sweet spot (three routers, for this input blacklist) where we beneﬁt from largereduction of CD=N while incurring a modest increase in the communication tion with the fact that the distribution of addresses as well asoverhead required. other statistical characteristics differ for good and bad trafﬁc, have been exploited in the past for detection and mitigation of malicious trafﬁc, such as, e.g., spam ,  or DDOS .a signiﬁcant beneﬁt in terms of reducing the collateral damage, In this paper, we exploit these characteristics for efﬁcientwhile the communication overhead increases only linearly preﬁx-based ﬁltering of malicious trafﬁc.with the overall number of ﬁlters available, as discussed in Preﬁx Selection: The work in  studied source preﬁx ﬁl-Section III-F. Depending on the bandwidth available and on the tering for classiﬁcation and blocking of DDOS trafﬁc, whichtolerable level of collateral damage, a network administrator is closely related to our FLOODING problem. The selection ofcan decide how many routers should be deployed to ﬁlter the preﬁxes in  was done heuristically, thus leading to large col-attack sources. Our work provides the framework to do so in a lateral damage. In contrast, we tackle analytically the optimalprincipled way. source preﬁx selection so as to minimize collateral damage. Fur- thermore, we provide a more general framework for formulating V. RELATED WORK and optimally solving a family of related problems, including Bigger Picture: Dealing with malicious trafﬁc is a hard but not limited to FLOODING.problem that requires the cooperation of several components, The work in  is related to our TIME-VARYINGincluding detection and mitigation techniques, as well as problem: It designed and analyzed an online learning algorithmarchitectural aspects. In this paper, we optimize the use of for tracking malicious IP preﬁxes based on a stream of labeledﬁltering—a mechanism that already exists on the Internet today data. The goal was detection, i.e., classifying a preﬁx as mali-and is a necessary building block of any bigger solution. More cious, depending on the ratio of malicious to legitimate trafﬁc itspeciﬁcally, we focus on the optimal selection of which preﬁxes generates and subject to a constraint on the number of preﬁxes.to block. The ﬁltering rules can be then propagated by ﬁltering In contrast: 1) we identify precisely (not approximately) theprotocols – and ideally installed on routers as close to IP preﬁxes with the highest concentration of malicious trafﬁc;the attack sources as possible. We note, however, that such pro- 2) we follow a different formulation (dynamic programmingtocols typically assume the ability to ﬁlter trafﬁc at arbitrarily inspired by knapsack problems); 3) we use the results of detec-ﬁne granularity and focus on where to place the ﬁlters rather tion as input to our ﬁltering problem.
394 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 2, APRIL 2012 An earlier body of literature focused on identifying IP pre- algorithms over Dshield.org logs and demonstrate thatﬁxes with signiﬁcant amount of network trafﬁc, typically re- they bring signiﬁcant beneﬁt compared to nonoptimized ﬁlterferred to as hierarchical heavy hitters: –. However, it did selection or to generic clustering algorithms. A key insightnot consider the interaction between legitimate the malicious behind that beneﬁt is that our algorithms exploit the spatial andtrafﬁc within the same preﬁx, which is the core tradeoff studied temporal clustering exhibited by sources of malicious trafﬁc.in this paper. Relation to Knapsack Problems: Filter selection belongs to ACKNOWLEDGMENTthe family of multidimensional knapsack problems (dKP) . The authors are grateful to P. Barford and M. Blodgett withThe general dKP problem is well known to be NP-hard. The the University of Wisconsin–Madison for making the 6-monthmost relevant variation is the knapsack with cardinality con- Dshield.org logs available to them. They would also like tostraint (1.5KP) , , which has constraints, one of thank M. Faloutsos for insightful discussions.them being a limit on the number of items: , . The 1.5KP problem is also NP-hard. These classic problems do not consider correlation between REFERENCESitems. However, in ﬁltering, the selection of an item (preﬁx)  “Understanding ACL on Catalyst 6500 series switches,” Cisco Systems, San Jose, CA, 2003 [Online]. Avail-voids the possibility to select other items (all overlapping able: http://www.cisco.com/en/US/products/hw/switches/preﬁxes). dKP problems with correlation between items have ps708/products_white_paper09186a00800c9470.shtmlbeen studied in  and , where the items were partitioned  “Protecting your core: Infrastructure protection accessinto classes and up to one item per class was picked. In our control lists,” Cisco Systems, San Jose, CA, 2008 [On- line]. Available: http://www.cisco.com/en/US/tech/tk648/case, a class is the set of all preﬁxes covering a certain address. tk361/technologies_white_paper09186a00801a1a55.shtmlEach item (preﬁx) can belong simultaneously to any number of  M. Collins, T. Shimeall, S. Faber, J. Janies, R. Weaver, M. De Shon, andclasses, from one class (/32 address) to all classes (/0 preﬁx). J. Kadane, “Using uncleanliness to predict future botnet addresses,” in Proc. ACM Internet Meas. Conf., San Diego, CA, Oct. 2007, pp.To the best of our knowledge, we are the ﬁrst to tackle a case 93–104.where the classes are not a partition of the set of items.  Z. Chen, C. Ji, and P. Barford, “Spatial-temporal characteristics of A continuous relaxation does not help either. Allowing internet malicious sources,” in Proc. IEEE INFOCOM Mini-Conf., Phoenix, AZ, May 2008, pp. 2306–2314.to be fractional corresponds to rate-limiting of preﬁx . This  Z. Mao, V. Sekar, O. Spatscheck, J. Van Der Merwe, and R. Vasudevan,has no advantage neither from a practical (rate limiters are more “Analyzing large DDoS attacks using multiple data sources,” in Proc.expensive than ACLs because, in addition to looking up packets ACM SIGCOMM Workshop Large-Scale Attack Defense, Pisa, Italy, http://ieeexploreprojects.blogspot.com N. Feamster, “Understanding the network-levelin TCAM, they also require rate and computation on the fast Sep. 2006, pp. 161–168.  A. Ramachandran andpath) nor from a theoretical point of view (the continuous 1.5KP behavior of spammers,” in Proc. ACM SIGCOMM, Pisa, Italy, Sep.is still NP-hard .) 2006, pp. 291–302. In summary, the special structure of the preﬁx ﬁltering  S. Venkataraman, S. Sen, O. Spatscheck, P. Haffner, and D. Song, “Ex- ploiting network structure for proactive spam mitigation,” presented atproblem, i.e., the hierarchy and overlap of candidate preﬁxes, the USENIX Security Symp., Boston, MA, Aug. 2007.leads to novel variations of dKP that could not be solved by  Y. Xie, F. Yu, K. Achan, E. Gillum, M. Goldszmidt, and T. Wobber,directly applying existing methods in the KP literature. “How dynamic are IP addresses?,” in Proc. ACM SIGCOMM, Kyoto, Japan, Aug. 2007, pp. 301–312. Our Prior Work: This paper builds on our conference  J. Zhang, P. Porras, and J. Ullrich, “Highly predictive blacklisting,”paper in . New contributions in this paper include: the presented at the USENIX Security Symp., San Jose, CA, Jul. 2008.formulation and optimal solution of the time-varying version  “Dshield: Cooperative network security community—Internet secu- rity,” Dshield.org [Online]. Available: http://www.dshield.orgof the ﬁltering problem; an extended evaluation section that  H. Kellerer, U. Pferschy, and D. Pisinger, Knapsack Problems. Newsimulates all ﬁltering problems over Dshield.org logs, York: Springer-Verlag, 2004.including FLOODING and DIST-FLOODING, which were not  G. Varghese, Network Algorithmics: An Interdisciplinary Approach to Designing Fast Networked Devices. San Mateo, CA: Morgan Kauf-evaluated in ; and additional proofs, complexity analysis, mann, 2005.and simulation results.  S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, In , we also studied optimal range-based ﬁltering, where U.K.: Cambridge Univ. Press, 2004.  T. Kanungo, D. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A.malicious source addresses were aggregated into continuous Wu, “An efﬁcient k-means clustering algorithm: Analysis and imple-ranges (of numbers in the IP address space ), instead mentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp.of preﬁxes, and we developed greedy solutions. Unfortunately, 881–892, Jul. 2002.ranges are not implementable in ACLs. Furthermore, it is well  G. Pack, J. Yoon, E. Collins, and C. Estan, “On ﬁltering of DDoS at- tacks based on source address preﬁxes,” presented at the SecureComm,known that ranges cannot be efﬁciently approximated by a Istanbul, Turkey, Sep. 2006.combination of preﬁxes .  E. Kohler, J. Li, V. Paxson, and S. Shenker, “Observed structure of addresses in IP trafﬁc,” in Proc. ACM SIGCOMM Workshop Internet VI. CONCLUSION Meas., Marseille, France, Nov. 2002, pp. 253–266.  Z. Chen and C. Ji, “Measuring network-aware worm spreading ability,” In this paper, we introduce a framework for optimal source in Proc. IEEE INFOCOM, Anchorage, AK, May 2007, pp. 116–124.preﬁx-based ﬁltering. The framework is rooted at the theory  R. Xu and D. Wunsch, II, “Survey of clustering algorithms,” IEEE Trans. Neural Netw., vol. 16, no. 3, pp. 645–678, May 2005.of the knapsack problem and provides a novel extension to  R. Mahajan, S. Bellovin, S. Floyd, J. Ioannidis, V. Paxson, and S.it. Within it, we formulate ﬁve practical problems, presented Schenker, “Controlling high bandwidth aggregates in the network,”in increasing order of complexity. For each problem, we de- Comput. Commun. Rev., vol. 32, pp. 62–73, Jul. 2002.  K. Argyraki and D. Cheriton, “Scalable network-layer defense againstsigned optimal algorithms that are also low-complexity (linear internet bandwidth-ﬂooding attacks,” IEEE/ACM Trans. Netw., vol. 17,or pseudo-polynomial in the input size). We simulate our no. 4, pp. 1284–1297, Aug. 2009.
SOLDO et al.: OPTIMAL SOURCE-BASED FILTERING OF MALICIOUS TRAFFIC 395  X. Liu, X. Yang, and Y. Lu, “To ﬁlter or to authorize: Network-layer Fabio Soldo (S’10) received the B.S. degree in DoS defense against multimillion-node botnets,” in Proc. ACM SIG- mathematics from Politecnico di Torino, Turin, Italy, COMM, Seattle, WA, Aug. 2008, pp. 195–206. in 2004, and the M.S. degree in mathematical engi-  “High performance packet classiﬁcation,” HiPAC.org [Online]. Avail- neering from Politecnico di Torino and Politecnico able: http://www.hipac.org/performance_tests/results.html di Milano, Milan, Italy, in 2006, and is currently  E. Al-Shaer and H. Hamed, “Firewall policy advisor,” DePaul Uni- pursuing the Ph.D. degree networked systems at the versity, Chicago, IL, 2005 [Online]. Available: http://www.mnlab. University of California, Irvine. cs.depaul.edu/projects/SPA He has had internships with Telefonica Research,  S. Venkataraman, A. Blum, D. Song, S. Sen, and O. Spatscheck, Barcelona, Spain; Docomo Euro-Labs, Munich, Ger- “Tracking dynamic sources of malicious activity at internet-scale,” many; and Google, Mountain View, CA. His research presented at the NIPS Whistler, BC, Canada, Dec. 2009. interests are in the areas of design and analysis of net-  Y. Zhang, S. Singh, S. Sen, N. Dufﬁeld, and C. Lund, “Online identi- work algorithms and network protocols and defense mechanisms against mali- ﬁcation of hierarchical heavy hitters: Algorithms, evaluation, and ap- cious trafﬁc. plications,” in Proc. ACM SIGCOMM Internet Meas. Conf., Taormina, Italy, Oct. 2004, pp. 101–114.  C. Estan, S. Savage, and G. Varghese, “Automatically inferring patterns of resource consumption in network trafﬁc,” in Proc. ACM SIGCOMM, Katerina Argyraki received the undergraduate de- Karlsruhe, Germany, Aug. 2003, pp. 137–148. gree in electrical and computer engineering from the  G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava, “Diamond Aristotle University, Thessaloniki, Greece, in 1999, in the rough: Finding hierarchical heavy hitters in multi-dimensional and the Ph.D. degree in electrical engineering from data,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, Paris, France, Stanford University, Stanford, CA, in 2007. Jun. 2004, pp. 155–166. She is a Researcher with the Operating Systems  P. Gilmore and R. Gomory, “A linear programming approach to the Group, School of Computer and Communication Sci- cutting stock problem—Part II,” Oper. Res., vol. 11, no. 6, pp. 863–888, ences, École Polytechnique Fédédrale de Lausanne 1963. (EPFL), Vaud, Switzerland. She works on network  A. Caprara, H. Kellerer, U. Pferschy, and D. Pisinger, “Approximation architectures and protocols with a focus on denial-of- algorithms for knapsack problems with cardinality constraints,” Eur. J. service defenses and accountability. Oper. Res., vol. 123, no. 2, pp. 333–345, 2000.  V. Li and G. Curry, “Solving multidimensional knapsack problems with generalized upper bound constraints using critical event tabu search,” Comput. Oper. Res., vol. 32, no. 4, pp. 825–848, 2005.  A. Bagchi, N. Bhattacharyya, and N. Chakravarti, “LP relaxation of the Athina Markopoulou (S’98–M’02) received the two dimensional knapsack problem with box and GUB constraints,” Diploma degree in electrical and computer engi- Eur. J. Oper. Res., vol. 89, no. 3, pp. 609–617, 1996. neering from the National Technical University  I. de Farias, Jr and G. Nemhauser, “A polyhedral study of the cardi- of Athens, Athens, Greece, in 1996, and the M.S. nality constrained knapsack problem,” Math. Program., vol. 96, no. 3, and Ph.D. degrees in electrical engineering from pp. 439–467, 2003. Stanford University, Stanford, CA, in 1998 and http://ieeexploreprojects.blogspot.com respectively.  F. Soldo, A. Markopoulou, and K. Argyraki, “Optimal ﬁltering of 2003, source address preﬁxes: Models and algorithms,” in Proc. IEEE She is an Assistant Professor with the Electrical INFOCOM, Rio de Janeiro, Brazil, Apr. 2009, pp. 2446–2454. Engineering and Computer Science Department,  F. Soldo, K. El Defrawy, A. Markopoulou, B. Krishnamurthy, and J. University of California, Irvine. She has been a Van Der Merwe, “Filtering sources of unwanted trafﬁc,” presented at Postdoctoral Fellow with Sprint Labs, Burlingame, the Inf. Theory Appl. Workshop, San Diego, CA, Jan. 2008. CA, in 2003 and with Stanford University from 2004 to 2005. She was a Member of the Technical Staff with Arastra, Inc., Palo Alto, CA, in 2005. Her research interests include network coding, network measurements and security, and online social networks. Dr. Markopoulou received the NSF CAREER Award in 2008.