Optimal source based filtering of malicious traffic.bak


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Optimal source based filtering of malicious traffic.bak

  1. 1. IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 2, APRIL 2012 381 Optimal Source-Based Filtering of Malicious Traffic Fabio Soldo, Student Member, IEEE, Katerina Argyraki, and Athina Markopoulou, Member, IEEE Abstract—In this paper, we consider the problem of blocking ma- Filtering capabilities are already available at routers today vialicious traffic on the Internet via source-based filtering. In partic- access control lists (ACLs). ACLs enable a router to match aular, we consider filtering via access control lists (ACLs): These are packet header against predefined rules and take predefined ac-already available at the routers today, but are a scarce resource be-cause they are stored in the expensive ternary content addressable tions on the matching packets [1], and they are currently usedmemory (TCAM). Aggregation (by filtering source prefixes instead for enforcing a variety of policies, including infrastructure pro-of individual IP addresses) helps reduce the number of filters, but tection [2]. For the purpose of blocking malicious traffic, a filtercomes also at the cost of blocking legitimate traffic originating from is a simple ACL rule that denies access to a source IP addressthe filtered prefixes. We show how to optimally choose which source or prefix. To keep up with the high forwarding rates of modernprefixes to filter for a variety of realistic attack scenarios and oper- routers, filtering is implemented in hardware: ACLs are typi-ators’ policies. In each scenario, we design optimal, yet computa-tionally efficient, algorithms. Using logs from Dshield.org, we cally stored in ternary content addressable memory (TCAM),evaluate the algorithms and demonstrate that they bring signifi- which allows for parallel access and reduces the number ofcant benefit in practice. lookups per forwarded packet. However, TCAM is more ex- Index Terms—Clustering algorithms, filtering, Internet, pensive and consumes more space and power than conventionalnetwork security. memory. The size and cost of TCAM puts a limit on the number of filters, and this is not expected to change in the near future.1 With thousands or tens of thousands of filters per path, an ISP I. INTRODUCTION alone cannot hope to block the currently witnessed attacks, not to mention attacks from multimillion-node botnets expected in OW CAN we protect our network infrastructure fromH malicious traffic, such as scanning, malicious codepropagation, spam, and distributed denial-of-service (DDoS) the near future. Consider the example shown in Fig. 1(a): An attacker com- mands a large number of compromised hosts to send traffic toattacks? These activities cause problems on a regular basis, (say a Web server), thus exhausting the resources a victimranging from simple annoyance to severe financial, operational, of and preventing it from serving its legitimate clients. The http://ieeexploreprojects.blogspot.comand political damage to companies, organizations, and critical ISP of tries to protect its client by blocking the attack at theinfrastructure. In recent years, they have increased in volume, gateway router . Ideally, should install one separate filter tosophistication, and automation, largely enabled by botnets, block traffic from each attack source. However, there are typi-which are used as the platform for launching these attacks. cally fewer filters than attack sources, hence aggregation is used, Protecting a victim (host or network) from malicious traffic i.e., a single filter (ACL) is used to block an entire source addressis a hard problem that requires the coordination of several com- prefix. This has the desired effect of reducing the number of fil-plementary components, including nontechnical (e.g., business ters necessary to block all attack traffic, but also the undesiredand legal) and technical solutions (at the application and/or net- effect of blocking legitimate traffic originating from the blockedwork level). Filtering support from the network is a fundamental prefixes (we will call the damage that results from blocking le-building block in this effort. For example, an Internet service gitimate traffic “collateral damage”). Therefore, filter selectionprovider (ISP) may use filtering in response to an ongoing DDoS can be viewed as an optimization problem that tries to block asattack to block the DDoS traffic before it reaches its clients. An- many attack sources with as little collateral damage as possible,other ISP may want to proactively identify and block traffic car- given a limited number of filters. Furthermore, several measure-rying malicious code before it reaches and compromises vulner- ment studies have demonstrated that malicious sources exhibitable hosts in the first place. In either case, filtering is a necessary temporal and spatial clustering [3]–[9], a feature that can be ex-operation that must be performed within the network. ploited by prefix-based filtering. In this paper, we formulate a general framework for studying source prefix filtering as a resource allocation problem. To the Manuscript received June 03, 2010; revised April 14, 2011; accepted June14, 2011; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor best of our knowledge, optimal filter selection has not been ex-T. Wolf. Date of publication July 25, 2011; date of current version April 12, plored so far, as most related work on filtering has focused on2012. This work was supported by the NSF CyberTrust Award 0831530 and the protocol and architectural aspects. Within this framework, weSwiss National Science Foundation under an Ambizone Grant. F. Soldo and A. Markopoulou are with the Department of Electrical En- 1A router linecard or supervisor-engine card typically supports a singlegineering and Computer Science, University of California, Irvine, Irvine, CA TCAM chip with tens of thousands of entries. For example, the Cisco Catalyst92697 USA (e-mail: fsoldo@uci.edu; athina@uci.edu). 4500, a midrange switch, provides a 64 000-entry TCAM to be shared among K. Argyraki is with the School of Computer Science and Communication all its interfaces (48–384). Cisco 12000, a high-end router used at the InternetSciences, École Polytechnique Fédédrale de Lausanne (EPFL), Vaud 1025, core, provides 20 000 entries that operate at line-speed per linecard (up to 4-GbSwitzerland (e-mail: katerina.argyraki@epfl.ch). Ethernet interfaces). The Catalyst 6500 switch can fit 16 K–32 K patterns and 2 Color versions of one or more of the figures in this paper are available online K–4 K masks in the TCAM. Depending on how an ISP connects to its clients,at http://ieeexplore.ieee.org. each individual client can typically use only part of these ACLs, i.e., a few Digital Object Identifier 10.1109/TNET.2011.2161615 hundreds to a few thousands filters. 1063-6692/$26.00 © 2011 IEEE
  2. 2. 382 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 2, APRIL 2012 TABLE I SUMMARY OF NOTATION AND TERMINOLOGY GFig. 1. Example of a distributed attack. Let us assume that the gateway router F Fhas only two filters, 1 and 2, available to block malicious traffic and protect V Fthe victim . It uses 1 to block a single malicious source address (A) and 2Fto block the entire source prefixa:b:c:3, which contains three malicious sources FSource IP Addresses and Prefixes: Every IPv4 address is abut also one legitimate source (B). Therefore, the selection of filter 2 tradesoff collateral damage (blocking B) for reduction in the number of filters (from 32-b sequence. We use standard IP/mask notation, i.e., we write F Fthree to one). We note that both filters, 1 and 2, are ACLs installed at the G to indicate a prefix of length bits, where and can takesame router . (a) Actual network. (b) Hierarchy of source IP addresses andprefixes. values and , respectively. For brevity, when the meaning is obvious from the context, we simply write to indicate prefix . We write toformulate and solve five practical source-address filtering prob- indicate that address is within the 2 addresses coveredlems, depending on the attack scenario and the operator’s policy by prefix .and constraints. Our contributions are twofold. On the theoret- Blacklists and Whitelists: A blacklist is a set of uniqueical side, filter selection optimization leads to novel variations source IP addresses that send bad (undesired) traffic to theof the multidimensional knapsack problem. We exploit the spe- victim. Similarly, a whitelist is a set of unique source IPcial structure of each problem and design optimal and compu- addresses that send good (legitimate) traffic to the victim. An http://ieeexploreprojects.blogspot.comtationally efficient algorithms. On the practical side, we provide address may belong either to a blacklist (in which case we calla set of cost-efficient algorithms that can be used both by op- it a “bad” address) or a to whitelist (in which case we call iterators to block undesired traffic and by router manufacturers a “good” address), but not to both. We use and toto optimize the use of TCAM and eventually the cost of routers. indicate the number of addresses in and , respectively.We use logs from Dshield.org to demonstrate that optimally For brevity, we also use for the number of addressesselecting which source prefixes to filter brings significant bene- in the blacklist, which is the size of the most important input tofits compared to nonoptimized filtering or to generic clustering our problem.algorithms. Each address in a blacklist or a whitelist is assigned a The outline of the rest of the paper is as follows. In Sec- weight , indicating its importance. If is a bad address,tion II, we formulate the general framework for optimal source we assign it a negative weight , which indicates theprefix filtering. In Section III, we study five specific problems benefit from blocking ; if is a good address, we assign itthat correspond to different attack scenarios and operator a positive weight , which indicates the damage frompolicies: blocking all addresses in a blacklist (BLOCK-ALL), blocking . The higher the absolute value of the weight, theblocking some addresses in a blacklist (BLOCK-SOME), higher the benefit or damage and thus the preference to blockblocking all/some addresses in a time-varying blacklist the address or not. The weight can have a different inter-(TIME-VARYING BLOCK-ALL/SOME), blocking flows pretation depending on the filtering problem. For instance, itduring a DDoS flooding attack to meet bandwidth constraints can represent the amount of bad/good traffic originating from(FLOODING), and distributed filtering across several routers the corresponding source address, or it can express policy: De-during flooding (DIST-FLOODING). For each problem, we pending on the amount of money gained/lost by the ISP whendesign an optimal, yet computationally efficient, algorithm blocking source address , an ISP operator can assign largeto solve it. In Section IV, we use data from Dshield.org positive weights to its important customers that should never be[10] to evaluate the performance of our algorithms in realistic blocked, or large negative weights to the worst attack sourcesattack scenarios, and we demonstrate that they bring significant that must definitely be blocked.benefit in practice. Section V discusses related work and puts Creating blacklists and whitelists (i.e., identifying bad andour work in perspective. Section VI concludes the paper. good addresses and assigning appropriate weights to them) is a difficult problem on its own right, but orthogonal to this work. We assume that the blacklist is provided by another module II. PROBLEM FORMULATION AND FRAMEWORK (e.g., an intrusion detection system or historical data) as input to our problem. The sources of legitimate traffic are also assumedA. Terminology and Notation known—e.g., Web servers or ISPs typically keep historical data Table I summarizes our terminology and notation. and know their customers. If it is not explicitly given, we take
  3. 3. SOLDO et al.: OPTIMAL SOURCE-BASED FILTERING OF MALICIOUS TRAFFIC 383a conservative approach and define the whitelist to includevariables corresponding to all possible prefixes and will beall addresses that are not in . omitted from now on for brevity. Filters: We focus on filtering of source address prefixes. In Eq. (1)–(4) provide the general framework for filter-selec-our context, a filter is an ACL rule that specifies that all packets tion optimization. Different filtering problems can be writtenwith a source IP address in prefix should be blocked. is as special cases, possibly with additional constraints. As wethe maximum number of available filters, and it is given as input discuss in Section V, these are all multidimensional knapsackto our problem. Filter optimization is meaningful only when problems [11], which are, in general, NP-hard. The specifics is much smaller than the size of the blacklist . of each problem dramatically affect the complexity, which canOtherwise, the optimal would be to block every single bad ad- vary from linear to NP-hard.dress. is indeed the case in practice due to the size In this paper, we formulate five practical filtering problemsand cost of the TCAM, as mentioned in Section I. and develop optimal, yet computationally efficient, algorithms The decision variable is 1 if a filter is to solve them. Here, we summarize the rationale behind eachassigned to block prefix , or 0 otherwise. A filter problem and outline our main results; the exact formulation andblocks all 2 addresses in that range. Hence, detailed solution is provided in Section III. expresses the benefit from filter , whereas BLOCK-ALL: Suppose a network operator has a blacklist expresses the collateral damage it of size , a whitelist , and a weight assigned to each ad-causes. An effective filter should have a large benefit anddress that indicates the amount of traffic originating from thatlow collateral damage . address. The total number of available filters is . The first Filtering Benefit and Collateral Damage: We define the practical goal the operator may have is to install a set of fil-filtering benefit as , i.e., the sum ters that block all bad traffic so as to minimize the amount ofof the weights of the bad addresses whose traffic is blocked. good traffic that is blocked. We design an optimal algorithm thatWe define the collateral damage of a filtering solution as solves this problem at the lowest achievable complexity (lin- , i.e., the sum of the weights of early increasing with ).the good addresses whose traffic is blocked. BLOCK-SOME: A blacklist and a whitelist are given as be- fore, but the operator is now willing to block only some, insteadB. Rationale and Overview of Filtering Problems of all, bad traffic, so as to decrease the amount of good traffic blocked at the expense of leaving some bad traffic unblocked. Given a set of bad and a set of good source addresses ( and The goal now is to block only those prefixes that have the highest ), a measure of their importance (the address weights ), impact and do not contain sources that generate a lot of good http://ieeexploreprojects.blogspot.comand a resource budget ( plus, possibly, other resources, traffic, so as to minimize the total cost in (1). We design an op-depending on the particular problem), the goal is to select timal, lowest-complexity (linearly increasing with ) algorithmwhich source prefixes to filter so as to minimize the impact of for this problem, as well.bad traffic and can be accommodated with the given resource TIME-VARYING BLOCK-ALL/SOME: Bad addresses maybudget. Different variations of the problem can be formulated, change over time [4]: New sources may send malicious trafficdepending on the attack scenario and the victim network’s and, conversely, previously active sources may disappearpolicies and constraints: The network operator may want to (e.g., when their vulnerabilities are patched). One way toblock all bad addresses or tolerate to leave some unblocked; the solve the dynamic versions of BLOCK-ALL (SOME) is toattack may be of low rate or a flooding attack; filters may be run the algorithms we propose for the static versions for theinstalled at one or several routers. At the core of each filtering blacklist/whitelist pair at each time slot. However, given thatproblem lies the following optimization: subsequent blacklists typically exhibit significant overlap [4], it may be more efficient to exploit this temporal correlation (1) and incrementally update the filtering rules. We show that is it possible to update the optimal solution, as new IPs are inserteds.t. (2) in or removed from the blacklist, in time. FLOODING: In a flooding attack, such as the one shown in Fig. 1, a large number of compromised hosts send traffic to the (3) victim and exhaust the victim’s access bandwidth. In this case, our framework can be used to select the filtering rules that mini- (4) mize the amount of good traffic that is blocked while meeting the access bandwidth constraint—in particular, the total bandwidth Eq. (1) expresses the objective to minimize the total cost of consumed by the unblocked traffic should not exceed the band-bad traffic, which consists of two parts: the collateral damage width of the flooded link, e.g., link G–V in Fig. 1. We prove that(the terms with ) and the cost of leaving bad traffic un- this problem is NP-hard, and we design a pseudo-polynomialblocked (the terms with ). We use notation to algorithm that solves it optimally, with complexity that growsdenote summation over all possible prefixes : , linearly with the blacklist and whitelist size, i.e., . . Eq. (2) expresses the constraint on the number DIST-FLOODING: All the above problems aim at installingof filters. Eq. (3) states that overlapping filters are mutually ex- filters at a single router. However, a network operator may useclusive, i.e., each bad address can be blocked at most once, oth- the filtering resources collaboratively across several routers toerwise filtering resources are wasted. Eq. (4) lists the decision better defend against an attack. Distributed filtering may also
  4. 4. 384 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 2, APRIL 2012be enabled by the cooperation across several ISPs against acommon enemy. The question in both cases is not only whichprefixes to block, but also at which router to install the filters.We study the practical problem of distributed filtering against aflooding attack. We prove that the problem can be decomposedinto several FLOODING problems, which can be solved in adistributed way. III. FILTERING PROBLEMS AND ALGORITHMS In this section, we provide the detailed formulation of eachproblem and present the algorithm that solves it. We start by Fig. 2. Example of LCP-tree(BL). Consider a blacklist consisting of the following nine bad addresses: BL =defining the data structure that we use to represent the problem f10:0:0:1; 10:0:0:3; 10:0:0:4; 10:0:0:5; 10:0:0:7; 10:0:0:8; 10:0:0:10;and to develop our algorithms. g 10:0:0:11; 10:0:0:12 . All remaining addresses are considered good. Each leaf represents one address in the BL. Each intermediate node represents theA. Data Structure for Representing Filtering Solutions longest common prefix p covering all bad addresses in that subtree. At each intermediate node p, we also show the collateral damage (i.e., number of good Definition 1 (LCP Tree): Given a set of addresses , we addresses blocked) when we filter prefix p instead of filtering each of its children. For instance, if we use two filters to block bad addresses the Longest Common Prefix tree of , denoted by and, the collateral damage is 0; if, instead, we use one filter toLCP-tree , as the binary tree with the following properties: block prefix, we also block good address, i.e., we1) each leaf represents a different address in , and there is a cause collateral damage 1.leaf for each address in ; 2) each intermediate (nonleaf) noderepresents the longest common prefix between the prefixesrepresented by its two children. damage. Second, (3) becomes (7), which enforces the constraint The LCP-tree can be constructed from the complete bi- that every bad address should be blocked by exactly one filter,nary tree (with root leaves at level 32 corresponding to all ad- as opposed to at most one filter in (3)dresses , and intermediate nodes at level corresponding to all prefixes of length ) by removing (5)the branches that do not have addresses in , and then by re- http://ieeexploreprojects.blogspot.commoving nodes with a single child. It is a variation of the binary s.t. (6)(or unibit) trie [12], but does not have nodes with a single child.The LCP-tree offers an intuitive way to represent sets of pre- (7)fixes that can block the addresses in set : Each node in the LCPtree represents a prefix that can be blocked, hence we can rep-resent a filtering solution as the pruned version of the LCP tree, Characterizing an Optimal Solution: Our algorithm startswhose leaves are all and only the blocked prefixes. from LCP-tree and outputs a pruned version of that LCP Example 1: For instance, consider the LCP tree depicted in tree. Hence, we start by proving that an optimal solution toFig. 2, whose leaves correspond to bad addresses that we want to BLOCK-ALL can indeed be represented as a pruned version ofblock. One (expensive) solution is to use one filter to block each that LCP tree.bad address; thus the LCP tree is not pruned, and its leaves cor- Proposition 3.1: An optimal solution to BLOCK-ALL can berespond to the filters. Another feasible solution is to use three fil- represented as a pruned subtree of LCP-tree with the sameters and block traffic from prefixes 0/1, 8/2, and 12/4; this can be root as LCP-tree , up to leaves, and each nonleaf noderepresented by the pruned version of the LCP tree that includes having exactly two children.the aforementioned prefixes as leaves. Yet another (rather rad- Proof: We prove that, for each feasible solution toical) solution is to filter a single prefix (0/0) to block all traffic; BLOCK-ALL , there exists another feasible solution that:this can be represented by the pruned version of the LCP tree 1) can be represented as a pruned subtree of LCP-tree asthat includes only its root. described in the proposition; and 2) whose collateral damage is Complexity: Given a list of addresses , we can build smaller or equal to ’s. This is sufficient to prove the proposi-the LCP-tree by performing insertions in a Patricia tion, since an optimal solution is also a feasible one.trie [12]. To insert a string of bits, we need at most Any filtering solution can be represented as a prunedcomparisons. Thus, the worst-case complexity is , subtree of the full binary tree of all IP addresses [LCP-where (bits) is the length of a 32-b IPv4 address. tree ] with the same root and leaves cor- responding to the filtered prefixes. is a feasible solution toB. BLOCK-ALL BLOCK-ALL, therefore uses up to filters, i.e., its tree Problem Statement: Given: a blacklist , a whitelist , has up to leaves. Indeed, if this was not the case, (6)and the number of available filters ; select filters that block would be violated and would not be a feasible solution.all bad traffic and minimize collateral damage. Let us assume that the tree representing includes a prefix Formulation: We formulate this problem by making two ad- that is not in LCP-tree . There are three possible cases.justments to the general framework of (1)–(4). First, (1) be- 1) includes no bad addresses. In this case, we can simplycomes (5), which expresses the goal to minimize the collateral remove from ’s tree (i.e., unblock ).
  5. 5. SOLDO et al.: OPTIMAL SOURCE-BASED FILTERING OF MALICIOUS TRAFFIC 385 2) Only one of ’s children includes bad addresses. In this case, we can replace with the child node. Algorithm 1: Algorithm for Solving BLOCK-ALL 3) Both of ’s children contain bad addresses. In this case, is already the longest common prefix of all bad addresses 1: build LCP-tree in , thus already on the LCP-tree . 2: for all leaf nodes doClearly, each of these operations transforms feasible solution , 3:which is assumed not to be on LCP-tree , into another fea- 4:sible solution with smaller or equal collateral damage but 5: end foron the LCP-Tree . We can repeat this process for all pre- 6:fixes that are in ’s tree but not in LCP-tree , until we 7: while docreate a feasible solution that includes only prefixes from 8: for all node such that doLCP-tree and has smaller or equal collateral damage. 9: The only element missing to prove the proposition is to show 10:that, in the pruned LCP subtree that represents , each nonleaf 11:node has exactly two children. We show this by contradiction:Suppose there exists a nonleaf node in our pruned LCP subtree 12:that has exactly one child. This can only result from pruning 13: end forout one child of a node in the LCP tree. This means that all the 14:bad addresses (leaves) in the subtree of this child node remain 15: end whileunfiltered, which violates (7), but this is a contradiction because 16: return , is a feasible solution. Algorithm: Algorithm 1, which solves BLOCK-ALL, con- the two children nodes (prefixes) of in the LCP-tree .sists of two steps. First, we build the LCP tree from the input Finding the optimal allocation of filters to block allblacklist . Second, in a bottom–up fashion, we compute addresses contained in (possibly all IP space) is equivalent , i.e., the minimum collateral damage needed to to finding the optimal allocation of filters to block allblock all bad addresses in the subtree of prefix using at most addresses in , and prefixes for bad addresses in , filters. Following a dynamic programming (DP) approach, such that . This is because prefixes andwe can find the optimal allocation of filters in the subtree rooted jointly contain all bad addresses. Moreover, each of andat prefix by finding a value and assigning http://ieeexploreprojects.blogspot.com address. Thus, at least one filter must filters to contains at least one badthe left subtree and to the right subtree, so as to minimize be assigned to each of them. If , i.e., there is onlycollateral damage. The fact that BLOCK-ALL needs to filter one filter available, the only feasible solution is to select asall bad addresses (leaves in the LCP tree) implies that at least the prefix to filter out. The same argument recursively applies toone filter must be assigned to the left and right subtree, i.e., descendant nodes, until either we reach a leaf node, or we have . In other words, for every pair of sibling only one filter available. In these cases, the problem is triviallynodes, (left) and (right), with common parent node , the solved by (9).following recursive equation holds: Complexity: The LCP-tree is a binary tree with leaves. Therefore, it has intermediate nodes (pre- (8) fixes). Computing (8) for every node and for every value involves solving subproblems, one for every pair with complexity .with boundary conditions for leaf and intermediate nodes in (8) requires only the optimal solution at the sibling nodes, . Thus, proceeding from the leaves to the (9) root, we can compute the optimal solution in . (10) In practice, the complexity is even lower since we do not need to compute for all values , but onlyOnce we compute for all prefixes in the LCP tree, we for , where is thesimply read the value of the optimal solution, . number of the leaves in prefix in the LCP tree. Moreover, weWe also use auxiliary variables to keep track of the set only need to compute entries for every prefix , s.t. weof prefixes used in the optimal solution. In lines 4 and 10 of cover all addresses in , which may requireAlgorithm 1, is initialized to the single prefix used. In for long prefixes in the LCP-tree.line 12, after computing the new cost, the corresponding set of Finally, we observe that the asymptotic complexity isprefixes is updated: . since and does not depend Theorem 3.2: Algorithm 1 computes the optimal solution on but only on the TCAM size. Thus, the time complexityof problem BLOCK-ALL: the prefixes that are contained in increases linearly with the number of bad addresses . Thisset are the optimal for (5)–(7). is within a constant factor of the lowest achievable complexity Proof: Recall that denotes the value of the op- since we need to read all bad addresses at least once.timal solution of BLOCK-ALL with filters (i.e., the min- Although the above is a worst-case analysis, we confirmed inimum collateral damage), while denotes the set of simulation that the computation time in practice is very closefilters selected in the optimal solution. Let and denote to that.
  6. 6. 386 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 2, APRIL 2012C. BLOCK-SOME Complexity: The analysis of Algorithm 1 applies to this al- Problem Statement: Given a blacklist , a whitelist , gorithm as well. The complexity is the same, i.e., linearly in-and the number of available filters , the goal is to select creasing with .filters so as to minimize the total cost of the attack. BLOCK-ALL Versus BLOCK-SOME: There is an interesting Formulation: This is precisely the problem described by connection between the two problems. The latter can be re-(1)–(4), but put slightly rephrased to better compare it to garded as an automatic way to select the best subset fromBLOCK-ALL. There are two differences from BLOCK-ALL. and run BLOCK-ALL only on that subset. If the absolute valueFirst, the goal is to minimize the total cost of the attack, which of weights of bad addresses are significantly larger than theinvolves both collateral damage and the filtering benefit weights of the good addresses, then BLOCK-SOME degener- , which is expressed by (11). Second, (13) states that every ates to BLOCK-ALL.bad address must be filtered by at most one prefix, which meansthat it may or may not be filtered D. TIME-VARYING BLOCK-ALL(SOME) (11) We now consider the case when the blacklist and whitelist change over time, and we seek to incrementally update the filtered prefixes that are affected by the change. More precisely, s.t. (12) consider a sequence of blacklists and of whitelists at times , respectively. (13) Problem Statement: Given: 1) a blacklist and whitelist, and ; 2) the number of available filters ; 3) the corresponding solution to BLOCK-ALL(SOME), de- Characterizing an Optimal Solution: As with BLOCK-ALL, noted ; and 4) another blacklist and whitelist, andour algorithm starts from LCP-tree and outputs a pruned ; obtain the solution to BLOCK-ALL(SOME) for theversion of that LCP tree. The only difference is that some bad second blacklist/whitelist, denoted .addresses may now remain unfiltered. In the pruned LCP sub- Algorithm: Consider, for the moment, that the whitelist re-tree that represents our solution, this means that there may exist mains the same and assume the focus is on the changes in theintermediate (nonleaf) nodes with a single child. blacklist. Proposition 3.3: An optimal solution to BLOCK-SOME can 1) Addition: First, consider that the two blacklists differ only http://ieeexploreprojects.blogspot.combe represented as a pruned subtree of LCP-tree with the in a single new bad address, which does not appear in ,same root as LCP-tree and up to leaves. but appears in . There are two cases, depending on whether Proof: In Proposition 3.1, we proved that any solution of the new bad address belongs to a prefix that is already filtered(5) and (6) can be reduced to a (pruned) subtree of the LCP tree in . If it is, no further action is needed, and .with at most leaves. Moreover, the constraint expressed by Otherwise, we modify the LCP tree that represents to also(13), which imposes the use of nonoverlapping prefixes, is auto- include the new bad address, as illustrated in Fig. 3. The keymatically imposed considering the of the pruned subtree point is that we only need to add one new intermediate nodeas the selected filter. This proves that any feasible solution of to the LCP tree (the gray node in Fig. 3), corresponding to theBLOCK-SOME can be represented as a pruned subtree of the longest common prefix between the new bad address and itsLCP tree with at most leaves, and thus, so can an optimal closest bad address that is already in the LCP tree. The optimalsolution. allocation of filters to the subtree rooted at prefix depends Algorithm: The algorithm that solves BLOCK-SOME is sim- only on how these filters are allocated to the children of .ilar to Algorithm 1 in that it relies on the LCP tree and a dynamic Hence, when we add a new node to the LCP tree, we need toprogramming (DP) approach. The main difference is that not all recompute the optimal filter allocation [i.e., recomputebad addresses need to be filtered. Hence, at each step, we can and , according to (8)] for all and only the ancestorsassign filters to the left and/or right subtree, whereas in of the new node, all the way up to the root node.line 11 of Algorithm 1 we had , now we have 2) Deletion: Then, assume that two blacklists differ in one . We can recursively compute the optimal solu- deleted bad address, which appears in but not in .tion as before In this case, we modify the LCP tree that represents to remove the leaf node that corresponds to that address as well as (14) its parent node (since that node does not have two children any more), and we recompute the optimal filter allocation for all andwith boundary conditions only the node’s ancestors. 3) Adjustment: Finally, suppose that the two blacklists differ (15) in one address, which appears in both blacklists but with dif- (16) ferent weights, or that the two blacklists are the same, while the two whitelists differ in one address (it either appears in one of (17) the two whitelists or it appears in both whitelists but with dif- ferent weights). In all of these cases, we do not need to add or re-where is an intermediate node (prefix) and is a leaf node move any nodes from the LCP tree, but we do need to adjust thein the LCP-tree. collateral damage or filtering benefit associated with one node,
  7. 7. SOLDO et al.: OPTIMAL SOURCE-BASED FILTERING OF MALICIOUS TRAFFIC 387 gets blocked, ) should fit within the link capacity , so as to avoid congestion and packet loss (18) s.t. (19) (20) (21)Fig. 3. Example of BLOCK-ALL and TIME-VARYING Characterizing an Optimal Solution: We represent the op-BLOCK-ALL. Consider a blacklist of 10 bad IP addresses timal solution as a pruned subtree of an LCP-tree. However,BL = f10:0:0:3; 10:0:0:10; 10:0:0:15; 10:0:0:17; 10:0:0:22; 10:0:0:31; we start with the full binary tree of all bad and good addresses10:0:0:32; 10:0:0:33; 10:0:0:57; 10:0:0:58g. The table next to eachnode p shows the minimum cost z (F ) computed by the DP algorithm LCP-tree . Moreover, to handle the constraint infor BLOCK-ALL for F = 1; . . . number of leaves in subtree. The optimal (20), each node corresponding to prefix is assigned an addi-solution to BLOCK-ALL consists of the four prefixes highlighted in black. tional cost, , indicating the total amount of traffic sent by ,When a new address, e.g.,, is added to the blacklist, a leaf nodeis added to the tree and TIME-VARYING needs to update all and only the .ancestor nodes in LCP-tree(BL), indicated by the dashed lines, according to Proposition 3.4: An optimal solution of FLOODING can be(8). Moreover, a new node is created to denote the longest common prefix represented as the leaves of a pruned subtree of LCP-treebetween and (or Note that all other nodescorresponding to the longest common prefixes between and other , with the same root, up to leaves, and the total costaddresses in BL are already in the LCP tree. The new optimal solution consists of the leaves .of the four prefixes indicated by the dashed circles. Proof: Similarly to Proposition 3.1, we prove that for every feasible solution to FLOODING , there exists another feasible solution , which: 1) can be represented as a pruned subtree http://ieeexploreprojects.blogspot.com as described in the proposition; andhence recompute the optimal filter allocation for all and only of LCP-treethat node’s ancestors. 2) whose collateral damage is smaller or equal to ’s. This is 4) Multiple Addresses: If the two successive time instances sufficient to prove the proposition since an optimal solution isdiffer in multiple addresses, we repeat the procedures described also a feasible one.above as needed, i.e., we perform one node addition for each Any filtering solution can be represented as a pruned subtreenew bad address, one deletion for each removed bad address, of LCP-tree with the same root and leavesand up to one adjustment for each other difference. corresponding to the filtered prefixes. is a feasible solution Complexity: Since the LCP tree is a complete binary tree, to FLOODING, therefore: ’s tree has up to leaves, oth-any leaf node has at most ancestors, so inserting a erwise (19) would be violated; the total cost of ’s leaves isnew bad address (or removing one) requires , otherwise (20) would be violated.operations. Hence, deriving from as described above Suppose that includes a prefix that is not inis asymptotically better than computing it from scratch using LCP-tree . We can construct a better feasibleAlgorithm 1 if and only if the number of different addresses solution , which can be represented as a pruned subtreebetween the two time instances is less than . of LCP-tree : has the same root, up to leaves, and total cost of the leaves . There are three possibilities.E. FLOODING 1) includes neither bad nor good addresses. In this case, we can simply remove from , i.e., unblock . Problem Statement: Given: 1) a blacklist and a 2) Only one of ’s children includes bad or good addresses.whitelist , where the absolute weight of each bad and In this case, we can replace with the child that containsgood address is equal to the amount of traffic it generates; the bad addresses.2) the number of available filters ; and 3) a constraint on 3) Both of ’s children include bad or good addresses. In thisthe victim’s link capacity (bandwidth) ; select filters so as to case, is already a longest common prefix, and we do notminimize collateral damage and make the total traffic fit within need to do anything.the victim’s link capacity. Clearly, each of these operations transforms feasible solu- Formulation: To formulate this problem, we need to make tion into another feasible solution with smaller or equaltwo adjustments to the general framework of (1)–(4). First, (1) collateral damage while still preserving the capacity constraint.becomes (18), which expresses the goal to minimize collateral This is because the transformations filter the same amountdamage. Second, we add a new constraint (20), which specifies of traffic, just using the longest prefix possible to do so. Wethat the total traffic that remains unblocked after filtering (which can repeat this process for all prefixes that are in but not inis the total traffic, , minus the traffic that LCP-tree , until we create a feasible solution
  8. 8. 388 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 2, APRIL 2012that includes only prefixes from LCP-tree and has only the number of filters allocated to that subtree, but alsosmaller or equal collateral damage. the corresponding amount of capacity (i.e., the amount of the Theorem 3.5: FLOODING [i.e., (18)–(21)] is NP-hard. victim’s capacity consumed by the unfiltered traffic coming Proof: To prove that FLOODING is -hard, we con- from the corresponding prefix). We can recursively computesider the knapsack problem with a cardinality constraint the optimal solution bottom–up as before (29) (22) (23) where is the minimum collateral damage of prefix when allocating filters and capacity to that prefix. s.t. (24) Complexity: Our DP approach computes entries for every node in LCP-tree . Moreover, the compu- tation of a single entry, given the entries of descendant nodes,which is known to be -hard [11], and we show that it reduces require operations, (29). We can leverage again theto FLOODING. To do this, we put FLOODING in a slightly observation that we do not need to compute entries fordifferent form by making two changes. all nodes in the LCP tree: At a node , it is sufficient to com- First, we change the inequality in (19) to an equality. Any pute (29) only forfeasible solution to FLOODING that uses filters and . Therefore, thecan be transformed to another feasible solution with exactly optimal solution to FLOODING, , can be com- filters, without increasing collateral damage. In fact, given puted in time. This is increasing linearlya feasible solution that uses filters, as long as with the number of addresses in and is polynomial , it is always possible to remove a filter from a prefix in . The overall complexity is pseudo-polynomial becauseand add two filters to the two prefixes corresponding to ’s chil- cannot be polynomially bounded in the input size. In the evalu-dren in LCP-tree . The solution constructed this way ation section, we present a heuristic algorithm that operates inuses filters, blocks all addresses blocked in , and has a increments of . Finally, we note that and thuscost that is less or equal to ’s. does not appear in the asymptotic complexity. Second, we define variables , , BLOCK-SOME Versus FLOODING: There is an interestingand and use them to rewrite FLOODING connection between the two problems. To see that, consider the http://ieeexploreprojects.blogspot.com partial Lagrangian relaxation of (18)–(21) (25) s.t. (26) (30) (27) s.t. (31) (28) (32) For a given instance of the problem defined by (22) and (23), For every fixed , (30)–(32) are equivalent to (11)–(13) forwe construct an equivalent instance of the problem defined by a specific assignments of weights . This shows that dual fea-(25)–(28) by introducing the following mapping. For sible solutions of FLOODING are instances of BLOCK-SOME : , . For all other prefixes that for a particular assignment of weights. The dual problem, in theare not addresses in the blacklist or whitelist: variable , aims exactly at tuning the Lagrangian multiplier to . Moreover, we assign and . With find the best assignment of weights.this assignment, a solution to the problem defined by (22) can F. DISTRIBUTED-FLOODINGbe obtained by solving FLOODING, then taking the values ofvariables that are blocked. Problem Statement: Consider a victim that connects to Algorithm: Given the hardness of the problem, we do not the Internet through its ISP and is flooded by a set of attackerslook for a polynomial-time algorithm. We design a pseudo-poly- listed in a blacklist , as in Fig. 1(a). To reach the victim,nomial-time algorithm that optimally solves FLOODING. Its attack traffic passed through one or more ISP routers. Letcomplexity is linearly with the number of good and bad ad- be the set of unique such routers. Let each router havedresses and with the magnitude of . capacity on the downstream link (toward ) and a limited Our algorithm is similar to the one that solves number of filters . The volume of good/bad traffic throughBLOCK-SOME, i.e., it relies on an LCP tree and a DP every router is assumed known. Our goal is to allocate filters onapproach. However, we now use the LCP tree of all the bad some or all routers, in a distributed way, so as to minimize theand good addresses. Moreover, when we compute the optimal total collateral damage and avoid congestion on all links of thefilter allocation for each subtree, we now need to consider not ISP network.
  9. 9. SOLDO et al.: OPTIMAL SOURCE-BASED FILTERING OF MALICIOUS TRAFFIC 389 Formulation: Let the variables indicate The dual problem iswhether or not filter is used at router . Then, the dis-tributed filtering problem can be stated as (41) (33) where is the optimal solution of (38)–(40) for a given . Given the prices , every subproblem (38)–(40) can be s.t. (34) solved independently and optimally by router using (29). Problem (41) can be solved using a projected subgradient (35) method [11]. In particular, we use the following update rule to compute shadow prices at each iteration: (36) Characterizing an Optimal Solution: Given the sets , , , and , at each router, we have the following. Proposition 3.6: There exists an optimal solution of where is the step size. The interpretation of the update rule isDIST-FLOODING that can be represented as a set of quite intuitive: For every that is filtered with multiple filters,different pruned subtrees of the LCP-tree , each the corresponding shadow price is augmented proportion-corresponding to a feasible solution of FLOODING for the ally to the number of times it is blocked. Increasing the pricessame input, and s.t. every subtree leaf is not a node of another has in turn the effect of forcing the router to try to unblock thesubtree. corresponding . The price is increased until a single filter is Proof: Feasible solutions of DIST-FLOODING allocate used to block that .filters on different routers s.t. (34) and (35) are satisfied indepen- Note, however, that since is an integer variable, ,dently at every router. In the LCP tree, this means having the dual problem is not always guaranteed to converge to asubtrees, one for every router, each having at most leaves primal feasible solution [13].and their associated blocked traffic , where is Distributed Versus Centralized Solution: When the number http://ieeexploreprojects.blogspot.comthe total incoming traffic at router . Each subtree can be thought of source attackers is too large to be blocked at a single router,as a feasible solution of a FLOODING problem. Eq. (36) en- we need to combine resources such as the overall network ca-sures that the same address is not filtered multiple times at dif- pacity and the number of filters, available at multiple routers toferent routers, to avoid waste of filters. In the LCP-tree, this reduce the overall collateral damage. The optimal allocation oftranslates into every leaf appearing at most in one subtree. filters, in that case, can be found by solving (33)–(36) provides Algorithm: Constraint (36), which imposes that different the optimal allocation of filters in that case.routers do not block the same prefixes, prevents us from a The solution can be found either in a centralized or in a dis-direct decomposition of the problem. To decouple the problem, tributed way. In a centralized approach, a single node solvesconsider the following partial Lagrangian relaxation: (33)–(36) and distributes the optimal filter allocation to the all routers involved. This reduces the communication overhead be- tween routers, however the computation burden is put all on the node that solves both the master problem and different sub- problems. This can become infeasible in practice due to the large number of attackers. An alternative approach is to have each router solve its own (37) subproblem (38)–(40) and a single node (e.g., the victim’s gateway or a dedicated node) to solve the master problem (41). The communication overhead of this scheme is limited. Atwhere is the Lagrangian multiplier (price) for the constraint every iteration of the subgradient, the shadow prices ’s arein (36), and is the price associated with broadcast to all routers. For 100 000 of bad IPs, if we encodeprefix . With this relaxation, both the objective function and the value of variables using 2 B, each router need to sendthe other constraints immediately decompose in indepen- about 200 kB of data at each iteration. Given the ’s, thedent subproblems, one per router routers solve independently a subproblem each and return the computed to the node in charge of the master problem. In (38) general, let denote the total number of active IP sources, and the communication overhead is per router per iteration. Our approach only requires communication between s.t. (39) the node solving the master problem and the nodes solving the subproblems. It does not entail communication between (40) subproblems, which would incur a significant communication overhead.
  10. 10. 390 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 2, APRIL 2012 IV. SIMULATION RESULTS In this section, we evaluate our algorithms using real logsof malicious traffic from Dshield.org. We demonstrate thatour algorithms bring significant benefit compared to nonopti-mized filter selection or to generic clustering algorithms in awide range of scenarios. The reason is the well-known fact thatsources of malicious traffic exhibit spatial and temporal clus-tering [3]–[9], which is exploited by our algorithms. Indeed,clustering in a blacklist allows to use a small number of filters toblock prefixes with high density of malicious IPs at low collat-eral damage. Furthermore, it has also been observed that goodand bad addresses are typically not colocated, which allows fordistinguishing between good and bad traffic [6], [7], [15], andin our case for efficient filtering of the prefixes with most mali- Fig. 4. Evaluation of BLOCK-ALL in Scenarios I (High Clustering blacklist)cious sources. and II (Low Clustering blacklist). We plot the collateral damage (CD) (normal- N ized over the number of malicious sources ) versus number of filters F . WeA. Simulation Setup k compare Algorithm 1 to -means clustering (we simulated 50 runs of Lloyd’s algorithm [14]). We used 61-day logs from Dshield.org [10]—a repos-itory of firewall and intrusion detection logs collected. Thedataset consists of 758 698 491 attack reports, from 32 950 391 common algorithm to solve the problem, known as Lloyd’sdifferent IP sources belonging to about 600 contributing orga- algorithm [14], which uses an iterative refinement technique.nizations. Each report includes a timestamp, the contributor BLOCK-ALL: We ran Algorithm 1 in these Scenarios I andID, and the information for the flow that raised the alarm, II, and we show the results in Fig. 4. We made the following ob-including the (malicious) source IP and the (victim) destination servations. First, the optimal algorithm performs significantlyIP. Looking at the attack sources in the logs, we verified that better than a generic clustering algorithm that does not exploitmalicious sources are clustered in a few prefixes, rather than the structure of IP prefixes. In particular, it reduces the collat-uniformly distributed over the IP space, consistently with what eral damage (CD) by up to 85% compared to -means when http://ieeexploreprojects.blogspot.comwas observed before, e.g., in [3]–[7]. run on the same (high-clustering) blacklist. Second, the degree In our simulations, we considered a blacklist to be the set of clustering in a blacklist matters: The CD is lowest (highest) inof sources attacking a particular organization (victim) during a the blacklist with highest (lowest) degree of clustering, respec-single day-period. The degree of clustering varied significantly tively. Results obtained for different victims and days were sim-in the blacklists of different victims and across different days. ilar and lied in between the two extremes. A few thousands ofThe higher the clustering, the more benefit we expect from our filters were sufficient to significantly reduce collateral damageapproach. We also simulated the whitelist by generating good in all cases.IP addresses according to the multifractal distribution in [16] BLOCK-SOME: In Fig. 5, we focus on Scenario II, i.e., theon routable prefixes. We performed the simulations on a Linux Low Clustering blacklist, which is the least favorable input formachine with a 2.4-GHz processor with 2 GB RAM. our algorithm and has the highest CD (shown in dashed line in Fig. 4). Compared to BLOCK-ALL, which blocks all badB. Simulation of BLOCK-ALL and BLOCK-SOME IPs, BLOCK-SOME allows the operator to trade off lower CD for some unfiltered bad IPs by appropriately tuning the weights Simulation Scenarios I and II: In Fig. 4, we consider two assigned to good and bad addresses. For simplicity,example blacklists corresponding to two different victims, each in Fig. 5, we assign the same weight to all good addresses,attacked by a large number of malicious IPs in a single day. and the same weight to all bad addresses. Fig. 5 shows resultsIn order to demonstrate the range of benefit of our approach, for two different values of . (However, we note thatwe chose the blacklists with the highest and the lowest degree our framework has the flexibility to assign different weights toof source clustering observed in the entire data set, referred each individual IP address.)to as “High Clustering” (Scenario I) and “Low Clustering” In Fig. 5(a), CD is always smaller than the corresponding(Scenario II), respectively. The degree of clustering of a black- CD in Fig. 4; they become equal only when we block all badlist is captured by the entropy associated with the distribution IPs. In Fig. 5(b), we observe that BLOCK-SOME reduces theof the IP addresses [17]. Intuitively, a blacklist with low entropy CD by 60% compared to BLOCK-ALL while leaving unfiltered(i.e., high clustering) contains IP addresses in a few prefixes only 10% of bad IPs and using only a few hundreds a filters. Inand thus is easier to block. Fig. 5(c), the total cost of the attack (i.e., the weighted sum of We compare Algorithm 1 to a -means, which is gen- bad and good traffic blocked) decreases as increases. Theeral yet prefix-agnostic. -means is a well-known clustering interaction between these two competing factors is complex andproblem [18]: The goal is to partition all observations (in our strongly depends on the input blacklist and whitelist. In the datacontext, IP addresses) in clusters such that each address we analyzed, we observed that CD tends to first increase andbelongs to the cluster with the nearest mean. We use the most then decrease with , while the number of unfiltered bad
  11. 11. SOLDO et al.: OPTIMAL SOURCE-BASED FILTERING OF MALICIOUS TRAFFIC 391Fig. 5. Evaluation of BLOCK-SOME for Scenario II (Low Clustering black- Wlist). Three metrics are considered: (a) collateral damage (CD), (b) number ofunfiltered bad IPs (UBIP), (c) and total cost CD + 1 UBIP . The operator w ;w W w =wexpresses preference for UBIP versus CD by tuning the weights . Weconsidered a high (2 ) and a low (2 ) value for = .IPs tends to decrease The ratio captures the effort2 madeby BLOCK-SOME to block all bad IPs and become similar toBLOCK-ALL.C. Simulation of FLOODING and DIST-FLOODING http://ieeexploreprojects.blogspot.comFLOODING in Scenario III. (a) CD=N versus F . We show the normalized collateral damage CD=N as a function of the Fig. 6. Optimal Solution of Simulation Scenario III: We consider a Web server under a number of available filters F when C is fixed, C = C . (b) CD=N versus C . We show CD=N as a function of the available capacity CDDOS attack. We assume that the server has a typical access when F is fixed, F = Fbandwidth of Mb/s and can handle 10 000 connec- .tions per second (a typical capability of a Web server that han-dles light content). We assume that each good (bad) source gen-erates the same amount of good (bad) traffic. We also assume flows, which is a common practice in DDOS attacks [19].that filters are available (consistently with the Since, in a typical DDOS attack, bad sources outnumber thediscussion in footnote 1), and we vary . Be- good sources, uniform rate-limiting disproportionately penal-fore the attack, 5000 good sources are picked from [16] and izes the good sources. While this solution is always applicableutilize 10% of the capacity. During the attack, the total bad and requires only one rate limiter, more filters (ACLs) cantraffic is Gb/s and is generated by a typical black- drastically reduce the collateral damage. We simulated onelist (141 763 bad source IPs), based on Dshield logs of a ran- of the algorithms in [15], referred to as SAPF positive, whichdomly chosen victim for a randomly chosen day.3 selects prefixes to allow and denies traffic generated by FLOODING—Optimal: Fig. 6(a) and (b) shows the collateral all IP addresses outside those prefixes.4 It uses clustering todamage of the optimal solution of FLOODING, for Scenario III, find initial IP addresses to allow. Then, it greedilyas a function of the number of available filters and of the shortens the length of the allowed prefixes, thus allowingbottleneck capacity , respectively. more traffic, until the available capacity is completely used. We simulate two baselines, namely uniform rate-limiting This heuristic solves the FLOODING problem, but does notand source address prefix filtering (SAPF) [15], and we guarantee an optimal solution. In our simulations, the optimalcompare them to our approach. The uniform rate-limiting solution found by our algorithm, for the same number of ACLs,approach drops the same fraction of traffic of all incoming reduces the collateral damage by about a factor of 2. Another observation is that, because of its heuristic nature, SAPF does 2Since w =w > 1, bad IPs are more important. WhenF we picked a ratio not necessarily decrease monotonically with , consistently W is high, the algorithm first tries to cover small clusters or single bad IPs.In the case of high , this happens around 10 000 filters. CD remains almost 4In [15], there are three heuristics proposed for generating ACL lists: a pos-constant in this phase, at the end of which all bad IPs are filtered [as in Fig. itive algorithm (which specifies prefixes to allow and blocks all other traffic),5(b)]. In the final phase, the algorithm releases single good IPs, which are less a negative (which specifies prefixes to block and accepts all other traffic), andimportant, and all bad IPs are blocked similarly to BLOCK-ALL. a mixed (with possibly overlapping prefixes). In this paper, we compare our 3However, because this problem is NP-hard, we do not simulate the entire optimal algorithm for the FLOODING problem only against the positive algo-IP space, but the range [,], which is known to account for the rithm. The positive algorithm was found, in [15], to perform the best (i.e., have F ;C ;wlargest amount of malicious traffic, e.g., see [6]. We also scale all parameters by the lowest collateral damage for the same number of ACLs), followed closely F Ca factor of 8, to maintain a constant ratio between the number by the mixed algorithm. The negative algorithm performed the worse; this isof IPs and , and the total flow generated and . why we do not compare against it, although it is more similar to our approach.
  12. 12. 392 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 2, APRIL 2012with [15], which might result in inefficient use of the availablefilters. In summary, compared to [15], our work finds theoptimal solution at the lowest possible complexity (linear in theinput size) and for a wider range of problems (including but notlimited to FLOODING). We also observe that varying the number of filters or the avail-able capacity has a different impact on the collateral damage.While the collateral damage decreases drastically as the numberof filters increases, when we increase the available capacity, weobserve two trends. First, as capacity increases, the optimal so- Fig. 7. Heuristic solution of FLOODING in Scenario IV. An approximate so-lution allows traffic from good sources that do not belong in lution (higher CD) is obtained by solving only the subproblems z (f; n1C )prefixes with many malicious sources. This causes a linear de- for n 2 and C = n1C C , n = 1; 2 . . .. The coarser the capacity increments 1C , the fewer subproblems we need to solve, but at the cost ofcrease with slope equal to the amount of traffic generated by higher collateral damage. In this scenario, increasing 1C significantly reducesgood sources. For even larger , good IPs located in the same the computational time by three orders of magnitude, while the percentage ofprefix as malicious sources are released. This trend depends on good traffic that is blocked (CD %) is only increased by a factor 3.the specific clustering of good and bad IPs considered as well ason the amount of traffic generated by both good and bad sources.When the value of is too low, increasing does notyield any benefit. Most of the improvement is obtained whenboth types of resources (number of filters and capacity) increase. FLOODING—Heuristic: The benefit of the optimal solutionof FLOODING comes at high computational cost due to the in-trinsic hardness of the FLOODING problem. To address thisissue, we design a heuristic for solving FLOODING, whichcan be tuned to achieve the desired tradeoff between collateraldamage and computational time. In particular, instead of solvingall subproblems for all possible values ofand , we consider discrete increments of capacity , with step size . If http://ieeexploreprojects.blogspot.com , the finestgranularity of is considered, and the problem is optimallysolved. If , we may get a suboptimal solu- Fig. 8. Distributed flooding. This topology exemplifies the part of a potentiallytion, but we reduce the computation cost, as fewer iterations are larger ISP topology involved in routing and blocking traffic toward victim V .required to solve the DP. The edge routers receive all incoming, malicious and legitimate, traffic toward Simulation Scenario IV: We consider again a DDOS at- victim V and route it through shortest paths with ties broken randomly. Any of the traversed routers (indicated with circles) can be used to deploy ACLs andtack launched by 61 229 different bad source IPs, based block the malicious traffic.on the Dshield.org logs. The available capacity Mb/s. Before the attack, the legitimate trafficconsumes Mb/s. During the attack, the total optimal flooding for a single router, but now we assume thatbad traffic generated is Gb/s. This scenario is the traffic reaches the victim through its ISP, as in Fig. 8.more challenging than scenario III because there is less unused We use a subgradient descent method to solve the dualcapacity before the attack and more malicious traffic during the problem in (41). In Fig. 9, we show the convergence of theflooding attack. method for two different step sizes: 0.05 and 0.01. We also In Fig. 7, we show the percentage of good traffic that is compare against the “no coordination” case, when routersblocked by the heuristic versus the time required to obtain a do not coordinate but act independently to block malicioussolution for Scenario IV. As we can see in Fig. 7, the optimal traffic; this corresponds to the first iteration of the subgradientsolution of FLOODING requires about 1 day of method. In the next iterations, routers coordinate, through thecomputation and has a CD that is only 6% of the total good exchange of shadow prices , and avoid the redundant overlaptraffic. Larger values of allow to dramatically reduce the of prefixes at multiple locations. This reduces the collateralcomputational time by about three orders of magnitude while damage significantly, i.e., by 50%.the CD is only increased by a factor 2–3. This asymmetry was Increasing the number of routers has two effects. On onealso the case in other Dshield logs we simulated. This can hand, it increases the total number of filters available to blockbe very useful in practice: An operator may decide to use an source IPs and thus can potentially reduce the overall collateralapproximation of the optimal filtering policy to immediately damage. On the other hand, deploying more routers increasescope with incoming bad traffic, and then successively refine the the communication overhead required to coordinate them.allocation of filters to further reduce the collateral damage if In Fig. 10, we study the tradeoff between reduced collateralthe attack persists. damage and increased communication overhead as the number DISTRIBUTED-FLOODING: We simulated the scenario of routers increases. For simplicity, we assume that each routerwhere an ISP utilizes multiple routers to collaboratively block has the same number of filters, i.e., . Asmalicious traffic. We consider the same scenario (III) as for the we can see in Fig. 10, increasing the number of routers provides
  13. 13. SOLDO et al.: OPTIMAL SOURCE-BASED FILTERING OF MALICIOUS TRAFFIC 393 on which filters to pick. Therefore, they are complementary but orthogonal to this work. We rely on an intrusion detection system or on historical data to distinguish good from bad traffic and to provide us with a blacklist. Detection of malicious traffic is an important problem, but out of the scope of this paper. The sources of legitimate traffic are also assumed known and used for assessing the collat- eral damage—e.g., Web servers or ISPs typically keep historical data and know their important customers. We also consider ad- dresses in the blacklist to not be spoofed. A practical deployment scenario is that of a single network under the same administrative authority, such as an ISP or a campus network. The operator can use our algorithms to install filters at a single edge router or at several routers in order toFig. 9. Evaluation of DIST-FLOODING in Scenario III. Results are shown optimize the use of its resources and to defend against an attackfor the distributed algorithm and two different values for the step size (0.01 in a cost-efficient way. Our distributed algorithm may also beand 0.05) of the subgradient method. The dashed line shows the case of “NoCoordination,” i.e., when each router acts independently. useful, not only for a routers within the same ISP, but also, in the future, when different ISPs start cooperating against common enemies [20]. The problems studied in this paper are also related to firewall configuration to protect public-access networks. Unlike routers where TCAM puts a hard limit on the number of ACLs, there is no hard limit on the number of firewall rules in software. How- ever, there is still an incentive to minimize their number and thus any associated performance penalty [22]. There is a body of work on firewall rule management and (mis)configuration [23], which aims at detecting anomalies, while we focus on resource allocation. http://ieeexploreprojects.blogspot.com Several measurement studies have Measurement Studies:Fig. 10. Distributed flooding: CD=N and communication overhead versus demonstrated that malicious sources exhibit spatial and tem-jRj. On the x-axis is the number of routers jRj. On the left y -axis is thenormalized collateral damage CD=N . On the right y -axis is the overall poral clustering [3]–[7], [9]. In order to deal with dynamiccommunication overhead (MB/iteration) required to coordinate routers. As malicious IP addresses [8], IP prefixes rather than individual IPCD=N decreases faster than the communication overhead increases, there is a addresses are typically considered. The clustering, in combina-sweet spot (three routers, for this input blacklist) where we benefit from largereduction of CD=N while incurring a modest increase in the communication tion with the fact that the distribution of addresses as well asoverhead required. other statistical characteristics differ for good and bad traffic, have been exploited in the past for detection and mitigation of malicious traffic, such as, e.g., spam [6], [7] or DDOS [15].a significant benefit in terms of reducing the collateral damage, In this paper, we exploit these characteristics for efficientwhile the communication overhead increases only linearly prefix-based filtering of malicious traffic.with the overall number of filters available, as discussed in Prefix Selection: The work in [15] studied source prefix fil-Section III-F. Depending on the bandwidth available and on the tering for classification and blocking of DDOS traffic, whichtolerable level of collateral damage, a network administrator is closely related to our FLOODING problem. The selection ofcan decide how many routers should be deployed to filter the prefixes in [15] was done heuristically, thus leading to large col-attack sources. Our work provides the framework to do so in a lateral damage. In contrast, we tackle analytically the optimalprincipled way. source prefix selection so as to minimize collateral damage. Fur- thermore, we provide a more general framework for formulating V. RELATED WORK and optimally solving a family of related problems, including Bigger Picture: Dealing with malicious traffic is a hard but not limited to FLOODING.problem that requires the cooperation of several components, The work in [24] is related to our TIME-VARYINGincluding detection and mitigation techniques, as well as problem: It designed and analyzed an online learning algorithmarchitectural aspects. In this paper, we optimize the use of for tracking malicious IP prefixes based on a stream of labeledfiltering—a mechanism that already exists on the Internet today data. The goal was detection, i.e., classifying a prefix as mali-and is a necessary building block of any bigger solution. More cious, depending on the ratio of malicious to legitimate traffic itspecifically, we focus on the optimal selection of which prefixes generates and subject to a constraint on the number of prefixes.to block. The filtering rules can be then propagated by filtering In contrast: 1) we identify precisely (not approximately) theprotocols [19]–[21] and ideally installed on routers as close to IP prefixes with the highest concentration of malicious traffic;the attack sources as possible. We note, however, that such pro- 2) we follow a different formulation (dynamic programmingtocols typically assume the ability to filter traffic at arbitrarily inspired by knapsack problems); 3) we use the results of detec-fine granularity and focus on where to place the filters rather tion as input to our filtering problem.