Network-Wide Heavy-Hitter Detection
with Commodity Switches
Ajay Kharat-2019H1030011G
BITS - Pilani
1
Rob Harrison, Qizhe Cai, Arpit Gupta, and Jennifer Rexford
Motivation
• Network operators often need to identify outliers in network traffic, to detect
attacks or diagnose performance problems.
• In order to detect such problems network operators perform heavy hitter
detection for flows.
• In the traditional system, the heavy hitter detection was done using analysing
packets or examining the packet flows.
• Prior work was focus on detecting heavy hitters on a single switch but we often
need to track network-wide heavy hitters.
• While detecting heavy hitters on network wide basis we will try to reduce the
communication overhead while maintaining the accuracy.
2
3
Flow Count
Grey
Collector
Sample Packets
Inaccurate at low sampling rates and short intervals
1
𝑥 c
Sampled Central Collection
Problem Statement
4
…
Switches
Coordinator
GlobalThreshold (T)
Flow Count Threshold
f c t = T/2j·n
Round 1 Round 2 Round 3 Round 4
8 4 2 1
Switches (n) = 4, GlobalThreshold (T) = 64
= 8= 4
CMY Approach
Problem Statement
Upper bound O(nlogT/n) on the communications overhead between
the n observers and a single coordinator
Approach and Solution
Internet
External
Ingress
Internal
Ingress
Servers
Border
Exploiting the Spatial Locality
5
Approach and Solution
…
Switches
Flow Count Threshold
f c t = EWMAi/∑n(EWMA)
Flow Estimate Threshold
f e = n·(t-1) T
Flow s1 s2 s3 sn
ft c1 c2 c3 cn
= T/n
Increase where flow is heavy, decrease where it is not
ft+1 c1 c2 c3 cn
…
Adaptive Local Thresholds
6
Evaluation of the Solution
• To quantify the reduction in the communication overhead using we can adjust
the following parameters:
Affinity probability (p) Global threshold (TG)
7
Evaluation of the Solution
Number of sites (n)
8
So by evaluating the solution we can find 70% reduction in
communication overhead as compared to CMY approach.
Related Work
• Frequent and Top-k Item Detection:
• Other approaches try to reduce spaces.
• It uses compact data structures to detect heavy hitters on a single switch.
• Whereas in current approach we try to reduce the communication overhead.
• Distributed Detection:
• Few approach try to work on design consideration whereas the current work
builds a prototype using adaptive local threshold.
• Other approaches try to reduce the communication overhead but ignores the
impact of key distribution in distributed data streams.
• Whereas the current approach focuses on exploiting the spatial locality in
order to improve those previous results.
9
Future Work
Approach can be improved in at least three ways:
(1) By reducing the amount of state switches must store in the data plane
(2) Supporting distinct counts
(3) Scaling to large number of sites
Memory-Efficient Heavy-Hitters:
• To reduce the space requirements, the data plane could maintain a count-min
sketch to estimate the counts for all keys, and then only store counts and
thresholds for keys with counts above some minimum size that would qualify
them as a potential heavy hitter.
Heavy Distinct Counts:
• We can use an approximate algorithm known as HyperLogLog to compute
distinct counts(e.g.. the no. of unique sources contacting a given destination).
• This algorithm takes advantage of randomization in order to approximate
distinct counts in a distributed fashion that can be merged by a central
coordinator.
10

Network-Wide Heavy-Hitter Detection with Commodity Switches

  • 1.
    Network-Wide Heavy-Hitter Detection withCommodity Switches Ajay Kharat-2019H1030011G BITS - Pilani 1 Rob Harrison, Qizhe Cai, Arpit Gupta, and Jennifer Rexford
  • 2.
    Motivation • Network operatorsoften need to identify outliers in network traffic, to detect attacks or diagnose performance problems. • In order to detect such problems network operators perform heavy hitter detection for flows. • In the traditional system, the heavy hitter detection was done using analysing packets or examining the packet flows. • Prior work was focus on detecting heavy hitters on a single switch but we often need to track network-wide heavy hitters. • While detecting heavy hitters on network wide basis we will try to reduce the communication overhead while maintaining the accuracy. 2
  • 3.
    3 Flow Count Grey Collector Sample Packets Inaccurateat low sampling rates and short intervals 1 𝑥 c Sampled Central Collection Problem Statement
  • 4.
    4 … Switches Coordinator GlobalThreshold (T) Flow CountThreshold f c t = T/2j·n Round 1 Round 2 Round 3 Round 4 8 4 2 1 Switches (n) = 4, GlobalThreshold (T) = 64 = 8= 4 CMY Approach Problem Statement Upper bound O(nlogT/n) on the communications overhead between the n observers and a single coordinator
  • 5.
  • 6.
    Approach and Solution … Switches FlowCount Threshold f c t = EWMAi/∑n(EWMA) Flow Estimate Threshold f e = n·(t-1) T Flow s1 s2 s3 sn ft c1 c2 c3 cn = T/n Increase where flow is heavy, decrease where it is not ft+1 c1 c2 c3 cn … Adaptive Local Thresholds 6
  • 7.
    Evaluation of theSolution • To quantify the reduction in the communication overhead using we can adjust the following parameters: Affinity probability (p) Global threshold (TG) 7
  • 8.
    Evaluation of theSolution Number of sites (n) 8 So by evaluating the solution we can find 70% reduction in communication overhead as compared to CMY approach.
  • 9.
    Related Work • Frequentand Top-k Item Detection: • Other approaches try to reduce spaces. • It uses compact data structures to detect heavy hitters on a single switch. • Whereas in current approach we try to reduce the communication overhead. • Distributed Detection: • Few approach try to work on design consideration whereas the current work builds a prototype using adaptive local threshold. • Other approaches try to reduce the communication overhead but ignores the impact of key distribution in distributed data streams. • Whereas the current approach focuses on exploiting the spatial locality in order to improve those previous results. 9
  • 10.
    Future Work Approach canbe improved in at least three ways: (1) By reducing the amount of state switches must store in the data plane (2) Supporting distinct counts (3) Scaling to large number of sites Memory-Efficient Heavy-Hitters: • To reduce the space requirements, the data plane could maintain a count-min sketch to estimate the counts for all keys, and then only store counts and thresholds for keys with counts above some minimum size that would qualify them as a potential heavy hitter. Heavy Distinct Counts: • We can use an approximate algorithm known as HyperLogLog to compute distinct counts(e.g.. the no. of unique sources contacting a given destination). • This algorithm takes advantage of randomization in order to approximate distinct counts in a distributed fashion that can be merged by a central coordinator. 10