Effective multimodel anomaly detection using cooperative negotiation

566 views
471 views

Published on

Many computer protection tools incorporate learning techniques that build mathematical models to capture the characteristics of system’s activity and then check whether live system’s activity fits the learned models. This approach, referred to as anomaly detection, has enjoyed immense popularity because of its effectiveness at recognizing unknown attacks (under the assumption that attacks cause glitches in the protected system). Typically, instead of building a single complex model, smaller, partial models are constructed, each capturing different features of the monitored activity. Such multimodel paradigm raises the non-trivial issue of combining each partial model to decide whether or not the activity contains signs of attacks. Various mechanisms can be chosen, ranging from a simple weighted average to Bayesian networks, or more sophisticated strategies. In this paper we show how different aggregation functions can influence the detection accuracy. To mitigate these issues we propose a radically different approach: rather than treating the aggregation as a calculation, we formulate it as a decision problem, implemented through cooperative negotiation between autonomous agents. We validated the approach on a publicly available, realistic dataset, and show that it enhances the detection accuracy with respect to a system that uses elementary aggregation mechanisms.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
566
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Effective multimodel anomaly detection using cooperative negotiation

  1. 1. Effective multimodel anomaly detection using cooperative negotiation AlbertoVolpatto, Federico Maggi, Stefano Zanero DEI, Politecnico di Milano
  2. 2. Anomaly detection Example: detecting malicious HTTP messages
  3. 3. Anomaly detection Example: detecting malicious HTTP messages Dynamic web pageBad guy Malicious HTTP Request GET /login/id/<script>..</script> Malicious HTTP Response <script>iInfectPCs();</script> HTTP Redirect www.iSpreadMalware.org Bad guy's page Unlucky Client Malicious HTTP Response /* Attack 3rd party plugin */
  4. 4. Anomaly detection Modeling non-malicious messages to find malicious ones Clients Webserver Millions of good HTTP messages
  5. 5. Anomaly detection Modeling non-malicious messages to find malicious ones Clients Webserver Millions of good HTTP messages
  6. 6. Learning phase Client Webserver Models of good messages M1 MnM2 M3
  7. 7. Learning phase Client Webserver Models of good messages M1 MnM2 M3
  8. 8. Learning phase Client Webserver Models of good messages M1 MnM2 M3
  9. 9. Learning phase Client Webserver MnM1 M3M2 Example of models — parameter string length — numeric range — probabilistic grammar of strings — string character distribution GET /page?uid=u44&p=14&do=delete
  10. 10. Learning phase Client Webserver Models of good sessions M1 MnM2 M3 C1 C3 C2 M1 C7 C1 C3 M2 C2 C10 C7 Mn GET /page?uid=u43&p=10&do=add
  11. 11. Learning phase Client Webserver M1 MnM3M2 C1 C2 C3 M1 C7 C1 C3 M2 C2 C5 C10 C3 C7 Mn C1 GET /page?uid=s10&do=add
  12. 12. Detection phase Client Webserver Detection of bad messages M1 MnM2 M3 GET /page?do=<script>MaliciousCode();
  13. 13. Issue
  14. 14. Anomaly value aggregation • Not always well formalized (exceptions exist) • Each model gives a partial anomaly value • Issue: combining partial anomaly values is not trivial • simple average (too simple) • weighted average (how to set weights?) • Bayesian networks (how to tune them easily?) • etc. (ample literature on the subject)
  15. 15. Proposed approach
  16. 16. Proposed approach • Models treated as autonomous agents
  17. 17. Proposed approach • Models treated as autonomous agents • Overall anomaly value negotiated iteratively through a mediator
  18. 18. New detection phase . . . Client Webserver Partial models (i.e., agents) M1 MnM2 HTTPrequests Mediator . . . q
  19. 19. Partial anomaly values . . . pt 1 pt 2 pt i pt n Client Webserver Partial models (i.e., agents) M1 MnM2 HTTPrequests Mediator . . . q t
  20. 20. Anomaly value . . . at at at at Client Webserver Partial models (i.e., agents) M1 MnM2 HTTPrequests Mediator . . . q t
  21. 21. Partial anomaly values . . . Client Webserver Partial models (i.e., agents) M1 MnM2 HTTPrequests Mediator . . . q pt+1 1 pt+1 2 pt+1 i pt+1 n t + 1
  22. 22. . . .
  23. 23. Until agreement All partial anomaly values are equal Overall anomaly value is selected pt 1 = pt 2 = · · · = pt n a = at
  24. 24. Negotiation function partial anomaly value of each agent at next iteration anomaly value at current iteration agreement coefficient for each agent pt+1 i = Fi(pt i, at ) = pt i + αi(at − pt i) pt+1 i at αi
  25. 25. Agreement coefficient αi = fα(wi) = 1 1 + eh(wi−k) weights are the same trust levels of each agent (i.e., partial model) are tuning parameters wi h, k
  26. 26. Agreement coefficient • values close to one: • change i-th offer • values close to zero: • preserve i-th offer o do e e αi 1 1 + eh(wi−k)
  27. 27. Agreement function anomaly value at current iteration weights are trust levels of each agent (i.e., partial model) at each iteration, the agreement is, e.g., the weighted average of the the anomaly value p across all the agents wi pt i at = f(pi, wi) = i pt iwi i wi at
  28. 28. Modification of the learning phase • A trust level for each model is needed • Typically, modern tools compute it during learning [Criscione et al., EC2ND 2010] • Examples: standard deviation of string length model, kurtosis measure of integer models
  29. 29. Trust levels constant over time • Two-way communication between agents and mediator is not needed • The mediator would just receive the initial offers (i.e., partial anomaly values) and run the iterative algorithm wt i = wt+1 i = · · · = wi
  30. 30. Optimized learning • Monitor the trust level of each model using a simple sliding window • When the j-th learning sample comes in • Stop learning when “stability” is reached δW (j) := max j∈W wj i − min j∈W wj i δW (j ) ε W
  31. 31. Stop learning when trust is “stable” e di e
  32. 32. Evaluation • Background traffic: clean UCSB iCTF 2008 data (22,051 HTTP requests-responses) 14,961 attack-free training samples 7,090 attack-free testing samples • Attack data: instances of the most popular real-world attacks (SQL injections, JavaScript injections, command injections) inserted with random mutations
  33. 33. Impact of modifications 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 DR FPR Weighted average Cooperative negotiation Cooperative negotiation and optimized learning
  34. 34. Limitations (1) i.e., future-works 1. strict convergence not formally proven • but parameters influence detection quality only minimally, and predictably • Mitigation: choose h and k to guarantee convergence
  35. 35. Convergence in practice 0 200 400 600 800 1000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Iterations k h = 2.5 h = 5.0 h = 7.5 h = 10.0 h = 12.5 h = 15.0 h = 17.5 h = 20.0 h = 22.5 h = 25.0
  36. 36. Convergence in practice 0 200 400 600 800 1000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Iterations k h = 2.5 h = 5.0 h = 7.5 h = 10.0 h = 12.5 h = 15.0 h = 17.5 h = 20.0 h = 22.5 h = 25.0
  37. 37. Convergence in practice 0 200 400 600 800 1000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Iterations k h = 2.5 h = 5.0 h = 7.5 h = 10.0 h = 12.5 h = 15.0 h = 17.5 h = 20.0 h = 22.5 h = 25.0
  38. 38. 0 0.001 0.002 0.003 0.004 0.005 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 FPR k h = 2.5 h = 5.0 h = 7.5 h = 10.0 h, k versus FPR
  39. 39. 0.94 0.945 0.95 0.955 0.96 0.965 0.97 0.975 0.98 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 DR k h = 2.5 h = 5.0 h = 7.5 h = 10.0 h, k versus DR
  40. 40. Limitations (2) i.e., future-works 2. trust level independent from input • possible concept drift not taken into account • Mitigation: already addressed by other approaches [Maggi et al., RAID 2009]
  41. 41. Limitations (3) i.e., future-works 2. relax cooperative assumption • important in case of distributed detections • agents may cheat because: • outdated training base (mitigation: [Robertson et al., NDSS 2010]) • intruders took over a detector
  42. 42. Conclusions • very simple to implement • distributed detection is “embedded” in the model • meaning of weights is now defined • easy to generalize to more complex schema • within the scope of our preliminary evaluation, it works against real-world attacks
  43. 43. Questions? AlbertoVolpatto, Federico Maggi, Stefano Zanero fmaggi@elet.polimi.it http://home.dei.polimi.it/fmaggi/ http://www.vplab.elet.polimi.it

×