The Heatmap
 - Why is Security Visualization so Hard?

6,245 views

Published on

This presentation explores why it is so hard to come up with a security monitoring (or shall we call it security intelligence) approach that helps find sophisticated attackers in all the data collected. It explores the question of how to visualize a billion events. To do so, the presentation dives deeply into heatmaps - matrices - as an example of a simple type of visualization. While these heatmaps are very simple, they are incredibly versatile and help us think about the problem of security visualization. They help illustrate how data mining and user experience design help get a handle of the security visualization challenges - enabling us to gain deep insight for a number of security use-cases.

Published in: Internet

The Heatmap
 - Why is Security Visualization so Hard?

  1. 1. Raffael Marty, CEO The Heatmap
 Why is Security Visualization so Hard? Area41 Zurich, Switzerland June 2, 2014
  2. 2. Security. Analytics. Insight.2 Heatmaps
  3. 3. Security. Analytics. Insight.3 I am Raffy - I do Viz! IBM Research
  4. 4. Security. Analytics. Insight.4 Attacks have changed: • Targeted • Objectives beyond monetization • Low and Slow • Multiple access vectors • Remotely controlled The (New) Threat Landscape APT 1 Unit 61398 (61398部 ) Motivations have changed: • Nation state sponsored • Political, economic, and military advantage • Monetization / Crimeware • Religion • Hacktivism Security approaches failed due to: • Reliance on past knowledge / signatures • Systems are too rigid (e.g, schema) • Poor scalability • Limited knowledge exchange
  5. 5. Security. Analytics. Insight.5 How Compromises Are Detected Mandiant M Trends Report 2014 Threat Report Attackers innetworks before detection 27 days 229 days Average time toresolveacyberattack Successfulattackspercompany perweek 1.4 Average cost percompany peryear $7.2M
  6. 6. Security. Analytics. Insight.6 Our Security Goals ! ! Find Intruders and ‘New Attacks’ ! ! Discover Exposure Early ! ! Communicate Findings
  7. 7. Security. Analytics. Insight.7 Visualize Me Lots (>1TB) of Data ! ! SecViz is Hard!
  8. 8. Security. Analytics. Insight.8 Visualize 1TB of Data - What Graph? drop reject NONE ctl accept DNS Update Failed Log In IP Fragments Max Flows Initiated Packet Flood UDP Flood Aggressive Aging Bootp Renew Log Out Release NACK Conflict DNS Update Successful DNS record not deleted DNS Update Request Port Flood 1 10000 100000000 How much information does each of the graphs convey?
  9. 9. Security. Analytics. Insight.9 The Heatmap Matrix A, where aij are integer values mapped to a color scale. aij = 1 10 20 30 40 50 60 70 80 >90 42 rows columns
  10. 10. Security. Analytics. Insight.10 Mapping Data to a Heatmap values = how often was <row_item> seen time rows = source ip columns = time
  11. 11. Security. Analytics. Insight.11 Mapping Log Records to Heatmaps May 5 23:57:50 pixl-ram sudo: pam_unix(sudo:session):
 session opened for user root by ram(uid=0) root ram peg sue } ∆t .. time bin
  12. 12. Security. Analytics. Insight.11 Mapping Log Records to Heatmaps May 5 23:57:50 pixl-ram sudo: pam_unix(sudo:session):
 session opened for user root by ram(uid=0) root ram peg sue } ∆t .. time bin
  13. 13. Security. Analytics. Insight.11 Mapping Log Records to Heatmaps May 5 23:57:50 pixl-ram sudo: pam_unix(sudo:session):
 session opened for user root by ram(uid=0) root ram peg sue } ∆t .. time bin ⨍()=+1
  14. 14. Security. Analytics. Insight.12 • Scales well to a lot of data (can aggregate ad infinitum) • Shows more information than a bar chart • Flexible ‘measure’ mapping • frequency count • sum(variable) [avg(), stddev(), …] • distinct count(variable) Why Heatmaps?
  15. 15. Security. Analytics. Insight.12 • Scales well to a lot of data (can aggregate ad infinitum) • Shows more information than a bar chart • Flexible ‘measure’ mapping • frequency count • sum(variable) [avg(), stddev(), …] • distinct count(variable) Why Heatmaps? • BUT information content is limited! • Aggregates too highly in time and potentially value dimensions
  16. 16. Security. Analytics. Insight.13 Data Visualization Workflow Overview Zoom / Filter Details on Demand
  17. 17. Security. Analytics. Insight.14 Heatmap • Can pack millions of records (although highly aggregated) • Allows for zoom-in to expose detail • By itself exposes patterns • Great ‘navigation’ tool to drill into different, ‘non-scalable’ visualization ! • No other visualization possesses these properties Data Visualization Workflow - Overview
  18. 18. Security. Analytics. Insight.15 1. Labels HeatMap Challenges - Display <1px per label 1000s of rows
  19. 19. Security. Analytics. Insight.16 2. Mouse-Over • What information to show? • Position - x/y coordinates • Original records • Query backend for each position? HeatMap Challenges - Display
  20. 20. Security. Analytics. Insight.17 3. Sorting • Random • Alphabetically • Based on values • Similarity • What algorithm? • What distance metric? • Leverage third data field / context? HeatMap Challenges - Display random row order rows clustered user
  21. 21. Security. Analytics. Insight.18 4. Overplotting • How to summarize multiple rows in one pixel? • Sum? • Overplot x and y axes? • Undo overplot on zoom? 1 row -> 1 pixel n rows -> 1 pixel 1 row -> m pixels }∑ HeatMap Challenges - Display
  22. 22. Security. Analytics. Insight.19 1. Time Selection • Take screen resolution into account
 (you have 1000 pixels and you query 1005 seconds?) • Chose start AND end time? • Communicate to user what data is available? HeatMap Challenges - Interaction start time end time
  23. 23. Security. Analytics. Insight.20 2. Zoom and Pan • Re-query for more detail? HeatMap Challenges - Interaction
  24. 24. Security. Analytics. Insight.21 3. Color Scales / Ranges • discrete • continuous • different colors • multiple anchors HeatMap Challenges - Interaction
  25. 25. Security. Analytics. Insight.22 4. Exposure - Mapping data to color HeatMap Challenges - Interaction values frequency dark colors under utilized
  26. 26. Security. Analytics. Insight.23 5. Pivot HeatMap Challenges - Interaction destinationAddress
  27. 27. Security. Analytics. Insight.23 5. Pivot HeatMap Challenges - Interaction destinationAddress sourceAddress WHERE destinationAddress = 81.223.6.41
  28. 28. Security. Analytics. Insight.24 Different backend technologies (big data) • Key-value store • Search engine • GraphDB • RDBMS • Columnar - can answer analytical questions • Hadoop (Map Reduce) • good for operations on ALL data HeatMap Challenges - Backend Other things to consider: • Caching • Joins
  29. 29. Security. Analytics. Insight.25 • Showing relationships -> link graphs ! ! ! • Showing multiple dimensions and their inter- relatedness -> || coords What’s the HeatMap Not Good At
  30. 30. Security. Analytics. Insight.26 Heatmaps Are Good Starting Points … BUT Overview Zoom / Filter Details on Demand
  31. 31. Security. Analytics. Insight.27 Leverage Data Mining to Summarize Data Overview Zoom / Filter Details on Demand Overview • Leverage data mining (clustering) to create an overview • Summarizing dozens of dimensions into a two-dimensional overview
  32. 32. Security. Analytics. Insight.28 Self Organizing Maps • Clustering based on a single data dimension • for example “attackers” • It’s hard to • engineer the right features • avoid over-learning • interpret the clusters 3 2 1 3 clusters
  33. 33. Raffael . Marty @ pixlcloud . com 29 Examples
  34. 34. Security. Analytics. Insight.30 Vincent Th i s h eat m a p s h o w s behavior over time. ! In this case, we see activity per user. We can see that ‘vincent’ is visually different from all of the other users. He shows up very lightly over the entire time period. This seems to be something to look into. ! Purely visual, without understanding the data were we able to find this.
  35. 35. Security. Analytics. Insight.33 Firewall Heatmap
  36. 36. Security. Analytics. Insight.34 Showing Activity per Destination Address
  37. 37. Security. Analytics. Insight.35 Changing Color Exposure
  38. 38. Security. Analytics. Insight.36 Zoom In
  39. 39. Security. Analytics. Insight.37 Pivot to Source Address
  40. 40. Security. Analytics. Insight.38 Seriate
  41. 41. Security. Analytics. Insight.40 Expanding Detail source destination port source port
  42. 42. Security. Analytics. Insight.41 Intra-Role Anomaly - Random Order users time dc(machines)
  43. 43. Security. Analytics. Insight.42 Intra-Role Anomaly - With Seriation
  44. 44. Security. Analytics. Insight.43 Intra-Role Anomaly - Sorted by User Role Administrator Sales Development Finance
  45. 45. Security. Analytics. Insight.43 Intra-Role Anomaly - Sorted by User Role Administrator Sales Development Finance Admin???
  46. 46. Security. Analytics. Insight.44 • Millions of rows • High-cardinality fields ! ! • Where to start analysis? • Formulate some hypotheses • Informs visualization process and data preparation • Our hypothesis and assumption • Machines that get passed and blocked might be of interest • Low-frequency sources are not interesting Firewall Data firewall data data type cardinality distribution source ip ipv4 10-10^6 depends dest ip ipv4 10-10^6 depends source port int 65535 depends dest port int int 65535 highly skewed bytes in/out int - skewed action bool / int 3 - direction / iface bool / str small -
  47. 47. Security. Analytics. Insight.45 Visual Mapping } ∆t .. time bin - aggregation source 10.0.0.1 10.0.0.2 10.0.0.3 10.0.0.4 block & 
 pass blockpass color mapping:
  48. 48. Security. Analytics. Insight.46 Low-Frequency Behavior sum <= 10; outbound sum <= 10; inbound 36k rows source ip
  49. 49. Security. Analytics. Insight.47 Outbound Blocks What’s That? Oct 25 11:56:14.123128 rule 238/0(match): block in on xl1: 212.254.110.98.80 > 195.186.129.127.1338: 
 . 3660196221:3660197653(1432) ack 906644 win 32936 (DF) Oct 25 11:57:18.140007 rule 238/0(match): block in on xl1: 212.254.110.98.80 > 195.186.129.127.1338: 
 . 0:1432(1432) ack 1 win 32936 (DF) Oct 25 11:58:22.156195 rule 238/0(match): block in on xl1: 212.254.110.98.80 > 195.186.129.127.1338: 
 . 0:1432(1432) ack 1 win 32936 (DF) Oct 25 11:59:26.170915 rule 238/0(match): block in on xl1: 212.254.110.98.80 > 195.186.129.127.1338: 
 . 0:1432(1432) ack 1 win 32936 (DF) less pflog.txt | grep xl1 | grep "rule 238" | sed -e 's/(Oct .. ..):..:..........*/1/' | uniq -c 6 Oct 25 03 8 Oct 25 05 3 Oct 25 06 25 Oct 25 07 9 Oct 25 08 117 Oct 25 09 127 Oct 25 10 169 Oct 25 11 178 Oct 25 12 158 Oct 25 13 187 Oct 25 14 354 Oct 25 15 111 Oct 25 16 104 Oct 25 17 33 Oct 25 18 17 Oct 25 19 A clear increase in rule 238 traffic
  50. 50. Security. Analytics. Insight.48 High Frequency Sources Over Time block & 
 pass blockpass sum > 10 672 rows
  51. 51. Security. Analytics. Insight.49 High Frequency Traffic Split Up inbound outbound 192.168.0.201! 195.141.69.42 195.141.69.43! 195.141.69.44 195.141.69.45! 195.141.69.46 212.254.110.100! 212.254.110.101! 212.254.110.107! 212.254.110.108! 212.254.110.109! 212.254.110.110! 212.254.110.98! 212.254.110.99 ! 62.245.245.139 !
  52. 52. Security. Analytics. Insight.50 Outbound Traffic - Some Questions To Ask • What happened mid-way through? • Why is anything outbound blocked? • What are the top and bottom machines doing? • Did we get a new machine into the network? • Some machines went away? 195.141.69.42
  53. 53. Security. Analytics. Insight.51 195.141.69.42 - Interactions action port dest
  54. 54. Security. Analytics. Insight.53 Zooming in on Top Rows ! 212.254.110.100 212.254.110.101 212.254.110.102 212.254.110.103 212.254.110.104 212.254.110.105 212.254.110.106 212.254.110.107 212.254.110.108 212.254.110.109 212.254.110.110 212.254.110.111 212.254.110.112 212.254.110.113 212.254.110.114 212.254.110.115 212.254.110.116 212.254.110.117 212.254.110.118 212.254.110.119 212.254.110.120 212.254.110.121 212.254.110.122 212.254.110.123 212.254.110.124 212.254.110.125 212.254.110.126 212.254.110.127 212.254.110.66 212.254.110.96 212.254.110.97 212.254.110.98 212.254.110.99 • Hardly any pass-block Oct 22 14:20:08.351202 rule 237/0(match): block in on xl0: 66.220.17.151.80 > 212.254.110.103.1881: S 1451746674:1451746678(4) ack 1137377281 win 16384 (DF)
  55. 55. Security. Analytics. Insight.53 Zooming in on Top Rows ! 212.254.110.100 212.254.110.101 212.254.110.102 212.254.110.103 212.254.110.104 212.254.110.105 212.254.110.106 212.254.110.107 212.254.110.108 212.254.110.109 212.254.110.110 212.254.110.111 212.254.110.112 212.254.110.113 212.254.110.114 212.254.110.115 212.254.110.116 212.254.110.117 212.254.110.118 212.254.110.119 212.254.110.120 212.254.110.121 212.254.110.122 212.254.110.123 212.254.110.124 212.254.110.125 212.254.110.126 212.254.110.127 212.254.110.66 212.254.110.96 212.254.110.97 212.254.110.98 212.254.110.99 • Hardly any pass-block 212.254.110.102 Oct 16 13:14:05.627835 rule 0/0(match): pass in on xl0: 66.220.17.151.80 > 212.254.110.102.1977: S 1841864015:1841864019(4) ack 1308753921 win 16384 (DF) ! SYN ACK for real Web traffic passed
  56. 56. Security. Analytics. Insight.54 This Guy Sure Keeps Busy 212.254.144.40 dest port
  57. 57. Security. Analytics. Insight.55 • Attackers are very successful • Data could reveal adversaries • We have a big data analytics problem • We need the right analytics and visualizations • Security visualization is hard • Data visualization workflow is a promising approach • Heatmaps are great for overviews • We need a set of heuristics and workflows Recap
  58. 58. 56 raffael.marty@pixlcloud.com

×