The extent and impact of recent security breaches is showing that current approaches are just not working. But what can we do to protect our business? We have been advocating monitoring for a long time as a way to detect subtle, advanced attacks. However, products have failed to deliver on this promise. Current solutions don't scale in both data volume and analytical insights. In this presentation we will explore why it is so hard to come up with a security monitoring (or shall we call it security intelligence) approach that helps find sophisticated attackers in all the data collected. We are going to explore the question of how to visualize a billion events. We are going to look at a number of security visualization examples to illustrate the problem and some possible solutions. These examples will also help illustrate how data mining and user experience design help us get a handle of the security visualization challenges - enabling us to gain deep insight for a number of security use-cases.
2. 2 Secur i ty. Analyt ics . Ins ight .
Visualization - Heatmaps
3. 3 Secur i ty. Analyt ics . Ins ight .
Visualization - Graphs
4. 4 Secur i ty. Analyt ics . Ins ight .
I am Raffy - I do Viz!
IBM Research
5. 27 days
229 days
Average time to resolve a cyber attack
1.4
$7.2M
5 Secur i ty. Analyt ics . Ins ight .
How Compromises Are Detected
Mandiant M Trends Report 2014 Threat Report
Attackers in networks before detection
Successful attacks per company per week
Average cost per company per year
6. 6 Secur i ty. Analyt ics . Ins ight .
Our Security Goals
Find Intruders and ‘New Attacks’
Discover Exposure Early
Communicate Findings
7. 7 Secur i ty. Analyt ics . Ins ight .
Why Visualization?
the stats ...
http://en.wikipedia.org/wiki/Anscombe%27s_quartet
the data...
8. 8 Secur i ty. Analyt ics . Ins ight .
Why Visualization?
http://en.wikipedia.org/wiki/Anscombe%27s_quartet
9. 9 Secur i ty. Analyt ics . Ins ight .
Visualize Me Lots (>1TB) of Data
10. 9 Secur i ty. Analyt ics . Ins ight .
Visualize Me Lots (>1TB) of Data
11. 9 Secur i ty. Analyt ics . Ins ight .
Visualize Me Lots (>1TB) of Data
SecViz is Hard!
12. ?
10 Secur i ty. Analyt ics . Ins ight .
It’s Hard - Understanding Data
• We don’t understand the data / logs
• Single log entry:
Mar 16 08:09:48 kernel: [0.00000] Normal 1048576 -> 1048576
• Absence of logs? Logging configuration?
• Collection of logs
• Understanding context (setup, business processes)
• Is this normal?
2011-07-22 20:34:51 282 ce6de14af68ce198 - - - OBSERVED
"unavailable" http://www.surfjunky.com/members/sj-a.php?
r=44864 200 TCP_NC_MISS GET text/html http www.surfjunky.com
80 /members/sj-a.php ?r=66556 php "Mozilla/5.0 (Windows NT
6.1; WOW64) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/
11.0.696.65 Safari/534.24" 82.137.200.42 1395 663 -
13. Situational Awareness
11 Secur i ty. Analyt ics . Ins ight .
It’s Hard - The Right Data
Security Monitoring
Data Exfiltration
‣ DNS traffic
Fraud
‣ HTTP header sequences
‣ Application logs
‣ DB logs
‣ context feeds!
‣ Application logs
‣ DLP
‣ Proxies
Phishing et al.
‣ email logs
‣ Are we focusing on the right data sources?
‣ Everyone focuses on
‣Traffic flows
‣ IDS data
Zero Days
Botnet / Malware infections
14. It’s Hard - Mapping the Data
Oct 13 20:00:05.680894 rule 57/0(match): pass in on xl1: 195.141.69.45.1030 >
217.12.4.104.53: 7040 [1au] A? mx1.mail.yahoo.com. (47) (DF)
1. Understand all elements
2. Which fields are important?
3. Do we need more context?
4. What do we want to see?
- Time-behavior?
- Relationships?
5. How much data do we have? What graph will scale to that?
12 Secur i ty. Analyt ics . Ins ight .
15. Visualize 1TB of Data - What Graph?
13 Secur i ty. Analyt ics . Ins ight .
drop reject NONE ctl accept DNS Update Failed
Log In
IP Fragments
Max Flows Initiated
Packet Flood
UDP Flood
Aggressive Aging
Bootp
Renew
Log Out
Release
NACK
Conflict
DNS Update Successful
DNS record not deleted
DNS Update Request
Port Flood
1 10000 100000000
How much information does each of the graphs convey?
16. 14 Secur i ty. Analyt ics . Ins ight .
It Is Hard - IP Addresses
FOCUS
Info-Viz =
Sec-Viz =
18. 16 Secur i ty. Analyt ics . Ins ight .
Data Visualization Workflow
Overview Zoom / Filter Details on Demand
19. 16 Secur i ty. Analyt ics . Ins ight .
Data Visualization Workflow
Overview Zoom / Filter Details on Demand
20. 16 Secur i ty. Analyt ics . Ins ight .
Data Visualization Workflow
Overview Zoom / Filter Details on Demand
21. Overview - The Heatmap
Matrix A, where aij are integer values mapped to a color scale.
17 Secur i ty. Analyt ics . Ins ight .
aij = 1 10 20 30 40 50 60 70 80 >90
42
rows
columns
22. Mapping Log Records to Heatmaps
May 5 23:57:50 pixl-ram sudo: pam_unix(sudo:session):
session opened for user root by ram(uid=0)
t .. time bin time
18 Secur i ty. Analyt ics . Ins ight .
root
ram
peg
sue
}Δ
23. Mapping Log Records to Heatmaps
May 5 23:57:50 pixl-ram sudo: pam_unix(sudo:session):
session opened for user root by ram(uid=0)
t .. time bin time
18 Secur i ty. Analyt ics . Ins ight .
root
ram
peg
sue
}Δ
24. Mapping Log Records to Heatmaps
May 5 23:57:50 pixl-ram sudo: pam_unix(sudo:session):
session opened for user root by ram(uid=0)
18 Secur i ty. Analyt ics . Ins ight .
root
ram
peg
sue
}Δ
⨍()=+1
t .. time bin
time
25. • Scales well to a lot of data (can aggregate ad infinitum)
• Shows more information than a bar chart
• Flexible ‘measure’ mapping
• frequency count
• sum(variable) [avg(), stddev(), …]
• distinct count(variable)
19 Secur i ty. Analyt ics . Ins ight .
Why Heatmaps?
26. • Scales well to a lot of data (can aggregate ad infinitum)
• Shows more information than a bar chart
• Flexible ‘measure’ mapping
• frequency count
• sum(variable) [avg(), stddev(), …]
• distinct count(variable)
19 Secur i ty. Analyt ics . Ins ight .
Why Heatmaps?
• BUT information content is limited!
• Aggregates too highly in time and potentially value dimensions
27. random row order
20 Secur i ty. Analyt ics . Ins ight .
HeatMap Challenges - Sorting
• Random
• Alphabetically
• Based on values
• Similarity
• What algorithm?
• What distance metric?
• Leverage third data field / context?
rows clustered
user
28. What’s the HeatMap Not Good At
21 Secur i ty. Analyt ics . Ins ight .
• Showing relationships
-> link graphs
• Showing multiple dimensions and their inter-relatedness
-> || coords
29. color = Port
22 Secur i ty. Analyt ics . Ins ight .
Graphs
SourceIP DestIP
30. 23 Secur i ty. Analyt ics . Ins ight .
Graphs To Show Relationships
31. destIP
URL
user
destIP
user
sourceport
destIP
user
24 Secur i ty. Analyt ics . Ins ight .
Some Graph Challenges
• How to map data to graph
• Don’t scale to few hundred (thousand) nodes
• What layout algorithm to chose?
• Node placement should be semantically motivated
• Graph metrics don’t mean anything in security (centrality, etc.)
• Analytics needs
• interactive features
• linked views
• Analytics is not a linear process
source event destination destport
sourceIP
action
destPort
32. 25 Secur i ty. Analyt ics . Ins ight .
Backend Challenges
Different backend technologies (big data)
• Key-value store
• Search engine
• GraphDB
• RDBMS
• Columnar - can answer analytical questions
• Hadoop (Map Reduce)
• good for operations on ALL data
Other things to consider:
• Caching
• Joins
34. 27 Secur i ty. Analyt ics . Ins ight .
Vincent
Th i s heatmap s hows
behavior over time.
In this case, we see activity
per user. We can see that
‘vincent’ is visually different
from all of the other users.
He shows up very lightly
over the ent i re t ime
period. This seems to be
something to look into.
Purely visual, without
understanding the data
were we able to find this.
35. Security. Analytics. Insight.
Attribution
Authentication Events: users over time
Who is behind these scans?
Challenges
• Finding meaningful patterns
Graph credit: Tye Wells
41. Intra-Role Anomaly - Sorted by User Role
Administrator
Sales
Development
Finance
33 Secur i ty. Analyt ics . Ins ight .
42. Intra-Role Anomaly - Sorted by User Role
Administrator
Admin???
Sales
Development
Finance
33 Secur i ty. Analyt ics . Ins ight .
43. 34 Secur i ty. Analyt ics . Ins ight .
Graphs - A Story
44. 34 Secur i ty. Analyt ics . Ins ight .
Graphs - A Story
45. 34 Secur i ty. Analyt ics . Ins ight .
Graphs - A Story
This looks interesting
• What is it?
• Green -> Port 53
• Only port 53?
• What IPs?
• What’s the time behavior?
The graph doesn’t answer these
questions
46. 35 Secur i ty. Analyt ics . Ins ight .
Graphs - A Story
• Adding a port
histogram
• Select DNS traffic
and see if other
ports light up.
47. 36 Secur i ty. Analyt ics . Ins ight .
DNS Traffic - A Closer Look
Linked Views
- Histograms for
Source
Port (Source)
Destination
- ||-coord
53. 40 Secur i ty. Analyt ics . Ins ight .
Firewall Time Behavior
source
10.0.0.1
10.0.0.2
10.0.0.3
10.0.0.4
54. 40 Secur i ty. Analyt ics . Ins ight .
Firewall Time Behavior
source
10.0.0.1
10.0.0.2
10.0.0.3
10.0.0.4
block &
pass
color mapping: pass block
55. 40 Secur i ty. Analyt ics . Ins ight .
Firewall Time Behavior
}Δ
t .. time bin - aggregation
source
10.0.0.1
10.0.0.2
10.0.0.3
10.0.0.4
block &
pass
color mapping: pass block
56. High Frequency Sources Over Time
block &
pass
pass block
41 Secur i ty. Analyt ics . Ins ight .
57. 42 Secur i ty. Analyt ics . Ins ight .
High Frequency Traffic Split Up
inbound outbound
192.168.0.201
195.141.69.42
195.141.69.43
195.141.69.44
195.141.69.45
195.141.69.46
212.254.110.100
212.254.110.101
212.254.110.107
212.254.110.108
212.254.110.109
212.254.110.110
212.254.110.98
212.254.110.99
62.245.245.139
58. Outbound Traffic - Some Questions To Ask
• What happened mid-way through?
• Why is anything outbound blocked?
• What are the top and bottom machines doing?
• Did we get a new machine into the network?
• Some machines went away?
43 Secur i ty. Analyt ics . Ins ight .
59. Outbound Traffic - Some Questions To Ask
• What happened mid-way through?
• Why is anything outbound blocked?
• What are the top and bottom machines doing?
• Did we get a new machine into the network?
• Some machines went away?
43 Secur i ty. Analyt ics . Ins ight .
195.141.69.42
60. 44 Secur i ty. Analyt ics . Ins ight .
195.141.69.42 - Interactions
action
port
dest
61. 44 Secur i ty. Analyt ics . Ins ight .
195.141.69.42 - Interactions
action
port
dest
62. 44 Secur i ty. Analyt ics . Ins ight .
195.141.69.42 - Interactions
action
port
dest
63. Inbound - Zooming in on Top Rows
45 Secur i ty. Analyt ics . Ins ight .
rows 0,300
64. Inbound - Zooming in on Top Rows
45 Secur i ty. Analyt ics . Ins ight .
rows 0,300
rows 200,260
65. 46 Secur i ty. Analyt ics . Ins ight .
Zooming in on Top Rows
• Hardly any pass-block
66. 46 Secur i ty. Analyt ics . Ins ight .
Zooming in on Top Rows
212.254.110.100
212.254.110.101
212.254.110.102
212.254.110.103
212.254.110.104
212.254.110.105
212.254.110.106
212.254.110.107
212.254.110.108
212.254.110.109
212.254.110.110
212.254.110.111
212.254.110.112
212.254.110.113
212.254.110.114
212.254.110.115
212.254.110.116
212.254.110.117
212.254.110.118
212.254.110.119
212.254.110.120
212.254.110.121
212.254.110.122
212.254.110.123
212.254.110.124
212.254.110.125
212.254.110.126
212.254.110.127
212.254.110.66
212.254.110.96
212.254.110.97
212.254.110.98
212.254.110.99
• Hardly any pass-block
67. 46 Secur i ty. Analyt ics . Ins ight .
Zooming in on Top Rows
212.254.110.100
212.254.110.101
212.254.110.102
212.254.110.103
212.254.110.104
212.254.110.105
212.254.110.106
212.254.110.107
212.254.110.108
212.254.110.109
212.254.110.110
212.254.110.111
212.254.110.112
212.254.110.113
212.254.110.114
212.254.110.115
212.254.110.116
212.254.110.117
212.254.110.118
212.254.110.119
212.254.110.120
212.254.110.121
212.254.110.122
212.254.110.123
212.254.110.124
212.254.110.125
212.254.110.126
212.254.110.127
212.254.110.66
212.254.110.96
212.254.110.97
212.254.110.98
212.254.110.99
• Hardly any pass-block
Oct 22 14:20:08.351202 rule 237/0(match): block in on xl0: 66.220.17.151.80 >
212.254.110.103.1881: S 1451746674:1451746678(4) ack 1137377281 win 16384 (DF)
ao.lop.com: 66.220.17.151 - Spyware Gang (LOP)
http://www.freedomlist.com/forum/viewtopic.php?t=15724
68. 46 Secur i ty. Analyt ics . Ins ight .
Zooming in on Top Rows
212.254.110.100
212.254.110.101
212.254.110.102
212.254.110.103
212.254.110.104
212.254.110.105
212.254.110.106
212.254.110.107
212.254.110.108
212.254.110.109
212.254.110.110
212.254.110.111
212.254.110.112
212.254.110.113
212.254.110.114
212.254.110.115
212.254.110.116
212.254.110.117
212.254.110.118
212.254.110.119
212.254.110.120
212.254.110.121
212.254.110.122
212.254.110.123
212.254.110.124
212.254.110.125
212.254.110.126
212.254.110.127
212.254.110.66
212.254.110.96
212.254.110.97
212.254.110.98
212.254.110.99
• Hardly any pass-block
69. 46 Secur i ty. Analyt ics . Ins ight .
Zooming in on Top Rows
212.254.110.100
212.254.110.101
212.254.110.102
212.254.110.103
212.254.110.104
212.254.110.105
212.254.110.106
212.254.110.107
212.254.110.108
212.254.110.109
212.254.110.110
212.254.110.111
212.254.110.112
212.254.110.113
212.254.110.114
212.254.110.115
212.254.110.116
212.254.110.117
212.254.110.118
212.254.110.119
212.254.110.120
212.254.110.121
212.254.110.122
212.254.110.123
212.254.110.124
212.254.110.125
212.254.110.126
212.254.110.127
212.254.110.66
212.254.110.96
212.254.110.97
212.254.110.98
212.254.110.99
• Hardly any pass-block
70. 46 Secur i ty. Analyt ics . Ins ight .
Zooming in on Top Rows
212.254.110.100
212.254.110.101
212.254.110.102
212.254.110.103
212.254.110.104
212.254.110.105
212.254.110.106
212.254.110.107
212.254.110.108
212.254.110.109
212.254.110.110
212.254.110.111
212.254.110.112
212.254.110.113
212.254.110.114
212.254.110.115
212.254.110.116
212.254.110.117
212.254.110.118
212.254.110.119
212.254.110.120
212.254.110.121
212.254.110.122
212.254.110.123
212.254.110.124
212.254.110.125
212.254.110.126
212.254.110.127
212.254.110.66
212.254.110.96
212.254.110.97
212.254.110.98
212.254.110.99
• Hardly any pass-block
212.254.110.102
Oct 16 13:14:05.627835 rule 0/0(match): pass in on xl0: 66.220.17.151.80 >
212.254.110.102.1977: S 1841864015:1841864019(4) ack 1308753921 win 16384 (DF)
pass in log quick on $ext from any to $honey
71. 47 Secur i ty. Analyt ics . Ins ight .
This Guy Sure Keeps Busy
212.254.144.40
72. 47 Secur i ty. Analyt ics . Ins ight .
This Guy Sure Keeps Busy
212.254.144.40
dest port
73. 48 Secur i ty. Analyt ics . Ins ight .
Recap
• Attackers are very successful
• Data can reveal adversaries
• We have a big data analytics problem
• We need the right analytics and visualizations
• Security visualization is hard
• Data visualization workflow is a promising approach
• Analytics is not a linear process