Raffael Marty gave a presentation on big data visualization. He discussed using visualization to discover patterns in large datasets and presenting security information on dashboards. Effective dashboards provide context, highlight important comparisons and metrics, and use aesthetically pleasing designs. Integration with security information management systems requires parsing and formatting data and providing interfaces for querying and analysis. Marty is working on tools for big data analytics, custom visualization workflows, and hunting for anomalies. He invited attendees to join an online community for discussing security visualization.
2. Security. Analytics. Insight.2
• Visualization
• Design Principles
• Dashboards
• SOC Dashboard
• Data Discovery and Exploration
• Data Requirements for Visualization
• Big Data Lake
Overview
10. Security. Analytics. Insight.10
• Objective: Find attackers in the network moving laterally
• Defines data needed (netflow, sflow, …)
• maybe restrict to a network segment
• Audience: security analyst, risk team, …
• Informs how to visualize / present data
For Example - Lateral Movement
Recon Weaponize Deliver Exploit Install C2 Act
11. Security. Analytics. Insight.11
• Show comparisons, contrasts,
differences
• Show causality, mechanism,
explanation, systematic structure.
• Show multivariate data; that is,
show more than 1 or 2 variables.
by Edward Tufte
Principals of Analytic Design
16. Security. Analytics. Insight.16
Additional information about
objects, such as:
• machine
• roles
• criticality
• location
• owner
• …
• user
• roles
• office location
• …
Add Context
source destination
machine and
user context
machine role
user role
21. Security. Analytics. Insight.21
• Audience, audience, audience!
• Comprehensive Information (enough context)
• Highlight important data
• Use graphics when appropriate
• Good choice of graphics and design
• Aesthetically pleasing
• Enough information to decide if action is necessary
• No scrolling
• Real-time vs. batch? (Refresh-rates)
• Clear organization
Dashboard Design Principles
24. Security. Analytics. Insight.24
• Disappears too quickly
• Analysts focus is on their own screens
• SOC dashboard just distracts
• Detailed information not legible
• Put the detailed dashboards on the analysts screens!
Dashboards For Discovery
25. Security. Analytics. Insight.25
• Provide analyst with context
• “What else is going on in the environment right now?”
• Bring Into Focus
• Turn something benign into something interesting
• Disprove
• Turn something interesting into something benign
Use SOC Dashboard For Context
Environment informs detection policies
27. Security. Analytics. Insight.27
• News feed summary (FS ISAC feeds, mailinglists, threat feeds)
• Monitoring twitter or IRC for certain activity / keywords
• Volumes or metrics (e.g., #firewall blocks, #IDS alerts, #failed transactions)
• Top N metrics:
• Top 10 suspicious users
• Top 10 servers connecting outbound
What To Put on Screens
Provide context to individual security alerts
http://raffy.ch/blog/2015/01/15/dashboards-in-the-security-opartions-center-soc/
30. Security. Analytics. Insight.30
Information Visualization Mantra
Overview Zoom / Filter Details on Demand
Principle by Ben Shneiderman
• summary / aggregation
• data mining
• signal detection (IDS, behavioral, etc.)
31. Security. Analytics. Insight.31
• Access to data
• Parsed data and data context
• Data architecture for central data access and fast queries
• Application of data mining (how?, what?, scalable, …)
• Visualization tools that support
• Complex visual types (||-coordinates, treemaps,
heat maps, link graphs)
• Linked views
• Data mining (clustering, …)
• Collaboration, information sharing
• Visual analytics workflow
Visualization Challenges
33. Security. Analytics. Insight.33
• One central location to store all cyber security data
• “Data collected only once and third party software leveraging it”
• Scalability and interoperability
• More than deploying an off the shelf product from a vendor
• Data use influences both data formats and technologies to store the data
• search, analytics, relationships, and distributed processing
• correlation, and statistical summarization
• What to do with Context? Enrich or join?
• Hard problems:
• Parsing: can you re-parse? Common naming scheme!
• Data store capabilities (search, analytics, distributed processing, etc.)
• Access to data: SQL (even in Hadoop context), how can products access the data?
The Big Data Lake
34. Security. Analytics. Insight.34
Federated Data Access
SIEM
dispatcher
SIEM
connector
SIEM console
Prod A
AD / LDAP
HR
…
IDS
FW
Prod B
DBs
Data Lake
Caveats:
• Dispatcher?
• Standard access to dispatcher /
products enabled
• Data lake technology?
SNMP
35. Security. Analytics. Insight.35
Multiple Data Stores
raw logs
key-value
structured
real-time
processing
(un)-structured data
context
SQL
s
t
o
r
a
g
e
stats
index
queue
distributed
processing
a
c
c
e
s
s
graph
Caveat:
• Need multiple types of
data stores
36. Security. Analytics. Insight.36
Technologies (Example)
raw logs
key-value
(Cassandra)
columnar
(parquet)
real-time
processing
(Spark)
(un)-structured data
context
SQL
(Impala,
SparkSQL)
H
D
F
S
aggregates
index
(ES)
queue
(Kafka)
distributed
processing
(Spark)
a
c
c
e
s
s
graph
(GraphX)
Caveat:
• No out of the box
solution available
37. Security. Analytics. Insight.37
SIEM Integration - Log Management First
SIEM
columnar
or
search engine
or
log management
processing
SIEM
connector
raw logs
SIEM console
SQL or search
interface
processing
filtering
H
D
F
S
e.g., PIG parsing
38. Security. Analytics. Insight.38
Simple SIEM Integration
raw, csv, json
flume
log data
SQL
(Impala,
with SerDe)
H
D
F
S
SIEM
connector
SIEM
Requirement:
• SIEM connector to forward text-
based data to Flume.
SQL interface
Tableau, etc.
SIEM console
39. Security. Analytics. Insight.39
SIEM Integration - Advanced
SIEM
columnar
(parquet)
processing
syslog data
SQL
(Impala,
SparkSQL)
H
D
F
S
index
(ES)
queue
(Kafka)
a
c
c
e
s
s
other data
sources
SIEM
connector
raw logs
SIEM console
SQL and search
interface
Tableau, Kibana, etc.
requires parsing and
formatting in a SIEM
readable format (e.g., CEF)
40. Security. Analytics. Insight.40
What I am Working On
Data Stores Analytics Forensics Models Admin
10.9.79.109 --> 3.16.204.150
10.8.24.80 --> 192.168.148.193
10.8.50.85 --> 192.168.148.193
10.8.48.128 --> 192.168.148.193
10.9.79.6 --> 192.168.148.193
10.9.79.6
10.8.48.128
80
53
8.8.8.8
127.0.0.1
Anomalies
Decomposition
Data
Seasonal
Trend
Anomaly Details
“Hunt” ExplainVisual Search
• Big data backend
• Own visualization engine (Web-based)
• Visualization workflows
41. Security. Analytics. Insight.41
BlackHat Workshop
Visual Analytics -
Delivering Actionable Security
Intelligence
August 1-6 2015, Las Vegas, USA
big data | analytics | visualization