Raffael Marty, CEO
Big Data Visualization
London
February, 2015
Security. Analytics. Insight.2
• Visualization
• Design Principles
• Dashboards
• SOC Dashboard
• Data Discovery and Exploration
• Data Requirements for Visualization
• Big Data Lake
Overview
Security. Analytics. Insight.3
I am Raffy - I do Viz!
IBM Research
4
Visualization
Security. Analytics. Insight.5
Why Visualization?
the stats ...
http://en.wikipedia.org/wiki/Anscombe%27s_quartet
the data...
Security. Analytics. Insight.6
Why Visualization?
http://en.wikipedia.org/wiki/Anscombe%27s_quartet
Human analyst:
• patterndetection
• remembers context
• fantasticintuition
• canpredict
Security. Analytics. Insight.7
Visualization To …
Present / Communicate Discover / Explore
Design Principles
Security. Analytics. Insight.9
Choosing Visualizations
Objective AudienceData
Security. Analytics. Insight.10
• Objective: Find attackers in the network moving laterally
• Defines data needed (netflow, sflow, …)
• maybe restrict to a network segment
• Audience: security analyst, risk team, …
• Informs how to visualize / present data
For Example - Lateral Movement
Recon Weaponize Deliver Exploit Install C2 Act
Security. Analytics. Insight.11
• Show  comparisons, contrasts,
differences
• Show  causality, mechanism,
explanation, systematic structure.
• Show  multivariate data; that is,
show more than 1 or 2 variables.
by Edward Tufte
Principals of Analytic Design
Security. Analytics. Insight.12
Show Context
42
Security. Analytics. Insight.
42
is just a number
and means nothing without
context
13
Show Context
Security. Analytics. Insight.15
Use Numbers To Highlight Most Important Parts of Data
Numbers
Summaries
Security. Analytics. Insight.16
Additional information about
objects, such as:
• machine
• roles
• criticality
• location
• owner
• …
• user
• roles
• office location
• …
Add Context
source destination
machine and 

user context
machine role
user role
Security. Analytics. Insight.17
Traffic Flow Analysis With Context
Security. Analytics. Insight.18
http://www.scifiinterfaces.com/
• Black background
• Blue or green colors
• Glow
Aesthetics Matter
Security. Analytics. Insight.19
B O R I N G
Security. Analytics. Insight.20
Sexier
Security. Analytics. Insight.21
• Audience, audience, audience!
• Comprehensive Information (enough context)
• Highlight important data
• Use graphics when appropriate
• Good choice of graphics and design
• Aesthetically pleasing
• Enough information to decide if action is necessary
• No scrolling
• Real-time vs. batch? (Refresh-rates)
• Clear organization
Dashboard Design Principles
22
SOC Dashboards
Security. Analytics. Insight.23
Mostly Blank
Security. Analytics. Insight.24
• Disappears too quickly
• Analysts focus is on their own screens
• SOC dashboard just distracts
• Detailed information not legible
• Put the detailed dashboards on the analysts screens!
Dashboards For Discovery
Security. Analytics. Insight.25
• Provide analyst with context
• “What else is going on in the environment right now?”
• Bring Into Focus
• Turn something benign into something interesting
• Disprove
• Turn something interesting into something benign
Use SOC Dashboard For Context
Environment informs detection policies
Security. Analytics. Insight.26
Show Comparisons
Current Measure
week prior
Security. Analytics. Insight.27
• News feed summary (FS ISAC feeds, mailinglists, threat feeds)
• Monitoring twitter or IRC for certain activity / keywords
• Volumes or metrics (e.g., #firewall blocks, #IDS alerts, #failed transactions)
• Top N metrics:
• Top 10 suspicious users
• Top 10 servers connecting outbound
What To Put on Screens
Provide context to individual security alerts
http://raffy.ch/blog/2015/01/15/dashboards-in-the-security-opartions-center-soc/
28
Data Discovery &
Exploration
Security. Analytics. Insight.29
Visualize Me Lots (>1TB) of Data
Security. Analytics. Insight.30
Information Visualization Mantra
Overview Zoom / Filter Details on Demand
Principle by Ben Shneiderman
• summary / aggregation
• data mining
• signal detection (IDS, behavioral, etc.)
Security. Analytics. Insight.31
• Access to data
• Parsed data and data context
• Data architecture for central data access and fast queries
• Application of data mining (how?, what?, scalable, …)
• Visualization tools that support
• Complex visual types (||-coordinates, treemaps, 

heat maps, link graphs)
• Linked views
• Data mining (clustering, …)
• Collaboration, information sharing
• Visual analytics workflow
Visualization Challenges
Big Data Lake
Security. Analytics. Insight.33
• One central location to store all cyber security data
• “Data collected only once and third party software leveraging it”
• Scalability and interoperability
• More than deploying an off the shelf product from a vendor
• Data use influences both data formats and technologies to store the data
• search, analytics, relationships, and distributed processing
• correlation, and statistical summarization
• What to do with Context? Enrich or join?
• Hard problems:
• Parsing: can you re-parse? Common naming scheme!
• Data store capabilities (search, analytics, distributed processing, etc.)
• Access to data: SQL (even in Hadoop context), how can products access the data?
The Big Data Lake
Security. Analytics. Insight.34
Federated Data Access
SIEM
dispatcher
SIEM 

connector
SIEM console
Prod A
AD / LDAP
HR
…
IDS
FW
Prod B
DBs
Data Lake
Caveats:
• Dispatcher?
• Standard access to dispatcher /

products enabled
• Data lake technology?
SNMP
Security. Analytics. Insight.35
Multiple Data Stores
raw logs
key-value
structured
real-time

processing
(un)-structured data
context
SQL
s
t
o
r
a
g
e
stats
index
queue
distributed

processing
a
c
c
e
s
s
graph
Caveat:
• Need multiple types of 

data stores
Security. Analytics. Insight.36
Technologies (Example)
raw logs
key-value
(Cassandra)
columnar
(parquet)
real-time

processing
(Spark)
(un)-structured data
context
SQL
(Impala,
SparkSQL)
H
D
F
S
aggregates
index
(ES)
queue
(Kafka)
distributed

processing
(Spark)
a
c
c
e
s
s
graph
(GraphX)
Caveat:
• No out of the box
solution available
Security. Analytics. Insight.37
SIEM Integration - Log Management First
SIEM
columnar
or
search engine

or
log management
processing
SIEM 

connector
raw logs
SIEM console
SQL or search

interface
processing
filtering
H
D
F
S
e.g., PIG parsing
Security. Analytics. Insight.38
Simple SIEM Integration
raw, csv, json
flume
log data
SQL
(Impala,
with SerDe)
H
D
F
S
SIEM 

connector
SIEM
Requirement:
• SIEM connector to forward text-
based data to Flume.
SQL interface
Tableau, etc.
SIEM console
Security. Analytics. Insight.39
SIEM Integration - Advanced
SIEM
columnar
(parquet)
processing
syslog data
SQL
(Impala,
SparkSQL)
H
D
F
S
index
(ES)
queue
(Kafka)
a
c
c
e
s
s
other data
sources
SIEM 

connector
raw logs
SIEM console
SQL and search 

interface
Tableau, Kibana, etc.
requires parsing and
formatting in a SIEM
readable format (e.g., CEF)
Security. Analytics. Insight.40
What I am Working On
Data Stores Analytics Forensics Models Admin
10.9.79.109 --> 3.16.204.150
10.8.24.80 --> 192.168.148.193
10.8.50.85 --> 192.168.148.193
10.8.48.128 --> 192.168.148.193
10.9.79.6 --> 192.168.148.193
10.9.79.6
10.8.48.128
80
53
8.8.8.8
127.0.0.1
Anomalies
Decomposition
Data
Seasonal
Trend
Anomaly Details
“Hunt” ExplainVisual Search
• Big data backend
• Own visualization engine (Web-based)
• Visualization workflows
Security. Analytics. Insight.41
BlackHat Workshop
Visual Analytics -
Delivering Actionable Security
Intelligence
August 1-6 2015, Las Vegas, USA
big data | analytics | visualization
Security. Analytics. Insight.42
http://secviz.org
List: secviz.org/mailinglist
Twitter: @secviz
Share, discuss, challenge, and learn about security visualization.
Security Visualization Community
Security. Analytics. Insight.
raffael.marty@pixlcloud.com
http://slideshare.net/zrlram
http://secviz.org and @secviz
Further resources:

Big Data Visualization

  • 1.
    Raffael Marty, CEO BigData Visualization London February, 2015
  • 2.
    Security. Analytics. Insight.2 •Visualization • Design Principles • Dashboards • SOC Dashboard • Data Discovery and Exploration • Data Requirements for Visualization • Big Data Lake Overview
  • 3.
    Security. Analytics. Insight.3 Iam Raffy - I do Viz! IBM Research
  • 4.
  • 5.
    Security. Analytics. Insight.5 WhyVisualization? the stats ... http://en.wikipedia.org/wiki/Anscombe%27s_quartet the data...
  • 6.
    Security. Analytics. Insight.6 WhyVisualization? http://en.wikipedia.org/wiki/Anscombe%27s_quartet Human analyst: • patterndetection • remembers context • fantasticintuition • canpredict
  • 7.
    Security. Analytics. Insight.7 VisualizationTo … Present / Communicate Discover / Explore
  • 8.
  • 9.
    Security. Analytics. Insight.9 ChoosingVisualizations Objective AudienceData
  • 10.
    Security. Analytics. Insight.10 •Objective: Find attackers in the network moving laterally • Defines data needed (netflow, sflow, …) • maybe restrict to a network segment • Audience: security analyst, risk team, … • Informs how to visualize / present data For Example - Lateral Movement Recon Weaponize Deliver Exploit Install C2 Act
  • 11.
    Security. Analytics. Insight.11 •Show  comparisons, contrasts, differences • Show  causality, mechanism, explanation, systematic structure. • Show  multivariate data; that is, show more than 1 or 2 variables. by Edward Tufte Principals of Analytic Design
  • 12.
  • 13.
    Security. Analytics. Insight. 42 isjust a number and means nothing without context 13 Show Context
  • 15.
    Security. Analytics. Insight.15 UseNumbers To Highlight Most Important Parts of Data Numbers Summaries
  • 16.
    Security. Analytics. Insight.16 Additionalinformation about objects, such as: • machine • roles • criticality • location • owner • … • user • roles • office location • … Add Context source destination machine and 
 user context machine role user role
  • 17.
    Security. Analytics. Insight.17 TrafficFlow Analysis With Context
  • 18.
    Security. Analytics. Insight.18 http://www.scifiinterfaces.com/ •Black background • Blue or green colors • Glow Aesthetics Matter
  • 19.
  • 20.
  • 21.
    Security. Analytics. Insight.21 •Audience, audience, audience! • Comprehensive Information (enough context) • Highlight important data • Use graphics when appropriate • Good choice of graphics and design • Aesthetically pleasing • Enough information to decide if action is necessary • No scrolling • Real-time vs. batch? (Refresh-rates) • Clear organization Dashboard Design Principles
  • 22.
  • 23.
  • 24.
    Security. Analytics. Insight.24 •Disappears too quickly • Analysts focus is on their own screens • SOC dashboard just distracts • Detailed information not legible • Put the detailed dashboards on the analysts screens! Dashboards For Discovery
  • 25.
    Security. Analytics. Insight.25 •Provide analyst with context • “What else is going on in the environment right now?” • Bring Into Focus • Turn something benign into something interesting • Disprove • Turn something interesting into something benign Use SOC Dashboard For Context Environment informs detection policies
  • 26.
    Security. Analytics. Insight.26 ShowComparisons Current Measure week prior
  • 27.
    Security. Analytics. Insight.27 •News feed summary (FS ISAC feeds, mailinglists, threat feeds) • Monitoring twitter or IRC for certain activity / keywords • Volumes or metrics (e.g., #firewall blocks, #IDS alerts, #failed transactions) • Top N metrics: • Top 10 suspicious users • Top 10 servers connecting outbound What To Put on Screens Provide context to individual security alerts http://raffy.ch/blog/2015/01/15/dashboards-in-the-security-opartions-center-soc/
  • 28.
  • 29.
  • 30.
    Security. Analytics. Insight.30 InformationVisualization Mantra Overview Zoom / Filter Details on Demand Principle by Ben Shneiderman • summary / aggregation • data mining • signal detection (IDS, behavioral, etc.)
  • 31.
    Security. Analytics. Insight.31 •Access to data • Parsed data and data context • Data architecture for central data access and fast queries • Application of data mining (how?, what?, scalable, …) • Visualization tools that support • Complex visual types (||-coordinates, treemaps, 
 heat maps, link graphs) • Linked views • Data mining (clustering, …) • Collaboration, information sharing • Visual analytics workflow Visualization Challenges
  • 32.
  • 33.
    Security. Analytics. Insight.33 •One central location to store all cyber security data • “Data collected only once and third party software leveraging it” • Scalability and interoperability • More than deploying an off the shelf product from a vendor • Data use influences both data formats and technologies to store the data • search, analytics, relationships, and distributed processing • correlation, and statistical summarization • What to do with Context? Enrich or join? • Hard problems: • Parsing: can you re-parse? Common naming scheme! • Data store capabilities (search, analytics, distributed processing, etc.) • Access to data: SQL (even in Hadoop context), how can products access the data? The Big Data Lake
  • 34.
    Security. Analytics. Insight.34 FederatedData Access SIEM dispatcher SIEM 
 connector SIEM console Prod A AD / LDAP HR … IDS FW Prod B DBs Data Lake Caveats: • Dispatcher? • Standard access to dispatcher /
 products enabled • Data lake technology? SNMP
  • 35.
    Security. Analytics. Insight.35 MultipleData Stores raw logs key-value structured real-time
 processing (un)-structured data context SQL s t o r a g e stats index queue distributed
 processing a c c e s s graph Caveat: • Need multiple types of 
 data stores
  • 36.
    Security. Analytics. Insight.36 Technologies(Example) raw logs key-value (Cassandra) columnar (parquet) real-time
 processing (Spark) (un)-structured data context SQL (Impala, SparkSQL) H D F S aggregates index (ES) queue (Kafka) distributed
 processing (Spark) a c c e s s graph (GraphX) Caveat: • No out of the box solution available
  • 37.
    Security. Analytics. Insight.37 SIEMIntegration - Log Management First SIEM columnar or search engine
 or log management processing SIEM 
 connector raw logs SIEM console SQL or search
 interface processing filtering H D F S e.g., PIG parsing
  • 38.
    Security. Analytics. Insight.38 SimpleSIEM Integration raw, csv, json flume log data SQL (Impala, with SerDe) H D F S SIEM 
 connector SIEM Requirement: • SIEM connector to forward text- based data to Flume. SQL interface Tableau, etc. SIEM console
  • 39.
    Security. Analytics. Insight.39 SIEMIntegration - Advanced SIEM columnar (parquet) processing syslog data SQL (Impala, SparkSQL) H D F S index (ES) queue (Kafka) a c c e s s other data sources SIEM 
 connector raw logs SIEM console SQL and search 
 interface Tableau, Kibana, etc. requires parsing and formatting in a SIEM readable format (e.g., CEF)
  • 40.
    Security. Analytics. Insight.40 WhatI am Working On Data Stores Analytics Forensics Models Admin 10.9.79.109 --> 3.16.204.150 10.8.24.80 --> 192.168.148.193 10.8.50.85 --> 192.168.148.193 10.8.48.128 --> 192.168.148.193 10.9.79.6 --> 192.168.148.193 10.9.79.6 10.8.48.128 80 53 8.8.8.8 127.0.0.1 Anomalies Decomposition Data Seasonal Trend Anomaly Details “Hunt” ExplainVisual Search • Big data backend • Own visualization engine (Web-based) • Visualization workflows
  • 41.
    Security. Analytics. Insight.41 BlackHatWorkshop Visual Analytics - Delivering Actionable Security Intelligence August 1-6 2015, Las Vegas, USA big data | analytics | visualization
  • 42.
    Security. Analytics. Insight.42 http://secviz.org List:secviz.org/mailinglist Twitter: @secviz Share, discuss, challenge, and learn about security visualization. Security Visualization Community
  • 43.