Mining Your Logs - Gaining Insight Through Visualization

Mining Your Logs
Gaining Insight Through Visualization

Raffael Marty - @zrlram
Google TechTalk March 2011

Raffael Marty
• Founder @
• Chief Security Strategist and Product Manager @ Splunk
• Manager Solutions @ ArcSight
• Intrusion Detection Research @ IBM Research
• IT Security Consultant @ PriceWaterhouse Coopers

Applied Security Visualization
Publisher: Addison Wesley (August, 2008)
ISBN: 0321510100

Logging as a Service 2 © by Raffael Marty

Agenda

•Log Analysis •Future Needs

•History •Data Visualization

•Log Architectures •Visualization Concepts

•What’s Working and •Security Visualization
What’s Not? Use-Cases


Log Analysis
10.0.20.9 - - [22/Mar/2011:10:00:52 +0000] "GET /admin/customer/customer/612/
HTTP/1.1" 200 2261 "https://logdog.loggly.org/admin/customer/customer/"
"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-us) AppleWebKit/
533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27"
TYhzVH8AAAEAAGOkBOQAAADA 655268

2010-12-28T18:12:10.031+00:00 frontend2-raffy
syslog-ng[19600]: syslog-ng starting up;
version='3.1.1'

2011-01-10T21:27:04.820+00:00 frontend2-raffy
kernel: : [ 664.107313] blocked inbound
IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:d8:30:62:5f:
6a:a3:08:00 SRC=10.0.20.109 DST=10.0.20.255
LEN=180 TOS=0x00 PREC=0x00 TTL=64 ID=126
PROTO=UDP SPT=17500 DPT=17500 LEN=160


History
• 1980 Eric Allman develops syslogd(8)
• 1996 Intellitactics
• 1997 Tivoli Risk Manager developed by IBM Research
in Zurich (later Zurich Correlation Engine, ZCE)
• 1999 - 2010 A number of log management / SIEM
players enter the market (software, appliances)
• 2000 ArcSight - 2010 sold for $1.65bn to HP
• 2009 Loggly (logging as a service)

History - The Other View
• Network management (SNMP)
• IDS false positive reduction
• Security monitoring (multiple data sources)
• Unification of NOC and SOC (failed?)
• Application monitoring (moving up the stack)
- original tools failed due to architectural constraints
- new approaches have been presented


Log Management Today

Where are you?
Logging as a Service © by Raffael Marty

Log Management Today

less tools
DIY Log Management CEP and SIEM Advanced Analytics
•grep •Open source •Open source •Not log speciﬁc!
•Perl •Commercial •Commercial
•SQL MapReduce
•Open source

Logging as a Service © by Raffael Marty

Open Source Tools
• graylog2 • lire • MS Logparser
• logstash • LogSurfer • Sguil
• swatch • SEC • Octopussy
• tenshi • LogHound • Sagan
• logwatch • slct
• OSSEC • log2timeline
• snare • logzilla
• lasso • OSSIM this list is likely incomplete!


Commercial Tools

this list is likely incomplete!

pixlcloud | Visualization in the Cloud 10 © PixlCloud LLC 2011

Log Architectures

11

Log Mgmt Architecture
Storage:
- on board
- external storage array
- clusters

Collection: Processing:
- syslog - indexing
- OPSEC - context storage
- SDEE - clustering
- netﬂow
- database


Log Mgmt Architecture
raw
normalized
or raw

Collection: Processing: Data Access:
- syslog - indexing - free-text search
- OPSEC - context storage - ﬁeld-based search
- SDEE - clustering - tagging schemas
- netﬂow
- database


Agents and Connectors
• piece of code to transport logs to a central location
• features • often additional features: • special protocols:
- batch - parse - OPSEC, SDEE
- compress - normalize - Windows
- encrypt - aggregate
• file-based collection
- sign - enrichment (context)
- fail-over
• database collection

pixlcloud | Visualization in the Cloud 14 © PixlCloud LLC 2011

SIEM Architecture
asset context
raw
normalized identity
context

...
context / tagging

RDBMS


SIEM Architecture
• RDBMS schema
- Fixed number and type of fields
- New data sources with new fields?
‣ overloading
• RDBMS clusters are expensive and scale poorly
• Need a parser for every data source
• Slow historical data queries
• Hard to configure database efficiently
- because of different use-cases

SIEM Architecture Beneﬁts
• Parsed data enables
- real-time correlation
- real-time statistics
- data augmentation (context) close to source

• Unified data access language
- over a fixed set of fields

• Real-time dashboards


Search vs. SIEM
• Full-text indexing
• Parsing at search time

Example search: Example search:
denied user=rmarty
• use index to ﬁnd • use index to ﬁnd ALL
occurrences of ‘denied’ occurrences of ‘rmarty’
• apply parser to results
• remove results where
user is not rmarty


New SIEM - Hybrid Models
• Use parsers for known data sources
• Collect everything else
• Index all data and use index for search
• Correlate parsed data


Categorization and Tagging
•How do you find all failed logins across any data source?
security:538 OR “sshd authentication failure” OR “sshd failed
password” OR ...

•Does not scale
- for new data sources
- for new events of existing sources id -> object, action, status

•Define a ‘taxonomy’ for all events
•Map events into taxonomy

Content Creation
• Rules, dashboards, reports, searches can use
taxonomy:
object=authentication AND action=login AND status=success

• All failures related to files:
object=file AND status=failure
• Approach scales well
• Mixing with other fields: • Huge effort to build and
action=login AND user=rmarty maintain mappings


Logging as a Service (LaaS)
• Economically advantageous - think about TCO
• Pay as you go
• Elastic infrastructure scales with your needs
• No installation needed
• No setup costs / time for logging solution
• Open platform with RESTful APIs

Logging as a Service 22

Loggly
Data Sources Consumers
Loggly
user interface
UI extensions

mobile-166 My syslog

Data collection
Proxies API Data access

Distributed
Indexers and Search Machines indexing and
processing

Log Archive Distributed
data store

Logging as a Service 23

Tool Usage
DIY MR Log Mgmt SIEM LaaS

data known known unknown known
-
sources only a few only a few many many

analysis known exploration unknown unknown extend
use-cases one or a few large-scale many many platform

dynamic no no yes yes yes
use-cases
real-time extend
no no no yes
correlation platform

engineer engineers license license
cost hardware hardware (hardware) hardware subscription
maintenance maintenance maintenance maintenance

Should you rather do it yourself (DIY)?


What is Working and
What is not?
25

What’s Working
• Log collection
• Log centralization
• Alerting on a priori known patterns
• Solving specific, known use-cases for sets of
known data sources, e.g.,
- monitoring privileged access to financial servers
- generating compliance reports
- security forensics


What’s Not Working
• Log formats are all over and not documented
Mar 16 08:09:58 kernel: [ 0.000000] Normal 1048576 -> 1048576

• No logging guidelines / developer education
• Parsing is broken
- based on regexes
- numerous mistakes
- doesn’t scale


What’s Not Working
• Normalization is broken:
- IP to hostnames (when to do DNS lookup)
- usernames (rmarty vs. ram vs. raffy)

• Categorization / Taxonomy
- doesn’t scale - is always out of date
- is buggy - expensive

• Prioritization has no working formula
• Anomaly detection is voodoo!

What Does It Mean?
• We don’t understand our data
• Security Operations Center (SOC) monitors all
corporate data sources. Analysts
- don’t know all the applications
- don’t know all the setups
- don’t know what log records are ‘normal’ behavior

--> Need tools to enable log owners to work
with their data


We Need Better Tools
• We will have more and more data and need to deal with
larger amounts of data
- SIEM needs to support new distributed, scalable data management
technologies
• More and more application layer data
- How are we going to deal with all the parsing / entity extraction?
- We need logging standards and guidelines

• How do we help analysts understand the data?
- What is important and what is not?
- Mapping problems to business process, business risk!


Data Visualization

32

Data/Log Visualization
• Exploration and Discovery

• Answer Questions

• Communicate Information

• Support Decisions

Security Visualization
• We are nowhere!
• Visualization is an afterthought
• Sec Viz dichotomy
• Tools are lacking fundamental capabilities
• Users don’t understand data, how can
they understand visuals?


Visualization
Concepts
35

The Analysis Approach
Details on
Overview first Zoom
demand

Principle by Ben Shneiderman


Simultaneous Views


Dynamic Coloring


Linked Views


Legible / Usable Graphs

Reducing non data ink!

Choosing the Right Chart


Ode to the Pie


Careful With Interpretations


Situational Awareness
• Treemap
• Protovis.JS
• Size: Amount
• Brightness: Variance
• Color: Sensor
• Shows: Scans -
bright spots

• Thanks to Chris Horsley


Firewall Treemap


Firewall Log
Port Source IP Destination IP


IDS Sig Tuning - Treemap
Hierarchy:
Source
Destination
Signature
Number of Events
Color: Priority
Size: Number of alerts


Vulnerability Data by Host


Visualization Future
• A solution to entity extraction
• Dynamic and interactive displays
• Computer aided intelligence / visualization
- Computer supported exploration
- Highly interactive

• Expert system that captures domain knowledge
- Collaborative


http://secviz.org
Share, discuss, challenge, and learn about security
visualization.
• List: secviz.org/mailinglist
• Twitter: @secviz


Mining Your Logs - Gaining Insight Through Visualization

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Mining Your Logs - Gaining Insight Through Visualization

Similar to Mining Your Logs - Gaining Insight Through Visualization (20)

More from Raffael Marty

More from Raffael Marty (20)

Mining Your Logs - Gaining Insight Through Visualization