Big Data is the latest hype in the security industry. We will have a closer look at what big data is comprised of: Hadoop, Spark, ElasticSearch, Hive, MongoDB, etc. We will learn how to best manage security data in a small Hadoop cluster for different types of use-cases. Doing so, we will encounter a number of big-data open source tools, such as LogStash and Moloch that help with managing log files and packet captures.
As a second topic we will look at visualization and how we can leverage visualization to learn more about our data. In the hands-on part, we will use some of the big data tools, as well as a number of visualization tools to actively investigate a sample data set.
2. 3 Secur i ty. Analyt ics . Ins ight .
I am Raffy - I do Viz!
IBM Research
3. 4 Secur i ty. Analyt ics . Ins ight .
Agenda
Introduction
Data Sources
DAVIX
Log Data Processing
• Big Data Ecosystem
• Security Big Data Tools
• Managing Security Data
• Visualizing Big Data
7. 10 Secur i ty. Analyt ics . Ins ight .
Hadoop Ecosystem
Mahout
machine learning
Hive
data warehouse
HiveQL query lang
Pig
programming language
(pig latin)
HBase
big data store
rndm read and write
auto sharding
Map Reduce
Impala
interactive
SQL queries
distributed file system
data redundancy
fault-tolerance
HDFS
random, real-time read/write access
append only
namenode / datanode architecture
Zookeeper
centralized “brain”
Sentry
Storm
8. Berkeley Data Analysis Stack (BDAS)
11 Secur i ty. Analyt ics . Ins ight .
https://amplab.cs.berkeley.edu/software/
SparkSQL
9. http://elasticsearch.org
12 Secur i ty. Analyt ics . Ins ight .
Elastic Search
• Schema free & document oriented
• Simple HTTP interface
• indexes JSON documents
• Queries, aggregations, highlighting, etc.
• Distributed - super easy to add nodes
• Real-time indexing
• Based on Lucene
• Replication
• Partitioning / sharding
• how an index is assigned to nodes
• Snapshots
Up and running in 10 minutes!!
10. 13 Secur i ty. Analyt ics . Ins ight .
Elastic Search - Admin Interface
15. 18 Secur i ty. Analyt ics . Ins ight .
Storing and Indexing Logs
Raw log:
Aug 2 13:29:58 pixl-ram sshd[1631]: Accepted publickey for ram from 192.168.30.1 port 49864 ssh2
Non parsed:
{“text“: “Aug 2 13:29:58 pixl-ram sshd[1631]: Accepted publickey for ram from 192.168.30.1 port 49864 ssh2”}
Parsed (through grok in LogStash):
{“text“: “Aug 2 13:29:58 pixl-ram sshd[1631]: Accepted publickey for ram from 192.168.30.1 port 49864 ssh2”,
“time”: “Aug 2 13:29:58”, “host”: “pixl-ram”, ”process”: “sshd”, “pid”: 1631}
-> structured search: time > “Aug 1 2014”
16. 19 Secur i ty. Analyt ics . Ins ight .
Grok
• Instead of re-writing regexes
• Ships with about 100 patterns
• Patterns you don't have to write yourself
• It is easy to add new patterns
HOSTNAME b(?:[0-9A-Za-z].......!
IP (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]…!
IPORHOST (?:%{HOSTNAME}|%{IP})!
17. • Automatic schema inference
• Assigns analyzers (prefix indexing, etc.)
• Field properties:
• “store” [field and document level]
• “index”:
• “analyzed”: tokenized, analyzed
• “not_analyzed”: indexed as is
• “no”: no indexing
20 Secur i ty. Analyt ics . Ins ight .
ElasticSearch on Grokked Data
18. 21 Secur i ty. Analyt ics . Ins ight .
Grok Patterns
Pattern database located in:
/opt/logstash/patterns
!
Debug Grok rules:
http://grokdebug.herokuapp.com/
19. 22 Secur i ty. Analyt ics . Ins ight .
LogStash UI - Kibana
20. • Block POST / PUT / DELETE to ES instance
• Older versions:
script.disable_dynamic: true!
! action.destructive_requires_name: true!
• Use aliases to allow only certain users access to certain indexes
• Use iptables to block ports (9200, 9300, …)
• Performance tuning:
• https://www.loggly.com/blog/nine-tips-configuring-elasticsearch-for-high-performance/
23 Secur i ty. Analyt ics . Ins ight .
Running ElasticSearch
21. Running LogStash
For debugging:
logstash -e ‘input { … } … output { … }’ !
24 Secur i ty. Analyt ics . Ins ight .
!
Other Command line parameters:
-w <number of cores>!
--debug!
!
!
input {
stdin {
type => "stdin-type"
}
!
file {
type => "syslog-ng"
path => [ "/var/log/*.log", “/var/log/messages" ]
}
}
!
output {
stdout { }
elasticsearch{
embedded => false
host => "192.168.0.23"
cluster => "logstash-cluster"
node_name => “logstash"
protocol => “node”
}
}
Act as an ES node,
not as an unknown client
22. 25 Secur i ty. Analyt ics . Ins ight .
Running Kibana
Authentication not built in
Use nginx as a proxy
For example:
https://github.com/elasticsearch/kibana/blob/master/sample/nginx.conf
23. 26 Secur i ty. Analyt ics . Ins ight .
Moloch
Open source, large scale IPv4
packet capturing, indexing and
database system powered by elastic
search.
Web interface for PCAP browsing,
searching, reporting, and exporting
PCAPs
https://github.com/aol/moloch
24. 27 Secur i ty. Analyt ics . Ins ight .
Moloch – Components
• Capture
• Sniffs the network interface,
• Parses the traffic and creates the Session Profile Information (aka SPI-Data)
• Writes the packets to disk
!
• Database
• Elasticsearch is used for storing and searching through the SPI-Data
!
• Viewer
• A web interface that allows for GUI and API access from remote hosts
25. Moloch – Capture – SPI-Data Types
• Moloch parses various protocols to create SPI-Data:
• IP
• HTTP
• DNS
• IP Address
• Hostname
• IRC
• Channel Names
• SSH
• Client Name
• Public Key
• SSL/TLS
• Certificate elements of various types (common names, serial, etc)
!
28 Secur i ty. Analyt ics . Ins ight .
• This is not an all inclusive list
26. 34 Secur i ty. Analyt ics . Ins ight .
Moloch - Couple Additions
• Web API’s
• Access meta information
• Grab PCAPs
!
• Indexing PCAP files:
! ${moloch_dir}/bin/moloch-capture -c [config_file] -r [pcap_file]
27. 35 Secur i ty. Analyt ics . Ins ight .
PacketPig
• Analyze PCAP files using Apache Pig
• Number of scripts made available
• e.g., running SNORT on the PCAPs
!
https://github.com/bigsnarfdude/packetpig
pig -x local !
-f pig/examples/binning.pig !
-param pcap=data/web.pcap
28. Security Onion
•Bro IDS, your choice of Snort or Suricata, Sguil
analyst console, ELSA, Squert, Snorby and capME
web interfaces
•All setup to work with each other out of the box
http://securityonion.blogspot.com/
pixlcloud | turning data into actionable insights copyright (c) 2014
30. PCAP in HDFS or HBase
Row or columnar, fixed schema?
Unstructured in ElasticSearch, enrich on ingestion?
ES or relational
38 Secur i ty. Analyt ics . Ins ight .
Data Type and Use
• What data do you have?
• PCAP
• Flows
• Context, (e.g., threat feeds)
• “Text” logs
• What’s your use-case?
• Search
• Analytics
• Forensics on PCAP
Index -> Elastic Search
Columnar, SQL enabled
Moloch? Or extract meta data and store PCAP in HDFS/HBase
31. 39 Secur i ty. Analyt ics . Ins ight .
OpenSOC
39. Principals of Analytic Design
• Show comparisons, contrasts,
differences
• Show causality, mechanism,
explanation, systematic structure.
• Show multivariate data; that is,
show more than 1 or 2 variables.
47 Secur i ty. Analyt ics . Ins ight .
!
by Edward Tufte
40. 48 Secur i ty. Analyt ics . Ins ight .
Add Context
Additional information about
objects, such as:
• machine
• roles
• criticality
• location
• owner
• …
• user
• roles
• office location
• …
source destination
machine and
user context
machine role
42. 50 Secur i ty. Analyt ics . Ins ight .
Visualize Me Lots (>1TB) of Data
!
! SecViz is Hard!
43. 51 Secur i ty. Analyt ics . Ins ight .
Data Visualization Workflow
Overview Zoom / Filter Details on Demand
Principle by Ben Shneiderman
44. 52 Secur i ty. Analyt ics . Ins ight .
Backend Support
This visualization process requires:
• Low latency, scalable backend (columnar, distributed data store)
• Efficient client-server communications and caching
• Assistance of data mining to
• Reduce overall data to look at
• Highlight relationships, patterns, and outliers
• Assist analyst in focussing on ‘important’ areas
55. 67 Secur i ty. Analyt ics . Ins ight .
Processing Pipeline
1. Get data into ElasticSearch
Parse data first, then store in ES
2. Get data out of ES (query)
Get into data format for visualization tool (e.g., CSV)
3. Visualize in the visualization tool
Potentially translate CSV into other format (e.g., DOT, GDF)
Process the data (aggregation, enhancement, etc)
56. 68 Secur i ty. Analyt ics . Ins ight .
LogStash Setup - Exercise
1. Check out /home/davix/ue14
logstash-syslog.conf [read, understand!]
2. Run logstash and index data:
! sudo /opt/logstash/bin/logstash -f logstash-syslog.conf!
! head -10 firewall | nc localhost 5000!! # send data
3. Check what’s in LogStash:
sudo /etc/init.d/logstash-web start!
! open http://localhost:9292 !# kibana
4. Use script to extract data
read_es.py [check out the script]
update the script to output a (src_ip, dst_ip, dst_port) tuple
5. Convert the CSV output to a GDF file to then load into Gephi
OR create a TM3 file for the treemap tool
curl 'http://localhost:9200/_all/_search?q=ACCEPTED'
curl ‘http://localhost:9200/twitter/_search?q=user:kimchy'
57. 69 Secur i ty. Analyt ics . Ins ight .
BlackHat Europe - Workshop
VISUAL ANALYTICS DELIVERING ACTIONABLE SECURITY INTELLIGENCE
October 14, 15 - Amsterdam
58. Security Visualization Community
Share, discuss, challenge, and learn about security
visualization.
•http://secviz.org
•List: secviz.org/mailinglist
•Twitter: @secviz
pixlcloud | turning data into actionable insights copyright (c) 2013