Workshop: Big Data Visualization for Security

18,259 views

Published on

Big Data is the latest hype in the security industry. We will have a closer look at what big data is comprised of: Hadoop, Spark, ElasticSearch, Hive, MongoDB, etc. We will learn how to best manage security data in a small Hadoop cluster for different types of use-cases. Doing so, we will encounter a number of big-data open source tools, such as LogStash and Moloch that help with managing log files and packet captures.
As a second topic we will look at visualization and how we can leverage visualization to learn more about our data. In the hands-on part, we will use some of the big data tools, as well as a number of visualization tools to actively investigate a sample data set.

Published in: Internet
1 Comment
27 Likes
Statistics
Notes
No Downloads
Views
Total views
18,259
On SlideShare
0
From Embeds
0
Number of Embeds
9,913
Actions
Shares
0
Downloads
397
Comments
1
Likes
27
Embeds 0
No embeds

No notes for slide

Workshop: Big Data Visualization for Security

  1. 1. Big Data Visualization for Security UE14 - Romania September 2014 Raffael Marty, CEO
  2. 2. 3 Secur i ty. Analyt ics . Ins ight . I am Raffy - I do Viz! IBM Research
  3. 3. 4 Secur i ty. Analyt ics . Ins ight . Agenda Introduction Data Sources DAVIX Log Data Processing • Big Data Ecosystem • Security Big Data Tools • Managing Security Data • Visualizing Big Data
  4. 4. 6 http://www.bigdatalandscape.com/
  5. 5. 8 Secur i ty. Analyt ics . Ins ight . Big Data - The Three V’s Velocity Volume Variety
  6. 6. The Big Data Ecosystem 9
  7. 7. 10 Secur i ty. Analyt ics . Ins ight . Hadoop Ecosystem Mahout machine learning Hive data warehouse HiveQL query lang Pig programming language (pig latin) HBase big data store rndm read and write auto sharding Map Reduce Impala interactive SQL queries distributed file system data redundancy fault-tolerance HDFS random, real-time read/write access append only namenode / datanode architecture Zookeeper centralized “brain” Sentry Storm
  8. 8. Berkeley Data Analysis Stack (BDAS) 11 Secur i ty. Analyt ics . Ins ight . https://amplab.cs.berkeley.edu/software/ SparkSQL
  9. 9. http://elasticsearch.org 12 Secur i ty. Analyt ics . Ins ight . Elastic Search • Schema free & document oriented • Simple HTTP interface • indexes JSON documents • Queries, aggregations, highlighting, etc. • Distributed - super easy to add nodes • Real-time indexing • Based on Lucene • Replication • Partitioning / sharding • how an index is assigned to nodes • Snapshots Up and running in 10 minutes!!
  10. 10. 13 Secur i ty. Analyt ics . Ins ight . Elastic Search - Admin Interface
  11. 11. Big Data Security Tools 14
  12. 12. 15 Secur i ty. Analyt ics . Ins ight . ELK Stack • Elastic Search • LogStash • Kibana
  13. 13. LogStash http://logstash.net/ input filter output http://www.elasticsearch.org/overview/logstash 16 Secur i ty. Analyt ics . Ins ight .
  14. 14. logstash http://logstash.net/ input files syslog email tcp socket Flume 17 Secur i ty. Analyt ics . Ins ight . ! AMQP STOMP Beanstalk redis ! twitter HTTP filter timestamp parsing anonymize drop events parse fields (grok) multiline joins output ElasticSearch Graylog2/GELF MongoDB Nagios TCP syslog WebSockets ! AMQP STOMP beanstalk redis messaging formats avro msgpack thrift xml protobuf csv
  15. 15. 18 Secur i ty. Analyt ics . Ins ight . Storing and Indexing Logs Raw log: Aug 2 13:29:58 pixl-ram sshd[1631]: Accepted publickey for ram from 192.168.30.1 port 49864 ssh2 Non parsed: {“text“: “Aug 2 13:29:58 pixl-ram sshd[1631]: Accepted publickey for ram from 192.168.30.1 port 49864 ssh2”} Parsed (through grok in LogStash): {“text“: “Aug 2 13:29:58 pixl-ram sshd[1631]: Accepted publickey for ram from 192.168.30.1 port 49864 ssh2”, “time”: “Aug 2 13:29:58”, “host”: “pixl-ram”, ”process”: “sshd”, “pid”: 1631} -> structured search: time > “Aug 1 2014”
  16. 16. 19 Secur i ty. Analyt ics . Ins ight . Grok • Instead of re-writing regexes • Ships with about 100 patterns • Patterns you don't have to write yourself • It is easy to add new patterns HOSTNAME b(?:[0-9A-Za-z].......! IP (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]…! IPORHOST (?:%{HOSTNAME}|%{IP})!
  17. 17. • Automatic schema inference • Assigns analyzers (prefix indexing, etc.) • Field properties: • “store” [field and document level] • “index”: • “analyzed”: tokenized, analyzed • “not_analyzed”: indexed as is • “no”: no indexing 20 Secur i ty. Analyt ics . Ins ight . ElasticSearch on Grokked Data
  18. 18. 21 Secur i ty. Analyt ics . Ins ight . Grok Patterns Pattern database located in: /opt/logstash/patterns ! Debug Grok rules: http://grokdebug.herokuapp.com/
  19. 19. 22 Secur i ty. Analyt ics . Ins ight . LogStash UI - Kibana
  20. 20. • Block POST / PUT / DELETE to ES instance • Older versions: script.disable_dynamic: true! ! action.destructive_requires_name: true! • Use aliases to allow only certain users access to certain indexes • Use iptables to block ports (9200, 9300, …) • Performance tuning: • https://www.loggly.com/blog/nine-tips-configuring-elasticsearch-for-high-performance/ 23 Secur i ty. Analyt ics . Ins ight . Running ElasticSearch
  21. 21. Running LogStash For debugging: logstash -e ‘input { … } … output { … }’ ! 24 Secur i ty. Analyt ics . Ins ight . ! Other Command line parameters: -w <number of cores>! --debug! ! ! input { stdin { type => "stdin-type" } ! file { type => "syslog-ng" path => [ "/var/log/*.log", “/var/log/messages" ] } } ! output { stdout { } elasticsearch{ embedded => false host => "192.168.0.23" cluster => "logstash-cluster" node_name => “logstash" protocol => “node” } } Act as an ES node, not as an unknown client
  22. 22. 25 Secur i ty. Analyt ics . Ins ight . Running Kibana Authentication not built in Use nginx as a proxy For example: https://github.com/elasticsearch/kibana/blob/master/sample/nginx.conf
  23. 23. 26 Secur i ty. Analyt ics . Ins ight . Moloch Open source, large scale IPv4 packet capturing, indexing and database system powered by elastic search. Web interface for PCAP browsing, searching, reporting, and exporting PCAPs https://github.com/aol/moloch
  24. 24. 27 Secur i ty. Analyt ics . Ins ight . Moloch – Components • Capture • Sniffs the network interface, • Parses the traffic and creates the Session Profile Information (aka SPI-Data) • Writes the packets to disk ! • Database • Elasticsearch is used for storing and searching through the SPI-Data ! • Viewer • A web interface that allows for GUI and API access from remote hosts
  25. 25. Moloch – Capture – SPI-Data Types • Moloch parses various protocols to create SPI-Data: • IP • HTTP • DNS • IP Address • Hostname • IRC • Channel Names • SSH • Client Name • Public Key • SSL/TLS • Certificate elements of various types (common names, serial, etc) ! 28 Secur i ty. Analyt ics . Ins ight . • This is not an all inclusive list
  26. 26. 34 Secur i ty. Analyt ics . Ins ight . Moloch - Couple Additions • Web API’s • Access meta information • Grab PCAPs ! • Indexing PCAP files: ! ${moloch_dir}/bin/moloch-capture -c [config_file] -r [pcap_file]
  27. 27. 35 Secur i ty. Analyt ics . Ins ight . PacketPig • Analyze PCAP files using Apache Pig • Number of scripts made available • e.g., running SNORT on the PCAPs ! https://github.com/bigsnarfdude/packetpig pig -x local ! -f pig/examples/binning.pig ! -param pcap=data/web.pcap
  28. 28. Security Onion •Bro IDS, your choice of Snort or Suricata, Sguil analyst console, ELSA, Squert, Snorby and capME web interfaces •All setup to work with each other out of the box http://securityonion.blogspot.com/ pixlcloud | turning data into actionable insights copyright (c) 2014
  29. 29. Storing Security Data 37
  30. 30. PCAP in HDFS or HBase Row or columnar, fixed schema? Unstructured in ElasticSearch, enrich on ingestion? ES or relational 38 Secur i ty. Analyt ics . Ins ight . Data Type and Use • What data do you have? • PCAP • Flows • Context, (e.g., threat feeds) • “Text” logs • What’s your use-case? • Search • Analytics • Forensics on PCAP Index -> Elastic Search Columnar, SQL enabled Moloch? Or extract meta data and store PCAP in HDFS/HBase
  31. 31. 39 Secur i ty. Analyt ics . Ins ight . OpenSOC
  32. 32. Raffael . Marty @ pixlcloud . com 40 Visualization
  33. 33. Visualization To … Present / Communicate Discover / Explore 41 Secur i ty. Analyt ics . Ins ight .
  34. 34. 42 Secur i ty. Analyt ics . Ins ight . Show Context 42
  35. 35. 43 Secur i ty. Analyt ics . Ins ight . Show Context 42 is just a number and means nothing without context
  36. 36. Use Numbers To Highlight Most Important Parts of Data 45 Secur i ty. Analyt ics . Ins ight . Numbers Summaries
  37. 37. Visualization Creates Context Visualization Puts Numbers (Data) in Context! 46 Secur i ty. Analyt ics . Ins ight .
  38. 38. Principals of Analytic Design • Show comparisons, contrasts, differences • Show causality, mechanism, explanation, systematic structure. • Show multivariate data; that is, show more than 1 or 2 variables. 47 Secur i ty. Analyt ics . Ins ight . ! by Edward Tufte
  39. 39. 48 Secur i ty. Analyt ics . Ins ight . Add Context Additional information about objects, such as: • machine • roles • criticality • location • owner • … • user • roles • office location • … source destination machine and user context machine role
  40. 40. Traffic Flow Analysis With Context 49 Secur i ty. Analyt ics . Ins ight .
  41. 41. 50 Secur i ty. Analyt ics . Ins ight . Visualize Me Lots (>1TB) of Data ! ! SecViz is Hard!
  42. 42. 51 Secur i ty. Analyt ics . Ins ight . Data Visualization Workflow Overview Zoom / Filter Details on Demand Principle by Ben Shneiderman
  43. 43. 52 Secur i ty. Analyt ics . Ins ight . Backend Support This visualization process requires: • Low latency, scalable backend (columnar, distributed data store) • Efficient client-server communications and caching • Assistance of data mining to • Reduce overall data to look at • Highlight relationships, patterns, and outliers • Assist analyst in focussing on ‘important’ areas
  44. 44. Visualization Tools 53
  45. 45. 54 Secur i ty. Analyt ics . Ins ight . Mondrian • Graphs: • Histogram • Box plots • Scatterplot • Mosaicplots • Parallel Coordinates • Boxplots • ... • Linking, brushing, … • Reads CSV files http://www.theusrus.de/Mondrian/
  46. 46. TM3 Input files: Source Port Destination Action STRING INTEGER STRING STRING 10.0.0.2 80 23.2.1.2 failed 55 Secur i ty. Analyt ics . Ins ight . Treemap 4.1 www.cs.umd.edu/hcil/treemap
  47. 47. Gephi http://gephi.org •Gephi UI • interactive link graphs • multiple layout algorithms • reads: CSV, DOT, GDF, etc. • graph metrics •Gephi Toolkit • APIs • Gephi Plugins • Gephi ‘Platform’ • adding JavaFX components 56 Secur i ty. Analyt ics . Ins ight .
  48. 48. 57 Secur i ty. Analyt ics . Ins ight . Visually Finding Insight in Gephi 1. Loading Data
  49. 49. 58 Secur i ty. Analyt ics . Ins ight . Visually Finding Insight in Gephi 2. Run Layout Algorithm (Force Atlas 2)
  50. 50. 60 Secur i ty. Analyt ics . Ins ight . Visually Finding Insight in Gephi 3. Use Degree as color and size of nodes
  51. 51. 63 Secur i ty. Analyt ics . Ins ight . Visually Finding Insight in Gephi 6. Use Preview and export Graph
  52. 52. AfterGlow - Creating DOT/GDF Files From CSV Parser Grapher CSV File Graph LanguageFile digraph structs { graph [label="AfterGlow 1.5.8", fontsize=8]; node [shape=ellipse, style=filled, fontsize=10, width=1, height=1, fixedsize=true]; edge [len=1.6]; ! "aaelenes" -> "Printing Resume" ; "abbe" -> "Information Encryption" ; "aanna" -> "Patent Access" ; "aatharuv" -> "Ping" ; } 65 Secur i ty. Analyt ics . Ins ight . aaelenes,Printing Resume abbe,Information Encrytion aanna,Patent Access aatharuy,Ping cat file | ./afterglow –c simple.properties –t | neato –Tgif –o test.gif
  53. 53. Hands On 66
  54. 54. 67 Secur i ty. Analyt ics . Ins ight . Processing Pipeline 1. Get data into ElasticSearch Parse data first, then store in ES 2. Get data out of ES (query) Get into data format for visualization tool (e.g., CSV) 3. Visualize in the visualization tool Potentially translate CSV into other format (e.g., DOT, GDF) Process the data (aggregation, enhancement, etc)
  55. 55. 68 Secur i ty. Analyt ics . Ins ight . LogStash Setup - Exercise 1. Check out /home/davix/ue14 logstash-syslog.conf [read, understand!] 2. Run logstash and index data: ! sudo /opt/logstash/bin/logstash -f logstash-syslog.conf! ! head -10 firewall | nc localhost 5000!! # send data 3. Check what’s in LogStash: sudo /etc/init.d/logstash-web start! ! open http://localhost:9292 !# kibana 4. Use script to extract data read_es.py [check out the script] update the script to output a (src_ip, dst_ip, dst_port) tuple 5. Convert the CSV output to a GDF file to then load into Gephi OR create a TM3 file for the treemap tool curl 'http://localhost:9200/_all/_search?q=ACCEPTED' curl ‘http://localhost:9200/twitter/_search?q=user:kimchy'
  56. 56. 69 Secur i ty. Analyt ics . Ins ight . BlackHat Europe - Workshop VISUAL ANALYTICS DELIVERING ACTIONABLE SECURITY INTELLIGENCE October 14, 15 - Amsterdam
  57. 57. Security Visualization Community Share, discuss, challenge, and learn about security visualization. •http://secviz.org •List: secviz.org/mailinglist •Twitter: @secviz pixlcloud | turning data into actionable insights copyright (c) 2013
  58. 58. info@pixlcloud.com

×