Your SlideShare is downloading. ×
0
Building a Log Analysis Pipeline
Building a Log Analysis Pipeline
Building a Log Analysis Pipeline
Building a Log Analysis Pipeline
Building a Log Analysis Pipeline
Building a Log Analysis Pipeline
Building a Log Analysis Pipeline
Building a Log Analysis Pipeline
Building a Log Analysis Pipeline
Building a Log Analysis Pipeline
Building a Log Analysis Pipeline
Building a Log Analysis Pipeline
Building a Log Analysis Pipeline
Building a Log Analysis Pipeline
Building a Log Analysis Pipeline
Building a Log Analysis Pipeline
Building a Log Analysis Pipeline
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Building a Log Analysis Pipeline

914

Published on

Quick internal presentation on work we've been doing to deploy an ELK stack for our security analysis needs.

Quick internal presentation on work we've been doing to deploy an ELK stack for our security analysis needs.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
914
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
27
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Building a Log Analysis Pipeline A BRIEF TOUR
  • 2. Problem Limited visibility into the environment SIEM solutions inadequate for risk management purposes Requests for extracts difficult or impossible to provide Unable to connect together different data sources
  • 3. Requirements Cheap ◦ Budget + Labor ≈ 0 ◦ Hobby project Scalable ◦ SIEM data in the TB range ◦ Need to have historical data ◦ Decoupled from logging infrastructure Performance ◦ Batch processing is okay ◦ …but batches can’t be too slow ◦ Need near real-time exploration options Confidentiality ◦ This is security data. Let’s not create more problems than solutions.
  • 4. Resources SIEM does a good job with log aggregation ◦ Stores raw syslog events Easy to access to raw events on the SIEM Data is relatively large, but not BIG
  • 5. A Plan Is Born “I have a cunning plan!” – S. Baldrick, Blackadder
  • 6. Early Approaches METHOD 1 - MONGODB ◦ Python regexp to create JSON ◦ Load to MongoDB ◦ Run Mongo MapReduce Worked – but slow. Required AWS for sufficient memory to run MapReduce flows METHOD 2 – PURE PYTHON ◦ Python regexp to create CSV ◦ Pull off to Analysis Workspace ◦ Python MapReduce in shell Worked – but limited and rigid
  • 7. Premature Data Truncation Leads to Poor Results Loose ability to query context Additional queries not possible without custom redesign ◦ Blocks vs. Passes ◦ Port information Querying peer node relations, etc. not practical
  • 8. Unleash the ELK!
  • 9. Elasticsearch ◦ Full text search engine based on Apache Lucene ◦ Incredibly fast and flexible query DSL ◦ Built for distributed search (horizontal scale) from the ground up
  • 10. Logstash Open Source log intake and processor Easy to use pattern matching ◦ No more opaque regexs! Terrific metadata enrichment Scores of plugins ◦ Inputs, outputs, filters, codecs
  • 11. Kibana ◦ Lightweight HTML5 interface to Elasticsearch for logs ◦ Not a full SIEM replacement ◦ Targeting the Splunk market
  • 12. Infrastructure On SIEM ◦ Python for creating extracts ◦ Bash for taring up raw logs Transport ◦ SCP from SIEM to Windows file share ◦ USB from Windows file share ◦ Sneaker net to analysis workspace On Analysis Workspace ◦ Vagrant ◦ Chef
  • 13. Demo
  • 14. Pieces Involved
  • 15. Next Steps – Infrastructure Complete provisioning scripts for Hadoop & AWS Transfer raw GZ files to encrypted S3 bucket ◦ Allow extract AWS EMR jobs to run Process via Logstash into Elasticsearch ◦ Elasticsearch for short-term exploration ◦ Archive structured data to S3 Setup Elasticsearch-Hadoop connector Use AWS EMR to do ad hoc extracts off of structured S3 buckets
  • 16. Next Steps – Data Products Full MaxMind integration ◦ Accuracy & detail Reputation ◦ REN-ISAC integration Graph exploration ◦ Who else talked to whom ◦ Clustering Future ◦ Proxy logs ◦ DNS logs
  • 17. Thanks Google Groups IRC #logstash, #chef, #vagrant, #elasticsearch Seattle Search and Machine Learning Meetup Seattle Chef Meetup Hortonworks Sandbox The Phoenix Project Data-Driven Security AlienVault …and more!

×