Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

FIM and System Call Auditing at Scale in a Large Container Deployment


Published on

This will show how, on a large container deployment, the speaker achieved insight into security events like file events on sensitive files, system call auditing, user level activity trail, network activity, etc., by customizing and plumbing a stack of open source tools that use the underlying Linux’s inotify and kernel audit components and by aggregating these events centrally in Elasticsearch.
(Source: RSA Conference USA 2018)

Published in: Technology
  • Login to see the comments

FIM and System Call Auditing at Scale in a Large Container Deployment

  1. 1. SESSION ID: #RSAC Ravi Honnavalli FIM AND SYSTEM CALL AUDITING AT SCALE IN A LARGE CONTAINER DEPLOYMENT CSV-R14 Staff Engineer Walmart Twitter handle: @ravi_honnavalli
  2. 2. #RSAC Disclaimer 2 NOTE: All content discussed here are out of self learning and not related to the work I do at Walmart.
  3. 3. #RSAC Ever increasing amount of logs 3
  4. 4. #RSAC Overwhelming amount of choices 4 Too many options!! Static rules? ML? Event source ? Agent vs Agentless
  5. 5. #RSAC Flood of OSS tools 5 Elasticsearch osquery journald ElastAlert TensorFlowUnstructured datastores fluentd
  6. 6. #RSAC GOAL: Demystifying the choices we have 6 •Classifying event sources •Understand event source type •Evaluate open source stacks Understanding types of event sources •Understand the insights we are looking for •Build a stack based on the event classification •If needed customize existing open source tools •Build adaptors / tools that join the whole chain Build our own stack based on insight needed • The stacks discussed in this presentation are by no means the only stack availableMake an informed decision
  7. 7. #RSAC Quick poll 7 Use audit logging to detect anomalies? How may implement it only to meet compliance? Take it further to use machine learning techniques?
  8. 8. #RSAC • kauditd • Inotify Kernel Possibilities of tools evaluated 8 • auditd • go-auditd • go-audit-container • osquery Data shippers • Logstash • Filebeat • Fluentd Sink • File • Syslog Deployment tools • Chef • Puppet • Ansible Fleet manager • Zentral • Kolide • Doorman • Hand crafted tool Unstructured data stores • Elasticsearch • MongoDB Graphing and reporting • Kibana • Grafana • ElastAlert • Custom tools based on query DSL/Lucene Data Preprocessing • Custom tool to build a training set ML • TensorFlow • Layer depth • Optimization algorithm • Learning rate • Gradient descent mechanisms
  9. 9. #RSAC Classification of event sources 9 Event sources Event based syscall inotify Scheduled query agent
  10. 10. #RSAC Security insight based on event source type 10 syscall Looking for specific outliers among mostly normal dynamic events. • Like identifying outliers • Monitoring constantly for a specific malicious system call along with other criteria (uid, etc) inotify Safe-guarding specific sensitive files / area in the file system • Watch for CREATE/ACCESS/MODIFY/ DELETE events on specific files agent Scheduled activities for static information • OS patch level queries, vulnerable kernel modules, mis- configuration
  12. 12. #RSAC Why syscall? 12 Fundamental transit points between user land and kernel Every process makes system calls disclosing information of its activity Several user space tools that send audit information (auditd, go-audit, go-audit- container) Can provide deep insight when aggregated and drilled down Ideal candidate to build a machine learning training set as the volume of data is huge
  13. 13. #RSAC Audit component 13 User land Kernel land Kauditd Syscall interfaceNetlink socket Reporting daemon User space application
  14. 14. #RSAC Audit log to gain insights at scale User land Kauditd Syscall interfaceNetlink socket go-audit- container User space application Elastics earch Grafana Kibana Pre- Processsink Email Pagerduty Slack TensorFlow Kernel
  15. 15. #RSAC Demo 15
  16. 16. #RSAC INOTIFY
  17. 17. #RSAC Inotify component 17 User land Kernel landInotify component User space application Inotify_add_watch Watch list Inotify_event { } Inotify_event { } Inotify_event { } Inotify_event { } Event queue
  18. 18. #RSAC Why inotify? 18 Lesser CPU consumption on an average Missing details in the reports Another parallel stack, a new and exciting stack to explore with osquery
  19. 19. #RSAC Demo 19
  20. 20. #RSAC inotify based stack for FIM User land Kernel land Elasticsea rch Grafana Kibana Pre- Processsink Email Pagerduty Slack TensorFlow osquery Inotify component Register watch Notify event
  21. 21. #RSAC AGENTS
  22. 22. #RSAC osquery 22 osquery osqueryosqueryosquery Fleet Manager OS OSOS OS Distributed Query
  23. 23. #RSAC Osquery stack to get insights at scale Elasticsea rch Grafana Kibana Pre- Processsink Email Pagerduty Slack TensorFlow osquery OS Extension plugin
  24. 24. #RSAC Demo of tracing a ‘Dirty COW’ exploit 24
  25. 25. #RSAC Learning from using fixed queries in Kibana, Grafana and custom tools 25 Robust rules need a lot of queries Fixed queries are good but only goes so far Any small variation of the rules is a false negative Using DNN based machine learning helps improve our ability to detect anomalies
  27. 27. #RSAC High level differences 27 Unsupervised Elasticsearch ML Anomaly detection Time series data Supervised Pre-processink-osquery Explicit labelling and pre-processing Explicit data classification on disparate info
  28. 28. #RSAC Picture credit: 28 Elasticsearch ML: Detecting outliers
  29. 29. #RSAC Demo of ElasticSearch ML 29
  30. 30. #RSAC Use case: Classifying disparate data 30 Classifying data from different event sources Broad classification into RED/YELLOW/GREEN Classifying to have a big picture of the security posture of the organization
  31. 31. #RSAC Supervised learning: Building a training set is key 31 Elasticsearch Preprocessink- osquery Labelling TensorFlow
  32. 32. #RSAC Pre-processink-osquery stages Stage 0 • Query one probe at a time from ES • Label into RED/YELLOW/GREEN • Write to stage_0.csv Stage 1 • Merge into existing stage_1.csv Manual Labeling • At this stage human administrator can manually label events that were not labeled or which were incorrectly labeled by the automated rules Stage 2 • Transform into numeric values, which will be the final training data set
  33. 33. #RSAC ML choices if you are building your own solution 33 Activation function Learning rate Depth of the network Batch size vs iterations Optimzers
  34. 34. #RSAC Results of our experiment 34 ReLU for hidden layers and softmax for activation Epoch vs Batch Size vs Iterations Adam optimizer Lower the learning rate the better
  35. 35. #RSAC Lessons learnt 35 With system call auditing, inotify, os level querying agents as foundations, combined with ability to aggregate at scale in Elasticsearch, we can achieve very deep security insights on the production environment. With anomaly detection discussed in this talk, we are just scratching the surface. Given that system calls are fundamental, possibilities are enormous. With static threshold and reporting configuration it is very easy to miss security insights DNN based machine learning helps in getting intelligent insights Based on parameters like platform support, CPU utilization, memory and disk space foot print, etc we looked at different choices of stack
  36. 36. #RSAC Apply 36 Start with a simple File Integrity monitoring implementation using audit log. Observe load of FIM events on the infrastructure Grow the solution to more detailed monitoring Try applying ML based on the rule based and manual labels Think of possibilities outside of what is discussed here today
  37. 37. #RSAC AuditNG suite 37
  38. 38. #RSAC Questions? 38 You can also reach out later: Twitter handle: @ravi_Honnavalli LinkedIn: honnavalli-0535163/