SESSION ID:
#RSAC
Ravi Honnavalli
FIM AND SYSTEM CALL AUDITING AT
SCALE IN A LARGE CONTAINER
DEPLOYMENT
CSV-R14
Staff Engineer
Walmart
Twitter handle: @ravi_honnavalli
#RSAC
Disclaimer
2
NOTE: All content discussed here are
out of self learning and not related to
the work I do at Walmart.
#RSAC
Ever increasing amount of logs
3
#RSAC
Overwhelming amount of choices
4
Too many
options!!
Static
rules?
ML?
Event
source
?
Agent vs
Agentless
#RSAC
Flood of OSS tools
5
Elasticsearch
osquery
journald
ElastAlert
TensorFlowUnstructured datastores
fluentd
#RSAC
GOAL: Demystifying the choices we have
6
•Classifying event sources
•Understand event source type
•Evaluate open source stacks
Understanding types of
event sources
•Understand the insights we are looking for
•Build a stack based on the event classification
•If needed customize existing open source tools
•Build adaptors / tools that join the whole chain
Build our own stack
based on insight needed
• The stacks discussed in this presentation are
by no means the only stack availableMake an informed
decision
#RSAC
Quick poll
7
Use audit logging to
detect anomalies?
How may implement it
only to meet
compliance?
Take it further to use
machine learning
techniques?
#RSAC
• kauditd
• Inotify
Kernel
Possibilities of tools evaluated
8
• auditd
• go-auditd
• go-audit-container
• osquery
Data shippers
• Logstash
• Filebeat
• Fluentd
Sink
• File
• Syslog
Deployment
tools
• Chef
• Puppet
• Ansible
Fleet manager
• Zentral
• Kolide
• Doorman
• Hand crafted tool
Unstructured data
stores
• Elasticsearch
• MongoDB
Graphing and reporting
• Kibana
• Grafana
• ElastAlert
• Custom tools based on
query DSL/Lucene
Data Preprocessing
• Custom tool to build a
training set
ML
• TensorFlow
• Layer depth
• Optimization
algorithm
• Learning rate
• Gradient descent
mechanisms
#RSAC
Classification of event sources
9
Event
sources
Event
based
syscall inotify
Scheduled
query
agent
#RSAC
Security insight based on event source type
10
syscall
Looking for specific
outliers among
mostly normal
dynamic events.
• Like identifying outliers
• Monitoring constantly for
a specific malicious
system call along with
other criteria (uid, etc)
inotify
Safe-guarding specific
sensitive files / area
in the file system
• Watch for
CREATE/ACCESS/MODIFY/
DELETE events on specific
files
agent
Scheduled activities for
static information
• OS patch level queries,
vulnerable kernel
modules, mis-
configuration
#RSAC
SYSTEM CALLS
#RSAC
Why syscall?
12
Fundamental transit
points between user land
and kernel
Every process makes system
calls disclosing information of
its activity
Several user space tools that
send audit information
(auditd, go-audit, go-audit-
container)
Can provide deep insight
when aggregated and drilled
down
Ideal candidate to build a
machine learning training set
as the volume of data is huge
#RSAC
Audit component
13
User land
Kernel land
Kauditd
Syscall interfaceNetlink
socket
Reporting
daemon
User space
application
#RSAC
Audit log to gain insights at scale
User land
Kauditd
Syscall interfaceNetlink
socket
go-audit-
container
User space
application
Elastics
earch
Grafana
Kibana
Pre-
Processsink
Email
Pagerduty
Slack
TensorFlow
Kernel
#RSAC
Demo
15
#RSAC
INOTIFY
#RSAC
Inotify component
17
User land
Kernel
landInotify component
User space application
Inotify_add_watch
Watch list
Inotify_event { }
Inotify_event { }
Inotify_event { }
Inotify_event { }
Event queue
#RSAC
Why inotify?
18
Lesser CPU
consumption on
an average
Missing details
in the reports
Another
parallel stack,
a new and
exciting stack
to explore
with osquery
#RSAC
Demo
19
#RSAC
inotify based stack for FIM
User land
Kernel
land
Elasticsea
rch
Grafana
Kibana
Pre-
Processsink
Email
Pagerduty
Slack
TensorFlow
osquery
Inotify
component
Register
watch
Notify
event
#RSAC
AGENTS
#RSAC
osquery
22
osquery osqueryosqueryosquery
Fleet Manager
OS OSOS OS
Distributed
Query
#RSAC
Osquery stack to get insights at scale
Elasticsea
rch
Grafana
Kibana
Pre-
Processsink
Email
Pagerduty
Slack
TensorFlow
osquery
OS
Extension
plugin
#RSAC
Demo of tracing a ‘Dirty COW’ exploit
24
#RSAC
Learning from using fixed queries in Kibana,
Grafana and custom tools
25
Robust rules
need a lot of
queries
Fixed queries are
good but only goes
so far
Any small variation
of the rules is a
false negative
Using DNN based machine
learning helps improve our
ability to detect anomalies
#RSAC
MACHINE LEARNING
#RSAC
High level differences
27
Unsupervised
Elasticsearch ML
Anomaly detection
Time series data
Supervised
Pre-processink-osquery
Explicit labelling and pre-processing
Explicit data classification on disparate info
#RSAC
Picture credit: https://unsplash.com/@ripato
28
Elasticsearch ML: Detecting outliers
#RSAC
Demo of ElasticSearch ML
29
#RSAC
Use case: Classifying disparate data
30
Classifying data from different event sources
Broad classification into RED/YELLOW/GREEN
Classifying to have a big picture of the security posture of
the organization
#RSAC
Supervised learning: Building a training set is
key
31
Elasticsearch
Preprocessink-
osquery
Labelling TensorFlow
#RSAC
Pre-processink-osquery stages
Stage 0
• Query one probe at a time from ES
• Label into RED/YELLOW/GREEN
• Write to stage_0.csv
Stage 1 • Merge into existing stage_1.csv
Manual
Labeling
• At this stage human administrator can manually label
events that were not labeled or which were
incorrectly labeled by the automated rules
Stage 2
• Transform into numeric
values, which will be the
final training data set
#RSAC
ML choices if you are building your own
solution
33
Activation
function
Learning
rate
Depth of
the network
Batch size
vs iterations
Optimzers
#RSAC
Results of our experiment
34
ReLU for hidden layers and softmax for activation
Epoch vs Batch Size vs Iterations
Adam optimizer
Lower the learning rate the better
#RSAC
Lessons learnt
35
With system call auditing, inotify, os level querying agents as foundations, combined with
ability to aggregate at scale in Elasticsearch, we can achieve very deep security insights on
the production environment.
With anomaly detection discussed in this talk, we are just scratching the surface. Given that
system calls are fundamental, possibilities are enormous.
With static threshold and reporting configuration it is very easy to miss security insights
DNN based machine learning helps in getting intelligent insights
Based on parameters like platform support, CPU utilization, memory and disk space foot
print, etc we looked at different choices of stack
#RSAC
Apply
36
Start with a simple File Integrity monitoring implementation using
audit log. Observe load of FIM events on the infrastructure
Grow the solution to more detailed monitoring
Try applying ML based on the rule based and manual labels
Think of possibilities outside of what is discussed here today
#RSAC
AuditNG suite
37
https://github.com/auditNG/preprocessink-osquery
https://github.com/auditNG/go-audit-container
#RSAC
Questions?
38
You can also reach out later:
Twitter handle: @ravi_Honnavalli
LinkedIn:
https://www.linkedin.com/in/ravi-
honnavalli-0535163/

FIM and System Call Auditing at Scale in a Large Container Deployment

  • 1.
    SESSION ID: #RSAC Ravi Honnavalli FIMAND SYSTEM CALL AUDITING AT SCALE IN A LARGE CONTAINER DEPLOYMENT CSV-R14 Staff Engineer Walmart Twitter handle: @ravi_honnavalli
  • 2.
    #RSAC Disclaimer 2 NOTE: All contentdiscussed here are out of self learning and not related to the work I do at Walmart.
  • 3.
  • 4.
    #RSAC Overwhelming amount ofchoices 4 Too many options!! Static rules? ML? Event source ? Agent vs Agentless
  • 5.
    #RSAC Flood of OSStools 5 Elasticsearch osquery journald ElastAlert TensorFlowUnstructured datastores fluentd
  • 6.
    #RSAC GOAL: Demystifying thechoices we have 6 •Classifying event sources •Understand event source type •Evaluate open source stacks Understanding types of event sources •Understand the insights we are looking for •Build a stack based on the event classification •If needed customize existing open source tools •Build adaptors / tools that join the whole chain Build our own stack based on insight needed • The stacks discussed in this presentation are by no means the only stack availableMake an informed decision
  • 7.
    #RSAC Quick poll 7 Use auditlogging to detect anomalies? How may implement it only to meet compliance? Take it further to use machine learning techniques?
  • 8.
    #RSAC • kauditd • Inotify Kernel Possibilitiesof tools evaluated 8 • auditd • go-auditd • go-audit-container • osquery Data shippers • Logstash • Filebeat • Fluentd Sink • File • Syslog Deployment tools • Chef • Puppet • Ansible Fleet manager • Zentral • Kolide • Doorman • Hand crafted tool Unstructured data stores • Elasticsearch • MongoDB Graphing and reporting • Kibana • Grafana • ElastAlert • Custom tools based on query DSL/Lucene Data Preprocessing • Custom tool to build a training set ML • TensorFlow • Layer depth • Optimization algorithm • Learning rate • Gradient descent mechanisms
  • 9.
    #RSAC Classification of eventsources 9 Event sources Event based syscall inotify Scheduled query agent
  • 10.
    #RSAC Security insight basedon event source type 10 syscall Looking for specific outliers among mostly normal dynamic events. • Like identifying outliers • Monitoring constantly for a specific malicious system call along with other criteria (uid, etc) inotify Safe-guarding specific sensitive files / area in the file system • Watch for CREATE/ACCESS/MODIFY/ DELETE events on specific files agent Scheduled activities for static information • OS patch level queries, vulnerable kernel modules, mis- configuration
  • 11.
  • 12.
    #RSAC Why syscall? 12 Fundamental transit pointsbetween user land and kernel Every process makes system calls disclosing information of its activity Several user space tools that send audit information (auditd, go-audit, go-audit- container) Can provide deep insight when aggregated and drilled down Ideal candidate to build a machine learning training set as the volume of data is huge
  • 13.
    #RSAC Audit component 13 User land Kernelland Kauditd Syscall interfaceNetlink socket Reporting daemon User space application
  • 14.
    #RSAC Audit log togain insights at scale User land Kauditd Syscall interfaceNetlink socket go-audit- container User space application Elastics earch Grafana Kibana Pre- Processsink Email Pagerduty Slack TensorFlow Kernel
  • 15.
  • 16.
  • 17.
    #RSAC Inotify component 17 User land Kernel landInotifycomponent User space application Inotify_add_watch Watch list Inotify_event { } Inotify_event { } Inotify_event { } Inotify_event { } Event queue
  • 18.
    #RSAC Why inotify? 18 Lesser CPU consumptionon an average Missing details in the reports Another parallel stack, a new and exciting stack to explore with osquery
  • 19.
  • 20.
    #RSAC inotify based stackfor FIM User land Kernel land Elasticsea rch Grafana Kibana Pre- Processsink Email Pagerduty Slack TensorFlow osquery Inotify component Register watch Notify event
  • 21.
  • 22.
  • 23.
    #RSAC Osquery stack toget insights at scale Elasticsea rch Grafana Kibana Pre- Processsink Email Pagerduty Slack TensorFlow osquery OS Extension plugin
  • 24.
    #RSAC Demo of tracinga ‘Dirty COW’ exploit 24
  • 25.
    #RSAC Learning from usingfixed queries in Kibana, Grafana and custom tools 25 Robust rules need a lot of queries Fixed queries are good but only goes so far Any small variation of the rules is a false negative Using DNN based machine learning helps improve our ability to detect anomalies
  • 26.
  • 27.
    #RSAC High level differences 27 Unsupervised ElasticsearchML Anomaly detection Time series data Supervised Pre-processink-osquery Explicit labelling and pre-processing Explicit data classification on disparate info
  • 28.
  • 29.
  • 30.
    #RSAC Use case: Classifyingdisparate data 30 Classifying data from different event sources Broad classification into RED/YELLOW/GREEN Classifying to have a big picture of the security posture of the organization
  • 31.
    #RSAC Supervised learning: Buildinga training set is key 31 Elasticsearch Preprocessink- osquery Labelling TensorFlow
  • 32.
    #RSAC Pre-processink-osquery stages Stage 0 •Query one probe at a time from ES • Label into RED/YELLOW/GREEN • Write to stage_0.csv Stage 1 • Merge into existing stage_1.csv Manual Labeling • At this stage human administrator can manually label events that were not labeled or which were incorrectly labeled by the automated rules Stage 2 • Transform into numeric values, which will be the final training data set
  • 33.
    #RSAC ML choices ifyou are building your own solution 33 Activation function Learning rate Depth of the network Batch size vs iterations Optimzers
  • 34.
    #RSAC Results of ourexperiment 34 ReLU for hidden layers and softmax for activation Epoch vs Batch Size vs Iterations Adam optimizer Lower the learning rate the better
  • 35.
    #RSAC Lessons learnt 35 With systemcall auditing, inotify, os level querying agents as foundations, combined with ability to aggregate at scale in Elasticsearch, we can achieve very deep security insights on the production environment. With anomaly detection discussed in this talk, we are just scratching the surface. Given that system calls are fundamental, possibilities are enormous. With static threshold and reporting configuration it is very easy to miss security insights DNN based machine learning helps in getting intelligent insights Based on parameters like platform support, CPU utilization, memory and disk space foot print, etc we looked at different choices of stack
  • 36.
    #RSAC Apply 36 Start with asimple File Integrity monitoring implementation using audit log. Observe load of FIM events on the infrastructure Grow the solution to more detailed monitoring Try applying ML based on the rule based and manual labels Think of possibilities outside of what is discussed here today
  • 37.
  • 38.
    #RSAC Questions? 38 You can alsoreach out later: Twitter handle: @ravi_Honnavalli LinkedIn: https://www.linkedin.com/in/ravi- honnavalli-0535163/