Recently, deep learning has delivered ground-breaking advances in many industries by delivering human-like understanding for difficult cognition problems. We will share our empirical experiences of applying deep learning to some real-world security challenges, together with leant lessons and suggestions.
1. Examples
We are going to explain our innovative User & Entity Behavior Analytics (UEBA) solution which includes 2 deep learning examples: 1. user and entity behavior anomaly detection using Convolutional Neural Network (CNN), 2. stateful user risk scoring using Long Short Term Memory (LSTM), in order to detect slow-gestating and multi-stage targeted attacks. We are also going to share several real-life use cases of successfully detecting compromised users and malicious insiders in big enterprises.
2. Infrastructure
The production data processing and analytics workflow is developed using Spark, Spark Streaming and TensorFlow. We will share the experience of managing and tuning distributed TensorFlow and Spark on a middle/small size cluster in both SAS and on-premises deployments. This includes how to manage and split resources between Spark and TensorFlow, how to split and tune workloads between parameter servers and worker servers in TensorFlow, etc.
3. Challenges and Guidance
At the end, we are going to discuss the special challenges of applying deep learning (or general ML) into security than most other consumer industries, e.g., lack of large volume of high-quality labeled data, interpretation of models, fast detection, high cost of inaccurate detections.
Human intelligence – including knowledge of both enterprise business context and security heuristics – is a very precious resource to help cover these gaps. Thus any effective security ML solution has to have well integrated human and machine intelligence.
To achieve this partnership, there are several suggestions based on our current experiences, e.g., mix of complex and simple models, reinforcement learning based on human feed, pairing probabilistic ML results with deterministic forensic data.
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggestions
1. Deep Learning in Security:
Examples, Infrastructure, Challenges and Suggestion
Jisheng Wang, Shirley Wu
June 13, 2017
2. 2
Ø Jisheng Wang, Senior Director of Data Science, CTO Office, Aruba / HPE
• Over 12-year experiences: Machine Learning + Big Data => Security
• Ph.D. @ Penn State, Chief Scientist @ Niara, Tech Lead @ Cisco
Ø Shirley Wu, Data Architect, Aruba / HPE
• Architect, big data infrastructure @ Niara
• Manager, big data and analytics @ Nice Systems
Ø Niara – a Hewlett Packard Enterprise company
• Re-invented enterprise security via User and Entity Behavior Analytics (UEBA)
• Acquired by Aruba, a Hewlett Packard Enterprise company in Feb, 2017
US, NIARA, ARUBA / HPE
3. 3
USER & ENTITY BEHAVIOR ANALYTICS (UEBA)
UEBA SECURITY
why this matters
USE CASES
how to detect malicious insiders
INFRASTRUCTURE
how to build big data infrastructure
CHALLENGES
how to build an enterprise solution
YOU
ARE
HERE
4. 4
PROBLEM THE SECURITY GAP
PREVENTION & DETECTION (US $B)
SECURITY SPEND
# BREACHES
DATA BREACHES
5. 5
PROBLEM CAUSE OF THE GAP
ATTACKERS
ARE QUICKLY INNOVATING &
ADAPTING
BATTLEFIELD
WITH IOT AND CLOUD, SECURITY
IS BORDERLESS
6. 6
PROBLEM ADDRESSING THE CAUSE
ATTACKERS
ARE QUICKLY INNOVATING &
ADAPTING
DEEP LEARNING
SOLUTIONS MUST BE
RESPONSIVE TO CHANGES
7. 7
PROBLEM ADDRESSING THE CAUSE
BATTLEFIELD
WITH IOT AND CLOUD, SECURITY
IS BORDERLESS
INSIDER BEHAVIOR
LOOK AT BEHAVIOR CHANGE OF
INSIDE USERS AND MACHINES
8. 8
USER & ENTITY BEHAVIOR ANALYTICS (UEBA)
MACHINE LEARNING DRIVEN
BEHAVIOR ANALYTICS IS
A NEW WAY TO COMBAT ATTACKERS
1 Machine driven, not only human driven
2 Detect compromised users, not only attackers
3 Post-infection detection, not only prevention
9. 9
REAL WORLD NEWS WORTHY EXAMPLES
COMPROMISED
40 million credit cards were stolen
from Target’s severs
STOLEN CREDENTIALS
NEGLIGENT
DDoS attack from 10M+ hacked home
devices took down major websites
ALL USED THE SAME PASSWORD
MALICIOUS
Edward Snowden stole more than 1.7 million
classified documents
INTENDED TO LEAK INFORMATION
10. 10
USER & ENTITY BEHAVIOR ANALYTICS
UEBA SECURITY
why this matters
USE CASES
how to detect malicious insiders
INFRASTRUCTURE
how to build big data infrastructure
CHALLENGES
how to build an enterprise solution
YOU
ARE
HERE
11. 11
REAL WORLD ATTACKS CAUGHT BY NIARA
SCANNING ATTACK
scan servers in the data center to find
out vulnerable targets
DETECTED WITH AD LOGS
EXFILTRATION OF DATA
upload a large file to cloud server hosted in
new country never accessed before
DETECTED WITH WEB PROXY LOGS
DATA DOWNLOAD
download data from internal document
repository which is not typical for the host
DETECTED WITH NETWORK TRAFFIC
19. 19
ENTITY SCORING RECURRENT NEURAL NETWORK (RNN)
t1,
PHISHING
EMAIL
INFECTION
t2,
SUSPCIOUS
C&C DNS
TUNNEL
t3,
ABORNOMAL
SERVER
ACCESS
t4,
LARGE DATA
UPLOAD TO
NEW
COUNTRY
Input Events Risk Scores
25
48
76
92
20. 20
ENTITY SCORING RECURRENT NEURAL NETWORK (RNN)
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
t1,
PHISHING
EMAIL
INFECTION
t2,
SUSPCIOUS
C&C DNS
TUNNEL
t3,
ABORNOMAL
SERVER
ACCESS
t4,
LARGE DATA
UPLOAD TO
NEW
COUNTRY
Input Layer
(200 x 1)
Input Events
one hot
encoding
21. 21
ENTITY SCORING RECURRENT NEURAL NETWORK (RNN)
f(t1)
0
0
0
0
f(t2-t1)
0
0
0
0
f(t3-t2)
0
0
0
0
f(t4-t3)
t1,
PHISHING
EMAIL
INFECTION
t2,
SUSPCIOUS
C&C DNS
TUNNEL
t3,
ABORNOMAL
SERVER
ACCESS
t4,
LARGE DATA
UPLOAD TO
NEW
COUNTRY
Input Layer
(200 x 1)
Input Events
one hot
time-decayed
encoding
22. 22
ENTITY SCORING RECURRENT NEURAL NETWORK (RNN)
0.6
0
0
0
0
0.8
0
0
0
0
0.9
0
0
0
0
0.5
t1,
PHISHING
EMAIL
INFECTION
t2,
SUSPCIOUS
C&C DNS
TUNNEL
t3,
ABORNOMAL
SERVER
ACCESS
t4,
LARGE DATA
UPLOAD TO
NEW
COUNTRY
Input Layer
(200 x 1)
Input Events
one hot
time-decayed
encoding
23. 23
ENTITY SCORING RECURRENT NEURAL NETWORK (RNN)
f(t1)
0
0
0
0
f(t2-t1)
0
0
0
0
f(t3-t2)
0
0
0
0
f(t4-t3)
t1,
PHISHING
EMAIL
INFECTION
t2,
SUSPCIOUS
C&C DNS
TUNNEL
t3,
ABORNOMAL
SERVER
ACCESS
t4,
LARGE DATA
UPLOAD TO
NEW
COUNTRY
Input Layer
(200 x 1)
Hidden Layer
(64 x 1)
Output Layer
(64 x 1)
Input Events Score Layer
(100 x 1)
Long-Short Term Memory (LSTM)
Risk Scores
25
48
76
92
24. 24
USER & ENTITY BEHAVIOR ANALYTICS
UEBA SECURITY
why this matters
USE CASES
how to detect malicious insiders
INFRASTRUCTURE
how to build big data infrastructure
CHALLENGES
how to build an enterprise solution
YOU
ARE
HERE
29. 29
DATA PIPELINE BIG DATA ECOSYSTEM
HDFS
Paruqet
HBase
ElasticSearch
Packets
Logs
Anomaly
Detection
Baseline
Profiling
Risk
Scoring
Data
Pre-processing
Real-Time
Detection
Cross-Source
Correlation
30. 30
USER & ENTITY BEHAVIOR ANALYTICS
UEBA SECURITY
why this matters
USE CASES
how to detect malicious insiders
INFRASTRUCTURE
how to build big data infrastructure
CHALLENGES
how to build an enterprise solution
YOU
ARE
HERE
31. 31
LOCAL CONTEXT HUMAN + MACHINE INTELLIGENCE
Models
Alerts
Reinforcement
Learning
Local
Context
Input
Data
Continuous
Learning
User
Feedback
32. 32
TRAINING DATA GLOBAL + LOCAL INTELLIGENCE
Global Security Intelligence
in the cloud
Local Security Intelligence
Individual customer deployments
CLASSIFIER FEEDBACK
33. 33
USER & ENTITY BEHAVIOR ANALYTICS
UEBA SECURITY
why this matters
USE CASES
how to detect malicious insiders
INFRASTRUCTURE
how to build big data infrastructure
CHALLENGES
how to build an enterprise solution