Deep Learning in Security:
Examples, Infrastructure, Challenges and Suggestion
Jisheng Wang, Shirley Wu
June 13, 2017
2
Ø Jisheng Wang, Senior Director of Data Science, CTO Office, Aruba / HPE
• Over 12-year experiences: Machine Learning + Big Data => Security
• Ph.D. @ Penn State, Chief Scientist @ Niara, Tech Lead @ Cisco
Ø Shirley Wu, Data Architect, Aruba / HPE
• Architect, big data infrastructure @ Niara
• Manager, big data and analytics @ Nice Systems
Ø Niara – a Hewlett Packard Enterprise company
• Re-invented enterprise security via User and Entity Behavior Analytics (UEBA)
• Acquired by Aruba, a Hewlett Packard Enterprise company in Feb, 2017
US, NIARA, ARUBA / HPE
3
USER & ENTITY BEHAVIOR ANALYTICS (UEBA)
UEBA SECURITY
why this matters
USE CASES
how to detect malicious insiders
INFRASTRUCTURE
how to build big data infrastructure
CHALLENGES
how to build an enterprise solution
YOU
ARE
HERE
4
PROBLEM THE SECURITY GAP
PREVENTION & DETECTION (US $B)
SECURITY SPEND
# BREACHES
DATA BREACHES
5
PROBLEM CAUSE OF THE GAP
ATTACKERS
ARE QUICKLY INNOVATING &
ADAPTING
BATTLEFIELD
WITH IOT AND CLOUD, SECURITY
IS BORDERLESS
6
PROBLEM ADDRESSING THE CAUSE
ATTACKERS
ARE QUICKLY INNOVATING &
ADAPTING
DEEP LEARNING
SOLUTIONS MUST BE
RESPONSIVE TO CHANGES
7
PROBLEM ADDRESSING THE CAUSE
BATTLEFIELD
WITH IOT AND CLOUD, SECURITY
IS BORDERLESS
INSIDER BEHAVIOR
LOOK AT BEHAVIOR CHANGE OF
INSIDE USERS AND MACHINES
8
USER & ENTITY BEHAVIOR ANALYTICS (UEBA)
MACHINE LEARNING DRIVEN
BEHAVIOR ANALYTICS IS
A NEW WAY TO COMBAT ATTACKERS
1 Machine driven, not only human driven
2 Detect compromised users, not only attackers
3 Post-infection detection, not only prevention
9
REAL WORLD NEWS WORTHY EXAMPLES
COMPROMISED
40 million credit cards were stolen
from Target’s severs
STOLEN CREDENTIALS
NEGLIGENT
DDoS attack from 10M+ hacked home
devices took down major websites
ALL USED THE SAME PASSWORD
MALICIOUS
Edward Snowden stole more than 1.7 million
classified documents
INTENDED TO LEAK INFORMATION
10
USER & ENTITY BEHAVIOR ANALYTICS
UEBA SECURITY
why this matters
USE CASES
how to detect malicious insiders
INFRASTRUCTURE
how to build big data infrastructure
CHALLENGES
how to build an enterprise solution
YOU
ARE
HERE
11
REAL WORLD ATTACKS CAUGHT BY NIARA
SCANNING ATTACK
scan servers in the data center to find
out vulnerable targets
DETECTED WITH AD LOGS
EXFILTRATION OF DATA
upload a large file to cloud server hosted in
new country never accessed before
DETECTED WITH WEB PROXY LOGS
DATA DOWNLOAD
download data from internal document
repository which is not typical for the host
DETECTED WITH NETWORK TRAFFIC
12
BEHAVIOR ENCODING USERS
User 1 User 2
13
BEHAVIOR ENCODING USER VS MACHINE
User Machine
14
ANOMALY DETECTION CONVOLUTIONAL NEURAL NETWORK (CNN)
Behavior
Image
(24x60x9)
8x20
Convolution
User
Labels
Feature
Maps
(24x60x40)
Feature
Maps
(12x30x40)
Feature
Maps
(12x30x80)
Feature
Maps
(6x15x80)
Output
Layer
1024
Nodes
2x2
Pooling
4x10
Convolution
2x2
Pooling
Fully
Connected
Fully
Connected
with Dropout
Feature Extraction Classification
15
BEHAVIOR ANOMALY USER | EXFILTRATION
User – Before Compromise User – Post Compromise
16
BEHAVIOR ANOMALY IOT DEVICE | DATA DOWNLOAD
Dropcam – Before Compromise Dropcam – Post Compromise
17
BEHAVIOR ANALYTICS MULTI-DIMENSIONAL
Behavioral
Analytics
Internal Resource Access
Finance servers
Authentication
AD logins
Remote Access
VPN logins
External Activity
C&C, personal email
SaaS Activity
Office 365, Box
Cloud IaaS
AWS, Azure
Physical Access
badge logs
Exfiltration
DLP, Email
18
ENTITY SCORING TEMPORAL SEQUENCE TRACKING
19
ENTITY SCORING RECURRENT NEURAL NETWORK (RNN)
t1,
PHISHING
EMAIL
INFECTION
t2,
SUSPCIOUS
C&C DNS
TUNNEL
t3,
ABORNOMAL
SERVER
ACCESS
t4,
LARGE DATA
UPLOAD TO
NEW
COUNTRY
Input Events Risk Scores
25
48
76
92
20
ENTITY SCORING RECURRENT NEURAL NETWORK (RNN)
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
t1,
PHISHING
EMAIL
INFECTION
t2,
SUSPCIOUS
C&C DNS
TUNNEL
t3,
ABORNOMAL
SERVER
ACCESS
t4,
LARGE DATA
UPLOAD TO
NEW
COUNTRY
Input Layer
(200 x 1)
Input Events
one hot
encoding
21
ENTITY SCORING RECURRENT NEURAL NETWORK (RNN)
f(t1)
0
0
0
0
f(t2-t1)
0
0
0
0
f(t3-t2)
0
0
0
0
f(t4-t3)
t1,
PHISHING
EMAIL
INFECTION
t2,
SUSPCIOUS
C&C DNS
TUNNEL
t3,
ABORNOMAL
SERVER
ACCESS
t4,
LARGE DATA
UPLOAD TO
NEW
COUNTRY
Input Layer
(200 x 1)
Input Events
one hot
time-decayed
encoding
22
ENTITY SCORING RECURRENT NEURAL NETWORK (RNN)
0.6
0
0
0
0
0.8
0
0
0
0
0.9
0
0
0
0
0.5
t1,
PHISHING
EMAIL
INFECTION
t2,
SUSPCIOUS
C&C DNS
TUNNEL
t3,
ABORNOMAL
SERVER
ACCESS
t4,
LARGE DATA
UPLOAD TO
NEW
COUNTRY
Input Layer
(200 x 1)
Input Events
one hot
time-decayed
encoding
23
ENTITY SCORING RECURRENT NEURAL NETWORK (RNN)
f(t1)
0
0
0
0
f(t2-t1)
0
0
0
0
f(t3-t2)
0
0
0
0
f(t4-t3)
t1,
PHISHING
EMAIL
INFECTION
t2,
SUSPCIOUS
C&C DNS
TUNNEL
t3,
ABORNOMAL
SERVER
ACCESS
t4,
LARGE DATA
UPLOAD TO
NEW
COUNTRY
Input Layer
(200 x 1)
Hidden Layer
(64 x 1)
Output Layer
(64 x 1)
Input Events Score Layer
(100 x 1)
Long-Short Term Memory (LSTM)
Risk Scores
25
48
76
92
24
USER & ENTITY BEHAVIOR ANALYTICS
UEBA SECURITY
why this matters
USE CASES
how to detect malicious insiders
INFRASTRUCTURE
how to build big data infrastructure
CHALLENGES
how to build an enterprise solution
YOU
ARE
HERE
25
DATA PIPELINE ARCHITECTURE
HDFS
Paruqet
HBase
ElasticSearch
Packets
Logs
Anomaly
Detection
Baseline
Profiling
Batch Analytics
Risk
Scoring
Data
Pre-processing
Real-Time
Detection
Streaming ETL
Cross-Source
Correlation
26
DEPLOYMENT OPTIONS ON-PREMISES & CLOUD
Private Cloud Public CloudOn Premises
27
DEPLOYMENT STRATEGIES DISTRIBUTED TENSORFLOW
Worker
(Chief)
WorkerWorker
Model
Parameter
Server
Parameter
Server
28
DEPLOYMENT STRATEGIES TENSORFLOW ON SPARK
Model
Edge Node
Spark Driver
Spark Executor
Parameter
Spark Executor
Parameter
Spark Executor
Worker
Spark Executor
Worker (C)
Spark Executor
Worker
29
DATA PIPELINE BIG DATA ECOSYSTEM
HDFS
Paruqet
HBase
ElasticSearch
Packets
Logs
Anomaly
Detection
Baseline
Profiling
Risk
Scoring
Data
Pre-processing
Real-Time
Detection
Cross-Source
Correlation
30
USER & ENTITY BEHAVIOR ANALYTICS
UEBA SECURITY
why this matters
USE CASES
how to detect malicious insiders
INFRASTRUCTURE
how to build big data infrastructure
CHALLENGES
how to build an enterprise solution
YOU
ARE
HERE
31
LOCAL CONTEXT HUMAN + MACHINE INTELLIGENCE
Models
Alerts
Reinforcement
Learning
Local
Context
Input
Data
Continuous
Learning
User
Feedback
32
TRAINING DATA GLOBAL + LOCAL INTELLIGENCE
Global Security Intelligence
in the cloud
Local Security Intelligence
Individual customer deployments
CLASSIFIER FEEDBACK
33
USER & ENTITY BEHAVIOR ANALYTICS
UEBA SECURITY
why this matters
USE CASES
how to detect malicious insiders
INFRASTRUCTURE
how to build big data infrastructure
CHALLENGES
how to build an enterprise solution
Thank You

Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggestions

  • 1.
    Deep Learning inSecurity: Examples, Infrastructure, Challenges and Suggestion Jisheng Wang, Shirley Wu June 13, 2017
  • 2.
    2 Ø Jisheng Wang,Senior Director of Data Science, CTO Office, Aruba / HPE • Over 12-year experiences: Machine Learning + Big Data => Security • Ph.D. @ Penn State, Chief Scientist @ Niara, Tech Lead @ Cisco Ø Shirley Wu, Data Architect, Aruba / HPE • Architect, big data infrastructure @ Niara • Manager, big data and analytics @ Nice Systems Ø Niara – a Hewlett Packard Enterprise company • Re-invented enterprise security via User and Entity Behavior Analytics (UEBA) • Acquired by Aruba, a Hewlett Packard Enterprise company in Feb, 2017 US, NIARA, ARUBA / HPE
  • 3.
    3 USER & ENTITYBEHAVIOR ANALYTICS (UEBA) UEBA SECURITY why this matters USE CASES how to detect malicious insiders INFRASTRUCTURE how to build big data infrastructure CHALLENGES how to build an enterprise solution YOU ARE HERE
  • 4.
    4 PROBLEM THE SECURITYGAP PREVENTION & DETECTION (US $B) SECURITY SPEND # BREACHES DATA BREACHES
  • 5.
    5 PROBLEM CAUSE OFTHE GAP ATTACKERS ARE QUICKLY INNOVATING & ADAPTING BATTLEFIELD WITH IOT AND CLOUD, SECURITY IS BORDERLESS
  • 6.
    6 PROBLEM ADDRESSING THECAUSE ATTACKERS ARE QUICKLY INNOVATING & ADAPTING DEEP LEARNING SOLUTIONS MUST BE RESPONSIVE TO CHANGES
  • 7.
    7 PROBLEM ADDRESSING THECAUSE BATTLEFIELD WITH IOT AND CLOUD, SECURITY IS BORDERLESS INSIDER BEHAVIOR LOOK AT BEHAVIOR CHANGE OF INSIDE USERS AND MACHINES
  • 8.
    8 USER & ENTITYBEHAVIOR ANALYTICS (UEBA) MACHINE LEARNING DRIVEN BEHAVIOR ANALYTICS IS A NEW WAY TO COMBAT ATTACKERS 1 Machine driven, not only human driven 2 Detect compromised users, not only attackers 3 Post-infection detection, not only prevention
  • 9.
    9 REAL WORLD NEWSWORTHY EXAMPLES COMPROMISED 40 million credit cards were stolen from Target’s severs STOLEN CREDENTIALS NEGLIGENT DDoS attack from 10M+ hacked home devices took down major websites ALL USED THE SAME PASSWORD MALICIOUS Edward Snowden stole more than 1.7 million classified documents INTENDED TO LEAK INFORMATION
  • 10.
    10 USER & ENTITYBEHAVIOR ANALYTICS UEBA SECURITY why this matters USE CASES how to detect malicious insiders INFRASTRUCTURE how to build big data infrastructure CHALLENGES how to build an enterprise solution YOU ARE HERE
  • 11.
    11 REAL WORLD ATTACKSCAUGHT BY NIARA SCANNING ATTACK scan servers in the data center to find out vulnerable targets DETECTED WITH AD LOGS EXFILTRATION OF DATA upload a large file to cloud server hosted in new country never accessed before DETECTED WITH WEB PROXY LOGS DATA DOWNLOAD download data from internal document repository which is not typical for the host DETECTED WITH NETWORK TRAFFIC
  • 12.
  • 13.
    13 BEHAVIOR ENCODING USERVS MACHINE User Machine
  • 14.
    14 ANOMALY DETECTION CONVOLUTIONALNEURAL NETWORK (CNN) Behavior Image (24x60x9) 8x20 Convolution User Labels Feature Maps (24x60x40) Feature Maps (12x30x40) Feature Maps (12x30x80) Feature Maps (6x15x80) Output Layer 1024 Nodes 2x2 Pooling 4x10 Convolution 2x2 Pooling Fully Connected Fully Connected with Dropout Feature Extraction Classification
  • 15.
    15 BEHAVIOR ANOMALY USER| EXFILTRATION User – Before Compromise User – Post Compromise
  • 16.
    16 BEHAVIOR ANOMALY IOTDEVICE | DATA DOWNLOAD Dropcam – Before Compromise Dropcam – Post Compromise
  • 17.
    17 BEHAVIOR ANALYTICS MULTI-DIMENSIONAL Behavioral Analytics InternalResource Access Finance servers Authentication AD logins Remote Access VPN logins External Activity C&C, personal email SaaS Activity Office 365, Box Cloud IaaS AWS, Azure Physical Access badge logs Exfiltration DLP, Email
  • 18.
    18 ENTITY SCORING TEMPORALSEQUENCE TRACKING
  • 19.
    19 ENTITY SCORING RECURRENTNEURAL NETWORK (RNN) t1, PHISHING EMAIL INFECTION t2, SUSPCIOUS C&C DNS TUNNEL t3, ABORNOMAL SERVER ACCESS t4, LARGE DATA UPLOAD TO NEW COUNTRY Input Events Risk Scores 25 48 76 92
  • 20.
    20 ENTITY SCORING RECURRENTNEURAL NETWORK (RNN) 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 t1, PHISHING EMAIL INFECTION t2, SUSPCIOUS C&C DNS TUNNEL t3, ABORNOMAL SERVER ACCESS t4, LARGE DATA UPLOAD TO NEW COUNTRY Input Layer (200 x 1) Input Events one hot encoding
  • 21.
    21 ENTITY SCORING RECURRENTNEURAL NETWORK (RNN) f(t1) 0 0 0 0 f(t2-t1) 0 0 0 0 f(t3-t2) 0 0 0 0 f(t4-t3) t1, PHISHING EMAIL INFECTION t2, SUSPCIOUS C&C DNS TUNNEL t3, ABORNOMAL SERVER ACCESS t4, LARGE DATA UPLOAD TO NEW COUNTRY Input Layer (200 x 1) Input Events one hot time-decayed encoding
  • 22.
    22 ENTITY SCORING RECURRENTNEURAL NETWORK (RNN) 0.6 0 0 0 0 0.8 0 0 0 0 0.9 0 0 0 0 0.5 t1, PHISHING EMAIL INFECTION t2, SUSPCIOUS C&C DNS TUNNEL t3, ABORNOMAL SERVER ACCESS t4, LARGE DATA UPLOAD TO NEW COUNTRY Input Layer (200 x 1) Input Events one hot time-decayed encoding
  • 23.
    23 ENTITY SCORING RECURRENTNEURAL NETWORK (RNN) f(t1) 0 0 0 0 f(t2-t1) 0 0 0 0 f(t3-t2) 0 0 0 0 f(t4-t3) t1, PHISHING EMAIL INFECTION t2, SUSPCIOUS C&C DNS TUNNEL t3, ABORNOMAL SERVER ACCESS t4, LARGE DATA UPLOAD TO NEW COUNTRY Input Layer (200 x 1) Hidden Layer (64 x 1) Output Layer (64 x 1) Input Events Score Layer (100 x 1) Long-Short Term Memory (LSTM) Risk Scores 25 48 76 92
  • 24.
    24 USER & ENTITYBEHAVIOR ANALYTICS UEBA SECURITY why this matters USE CASES how to detect malicious insiders INFRASTRUCTURE how to build big data infrastructure CHALLENGES how to build an enterprise solution YOU ARE HERE
  • 25.
    25 DATA PIPELINE ARCHITECTURE HDFS Paruqet HBase ElasticSearch Packets Logs Anomaly Detection Baseline Profiling BatchAnalytics Risk Scoring Data Pre-processing Real-Time Detection Streaming ETL Cross-Source Correlation
  • 26.
    26 DEPLOYMENT OPTIONS ON-PREMISES& CLOUD Private Cloud Public CloudOn Premises
  • 27.
    27 DEPLOYMENT STRATEGIES DISTRIBUTEDTENSORFLOW Worker (Chief) WorkerWorker Model Parameter Server Parameter Server
  • 28.
    28 DEPLOYMENT STRATEGIES TENSORFLOWON SPARK Model Edge Node Spark Driver Spark Executor Parameter Spark Executor Parameter Spark Executor Worker Spark Executor Worker (C) Spark Executor Worker
  • 29.
    29 DATA PIPELINE BIGDATA ECOSYSTEM HDFS Paruqet HBase ElasticSearch Packets Logs Anomaly Detection Baseline Profiling Risk Scoring Data Pre-processing Real-Time Detection Cross-Source Correlation
  • 30.
    30 USER & ENTITYBEHAVIOR ANALYTICS UEBA SECURITY why this matters USE CASES how to detect malicious insiders INFRASTRUCTURE how to build big data infrastructure CHALLENGES how to build an enterprise solution YOU ARE HERE
  • 31.
    31 LOCAL CONTEXT HUMAN+ MACHINE INTELLIGENCE Models Alerts Reinforcement Learning Local Context Input Data Continuous Learning User Feedback
  • 32.
    32 TRAINING DATA GLOBAL+ LOCAL INTELLIGENCE Global Security Intelligence in the cloud Local Security Intelligence Individual customer deployments CLASSIFIER FEEDBACK
  • 33.
    33 USER & ENTITYBEHAVIOR ANALYTICS UEBA SECURITY why this matters USE CASES how to detect malicious insiders INFRASTRUCTURE how to build big data infrastructure CHALLENGES how to build an enterprise solution
  • 34.