Learn from Satish Dandu, Michael Balint, and Joshua Patterson on how to accelerate anomaly detection and inferencing by using deep learning and GPU data pipelines.
4. 4
DATA PLATFORM-AS-A-SERVICE
• Handles 1M events/second
• Auto-scales the cluster
automatically
SCALE
• Offers HA with no data-loss
• Always-on architecture
• Data replication
HIGH AVAILABILITY
• Data platform security has
been implemented with
VPCs in AWS
• Dashboard access using
NVIDIA LDAP
SECURITY
• Log-to-analytics
• Kibana, JDBC access
• Accessing data using BI tools
SELF SERVICE
10. 10
ANOMALY DETECTION USING DEEP LEARNING
Data
Platform
AI Framework
(Keras +
TensorFlow)
NGC/NGN
GPU
Cluster
NGC/NGN
GPU
ClusterGPU Cloud
Anomaly
Detection
Top Features
Automated Alerts & Dashboards
Early Detection
Self Service
Better accuracy & less noise
11. 11
Raw Dataset
Feature Learning
Algorithm: Recurrent
Neural Network (RNN),
Autoencoders (AE)
Unsupervised Learning:
Multivariate-Gaussian
Supervised Learning:
Logistic Regression
Anomalies: Email alerts,
Dashboards
Time X1 X2
Time X1 X2 X’ X’’
Time X1 X2 X’ X’’ Y
1
0
Anomaly Post-
processing: Univariate
Analysis
Time X1 X2 Y Anomaly
Description
1 X1
0
Anomaly Detection
Feedback
from user
ANOMALY DETECTION FRAMEWORK
12. 12
ANOMALY DETECTION BENEFITS WITH DEEP LEARNING
Top Features
Automated Alerts &
Dashboards
Early Detection
Self Service
Better accuracy & less
noise
With DL Without DL
13. 13
ANOMALY DETECTION TRAINING
Ø Evolution
Ø CPU vs GPU
Ø Learnings :
Ø Manual feature extraction does not scale
Ø Dataset preparation is the long pole
Ø Training on CPU takes longer than data collection rate
V0:
Manual
Feature
Creation
V1:
Automatic
Feature
Creation
using DL
(Theano)
V2: Multi-
GPU support
+ TensorFlow
Serving
(Keras +
TensorFlow)
14. 14
ANOMALY DETECTION INFERENCING
1. Aggregated
Data output for
live data
2. Deep learning
Model
3. AD platform uses
trained model to
inference
anomalies
4. Sends
automated alerts
with dashboards
5. Users can label
alerts
6. Model gets
updated during
next training cycle
Goal : Detect/Inference
any anomalies in near-real
time based on user activity
based on trained models
http://www.sci-news.com/othersciences/linguistics/science-mystery-words-gullivers-
travels-03135.html
15. 15
V1: DATA PREP FOR INFERENCING
Ø Use Case: Detecting anomalies with user’s activity
Ø Inferencing flow from 10k feet
Ø Started with python scripts for windowed aggregation
Live
Streaming
Data
AD Platform
ETL
aggregations
for inferencing
73
103
154
0
50
100
150
200
10 MINS 30 MINS 60 MINS
Python Script Performance
Ø Learnings: Hard to scale for near real time.
AD platform runs inferencing every 3 mins
as we are impacted by speed of data
processing
Live Streaming
Data
AD Platform
16. 16
V2: IMPROVING DATA PREP PERFORMANCE
Ø V2: To improve performance, we started using Presto
with data on S3 in JSON format
Ø Live data will be streamed from Kafka to S3. We use
Presto for our data warehousing needs
Ø Presto is an open-source distributed SQL query engine
optimized for low-latency, ad-hoc analysis of data*
20
4
25
6
30
8
0
5
10
15
20
25
30
35
PRESTO ON JSON PRESTO ON PARQUET
10 mins 30 mins 60 mins
Live Streaming
Data
AD Platform
Ø Learnings: Presto with Parquet has best
performance but we need to batch data at 30
secs interval. So it’s not completely real time
18. 18
GPU ACCELERATION
Accelerate the pipeline, not just deep learning!
Ø GPUs for deep learning = proven
Ø Where else and how else can we use
GPU acceleration?
Ø Dashboards
Ø Accelerating data pipeline
Ø Stream processing
Ø Building better models faster
Ø First: GPU databases
Data Ingestion
Data Processing
Visualization
Model Training
Inferencing
24. 24
DASHBOARD PERFORMANCE
MapD Immerse vs Elastic Kibana
0
100
200
300
1 6 11 16 21 26 31
MapD Immerse (DGX)
MapD Immerse (P2)
Elastic Kibana
x
< 9s
< 12s
Days of Data
TimetoFullyLoad(seconds)
25. 25
Kratos Artificial IntelligenceV3: Data Prep using GPU acceleration
• V3: Explored GPU databases like MapD to improve the performance for querying on streaming live data
• MapD offers constant query response times
• MapD has some SQL limitations. We use Presto as an interface & built a “MapD-> Presto” connector for full ANSI Sql features
Live Streaming
Data
AD Platform
20
4
0.1 1.2
25
6
0.1 1.2
30
8
0.1 1.2
0
5
10
15
20
25
30
35
PRESTO ON JSON PRESTO ON PARQUET MAPD PRESTO + MAPD
GPU Database Performance
10 mins 30 mins 60 mins
ExecutionTime(seconds)
27. 27
EXPAND GPU USAGE
More Data, Less Hardware
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
2008 2010 2012 2014 2016 2017
Peak Double Precision
NVIDIA GPU x86 CPU
TFLOPS
Scaling up and out with GPU
29. 29
GOAI ECOSYSTEM
End To End Data Science
A Python open-source just-in-time
optimizing compiler that uses LLVM to
produce native machine instructions
Dask is a flexible parallel computing
library for analytic computing with dynamic
task scheduling and big data collections.
30. 30
GOAI ECOSYSTEM
End To End Data Science
A CUDA library for graph-processing designed
specifically for the GPU. It uses a high-level,
bulk-synchronous, data-centric abstraction
focused on vertex or edge operations.
Graph Analytics & Visualization
SIEM attack escalation Dropbox external sharing logs
31. 31
BETTER DATA PIPELINES
User Defined Functions at Scale
https://github.com/gpuopenanalytics
libgdf, pygdf, dask_gdf