GPU-Accelerating A Deep Learning Anomaly Detection Platform

2
AGENDA
Ø Data Platform-as–a-Service
Ø Multi-Tenancy
Ø Anomaly Detection Platform
Ø Training & Inferencing Evolution
Ø Performance & Learnings
Ø GPU Acceleration
Ø Dash boarding use case & impetus for GPU-
acceleration
Ø Performance & Learnings
Ø Future & GOAI

4
DATA PLATFORM-AS-A-SERVICE
• Handles 1M events/second
• Auto-scales the cluster
automatically
SCALE
• Offers HA with no data-loss
• Always-on architecture
• Data replication
HIGH AVAILABILITY
• Data platform security has
been implemented with
VPCs in AWS
• Dashboard access using
NVIDIA LDAP
SECURITY
• Log-to-analytics
• Kibana, JDBC access
• Accessing data using BI tools
SELF SERVICE

9
DATA PAAS
Anomaly
Detection
(AD) PaaS
*Images created from quickmeme.com

10
ANOMALY DETECTION USING DEEP LEARNING
Data
Platform
AI Framework
(Keras +
TensorFlow)
NGC/NGN
GPU
Cluster
NGC/NGN
GPU
ClusterGPU Cloud
Anomaly
Detection
Top Features
Automated Alerts & Dashboards
Early Detection
Self Service
Better accuracy & less noise

11
Raw Dataset
Feature Learning
Algorithm: Recurrent
Neural Network (RNN),
Autoencoders (AE)
Unsupervised Learning:
Multivariate-Gaussian
Supervised Learning:
Logistic Regression
Anomalies: Email alerts,
Dashboards
Time X1 X2
Time X1 X2 X’ X’’
Time X1 X2 X’ X’’ Y
1
0
Anomaly Post-
processing: Univariate
Analysis
Time X1 X2 Y Anomaly
Description
1 X1
0
Anomaly Detection
Feedback
from user
ANOMALY DETECTION FRAMEWORK

12
ANOMALY DETECTION BENEFITS WITH DEEP LEARNING
Top Features
Automated Alerts &
Dashboards
Early Detection
Self Service
Better accuracy & less
noise
With DL Without DL

13
ANOMALY DETECTION TRAINING
Ø Evolution
Ø CPU vs GPU
Ø Learnings :
Ø Manual feature extraction does not scale
Ø Dataset preparation is the long pole
Ø Training on CPU takes longer than data collection rate
V0:
Manual
Feature
Creation
V1:
Automatic
Feature
Creation
using DL
(Theano)
V2: Multi-
GPU support
+ TensorFlow
Serving
(Keras +
TensorFlow)

14
ANOMALY DETECTION INFERENCING
1. Aggregated
Data output for
live data
2. Deep learning
Model
3. AD platform uses
trained model to
inference
anomalies
4. Sends
automated alerts
with dashboards
5. Users can label
alerts
6. Model gets
updated during
next training cycle
Goal : Detect/Inference
any anomalies in near-real
time based on user activity
based on trained models
http://www.sci-news.com/othersciences/linguistics/science-mystery-words-gullivers-
travels-03135.html

15
V1: DATA PREP FOR INFERENCING
Ø Use Case: Detecting anomalies with user’s activity
Ø Inferencing flow from 10k feet
Ø Started with python scripts for windowed aggregation
Live
Streaming
Data
AD Platform
ETL
aggregations
for inferencing
73
103
154
0
50
100
150
200
10 MINS 30 MINS 60 MINS
Python Script Performance
Ø Learnings: Hard to scale for near real time.
AD platform runs inferencing every 3 mins
as we are impacted by speed of data
processing
Live Streaming
Data
AD Platform

16
V2: IMPROVING DATA PREP PERFORMANCE
Ø V2: To improve performance, we started using Presto
with data on S3 in JSON format
Ø Live data will be streamed from Kafka to S3. We use
Presto for our data warehousing needs
Ø Presto is an open-source distributed SQL query engine
optimized for low-latency, ad-hoc analysis of data*
20
4
25
6
30
8
0
5
10
15
20
25
30
35
PRESTO ON JSON PRESTO ON PARQUET
10 mins 30 mins 60 mins
Live Streaming
Data
AD Platform
Ø Learnings: Presto with Parquet has best
performance but we need to batch data at 30
secs interval. So it’s not completely real time

18
GPU ACCELERATION
Accelerate the pipeline, not just deep learning!
Ø GPUs for deep learning = proven
Ø Where else and how else can we use
GPU acceleration?
Ø Dashboards
Ø Accelerating data pipeline
Ø Stream processing
Ø Building better models faster
Ø First: GPU databases
Data Ingestion
Data Processing
Visualization
Model Training
Inferencing

19
21 30
1560
80 99
1250
150
269
2250
372
696
2970
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
MapD DGX-1 MapD 4 x P100 Redshift 6-node Spark 11-node
Query 1 Query 2 Query 3 Query 4
1+ BILLION TAXI RIDES BENCHMARK
TimeinMilliseconds
Source: MapD Benchmarks on DGX from internal NVIDIA testing following guidelines of
Mark Litwintschik’s blogs: Redshift, 6-node ds2.8xlarge cluster & Spark 2.1, 11 x m3.xlarge cluster w/ HDFS @marklit82
10190 8134 19624 85942

20
MAPD + IMMERSE VS ELASTIC + KIBANA
Elastic + Kibana
• Fantastic for complex search
• Scales easily (up to a point)
• Indexing consumes more storage (~4-6x)
• Kibana for KPI dashboarding?
MapD Core
• Very fast OLAP queries
• JIT LLVM query compiler
• GPUs for compute
• CPUs for parse + ingest
• Limited join support (for now)
• Concurrency?
Immerse
• c3/d3 + crossfilter = nice dashboards
• Backend rendering

22
ARCHITECTURE
V2 (with MapD)

23
MAPD VS KIBANA
Dashboards Comparison + Performance Test Method

24
DASHBOARD PERFORMANCE
MapD Immerse vs Elastic Kibana
0
100
200
300
1 6 11 16 21 26 31
MapD Immerse (DGX)
MapD Immerse (P2)
Elastic Kibana
x
< 9s
< 12s
Days of Data
TimetoFullyLoad(seconds)

25
Kratos Artificial IntelligenceV3: Data Prep using GPU acceleration
• V3: Explored GPU databases like MapD to improve the performance for querying on streaming live data
• MapD offers constant query response times
• MapD has some SQL limitations. We use Presto as an interface & built a “MapD-> Presto” connector for full ANSI Sql features
Live Streaming
Data
AD Platform
20
4
0.1 1.2
25
6
0.1 1.2
30
8
0.1 1.2
0
5
10
15
20
25
30
35
PRESTO ON JSON PRESTO ON PARQUET MAPD PRESTO + MAPD
GPU Database Performance
10 mins 30 mins 60 mins
ExecutionTime(seconds)

27
EXPAND GPU USAGE
More Data, Less Hardware
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
2008 2010 2012 2014 2016 2017
Peak Double Precision
NVIDIA GPU x86 CPU
TFLOPS
Scaling up and out with GPU

28
EXPAND GPU USAGE
Internal Logs to Cyber Security
Cyber Security Analytics Platform

29
GOAI ECOSYSTEM
End To End Data Science
A Python open-source just-in-time
optimizing compiler that uses LLVM to
produce native machine instructions
Dask is a flexible parallel computing
library for analytic computing with dynamic
task scheduling and big data collections.

30
GOAI ECOSYSTEM
End To End Data Science
A CUDA library for graph-processing designed
specifically for the GPU. It uses a high-level,
bulk-synchronous, data-centric abstraction
focused on vertex or edge operations.
Graph Analytics & Visualization
SIEM attack escalation Dropbox external sharing logs

31
BETTER DATA PIPELINES
User Defined Functions at Scale
https://github.com/gpuopenanalytics
libgdf, pygdf, dask_gdf

32
HIVE to BlazingDB

33
More Models
nvGRAPH
https://github.com/h2oai/h2o4gpu
# edges = E * 2^S ~34M

34
JOIN THE REVOLUTION
Everyone Can Help!
APACHE ARROW APACHE PARQUET GPU Open Analytics
Initiative
https://arrow.apache.org/
@ApacheArrow
https://parquet.apache.org/
@ApacheParquet
http://gpuopenanalytics.com/
@Gpuoai
Integrations, feedback, documentation support, pull requests, new issues, or donations welcomed!

35
THANK
YOU!
NVIDIA Team
Ed Clune
Ayush Jaiswal
Keith Kraus
Rohit Kulkarni
Jarod Maupin
Nixs Nataraja
Rohan Somvanshi
Michael Wendt

GPU-Accelerating A Deep Learning Anomaly Detection Platform

GPU-Accelerating A Deep Learning Anomaly Detection Platform

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to GPU-Accelerating A Deep Learning Anomaly Detection Platform

Similar to GPU-Accelerating A Deep Learning Anomaly Detection Platform (20)

More from NVIDIA

More from NVIDIA (20)

Recently uploaded

Recently uploaded (20)

GPU-Accelerating A Deep Learning Anomaly Detection Platform