How To Achieve Real-Time Analytics On A Data Lake Using GPUs

HOW TO ACHIEVE REAL-TIME ANALYTICS ON A
DATA LAKE USING GPUS
Mark Brooks - Principal System Engineer @ Kinetica
May 09, 2017

The Challenge:
How to maintain analytic performance while dealing
with:
• Larger data volumes
• Streaming data with minimal end-to-end latency
• Ad-hoc drill down (you can’t pre-aggregate everything)
2

Architectural and Design Approaches
1. One database to rule them all
2. SQL on Hadoop (or directly on the Data Lake)
3. Data Lake + NoSQL + Spark + Search + Cache +…
4. Lambda Architecture
5. Kappa Architecture
6. Next generation hardware acceleration
3

One Database To Rule Them All
4

SQL on a Data Lake
Credit: https://www.slideshare.net/Bigdatapump/sql-on-hadoop-49494494
5

Hadoop + NoSQL + Search + Memory Cache +…
Credit: Matt Turck - https://www.slideshare.net/mjft01/big-data-landscape-matt-turck-may-2014
6

Lambda Architecture
Credit: Nathan Marz http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html
James Kinley http://jameskinley.tumblr.com/tagged/Lambda
7

Lambda Architecture
Credit: James Kinley http://jameskinley.tumblr.com/tagged/Lambda
7

Kappa Architecture
Credit: Jay Kreps https://www.oreilly.com/ideas/questioning-the-lambda-architecture
8

Kappa Architecture
8
Stream processing systems already have a notion
of parallelism; why not just handle reprocessing by
increasing the parallelism and replaying history
very, very fast?

Next Generation Hardware Acceleration
8
Consider a system with these characteristics:
• Horizontally Scalable
• Low end-to-end latency
• Powerful enough to not require pre-aggregation
This is now possible…

GPU Accelerated Compute
12
DATA WAREHOUSE
RDBMS & Data Warehouse
technologies enable
organizations to store and
analyze growing volumes of data
on high performance machines,
but at high cost.
DISTRIBUTED STORAGE
Hadoop and MapReduce
enables distributed storage and
processing across multiple
machines.
Storing massive volumes of data
becomes more affordable, but
performance is slow
AFFORDABLE MEMORY
Affordable memory allows for
faster data read and write.
HANA, MemSQL, & Exadata
provide faster analytics.
1990 - 2000’s 2005… 2010… 2017…
AT SCALE PROCESSING
BECOMES THE
BOTTLENECK
GPU ACCELERATED COMPUTE
GPU cores bulk process tasks in
parallel - far more efficient for many
data-intensive tasks than CPUs
which process those tasks linearly.

Kinetica: Core
13
ANALYTICS DATABASE ACCELERATED BY GPUs
KINETICA
Commodity Hardware
w/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
GPU Accelerated
Columnar In-memory Database
HTTP Head Node
Columnar in-memory database
Data available much like a traditional RDBMS… rows,
columns
Data held in-memory; persisted to disk
Interact with Kinetica through its native REST API,
Java, Python, JavaScript, NodeJS, C++, SQL, etc… as
well as with various connectors
Native GIS & IP address object support
VERY FAST: Ideal for OLAP workloads
Typical hardware setup: 256GB - 1TB
memory with 2-4 GPUs per node.

Multi-Head Ingest and Scale-Out Architecture
ON-DEMAND SCALE OUT
Commodity Hardware
w/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar
In-memory
HTTP Head Node
+
Commodity Hardware
w/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar
In-memory
HTTP Head Node
Commodity Hardware
w/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar
In-memory
HTTP Head Node
MULTI-HEAD INGEST 19

Real-Time Data Handlers for Structured & Unstructured Data
VISUALIZATION via ODBC/JDBCAPIs
Java API
JavaScript API
REST API
C++ API
Node.js API
Python API
OPEN SOURCE
INTEGRATION
Apache NiFi
Apache Kafka
Apache Spark
Apache Storm
GEOSPATIAL CAPABILITIES
Geometric
Objects
Tracks
Geospatial
Endpoints
WMS
WKT
KINETICA CLUSTER
On-Demand Scale
Commodity Hardware
w/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar
In-memory
HTTP Head Node
Commodity Hardware
w/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar
In-memory
HTTP Head Node
Commodity Hardware
w/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar
In-memory
HTTP Head Node
Commodity Hardware
w/ GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar
In-memory
HTTP Head Node
OTHER
INTEGRATION
Message Queues
ETL Tools
Streaming Tools
20

Parallel Ingest Provides High Performance Streaming
16
1 NODE (1TB/2GPU)
PARALLEL
INGEST
1 NODE (1TB/2GPU)
1 NODE (1TB/2GPU)
Each node of the system can share the task of data
ingest, provides more and faster throughput. It can be
made faster simply by adding more nodes.
No compute is used on ingest !

Speed Layer for the Data Lake
17
Parallel
Ingestion
Parallel ingestion of events
Kinetica is speed layer with real-
time analytic capabilities
HDFS for archival store
Much looser coupling than
traditional lambda architecture
Batch mode Spark or MR jobs
can push data to Kinetica as
needed for fast query on data
loaded from the data lake
EVENTS
MESSAGE
BROKERS
Amazon
Kinesis
ANALYSTS
MOBILE
USERS
DASHBOARDS &
APPLICATIONS
ALERTING
SYSTEMS
Put, get, scan
Execute complex
analytics on the fly
Kinetica
Connectors
STREAM
PROCESSING
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS / AWS S3 / GCS / Azure Data Lake

Real-Time, Advanced Analytics, Speed Layer for Teradata or Oracle
18
Parallel ingestion of events
Lambda-type architecture for
Teradata or Oracle
Kinetica is speed layer with
near-real-time analytic
capabilities
Converge Machine Learning,
streaming and location
analytics and fast Query and
Analytics with Kinetica and
RDBMS
DATA IN
MOTION
AND REST
DATA WAREHOUSE / TRANSACTIONAL
Amazon
Kinesis
ANALYSTS
MOBILE
USERS
DASHBOARDS &
APPLICATIONS
ALERTING
SYSTEMS
Kinetica
Connectors
STREAM / ETL
PROCESSING
Fast GPU
accelerated, in-
Memory Database
Converge ML, AI,
Streaming

Advanced In-Database Analytics
1. User-defined functions (UDFs) can receive table data,
do arbitrary computations, and save output to a
separate table in a distributed manner.
2. UDFs have direct access to CUDA APIs – enables
compute-to-grid analytics for logic deployed within
Kinetica.
3. Works with custom code, or packaged code. Opens
the way for machine learning/artificial intelligence
libraries such as TensorFlow, BIDMach, Caffe and
Torch to work on data directly within Kinetica.
4. Available now with C++ & Java bindings.
19
ORCHESTRATION LAYER WITH USER-DEFINED FUNCTIONS (UDFs)
PHYSICAL / VIRTUAL SERVER
Table A
Table n
GPU
UDFs exposed from
RESTful endpoint
Data returned to
output table for
further analysis
CUDA Libraries
n number of Kinetica servers
Table B
Table C
Proc Server
UDF_A UDF_B UDF_n
/exec/proc/UDF_A/

Kinetica Architecture
20
ETL / STREAM
PROCESSING
ON DEMAND SCALE OUT +
1TB MEM / 2 GPU CARDS
SQL
Native
APIs
PARALLEL INGEST
Geospatial
WMS
Custom
Connectors
In-Database Processing
CUSTOM LOGIC
BIDMach
ML Libs
BI DASHBOARDS
BI / GIS / APPS
CUSTOM APPS
& GEOSPATIAL
KINETICA ‘REVEAL’
STREAMING DATAERP / CRM /
TRANSACTIONAL DATA
UDFs

21
AI & BI on One GPU-Accelerated Database
HIGH PERFORMANCE ANALYTICS
DATABASE
UDF UDF UDF
ODBC
/ JDBC Native
REST API WMS
BUSINESS INTELLIGENCE
CUSTOM APPLICATIONS
HIGH FIDELITY
GEOSPATIAL PIPELINE
MACHINE LEARNING
& DEEP LEARNING GPU-ACCELERATED
DATA SCIENCE
PREDICTIVE MODELS
e.g. Risk Management,
Sales Volume, Fraud.
BIDMach
SQL
DATA SCIENTISTS
/ DEVELOPERS
BUSINESS USERS

50-100x Faster on Queries with Large Datasets
• Large retailer tested complex SQL queries
on 3 years of retail data (150bn rows)
• 10 node Kinetica cluster against 30TB+
cluster from next best alternative
• GPU is able to perform many instructions in
parallel. Huge performance gains on
aggregations, group bys, joins, etc.
• Kinetica sustained ingest of 1.3bn
objects/minute with 70 attributes per row
22
WHEN COMPARED TO LEADING IN-MEMORY ALTERNATIVES
SUM (Q1)
GROUP BY (Q5)
SELECT (Q10)
0 5 10 15 20 25 30 35 40 45 50
Kinetica Leading In-Memory DB
More Details

23
Distributed Geospatial Pipeline
23
• NATIVE VISUALIZATION IS DESIGNED FOR FAST MOVING, LOCATION-BASED DATA
Native Geospatial Object Types
• Points, Shapes, Tracks, Labels
Native Geospatial Functions
• Filters (by area, by series, by geometry, etc.)
• Aggregation (histograms)
• Geofencing - triggers
• Video generation (based on dates/times)
Generate Map Overlay Imagery (via WMS)
• Rasterize points
• Style based on attributes (class-break)
• Heat maps

Full-Text Search
“Rain Tire” ~5Kinetica includes powerful text search functionality,
including :
• Exact Phrases
• Boolean – AND / OR
• Wildcards
• Grouping
• Fuzzy Search (Damerau-Levenshtein optimal string alignment algorithm)
• N-Gram Term Proximity Search
• Term Boosting Relevance Prioritization
"Union Tranquility"~10
[100 TO 200]
22

INTELLIGENCE: US Army - INSCOM
US Army’s in-memory computational engine for
any data with a geospatial or temporal attribute for
a major joint cloud initiative within the Intelligence
Community (IC ITE).
Intel analysts are able to conduct near real-time
analytics and fuse SIGINT, ISR, and GEOINT
streaming big data feeds and visualize in a web
browser.
First time in history military analysts are able to
query and visualize billions to trillions of near real-
time objects in a production environment.
Major executive military and congressional
visibility.
Oracle Spatial
(92 Minutes)
42x Lower Space
28x Lower Cost
38x Lower Power Cost
U.S Army INSCOM Shift from Oracle to GPUdb
GPUdb
(20ms)
1 GPUdb server vs 42 servers with Oracle 10gR2 (2011)
CASE STUDY : LOCATION BASED ANALYTICS
24

LOGISTICS: Workforce optimization
DISTRIBUTED ANALYSIS
USPS’ parallel cluster is able to serve up to 15,000
simultaneous sessions, providing the service’s managers
and analysts with the capability to instantly analyze
their areas of responsibility via dashboards.
AT SCALE
With 200,000 USPS devices emitting location once
every minute, that amounts to more than a quarter
billion events captured and analyzed daily… tracked on
10 nodes.
USPS is the single largest logistic entity in the
country, moving more individual items in four
hours than the combination of UPS, FedEx,
and DHL move all year.
25

LOGISTICS & FLEET MANAGEMENT
27
Kinetica enables agile tracking of shipments to
assist store managers for tracking of inventory
and arrival times.
• Visibility and tracking of deliveries & trucks for store
managers
• ETA & Notifications – Provide estimated time of delivery,
notifications and custom location based alerting
• Route Optimization based on truck size, and if cargo is
perishable or contains hazardous materials.
LARGE
RETAILER

RISK MANAGEMENT
28
Large financial institution moves counterparty
risk analysis from overnight to real-time.
• Data collected by XVA library which computes risk
metrics for each trade
• Risk computations are becoming more complex and
computationally heavy. xVA analysis needs to project
years into the future.
• Kinetica enables banks to move from batch/overnight
analysis to a streaming/real-time system for flexible
real-time monitoring by traders, auditors and
management.
MULTINATIONAL
BANK
CASE STUDY : ADVANCED IN-DATABASE ANALYTICS

Scale Out on Industry Standard Hardware
29
Kinetica typically results in 1⁄10 hardware costs of standard in-memory databases.
IN THE CLOUD WITH:
CERTIFIED ON PREMISE WITH:
Runs on industry standard servers,
512GB memory with GPUs (ex. NVIDIA
K80)
COMING SOON:

Stop by Booth #431 and
Get Your Free T-shirt
www.kinetica.com

How To Achieve Real-Time Analytics On A Data Lake Using GPUs

More Related Content

What's hot

Similar to How To Achieve Real-Time Analytics On A Data Lake Using GPUs

Recently uploaded

How To Achieve Real-Time Analytics On A Data Lake Using GPUs