Mate Radalj's presentation on how to operationalize machine learning using GPU-accelerated, in-database analytics, given at the Bay Area GPU-Accelerated Computing Meetup on October 19, 2017. Presentation includes use cases and links to demos.
2. Why a GPU Database?
• Leverage Innovations in CPU and GPU
technology
• Big Data
• Traditional Analytics
• Emerging AI/ML/Deep Learning
Computing
• Real-Time Ingestion
• Geospatial and Temporal
• Single Hardware Platform
• Simplified Software Stack
3
3. AI/ML/Deep Learning Lifecycle
•Create, extract, transform, and
process big data: batch and streams
• Apply ML to data.
•Model pre-processing
•Model execution
•Model post-processing
•Within an ecosystem of general
analytics
•Supporting a range of human and
machine consumers
9
5. 5
Typical AI Process: High Latency, Rigid, Complex HW and SW Stack
SPECIALIZED AI/ DATA
SCIENCE TOOLS
SUBSET
DATA SCIENTISTSBUSINESS USERS
EXTRACT
EXTRACTING DATA FOR AI IS
EXPENSIVE AND SLOW
ENTERPRISES
STRUGGLE TO
MAKE AI MODELS
AVAILABLE TO
BUSINESS
???
• MapReduce
• Spark
• Cassandra
• SQL Databases
• DFS
• CPU Compute Nodes
• GPU Compute Nodes
Proliferation of Hardware & Software Components
6. Kinetica: A More Ideal AI Process
6
Monte Carlo Risk
Custom Function 2
Custom Function 3
API EXPOSES CUSTOM
FUNCTIONS WHICH CAN BE
MADE AVAILABLE TO BUSINESS
USERS
BUSINESS USERS
DATA SCIENTISTS
UDFs
Single Hardware Platform
• Analytics
• AI/ML/Deep Learning
• Power of in-memory SQL
• Integrated CPU/GPU
• Bomb with Streams
7. Current Inefficient Use of Python
7
python
• Interpreted
• Single threaded
• Clean, transform
• Flow: for each member
• Pre-process
• Model execute
• Post-process
=
8. Optimized SQL and Python UDF with Kinetica
8
=
SQL
UDF
python
SQL
• Pre-process
• Binary executable code
• Superior optimization
• declarative SQL
• Model execute
• Only essential imperative model code
• Not relational set processing
• Post-process
• Binary executable code
• Superior optimization
• Declarative SQL
9. Various
ETL/ELT
Head
Node
Worker
1
KINETICA: 10 Node Cluster
Worker
9
Fact and dimensions tables for various Use Cases
Billions of rows
Massive Stream
Ingestion
Massive Fast
Analytics
Apache Tomcat Applications Servers
• Spring Endpoint oriented architecture
• Horizontal elastic scaling
Full Model Pipeline 1
Various
ETL/ELT
Full Model Pipeline N
Prompts
Project
Overall technology Architecture
9
Fast Streaming
Projects
Fast Analytics
Projects
11. MNIST: Simple Image Processing Use Case
11
A Parametric Model: Python Using TensorFlow
Model Training
• Set of image files stored in Kinetica Database Table
• Grey: 1 channel, 2D
• Color: 3 channel, 3D
• Python UDF in Kinetica using TensorFlow
• Convert each image in Kinetica table to flattened 1D Array and insert into data
frame
• Or use raw image format and insert into data frame.
• Call TensorFlow: convert data frame to tensors
• N layer Neural Network
• Runs on GPU
• Output = table TFModel
• coefficients
Model Serving
• Python UDF in Kinetica using TensorFlow
• Input = table TFModel table.
• Output = table mnist_inference_out
Model Analytics
• SQL!!! Predict images of numbers: 0, 1, 2, 4, 4, 5, 6, 7, 8, 9
13. Demo’s
13
UDF in Kinetica
1. Write the UDF: pythjon, java, c/c++, and javascript.
2. Register the UDF
3. Invoke the UDF
https://bitbucket.org/gisfederal/gsk-imagerecognition/src
https://bitbucket.org/gisfederal/gsk-imagerecognition/src
14. Amit Vij | CEO | Kinetica 14
Bringing it All Together with Geospatial
15. 15
IoT Data Challenges and Geospatial
EXPLOSION OF DATA
Structured and unstructured
Devices, Sensors
Industrial IoT
REAL-TIME DEMANDS
Current Technology:
I/O Bound
Compute Bound
EXISTING SOLUTIONS NOT WORKING
Too Complex
Batch Processing
Duct taping 5-10 technologies
16. 16
Accelerated Geospatial with Kinetica | Fast, Scalable, Flexible
Solution
• Full data provisioning
• Scale and speed
• Flexibility
• Simplicity
Bonus
• Converge AI and BI
• Streaming Analytics
17. Kinetica Database | Geospatial 101
17
Geospatial Objects
z
Points
Lines
Polygons
Tracks
Labels
Spatial Operations
Accelerated Spatial Operations
SQL Expression & API Support
Spatial Queries, Filters & Joins
Geospatial Event Triggers
Geospatial Visualization
Server-side Rendering Vector data via
WMS
Complex Symbology Support
Several Built-in Geospatial Renderers
1
2
3
19. Kinetica Machine Learning Use Cases
.
19
OLAP
Performance,
Scalability,
Stability
Geospatial
Processing &
Visualization
API for GPU
Powered Data
& Compute
Orchestration
• Activity Based
Intelligence
• Oil & Gas
• Drilling
optimization
• Logistics
• Last Mile
• Fleet Management
Full Data Science
Model Pipeline
ML / AI
Augmentation
Geospatial
Fast Ingest, Fast
Streaming, and
Fast Analytics
• Supply Chain Management
• Replenishment: Real-
time mass streaming
ingest and analytics
• Integrated planning:
Massively concurrent,
high throughput analytics
20. INTELLIGENCE | US Army - INSCOM
Oracle Spatial
(92 Minutes)
42x Lower Space
28x Lower Cost
38x Lower Power Cost
U.S Army INSCOM Shift from Oracle to GPUdb
GPUdb
(20ms)
1 GPUdb server vs 42 servers with Oracle 10gR2 (2011)
NEW CAPABILITIES DELIVERED
• Intel analysts can do real-time geospatial
analytics on over 200 streaming data feeds
• Military analysts are able to query and visualize
billions to trillions of near real-time objects
SOLUTION OVERVIEW
• US Army’s in-memory computational engine for
geospatial and temporal data.
• Queries down from 92 minutes to less than 1
second
• Replaced 42 Oracle 10gR2 servers with a
single Kinetica server – 42x lower space, 28x
lower cost, 38X lower power cost
20
21. LOGISTICS | Workforce optimization
NEW CAPABILITIES DELIVERED
• Real-time delivery and pickup notifications, shipment
routing, just-in-time supplies
• Real-time route optimization - route planning, rerouting
• Geospatial analytics to uncover overlapping coverage
areas, uncovered areas, and distribution bottlenecks
SOLUTION OVERVIEW
• Collect, process, and analyze 200,000 messages per
minute for real-time streaming analytics. 15,000 daily
sessions with 5 9’s uptime
21
22. PIPE LINE & WELL RESEARCH | Location-based analytics
22
NEW CAPABILITIES DELIVERED
• Geospatial visualization and analytics of massive number of
wells, pipelines by land ownership, region etc.
• Custom visualizations and charts for data-driven insights
• Embedded solution with seamless Node.js integration, GPU
acceleration
SOLUTION OVERVIEW
• Kinetica running in RSEG’s Amazon Web Services VPC
deployment
23. Automotive | Connected Car Analytics with Machine Learning
23
Activity-Based Intelligence
• Behavioral analytics
• Ex: Lost driver alerts
• GPS data + real-time route tracking
• Rule-based analytics
• Ex: Dangerous driving
• Rainy weather + condition of road + MPH
• Complicated pattern recognition
• Ex: Tailgating
• Collecting speed and brake data
• Model trained to classify driving patterns
Intelligence
Streaming Data
Actuation loop
27. Contact:
/ kinetica.com
/ Email: info@kinetica.com
Thank You!
/www.nvidia.com/analytics
/www.nvidia.com/dgx1
/Email: dgxanalytics@nvidia.com
Save the Date
December 7th, Kinetica HQ, San Francisco
Holiday Networking and ”Housewarming” Event for new Kinetica SF HQ