HOW	TO	ACHIEVE	REAL-TIME	ANALYTICS	ON	A	
DATA	LAKE	USING	GPUS	
Mark	Brooks	- Principal	System	Engineer	@	Kinetica
May	09,	2017
The	Challenge:
How	to	maintain	analytic	performance	while	dealing	
with:
• Larger	data	volumes
• Streaming	data	with	minimal	end-to-end	latency
• Ad-hoc	drill	down		(you	can’t	pre-aggregate	everything)
2
Architectural	and	Design	Approaches
1. One	database	to	rule	them	all
2. SQL	on	Hadoop	(or	directly	on	the	Data	Lake)
3. Data	Lake	+	NoSQL	+	Spark	+	Search	+	Cache	+…
4. Lambda	Architecture
5. Kappa	Architecture
6. Next	generation	hardware	acceleration
3
One	Database	To	Rule	Them	All
4
SQL	on	a	Data	Lake
Credit:		https://www.slideshare.net/Bigdatapump/sql-on-hadoop-49494494
5
Hadoop	+	NoSQL	+	Search	+	Memory	Cache	+…
Credit:		Matt	Turck - https://www.slideshare.net/mjft01/big-data-landscape-matt-turck-may-2014
6
Lambda	Architecture
Credit:	 Nathan	Marz http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html
James	Kinley					http://jameskinley.tumblr.com/tagged/Lambda
7
Lambda	Architecture
Credit:	James	Kinley					http://jameskinley.tumblr.com/tagged/Lambda
7
Kappa	Architecture
Credit:			Jay	Kreps			https://www.oreilly.com/ideas/questioning-the-lambda-architecture					
8
Kappa	Architecture
Credit:			Jay	Kreps			https://www.oreilly.com/ideas/questioning-the-lambda-architecture					
8
Stream processing systems already have a notion
of parallelism; why not just handle reprocessing by
increasing the parallelism and replaying history
very, very fast?
Next	Generation	Hardware	Acceleration
Credit:			Jay	Kreps			https://www.oreilly.com/ideas/questioning-the-lambda-architecture					
8
Consider	a	system	with	these	characteristics:
• Horizontally	Scalable
• Low	end-to-end	latency
• Powerful	enough	to	not	require	pre-aggregation
This	is	now	possible…
GPU	Accelerated	Compute
12
DATA	WAREHOUSE
RDBMS	&	Data	Warehouse		
technologies	enable	
organizations	to	store	and	
analyze	growing	volumes	of	data	
on	high	performance	machines,	
but	at	high	cost.
DISTRIBUTED	STORAGE
Hadoop	and	MapReduce	
enables	distributed	storage	and	
processing	across	multiple	
machines.
Storing	massive	volumes	of	data	
becomes	more	affordable,	but	
performance	is	slow		
AFFORDABLE	MEMORY
Affordable	memory	allows	for	
faster	data	read	and	write.	
HANA,	MemSQL,	&	Exadata
provide	faster	analytics.
1990	- 2000’s 2005… 2010… 2017…
AT	SCALE	PROCESSING	
BECOMES	THE	
BOTTLENECK
GPU	ACCELERATED	COMPUTE
GPU	cores	bulk	process	tasks	in	
parallel	- far	more	efficient	for	many	
data-intensive	tasks	than	CPUs	
which	process	those	tasks	linearly.
Kinetica:	Core
13
ANALYTICS	DATABASE	ACCELERATED	BY	GPUs
KINETICA
Commodity	Hardware
w/	GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
GPU	Accelerated
Columnar	In-memory	Database
HTTP	Head	Node
Columnar	in-memory	database
Data	available	much	like	a	traditional	RDBMS…	rows,	
columns
Data	held	in-memory;	persisted	to	disk
Interact	with	Kinetica	through	its	native	REST	API,	
Java,	Python,	JavaScript,	NodeJS,	C++,	SQL,	etc…	as	
well	as	with	various	connectors
Native	GIS	&	IP	address	object	support
VERY	FAST:	Ideal	for	OLAP	workloads
Typical	hardware	setup:	256GB	- 1TB	
memory	with	2-4	GPUs	per	node.
Multi-Head	Ingest	and	Scale-Out	Architecture
ON-DEMAND	SCALE	OUT
Commodity	Hardware
w/	GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar	
In-memory
HTTP	Head	Node
+
Commodity	Hardware
w/	GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar	
In-memory
HTTP	Head	Node
Commodity	Hardware
w/	GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar	
In-memory
HTTP	Head	Node
MULTI-HEAD	INGEST 19
Real-Time	Data	Handlers	for	Structured	&	Unstructured	Data
VISUALIZATION	via	ODBC/JDBCAPIs
Java	API
JavaScript	API
REST	API
C++	API
Node.js	API
Python	API
OPEN	SOURCE	
INTEGRATION
Apache	NiFi
Apache	Kafka
Apache	Spark
Apache	Storm
GEOSPATIAL	CAPABILITIES
Geometric	
Objects
Tracks
Geospatial	
Endpoints
WMS
WKT
KINETICA		CLUSTER
On-Demand	Scale
Commodity	Hardware
w/	GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar	
In-memory
HTTP	Head	Node
Commodity	Hardware
w/	GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar	
In-memory
HTTP	Head	Node
Commodity	Hardware
w/	GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar	
In-memory
HTTP	Head	Node
Commodity	Hardware
w/	GPUs
Disk
A1 B1 C1
A2 B2 C2
A3 B3 C3
A4 B4 C4
Columnar	
In-memory
HTTP	Head	Node
OTHER
INTEGRATION
Message	Queues
ETL	Tools
Streaming	Tools
20
Parallel	Ingest	Provides	High	Performance	Streaming
16
1	NODE	(1TB/2GPU)	
PARALLEL	
INGEST	
1	NODE	(1TB/2GPU)	
1	NODE	(1TB/2GPU)	
Each	node	of	the	system	can	share	the	task	of	data	
ingest,	provides	more	and	faster	throughput.	It	can	be	
made	faster	simply	by	adding	more	nodes.
No	compute	is	used	on	ingest	!
Speed	Layer	for	the	Data	Lake
17
Parallel	
Ingestion
Parallel	ingestion	of	events
Kinetica	is	speed	layer	with	real-
time	analytic	capabilities
HDFS	for	archival	store
Much	looser	coupling	than	
traditional	lambda	architecture
Batch	mode	Spark	or	MR	jobs	
can	push	data	to	Kinetica	as	
needed	for	fast	query	on	data	
loaded	from	the	data	lake
EVENTS
MESSAGE
BROKERS
Amazon	
Kinesis
ANALYSTS
MOBILE	
USERS
DASHBOARDS	&
APPLICATIONS
ALERTING	
SYSTEMS
Put,	get,	scan
Execute	complex	
analytics	on	the	fly
Kinetica	
Connectors
STREAM
PROCESSING
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
HDFS	/	AWS	S3	/	GCS	/	Azure	Data	Lake
Real-Time,	Advanced	Analytics,	Speed	Layer	for	Teradata	or	Oracle
18
Parallel	ingestion	of	events
Lambda-type	architecture	for	
Teradata	or	Oracle
Kinetica	is	speed	layer	with	
near-real-time	analytic	
capabilities	
Converge	Machine	Learning,	
streaming	and	location	
analytics	and	fast	Query	and	
Analytics	with	Kinetica	and	
RDBMS		
DATA	IN	
MOTION	
AND	REST
DATA	WAREHOUSE	/	TRANSACTIONAL
Amazon	
Kinesis
ANALYSTS
MOBILE	
USERS
DASHBOARDS	&
APPLICATIONS
ALERTING	
SYSTEMS
Kinetica	
Connectors
STREAM	/	ETL
PROCESSING
Fast	GPU	
accelerated,	in-
Memory	Database	
Converge	ML,	AI,	
Streaming
Advanced	In-Database	Analytics
1. User-defined	functions	(UDFs)	can	receive	table	data,	
do	arbitrary	computations,	and	save	output	to	a	
separate	table	in	a	distributed	manner.	
2. UDFs	have	direct	access	to	CUDA	APIs	– enables	
compute-to-grid	analytics	for	logic	deployed	within	
Kinetica.
3. Works	with	custom	code,	or	packaged	code.	Opens	
the	way	for	machine	learning/artificial	intelligence	
libraries	such	as	TensorFlow,	BIDMach,	Caffe and	
Torch to	work	on	data	directly	within	Kinetica.
4. Available	now	with	C++	&	Java	bindings.
19
ORCHESTRATION	LAYER	WITH	USER-DEFINED	FUNCTIONS	(UDFs)				
PHYSICAL	/	VIRTUAL	SERVER
Table	A
Table	n
GPU
UDFs	exposed		from	
RESTful	endpoint
Data	returned	to	
output	table	for	
further	analysis
CUDA	Libraries
n	number	of	Kinetica	servers
Table	B
Table	C
Proc	Server
UDF_A UDF_B UDF_n
/exec/proc/UDF_A/
Kinetica	Architecture
20
ETL	/	STREAM	
PROCESSING
ON	DEMAND	SCALE	OUT				+
1TB	MEM	/	2	GPU	CARDS
SQL
Native
APIs
PARALLEL	INGEST
Geospatial	
WMS	
Custom	
Connectors
In-Database	Processing
CUSTOM	LOGIC
BIDMach
ML	Libs
BI	DASHBOARDS
BI	/	GIS	/	APPS
CUSTOM	APPS	
&	GEOSPATIAL
KINETICA	‘REVEAL’
STREAMING	DATAERP	/	CRM	/	
TRANSACTIONAL	DATA
UDFs
21
AI	&	BI	on	One	GPU-Accelerated	Database					
HIGH	PERFORMANCE	ANALYTICS	
DATABASE
UDF UDF UDF
ODBC	
/	JDBC Native
REST	API WMS		
BUSINESS	INTELLIGENCE
CUSTOM	APPLICATIONS
HIGH	FIDELITY	
GEOSPATIAL PIPELINE
MACHINE	LEARNING	
&	DEEP	LEARNING	 GPU-ACCELERATED	
DATA	SCIENCE
PREDICTIVE		MODELS
e.g.	Risk	Management,
Sales	Volume,	Fraud.
BIDMach
SQL
DATA	SCIENTISTS	
/	DEVELOPERS
BUSINESS	USERS
50-100x	Faster	on	Queries	with	Large	Datasets
• Large	retailer	tested	complex	SQL	queries	
on	3	years	of	retail	data	(150bn	rows)
• 10	node	Kinetica	cluster	against	30TB+	
cluster	from	next	best	alternative
• GPU	is	able	to	perform	many	instructions	in	
parallel. Huge	performance	gains	on	
aggregations,	group	bys,	joins,	etc.	
• Kinetica	sustained	ingest	of	1.3bn	
objects/minute	with	70	attributes	per	row
22
WHEN	COMPARED	TO	LEADING	IN-MEMORY	ALTERNATIVES
SUM (Q1)
GROUP BY (Q5)
SELECT (Q10)
0 5 10 15 20 25 30 35 40 45 50
Kinetica Leading In-Memory DB
More	Details
23
Distributed	Geospatial	Pipeline
23
• NATIVE	VISUALIZATION	IS	DESIGNED	FOR	FAST	MOVING,	LOCATION-BASED	DATA
Native	Geospatial	Object	Types
• Points,	Shapes,	Tracks,	Labels
Native	Geospatial	Functions
• Filters	(by	area,	by	series,	by	geometry,	etc.)
• Aggregation	(histograms)		
• Geofencing - triggers
• Video	generation	(based	on	dates/times)
Generate	Map	Overlay	Imagery	(via	WMS)
• Rasterize	points
• Style	based	on	attributes	(class-break)
• Heat	maps
Full-Text	Search	
“Rain Tire” ~5Kinetica	includes	powerful	text	search	functionality,
including	:	
• Exact	Phrases
• Boolean	– AND	/	OR
• Wildcards
• Grouping
• Fuzzy	Search	(Damerau-Levenshtein optimal	string	alignment	algorithm)
• N-Gram	Term	Proximity	Search	
• Term	Boosting	Relevance	Prioritization
"Union Tranquility"~10
[100 TO 200]
22
INTELLIGENCE:	US	Army	- INSCOM
US	Army’s	in-memory	computational	engine	for	
any	data	with	a	geospatial	or	temporal	attribute	for	
a	major	joint	cloud	initiative	within	the	Intelligence	
Community	(IC	ITE).
Intel	analysts	are	able	to	conduct	near	real-time	
analytics	and	fuse	SIGINT,	ISR,	and	GEOINT	
streaming	big	data	feeds	and	visualize	in	a	web	
browser.	
First	time	in	history	military	analysts	are	able	to	
query	and	visualize	billions	to	trillions	of	near	real-
time	objects	in	a	production	environment.	
Major	executive	military	and	congressional	
visibility.	
Oracle	Spatial	
(92	Minutes)
42x	Lower	Space
28x	Lower	Cost
38x	Lower	Power	Cost
U.S	Army	INSCOM	Shift	from	Oracle	to	GPUdb
GPUdb
(20ms)		
1	GPUdb	server	vs	42	servers	with	Oracle	10gR2		(2011)
CASE	STUDY	: LOCATION	BASED	ANALYTICS
24
LOGISTICS:	Workforce	optimization	
DISTRIBUTED	ANALYSIS
USPS’	parallel	cluster	is	able	to	serve	up	to	15,000	
simultaneous	sessions,	providing	the	service’s	managers	
and	analysts	with	the	capability	to	instantly	analyze	
their	areas	of	responsibility	via	dashboards.
AT	SCALE
With	200,000	USPS	devices	emitting	location	once	
every	minute,	that	amounts	to	more	than	a	quarter	
billion	events	captured	and	analyzed	daily…	tracked	on	
10	nodes.
USPS	is	the	single	largest	logistic	entity	in	the	
country,	moving	more	individual	items	in	four	
hours	than	the	combination	of	UPS,	FedEx,	
and	DHL	move	all	year.
CASE	STUDY	: LOCATION	BASED	ANALYTICS
25
LOGISTICS	&	FLEET	MANAGEMENT	
27
Kinetica	enables	agile	tracking	of	shipments	to	
assist	store	managers	for	tracking	of	inventory	
and	arrival	times.
• Visibility	and	tracking	of	deliveries	&	trucks	for	store	
managers
• ETA	&	Notifications	– Provide	estimated	time	of	delivery,	
notifications	and	custom	location	based	alerting
• Route	Optimization	based	on	truck	size,	and	if	cargo	is	
perishable	or	contains	hazardous	materials.
LARGE
RETAILER
CASE	STUDY	: LOCATION	BASED	ANALYTICS
RISK	MANAGEMENT
28
Large	financial	institution	moves	counterparty	
risk	analysis	from	overnight	to	real-time.
• Data	collected	by	XVA	library	which	computes	risk	
metrics	for	each	trade
• Risk	computations	are	becoming	more	complex	and	
computationally	heavy.	xVA analysis	needs	to	project	
years	into	the	future.
• Kinetica	enables	banks	to	move	from	batch/overnight	
analysis	to	a	streaming/real-time	system	for	flexible	
real-time	monitoring	by	traders,	auditors	and	
management.	
MULTINATIONAL
BANK
CASE	STUDY	:	ADVANCED		IN-DATABASE	ANALYTICS
Scale	Out	on	Industry	Standard	Hardware
29
Kinetica	typically	results	in	 1⁄10 hardware	costs	of	standard	in-memory	databases.
IN	THE	CLOUD	WITH:
CERTIFIED	ON	PREMISE	WITH:
Runs	on	industry	standard	servers,	
512GB	memory	with	GPUs	(ex.	NVIDIA	
K80)
COMING	SOON:
Stop	by	Booth	#431	and	
Get	Your	Free	T-shirt
www.kinetica.com

How To Achieve Real-Time Analytics On A Data Lake Using GPUs