1	©	Cloudera,	Inc.	All	rights	reserved.	
Intro	to	Apache	Kudu		
Hadoop	storage	for	fast	analy=cs	on	fast	data	
	
Shravan	(Sean)	Pabba		|	Systems	Engineer,	Cloudera	|	@skpabba
2	©	Cloudera,	Inc.	All	rights	reserved.	
Apache	Kudu	
Storage	for	fast	(low	latency)	analy=cs	on	fast	(high	throughput)	data	
•  Simplifies	the	architecture	for	building	
analy=c	applica=ons	on	changing	data	
	
•  Op=mized	for	fast	analy=c	performance	
	
•  Na=vely	integrated	with	the	Hadoop	
ecosystem	of	components	
FILESYSTEM	
HDFS	
NoSQL	
HBASE	
INGEST	–	SQOOP,	FLUME,	KAFKA	
DATA	INTEGRATION	&	STORAGE	
SECURITY	–	SENTRY	
RESOURCE	MANAGEMENT	–	YARN	
UNIFIED	DATA	SERVICES	
BATCH	 STREAM	 SQL	 SEARCH	 MODEL	 ONLINE	
DATA	ENGINEERING	 DATA	DISCOVERY	&	ANALYTICS	 DATA	APPS	
SPARK,	
HIVE,	PIG	
SPARK	 IMPALA	 SOLR	 SPARK	 HBASE	
COLUMNAR	STORE	
KUDU
3	©	Cloudera,	Inc.	All	rights	reserved.	
Why	Kudu?
4	©	Cloudera,	Inc.	All	rights	reserved.	
Previous	Hadoop	storage	landscape	
HDFS	(GFS)	excels	at:	
•  Batch	ingest	only	(eg	hourly)	
•  Efficiently	scanning	large	amounts	
of	data	(analy=cs)	
HBase	(BigTable)	excels	at:	
•  Efficiently	finding	and	wri=ng	
individual	rows	
•  Making	data	mutable	
	
Gaps	exist	when	these	proper=es	
are	needed	simultaneously
5	©	Cloudera,	Inc.	All	rights	reserved.	
•  High	throughput	for	big	scans	
Goal:	Within	2x	of	Parquet	
	
•  Low-latency	for	short	accesses		
Goal:	1ms	read/write	on	SSD	
	
•  Database-like	seman=cs	
Ini=ally,	single-row	atomicity	
	
•  Rela=onal	data	model	
•  SQL	queries	should	be	natural	and	easy	
•  Include	NoSQL-style	scan,	insert,	and	update	APIs	
	
Kudu	design	goals
6	©	Cloudera,	Inc.	All	rights	reserved.	
Changing	hardware	landscape	
•  Spinning	disk	->	solid	state	storage	
•  NAND	Flash:	Up	to	450k	read	250k	write	IOPS,	about	2GB/sec	read	and	1.5GB/
sec	write	throughput,	at	a	price	of	less	than	$3/GB	and	dropping	
•  Intel	Optane/3D	XPoint	memory	(1000x	faster	than	Flash,	cheaper	than	RAM)	
	
•  RAM	is	cheaper	and	more	abundant:	
•  64->128->256GB	over	last	few	years	
	
•  Takeaway:	The	next	performance	bomleneck	is	CPU,	and	current	storage	systems	
weren’t	designed	with	CPU	efficiency	in	mind
7	©	Cloudera,	Inc.	All	rights	reserved.	
Apache	Kudu:	Scalable	and	fast	structured	storage	
Scalable	
•  Tested	up	to	400+	nodes	(~3PB	cluster)	
•  Designed	to	scale	to	1000s	of	nodes	and	tens	of	PBs	
Fast	
•  Millions	of	read/write	opera=ons	per	second	across	cluster	
•  Mul=ple	GB/second	read	throughput	per	node	
Tables	
•  Represents	data	in	structured	tables	like	a	normal	database	
•  Individual	record-level	access	to	100+	billion	row	tables
8	©	Cloudera,	Inc.	All	rights	reserved.	
Storing	records	in	Kudu	tables	
•  A	Kudu	table	has	a	SQL-like	schema	
•  And	a	finite	number	of	columns	(unlike	HBase/Cassandra)	
•  Types:	BOOL,	INT8,	INT16,	INT32,	INT64,	FLOAT,	DOUBLE,	STRING,	BINARY,	
TIMESTAMP	
•  Some	subset	of	columns	makes	up	a	possibly-composite	primary	key	
•  Fast	ALTER	TABLE	
•  Java,	Python,	and	C++	NoSQL-style	APIs	
•  Insert(),	Update(),	Delete(),	Scan()	
•  SQL	via	integra=ons	with	Impala	and	Spark	
•  Community	work	in	progress	/	experimental:	Drill,	Hive
9	©	Cloudera,	Inc.	All	rights	reserved.	
Use	cases
10	©	Cloudera,	Inc.	All	rights	reserved.	
Kudu	use	cases	
Kudu	is	best	for	use	cases	requiring:	
• Simultaneous	combina=on	of	sequen=al	and	random	reads	and	writes	
• Minimal	to	zero	data	latencies	
	
Time	series	
• Examples:	Streaming	market	data;	fraud	detec=on	&	preven=on;	network	monitoring	
• Workload:	Inserts,	updates,	scans,	lookups	
	
Online	repor=ng	/	data	warehousing	
• Example:	Opera=onal	data	store	(ODS)	
• Workload:	Inserts,	updates,	scans,	lookups
11	©	Cloudera,	Inc.	All	rights	reserved.	
“Tradi=onal”	real-=me	analy=cs	in	Hadoop	
Fraud	detec=on	in	the	real	world	=	storage	complexity	
Considera=ons:	
•  How	do	I	handle	failure	
during	this	process?	
•  How	oyen	do	I	reorganize	
data	streaming	in	into	a	
format	appropriate	for	
repor=ng?	
•  When	repor=ng,	how	do	I	see	
data	that	has	not	yet	been	
reorganized?	
•  How	do	I	ensure	that	
important	jobs	aren’t	
interrupted	by	maintenance?	
New	Par==on	
Most	Recent	Par==on	
Historical	Data	
HBase	
Parquet	
File	
Have	we	
accumulated	
enough	data?	
Reorganize	
HBase	file	
into	Parquet	
•  Wait	for	running	opera=ons	to	complete		
•  Define	new	Impala	par==on	referencing	
the	newly	wrimen	Parquet	file	
Ka{a	
Repor=ng	
Request	
Storage	in	HDFS
12	©	Cloudera,	Inc.	All	rights	reserved.	
Real-=me	analy=cs	in	Hadoop	with	Kudu	
Improvements:	
•  One	system	to	operate	
•  No	cron	jobs	or	background	
processes	
•  Handle	late	arrivals	or	data	
correc=ons	with	ease	
•  New	data	available	
immediately	for	analy=cs	or	
opera=ons		
Historical	and	Real-=me	
Data	
Incoming	data	
(e.g.	Ka{a)	
Repor=ng	
Request	
Storage	in	Kudu
13	©	Cloudera,	Inc.	All	rights	reserved.	
Large	Cable	Company	-	Old	Architecture	
Source: https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56113
14	©	Cloudera,	Inc.	All	rights	reserved.	
Challenges	
• Rebuild	of	en=re	datasets,	or	par==ons	by	re-genera=ng	compressed	CSV	files	
and	loading	into	HDFS	to	keep	data	current	took	several	hours	or	days.	
• Rebuild	opera=ons	consumed	cluster	capacity,	limi=ng	availability	to	other	teams	
in	a	shared	cluster.	
• No	way	to	update	a	single	row	in	the	dataset	without	recrea=ng	table	or	using	a	
slower	complicated	integra=on	with	HBase.	
Source: https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56113
15	©	Cloudera,	Inc.	All	rights	reserved.	
Large	Cable	Company	-	New	Architecture	
• Stores	Tune	Events	into	Kudu.	Any	data	fixes	are	made	directly	in	Kudu.	
• Stores	Metadata	directly	into	Kudu.	Any	data	fixes	are	made	directly	in	Kudu	
• Spark	Streaming	updates	Kudu	on	a	real	=me	basis	to	support	quick	analy=cs.	
• Spark	Job	reads	the	raw	events	,	sessionizes	and	updates	Kudu.	
• BI	tools	like	Zoomdata	directly	work	with	Impala	or	Kudu	to	enable	analy=cs.	
Source: https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56113
16	©	Cloudera,	Inc.	All	rights	reserved.	
Large	Cable	Company	-	New	Architecture	
Source: https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56113
17	©	Cloudera,	Inc.	All	rights	reserved.	
Kudu+Impala	vs	MPP	DWH	
Commonali=es	
✓	Fast	analy=c	queries	via	SQL,	including	most	commonly	used	modern	features	
✓	Ability	to	insert,	update,	and	delete	data	
Differences	
✓	Faster	streaming	inserts	
✓	Improved	Hadoop	integra=on	
	•	JOIN	between	HDFS	+	Kudu	tables,	run	on	same	cluster	
	•	Spark,	Flume,	other	integra=ons	
✗	Slower	batch	inserts	
✗	No	transac=onal	data	loading,	mul=-row	transac=ons,	or	indexing
18	©	Cloudera,	Inc.	All	rights	reserved.	
How	it	works	
Replica=on	and	fault	tolerance
19	©	Cloudera,	Inc.	All	rights	reserved.	
Tables,	tablets,	tablet	servers	and	masters	
•  Each	table	is	horizontally	par==oned	into	tablets	
•  Range	or	hash	par==oning	
• PRIMARY KEY (host, metric, timestamp) DISTRIBUTE BY
HASH(timestamp) INTO 100 BUCKETS
•  Each	tablet	has	N	replicas	(3	or	5)	with	Ray	consensus	
•  Automa=c	fault	tolerance	
•  MTTR:	~5	seconds	
•  Tablet	servers	host	tablets	on	local	disk	drives	
•  Master	services	metadata	opera=ons	
•  Create/drop	tables	and	tablets	
•  Locate	tablets
20	©	Cloudera,	Inc.	All	rights	reserved.	
How	it	works	
Columnar	storage
21	©	Cloudera,	Inc.	All	rights	reserved.	
Columnar	storage	
{25059873,	
22309487,	
23059861,	
23010982}	
Tweet_id	
{newsycbot,	
RideImpala,	
fastly,	
llvmorg}	
User_name	
{1442865158,	
1442828307,	
1442865156,	
1442865155}	
Created_at	
{Visual	exp…,	
Introducing	..,	
Missing	July…,	
LLVM	3.7….}	
text
22	©	Cloudera,	Inc.	All	rights	reserved.	
Columnar	storage	
{25059873,	
22309487,	
23059861,	
23010982}	
Tweet_id	
{newsycbot,	
RideImpala,	
fastly,	
llvmorg}	
User_name	
{1442865158,	
1442828307,	
1442865156,	
1442865155}	
Created_at	
{Visual	exp…,	
Introducing	..,	
Missing	July…,	
LLVM	3.7….}	
text	
SELECT	COUNT(*)	FROM	tweets	WHERE	user_name	=	‘newsycbot’;	
Only	read	1	column		
1GB	 2GB	 1GB	 200GB
23	©	Cloudera,	Inc.	All	rights	reserved.	
Columnar	compression	
{1442825158,	
1442826100,	
1442827994,	
1442828527}	
Created_at	
Created_at	 Diff(created_at)	
1442825158	 n/a	
1442826100	 942	
1442827994	 1894	
1442828527	 533	
64	bits	each	 11	bits	each	
•  Many	columns	can	compress	to	
a	few	bits	per	row!	
•  Especially:	
•  Timestamps	
•  Time	series	values	
•  Low-cardinality	strings	
	
•  Massive	space	savings	and	
throughput	increase!
24	©	Cloudera,	Inc.	All	rights	reserved.	
Represen=ng	=me	series	in	Kudu
25	©	Cloudera,	Inc.	All	rights	reserved.	
What	is	=me	series?	
Data	that	can	be	usefully	par==oned	and	queried	based	on	=me	
	
Examples:	
•  Web	user	ac=vity	data	(view	and	click	data,	tweets,	likes)	
•  Machine	metrics	(CPU	u=liza=on,	free	memory,	requests/sec)	
•  Pa=ent	data	(blood	pressure	readings,	weight	changes	over	=me)	
•  Financial	data	(stock	transac=ons,	price	fluctua=ons)
26	©	Cloudera,	Inc.	All	rights	reserved.	
Kudu	&	=me	series	data	
Real	=me	data	inges=on	+	fast	scans	=	
Ideal	pla…orm	for	storing	and	querying	=me	series	data	
	
•  Support	for	many	column	encodings	and	compression	schemes	
•  Encodings:	Plain,	dic=onary,	bitshuffle,	Run	Length,	Prefix	
•  Compression:	LZ4,	gzip,	bzip2	
•  Kudu	supports	a	flexible	range	of	par==oning	schemes	
•  Par==on	by	=me	range,	hash,	or	both	
•  Parallelizable	scans	
•  Scale-out	storage	system
27	©	Cloudera,	Inc.	All	rights	reserved.	
Par==oning	by	=me	range	+	series	hash
28	©	Cloudera,	Inc.	All	rights	reserved.	
Par==oning	by	=me	range	+	series	hash	(inserts)	
Inserts	are	spread	among	all	par==ons	of	the	=me	range
29	©	Cloudera,	Inc.	All	rights	reserved.	
Par==oning	by	=me	range	+	series	hash	(scans)	
Big	scans	(across	=me	intervals)	can	be	parallelized	across	par==ons
30	©	Cloudera,	Inc.	All	rights	reserved.	
Dynamic	par==on	management	
•  Allows	for	dropping	and	adding	par==ons	on	live	tables	
•  Efficiently	remove	ranges	of	(typically	old)	data	using	an	admin	tool
31	©	Cloudera,	Inc.	All	rights	reserved.	
Integra=ons
32	©	Cloudera,	Inc.	All	rights	reserved.	
Impala	integra=on	
• CREATE TABLE … DISTRIBUTE BY HASH(col1) INTO 16 BUCKETS
AS SELECT … FROM …
• INSERT / UPDATE / DELETE
• Optimizations: predicate pushdown, scan locality, scan parallelism
• More optimizations on the way
• Not an Impala user? Community working on other integrations (Hive, Drill, Presto, etc)
33	©	Cloudera,	Inc.	All	rights	reserved.	
Spark	DataSource	integra=on	
// Import kudu datasource
import org.kududb.spark.kudu._
val kuduDataFrame = sqlContext.read.options(
Map("kudu.master" -> "master.address.example.com", "kudu.table" -> "my_table_name")).kudu
// Then query using spark api or register a temporary table and use spark sql
kuduDataFrame.select("id").filter("id" >= 5).show()
// (prints the selection to the console)
// Register kuduDataFrame as a temporary table for spark-sql
kuduDataFrame.registerTempTable("kudu_table")
// Select from the dataframe
sqlContext.sql("select id from kudu_table where id >= 5").show()
// (prints the sql results to the console)
34	©	Cloudera,	Inc.	All	rights	reserved.	
MapReduce	integra=on	
• 	Mul=-framework	cluster	(MR	+	HDFS	+	Kudu	on	the	same	disks)	
• 	KuduTableInputFormat	/	KuduTableOutputFormat	
• 	Support	for	pushing	down	predicates,	column	projec=ons,	etc.	
• 	Lots	of	Kudu	integra=on	/	correctness	tes=ng	done	via	MapReduce
35	©	Cloudera,	Inc.	All	rights	reserved.	
Flume	integra=on	
• Basic	Flume	sink,	similar	to	the	Flume	HBaseSink	
• Write	a	simple	EventProducer	plugin	to	transform	from	your	
event	format	to	Kudu	Insert	objects	
• Then	deploy	with	a	Flume	config	file	like	the	following:	
	
agent.sink.kudu.type	=	org.kududb.flume.sink.KuduSink	
agent.sink.kudu.masterAddresses	=	kudu01.example.com	
agent.sink.kudu.tableName	=	my-table	
agent.sink.kudu.producer	=	MyEventProducer
36	©	Cloudera,	Inc.	All	rights	reserved.	
Performance
37	©	Cloudera,	Inc.	All	rights	reserved.	
TPC-H	(analy=cs	benchmark)	
•  75	server	cluster	
•  12	(spinning)	disks	each,	enough	RAM	to	fit	dataset	
•  TPC-H	Scale	Factor	100	(100GB)	
•  Example	SQL	query	(via	Impala):	
•  SELECT	n_name,	sum(l_extendedprice	*	(1	-	l_discount))	as	revenue	FROM	customer,	orders,	
lineitem,	supplier,	nation,	region	WHERE	c_custkey	=	o_custkey	AND	l_orderkey	=	
o_orderkey	AND	l_suppkey	=	s_suppkey	AND	c_nationkey	=	s_nationkey	AND	s_nationkey	=	
n_nationkey	AND	n_regionkey	=	r_regionkey	AND	r_name	=	'ASIA'	AND	o_orderdate	>=	date	
'1994-01-01'	AND	o_orderdate	<	'1995-01-01’	GROUP	BY	n_name	ORDER	BY	revenue	desc;
38	©	Cloudera,	Inc.	All	rights	reserved.	
TPC-H	results:	Kudu	vs	Parquet	
•  Kudu	outperforms	Parquet	by	31%	(geometric	mean)	for	RAM-resident	data
39	©	Cloudera,	Inc.	All	rights	reserved.	
TPC-H	results:	Kudu	vs	other	NoSQL	storage	
Apache	Phoenix:	OLTP	SQL	engine	built	on	HBase	
•  10	node	cluster	(9	workers,	1	master)	
•  TPC-H	LINEITEM	table	only	(6B	rows)
40	©	Cloudera,	Inc.	All	rights	reserved.	
What	about	NoSQL-style	random	access?	(YCSB)	
	
	
•  YCSB	0.5.0-snapshot	
•  10	node	cluster	
(9	workers,	1	master)	
•  100M	row	data	set	
•  10M	opera=ons	each	
workload
41	©	Cloudera,	Inc.	All	rights	reserved.	
Geˆng	started	with	Kudu
42	©	Cloudera,	Inc.	All	rights	reserved.	
Geˆng	started	as	a	user	
•  On	the	web:	kudu.apache.org	
•  User	mailing	list:	user@kudu.apache.org	
•  Slack	chat	channel	(see	web	site)	
	
•  Quickstart	VM	
•  Easiest	way	to	get	started	
•  Impala	and	Kudu	in	an	easy-to-install	VM	
•  CSD	and	Parcels	
•  For	installa=on	on	a	Cloudera	Manager-managed	cluster
43	©	Cloudera,	Inc.	All	rights	reserved.	
Geˆng	started	as	a	developer	
•  Source	code:	github.com/apache/kudu	
• All	commits	go	here	first	
•  Code	reviews:	gerrit.cloudera.org	
• All	code	reviews	are	public	
•  Public	JIRA:	issues.apache.org/jira/browse/KUDU	
•  Includes	bugs	going	back	to	2013	
•  Developer	mailing	list:	dev@kudu.apache.org	
	
•  Apache	2.0	license	open	source	and	an	ASF	project	
•  Contribu=ons	welcome	and	encouraged!
44	©	Cloudera,	Inc.	All	rights	reserved.	
Project	status	
•  First	open	source	beta	released	in	September	2015.	
•  Kudu	1.0.0	version	released	in	September	2016.	
•  Kudu	1.3.1	version	was	released	last	week.	
•  Kerberos	authen=ca=on,	TLS	encryp=on,	and	coarse-grained	(cluster-level)	
authoriza=on	
•  Many	Produc=on	customers	
•  Users	tes=ng	up	to	400+	nodes	so	far.	
•  Kudu	is	a	top-level	project	(TLP)	at	the	Apache	Soyware	Founda=on	
•  Community-driven	open	source	process.
45	©	Cloudera,	Inc.	All	rights	reserved.	
Apache	Kudu	Community
46	©	Cloudera,	Inc.	All	rights	reserved.	
kudu.apache.org	
@ApacheKudu

Introduction to Apache Kudu