SlideShare a Scribd company logo
1 of 46
Download to read offline
1	©	Cloudera,	Inc.	All	rights	reserved.	
Intro	to	Apache	Kudu		
Hadoop	storage	for	fast	analy=cs	on	fast	data	
	
Shravan	(Sean)	Pabba		|	Systems	Engineer,	Cloudera	|	@skpabba
2	©	Cloudera,	Inc.	All	rights	reserved.	
Apache	Kudu	
Storage	for	fast	(low	latency)	analy=cs	on	fast	(high	throughput)	data	
•  Simplifies	the	architecture	for	building	
analy=c	applica=ons	on	changing	data	
	
•  Op=mized	for	fast	analy=c	performance	
	
•  Na=vely	integrated	with	the	Hadoop	
ecosystem	of	components	
FILESYSTEM	
HDFS	
NoSQL	
HBASE	
INGEST	–	SQOOP,	FLUME,	KAFKA	
DATA	INTEGRATION	&	STORAGE	
SECURITY	–	SENTRY	
RESOURCE	MANAGEMENT	–	YARN	
UNIFIED	DATA	SERVICES	
BATCH	 STREAM	 SQL	 SEARCH	 MODEL	 ONLINE	
DATA	ENGINEERING	 DATA	DISCOVERY	&	ANALYTICS	 DATA	APPS	
SPARK,	
HIVE,	PIG	
SPARK	 IMPALA	 SOLR	 SPARK	 HBASE	
COLUMNAR	STORE	
KUDU
3	©	Cloudera,	Inc.	All	rights	reserved.	
Why	Kudu?
4	©	Cloudera,	Inc.	All	rights	reserved.	
Previous	Hadoop	storage	landscape	
HDFS	(GFS)	excels	at:	
•  Batch	ingest	only	(eg	hourly)	
•  Efficiently	scanning	large	amounts	
of	data	(analy=cs)	
HBase	(BigTable)	excels	at:	
•  Efficiently	finding	and	wri=ng	
individual	rows	
•  Making	data	mutable	
	
Gaps	exist	when	these	proper=es	
are	needed	simultaneously
5	©	Cloudera,	Inc.	All	rights	reserved.	
•  High	throughput	for	big	scans	
Goal:	Within	2x	of	Parquet	
	
•  Low-latency	for	short	accesses		
Goal:	1ms	read/write	on	SSD	
	
•  Database-like	seman=cs	
Ini=ally,	single-row	atomicity	
	
•  Rela=onal	data	model	
•  SQL	queries	should	be	natural	and	easy	
•  Include	NoSQL-style	scan,	insert,	and	update	APIs	
	
Kudu	design	goals
6	©	Cloudera,	Inc.	All	rights	reserved.	
Changing	hardware	landscape	
•  Spinning	disk	->	solid	state	storage	
•  NAND	Flash:	Up	to	450k	read	250k	write	IOPS,	about	2GB/sec	read	and	1.5GB/
sec	write	throughput,	at	a	price	of	less	than	$3/GB	and	dropping	
•  Intel	Optane/3D	XPoint	memory	(1000x	faster	than	Flash,	cheaper	than	RAM)	
	
•  RAM	is	cheaper	and	more	abundant:	
•  64->128->256GB	over	last	few	years	
	
•  Takeaway:	The	next	performance	bomleneck	is	CPU,	and	current	storage	systems	
weren’t	designed	with	CPU	efficiency	in	mind
7	©	Cloudera,	Inc.	All	rights	reserved.	
Apache	Kudu:	Scalable	and	fast	structured	storage	
Scalable	
•  Tested	up	to	400+	nodes	(~3PB	cluster)	
•  Designed	to	scale	to	1000s	of	nodes	and	tens	of	PBs	
Fast	
•  Millions	of	read/write	opera=ons	per	second	across	cluster	
•  Mul=ple	GB/second	read	throughput	per	node	
Tables	
•  Represents	data	in	structured	tables	like	a	normal	database	
•  Individual	record-level	access	to	100+	billion	row	tables
8	©	Cloudera,	Inc.	All	rights	reserved.	
Storing	records	in	Kudu	tables	
•  A	Kudu	table	has	a	SQL-like	schema	
•  And	a	finite	number	of	columns	(unlike	HBase/Cassandra)	
•  Types:	BOOL,	INT8,	INT16,	INT32,	INT64,	FLOAT,	DOUBLE,	STRING,	BINARY,	
TIMESTAMP	
•  Some	subset	of	columns	makes	up	a	possibly-composite	primary	key	
•  Fast	ALTER	TABLE	
•  Java,	Python,	and	C++	NoSQL-style	APIs	
•  Insert(),	Update(),	Delete(),	Scan()	
•  SQL	via	integra=ons	with	Impala	and	Spark	
•  Community	work	in	progress	/	experimental:	Drill,	Hive
9	©	Cloudera,	Inc.	All	rights	reserved.	
Use	cases
10	©	Cloudera,	Inc.	All	rights	reserved.	
Kudu	use	cases	
Kudu	is	best	for	use	cases	requiring:	
• Simultaneous	combina=on	of	sequen=al	and	random	reads	and	writes	
• Minimal	to	zero	data	latencies	
	
Time	series	
• Examples:	Streaming	market	data;	fraud	detec=on	&	preven=on;	network	monitoring	
• Workload:	Inserts,	updates,	scans,	lookups	
	
Online	repor=ng	/	data	warehousing	
• Example:	Opera=onal	data	store	(ODS)	
• Workload:	Inserts,	updates,	scans,	lookups
11	©	Cloudera,	Inc.	All	rights	reserved.	
“Tradi=onal”	real-=me	analy=cs	in	Hadoop	
Fraud	detec=on	in	the	real	world	=	storage	complexity	
Considera=ons:	
•  How	do	I	handle	failure	
during	this	process?	
•  How	oyen	do	I	reorganize	
data	streaming	in	into	a	
format	appropriate	for	
repor=ng?	
•  When	repor=ng,	how	do	I	see	
data	that	has	not	yet	been	
reorganized?	
•  How	do	I	ensure	that	
important	jobs	aren’t	
interrupted	by	maintenance?	
New	Par==on	
Most	Recent	Par==on	
Historical	Data	
HBase	
Parquet	
File	
Have	we	
accumulated	
enough	data?	
Reorganize	
HBase	file	
into	Parquet	
•  Wait	for	running	opera=ons	to	complete		
•  Define	new	Impala	par==on	referencing	
the	newly	wrimen	Parquet	file	
Ka{a	
Repor=ng	
Request	
Storage	in	HDFS
12	©	Cloudera,	Inc.	All	rights	reserved.	
Real-=me	analy=cs	in	Hadoop	with	Kudu	
Improvements:	
•  One	system	to	operate	
•  No	cron	jobs	or	background	
processes	
•  Handle	late	arrivals	or	data	
correc=ons	with	ease	
•  New	data	available	
immediately	for	analy=cs	or	
opera=ons		
Historical	and	Real-=me	
Data	
Incoming	data	
(e.g.	Ka{a)	
Repor=ng	
Request	
Storage	in	Kudu
13	©	Cloudera,	Inc.	All	rights	reserved.	
Large	Cable	Company	-	Old	Architecture	
Source: https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56113
14	©	Cloudera,	Inc.	All	rights	reserved.	
Challenges	
• Rebuild	of	en=re	datasets,	or	par==ons	by	re-genera=ng	compressed	CSV	files	
and	loading	into	HDFS	to	keep	data	current	took	several	hours	or	days.	
• Rebuild	opera=ons	consumed	cluster	capacity,	limi=ng	availability	to	other	teams	
in	a	shared	cluster.	
• No	way	to	update	a	single	row	in	the	dataset	without	recrea=ng	table	or	using	a	
slower	complicated	integra=on	with	HBase.	
Source: https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56113
15	©	Cloudera,	Inc.	All	rights	reserved.	
Large	Cable	Company	-	New	Architecture	
• Stores	Tune	Events	into	Kudu.	Any	data	fixes	are	made	directly	in	Kudu.	
• Stores	Metadata	directly	into	Kudu.	Any	data	fixes	are	made	directly	in	Kudu	
• Spark	Streaming	updates	Kudu	on	a	real	=me	basis	to	support	quick	analy=cs.	
• Spark	Job	reads	the	raw	events	,	sessionizes	and	updates	Kudu.	
• BI	tools	like	Zoomdata	directly	work	with	Impala	or	Kudu	to	enable	analy=cs.	
Source: https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56113
16	©	Cloudera,	Inc.	All	rights	reserved.	
Large	Cable	Company	-	New	Architecture	
Source: https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56113
17	©	Cloudera,	Inc.	All	rights	reserved.	
Kudu+Impala	vs	MPP	DWH	
Commonali=es	
✓	Fast	analy=c	queries	via	SQL,	including	most	commonly	used	modern	features	
✓	Ability	to	insert,	update,	and	delete	data	
Differences	
✓	Faster	streaming	inserts	
✓	Improved	Hadoop	integra=on	
	•	JOIN	between	HDFS	+	Kudu	tables,	run	on	same	cluster	
	•	Spark,	Flume,	other	integra=ons	
✗	Slower	batch	inserts	
✗	No	transac=onal	data	loading,	mul=-row	transac=ons,	or	indexing
18	©	Cloudera,	Inc.	All	rights	reserved.	
How	it	works	
Replica=on	and	fault	tolerance
19	©	Cloudera,	Inc.	All	rights	reserved.	
Tables,	tablets,	tablet	servers	and	masters	
•  Each	table	is	horizontally	par==oned	into	tablets	
•  Range	or	hash	par==oning	
• PRIMARY KEY (host, metric, timestamp) DISTRIBUTE BY
HASH(timestamp) INTO 100 BUCKETS
•  Each	tablet	has	N	replicas	(3	or	5)	with	Ray	consensus	
•  Automa=c	fault	tolerance	
•  MTTR:	~5	seconds	
•  Tablet	servers	host	tablets	on	local	disk	drives	
•  Master	services	metadata	opera=ons	
•  Create/drop	tables	and	tablets	
•  Locate	tablets
20	©	Cloudera,	Inc.	All	rights	reserved.	
How	it	works	
Columnar	storage
21	©	Cloudera,	Inc.	All	rights	reserved.	
Columnar	storage	
{25059873,	
22309487,	
23059861,	
23010982}	
Tweet_id	
{newsycbot,	
RideImpala,	
fastly,	
llvmorg}	
User_name	
{1442865158,	
1442828307,	
1442865156,	
1442865155}	
Created_at	
{Visual	exp…,	
Introducing	..,	
Missing	July…,	
LLVM	3.7….}	
text
22	©	Cloudera,	Inc.	All	rights	reserved.	
Columnar	storage	
{25059873,	
22309487,	
23059861,	
23010982}	
Tweet_id	
{newsycbot,	
RideImpala,	
fastly,	
llvmorg}	
User_name	
{1442865158,	
1442828307,	
1442865156,	
1442865155}	
Created_at	
{Visual	exp…,	
Introducing	..,	
Missing	July…,	
LLVM	3.7….}	
text	
SELECT	COUNT(*)	FROM	tweets	WHERE	user_name	=	‘newsycbot’;	
Only	read	1	column		
1GB	 2GB	 1GB	 200GB
23	©	Cloudera,	Inc.	All	rights	reserved.	
Columnar	compression	
{1442825158,	
1442826100,	
1442827994,	
1442828527}	
Created_at	
Created_at	 Diff(created_at)	
1442825158	 n/a	
1442826100	 942	
1442827994	 1894	
1442828527	 533	
64	bits	each	 11	bits	each	
•  Many	columns	can	compress	to	
a	few	bits	per	row!	
•  Especially:	
•  Timestamps	
•  Time	series	values	
•  Low-cardinality	strings	
	
•  Massive	space	savings	and	
throughput	increase!
24	©	Cloudera,	Inc.	All	rights	reserved.	
Represen=ng	=me	series	in	Kudu
25	©	Cloudera,	Inc.	All	rights	reserved.	
What	is	=me	series?	
Data	that	can	be	usefully	par==oned	and	queried	based	on	=me	
	
Examples:	
•  Web	user	ac=vity	data	(view	and	click	data,	tweets,	likes)	
•  Machine	metrics	(CPU	u=liza=on,	free	memory,	requests/sec)	
•  Pa=ent	data	(blood	pressure	readings,	weight	changes	over	=me)	
•  Financial	data	(stock	transac=ons,	price	fluctua=ons)
26	©	Cloudera,	Inc.	All	rights	reserved.	
Kudu	&	=me	series	data	
Real	=me	data	inges=on	+	fast	scans	=	
Ideal	pla…orm	for	storing	and	querying	=me	series	data	
	
•  Support	for	many	column	encodings	and	compression	schemes	
•  Encodings:	Plain,	dic=onary,	bitshuffle,	Run	Length,	Prefix	
•  Compression:	LZ4,	gzip,	bzip2	
•  Kudu	supports	a	flexible	range	of	par==oning	schemes	
•  Par==on	by	=me	range,	hash,	or	both	
•  Parallelizable	scans	
•  Scale-out	storage	system
27	©	Cloudera,	Inc.	All	rights	reserved.	
Par==oning	by	=me	range	+	series	hash
28	©	Cloudera,	Inc.	All	rights	reserved.	
Par==oning	by	=me	range	+	series	hash	(inserts)	
Inserts	are	spread	among	all	par==ons	of	the	=me	range
29	©	Cloudera,	Inc.	All	rights	reserved.	
Par==oning	by	=me	range	+	series	hash	(scans)	
Big	scans	(across	=me	intervals)	can	be	parallelized	across	par==ons
30	©	Cloudera,	Inc.	All	rights	reserved.	
Dynamic	par==on	management	
•  Allows	for	dropping	and	adding	par==ons	on	live	tables	
•  Efficiently	remove	ranges	of	(typically	old)	data	using	an	admin	tool
31	©	Cloudera,	Inc.	All	rights	reserved.	
Integra=ons
32	©	Cloudera,	Inc.	All	rights	reserved.	
Impala	integra=on	
• CREATE TABLE … DISTRIBUTE BY HASH(col1) INTO 16 BUCKETS
AS SELECT … FROM …
• INSERT / UPDATE / DELETE
• Optimizations: predicate pushdown, scan locality, scan parallelism
• More optimizations on the way
• Not an Impala user? Community working on other integrations (Hive, Drill, Presto, etc)
33	©	Cloudera,	Inc.	All	rights	reserved.	
Spark	DataSource	integra=on	
// Import kudu datasource
import org.kududb.spark.kudu._
val kuduDataFrame = sqlContext.read.options(
Map("kudu.master" -> "master.address.example.com", "kudu.table" -> "my_table_name")).kudu
// Then query using spark api or register a temporary table and use spark sql
kuduDataFrame.select("id").filter("id" >= 5).show()
// (prints the selection to the console)
// Register kuduDataFrame as a temporary table for spark-sql
kuduDataFrame.registerTempTable("kudu_table")
// Select from the dataframe
sqlContext.sql("select id from kudu_table where id >= 5").show()
// (prints the sql results to the console)
34	©	Cloudera,	Inc.	All	rights	reserved.	
MapReduce	integra=on	
• 	Mul=-framework	cluster	(MR	+	HDFS	+	Kudu	on	the	same	disks)	
• 	KuduTableInputFormat	/	KuduTableOutputFormat	
• 	Support	for	pushing	down	predicates,	column	projec=ons,	etc.	
• 	Lots	of	Kudu	integra=on	/	correctness	tes=ng	done	via	MapReduce
35	©	Cloudera,	Inc.	All	rights	reserved.	
Flume	integra=on	
• Basic	Flume	sink,	similar	to	the	Flume	HBaseSink	
• Write	a	simple	EventProducer	plugin	to	transform	from	your	
event	format	to	Kudu	Insert	objects	
• Then	deploy	with	a	Flume	config	file	like	the	following:	
	
agent.sink.kudu.type	=	org.kududb.flume.sink.KuduSink	
agent.sink.kudu.masterAddresses	=	kudu01.example.com	
agent.sink.kudu.tableName	=	my-table	
agent.sink.kudu.producer	=	MyEventProducer
36	©	Cloudera,	Inc.	All	rights	reserved.	
Performance
37	©	Cloudera,	Inc.	All	rights	reserved.	
TPC-H	(analy=cs	benchmark)	
•  75	server	cluster	
•  12	(spinning)	disks	each,	enough	RAM	to	fit	dataset	
•  TPC-H	Scale	Factor	100	(100GB)	
•  Example	SQL	query	(via	Impala):	
•  SELECT	n_name,	sum(l_extendedprice	*	(1	-	l_discount))	as	revenue	FROM	customer,	orders,	
lineitem,	supplier,	nation,	region	WHERE	c_custkey	=	o_custkey	AND	l_orderkey	=	
o_orderkey	AND	l_suppkey	=	s_suppkey	AND	c_nationkey	=	s_nationkey	AND	s_nationkey	=	
n_nationkey	AND	n_regionkey	=	r_regionkey	AND	r_name	=	'ASIA'	AND	o_orderdate	>=	date	
'1994-01-01'	AND	o_orderdate	<	'1995-01-01’	GROUP	BY	n_name	ORDER	BY	revenue	desc;
38	©	Cloudera,	Inc.	All	rights	reserved.	
TPC-H	results:	Kudu	vs	Parquet	
•  Kudu	outperforms	Parquet	by	31%	(geometric	mean)	for	RAM-resident	data
39	©	Cloudera,	Inc.	All	rights	reserved.	
TPC-H	results:	Kudu	vs	other	NoSQL	storage	
Apache	Phoenix:	OLTP	SQL	engine	built	on	HBase	
•  10	node	cluster	(9	workers,	1	master)	
•  TPC-H	LINEITEM	table	only	(6B	rows)
40	©	Cloudera,	Inc.	All	rights	reserved.	
What	about	NoSQL-style	random	access?	(YCSB)	
	
	
•  YCSB	0.5.0-snapshot	
•  10	node	cluster	
(9	workers,	1	master)	
•  100M	row	data	set	
•  10M	opera=ons	each	
workload
41	©	Cloudera,	Inc.	All	rights	reserved.	
Geˆng	started	with	Kudu
42	©	Cloudera,	Inc.	All	rights	reserved.	
Geˆng	started	as	a	user	
•  On	the	web:	kudu.apache.org	
•  User	mailing	list:	user@kudu.apache.org	
•  Slack	chat	channel	(see	web	site)	
	
•  Quickstart	VM	
•  Easiest	way	to	get	started	
•  Impala	and	Kudu	in	an	easy-to-install	VM	
•  CSD	and	Parcels	
•  For	installa=on	on	a	Cloudera	Manager-managed	cluster
43	©	Cloudera,	Inc.	All	rights	reserved.	
Geˆng	started	as	a	developer	
•  Source	code:	github.com/apache/kudu	
• All	commits	go	here	first	
•  Code	reviews:	gerrit.cloudera.org	
• All	code	reviews	are	public	
•  Public	JIRA:	issues.apache.org/jira/browse/KUDU	
•  Includes	bugs	going	back	to	2013	
•  Developer	mailing	list:	dev@kudu.apache.org	
	
•  Apache	2.0	license	open	source	and	an	ASF	project	
•  Contribu=ons	welcome	and	encouraged!
44	©	Cloudera,	Inc.	All	rights	reserved.	
Project	status	
•  First	open	source	beta	released	in	September	2015.	
•  Kudu	1.0.0	version	released	in	September	2016.	
•  Kudu	1.3.1	version	was	released	last	week.	
•  Kerberos	authen=ca=on,	TLS	encryp=on,	and	coarse-grained	(cluster-level)	
authoriza=on	
•  Many	Produc=on	customers	
•  Users	tes=ng	up	to	400+	nodes	so	far.	
•  Kudu	is	a	top-level	project	(TLP)	at	the	Apache	Soyware	Founda=on	
•  Community-driven	open	source	process.
45	©	Cloudera,	Inc.	All	rights	reserved.	
Apache	Kudu	Community
46	©	Cloudera,	Inc.	All	rights	reserved.	
kudu.apache.org	
@ApacheKudu

More Related Content

What's hot

Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduJeremy Beard
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupCaserta
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data PlatformRakuten Group, Inc.
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Cloudera, Inc.
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Exponea - Kafka and Hadoop as components of architecture
Exponea  - Kafka and Hadoop as components of architectureExponea  - Kafka and Hadoop as components of architecture
Exponea - Kafka and Hadoop as components of architectureMartinStrycek
 
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Dataconomy Media
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Hadoop / Spark Conference Japan
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache KuduAndriy Zabavskyy
 
Apache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architecturesApache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architecturesNacho García Fernández
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoophadooparchbook
 
Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduEnabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduGrant Henke
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataOfir Manor
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataRyan Bosshart
 
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataCloudera, Inc.
 
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast DataKudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Datamichaelguia
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopCloudera, Inc.
 

What's hot (20)

Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
 
Introducing Kudu
Introducing KuduIntroducing Kudu
Introducing Kudu
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Exponea - Kafka and Hadoop as components of architecture
Exponea  - Kafka and Hadoop as components of architectureExponea  - Kafka and Hadoop as components of architecture
Exponea - Kafka and Hadoop as components of architecture
 
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
Introduction to Kudu: Hadoop Storage for Fast Analytics on Fast Data - Rüdige...
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Apache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architecturesApache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architectures
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
Enabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache KuduEnabling the Active Data Warehouse with Apache Kudu
Enabling the Active Data Warehouse with Apache Kudu
 
Interactive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroDataInteractive SQL-on-Hadoop and JethroData
Interactive SQL-on-Hadoop and JethroData
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
 
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
 
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast DataKudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Data
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Kudu demo
Kudu demoKudu demo
Kudu demo
 
Impala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for HadoopImpala 2.0 - The Best Analytic Database for Hadoop
Impala 2.0 - The Best Analytic Database for Hadoop
 

Similar to Introduction to Apache Kudu

Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
 
sql on hadoop
sql on hadoop sql on hadoop
sql on hadoop Jianwei Li
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 
Apache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in HadoopApache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in HadoopCloudera Japan
 
Big Data Processing with Hadoop-MapReduce in Cloud Systems
Big Data Processing with Hadoop-MapReduce in Cloud SystemsBig Data Processing with Hadoop-MapReduce in Cloud Systems
Big Data Processing with Hadoop-MapReduce in Cloud SystemsIntellipaat
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHortonworks
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsDr. Mirko Kämpf
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsDr. Mirko Kämpf
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache HadoopHortonworks
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Marcel Krcah
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellKhalid Imran
 

Similar to Introduction to Apache Kudu (20)

Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
sql on hadoop
sql on hadoop sql on hadoop
sql on hadoop
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Apache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in HadoopApache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in Hadoop
 
Hortonworks.bdb
Hortonworks.bdbHortonworks.bdb
Hortonworks.bdb
 
Big Data Processing with Hadoop-MapReduce in Cloud Systems
Big Data Processing with Hadoop-MapReduce in Cloud SystemsBig Data Processing with Hadoop-MapReduce in Cloud Systems
Big Data Processing with Hadoop-MapReduce in Cloud Systems
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
How to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDBHow to Use Apache Zeppelin with HWX HDB
How to Use Apache Zeppelin with HWX HDB
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
Predictive Analytics and Machine Learning…with SAS and Apache HadoopPredictive Analytics and Machine Learning…with SAS and Apache Hadoop
Predictive Analytics and Machine Learning …with SAS and Apache Hadoop
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Spark_Part 1
Spark_Part 1Spark_Part 1
Spark_Part 1
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)
 
Hadoop white papers
Hadoop white papersHadoop white papers
Hadoop white papers
 
963
963963
963
 
Future of-hadoop-analytics
Future of-hadoop-analyticsFuture of-hadoop-analytics
Future of-hadoop-analytics
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

Introduction to Apache Kudu