Introduction to Apache Kudu

1 © Cloudera, Inc. All rights reserved.
Intro to Apache Kudu
Hadoop storage for fast analy=cs on fast data

Shravan (Sean) Pabba | Systems Engineer, Cloudera | @skpabba

Apache Kudu
Storage for fast (low latency) analy=cs on fast (high throughput) data
•  Simpliﬁes the architecture for building
analy=c applica=ons on changing data

•  Op=mized for fast analy=c performance

•  Na=vely integrated with the Hadoop
ecosystem of components
FILESYSTEM
HDFS
NoSQL
HBASE
INGEST – SQOOP, FLUME, KAFKA
DATA INTEGRATION & STORAGE
SECURITY – SENTRY
RESOURCE MANAGEMENT – YARN
UNIFIED DATA SERVICES
BATCH STREAM SQL SEARCH MODEL ONLINE
DATA ENGINEERING DATA DISCOVERY & ANALYTICS DATA APPS
SPARK,
HIVE, PIG
SPARK IMPALA SOLR SPARK HBASE
COLUMNAR STORE
KUDU

Why Kudu?

Previous Hadoop storage landscape
HDFS (GFS) excels at:
•  Batch ingest only (eg hourly)
•  Efficiently scanning large amounts
of data (analy=cs)
HBase (BigTable) excels at:
•  Efficiently finding and wri=ng
individual rows
•  Making data mutable

Gaps exist when these proper=es
are needed simultaneously

•  High throughput for big scans
Goal: Within 2x of Parquet

•  Low-latency for short accesses
Goal: 1ms read/write on SSD

•  Database-like seman=cs
Ini=ally, single-row atomicity

•  Rela=onal data model
•  SQL queries should be natural and easy
•  Include NoSQL-style scan, insert, and update APIs

Kudu design goals

Changing hardware landscape
•  Spinning disk -> solid state storage
•  NAND Flash: Up to 450k read 250k write IOPS, about 2GB/sec read and 1.5GB/
sec write throughput, at a price of less than $3/GB and dropping
•  Intel Optane/3D XPoint memory (1000x faster than Flash, cheaper than RAM)

•  RAM is cheaper and more abundant:
•  64->128->256GB over last few years

•  Takeaway: The next performance bomleneck is CPU, and current storage systems
weren’t designed with CPU eﬃciency in mind

Apache Kudu: Scalable and fast structured storage
Scalable
•  Tested up to 400+ nodes (~3PB cluster)
•  Designed to scale to 1000s of nodes and tens of PBs
Fast
•  Millions of read/write opera=ons per second across cluster
•  Mul=ple GB/second read throughput per node
Tables
•  Represents data in structured tables like a normal database
•  Individual record-level access to 100+ billion row tables

Storing records in Kudu tables
•  A Kudu table has a SQL-like schema
•  And a ﬁnite number of columns (unlike HBase/Cassandra)
•  Types: BOOL, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, STRING, BINARY,
TIMESTAMP
•  Some subset of columns makes up a possibly-composite primary key
•  Fast ALTER TABLE
•  Java, Python, and C++ NoSQL-style APIs
•  Insert(), Update(), Delete(), Scan()
•  SQL via integra=ons with Impala and Spark
•  Community work in progress / experimental: Drill, Hive

Use cases

Kudu use cases
Kudu is best for use cases requiring:
• Simultaneous combina=on of sequen=al and random reads and writes
• Minimal to zero data latencies

Time series
• Examples: Streaming market data; fraud detec=on & preven=on; network monitoring
• Workload: Inserts, updates, scans, lookups

Online repor=ng / data warehousing
• Example: Opera=onal data store (ODS)
• Workload: Inserts, updates, scans, lookups

“Tradi=onal” real-=me analy=cs in Hadoop
Fraud detec=on in the real world = storage complexity
Considera=ons:
•  How do I handle failure
during this process?
•  How oyen do I reorganize
data streaming in into a
format appropriate for
repor=ng?
•  When repor=ng, how do I see
data that has not yet been
reorganized?
•  How do I ensure that
important jobs aren’t
interrupted by maintenance?
New Par==on
Most Recent Par==on
Historical Data
HBase
Parquet
File
Have we
accumulated
enough data?
Reorganize
HBase file
into Parquet
•  Wait for running opera=ons to complete
•  Define new Impala par==on referencing
the newly wrimen Parquet file
Ka{a
Repor=ng
Request
Storage in HDFS

Real-=me analy=cs in Hadoop with Kudu
Improvements:
•  One system to operate
•  No cron jobs or background
processes
•  Handle late arrivals or data
correc=ons with ease
•  New data available
immediately for analy=cs or
opera=ons
Historical and Real-=me
Data
Incoming data
(e.g. Ka{a)
Repor=ng
Request
Storage in Kudu

Large Cable Company - Old Architecture
Source: https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56113

Challenges
• Rebuild of en=re datasets, or par==ons by re-genera=ng compressed CSV ﬁles
and loading into HDFS to keep data current took several hours or days.
• Rebuild opera=ons consumed cluster capacity, limi=ng availability to other teams
in a shared cluster.
• No way to update a single row in the dataset without recrea=ng table or using a
slower complicated integra=on with HBase.

Large Cable Company - New Architecture
• Stores Tune Events into Kudu. Any data ﬁxes are made directly in Kudu.
• Stores Metadata directly into Kudu. Any data ﬁxes are made directly in Kudu
• Spark Streaming updates Kudu on a real =me basis to support quick analy=cs.
• Spark Job reads the raw events , sessionizes and updates Kudu.
• BI tools like Zoomdata directly work with Impala or Kudu to enable analy=cs.

Large Cable Company - New Architecture

Kudu+Impala vs MPP DWH
Commonali=es
✓ Fast analy=c queries via SQL, including most commonly used modern features
✓ Ability to insert, update, and delete data
Diﬀerences
✓ Faster streaming inserts
✓ Improved Hadoop integra=on
• JOIN between HDFS + Kudu tables, run on same cluster
• Spark, Flume, other integra=ons
✗ Slower batch inserts
✗ No transac=onal data loading, mul=-row transac=ons, or indexing

How it works
Replica=on and fault tolerance

Tables, tablets, tablet servers and masters
•  Each table is horizontally par==oned into tablets
•  Range or hash par==oning
• PRIMARY KEY (host, metric, timestamp) DISTRIBUTE BY
HASH(timestamp) INTO 100 BUCKETS
•  Each tablet has N replicas (3 or 5) with Ray consensus
•  Automa=c fault tolerance
•  MTTR: ~5 seconds
•  Tablet servers host tablets on local disk drives
•  Master services metadata opera=ons
•  Create/drop tables and tablets
•  Locate tablets

How it works
Columnar storage

Columnar storage
{25059873,
22309487,
23059861,
23010982}
Tweet_id
{newsycbot,
RideImpala,
fastly,
llvmorg}
User_name
{1442865158,
1442828307,
1442865156,
1442865155}
Created_at
{Visual exp…,
Introducing ..,
Missing July…,
LLVM 3.7….}
text

Columnar storage
{25059873,
22309487,
23059861,
23010982}
Tweet_id
{newsycbot,
RideImpala,
fastly,
llvmorg}
User_name
{1442865158,
1442828307,
1442865156,
1442865155}
Created_at
{Visual exp…,
Introducing ..,
Missing July…,
LLVM 3.7….}
text
SELECT COUNT(*) FROM tweets WHERE user_name = ‘newsycbot’;
Only read 1 column
1GB 2GB 1GB 200GB

Columnar compression
{1442825158,
1442826100,
1442827994,
1442828527}
Created_at
Created_at Diﬀ(created_at)
1442825158 n/a
1442826100 942
1442827994 1894
1442828527 533
64 bits each 11 bits each
•  Many columns can compress to
a few bits per row!
•  Especially:
•  Timestamps
•  Time series values
•  Low-cardinality strings

•  Massive space savings and
throughput increase!

Represen=ng =me series in Kudu

What is =me series?
Data that can be usefully par==oned and queried based on =me

Examples:
•  Web user ac=vity data (view and click data, tweets, likes)
•  Machine metrics (CPU u=liza=on, free memory, requests/sec)
•  Pa=ent data (blood pressure readings, weight changes over =me)
•  Financial data (stock transac=ons, price ﬂuctua=ons)

Kudu & =me series data
Real =me data inges=on + fast scans =
Ideal pla…orm for storing and querying =me series data

•  Support for many column encodings and compression schemes
•  Encodings: Plain, dic=onary, bitshuffle, Run Length, Prefix
•  Compression: LZ4, gzip, bzip2
•  Kudu supports a flexible range of par==oning schemes
•  Par==on by =me range, hash, or both
•  Parallelizable scans
•  Scale-out storage system

Par==oning by =me range + series hash

Par==oning by =me range + series hash (inserts)
Inserts are spread among all par==ons of the =me range

Par==oning by =me range + series hash (scans)
Big scans (across =me intervals) can be parallelized across par==ons

Dynamic par==on management
•  Allows for dropping and adding par==ons on live tables
•  Eﬃciently remove ranges of (typically old) data using an admin tool

Integra=ons

Impala integra=on
• CREATE TABLE … DISTRIBUTE BY HASH(col1) INTO 16 BUCKETS
AS SELECT … FROM …
• INSERT / UPDATE / DELETE
• Optimizations: predicate pushdown, scan locality, scan parallelism
• More optimizations on the way
• Not an Impala user? Community working on other integrations (Hive, Drill, Presto, etc)

Spark DataSource integra=on
// Import kudu datasource
import org.kududb.spark.kudu._
val kuduDataFrame = sqlContext.read.options(
Map("kudu.master" -> "master.address.example.com", "kudu.table" -> "my_table_name")).kudu
// Then query using spark api or register a temporary table and use spark sql
kuduDataFrame.select("id").filter("id" >= 5).show()
// (prints the selection to the console)
// Register kuduDataFrame as a temporary table for spark-sql
kuduDataFrame.registerTempTable("kudu_table")
// Select from the dataframe
sqlContext.sql("select id from kudu_table where id >= 5").show()
// (prints the sql results to the console)

MapReduce integra=on
•  Mul=-framework cluster (MR + HDFS + Kudu on the same disks)
•  KuduTableInputFormat / KuduTableOutputFormat
•  Support for pushing down predicates, column projec=ons, etc.
•  Lots of Kudu integra=on / correctness tes=ng done via MapReduce

Flume integra=on
• Basic Flume sink, similar to the Flume HBaseSink
• Write a simple EventProducer plugin to transform from your
event format to Kudu Insert objects
• Then deploy with a Flume conﬁg ﬁle like the following:

agent.sink.kudu.type = org.kududb.flume.sink.KuduSink
agent.sink.kudu.masterAddresses = kudu01.example.com
agent.sink.kudu.tableName = my-table
agent.sink.kudu.producer = MyEventProducer

Performance

TPC-H (analy=cs benchmark)
•  75 server cluster
•  12 (spinning) disks each, enough RAM to ﬁt dataset
•  TPC-H Scale Factor 100 (100GB)
•  Example SQL query (via Impala):
•  SELECT n_name, sum(l_extendedprice * (1 - l_discount)) as revenue FROM customer, orders,
lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey =
o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey =
n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate >= date
'1994-01-01' AND o_orderdate < '1995-01-01’ GROUP BY n_name ORDER BY revenue desc;

TPC-H results: Kudu vs Parquet
•  Kudu outperforms Parquet by 31% (geometric mean) for RAM-resident data

TPC-H results: Kudu vs other NoSQL storage
Apache Phoenix: OLTP SQL engine built on HBase
•  10 node cluster (9 workers, 1 master)
•  TPC-H LINEITEM table only (6B rows)

What about NoSQL-style random access? (YCSB)

•  YCSB 0.5.0-snapshot
•  10 node cluster
(9 workers, 1 master)
•  100M row data set
•  10M opera=ons each
workload

Geˆng started with Kudu

Geˆng started as a user
•  On the web: kudu.apache.org
•  User mailing list: user@kudu.apache.org
•  Slack chat channel (see web site)

•  Quickstart VM
•  Easiest way to get started
•  Impala and Kudu in an easy-to-install VM
•  CSD and Parcels
•  For installa=on on a Cloudera Manager-managed cluster

Geˆng started as a developer
•  Source code: github.com/apache/kudu
• All commits go here ﬁrst
•  Code reviews: gerrit.cloudera.org
• All code reviews are public
•  Public JIRA: issues.apache.org/jira/browse/KUDU
•  Includes bugs going back to 2013
•  Developer mailing list: dev@kudu.apache.org

•  Apache 2.0 license open source and an ASF project
•  Contribu=ons welcome and encouraged!

Project status
•  First open source beta released in September 2015.
•  Kudu 1.0.0 version released in September 2016.
•  Kudu 1.3.1 version was released last week.
•  Kerberos authen=ca=on, TLS encryp=on, and coarse-grained (cluster-level)
authoriza=on
•  Many Produc=on customers
•  Users tes=ng up to 400+ nodes so far.
•  Kudu is a top-level project (TLP) at the Apache Soyware Founda=on
•  Community-driven open source process.

Apache Kudu Community

kudu.apache.org
@ApacheKudu

Introduction to Apache Kudu

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Apache Kudu

Similar to Introduction to Apache Kudu (20)

Recently uploaded

Recently uploaded (20)

Introduction to Apache Kudu