SlideShare a Scribd company logo
1© Cloudera, Inc. All rights reserved.
Introducing Apache Kudu
(incubating)
Mladen Kovacevic | Solutions Architect
May 2016
Adapted from Jeremy Beard’s original presentation in New York, Dec 2015
2© Cloudera, Inc. All rights reserved.
Presenter
• Mladen Kovacevic
• Solutions Architect at Cloudera (Toronto based)
• 2+ yrs in Big Data
• 3+ yrs building workload optimized systems, exploiting hardware features to
improve software performance and capability (compute, storage, networking)
• 6+ yrs engine development enterprise RDBMS
mladen@cloudera.com
3© Cloudera, Inc. All rights reserved.
Current storage landscape in Hadoop
HDFS (GFS) excels at:
• Scanning large amounts of data
• Accumulating data with high
throughput
HBase (BigTable) excels at:
• Efficiently finding and writing
individual rows
• Making data mutable
Gaps exist when these properties
are needed simultaneously
4© Cloudera, Inc. All rights reserved.
Managing the gap (today)
Code Complexity
• Manage flow and sync of data between HDFS and HBase
Monitoring and Security
• Managing consistent backups, security policies, monitoring and more is hard
Performance
• Significant lag between arrival of HBase data “staging” and time when data is
available for analytics.
5© Cloudera, Inc. All rights reserved.
Hardware landscape is changing
rethink design!
• Spinning disk (HDD) -> solid state storage (SSD)
• NAND flash: Up to 450k read 250k write IOPS, about 2GB/sec read and 1.5GB/sec
write throughput, at a price of less than $3/GB and dropping
• 3D XPoint memory – persistent storage (1000x faster than NAND, cheaper than RAM)
• RAM is cheaper and more abundant:
• 64->128->256GB over last few years
• Takeaway 1: The next bottleneck is CPU, and current storage heavy applications weren’t
designed with CPU efficiency in mind
• Takeaway 2: Column stores are feasible for random access
6© Cloudera, Inc. All rights reserved.
Apache Kudu (Incubating)
Storage for Fast Analytics on Fast Data
• New updatable column store for Hadoop
• Simplifies the architecture for building analytic
applications on changing data
• Designed for fast analytic performance
• Natively integrated with Hadoop
• Donated as incubating project at
Apache Software Foundation (November
17, 2015)
• Beta v0.8.0 now available
STRUCTURED
Sqoop
UNSTRUCTURED
Kafka, Flume
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT
YARN
SECURITY
Sentry, RecordService
FILESYSTEM
HDFS
RELATIONAL
Kudu
NoSQL
HBase
STORE
INTEGRATE
BATCH
Spark, Hive, Pig
MapReduce
STREAM
Spark
SQL
Impala
SEARCH
Solr
SDK
Kite
7© Cloudera, Inc. All rights reserved.
• High throughput for big scans (columnar storage and
replication)
Goal: Within 2x of Parquet
• Low-latency for short accesses (primary key indexes
and quorum design)
Goal: 1ms read/write on SSD
• Database-like semantics (initially single-row ACID)
• Relational data model (tables/schemas)
• Easily integrate with SQL engines (Impala, Drill, Hive)
• “NoSQL” style scan/insert/update (Java, C++, Python
clients)
Kudu design goals
8© Cloudera, Inc. All rights reserved.
What Kudu is not
• Not a SQL interface
• Just the storage layer
• “BYOSQL” – Bring-your-own SQL
• Not a file system
• Data must have tabular structure
• Not an application that runs on HDFS
• An alternative, native Hadoop storage engine
• Not a replacement for HDFS or HBase
• Select the right storage for the right use case
• Cloudera will continue to support and invest in all three
9© Cloudera, Inc. All rights reserved.
What Kudu is
STORAGE SYSTEM
for
TABLES
of
STRUCTURED DATA
No SQL engine, nor query planner, nor
optimizer.
Storage system that lets you get at
your data quickly (random/scan).
Highly scalable, with ability to provide
higher level systems info to exploit its
abilities (ie. locality).
Data logically presented to clients as
rows and finite set of columns
Stored in columnar format
Tables are partitioned across tablets
managed by tablet servers.
Tablets are replicated managed by Raft
consensus.
Strongly typed columns
Finite number of columns
Add/remove columns solely by ALTER
TABLE statements
10© Cloudera, Inc. All rights reserved.
About Kudu
• Apache-licensed open source software
• Structured data model
• Basic construct: tables
• Tables broken down into tablets (roughly equivalent to partitions)
• Architecture supports geographically disparate, active/active systems
• Not the initial design goal
11© Cloudera, Inc. All rights reserved.
Kudu data model
• Tables have a RDBMS-like schema
• Finite number of columns (unlike HBase/Cassandra)
• Types: BOOL, INT8/16/32/64, FLOAT, DOUBLE, STRING, BINARY, TIMESTAMP
• Some subset of columns make up a primary key
• Fast random reads/writes by primary key
• No secondary indexes (yet)
• Columnar layout on disk – Parquet-like
• Lazy materialization
• Encoding and compression options
11
12© Cloudera, Inc. All rights reserved.
Table partitioning
• Hash bucketing
• Distribute records by hash of partition column(s)
• N buckets leads to N tablets
• Range partitioning
• Distribute records by ranges of the partition column(s)
• N split keys leads to N tablets
• Can be a mix for different columns of the primary key
13© Cloudera, Inc. All rights reserved.
Kudu tables are partitioned into “tablets” (regions)
Partitioning based on
primary key (PK)
• Native support for
range partitioning
and/or hash
partitioning (salting)
• Hash example:
PRIMARY KEY
(tweet_id) DISTRIBUTE
BY HASH(tweet_id)
INTO 100 BUCKETS
14© Cloudera, Inc. All rights reserved.
Each tablet is fault tolerant via Raft consensus
• A single Tablet Server can
host many tablets
• Metadata is stored on just
another tablet, but only
Master Server processes
host that tablet
• Raft consensus:
• Strong consistency
• Leader election on failure
• Replication factor 3 or 5 is
typical
15© Cloudera, Inc. All rights reserved.
Consistency model
• Consistency and replication enforced by Raft consensus (similar to Paxos)
• Replication by operation not data
• Single-row transactions now
• Multi-row transactions later
• Geo-distributed replicas will be possible under strict time synchronization
• Techniques drawn from Google Spanner and others
16© Cloudera, Inc. All rights reserved.
Kudu interfaces
• NoSQL-style APIs
• Insert(), Update(), Delete(), Scan()
• Java, C++, limited Python
• Integrations with MapReduce, Spark, and Impala
• No direct access to underlying Kudu tablet files
• Beta does not have authentication, authorization, encryption
17© Cloudera, Inc. All rights reserved.
Impala integration
• Opens up Kudu to JDBC/ODBC clients
• Intuitive way to get data into Kudu
• INSERT INTO kudu_table SELECT * FROM src_table;
• Additional commands
• UPDATE
• DELETE
• Efficient INSERT VALUES
• Runs on the Kudu C++ client
18© Cloudera, Inc. All rights reserved.
Performance characteristics
Very CPU efficient
• Written in modern C++, uses specialized CPU instructions (SIMD), JIT
compilation with LLVM
Latency dependent on storage hardware capabilities
• Expect sub-millisecond response on SSDs and upcoming technologies
No garbage collection allows very large memory footprint with no pauses
Bloom filters reduce the need for many disk accesses
19© Cloudera, Inc. All rights reserved.
Operating Kudu
• Easiest through Cloudera Manager integration
• Separate parcel for now
• Kudu is always compacting
• No minor vs. major compaction
• No compaction latency spikes
• Web UI is full of metrics and logs
20© Cloudera, Inc. All rights reserved.
Cluster layout
• One or multiple masters for failover
• Only one in current beta
• Low CPU and memory impact
• One tablet server per worker node
• Can share disks with HDFS
• One SSD per worker node just for Kudu WAL can speed up writes
• No dependencies on other Hadoop ecosystem components
• But interfacing components like Impala or Spark do
21© Cloudera, Inc. All rights reserved.
Kudu: Cluster metadata management
• Replicated master
• Acts as a tablet directory
• Acts as a catalog (which tables exist, etc)
• Acts as a load balancer (tracks TS liveness, re-replicates under-replicated
tablets)
• Caches all metadata in RAM for high performance
• Under heavy load, 99.99% response times still in microseconds (650us)
• Client configured with master addresses
• Client asks master for tablet locations as needed and caches them locally
22© Cloudera, Inc. All rights reserved.
Real-time analytics in Hadoop today
Merging in new data = storage complexity
Downsides:
● Multiple storage layers
● Latest data is hidden
● Files are messy
● Complex to do updates
without breaking running
queriesNew Partition
Most Recent Partition
Historic Data
HBase
Parquet
File
Have we
accumulated
enough data?
Reorganize
HBase file
into Parquet
• Wait for running operations to complete
• Define new Impala partition referencing
the newly written Parquet file
Incoming Data
(Messaging
System)
Reporting
Request
HDFS + Impala
23© Cloudera, Inc. All rights reserved.
Real-time analytics in Hadoop with Kudu
Improvements:
● One system to operate
● No schedules or
background processes
● Handle late arrivals or data
corrections with ease
● New data available
immediately for analytics
or operations
Historical and Real-time
Data
Incoming Data
(Messaging
System)
Reporting
Request
Kudu + Impala
24© Cloudera, Inc. All rights reserved.
Kudu for data warehousing
• Near real time data visibility
• BI tools can display events that happened seconds earlier
• Excellent for star schemas
• Fast scans of deep fact tables
• Efficient wide fact tables
• Simplified updates of slowly changing dimensions
25© Cloudera, Inc. All rights reserved.
Near real time data warehousing on Kudu
Files
RDBMS
Streams
K
A
F
K
A
K
U
D
U
IMPALA
HUE
BI tools
User
FLUME
SPARK
STREAMING
Simple
Complex
26© Cloudera, Inc. All rights reserved.
Resources
Join the community
http://getkudu.io
kudu-user@googlegroups.com
Download the beta
cloudera.com/downloads
Read the whitepaper
getkudu.io/kudu.pdf
27© Cloudera, Inc. All rights reserved.
Demo
• Load a standard HDFS Parquet table
• Create a Kudu table via Impala
• INSERT/UPDATE/DELETE statement with Impala
• Spark insert into Kudu table sample job
• Reading from Parquet files as input
• Map task to insert each row into Kudu
28© Cloudera, Inc. All rights reserved.
Creating a Kudu table - Impala
Kudu Storage handler
Table name in Impala
does NOT match table
name in Kudu. Kudu is
its own storage layer.
Kudu Master hostname and port
A primary key is mandatory
29© Cloudera, Inc. All rights reserved.
Spark (Scala) code – Java APIDataFrame Row
Kudu Master hostname and port
Kudu table name
Create a client, session and table object
Extract values from the row, strong types
Create an insert object and row
Set the values by type, column name and column valule
Perform the actual insert
Cleanup
30© Cloudera, Inc. All rights reserved.
Kudu with Spark DataFrames
Tie in Kudu with Spark
Map of the Kudu table name, and Kudu master
Table name to refer to within Spark
SparkSQL statement
31© Cloudera, Inc. All rights reserved.
Kudu code examples and docs
https://github.com/cloudera/kudu-examples
http://www.cloudera.com/documentation/betas/kudu/0-7-
0/topics/kudu_development.html
http://getkudu.io/docs/developing.html
32© Cloudera, Inc. All rights reserved.
Thank you
mladen@cloudera.com

More Related Content

What's hot

February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
Yahoo Developer Network
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
Mike Percy
 
Apache kudu
Apache kuduApache kudu
Apache kudu
Asim Jalis
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
Todd Lipcon
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
Rakuten Group, Inc.
 
Kudu demo
Kudu demoKudu demo
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Data Con LA
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
Caserta
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Cloudera, Inc.
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
Shravan (Sean) Pabba
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Jeremy Beard
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Data Con LA
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
Chris George
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
Yahoo Developer Network
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
markgrover
 
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Cloudera, Inc.
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
 
SFHUG Kudu Talk
SFHUG Kudu TalkSFHUG Kudu Talk
SFHUG Kudu Talk
Felicia Haggarty
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache Kudu
DataWorks Summit
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureImproving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Databricks
 

What's hot (20)

February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
Apache kudu
Apache kuduApache kudu
Apache kudu
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
 
Kudu demo
Kudu demoKudu demo
Kudu demo
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduBuilding Effective Near-Real-Time Analytics with Spark Streaming and Kudu
Building Effective Near-Real-Time Analytics with Spark Streaming and Kudu
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast DataKudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
SFHUG Kudu Talk
SFHUG Kudu TalkSFHUG Kudu Talk
SFHUG Kudu Talk
 
Low latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache KuduLow latency high throughput streaming using Apache Apex and Apache Kudu
Low latency high throughput streaming using Apache Apex and Apache Kudu
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureImproving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 

Viewers also liked

DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Hakka Labs
 
Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016
Josh Elser
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Josh Elser
 
2016 Resume Current JB
2016 Resume Current JB2016 Resume Current JB
2016 Resume Current JBJanet Begg
 
AWS Summit 2011: Closing Keynote : The Story of Amazon.com's Move to the AWS ...
AWS Summit 2011: Closing Keynote : The Story of Amazon.com's Move to the AWS ...AWS Summit 2011: Closing Keynote : The Story of Amazon.com's Move to the AWS ...
AWS Summit 2011: Closing Keynote : The Story of Amazon.com's Move to the AWS ...Amazon Web Services
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache FlinkThe Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
DataWorks Summit/Hadoop Summit
 
Amazon.com Corporate IT apps Migration to AWS
Amazon.com Corporate IT apps Migration to AWSAmazon.com Corporate IT apps Migration to AWS
Amazon.com Corporate IT apps Migration to AWSAmazon Web Services
 
Bases de datos en la nube con AWS
Bases de datos en la nube con AWSBases de datos en la nube con AWS
Bases de datos en la nube con AWS
Amazon Web Services LATAM
 
AWS CLOUD 2017 - AWS 코어팀과 함께하는 고객 성공 전략 (황인철 상무 & 박성훈 테크니컬 어카운트 매니저 & 김소희 컨설턴트)
AWS CLOUD 2017 - AWS 코어팀과 함께하는 고객 성공 전략 (황인철 상무 & 박성훈 테크니컬 어카운트 매니저 & 김소희 컨설턴트)AWS CLOUD 2017 - AWS 코어팀과 함께하는 고객 성공 전략 (황인철 상무 & 박성훈 테크니컬 어카운트 매니저 & 김소희 컨설턴트)
AWS CLOUD 2017 - AWS 코어팀과 함께하는 고객 성공 전략 (황인철 상무 & 박성훈 테크니컬 어카운트 매니저 & 김소희 컨설턴트)
Amazon Web Services Korea
 
AWS Services Overview
AWS Services OverviewAWS Services Overview
AWS Services Overview
Amazon Web Services LATAM
 
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
Amazon Web Services
 
Amazon 인공 지능(AI) 서비스 및 AWS 기반 딥러닝 활용 방법 - 윤석찬 (AWS, 테크에반젤리스트)
Amazon 인공 지능(AI) 서비스 및 AWS 기반 딥러닝 활용 방법 - 윤석찬 (AWS, 테크에반젤리스트)Amazon 인공 지능(AI) 서비스 및 AWS 기반 딥러닝 활용 방법 - 윤석찬 (AWS, 테크에반젤리스트)
Amazon 인공 지능(AI) 서비스 및 AWS 기반 딥러닝 활용 방법 - 윤석찬 (AWS, 테크에반젤리스트)
Amazon Web Services Korea
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
Amazon Web Services
 
Hive: A Cloud Story
Hive: A Cloud StoryHive: A Cloud Story
Hive: A Cloud Story
Amazon Web Services
 
Telenor Connexion
Telenor Connexion Telenor Connexion
Telenor Connexion
Amazon Web Services
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
Amazon Web Services
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
Amazon Web Services
 
Partnering with AWS
Partnering with AWSPartnering with AWS
Partnering with AWS
Amazon Web Services
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
Amazon Web Services
 
10 Tips for Connecting with Your Audience
10 Tips for Connecting with Your Audience10 Tips for Connecting with Your Audience
10 Tips for Connecting with Your Audience
SketchBubble
 

Viewers also liked (20)

DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016Apache Phoenix Query Server PhoenixCon2016
Apache Phoenix Query Server PhoenixCon2016
 
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseApache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
Apache Phoenix and Apache HBase: An Enterprise Grade Data Warehouse
 
2016 Resume Current JB
2016 Resume Current JB2016 Resume Current JB
2016 Resume Current JB
 
AWS Summit 2011: Closing Keynote : The Story of Amazon.com's Move to the AWS ...
AWS Summit 2011: Closing Keynote : The Story of Amazon.com's Move to the AWS ...AWS Summit 2011: Closing Keynote : The Story of Amazon.com's Move to the AWS ...
AWS Summit 2011: Closing Keynote : The Story of Amazon.com's Move to the AWS ...
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache FlinkThe Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
 
Amazon.com Corporate IT apps Migration to AWS
Amazon.com Corporate IT apps Migration to AWSAmazon.com Corporate IT apps Migration to AWS
Amazon.com Corporate IT apps Migration to AWS
 
Bases de datos en la nube con AWS
Bases de datos en la nube con AWSBases de datos en la nube con AWS
Bases de datos en la nube con AWS
 
AWS CLOUD 2017 - AWS 코어팀과 함께하는 고객 성공 전략 (황인철 상무 & 박성훈 테크니컬 어카운트 매니저 & 김소희 컨설턴트)
AWS CLOUD 2017 - AWS 코어팀과 함께하는 고객 성공 전략 (황인철 상무 & 박성훈 테크니컬 어카운트 매니저 & 김소희 컨설턴트)AWS CLOUD 2017 - AWS 코어팀과 함께하는 고객 성공 전략 (황인철 상무 & 박성훈 테크니컬 어카운트 매니저 & 김소희 컨설턴트)
AWS CLOUD 2017 - AWS 코어팀과 함께하는 고객 성공 전략 (황인철 상무 & 박성훈 테크니컬 어카운트 매니저 & 김소희 컨설턴트)
 
AWS Services Overview
AWS Services OverviewAWS Services Overview
AWS Services Overview
 
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
 
Amazon 인공 지능(AI) 서비스 및 AWS 기반 딥러닝 활용 방법 - 윤석찬 (AWS, 테크에반젤리스트)
Amazon 인공 지능(AI) 서비스 및 AWS 기반 딥러닝 활용 방법 - 윤석찬 (AWS, 테크에반젤리스트)Amazon 인공 지능(AI) 서비스 및 AWS 기반 딥러닝 활용 방법 - 윤석찬 (AWS, 테크에반젤리스트)
Amazon 인공 지능(AI) 서비스 및 AWS 기반 딥러닝 활용 방법 - 윤석찬 (AWS, 테크에반젤리스트)
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
Hive: A Cloud Story
Hive: A Cloud StoryHive: A Cloud Story
Hive: A Cloud Story
 
Telenor Connexion
Telenor Connexion Telenor Connexion
Telenor Connexion
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
Partnering with AWS
Partnering with AWSPartnering with AWS
Partnering with AWS
 
Deep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDBDeep Dive on Amazon DynamoDB
Deep Dive on Amazon DynamoDB
 
10 Tips for Connecting with Your Audience
10 Tips for Connecting with Your Audience10 Tips for Connecting with Your Audience
10 Tips for Connecting with Your Audience
 

Similar to Introducing Apache Kudu (Incubating) - Montreal HUG May 2016

Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
StampedeCon
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
Swiss Big Data User Group
 
Kudu Cloudera Meetup Paris
Kudu Cloudera Meetup ParisKudu Cloudera Meetup Paris
Kudu Cloudera Meetup Paris
نهاد مبارك
 
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Timothy Spann
 
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptxKudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
Felicia Haggarty
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
cdmaxime
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
Spark Summit
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
SUSE Italy
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)
Cloudera, Inc.
 
MySQL in the Hosted Cloud
MySQL in the Hosted CloudMySQL in the Hosted Cloud
MySQL in the Hosted Cloud
Colin Charles
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
cdmaxime
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Cloudera, Inc.
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with HadoopCloudera, Inc.
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
MaharajothiP
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Hadoop / Spark Conference Japan
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Databricks
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
 
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisImpala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris Tsirogiannis
Felicia Haggarty
 

Similar to Introducing Apache Kudu (Incubating) - Montreal HUG May 2016 (20)

Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Kudu Cloudera Meetup Paris
Kudu Cloudera Meetup ParisKudu Cloudera Meetup Paris
Kudu Cloudera Meetup Paris
 
Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)Cloudera Operational DB (Apache HBase & Apache Phoenix)
Cloudera Operational DB (Apache HBase & Apache Phoenix)
 
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptxKudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike PercySpark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)Bay Area Impala User Group Meetup (Sept 16 2014)
Bay Area Impala User Group Meetup (Sept 16 2014)
 
MySQL in the Hosted Cloud
MySQL in the Hosted CloudMySQL in the Hosted Cloud
MySQL in the Hosted Cloud
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Impala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris TsirogiannisImpala tech-talk by Dimitris Tsirogiannis
Impala tech-talk by Dimitris Tsirogiannis
 

Recently uploaded

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 

Recently uploaded (20)

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 

Introducing Apache Kudu (Incubating) - Montreal HUG May 2016

  • 1. 1© Cloudera, Inc. All rights reserved. Introducing Apache Kudu (incubating) Mladen Kovacevic | Solutions Architect May 2016 Adapted from Jeremy Beard’s original presentation in New York, Dec 2015
  • 2. 2© Cloudera, Inc. All rights reserved. Presenter • Mladen Kovacevic • Solutions Architect at Cloudera (Toronto based) • 2+ yrs in Big Data • 3+ yrs building workload optimized systems, exploiting hardware features to improve software performance and capability (compute, storage, networking) • 6+ yrs engine development enterprise RDBMS mladen@cloudera.com
  • 3. 3© Cloudera, Inc. All rights reserved. Current storage landscape in Hadoop HDFS (GFS) excels at: • Scanning large amounts of data • Accumulating data with high throughput HBase (BigTable) excels at: • Efficiently finding and writing individual rows • Making data mutable Gaps exist when these properties are needed simultaneously
  • 4. 4© Cloudera, Inc. All rights reserved. Managing the gap (today) Code Complexity • Manage flow and sync of data between HDFS and HBase Monitoring and Security • Managing consistent backups, security policies, monitoring and more is hard Performance • Significant lag between arrival of HBase data “staging” and time when data is available for analytics.
  • 5. 5© Cloudera, Inc. All rights reserved. Hardware landscape is changing rethink design! • Spinning disk (HDD) -> solid state storage (SSD) • NAND flash: Up to 450k read 250k write IOPS, about 2GB/sec read and 1.5GB/sec write throughput, at a price of less than $3/GB and dropping • 3D XPoint memory – persistent storage (1000x faster than NAND, cheaper than RAM) • RAM is cheaper and more abundant: • 64->128->256GB over last few years • Takeaway 1: The next bottleneck is CPU, and current storage heavy applications weren’t designed with CPU efficiency in mind • Takeaway 2: Column stores are feasible for random access
  • 6. 6© Cloudera, Inc. All rights reserved. Apache Kudu (Incubating) Storage for Fast Analytics on Fast Data • New updatable column store for Hadoop • Simplifies the architecture for building analytic applications on changing data • Designed for fast analytic performance • Natively integrated with Hadoop • Donated as incubating project at Apache Software Foundation (November 17, 2015) • Beta v0.8.0 now available STRUCTURED Sqoop UNSTRUCTURED Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService FILESYSTEM HDFS RELATIONAL Kudu NoSQL HBase STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr SDK Kite
  • 7. 7© Cloudera, Inc. All rights reserved. • High throughput for big scans (columnar storage and replication) Goal: Within 2x of Parquet • Low-latency for short accesses (primary key indexes and quorum design) Goal: 1ms read/write on SSD • Database-like semantics (initially single-row ACID) • Relational data model (tables/schemas) • Easily integrate with SQL engines (Impala, Drill, Hive) • “NoSQL” style scan/insert/update (Java, C++, Python clients) Kudu design goals
  • 8. 8© Cloudera, Inc. All rights reserved. What Kudu is not • Not a SQL interface • Just the storage layer • “BYOSQL” – Bring-your-own SQL • Not a file system • Data must have tabular structure • Not an application that runs on HDFS • An alternative, native Hadoop storage engine • Not a replacement for HDFS or HBase • Select the right storage for the right use case • Cloudera will continue to support and invest in all three
  • 9. 9© Cloudera, Inc. All rights reserved. What Kudu is STORAGE SYSTEM for TABLES of STRUCTURED DATA No SQL engine, nor query planner, nor optimizer. Storage system that lets you get at your data quickly (random/scan). Highly scalable, with ability to provide higher level systems info to exploit its abilities (ie. locality). Data logically presented to clients as rows and finite set of columns Stored in columnar format Tables are partitioned across tablets managed by tablet servers. Tablets are replicated managed by Raft consensus. Strongly typed columns Finite number of columns Add/remove columns solely by ALTER TABLE statements
  • 10. 10© Cloudera, Inc. All rights reserved. About Kudu • Apache-licensed open source software • Structured data model • Basic construct: tables • Tables broken down into tablets (roughly equivalent to partitions) • Architecture supports geographically disparate, active/active systems • Not the initial design goal
  • 11. 11© Cloudera, Inc. All rights reserved. Kudu data model • Tables have a RDBMS-like schema • Finite number of columns (unlike HBase/Cassandra) • Types: BOOL, INT8/16/32/64, FLOAT, DOUBLE, STRING, BINARY, TIMESTAMP • Some subset of columns make up a primary key • Fast random reads/writes by primary key • No secondary indexes (yet) • Columnar layout on disk – Parquet-like • Lazy materialization • Encoding and compression options 11
  • 12. 12© Cloudera, Inc. All rights reserved. Table partitioning • Hash bucketing • Distribute records by hash of partition column(s) • N buckets leads to N tablets • Range partitioning • Distribute records by ranges of the partition column(s) • N split keys leads to N tablets • Can be a mix for different columns of the primary key
  • 13. 13© Cloudera, Inc. All rights reserved. Kudu tables are partitioned into “tablets” (regions) Partitioning based on primary key (PK) • Native support for range partitioning and/or hash partitioning (salting) • Hash example: PRIMARY KEY (tweet_id) DISTRIBUTE BY HASH(tweet_id) INTO 100 BUCKETS
  • 14. 14© Cloudera, Inc. All rights reserved. Each tablet is fault tolerant via Raft consensus • A single Tablet Server can host many tablets • Metadata is stored on just another tablet, but only Master Server processes host that tablet • Raft consensus: • Strong consistency • Leader election on failure • Replication factor 3 or 5 is typical
  • 15. 15© Cloudera, Inc. All rights reserved. Consistency model • Consistency and replication enforced by Raft consensus (similar to Paxos) • Replication by operation not data • Single-row transactions now • Multi-row transactions later • Geo-distributed replicas will be possible under strict time synchronization • Techniques drawn from Google Spanner and others
  • 16. 16© Cloudera, Inc. All rights reserved. Kudu interfaces • NoSQL-style APIs • Insert(), Update(), Delete(), Scan() • Java, C++, limited Python • Integrations with MapReduce, Spark, and Impala • No direct access to underlying Kudu tablet files • Beta does not have authentication, authorization, encryption
  • 17. 17© Cloudera, Inc. All rights reserved. Impala integration • Opens up Kudu to JDBC/ODBC clients • Intuitive way to get data into Kudu • INSERT INTO kudu_table SELECT * FROM src_table; • Additional commands • UPDATE • DELETE • Efficient INSERT VALUES • Runs on the Kudu C++ client
  • 18. 18© Cloudera, Inc. All rights reserved. Performance characteristics Very CPU efficient • Written in modern C++, uses specialized CPU instructions (SIMD), JIT compilation with LLVM Latency dependent on storage hardware capabilities • Expect sub-millisecond response on SSDs and upcoming technologies No garbage collection allows very large memory footprint with no pauses Bloom filters reduce the need for many disk accesses
  • 19. 19© Cloudera, Inc. All rights reserved. Operating Kudu • Easiest through Cloudera Manager integration • Separate parcel for now • Kudu is always compacting • No minor vs. major compaction • No compaction latency spikes • Web UI is full of metrics and logs
  • 20. 20© Cloudera, Inc. All rights reserved. Cluster layout • One or multiple masters for failover • Only one in current beta • Low CPU and memory impact • One tablet server per worker node • Can share disks with HDFS • One SSD per worker node just for Kudu WAL can speed up writes • No dependencies on other Hadoop ecosystem components • But interfacing components like Impala or Spark do
  • 21. 21© Cloudera, Inc. All rights reserved. Kudu: Cluster metadata management • Replicated master • Acts as a tablet directory • Acts as a catalog (which tables exist, etc) • Acts as a load balancer (tracks TS liveness, re-replicates under-replicated tablets) • Caches all metadata in RAM for high performance • Under heavy load, 99.99% response times still in microseconds (650us) • Client configured with master addresses • Client asks master for tablet locations as needed and caches them locally
  • 22. 22© Cloudera, Inc. All rights reserved. Real-time analytics in Hadoop today Merging in new data = storage complexity Downsides: ● Multiple storage layers ● Latest data is hidden ● Files are messy ● Complex to do updates without breaking running queriesNew Partition Most Recent Partition Historic Data HBase Parquet File Have we accumulated enough data? Reorganize HBase file into Parquet • Wait for running operations to complete • Define new Impala partition referencing the newly written Parquet file Incoming Data (Messaging System) Reporting Request HDFS + Impala
  • 23. 23© Cloudera, Inc. All rights reserved. Real-time analytics in Hadoop with Kudu Improvements: ● One system to operate ● No schedules or background processes ● Handle late arrivals or data corrections with ease ● New data available immediately for analytics or operations Historical and Real-time Data Incoming Data (Messaging System) Reporting Request Kudu + Impala
  • 24. 24© Cloudera, Inc. All rights reserved. Kudu for data warehousing • Near real time data visibility • BI tools can display events that happened seconds earlier • Excellent for star schemas • Fast scans of deep fact tables • Efficient wide fact tables • Simplified updates of slowly changing dimensions
  • 25. 25© Cloudera, Inc. All rights reserved. Near real time data warehousing on Kudu Files RDBMS Streams K A F K A K U D U IMPALA HUE BI tools User FLUME SPARK STREAMING Simple Complex
  • 26. 26© Cloudera, Inc. All rights reserved. Resources Join the community http://getkudu.io kudu-user@googlegroups.com Download the beta cloudera.com/downloads Read the whitepaper getkudu.io/kudu.pdf
  • 27. 27© Cloudera, Inc. All rights reserved. Demo • Load a standard HDFS Parquet table • Create a Kudu table via Impala • INSERT/UPDATE/DELETE statement with Impala • Spark insert into Kudu table sample job • Reading from Parquet files as input • Map task to insert each row into Kudu
  • 28. 28© Cloudera, Inc. All rights reserved. Creating a Kudu table - Impala Kudu Storage handler Table name in Impala does NOT match table name in Kudu. Kudu is its own storage layer. Kudu Master hostname and port A primary key is mandatory
  • 29. 29© Cloudera, Inc. All rights reserved. Spark (Scala) code – Java APIDataFrame Row Kudu Master hostname and port Kudu table name Create a client, session and table object Extract values from the row, strong types Create an insert object and row Set the values by type, column name and column valule Perform the actual insert Cleanup
  • 30. 30© Cloudera, Inc. All rights reserved. Kudu with Spark DataFrames Tie in Kudu with Spark Map of the Kudu table name, and Kudu master Table name to refer to within Spark SparkSQL statement
  • 31. 31© Cloudera, Inc. All rights reserved. Kudu code examples and docs https://github.com/cloudera/kudu-examples http://www.cloudera.com/documentation/betas/kudu/0-7- 0/topics/kudu_development.html http://getkudu.io/docs/developing.html
  • 32. 32© Cloudera, Inc. All rights reserved. Thank you mladen@cloudera.com

Editor's Notes

  1. Copy and paste text of the above slide: CREATE TABLE kudu_table ( cityid STRING , event_dtm STRING , temperature DOUBLE ) TBLPROPERTIES ( 'storage_handler' = 'com.cloudera.kudu.hive.KuduStorageHandler' , 'kudu.table_name' = 'table_in_kudu' , 'kudu.master_addresses' = 'mladen-kudu-1.vpc.cloudera.com:7051' , 'kudu.key_columns' = 'cityid,event_dtm' );
  2. Copy and paste text of the above slide: CREATE TABLE kudu_table ( cityid STRING , event_dtm STRING , temperature DOUBLE ) TBLPROPERTIES ( 'storage_handler' = 'com.cloudera.kudu.hive.KuduStorageHandler' , 'kudu.table_name' = 'table_in_kudu' , 'kudu.master_addresses' = 'mladen-kudu-1.vpc.cloudera.com:7051' , 'kudu.key_columns' = 'cityid,event_dtm' );