SlideShare a Scribd company logo
APACHE - KUDU
WELCOME TO DEMO SESSION TO WORK WITH APACHE - KUDU
Part of defining NDX Strategic Architecture
Version: 0.2. – Status: draft, Date: 9/28/2016
Author: Ravi Kumar Itha & ZTV Team, Reviewers: Manjunatha Prabhu, Felix Shulman, Garry Steedman, Mara Preotescu
Participating teams: Nielsen, Kogentix, and Cloudera
Agenda
 Kudu – Overview
 Kudu – High level
 Design Goals of Kudu
 Kudu – Architecture
 Kudu –Tablet Storage
 Kudu – Hadoop Integration
 Kudu Implementation in Buffer Load and Rawdata load.
 Inserting the data
 Reading the data
 Deleting
 Dropping a table
Kudu at a high level
• It is an open source storage engine, supports low-latency random access together with efficient analytical access patterns.
• It distributes data using horizontal partitioning and replicates each partition, providing low mean-time-to-recovery and low tail latencies
• It is designed within the context of the Hadoop ecosystem and supports integration with Cloudera Impala, Apache Spark, and MapReduce.
Feature Description
Tables and Schemas • Kudu is a storage system for tables of structured data.
• A Kudu cluster may have any number of tables
• Each table has a well-defined schema consisting of a finite number of columns
Unlike most relational
databases
• Kudu does not currently offer secondary indexes or uniqueness constraints other than the primary key.
• Currently, Kudu requires that every table has a primary key defined, though we anticipate that a future version will add
automatic generation of surrogate keys
Write operations • Insert, Update, Upsert and Delete
Read operations • Kudu offers Scan operation to retrieve data from a table. On a scan, any number of predicates can be provided to filter
the results
• In addition to applying predicates, user may specify a projection for a scan
API • Kudu provides APIs for callers to determine the mapping of data ranges to particular servers to aid distributed
execution frameworks such as Spark, MapReduce, or Impala
Consistency Model • Snapshot consistency
• External consistency
Timestamps • Kudu does not allow the user to manually set the timestamp of a write operation
• Allow the user to specify a timestamp for a read operation. This allows the user to perform point-in-time queries in the
past
Design Goals of Kudu
 Strong performance for both scan and random access to help customers simplify complex hybrid architectures
 High CPU efficiency in order to maximize the return on investment that our customers are making in modern processors
 High IO efficiency in order to leverage modern persistent storage
 The ability to update data in place, to avoid extraneous processing and data movement
 The ability to support active-active replicated clusters that span multiple data centers in geographically distant locations
Kudu – Architecture
• Reference Material:
Feature Description
Cluster Roles • Kudu relies on a single Master server*, responsible for metadata,
• Arbitrary number of Tablet Servers, responsible for data
Partitioning • Tables in Kudu are horizontally partitioned. Like BigTable, calls these horizontal partitions tablets.
• Any row may be mapped to exactly one tablet based on the value of its primary key  ensures random access
operations
• For large tables, the recommendation is to have 10-100 tablets per machine. Each tablet can be tens of gigabytes
• Kudu supports a flexible array of partitioning schemes
• Partition schema is made up of zero or more hash partitioning rules followed by an optional range-partitioning rule:
 A hash-partitioning rule consists of a subset of the primary key columns and a number of buckets
 A range-partitioning rule consists of an ordered subset of the primary key columns
Replication • Kudu replicates all of its table data across multiple machines, typically 3 or 5
The Kudu Master • Act as a catalog manager
• Act as a cluster coordinator
• Act as a tablet directory
Kudu – Architecture
Kudu –Tablet Storage
Feature Description
Objectives behind the
design
• Fast columnar scans
• Low-latency random updates
• Consistency of performance
RowSets • Tablets in Kudu are themselves subdivided into smaller units called RowSets
• Two types of RowSets: MemRowSets, DiskRowSets
• MemRowSets – RowSets exist in memory are called
• DiskRowSets – RowSets exist in a combination of disk and memory
Other features that make
Kudu perform better in
data read / write and data
management
Kudu has implemented the below processes efficiently by following some best techniques such as Immutable B-tree indexes,
LRU (Least Recently Used ) page caches, Bloom filters, MVCC (Multi-version concurrency control), encoding techniques.
• INSERT path
• Read path
• Lazy Materialization
• Delta Compaction
• RowSet Compaction
• Scheduling maintenance
Kudu – Hadoop Integration
Feature Description
MapReduce and Spark • Bindings for MapReduce jobs to either input or output data to Kudu tables
• A small glue layer binds Kudu tables to higher-level Spark concepts such as DataFrames and Spark SQL tables
• It has native support for several key features:
 Locality
 Columnar Projection
 Predicate pushdown support
Impala • Kudu is also deeply integrated with Cloudera Impala
• SQL support operations are provided via its integration with Impala
• Impala integration includes several key features:
 Locality
 Predicate pushdown support
 DDL extensions
 DML extensions
WHAT DATA TYPES DOES KUDU
SUPPORT?
• Boolean
• 8-bit-signed-integer
• 16-bit-signed-integer
• 32-bit-signed-integer
• 64-bit-signed-integer
• Timestamp
• 32-bit-floating-point
• 64-bit-floating-point
• String
• Binary
ENCODING TYPES ?
COLUMNTYPE
• Integer,Timestamp plain, bitshuffle,run length
• Float plain, bitshuffle
• Bool plain, dictionary, run length
• String,binary plain, prefix, dictionary
Bishulle results are LZ4 compression
ENCODING
HOWTO CREATE A TABLE IN KUDU ?
• You can user Impala
• You can use scala.
CREATINGTABLE USING IMPALA
CREATE TABLE <table_name> (columns)
PRIMARY KEY (c1,c2)
DISTRIBUTE BY RANGE (column)
RANGE BOUND ((2011), (2016))
SPLIT ROWS ((2012), (2013), (2014), (2015));
CREATINGTABLE USING SCALA
• Create kuducontext
• val columnList = new ArrayList[ColumnSchema]()
columnList.add(new ColumnSchemaBuilder("nc_periodid",Type.INT32).key(true).build())
columnList.add(new ColumnSchemaBuilder("ac_nshopid",Type.STRING).key(true).build())
columnList.add(new ColumnSchemaBuilder("ac_lbatchtype",Type.STRING).key(false).build())
val schema = new Schema(columnList)
val cto = new CreateTableOptions()
distrubutionList.add("nc_periodid")
cto.addHashPartitions(distrubutionList, numberOfBuckets)
kuduClient.createTable(tableName, schema, cto).setRepilica(3)
OPERATION ATABLE USING SCALA
• kuduContext.insertRows(DF,table)
• kuduContext.upsertRows(DF, table)
• kuduContext.updateRows(DF,table)
• kuduContext.tableExists(table)
• kuduContext.deleteTable(table)
• kuduContext.deleteRows(DF, table)
READING DATA IN TO DF FROM A TABLE
• val df = sqlContext.read.options(Map("kudu.master" -> "kudu.master:7051",
"kudu.table" -> "kudu_table")).kudu.where(condition)
 Currently column filter and between condition are supported
Kudu demo
Kudu demo

More Related Content

What's hot

Exponea - Kafka and Hadoop as components of architecture
Exponea  - Kafka and Hadoop as components of architectureExponea  - Kafka and Hadoop as components of architecture
Exponea - Kafka and Hadoop as components of architecture
MartinStrycek
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
Rakuten Group, Inc.
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
Mike Percy
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
jdcryans
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Data Con LA
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
Jeff Holoman
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
Yahoo Developer Network
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
Shravan (Sean) Pabba
 
Apache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architecturesApache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architectures
Nacho García Fernández
 
Apache kudu
Apache kuduApache kudu
Apache kudu
Asim Jalis
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
Vitthal Gogate
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
Chris George
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
liuknag
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureImproving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Databricks
 
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast DataKudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Data
michaelguia
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
markgrover
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Databricks
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
Scott Leberknight
 
Azure_Business_Opportunity
Azure_Business_OpportunityAzure_Business_Opportunity
Azure_Business_OpportunityNojan Emad
 

What's hot (20)

Exponea - Kafka and Hadoop as components of architecture
Exponea  - Kafka and Hadoop as components of architectureExponea  - Kafka and Hadoop as components of architecture
Exponea - Kafka and Hadoop as components of architecture
 
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platformcloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Apache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architecturesApache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architectures
 
Apache kudu
Apache kuduApache kudu
Apache kudu
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
 
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureImproving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast DataKudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Data
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 Improving Apache Spark by Taking Advantage of Disaggregated Architecture Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 
Cloudera Impala
Cloudera ImpalaCloudera Impala
Cloudera Impala
 
Azure_Business_Opportunity
Azure_Business_OpportunityAzure_Business_Opportunity
Azure_Business_Opportunity
 

Similar to Kudu demo

A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clusters
Amjith Singh
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
Caserta
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
StampedeCon
 
SFHUG Kudu Talk
SFHUG Kudu TalkSFHUG Kudu Talk
SFHUG Kudu Talk
Felicia Haggarty
 
Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Fast Analytics
Fast Analytics Fast Analytics
APACHE SPARK.pptx
APACHE SPARK.pptxAPACHE SPARK.pptx
APACHE SPARK.pptx
DeepaThirumurugan
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
Bigdatapump
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
Zahra Eskandari
 
Apache Kudu
Apache KuduApache Kudu
Apache Kudu
Mike Frampton
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
Girish Khanzode
 
Moving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed InstanceMoving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed Instance
Thomas Sykes
 
Hadoop
HadoopHadoop
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud ComputingFarzad Nozarian
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
betalab
 
HSA Features
HSA FeaturesHSA Features
HSA Features
Hen-Jung Wu
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopDataWorks Summit
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Fwdays
 

Similar to Kudu demo (20)

A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Scheduling scheme for hadoop clusters
Scheduling scheme for hadoop clustersScheduling scheme for hadoop clusters
Scheduling scheme for hadoop clusters
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
SFHUG Kudu Talk
SFHUG Kudu TalkSFHUG Kudu Talk
SFHUG Kudu Talk
 
Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
 
Fast Analytics
Fast Analytics Fast Analytics
Fast Analytics
 
APACHE SPARK.pptx
APACHE SPARK.pptxAPACHE SPARK.pptx
APACHE SPARK.pptx
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Apache Kudu
Apache KuduApache Kudu
Apache Kudu
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Moving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed InstanceMoving to the cloud; PaaS, IaaS or Managed Instance
Moving to the cloud; PaaS, IaaS or Managed Instance
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
HSA Features
HSA FeaturesHSA Features
HSA Features
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 

Recently uploaded

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 

Recently uploaded (20)

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 

Kudu demo

  • 1. APACHE - KUDU WELCOME TO DEMO SESSION TO WORK WITH APACHE - KUDU Part of defining NDX Strategic Architecture Version: 0.2. – Status: draft, Date: 9/28/2016 Author: Ravi Kumar Itha & ZTV Team, Reviewers: Manjunatha Prabhu, Felix Shulman, Garry Steedman, Mara Preotescu Participating teams: Nielsen, Kogentix, and Cloudera
  • 2. Agenda  Kudu – Overview  Kudu – High level  Design Goals of Kudu  Kudu – Architecture  Kudu –Tablet Storage  Kudu – Hadoop Integration  Kudu Implementation in Buffer Load and Rawdata load.  Inserting the data  Reading the data  Deleting  Dropping a table
  • 3. Kudu at a high level • It is an open source storage engine, supports low-latency random access together with efficient analytical access patterns. • It distributes data using horizontal partitioning and replicates each partition, providing low mean-time-to-recovery and low tail latencies • It is designed within the context of the Hadoop ecosystem and supports integration with Cloudera Impala, Apache Spark, and MapReduce. Feature Description Tables and Schemas • Kudu is a storage system for tables of structured data. • A Kudu cluster may have any number of tables • Each table has a well-defined schema consisting of a finite number of columns Unlike most relational databases • Kudu does not currently offer secondary indexes or uniqueness constraints other than the primary key. • Currently, Kudu requires that every table has a primary key defined, though we anticipate that a future version will add automatic generation of surrogate keys Write operations • Insert, Update, Upsert and Delete Read operations • Kudu offers Scan operation to retrieve data from a table. On a scan, any number of predicates can be provided to filter the results • In addition to applying predicates, user may specify a projection for a scan API • Kudu provides APIs for callers to determine the mapping of data ranges to particular servers to aid distributed execution frameworks such as Spark, MapReduce, or Impala Consistency Model • Snapshot consistency • External consistency Timestamps • Kudu does not allow the user to manually set the timestamp of a write operation • Allow the user to specify a timestamp for a read operation. This allows the user to perform point-in-time queries in the past
  • 4. Design Goals of Kudu  Strong performance for both scan and random access to help customers simplify complex hybrid architectures  High CPU efficiency in order to maximize the return on investment that our customers are making in modern processors  High IO efficiency in order to leverage modern persistent storage  The ability to update data in place, to avoid extraneous processing and data movement  The ability to support active-active replicated clusters that span multiple data centers in geographically distant locations
  • 5. Kudu – Architecture • Reference Material: Feature Description Cluster Roles • Kudu relies on a single Master server*, responsible for metadata, • Arbitrary number of Tablet Servers, responsible for data Partitioning • Tables in Kudu are horizontally partitioned. Like BigTable, calls these horizontal partitions tablets. • Any row may be mapped to exactly one tablet based on the value of its primary key  ensures random access operations • For large tables, the recommendation is to have 10-100 tablets per machine. Each tablet can be tens of gigabytes • Kudu supports a flexible array of partitioning schemes • Partition schema is made up of zero or more hash partitioning rules followed by an optional range-partitioning rule:  A hash-partitioning rule consists of a subset of the primary key columns and a number of buckets  A range-partitioning rule consists of an ordered subset of the primary key columns Replication • Kudu replicates all of its table data across multiple machines, typically 3 or 5 The Kudu Master • Act as a catalog manager • Act as a cluster coordinator • Act as a tablet directory
  • 7. Kudu –Tablet Storage Feature Description Objectives behind the design • Fast columnar scans • Low-latency random updates • Consistency of performance RowSets • Tablets in Kudu are themselves subdivided into smaller units called RowSets • Two types of RowSets: MemRowSets, DiskRowSets • MemRowSets – RowSets exist in memory are called • DiskRowSets – RowSets exist in a combination of disk and memory Other features that make Kudu perform better in data read / write and data management Kudu has implemented the below processes efficiently by following some best techniques such as Immutable B-tree indexes, LRU (Least Recently Used ) page caches, Bloom filters, MVCC (Multi-version concurrency control), encoding techniques. • INSERT path • Read path • Lazy Materialization • Delta Compaction • RowSet Compaction • Scheduling maintenance
  • 8. Kudu – Hadoop Integration Feature Description MapReduce and Spark • Bindings for MapReduce jobs to either input or output data to Kudu tables • A small glue layer binds Kudu tables to higher-level Spark concepts such as DataFrames and Spark SQL tables • It has native support for several key features:  Locality  Columnar Projection  Predicate pushdown support Impala • Kudu is also deeply integrated with Cloudera Impala • SQL support operations are provided via its integration with Impala • Impala integration includes several key features:  Locality  Predicate pushdown support  DDL extensions  DML extensions
  • 9. WHAT DATA TYPES DOES KUDU SUPPORT? • Boolean • 8-bit-signed-integer • 16-bit-signed-integer • 32-bit-signed-integer • 64-bit-signed-integer • Timestamp • 32-bit-floating-point • 64-bit-floating-point • String • Binary
  • 10. ENCODING TYPES ? COLUMNTYPE • Integer,Timestamp plain, bitshuffle,run length • Float plain, bitshuffle • Bool plain, dictionary, run length • String,binary plain, prefix, dictionary Bishulle results are LZ4 compression ENCODING
  • 11. HOWTO CREATE A TABLE IN KUDU ? • You can user Impala • You can use scala.
  • 12. CREATINGTABLE USING IMPALA CREATE TABLE <table_name> (columns) PRIMARY KEY (c1,c2) DISTRIBUTE BY RANGE (column) RANGE BOUND ((2011), (2016)) SPLIT ROWS ((2012), (2013), (2014), (2015));
  • 13. CREATINGTABLE USING SCALA • Create kuducontext • val columnList = new ArrayList[ColumnSchema]() columnList.add(new ColumnSchemaBuilder("nc_periodid",Type.INT32).key(true).build()) columnList.add(new ColumnSchemaBuilder("ac_nshopid",Type.STRING).key(true).build()) columnList.add(new ColumnSchemaBuilder("ac_lbatchtype",Type.STRING).key(false).build()) val schema = new Schema(columnList) val cto = new CreateTableOptions() distrubutionList.add("nc_periodid") cto.addHashPartitions(distrubutionList, numberOfBuckets) kuduClient.createTable(tableName, schema, cto).setRepilica(3)
  • 14. OPERATION ATABLE USING SCALA • kuduContext.insertRows(DF,table) • kuduContext.upsertRows(DF, table) • kuduContext.updateRows(DF,table) • kuduContext.tableExists(table) • kuduContext.deleteTable(table) • kuduContext.deleteRows(DF, table)
  • 15. READING DATA IN TO DF FROM A TABLE • val df = sqlContext.read.options(Map("kudu.master" -> "kudu.master:7051", "kudu.table" -> "kudu_table")).kudu.where(condition)  Currently column filter and between condition are supported