Are you Kudu-ing me?!

Przemek Maciolek
Przemek MaciolekVP of R&D at Collective Sense
Are you Kudu-ing me?!
Are you Kudu-ing me?!
Are you Kudu-ing me?!
Are you Kudu-ing me?!
Are you Kudu-ing me?!
Are you Kudu-ing me?!
Are you Kudu-ing me?!
Are you Kudu-ing me?!
Are you Kudu-ing me?!
This folks must be all wrong, aren’t they?
uuid first_name last_name dob
ee-c6-47-2c John Connor Feb 28th, 1985
84-ee-ff-d5 Sarah Connor May 11th, 1965
57-4f-d9-d8 Kyle Reese Mar 1st, 2002
SELECT MIN(dob) FROM characters WHERE last_name=”connor”
uuid
ee-c6-47-2c
84-ee-ff-d5
57-4f-d9-d8
last_name
Connor
Connor
Reese
first_name
John
Sarah
Kyle
dob
Feb 28th, 1985
May 11th, 1965
Mar 1st, 2002
SELECT MIN(dob) FROM characters WHERE last_name=”connor”
What’s the problem with
Apache Parquet then?
Ever implemented Lambda
Architecture?
Are you Kudu-ing me?!
Are you Kudu-ing me?!
Are you Kudu-ing me?!
Are you Kudu-ing me?!
last_name first_name movie actor actor_age
Connor John Terminator 2 Edward Furlong 14
Connor John Terminator 2 Michael Edwards 47
Connor Sarah Terminator Linda Hamilton 28
Connor Sarah Terminator 2 Linda Hamilton 35
Reese Kyle Terminator 2 Michael Biehn 35
T-800 Terminator Arnold
Schwarzenegger
37
CREATE TABLE ’characters’ (
last_name STRING,
first_name STRING,
movie STRING,
actor STRING,
actor_age INT
)
DISTRIBUTE BY HASH (last_name, first_name) INTO 4 BUCKETS
TBLPROPERTIES (
’kudu.key_columns’ = ’last_name, first_name, movie, actor’
)
last_name first_name movie actor actor_age
Connor John Terminator 2 Edward Furlong 14
Connor John Terminator 2 Michael Edwards 47
Connor Sarah Terminator Linda Hamilton 28
Connor Sarah Terminator 2 Linda Hamilton 35
Reese Kyle Terminator 2 Michael Biehn 35
T-800 Terminator Arnold
Schwarzenegger
37
CREATE TABLE ’characters’ (
last_name STRING,
first_name STRING,
movie STRING,
actor STRING,
actor_age INT
)
DISTRIBUTE BY HASH (last_name, first_name) INTO 4 BUCKETS
TBLPROPERTIES (
’kudu.key_columns’ = ’last_name, first_name, movie, actor’
)
last_name first_name movie actor actor_age
Connor John Terminator 2 Edward Furlong 14
Connor John Terminator 2 Michael Edwards 47
Connor Sarah Terminator Linda Hamilton 28
Connor Sarah Terminator 2 Linda Hamilton 35
Reese Kyle Terminator 2 Michael Biehn 35
T-800 Terminator Arnold
Schwarzenegger
37
CREATE TABLE ’characters’ (
last_name STRING,
first_name STRING,
movie STRING,
actor STRING,
actor_age INT
)
DISTRIBUTE BY HASH (last_name, first_name) INTO 4 BUCKETS
TBLPROPERTIES (
’kudu.key_columns’ = ’last_name, first_name, movie, actor’
)
Are you Kudu-ing me?!
last_name first_name movie actor actor_age
Connor John Terminator 2 Edward Furlong 14
Connor John Terminator 2 Michael Edwards 47
Connor Sarah Terminator Linda Hamilton 28
Connor Sarah Terminator 2 Linda Hamilton 35
Reese Kyle Terminator 2 Michael Biehn 35
T-800 Terminator Arnold
Schwarzenegger
37
last_name first_name movie actor actor_age
Connor John Terminator 2 Edward Furlong 14
Connor John Terminator 2 Michael Edwards 47
Connor Sarah Terminator Linda Hamilton 28
Connor Sarah Terminator 2 Linda Hamilton 35
Reese Kyle Terminator 2 Michael Biehn 35
T-800 Terminator Arnold
Schwarzenegger
37
Somewhere between BigTable/HBase range partitioning and Cassandra’s hash partitioning.
last_name
Connor
Connor
Reese
first_name
John
John
Kyle
movie
Terminator 2
Terminator 2
Terminator 2
actor
Edward Furlong
Michael Edwards
Michael Biehn
actor_age
14
47
35
last_name
Connor
Connor
first_name
Sarah
Sarah
movie
Terminator
Terminator 2
actor
Linda Hamilton
Linda Hamilton
actor_age
28
35
last_name
T-800
first_name movie
Terminator
actor
Arnold
Schwarzenegger
actor_age
37
last_name
Connor
Connor
Reese
first_name
John
John
Kyle
movie
Terminator 2
Terminator 2
Terminator 2
actor
Edward Furlong
Michael Edwards
Michael Biehn
actor_age
14
47
35
last_name
Connor
Connor
first_name
Sarah
Sarah
movie
Terminator
Terminator 2
actor
Linda Hamilton
Linda Hamilton
actor_age
28
35
last_name
T-800
first_name movie
Terminator
actor
Arnold
Schwarzenegger
actor_age
37
INSERT INTO
characters (last_name, first_name, movie, actor, actor_age)
VALUES
(’Connor’, ’John’, ’Terminator Genisys’, ’Jason Clarke’, 36)
last_name
Connor
Connor
Connor
Reese
first_name
John
John
John
Kyle
movie
Terminator 2
Terminator 2
Terminator
Genisys
Terminator 2
actor
Edward Furlong
Michael Edwards
Jason Clarke
Michael Biehn
actor_age
14
47
36
35
last_name
Connor
Connor
first_name
Sarah
Sarah
movie
Terminator
Terminator 2
actor
Linda Hamilton
Linda Hamilton
actor_age
28
35
last_name
T-800
first_name movie
Terminator
actor
Arnold
Schwarzenegger
actor_age
37
INSERT INTO
characters (last_name, first_name, movie, actor, actor_age)
VALUES
(’Connor’, ’John’, ’Terminator Genisys’, ’Jason Clarke’, 36)
Delta
last_name
Connor
Connor
Connor
Reese
first_name
John
John
John
Kyle
movie
Terminator 2
Terminator 2
Terminator
Genisys
Terminator 2
actor
Edward Furlong
Michael Edwards
Jason Clarke
Michael Biehn
actor_age
14
47
36
35
last_name
Connor
Connor
first_name
Sarah
Sarah
movie
Terminator
Terminator 2
actor
Linda Hamilton
Linda Hamilton
actor_age
28
35
last_name
T-800
first_name movie
Terminator
actor
Arnold
Schwarzenegger
actor_age
37
SELECT MAX(actor_age) FROM characters WHERE last_name=’Connor’
last_name
Connor
Connor
Connor
Reese
first_name
John
John
John
Kyle
movie
Terminator 2
Terminator 2
Terminator
Genisys
Terminator 2
actor
Edward Furlong
Michael Edwards
Jason Clarke
Michael Biehn
actor_age
14
47
36
35
last_name
Connor
Connor
first_name
Sarah
Sarah
movie
Terminator
Terminator 2
actor
Linda Hamilton
Linda Hamilton
actor_age
28
35
last_name
T-800
first_name movie
Terminator
actor
Arnold
Schwarzenegger
actor_age
37
SELECT MAX(actor_age) FROM characters WHERE last_name=’Connor’
MPP FTW
last_name
Connor
Connor
Connor
Reese
first_name
John
John
John
Kyle
movie
Terminator 2
Terminator 2
Terminator
Genisys
Terminator 2
actor
Edward Furlong
Michael Edwards
Jason Clarke
Michael Biehn
actor_age
14
47
36
35
last_name
Connor
Connor
first_name
Sarah
Sarah
movie
Terminator
Terminator 2
actor
Linda Hamilton
Linda Hamilton
actor_age
28
35
last_name
T-800
first_name movie
Terminator
actor
Arnold
Schwarzenegger
actor_age
37
SELECT MAX(actor_age) FROM characters WHERE movie=’Terminator 2’
last_name
Connor
Connor
Connor
Reese
first_name
John
John
John
Kyle
movie
Terminator 2
Terminator 2
Terminator
Genisys
Terminator 2
actor
Edward Furlong
Michael Edwards
Jason Clarke
Michael Biehn
actor_age
14
47
36
35
last_name
Connor
Connor
first_name
Sarah
Sarah
movie
Terminator
Terminator 2
actor
Linda Hamilton
Linda Hamilton
actor_age
28
35
last_name
T-800
first_name movie
Terminator
actor
Arnold
Schwarzenegger
actor_age
37
SELECT MAX(actor_age) FROM characters WHERE movie=’Terminator 2’
Bloom filters FTW
Are you Kudu-ing me?!
Are you Kudu-ing me?!
Tablet
Server 1
Tablet
Server 2
Master
Leader
Leader
Master
Master
replica
Leader
Leader
Tablet
Server 1
Tablet
Server 2
Tablet
Server 3
Leader
Leader
Tablet
Server 1
Tablet
Server 2
Master
Master
replica
Tablet
Server 3
Leader
Leader
Typically 10-100 tablets per machine.
Are you Kudu-ing me?!
Are you Kudu-ing me?!
DiskRowSet
• Col A
• Col B
• …
• [Delta
store]
DiskRowSet
• Col A
• Col B
• …
• [Delta
store]
MemRowSet
• Col A
• Col B
• …
In-memory concurrent B-tree,
Keeps all recently-inserted rows
Each column separately written in a
single contiguous block of data
Base data
Deltas organized by rows
(until compaction happens)
Are you Kudu-ing me?!
Are you Kudu-ing me?!
Are you Kudu-ing me?!
Long story short:
- 30% faster than Parquet 1.0 (TPC-H)
- 16-187 times faster than Phoenix or HBase (TPC-H again)
- hundreds of thousands of rows inserted per second on a single tablet server
TPC-H test, scale factor 100, RF 3
- 75 nodes, each: 64 GB RAM, 12 spinning disks, 2x 6-core Xeon
- Expansion of 62 GB of data (post-replication, compactions done):
- 570 GB in Hbase (9.2x)
- 227 GB in Kudu (3.7x)
http://getkudu.io/kudu.pdf
http://getkudu.io/
http://getkudu.io/faq.html
pmm@collective-sense.com
1 of 46

Recommended

Tekken Custom Chars v2 by
Tekken Custom Chars v2Tekken Custom Chars v2
Tekken Custom Chars v2Sterling Nelson
1.9K views35 slides
My Created Characters on Tekken Tag Tournament 2 by
My Created Characters on Tekken Tag Tournament 2My Created Characters on Tekken Tag Tournament 2
My Created Characters on Tekken Tag Tournament 2Sterling Nelson
3.3K views48 slides
Spark! by
Spark!Spark!
Spark!Przemek Maciolek
1.8K views25 slides
5 Apache Spark Tips in 5 Minutes by
5 Apache Spark Tips in 5 Minutes5 Apache Spark Tips in 5 Minutes
5 Apache Spark Tips in 5 MinutesCloudera, Inc.
2.2K views10 slides
Kudu demo by
Kudu demoKudu demo
Kudu demoHemanth Kumar Ratakonda
648 views17 slides
Apache Spark Operations by
Apache Spark OperationsApache Spark Operations
Apache Spark OperationsCloudera, Inc.
1.9K views33 slides

More Related Content

Viewers also liked

Certificate_2 by
Certificate_2Certificate_2
Certificate_2Nyathina
18 views1 slide
coexistence, by Okim by
coexistence, by Okim   coexistence, by Okim
coexistence, by Okim Sari Asih
268 views13 slides
Emprendimiento by
EmprendimientoEmprendimiento
Emprendimientocejotaz
205 views1 slide
3 columnas 229 mm en curva bn by
3 columnas 229 mm en curva bn3 columnas 229 mm en curva bn
3 columnas 229 mm en curva bnJessica Orosco
110 views1 slide
Nube by
NubeNube
NubeLucianacg8
88 views1 slide
あなたはプラダメガネについて知っておくべきこと by
あなたはプラダメガネについて知っておくべきことあなたはプラダメガネについて知っておくべきこと
あなたはプラダメガネについて知っておくべきことcanlie279
90 views1 slide

Viewers also liked(13)

Certificate_2 by Nyathina
Certificate_2Certificate_2
Certificate_2
Nyathina18 views
coexistence, by Okim by Sari Asih
coexistence, by Okim   coexistence, by Okim
coexistence, by Okim
Sari Asih268 views
Emprendimiento by cejotaz
EmprendimientoEmprendimiento
Emprendimiento
cejotaz205 views
3 columnas 229 mm en curva bn by Jessica Orosco
3 columnas 229 mm en curva bn3 columnas 229 mm en curva bn
3 columnas 229 mm en curva bn
Jessica Orosco110 views
あなたはプラダメガネについて知っておくべきこと by canlie279
あなたはプラダメガネについて知っておくべきことあなたはプラダメガネについて知っておくべきこと
あなたはプラダメガネについて知っておくべきこと
canlie27990 views
Epiphany year c by chuyen tran
Epiphany year cEpiphany year c
Epiphany year c
chuyen tran1.2K views
Bright Copy Portfolio.Pdf by Bright_Copy
Bright Copy Portfolio.PdfBright Copy Portfolio.Pdf
Bright Copy Portfolio.Pdf
Bright_Copy261 views
Banner Web - Boas Festas 2012 by Hendrik Nanni
Banner Web - Boas Festas 2012Banner Web - Boas Festas 2012
Banner Web - Boas Festas 2012
Hendrik Nanni177 views
Portada libro THE BEATLES un día en la vida by Jessica Orosco
Portada libro THE BEATLES un día en la vidaPortada libro THE BEATLES un día en la vida
Portada libro THE BEATLES un día en la vida
Jessica Orosco411 views

Recently uploaded

[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx by
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptxDataScienceConferenc1
6 views16 slides
Survey on Factuality in LLM's.pptx by
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptxNeethaSherra1
7 views9 slides
Infomatica-MDM.pptx by
Infomatica-MDM.pptxInfomatica-MDM.pptx
Infomatica-MDM.pptxKapil Rangwani
11 views16 slides
Data about the sector workshop by
Data about the sector workshopData about the sector workshop
Data about the sector workshopinfo828217
15 views27 slides
Amy slides.pdf by
Amy slides.pdfAmy slides.pdf
Amy slides.pdfStatsCommunications
5 views13 slides
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptxayeshabaig2004
7 views30 slides

Recently uploaded(20)

[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx by DataScienceConferenc1
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
Survey on Factuality in LLM's.pptx by NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra17 views
Data about the sector workshop by info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821715 views
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20047 views
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
CRM stick or twist workshop by info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821711 views
Data Journeys Hard Talk workshop final.pptx by info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821710 views
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1007 views
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... by DataScienceConferenc1
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
3196 The Case of The East River by ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9017 views
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
Ukraine Infographic_22NOV2023_v2.pdf by AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views

Are you Kudu-ing me?!