Successfully reported this slideshow.

Are you Kudu-ing me?!

1

Share

Upcoming SlideShare
Tekken Custom Chars v2
Tekken Custom Chars v2
Loading in …3
×
1 of 46
1 of 46

Are you Kudu-ing me?!

1

Share

Download to read offline

What is Kudu? What data model does it use? Why it might be better than Apache Parquet and NoSQL databases? Why column-oriented databases are important?

What is Kudu? What data model does it use? Why it might be better than Apache Parquet and NoSQL databases? Why column-oriented databases are important?

More Related Content

Are you Kudu-ing me?!

  1. 1. This folks must be all wrong, aren’t they?
  2. 2. uuid first_name last_name dob ee-c6-47-2c John Connor Feb 28th, 1985 84-ee-ff-d5 Sarah Connor May 11th, 1965 57-4f-d9-d8 Kyle Reese Mar 1st, 2002 SELECT MIN(dob) FROM characters WHERE last_name=”connor”
  3. 3. uuid ee-c6-47-2c 84-ee-ff-d5 57-4f-d9-d8 last_name Connor Connor Reese first_name John Sarah Kyle dob Feb 28th, 1985 May 11th, 1965 Mar 1st, 2002 SELECT MIN(dob) FROM characters WHERE last_name=”connor”
  4. 4. What’s the problem with Apache Parquet then?
  5. 5. Ever implemented Lambda Architecture?
  6. 6. last_name first_name movie actor actor_age Connor John Terminator 2 Edward Furlong 14 Connor John Terminator 2 Michael Edwards 47 Connor Sarah Terminator Linda Hamilton 28 Connor Sarah Terminator 2 Linda Hamilton 35 Reese Kyle Terminator 2 Michael Biehn 35 T-800 Terminator Arnold Schwarzenegger 37 CREATE TABLE ’characters’ ( last_name STRING, first_name STRING, movie STRING, actor STRING, actor_age INT ) DISTRIBUTE BY HASH (last_name, first_name) INTO 4 BUCKETS TBLPROPERTIES ( ’kudu.key_columns’ = ’last_name, first_name, movie, actor’ )
  7. 7. last_name first_name movie actor actor_age Connor John Terminator 2 Edward Furlong 14 Connor John Terminator 2 Michael Edwards 47 Connor Sarah Terminator Linda Hamilton 28 Connor Sarah Terminator 2 Linda Hamilton 35 Reese Kyle Terminator 2 Michael Biehn 35 T-800 Terminator Arnold Schwarzenegger 37 CREATE TABLE ’characters’ ( last_name STRING, first_name STRING, movie STRING, actor STRING, actor_age INT ) DISTRIBUTE BY HASH (last_name, first_name) INTO 4 BUCKETS TBLPROPERTIES ( ’kudu.key_columns’ = ’last_name, first_name, movie, actor’ )
  8. 8. last_name first_name movie actor actor_age Connor John Terminator 2 Edward Furlong 14 Connor John Terminator 2 Michael Edwards 47 Connor Sarah Terminator Linda Hamilton 28 Connor Sarah Terminator 2 Linda Hamilton 35 Reese Kyle Terminator 2 Michael Biehn 35 T-800 Terminator Arnold Schwarzenegger 37 CREATE TABLE ’characters’ ( last_name STRING, first_name STRING, movie STRING, actor STRING, actor_age INT ) DISTRIBUTE BY HASH (last_name, first_name) INTO 4 BUCKETS TBLPROPERTIES ( ’kudu.key_columns’ = ’last_name, first_name, movie, actor’ )
  9. 9. last_name first_name movie actor actor_age Connor John Terminator 2 Edward Furlong 14 Connor John Terminator 2 Michael Edwards 47 Connor Sarah Terminator Linda Hamilton 28 Connor Sarah Terminator 2 Linda Hamilton 35 Reese Kyle Terminator 2 Michael Biehn 35 T-800 Terminator Arnold Schwarzenegger 37
  10. 10. last_name first_name movie actor actor_age Connor John Terminator 2 Edward Furlong 14 Connor John Terminator 2 Michael Edwards 47 Connor Sarah Terminator Linda Hamilton 28 Connor Sarah Terminator 2 Linda Hamilton 35 Reese Kyle Terminator 2 Michael Biehn 35 T-800 Terminator Arnold Schwarzenegger 37 Somewhere between BigTable/HBase range partitioning and Cassandra’s hash partitioning.
  11. 11. last_name Connor Connor Reese first_name John John Kyle movie Terminator 2 Terminator 2 Terminator 2 actor Edward Furlong Michael Edwards Michael Biehn actor_age 14 47 35 last_name Connor Connor first_name Sarah Sarah movie Terminator Terminator 2 actor Linda Hamilton Linda Hamilton actor_age 28 35 last_name T-800 first_name movie Terminator actor Arnold Schwarzenegger actor_age 37
  12. 12. last_name Connor Connor Reese first_name John John Kyle movie Terminator 2 Terminator 2 Terminator 2 actor Edward Furlong Michael Edwards Michael Biehn actor_age 14 47 35 last_name Connor Connor first_name Sarah Sarah movie Terminator Terminator 2 actor Linda Hamilton Linda Hamilton actor_age 28 35 last_name T-800 first_name movie Terminator actor Arnold Schwarzenegger actor_age 37 INSERT INTO characters (last_name, first_name, movie, actor, actor_age) VALUES (’Connor’, ’John’, ’Terminator Genisys’, ’Jason Clarke’, 36)
  13. 13. last_name Connor Connor Connor Reese first_name John John John Kyle movie Terminator 2 Terminator 2 Terminator Genisys Terminator 2 actor Edward Furlong Michael Edwards Jason Clarke Michael Biehn actor_age 14 47 36 35 last_name Connor Connor first_name Sarah Sarah movie Terminator Terminator 2 actor Linda Hamilton Linda Hamilton actor_age 28 35 last_name T-800 first_name movie Terminator actor Arnold Schwarzenegger actor_age 37 INSERT INTO characters (last_name, first_name, movie, actor, actor_age) VALUES (’Connor’, ’John’, ’Terminator Genisys’, ’Jason Clarke’, 36) Delta
  14. 14. last_name Connor Connor Connor Reese first_name John John John Kyle movie Terminator 2 Terminator 2 Terminator Genisys Terminator 2 actor Edward Furlong Michael Edwards Jason Clarke Michael Biehn actor_age 14 47 36 35 last_name Connor Connor first_name Sarah Sarah movie Terminator Terminator 2 actor Linda Hamilton Linda Hamilton actor_age 28 35 last_name T-800 first_name movie Terminator actor Arnold Schwarzenegger actor_age 37 SELECT MAX(actor_age) FROM characters WHERE last_name=’Connor’
  15. 15. last_name Connor Connor Connor Reese first_name John John John Kyle movie Terminator 2 Terminator 2 Terminator Genisys Terminator 2 actor Edward Furlong Michael Edwards Jason Clarke Michael Biehn actor_age 14 47 36 35 last_name Connor Connor first_name Sarah Sarah movie Terminator Terminator 2 actor Linda Hamilton Linda Hamilton actor_age 28 35 last_name T-800 first_name movie Terminator actor Arnold Schwarzenegger actor_age 37 SELECT MAX(actor_age) FROM characters WHERE last_name=’Connor’ MPP FTW
  16. 16. last_name Connor Connor Connor Reese first_name John John John Kyle movie Terminator 2 Terminator 2 Terminator Genisys Terminator 2 actor Edward Furlong Michael Edwards Jason Clarke Michael Biehn actor_age 14 47 36 35 last_name Connor Connor first_name Sarah Sarah movie Terminator Terminator 2 actor Linda Hamilton Linda Hamilton actor_age 28 35 last_name T-800 first_name movie Terminator actor Arnold Schwarzenegger actor_age 37 SELECT MAX(actor_age) FROM characters WHERE movie=’Terminator 2’
  17. 17. last_name Connor Connor Connor Reese first_name John John John Kyle movie Terminator 2 Terminator 2 Terminator Genisys Terminator 2 actor Edward Furlong Michael Edwards Jason Clarke Michael Biehn actor_age 14 47 36 35 last_name Connor Connor first_name Sarah Sarah movie Terminator Terminator 2 actor Linda Hamilton Linda Hamilton actor_age 28 35 last_name T-800 first_name movie Terminator actor Arnold Schwarzenegger actor_age 37 SELECT MAX(actor_age) FROM characters WHERE movie=’Terminator 2’ Bloom filters FTW
  18. 18. Tablet Server 1 Tablet Server 2 Master
  19. 19. Leader Leader Master Master replica Leader Leader Tablet Server 1 Tablet Server 2 Tablet Server 3
  20. 20. Leader Leader Tablet Server 1 Tablet Server 2 Master Master replica Tablet Server 3 Leader Leader Typically 10-100 tablets per machine.
  21. 21. DiskRowSet • Col A • Col B • … • [Delta store] DiskRowSet • Col A • Col B • … • [Delta store] MemRowSet • Col A • Col B • … In-memory concurrent B-tree, Keeps all recently-inserted rows Each column separately written in a single contiguous block of data Base data Deltas organized by rows (until compaction happens)
  22. 22. Long story short: - 30% faster than Parquet 1.0 (TPC-H) - 16-187 times faster than Phoenix or HBase (TPC-H again) - hundreds of thousands of rows inserted per second on a single tablet server
  23. 23. TPC-H test, scale factor 100, RF 3 - 75 nodes, each: 64 GB RAM, 12 spinning disks, 2x 6-core Xeon - Expansion of 62 GB of data (post-replication, compactions done): - 570 GB in Hbase (9.2x) - 227 GB in Kudu (3.7x) http://getkudu.io/kudu.pdf
  24. 24. http://getkudu.io/ http://getkudu.io/faq.html
  25. 25. pmm@collective-sense.com

×