Successfully reported this slideshow.
Your SlideShare is downloading. ×

User Defined Partitioning on PlazmaDB

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Presto At Treasure Data
Presto At Treasure Data
Loading in …3
×

Check these out next

1 of 28 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to User Defined Partitioning on PlazmaDB (20)

Advertisement

More from Kai Sasaki (20)

Recently uploaded (20)

Advertisement

User Defined Partitioning on PlazmaDB

  1. 1. T R E A S U R E D A T A USER DEFINED PARTITIONING A New Partitioning Strategy accelerating CDP Workload Kai Sasaki Software Engineer in Treasure Data
  2. 2. ABOUT ME - Kai Sasaki (@Lewuathe) - Software Engineer in Treasure Data since 2015 Working in Query Engine Team (Managing Hive, Presto in Treasure data) - Contributor of Hadoop, Spark, Presto
  3. 3. TOPICS PlazmaDB PlazmaDB is a metadata storage for all log data in Treasure Data. It supports import, export, INSERT INTO, CREATE TABLE, DELETE etc on top of PostgreSQL transaction mechanism. Time Index Partitioning Partitioning log data by the time log generated. The time when the log is generated is specified as “time” column in Treasure Data. It enables us to skip to read unnecessary partitions. User Defined Partitioning (New!) In addition to “time” column, we can use any column as partitioning key. It provides us more flexible partitioning strategy that fits CDP workload.
  4. 4. OVERVIEW OF QUERY ENGINE IN TD
  5. 5. PRESTO IN TREASURE DATA • Multiple clusters with 50~60 worker cluster • Presto 0.188 Stats • 4.3+ million queries / month • 400 trillion records / month • 6+ PB / month At the end of 2017
  6. 6. HIVE AND PRESTO ON PLAZMADB Bulk Import Fluentd Mobile SDK PlazmaDB Presto Hive SQL, CDP Amazon S3
  7. 7. PLAZMADB PlazmaDB Amazon S3 id data_set_id first_index_key last_index_key record_count path P1 3065124 187250 1412323028 1412385139 109 abcdefg-1234567-abcdefg-1234567 P2 3065125 187250 1412323030 1412324030 209 abcdefg-1234567-abcdefg-9182841 P3 3065126 187250 1412327028 1412328028 31 abcdefg-1234567-abcdefg-5818231 P4 3065127 187250 1412325011 1412326001 102 abcdefg-1234567-abcdefg-7271828 P5 3065128 281254 1412324214 1412325210 987 abcdefg-1234567-abcdefg-6717284 P6 3065129 281254 1412325123 1412329800 541 abcdefg-1234567-abcdefg-5717274 Multi Column Indexes s3://plazma-partitions/… 1-hour partitioning
  8. 8. PLAZMADB PlazmaDB Amazon S3 Realtime Storage Amazon S3 Archive StorageMapReduce Keeps 1-hour partitioning periodically. Time-Indexed Partitioning
  9. 9. PROBLEM • Time index partitioning is efficient only when “time” value is specified.
 Specifying other columns cause full-scan which can make 
 performance worse. • The number of records in a partition highly depends on the table type, user usage. SELECT COUNT(1) FROM table WHERE user_id = 1; id data_set_id first_index_key last_index_key record_count path P1 3065124 100 1412323028 1412385139 1 abcdefg-1234567-abcdefg-1234567 P2 3065125 100 1412323030 1412324030 1 abcdefg-1234567-abcdefg-9182841 P3 3065126 100 1412327028 1412328028 1 abcdefg-1234567-abcdefg-5818231 P4 3065127 200 1412325011 1412326001 101021 abcdefg-1234567-abcdefg-7271828
  10. 10. USER DEFINED PARTITIONING
  11. 11. USER DEFINED PARTITIONING • User can specify the partitioning strategy based on their usage using partitioning key column 
 max time range. 1h 1h 1h 1h1h time v1 v2 v3 c1
  12. 12. USER DEFINED PARTITIONING 1h 1h 1h 1h1h time c1 v1 v2 v3 … WHERE c1 = ‘v1’ AND time = … • User can specify the partitioning strategy based on their usage using partitioning key column 
 max time range.
  13. 13. USER DEFINED PARTITIONING 1h 1h 1h 1h1h time c1 v1 v2 v3 … WHERE c1 = ‘v1’ AND time = … 1h 1h 1h 1h1h time c1 v1 v2 v3 … WHERE c1 = ‘v1’ AND time = … • User can specify the partitioning strategy based on their usage using partitioning key column 
 max time range.
  14. 14. USER DEFINED PARTITIONING CREATE TABLE via Presto or Hive Insert data partitioned by set partitioning key Set user defined configuration The number of bucket, hash function, partitioning key Read the data from UDP table UDP table is now visible via Presto and HiveLOG
  15. 15. USER DEFINED CONFIGURATION • We need to set columns to be used as partitioning key and the number of partitions. 
 It should be custom configuration by each user. user_table_id columns bucket_count partiton_function T1 141849 [["o_orderkey","long"]] 32 hash T2 141850 [[“user_id","long"]] 32 hash T3 141910 [[“item_id”,”long"]] 16 hash T4 151242 [[“region_id”,”long"], [“device_id”,”long”]] 256 hash
  16. 16. CREATE UDP TABLE VIA PRESTO • Presto and Hive support CREATE TABLE/INSERT INTO on UDP table CREATE TABLE udp_customer WITH ( bucketed_on = array[‘customer_id’], bucket_count = 128 ) AS SELECT * from normal_customer;
  17. 17. CREATE UDP TABLE VIA PRESTO • Override ConnectorPageSink to write MPC1 file based on user defined partitioning key. PlazmaPageSink PartitionedMPCWriter TimeRangeMPCWriter TimeRangeMPCWriter TimeRangeMPCWriter BufferedMPCWriter BufferedMPCWriter BufferedMPCWriter . . . b1 b2 b3 Page 1h 1h 1h
  18. 18. CREATE UDP TABLE VIA PRESTO • Override ConnectorPageSink to write MPC1 file based on user defined partitioning key. PlazmaPageSink PartitionedMPCWriter TimeRangeMPCWriter TimeRangeMPCWriter TimeRangeMPCWriter BufferedMPCWriter BufferedMPCWriter BufferedMPCWriter . . . Page
  19. 19. CREATE UDP TABLE VIA PRESTO id data_set_id first_index_key last_index_key record_count path bucket_ number P1 3065124 187250 1412323028 1412385139 109 abcdefg-1234567-abcdefg-1234567 1 P2 3065125 187250 1412323030 1412324030 209 abcdefg-1234567-abcdefg-9182841 2 P3 3065126 187250 1412327028 1412328028 31 abcdefg-1234567-abcdefg-5818231 3 P4 3065127 187250 1412325011 1412326001 102 abcdefg-1234567-abcdefg-7271828 2 P5 3065128 281254 1412324214 1412325210 987 abcdefg-1234567-abcdefg-6717284 16 P6 3065129 281254 1412325123 1412329800 541 abcdefg-1234567-abcdefg-5717274 14 • New bucket_number column is added to partition record in PlazmaDB.
  20. 20. READ DATA FROM UDP TABLE ConnectorSplitManager#getSplits returns data source splits to be read by Presto cluster. Decide target bucket from constraint Constraint specifies the range should be read from the table. ConnectorSplitManager asks PlazmaDB to get the partitions on the target bucket. Override Presto Connector to data source Presto provides a plugin mechanism to connect any data source flexibly. The connector provides the information about metadata and location of real data source, UDFs. Receive constraint as TupleDomain TupleDomain is created from query plan and passed through TableLayout which is available in ConnectorSplitManager
  21. 21. READ DATA FROM UDP TABLE SplitManager PlazmaDB TableLayout SQL constraint Map<ColumnHandle, Domain> Distribute PageSource … WHERE bucker_number in () …
  22. 22. PERFORMANCE
  23. 23. PERFORMANCE COMPARISON SQLs on TPC-H (scaled factor=1000) elapsedtime(sec) 0 sec 75 sec 150 sec 225 sec 300 sec count1_filter groupby hashjoin 87.279 36.569 1.04 266.71 69.374 19.478 NORMAL UDP
  24. 24. COLOCATED JOIN time left right l1 r1 l2 r2 l3 r3 left right left right time Distributed Join l1 r1 l1 r1 l2 r2 l3 r3 l2 r2 l3 r3 Colocated Join
  25. 25. PERFORMANCE COMPARISON SQLs on TPC-H (scaled factor=1000) elapsedtime 0 sec 20 sec 40 sec 60 sec 80 sec between mod_predicate count_distinct NORMAL UDP
  26. 26. USER DEFINED PARTITIONING 1h 1h 1h 1h1h time c1 v1 v2 v3 … WHERE time = … 1h 1h 1h 1h1h time c1 v1 v2 v3 … WHERE time = …
  27. 27. FUTURE WORKS • Maintaining efficient partitioning structure • Developing Stella job to rearranging partitioning schema flexibly by using Presto resource. • Various kinds of pipeline (streaming import etc) should support UDP table. • Documentation
  28. 28. T R E A S U R E D A T A

×