Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Automatic Clustering at Snowflake

18 views

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2OKHJt9.

Prasanna Rajaperumal presents Snowflake’s clustering capabilities, including their algorithm for incremental maintenance of approximate clustering of partitioned tables, as well as their infrastructure to perform such maintenance automatically. He also covers some real-world problems they run into and their solutions. Filmed at qconlondon.com.

Prasanna Rajaperumal is a senior engineer at Snowflake, working on Snowflake Databases' Query Engine. Before Snowflake, he worked on building the next generation Data infrastructure at Uber. Over the last decade, he has been building data systems that scale in Cloudera, Cisco and few other companies before that.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Automatic Clustering at Snowflake

  1. 1. © 2019 Snowflake Computing Inc. All Rights Reserved AUTOMATIC CLUSTERING PRASANNA RAJAPERUMAL I MARCH 2019
  2. 2. InfoQ.com: News & Community Site Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ snowflake-automatic-clustering/ • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week
  3. 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon London www.qconlondon.com
  4. 4. © 2018 Snowflake Computing Inc. All Rights Reserved SNOWFLAKE 2 Our vision Allow our customers to access all their data in one place so they can make actionable decisions anytime, anywhere, with any number of users. Our solution Next-generation data warehouse built from the ground up for the cloud to address today’s data and analytics challenges. Delivered as a service Built for the cloud SQL Data Warehouse • Memory Management • Degree of parallelism • Failure recovery • Scale Clusters …. • Clustering Automatic
  5. 5. ARCHITECTURE Shared disk architecture Data stored in S3 Metadata stored separately Compute on-demand
  6. 6. TABLE DATA Partitioned horizontally into files (!) Columnar (Hybrid) storage (PAX) Column values grouped Compressed Header contains index of column start Name Age Country Sam 30 USA Trevor 35 Canada Anna 38 England Raj 40 India Row Oriented StorageColumn Oriented StorageHybrid Columnar Storage
  7. 7. TABLE DATA Immutable File is a unit of Update Every change is files deleted and files added Concurrency Sized to be few 10’s of MB at the most 10’s of millions of files Sam 30 USA Trevor 35 Canada Sam 30 USA Trevor 45 Canada File 1: Update table T set age = 45 where name = “Trevor” File 2:
  8. 8. METADATA Stats on every column for every file Zone maps (Netezza) Min/Max, #distinct, #nulls etc Improve Query performance Pruning: Reduce scan File1 Id: 3, 5, 9 date: 4/1, 4/2, 4/3 File2 Id: 6, 1, 10 date: 4/4, 4/3, 4/4 File3 Id: 7, 9, 10 date: 4/5, 4/5, 4/5 File4 Id: 1, 9, 10 date: 4/5, 4/6, 4/6 File Name Col Min Max File1 Id date 3 4/1 9 4/3 File2 Id date 1 4/3 10 4/4 File3 Id date 7 4/5 10 4/5 File4 Id date 1 4/5 10 4/6
  9. 9. INDEX COMPARISON Why not B-Trees? Uni-dimensional Index Maintenance IO Cost on large queries source
  10. 10. © 2019 Snowflake Computing Inc. All Rights Reserved DEFAULT CLUSTERING 8 Partitioned on ingestion • Based on physical property: size • Clustered based on load time • Not optimized for other dimensions 01/01/2018 02/01/2018 03/01/2018 04/01/2018 05/01/2018 06/01/2018 07/01/2018 08/01/2018 09/01/2018 10/01/2018 11/01/2018 12/01/2018 Pruned Partitions Query: Sept-Dec Select … from orders where date between ’09-01-2018’ and ’12-01-2018’ Select … from orders where product = ‘IPhone’
  11. 11. © 2019 Snowflake Computing Inc. All Rights Reserved PARTITIONING COMPARISON 9 Hash based • Not efficient for range scans Range based • Strict boundaries • Range Splitting when it gets too big Snowflake • Key not used for data distribution • Overlaps in ranges are okay • Achieve good pruning with zone maps (min/max) Partition 1 Partition 2 Partition 3 Hash (key) Partition 1 Partition 2 Partition 3 < Key < 0 - 25 26 - 50 51 - 75 Partition 1 Partition 2 Partition 3 < Column < 0 - 30 20 - 60 50 - 75
  12. 12. © 2019 Snowflake Computing Inc. All Rights Reserved DEFAULT CLUSTERED TABLE 10 Micro-partition 1 Micro-partition 2 Micro-partition 3 Micro-partition 4 Name Country Rows 1-6 Rows 7-12 Rows 13-18 Rows 19-24 2 4 3 2 3 2 3 2 4 5 1 5 2 4 2 2 5 3 1 4 5 5 3 2 A C C B A C Z B C X A A X Z Y B X A C Z Y B X Z ID Name Country Date UK SP DE DE FR SP DE UK NL FR NL FR FR NL SP SP DE UK FR NL SP SP DE UK 11/2 11/2 11/2 11/2 11/2 11/2 11/2 11/2 11/2 11/3 11/3 11/3 11/2 11/2 11/2 11/3 11/3 11/4 11/3 11/4 11/4 11/5 11/5 11/5 Query Select count(*) from T where id = 2 and date = ’11/2’; Natural ordering by date 11/2 11/2 11/2 11/2 11/2 11/2 11/2 11/2 11/2 11/3 11/3 11/3 11/2 11/2 11/2 11/3 11/3 11/4 11/3 11/4 11/4 11/5 11/5 11/5 Date ID 2 4 3 2 4 3 3 2 4 5 1 5 2 4 2 2 5 3 1 4 5 5 3 2 Scans 3 partitions
  13. 13. © 2019 Snowflake Computing Inc. All Rights Reserved EXPLICIT CLUSTERING AUTOMATIC CLUSTERING Automatically reorganizes table storage from natural order to Clustering keys order Clustering Keys Identify interesting expressions Should be order preserving
  14. 14. © 2019 Snowflake Computing Inc. All Rights Reserved PROBLEM DEFINTION CONTINIOUS SORTING AT PETABYTE SCALE
  15. 15. © 2019 Snowflake Computing Inc. All Rights Reserved EXPLICITLY CLUSTERED TABLE 13 Micro-partition 1 Micro-partition 2 Micro-partition 3 Micro-partition 4 Name Country Rows 1-6 Rows 7-12 Rows 13-18 Rows 19-24 2 4 3 2 3 2 3 2 4 5 1 5 2 4 2 2 5 3 1 4 5 5 3 2 A C C B A C Z B C X A A X Z Y B X A C Z Y B X Z ID Name Country Date UK SP DE DE FR SP DE UK NL FR NL FR FR NL SP SP DE UK FR NL SP SP DE UK 11/2 11/2 11/2 11/2 11/2 11/2 11/2 11/2 11/2 11/3 11/3 11/3 11/2 11/2 11/2 11/3 11/3 11/4 11/3 11/4 11/4 11/5 11/5 11/5 11/2 11/2 11/2 11/2 11/2 11/2 11/2 11/2 11/2 11/2 11/2 11/2 11/3 11/3 11/3 11/3 11/3 11/3 11/4 11/4 11/4 11/5 11/5 11/5 Date ID 2 2 2 2 2 2 3 3 3 4 4 4 5 5 5 1 2 1 3 4 5 5 3 2 Clustering keys (date, id) Scans only 1 partition Query Select count(*) from T where id = 2 and date = ’11/2’;
  16. 16. © 2019 Snowflake Computing Inc. All Rights Reserved WHY EXPLICIT CLUSTERING? GOOD CLUSTERING BETTER PRUNING QUERY PERFORMANCE JOIN OPTIMIZATIONS BETTER REDUCTION
  17. 17. © 2019 Snowflake Computing Inc. All Rights Reserved CHALLENGES: KEEPING UP WITH CHANGES 15 05/01/2018, [0, 100] New File [0, 10) [10, 20) [20, 30) [30, 40) [40, 50) [50, 60) [60, 70) [70, 80) [80, 90) [90, 100) [0, 10) [10, 20) [20, 30) [30, 40) [40, 50) [50, 60) [60, 70) [70, 80) [80, 90) [90, 100) NEW DATA ADDED PARTITION REWRITE Original Immutable Files All Files Updated
  18. 18. © 2019 Snowflake Computing Inc. All Rights Reserved APPROACHES TO KEEP UP 16 Batch re-cluster Re-cluster inline with the changes Extremely expensive Block other changes Huge variation in Query performance Not practical on PB tables Full table re-write High write amplification Bound by re-cluster speed Not practical for TB/PB tables
  19. 19. © 2019 Snowflake Computing Inc. All Rights Reserved REQUIREMENTS 17 Actual sort order is not a requirement Goal – Good pruning with min/max Overlap of value ranges are acceptable Background Service Find incremental work to improve query performance Should not interfere with other changes Can change clustering keys
  20. 20. © 2019 Snowflake Computing Inc. All Rights Reserved PROBLEM DEFINTION CONTINIOUS SORTING AT PETABYTE SCALE Approximate
  21. 21. © 2019 Snowflake Computing Inc. All Rights Reserved CLUSTERING METRICS - WIDTH 19 Clustering value range Width of a partition Width of the line connecting the min and max value in the clustering values domain range 1 100 Sam 18 USA Trevor 78 Canada Anna 30 England Raj 35 India File1 : Wide File File2 : Narrow File Clustering key = AGE 18 78 File1 Width = 60 File 2 Width = 5 30 35
  22. 22. © 2019 Snowflake Computing Inc. All Rights Reserved CLUSTERING METRICS - DEPTH 20 Clustering value range Depth at a single value (or range) Number of partitions overlapping at a certain value in the clustering key domain value range 1 100 Sam 18 USA Trevor 78 Canada Anna 30 England Raj 35 India File1 File2 Clustering key = AGE 18 78 30 35 Query: 32 Depth: 3 27 100 Query: 70 Depth: 2 Query: 20 Depth: 1
  23. 23. © 2019 Snowflake Computing Inc. All Rights Reserved GOAL Reduce Worst Clustering Depth below an acceptable threshold to get Predictable Query Performance
  24. 24. © 2019 Snowflake Computing Inc. All Rights Reserved CLUSTERING SCENARIO A B C D E F G H I J K L Clustering Value Range A B B C D D E F G H I J J K L A C E H K depth=2 depth=2 width=11 Start State width=4 width=5 width=4
  25. 25. © 2019 Snowflake Computing Inc. All Rights Reserved CLUSTERING SCENARIO A B C D E F G H I J K L A B B C D D E F G H I J J K L A C E H K depth=3 depth=4 width=11 Iteration 1: Pick 3 files to recluster width=4 width=5 width=4 C F B E H E I D L K width=8 width=9 Clustering Value Range
  26. 26. © 2019 Snowflake Computing Inc. All Rights Reserved CLUSTERING SCENARIO A B C D E F G H I J K L A B B C D D E F G H I J J K L depth=2 depth=2 Improved Clustering state Iteration 2: width=4 width=5 width=4 E E E F H H H I K L A B C C D width=4 width=4 width=5 Clustering Value Range
  27. 27. © 2019 Snowflake Computing Inc. All Rights Reserved INSIGHT Reduce Worst Depth by Reducing overlapping (reduce width)
  28. 28. © 2019 Snowflake Computing Inc. All Rights Reserved 28 FILE SELECTION: INTUITION Clustering Value Range ClusteringDepth
  29. 29. © 2019 Snowflake Computing Inc. All Rights Reserved FILE SELECTION: INTUITION 29 Peak1 Clustering Value Range ClusteringDepth Target Depth Peak2
  30. 30. © 2019 Snowflake Computing Inc. All Rights Reserved AFTER RECLUSTERING 30 Partitions selected for clustering Clustering Value Range ClusteringDepth
  31. 31. © 2019 Snowflake Computing Inc. All Rights Reserved ALGORITHM OUTLINE 31 Re-cluster Execution Optimistic Commit Partition Batches CHANGES Partition Selection Stop Keep re-clustering until table is well clustered enough New Changes triggers partition selection Decouple Partition Selection from Execution Divide work into batches
  32. 32. © 2019 Snowflake Computing Inc. All Rights Reserved FILE SELECTION ALGORITHM INTERNALS Intuition: Work on the peaks • Sort the micro-partition endpoints • First pass: Compute peak ranges and calculate the number of overlapping micro-partitions • Stabbing count array • Second pass: For the peak ranges – compute list of partitions ordered by depth • MinMaxPriorityQueue
  33. 33. © 2019 Snowflake Computing Inc. All Rights Reserved CLUSTERING LEVELS 33 Clustering Key Domain LEVEL N… Reduces clustering width with levels More partitions with shorter range of data LEVEL 2: LEVEL 3: LEVEL 0: LEVEL 1:
  34. 34. © 2019 Snowflake Computing Inc. All Rights Reserved REVIEW Wake up when data arrives Split the table files into levels (clustering state) Run clustering partition selection algorithm on each level Queue up “recluster task” for execution Scale up and execute re-cluster tasks • Sized just right – optimized not to spill • Non-blocking – optimistic commit If clustering depth is not “good enough” repeat Else sleep
  35. 35. © 2019 Snowflake Computing Inc. All Rights Reserved EFFECT ON QUERY PERFORMANCE 35 Time Average Query Response time Background Recluster tasks Files Added T0 T1 T2 T3 T4 T5
  36. 36. © 2019 Snowflake Computing Inc. All Rights Reserved FUTURE WORK 36 Multi Dimension Keys • “Fake wide partition” problem • High cardinality column renders subsequent columns useless • Avoid excessive churn Manual cluster keys Partition selection based on usage Other linearization functions • Z-order, Grey-order, Hilbert curves Analyze workload to pick best clustering keys automatically Partition selection Phase 2
  37. 37. © 2019 Snowflake Computing Inc. All Rights Reserved JOIN US! One of the “Best Places to Work” every year Where else can you compete with Amazon, Google and Microsoft? ☺ Ambitious Projects Smart People Immediate Impact Fun Culture
  38. 38. THANK YOU © 2018 Snowflake Computing Inc. All Rights Reserved
  39. 39. © 2019 Snowflake Computing Inc. All Rights Reserved Jiaqi Yan
  40. 40. © 2019 Snowflake Computing Inc. All Rights Reserved “FAKE WIDE PARTITIONS” 40 A B C D 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5Actual Min/Max Column Metadata Min/Max A-1 A-3 A-A 1-3 A-5 B-1 A-B 1-5 B-2 B-4 B-B 2-4 B-4 C-2 B-C 2-4 C-3 C-4 C-C 3-4 C-5 D-2 C-D 2-5
  41. 41. © 2019 Snowflake Computing Inc. All Rights Reserved CASE STUDY 41 > 1PB table, trickle load every 3 minutes Provides dashboard service to customers Sub-second query time SLA Query by application IDs Huge skews cross applications Cluster by (app_id, time, name)
  42. 42. © 2019 Snowflake Computing Inc. All Rights Reserved DASHBOARD 42
  43. 43. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ snowflake-automatic-clustering/

×