Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019

367 views

Published on

In April 2019, we did an USA excursion and presented selected publications of the TU Berlin DIMA and the DFKI IAM research groups. This slide set contains the four teaser talks which we presented on the tour:

1) Jonas Traub: Optimized On-Demand Data Streaming from Sensor Nodes
2) Sebastian Breß: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors
3) Martin Kiefer: Estimating Join Selectivities using Bandwidth Optimized Kernel Density Models
4) Andreas Kunft: BlockJoin: Efficient Matrix Partitioning through Joins

Published in: Science
  • Be the first to comment

Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019

  1. 1. Database Research at TU Berlin Today‘s Talks: Jonas Traub Sebastian Breß Martin Kiefer Andreas Kunft Optimized On-Demand Data Streaming from Sensor Nodes ACM Symposium on Cloud Computing (SoCC), 2017. Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models Proceedings of the VLDB Endowment (PVLDB), 2017. Generating Custom Code for Efficient Query Execution on Heterogeneous Processors The VLDB Journal, 27(6), 2018. BlockJoin: Efficient Matrix Partitioning Through Joins Proceedings of the VLDB Endowment (PVLDB), 2017. Database Systems and Information Management Group (DIMA) of Volker Markl
  2. 2. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Optimized On-Demand Data Streaming from Sensor Nodes Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl ACM Symposium on Cloud Computing (SoCC), 2017
  3. 3. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud Real-time insights 3
  4. 4. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud Real-time insights Billions of sensor nodes form a sensor cloud and provide data streams to analysis systems. 3
  5. 5. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud Real-time insights Billions of sensor nodes form a sensor cloud and provide data streams to analysis systems. 3
  6. 6. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud Real-time insights Billions of sensor nodes form a sensor cloud and provide data streams to analysis systems. 3
  7. 7. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 The Sensor Cloud Real-time insights Billions of sensor nodes form a sensor cloud and provide data streams to analysis systems. 3
  8. 8. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Real-time insights Billions of sensor nodes form a sensor cloud and provide data streams to analysis systems. The Sensor Cloud – Problems 4
  9. 9. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Real-time insights Streaming all data from billions of sensors to all applications with maximal frequencies is impossible Billions of sensor nodes form a sensor cloud and provide data streams to analysis systems. The Sensor Cloud – Problems 4
  10. 10. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Real-time insights Streaming all data from billions of sensors to all applications with maximal frequencies is impossible Increasing data rates require expensive system scale-out. Billions of sensor nodes form a sensor cloud and provide data streams to analysis systems. The Sensor Cloud – Problems 4
  11. 11. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Tailor Data Streams to the Demand of Applications • Provide an abstraction to define the data demand of applications. • Optimize communication costs while maintaining the result accuracy. • Share sensor reads and data transfer among users and queries. User-Defined Sampling Functions (UDSFs) Read-Time Optimization Multi-Query / Multi-User Optimization The Sensor Cloud – Solutions 5
  12. 12. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Architecture Overview 6
  13. 13. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Architecture Overview 6
  14. 14. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Architecture Overview 6
  15. 15. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Architecture Overview 6
  16. 16. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Architecture Overview 6
  17. 17. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Sensor Read Scheduling 7
  18. 18. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Input: Sensor read time and value Output: Next Sensor Read Request User-Defined Sampling Functions 8
  19. 19. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Input: Sensor read time and value User-Defined Sampling Functions 9
  20. 20. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Enable adaptive sampling techniques to reduce data transmission e.g., Adam [Trihinas ‘15], FAST [Fan ‘14], L-SIP [Gaura ’13] User-Defined Sampling Functions 10
  21. 21. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Sensor Read Fusion 11
  22. 22. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 1) Minimize Sensor Reads and Data Transfer: Latest possible read time Sensor Read Fusion 12
  23. 23. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 1) Minimize Sensor Reads and Data Transfer: Latest possible read time 2) Optimize Sensor Read Times: ● Check the paper for all details on the read time optimizer! Sensor Read Fusion 12
  24. 24. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Read Execution 14
  25. 25. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Local Filtering 15
  26. 26. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 ● Enable adaptive filtering in combination with adaptive sampling ● Enable model-driven data acquisition Local Filtering 15
  27. 27. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 • On-Demand scheduling reduces sensor reads and data transfer by up to 87%. • The # of reads and transfers increases sub-linearly with the # of queries. Increasing the Number of Concurrent Queries 16 independent queries on-demand scheduling
  28. 28. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Further Publications on Data Streams and Sensor Data: Optimized On-Demand Data Streaming from Sensor Nodes Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl ACM Symposium on Cloud Computing (SoCC), 2017 Efficient Window Aggregation with General Stream Slicing EDBT 2019 I²: Interactive Real-Time Visualization for Streaming Data EDBT 2017 Resense: Transparent Record and Replay of Sensor Data in the Internet of Things EDBT 2019
  29. 29. Database Research at TU Berlin Up Next: Jonas Traub Sebastian Breß Martin Kiefer Andreas Kunft Optimized On-Demand Data Streaming from Sensor Nodes ACM Symposium on Cloud Computing (SoCC), 2017. Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models Proceedings of the VLDB Endowment (PVLDB), 2017. Generating Custom Code for Efficient Query Execution on Heterogeneous Processors The VLDB Journal, 27(6), 2018. BlockJoin: Efficient Matrix Partitioning Through Joins Proceedings of the VLDB Endowment (PVLDB), 2017. Database Systems and Information Management Group (DIMA) of Volker Markl
  30. 30. Generating Custom Code for Efficient Query Execution on Heterogeneous Processors Sebastian Breß, Bastian Köcher, Henning Funke, Steffen Zeuch, Tilmann Rabl, Volker Markl VLDB Journal, 27(6), 797-822, 2018
  31. 31. Heterogeneous Processors 20S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  32. 32. Heterogeneous Processors 20 CPUs S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  33. 33. Heterogeneous Processors 20 CPUs MICs S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  34. 34. Heterogeneous Processors 20 CPUs MICs GPUs S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  35. 35. Heterogeneous Processors 20 Enable databases to automatically exploit heterogeneous processors Goal CPUs MICs GPUs S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  36. 36. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 21 Writing efficient code for different processors is costly and error prone Problem Problem and Key Ideas
  37. 37. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 21 Writing efficient code for different processors is costly and error prone Problem Generate custom code for each query and processor Key Idea 1 Problem and Key Ideas
  38. 38. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 21 Writing efficient code for different processors is costly and error prone Problem Generate custom code for each query and processor Key Idea 1 Identify efficient code modifications and parameters automatically Key Idea 2 Problem and Key Ideas
  39. 39. Challenges S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 22
  40. 40. Challenges S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 22 Represent code modifications in query plan Intermediate Representation
  41. 41. Challenges S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 22 Represent code modifications in query plan Intermediate Representation Select efficient parameters and code modifications Variant Optimization
  42. 42. Challenges S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 22 Represent code modifications in query plan Intermediate Representation Select efficient parameters and code modifications Variant Optimization Generate hardware-tailored code Code Generation
  43. 43. Hawk Code Generator S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 23
  44. 44. Hawk Code Generator S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 23 y a od a o a s
  45. 45. Hawk Code Generator S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 23 y a od a o a s No changes to SQL parser and optimizer Alternative Execution Engine
  46. 46. Hawk Code Generator S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 23 y a od a o a s No changes to SQL parser and optimizer Alternative Execution Engine Execute queries on CPUs/GPUs/MICs Multi-Processor Support
  47. 47. Hawk Code Generator S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 23 y a od a o a s No changes to SQL parser and optimizer Alternative Execution Engine Execute queries on CPUs/GPUs/MICs Multi-Processor Support Tunes code and parameters to processors Automatic Performance Optimization
  48. 48. Step 1: Query Segmentation 24 CJCJ CJ SQL S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  49. 49. Step 1: Query Segmentation 24 CJCJ CJ SQL S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  50. 50. Step 1: Query Segmentation 24 SQL S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  51. 51. Step 2: Select Processor-Specific Code Variants S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 25 Pipeline program Optimized Pipeline Programs
  52. 52. Step 2: Select Processor-Specific Code Variants S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 25 Pipeline program Optimized Pipeline Programs Variant Optimizer
  53. 53. Step 2: Select Processor-Specific Code Variants S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 25 Pipeline program Optimized Pipeline Programs Variant Optimizer
  54. 54. Step 2: Select Processor-Specific Code Variants S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 25 Pipeline program Optimized Pipeline Programs Variant Optimizer
  55. 55. Step 2: Select Processor-Specific Code Variants S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 25 Pipeline program Optimized Pipeline Programs Variant Optimizer
  56. 56. Step 3: Generate Target Code 26 Optimized Pipeline Programs Code Generator Target Code
  57. 57. Step 3: Generate Target Code 26 Optimized Pipeline Programs Code Generator Target Code
  58. 58. Step 3: Generate Target Code 26 Optimized Pipeline Programs Code Generator Target Code
  59. 59. Step 3: Generate Target Code 26 Optimized Pipeline Programs Code Generator Target Code
  60. 60. Code Generator Details 27
  61. 61. Pipeline Program IR 28 SELECT id, age FROM person WHERE age < 25; SQL Query Pipeline Program S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  62. 62. Pipeline Program IR (2) 29S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  63. 63. Pipeline Program IR (2) 29 LOOP(person) S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  64. 64. Pipeline Program IR (2) 29 LOOP(person) FILTER(age<25) S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  65. 65. Pipeline Program IR (2) 29 LOOP(person) FILTER(age<25) HASH_PUT(id) S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  66. 66. Pipeline Program IR (2) 29 LOOP(person) FILTER(age<25) HASH_PUT(id) PROJECT(id, age) S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  67. 67. Pipeline Program IR: Modifications 30 LOOP(table) FILTER(age<25) HASH_PUT(id) PROJECT(id, age) S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  68. 68. Pipeline Program IR: Modifications 30 LOOP(table) FILTER(age<25) HASH_PUT(id) PROJECT(id, age) Memory Access Pattern S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  69. 69. Pipeline Program IR: Modifications 30 LOOP(table) FILTER(age<25) HASH_PUT(id) PROJECT(id, age) Memory Access Pattern Predication Mode S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  70. 70. Pipeline Program IR: Modifications 30 LOOP(table) FILTER(age<25) HASH_PUT(id) PROJECT(id, age) Memory Access Pattern Hash Table Implementation Predication Mode S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  71. 71. Pipeline Program IR: Modifications 30 LOOP(table) FILTER(age<25) HASH_PUT(id) PROJECT(id, age) Memory Access Pattern Hash Table Implementation Predication Mode Parallelization Strategy S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  72. 72. Pipeline Program IR: Modifications (2) 31 LOOP(table, sequential) FILTER(age<25, branched) HASH_PUT(id, linear_probing) PROJECT(id, age, single-pass) LOOP(table) FILTER(age<25) HASH_PUT(id) PROJECT(id, age) S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  73. 73. Pipeline Program IR: Modifications (2) 31 LOOP(table, sequential) FILTER(age<25, branched) HASH_PUT(id, linear_probing) PROJECT(id, age, single-pass) FILTER(age<25) HASH_PUT(id) PROJECT(id, age) S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  74. 74. Pipeline Program IR: Modifications (2) 31 LOOP(table, sequential) FILTER(age<25, branched) HASH_PUT(id, linear_probing) PROJECT(id, age, single-pass) HASH_PUT(id) PROJECT(id, age) S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  75. 75. Pipeline Program IR: Modifications (2) 31 LOOP(table, sequential) FILTER(age<25, branched) HASH_PUT(id, linear_probing) PROJECT(id, age, single-pass)PROJECT(id, age) S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  76. 76. Pipeline Program IR: Modifications (2) 31 LOOP(table, sequential) FILTER(age<25, branched) HASH_PUT(id, linear_probing) PROJECT(id, age, single-pass) S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  77. 77. Generating Code: Sequential Memory Access 32 int thread_id = get_thread_id(); start=start_idx(thread_id, num_rows); end=end_idx(thread_id, num_rows); for(tid=start;tid<end;tid+=1){ if(age[id] < 25){ OUTPUT(id[tid], age[tid]); } } S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  78. 78. Memory Access Patterns 33S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  79. 79. Pipeline Program IR: Rewrite 80 LOOP(table, coalesced) FILTER(age<25, branched) HASH_PUT(id, linear_probing) PROJECT(id, age, single-pass) LOOP(table, sequential) S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  80. 80. Pipeline Program IR: Rewrite 81 LOOP(table, coalesced) FILTER(age<25, branched) HASH_PUT(id, linear_probing) PROJECT(id, age, single-pass) S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  81. 81. Generating Code: Coalesced Memory Access 82 int thread_id = get_thread_id(); int num_threads= get_num_threads(); for(id=thread_id;id<num_rows; id+=num_threads){ if(age[id] < 25){ OUTPUT(id[tid], age[tid]); } } S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  82. 82. Generating Code: Coalesced Memory Access 83 int thread_id = get_thread_id(); int num_threads= get_num_threads(); for(id=thread_id;id<num_rows; id+=num_threads){ if(age[id] < 25){ OUTPUT(id[tid], age[tid]); } } S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 Pipeline programs provide fine-grained control over generated code
  83. 83. Performance: Memory Access Patterns 84S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  84. 84. Code Variant Optimization 37
  85. 85. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 38 Change to a pipeline program that conserves the semantic but changes the code Modification Terminology
  86. 86. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 38 Change to a pipeline program that conserves the semantic but changes the code Modification Provides value for each supported modification, defines the generated code Variant configuration Terminology
  87. 87. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 38 Change to a pipeline program that conserves the semantic but changes the code Modification Provides value for each supported modification, defines the generated code Variant configuration Compilation result of a pipeline program Code variant Terminology
  88. 88. Variant Optimization 39S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  89. 89. Variant Optimization 39 Derive an efficient code variant for each processor S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  90. 90. Variant Optimization 39 Derive an efficient code variant for each processor Perform an offline calibration phase on a test workload S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  91. 91. Variant Optimization 39 Derive an efficient code variant for each processor Perform an offline calibration phase on a test workload Explore the impact of each code modification separately S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  92. 92. Variant Optimization - Algorithm 40 Slow FastVariant Space
  93. 93. Variant Optimization - Algorithm 40 Slow FastVariant Space Initial Variant
  94. 94. Variant Optimization - Algorithm 40 Slow FastVariant Space Initial Variant
  95. 95. Variant Optimization - Algorithm 40 Slow FastVariant Space
  96. 96. Variant Optimization - Algorithm 40 Slow FastVariant Space
  97. 97. Variant Optimization - Algorithm 40 Slow FastVariant Space
  98. 98. Variant Optimization - Algorithm 40 Slow FastVariant Space
  99. 99. Variant Optimization - Algorithm 40 Slow FastVariant Space
  100. 100. Variant Optimization - Algorithm 41 Slow FastVariant Space Variant 1
  101. 101. Variant Optimization - Algorithm 42 Slow FastVariant Space
  102. 102. Variant Optimization - Algorithm 42 Slow FastVariant Space
  103. 103. Variant Optimization - Algorithm 42 Slow FastVariant Space
  104. 104. Variant Optimization - Algorithm 42 Slow FastVariant Space
  105. 105. Variant Optimization - Algorithm 42 Slow FastVariant Space
  106. 106. Variant Optimization - Algorithm 42 Slow FastVariant Space
  107. 107. Variant Optimization - Algorithm 42 Slow FastVariant Space
  108. 108. Variant Optimization - Algorithm 43 Slow FastVariant Space Variant 2
  109. 109. Search Algorithm 44S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  110. 110. Search Algorithm 44 Finds an efficient variant with linear run-time in the number of dimensions S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  111. 111. Search Algorithm 44 Finds an efficient variant with linear run-time in the number of dimensions Code modifications are not strictly orthogonal (space not convex) S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  112. 112. Search Algorithm 44 Finds an efficient variant with linear run-time in the number of dimensions Code modifications are not strictly orthogonal (space not convex) Perform multiple iterations of the algorithm to find best code variant S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  113. 113. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 45 Optimizing Search Time
  114. 114. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 45 Terminate the search if no faster variant is found during an iteration Early Termination Optimizing Search Time
  115. 115. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 45 Terminate the search if no faster variant is found during an iteration Early Termination Explore the parameter values of the most critical modifications first Feature Ordering Optimizing Search Time
  116. 116. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 45 Terminate the search if no faster variant is found during an iteration Early Termination Explore the parameter values of the most critical modifications first Feature Ordering Only include code modifications that change the code Nested Modifications Optimizing Search Time
  117. 117. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 46 Evaluation of Search Time Variant exploration times for SSB Q4.1 on SF1
  118. 118. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 46 Evaluation of Search Time Our strategy outperforms backtracking by up to two orders of magnitude Variant exploration times for SSB Q4.1 on SF1
  119. 119. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 47 Handling Query Dependencies
  120. 120. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 47 Variant configuration of a processor serves as starting point for further tuning Reuse Variant Configurations Handling Query Dependencies
  121. 121. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 47 Variant configuration of a processor serves as starting point for further tuning Reuse Variant Configurations Set a query-dependent modification to another parameter value when we expect a performance improvement Heuristic-Based Rewrites Handling Query Dependencies
  122. 122. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 47 Variant configuration of a processor serves as starting point for further tuning Reuse Variant Configurations Set a query-dependent modification to another parameter value when we expect a performance improvement Heuristic-Based Rewrites Switch to software predication in FILTER when selectivity is 50% Example: Software Predication Handling Query Dependencies
  123. 123. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 48 Query Compilation Times
  124. 124. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 48 Query Compilation Times Compilation times of OpenCL are in the order of hundreds of milliseconds
  125. 125. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 48 Query Compilation Times Compilation times of OpenCL are in the order of hundreds of milliseconds
  126. 126. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 48 Query Compilation Times Compilation times of OpenCL are in the order of hundreds of milliseconds Compilation times grow linear with the number of pipelines in a query
  127. 127. Evaluation Results 49 1 1 1 1 1 1 7 11 1 1 1 1 1 1 1 1 17 1 1 1 1 1 1 1 1 S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  128. 128. Evaluation Results 49 1 1 1 1 1 1 7 11 1 1 1 1 1 1 1 1 17 1 1 1 1 1 1 1 1 Performance difference among variants up to two orders of magnitude S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  129. 129. Evaluation Results 49 1 1 1 1 1 1 7 11 1 1 1 1 1 1 1 1 17 1 1 1 1 1 1 1 1 Performance difference among variants up to two orders of magnitude Hawk reliably identifies efficient code variants for CPUs, GPUs, MICs S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  130. 130. Evaluation Results 49 1 1 1 1 1 1 7 11 1 1 1 1 1 1 1 1 17 1 1 1 1 1 1 1 1 Performance difference among variants up to two orders of magnitude Hawk reliably identifies efficient code variants for CPUs, GPUs, MICs Best code depends on query characteristics S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  131. 131. Conclusion 50S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  132. 132. Conclusion 50 A hardware-tailored code generator Hawk S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  133. 133. Conclusion 50 A hardware-tailored code generator Hawk Produce custom code variants for each processor Code Variant Generation S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  134. 134. Conclusion 50 A hardware-tailored code generator Hawk Produce custom code variants for each processor Code Variant Generation No manual tuning for a specific processor Variant Optimization S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  135. 135. https://github.com/TU-Berlin-DIMA/Hawk-VLDBJ Conclusion 50 A hardware-tailored code generator Hawk Produce custom code variants for each processor Code Variant Generation No manual tuning for a specific processor Variant Optimization S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
  136. 136. Further Publications on Data Management on Modern Hardware: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors Sebastian Breß, Bastian Köcher, Henning Funke, Steffen Zeuch, Tilmann Rabl, Volker Markl VLDB Journal, 27(6), 797-822, 2018 Pipelined Query Processing in Coprocessor Environments SIGMOD 2018 Efficient and Scalable k-Means on GPUs. Datenbank-Spektrum 2018 Analyzing Efficient Stream Processing on Modern Hardware PVLDB 2019
  137. 137. Database Research at TU Berlin Up Next: Jonas Traub Sebastian Breß Martin Kiefer Andreas Kunft Optimized On-Demand Data Streaming from Sensor Nodes ACM Symposium on Cloud Computing (SoCC), 2017. Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models Proceedings of the VLDB Endowment (PVLDB), 2017. Generating Custom Code for Efficient Query Execution on Heterogeneous Processors The VLDB Journal, 27(6), 2018. BlockJoin: Efficient Matrix Partitioning Through Joins Proceedings of the VLDB Endowment (PVLDB), 2017. Database Systems and Information Management Group (DIMA) of Volker Markl
  138. 138. GPU-Accelerated Join Selectivity Estimation using KDE Models Paper: Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models, Martin Kiefer, Max Heimel, Sebastian Breß, Volker Markl PVLDB, Volume 10 Issue 13, September 2017
  139. 139. GPU-Accelerated Kernel Density Estimation for Join Selectivity Estimation 54 Query Optimizer Database Engine Query Plan Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  140. 140. GPU-Accelerated Kernel Density Estimation for Join Selectivity Estimation 54 Query Optimizer Database Engine Statistical CoprocessorQuery Plan Parameters Estimates Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  141. 141. GPU-Accelerated Kernel Density Estimation for Join Selectivity Estimation 54 Query Optimizer Database Engine Statistical CoprocessorQuery Plan Parameters Estimates Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  142. 142. Background: Kernel Density Estimators 55 Dataset Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  143. 143. Background: Kernel Density Estimators 55 Dataset Sample 𝑆 Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  144. 144. Background: Kernel Density Estimators 55 Dataset Sample 𝑆 Kernels 𝐾 𝐻 Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  145. 145. Background: Kernel Density Estimators 55 Dataset Sample 𝑆 Kernels 𝐾 𝐻 Estimate ෠𝑃 𝐻 Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  146. 146. Background: Kernel Density Estimators 55 Dataset Sample 𝑆 Kernels 𝐾 𝐻 Estimate ෠𝑃 𝐻 Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  147. 147. Background: Kernel Density Estimators 55 ෠𝑃 𝐻 Ԧ𝑥 = 1 |𝑆| ෍ 𝑖=1 |𝑆| 𝐾 𝐻 𝑠𝑖, Ԧ𝑥 Average… … over the kernel contributions Dataset Sample 𝑆 Kernels 𝐾 𝐻 Estimate ෠𝑃 𝐻 Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  148. 148. Background: Kernel Density Estimators 56 Average… … over the kernel contributions Dataset Sample 𝑆 Kernels 𝐾 𝐻 Estimate ෠𝑃 𝐻 Ω Ω sel Ω = 1 |𝑆| ෍ 𝑖=1 |𝑆| න Ω 𝐾 𝐻(𝑠𝑖, Ԧ𝑥) 𝑑 Ԧ𝑥 Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  149. 149. Background: Kernel Density Estimators for Multi- Dimensional Selectivity Estimation [1] 57 Good fit Overfit Underfit The bandwidth matrix 𝐻 controls the smoothing applied on the sample • Range selections over base tables • Bandwidth optimization based on the estimation error • Easy model maintenance [1] Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation, SIGMOD’15
  150. 150. The Problem: Multi-Dimensional Join Selectivity Estimation • and generalization to multiple joins • Databases: Independence Assumption • Often violated • Introduce large errors, potentially bad query plans • Research: Various Methods (e.g. Sampling, Sketches) • Our Approach: Kernel Density Estimators 58Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  151. 151. Why KDEs for Join Selectivities? • Multivariate Estimator • No independence assumption • Hybrid between samples and histograms • Small bandwidth: Sample evaluation • Increasing bandwidth: More smoothing, increasing bucket sizes • Bandwidth optimization selects proper bandwidth 59Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  152. 152. The Approach: Join and Base Table Models 60Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  153. 153. The Approach: Join and Base Table Models 60 Sample from 𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1 𝑅2 Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  154. 154. The Approach: Join and Base Table Models 60 Bandwidth 𝐻 Sample from 𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1 𝑅2 Join KDE Model (𝑷) Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  155. 155. The Approach: Join and Base Table Models 60 Bandwidth 𝐻 Sample from 𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1 𝑅2 Join KDE Model (𝑷) 𝑃(𝑐1 ∧ 𝑐2)Compute: Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  156. 156. The Approach: Join and Base Table Models 60 Bandwidth 𝐻 Sample from 𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1 𝑅2 Join KDE Model (𝑷) Sample from 𝑅1 Sample from 𝑅2 𝑃(𝑐1 ∧ 𝑐2)Compute: Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  157. 157. The Approach: Join and Base Table Models 60 Bandwidth 𝐻 Sample from 𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1 𝑅2 Join KDE Model (𝑷) Bandwidth 𝐻 Sample from 𝑅1 Base Table KDE Model (𝑷 𝟏) Bandwidth 𝐻 Sample from 𝑅2 Base Table KDE Model (𝑷 𝟐) 𝑃(𝑐1 ∧ 𝑐2)Compute: Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  158. 158. The Approach: Join and Base Table Models 60 Bandwidth 𝐻 Sample from 𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1 𝑅2 Join KDE Model (𝑷) Bandwidth 𝐻 Sample from 𝑅1 Base Table KDE Model (𝑷 𝟏) Bandwidth 𝐻 Sample from 𝑅2 Base Table KDE Model (𝑷 𝟐) 𝑃(𝑐1 ∧ 𝑐2) Compute: ෍ 𝑣∈𝐴 𝑃1 𝐴1 = 𝑣 ∧ 𝑐1 ⋅ 𝑃2 𝐴2 = 𝑣 ∧ 𝑐2Compute: Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  159. 159. The Approach: Join and Base Table Models 60 Bandwidth 𝐻 Sample from 𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1 𝑅2 Join KDE Model (𝑷) Bandwidth 𝐻 Sample from 𝑅1 Base Table KDE Model (𝑷 𝟏) Bandwidth 𝐻 Sample from 𝑅2 Base Table KDE Model (𝑷 𝟐) 𝑃(𝑐1 ∧ 𝑐2) Compute: ෍ 𝑣∈𝐴 𝑃1 𝐴1 = 𝑣 ∧ 𝑐1 ⋅ 𝑃2 𝐴2 = 𝑣 ∧ 𝑐2Compute: Easy to evaluate, better estimates Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  160. 160. The Approach: Join and Base Table Models 60 Bandwidth 𝐻 Sample from 𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1 𝑅2 Join KDE Model (𝑷) Bandwidth 𝐻 Sample from 𝑅1 Base Table KDE Model (𝑷 𝟏) Bandwidth 𝐻 Sample from 𝑅2 Base Table KDE Model (𝑷 𝟐) 𝑃(𝑐1 ∧ 𝑐2) Compute: ෍ 𝑣∈𝐴 𝑃1 𝐴1 = 𝑣 ∧ 𝑐1 ⋅ 𝑃2 𝐴2 = 𝑣 ∧ 𝑐2Compute: Easy to evaluate, better estimates Support for base table and join selectivities Easy to construct and to maintain Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  161. 161. Table Model: Computation Components 61 Selectivity:
  162. 162. Table Model: Computation Components 61 Sum over cross product of two samples Selectivity:
  163. 163. Table Model: Computation Components 61 Sum over cross product of two samples Invariant Contributions: Contribution of sample points wrt. selection predicate Selectivity:
  164. 164. Table Model: Computation Components 61 Sum over cross product of two samples Cross Contribution: Distance function on join attributes of sample points Invariant Contributions: Contribution of sample points wrt. selection predicate Selectivity:
  165. 165. Table Model: Sample Pruning 9
  166. 166. Table Model: Sample Pruning 9 𝑡1 (1) 𝑡1 (2) 𝑡1 (3) 𝑡1 (4) 𝑡1 (5) Sample 1
  167. 167. Table Model: Sample Pruning 9 𝑡1 (1) 𝑡1 (2) 𝑡1 (3) 𝑡1 (4) 𝑡1 (5) Compute Sample 1
  168. 168. Table Model: Sample Pruning 9 𝑡1 (1) 𝑡1 (2) 𝑡1 (3) 𝑡1 (4) 𝑡1 (5) 𝑡1 (1) 𝑡1 (2) 𝑡1 (3) 𝑡1 (4) 𝑝1 (1) 𝑝1 (2) 𝑝1 (3) 𝑝1 (4) 𝑡1 (5) 𝑝1 (5) Compute Sample 1
  169. 169. Table Model: Sample Pruning 9 𝑡1 (1) 𝑡1 (2) 𝑡1 (3) 𝑡1 (4) 𝑡1 (5) 𝑡1 (1) 𝑡1 (2) 𝑡1 (3) 𝑡1 (4) 𝑝1 (1) 𝑝1 (2) 𝑝1 (3) 𝑝1 (4) 𝑡1 (5) 𝑝1 (5) 𝑡1 (1) 𝑡1 (4) 𝑝1 (1) 𝑝1 (4) Compute Filter by contribution Sample 1
  170. 170. Table Model: Cross Pruning 63
  171. 171. Table Model: Cross Pruning 63 𝑡1 (1) 𝑡1 (2) 𝑡1 (3) 𝑡1 (4) 𝑝1 (1) 𝑝1 (2) 𝑝1 (3) 𝑝1 (4) 𝑡1 (5) 𝑝1 (5) Sample 1
  172. 172. Table Model: Cross Pruning 63 𝑡1 (1) 𝑡1 (2) 𝑡1 (3) 𝑡1 (4) 𝑝1 (1) 𝑝1 (2) 𝑝1 (3) 𝑝1 (4) 𝑡1 (5) 𝑝1 (5) 𝑡2 (1) 𝑡2 (2) 𝑡2 (3) 𝑡2 (4) 𝑝2 (1) 𝑝2 (2) 𝑝2 (3) 𝑝2 (4) 𝑡2 (5) 𝑝2 (5) Sample 1 Sample 2 (Sorted on join attribute)
  173. 173. Table Model: Cross Pruning 63 𝑡1 (1) 𝑡1 (2) 𝑡1 (3) 𝑡1 (4) 𝑝1 (1) 𝑝1 (2) 𝑝1 (3) 𝑝1 (4) 𝑡1 (5) 𝑝1 (5) 𝑡2 (1) 𝑡2 (2) 𝑡2 (3) 𝑡2 (4) 𝑝2 (1) 𝑝2 (2) 𝑝2 (3) 𝑝2 (4) 𝑡2 (5) 𝑝2 (5) 𝑡1 𝑖 . 𝐴 − 𝑡2 𝑗 . 𝐴 < 𝜃 Sample 1 Sample 2 (Sorted on join attribute)
  174. 174. Evaluation: Scaling the Model Size (Postgres) 64 Dataset: DMV Query: Q1U Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  175. 175. Evaluation: Scaling the Model Size (Table Sample) 65 Dataset: DMV Query: Q1U Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  176. 176. Evaluation: Scaling the Model Size (Correlated Sample) 66 Dataset: DMV Query: Q1U Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  177. 177. Evaluation: Scaling the Model Size (AGMS Sketch) 67 Dataset: DMV Query: Q1U Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  178. 178. Evaluation: Scaling the Model Size (Join Sample) 68 Dataset: DMV Query: Q1U Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  179. 179. Evaluation: Scaling the Model Size (Join Sample + KDE) 69 Dataset: DMV Query: Q1U Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  180. 180. Evaluation: Scaling the Model Size (Table Sample + KDE) 70 Dataset: DMV Query: Q1U Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  181. 181. Runtime: CPU vs GPU Dataset: IMDB Workload: Q1U GPU: Tesla V100 CPU: Intel Xeon Gold 5115 TS+KDE: 4x faster JS+KDE: 5x faster 0,1 1 10 100 1% 2% 4% 8% 16% AverageEstimationTime(ms) Sample Size (Relative to Base Table Size) TS+KDE (GPU) TS+KDE (CPU) JS+KDE (GPU) JS+KDE (CPU) 71Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
  182. 182. Conclusion • KDE models for join selectivity estimation • “Getting most out of your sample” • Based on join or base table KDE models • Learning hybrid between histograms and samples • GPU-acceleration possible • Experiments, data, and code online 72 github.com/martinkiefer/join-kde “Estimating Join Selectivities using Bandwidth- Optimized Kernel Density Models”, PVLDB 17
  183. 183. Further Publications on GPU-Accelerated Kernel Density Estimation: Estimating Join Selectivities using Bandwidth- Optimized Kernel Density Models Martin Kiefer, Max Heimel, Sebastian Breß, Volker Markl Proceedings of the VLDB Endowment, 10(13), 2017 Demonstrating Transfer-Efficient Sample Maintenance on Graphics Cards EDBT 2015 Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation SIGMOD 2015
  184. 184. Database Research at TU Berlin Up Next: Jonas Traub Sebastian Breß Martin Kiefer Andreas Kunft Optimized On-Demand Data Streaming from Sensor Nodes ACM Symposium on Cloud Computing (SoCC), 2017. Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models Proceedings of the VLDB Endowment (PVLDB), 2017. Generating Custom Code for Efficient Query Execution on Heterogeneous Processors The VLDB Journal, 27(6), 2018. BlockJoin: Efficient Matrix Partitioning Through Joins Proceedings of the VLDB Endowment (PVLDB), 2017. Database Systems and Information Management Group (DIMA) of Volker Markl
  185. 185. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 BlockJoin: Efficient Matrix Partitioning Through Joins Andreas Kunft, Asterios Katsifodimos, Sebastian Schelter, Tilmann Rabl, Volker Markl PVLDB, Volume 10 Issue 13, September 2017
  186. 186. 76 Common Pattern in end-to-end machine learning pipelines 1. Relational operators e.g., join and filter the input data 2. User-defined functions e.g., feature transformation and vectorization 3. Linear algebra operators e.g., model training and cross-validation INTRODUCTION ⋈ ML𝒇 BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
  187. 187. 77 Parallel Dataflow engines implement • Relational operators on row-partitioned datasets • Linear algebra operators on block-partitioned matrices INTRODUCTION ⋈ ML𝒇 BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
  188. 188. 78 Parallel Dataflow engines implement • Relational operators on row-partitioned datasets • Linear algebra operators on block-partitioned matrices >> Pipelines combining both require expensive re-partitioning (shuffle) steps INTRODUCTION ⋈ ML𝒇 BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
  189. 189. STANDARD WORKFLOW 79 ⋈ Join Result Row-wise Products Reviews PK FK P1 1 1 1 1 P2 2 2 2 2 P1 1 3 3 3 P1 1 4 4 4 P1 1 P2 2 P3 3 P1 1 1 1 P2 2 2 2 P1 3 3 3 P1 4 4 4 BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
  190. 190. STANDARD WORKFLOW 80 0 0 1 1 2 2 0 1 1 3 1 4 ⋈ Join Result Row-wise 0 1 1 1 1 1 2 2 2 2 2 1 3 3 3 3 1 4 4 4 Global row-index Row-wise 1 3 1 4 Matrix block-partitioned Products Reviews PK FK 1 0 1 1 2 2 1 1 3 3 4 4 P1 1 1 1 1 P2 2 2 2 2 P1 1 3 3 3 P1 1 4 4 4 P1 1 P2 2 P3 3 P1 1 1 1 P2 2 2 2 P1 3 3 3 P1 4 4 4 BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
  191. 191. STANDARD WORKFLOW - PROBLEMS 81 0 0 1 1 2 2 0 1 1 3 1 4 ⋈ Join Result Row-wise 0 1 1 1 1 1 2 2 2 2 2 1 3 3 3 3 1 4 4 4 Global row-index Row-wise 1 3 1 4 Matrix block-partitioned Products Reviews PK FK 1 0 1 1 2 2 1 1 3 3 4 4 P1 1 1 1 1 P2 2 2 2 2 P1 1 3 3 3 P1 1 4 4 4 P1 1 P2 2 P3 3 P1 1 1 1 P2 2 2 2 P1 3 3 3 P1 4 4 4 Distributed Join Re- Partitioning BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
  192. 192. 0 0 1 1 2 2 0 1 1 3 1 4 STANDARD WORKFLOW - PROBLEMS 82 ⋈ Join Result Row-wise 0 1 1 1 1 1 2 2 2 2 2 1 3 3 3 3 1 4 4 4 Global row-index Row-wise 1 3 1 4 Matrix block-partitioned Materializes the join result, just to apply sequential row-index: • Shuffles data for row-wise partitioning , which is split up immediately • Puts heavy load on a few machines in case of skewed keys • Forces early matrix block materialization Products Reviews PK FK 1 0 1 1 2 2 1 1 3 3 4 4 P1 1 1 1 1 P2 2 2 2 2 P1 1 3 3 3 P1 1 4 4 4 P1 1 P2 2 P3 3 P1 1 1 1 P2 2 2 2 P1 3 3 3 P1 4 4 4 Distributed Join Re- Partitioning BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
  193. 193. • We propose Specialized operators at the intersection of linear and relational algebra • Here, we focus on Efficient creation of block-partitioned results from normalized data 83 HOW CAN WE IMPROVE? BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
  194. 194. OUR APPROACH 84 Prune Apply row-index 1 1 2 2 1 3 1 4 1 1 2 2 3 3 4 4 Block-partitioned matrix P1 1 P2 2 P1 1 1 1 P2 2 2 2 P1 3 3 3 P1 4 4 4 0 1 1 2 2 1 3 1 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 Local TID- Join Products Reviews PK FK Local Join Kernel Distributed Fetch Kernel P1 1 P2 2 P3 3 P1 1 1 1 P2 2 2 2 P1 3 3 3 P1 4 4 4 BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
  195. 195. OUR APPROACH Creates block-partitioned results from normalized data JOIN KERNEL: Local TID-Join on driver to create block-index meta-data 1. Meta-data provides mapping of TID to row-index for both relations 2. Row-index is applied independently: no materialization of join result 85BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
  196. 196. OUR APPROACH Creates block-partitioned results from normalized data JOIN KERNEL: Local TID-Join on driver to create block-index meta-data FETCH KERNEL: Materialization strategy of matrix blocks based on matrix shape: • Late materialization: Blocks are materialized on the receiver node |PK columns| >> |FK columns| • Early materialization: Blocks are materialized on the sender node |PK columns| << |FK columns| 86BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
  197. 197. Evaluation 87BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
  198. 198. PK – FK JOIN PK Table: 100k rows, scaling columns FK Table: 1m rows, 5k columns 88 b. Power-law distributed FKsa. Uniform distributed FKs up to 2.5x speedup skew resistant, while the baseline fails BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
  199. 199. PK – FK JOIN PK Table: 100k rows, scaling columns FK Table: 1m rows, 5k columns 89 b. Power-law distributed FKsa. Uniform distributed FKs BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
  200. 200. RECAP BlockJoin is a logically fused operator pipeline • Separation of matrix index creation and matrix materialization > No materialization of join result > Skew resistant • Cost based block materialization based on data shape > Late materialization > Early materialization 90BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
  201. 201. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17 Further Publications: BlockJoin: Efficient Matrix PartitioningThrough Joins Andreas Kunft, Asterios Katsifodimos, Sebastian Schelter, Tilmann Rabl, and Volker Markl. PVLDB 10.13, 2017 Bridging the gap: towards optimization across linear and relational algebra BeyondMR 2016 Implicit Parallelism through Deep Language Embedding SIGMOD 2015 ScootR: Scaling R Dataframes on Dataflow Systems SoCC 2018
  202. 202. Database Research at TU Berlin Today‘s Talks: Jonas Traub Sebastian Breß Martin Kiefer Andreas Kunft Optimized On-Demand Data Streaming from Sensor Nodes ACM Symposium on Cloud Computing (SoCC), 2017. Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models Proceedings of the VLDB Endowment (PVLDB), 2017. Generating Custom Code for Efficient Query Execution on Heterogeneous Processors The VLDB Journal, 27(6), 2018. BlockJoin: Efficient Matrix Partitioning Through Joins Proceedings of the VLDB Endowment (PVLDB), 2017. Database Systems and Information Management Group (DIMA) of Volker Markl

×