© Kriti Kathuria 2023 HOSTED BY
Low-Latency Data Access:
The Required Synergy Between CPU, Memory & Disk
Kriti Kathuria
Database Researcher
© Kriti Kathuria 2023
Kriti Kathuria (she/her)
Database Researcher
■ Conceptualizing Eventual Durability
■ SQL-gen for Incremental View Maintenance
■ Data Engineer in a past life
■ Good mentorship is powerful and fundamental
■ At scale, the insignificant become significant!
2
© Kriti Kathuria 2023
Motivation
● At scale, the insignificant becomes significant!
● A single IO takes insignificant time
● But when it is GBs of data and thousands of IO ops, the latencies become
significant.
● Thus, p99, at scale, matters.
3
© Kriti Kathuria 2023
Outline
● Motivation
● Existing Techniques
○ Aggregation Processing
○ Vectorization
○ Query Compilation
● Closing Remarks
4
© Kriti Kathuria 2023
Matrix Multiplication
5
● Example from MIT 6.172, Fall 2018, Lecture 1
© Kriti Kathuria 2023
Matrix Multiplication
6
● 3 matrices: A x B = C
X
=
i
k j
k
i
j
© Kriti Kathuria 2023
Matrix Multiplication
7
© Kriti Kathuria 2023
Matrix Multiplication
8
© Kriti Kathuria 2023
Processing aggregate queries
in a database
9
© Kriti Kathuria 2023
Aggregation during run-generation
10
© Kriti Kathuria 2023
Aggregation during run-generation
11
● tpch sf = 1, 6M rows
● filter: 5M rows fetched from disk
● output: 4 rows
Thanh Do, Goetz Graefe, and Jeffrey Naughton. 2023. Efficient
Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans.
Database Syst. 47, 4, Article 16 (December 2022), 35 pages.
https://doi.org/10.1145/3568027
© Kriti Kathuria 2023
Aggregation during run-generation
12
● tpch sf = 1, 6M rows
● filter: 5M rows fetched from disk
● output: 4 rows
● tpch sf = 1000, 6B rows
● filter: 5B rows fetched from disk
● output: 4 rows
Thanh Do, Goetz Graefe, and Jeffrey Naughton. 2023. Efficient
Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans.
Database Syst. 47, 4, Article 16 (December 2022), 35 pages.
https://doi.org/10.1145/3568027
© Kriti Kathuria 2023
Aggregation during run-generation
13
Run generation
for sorting
Sorted runs
Reduction of
sorted data
Thanh Do, Goetz Graefe, and Jeffrey Naughton. 2023. Efficient
Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans.
Database Syst. 47, 4, Article 16 (December 2022), 35 pages.
https://doi.org/10.1145/3568027
© Kriti Kathuria 2023
Aggregation during run-generation
14
In-memory
index
Unsorted data on disk
Thanh Do, Goetz Graefe, and Jeffrey Naughton. 2023. Efficient
Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans.
Database Syst. 47, 4, Article 16 (December 2022), 35 pages.
https://doi.org/10.1145/3568027
© Kriti Kathuria 2023
Vectorized Query Processing
15
© Kriti Kathuria 2023
Vectorized Query Processing
16
© Kriti Kathuria 2023
Vectorized Query Processing
17
© Kriti Kathuria 2023
JIT Query Compilation
18
© Kriti Kathuria 2023
JIT Query Compilation
19
© Kriti Kathuria 2023
JIT Query Compilation
20
© Kriti Kathuria 2023
Kriti Kathuria
linkedin.com/in/kriti-kathuria/
twitter.com/kaykathuria
Thank you! Let’s connect.
21

Low-Latency Data Access: The Required Synergy Between Memory & Disk

  • 1.
    © Kriti Kathuria2023 HOSTED BY Low-Latency Data Access: The Required Synergy Between CPU, Memory & Disk Kriti Kathuria Database Researcher
  • 2.
    © Kriti Kathuria2023 Kriti Kathuria (she/her) Database Researcher ■ Conceptualizing Eventual Durability ■ SQL-gen for Incremental View Maintenance ■ Data Engineer in a past life ■ Good mentorship is powerful and fundamental ■ At scale, the insignificant become significant! 2
  • 3.
    © Kriti Kathuria2023 Motivation ● At scale, the insignificant becomes significant! ● A single IO takes insignificant time ● But when it is GBs of data and thousands of IO ops, the latencies become significant. ● Thus, p99, at scale, matters. 3
  • 4.
    © Kriti Kathuria2023 Outline ● Motivation ● Existing Techniques ○ Aggregation Processing ○ Vectorization ○ Query Compilation ● Closing Remarks 4
  • 5.
    © Kriti Kathuria2023 Matrix Multiplication 5 ● Example from MIT 6.172, Fall 2018, Lecture 1
  • 6.
    © Kriti Kathuria2023 Matrix Multiplication 6 ● 3 matrices: A x B = C X = i k j k i j
  • 7.
    © Kriti Kathuria2023 Matrix Multiplication 7
  • 8.
    © Kriti Kathuria2023 Matrix Multiplication 8
  • 9.
    © Kriti Kathuria2023 Processing aggregate queries in a database 9
  • 10.
    © Kriti Kathuria2023 Aggregation during run-generation 10
  • 11.
    © Kriti Kathuria2023 Aggregation during run-generation 11 ● tpch sf = 1, 6M rows ● filter: 5M rows fetched from disk ● output: 4 rows Thanh Do, Goetz Graefe, and Jeffrey Naughton. 2023. Efficient Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans. Database Syst. 47, 4, Article 16 (December 2022), 35 pages. https://doi.org/10.1145/3568027
  • 12.
    © Kriti Kathuria2023 Aggregation during run-generation 12 ● tpch sf = 1, 6M rows ● filter: 5M rows fetched from disk ● output: 4 rows ● tpch sf = 1000, 6B rows ● filter: 5B rows fetched from disk ● output: 4 rows Thanh Do, Goetz Graefe, and Jeffrey Naughton. 2023. Efficient Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans. Database Syst. 47, 4, Article 16 (December 2022), 35 pages. https://doi.org/10.1145/3568027
  • 13.
    © Kriti Kathuria2023 Aggregation during run-generation 13 Run generation for sorting Sorted runs Reduction of sorted data Thanh Do, Goetz Graefe, and Jeffrey Naughton. 2023. Efficient Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans. Database Syst. 47, 4, Article 16 (December 2022), 35 pages. https://doi.org/10.1145/3568027
  • 14.
    © Kriti Kathuria2023 Aggregation during run-generation 14 In-memory index Unsorted data on disk Thanh Do, Goetz Graefe, and Jeffrey Naughton. 2023. Efficient Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans. Database Syst. 47, 4, Article 16 (December 2022), 35 pages. https://doi.org/10.1145/3568027
  • 15.
    © Kriti Kathuria2023 Vectorized Query Processing 15
  • 16.
    © Kriti Kathuria2023 Vectorized Query Processing 16
  • 17.
    © Kriti Kathuria2023 Vectorized Query Processing 17
  • 18.
    © Kriti Kathuria2023 JIT Query Compilation 18
  • 19.
    © Kriti Kathuria2023 JIT Query Compilation 19
  • 20.
    © Kriti Kathuria2023 JIT Query Compilation 20
  • 21.
    © Kriti Kathuria2023 Kriti Kathuria linkedin.com/in/kriti-kathuria/ twitter.com/kaykathuria Thank you! Let’s connect. 21