The Zero-ETL Approach: Enhancing Data Agility and Insight
Square Root Decomposition
1. Square Root Decompostion
Square Root Decomposition is a technique used by competitve programmers as
the most common query optimization technique.This method is allow us to
perfor some common operations(finding some of the element of given sub-array,
finding the min / max element etc.) in O(sqrt(n)).Which is way more faster
than naive O(n) approach.
The key idea of this technique is decompose the original array into smaller
chunks specifically sqrt(n).
Lets say we have an array of n=9 elements and we decompose this array into
small chunks of size sqrt(n).
index :
Elements :
Here , n=9
So, ceil(sqrt(n)) = 3
This means we are going to decompose this array into 3 chunks and every chunks
will have a precomputed value of query we need to perform.
Suppose we are finding the sum of a range between l , r.
Then the decomposed array will save those sums for us to get further result
from it.
index :
Elements :
Block - 0 Block - 1 Block - 2
Decomposed Array
2. How we know which block is contain which range ??
We simply can divide the index with sqrt(n).Which is in this case 3.
So ,
0 / 3 = 0 (block - 0)
1 / 3 = 0 (block - 0)
4 / 3 = 1 (block - 1)
8 / 3 = 2 (block - 2) ans so on.
Query range types
index :
Elements :
Block - 0 Block - 1 Block - 2
Decomposed Array
Given Range is in block boundaries
l = 0 , r = 5
l = 3 , r = 5
In this type , the range may cover the blocks evenly and we can answer it by
summing up the completely overlapped blcock.
As example,
l = 0 , r = 5
this query totally overlapped the block 0,1.
So the ans would be decompose[0] + decompose[1] = 12
we get the result in just 2 step where in naive approach you will
need atlest 6 in this query.
3. Time complexity of this in worst case the range can be
l = 0 , r = n - 1
so , we need to sum up sqrt(n) blocks
as a result the complexity becomes O(sqrt(n))
Given Range is not in block boundaries
index :
Elements :
Block - 0 Block - 1 Block - 2
Decomposed Array
l = 1 , r = 6
l = 4 , r = 7
In this type , we can sum up the overlapped parts from the decomposed array and
some tail values from the original array.
as example , l = 1 , r = 6
see, in this segment , from 3 to 5 resides in block-1
we can add the decompose[1] with our answer
and from start, index 1 , 2 are calculated from orignal array
so answer becomes ans = decompose[1] + original[1] + original[2]
and from end, index 6 calculated from original.
as final result
ans = decompose[1] + original[1] + original[2] + original[6]
ans = 4 + 2 + 1 + 4
ans = 11
4. Time complexity of this in worst case can be like
l = 1 , r = n - 2;
this means the number of overlapped blocks is sqrt(n)-2 bc the first
and last block is partial here.
so to calculate the overlapped parts we need,
** sqrt(n)-2 ~ sqrt(n) time.
and to calculate the first and last partial blocks
we need to go through sqrt(n) - 1 elements from the original array.
so from both side,
** sqrt(n)-1 + sqrt(n)-1
so finally,
overlapped blocks + front non partial blocks + back non partial blocks
= sqrt(n) + sqrt(n) + sqrt(n)
= 3 * sqrt(n) ~ sqrt(n).
Implementation :
//preprocess the decomposed array