Kudo Codefest: Faster data retrival with SQL query optimization

Faster Data Retrieval with
SQL Query Optimization
<andrew@kudo.co.id>
<ajeng.tya@kudo.co.id>
Andrew Kaligis
Ajeng Tya Meiranti

Kudo use agents as its primary business model
PROBLEM
To make kudo grow, we need to grow our agents across all provinces.
A lot of agents, means alot of transaction.
Growing transactions made a lot of kind of data saved into our database.
Millions of data spreading in our database in hundreds of tables

How search data faster in
this such millions of data?

How to keep our
performance while our
data still growing every
day?

Indexing
“indexing in database is like an index in a books”
 Columns are often used in the clause "where" or the join condition.
 Column contains values with a wide coverage.
 The column contains many null values.
 Table is large and most of the display data is more than 2-4%

Some programmers has a habit to write "SELECT * FROM my_table“.
Avoid
(Select * from)
fetch all column fetch only required tables
(agent_name & city)
0.1 KB * 6 column * 1000000 rows
= 600000 KB
(585.9 MB)
0.1 KB * 2 column * 1000000 rows
= 200000 KB
(195,3 MB)
Query with * means that you select all column when table scan.
example :
Our table has 6 columns (id, agent_name, address, city, province_id, distributor_id) and
1.000.000 rows
Each cell contain 2 KB data

Avoid
(Select * from)
The result of both query is very significant.
So, never use * inside your query if it does not need to.

Case :
Show 50 data for each pages, need 0.1 KB * 2 column * 50 rows = 10 KB (Small, Isn't it?)
Pagination
Query Limit and Offset
 Faster retrieve data to show to end user

Split “joined query”
Total Sales By Main Cetegory
Category
id category_id main_category_id
1 5 3
2 6 2
3 7 1
4 8 4
Map_Category
id item_id category_id
1 8001 5
2 8002 6
3 8003 7
4 8004 8
Item_Category Order
id item_id total_sales
1 8001 3
2 8002 2
3 8003 1
4 8004 4
id name
1 Fashion
2 Healthy
3 Elekrtonic
4 Others
5 TV
6 Tooth Health
7 Shoes
8 Toys

Case Query
SELECT order.total_sales
FROM order
LEFT JOIN (
SELECT item_category.item_id,
item_category.category_id
map_category.main_category_id
FROM item_category
LEFT JOIN map_category
ON item_category.category_id = map_category.category_id
GROUP BY item_category.item_id
) AS flag_category
ON order.item_id = flag_category.item_id

Part 1
id name
1 Fashion
2 Healthy
3 Elekrtonic
4 Others
5 TV
6 Tooth Health
7 Shoes
8 Toys
id category_id main_category_id
1 5 3
2 6 2
3 7 1
4 8 4
Map_Category
SELECT category_id,main_category_id
FROM map_category
Category

Part 2
id item_id category_id
1 8001 3
2 8002 2
3 8003 1
4 8004 4
SELECT category_id,item_id
FROM item_category
Item Category

Part 3
id item_id total_sales
1 8001 3
2 8002 2
3 8003 1
4 8004 4
SELECT item_id , total_sales
FROM order
Order

Caching Mechanism
Load data faster without a query to the server

Caching Mechanism
 Redis using RAM to store the data
 It helps to fetch the data faster, processing data in
RAM is faster than Hard Disk
 Redis using key-value data structure
 We can get specific collection using specific key

Caching Mechanism
 Sample implementation

Denormalization table
 it contains rows with
multiple values for an
attribute (repeating groups)
or
Denormalization is the process of attempting to optimize the read
performance of a database by adding redundant data or by
grouping data.
https://en.wikipedia.org/wiki/Denormalization

Denormalization table
Still, denormalization brings the danger of update anomalies back to the database.
Therefore, you have to do it deliberately. You should document any denormalization
thoroughly.
Id name
1 TIKI
2 JNE
Id name
1 Jakarta
2 Depok
Id name
1 Shoes
2 Handphone
Shipping Address item
Order_id Order_date Shipping_name Address_name Item_name
12010 2016/05/26 TIKI Jakarta Handphone
12011 2016/05/26 TIKI Depok Handphone

“ The fastest query is
the one you never
make “

Andrew Kaligis
andrew@kudo.co.id
Ajeng Tya Meiranti
ajeng.tya@kudo.co.id

Kudo Codefest: Faster data retrival with SQL query optimization

More Related Content

Similar to Kudo Codefest: Faster data retrival with SQL query optimization

Recently uploaded

Kudo Codefest: Faster data retrival with SQL query optimization