Faster Data Retrieval with
SQL Query Optimization
<andrew@kudo.co.id>
<ajeng.tya@kudo.co.id>
Andrew Kaligis
Ajeng Tya Meiranti
Kudo use agents as its primary business model
PROBLEM
To make kudo grow, we need to grow our agents across all provinces.
A lot of agents, means alot of transaction.
Growing transactions made a lot of kind of data saved into our database.
Millions of data spreading in our database in hundreds of tables
How search data faster in
this such millions of data?
How to keep our
performance while our
data still growing every
day?
Indexing
“indexing in database is like an index in a books”
 Columns are often used in the clause "where" or the join condition.
 Column contains values with a wide coverage.
 The column contains many null values.
 Table is large and most of the display data is more than 2-4%
Indexing
Some programmers has a habit to write "SELECT * FROM my_table“.
Avoid
(Select * from)
fetch all column fetch only required tables
(agent_name & city)
0.1 KB * 6 column * 1000000 rows
= 600000 KB
(585.9 MB)
0.1 KB * 2 column * 1000000 rows
= 200000 KB
(195,3 MB)
Query with * means that you select all column when table scan.
example :
Our table has 6 columns (id, agent_name, address, city, province_id, distributor_id) and
1.000.000 rows
Each cell contain 2 KB data
Avoid
(Select * from)
The result of both query is very significant.
So, never use * inside your query if it does not need to.
Case :
Show 50 data for each pages, need 0.1 KB * 2 column * 50 rows = 10 KB (Small, Isn't it?)
Pagination
Query Limit and Offset
 Faster retrieve data to show to end user
Join Many Tables Are
Bad
Split “joined query”
Total Sales By Main Cetegory
Category
id category_id main_category_id
1 5 3
2 6 2
3 7 1
4 8 4
Map_Category
id item_id category_id
1 8001 5
2 8002 6
3 8003 7
4 8004 8
Item_Category Order
id item_id total_sales
1 8001 3
2 8002 2
3 8003 1
4 8004 4
id name
1 Fashion
2 Healthy
3 Elekrtonic
4 Others
5 TV
6 Tooth Health
7 Shoes
8 Toys
Split “joined query”
Case Query
SELECT order.total_sales
FROM order
LEFT JOIN (
SELECT item_category.item_id,
item_category.category_id
map_category.main_category_id
FROM item_category
LEFT JOIN map_category
ON item_category.category_id = map_category.category_id
GROUP BY item_category.item_id
) AS flag_category
ON order.item_id = flag_category.item_id
Split “joined query”
Part 1
id name
1 Fashion
2 Healthy
3 Elekrtonic
4 Others
5 TV
6 Tooth Health
7 Shoes
8 Toys
id category_id main_category_id
1 5 3
2 6 2
3 7 1
4 8 4
Map_Category
SELECT category_id,main_category_id
FROM map_category
Category
Split “joined query”
Part 2
id item_id category_id
1 8001 3
2 8002 2
3 8003 1
4 8004 4
SELECT category_id,item_id
FROM item_category
Item Category
Split “joined query”
Part 3
id item_id total_sales
1 8001 3
2 8002 2
3 8003 1
4 8004 4
SELECT item_id , total_sales
FROM order
Order
Caching Mechanism
Load data faster without a query to the server
Caching Mechanism
 Redis using RAM to store the data
 It helps to fetch the data faster, processing data in
RAM is faster than Hard Disk
 Redis using key-value data structure
 We can get specific collection using specific key
Caching Mechanism
 Sample implementation
Denormalization table
 it contains rows with
multiple values for an
attribute (repeating groups)
or
Denormalization is the process of attempting to optimize the read
performance of a database by adding redundant data or by
grouping data.
https://en.wikipedia.org/wiki/Denormalization
Denormalization table
Still, denormalization brings the danger of update anomalies back to the database.
Therefore, you have to do it deliberately. You should document any denormalization
thoroughly.
Id name
1 TIKI
2 JNE
Id name
1 Jakarta
2 Depok
Id name
1 Shoes
2 Handphone
Shipping Address item
Order_id Order_date Shipping_name Address_name Item_name
12010 2016/05/26 TIKI Jakarta Handphone
12011 2016/05/26 TIKI Depok Handphone
“ The fastest query is
the one you never
make “
Andrew Kaligis
andrew@kudo.co.id
Ajeng Tya Meiranti
ajeng.tya@kudo.co.id

Kudo Codefest: Faster data retrival with SQL query optimization

  • 1.
    Faster Data Retrievalwith SQL Query Optimization <andrew@kudo.co.id> <ajeng.tya@kudo.co.id> Andrew Kaligis Ajeng Tya Meiranti
  • 2.
    Kudo use agentsas its primary business model PROBLEM To make kudo grow, we need to grow our agents across all provinces. A lot of agents, means alot of transaction. Growing transactions made a lot of kind of data saved into our database. Millions of data spreading in our database in hundreds of tables
  • 3.
    How search datafaster in this such millions of data?
  • 4.
    How to keepour performance while our data still growing every day?
  • 5.
    Indexing “indexing in databaseis like an index in a books”  Columns are often used in the clause "where" or the join condition.  Column contains values with a wide coverage.  The column contains many null values.  Table is large and most of the display data is more than 2-4%
  • 6.
  • 7.
    Some programmers hasa habit to write "SELECT * FROM my_table“. Avoid (Select * from) fetch all column fetch only required tables (agent_name & city) 0.1 KB * 6 column * 1000000 rows = 600000 KB (585.9 MB) 0.1 KB * 2 column * 1000000 rows = 200000 KB (195,3 MB) Query with * means that you select all column when table scan. example : Our table has 6 columns (id, agent_name, address, city, province_id, distributor_id) and 1.000.000 rows Each cell contain 2 KB data
  • 8.
    Avoid (Select * from) Theresult of both query is very significant. So, never use * inside your query if it does not need to.
  • 9.
    Case : Show 50data for each pages, need 0.1 KB * 2 column * 50 rows = 10 KB (Small, Isn't it?) Pagination Query Limit and Offset  Faster retrieve data to show to end user
  • 10.
  • 11.
    Split “joined query” TotalSales By Main Cetegory Category id category_id main_category_id 1 5 3 2 6 2 3 7 1 4 8 4 Map_Category id item_id category_id 1 8001 5 2 8002 6 3 8003 7 4 8004 8 Item_Category Order id item_id total_sales 1 8001 3 2 8002 2 3 8003 1 4 8004 4 id name 1 Fashion 2 Healthy 3 Elekrtonic 4 Others 5 TV 6 Tooth Health 7 Shoes 8 Toys
  • 12.
    Split “joined query” CaseQuery SELECT order.total_sales FROM order LEFT JOIN ( SELECT item_category.item_id, item_category.category_id map_category.main_category_id FROM item_category LEFT JOIN map_category ON item_category.category_id = map_category.category_id GROUP BY item_category.item_id ) AS flag_category ON order.item_id = flag_category.item_id
  • 13.
    Split “joined query” Part1 id name 1 Fashion 2 Healthy 3 Elekrtonic 4 Others 5 TV 6 Tooth Health 7 Shoes 8 Toys id category_id main_category_id 1 5 3 2 6 2 3 7 1 4 8 4 Map_Category SELECT category_id,main_category_id FROM map_category Category
  • 14.
    Split “joined query” Part2 id item_id category_id 1 8001 3 2 8002 2 3 8003 1 4 8004 4 SELECT category_id,item_id FROM item_category Item Category
  • 15.
    Split “joined query” Part3 id item_id total_sales 1 8001 3 2 8002 2 3 8003 1 4 8004 4 SELECT item_id , total_sales FROM order Order
  • 16.
    Caching Mechanism Load datafaster without a query to the server
  • 17.
    Caching Mechanism  Redisusing RAM to store the data  It helps to fetch the data faster, processing data in RAM is faster than Hard Disk  Redis using key-value data structure  We can get specific collection using specific key
  • 18.
  • 19.
    Denormalization table  itcontains rows with multiple values for an attribute (repeating groups) or Denormalization is the process of attempting to optimize the read performance of a database by adding redundant data or by grouping data. https://en.wikipedia.org/wiki/Denormalization
  • 20.
    Denormalization table Still, denormalizationbrings the danger of update anomalies back to the database. Therefore, you have to do it deliberately. You should document any denormalization thoroughly. Id name 1 TIKI 2 JNE Id name 1 Jakarta 2 Depok Id name 1 Shoes 2 Handphone Shipping Address item Order_id Order_date Shipping_name Address_name Item_name 12010 2016/05/26 TIKI Jakarta Handphone 12011 2016/05/26 TIKI Depok Handphone
  • 21.
    “ The fastestquery is the one you never make “
  • 22.
    Andrew Kaligis andrew@kudo.co.id Ajeng TyaMeiranti ajeng.tya@kudo.co.id