3. Areas Already Looked into…
• Code Optimization
• SQL Query Tuning
• Keeping latest data into a separate table than
the table keeping historic data
4. Areas To Be Looked into…
• DB Design
– Disk I/O
• Number of Columns in a Table
• Choose right Data Types for Columns
– Tune DB Buffer Size
• To improve Caching
– Standard recommended solution to deal with large data
set tables
• Table Partitioning
• Application Design
– Tune JDBC Code and Design
– Apply Application Level Cache
5. How Postgres Stores Data
• Disk Files
– Files under path: /var/lib/pgsql/avvqdb/pgdata/base/16386/
• Page Size
– The size of a page is fixed at 8,192 bytes
– All disk I/O is performed on a page-by-page basis, when you select a
single row from a table, PostgreSQL will read at least one page
– Heap page & Index Page
• Heap and Index Cache Hits
– Disk I/O is expensive
– Postgres itself tracks access patterns of your data and will on its own
keep frequently accessed data in cache
– Caches heap and index pages
– Insertion order matters for effective caching
6. Minimize Disk I/O
– Normalize Database
• Remove unnecessary columns
– Choose Right Data Types
• Avoid using larger size data type if data values are small
and can fit into smaller data types
• It has a direct impact on Cache hits
7. Tune DB Buffer Size
– Adjust the DB Buffer Size
• To improve heap block cache hits
• And to improve index block cache hits
8. Handle Large Data Set Tables
• Table partition standard solution for large data set tables
• The average number of heap/index blocks you'll have to navigate in order
to find a row goes down
• Partition also benefits on choosing the right scan types during query
• There are some maintenance advantages too. You can DROP an individual
partition, to erase all of the data from that range. This is a common
technique for pruning historical data out of a partitioned table, one that
avoids the VACUUM cleanup work that DELETE leaves behind.
• Dynamic partition rules can be setup which minimizes maintenance
overhead and transparent to application layer
• Tips:
– On what column to partition matters
– No. of partitions should not be large
– Race condition if two separate transactions inserts
Handle Large Data Set Tables
9. Handle Large Data Set TablesPerformance Improvement Matrix
Original Table: TestTable (Total Records: 29756342)
Master Partition Table: TestTable _Master (with 10 child tables)
OS: RHEL 6.5, CPU: 8 core, RAM: 6 GB
Query
No.
Query Received Timestamp Range Total Records
Found
Total Record in
Table
1st between 1396915200 and 1397001600
(2 days)
12929330 29756342
2nd between 1396915200 1397001600
(1 day)
4320000 29757518
3rd between 1396915200 1397001600
(1 day)
4320000 29757518
10. Handle Large Data Set TablesPerformance Improvement Matrix
Query Attempt On Original
Table
On Master
Partition Table
On Master
Partition Table
with parallel
queries
1st 1st 949 sec 732 sec 294 sec
1st 2nd 938 sec 549 sec 290 sec
2nd 1st 367 sec 185 sec
3rd 1st 457 sec 128 sec
11. Thank You
For any queries please reach out to me at
mchopker@gmail.com