Êàê PostgreSQL ðàáîòàåò ñ äèñêîì 
Èëüÿ Êîñìîäåìüÿíñêèé 
ik@postgresql-consulting.com
Ïëàí 
 Çà÷åì áàçå äèñê? 
 ×òî îñîáåííîãî â ðàáîòå ñ äèñêîì ó PostgreSQL? 
 Ïîòåíöèàëüíûå óçêèå ìåñòà 
 Ìîíèòîðèíã äèñêîâîé íàãðóçêè 
 Âûáîð àïïàðàòíîãî îáåñïå÷åíèÿ äëÿ ñåðâåðà ñ PostgreSQL 
 Íàñòðîéêà
Çà÷åì áàçå äèñê? 
 ×èòàòü ñòðàíèöû ñ äèñêà 
 Ïèñàòü Write Ahead Log (WAL) 
 Ñèíõðîíèçèðîâàòü WAL ñ õðàíèëèùåì (CHECKPOINT)
Çà÷åì áàçå äèñê? 
 ×èòàòü ñòðàíèöû ñ äèñêà 
 Ïèñàòü Write Ahead Log (WAL) 
 Ñèíõðîíèçèðîâàòü WAL ñ õðàíèëèùåì (CHECKPOINT) 
Ñïåöèôèêà PostgreSQL 
 autovacuum 
 pg_clog 
 tmp, äèñêîâûå ñîðòèðîâêè, õýøèðîâàíèå
Çà÷åì áàçå äèñê? 
 ×èòàòü ñòðàíèöû ñ äèñêà 
 Ïèñàòü Write Ahead Log (WAL) 
 Ñèíõðîíèçèðîâàòü WAL ñ õðàíèëèùåì (CHECKPOINT) 
Ñïåöèôèêà PostgreSQL 
 autovacuum 
 pg_clog 
 tmp, äèñêîâûå ñîðòèðîâêè, õýøèðîâàíèå
Æèçíåííûé öèêë ñòðàíèöû â PostgreSQL 
shared_buers 
l 
êýø ÎÑ 
l 
äèñêè
Checkpoint 
Äëÿ ÷åãî íóæåí? 
 ×èñòûåñòðàíèöû ïîäíèìàþòñÿâ shared_buers, åñëè õîòü îäèí tuple 
èçìåíèëñÿ, ñòðàíèöà ïîìå÷àåòñÿ êàê ãðÿçíàÿ 
 COMMIT; íå âîçâðàùàåò óïðàâëåíèÿ, ïîêà ñòðàíèöà íå çàïèñàíà â WAL 
 Âðåìÿ îò âðåìåíè ïðîèñõîäèò CHECKPOINT: ãðÿçíûå ñòðàíèöû èç 
shared_buers ïèøóòñÿ íà äèñê (fsync) 
 Ïðè áîëüøèõ çíà÷åíèÿõ shared_buers âîçíèêàþò ïðîáëåìû
Checkpoint 
Äèàãíîñòèêà 
 Âñïëåñêè íà ìîíèòîðèíãå äèñêîâîé óòèëèçàöèè (iostat -d -x 1, ïîñëåäíÿÿ 
êîëîíêà, %util) 
 pg_stat_bgwriter
pg_stat_bgwriter 
pgbench=# select * from pg_stat_bgwriter ; 
-[ RECORD 1 ]---------+------------------------------ 
checkpoints_timed | 29 
checkpoints_req | 13 
checkpoint_write_time | 206345 
checkpoint_sync_time | 9989 
buffers_checkpoint | 67720 
buffers_clean | 1046 
maxwritten_clean | 0 
buffers_backend | 48142 
buffers_backend_fsync | 0 
buffers_alloc | 30137 
stats_reset | 2014-10-24 17:59:15.812002-04 
postgres=# select pg_stat_reset_shared(’bgwriter’); 
-[ RECORD 1 ]--------+- 
pg_stat_reset_shared |
×òî äåëàòü? 
Hardware: RAID 
 Äåøåâûé RAID êîíòðîëëåð - çëî, óæ ëó÷øå software RAID 
 RAID êîíòðîëëåð äîëæåí áûòü ñ áàòàðåéêîé 
 Ïðîèçâîäèòåëè: megaraid èëè perc - OK, à íàïðèìåð HP èëè ARECA - íå 
âñåãäà 
 Áàòàðåéêà - ïðèñóòñòâóåò è èñïðàâíà 
 cache mode ! write back 
 io mode ! direct 
 Disk Write Cache Mode ! disabled
×òî äåëàòü? 
Hardware: Äèñêè 
 2,5SAS (âïîëíå áûâàåò 15K) - seek â 2-3 ðàçà áûñòðåå ÷åì 3,5 
 Íå âñå SSD îäèíàêîâî ïîëåçíû: enterprise level Intel p3700 vs desktop-level 
Samsung 
 Èñïîëüçîâàòü òîëüêî SSD äëÿ PostgreSQL îçíà÷àåò áûòü ãîòîâûìè ê ðÿäó 
ïðîáëåì 
 RAID 1+0 
 Eñëè íåò õîðîøèõ äèñêîâ èëè êîíòðîëëåðà - synchronous_commit ! o
×òî äåëàòü? 
Ôàéëîâàÿ ñèñòåìà 
 xfs èëè ext4 - ÎÊ, zfs è ïðî÷èå lvm - íå ïðî ïðîèçâîäèòåëüíîñòü 
 barrier=0, noatime
×òî äåëàòü? 
Îïåðàöèîííàÿ ñèñòåìà 
 Ïî óìîë÷àíèþ vm:dirty_ratio = 20 vm:dirty_background_ratio = 10 - ïëîõî 
 Áîëåå ðàçóìíûå çíà÷åíèÿ vm:dirty_background_bytes = 67108864 
vm:dirty_bytes = 536870912 (512Mb êýø íà RAID) 
 Åñëè íåò êýøà íà RAID - ðàçäåëèòü íà 4
×òî äåëàòü? 
postgresql.conf 
 wal_buers (768kB ! 16Mb) 
 checkpoint_segments (3 - ÷åêïîéíò êàæäûå 48Mb ! 256 - 4Gb) 
 checkpoint_timeout = 60 (÷òî ïåðâîå íàñòóïèò) 
 checkpoint_completion_target = 0:9 (äëÿ ðàñïðåäåëåíèÿ ÷åêïîéíòîâ)
Êàê ñåáÿ ïðîâåðèòü? 
pgdev@pg-dev-deb:~$ tt_pg/bin/pg_test_fsync 
5 seconds per test 
O_DIRECT supported on this platform for open_datasync and open_sync. 
Compare file sync methods using one 8kB write: 
(in wal_sync_method preference order, except fdatasync 
is Linux’s default) 
open_datasync 11396.056 ops/sec 88 usecs/op 
fdatasync 11054.894 ops/sec 90 usecs/op 
fsync 10692.608 ops/sec 94 usecs/op 
fsync_writethrough n/a 
open_sync 67.045 ops/sec 14915 usecs/op 
Compare file sync methods using two 8kB writes: 
(in wal_sync_method preference order, except fdatasync
Íåáîëüøàÿ õèòðîñòü: ïóñòü bgwriter ðàáîòàåò 
postgres=# select name, setting, context, max_val, min_val from pg_settings where name ~ ’bgwr’; 
name | setting | context | max_val | min_val 
-------------------------+---------+---------+---------+--------- 
bgwriter_delay | 200 | sighup | 10000 | 10 
bgwriter_lru_maxpages | 100 | sighup | 1000 | 0 
bgwriter_lru_multiplier | 2 | sighup | 10 | 0 
(3 rows)
Ïðî ÷òî íå çàáûòü: autovacuum 
postgres=# select name, setting, context from pg_settings where category ~ ’Autovacuum’; 
name | setting | context 
-------------------------------------+-----------+------------ 
autovacuum | on | sighup 
autovacuum_analyze_scale_factor | 0.02 | sighup 
autovacuum_analyze_threshold | 50 | sighup 
autovacuum_freeze_max_age | 200000000 | postmaster 
autovacuum_max_workers | 3 | postmaster 
autovacuum_multixact_freeze_max_age | 400000000 | postmaster 
autovacuum_naptime | 60 | sighup 
autovacuum_vacuum_cost_delay | 20 | sighup 
autovacuum_vacuum_cost_limit | -1 | sighup 
autovacuum_vacuum_scale_factor | 0.001 | sighup 
autovacuum_vacuum_threshold | 50 | sighup 
(11 rows)
Âîïðîñû? 
ik@postgresql-consulting.com

Как PostgreSQL работает с диском

  • 1.
    Êàê PostgreSQL ðàáîòàåòñ äèñêîì Èëüÿ Êîñìîäåìüÿíñêèé ik@postgresql-consulting.com
  • 2.
    Ïëàí Çà÷åìáàçå äèñê? ×òî îñîáåííîãî â ðàáîòå ñ äèñêîì ó PostgreSQL? Ïîòåíöèàëüíûå óçêèå ìåñòà Ìîíèòîðèíã äèñêîâîé íàãðóçêè Âûáîð àïïàðàòíîãî îáåñïå÷åíèÿ äëÿ ñåðâåðà ñ PostgreSQL Íàñòðîéêà
  • 3.
    Çà÷åì áàçå äèñê? ×èòàòü ñòðàíèöû ñ äèñêà Ïèñàòü Write Ahead Log (WAL) Ñèíõðîíèçèðîâàòü WAL ñ õðàíèëèùåì (CHECKPOINT)
  • 4.
    Çà÷åì áàçå äèñê? ×èòàòü ñòðàíèöû ñ äèñêà Ïèñàòü Write Ahead Log (WAL) Ñèíõðîíèçèðîâàòü WAL ñ õðàíèëèùåì (CHECKPOINT) Ñïåöèôèêà PostgreSQL autovacuum pg_clog tmp, äèñêîâûå ñîðòèðîâêè, õýøèðîâàíèå
  • 5.
    Çà÷åì áàçå äèñê? ×èòàòü ñòðàíèöû ñ äèñêà Ïèñàòü Write Ahead Log (WAL) Ñèíõðîíèçèðîâàòü WAL ñ õðàíèëèùåì (CHECKPOINT) Ñïåöèôèêà PostgreSQL autovacuum pg_clog tmp, äèñêîâûå ñîðòèðîâêè, õýøèðîâàíèå
  • 6.
    Æèçíåííûé öèêë ñòðàíèöûâ PostgreSQL shared_buers l êýø ÎÑ l äèñêè
  • 7.
    Checkpoint Äëÿ ÷åãîíóæåí? ×èñòûåñòðàíèöû ïîäíèìàþòñÿâ shared_buers, åñëè õîòü îäèí tuple èçìåíèëñÿ, ñòðàíèöà ïîìå÷àåòñÿ êàê ãðÿçíàÿ COMMIT; íå âîçâðàùàåò óïðàâëåíèÿ, ïîêà ñòðàíèöà íå çàïèñàíà â WAL Âðåìÿ îò âðåìåíè ïðîèñõîäèò CHECKPOINT: ãðÿçíûå ñòðàíèöû èç shared_buers ïèøóòñÿ íà äèñê (fsync) Ïðè áîëüøèõ çíà÷åíèÿõ shared_buers âîçíèêàþò ïðîáëåìû
  • 8.
    Checkpoint Äèàãíîñòèêà Âñïëåñêè íà ìîíèòîðèíãå äèñêîâîé óòèëèçàöèè (iostat -d -x 1, ïîñëåäíÿÿ êîëîíêà, %util) pg_stat_bgwriter
  • 9.
    pg_stat_bgwriter pgbench=# select* from pg_stat_bgwriter ; -[ RECORD 1 ]---------+------------------------------ checkpoints_timed | 29 checkpoints_req | 13 checkpoint_write_time | 206345 checkpoint_sync_time | 9989 buffers_checkpoint | 67720 buffers_clean | 1046 maxwritten_clean | 0 buffers_backend | 48142 buffers_backend_fsync | 0 buffers_alloc | 30137 stats_reset | 2014-10-24 17:59:15.812002-04 postgres=# select pg_stat_reset_shared(’bgwriter’); -[ RECORD 1 ]--------+- pg_stat_reset_shared |
  • 10.
    ×òî äåëàòü? Hardware:RAID Äåøåâûé RAID êîíòðîëëåð - çëî, óæ ëó÷øå software RAID RAID êîíòðîëëåð äîëæåí áûòü ñ áàòàðåéêîé Ïðîèçâîäèòåëè: megaraid èëè perc - OK, à íàïðèìåð HP èëè ARECA - íå âñåãäà Áàòàðåéêà - ïðèñóòñòâóåò è èñïðàâíà cache mode ! write back io mode ! direct Disk Write Cache Mode ! disabled
  • 11.
    ×òî äåëàòü? Hardware:Äèñêè 2,5SAS (âïîëíå áûâàåò 15K) - seek â 2-3 ðàçà áûñòðåå ÷åì 3,5 Íå âñå SSD îäèíàêîâî ïîëåçíû: enterprise level Intel p3700 vs desktop-level Samsung Èñïîëüçîâàòü òîëüêî SSD äëÿ PostgreSQL îçíà÷àåò áûòü ãîòîâûìè ê ðÿäó ïðîáëåì RAID 1+0 Eñëè íåò õîðîøèõ äèñêîâ èëè êîíòðîëëåðà - synchronous_commit ! o
  • 12.
    ×òî äåëàòü? Ôàéëîâàÿñèñòåìà xfs èëè ext4 - ÎÊ, zfs è ïðî÷èå lvm - íå ïðî ïðîèçâîäèòåëüíîñòü barrier=0, noatime
  • 13.
    ×òî äåëàòü? Îïåðàöèîííàÿñèñòåìà Ïî óìîë÷àíèþ vm:dirty_ratio = 20 vm:dirty_background_ratio = 10 - ïëîõî Áîëåå ðàçóìíûå çíà÷åíèÿ vm:dirty_background_bytes = 67108864 vm:dirty_bytes = 536870912 (512Mb êýø íà RAID) Åñëè íåò êýøà íà RAID - ðàçäåëèòü íà 4
  • 14.
    ×òî äåëàòü? postgresql.conf wal_buers (768kB ! 16Mb) checkpoint_segments (3 - ÷åêïîéíò êàæäûå 48Mb ! 256 - 4Gb) checkpoint_timeout = 60 (÷òî ïåðâîå íàñòóïèò) checkpoint_completion_target = 0:9 (äëÿ ðàñïðåäåëåíèÿ ÷åêïîéíòîâ)
  • 15.
    Êàê ñåáÿ ïðîâåðèòü? pgdev@pg-dev-deb:~$ tt_pg/bin/pg_test_fsync 5 seconds per test O_DIRECT supported on this platform for open_datasync and open_sync. Compare file sync methods using one 8kB write: (in wal_sync_method preference order, except fdatasync is Linux’s default) open_datasync 11396.056 ops/sec 88 usecs/op fdatasync 11054.894 ops/sec 90 usecs/op fsync 10692.608 ops/sec 94 usecs/op fsync_writethrough n/a open_sync 67.045 ops/sec 14915 usecs/op Compare file sync methods using two 8kB writes: (in wal_sync_method preference order, except fdatasync
  • 16.
    Íåáîëüøàÿ õèòðîñòü: ïóñòübgwriter ðàáîòàåò postgres=# select name, setting, context, max_val, min_val from pg_settings where name ~ ’bgwr’; name | setting | context | max_val | min_val -------------------------+---------+---------+---------+--------- bgwriter_delay | 200 | sighup | 10000 | 10 bgwriter_lru_maxpages | 100 | sighup | 1000 | 0 bgwriter_lru_multiplier | 2 | sighup | 10 | 0 (3 rows)
  • 17.
    Ïðî ÷òî íåçàáûòü: autovacuum postgres=# select name, setting, context from pg_settings where category ~ ’Autovacuum’; name | setting | context -------------------------------------+-----------+------------ autovacuum | on | sighup autovacuum_analyze_scale_factor | 0.02 | sighup autovacuum_analyze_threshold | 50 | sighup autovacuum_freeze_max_age | 200000000 | postmaster autovacuum_max_workers | 3 | postmaster autovacuum_multixact_freeze_max_age | 400000000 | postmaster autovacuum_naptime | 60 | sighup autovacuum_vacuum_cost_delay | 20 | sighup autovacuum_vacuum_cost_limit | -1 | sighup autovacuum_vacuum_scale_factor | 0.001 | sighup autovacuum_vacuum_threshold | 50 | sighup (11 rows)
  • 18.