Released at “Linux Kernel Summit 2009”
                       http://events.linuxfoundation.org/archive/2009/linux-kernel-...
Agenda
Background
  Postgres won’t use Direct I/O!
  Storage and buffer usage in Postgres


Discussions
  Low priority I/O...
Background: Postgres won’t use Direct I/O!
Our policy is to delegate as much as possible to the kernel
and avoid re-implem...
Background: Storage and buffer usage in Postgres
  Consist of multiple processes.
  Use file system and multiple files. (p...
Low priority I/O for background tasks
PostgreSQL uses some background tasks
  VACUUM – cleanup DELETE’d rows and reclaim t...
Avoid duplicated caching in DB and kernel buffers
Both postgres and kernel might cache file data because postgres uses
buf...
Upcoming SlideShare
Loading in …5
×

Wish list from PostgreSQL - Linux Kernel Summit 2009

1,581 views

Published on

This explains storage and buffer usage in Postgres and discusses about I/O and buffer management in Linux kernel.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,581
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Wish list from PostgreSQL - Linux Kernel Summit 2009

  1. 1. Released at “Linux Kernel Summit 2009” http://events.linuxfoundation.org/archive/2009/linux-kernel-summit Linux Kernel Summit 2009 Wish list from PostgreSQL Itagaki Takahiro NTT Open Source Software Center October 18 - 20, 2009 - Tokyo, Japan
  2. 2. Agenda Background Postgres won’t use Direct I/O! Storage and buffer usage in Postgres Discussions Low priority I/O for background tasks Avoid duplicated caching in DB and kernel buffers 2
  3. 3. Background: Postgres won’t use Direct I/O! Our policy is to delegate as much as possible to the kernel and avoid re-implementing the whole block layer in user- space of PostgreSQL. It might be opposite requirements from commercial DBMS folks. We’d like to keep I/O layer in small. codes for block layer is We won’t use RAW device, too. <30K lines (5%) Layout of files should be managed by file system. Postgres code lines (600K lines) Not ideal, but it is good approach to support many platforms by a small number of developers. <100 active main developers <10 committers support >10 platforms 3
  4. 4. Background: Storage and buffer usage in Postgres Consist of multiple processes. Use file system and multiple files. (per 1GB of table / per 16MB of xlog) Mainly use traditional system calls. (lseek, read, write, fsync) Starting to use posix_fadvise() in the latest version. We depends on kernel buffer cache and I/O managements. Do not use synchronous I/O to access data files. Do not read-ahead by itself; expect read() to do it. fork() postmaster (listener process) writer backend (sync process) (SQL executor process) own shared buffer pool with shmget() lseek() lseek() write() own I/O exclusion control read() overwrites fsync() write() expands data files 1GB 1GB 1GB xlog files 16MB 16MB 16MB storage + file system 4
  5. 5. Low priority I/O for background tasks PostgreSQL uses some background tasks VACUUM – cleanup DELETE’d rows and reclaim the area. CHECKPOINT – flush all modified pages to disks. Current behavior in Postgres Take some sleep every constant amount of I/O. Consume constant I/O band width regardless of workload. Ideal behavior Does operation blocked by fsync() ? on-cache page off-cache page Background tasks can use all of read() not blocked blocked surplus I/O band width as far as write() blocked blocked it does not affect to service. lseek() blocked pread() not blocked sometimes Requirements pwrite() sometimes sometimes Low priority I/O should affect buffered writes and fsync. Normal I/O should not wait for low priority I/Os; so fsync should not block lseek, read, write (both overwrites and extends). 5
  6. 6. Avoid duplicated caching in DB and kernel buffers Both postgres and kernel might cache file data because postgres uses buffered I/O. Same blocks might be cached in DB and kernel buffers. duplicated Approaches to eliminate duplicated caching DB buffers Direct I/O kernel buffers Pros: Can eliminate kernel cache Cons: Need to add I/O manager to Postgres storage mmap Pros: Can eliminate DB cache Cons: Hard to implements “Write-Ahead Logging” because mapped blocks could be flushed out at arbitrary timing. mmap is better to avoid reinvention of I/O manager in Postgres. Requirements Have a control flag to prevent modified blocks to be flushed out. The flag is released when WAL buffers are written into storage. – mlock() is not enough because it cannot prevent flushing. madvise( MADV_{ DOFLUSH | DONTFLUSH } ) ? 6

×