About
PostgreSQL DBA.
Linux system administrator.
PostgreSQL-Consulting.com:
● 24/7 support.
● Audit, performance optimizations.
● Consulting and Training.
● Monitoring and Emergency.
● Capacity planning.
Slides: https://goo.gl/awmZ2H
Agenda
RDBMS on Linux, why?
Databases and Resources.
OS subsystems.
CPU, Process scheduling, Power saving policies.
Memory, VM, NUMA, Huge pages.
Storage, File Systems, Input/Output.
Other misc.
Why Linux?
Linux is a good choice:
● Active development & Community support.
● A lot of features & Fast implementation.
● Stable & Mature & Durable.
Databases & Resources
Concurrency
Query speed
Sort, group, hash,...
OS page cache
DB buffer pool
Local process cache
DB data files
Transaction Log
Cold start
CPU Memory
Storage
Databases & Resources
CPU Scheduling
NUMA
Power Saving
Virtual Memory
NUMA
Huge Pages
File Systems
Storage I/O
CPU Memory
Storage
Resources
CPU scheduler.
Virtual memory and NUMA.
Huge pages.
File systems.
Storage IO.
Power saving policy.
Others.
CPU scheduling
CPU scheduler responsible for proper processes planning:
Sysctl:
● kernel.sched_migration_cost_ns = 5000000 (default: 500000).
● kernel.sched_autogroup_enabled = 0 (default: 1).
http://www.postgresql.org/message-id/50E4AAB1.9040902@optionshouse.com
http://kernelnewbies.org/Linux_2_6_38#head-59575a6aeafa38490226a560ee02de89829a5b20
CPU scheduling
CPU scheduler responsible for proper processes planning:
Sysctl:
● kernel.sched_migration_cost_ns = 5000000 (default: 500000).
● kernel.sched_autogroup_enabled = 0 (default: 1).
http://www.postgresql.org/message-id/50E4AAB1.9040902@optionshouse.com
http://kernelnewbies.org/Linux_2_6_38#head-59575a6aeafa38490226a560ee02de89829a5b20
Be aware on Ubuntu: 12.04 #1055222 and 14.04 #1422016.
Use noautogroup kernel param instead of sysctl.conf.
Virtual Memory
What is it?
Allocator, Caching, Dirty pages and Writeback.
Virtual Memory
Virtual Memory
Sysctl:
vm.dirty_background_ratio & vm.dirty_ratio = disable it.
vm.dirty_background_bytes & vm.dirty_bytes = depends on ...
RAID cache size, 64MB/128MB otherwise
Virtual Memory
Out-of-memory & OOM-Killer
Sysctl: vm.swappiness = 1 (default: 60)
NUMA
S — Socket C — CPU core M — Memory bank
NUMA
BIOS: enable memory node interleaving.
Kernel boot: numa=off.
numactl utility.
Sysctl:
● vm.zone_reclaim_mode = 0 (default: 0).
● kernel.numa_balancing = 0 (default: 0).
Huge Pages
Huge pages vs. Transparent huge pages.
Huge pages are supported by many RDBMS.
Always disable transparent huge pages.
Huge Pages
Huge pages vs. Transparent huge pages.
Huge pages are supported by many RDBMS.
Always disable transparent huge pages.
/etc/rc.local:
● echo never > /sys/kernel/mm/transparent_hugepage/enabled
● echo never > /sys/kernel/mm/transparent_hugepage/defrag
Filesystems
Ext3 vs Ext4 vs XFS: what is better?
Filesystem Barriers.
Filesystems
Ext3 vs Ext4 vs XFS: what is better?
Filesystem Barriers.
Disable Write Cache:
● hdparm -W0 /dev/device
● MegaCli64 -LDSetProp -DisDskCache -Lall -aALL
Filesystems
Ext3 vs Ext4 vs XFS: what is better?
Filesystem Barriers.
Disable Write Cache:
● hdparm -W0 /dev/device
● MegaCli64 -LDSetProp -DisDskCache -Lall -aALL
Hardware RAID + BBU = barrier=0 (disable).
Software RAID = barrier=1 (enable).
Filesystems
Ext3 vs Ext4 vs XFS: what is better?
Filesystem Barriers.
Disable Write Cache:
● hdparm -W0 /dev/device
● MegaCli64 -LDSetProp -DisDskCache -Lall -aALL
Hardware RAID + BBU = barrier=0 (disable).
Software RAID = barrier=1 (enable).
Enterprise SSD with Power Loss Protection = barrier=0 (disable).
Storage IO
SATA/SAS vs SSD.
IO elevators.
Storage IO
SATA/SAS vs SSD.
IO elevators:
● noop: SSD, PCIe SSD, hi-end storages.
● deadline: RAID, SATA/SAS.
● cfq: good default.
● none (multi-queue block IO): SSD, PCIe SSD.
Storage IO
SATA/SAS vs SSD.
IO elevators:
● noop: SSD, PCIe SSD, hi-end storages.
● deadline: RAID, SATA/SAS.
● cfq: good default.
● none (multi-queue block IO): SSD, PCIe SSD.
# echo 'elevator_name' > /sys/block/<device>/queue/scheduler
kernel boot: elevator=<name>
/sys/block/*/queue/: rotational, rq_affinity, read_ahead_kb
Power Saving Policy
Drivers: acpi_cpufreq vs. intel_pstate.
scaling_governor.
Power Saving Policy
Drivers: acpi_cpufreq vs. intel_pstate.
scaling_governor:
● /sys/devices/system/cpu/cpuX/cpufreq/scaling_available_governors
● /sys/devices/system/cpu/cpuX/cpufreq/scaling_governor
Power Saving Policy
Drivers: acpi_cpufreq vs. intel_pstate.
scaling_governor:
● /sys/devices/system/cpu/cpuX/cpufreq/scaling_available_governors
● /sys/devices/system/cpu/cpuX/cpufreq/scaling_governor
acpi_cpufreq + performance.
intel_pstate + powersave.
Misc: Clocksources
What is clocksource?
acpi_pm vs. hpet vs. tsc.
/sys/devices/system/clocksource/clocksource0/available_clocksource.
/sys/devices/system/clocksource/clocksource0/current_clocksource.
Summary
Linux is a good choice for RDBMS:
Modern, Universal, Flexible, Stable.
Adapt Linux for your workloads.
Test → Change → Test → Commit/Rollback.
Questions?
Alexey Lesovsky
lesovsky@pgco.me
PostgreSQL-Consulting.com: Data maintenance at its best
https://postgresql-consulting.com

Алексей Лесовский "Тюнинг Linux для баз данных. "

  • 2.
    About PostgreSQL DBA. Linux systemadministrator. PostgreSQL-Consulting.com: ● 24/7 support. ● Audit, performance optimizations. ● Consulting and Training. ● Monitoring and Emergency. ● Capacity planning. Slides: https://goo.gl/awmZ2H
  • 3.
    Agenda RDBMS on Linux,why? Databases and Resources. OS subsystems. CPU, Process scheduling, Power saving policies. Memory, VM, NUMA, Huge pages. Storage, File Systems, Input/Output. Other misc.
  • 4.
    Why Linux? Linux isa good choice: ● Active development & Community support. ● A lot of features & Fast implementation. ● Stable & Mature & Durable.
  • 5.
    Databases & Resources Concurrency Queryspeed Sort, group, hash,... OS page cache DB buffer pool Local process cache DB data files Transaction Log Cold start CPU Memory Storage
  • 6.
    Databases & Resources CPUScheduling NUMA Power Saving Virtual Memory NUMA Huge Pages File Systems Storage I/O CPU Memory Storage
  • 7.
    Resources CPU scheduler. Virtual memoryand NUMA. Huge pages. File systems. Storage IO. Power saving policy. Others.
  • 8.
    CPU scheduling CPU schedulerresponsible for proper processes planning: Sysctl: ● kernel.sched_migration_cost_ns = 5000000 (default: 500000). ● kernel.sched_autogroup_enabled = 0 (default: 1). http://www.postgresql.org/message-id/50E4AAB1.9040902@optionshouse.com http://kernelnewbies.org/Linux_2_6_38#head-59575a6aeafa38490226a560ee02de89829a5b20
  • 9.
    CPU scheduling CPU schedulerresponsible for proper processes planning: Sysctl: ● kernel.sched_migration_cost_ns = 5000000 (default: 500000). ● kernel.sched_autogroup_enabled = 0 (default: 1). http://www.postgresql.org/message-id/50E4AAB1.9040902@optionshouse.com http://kernelnewbies.org/Linux_2_6_38#head-59575a6aeafa38490226a560ee02de89829a5b20 Be aware on Ubuntu: 12.04 #1055222 and 14.04 #1422016. Use noautogroup kernel param instead of sysctl.conf.
  • 10.
    Virtual Memory What isit? Allocator, Caching, Dirty pages and Writeback.
  • 11.
  • 12.
    Virtual Memory Sysctl: vm.dirty_background_ratio &vm.dirty_ratio = disable it. vm.dirty_background_bytes & vm.dirty_bytes = depends on ... RAID cache size, 64MB/128MB otherwise
  • 13.
    Virtual Memory Out-of-memory &OOM-Killer Sysctl: vm.swappiness = 1 (default: 60)
  • 14.
    NUMA S — SocketC — CPU core M — Memory bank
  • 15.
    NUMA BIOS: enable memorynode interleaving. Kernel boot: numa=off. numactl utility. Sysctl: ● vm.zone_reclaim_mode = 0 (default: 0). ● kernel.numa_balancing = 0 (default: 0).
  • 16.
    Huge Pages Huge pagesvs. Transparent huge pages. Huge pages are supported by many RDBMS. Always disable transparent huge pages.
  • 17.
    Huge Pages Huge pagesvs. Transparent huge pages. Huge pages are supported by many RDBMS. Always disable transparent huge pages. /etc/rc.local: ● echo never > /sys/kernel/mm/transparent_hugepage/enabled ● echo never > /sys/kernel/mm/transparent_hugepage/defrag
  • 18.
    Filesystems Ext3 vs Ext4vs XFS: what is better? Filesystem Barriers.
  • 19.
    Filesystems Ext3 vs Ext4vs XFS: what is better? Filesystem Barriers. Disable Write Cache: ● hdparm -W0 /dev/device ● MegaCli64 -LDSetProp -DisDskCache -Lall -aALL
  • 20.
    Filesystems Ext3 vs Ext4vs XFS: what is better? Filesystem Barriers. Disable Write Cache: ● hdparm -W0 /dev/device ● MegaCli64 -LDSetProp -DisDskCache -Lall -aALL Hardware RAID + BBU = barrier=0 (disable). Software RAID = barrier=1 (enable).
  • 21.
    Filesystems Ext3 vs Ext4vs XFS: what is better? Filesystem Barriers. Disable Write Cache: ● hdparm -W0 /dev/device ● MegaCli64 -LDSetProp -DisDskCache -Lall -aALL Hardware RAID + BBU = barrier=0 (disable). Software RAID = barrier=1 (enable). Enterprise SSD with Power Loss Protection = barrier=0 (disable).
  • 22.
    Storage IO SATA/SAS vsSSD. IO elevators.
  • 23.
    Storage IO SATA/SAS vsSSD. IO elevators: ● noop: SSD, PCIe SSD, hi-end storages. ● deadline: RAID, SATA/SAS. ● cfq: good default. ● none (multi-queue block IO): SSD, PCIe SSD.
  • 24.
    Storage IO SATA/SAS vsSSD. IO elevators: ● noop: SSD, PCIe SSD, hi-end storages. ● deadline: RAID, SATA/SAS. ● cfq: good default. ● none (multi-queue block IO): SSD, PCIe SSD. # echo 'elevator_name' > /sys/block/<device>/queue/scheduler kernel boot: elevator=<name> /sys/block/*/queue/: rotational, rq_affinity, read_ahead_kb
  • 25.
    Power Saving Policy Drivers:acpi_cpufreq vs. intel_pstate. scaling_governor.
  • 26.
    Power Saving Policy Drivers:acpi_cpufreq vs. intel_pstate. scaling_governor: ● /sys/devices/system/cpu/cpuX/cpufreq/scaling_available_governors ● /sys/devices/system/cpu/cpuX/cpufreq/scaling_governor
  • 27.
    Power Saving Policy Drivers:acpi_cpufreq vs. intel_pstate. scaling_governor: ● /sys/devices/system/cpu/cpuX/cpufreq/scaling_available_governors ● /sys/devices/system/cpu/cpuX/cpufreq/scaling_governor acpi_cpufreq + performance. intel_pstate + powersave.
  • 28.
    Misc: Clocksources What isclocksource? acpi_pm vs. hpet vs. tsc. /sys/devices/system/clocksource/clocksource0/available_clocksource. /sys/devices/system/clocksource/clocksource0/current_clocksource.
  • 29.
    Summary Linux is agood choice for RDBMS: Modern, Universal, Flexible, Stable. Adapt Linux for your workloads. Test → Change → Test → Commit/Rollback.
  • 30.
    Questions? Alexey Lesovsky lesovsky@pgco.me PostgreSQL-Consulting.com: Datamaintenance at its best https://postgresql-consulting.com