Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1    3                1                     5            1            412    Five Steps    to PostgreSQL11      Performanc...
1                  3                   1                                      5postgresql.conf                          1 ...
0. Getting Outfitted
5 Layer Cake   Queries     Transactions             Application   Drivers     Connections    Caching   Middleware   Schema...
5 Layer Cake   Queries     Transactions             Application   Drivers     Connections    Caching   Middleware   Schema...
Scalability Funnel             Application             Middleware             PostgreSQL                 OS               ...
What Flavor is Your DB?                 O                                        1W   ►Web Application (Web)     ●DB small...
What Flavor is Your DB?               O                                      1O   ►Online Transaction Processing     (OLTP...
What Flavor is Your DB?             O                                    1D   ►Data Warehousing (DW)     ●Large to huge da...
Tips for Good Form               O                                 1►Engineer for the problems you have ●not for the ones ...
Tips for Good Form                     O                                       1►A little overallocation is cheaper than d...
Tips for Good Form                   O                                     1►Test, Tune, and Test Again ●you cant measure ...
Tips for Good Form                        O                                          1►Most server performance is threshol...
1   Application     Design
Schema Design                             1                                          1►Table design ●do not optimize prema...
Schema Design                             1                                          1►Table design ●consider using natura...
Schema Design                              1                                           1►Table design ●think of when data ...
Schema Design                         1                                      1►Indexing ●index most foreign keys ●index co...
Schema Design                            1                                         1►Not Indexing ●indexes cost you on upd...
Right indexes?                         1                                       1►pg_stat_user_indexes ●shows indexes not b...
Partitioning                              1                                          1►Partition large or growing tables  ...
Partitioning                             1                                         1►Application must be partition-complia...
Query design                         1                                     1►Do more with each query ●PostgreSQL does well...
Query design                        1                                    1►Do more with each transaction ●batch related wr...
Query design                           1                                       1►Know the query gotchas (per version) ●Alw...
But I use ORM!                         1                                       1►ORM != high performance ●ORM is for fast ...
Its All About Caching                 1                                       1►Use prepared queries    W O ●whenever you ...
Its All About Caching                       1                                             1►Cache, cache everywhere       ...
Its All About Caching              1                                    1But …►think carefully about cache invalidation  ●...
Connection Management               1                                    1►Connections take resources   W O ●RAM, CPU ●tra...
Connection Management                    1                                         1►Make sure youre only using       W O ...
Pooling                              1                                     1►Over 100 connections? You need pooling!   Web...
Pooling                              1                                     1►New connections are expensive ●use persistent...
21    Query    Tuning
Bad Queries                                                                                                               ...
Optimize Your Queries                                          1                                          2in Test►Before ...
Optimize Your Queries                                        1                                        2in Test►Look for “b...
Finding bad queries            1                               2               ►Log Analysis                 ●dozens of lo...
Fixing bad queries         1                           2►EXPLAIN ANALYZE►things to look for: ●bad rowcount estimates ●sequ...
Fixing bad queries                      1                                        2►reading explain analyze is an art  ●its...
Query Optimization Cycle                    1                                             2    log queries               r...
Query Optimization Cycle                                        1                                        2 (new)          ...
Procedure Optimization                                           1                                           2 Cycle    lo...
Procedure Optimization                                         1                                         2 Cycle (new)    ...
3                  1postgresql.conf
max_connections                    3                                   1►As many as you need to use  ●web apps: 100 to 300...
shared_buffers                          3                                        1►1/4 of RAM on a dedicated server       ...
Other memory parameters                   3                                          1►work_mem ●non-shared   ▬lower it fo...
Other memory parameters                     3                                            1►maintenance_work_mem ●the faste...
Other memory parameters             3                                    1►temp_buffers ●max size of temp tables before sw...
Commits                               3                                      1►checkpoint_segments ●more if you have the d...
Query tuning                             3                                         1►effective_cache_size  ●RAM available ...
Query tuning                           3                                       1►effective_io_concurrency ●set to number o...
A word aboutRandom Page Cost                                      3                                      1►Abused as a “fo...
Maintenance                            3                                       1►Autovacuum ●leave it on for any applicati...
Maintenance                           3                                      1►Autovacuum ●have 100s or 1000s of tables?  ...
1  4   OS &Filesystem
Spread Your Files Around             1                                     4►Separate the transaction log if   O D possibl...
Spread Your Files Around              1                                      4number of drives/arrays      1    2     3   ...
Spread Your Files Around               1                                       4►Tablespaces for temp files     D ●more fr...
Linux Tuning                             1                                         4►Filesystems ●Use XFS or Ext4   ▬butrf...
Linux Tuning                        1                                    4►OS tuning ●must increase shmmax, shmall in kern...
Linux Tuning                   1                               4►Turn off the OOM Killer!  ● vm.oom-kill = 0  ● vm.overcom...
OpenSolaris/IIlumos                 1                                    4►Filesystems ●Use ZFS   ▬reduce block size to 8K...
Windows, OSX Tuning      1                         4►Youre joking, right?
What about The Cloud?               1                                    4►Configuring for cloud servers is different  ●sh...
What about The Cloud?                    1                                         4►Some general advice: ●make sure your ...
Set up Monitoring!                        1                                          4►Get warning ahead of time ●know abo...
1 5Hardware
Hardware Basics                   1                                  5►Four basic components: ●CPU ●RAM ●I/O: Disks and di...
Hardware Basics                           1                                          5►Different priorities for different ...
Getting Enough CPU                   1                                     5►One Core, One Query  ●How many concurrent que...
Getting Enough RAM                 1                                   5►RAM use is "thresholded" ●as long as you are abov...
Getting Enough RAM                     1                                       5►Critical RAM thresholds           W ●Do y...
Getting Enough RAM                          1                                            5►Critical RAM thresholds        ...
Getting Enough RAM                          1                                            5►Critical RAM thresholds        ...
Other RAM Issues                    1                                    5►Get ECC RAM ●Better to know about bad RAM befor...
Getting Enough I/O                   1                                     5►Will your database be I/O Bound? ●many writes...
Getting Enough I/O                       1                                         5►Optimize for the I/O youll need  ●if ...
I/O Decision Tree                                                 1                                                       ...
I/O Tips                            1                                    5►RAID ●get battery backup and turn your write  c...
I/O Tips                               1                                       5►DAS/SAN/NAS ●measure lag time: it can kil...
I/O Tips           1                   5           iSCSI             =           death
SSD                                   1                                      5►Very fast seeks              D ●great for i...
NAND (FusionIO)                      1                                     5All the advantages of SSD, Plus:►Very fast wri...
Tablespaces for NVRAM                    1                                         5►Have a "hot" and a "cold" tablespace ...
Network                           1                                  5►Network can be your bottleneck ●lag time ●bandwith ...
Network                            1                                   5►Have dedicated connections ●between appserver and...
Network                                 1                                        5►Data Transfers ●Gigabit is only 100MB/s...
The Most ImportantHardware Advice:                                      1                                      5►Quality m...
The Most ImportantHardware Advice:                                       1                                       5►High-pe...
The Most ImportantHardware Advice:                                    1                                    5►Make sure you...
The Most ImportantHardware Advice:                                 1                                 5►So Test, Test, Test...
Questions?                                                                                           1                    ...
Upcoming SlideShare
Loading in …5
×

Five steps perform_2013

5,654 views

Published on

Five steps perform_2013

  1. 1. 1 3 1 5 1 412 Five Steps to PostgreSQL11 Performance Josh Berkus PostgreSQL Project MelPUG 2013
  2. 2. 1 3 1 5postgresql.conf 1 4 Hardware 1 2 OS & Filesystem Query Tuning 1 1 Application Design
  3. 3. 0. Getting Outfitted
  4. 4. 5 Layer Cake Queries Transactions Application Drivers Connections Caching Middleware Schema Config PostgreSQL Filesystem Kernel Operating System Storage RAM/CPU Network Hardware
  5. 5. 5 Layer Cake Queries Transactions Application Drivers Connections Caching Middleware Schema Config PostgreSQL Filesystem Kernel Operating System Storage RAM/CPU Network Hardware
  6. 6. Scalability Funnel Application Middleware PostgreSQL OS HW
  7. 7. What Flavor is Your DB? O 1W ►Web Application (Web) ●DB smaller than RAM ●90% or more “one-liner” queries
  8. 8. What Flavor is Your DB? O 1O ►Online Transaction Processing (OLTP) ●DB slightly larger than RAM to 1TB ●20-70% small data write queries, some large transactions
  9. 9. What Flavor is Your DB? O 1D ►Data Warehousing (DW) ●Large to huge databases (100GB to 100TB) ●Large complex reporting queries ●Large bulk loads of data ●Also called "Decision Support" or "Business Intelligence"
  10. 10. Tips for Good Form O 1►Engineer for the problems you have ●not for the ones you dont
  11. 11. Tips for Good Form O 1►A little overallocation is cheaper than downtime ●unless youre an OEM, dont stint a few GB ●resource use will grow over time
  12. 12. Tips for Good Form O 1►Test, Tune, and Test Again ●you cant measure performance by “it seems fast”
  13. 13. Tips for Good Form O 1►Most server performance is thresholded ●“slow” usually means “25x slower” ●its not how fast it is, its how close you are to capacity
  14. 14. 1 Application Design
  15. 15. Schema Design 1 1►Table design ●do not optimize prematurely ▬normalize your tables and wait for a proven issue to denormalize ▬Postgres is designed to perform well with normalized tables ●Entity-Attribute-Value tables and other innovative designs tend to perform poorly
  16. 16. Schema Design 1 1►Table design ●consider using natural keys ▬can cut down on the number of joins you need ●BLOBs can be slow ▬have to be completely rewritten, compressed ▬can also be fast, thanks to compression
  17. 17. Schema Design 1 1►Table design ●think of when data needs to be updated, as well as read ▬sometimes you need to split tables which will be updated at different times ▬dont trap yourself into updating the same rows multiple times
  18. 18. Schema Design 1 1►Indexing ●index most foreign keys ●index common WHERE criteria ●index common aggregated columns ●learn to use special index types: expressions, full text, partial
  19. 19. Schema Design 1 1►Not Indexing ●indexes cost you on updates, deletes ▬especially with HOT ●too many indexes can confuse the planner ●dont index: tiny tables, low-cardinality columns
  20. 20. Right indexes? 1 1►pg_stat_user_indexes ●shows indexes not being used ●note that it doesnt record unique index usage►pg_stat_user_tables ●shows seq scans: index candidates? ●shows heavy update/delete tables: index less
  21. 21. Partitioning 1 1►Partition large or growing tables ●historical data ▬data will be purged ▬massive deletes are server-killers ●very large tables ▬anything over 10GB / 10m rows ▬partition by active/passive
  22. 22. Partitioning 1 1►Application must be partition-compliant ●every query should call the partition key ●pre-create your partitions ▬do not create them on demand … they will lock
  23. 23. Query design 1 1►Do more with each query ●PostgreSQL does well with fewer larger queries ●not as well with many small queries ●avoid doing joins, tree-walking in middleware
  24. 24. Query design 1 1►Do more with each transaction ●batch related writes into large transactions
  25. 25. Query design 1 1►Know the query gotchas (per version) ●Always try rewriting subqueries as joins ●try swapping NOT IN and NOT EXISTS for bad queries ●try to make sure that index/key types match ●avoid unanchored text searches "ILIKE %josh%"
  26. 26. But I use ORM! 1 1►ORM != high performance ●ORM is for fast development, not fast databases ●make sure your ORM allows "tweaking" queries ●applications which are pushing the limits of performance probably cant use ORM ▬but most dont have a problem
  27. 27. Its All About Caching 1 1►Use prepared queries W O ●whenever you have repetitive loops
  28. 28. Its All About Caching 1 1►Cache, cache everywhere W O ●plan caching: on the PostgreSQL server ●parse caching: in some drivers ●data caching: ▬in the appserver ▬in memcached/varnish/nginx ▬in the client (javascript, etc.) ●use as many kinds of caching as you can
  29. 29. Its All About Caching 1 1But …►think carefully about cache invalidation ●and avoid “cache storms”
  30. 30. Connection Management 1 1►Connections take resources W O ●RAM, CPU ●transaction checking
  31. 31. Connection Management 1 1►Make sure youre only using W O connections you need ●look for “<IDLE>” and “<IDLE> in Transaction” ●log and check for a pattern of connection growth ●make sure that database and appserver timeouts are synchronized
  32. 32. Pooling 1 1►Over 100 connections? You need pooling! Webserver Webserver Pool PostgreSQL Webserver
  33. 33. Pooling 1 1►New connections are expensive ●use persistent connections or connection pooling sofware ▬appservers ▬pgBouncer ▬pgPool (sort of) ●set pool side to maximum connections needed
  34. 34. 21 Query Tuning
  35. 35. Bad Queries 1 2 Ranked Query Execution Times 5000 4000 3000 execution time 2000 1000 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 % ranking
  36. 36. Optimize Your Queries 1 2in Test►Before you go production ●simulate user load on the application ●monitor and fix slow queries ●look for worst procedures
  37. 37. Optimize Your Queries 1 2in Test►Look for “bad queries” ●queries which take too long ●data updates which never complete ●long-running stored procedures ●interfaces issuing too many queries ●queries which block
  38. 38. Finding bad queries 1 2 ►Log Analysis ●dozens of logging options ●log_min_duration_ statement ●pgfouine ●pgBadger
  39. 39. Fixing bad queries 1 2►EXPLAIN ANALYZE►things to look for: ●bad rowcount estimates ●sequential scans ●high-count loops ●large on-disk sorts
  40. 40. Fixing bad queries 1 2►reading explain analyze is an art ●its an inverted tree ●look for the deepest level at which the problem occurs►try re-writing complex queries several ways
  41. 41. Query Optimization Cycle 1 2 log queries run pgbadger explain analyzeapply fixes worst queries troubleshoot worst queries
  42. 42. Query Optimization Cycle 1 2 (new) check pg_stat_statements explain analyzeapply fixes worst queries troubleshoot worst queries
  43. 43. Procedure Optimization 1 2 Cycle log queries run pg_fouine instrumentapply fixes worst functions find slow operations
  44. 44. Procedure Optimization 1 2 Cycle (new) check pg_stat_function instrumentapply fixes worst functions find slow operations
  45. 45. 3 1postgresql.conf
  46. 46. max_connections 3 1►As many as you need to use ●web apps: 100 to 300 W O D ●analytics: 20 to 40►If you need more than 100 regularly, use a connection pooler ●like pgbouncer
  47. 47. shared_buffers 3 1►1/4 of RAM on a dedicated server W O ●not more than 8GB (test) ●cache_miss statistics can tell you if you need more►less buffers to preserve cache space D
  48. 48. Other memory parameters 3 1►work_mem ●non-shared ▬lower it for many connections W O ▬raise it for large queries D ●watch for signs of misallocation ▬swapping RAM: too much work_mem ▬log temp files: not enough work_mem ●probably better to allocate by task/ROLE
  49. 49. Other memory parameters 3 1►maintenance_work_mem ●the faster vacuum completes, the better ▬but watch out for multiple autovacuum workers! ●raise to 256MB to 1GB for large databases ●also used for index creation ▬raise it for bulk loads
  50. 50. Other memory parameters 3 1►temp_buffers ●max size of temp tables before swapping to disk ●raise if you use lots of temp tables D►wal_buffers ●raise it to 32MB
  51. 51. Commits 3 1►checkpoint_segments ●more if you have the disk: 16, 64, 128►synchronous_commit ●response time more important than data integrity? ●turn synchronous_commit = off W ●lose a finite amount of data in a shutdown
  52. 52. Query tuning 3 1►effective_cache_size ●RAM available for queries ●set it to 3/4 of your available RAM►default_statistics_target D ●raise to 200 to 1000 for large databases ●now defaults to 100 ●setting statistics per column is better
  53. 53. Query tuning 3 1►effective_io_concurrency ●set to number of disks or channels ●advisory only ●Linux only
  54. 54. A word aboutRandom Page Cost 3 1►Abused as a “force index use” parameter►Lower it if the seek/scan ratio of your storage is actually different ●SSD/NAND: 1.0 to 2.0 ●EC2: 1.1 to 2.0 ●High-end SAN: 2.5 to 3.5►Never below 1.0
  55. 55. Maintenance 3 1►Autovacuum ●leave it on for any application which gets constant writes W O ●not so good for batch writes -- do manual vacuum for bulk loads D
  56. 56. Maintenance 3 1►Autovacuum ●have 100s or 1000s of tables? multiple_autovacuum_workers ▬but not more than ½ cores ●large tables? raise autovacuum_vacuum_cost_limit ●you can change settings per table
  57. 57. 1 4 OS &Filesystem
  58. 58. Spread Your Files Around 1 4►Separate the transaction log if O D possible ●pg_xlog directory ●on a dedicated disk/array, performs 10-50% faster ●many WAL options only work if you have a separate drive
  59. 59. Spread Your Files Around 1 4number of drives/arrays 1 2 3 which partitionOS/applications 1 1 1transaction log 1 1 2database 1 2 3
  60. 60. Spread Your Files Around 1 4►Tablespaces for temp files D ●more frequently useful if you do a lot of disk sorts ●Postgres can round-robin multiple temp tablespaces
  61. 61. Linux Tuning 1 4►Filesystems ●Use XFS or Ext4 ▬butrfs not ready yet, may never work for DB ▬Ext3 has horrible flushing behavior ●Reduce logging ▬data=ordered, noatime, nodiratime
  62. 62. Linux Tuning 1 4►OS tuning ●must increase shmmax, shmall in kernel ●use deadline or noop scheduler to speed writes ●disable NUMA memory localization (recent) ●check your kernel version carefully for performance issues!
  63. 63. Linux Tuning 1 4►Turn off the OOM Killer! ● vm.oom-kill = 0 ● vm.overcommit_memory = 2 ● vm.overcommit_ratio = 80
  64. 64. OpenSolaris/IIlumos 1 4►Filesystems ●Use ZFS ▬reduce block size to 8K W O ●turn off full_page_writes►OS configuration ●no need to configure shared memory ●use packages compiled with Sun compiler
  65. 65. Windows, OSX Tuning 1 4►Youre joking, right?
  66. 66. What about The Cloud? 1 4►Configuring for cloud servers is different ●shared resources ●unreliable I/O ●small resource limits►Also depends on which cloud ●AWS, Rackspace, Joyent, GoGrid… so I cant address it all here.
  67. 67. What about The Cloud? 1 4►Some general advice: ●make sure your database fits in RAM ▬except on Joyent ●Dont bother with most OS/FS tuning ▬just some basic FS configuration options ●use synchronous_commit = off if possible
  68. 68. Set up Monitoring! 1 4►Get warning ahead of time ●know about performance problems before they go critical ●set up alerts ▬80% of capacity is an emergency! ●set up trending reports ▬is there a pattern of steady growth
  69. 69. 1 5Hardware
  70. 70. Hardware Basics 1 5►Four basic components: ●CPU ●RAM ●I/O: Disks and disk bandwidth ●Network
  71. 71. Hardware Basics 1 5►Different priorities for different applications ●Web: CPU, Network, RAM, ... I/O W ●OLTP: balance all O ●DW: I/O, CPU, RAM D
  72. 72. Getting Enough CPU 1 5►One Core, One Query ●How many concurrent queries do you need? ●Best performance at 1 core per no more than two concurrent queries►So if you can up your core count, do►Also: L1, L2 cache size matters
  73. 73. Getting Enough RAM 1 5►RAM use is "thresholded" ●as long as you are above the amount of RAM you need, even 5%, server will be fast ●go even 1% over and things slow down a lot
  74. 74. Getting Enough RAM 1 5►Critical RAM thresholds W ●Do you have enough RAM to keep the database in shared_buffers? ▬Ram 3x to 6x the size of DB
  75. 75. Getting Enough RAM 1 5►Critical RAM thresholds O ●Do you have enough RAM to cache the whole database? ▬RAM 2x to 3x the on-disk size of the database ●Do you have enough RAM to cache the “working set”? ▬the data which is needed 95% of the time
  76. 76. Getting Enough RAM 1 5►Critical RAM thresholds D ●Do you have enough RAM for sorts & aggregates? ▬Whats the largest data set youll need to work with? ▬For how many users
  77. 77. Other RAM Issues 1 5►Get ECC RAM ●Better to know about bad RAM before it corrupts your data.►What else will you want RAM for? ●RAMdisk? ●SWRaid? ●Applications?
  78. 78. Getting Enough I/O 1 5►Will your database be I/O Bound? ●many writes: bound by transaction log O ●database much larger than RAM: bound by I/O for many/most queries D
  79. 79. Getting Enough I/O 1 5►Optimize for the I/O youll need ●if you DB is terabytes, spend most of your money on disks ●calculate how long it will take to read your entire database from disk ▬backups ▬snapshots ●dont forget the transaction log!
  80. 80. I/O Decision Tree 1 5lots of fits in No Yes mirroredwrites? RAM? Yes No afford terabytes HW RAID good HW Yes No of data? RAID? Yes No mostlySW RAID Storage read? Device Yes No RAID 5 RAID 1+0
  81. 81. I/O Tips 1 5►RAID ●get battery backup and turn your write cache on ●SAS has 2x the real throughput of SATA ●more spindles = faster database ▬big disks are generally slow
  82. 82. I/O Tips 1 5►DAS/SAN/NAS ●measure lag time: it can kill response time ●how many channels? ▬“gigabit” is only 100mb/s ▬make sure multipath works ●use fiber if you can afford it
  83. 83. I/O Tips 1 5 iSCSI = death
  84. 84. SSD 1 5►Very fast seeks D ●great for index access on large tables ●up to 20X faster►Not very fast random writes ●low-end models can be slower than HDD ●most are about 2X speed►And use server models, not desktop!
  85. 85. NAND (FusionIO) 1 5All the advantages of SSD, Plus:►Very fast writes ( 5X to 20X ) W O ●more concurrency on writes ●MUCH lower latency►But … very expensive (50X)
  86. 86. Tablespaces for NVRAM 1 5►Have a "hot" and a "cold" tablespace ●current data on "hot" O D ●older/less important data on "cold" ●combine with partitioning►compromise between speed and size
  87. 87. Network 1 5►Network can be your bottleneck ●lag time ●bandwith ●oversubscribed switches ●NAS
  88. 88. Network 1 5►Have dedicated connections ●between appserver and database server ●between database server and failover server ●between database and storage
  89. 89. Network 1 5►Data Transfers ●Gigabit is only 100MB/s ●Calculate capacity for data copies, standby, dumps
  90. 90. The Most ImportantHardware Advice: 1 5►Quality matters ●not all CPUs are the same ●not all RAID cards are the same ●not all server systems are the same ●one bad piece of hardware, or bad driver, can destroy your application performance
  91. 91. The Most ImportantHardware Advice: 1 5►High-performance databases means hardware expertise ●the statistics dont tell you everything ●vendors lie ●you will need to research different models and combinations ●read the pgsql-performance mailing list
  92. 92. The Most ImportantHardware Advice: 1 5►Make sure you test your hardware before you put your database on it ●“Try before you buy” ●Never trust the vendor or your sysadmins
  93. 93. The Most ImportantHardware Advice: 1 5►So Test, Test, Test! ●CPU: PassMark, sysbench, Spec CPU ●RAM: memtest, cachebench, Stream ●I/O: bonnie++, dd, iozone ●Network: bwping, netperf ●DB: pgBench, sysbench
  94. 94. Questions? 1 6►Josh Berkus ►More Advice ● josh@pgexperts.com ● www.postgresql.org/docs ● www.pgexperts.com ● pgsql-performance ▬ /presentations.html mailing list ● www.databasesoup.com ● planet.postgresql.org ● irc.freenode.net ▬ #postgresql This talk is copyright 2013 Josh Berkus, and is licensed under the creative commons attribution license

×