Managing Terabytes                          Problems and solutions with                       operating large Postgres ins...
About me.                          2                            1c1e                         re0n                      Uef...
The Environment                       • 1.6 TB, 1 cluster,Version 8.2                       • 1.1 TB, 1 cluster,Version 8....
Some stats                       • daily peak: ~3000 commits per second                       • average writes: 4 MBps    ...
What’s good                       • Most queries are fast!                       • Benchmarks say we’re pushing the limits...
And lots more. But...                                        1c1e                                     re0n                ...
1c1e             re0n          Uef2        .En      Cfo   Ceon ogmSP
The Problems                  1. System resource exhaustion                  2. Everything is slow: Huge catalogs, Backups...
System Resource Exhaustion                                             1c1e                                          re0n ...
Running out of inodes                       Problem: UFS on Solaris                       “The only way to add more inodes...
Running out of inodes                       Solution 0: Delete files.                       Solution 1: Sharding/bigger file...
Running out of                           file descriptors                       Problem: Too many open files                ...
Running out of                           file descriptors                       Solution: You need a connection            ...
Everything is slow.                                      1c1e                                   re0n                      ...
Huge Catalogs                       409,994 tablesSP ogm   Ceon      Cfo        .En          Uef2             re          ...
Maintenance problem                       Minor mistake in parent table definitions:                       not null default...
Huge Catalogs                       Problem: Slow scans of catalog data                       Solution:                   ...
Stats collection                       9,019,868 total data points for table stats                       4,550,770 total d...
Stats collection                       9,019,868 total data points for table stats                       4,550,770 total d...
Stats collection                       9,019,868 total data points for table stats                       4,550,770 total d...
Stats collection                       9,019,868 total data points for table stats                       4,550,770 total d...
Backups                       pg_dump takes longer and longer...SP ogm   Ceon      Cfo        .En          Uef2           ...
                            backup          |   duration                        -------------------+--------------------  ...
Backups                       Problem: pg_dump is too slow.                       Solutions:                       • patch...
How not to migrate                        to a 64-bit systemSP ogm   Ceon      Cfo        .En          Uef2             re...
Title Text                       Install 32-bit Postgres and libraries on a 64-bit system.                       Install 6...
A single warm standby                          is not a backup.                       But lots of people use them that way...
Ship WAL from Solaris x86 -> Linux                                  It did work!SP ogm   Ceon      Cfo        .En         ...
Handling VACUUM problems                                           1c1e                                        re0n       ...
Bloat                       Problem: Lots of dead tuples in tables.                       • Frequent UPDATEs to long table...
Fixing bloat                       Solution: Write custom scripts to clean                       • VACUUM for small things...
Catalog Bloat                       Application allowed users to initiate ALTER                       TABLE.              ...
Transaction                       wraparound avoidance                       Problem: autovacuum set off too              ...
Upgrades                           1c1e                        re0n                     Uef2                   .En        ...
Minor upgrades                       Problem: Restarting Postgres causes bad                       application performance...
Minor upgrades                       Solutions:                         • Plan for a CHECKPOINT before                    ...
Major Version upgrades                       Problem: Major upgrades are a PITA.                         • <8.2 - no pg_up...
Major Version upgrades                       Solutions: :(                         • >=8.3 - pg_upgrade                   ...
Major Version upgrades                       Solutions: :(                         • Write tools to migrate data          ...
The Problems                  1. System resource exhaustion                  2. Everything is slow: Huge catalogs, Backups...
The Solutions                  1. System resource exhaustion                     Choose a better filesystem, Pooling       ...
The Solutions                  3. Handling VACUUM problems: Bloat,                     Transaction wraparound             ...
Thanks!                           Managing Terabytes                          Problems and solutions with                 ...
Upcoming SlideShare
Loading in...5
×

Managing terabytes

1,531

Published on

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,531
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
15
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Managing terabytes

  1. 1. Managing Terabytes Problems and solutions with operating large Postgres installations Selena Deckelmann Prime Radiant @selenamarieSP ogm Ceon Cfo .En Uef2 re 1 0n 1c1e
  2. 2. About me. 2 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  3. 3. The Environment • 1.6 TB, 1 cluster,Version 8.2 • 1.1 TB, 1 cluster,Version 8.3 • 8.4/9.0 Dev systems • Working toward 9.0 into prod (May 2011) • pgpool, Redis, RabbitMQ, NFSSP ogm Ceon Cfo .En Uef2 re 3 0n 1c1e
  4. 4. Some stats • daily peak: ~3000 commits per second • average writes: 4 MBps • average reads: 8 MBpsSP ogm Ceon Cfo .En Uef2 re 4 0n 1c1e
  5. 5. What’s good • Most queries are fast! • Benchmarks say we’re pushing the limits of the hardware • Developers love working with PostgresSP ogm Ceon Cfo .En Uef2 re0n 1c1e
  6. 6. And lots more. But... 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  7. 7. 1c1e re0n Uef2 .En Cfo Ceon ogmSP
  8. 8. The Problems 1. System resource exhaustion 2. Everything is slow: Huge catalogs, Backups 3. Handling VACUUM problems: Bloat, Transaction wraparound 4. Upgrades: Minor, MajorSP ogm Ceon Cfo .En Uef2 re0n 1c1e
  9. 9. System Resource Exhaustion 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  10. 10. Running out of inodes Problem: UFS on Solaris “The only way to add more inodes to a UFS filesystem is: 1. destroy the filesystem and create a new filesystem with a higher inode density 2. enlarge the filesystem - growfs man page”SP ogm Ceon Cfo .En Uef2 re 10 0n 1c1e
  11. 11. Running out of inodes Solution 0: Delete files. Solution 1: Sharding/bigger filesystem Solution 2: xfsSP ogm Ceon Cfo .En Uef2 re 11 0n 1c1e
  12. 12. Running out of file descriptors Problem: Too many open files by the database. selena@lulu:~ #508 18:43 :) sudo lsof -p 19121 | wc 40 355 4151 Solution: You need a connection pooler.SP ogm Ceon Cfo .En Uef2 re 12 0n 1c1e
  13. 13. Running out of file descriptors Solution: You need a connection pooler. Recommended: pgbouncer (threaded, online upgrade) pgpool-II (failover)SP ogm Ceon Cfo .En Uef2 re 13 0n 1c1e
  14. 14. Everything is slow. 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  15. 15. Huge Catalogs 409,994 tablesSP ogm Ceon Cfo .En Uef2 re 15 0n 1c1e
  16. 16. Maintenance problem Minor mistake in parent table definitions: not null default nextval(important_sequence::text) vs not null default nextval(important_sequence::regclass)SP ogm Ceon Cfo .En Uef2 re 16 0n 1c1e
  17. 17. Huge Catalogs Problem: Slow scans of catalog data Solution: Upgrade to Postgres 8.4 or higher But really: Avoid making a cluster with >400k tables.SP ogm Ceon Cfo .En Uef2 re 17 0n 1c1e
  18. 18. Stats collection 9,019,868 total data points for table stats 4,550,770 total data points for index stats Problem: This is slow to write. (128 MB written every second or so)SP ogm Ceon Cfo .En Uef2 re 18 0n 1c1e
  19. 19. Stats collection 9,019,868 total data points for table stats 4,550,770 total data points for index stats Soution: Move stats file to RAM. stats_temp_directory (8.4 or higher) There’s a trivial patch for earlier versions.SP ogm Ceon Cfo .En Uef2 re 19 0n 1c1e
  20. 20. Stats collection 9,019,868 total data points for table stats 4,550,770 total data points for index stats Problem: This is slow to read.SP ogm Ceon Cfo .En Uef2 re 20 0n 1c1e
  21. 21. Stats collection 9,019,868 total data points for table stats 4,550,770 total data points for index stats Solution: Supposedly, this is better in 8.4 and higher. (fewer writes per minute) Still probably not fast.SP ogm Ceon Cfo .En Uef2 re 21 0n 1c1e
  22. 22. Backups pg_dump takes longer and longer...SP ogm Ceon Cfo .En Uef2 re 22 0n 1c1e
  23. 23.    backup     |   duration -------------------+--------------------  2009­11­22  |  02:44:36.821475   2009­11­23  |  02:46:20.003507  2009­11­24  |  02:47:06.260705  2009­12­06  |  07:13:04.174964  2009­12­13  |  05:00:01.082676  2009­12­20  |  06:24:49.433043  2009­12­27  |  05:35:20.551477  2010­01­03  |  07:36:49.651492  2010­01­10  |  05:55:02.396163  2010­01­17  |  07:32:33.277559  2010­01­24  |  06:22:46.522319  2010­01­31  |  10:48:13.060888  2010­02­07  |  21:21:47.77618  2010­02­14  |  14:32:04.638267  2010­02­21  |  11:34:42.353244  2010­02­28  |  11:13:02.102345SP ogm Ceon Cfo .En Uef2 re 23 0n 1c1e
  24. 24. Backups Problem: pg_dump is too slow. Solutions: • patching pg_dump for SELECT ... LIMIT • crank down shared_buffers • Stop using pg_dump for backups • 64-bit might helpSP ogm Ceon Cfo .En Uef2 re 24 0n 1c1e
  25. 25. How not to migrate to a 64-bit systemSP ogm Ceon Cfo .En Uef2 re 25 0n 1c1e
  26. 26. Title Text Install 32-bit Postgres and libraries on a 64-bit system. Install 64-bit Postgres/libs of the same version. Copy “hot backup” from 32-bit sys over to 64-bit sys. Run pg_dump from 64-bit version on 32-bit Postgres.SP ogm Ceon Cfo .En Uef2 re 26 0n 1c1e
  27. 27. A single warm standby is not a backup. But lots of people use them that way!SP ogm Ceon Cfo .En Uef2 re 27 0n 1c1e
  28. 28. Ship WAL from Solaris x86 -> Linux It did work!SP ogm Ceon Cfo .En Uef2 re 28 0n 1c1e
  29. 29. Handling VACUUM problems 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  30. 30. Bloat Problem: Lots of dead tuples in tables. • Frequent UPDATEs to long tables of log data • Frequent DELETEs without a VACUUM • A terabyte of dead tuplesSP ogm Ceon Cfo .En Uef2 re 30 0n 1c1e
  31. 31. Fixing bloat Solution: Write custom scripts to clean • VACUUM for small things • CLUSTER for everything else • Considered TRUNCATESP ogm Ceon Cfo .En Uef2 re 31 0n 1c1e
  32. 32. Catalog Bloat Application allowed users to initiate ALTER TABLE. Regular VACUUM couldn’t fix it. VACUUM FULL of the catalog takes 2+ hours. Use of NOTIFY/LISTEN can also cause bloat.SP ogm Ceon Cfo .En Uef2 re 32 0n 1c1e
  33. 33. Transaction wraparound avoidance Problem: autovacuum set off too frequently Watch age(datfrozenxid) Solution: Increase autovacuum_freeze_max_age (default is 200 million, we increase to one billion)SP ogm Ceon Cfo .En Uef2 re 33 0n 1c1e
  34. 34. Upgrades 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  35. 35. Minor upgrades Problem: Restarting Postgres causes bad application performance. • Require a start/stop of database • Unexpected CHECKPOINT • Cold cacheSP ogm Ceon Cfo .En Uef2 re 35 0n 1c1e
  36. 36. Minor upgrades Solutions: • Plan for a CHECKPOINT before shutdown • Warm the cache (Queries that exercise indexes, maybe table scans)SP ogm Ceon Cfo .En Uef2 re 36 0n 1c1e
  37. 37. Major Version upgrades Problem: Major upgrades are a PITA. • <8.2 - no pg_upgrade :( • Time your restores. • Document your SLAs.SP ogm Ceon Cfo .En Uef2 re 37 0n 1c1e
  38. 38. Major Version upgrades Solutions: :( • >=8.3 - pg_upgrade • Time your restores. • Document your SLAs.SP ogm Ceon Cfo .En Uef2 re 38 0n 1c1e
  39. 39. Major Version upgrades Solutions: :( • Write tools to migrate data • Shard • Trigger-based replicationSP ogm Ceon Cfo .En Uef2 re 39 0n 1c1e
  40. 40. The Problems 1. System resource exhaustion 2. Everything is slow: Huge catalogs, Backups 3. Handling VACUUM problems: Bloat, Transaction wraparound 4. Upgrades: Minor, MajorSP ogm Ceon Cfo .En Uef2 re0n 1c1e
  41. 41. The Solutions 1. System resource exhaustion Choose a better filesystem, Pooling 2. Everything is slow: Huge catalogs, Backups Don’t do that, Monitor & Binary backupsSP ogm Ceon Cfo .En Uef2 re0n 1c1e
  42. 42. The Solutions 3. Handling VACUUM problems: Bloat, Transaction wraparound Developer education, Monitoring, Cleanup, *_max_freeze_age 4. Upgrades: Minor, Major Plan, Plan, Plan (CHECKPOINT, warm cache, pg_upgrade)SP ogm Ceon Cfo .En Uef2 re0n 1c1e
  43. 43. Thanks! Managing Terabytes Problems and solutions with operating large Postgres installations Selena Deckelmann Prime Radiant @selenamarieSP ogm Ceon Cfo .En Uef2 re 43 0n 1c1e
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×