Your SlideShare is downloading. ×
0
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Managing terabytes
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Managing terabytes

1,492

Published on

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,492
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
15
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Managing Terabytes Problems and solutions with operating large Postgres installations Selena Deckelmann Prime Radiant @selenamarieSP ogm Ceon Cfo .En Uef2 re 1 0n 1c1e
  • 2. About me. 2 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 3. The Environment • 1.6 TB, 1 cluster,Version 8.2 • 1.1 TB, 1 cluster,Version 8.3 • 8.4/9.0 Dev systems • Working toward 9.0 into prod (May 2011) • pgpool, Redis, RabbitMQ, NFSSP ogm Ceon Cfo .En Uef2 re 3 0n 1c1e
  • 4. Some stats • daily peak: ~3000 commits per second • average writes: 4 MBps • average reads: 8 MBpsSP ogm Ceon Cfo .En Uef2 re 4 0n 1c1e
  • 5. What’s good • Most queries are fast! • Benchmarks say we’re pushing the limits of the hardware • Developers love working with PostgresSP ogm Ceon Cfo .En Uef2 re0n 1c1e
  • 6. And lots more. But... 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 7. 1c1e re0n Uef2 .En Cfo Ceon ogmSP
  • 8. The Problems 1. System resource exhaustion 2. Everything is slow: Huge catalogs, Backups 3. Handling VACUUM problems: Bloat, Transaction wraparound 4. Upgrades: Minor, MajorSP ogm Ceon Cfo .En Uef2 re0n 1c1e
  • 9. System Resource Exhaustion 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 10. Running out of inodes Problem: UFS on Solaris “The only way to add more inodes to a UFS filesystem is: 1. destroy the filesystem and create a new filesystem with a higher inode density 2. enlarge the filesystem - growfs man page”SP ogm Ceon Cfo .En Uef2 re 10 0n 1c1e
  • 11. Running out of inodes Solution 0: Delete files. Solution 1: Sharding/bigger filesystem Solution 2: xfsSP ogm Ceon Cfo .En Uef2 re 11 0n 1c1e
  • 12. Running out of file descriptors Problem: Too many open files by the database. selena@lulu:~ #508 18:43 :) sudo lsof -p 19121 | wc 40 355 4151 Solution: You need a connection pooler.SP ogm Ceon Cfo .En Uef2 re 12 0n 1c1e
  • 13. Running out of file descriptors Solution: You need a connection pooler. Recommended: pgbouncer (threaded, online upgrade) pgpool-II (failover)SP ogm Ceon Cfo .En Uef2 re 13 0n 1c1e
  • 14. Everything is slow. 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 15. Huge Catalogs 409,994 tablesSP ogm Ceon Cfo .En Uef2 re 15 0n 1c1e
  • 16. Maintenance problem Minor mistake in parent table definitions: not null default nextval(important_sequence::text) vs not null default nextval(important_sequence::regclass)SP ogm Ceon Cfo .En Uef2 re 16 0n 1c1e
  • 17. Huge Catalogs Problem: Slow scans of catalog data Solution: Upgrade to Postgres 8.4 or higher But really: Avoid making a cluster with >400k tables.SP ogm Ceon Cfo .En Uef2 re 17 0n 1c1e
  • 18. Stats collection 9,019,868 total data points for table stats 4,550,770 total data points for index stats Problem: This is slow to write. (128 MB written every second or so)SP ogm Ceon Cfo .En Uef2 re 18 0n 1c1e
  • 19. Stats collection 9,019,868 total data points for table stats 4,550,770 total data points for index stats Soution: Move stats file to RAM. stats_temp_directory (8.4 or higher) There’s a trivial patch for earlier versions.SP ogm Ceon Cfo .En Uef2 re 19 0n 1c1e
  • 20. Stats collection 9,019,868 total data points for table stats 4,550,770 total data points for index stats Problem: This is slow to read.SP ogm Ceon Cfo .En Uef2 re 20 0n 1c1e
  • 21. Stats collection 9,019,868 total data points for table stats 4,550,770 total data points for index stats Solution: Supposedly, this is better in 8.4 and higher. (fewer writes per minute) Still probably not fast.SP ogm Ceon Cfo .En Uef2 re 21 0n 1c1e
  • 22. Backups pg_dump takes longer and longer...SP ogm Ceon Cfo .En Uef2 re 22 0n 1c1e
  • 23.    backup     |   duration -------------------+--------------------  2009­11­22  |  02:44:36.821475   2009­11­23  |  02:46:20.003507  2009­11­24  |  02:47:06.260705  2009­12­06  |  07:13:04.174964  2009­12­13  |  05:00:01.082676  2009­12­20  |  06:24:49.433043  2009­12­27  |  05:35:20.551477  2010­01­03  |  07:36:49.651492  2010­01­10  |  05:55:02.396163  2010­01­17  |  07:32:33.277559  2010­01­24  |  06:22:46.522319  2010­01­31  |  10:48:13.060888  2010­02­07  |  21:21:47.77618  2010­02­14  |  14:32:04.638267  2010­02­21  |  11:34:42.353244  2010­02­28  |  11:13:02.102345SP ogm Ceon Cfo .En Uef2 re 23 0n 1c1e
  • 24. Backups Problem: pg_dump is too slow. Solutions: • patching pg_dump for SELECT ... LIMIT • crank down shared_buffers • Stop using pg_dump for backups • 64-bit might helpSP ogm Ceon Cfo .En Uef2 re 24 0n 1c1e
  • 25. How not to migrate to a 64-bit systemSP ogm Ceon Cfo .En Uef2 re 25 0n 1c1e
  • 26. Title Text Install 32-bit Postgres and libraries on a 64-bit system. Install 64-bit Postgres/libs of the same version. Copy “hot backup” from 32-bit sys over to 64-bit sys. Run pg_dump from 64-bit version on 32-bit Postgres.SP ogm Ceon Cfo .En Uef2 re 26 0n 1c1e
  • 27. A single warm standby is not a backup. But lots of people use them that way!SP ogm Ceon Cfo .En Uef2 re 27 0n 1c1e
  • 28. Ship WAL from Solaris x86 -> Linux It did work!SP ogm Ceon Cfo .En Uef2 re 28 0n 1c1e
  • 29. Handling VACUUM problems 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 30. Bloat Problem: Lots of dead tuples in tables. • Frequent UPDATEs to long tables of log data • Frequent DELETEs without a VACUUM • A terabyte of dead tuplesSP ogm Ceon Cfo .En Uef2 re 30 0n 1c1e
  • 31. Fixing bloat Solution: Write custom scripts to clean • VACUUM for small things • CLUSTER for everything else • Considered TRUNCATESP ogm Ceon Cfo .En Uef2 re 31 0n 1c1e
  • 32. Catalog Bloat Application allowed users to initiate ALTER TABLE. Regular VACUUM couldn’t fix it. VACUUM FULL of the catalog takes 2+ hours. Use of NOTIFY/LISTEN can also cause bloat.SP ogm Ceon Cfo .En Uef2 re 32 0n 1c1e
  • 33. Transaction wraparound avoidance Problem: autovacuum set off too frequently Watch age(datfrozenxid) Solution: Increase autovacuum_freeze_max_age (default is 200 million, we increase to one billion)SP ogm Ceon Cfo .En Uef2 re 33 0n 1c1e
  • 34. Upgrades 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 35. Minor upgrades Problem: Restarting Postgres causes bad application performance. • Require a start/stop of database • Unexpected CHECKPOINT • Cold cacheSP ogm Ceon Cfo .En Uef2 re 35 0n 1c1e
  • 36. Minor upgrades Solutions: • Plan for a CHECKPOINT before shutdown • Warm the cache (Queries that exercise indexes, maybe table scans)SP ogm Ceon Cfo .En Uef2 re 36 0n 1c1e
  • 37. Major Version upgrades Problem: Major upgrades are a PITA. • <8.2 - no pg_upgrade :( • Time your restores. • Document your SLAs.SP ogm Ceon Cfo .En Uef2 re 37 0n 1c1e
  • 38. Major Version upgrades Solutions: :( • >=8.3 - pg_upgrade • Time your restores. • Document your SLAs.SP ogm Ceon Cfo .En Uef2 re 38 0n 1c1e
  • 39. Major Version upgrades Solutions: :( • Write tools to migrate data • Shard • Trigger-based replicationSP ogm Ceon Cfo .En Uef2 re 39 0n 1c1e
  • 40. The Problems 1. System resource exhaustion 2. Everything is slow: Huge catalogs, Backups 3. Handling VACUUM problems: Bloat, Transaction wraparound 4. Upgrades: Minor, MajorSP ogm Ceon Cfo .En Uef2 re0n 1c1e
  • 41. The Solutions 1. System resource exhaustion Choose a better filesystem, Pooling 2. Everything is slow: Huge catalogs, Backups Don’t do that, Monitor & Binary backupsSP ogm Ceon Cfo .En Uef2 re0n 1c1e
  • 42. The Solutions 3. Handling VACUUM problems: Bloat, Transaction wraparound Developer education, Monitoring, Cleanup, *_max_freeze_age 4. Upgrades: Minor, Major Plan, Plan, Plan (CHECKPOINT, warm cache, pg_upgrade)SP ogm Ceon Cfo .En Uef2 re0n 1c1e
  • 43. Thanks! Managing Terabytes Problems and solutions with operating large Postgres installations Selena Deckelmann Prime Radiant @selenamarieSP ogm Ceon Cfo .En Uef2 re 43 0n 1c1e

×