Managing Terabytes
                          Problems and solutions with
                       operating large Postgres installations
                                Selena Deckelmann
                                   Prime Radiant
                                  @selenamarie
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                         1
               0n
                1c1e
About me.



                          2
                            1c1e
                         re0n
                      Uef2
                    .En
                  Cfo
               Ceon
             ogm
            SP
The Environment

                       • 1.6 TB, 1 cluster,Version 8.2
                       • 1.1 TB, 1 cluster,Version 8.3
                       • 8.4/9.0 Dev systems
                       • Working toward 9.0 into prod (May 2011)
                       • pgpool, Redis, RabbitMQ, NFS
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                           3
               0n
                1c1e
Some stats

                       • daily peak: ~3000 commits per second
                       • average writes: 4 MBps
                       • average reads: 8 MBps
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                           4
               0n
                1c1e
What’s good

                       • Most queries are fast!
                       • Benchmarks say we’re pushing the limits of
                         the hardware
                       • Developers love working with Postgres
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re0n
                1c1e
And lots more. But...
                                        1c1e
                                     re0n
                                  Uef2
                                .En
                              Cfo
                           Ceon
                         ogm
                        SP
1c1e
             re0n
          Uef2
        .En
      Cfo
   Ceon
 ogm
SP
The Problems

                  1. System resource exhaustion
                  2. Everything is slow: Huge catalogs, Backups
                  3. Handling VACUUM problems: Bloat,
                     Transaction wraparound
                  4. Upgrades: Minor, Major
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re0n
                1c1e
System Resource Exhaustion
                                             1c1e
                                          re0n
                                       Uef2
                                     .En
                                   Cfo
                                Ceon
                              ogm
                             SP
Running out of inodes

                       Problem: UFS on Solaris
                       “The only way to add more inodes to a UFS
                       filesystem is: 1. destroy the filesystem and create a
                       new filesystem with a higher inode density 2. enlarge
                       the filesystem - growfs man page”
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                              10
               0n
                1c1e
Running out of inodes

                       Solution 0: Delete files.
                       Solution 1: Sharding/bigger filesystem
                       Solution 2: xfs
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                         11
               0n
                1c1e
Running out of
                           file descriptors
                       Problem: Too many open files
                       by the database.
                       selena@lulu:~ #508 18:43 :)
                       sudo lsof -p 19121 | wc
                           40     355        4151

                       Solution: You need a connection
                       pooler.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                        12
               0n
                1c1e
Running out of
                           file descriptors
                       Solution: You need a connection
                       pooler.
                       Recommended:
                       pgbouncer (threaded, online upgrade)
                       pgpool-II (failover)
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                       13
               0n
                1c1e
Everything is slow.
                                      1c1e
                                   re0n
                                Uef2
                              .En
                            Cfo
                         Ceon
                       ogm
                      SP
Huge Catalogs


                       409,994 tables
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                        15
               0n
                1c1e
Maintenance problem
                       Minor mistake in parent table definitions:

                       not null default
                       nextval('important_sequence'::text)

                       vs

                       not null default
                       nextval('important_sequence'::regclass)
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                           16
               0n
                1c1e
Huge Catalogs
                       Problem: Slow scans of catalog data


                       Solution:
                       Upgrade to Postgres 8.4 or higher


                       But really: Avoid making a cluster with >400k
                       tables.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                           17
               0n
                1c1e
Stats collection

                       9,019,868 total data points for table stats
                       4,550,770 total data points for index stats
                       Problem: This is slow to write.
                       (128 MB written every second or so)
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                            18
               0n
                1c1e
Stats collection

                       9,019,868 total data points for table stats
                       4,550,770 total data points for index stats
                       Soution: Move stats file to RAM.
                       stats_temp_directory    (8.4 or higher)
                       There’s a trivial patch for earlier versions.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                             19
               0n
                1c1e
Stats collection

                       9,019,868 total data points for table stats
                       4,550,770 total data points for index stats
                       Problem: This is slow to read.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                            20
               0n
                1c1e
Stats collection
                       9,019,868 total data points for table stats
                       4,550,770 total data points for index stats
                       Solution:
                       Supposedly, this is better in 8.4 and higher.
                       (fewer writes per minute)
                       Still probably not fast.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                             21
               0n
                1c1e
Backups


                       pg_dump takes longer and longer...
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                       22
               0n
                1c1e
 
                           backup          |   duration
                        -------------------+--------------------
                            2009­11­22     |  02:44:36.821475
                            2009­11­23     |  02:46:20.003507
                            2009­11­24     |  02:47:06.260705
                            2009­12­06     |  07:13:04.174964
                            2009­12­13     |  05:00:01.082676
                            2009­12­20     |  06:24:49.433043
                            2009­12­27     |  05:35:20.551477
                            2010­01­03     |  07:36:49.651492
                            2010­01­10     |  05:55:02.396163
                            2010­01­17     |  07:32:33.277559
                            2010­01­24     |  06:22:46.522319
                            2010­01­31     |  10:48:13.060888
                            2010­02­07     |  21:21:47.77618
                            2010­02­14     |  14:32:04.638267
                            2010­02­21     |  11:34:42.353244
                            2010­02­28     |  11:13:02.102345
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                        23
               0n
                1c1e
Backups
                       Problem: pg_dump is too slow.
                       Solutions:
                       • patching pg_dump for SELECT ... LIMIT
                       • crank down shared_buffers
                       • Stop using pg_dump for backups
                       • 64-bit might help
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                         24
               0n
                1c1e
How not to migrate
                        to a 64-bit system
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                25
               0n
                1c1e
Title Text



                       Install 32-bit Postgres and libraries on a 64-bit system.
                       Install 64-bit Postgres/libs of the same version.
                       Copy “hot backup” from 32-bit sys over to 64-bit sys.
                       Run pg_dump from 64-bit version on 32-bit Postgres.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                                  26
               0n
                1c1e
A single warm standby
                          is not a backup.


                       But lots of people use them that way!
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                          27
               0n
                1c1e
Ship WAL from Solaris x86 -> Linux
                                  It did work!
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                    28
               0n
                1c1e
Handling VACUUM problems
                                           1c1e
                                        re0n
                                     Uef2
                                   .En
                                 Cfo
                              Ceon
                            ogm
                           SP
Bloat

                       Problem: Lots of dead tuples in tables.

                       • Frequent UPDATEs to long tables of log
                          data
                       • Frequent DELETEs without a VACUUM
                       • A terabyte of dead tuples
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                           30
               0n
                1c1e
Fixing bloat

                       Solution: Write custom scripts to clean

                       • VACUUM for small things
                       • CLUSTER for everything else
                       • Considered TRUNCATE
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                         31
               0n
                1c1e
Catalog Bloat

                       Application allowed users to initiate ALTER
                       TABLE.

                       Regular VACUUM couldn’t fix it.
                       VACUUM FULL   of the catalog takes 2+ hours.
                       Use of NOTIFY/LISTEN can also cause bloat.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                             32
               0n
                1c1e
Transaction
                       wraparound avoidance
                       Problem: autovacuum set off too
                       frequently
                       Watch age(datfrozenxid)
                       Solution:
                       Increase autovacuum_freeze_max_age
                       (default is 200 million, we increase to one
                       billion)
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                            33
               0n
                1c1e
Upgrades
                           1c1e
                        re0n
                     Uef2
                   .En
                 Cfo
              Ceon
            ogm
           SP
Minor upgrades

                       Problem: Restarting Postgres causes bad
                       application performance.
                       • Require a start/stop of database
                         • Unexpected CHECKPOINT
                         • Cold cache
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                          35
               0n
                1c1e
Minor upgrades

                       Solutions:
                         • Plan for a CHECKPOINT before
                            shutdown
                         • Warm the cache (Queries that
                            exercise indexes, maybe table scans)
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                          36
               0n
                1c1e
Major Version upgrades

                       Problem: Major upgrades are a PITA.
                         • <8.2 - no pg_upgrade :(
                         • Time your restores.
                         • Document your SLAs.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                         37
               0n
                1c1e
Major Version upgrades

                       Solutions: :(
                         • >=8.3 - pg_upgrade
                         • Time your restores.
                         • Document your SLAs.
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                       38
               0n
                1c1e
Major Version upgrades

                       Solutions: :(
                         • Write tools to migrate data
                         • Shard
                         • Trigger-based replication
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                         39
               0n
                1c1e
The Problems

                  1. System resource exhaustion
                  2. Everything is slow: Huge catalogs, Backups
                  3. Handling VACUUM problems: Bloat,
                     Transaction wraparound
                  4. Upgrades: Minor, Major
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re0n
                1c1e
The Solutions

                  1. System resource exhaustion
                     Choose a better filesystem, Pooling
                  2. Everything is slow: Huge catalogs, Backups
                     Don’t do that, Monitor & Binary
                     backups
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re0n
                1c1e
The Solutions

                  3. Handling VACUUM problems: Bloat,
                     Transaction wraparound
                     Developer education, Monitoring,
                     Cleanup, *_max_freeze_age
                  4. Upgrades: Minor, Major
                     Plan, Plan, Plan
                       (CHECKPOINT, warm cache, pg_upgrade)
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re0n
                1c1e
Thanks!
                           Managing Terabytes
                          Problems and solutions with
                       operating large Postgres installations
                                Selena Deckelmann
                                   Prime Radiant
                                  @selenamarie
SP
 ogm
   Ceon
      Cfo
        .En
          Uef2
             re




                                         43
               0n
                1c1e

Managing terabytes

  • 1.
    Managing Terabytes Problems and solutions with operating large Postgres installations Selena Deckelmann Prime Radiant @selenamarie SP ogm Ceon Cfo .En Uef2 re 1 0n 1c1e
  • 2.
    About me. 2 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 3.
    The Environment • 1.6 TB, 1 cluster,Version 8.2 • 1.1 TB, 1 cluster,Version 8.3 • 8.4/9.0 Dev systems • Working toward 9.0 into prod (May 2011) • pgpool, Redis, RabbitMQ, NFS SP ogm Ceon Cfo .En Uef2 re 3 0n 1c1e
  • 4.
    Some stats • daily peak: ~3000 commits per second • average writes: 4 MBps • average reads: 8 MBps SP ogm Ceon Cfo .En Uef2 re 4 0n 1c1e
  • 5.
    What’s good • Most queries are fast! • Benchmarks say we’re pushing the limits of the hardware • Developers love working with Postgres SP ogm Ceon Cfo .En Uef2 re0n 1c1e
  • 6.
    And lots more.But... 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 7.
    1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 8.
    The Problems 1. System resource exhaustion 2. Everything is slow: Huge catalogs, Backups 3. Handling VACUUM problems: Bloat, Transaction wraparound 4. Upgrades: Minor, Major SP ogm Ceon Cfo .En Uef2 re0n 1c1e
  • 9.
    System Resource Exhaustion 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 10.
    Running out ofinodes Problem: UFS on Solaris “The only way to add more inodes to a UFS filesystem is: 1. destroy the filesystem and create a new filesystem with a higher inode density 2. enlarge the filesystem - growfs man page” SP ogm Ceon Cfo .En Uef2 re 10 0n 1c1e
  • 11.
    Running out ofinodes Solution 0: Delete files. Solution 1: Sharding/bigger filesystem Solution 2: xfs SP ogm Ceon Cfo .En Uef2 re 11 0n 1c1e
  • 12.
    Running out of file descriptors Problem: Too many open files by the database. selena@lulu:~ #508 18:43 :) sudo lsof -p 19121 | wc 40 355 4151 Solution: You need a connection pooler. SP ogm Ceon Cfo .En Uef2 re 12 0n 1c1e
  • 13.
    Running out of file descriptors Solution: You need a connection pooler. Recommended: pgbouncer (threaded, online upgrade) pgpool-II (failover) SP ogm Ceon Cfo .En Uef2 re 13 0n 1c1e
  • 14.
    Everything is slow. 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 15.
    Huge Catalogs 409,994 tables SP ogm Ceon Cfo .En Uef2 re 15 0n 1c1e
  • 16.
    Maintenance problem Minor mistake in parent table definitions: not null default nextval('important_sequence'::text) vs not null default nextval('important_sequence'::regclass) SP ogm Ceon Cfo .En Uef2 re 16 0n 1c1e
  • 17.
    Huge Catalogs Problem: Slow scans of catalog data Solution: Upgrade to Postgres 8.4 or higher But really: Avoid making a cluster with >400k tables. SP ogm Ceon Cfo .En Uef2 re 17 0n 1c1e
  • 18.
    Stats collection 9,019,868 total data points for table stats 4,550,770 total data points for index stats Problem: This is slow to write. (128 MB written every second or so) SP ogm Ceon Cfo .En Uef2 re 18 0n 1c1e
  • 19.
    Stats collection 9,019,868 total data points for table stats 4,550,770 total data points for index stats Soution: Move stats file to RAM. stats_temp_directory (8.4 or higher) There’s a trivial patch for earlier versions. SP ogm Ceon Cfo .En Uef2 re 19 0n 1c1e
  • 20.
    Stats collection 9,019,868 total data points for table stats 4,550,770 total data points for index stats Problem: This is slow to read. SP ogm Ceon Cfo .En Uef2 re 20 0n 1c1e
  • 21.
    Stats collection 9,019,868 total data points for table stats 4,550,770 total data points for index stats Solution: Supposedly, this is better in 8.4 and higher. (fewer writes per minute) Still probably not fast. SP ogm Ceon Cfo .En Uef2 re 21 0n 1c1e
  • 22.
    Backups pg_dump takes longer and longer... SP ogm Ceon Cfo .En Uef2 re 22 0n 1c1e
  • 23.
       backup     |   duration -------------------+--------------------  2009­11­22  |  02:44:36.821475   2009­11­23  |  02:46:20.003507  2009­11­24  |  02:47:06.260705  2009­12­06  |  07:13:04.174964  2009­12­13  |  05:00:01.082676  2009­12­20  |  06:24:49.433043  2009­12­27  |  05:35:20.551477  2010­01­03  |  07:36:49.651492  2010­01­10  |  05:55:02.396163  2010­01­17  |  07:32:33.277559  2010­01­24  |  06:22:46.522319  2010­01­31  |  10:48:13.060888  2010­02­07  |  21:21:47.77618  2010­02­14  |  14:32:04.638267  2010­02­21  |  11:34:42.353244  2010­02­28  |  11:13:02.102345 SP ogm Ceon Cfo .En Uef2 re 23 0n 1c1e
  • 24.
    Backups Problem: pg_dump is too slow. Solutions: • patching pg_dump for SELECT ... LIMIT • crank down shared_buffers • Stop using pg_dump for backups • 64-bit might help SP ogm Ceon Cfo .En Uef2 re 24 0n 1c1e
  • 25.
    How not tomigrate to a 64-bit system SP ogm Ceon Cfo .En Uef2 re 25 0n 1c1e
  • 26.
    Title Text Install 32-bit Postgres and libraries on a 64-bit system. Install 64-bit Postgres/libs of the same version. Copy “hot backup” from 32-bit sys over to 64-bit sys. Run pg_dump from 64-bit version on 32-bit Postgres. SP ogm Ceon Cfo .En Uef2 re 26 0n 1c1e
  • 27.
    A single warmstandby is not a backup. But lots of people use them that way! SP ogm Ceon Cfo .En Uef2 re 27 0n 1c1e
  • 28.
    Ship WAL fromSolaris x86 -> Linux It did work! SP ogm Ceon Cfo .En Uef2 re 28 0n 1c1e
  • 29.
    Handling VACUUM problems 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 30.
    Bloat Problem: Lots of dead tuples in tables. • Frequent UPDATEs to long tables of log data • Frequent DELETEs without a VACUUM • A terabyte of dead tuples SP ogm Ceon Cfo .En Uef2 re 30 0n 1c1e
  • 31.
    Fixing bloat Solution: Write custom scripts to clean • VACUUM for small things • CLUSTER for everything else • Considered TRUNCATE SP ogm Ceon Cfo .En Uef2 re 31 0n 1c1e
  • 32.
    Catalog Bloat Application allowed users to initiate ALTER TABLE. Regular VACUUM couldn’t fix it. VACUUM FULL of the catalog takes 2+ hours. Use of NOTIFY/LISTEN can also cause bloat. SP ogm Ceon Cfo .En Uef2 re 32 0n 1c1e
  • 33.
    Transaction wraparound avoidance Problem: autovacuum set off too frequently Watch age(datfrozenxid) Solution: Increase autovacuum_freeze_max_age (default is 200 million, we increase to one billion) SP ogm Ceon Cfo .En Uef2 re 33 0n 1c1e
  • 34.
    Upgrades 1c1e re0n Uef2 .En Cfo Ceon ogm SP
  • 35.
    Minor upgrades Problem: Restarting Postgres causes bad application performance. • Require a start/stop of database • Unexpected CHECKPOINT • Cold cache SP ogm Ceon Cfo .En Uef2 re 35 0n 1c1e
  • 36.
    Minor upgrades Solutions: • Plan for a CHECKPOINT before shutdown • Warm the cache (Queries that exercise indexes, maybe table scans) SP ogm Ceon Cfo .En Uef2 re 36 0n 1c1e
  • 37.
    Major Version upgrades Problem: Major upgrades are a PITA. • <8.2 - no pg_upgrade :( • Time your restores. • Document your SLAs. SP ogm Ceon Cfo .En Uef2 re 37 0n 1c1e
  • 38.
    Major Version upgrades Solutions: :( • >=8.3 - pg_upgrade • Time your restores. • Document your SLAs. SP ogm Ceon Cfo .En Uef2 re 38 0n 1c1e
  • 39.
    Major Version upgrades Solutions: :( • Write tools to migrate data • Shard • Trigger-based replication SP ogm Ceon Cfo .En Uef2 re 39 0n 1c1e
  • 40.
    The Problems 1. System resource exhaustion 2. Everything is slow: Huge catalogs, Backups 3. Handling VACUUM problems: Bloat, Transaction wraparound 4. Upgrades: Minor, Major SP ogm Ceon Cfo .En Uef2 re0n 1c1e
  • 41.
    The Solutions 1. System resource exhaustion Choose a better filesystem, Pooling 2. Everything is slow: Huge catalogs, Backups Don’t do that, Monitor & Binary backups SP ogm Ceon Cfo .En Uef2 re0n 1c1e
  • 42.
    The Solutions 3. Handling VACUUM problems: Bloat, Transaction wraparound Developer education, Monitoring, Cleanup, *_max_freeze_age 4. Upgrades: Minor, Major Plan, Plan, Plan (CHECKPOINT, warm cache, pg_upgrade) SP ogm Ceon Cfo .En Uef2 re0n 1c1e
  • 43.
    Thanks! Managing Terabytes Problems and solutions with operating large Postgres installations Selena Deckelmann Prime Radiant @selenamarie SP ogm Ceon Cfo .En Uef2 re 43 0n 1c1e