Grabbing the PostgreSQL
 Elephant by the Trunk
Harold Giménez
awesomeful.net
@hgimenez




                 http://www.flickr.com/photos/leeh/180311941/sizes/o/
Elephants are cool
Installation
Installation
• build   from git clone
 git clone git://git.postgresql.org/git/postgresql.git
   && cd postgresql
 # (mirro...
Post Installation
Post Installation


• Init   the database cluster
  sudo -u postgres pg_ctl -D /usr/local/pgsql/data init


• Start   the ...
Post Installation
 • Change   postgres password:
  sudo -u postgres psql template0
  ALTER USER postgres WITH PASSWORD 'ne...
Post Installation

• Createa “virgin” database
 From the psql prompt
 create database my_awesome_app_development owner=hgi...
Get acquainted with psql
 psql -d db_name
 ?
 d
 d users
 l
 c
 set ECHO_HIDDEN true
 timing
 x
 q
Ruby adapters

gem install postgres-pr

gem install postgres

gem install activerecord-jdbcpostgresql-adapter

gem install...
Rails plugins and Ruby tools
Constraints, views and
other migration helpers:
http://github.com/alex3t/rails_on_pg
http://g...
database.yml

  development:
    adapter: postgresql
    encoding: unicode
    database: bostonrb_development
    min_mess...
<obligatory-MySQL-rant>
Features
                               Replication in alpha testing




   Presentation by Relevant Logic - 2 years ago
 ...
http://twitter.com/depesz_com/status/8645567175
http://twitter.com/jordibunster/status/9204222189
PostgreSQL license
     http://www.postgresql.org/about/licence

“   Permission to use, copy, modify, and distribute
    t...
</obligatory-MySQL-rant>
Features at a glance
Features at a glance
• Views                      • Query     Plan Optimizer

• Triggers                   • Procedural   ...
• Atomicity

• Consistency

• Isolation     MVCC

• Durability
Multi Version
Concurrency Control
Global Unified
Configuration Settings
GUCS

                                    How
          What
                           postgresql.conf
The set of Postgre...
shared_buffers
Available memory to postgres
        tot RAM/4
work_mem
Non-shared RAM used by query for sorts,
             hashes, etc
              32 - 64MB
maintenance_work_mem
Memory usage for vacuum, bulk data loads, etc
            256MB - 1GB
wal_buffers
Memory available to the Write-Ahead-Log
          8MB for SMPs
checkpoint_segments

How many 8MB log segments to create before a
               checkpoint
               16 to 64
synchronous_commits
     WAL gets written immediately if on,
which means in event of a crash no data is lost,
            ...
effective_io_concurrency
  Only if OS supports async IO
number of disks/channels
effective_cache_size
estimated RAM available for caching in
 shared_buffers and filesystem cache
         tot RAM * 2/3
APP PERFORMANCE
• Hardware: Storage, RAM, CPU, Network

• OS: Filsystem, Kernel

• PostgreSQL: Schema, Configuration

• App...
YesSQL
Red



Refactor         Green
SCALING

• Table   Partitioning

• Vertical   partitioning of data

• Horizontal    partitioning (sharding) - PL/Proxy

• ...
Replication
Tools



        http://www.flickr.com/photos/leelefever/2101999317/sizes/l/
Tools
• explain   analyze

• pg_stats                   • iostat

• pg_stat_user_indexes       • vmstat   5 50, vmstatx

•...
Learn More
• http://wiki.postgresql.org/

• http://www.postgresql.org/docs/current/static/

• http://pgexperts.com/documen...
?
Grabbing the PostgreSQL Elephant by the Trunk
Grabbing the PostgreSQL Elephant by the Trunk
Grabbing the PostgreSQL Elephant by the Trunk
Upcoming SlideShare
Loading in …5
×

Grabbing the PostgreSQL Elephant by the Trunk

5,049 views

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,049
On SlideShare
0
From Embeds
0
Number of Embeds
42
Actions
Shares
0
Downloads
1
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

  • Who the fuck is this clown?
  • 12 ft tall, 5+tons (12,000 lbs)
    Their trunk has over 100000 muscle units
    They eat and walk most of the day - terrible digestive system. Other animals eat their poop.
    One of the big five.
    Intimidating. Aggressive. Nobody messes with them.
    They change their teeth six times in a lifetime. At age 60 they don&amp;#x2019;t grow them again and die of starvation.
    Four toes.




















  • An expected feature, it guarantees the reliability of DB transactions.

    A: Transactional operations either succeed completely or fail completely. In Postgres DDL operations are also transactional.
    C: Database goes from one consistent state to the next.
    I: Data that is generated in interim steps of a transaction can never be seen by any other readers/queries/viewers. This is what MVCC is all about.
    D: Guarantees that once the user/client is notified of success, the data is persisted and the transaction will not be undonde.
  • MVCC - Multi Version Concurrency Control.

    An alternative to MVCC is read-locking. Every time a query reads from the DB, it locks the rows so that no other statement can change the data, and therefore it is reading &amp;#x201C;real and consistent&amp;#x201D; data. However, there can be many users reading from the same table. Hundreds of thousands in bigger websites. What if there&amp;#x2019;s a queue of hundreds of reads, and you need to update the table? The update statement must wait until all reads are done in order to issue the update. Additional reads must wait until the update is completed before retrieving new data.

    With MVCC, none of the reads need to lock the rows, and the update doesn&amp;#x2019;t require a lock either. They all execute immediately.

    Postgres keeps multiple versions (MVcc) of the data every time the data changes (to clean up older versions, use VACUUM).

  • http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server http://www.postgresql.org/docs/current/static/runtime-config-client.htmlhttp://www.postgresql.org/docs/current/static/runtime-config-resource.htmlhttp://www.pgcon.org/2008/schedule/events/104.en.htmlhttp://vimeo.com/7109722http://wiki.postgresql.org/wiki/GUCS_Overhaulhttp://dimitrik.free.fr/db_STRESS_PostgreSQL_837_and_84_May2009.html
    http://www.pgexperts.com/documents.html
    Oracle has over 500 config settings.
  • postgresql&amp;#x2019;s dedicated ram. 2nd level cache (1st level is the OS cache). Good starting point is 1/4 of available RAM. cache_miss statistics can tell you if you need more.

  • Memory limit on per query operations like sort, count, etc. If RAM swapping is high, work_mem is too high.
    In general, OS swapping == too much work_mem, while caching sorts in pg_temp == not enough work_mem.
    &amp;#xA0;&amp;#xA0;&amp;#xA0;&amp;#xA0;work_mem = 32MB


  • vacuum and analyze use it. Query expiration. Could be in the 256MB-1GB range for larger DBs.

  • Default is appropriate for one CPU, small DB. Increase it to maybe 8MB for SMPs. Whimpy default because of the linux shared mem limits.


  • Turn off, but know that there are risks as you may loose .5 seconds worth of data. If it&amp;#x2019;s on, the WAL gets written immediately.

  • number of disks or channels. Only if your OS supports async IO. Linux and FreeBSD do.

  • This setting simply hints the planner, to make better cost estimates.
  • http://vimeo.com/9889075
    So, I&amp;#x2019;ve convinced you and you will port the app to postgres. Here&amp;#x2019;s what to look after to avoid any pitfalls.

    If any one component fails, the whole system suffers.

    Hardware: I/O is a very common bottleneck. Many writes cause the transaction log to bring the process down. If the database is 3x larger than RAM -&gt; I/O bound per query.
    Use SATA, not SAS (SATA is half duplex). Raid 1+0 over Raid 5, especially for writes. More spindles, the better =&gt; Many small drives better than less bigger drives. Move transaction log to a separate disk.

    BEWARE OF CLOUD and IO!

    Low RAM is also common, and capping on memory makes the server hit the disks. bad. It is ideal to fit entire database in RAM, using shared_buffers. If that&amp;#x2019;s not possible, cache as much of the database as possible, in which case think about how big is the operation dataset that needs caching. Think about sorting, aggregates and other operations that may benefit from in-ram caching.

    CPU: Pick more CPUs over more cores on same processor. L2 cache, speed, and 64 bit.

    OS: Use something that supports direct I/O - linux, solaris, freebsd.
    * Tablespaces for large tables
    * Tuning linux:
    * XFS &amp; JFS for database files. Otherwise ext3 if in redhat (no support for XFS). UFS if in solaris.
    * Reduce logging - data-writeback, noatime, nodiratime.
    * pre 2.6.9 kernels must upgrade.
    * use deadline scheduler for write speed

    Schema:
    * 1NF. Postgres is optimized for normalized lookups. Wait for an actual issue before denormalizing.
    * You may benefit from separating out tables (1-to-1s) based on read/write patterns.
    * Indexes: FKs, where clauses, aggregations. Use expression indexes, partial indexes. Don&amp;#x2019;t over index. Don&amp;#x2019;t index small tables. Look at pg_stat_user_indexes and pg_stat_user_tables.
    * Partitioning: historical data, very big tables. Any big deletes. App must know how to deal with it, by querying the partition key as part of the WHERE criteria.
    * Query design. Do more on each query.

    * Connection pooling: pgPool, pgBouncer.
    * pg_fouine: Analyze queries. In 8.4 there&amp;#x2019;s pg_stat_statement
    * EXPLAIN_ANALYZE

    Have a mechanism for measuring changes you make. How much CPU, RAM, swapping is happening before and after tuning? Use top, vmstats, etc





  • * pgPool is more complex. Replication, distributed query management. pgBouncer is more lightweight.
    * Table Partitioning allows you to manage dead data without overhead, keeping your database healthy. Performance of course.






  • Grabbing the PostgreSQL Elephant by the Trunk

    1. 1. Grabbing the PostgreSQL Elephant by the Trunk
    2. 2. Harold Giménez awesomeful.net @hgimenez http://www.flickr.com/photos/leeh/180311941/sizes/o/
    3. 3. Elephants are cool
    4. 4. Installation
    5. 5. Installation • build from git clone git clone git://git.postgresql.org/git/postgresql.git && cd postgresql # (mirrored at git://github.com/postgres/postgres.git) ./configure && make && sudo make install • Useful configuration options: ./configure --prefix=/opt/postgresql-9.0-dev --with-pgport=8200 • port/fink/yum/apt-get • EnterpriseDB
    6. 6. Post Installation
    7. 7. Post Installation • Init the database cluster sudo -u postgres pg_ctl -D /usr/local/pgsql/data init • Start the server sudo -u postgres pg_ctl -D /usr/local/pgsql/data start (there are SysV init scripts in contrib/start-scripts)
    8. 8. Post Installation • Change postgres password: sudo -u postgres psql template0 ALTER USER postgres WITH PASSWORD 'new_password'; • Create role sudo -u postgres createuser --no-superuser --createdb -- no-createrole --login --pwprompt --encrypted -h 127.0.0.1 -p 5432 hgimenez • pg_hba.conf (host based auth) type database user cidr-address method local all all ident host all all 10.2.0.0/16 md5
    9. 9. Post Installation • Createa “virgin” database From the psql prompt create database my_awesome_app_development owner=hgimenez template=template0 encoding='utf8'; From your shell createdb -U postgres -O db_user -T template0 -E 'utf8' db_name
    10. 10. Get acquainted with psql psql -d db_name ? d d users l c set ECHO_HIDDEN true timing x q
    11. 11. Ruby adapters gem install postgres-pr gem install postgres gem install activerecord-jdbcpostgresql-adapter gem install pg http://www.flickr.com/photos/jurvetson/60685364/sizes/s/
    12. 12. Rails plugins and Ruby tools Constraints, views and other migration helpers: http://github.com/alex3t/rails_on_pg http://github.com/matthuhiggins/foreigner http://agilewebdevelopment.com/plugins/redhillonrails_core Full text search (tsearch2): http://github.com/pka/acts_as_tsearch http://github.com/tenderlove/texticle Backups: http://github.com/meskyanichi/backup The database toolkit: http://github.com/jeremyevans/sequel
    13. 13. database.yml development: adapter: postgresql encoding: unicode database: bostonrb_development min_messages: warning
    14. 14. <obligatory-MySQL-rant>
    15. 15. Features Replication in alpha testing Presentation by Relevant Logic - 2 years ago http://www.slideshare.net/gisborne/postgres-presentation-presentation
    16. 16. http://twitter.com/depesz_com/status/8645567175
    17. 17. http://twitter.com/jordibunster/status/9204222189
    18. 18. PostgreSQL license http://www.postgresql.org/about/licence “ Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies. ”
    19. 19. </obligatory-MySQL-rant>
    20. 20. Features at a glance
    21. 21. Features at a glance • Views • Query Plan Optimizer • Triggers • Procedural Languages • Rules • Full text search • Constraints • PITR, Warm Standby • ACID, transactions, MVCC • Replication • PostGIS • GUCS
    22. 22. • Atomicity • Consistency • Isolation MVCC • Durability
    23. 23. Multi Version Concurrency Control
    24. 24. Global Unified Configuration Settings
    25. 25. GUCS How What postgresql.conf The set of PostgreSQL’s configuration parameters. SET (and SHOW) Global or session select * from pg_settings Performance, security, backend admin command line switches
    26. 26. shared_buffers Available memory to postgres tot RAM/4
    27. 27. work_mem Non-shared RAM used by query for sorts, hashes, etc 32 - 64MB
    28. 28. maintenance_work_mem Memory usage for vacuum, bulk data loads, etc 256MB - 1GB
    29. 29. wal_buffers Memory available to the Write-Ahead-Log 8MB for SMPs
    30. 30. checkpoint_segments How many 8MB log segments to create before a checkpoint 16 to 64
    31. 31. synchronous_commits WAL gets written immediately if on, which means in event of a crash no data is lost, guaranteed. off
    32. 32. effective_io_concurrency Only if OS supports async IO number of disks/channels
    33. 33. effective_cache_size estimated RAM available for caching in shared_buffers and filesystem cache tot RAM * 2/3
    34. 34. APP PERFORMANCE • Hardware: Storage, RAM, CPU, Network • OS: Filsystem, Kernel • PostgreSQL: Schema, Configuration • Application: Queries, Transactions
    35. 35. YesSQL
    36. 36. Red Refactor Green
    37. 37. SCALING • Table Partitioning • Vertical partitioning of data • Horizontal partitioning (sharding) - PL/Proxy • PITR, Replication • Manage number of connections with pgBouncer
    38. 38. Replication
    39. 39. Tools http://www.flickr.com/photos/leelefever/2101999317/sizes/l/
    40. 40. Tools • explain analyze • pg_stats • iostat • pg_stat_user_indexes • vmstat 5 50, vmstatx • pg_stat_user_tables • top • pg_statio_user_tables • pgtop • pg_dump & pg_restore • dd, bonnie++
    41. 41. Learn More • http://wiki.postgresql.org/ • http://www.postgresql.org/docs/current/static/ • http://pgexperts.com/documents.html • http://wiki.postgresql.org/images/4/45/Explaining_EXPLAIN.pdf • #postgresql, mailing lists
    42. 42. ?

    ×