• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Pro PostgreSQL
 

Pro PostgreSQL

on

  • 6,580 views

This is the version of my "Pro PostgreSQL" talk given at OSCon 2007

This is the version of my "Pro PostgreSQL" talk given at OSCon 2007

Statistics

Views

Total Views
6,580
Views on SlideShare
6,567
Embed Views
13

Actions

Likes
6
Downloads
0
Comments
0

1 Embed 13

http://www.slideshare.net 13

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Pro PostgreSQL Pro PostgreSQL Presentation Transcript

    • Pro PostgreSQL Robert Treat omniti.com brighterlamp.org
    • Who Am I? (Why Listen To Me) A-0
      • PostgreSQL User Since 6.5.x
      • DBA of High Traffic / Large PostgreSQL Instances
      • Long Time Contributor to PostgreSQL Project
      • Contribute / Maintain Several Open Source Projects
      • Co-Author Beginning PHP & PostgreSQL 8 (Apress)
    • Outline A-1
      • Installation
      • Upgrading
      • Configuration
      • Routine maintenance
      • Replication / Availability
      • Advanced SQL
      • Query tuning
      • Indexing
      • Tablespaces
      • Partitioning
      http://pgfoundry.org/projects/dbsamples/
    • C-0 Get Off To A Good Start
    • C-1 Get Off To A Good Start Use package management
      • Consistent
      • Standardized
      • Simple
      • Different across systems
      • Upgrades are an issue
      • Trust your packager?
      C-2 Get Off To A Good Start Use package management
      • Different across systems
      • Upgrades are an issue
      • Trust your packager?
      C-3 Get Off To A Good Start Use package management Don't Be Afraid To Roll Your Own
    • C-4 Get Off To A Good Start
      • $PGDATA/pg_log
      • /var/log/pgsql
      • when in doubt... (postgresql.conf)
      Configure Logging Logging is often overlooked, but is the first step toward troubleshooting!
    • C-5 Get Off To A Good Start
      • most systems have different defaults
      • firewalls/ selinux (FATAL)
      • rtfm (pg_hba.conf, grant, revoke)
      Configure Authentication
    • C-6 Get Off To A Good Start
      • TRUST
      • md5
      • IDENT
      Authentication Methods
    • C-7 Get Off To A Good Start
      • trust these more than your own code
      • package dependent
      • use different schemas
      • tsearch2, pgcrypto
      • pgstatstuple, pg_buffercache, pg_freespacemap
      /contrib
    • C-8 Get Off To A Good Start
      • package dependent
      • some are non-core (plruby, plr, plphp)
      • varying functionality
      • varying levels of trust
      • don't be afraid, test!
      procedural languages
    • D-0 Let's Talk About Upgrades
    • D-1 Let's Talk About Upgrades Versioning First Digit ( 7 .4.16 -> 8 .2.0) Second Digit (8.2.4 -> 8.3.0) Third Digit (8.3.0 -> 8.3.1)
    • D-2 Let's Talk About Upgrades Versioning First Digit (7.4.16 -> 8.2.0) Second Digit (8. 2 .4 -> 8. 3 .0) Third Digit (8.3.0 -> 8.3.1)
    • D-3 Let's Talk About Upgrades Versioning First Digit (7.4.16 -> 8.2.0) Second Digit (8.2.4 -> 8.3.0) Third Digit (8.3. 0 -> 8.3. 1 )
    • D-4 Let's Talk About Upgrades Achtung!!
      • Make Backups!
      • Read the Release Notes!
    • D-5 Let's Talk About Upgrades pg_dump/pg_restore
      • simple
      • -Fc is your friend
      • dump with new version of pg_dump
      • pitfalls (time, hdd)
    • D-6 Let's Talk About Upgrades the slony method
      • not simple
      • create slave on new version
      • switchover (switch back?)
      • pitfalls (initial synch, compatibility)
    • D-7 Let's Talk About Upgrades pg_migrator
      • in place upgrades
      • rewrites system catalog info
      • no way to go back (fs snapshots)
      • still new, getting better
      • 8.1 -> 8.2 only (for now)
    • D-8 Let's Talk About Upgrades upgrading older db
      • <= 7.2 is no longer supported (upgrade now!)
      • pg_dump 8.2 has issues with <= 7.2
          • you can upgrade to 7.3 first
          • use adddepends on 7.3 install
      • slony requires 7.3 (or 7.4) or newer
      • pg_migrator (lol)
    • E-0 Figure Your Configure
    • E-1 Figure Your Configure the basics : performance
      • effective_cache_size
      • shared_buffers
      • default_statistics_target
      • sort_mem
      • checkpoint_segments
      • checkpoint_timeout
    • E-2 Figure Your Configure the basics : logging
      • stderr/pg_log vs. syslog/eventlog
      • log_min_error_statement (error!)
      • log_min_duration_statement
      • log_line_prefix (%d, %p, %t)
    • E-5 Figure Your Configure other stuff worth looking at
      • maintenance_work_mem
      • max_prepared_transactions
      • update_process_title
      • max_fsm_pages
    • P-0 Routine Maintenance
    • P-1 Routine Maintenance a word about vacuum
      • reclaim usable space
      • update table stats
      • avoid xid wraparound
    • P-2 Routine Maintenance autovacuum : just do it!
      • autovacuum
      • stats_start_collector
      • stats_row_level
      • pg_autovacuum ?
    • P-3 Routine Maintenance other stuff worth looking at
      • reindexing
      • logfiles
      • backups
      • failover
    • G-0 Availability
    • G-1 Availability what do we mean by availability?
      • not backups (exactly)
      • not replication (necessarily)
      • not clustering (even less so)
    • G-2 Availability what do we mean by availability? if (kablooy) then (ok)
      • not backups (exactly)
      • not replication (necessarily)
      • not clustering (even less so)
    • G-3 Availability pg_dump
      • traditionally used for backups
      • send dump to another server
      • constantly run restore process
      • large time, i/o constraints
    • G-4 Availability pitr
      • create second, standby server
      • ship wal logs to new server
      • less time/io than pg_dump
      • 8.1 -> cold standby
      • 8.2 -> warm standby
      • 8.4 -> hot standby ?
    • G-5 Availability slony
      • asynchronous, master-slave replication
      • controlled switchover, failover
      • low i/o, time constraints
      • other benefits (upgrades, scaling)
    • G-5 Availability bucardo
      • asynchronous, multi-master replication
      • conflict resolution
      • low i/o, time constraints
      • other benefits (upgrades, scaling)
    • G-6 Availability shared disk
      • one copy of PGDATA on shared storage
      • standby takes over akin to db crash
      • shared disk is point of failure (raid)
      • must ensure only one postmaster running
    • G-7 Availability filesystem replication
      • drbd, zfs
      • filesystem mirrored between servers
      • synchronized, ordered writes
      • single disk system?
    • G-8 Availability pgpool
      • dual-master, statement based
      • little caveats (random(),now(),sequences)
      • bigger caveats (security, password, pg_hba)
      • pgpool becomes failure point
    • I-0 Beyond Simple SQL
    • I-1 Beyond Simple SQL generate series pagila=# select * from generate_series(1,3); generate_series ----------------- 1 2 3 (3 rows)
      • behold the power of loops!
    • I-2 Beyond Simple SQL generate series
      • extrapolate many uses
      pagila=# select '2007-05-22 09:00:00'::timestamp - x * '1 hour'::interval as countdown from generate_series(1,10) x; countdown --------------------- 2007-05-22 08:00:00 2007-05-22 07:00:00 2007-05-22 06:00:00 2007-05-22 05:00:00 2007-05-22 04:00:00 2007-05-22 03:00:00 2007-05-22 02:00:00 2007-05-22 01:00:00 2007-05-22 00:00:00 2007-05-21 23:00:00 (10 rows)
    • I-3 Beyond Simple SQL rownum()
    • I-4 Beyond Simple SQL row numbering
    • I-5 Beyond Simple SQL row numbering pagila=# select (select count(*)+1 from customer where customer_id < x.customer_id) as rownum, customer_id, first_name, last_name from customer x limit 5; rownum | customer_id | first_name | last_name --------+-------------+------------+----------- 1 | 1 | MARY | SMITH 2 | 3 | LINDA | WILLIAMS 3 | 4 | BARBARA | JONES 4 | 5 | ELIZABETH | BROWN 5 | 6 | JENNIFER | DAVIS (5 rows)
    • I-6 Beyond Simple SQL row numbering pagila=# select (select count(*)+1 from customer where customer_id < x.customer_id) as rownum, customer_id, first_name, last_name from customer x limit 5; rownum | customer_id | first_name | last_name --------+-------------+------------+----------- 1 | 1 | MARY | SMITH 2 | 3 | LINDA | WILLIAMS 3 | 4 | BARBARA | JONES 4 | 5 | ELIZABETH | BROWN 5 | 6 | JENNIFER | DAVIS (5 rows)
    • I-7 Beyond Simple SQL row numbering pagila=# select (select count(*)+1 from customer where customer_id < x.customer_id) as rownum, customer_id, first_name, last_name from customer x limit 5; rownum | customer_id | first_name | last_name --------+-------------+------------+----------- 1 | 1 | MARY | SMITH 2 | 3 | LINDA | WILLIAMS 3 | 4 | BARBARA | JONES 4 | 5 | ELIZABETH | BROWN 5 | 6 | JENNIFER | DAVIS (5 rows)
      • fill in missing gaps
    • I-8 Beyond Simple SQL row numbering
      • watch out for order by
      pagila=# select (select count(*)+1 from customer where last_name < x.last_name) as rownum, first_name, last_name from customer x order by last_name limit 5; rownum | first_name | last_name --------+------------+----------- 1 | RAFAEL | ABNEY 2 | NATHANIEL | ADAM 3 | KATHLEEN | ADAMS 4 | DIANA | ALEXANDER 5 | GORDON | ALLARD (5 rows)
    • I-9 Beyond Simple SQL rollup() pagila=# select customer_id, amount from payment limit 10; customer_id | amount -------------+-------- 267 | 7.98 267 | 0.00 269 | 3.98 269 | 0.00 274 | 0.99 279 | 4.99 282 | 0.99 284 | 5.98 284 | 0.00 287 | 0.99 (10 rows)
    • I-10 Beyond Simple SQL rollup() pagila=# select customer_id, amount, total from payment JOIN (select customer_id, sum(amount) as total from payment group by customer_id) x using (customer_id) limit 10; customer_id | amount | total -------------+--------+-------- 267 | 7.98 | 159.64 267 | 0.00 | 159.64 269 | 3.98 | 129.70 269 | 0.00 | 129.70 274 | 0.99 | 152.65 279 | 4.99 | 135.69 282 | 0.99 | 103.73 284 | 5.98 | 126.72 284 | 0.00 | 126.72 287 | 0.99 | 115.71 (10 rows)
      • Include totals with given row
    • I-11 Beyond Simple SQL rank() SELECT * FROM (select c1.first_name, c1.last_name, c1.store_id, p1.total, (select 1 + count(*) from customer c2 join (select customer_id, sum(amount) as total from only payment group by customer_id) p2 using (customer_id) where c2.store_id = c1.store_id and p2.total > p1.total) as rank from customer c1 join (select customer_id, sum(amount) as total from only payment group by customer_id) p1 using (customer_id) ) x WHERE x.rank <= 3 ORDER BY x.store_id, x.rank; first_name | last_name | store_id | total | rank ------------+-----------+----------+--------+------ ELEANOR | HUNT | 1 | 216.54 | 1 CLARA | SHAW | 1 | 195.58 | 2 TOMMY | COLLAZO | 1 | 186.62 | 3 KARL | SEAL | 2 | 221.55 | 1 MARION | SNYDER | 2 | 194.61 | 2 RHONDA | KENNEDY | 2 | 194.61 | 2 (6 rows)
    • I-12 Beyond Simple SQL SELECT * FROM (SELECT c1.first_name, c1.last_name, c1.store_id, p1.total, (SELECT 1 + count(*) FROM customer c2 JOIN (SELECT customer_id, sum(amount) AS total FROM payment GROUP BY customer_id) p2 USING (customer_id) WHERE c2.store_id = c1.store_id AND p2.total > p1.total ) AS rank FROM customer c1 JOIN (SELECT customer_id, sum(amount) AS total FROM payment GROUP BY customer_id) p1 USING (customer_id) ) x WHERE x.rank <= 3 ORDER BY x.store_id, x.rank;
    • I-13 Beyond Simple SQL SELECT * FROM (SELECT c1.first_name, c1.last_name, c1.store_id, p1.total, (SELECT 1 + count(*) FROM customer c2 JOIN (SELECT customer_id, sum(amount) AS total FROM payment GROUP BY customer_id) p2 USING (customer_id) WHERE c2.store_id = c1.store_id AND p2.total > p1.total ) AS rank FROM customer c1 JOIN (SELECT customer_id, sum(amount) AS total FROM payment GROUP BY customer_id) p1 USING (customer_id) ) x WHERE x.rank <= 3 ORDER BY x.store_id, x.rank;
    • I-14 Beyond Simple SQL SELECT * FROM (SELECT c1.first_name, c1.last_name, c1.store_id, p1.total, (SELECT 1 + count(*) FROM customer c2 JOIN (SELECT customer_id, sum(amount) AS total FROM payment GROUP BY customer_id) p2 USING (customer_id) WHERE c2.store_id = c1.store_id AND p2.total > p1.total ) AS rank FROM customer c1 JOIN (SELECT customer_id, sum(amount) AS total FROM payment GROUP BY customer_id) p1 USING (customer_id) ) x WHERE x.rank <= 3 ORDER BY x.store_id, x.rank;
    • I-15 Beyond Simple SQL SELECT * FROM (SELECT c1.first_name, c1.last_name, c1.store_id, p1.total, (SELECT 1 + count(*) FROM customer c2 JOIN (SELECT customer_id, sum(amount) AS total FROM payment GROUP BY customer_id) p2 USING (customer_id) WHERE c2.store_id = c1.store_id AND p2.total > p1.total ) AS rank FROM customer c1 JOIN (SELECT customer_id, sum(amount) AS total FROM payment GROUP BY customer_id) p1 USING (customer_id) ) x WHERE x.rank <= 3 ORDER BY x.store_id, x.rank;
    • I-16 Beyond Simple SQL SELECT * FROM (SELECT c1.first_name, c1.last_name, c1.store_id, p1.total, (SELECT 1 + count(*) FROM customer c2 JOIN (SELECT customer_id, sum(amount) AS total FROM payment GROUP BY customer_id) p2 USING (customer_id) WHERE c2.store_id = c1.store_id AND p2.total > p1.total ) AS rank FROM customer c1 JOIN (SELECT customer_id, sum(amount) AS total FROM payment GROUP BY customer_id) p1 USING (customer_id) ) x WHERE x.rank <= 3 ORDER BY x.store_id, x.rank;
    • J-0 Query Your Queries
    • J-1 Query Your Queries finding slow queries: log_min_duration_statement
      • -1, 0 , n
      • superuser only
      • alter user
      LOG: duration: 5005.273 ms statement: select pg_sleep(5);
    • J-2 Query Your Queries finding slow queries: pgfouine / pqa
      • log analyzers
      • command line, generate reports
      • i/o load
      http://pgfouine.projects.postgresql.org/reports.html http://pqa.projects.postgresql.org/example.html
    • J-3 Query Your Queries finding slow queries: pg_stat_all_tables pagila=# d pg_stat_all_tables View &quot;pg_catalog.pg_stat_all_tables&quot; Column | Type | ----------------+-------------------------- relid | oid | schemaname | name | relname | name | seq_scan | bigint | seq_tup_read | bigint | idx_scan | bigint | idx_tup_fetch | bigint | n_tup_ins | bigint | n_tup_upd | bigint | n_tup_del | bigint |
    • J-4 Query Your Queries finding slow queries: pg_stat_all_tables pagila=# d pg_stat_all_tables View &quot;pg_catalog.pg_stat_all_tables&quot; Column | Type | ----------------+-------------------------- relid | oid | schemaname | name | relname | name | seq_scan | bigint | seq_tup_read | bigint | idx_scan | bigint | idx_tup_fetch | bigint | n_tup_ins | bigint | n_tup_upd | bigint | n_tup_del | bigint |
    • J-5 Query Your Queries finding slow queries: pg_stat_all_tables pagila=# d pg_stat_all_tables View &quot;pg_catalog.pg_stat_all_tables&quot; Column | Type | ----------------+-------------------------- relid | oid | schemaname | name | relname | name | seq_scan | bigint | seq_tup_read | bigint | idx_scan | bigint | idx_tup_fetch | bigint | n_tup_ins | bigint | n_tup_upd | bigint | n_tup_del | bigint |
    • J-6 Query Your Queries finding slow queries: pg_stat_all_tables pagila=# d pg_stat_all_tables View &quot;pg_catalog.pg_stat_all_tables&quot; Column | Type | ----------------+-------------------------- relid | oid | schemaname | name | relname | name | seq_scan | bigint | seq_tup_read | bigint | idx_scan | bigint | idx_tup_fetch | bigint | n_tup_ins | bigint | n_tup_upd | bigint | n_tup_del | bigint |
    • J-7 Query Your Queries finding slow queries: pg_stat_all_indexes pagila=# d pg_stat_all_indexes View &quot;pg_catalog.pg_stat_all_indexes&quot; Column | Type | ---------------+--------+ relid | oid | indexrelid | oid | schemaname | name | relname | name | indexrelname | name | idx_scan | bigint | idx_tup_read | bigint | idx_tup_fetch | bigint |
    • J-8 Query Your Queries finding slow queries: pg_stat_all_indexes pagila=# d pg_stat_all_indexes View &quot;pg_catalog.pg_stat_all_indexes&quot; Column | Type | ---------------+--------+ relid | oid | indexrelid | oid | schemaname | name | relname | name | indexrelname | name | idx_scan | bigint | idx_tup_read | bigint | idx_tup_fetch | bigint |
    • J-9 Query Your Queries finding slow queries: pg_statio_all_tables pagila=# d pg_statio_all_tables View &quot;pg_catalog.pg_statio_all_tables&quot; Column | Type | -----------------+--------+ relid | oid | schemaname | name | relname | name | heap_blks_read | bigint | heap_blks_hit | bigint | idx_blks_read | bigint | idx_blks_hit | bigint | toast_blks_read | bigint | toast_blks_hit | bigint | tidx_blks_read | bigint | tidx_blks_hit | bigint |
    • J-9 Query Your Queries finding slow queries: pg_statio_all_tables pagila=# d pg_statio_all_tables View &quot;pg_catalog.pg_statio_all_tables&quot; Column | Type | -----------------+--------+ relid | oid | schemaname | name | relname | name | heap_blks_read | bigint | heap_blks_hit | bigint | idx_blks_read | bigint | idx_blks_hit | bigint | toast_blks_read | bigint | toast_blks_hit | bigint | tidx_blks_read | bigint | tidx_blks_hit | bigint |
    • J-10 Query Your Queries fixing slow queries: explain analyze
      • universal tool
      • good for specific queries
      • “ explain” for large queries
      • could be it's own talk
    • J-11 Query Your Queries fixing slow queries: explain analyze
      • universal tool
      • good for specific queries
      • “ explain” for large queries
      • could be it's own talk
      http://www.postgresql.org/docs/techdocs.38
    • L-0 Indexing Options
    • L-1 Indexing Options indexing (basic)
      • use explain to find large sequential reads
      • use pg_stat_* tables to find numerous reads
      • btree – (gist/gin)
      • enable_indexscan, enable_bitmapscan
      • dual column vs. single column
    • L-2 Indexing Options indexing (partial) create index address_ba_part_idx on address (district) where district = 'Buenos Aires';
      • restrain index to rows that matter
      • can give significant speed improvements
      • where clause of index should match
      • where clause of query
    • L-3 Indexing Options indexing (partial) create index customer_active_part_idx on customer (customer_id) where activebool is true;
      • restrain index to rows that matter
      • can give significant speed improvements
      • where clause of index should match
      • where clause of query
    • L-4 Indexing Options indexing (functional)
      • some people prefer to call these expressional indexes
    • L-5 Indexing Options indexing (expressional) create unique index one_true_email_xidx on customer (lower(email));
      • push expensive functions into your index
      • system sees just WHERE indexedcolumn = 'constant'
      • expression of index should match expression of queries
      • narrow scope, but nice gains
    • L-6 Indexing Options indexing (expressional) create index fullname_xidx on customer ((first_name||' '||last_name));
      • push expensive functions into your index
      • system sees just WHERE indexedcolumn = 'constant'
      • expression of index should match expression of queries
      • narrow scope, but nice gains
    • L-7 Indexing Options full text indexing gist vs. gin
      • old school
      • slower for queries
      • faster insert / update
      • mature
      • new in 8.2
      • faster for queries
      • slower insert / update
      • green
    • N-0 PostgreSQL Tablespaces
    • N-1 PostgreSQL Tablespaces tablespaces?
      • define logical locations for object placement
      • point to locations on disk (uses symlinks)
      • size determined by disk size (not pre-ordained)
      • dedicate per db, split db across multiple tblspc
    • N-2 PostgreSQL Tablespaces tablespaces!
      • split database over separate disks
      • use stat, statio tables to gauge disk access
      • create dedicated storage for workloads
          • disk for read / write
          • disk for read only
          • large, slow disk for archiving
          • disk for indexes
    • Q-0 PostgreSQL Partitioning
    • Q-1 PostgreSQL Partitioning partitioning?
      • as table size grows, it becomes unmanageable
      • use inheritance, rules, constraints to split data
      • queries ignore non-relevant partitions
      • could be it's own talk
    • Q-2 PostgreSQL Partitioning partitioning!
      • as table size grows, it becomes unmanageable
      • use inheritance, rules, constraints to split data
      • queries ignore non-relevant partitions
      • could be it's own talk
      http://www.pgcon.org/2007/schedule/events/41.en.html
    • Q-3 PostgreSQL Partitioning partitioning : key points
      • determine list vs. range
      • use triggers rather than rules
      • partition creation vs. data population
      • automate maintenance
    • K-0 Other Stuff I Should Mention
    • K-1 Other Stuff I Should Mention tsearch2
      • full text search interface
      • /contrib module (integrated in 8.3?)
      • very advanced capabilities (rank, headline)
      • missing some things (wildcard)
      • good performance for db's
      • better performance with problem specific tools
      • make it live in its own schema
    • K-2 Other Stuff I Should Mention pgcrypto
      • cryptography type functions
      • /contrib (export issues)
      • md5, sha1, blowfish, many more
    • K-3 Other Stuff I Should Mention dblink
      • pg -> pg connections
      • /contrib (still under development?)
      • can have performance issues on large queries
      • make it live in its own schema
      • beware security issue
    • K-4 Other Stuff I Should Mention autonomous logging tool
      • persistent logging for postgresql functions
      • built on top of dblink
      • make it live in its own schema
      https://labs.omniti.com/trac/pgsoltools
    • K-4 Other Stuff I Should Mention snapshot pitr clones
      • full read/write copy of pitr slave
      • static snapshot
      • need solaris (zfs zone mojo)
      https://labs.omniti.com/trac/pgsoltools
    • K-4 Other Stuff I Should Mention pgbouncer
      • connection pooling application
      • 100-1 pgb/db connection ratio
      • small community, good results
      https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer
    • K-5 Other Stuff I Should Mention dbi-link
      • heterogeneous connections for postgresql
      • built on plperl / dbi
      • moving toward sql-med support
      • similar performance issues to dblink
      http://pgfoundry.org/projects/dbi-link/
    • K-6 Other Stuff I Should Mention phppgadmin
      • web based gui for postgresql
      • remote administration of multiple servers
      • implements much of postgresql functionality
      • support back to 7.1?
      http://phppgadmin.sourceforge.net/
    • K-7 Other Stuff I Should Mention ;-) my book?
    • K-8 Other Stuff I Should Mention ;-) we're hiring
      • software engineers
      • web programmers
      • ui/graphics
      http://www.omniti.com/people/jobs
    • X-0 El Fin