• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Grabbing the PostgreSQL Elephant by the Trunk
 

Grabbing the PostgreSQL Elephant by the Trunk

on

  • 5,444 views

 

Statistics

Views

Total Views
5,444
Views on SlideShare
5,404
Embed Views
40

Actions

Likes
2
Downloads
1
Comments
0

3 Embeds 40

http://www.slideshare.net 23
http://speakerrate.com 14
http://www.linkedin.com 3

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • <br />
  • Who the fuck is this clown? <br />
  • 12 ft tall, 5+tons (12,000 lbs) <br /> Their trunk has over 100000 muscle units <br /> They eat and walk most of the day - terrible digestive system. Other animals eat their poop. <br /> One of the big five. <br /> Intimidating. Aggressive. Nobody messes with them. <br /> They change their teeth six times in a lifetime. At age 60 they don&#x2019;t grow them again and die of starvation. <br /> Four toes. <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • An expected feature, it guarantees the reliability of DB transactions. <br /> <br /> A: Transactional operations either succeed completely or fail completely. In Postgres DDL operations are also transactional. <br /> C: Database goes from one consistent state to the next. <br /> I: Data that is generated in interim steps of a transaction can never be seen by any other readers/queries/viewers. This is what MVCC is all about. <br /> D: Guarantees that once the user/client is notified of success, the data is persisted and the transaction will not be undonde. <br />
  • MVCC - Multi Version Concurrency Control. <br /> <br /> An alternative to MVCC is read-locking. Every time a query reads from the DB, it locks the rows so that no other statement can change the data, and therefore it is reading &#x201C;real and consistent&#x201D; data. However, there can be many users reading from the same table. Hundreds of thousands in bigger websites. What if there&#x2019;s a queue of hundreds of reads, and you need to update the table? The update statement must wait until all reads are done in order to issue the update. Additional reads must wait until the update is completed before retrieving new data. <br /> <br /> With MVCC, none of the reads need to lock the rows, and the update doesn&#x2019;t require a lock either. They all execute immediately. <br /> <br /> Postgres keeps multiple versions (MVcc) of the data every time the data changes (to clean up older versions, use VACUUM). <br />
  • <br />
  • http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server http://www.postgresql.org/docs/current/static/runtime-config-client.htmlhttp://www.postgresql.org/docs/current/static/runtime-config-resource.htmlhttp://www.pgcon.org/2008/schedule/events/104.en.htmlhttp://vimeo.com/7109722http://wiki.postgresql.org/wiki/GUCS_Overhaulhttp://dimitrik.free.fr/db_STRESS_PostgreSQL_837_and_84_May2009.html <br /> http://www.pgexperts.com/documents.html <br /> Oracle has over 500 config settings. <br />
  • postgresql&#x2019;s dedicated ram. 2nd level cache (1st level is the OS cache). Good starting point is 1/4 of available RAM. cache_miss statistics can tell you if you need more. <br /> <br />
  • Memory limit on per query operations like sort, count, etc. If RAM swapping is high, work_mem is too high. <br /> In general, OS swapping == too much work_mem, while caching sorts in pg_temp == not enough work_mem. <br /> &#xA0;&#xA0;&#xA0;&#xA0;work_mem = 32MB <br /> <br /> <br />
  • vacuum and analyze use it. Query expiration. Could be in the 256MB-1GB range for larger DBs. <br /> <br />
  • Default is appropriate for one CPU, small DB. Increase it to maybe 8MB for SMPs. Whimpy default because of the linux shared mem limits. <br /> <br />
  • <br />
  • Turn off, but know that there are risks as you may loose .5 seconds worth of data. If it&#x2019;s on, the WAL gets written immediately. <br /> <br />
  • number of disks or channels. Only if your OS supports async IO. Linux and FreeBSD do. <br /> <br />
  • This setting simply hints the planner, to make better cost estimates. <br />
  • http://vimeo.com/9889075 <br /> So, I&#x2019;ve convinced you and you will port the app to postgres. Here&#x2019;s what to look after to avoid any pitfalls. <br /> <br /> If any one component fails, the whole system suffers. <br /> <br /> Hardware: I/O is a very common bottleneck. Many writes cause the transaction log to bring the process down. If the database is 3x larger than RAM -> I/O bound per query. <br /> Use SATA, not SAS (SATA is half duplex). Raid 1+0 over Raid 5, especially for writes. More spindles, the better => Many small drives better than less bigger drives. Move transaction log to a separate disk. <br /> <br /> BEWARE OF CLOUD and IO! <br /> <br /> Low RAM is also common, and capping on memory makes the server hit the disks. bad. It is ideal to fit entire database in RAM, using shared_buffers. If that&#x2019;s not possible, cache as much of the database as possible, in which case think about how big is the operation dataset that needs caching. Think about sorting, aggregates and other operations that may benefit from in-ram caching. <br /> <br /> CPU: Pick more CPUs over more cores on same processor. L2 cache, speed, and 64 bit. <br /> <br /> OS: Use something that supports direct I/O - linux, solaris, freebsd. <br /> * Tablespaces for large tables <br /> * Tuning linux: <br /> * XFS & JFS for database files. Otherwise ext3 if in redhat (no support for XFS). UFS if in solaris. <br /> * Reduce logging - data-writeback, noatime, nodiratime. <br /> * pre 2.6.9 kernels must upgrade. <br /> * use deadline scheduler for write speed <br /> <br /> Schema: <br /> * 1NF. Postgres is optimized for normalized lookups. Wait for an actual issue before denormalizing. <br /> * You may benefit from separating out tables (1-to-1s) based on read/write patterns. <br /> * Indexes: FKs, where clauses, aggregations. Use expression indexes, partial indexes. Don&#x2019;t over index. Don&#x2019;t index small tables. Look at pg_stat_user_indexes and pg_stat_user_tables. <br /> * Partitioning: historical data, very big tables. Any big deletes. App must know how to deal with it, by querying the partition key as part of the WHERE criteria. <br /> * Query design. Do more on each query. <br /> <br /> * Connection pooling: pgPool, pgBouncer. <br /> * pg_fouine: Analyze queries. In 8.4 there&#x2019;s pg_stat_statement <br /> * EXPLAIN_ANALYZE <br /> <br /> Have a mechanism for measuring changes you make. How much CPU, RAM, swapping is happening before and after tuning? Use top, vmstats, etc <br /> <br /> <br />
  • <br />
  • <br />
  • <br />
  • * pgPool is more complex. Replication, distributed query management. pgBouncer is more lightweight. <br /> * Table Partitioning allows you to manage dead data without overhead, keeping your database healthy. Performance of course. <br /> <br />
  • <br />
  • <br />
  • <br />
  • <br />
  • <br />

Grabbing the PostgreSQL Elephant by the Trunk Grabbing the PostgreSQL Elephant by the Trunk Presentation Transcript

  • Grabbing the PostgreSQL Elephant by the Trunk
  • Harold Giménez awesomeful.net @hgimenez http://www.flickr.com/photos/leeh/180311941/sizes/o/
  • Elephants are cool
  • Installation
  • Installation • build from git clone git clone git://git.postgresql.org/git/postgresql.git && cd postgresql # (mirrored at git://github.com/postgres/postgres.git) ./configure && make && sudo make install • Useful configuration options: ./configure --prefix=/opt/postgresql-9.0-dev --with-pgport=8200 • port/fink/yum/apt-get • EnterpriseDB
  • Post Installation
  • Post Installation • Init the database cluster sudo -u postgres pg_ctl -D /usr/local/pgsql/data init • Start the server sudo -u postgres pg_ctl -D /usr/local/pgsql/data start (there are SysV init scripts in contrib/start-scripts)
  • Post Installation • Change postgres password: sudo -u postgres psql template0 ALTER USER postgres WITH PASSWORD 'new_password'; • Create role sudo -u postgres createuser --no-superuser --createdb -- no-createrole --login --pwprompt --encrypted -h 127.0.0.1 -p 5432 hgimenez • pg_hba.conf (host based auth) type database user cidr-address method local all all ident host all all 10.2.0.0/16 md5
  • Post Installation • Createa “virgin” database From the psql prompt create database my_awesome_app_development owner=hgimenez template=template0 encoding='utf8'; From your shell createdb -U postgres -O db_user -T template0 -E 'utf8' db_name
  • Get acquainted with psql psql -d db_name ? d d users l c set ECHO_HIDDEN true timing x q
  • Ruby adapters gem install postgres-pr gem install postgres gem install activerecord-jdbcpostgresql-adapter gem install pg http://www.flickr.com/photos/jurvetson/60685364/sizes/s/
  • Rails plugins and Ruby tools Constraints, views and other migration helpers: http://github.com/alex3t/rails_on_pg http://github.com/matthuhiggins/foreigner http://agilewebdevelopment.com/plugins/redhillonrails_core Full text search (tsearch2): http://github.com/pka/acts_as_tsearch http://github.com/tenderlove/texticle Backups: http://github.com/meskyanichi/backup The database toolkit: http://github.com/jeremyevans/sequel
  • database.yml development: adapter: postgresql encoding: unicode database: bostonrb_development min_messages: warning
  • <obligatory-MySQL-rant>
  • Features Replication in alpha testing Presentation by Relevant Logic - 2 years ago http://www.slideshare.net/gisborne/postgres-presentation-presentation
  • http://twitter.com/depesz_com/status/8645567175
  • http://twitter.com/jordibunster/status/9204222189
  • PostgreSQL license http://www.postgresql.org/about/licence “ Permission to use, copy, modify, and distribute this software and its documentation for any purpose, without fee, and without a written agreement is hereby granted, provided that the above copyright notice and this paragraph and the following two paragraphs appear in all copies. ”
  • </obligatory-MySQL-rant>
  • Features at a glance
  • Features at a glance • Views • Query Plan Optimizer • Triggers • Procedural Languages • Rules • Full text search • Constraints • PITR, Warm Standby • ACID, transactions, MVCC • Replication • PostGIS • GUCS
  • • Atomicity • Consistency • Isolation MVCC • Durability
  • Multi Version Concurrency Control
  • Global Unified Configuration Settings
  • GUCS How What postgresql.conf The set of PostgreSQL’s configuration parameters. SET (and SHOW) Global or session select * from pg_settings Performance, security, backend admin command line switches
  • shared_buffers Available memory to postgres tot RAM/4
  • work_mem Non-shared RAM used by query for sorts, hashes, etc 32 - 64MB
  • maintenance_work_mem Memory usage for vacuum, bulk data loads, etc 256MB - 1GB
  • wal_buffers Memory available to the Write-Ahead-Log 8MB for SMPs
  • checkpoint_segments How many 8MB log segments to create before a checkpoint 16 to 64
  • synchronous_commits WAL gets written immediately if on, which means in event of a crash no data is lost, guaranteed. off
  • effective_io_concurrency Only if OS supports async IO number of disks/channels
  • effective_cache_size estimated RAM available for caching in shared_buffers and filesystem cache tot RAM * 2/3
  • APP PERFORMANCE • Hardware: Storage, RAM, CPU, Network • OS: Filsystem, Kernel • PostgreSQL: Schema, Configuration • Application: Queries, Transactions
  • YesSQL
  • Red Refactor Green
  • SCALING • Table Partitioning • Vertical partitioning of data • Horizontal partitioning (sharding) - PL/Proxy • PITR, Replication • Manage number of connections with pgBouncer
  • Replication
  • Tools http://www.flickr.com/photos/leelefever/2101999317/sizes/l/
  • Tools • explain analyze • pg_stats • iostat • pg_stat_user_indexes • vmstat 5 50, vmstatx • pg_stat_user_tables • top • pg_statio_user_tables • pgtop • pg_dump & pg_restore • dd, bonnie++
  • Learn More • http://wiki.postgresql.org/ • http://www.postgresql.org/docs/current/static/ • http://pgexperts.com/documents.html • http://wiki.postgresql.org/images/4/45/Explaining_EXPLAIN.pdf • #postgresql, mailing lists
  • ?