Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Accidental DBA

1,442 views

Published on

Published in: Technology
  • Be the first to comment

The Accidental DBA

  1. 1. The Accidental DBA a guide for the perplexed Do n t P ani c Josh Berkus PostgreSQL Experts, Inc. LinuxCon NA 2012
  2. 2. “Jonathan quit. Youre in charge ofthe Postgres servers now.”
  3. 3. covered in this talk● PostgreSQL ● MySQL ● installation ● (covered ● configuration yesterday) ● connections ● no time for ● backup ● replication ● monitoring ● indexes ● slow queries ● schema design ● migrations
  4. 4. 9.2rc1 Out Now!
  5. 5. “DevOps”“Cloud”
  6. 6. Y U no DBA?1.limited budgets2.shortage of operational staff3.cheaper OSS databases… you are the DBA now.
  7. 7. Oh My God,We’re All Going To Die.
  8. 8. Do n t P a ni c
  9. 9. Do nt P ani cInstallation
  10. 10. Use Packages!● version not important?● use the ones that come with your distro ● Red Hat, Centos, SciLinux ● Debian, Ubuntu ● SuSE
  11. 11. Use Packages!● need the latest version?● alternate packages ● Red Hat: yum.postgresql.org ● Ubuntu: Martin Pitts backports ● SuSE: build service ● Debian: backports coming soon
  12. 12. create data directory?● $PGDATA is where the database files live● most packages create it● if not, use “initdb” to create it ● pick a suitable location!
  13. 13. Do nt P ani cconfiguration
  14. 14. use good hardware● databases use all the hardware ● RAM, CPU, IO ● disk can be very important – DB larger than RAM – write-heavy database● the database cannot outperform bad hardware
  15. 15. put the databaseon its own server(or virtual server)
  16. 16. cloud servers● cloud server performance sucks ● especially IO● make sure you have enough RAM to cache the whole database
  17. 17. Linux configuration1.turn the OOM killer off2.set zone_reclaim_mode = 03.use XFS or Ext4 for database files4.increase shmmax, shmall ● so that you can raise shared_buffers ● this is going away with 9.3!
  18. 18. postgresql.confshared_buffers = ¼ of RAM up to 8GBwork_mem = ( RAM * 2 )/ max_connectionsmaintenance_work_mem = 1/16 RAMeffective_cache_size = 75% of RAMcheckpoint_segments = 32 (small DB) to 128 (large DB, many writes)
  19. 19. Do nt P ani cconnections & security
  20. 20. network● local connections: UDP, if possible ● faster than TCP/IP● other servers: port 5432 ● make sure its open on the firewall!● on the cloud? use SSL ● secure your connections ● PITA to set up, though
  21. 21. max_connections “ERROR: connection limit exceeded for non-superusers”● postgresql.conf● increase number of connections● good up to about 50 + 10 x cores● keep needing to increase it? something wrong with the app
  22. 22. connection pooling● Java? use J2EE pooling● Everyone else: pgbouncer ● event-based pooler ● separate package ● on DB server, or ● app server, or rd ● 3 “bouncer” server
  23. 23. host-based access“FATAL: no pg_hba.conf entry for host"192.168.0.1", user "chaos", database"chaosLRdb", SSL off”● pg_hba.conf● access control list: ● database/user/host address ● like iptables for Postgres● change config and reload
  24. 24. security“FATAL: password authenticationfailed for user "wwwuser"”● Postgres users & passwords ● CREATE/ALTER USER ● “group” ROLEs● DB object permissions● Or: use LDAP or PAM
  25. 25. the psqlcommand line
  26. 26. Do nt P ani cslow queries
  27. 27. pg_stat_activity-[ RECORD 2 ]----+--------------------------------datid | 16422datname | libdataprocpid | 46295usesysid | 10usename | dataentryapplication_name | psqlclient_addr | 192.168.101.114client_port | 5432backend_start | 2012-08-26 15:09:05.233-07xact_start | 2012-08-26 15:09:06.113-07query_start | 2012-08-26 15:11:53.521-07waiting | fcurrent_query | <IDLE> in transaction
  28. 28. locks● write queries can block on other write queries ● as can table schema changes ● queries can wait forever on locks● look for “<IDLE> in transaction” ● thats a ...
  29. 29. Zombie TransactionsWant RAAAAAAAM
  30. 30. killing zombies● pg_cancel_backend(pid) ● kills running queries with sigINT ● like CTRL-C● pg_terminate_backend(pid) ● kills bad connections, idle transactions ● can cause DB to restart
  31. 31. EXPLAINNested Loop (cost=792.00..828.08 rows=1422317 width=99) -> HashAggregate (cost=792.00..792.00 rows=1 width=4) -> Index Scan usingindex_player_summaries_on_player_id on player_summariesps (cost=0.00..791.80 rows=403 width=4) Index Cond: (player_id = 21432312) -> Index Scan using index_player_summaries_on_match_idon player_summaries (cost=0.00..33.98 rows=600 width=99) Index Cond: (match_id = ps.match_id)
  32. 32. EXPLAIN ANALYZENested Loop (cost=792.00..828.08 rows=1422317width=99) (actual time=9928.869..20753.723rows=13470 loops=1) -> HashAggregate (cost=792.00..792.00 rows=1 width=4)(actual time=9895.105..9897.096 rows=1347 loops=1) -> Index Scan usingindex_player_summaries_on_player_id on player_summariesps (cost=0.00..791.80 rows=403 width=4) (actualtime=27.413..9890.887 rows=1347 loops=1) Index Cond: (player_id = 21432312) -> Index Scan using index_player_summaries_on_match_idon player_summaries (cost=0.00..33.98 rows=600 width=99)(actual time=7.375..8.037 rows=10 loops=1347) Index Cond: (match_id = ps.match_id)Total runtime: 20764.371 ms"
  33. 33. explain.depesz.com
  34. 34. what to look for● “seq scan” on large table ● maybe index needed● cartesian joins● really bad row estimates ● ANALYZE needed?
  35. 35. Do nt P ani cbackups
  36. 36. 2012-01-27 18:00:44 MSK FATAL: invalid page header in block 311757 of relation base/26976/279772012-01-27 18:00:44 MSK CONTEXT: xlog redo insert: rel 1663/26976/27977; tid 311757/442012-01-27 18:00:44 MSK LOG: startup process (PID 392) exited with exit code 12012-01-27 18:00:44 MSK LOG: aborting startup due to startup process failure
  37. 37. pg_dump● “logical” backup ● portable ● compressed ● works for upgrades● good for small databases● use -Fc ● custom binary format
  38. 38. PITR● “Point-In-Time Recovery”● “binary” and “continuous” backup ● take snapshot of DB files ● accumulate logfile copies● good for large databases● can combine with replication
  39. 39. PITR - PITA● can be difficult to set up & monitor● use tools: ● RepMgr ● OmniPITR ● WAL-E (for AWS)
  40. 40. Do nt P ani cmonitoring
  41. 41. use your favorite toolganglia, collectd, Hyperic, OpenNMS,OpenView, whatever ....● nagios check_postgres.pl ● broad list of checks ● mine it for queries and techniques
  42. 42. many useful checks● disk space ● long queries● caching RAM ● database size● response time ● table bloat● connections ● system load● idle transacts ● replication lag● table growth ● XID wraparound● waiting queries ● execution time
  43. 43. activity log● connections & disconnections● slow queries● DB swap usage● schema changes● lock waits & deadlocks
  44. 44. pgfouine, pgbadger
  45. 45. Do nt P ani cschema migrations
  46. 46. SQL-DDL is code1.write migration scripts ● make them idempotent ● write “undo” scripts2.check them into version control3.test them on a staging server ● check how long they take4.deploy on production
  47. 47. Postgres doesntrequire downtime for most schema changes.
  48. 48. Do nt P ani updates c &upgrades
  49. 49. major vs. minor9.2 == a major version ● requires an upgrade from 9.1.4 ● contains features not in 9.1 ● requires testing and planned downtime9.1.5 == a minor version ● is a minor “update” from 9.1.4. ● can (and should) be applied immediately
  50. 50. minor updates● come out ~ every 2 months● contain only bugfixes ● security hole patches ● data loss prevention ● fix server crashes● no new or changed features ● occasional documented breakage
  51. 51. update procedure1.schedule 5 minute downtime2.download packages3.shut down postgresql4.install packages5.restart postgresql6.restart application
  52. 52. major upgrades● come out once per year● have many new features ● and sometimes break stuff which used to work● require extensive testing with your application● require significant downtime to upgrade
  53. 53. upgrade procedures● dump & reload ● use pg_dump & pg_restore on database ● most reliable way ● “cleans up” database in process ● best with small databases ● can take a long, long time
  54. 54. upgrade procedures● pgUpgrade ● upgrade “in place” ● much faster ● does not “clean up” database ● sometimes doesnt work
  55. 55. EOL after 5 years
  56. 56. Getting Help● docs ● postgresql.org/docs/ ● postgresguide.org ● wiki.postgresql.org● books ● PostgreSQL Up and Running (OReilly) ● PostgreSQL 9.0 High Performance (Packt)
  57. 57. Getting Help● mailing lists ● postgresql.org/community/lists – pgsql-novice, pgsql-general● chat ● irc.freenode.net, #postgresql● web ● dba.stackexchange.com ● planet.postgresql.org
  58. 58. questions?● Josh Berkus ● josh@pgexperts.com ● PGX: www.pgexperts.com ● Blog: www.databasesoup.com● Upcoming Events ● Postgres Open: Chicago, Sept 17-19 ● LISA: San Diego, Dec 5-8 Copyright 2012 PostgreSQL Experts Inc. Released under the Creative Commons Attribution License. All images are the property of their respective owners. The Dont Panic slogan and logo is property of the BBC, and the Dilbert image belongs to Scott Adams and is used here as parody.

×