Highload++ 2010
Scaling with Postgres
Robert Treat
Monday, October 25, 2010
Who Am I?
✤ Robert Treat
✤ OmniTI
✤ Design, Development,
Database, Ops
Monday, October 25, 2010
Who Am I?
✤ Robert Treat
✤ OmniTI
✤ Design, Development,
DATABASE, Ops
Monday, October 25, 2010
Who Am I?
✤ Robert Treat
✤ OmniTI
✤ Design, Development,
DATABASE, Ops
✤ Etsy, Allisports,
National Geographic,
Gilt, etc....
Who Am I?
✤ Robert Treat
✤ Postgres
✤ Web, Advocacy,
phpPgAdmin
✤ Major Contributor
Monday, October 25, 2010
Who Am I?
✤ Postgres 6.5 -> 9.1alpha1
✤ Terabytes of data
✤ Millions of transactions per day
✤ OLTP, ODS, DW
✤ Perl, PHP, ...
Who Am I?
OBSERVATION == LEARNING
(hopefully)
Monday, October 25, 2010
Scalability
It is the ability of a computer
application or product (hardware
or software) to continue to
function well whe...
Scalability
Given ever increasing load
Monday, October 25, 2010
Scalability
NEVER GO DOWN
ALWAYS PERFORM WELL
Given ever increasing load
Monday, October 25, 2010
Scalability
NEVER GO DOWN
ALWAYS PERFORM WELL
Given ever increasing load
impossible goal, but we’ll try
Monday, October 25...
Scalability
NEVER GO DOWN
ALWAYS PERFORM WELL
Given ever increasing load
NOTE! data loss is not a goal, but ideally we won...
It starts with culture...
Monday, October 25, 2010
✤ Get over schema purity
✤ add column default not null
Monday, October 25, 2010
✤ Get over schema purity
✤ add column default not null
Good performance comes from good schema
design, HOWEVER, perfect re...
✤ Devs must own schema and queries
✤ they design, you refine
Monday, October 25, 2010
✤ Devs must own schema and queries
✤ they design, you refine
Performance and scalability cannot be
managed solely within th...
GainVisibility
Monday, October 25, 2010
GainVisibility
✤ Monitoring
✤ Alerts
✤ Trending
✤ Capacity Planning
✤ Performance Tuning
Monday, October 25, 2010
GainVisibility
✤ Alerts
✤ server: out of disk space, high load, etc...
✤ database: connections, sequences, etc...
✤ busine...
GainVisibility
✤ Trending
✤ server: disk usage, load, etc...
✤ database: connections, sequences, etc...
✤ business: regist...
GainVisibility
✤ Capacity Planning
✤ disks, cpu, memory
✤ connections, vacuum, bloat
simple projections, done regularly, a...
GainVisibility
✤ Performance tuning
✤ how long do queries take?
✤ how often do they run?
pgfouine
Monday, October 25, 2010
GainVisibility
COMMITS/PUSHES
Monday, October 25, 2010
GainVisibility
ALL alerts, graphs, query reports, etc...
MUST be available to EVERYONE on
the team AT ALL TIMES
Monday, Oc...
Hands on
You can’t succeed without first putting
the right culture in place.
Once you are on the right path, make
sure you ...
PostgresVersions
✤ MINIMUM: 8.3
✤ removes xid for read only queries, significant reduction in vacuum
activity
Monday, Octob...
PostgresVersions
✤ MINIMUM: 8.3
✤ removes xid for read only queries, significant reduction in vacuum
activity
seriously!
Mo...
PostgresVersions
✤ MINIMUM: 8.3
✤ removes xid for read only queries, significant reduction in vacuum
activity
✤ BETTER: 8.4...
PostgresVersions
✤ MINIMUM: 8.3
✤ removes xid for read only queries, significant reduction in vacuum
activity
✤ BETTER: 8.4...
Speaking of replication
✤ Common practice for scaling websites
✤ Good for READ based loads
✤ We have used many:
✤ slony, r...
Speaking of replication
Monday, October 25, 2010
Speaking of replication
✤ No favorite system for this, evaluate based on:
✤ avoid solutions that duplicate writes at sql l...
So what would you use? (tm)
✤ 2 Nodes, master + standby: Postgres 9.0
✤ Master + multiple slaves: Slony
✤ Master-Master: B...
A word about “Sharding”
✤ Distributed computing is hard(er)
✤ we think of things in a singular global state
✤ the more we ...
A word about “Sharding”
✤ Splitting systems by service:
✤ separate db for login, forums, sales, etc...
✤ allows for growth...
Pooling
✤ Postgres connections are expensive!
✤ fork new process per connection
✤ keep 1 process open per connection
✤ 100...
Pooling
✤ Postgres connections are expensive!
✤ fork new process per connection
✤ keep 1 process open per connection
✤ 100...
Summary
✤ Schema / Queries should be shared between dev, dba teams!
✤ Monitoring + Visibility!
✤ >= 8.3 Required!
✤ Replic...
Thanks!
more:
@robtreat2
www.xzilla.net
Oleg & Crew
Highload++
OmniTI
Postgres Community!
You!
Monday, October 25, 2010
Upcoming SlideShare
Loading in …5
×

Scaling with Postgres (Highload++ 2010)

3,790
-1

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,790
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Scaling with Postgres (Highload++ 2010)

  1. 1. Highload++ 2010 Scaling with Postgres Robert Treat Monday, October 25, 2010
  2. 2. Who Am I? ✤ Robert Treat ✤ OmniTI ✤ Design, Development, Database, Ops Monday, October 25, 2010
  3. 3. Who Am I? ✤ Robert Treat ✤ OmniTI ✤ Design, Development, DATABASE, Ops Monday, October 25, 2010
  4. 4. Who Am I? ✤ Robert Treat ✤ OmniTI ✤ Design, Development, DATABASE, Ops ✤ Etsy, Allisports, National Geographic, Gilt, etc... Monday, October 25, 2010
  5. 5. Who Am I? ✤ Robert Treat ✤ Postgres ✤ Web, Advocacy, phpPgAdmin ✤ Major Contributor Monday, October 25, 2010
  6. 6. Who Am I? ✤ Postgres 6.5 -> 9.1alpha1 ✤ Terabytes of data ✤ Millions of transactions per day ✤ OLTP, ODS, DW ✤ Perl, PHP, Java, Ruby, C# Monday, October 25, 2010
  7. 7. Who Am I? OBSERVATION == LEARNING (hopefully) Monday, October 25, 2010
  8. 8. Scalability It is the ability of a computer application or product (hardware or software) to continue to function well when it (or its context) is changed in size or volume in order to meet a user need. Monday, October 25, 2010
  9. 9. Scalability Given ever increasing load Monday, October 25, 2010
  10. 10. Scalability NEVER GO DOWN ALWAYS PERFORM WELL Given ever increasing load Monday, October 25, 2010
  11. 11. Scalability NEVER GO DOWN ALWAYS PERFORM WELL Given ever increasing load impossible goal, but we’ll try Monday, October 25, 2010
  12. 12. Scalability NEVER GO DOWN ALWAYS PERFORM WELL Given ever increasing load NOTE! data loss is not a goal, but ideally we won’t lose it :-) Monday, October 25, 2010
  13. 13. It starts with culture... Monday, October 25, 2010
  14. 14. ✤ Get over schema purity ✤ add column default not null Monday, October 25, 2010
  15. 15. ✤ Get over schema purity ✤ add column default not null Good performance comes from good schema design, HOWEVER, perfect relational modeling is NOT THE GOAL Monday, October 25, 2010
  16. 16. ✤ Devs must own schema and queries ✤ they design, you refine Monday, October 25, 2010
  17. 17. ✤ Devs must own schema and queries ✤ they design, you refine Performance and scalability cannot be managed solely within the database; both require application level knowledge. To achieve this, application developers need to have visibility of the resources they work on Monday, October 25, 2010
  18. 18. GainVisibility Monday, October 25, 2010
  19. 19. GainVisibility ✤ Monitoring ✤ Alerts ✤ Trending ✤ Capacity Planning ✤ Performance Tuning Monday, October 25, 2010
  20. 20. GainVisibility ✤ Alerts ✤ server: out of disk space, high load, etc... ✤ database: connections, sequences, etc... ✤ business: registrations, revenue, etc... ✤ etc... check_postgres.pl Monday, October 25, 2010
  21. 21. GainVisibility ✤ Trending ✤ server: disk usage, load, etc... ✤ database: connections, sequences, etc... ✤ business: registrations, revenue, etc... ✤ etc... cacti, mrtg, circonus Monday, October 25, 2010
  22. 22. GainVisibility ✤ Capacity Planning ✤ disks, cpu, memory ✤ connections, vacuum, bloat simple projections, done regularly, are good enough Monday, October 25, 2010
  23. 23. GainVisibility ✤ Performance tuning ✤ how long do queries take? ✤ how often do they run? pgfouine Monday, October 25, 2010
  24. 24. GainVisibility COMMITS/PUSHES Monday, October 25, 2010
  25. 25. GainVisibility ALL alerts, graphs, query reports, etc... MUST be available to EVERYONE on the team AT ALL TIMES Monday, October 25, 2010
  26. 26. Hands on You can’t succeed without first putting the right culture in place. Once you are on the right path, make sure you have the right technology Monday, October 25, 2010
  27. 27. PostgresVersions ✤ MINIMUM: 8.3 ✤ removes xid for read only queries, significant reduction in vacuum activity Monday, October 25, 2010
  28. 28. PostgresVersions ✤ MINIMUM: 8.3 ✤ removes xid for read only queries, significant reduction in vacuum activity seriously! Monday, October 25, 2010
  29. 29. PostgresVersions ✤ MINIMUM: 8.3 ✤ removes xid for read only queries, significant reduction in vacuum activity ✤ BETTER: 8.4 ✤ revised free space map management leads to more efficient vacuuming Monday, October 25, 2010
  30. 30. PostgresVersions ✤ MINIMUM: 8.3 ✤ removes xid for read only queries, significant reduction in vacuum activity ✤ BETTER: 8.4 ✤ revised free space map management leads to more efficient vacuuming ✤ WHY NOT? 9.0 ✤ Hot standby / streaming replication couldn’t hurt Monday, October 25, 2010
  31. 31. Speaking of replication ✤ Common practice for scaling websites ✤ Good for READ based loads ✤ We have used many: ✤ slony, rubyrep, bucardo, 9.0 built-in, mammoth, wrote-our-own Monday, October 25, 2010
  32. 32. Speaking of replication Monday, October 25, 2010
  33. 33. Speaking of replication ✤ No favorite system for this, evaluate based on: ✤ avoid solutions that duplicate writes at sql level (imho) ✤ how comfortable am I debugging the system? ✤ do you need automated schema changes? ✤ how much redundancy / complexity do you need? ✤ how does the system handle node failure for N nodes? Monday, October 25, 2010
  34. 34. So what would you use? (tm) ✤ 2 Nodes, master + standby: Postgres 9.0 ✤ Master + multiple slaves: Slony ✤ Master-Master: Bucardo All choices subject to change!! Monday, October 25, 2010
  35. 35. A word about “Sharding” ✤ Distributed computing is hard(er) ✤ we think of things in a singular global state ✤ the more we can work in that model, the better ✤ RDBM offer poor solutions for multiple masters ✤ you must manage that complexity on your own Monday, October 25, 2010
  36. 36. A word about “Sharding” ✤ Splitting systems by service: ✤ separate db for login, forums, sales, etc... ✤ allows for growth ✤ provides simple interface Monday, October 25, 2010
  37. 37. Pooling ✤ Postgres connections are expensive! ✤ fork new process per connection ✤ keep 1 process open per connection ✤ 1000+ processes you will notice trouble Monday, October 25, 2010
  38. 38. Pooling ✤ Postgres connections are expensive! ✤ fork new process per connection ✤ keep 1 process open per connection ✤ 1000+ processes you will notice trouble ✤ POOLING ✤ JDBC, mod-perl ✤ pgbouncer ftw! Monday, October 25, 2010
  39. 39. Summary ✤ Schema / Queries should be shared between dev, dba teams! ✤ Monitoring + Visibility! ✤ >= 8.3 Required! ✤ Replication, jump in it! ✤ Use connection pooling! Monday, October 25, 2010
  40. 40. Thanks! more: @robtreat2 www.xzilla.net Oleg & Crew Highload++ OmniTI Postgres Community! You! Monday, October 25, 2010

×