• Save
Scaling our SaaS backend with PostgreSQL
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Scaling our SaaS backend with PostgreSQL

  • 805 views
Uploaded on

Slides of the talk I gave on 2013-10-28 at the Backend Web Berlin Meetup as well as Developer Conference in Hamburg on 2013-11-08.

Slides of the talk I gave on 2013-10-28 at the Backend Web Berlin Meetup as well as Developer Conference in Hamburg on 2013-11-08.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • can you send me the ppt
    Redis as a message queue to me? I like the blackboard background of this ppt. my email is albertshen1206@gmail.com. thanks
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
805
On Slideshare
803
From Embeds
2
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
1
Likes
1

Embeds 2

https://twitter.com 2

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Hi, I’m Oliver, I’m a software developer, currently heading the development team at Bidmanagement GmbH in Berlin.
  • I’m going to talk aboutpostgresqlNot so much about the dbms itself, but more about how we’re using it as main datastore in our system.
  • About how in our company we're running a large Postgresql installationHow we‘ve grown our setup
  • ----- Meeting Notes (27/10/13 11:10) -----very popularbillions of dollarsvery important online marketing channel
  • Google provides a very extensive API
  • ----- Meeting Notes (23/10/13 22:22) -----The different kinds of data we store can be largely separated into two groups.
  • .. And we decided to go with postgresql, because:Our Go-To tool for storing data for many yearsProblems from time to time, but..We never looked back
  • But it began much smaller …
  • Straightforward approachNobody thought of scaling
  • Pilots successful, we started to acquire customersSoon >10mio rows in some tablesQuery performance lagged (many FTS) Did not want to scale horizontally, because we aspired much bigger growth(Also: expensive)----- Meeting Notes (24/10/13 20:45) -----vertically
  • PostgreSQL supports partitioning via inheritance[insert scheme]Use CHECK constraints to tell Query Planner where to lookCannot insert into parent table, must insert into child tableLot of effort goes to application logicTried it on one table, weren’t it conviced
  • One main db with non-account specific dataCurrently ~ 1-2 GBSeveral machines dedicated to account-databases50-1000 DBs per machinePostgreSQL 9.0 and 9.3 on each machineAllows us to migrate one db after another
  • Partitioning scheme allows easy horizontal scaling More machines. But which?Dataset does not fit in RAM High I/O requirementsAWS EC2?Must migrate all/most machines due to latencyDB Instances run 24/7  costlyEBS Performance limited (GBit Ethernet)[ec2 / ebs performance numbers vs. physical]----- Meeting Notes (24/10/13 20:45) -----Add: not many core
  • Not that much elasticity requiredAs B2B our growth is more predictableBatch processing of expensive backend jobs1 year EC2 instance ≅ Buying one physical serverUsing mid-sized machinesGood price/value ratio
  • SATA: 600GB vs 3 TBEC2: performance, latency unclear. Evaluate to make informed decisionSSDs: expensive. Reliable? Raid?
  • But when things go awry and data gets deleted …
  • Big cheap HDDs
  • But when things go awry and data gets deleted …
  • But when things go awry and data gets deleted …
  • MainDB still replicatedTo enable quick failoverHere we can’t afford extended downtime
  • Capacity doubled, cost reduced 40%The more servers, the faster the restoreGbit Ethernet on backup server is limiting factor
  • From sequential reads to random readsFeedback loop:
  • Webapp-queries with humans waiting are quite fastProblematic queries done by the analysis jobsFrequent full table scansQueries with huge resultsNeed way to synchronize queries, control concurrencyCould use a connection poolerOr an external synchronization mechanisme.g. Zookeeper
  • Webapp-queries with humans waiting are quite fastProblematic queries done by the analysis jobsFrequent full table scansQueries with huge resultsNeed way to synchronize queries, control concurrencyCould use a connection poolerOr an external synchronization mechanisme.g. Zookeeper
  • We rewrite the history every day (for various reasons)Conversions arrive up to 30 days laterCampaigns are added to optimizationFor most accounts <1M recordsFor some 10-100MWe achieve up to 80k inserts/secNetwork is bottleneck [check this]
  • We use COPY for all bulk inserts, even small bulksDrop/recreate with simple plpgsql functionsFor complete table rewritesTRUNCATE is not transaction safe
  • We added a self-service signup2-minute process to add AdWords account to the systemOAuth User Info  Optimization BootstrapBiggest problem:CREATE DATABASE can take several minutesDepends on current amount of write activity
  • We know always keep 10-20 spare databases in stockWe control target host for new databases this wayTake care not to have race conditions when applying schema changes

Transcript

  • 1. SCALING OUR SAAS BACKEND WITH POSTGRESQL OLIVER SEEMANN, BIDMANAGEMENT GMBH BWB MEETUP, 2013-10-28
  • 2. THIS TALK IS ABOUT …
  • 3. THIS TALK IS ABOUT … Gigabytes Terabytes
  • 4. PRODUCTIVITY TOOLS FOR ONLINE MARKETERS Automatic Bid Management for
  • 5. Auctioned Ads “Organic” Search
  • 6. SIGNIFICANT AMOUNTS OF DATA 10.000 Campaigns 5 Mio Keywords 4 Mio Ads per AdWords account
  • 7. SIGNIFICANT AMOUNTS OF DATA Full History for all objects over full lifetime
  • 8. SLOW AND FAST DATA “Slow” / OLAP data for batch-processing jobs “Fast” / OLTP data for human interaction
  • 9. INITIALLY SEPARATE Slow Data Fast Data
  • 10. A LOT OF OVERLAP Slow Data Fast Data
  • 11. THEN ONLY ONE Slow Data Fast Data
  • 12. CURRENTLY 7 machines running PostgreSQL 3 Terabytes Data Thousands of Databases Largest Table: 120GB
  • 13. HOW IT BEGAN Experiment
  • 14. DESIGN BY THE BOOK Scenario PK,FK1 PK,FK1 PK Customer PK customer_id Account Campaign Adgroup PK user_id FK1 customer_id account_id PK campaign_id PK adgroup_id FK1 User PK customer_id FK1 account_id FK1 campaign_id UserAccountAccess PK,FK1 PK,FK2 account_id user_id History PK PK,FK1 PK,FK1,FK2 day keyword_id adgroup_id keyword_id adgroup_id factor Keywords PK,FK1 PK adgroup_id keyword_id
  • 15. MORE CUSTOMERS – MORE DATA
  • 16. PARTITIONING All Accounts Account 1 – Rec 1 Account 2 – Rec 1 Account 1 – Rec 2 Account 3 – Rec 1 Account 2 – Rec 2 Account 2 – Rec 3 Account 1 – Rec 3 Account 3 – Rec 2
  • 17. PARTITIONING Account 1 Account 2 Account 3 Account 1 – Rec 1 Account 2 – Rec 1 Account 3 – Rec 1 Account 1 – Rec 2 Account 2 – Rec 2 Account 3 – Rec 2 Account 1 – Rec 3 Account 2 – Rec 3 Account 3 – Rec 3
  • 18. PARTITION WITH INHERITANCE SELECT Child Parent INSERT Child CHECK CONSTRAINTS Child
  • 19. ISOLATE ACCOUNTS One DB Many DBs
  • 20. PARTITIONING VIA DATABASES Excellent horizontal scaling Easy cloning pg_dump/pg_restore Some Overhead No direct references
  • 21. WHY NOT SCHEMAS? More lightweight Full References No easy cloning No schemas inside schemas
  • 22. SETUP main machine-1 machine-0 machine-2
  • 23. DB HARDWARE Data > RAM ⇒ High I/O EC2?
  • 24. MIGRATION TO EC2 Must migrate all/most machines No PostgreSQL in RDS DB Instances run 24/7 ⇒ costly EBS Performance limited
  • 25. EBS I/O LIMITED MB/s 900 800 700 600 500 400 300 200 100 0 Seq. Write Seq. Read AWS Instance AWS EBS (Raid-0) Storage SSD (Raid0) Real 15k SAS2 (Raid-10)
  • 26. DEDICATED MACHINES Moderate CPU / RAM Fast Disks Battery-backed caching controller
  • 27. ALTERNATIVE HW Use bigger (and slower) SATA drives Evaluate EC2+EBS in production SSDs
  • 28. HARDWARE FAILS Replication Master Slave Availability Query Load Balancing
  • 29. REPLICATION Account DBs Main DB master-1 master slave-1 master-2 slave-2 slave
  • 30. BACKUPS pg_dump compressed Backup Server
  • 31. REPLICATION Account DBs Main DB master-1 master slave-1 master-2 slave-2 slave
  • 32. REPLICATION Account DBs Main DB master-1 master slave-1 master-2 slave-2 slave
  • 33. REPLICATION Account DBs Main DB master-1 master master-3 master-2 master-4 slave
  • 34. DISASTER RESTORE concurrent pg_restore Backup Server
  • 35. PERFORMANCE PROBLEMS Too many concurrent full table scans From 300MB/s to 30MB/s MORE CONCURRENT QUERIES LONGER QUERY RUNTIME
  • 36. DIFFERENT APPS Web App Server Compute Cluster Many fast queries Few very slow queries
  • 37. DIFFERENT APPS Semaphore Web App Server Many fast queries Compute Cluster Few very slow queries Simple counting semaphore using Advisory Locks Implemented in the application
  • 38. BULK INSERTS INSERT 20k – 80k per sec 50M
  • 39. BULK INSERT BEST PRACTICE COPY instead of INSERT Drop indexes + recreate Truncate COPY into a new table, swap + drop
  • 40. SIGNUP PROBLEMS Adspert Service Signup CREATE DATABASE Up to 5-10 min
  • 41. PRE-CREATE DATABASES Create DBs ahead of time New signups rename DBs Periodically create new Fall back to direct create
  • 42. CONCLUDING .. Partitioning into Databases Physical Hardware Check out advisory locks
  • 43. THANKS FOR LISTENING QUESTIONS?