Andy Parsons Pivotal June 2011

764 views

Published on

  • Be the first to comment

Andy Parsons Pivotal June 2011

  1. 1. War Stories: OperationalFun with PostgreSQL in the Cloud Andy Parsons
  2. 2. Who am I?• Startup junkie / masochist• Deliver stuff that works in startup time• Old timer on the NYC startup scene• ♥ luxury of choosing tools. And living with them.
  3. 3. Who Aren’t I?• DBA• Sys Admin• PostgreSQL Uber Guru
  4. 4. This Talk is ...• Pragmatic, Concrete• About My Experiences and Lessons Learned• About 3 recent startups built with PostgreSQL• Going to focus on Postgres, but leak into overall architecture
  5. 5. Brief History: me + PostgreSQL• Digital Railroad (Deadpool 2007)• Shrty (acquired by Collecta 2008)• Outside.in (acquired by AOL 2011)• Bookish (stealth)
  6. 6. The IM You Don’t Want to Get. Ever.
  7. 7. The IM You Don’t Want to Get. Ever. 1:05 am “Site’s down” 1:06 am “U seeing all these alerts?” 1:09 am “What’s it mean- no such device?”
  8. 8. The IM You Don’t Want to Get. Ever. 1:05 am “Site’s down” 1:06 am “U seeing all these alerts?” 1:09 am “What’s it mean- no such device?”
  9. 9. Fallout• A lot of the system was down for a short time• When it came back up, data was old• New data had to be merged with incoming• But, incoming pipeline never compromised
  10. 10. Lessons Learned• Don’t trust expensive hardware and datacenters• Redundant isn’t redundant enough• SH** HAPPENS• Postgres looks cool!
  11. 11. Shrty• Social Network Aggregation• Seed capital• 2 developers• First attempt to run Postgres on EC2
  12. 12. Story• 3 guys with an idea and a logo• Built in 2 months in RoR and Java• Modest traffic, tested up to 100K users• Investor pitches• “Production”• Sold.
  13. 13. Lessons Learned• PostgreSQL + EC2 : it works!• Cheap!• I/O is massively unpredictable• Ephemeral storage is ... ephemeral• No SLA in the Cloud
  14. 14. outside.in• Hyperlocal News• Geotag and categorize web pages, blog posts and tweets from hundreds of thousands of sources• Organize data into ~85,000 neighborhoods• Query for news with 1000 ft. of a user• Chose Postgres for PostGIS• Powers local on CNN’s homepage and many other sites• Now part of AOL’s Patch
  15. 15. Architecture Postgres RoR Slave Scala Postgres Svc Master Scala Denorm Postgres RoR APIs / Slave Scala Indexing Q’ing Svc ScalaMobile APIs Text Solr Scala Mining Slave Solr Svc MasterPublic Solr API Slave
  16. 16. EC2 DB “Hardware”• m2.4xlarge = High-Memory Quadruple Extra Large!• 68.4 GB RAM• High I/O Performance• 8 virtual cores
  17. 17. The Cloud Giveth and Taketh• Machines vanish (network, switch, power ...)• Network availability• Multi-tenant machines• SAN location• OI became a large AWS customer, assigned acct. manager and access to EC2 engineers• Email you don’t want to get on a Friday night...
  18. 18. Hello,One of your instances in the us-east-1 region is on hardware that requires networkrelated maintenance. Your other instances that are not listed here will not be affected.i-3fcdb156For the above instance, we recommend migrating to a replacement instance to avoidany downtime. Your replacement instance would not be subject to this maintenance.If you leave your instance running, you will lose network connectivity for up to twohours. The maintenance will occur during a 12-hour window starting at 12:00amPST on Monday, February 15, 2010. After the maintenance is complete, networkconnectivity will be restored to your instance.As always, we recommend keeping current backups of data stored on your instance.Sincerely,The Amazon EC2 Team
  19. 19. Failure is Assured• Load balance with health checks (Varnish)• Use DNS. Private IPs *do* change• Use Puppet (or Chef) • Hardened basic image, apply security patches there • Puppet bootstraps from there• Replace instances before they fail when possible
  20. 20. Resource Contention• Everyone needs data, everyone needs it NOW.• PUT WAL on separate disk (log writing bounds write throughput)• Keep an eye on iostat - one disk in RAID 0 can ruin your day• Backups, buffer cache filling, vacuuming
  21. 21. Connections• Managing max_connections• PGBouncer = basic conn pooler • Session mode - life of connection • Tx mode - life of transaction • Statement mode - life of single statement
  22. 22. Containment Problem• Places (points) need to be placed into neighborhoods properly• Neighborhood and municipal boundaries are complex• Neighborhoods overlap towns - need % intersection• Containment projects upward• US shape data is messy
  23. 23. Geometry is Slow :(• Simplify shapes - if you can• Avoid complex Geo queries online (ST_CONTAINS, ST_INTERSECTION, ST_CENTROID)• Cache Containment. Geo will never be faster than simple SELECT• Eventually... index containment in Lucene• PostGIS for generating and updating containment cache only (periodic, offline)
  24. 24. Hyperlocal at CNN Scale• Strategic investor• Initial API impl was CNN homepage!• Many MM page views• 350 req/s• News = sensitive to caching
  25. 25. Replication• Done via WAL Shipping• Warm standby only in Postgres 8.4• Base (hot) backup, then ship/apply applying WAL• Replica - sometimes came out of standby mode (manual procedure to remedy)• WAL shipping to multiple slaves: • Make some with RAID for emergency promotion to master • Make one use a single EBS volume and snapshot that.
  26. 26. Backup• Periodic full pg_dump -> S3• Lots of I/O pressure• Experiments using XFS RAID snapshotting. Don’t do it.
  27. 27. Load Balancing• HAProxy• ELB for Application Servers - not for internal use!• From the horse’s mouth: scales up HAProxy cores with # unique IP’s NOT raw traffic.
  28. 28. Linux Buffer Cache• Postgres highly dependent on warm OS caches• Crazy variances in query times: • 10 ms in Staging • 5000 ms in Prod• Data stampedes• Warm up time for db = warming caches
  29. 29. I/O• DB performance is a game of maximizing IO, where EC2 is your opponent.• Guaranteed IOPs (???)• RAID 0 or RAID 10?
  30. 30. EBS Filesystem Tests Seq. Reads Seq. Writes Random Random R/W Mix: RW Mix: Filesystem # of Disks (MB/s) (MB/s) Reads (MB/s) Writes (MB/s) Reads (MB/s) Writes (MB/s) EXT3, 3 74.7 102.1 1.3 20.4 21.3 25.1 64K stripe EXT3, 128K stripe, 3 1.6 11.32MB readahead buffer XFS, 3 20.7 107.2 1.7 40.2 13.6 12.5 64k stripe XFS, 3 102.2 106.2 1.5 87.8 41.1 24.6 128K stripe XFS, 4 115.8 135.4 2.0 76.4 41.0 24.6 64K stripe XFS, 4 104.8 103.1 1.8 70.8 49.3 30.3 128K stripe XFS, 128K stripe, 4 105.0 102.8 2.0 70.1 55.1 31.5 deadline scheduler
  31. 31. EBS Filesystem Tests Seq. Reads Seq. Writes Random Random R/W Mix: RW Mix: Filesystem # of Disks (MB/s) (MB/s) Reads (MB/s) Writes (MB/s) Reads (MB/s) Writes (MB/s) EXT3, 3 74.7 102.1 1.3 20.4 21.3 25.1 64K stripe EXT3, 128K stripe, 3 1.6 11.32MB readahead buffer XFS, 3 20.7 107.2 1.7 40.2 13.6 12.5 64k stripe XFS, 3 102.2 106.2 1.5 87.8 41.1 24.6 128K stripe XFS, 4 115.8 135.4 2.0 76.4 41.0 24.6 64K stripe XFS, 4 104.8 103.1 1.8 70.8 49.3 30.3 128K stripe XFS, 128K stripe, 4 105.0 102.8 2.0 70.1 55.1 31.5 deadline scheduler
  32. 32. Keeping Things Healthy• Monitor bloat• Vacuum as needed • autovacuum may not be enough • VACUUM FULL may be too much (locks)• Vacuum analyze frequently• Use autovacuum but tune carefully• PgFouine FTW! • Log analysis • Slow queries • Vacuum analysis
  33. 33. More Performance• Use stored procedures (and debugger)• Query optimizer doesn’t always do what you expect! (separate slide?)• Maximize statistics (but beware dynamic SQL) ALTER TABLE <table> ALTER COLUMN <column> SET STATISTICS <number>
  34. 34. SELECT Heinous SQL stories.id, WHERE (SELECT (fpa.owned=TRUE OR fpa.owned IS NULL) AND fsa.title fsa.story_id=stories.id FROM ORDER BY fsa.created_at DESC feed_story_attachments fsa LIMIT 1) AS author_url, LEFT OUTER JOIN feed_publication_attachments fpa (EXISTS ( ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112 SELECT fpa.id WHERE FROM feed_publication_attachments fpa (fpa.owned=TRUE OR fpa.owned IS NULL) AND JOIN feed_story_attachments fsa fsa.story_id=stories.id ON fsa.feed_id=fpa.feed_id ORDER BY fsa.created_at ASC WHERE LIMIT 1) as title, stories.id = fsa.story_id AND (SELECT fpa.publication_id=112 AND f.title fpa.owned FROM )) AS promoted feeds f FROM stories JOIN feed_story_attachments fsa JOIN blips b ON f.id=fsa.feed_id ON b.story_id = stories.id AND b.location_id=1435491 AND LEFT OUTER JOIN feed_publication_attachments fpa b.publisher_id IN (0,115) ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112 WHERE WHERE b.blip_type_id in (1,3) AND -- comment out to run prior query (fpa.owned=TRUE OR fpa.owned IS NULL) AND form fsa.story_id=stories.id ( ORDER BY fsa.created_at DESC NOT EXISTS ( LIMIT 1) as "author", SELECT bf.id (SELECT FROM blip_filters bf fsa.url WHERE FROM bf.location_id=1435491 AND feed_story_attachments fsa bf.story_id = stories.id AND LEFT OUTER JOIN feed_publication_attachments fpa bf.publisher_id=115 ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112 ) AND EXISTS ( WHERE SELECT (fpa.owned=TRUE OR fpa.owned IS NULL) AND f.id fsa.story_id=stories.id FROM ORDER BY fsa.created_at DESC feeds f LIMIT 1) as url, JOIN feed_story_attachments fsa SUBSTRING(stories.summary FROM 1 FOR 200) AS summary, ON f.id=fsa.feed_id stories.sort_date as published_at, LEFT OUTER JOIN feed_publication_attachments fpa (SELECT ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112 f.base_url WHERE FROM (fpa.owned=TRUE OR fpa.owned IS NULL) AND feeds f fsa.story_id=stories.id JOIN feed_story_attachments fsa ) AND ( ON f.id=fsa.feed_id NOT EXISTS( LEFT OUTER JOIN feed_publication_attachments fpa SELECT psf.id ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112 FROM publication_story_filters psf WHERE WHERE (fpa.owned=TRUE OR fpa.owned IS NULL) AND psf.story_id = stories.id AND fsa.story_id=stories.id psf.publication_id=112 ORDER BY fsa.created_at DESC )
  35. 35. fsa.url fpa.owned FROM )) AS promoted feed_story_attachments fsa FROM stories LEFT OUTER JOIN JOIN blips bfeed_publication_attachments fpa ON b.story_id = stories.id AND ON fsa.feed_id=fpa.feed_id b.location_id=1435491 ANDAND fpa.publication_id=112 b.publisher_id IN (0,115) WHERE WHERE (fpa.owned=TRUE OR fpa.owned b.blip_type_id in (1,3) AND --IS NULL) AND comment out to run prior query form fsa.story_id=stories.id ( ORDER BY fsa.created_at DESC NOT EXISTS ( LIMIT 1) as url, SELECT bf.id SUBSTRING(stories.summary FROM 1 FROM blip_filters bfFOR 200) AS summary, WHERE stories.sort_date as published_at, bf.location_id=1435491 AND (SELECT bf.story_id = stories.id AN f.base_url bf.publisher_id=115 FROM ) AND EXISTS ( feeds f SELECT JOIN feed_story_attachments fsa f.id ON f.id=fsa.feed_id FROM LEFT OUTER JOIN feeds ffeed_publication_attachments fpa JOIN feed_story_attachments ON fsa.feed_id=fpa.feed_id fsaAND fpa.publication_id=112 ON f.id=fsa.feed_id WHERE LEFT OUTER JOIN (fpa.owned=TRUE OR fpa.owned ISfeed_publication_attachments fpa ON fsa.feed_id=fpa.feed_i
  36. 36. Make Heinous SQL Run Fast!• Fast = subsecond• Ideally < 250 ms• Query planner - feed it stats• Sometimes rewrite q’s to take advantage of GiST indexes (critical for geo)
  37. 37. Costs Reserved StandardMonthly $$ Jan Feb March April May June July Aug Sept Oct Nov Dec
  38. 38. Lessons Learned• EC2 is still cheap-ish, but not without careful planning!• Denormalize into something else (Lucene, Geo Cache)• Monitor the crap out of everything• Send a synthetic transaction ID through stack• Plan on a few failures a week
  39. 39. • Hybrid Postgres/MongoDB/Lucene Data Stack• Postgres 9.0• Mongo for social graph and event-logging• UUIDs for shared references• Hot Standby• Streaming Replication• VPC and Dedicated Instances ($$)• Experimenting with other Clouds for Production Environment• Launching late summer - and we’re hiring!
  40. 40. Lessons Learned• Invite me back in a few months.
  41. 41. Some Thoughts and Conclusions• PostgreSQL is a GREAT choice if you are starting out now, on EC2.• The Postgres community is awesome.• Organized governing body- who needs it?• Let’s see a shrink-wrapped EC2 cloud- provider. We’ll be customer #2 :)
  42. 42. Questions?
  43. 43. Thanks!Andy Parsonsandy@bookish.com@andyparsonshttp://linkedin.com/in/andyparsons

×