Shrty• Social Network Aggregation• Seed capital• 2 developers• First attempt to run Postgres on EC2
Story• 3 guys with an idea and a logo• Built in 2 months in RoR and Java• Modest trafﬁc, tested up to 100K users• Investor pitches• “Production”• Sold.
Lessons Learned• PostgreSQL + EC2 : it works!• Cheap!• I/O is massively unpredictable• Ephemeral storage is ... ephemeral• No SLA in the Cloud
outside.in• Hyperlocal News• Geotag and categorize web pages, blog posts and tweets from hundreds of thousands of sources• Organize data into ~85,000 neighborhoods• Query for news with 1000 ft. of a user• Chose Postgres for PostGIS• Powers local on CNN’s homepage and many other sites• Now part of AOL’s Patch
EC2 DB “Hardware”• m2.4xlarge = High-Memory Quadruple Extra Large!• 68.4 GB RAM• High I/O Performance• 8 virtual cores
The Cloud Giveth and Taketh• Machines vanish (network, switch, power ...)• Network availability• Multi-tenant machines• SAN location• OI became a large AWS customer, assigned acct. manager and access to EC2 engineers• Email you don’t want to get on a Friday night...
Hello,One of your instances in the us-east-1 region is on hardware that requires networkrelated maintenance. Your other instances that are not listed here will not be affected.i-3fcdb156For the above instance, we recommend migrating to a replacement instance to avoidany downtime. Your replacement instance would not be subject to this maintenance.If you leave your instance running, you will lose network connectivity for up to twohours. The maintenance will occur during a 12-hour window starting at 12:00amPST on Monday, February 15, 2010. After the maintenance is complete, networkconnectivity will be restored to your instance.As always, we recommend keeping current backups of data stored on your instance.Sincerely,The Amazon EC2 Team
Failure is Assured• Load balance with health checks (Varnish)• Use DNS. Private IPs *do* change• Use Puppet (or Chef) • Hardened basic image, apply security patches there • Puppet bootstraps from there• Replace instances before they fail when possible
Resource Contention• Everyone needs data, everyone needs it NOW.• PUT WAL on separate disk (log writing bounds write throughput)• Keep an eye on iostat - one disk in RAID 0 can ruin your day• Backups, buffer cache ﬁlling, vacuuming
Connections• Managing max_connections• PGBouncer = basic conn pooler • Session mode - life of connection • Tx mode - life of transaction • Statement mode - life of single statement
Containment Problem• Places (points) need to be placed into neighborhoods properly• Neighborhood and municipal boundaries are complex• Neighborhoods overlap towns - need % intersection• Containment projects upward• US shape data is messy
Geometry is Slow :(• Simplify shapes - if you can• Avoid complex Geo queries online (ST_CONTAINS, ST_INTERSECTION, ST_CENTROID)• Cache Containment. Geo will never be faster than simple SELECT• Eventually... index containment in Lucene• PostGIS for generating and updating containment cache only (periodic, ofﬂine)
Hyperlocal at CNN Scale• Strategic investor• Initial API impl was CNN homepage!• Many MM page views• 350 req/s• News = sensitive to caching
Replication• Done via WAL Shipping• Warm standby only in Postgres 8.4• Base (hot) backup, then ship/apply applying WAL• Replica - sometimes came out of standby mode (manual procedure to remedy)• WAL shipping to multiple slaves: • Make some with RAID for emergency promotion to master • Make one use a single EBS volume and snapshot that.
Backup• Periodic full pg_dump -> S3• Lots of I/O pressure• Experiments using XFS RAID snapshotting. Don’t do it.
Load Balancing• HAProxy• ELB for Application Servers - not for internal use!• From the horse’s mouth: scales up HAProxy cores with # unique IP’s NOT raw trafﬁc.
Linux Buffer Cache• Postgres highly dependent on warm OS caches• Crazy variances in query times: • 10 ms in Staging • 5000 ms in Prod• Data stampedes• Warm up time for db = warming caches
I/O• DB performance is a game of maximizing IO, where EC2 is your opponent.• Guaranteed IOPs (???)• RAID 0 or RAID 10?
Keeping Things Healthy• Monitor bloat• Vacuum as needed • autovacuum may not be enough • VACUUM FULL may be too much (locks)• Vacuum analyze frequently• Use autovacuum but tune carefully• PgFouine FTW! • Log analysis • Slow queries • Vacuum analysis
More Performance• Use stored procedures (and debugger)• Query optimizer doesn’t always do what you expect! (separate slide?)• Maximize statistics (but beware dynamic SQL) ALTER TABLE <table> ALTER COLUMN <column> SET STATISTICS <number>
SELECT Heinous SQL stories.id, WHERE (SELECT (fpa.owned=TRUE OR fpa.owned IS NULL) AND fsa.title fsa.story_id=stories.id FROM ORDER BY fsa.created_at DESC feed_story_attachments fsa LIMIT 1) AS author_url, LEFT OUTER JOIN feed_publication_attachments fpa (EXISTS ( ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112 SELECT fpa.id WHERE FROM feed_publication_attachments fpa (fpa.owned=TRUE OR fpa.owned IS NULL) AND JOIN feed_story_attachments fsa fsa.story_id=stories.id ON fsa.feed_id=fpa.feed_id ORDER BY fsa.created_at ASC WHERE LIMIT 1) as title, stories.id = fsa.story_id AND (SELECT fpa.publication_id=112 AND f.title fpa.owned FROM )) AS promoted feeds f FROM stories JOIN feed_story_attachments fsa JOIN blips b ON f.id=fsa.feed_id ON b.story_id = stories.id AND b.location_id=1435491 AND LEFT OUTER JOIN feed_publication_attachments fpa b.publisher_id IN (0,115) ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112 WHERE WHERE b.blip_type_id in (1,3) AND -- comment out to run prior query (fpa.owned=TRUE OR fpa.owned IS NULL) AND form fsa.story_id=stories.id ( ORDER BY fsa.created_at DESC NOT EXISTS ( LIMIT 1) as "author", SELECT bf.id (SELECT FROM blip_filters bf fsa.url WHERE FROM bf.location_id=1435491 AND feed_story_attachments fsa bf.story_id = stories.id AND LEFT OUTER JOIN feed_publication_attachments fpa bf.publisher_id=115 ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112 ) AND EXISTS ( WHERE SELECT (fpa.owned=TRUE OR fpa.owned IS NULL) AND f.id fsa.story_id=stories.id FROM ORDER BY fsa.created_at DESC feeds f LIMIT 1) as url, JOIN feed_story_attachments fsa SUBSTRING(stories.summary FROM 1 FOR 200) AS summary, ON f.id=fsa.feed_id stories.sort_date as published_at, LEFT OUTER JOIN feed_publication_attachments fpa (SELECT ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112 f.base_url WHERE FROM (fpa.owned=TRUE OR fpa.owned IS NULL) AND feeds f fsa.story_id=stories.id JOIN feed_story_attachments fsa ) AND ( ON f.id=fsa.feed_id NOT EXISTS( LEFT OUTER JOIN feed_publication_attachments fpa SELECT psf.id ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112 FROM publication_story_filters psf WHERE WHERE (fpa.owned=TRUE OR fpa.owned IS NULL) AND psf.story_id = stories.id AND fsa.story_id=stories.id psf.publication_id=112 ORDER BY fsa.created_at DESC )
fsa.url fpa.owned FROM )) AS promoted feed_story_attachments fsa FROM stories LEFT OUTER JOIN JOIN blips bfeed_publication_attachments fpa ON b.story_id = stories.id AND ON fsa.feed_id=fpa.feed_id b.location_id=1435491 ANDAND fpa.publication_id=112 b.publisher_id IN (0,115) WHERE WHERE (fpa.owned=TRUE OR fpa.owned b.blip_type_id in (1,3) AND --IS NULL) AND comment out to run prior query form fsa.story_id=stories.id ( ORDER BY fsa.created_at DESC NOT EXISTS ( LIMIT 1) as url, SELECT bf.id SUBSTRING(stories.summary FROM 1 FROM blip_filters bfFOR 200) AS summary, WHERE stories.sort_date as published_at, bf.location_id=1435491 AND (SELECT bf.story_id = stories.id AN f.base_url bf.publisher_id=115 FROM ) AND EXISTS ( feeds f SELECT JOIN feed_story_attachments fsa f.id ON f.id=fsa.feed_id FROM LEFT OUTER JOIN feeds ffeed_publication_attachments fpa JOIN feed_story_attachments ON fsa.feed_id=fpa.feed_id fsaAND fpa.publication_id=112 ON f.id=fsa.feed_id WHERE LEFT OUTER JOIN (fpa.owned=TRUE OR fpa.owned ISfeed_publication_attachments fpa ON fsa.feed_id=fpa.feed_i
Make Heinous SQL Run Fast!• Fast = subsecond• Ideally < 250 ms• Query planner - feed it stats• Sometimes rewrite q’s to take advantage of GiST indexes (critical for geo)
Costs Reserved StandardMonthly $$ Jan Feb March April May June July Aug Sept Oct Nov Dec
Lessons Learned• EC2 is still cheap-ish, but not without careful planning!• Denormalize into something else (Lucene, Geo Cache)• Monitor the crap out of everything• Send a synthetic transaction ID through stack• Plan on a few failures a week
• Hybrid Postgres/MongoDB/Lucene Data Stack• Postgres 9.0• Mongo for social graph and event-logging• UUIDs for shared references• Hot Standby• Streaming Replication• VPC and Dedicated Instances ($$)• Experimenting with other Clouds for Production Environment• Launching late summer - and we’re hiring!
Lessons Learned• Invite me back in a few months.
Some Thoughts and Conclusions• PostgreSQL is a GREAT choice if you are starting out now, on EC2.• The Postgres community is awesome.• Organized governing body- who needs it?• Let’s see a shrink-wrapped EC2 cloud- provider. We’ll be customer #2 :)