SlideShare a Scribd company logo
1 of 43
War Stories: Operational
Fun with PostgreSQL in
      the Cloud
        Andy Parsons
Who am I?

• Startup junkie / masochist
• Deliver stuff that works in startup time
• Old timer on the NYC startup scene
• ♥ luxury of choosing tools. And living with
  them.
Who Aren’t I?

• DBA
• Sys Admin
• PostgreSQL Uber Guru
This Talk is ...
• Pragmatic, Concrete
• About My Experiences and Lessons
  Learned
• About 3 recent startups built with
  PostgreSQL
• Going to focus on Postgres, but leak into
  overall architecture
Brief History:
    me + PostgreSQL
• Digital Railroad (Deadpool 2007)
• Shrty (acquired by Collecta 2008)
• Outside.in (acquired by AOL 2011)
• Bookish (stealth)
The IM You Don’t Want
     to Get. Ever.
The IM You Don’t Want
     to Get. Ever.
           1:05 am “Site’s down”

     1:06 am “U seeing all these alerts?”

      1:09 am “What’s it mean- no such
                 device?”
The IM You Don’t Want
     to Get. Ever.
           1:05 am “Site’s down”

     1:06 am “U seeing all these alerts?”

      1:09 am “What’s it mean- no such
                 device?”
Fallout

• A lot of the system was down for a short
  time
• When it came back up, data was old
• New data had to be merged with incoming
• But, incoming pipeline never compromised
Lessons Learned
• Don’t trust expensive hardware and
  datacenters
• Redundant isn’t redundant enough
• SH** HAPPENS
• Postgres looks cool!
Shrty

• Social Network Aggregation
• Seed capital
• 2 developers
• First attempt to run Postgres on EC2
Story
• 3 guys with an idea and a logo
• Built in 2 months in RoR and Java
• Modest traffic, tested up to 100K users
• Investor pitches
• “Production”
• Sold.
Lessons Learned
• PostgreSQL + EC2 : it works!
• Cheap!
• I/O is massively unpredictable
• Ephemeral storage is ... ephemeral
• No SLA in the Cloud
outside.in
•   Hyperlocal News
•   Geotag and categorize web pages, blog posts and
    tweets from hundreds of thousands of sources
•   Organize data into ~85,000 neighborhoods
•   Query for news with 1000 ft. of a user
•   Chose Postgres for PostGIS
•   Powers local on CNN’s homepage and many other sites
•   Now part of AOL’s Patch
Architecture
                                                 Postgres
 RoR                                              Slave
                   Scala              Postgres
                    Svc                Master
         Scala             Denorm                Postgres
 RoR     APIs                  /                  Slave
                   Scala   Indexing
                                                            Q’ing
                    Svc
         Scala
Mobile   APIs               Text                  Solr
                   Scala   Mining                 Slave
                                       Solr
                    Svc
                                      Master
Public                                            Solr
 API                                              Slave
EC2 DB “Hardware”

• m2.4xlarge = High-Memory Quadruple
  Extra Large!
• 68.4 GB RAM
• High I/O Performance
• 8 virtual cores
The Cloud Giveth
        and Taketh
•   Machines vanish (network, switch, power ...)
•   Network availability
•   Multi-tenant machines
•   SAN location
•   OI became a large AWS customer, assigned
    acct. manager and access to EC2 engineers
•   Email you don’t want to get on a Friday night...
Hello,

One of your instances in the us-east-1 region is on hardware that requires network
related maintenance. Your other instances that are not listed here will not be affected.

i-3fcdb156

For the above instance, we recommend migrating to a replacement instance to avoid
any downtime. Your replacement instance would not be subject to this maintenance.

If you leave your instance running, you will lose network connectivity for up to two
hours. The maintenance will occur during a 12-hour window starting at 12:00am
PST on Monday, February 15, 2010. After the maintenance is complete, network
connectivity will be restored to your instance.

As always, we recommend keeping current backups of data stored on your instance.

Sincerely,

The Amazon EC2 Team
Failure is Assured
•   Load balance with health checks (Varnish)
•   Use DNS. Private IPs *do* change
•   Use Puppet (or Chef)
    •   Hardened basic image, apply security patches
        there
    •   Puppet bootstraps from there
•   Replace instances before they fail when possible
Resource Contention
• Everyone needs data, everyone needs it
  NOW.
• PUT WAL on separate disk (log writing
  bounds write throughput)
• Keep an eye on iostat - one disk in RAID 0
  can ruin your day
• Backups, buffer cache filling, vacuuming
Connections

• Managing max_connections
• PGBouncer = basic conn pooler
 • Session mode - life of connection
 • Tx mode - life of transaction
 • Statement mode - life of single statement
Containment Problem
•   Places (points) need to be placed into
    neighborhoods properly
•   Neighborhood and municipal boundaries are
    complex
•   Neighborhoods overlap towns - need %
    intersection
•   Containment projects upward
•   US shape data is messy
Geometry is
      Slow :(
•   Simplify shapes - if you can
•   Avoid complex Geo queries online
    (ST_CONTAINS, ST_INTERSECTION,
    ST_CENTROID)
•   Cache Containment. Geo will never be faster than
    simple SELECT
•   Eventually... index containment in Lucene
•   PostGIS for generating and updating containment
    cache only (periodic, offline)
Hyperlocal at
             CNN Scale
• Strategic investor
• Initial API impl was
  CNN homepage!
• Many MM page views
• 350 req/s
• News = sensitive to
  caching
Replication
•   Done via WAL Shipping
•   Warm standby only in Postgres 8.4
•   Base (hot) backup, then ship/apply applying WAL
•   Replica - sometimes came out of standby mode (manual
    procedure to remedy)
•   WAL shipping to multiple slaves:
    •   Make some with RAID for emergency promotion to
        master
    •   Make one use a single EBS volume and snapshot that.
Backup

• Periodic full pg_dump -> S3
• Lots of I/O pressure
• Experiments using XFS RAID snapshotting.
  Don’t do it.
Load Balancing
• HAProxy
• ELB for Application Servers - not for
  internal use!
• From the horse’s mouth: scales up
  HAProxy cores with # unique IP’s NOT
  raw traffic.
Linux Buffer Cache
• Postgres highly dependent on warm OS
  caches
• Crazy variances in query times:
  • 10 ms in Staging
  • 5000 ms in Prod
• Data stampedes
• Warm up time for db = warming caches
I/O

• DB performance is a game of maximizing
  IO, where EC2 is your opponent.
• Guaranteed IOPs (???)
• RAID 0 or RAID 10?
EBS Filesystem Tests
                                    Seq. Reads   Seq. Writes     Random      Random        R/W Mix:     RW Mix:
    Filesystem         # of Disks
                                      (MB/s)       (MB/s)      Reads (MB/s) Writes (MB/s) Reads (MB/s) Writes (MB/s)
       EXT3,
                           3           74.7         102.1          1.3          20.4          21.3          25.1
     64K stripe

 EXT3, 128K stripe,
                           3                                       1.6          11.3
2MB readahead buffer

       XFS,
                           3           20.7         107.2          1.7          40.2          13.6          12.5
     64k stripe

       XFS,
                           3          102.2         106.2          1.5          87.8          41.1          24.6
    128K stripe

       XFS,
                           4          115.8         135.4          2.0          76.4          41.0          24.6
     64K stripe

       XFS,
                           4          104.8         103.1          1.8          70.8          49.3          30.3
    128K stripe

  XFS, 128K stripe,
                           4          105.0         102.8          2.0          70.1          55.1          31.5
 deadline scheduler
EBS Filesystem Tests
                                    Seq. Reads   Seq. Writes     Random      Random        R/W Mix:     RW Mix:
    Filesystem         # of Disks
                                      (MB/s)       (MB/s)      Reads (MB/s) Writes (MB/s) Reads (MB/s) Writes (MB/s)
       EXT3,
                           3           74.7         102.1          1.3          20.4          21.3          25.1
     64K stripe

 EXT3, 128K stripe,
                           3                                       1.6          11.3
2MB readahead buffer

       XFS,
                           3           20.7         107.2          1.7          40.2          13.6          12.5
     64k stripe

       XFS,
                           3          102.2         106.2          1.5          87.8          41.1          24.6
    128K stripe

       XFS,
                           4          115.8         135.4          2.0          76.4          41.0          24.6
     64K stripe

       XFS,
                           4          104.8         103.1          1.8          70.8          49.3          30.3
    128K stripe

  XFS, 128K stripe,
                           4          105.0         102.8          2.0          70.1          55.1          31.5
 deadline scheduler
Keeping Things Healthy
•   Monitor bloat
•   Vacuum as needed
    •   autovacuum may not be enough
    •   VACUUM FULL may be too much (locks)
•   Vacuum analyze frequently
•   Use autovacuum but tune carefully
•   PgFouine FTW!
    •   Log analysis
    •   Slow queries
    •   Vacuum analysis
More Performance
• Use stored procedures (and debugger)
• Query optimizer doesn’t always do what
  you expect! (separate slide?)
• Maximize statistics (but beware dynamic
  SQL)
  ALTER TABLE <table> ALTER COLUMN <column> SET
  STATISTICS <number>
SELECT




                                  Heinous SQL
  stories.id,                                                        WHERE
  (SELECT                                                              (fpa.owned=TRUE OR fpa.owned IS NULL) AND
     fsa.title                                                         fsa.story_id=stories.id
   FROM                                                              ORDER BY fsa.created_at DESC
     feed_story_attachments fsa                                      LIMIT 1) AS author_url,
     LEFT OUTER JOIN feed_publication_attachments fpa              (EXISTS (
        ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112         SELECT fpa.id
   WHERE                                                              FROM feed_publication_attachments fpa
     (fpa.owned=TRUE OR fpa.owned IS NULL) AND                          JOIN feed_story_attachments fsa
     fsa.story_id=stories.id                                               ON fsa.feed_id=fpa.feed_id
   ORDER BY fsa.created_at ASC                                        WHERE
   LIMIT 1) as title,                                                   stories.id = fsa.story_id AND
   (SELECT                                                              fpa.publication_id=112 AND
       f.title                                                          fpa.owned
    FROM                                                           )) AS promoted
       feeds f                                                   FROM stories
       JOIN feed_story_attachments fsa                           JOIN blips b
         ON f.id=fsa.feed_id                                       ON b.story_id = stories.id AND b.location_id=1435491 AND
       LEFT OUTER JOIN feed_publication_attachments fpa          b.publisher_id IN (0,115)
         ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112   WHERE
    WHERE                                                          b.blip_type_id in (1,3) AND -- comment out to run prior query
       (fpa.owned=TRUE OR fpa.owned IS NULL) AND                 form
       fsa.story_id=stories.id                                     (
    ORDER BY fsa.created_at DESC                                      NOT EXISTS (
     LIMIT 1) as "author",                                              SELECT bf.id
   (SELECT                                                              FROM blip_filters bf
       fsa.url                                                          WHERE
    FROM                                                                   bf.location_id=1435491 AND
       feed_story_attachments fsa                                          bf.story_id = stories.id AND
       LEFT OUTER JOIN feed_publication_attachments fpa                    bf.publisher_id=115
         ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112        ) AND EXISTS (
    WHERE                                                               SELECT
       (fpa.owned=TRUE OR fpa.owned IS NULL) AND                           f.id
       fsa.story_id=stories.id                                          FROM
    ORDER BY fsa.created_at DESC                                           feeds f
    LIMIT 1) as url,                                                       JOIN feed_story_attachments fsa
  SUBSTRING(stories.summary FROM 1 FOR 200) AS summary,                      ON f.id=fsa.feed_id
  stories.sort_date as published_at,                                       LEFT OUTER JOIN feed_publication_attachments fpa
  (SELECT                                                                    ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112
     f.base_url                                                         WHERE
   FROM                                                                    (fpa.owned=TRUE OR fpa.owned IS NULL) AND
     feeds f                                                               fsa.story_id=stories.id
     JOIN feed_story_attachments fsa                                  ) AND (
        ON f.id=fsa.feed_id                                           NOT EXISTS(
     LEFT OUTER JOIN feed_publication_attachments fpa                   SELECT psf.id
        ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112           FROM publication_story_filters psf
   WHERE                                                                WHERE
     (fpa.owned=TRUE OR fpa.owned IS NULL) AND                             psf.story_id = stories.id AND
     fsa.story_id=stories.id                                               psf.publication_id=112
   ORDER BY fsa.created_at DESC                                         )
fsa.url                             fpa.owned
    FROM                              )) AS promoted
      feed_story_attachments fsa    FROM stories
      LEFT OUTER JOIN               JOIN blips b
feed_publication_attachments fpa      ON b.story_id = stories.id AND
         ON fsa.feed_id=fpa.feed_id b.location_id=1435491 AND
AND fpa.publication_id=112          b.publisher_id IN (0,115)
    WHERE                           WHERE
      (fpa.owned=TRUE OR fpa.owned    b.blip_type_id in (1,3) AND --
IS NULL) AND                        comment out to run prior query form
      fsa.story_id=stories.id         (
    ORDER BY fsa.created_at DESC        NOT EXISTS (
    LIMIT 1) as url,                      SELECT bf.id
  SUBSTRING(stories.summary FROM 1        FROM blip_filters bf
FOR 200) AS summary,                      WHERE
  stories.sort_date as published_at,        bf.location_id=1435491 AND
  (SELECT                                   bf.story_id = stories.id AN
     f.base_url                             bf.publisher_id=115
   FROM                                 ) AND EXISTS (
     feeds f                              SELECT
     JOIN feed_story_attachments fsa        f.id
        ON f.id=fsa.feed_id               FROM
     LEFT OUTER JOIN                        feeds f
feed_publication_attachments fpa            JOIN feed_story_attachments
        ON fsa.feed_id=fpa.feed_id  fsa
AND fpa.publication_id=112                     ON f.id=fsa.feed_id
   WHERE                                    LEFT OUTER JOIN
     (fpa.owned=TRUE OR fpa.owned ISfeed_publication_attachments fpa
                                               ON fsa.feed_id=fpa.feed_i
Make Heinous SQL
       Run Fast!
• Fast = subsecond
• Ideally < 250 ms
• Query planner - feed it stats
• Sometimes rewrite q’s to take advantage of
  GiST indexes (critical for geo)
Costs
                                   Reserved        Standard
Monthly $$




         Jan   Feb   March April   May   June   July   Aug    Sept   Oct   Nov   Dec
Lessons Learned
• EC2 is still cheap-ish, but not without careful
  planning!
• Denormalize into something else (Lucene,
  Geo Cache)
• Monitor the crap out of everything
• Send a synthetic transaction ID through stack
• Plan on a few failures a week
•   Hybrid Postgres/MongoDB/Lucene Data Stack
•   Postgres 9.0
•   Mongo for social graph and event-logging
•   UUIDs for shared references
•   Hot Standby
•   Streaming Replication
•   VPC and Dedicated Instances ($$)
•   Experimenting with other Clouds for Production
    Environment
•   Launching late summer - and we’re hiring!
Lessons Learned


• Invite me back in a few months.
Some Thoughts and
     Conclusions
• PostgreSQL is a GREAT choice if you are
  starting out now, on EC2.
• The Postgres community is awesome.
• Organized governing body- who needs it?
• Let’s see a shrink-wrapped EC2 cloud-
  provider. We’ll be customer #2 :)
Questions?
Thanks!
Andy Parsons
andy@bookish.com
@andyparsons
http://linkedin.com/in/andyparsons

More Related Content

What's hot

Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)MongoDB
 
Exadata and OLTP
Exadata and OLTPExadata and OLTP
Exadata and OLTPEnkitec
 
Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State DrivesRick Branson
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems Baruch Osoveskiy
 
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph EnterpriseRed Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph EnterpriseRed_Hat_Storage
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster Ceph Community
 
Perf Vsphere Storage Protocols
Perf Vsphere Storage ProtocolsPerf Vsphere Storage Protocols
Perf Vsphere Storage ProtocolsYanghua Zhang
 
MySQL High-Availability and Scale-Out architectures
MySQL High-Availability and Scale-Out architecturesMySQL High-Availability and Scale-Out architectures
MySQL High-Availability and Scale-Out architecturesFromDual GmbH
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on CephCeph Community
 
Understanding PostgreSQL LW Locks
Understanding PostgreSQL LW LocksUnderstanding PostgreSQL LW Locks
Understanding PostgreSQL LW LocksJignesh Shah
 
Stateless Hypervisors at Scale
Stateless Hypervisors at ScaleStateless Hypervisors at Scale
Stateless Hypervisors at ScaleAntony Messerl
 
An Introduction to the Implementation of ZFS by Kirk McKusick
An Introduction to the Implementation of ZFS by Kirk McKusickAn Introduction to the Implementation of ZFS by Kirk McKusick
An Introduction to the Implementation of ZFS by Kirk McKusickeurobsdcon
 
PostgreSQL and Linux Containers
PostgreSQL and Linux ContainersPostgreSQL and Linux Containers
PostgreSQL and Linux ContainersJignesh Shah
 
Docking postgres
Docking postgresDocking postgres
Docking postgresrycamor
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Community
 
Ceph Day Melbourne - Walk Through a Software Defined Everything PoC
Ceph Day Melbourne - Walk Through a Software Defined Everything PoCCeph Day Melbourne - Walk Through a Software Defined Everything PoC
Ceph Day Melbourne - Walk Through a Software Defined Everything PoCCeph Community
 

What's hot (19)

Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Exadata and OLTP
Exadata and OLTPExadata and OLTP
Exadata and OLTP
 
Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State Drives
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems
 
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph EnterpriseRed Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
Red Hat Enterprise Linux OpenStack Platform on Inktank Ceph Enterprise
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Docker vs kvm
Docker vs kvmDocker vs kvm
Docker vs kvm
 
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
Ceph Day Shanghai - CeTune - Benchmarking and tuning your Ceph cluster
 
Perf Vsphere Storage Protocols
Perf Vsphere Storage ProtocolsPerf Vsphere Storage Protocols
Perf Vsphere Storage Protocols
 
MySQL High-Availability and Scale-Out architectures
MySQL High-Availability and Scale-Out architecturesMySQL High-Availability and Scale-Out architectures
MySQL High-Availability and Scale-Out architectures
 
How swift is your Swift - SD.pptx
How swift is your Swift - SD.pptxHow swift is your Swift - SD.pptx
How swift is your Swift - SD.pptx
 
2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph2016-JAN-28 -- High Performance Production Databases on Ceph
2016-JAN-28 -- High Performance Production Databases on Ceph
 
Understanding PostgreSQL LW Locks
Understanding PostgreSQL LW LocksUnderstanding PostgreSQL LW Locks
Understanding PostgreSQL LW Locks
 
Stateless Hypervisors at Scale
Stateless Hypervisors at ScaleStateless Hypervisors at Scale
Stateless Hypervisors at Scale
 
An Introduction to the Implementation of ZFS by Kirk McKusick
An Introduction to the Implementation of ZFS by Kirk McKusickAn Introduction to the Implementation of ZFS by Kirk McKusick
An Introduction to the Implementation of ZFS by Kirk McKusick
 
PostgreSQL and Linux Containers
PostgreSQL and Linux ContainersPostgreSQL and Linux Containers
PostgreSQL and Linux Containers
 
Docking postgres
Docking postgresDocking postgres
Docking postgres
 
Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph Ceph Day Melbourne - Troubleshooting Ceph
Ceph Day Melbourne - Troubleshooting Ceph
 
Ceph Day Melbourne - Walk Through a Software Defined Everything PoC
Ceph Day Melbourne - Walk Through a Software Defined Everything PoCCeph Day Melbourne - Walk Through a Software Defined Everything PoC
Ceph Day Melbourne - Walk Through a Software Defined Everything PoC
 

Similar to Andy Parsons Pivotal June 2011

How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataacelyc1112009
 
Real world Scala hAkking NLJUG JFall 2011
Real world Scala hAkking NLJUG JFall 2011Real world Scala hAkking NLJUG JFall 2011
Real world Scala hAkking NLJUG JFall 2011Raymond Roestenburg
 
Solid State Drive Technology - MIT Lincoln Labs
Solid State Drive Technology - MIT Lincoln LabsSolid State Drive Technology - MIT Lincoln Labs
Solid State Drive Technology - MIT Lincoln LabsMatt Simmons
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationBigstep
 
stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4Gaurav "GP" Pal
 
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefGaurav "GP" Pal
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentSpeedment, Inc.
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...srisatish ambati
 
How to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation SavingsHow to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation SavingsIsaac Christoffersen
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualizationSisimon Soman
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the CloudInes Sombra
 
Road show 2015 triangle meetup
Road show 2015 triangle meetupRoad show 2015 triangle meetup
Road show 2015 triangle meetupwim_provoost
 
V mware2012 20121221_final
V mware2012 20121221_finalV mware2012 20121221_final
V mware2012 20121221_finalWeb2Present
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0jbellis
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelinesSumant Tambe
 

Similar to Andy Parsons Pivotal June 2011 (20)

How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
Real world Scala hAkking NLJUG JFall 2011
Real world Scala hAkking NLJUG JFall 2011Real world Scala hAkking NLJUG JFall 2011
Real world Scala hAkking NLJUG JFall 2011
 
Solid State Drive Technology - MIT Lincoln Labs
Solid State Drive Technology - MIT Lincoln LabsSolid State Drive Technology - MIT Lincoln Labs
Solid State Drive Technology - MIT Lincoln Labs
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4
 
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ Speedment
 
Ceph on arm64 upload
Ceph on arm64   uploadCeph on arm64   upload
Ceph on arm64 upload
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
 
How to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation SavingsHow to Modernize Your Database Platform to Realize Consolidation Savings
How to Modernize Your Database Platform to Realize Consolidation Savings
 
VDI storage and storage virtualization
VDI storage and storage virtualizationVDI storage and storage virtualization
VDI storage and storage virtualization
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
 
Road show 2015 triangle meetup
Road show 2015 triangle meetupRoad show 2015 triangle meetup
Road show 2015 triangle meetup
 
V mware2012 20121221_final
V mware2012 20121221_finalV mware2012 20121221_final
V mware2012 20121221_final
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 

Andy Parsons Pivotal June 2011

  • 1. War Stories: Operational Fun with PostgreSQL in the Cloud Andy Parsons
  • 2. Who am I? • Startup junkie / masochist • Deliver stuff that works in startup time • Old timer on the NYC startup scene • ♥ luxury of choosing tools. And living with them.
  • 3. Who Aren’t I? • DBA • Sys Admin • PostgreSQL Uber Guru
  • 4. This Talk is ... • Pragmatic, Concrete • About My Experiences and Lessons Learned • About 3 recent startups built with PostgreSQL • Going to focus on Postgres, but leak into overall architecture
  • 5. Brief History: me + PostgreSQL • Digital Railroad (Deadpool 2007) • Shrty (acquired by Collecta 2008) • Outside.in (acquired by AOL 2011) • Bookish (stealth)
  • 6. The IM You Don’t Want to Get. Ever.
  • 7. The IM You Don’t Want to Get. Ever. 1:05 am “Site’s down” 1:06 am “U seeing all these alerts?” 1:09 am “What’s it mean- no such device?”
  • 8. The IM You Don’t Want to Get. Ever. 1:05 am “Site’s down” 1:06 am “U seeing all these alerts?” 1:09 am “What’s it mean- no such device?”
  • 9. Fallout • A lot of the system was down for a short time • When it came back up, data was old • New data had to be merged with incoming • But, incoming pipeline never compromised
  • 10. Lessons Learned • Don’t trust expensive hardware and datacenters • Redundant isn’t redundant enough • SH** HAPPENS • Postgres looks cool!
  • 11. Shrty • Social Network Aggregation • Seed capital • 2 developers • First attempt to run Postgres on EC2
  • 12. Story • 3 guys with an idea and a logo • Built in 2 months in RoR and Java • Modest traffic, tested up to 100K users • Investor pitches • “Production” • Sold.
  • 13. Lessons Learned • PostgreSQL + EC2 : it works! • Cheap! • I/O is massively unpredictable • Ephemeral storage is ... ephemeral • No SLA in the Cloud
  • 14. outside.in • Hyperlocal News • Geotag and categorize web pages, blog posts and tweets from hundreds of thousands of sources • Organize data into ~85,000 neighborhoods • Query for news with 1000 ft. of a user • Chose Postgres for PostGIS • Powers local on CNN’s homepage and many other sites • Now part of AOL’s Patch
  • 15. Architecture Postgres RoR Slave Scala Postgres Svc Master Scala Denorm Postgres RoR APIs / Slave Scala Indexing Q’ing Svc Scala Mobile APIs Text Solr Scala Mining Slave Solr Svc Master Public Solr API Slave
  • 16. EC2 DB “Hardware” • m2.4xlarge = High-Memory Quadruple Extra Large! • 68.4 GB RAM • High I/O Performance • 8 virtual cores
  • 17. The Cloud Giveth and Taketh • Machines vanish (network, switch, power ...) • Network availability • Multi-tenant machines • SAN location • OI became a large AWS customer, assigned acct. manager and access to EC2 engineers • Email you don’t want to get on a Friday night...
  • 18. Hello, One of your instances in the us-east-1 region is on hardware that requires network related maintenance. Your other instances that are not listed here will not be affected. i-3fcdb156 For the above instance, we recommend migrating to a replacement instance to avoid any downtime. Your replacement instance would not be subject to this maintenance. If you leave your instance running, you will lose network connectivity for up to two hours. The maintenance will occur during a 12-hour window starting at 12:00am PST on Monday, February 15, 2010. After the maintenance is complete, network connectivity will be restored to your instance. As always, we recommend keeping current backups of data stored on your instance. Sincerely, The Amazon EC2 Team
  • 19. Failure is Assured • Load balance with health checks (Varnish) • Use DNS. Private IPs *do* change • Use Puppet (or Chef) • Hardened basic image, apply security patches there • Puppet bootstraps from there • Replace instances before they fail when possible
  • 20. Resource Contention • Everyone needs data, everyone needs it NOW. • PUT WAL on separate disk (log writing bounds write throughput) • Keep an eye on iostat - one disk in RAID 0 can ruin your day • Backups, buffer cache filling, vacuuming
  • 21. Connections • Managing max_connections • PGBouncer = basic conn pooler • Session mode - life of connection • Tx mode - life of transaction • Statement mode - life of single statement
  • 22. Containment Problem • Places (points) need to be placed into neighborhoods properly • Neighborhood and municipal boundaries are complex • Neighborhoods overlap towns - need % intersection • Containment projects upward • US shape data is messy
  • 23. Geometry is Slow :( • Simplify shapes - if you can • Avoid complex Geo queries online (ST_CONTAINS, ST_INTERSECTION, ST_CENTROID) • Cache Containment. Geo will never be faster than simple SELECT • Eventually... index containment in Lucene • PostGIS for generating and updating containment cache only (periodic, offline)
  • 24. Hyperlocal at CNN Scale • Strategic investor • Initial API impl was CNN homepage! • Many MM page views • 350 req/s • News = sensitive to caching
  • 25. Replication • Done via WAL Shipping • Warm standby only in Postgres 8.4 • Base (hot) backup, then ship/apply applying WAL • Replica - sometimes came out of standby mode (manual procedure to remedy) • WAL shipping to multiple slaves: • Make some with RAID for emergency promotion to master • Make one use a single EBS volume and snapshot that.
  • 26. Backup • Periodic full pg_dump -> S3 • Lots of I/O pressure • Experiments using XFS RAID snapshotting. Don’t do it.
  • 27. Load Balancing • HAProxy • ELB for Application Servers - not for internal use! • From the horse’s mouth: scales up HAProxy cores with # unique IP’s NOT raw traffic.
  • 28. Linux Buffer Cache • Postgres highly dependent on warm OS caches • Crazy variances in query times: • 10 ms in Staging • 5000 ms in Prod • Data stampedes • Warm up time for db = warming caches
  • 29. I/O • DB performance is a game of maximizing IO, where EC2 is your opponent. • Guaranteed IOPs (???) • RAID 0 or RAID 10?
  • 30. EBS Filesystem Tests Seq. Reads Seq. Writes Random Random R/W Mix: RW Mix: Filesystem # of Disks (MB/s) (MB/s) Reads (MB/s) Writes (MB/s) Reads (MB/s) Writes (MB/s) EXT3, 3 74.7 102.1 1.3 20.4 21.3 25.1 64K stripe EXT3, 128K stripe, 3 1.6 11.3 2MB readahead buffer XFS, 3 20.7 107.2 1.7 40.2 13.6 12.5 64k stripe XFS, 3 102.2 106.2 1.5 87.8 41.1 24.6 128K stripe XFS, 4 115.8 135.4 2.0 76.4 41.0 24.6 64K stripe XFS, 4 104.8 103.1 1.8 70.8 49.3 30.3 128K stripe XFS, 128K stripe, 4 105.0 102.8 2.0 70.1 55.1 31.5 deadline scheduler
  • 31. EBS Filesystem Tests Seq. Reads Seq. Writes Random Random R/W Mix: RW Mix: Filesystem # of Disks (MB/s) (MB/s) Reads (MB/s) Writes (MB/s) Reads (MB/s) Writes (MB/s) EXT3, 3 74.7 102.1 1.3 20.4 21.3 25.1 64K stripe EXT3, 128K stripe, 3 1.6 11.3 2MB readahead buffer XFS, 3 20.7 107.2 1.7 40.2 13.6 12.5 64k stripe XFS, 3 102.2 106.2 1.5 87.8 41.1 24.6 128K stripe XFS, 4 115.8 135.4 2.0 76.4 41.0 24.6 64K stripe XFS, 4 104.8 103.1 1.8 70.8 49.3 30.3 128K stripe XFS, 128K stripe, 4 105.0 102.8 2.0 70.1 55.1 31.5 deadline scheduler
  • 32. Keeping Things Healthy • Monitor bloat • Vacuum as needed • autovacuum may not be enough • VACUUM FULL may be too much (locks) • Vacuum analyze frequently • Use autovacuum but tune carefully • PgFouine FTW! • Log analysis • Slow queries • Vacuum analysis
  • 33. More Performance • Use stored procedures (and debugger) • Query optimizer doesn’t always do what you expect! (separate slide?) • Maximize statistics (but beware dynamic SQL) ALTER TABLE <table> ALTER COLUMN <column> SET STATISTICS <number>
  • 34. SELECT Heinous SQL stories.id, WHERE (SELECT (fpa.owned=TRUE OR fpa.owned IS NULL) AND fsa.title fsa.story_id=stories.id FROM ORDER BY fsa.created_at DESC feed_story_attachments fsa LIMIT 1) AS author_url, LEFT OUTER JOIN feed_publication_attachments fpa (EXISTS ( ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112 SELECT fpa.id WHERE FROM feed_publication_attachments fpa (fpa.owned=TRUE OR fpa.owned IS NULL) AND JOIN feed_story_attachments fsa fsa.story_id=stories.id ON fsa.feed_id=fpa.feed_id ORDER BY fsa.created_at ASC WHERE LIMIT 1) as title, stories.id = fsa.story_id AND (SELECT fpa.publication_id=112 AND f.title fpa.owned FROM )) AS promoted feeds f FROM stories JOIN feed_story_attachments fsa JOIN blips b ON f.id=fsa.feed_id ON b.story_id = stories.id AND b.location_id=1435491 AND LEFT OUTER JOIN feed_publication_attachments fpa b.publisher_id IN (0,115) ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112 WHERE WHERE b.blip_type_id in (1,3) AND -- comment out to run prior query (fpa.owned=TRUE OR fpa.owned IS NULL) AND form fsa.story_id=stories.id ( ORDER BY fsa.created_at DESC NOT EXISTS ( LIMIT 1) as "author", SELECT bf.id (SELECT FROM blip_filters bf fsa.url WHERE FROM bf.location_id=1435491 AND feed_story_attachments fsa bf.story_id = stories.id AND LEFT OUTER JOIN feed_publication_attachments fpa bf.publisher_id=115 ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112 ) AND EXISTS ( WHERE SELECT (fpa.owned=TRUE OR fpa.owned IS NULL) AND f.id fsa.story_id=stories.id FROM ORDER BY fsa.created_at DESC feeds f LIMIT 1) as url, JOIN feed_story_attachments fsa SUBSTRING(stories.summary FROM 1 FOR 200) AS summary, ON f.id=fsa.feed_id stories.sort_date as published_at, LEFT OUTER JOIN feed_publication_attachments fpa (SELECT ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112 f.base_url WHERE FROM (fpa.owned=TRUE OR fpa.owned IS NULL) AND feeds f fsa.story_id=stories.id JOIN feed_story_attachments fsa ) AND ( ON f.id=fsa.feed_id NOT EXISTS( LEFT OUTER JOIN feed_publication_attachments fpa SELECT psf.id ON fsa.feed_id=fpa.feed_id AND fpa.publication_id=112 FROM publication_story_filters psf WHERE WHERE (fpa.owned=TRUE OR fpa.owned IS NULL) AND psf.story_id = stories.id AND fsa.story_id=stories.id psf.publication_id=112 ORDER BY fsa.created_at DESC )
  • 35. fsa.url fpa.owned FROM )) AS promoted feed_story_attachments fsa FROM stories LEFT OUTER JOIN JOIN blips b feed_publication_attachments fpa ON b.story_id = stories.id AND ON fsa.feed_id=fpa.feed_id b.location_id=1435491 AND AND fpa.publication_id=112 b.publisher_id IN (0,115) WHERE WHERE (fpa.owned=TRUE OR fpa.owned b.blip_type_id in (1,3) AND -- IS NULL) AND comment out to run prior query form fsa.story_id=stories.id ( ORDER BY fsa.created_at DESC NOT EXISTS ( LIMIT 1) as url, SELECT bf.id SUBSTRING(stories.summary FROM 1 FROM blip_filters bf FOR 200) AS summary, WHERE stories.sort_date as published_at, bf.location_id=1435491 AND (SELECT bf.story_id = stories.id AN f.base_url bf.publisher_id=115 FROM ) AND EXISTS ( feeds f SELECT JOIN feed_story_attachments fsa f.id ON f.id=fsa.feed_id FROM LEFT OUTER JOIN feeds f feed_publication_attachments fpa JOIN feed_story_attachments ON fsa.feed_id=fpa.feed_id fsa AND fpa.publication_id=112 ON f.id=fsa.feed_id WHERE LEFT OUTER JOIN (fpa.owned=TRUE OR fpa.owned ISfeed_publication_attachments fpa ON fsa.feed_id=fpa.feed_i
  • 36. Make Heinous SQL Run Fast! • Fast = subsecond • Ideally < 250 ms • Query planner - feed it stats • Sometimes rewrite q’s to take advantage of GiST indexes (critical for geo)
  • 37. Costs Reserved Standard Monthly $$ Jan Feb March April May June July Aug Sept Oct Nov Dec
  • 38. Lessons Learned • EC2 is still cheap-ish, but not without careful planning! • Denormalize into something else (Lucene, Geo Cache) • Monitor the crap out of everything • Send a synthetic transaction ID through stack • Plan on a few failures a week
  • 39. Hybrid Postgres/MongoDB/Lucene Data Stack • Postgres 9.0 • Mongo for social graph and event-logging • UUIDs for shared references • Hot Standby • Streaming Replication • VPC and Dedicated Instances ($$) • Experimenting with other Clouds for Production Environment • Launching late summer - and we’re hiring!
  • 40. Lessons Learned • Invite me back in a few months.
  • 41. Some Thoughts and Conclusions • PostgreSQL is a GREAT choice if you are starting out now, on EC2. • The Postgres community is awesome. • Organized governing body- who needs it? • Let’s see a shrink-wrapped EC2 cloud- provider. We’ll be customer #2 :)

Editor's Notes

  1. \n
  2. Seen Josh Berkus&amp;#x2019; 7 Habits of Highly Ineffective Presenters\n
  3. \n
  4. \n
  5. DRR- photography\nShrty - social network aggregation, innovative at the time\nOI - hyperlocal news\nObikosh - stealth\n
  6. going to start with a story. 5 years ago\n
  7. but especially on a Friday night. failure of primary and secondary backup devices on site. Offsite backups were old.\n
  8. Bad went to worse and worse and worse.\nPainful recovery process.\nthis is NOT what i mean by &amp;#x201C;operational fun&amp;#x201D;\nBesides friendships with the engineers, learned a few things\n\n
  9. Digital Railroad.\nUsed by news photographers worldwide\nFTP service stayed up - separation of concerns\nHappened to be built on postgres by one of our devs\n
  10. and it happens at 1 am on Friday night\n
  11. \n
  12. production wasn&amp;#x2019;t really production\n
  13. no sla: nature of EC2 scale. often we would know of failures long before AWS\nsupport can really only tell you that indeed, you have experienced a failure\n
  14. \n
  15. \n
  16. \n
  17. 150 instances\n
  18. \n
  19. \n
  20. \n
  21. each connection has its own working set, so be careful. we got up to 500.\ncould not use pgbouncer because clients never release conns.\n
  22. \n
  23. Engineer on team came up with containment cache idea.\n
  24. \n
  25. explain WAL\nhot standby in 9.0!\n
  26. \n
  27. elastic load balancers. no good for internal use.\n
  28. tell ironhide story. service for iPhone app and a version of API.\n
  29. RAID 10 will tolerate volume loss. we did not do io tests.\n
  30. fio used for testing.\nGuaranteed iops?\n
  31. used mdadm to build software RAID0\n
  32. autovacuum can kick in at inopportune moments (data load, data fixup)\n
  33. - crank up stats for better smarter query plans\n- but it comes at a price: slower vacuum analyze, pay the price with dynamic sql query planning time\n
  34. \n
  35. \n
  36. query planner is really good. \nBruce Momjian (sp) said they want the query optimizer to do everything for you. If you have a problem they will get you a patch.\n
  37. 8 months to realize savings from reservations, given typical flux of res and non-res\n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n