AWS Customer Presentation - HotPads

  • 5,614 views
Uploaded on

Matt Corgan, Co-Founder and Director of Technology, HotPads.com talks at AWS Start-Up Event in Washington DC about their use of AWS.

Matt Corgan, Co-Founder and Director of Technology, HotPads.com talks at AWS Start-Up Event in Washington DC about their use of AWS.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • That's really great for the company without IT infrustructue
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
5,614
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
160
Comments
1
Likes
15

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. HotPads.com on AWS Matthew Corgan President May 27, 2009
  • 2. What is HotPads?
    • Real Estate search engine
      • Launched in May, 2005 in Washington, DC
        • Used The Planet for hosting until December, 2008
      • 9 employees, 6 engineers
      • 800,000 visits/month
      • 4.5 million page-views/month
      • 3.5 million real estate listings updated daily
      • Java and MySQL
  • 3. AWS costs in April
    • EC2 instances : $7,400
    • S3 : $1,500
    • EBS : $500
    • CloudFront : $460
    • EIPs : $8
    • RightScale - $500
      • 3 rd party management console
    • SQS : in development
    • Reserved instances : still evaluating
  • 4. Site components S HotPads.com Load balancer MapTile Job Messaging Databases L XL L Public S3 CF VA TX CA International Web L MEM EBS L HotPads.com S3 L CF CF CF CF CF CF EBS L EBS L EBS L EBS L EBS XL EBS MEM L Indexing
  • 5. S3 – better for larger objects
    • Latency > 10ms or even > 100ms
    • Memcached latency below 1ms
    • $0.15 per GB-month storage
    • $1 per 1mm GETs
    • $1 per 100k PUTs
    • Ex: 67 KB object (600px image)
      • PUT cost ~= storage cost ~= download cost
    • Ex: 6.7 KB object (15px thumbnail)
      • GET cost ~= storage cost ~= download cost
      • Careful! – PUT cost is 10x the storage and transfer costs
  • 6. S3 – April usage
    • Photos
      • 330 GB downloaded @ $.15/GB = $49
      • 55mm GETs @ $1/mm = $55
      • 42mm PUTs @ $1/1k = $420!
    • Database backups
      • 4.4 TB stored @ $.15/GB = $660
        • Probably too many copies stored
    • Maptiles
      • ~$100 for downloads and GETs
  • 7. CloudFront
    • HotPads uses for:
      • Static files : great
      • Map tiles : ok
      • Photos : toss-up, but we use anyway
        • Many photos are only viewed once
        • CloudFront miss has to go back to S3, so cache miss may take longer than going to S3 directly
        • Pay for 2 GETs on a miss
        • Maybe pay for 2x the transfer cost (not sure)
        • But, makes frequently viewed listings faster
  • 8. EC2 breakdown
    • EC2 (currently all “ memory ” instance types)
      • Load balancers, HAProxy, 2 small = $150
      • Web servers, Tomcat, 3-5 large = $1,200
        • Scale out 11am to Midnight
      • Job servers, Tomcat, 5 large = ~$1,500
      • Index servers, Tomcat, 1 X-large, 1 large = ~$900
      • MySQL masters, 1 X-large, 2 large = ~$1,200
      • MySQL slaves, 1 X-large, 2 large = ~$1,200
      • Messaging server, ActiveMQ,1 large = ~$300
      • Map tile creation servers, Tilecache, 1 large = ~$300
      • Development/testing/migration servers = ~$600
    • 8GB Memcached on permanent webs/jobs
  • 9. EBS – used for all databases
    • Cons
      • Black box: hard to determine the best usage
      • Adds costs above using local drives (but not too much)
      • Less bandwidth (not usually important for databases)
    • Pros
      • Lower average latency
      • Especially fast random writes
      • Snapshot backups allow for very short write-locks and only storing diffs
      • Ability to clone and hibernate databases
      • Redundancy
        • We had lost the local disks on a live master database twice
  • 10.
    • I/O bound
    • RAIDing multiple volumes didn’t help much
    • Testing multiple drives with 1 schema per drive
    Database utilization
  • 11. SimpleDB
    • Pros
      • Stand-alone DB servers are often drastically underutilized and a pain to administer, backup, and restore after failure
      • SimpleDB is schema-less
        • MySQL schema changes are a major problem
    • Cons
      • Binary stored values can’t be interpreted by generic GUI,
      • and have to be encoded by the client
      • Tied to EC2 for latency reasons
      • Eventual consistency when accessed from different
      • EC2 nodes
      • “ Column” names (may??) inflate storage size
      • Must partition a table before it hits 10 GB
  • 12. Reserved Instances
    • Pros
      • Get 1 year for the cost of 6 months
      • Guaranteed to get an instance
        • yes – we have been denied
    • Cons
      • Tied to particular instance type
        • Your needs may change
        • Amazon may introduce more appropriate instance types
  • 13. How does AppEngine compare?
    • Benefits?
      • Low cost, no idle instances sitting around
      • No Linux administration
    • Why don’t we use it?
      • Java deployments limited to 1,000 files
      • Cannot spawn threads
        • Several areas of HotPads are multi-threaded for a 10x request latency improvement
      • Request limit of 30 seconds: no long jobs
      • Our indexes need a big, long-lived heap
    • Amazon lets you innovate more, and that’s our goal.