AWS Customer Presentation - HotPads


Published on

Matt Corgan, Co-Founder and Director of Technology, talks at AWS Start-Up Event in Washington DC about their use of AWS.

Published in: Technology
1 Comment
  • That's really great for the company without IT infrustructue
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

AWS Customer Presentation - HotPads

  1. 1. on AWS Matthew Corgan President May 27, 2009
  2. 2. What is HotPads? <ul><li>Real Estate search engine </li></ul><ul><ul><li>Launched in May, 2005 in Washington, DC </li></ul></ul><ul><ul><ul><li>Used The Planet for hosting until December, 2008 </li></ul></ul></ul><ul><ul><li>9 employees, 6 engineers </li></ul></ul><ul><ul><li>800,000 visits/month </li></ul></ul><ul><ul><li>4.5 million page-views/month </li></ul></ul><ul><ul><li>3.5 million real estate listings updated daily </li></ul></ul><ul><ul><li>Java and MySQL </li></ul></ul>
  3. 3. AWS costs in April <ul><li>EC2 instances : $7,400 </li></ul><ul><li>S3 : $1,500 </li></ul><ul><li>EBS : $500 </li></ul><ul><li>CloudFront : $460 </li></ul><ul><li>EIPs : $8 </li></ul><ul><li>RightScale - $500 </li></ul><ul><ul><li>3 rd party management console </li></ul></ul><ul><li>SQS : in development </li></ul><ul><li>Reserved instances : still evaluating </li></ul>
  4. 4. Site components S Load balancer MapTile Job Messaging Databases L XL L Public S3 CF VA TX CA International Web L MEM EBS L S3 L CF CF CF CF CF CF EBS L EBS L EBS L EBS L EBS XL EBS MEM L Indexing
  5. 5. S3 – better for larger objects <ul><li>Latency > 10ms or even > 100ms </li></ul><ul><li>Memcached latency below 1ms </li></ul><ul><li>$0.15 per GB-month storage </li></ul><ul><li>$1 per 1mm GETs </li></ul><ul><li>$1 per 100k PUTs </li></ul><ul><li>Ex: 67 KB object (600px image) </li></ul><ul><ul><li>PUT cost ~= storage cost ~= download cost </li></ul></ul><ul><li>Ex: 6.7 KB object (15px thumbnail) </li></ul><ul><ul><li>GET cost ~= storage cost ~= download cost </li></ul></ul><ul><ul><li>Careful! – PUT cost is 10x the storage and transfer costs </li></ul></ul>
  6. 6. S3 – April usage <ul><li>Photos </li></ul><ul><ul><li>330 GB downloaded @ $.15/GB = $49 </li></ul></ul><ul><ul><li>55mm GETs @ $1/mm = $55 </li></ul></ul><ul><ul><li>42mm PUTs @ $1/1k = $420! </li></ul></ul><ul><li>Database backups </li></ul><ul><ul><li>4.4 TB stored @ $.15/GB = $660 </li></ul></ul><ul><ul><ul><li>Probably too many copies stored </li></ul></ul></ul><ul><li>Maptiles </li></ul><ul><ul><li>~$100 for downloads and GETs </li></ul></ul>
  7. 7. CloudFront <ul><li>HotPads uses for: </li></ul><ul><ul><li>Static files : great </li></ul></ul><ul><ul><li>Map tiles : ok </li></ul></ul><ul><ul><li>Photos : toss-up, but we use anyway </li></ul></ul><ul><ul><ul><li>Many photos are only viewed once </li></ul></ul></ul><ul><ul><ul><li>CloudFront miss has to go back to S3, so cache miss may take longer than going to S3 directly </li></ul></ul></ul><ul><ul><ul><li>Pay for 2 GETs on a miss </li></ul></ul></ul><ul><ul><ul><li>Maybe pay for 2x the transfer cost (not sure) </li></ul></ul></ul><ul><ul><ul><li>But, makes frequently viewed listings faster </li></ul></ul></ul>
  8. 8. EC2 breakdown <ul><li>EC2 (currently all “ memory ” instance types) </li></ul><ul><ul><li>Load balancers, HAProxy, 2 small = $150 </li></ul></ul><ul><ul><li>Web servers, Tomcat, 3-5 large = $1,200 </li></ul></ul><ul><ul><ul><li>Scale out 11am to Midnight </li></ul></ul></ul><ul><ul><li>Job servers, Tomcat, 5 large = ~$1,500 </li></ul></ul><ul><ul><li>Index servers, Tomcat, 1 X-large, 1 large = ~$900 </li></ul></ul><ul><ul><li>MySQL masters, 1 X-large, 2 large = ~$1,200 </li></ul></ul><ul><ul><li>MySQL slaves, 1 X-large, 2 large = ~$1,200 </li></ul></ul><ul><ul><li>Messaging server, ActiveMQ,1 large = ~$300 </li></ul></ul><ul><ul><li>Map tile creation servers, Tilecache, 1 large = ~$300 </li></ul></ul><ul><ul><li>Development/testing/migration servers = ~$600 </li></ul></ul><ul><li>8GB Memcached on permanent webs/jobs </li></ul>
  9. 9. EBS – used for all databases <ul><li>Cons </li></ul><ul><ul><li>Black box: hard to determine the best usage </li></ul></ul><ul><ul><li>Adds costs above using local drives (but not too much) </li></ul></ul><ul><ul><li>Less bandwidth (not usually important for databases) </li></ul></ul><ul><li>Pros </li></ul><ul><ul><li>Lower average latency </li></ul></ul><ul><ul><li>Especially fast random writes </li></ul></ul><ul><ul><li>Snapshot backups allow for very short write-locks and only storing diffs </li></ul></ul><ul><ul><li>Ability to clone and hibernate databases </li></ul></ul><ul><ul><li>Redundancy </li></ul></ul><ul><ul><ul><li>We had lost the local disks on a live master database twice </li></ul></ul></ul>
  10. 10. <ul><li>I/O bound </li></ul><ul><li>RAIDing multiple volumes didn’t help much </li></ul><ul><li>Testing multiple drives with 1 schema per drive </li></ul>Database utilization
  11. 11. SimpleDB <ul><li>Pros </li></ul><ul><ul><li>Stand-alone DB servers are often drastically underutilized and a pain to administer, backup, and restore after failure </li></ul></ul><ul><ul><li>SimpleDB is schema-less </li></ul></ul><ul><ul><ul><li>MySQL schema changes are a major problem </li></ul></ul></ul><ul><li>Cons </li></ul><ul><ul><li>Binary stored values can’t be interpreted by generic GUI, </li></ul></ul><ul><ul><li>and have to be encoded by the client </li></ul></ul><ul><ul><li>Tied to EC2 for latency reasons </li></ul></ul><ul><ul><li>Eventual consistency when accessed from different </li></ul></ul><ul><ul><li>EC2 nodes </li></ul></ul><ul><ul><li>“ Column” names (may??) inflate storage size </li></ul></ul><ul><ul><li>Must partition a table before it hits 10 GB </li></ul></ul>
  12. 12. Reserved Instances <ul><li>Pros </li></ul><ul><ul><li>Get 1 year for the cost of 6 months </li></ul></ul><ul><ul><li>Guaranteed to get an instance </li></ul></ul><ul><ul><ul><li>yes – we have been denied </li></ul></ul></ul><ul><li>Cons </li></ul><ul><ul><li>Tied to particular instance type </li></ul></ul><ul><ul><ul><li>Your needs may change </li></ul></ul></ul><ul><ul><ul><li>Amazon may introduce more appropriate instance types </li></ul></ul></ul>
  13. 13. How does AppEngine compare? <ul><li>Benefits? </li></ul><ul><ul><li>Low cost, no idle instances sitting around </li></ul></ul><ul><ul><li>No Linux administration </li></ul></ul><ul><li>Why don’t we use it? </li></ul><ul><ul><li>Java deployments limited to 1,000 files </li></ul></ul><ul><ul><li>Cannot spawn threads </li></ul></ul><ul><ul><ul><li>Several areas of HotPads are multi-threaded for a 10x request latency improvement </li></ul></ul></ul><ul><ul><li>Request limit of 30 seconds: no long jobs </li></ul></ul><ul><ul><li>Our indexes need a big, long-lived heap </li></ul></ul><ul><li>Amazon lets you innovate more, and that’s our goal. </li></ul>