AWS Customer Presentation - eHarmony


Published on

AWS customer presentation by at the AWS Cloud for the Enterprise event in LA on October 15, 2009

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

AWS Customer Presentation - eHarmony

  1. 1. The Evolution of Hadoop and AWS at<br />
  2. 2. About eHarmony<br />Launched in 2000<br />Goal to create compatible matches that lead to happy, long-term relationships<br />Compatibility models based on decades of research and clinical experience in psychology<br />Available in United States, Canada, Australia and United Kingdom<br />
  3. 3.
  4. 4. Over 20 million users<br />+<br />320-item questionnaire answered by each user<br />=<br />BIG DATA<br />
  5. 5. Continuous Improvement on Match Quality<br />Requires infrastructure that supports:<br />More user data<br />More complex models<br />Increased growth<br />
  6. 6. Why Not a Traditional Solution?<br />Scaling vertically<br />Complex to build<br />Scaling a constant challenge<br />Long engineering effort<br />Expensive<br />
  7. 7. How Hadoop Solved Our Problem<br />Cuts BIG DATA into small data<br />Horizontal scaling platform<br />Fault tolerance<br />Commodity boxes<br />
  8. 8. How Amazon Solved Our Problem<br />Amazon EC2 & S3 provided an attractive approach<br />Hosted Hadoop framework<br />Cost effective<br />Ability to scale on demand<br />SOLD!<br />
  9. 9. AWS Pricing Model<br />Pay-per-use elastic model<br />Choice of server type<br />Lets you get up and running quickly and cheaply<br />Highly cost effective alternative to doing it in-house<br />9<br />
  10. 10. Performance by Instance Type<br />10<br />Minutes<br />
  11. 11. AWS Elastic MapReduce<br />EC2 cluster managed for you behind the scenes<br />Only have to worry about MapReduce <br />Read/write data directly from S3 or HDFS<br />Faster turn-around time to production<br />
  12. 12. Elastic MapReduce for eHarmony<br />Vastly simplified our Hadoop processing<br />No need to explicitly allocate, start and shutdown EC2 instances<br />No need to explicitly manipulate master node<br />Cluster control and job management reduced to a single local command<br />12<br />
  13. 13. Architecture<br />Data Warehouse<br />Amazon Cloud<br />S3<br />Elastic MapReduce<br />upload<br />User data dump<br />input<br />Hadoop<br />Jobs<br />download<br />output<br />update<br />key-value store<br />Data Warehouse<br />
  14. 14. Challenges<br />The overall process depends on the success of each stage<br />Assume every stage is unreliable<br />Need to build retry/abort logic to handle failures<br />14<br />
  15. 15. Total Execution Time<br />15<br />
  16. 16. Lessons Learned<br />EC2/S3/EMR = cost effective<br />Hadoop community support is great<br />Hadoop combined w/ real-time system = tricky<br />Dev tools really easy to work right out of the box<br />Ensuring end-to-end reliability poses biggest challenges<br />16<br />
  17. 17. Looking Ahead<br />More tools to empower business intelligence beyond engineering<br />HIVE<br />Helps empower engineers & non-engineers to create analytic jobs on the fly<br />Tools for integrating to and from a traditional database/data warehouse to a Hadoop cluster<br />
  18. 18. User Satisfaction<br />