eHarmony Customer Presentation

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite & 1 Group

    eHarmony Customer Presentation - Presentation Transcript

    1. The Evolution of Hadoop and AWS at
    2. About eHarmony
      Launched in 2000
      Goal to create compatible matches that lead to happy, long-term relationships
      Compatibility models based on decades of research and clinical experience in psychology
      Available in United States, Canada, Australia and United Kingdom
    3. Over 20 million users
      +
      320-item questionnaire answered by each user
      =
      BIG DATA
    4. Continuous Improvement on Match Quality
      Requires infrastructure that supports:
      More user data
      More complex models
      Increased growth
    5. Why Not a Traditional Solution?
      Scaling vertically
      Complex to build
      Scaling a constant challenge
      Long engineering effort
      Expensive
    6. How Hadoop Solved Our Problem
      Cuts BIG DATA into small data
      Horizontal scaling platform
      Fault tolerance
      Commodity boxes
    7. How Amazon Solved Our Problem
      Amazon EC2 & S3 provided an attractive approach
      Hosted Hadoop framework
      Cost effective
      Ability to scale on demand
      SOLD!
    8. AWS Pricing Model
      Pay-per-use elastic model
      Choice of server type
      Lets you get up and running quickly and cheaply
      Highly cost effective alternative to doing it in-house
      9
    9. Performance by Instance Type
      10
      Minutes
    10. AWS Elastic MapReduce
      EC2 cluster managed for you behind the scenes
      Only have to worry about MapReduce
      Read/write data directly from S3 or HDFS
      Faster turn-around time to production
    11. Elastic MapReduce for eHarmony
      Vastly simplified our Hadoop processing
      No need to explicitly allocate, start and shutdown EC2 instances
      No need to explicitly manipulate master node
      Cluster control and job management reduced to a single local command
      12
    12. Architecture
      Data Warehouse
      Amazon Cloud
      S3
      Elastic MapReduce
      upload
      User data dump
      input
      Hadoop
      Jobs
      download
      output
      update
      key-value store
      Data Warehouse
    13. Challenges
      The overall process depends on the success of each stage
      Assume every stage is unreliable
      Need to build retry/abort logic to handle failures
      14
    14. Total Execution Time
      15
    15. Lessons Learned
      EC2/S3/EMR = cost effective
      Hadoop community support is great
      Hadoop combined w/ real-time system = tricky
      Dev tools really easy to work right out of the box
      Ensuring end-to-end reliability poses biggest challenges
      16
    16. Looking Ahead
      More tools to empower business intelligence beyond engineering
      HIVE
      Helps empower engineers & non-engineers to create analytic jobs on the fly
      Tools for integrating to and from a traditional database/data warehouse to a Hadoop cluster
    17. User Satisfaction
    SlideShare Zeitgeist 2009

    + Amazon Web ServicesAmazon Web Services Nominate

    custom

    134 views, 1 favs, 0 embeds more stats

    AWS customer presentation by eHarmony.com at the AW more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 134
      • 134 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 0
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories