Big Dating at eHarmony


Published on

Thod Nguyen's presentation on "Big Dating at eHarmony" at MongoDB World 2014

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

  • So here’s the agenda for today

    First, I’ll talk about our compatibility matching system – the key to generating all those happy couples and satisfied marriages I talked about before.

    Then I’ll talk to you about the old system, how it was architected, and where we ran into problems.

    Then I’ll talk about the new system – our requirements, the technologies we evaluated and why we selected MongoDB.

    Finally, I’ll discuss some of the lessons we learned during MongoDB migration and and the new potential use cases we’re considering MongoDB for.
  • eHarmony’s secret sauce is our Compatibility matching system

    It consists of a sophisticated 3-tier process.

    Compatibility matching identifies potential compatible matches based on user core compatibility from the 29 dimensions of psychology and personality traits AND also based on user preferences

    Affinity Matching predicts the probability of communication between 2 people. That is, will these two people want to connect. Even if 2 people are very compatible because they have similar beliefs or interests; however, that doesn’t mean that they want to connect because of other reasons. For example, they could be completely in different age groups, or they live 3K miles apart or may not be attracted to each other.

    3. Match distribution helps to ensure we deliver the right matches to the right users at the right time – and to as many users as possible across our entire network.

    For the purpose of this talk I’ll stay mostly on the Compatibility matching system, allowing us to focus more on the usage of MongoDB.
  • The Compatibility matching component is a two step process

  • Traditional search is unidirectional

    To understand that let’s take a look at Nikki as an example.

    In this scenario, Nikkie is in the market looking for toasters.

    All that matters in the one-way search is to return the toasters that meet the criteria that Nikki had specified and whichever toaster she gets to take home, the Poor Toaster has no choice in this matter.

    But dating is more complex than this especially when we’re trying to create a meaningful, romantic connection between 2 people.

  • Dating is bidirectional – both people need to want to be with one another

    At eHarmony we developed a bidirectional system to make sure that user preferences are met both ways or mutually

    Take Nikki as an example again. This time she’s not looking for toasters on Amazon. She’s on eHarmony. We also have some other eHarmony users for example Jeb, Jon and Nick.

    First, we need to consider only those that meet Nikki’s criteria. In this case, that’s only Jeb and Jon.

    For us to have a match, Nikki also needs to meet the criteria specified by Jon or Jeb.

    In this case, that’s only Jon.

    What are some of these criteria that we’re talking about, these are simple things like age, distant, religion, ethnicity, income/education.

    This completes the first part of the matching system

  • So why was MongoDB selected?

    It provided the Best of both worlds – it supported fast, multi-attribute searches and powerful indexing features with dynamic, and flexible data model

    Supported Auto-Scaling – Anytime we want to handle more load, we just add a shard to our sharded cluster and if the shard is getting hot, we add another replica to our replica sets.

    Built-in sharding – so we can scale out our big data horizontally running on top of commodity machines and still maintaining high-performance throughput.

    Auto-balancing of data within the shard or all shards automatically and seamlessly so the client application doesn’t have to worry about the internals of how the data is stored and managed

    There were also other benefits including:

    Ease of management – This is very important to us when we have a small Ops team managing 1K+ servers and 2K+ other devices in our primary datacenter

    Open-Source + Commercial Entity –It’s open source with good community support and enterprise professional support from MongoDB team.

    In Q1 2013, we deploy 12 Mongos with 3 shards (3x4)
    In Q2 2014, we increase 18 Mongos with 3 shards (3x6)
    Query caching solution to maximize the throughput and performance
  • So what are the tradeoff’s when deploying MongoDB

    1. MongoDB is a schema free datastore. So the data format or data representation is repeated in every document in the collection, therefore, it requires a lot more storage space, which translates to a larger footprint.

    3. Aggregation queries in MongoDB are very different than traditional SQL aggregation functions. That results in a paradigm shift from DBA focus to Engineering focus.

    4. Lastly, the initial configuration/migration can be long, and manual process due to lack of automated tooling on MongoDB side. I was told that the new version of MMS dashboard will include automated provisioning, and configuration, software upgrade and point in time recovery/backup. This is a fantastic news for the Mongo community.
  • There were a few key lessons we learned during the MongoDB migration:

    1. Turn on the Firehose
    - When testing or even evaluating, use production data and queries to ensure that you have apple to apple comparison in terms of performance and scalability metrics

    2. Unleash the Chaos Monkey (LT)
    During your load testing, you kill your mongod instances to ensure your cluster and applications continue to function normally.

    3. Involve the MongoDB team from the start even during the POC
    Best architectural guidance and support related to data modeling, indexing strategy, optimized queries, helping you with Mongos production topology with proper monitoring (MMS) and integration with your internal monitoring system.

    4. Select a good shard key such that most of your queries can be isolated to a shard, so that mongos does not have wait to collect results from all shards.

    5. Run in shadow mode:
    - The matching infrastructure is based on the event-driven Service-Oriented Architecture (SOA) model
    - It’s easy for us to have 2 CMP clusters running from the same distributed messaging System.
    - Basically the messages were replicated to both of the clusters (Postgres and MongoDB) from real production traffic, so we were able to optimize the MongoDB (shard, key, query indices) without affecting our production users and once we certify the solution, we simply switched to the MongoDB based cluster.

    Tuning - (Shard Key, Query Indices) / Enhance Code /  Increase Mongo Cluster capacity
  • What’s next to come from eHarmony:

    Our core mission is to make people lives better, happier whether to find the love of their lives across multiple locales and languages or to help them finding the right job.

    Our online dating sites in AU and UK have been very profitable so we plan to expand that success to 20 other countries in the next few years, starting with English speaking countries, Spanish, French and other languages.

    We’re also working on the new job compatibility vertical, we called it “Careers by eHarmony”. And we plan to launch the Beta site in December of this year. We know for a long time that it’s really hard to make the marriage to work if you’re not happy with your job. 65+% of people in America are not happy with the job they currently have and they can be, if they get matched with the right job based on the culture of the company, personality to whom they will be reporting to in addition to their skills.

    Here are some potential use cases we may consider using MongoDB for :

    Real-time geo-location based matching services leveraging on the MongoDB spatial indexes and queries

    2. We also explore MongoDB for one of our datastores for the new Jobs Compatibility vertical
  • Here are some of our major technology investments to help us solving complex engineering problems and providing long-term maintainability, scalability and innovation at eHarmony.

    For example:

    1. We use a lot of Scala as the functional programming language to implement our CMS, and Affinity matching models.

    2. We heavily use Hadoop/Hive on top of Yarn for our data mining, massive data processing, and R (Revolution) as the programming language for data science and predictive analytics in our machine learning models.

    3. We use Node.js/HTML5/Backbone to implement our public facing eHarmony web applications for both Mobile Web and Desktop.
  • 1. We have lots of open positions right now, so if you’re interested to be part of a great cause, great culture and working on the coolest and most cutting-edge technologies

    reach out to me directly on @LinkedIn or apply directly @jobs.

    That would require too much detail right now given a short-time that we have, let’s connect afterward to discuss.
  • Big Dating at eHarmony

    1. 1. Thod Nguyen Chief Technology Officer Big Dating at eHarmony
    2. 2. social impact
    3. 3. big dating at scale 3B+ potential matches daily ~ 25+ TB of data 60M+ multi-attribute queries daily looking across 250+ attributes 212M+ photos ~ 15+ TB of data 4B+ relationship questionnaires ~ 25+ TB of data
    4. 4. the big win for product Decreased the processing time to match by 95%, from 2+ weeks to 12 hours on 3B+ potential matches/day 30% increase in 2-way communications 50% increase in paid subs 60% increase in unique visitors
    5. 5. today Compatibility Matching System The Old The New Why MongoDB What’s Next
    6. 6. compatibility matching system® Compatibility Matching System® Match Distribution 3 Compatibility Matching 1 Affinity Matching 2
    7. 7. Compatibility Matching System® Affinity Matching Match Distribution 2 3 compatibility matching system (cont’d) Compatibility Matching 1
    8. 8. traditional search
    9. 9. eharmony matching
    10. 10. compatibility models
    11. 11. compatibility matching process
    12. 12. legacy compatibility match processor (CMP)
    13. 13. legacy compatibility match processor V.2 (CMP)
    14. 14. challenges with existing v2. design
    15. 15. challenges with existing v2. design (contd.)
    16. 16. challenges with existing v2. design (contd.)
    17. 17. challenges with existing v2. design (contd.)
    18. 18. challenges with existing v2. design (contd.)
    19. 19. new data store requirements
    20. 20. why Mongodb?
    21. 21. tradeoffs No schema = larger footprint Aggregation queries are different Initial configuration can be long, manual process
    22. 22. lessons learned Turn on the Firehose Unleash the Chaos Monkey Engage MongoDB, Inc. early – dev to production Try to isolate your queries to a shard Run in shadow mode
    23. 23. what’s next New matching use cases: Globalization and Localization of eH site Careers by eHarmony Internet of Things “Compatible” New use cases within eHarmony: Real-time geo location based matching service Careers
    24. 24. technology stack
    25. 25. We’re Hiring