Video available at http://youtu.be/1N3YtjXmtNI
Ryan Barker, Software Architect for Eharmony.com explains the evolution of the matching system behind the largest matching site for singles. Detailed explanations of past and present online and offline matching systems including the latest SOA REST based webservices and hadoop backend hybrid system
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Dating with Models
1. 1
Dating with ModelsDating with Models
A How to Guide for Programmers and ArchitectsA How to Guide for Programmers and Architects
Ryan BarkerRyan Barker
2. The eHarmony Difference ›The eHarmony Difference › How are we different?
• 30+ years as clinical psychologist
and marriage counselor
• Many failing marriages due to
fundamental incompatibility
Can we do better?
3. The fundamental idea›The fundamental idea›
320 Questions
› Personality
› Values
› Attitudes
› Beliefs
Compatibility Matching
7. So lets build it! ›So lets build it! › Models as a stored procedure~2001
8. Problems ›Problems › Stored procedures are awesome
• Problem #1 – Thousands of users, very few matches. Entire
company is at stake
• Resolution – Line by line debugging of stored procedure finds
an AND that should be an OR
• Problem #2 – Database load increasing
• Resolution – Optimize stored procedure? More hardware?
Rewrite?
• Problem #3 – Order by compatibility does not work
• Resolution – Change stored procedure? Find a way to
introduce models
17. Problems ›Problems › Better but still suboptimal
• Problem #1 – Suboptimal distribution of matches
• Resolution – Shuffle loop order each day? Introduce an
optimizer!
• Problem #2 – Nightly match run taking 27 hours, heavy
database load
• Resolution – Move to an offline process
• Problem #3 – Java models require testing and new releases.
Groovy models are too slow
• Resolution – Change to configuration based models
24. Problems ›Problems › The design is never finished
• Problem #1 – More data required
• Resolution – Build services to collect data in real time
• Problem #2 – Bandwidth limitations
• Resolution – Switch to protocol buffers
• Problem #3 – Can’t reprocess people fast enough due to
database load
• Resolution – Switch to key value store backed services
26. Rearchitecture ›Rearchitecture › Service features
• RESTful data oriented design
• Single element
• GET – Return single element
• POST – Update single element
• PUT – Create single element
• DELETE – Delete single element
• Multiple element
• GET – Return list of elements
• Produces/Consumes JSON or Protobuf
• JAX-RS providers transparently convert
between formats
• Accept/ContentType: X-application-protobuf
27. Rearchitecture ›Rearchitecture › Service Client features
• Generic client customized for each service
• Single element
• GET – Return single element
• POST – Update single element
• PUT – Create single element
• DELETE – Delete single element
• Multiple element
• GET – Return list of elements
• BATCH – Scatter gather implementation
• Protocol buffer based by default, falls back to
JSON for older services
• Configurable retries for GET/PUT/DELETE
28. Current Day ›Current Day › Matching User Service
Matching User Service is a data aggregation service
that gathers data from various sources, and stores
them in a key value store
•REST + Protocol buffer based
• /user-service/<version>/users/<user-id>
• Supports full and partial updates
• Supports single and batch gets
• 1000+ data attributes,
• ~4KB each uncompressed
•Key: Userid
•Value: UserProto
32. Current Day ›Current Day › Pairing Service
Pairing Service is a data service that supports a
specialized set of operations
•REST + Protocol buffer based
• GET/PUT/DELETE /pairings-
service/<version>/pairings/<type>/users/<user-id>
• DELETE /pairings-
service/<version>/pairings/<type>/users/<user-
id>/candidates/<candidate-id>
• 4 data attributes per pairing
• 0 to tens of thousands of pairings per user
•Stores: 1 per type
•Key: Userid
•Value: PairingsProto
33. Current Day ›Current Day › Scoring Service
Scoring Service is a stateless calculation
service that supports JSON based models
•REST + Protocol buffer based
• GET /scoring-service/<version>/users/<user-
id>/models/<modelname>/score
• POST /scoring-
service/<version>/models/<modelname>/score
•Knows how to fetch data from data sources for
some models
•All models slowly being centralized in one place
•Underlying library supports any protobuf or map
•Possible candidate for redesign?
34. Current Day ›Current Day › Model Frameworks 3.0
Model Frameworks 3.0 is the core library
behind all scoring
•JSON based model definitions
•Scala DSL implementation with bytecode
generation
•Supports Protobuffs (Message), ResultSet, Maps
•Examples
• “same_religion” : ”{user.profile.religion} ==
{cand.profile.religion}”
• “bin_age_diff” : ”bin(bins, {user.calculatedValues.age} -
{cand.calculatedValues.age})”
Hello and thank you Pleasure to be here. Today I am here to talk about what is happening behind the scenes @ eharmony. We were one of the first companies to apply sophisticated technology to the very old concept of matchmaking . eHarmony takes a very different approach from other online dating sites, … search-based . On those sites, you determine your preferences – and filter out That ’ s one valid approach . But eHarmony is different . eHarmony was created to give people a [better chance] and a better way to find a great long-term relationship. Many of you may know from our old television commercials that eHarmony was founded by [Dr. Neil Clark Warren ]. You may not know that he was a clinical psychologist and marriage counselor in Pasadena , California for more than 30 years . A lot of the couples Dr. Warren counseled were in failing marriages . Over the years, he realized that marriages often fall apart when the people in them are fundamentally incompatible . Dr. Warren believed that the best way to create happier marriages and reduce some of the negative effects of divorce was to give people a better chance of marrying the right person in the first place. That insight led to a lot of questions: What makes some couples more satisfied in their relationships over time than others? Can long-term relationship satisfaction be predicted ? If so, can those qualities be used to match single people ? Dr. Warren and the founding team at eHarmony began researching those questions by studying several thousand married couples . They discovered that there are common traits that distinguish the most satisfied married couples from others. Thus, in the late 90s, eHarmony was born .
eHarmony was created to give people a [better chance] and a better way to find a great long-term relationship. Many of you may know from our old television commercials that eHarmony was founded by [Dr. Neil Clark Warren ]. You may not know that he was a clinical psychologist and marriage counselor in Pasadena , California for more than 30 years . A lot of the couples Dr. Warren counseled were in failing marriages . Over the years, he realized that marriages often fall apart when the people in them are fundamentally incompatible . Dr. Warren believed that the best way to create happier marriages and reduce some of the negative effects of divorce was to give people a better chance of marrying the right person in the first place. That insight led to a lot of questions: What makes some couples more satisfied in their relationships over time than others? Can long-term relationship satisfaction be predicted ? If so, can those qualities be used to match single people ? Dr. Warren and the founding team at eHarmony began researching those questions by studying several thousand married couples . They discovered that there are common traits that distinguish the most satisfied married couples from others. Thus, in the late 90s, eHarmony was born .
That is compatibility matching Similarity on dims that don ’ t get discussed When asked: “ Are you happy with yourself? ” Important but not pickup line. That ’ s why RQ A very good snapshot of personality
Core traits and vital attrs Core traits: [CLICK TO BUILD] Vital attributes Initial eH model
Here is the initial eHarmony model Only pairs with high chance to be very happy together are introduced.
If no click no comm Compatibility and chemistry are two very different things. interests provide something to talk about. His matches have to like him back. Affinity Matching is about
Every match eHarmony makes is compatible. That is from the personality perspective. However not all matches end up talking to each other. Sometimes the age gapcould be too big Other times the users may live too far. There are too many reasons to count. We are trying to deliver as many matches as possible where both users are interested in each other, start communicating and get to know each other.
That leads me to the last piece of our matching process, which we call match distribution. We need to make sure that we ’ re presenting the right matches… to the right users… at the right time… to as many people as possible across our entire network, every day. Network changing every day Let me illustrate this.
Now we ’ re not doing those joins on disk at all. For each potential match we want to process, we can load relevant user data on demand from each side from our voldemort cache. Ths was loaded with user data by a previous mapreduce step. now we ’ re joining in ram, record by record, on demand. At the end of the evalutation we ’ ve actually thrown away most of the data we don ’ t need after we ’ ve used it. Did I meantion this gave us a 10x speedup over conventional hadoop joins. It ’ s worth repeating: we got an order of magnitude performance improvement by doing this technique.
It worked
How it works for adam? matches Interested in Julia Break the ice? Pick up lines no good
Doing matchmaking well requires an innate understanding of your customers and the sophistication to use that data to deliver a valuable experience. All the advances in computing power and algorithms have recently opened up a lot of new possibilities and applications. I ’ m happy to talk with any of you further if you have questions about eHarmony or how to apply matchmaking to your own businesses. Thank you.