SlideShare a Scribd company logo
1
Dating with ModelsDating with Models
A How to Guide for Programmers and ArchitectsA How to Guide for Programmers and Architects
Ryan BarkerRyan Barker
The eHarmony Difference ›The eHarmony Difference › How are we different?
• 30+ years as clinical psychologist
and marriage counselor
• Many failing marriages due to
fundamental incompatibility
Can we do better?
The fundamental idea›The fundamental idea›
320 Questions
› Personality
› Values
› Attitudes
› Beliefs
Compatibility Matching
Compatibility Matching ›Compatibility Matching › Obstreperousness
Compatibility Matching ›Compatibility Matching › Romantic
Compatibility Matching ›Compatibility Matching › 29 Dimensions®
So lets build it! ›So lets build it! › Models as a stored procedure~2001
Problems ›Problems › Stored procedures are awesome
• Problem #1 – Thousands of users, very few matches. Entire
company is at stake
• Resolution – Line by line debugging of stored procedure finds
an AND that should be an OR
• Problem #2 – Database load increasing
• Resolution – Optimize stored procedure? More hardware?
Rewrite?
• Problem #3 – Order by compatibility does not work
• Resolution – Change stored procedure? Find a way to
introduce models
Match
Distribution
3
Compatibility
Matching
1
Affinity
Matching
2
The eHarmony Difference ›The eHarmony Difference › Compatibility Matching System®
Layers on Top of
Compatibility Matching
61 21
3000
Affinity Matching ›Affinity Matching ›
………
Affinity Matching ›Affinity Matching ›
Affinity Matching ›Affinity Matching › Distance
Prob( )
Affinity Matching ›Affinity Matching › Distance
Affinity Matching ›Affinity Matching › Height difference
Prob( ) 4 - 8 in
cm
Affinity Matching ›Affinity Matching › “Attractiveness”
Prob( )
Redesign ›Redesign › Event based matching with Java/Groovy models
Problems ›Problems › Better but still suboptimal
• Problem #1 – Suboptimal distribution of matches
• Resolution – Shuffle loop order each day? Introduce an
optimizer!
• Problem #2 – Nightly match run taking 27 hours, heavy
database load
• Resolution – Move to an offline process
• Problem #3 – Java models require testing and new releases.
Groovy models are too slow
• Resolution – Change to configuration based models
Compatibility
Matching
1
Affinity
Matching
2
Match
Distribution
3
The eHarmony Difference ›The eHarmony Difference › Compatibility Matching System®
Delivering the right
matches at the right time
to as many people as
possible across the entire
network.
Match Distribution ›Match Distribution › Graph optimization
2 21Prob( | data)
Match Distribution ›Match Distribution › Graph optimization
2 2Prob( | data)
Match Distribution ›Match Distribution › Graph optimization
2 2Prob( | data)
23
Match Distribution ›Match Distribution › Does it work?
Problems ›Problems › The design is never finished
• Problem #1 – More data required
• Resolution – Build services to collect data in real time
• Problem #2 – Bandwidth limitations
• Resolution – Switch to protocol buffers
• Problem #3 – Can’t reprocess people fast enough due to
database load
• Resolution – Switch to key value store backed services
Rearchitecture ›Rearchitecture › Services for everything
Rearchitecture ›Rearchitecture › Service features
• RESTful data oriented design
• Single element
• GET – Return single element
• POST – Update single element
• PUT – Create single element
• DELETE – Delete single element
• Multiple element
• GET – Return list of elements
• Produces/Consumes JSON or Protobuf
• JAX-RS providers transparently convert
between formats
• Accept/ContentType: X-application-protobuf
Rearchitecture ›Rearchitecture › Service Client features
• Generic client customized for each service
• Single element
• GET – Return single element
• POST – Update single element
• PUT – Create single element
• DELETE – Delete single element
• Multiple element
• GET – Return list of elements
• BATCH – Scatter gather implementation
• Protocol buffer based by default, falls back to
JSON for older services
• Configurable retries for GET/PUT/DELETE
Current Day ›Current Day › Matching User Service
Matching User Service is a data aggregation service
that gathers data from various sources, and stores
them in a key value store
•REST + Protocol buffer based
• /user-service/<version>/users/<user-id>
• Supports full and partial updates
• Supports single and batch gets
• 1000+ data attributes,
• ~4KB each uncompressed
•Key: Userid
•Value: UserProto
Current Day ›Current Day › Matching User Servic
Current Day ›Current Day › Matching User Service
Current Day ›Current Day › Matching User Service
Current Day ›Current Day › Pairing Service
Pairing Service is a data service that supports a
specialized set of operations
•REST + Protocol buffer based
• GET/PUT/DELETE /pairings-
service/<version>/pairings/<type>/users/<user-id>
• DELETE /pairings-
service/<version>/pairings/<type>/users/<user-
id>/candidates/<candidate-id>
• 4 data attributes per pairing
• 0 to tens of thousands of pairings per user
•Stores: 1 per type
•Key: Userid
•Value: PairingsProto
Current Day ›Current Day › Scoring Service
Scoring Service is a stateless calculation
service that supports JSON based models
•REST + Protocol buffer based
• GET /scoring-service/<version>/users/<user-
id>/models/<modelname>/score
• POST /scoring-
service/<version>/models/<modelname>/score
•Knows how to fetch data from data sources for
some models
•All models slowly being centralized in one place
•Underlying library supports any protobuf or map
•Possible candidate for redesign?
Current Day ›Current Day › Model Frameworks 3.0
Model Frameworks 3.0 is the core library
behind all scoring
•JSON based model definitions
•Scala DSL implementation with bytecode
generation
•Supports Protobuffs (Message), ResultSet, Maps
•Examples
• “same_religion” : ”{user.profile.religion} ==
{cand.profile.religion}”
• “bin_age_diff” : ”bin(bins, {user.calculatedValues.age} -
{cand.calculatedValues.age})”
Current Day ›Current Day › Offline Matching – Spring Conductor
Current Day ›Current Day › Offline Matching – Hadoop flow
38
linkedin.com/in/rbarker1

More Related Content

What's hot

Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2
MongoDB
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB
 
Ldap2010
Ldap2010Ldap2010
Ldap2010
CYJ
 
Speed Kit: Getting Websites out of the Web Performance Stone Age
Speed Kit: Getting Websites out of the Web Performance Stone AgeSpeed Kit: Getting Websites out of the Web Performance Stone Age
Speed Kit: Getting Websites out of the Web Performance Stone Age
Felix Gessert
 
RESTdesc – Efficient runtime service discovery and consumption
RESTdesc – Efficient runtime service discovery and consumptionRESTdesc – Efficient runtime service discovery and consumption
RESTdesc – Efficient runtime service discovery and consumption
Ruben Verborgh
 
Tips and Tricks for Migrating to Exchange Online
Tips and Tricks for Migrating to Exchange OnlineTips and Tricks for Migrating to Exchange Online
Tips and Tricks for Migrating to Exchange Online
Steve Goodman
 
Event Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspectiveEvent Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspective
Jonas Bonér
 
GWAVACon - Migration into Office 365 Cloud
GWAVACon - Migration into Office 365 CloudGWAVACon - Migration into Office 365 Cloud
GWAVACon - Migration into Office 365 Cloud
GWAVA
 

What's hot (8)

Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
Ldap2010
Ldap2010Ldap2010
Ldap2010
 
Speed Kit: Getting Websites out of the Web Performance Stone Age
Speed Kit: Getting Websites out of the Web Performance Stone AgeSpeed Kit: Getting Websites out of the Web Performance Stone Age
Speed Kit: Getting Websites out of the Web Performance Stone Age
 
RESTdesc – Efficient runtime service discovery and consumption
RESTdesc – Efficient runtime service discovery and consumptionRESTdesc – Efficient runtime service discovery and consumption
RESTdesc – Efficient runtime service discovery and consumption
 
Tips and Tricks for Migrating to Exchange Online
Tips and Tricks for Migrating to Exchange OnlineTips and Tricks for Migrating to Exchange Online
Tips and Tricks for Migrating to Exchange Online
 
Event Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspectiveEvent Driven-Architecture from a Scalability perspective
Event Driven-Architecture from a Scalability perspective
 
GWAVACon - Migration into Office 365 Cloud
GWAVACon - Migration into Office 365 CloudGWAVACon - Migration into Office 365 Cloud
GWAVACon - Migration into Office 365 Cloud
 

Viewers also liked

Research project 2014
Research project 2014Research project 2014
Research project 2014
DB Entertainment
 
Eharmony socialmedia
Eharmony socialmediaEharmony socialmedia
Eharmony socialmedia
jvandervoort16
 
5 - eHarmony Presentation Noah Conference 2011
5 - eHarmony Presentation Noah Conference 20115 - eHarmony Presentation Noah Conference 2011
5 - eHarmony Presentation Noah Conference 2011
NOAH Advisors
 
presentation
presentationpresentation
presentation
cheesebot234
 
eHarmony Creative Strategy
eHarmony Creative StrategyeHarmony Creative Strategy
eHarmony Creative Strategy
Befrank86
 
AWS Customer Presentation - eHarmony
AWS Customer Presentation - eHarmonyAWS Customer Presentation - eHarmony
AWS Customer Presentation - eHarmony
Amazon Web Services
 
Final presentation
Final presentationFinal presentation
Final presentation
cmcglaun
 
Onine dating
Onine datingOnine dating
Onine dating
Tahlia Nicholson
 
e-Harmony Study
e-Harmony Study e-Harmony Study
e-Harmony Study
oiisdp2010
 
MKTG Research eHarmony Final
MKTG Research eHarmony FinalMKTG Research eHarmony Final
MKTG Research eHarmony Final
Peter Curry
 
Big Dating at eHarmony
Big Dating at eHarmonyBig Dating at eHarmony
Big Dating at eHarmony
MongoDB
 
What is online dating?
What is online dating?What is online dating?
What is online dating?
12pm19
 
eHarmony Strategic Marketing Case Study
eHarmony Strategic Marketing Case StudyeHarmony Strategic Marketing Case Study
eHarmony Strategic Marketing Case Study
Zoe Robinson
 
Online dating
Online datingOnline dating
Online dating
Nessa Nguyen
 

Viewers also liked (14)

Research project 2014
Research project 2014Research project 2014
Research project 2014
 
Eharmony socialmedia
Eharmony socialmediaEharmony socialmedia
Eharmony socialmedia
 
5 - eHarmony Presentation Noah Conference 2011
5 - eHarmony Presentation Noah Conference 20115 - eHarmony Presentation Noah Conference 2011
5 - eHarmony Presentation Noah Conference 2011
 
presentation
presentationpresentation
presentation
 
eHarmony Creative Strategy
eHarmony Creative StrategyeHarmony Creative Strategy
eHarmony Creative Strategy
 
AWS Customer Presentation - eHarmony
AWS Customer Presentation - eHarmonyAWS Customer Presentation - eHarmony
AWS Customer Presentation - eHarmony
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Onine dating
Onine datingOnine dating
Onine dating
 
e-Harmony Study
e-Harmony Study e-Harmony Study
e-Harmony Study
 
MKTG Research eHarmony Final
MKTG Research eHarmony FinalMKTG Research eHarmony Final
MKTG Research eHarmony Final
 
Big Dating at eHarmony
Big Dating at eHarmonyBig Dating at eHarmony
Big Dating at eHarmony
 
What is online dating?
What is online dating?What is online dating?
What is online dating?
 
eHarmony Strategic Marketing Case Study
eHarmony Strategic Marketing Case StudyeHarmony Strategic Marketing Case Study
eHarmony Strategic Marketing Case Study
 
Online dating
Online datingOnline dating
Online dating
 

Similar to Dating with Models

Art of refactoring - Code Smells and Microservices Antipatterns
Art of refactoring - Code Smells and Microservices AntipatternsArt of refactoring - Code Smells and Microservices Antipatterns
Art of refactoring - Code Smells and Microservices Antipatterns
El Mahdi Benzekri
 
Operations for databases – the agile/devops journey
Operations for databases – the agile/devops journeyOperations for databases – the agile/devops journey
Operations for databases – the agile/devops journey
Eduardo Piairo
 
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
K.Mohamed Faizal
 
MongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message StoreMongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message Store
Evan Rodd
 
MongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message StoreMongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message Store
MongoDB
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
MongoDB
 
Add Redis to Postgres to Make Your Microservices Go Boom!
Add Redis to Postgres to Make Your Microservices Go Boom!Add Redis to Postgres to Make Your Microservices Go Boom!
Add Redis to Postgres to Make Your Microservices Go Boom!
Dave Nielsen
 
Taming Large Databases
Taming Large DatabasesTaming Large Databases
Taming Large Databases
Neo4j
 
Operations for databases: the agile/devops journey
Operations for databases: the agile/devops journeyOperations for databases: the agile/devops journey
Operations for databases: the agile/devops journey
Eduardo Piairo
 
PostgreSQL at 20TB and Beyond
PostgreSQL at 20TB and BeyondPostgreSQL at 20TB and Beyond
PostgreSQL at 20TB and Beyond
Chris Travers
 
Got documents - The Raven Bouns Edition
Got documents - The Raven Bouns EditionGot documents - The Raven Bouns Edition
Got documents - The Raven Bouns Edition
Maggie Pint
 
PostgreSQL as a Big Data Platform
PostgreSQL as a Big Data Platform PostgreSQL as a Big Data Platform
PostgreSQL as a Big Data Platform
Chris Travers
 
Scalability and performance for e commerce
Scalability and performance for e commerceScalability and performance for e commerce
Scalability and performance for e commerce
Ecommerce Solution Provider SysIQ
 
RedHat MRG and Infinispan for Large Scale Integration
RedHat MRG and Infinispan for Large Scale IntegrationRedHat MRG and Infinispan for Large Scale Integration
RedHat MRG and Infinispan for Large Scale Integration
prajods
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
Eugenio Minardi
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
Gibraltar Software
 
Operations for databases – The DevOps journey
Operations for databases – The DevOps journey Operations for databases – The DevOps journey
Operations for databases – The DevOps journey
Eduardo Piairo
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
elliando dias
 
Got documents Code Mash Revision
Got documents Code Mash RevisionGot documents Code Mash Revision
Got documents Code Mash Revision
Maggie Pint
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications
Tugdual Grall
 

Similar to Dating with Models (20)

Art of refactoring - Code Smells and Microservices Antipatterns
Art of refactoring - Code Smells and Microservices AntipatternsArt of refactoring - Code Smells and Microservices Antipatterns
Art of refactoring - Code Smells and Microservices Antipatterns
 
Operations for databases – the agile/devops journey
Operations for databases – the agile/devops journeyOperations for databases – the agile/devops journey
Operations for databases – the agile/devops journey
 
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
Deep Dive into SharePoint Topologies and Server Architecture for SharePoint 2013
 
MongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message StoreMongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message Store
 
MongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message StoreMongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message Store
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Add Redis to Postgres to Make Your Microservices Go Boom!
Add Redis to Postgres to Make Your Microservices Go Boom!Add Redis to Postgres to Make Your Microservices Go Boom!
Add Redis to Postgres to Make Your Microservices Go Boom!
 
Taming Large Databases
Taming Large DatabasesTaming Large Databases
Taming Large Databases
 
Operations for databases: the agile/devops journey
Operations for databases: the agile/devops journeyOperations for databases: the agile/devops journey
Operations for databases: the agile/devops journey
 
PostgreSQL at 20TB and Beyond
PostgreSQL at 20TB and BeyondPostgreSQL at 20TB and Beyond
PostgreSQL at 20TB and Beyond
 
Got documents - The Raven Bouns Edition
Got documents - The Raven Bouns EditionGot documents - The Raven Bouns Edition
Got documents - The Raven Bouns Edition
 
PostgreSQL as a Big Data Platform
PostgreSQL as a Big Data Platform PostgreSQL as a Big Data Platform
PostgreSQL as a Big Data Platform
 
Scalability and performance for e commerce
Scalability and performance for e commerceScalability and performance for e commerce
Scalability and performance for e commerce
 
RedHat MRG and Infinispan for Large Scale Integration
RedHat MRG and Infinispan for Large Scale IntegrationRedHat MRG and Infinispan for Large Scale Integration
RedHat MRG and Infinispan for Large Scale Integration
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
 
Operations for databases – The DevOps journey
Operations for databases – The DevOps journey Operations for databases – The DevOps journey
Operations for databases – The DevOps journey
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Got documents Code Mash Revision
Got documents Code Mash RevisionGot documents Code Mash Revision
Got documents Code Mash Revision
 
Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications Enabling Telco to Build and Run Modern Applications
Enabling Telco to Build and Run Modern Applications
 

Recently uploaded

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 

Recently uploaded (20)

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 

Dating with Models

  • 1. 1 Dating with ModelsDating with Models A How to Guide for Programmers and ArchitectsA How to Guide for Programmers and Architects Ryan BarkerRyan Barker
  • 2. The eHarmony Difference ›The eHarmony Difference › How are we different? • 30+ years as clinical psychologist and marriage counselor • Many failing marriages due to fundamental incompatibility Can we do better?
  • 3. The fundamental idea›The fundamental idea› 320 Questions › Personality › Values › Attitudes › Beliefs Compatibility Matching
  • 4. Compatibility Matching ›Compatibility Matching › Obstreperousness
  • 6. Compatibility Matching ›Compatibility Matching › 29 Dimensions®
  • 7. So lets build it! ›So lets build it! › Models as a stored procedure~2001
  • 8. Problems ›Problems › Stored procedures are awesome • Problem #1 – Thousands of users, very few matches. Entire company is at stake • Resolution – Line by line debugging of stored procedure finds an AND that should be an OR • Problem #2 – Database load increasing • Resolution – Optimize stored procedure? More hardware? Rewrite? • Problem #3 – Order by compatibility does not work • Resolution – Change stored procedure? Find a way to introduce models
  • 9. Match Distribution 3 Compatibility Matching 1 Affinity Matching 2 The eHarmony Difference ›The eHarmony Difference › Compatibility Matching System® Layers on Top of Compatibility Matching
  • 10. 61 21 3000 Affinity Matching ›Affinity Matching ›
  • 12. Affinity Matching ›Affinity Matching › Distance Prob( )
  • 13. Affinity Matching ›Affinity Matching › Distance
  • 14. Affinity Matching ›Affinity Matching › Height difference Prob( ) 4 - 8 in cm
  • 15. Affinity Matching ›Affinity Matching › “Attractiveness” Prob( )
  • 16. Redesign ›Redesign › Event based matching with Java/Groovy models
  • 17. Problems ›Problems › Better but still suboptimal • Problem #1 – Suboptimal distribution of matches • Resolution – Shuffle loop order each day? Introduce an optimizer! • Problem #2 – Nightly match run taking 27 hours, heavy database load • Resolution – Move to an offline process • Problem #3 – Java models require testing and new releases. Groovy models are too slow • Resolution – Change to configuration based models
  • 18. Compatibility Matching 1 Affinity Matching 2 Match Distribution 3 The eHarmony Difference ›The eHarmony Difference › Compatibility Matching System® Delivering the right matches at the right time to as many people as possible across the entire network.
  • 19. Match Distribution ›Match Distribution › Graph optimization 2 21Prob( | data)
  • 20. Match Distribution ›Match Distribution › Graph optimization 2 2Prob( | data)
  • 21. Match Distribution ›Match Distribution › Graph optimization 2 2Prob( | data)
  • 22. 23
  • 23. Match Distribution ›Match Distribution › Does it work?
  • 24. Problems ›Problems › The design is never finished • Problem #1 – More data required • Resolution – Build services to collect data in real time • Problem #2 – Bandwidth limitations • Resolution – Switch to protocol buffers • Problem #3 – Can’t reprocess people fast enough due to database load • Resolution – Switch to key value store backed services
  • 25. Rearchitecture ›Rearchitecture › Services for everything
  • 26. Rearchitecture ›Rearchitecture › Service features • RESTful data oriented design • Single element • GET – Return single element • POST – Update single element • PUT – Create single element • DELETE – Delete single element • Multiple element • GET – Return list of elements • Produces/Consumes JSON or Protobuf • JAX-RS providers transparently convert between formats • Accept/ContentType: X-application-protobuf
  • 27. Rearchitecture ›Rearchitecture › Service Client features • Generic client customized for each service • Single element • GET – Return single element • POST – Update single element • PUT – Create single element • DELETE – Delete single element • Multiple element • GET – Return list of elements • BATCH – Scatter gather implementation • Protocol buffer based by default, falls back to JSON for older services • Configurable retries for GET/PUT/DELETE
  • 28. Current Day ›Current Day › Matching User Service Matching User Service is a data aggregation service that gathers data from various sources, and stores them in a key value store •REST + Protocol buffer based • /user-service/<version>/users/<user-id> • Supports full and partial updates • Supports single and batch gets • 1000+ data attributes, • ~4KB each uncompressed •Key: Userid •Value: UserProto
  • 29. Current Day ›Current Day › Matching User Servic
  • 30. Current Day ›Current Day › Matching User Service
  • 31. Current Day ›Current Day › Matching User Service
  • 32. Current Day ›Current Day › Pairing Service Pairing Service is a data service that supports a specialized set of operations •REST + Protocol buffer based • GET/PUT/DELETE /pairings- service/<version>/pairings/<type>/users/<user-id> • DELETE /pairings- service/<version>/pairings/<type>/users/<user- id>/candidates/<candidate-id> • 4 data attributes per pairing • 0 to tens of thousands of pairings per user •Stores: 1 per type •Key: Userid •Value: PairingsProto
  • 33. Current Day ›Current Day › Scoring Service Scoring Service is a stateless calculation service that supports JSON based models •REST + Protocol buffer based • GET /scoring-service/<version>/users/<user- id>/models/<modelname>/score • POST /scoring- service/<version>/models/<modelname>/score •Knows how to fetch data from data sources for some models •All models slowly being centralized in one place •Underlying library supports any protobuf or map •Possible candidate for redesign?
  • 34. Current Day ›Current Day › Model Frameworks 3.0 Model Frameworks 3.0 is the core library behind all scoring •JSON based model definitions •Scala DSL implementation with bytecode generation •Supports Protobuffs (Message), ResultSet, Maps •Examples • “same_religion” : ”{user.profile.religion} == {cand.profile.religion}” • “bin_age_diff” : ”bin(bins, {user.calculatedValues.age} - {cand.calculatedValues.age})”
  • 35. Current Day ›Current Day › Offline Matching – Spring Conductor
  • 36. Current Day ›Current Day › Offline Matching – Hadoop flow

Editor's Notes

  1. Hello and thank you Pleasure to be here. Today I am here to talk about what is happening behind the scenes @ eharmony. We were one of the first companies to apply sophisticated technology to the very old concept of matchmaking . eHarmony takes a very different approach from other online dating sites, … search-based . On those sites, you determine your preferences – and filter out That ’ s one valid approach . But eHarmony is different . eHarmony was created to give people a [better chance] and a better way to find a great long-term relationship. Many of you may know from our old television commercials that eHarmony was founded by [Dr. Neil Clark Warren ]. You may not know that he was a clinical psychologist and marriage counselor in Pasadena , California for more than 30 years . A lot of the couples Dr. Warren counseled were in failing marriages . Over the years, he realized that marriages often fall apart when the people in them are fundamentally incompatible . Dr. Warren believed that the best way to create happier marriages and reduce some of the negative effects of divorce was to give people a better chance of marrying the right person in the first place. That insight led to a lot of questions: What makes some couples more satisfied in their relationships over time than others? Can long-term relationship satisfaction be predicted ? If so, can those qualities be used to match single people ? Dr. Warren and the founding team at eHarmony began researching those questions by studying several thousand married couples . They discovered that there are common traits that distinguish the most satisfied married couples from others.  Thus, in the late 90s, eHarmony was born .
  2. eHarmony was created to give people a [better chance] and a better way to find a great long-term relationship. Many of you may know from our old television commercials that eHarmony was founded by [Dr. Neil Clark Warren ]. You may not know that he was a clinical psychologist and marriage counselor in Pasadena , California for more than 30 years . A lot of the couples Dr. Warren counseled were in failing marriages . Over the years, he realized that marriages often fall apart when the people in them are fundamentally incompatible . Dr. Warren believed that the best way to create happier marriages and reduce some of the negative effects of divorce was to give people a better chance of marrying the right person in the first place. That insight led to a lot of questions: What makes some couples more satisfied in their relationships over time than others? Can long-term relationship satisfaction be predicted ? If so, can those qualities be used to match single people ? Dr. Warren and the founding team at eHarmony began researching those questions by studying several thousand married couples . They discovered that there are common traits that distinguish the most satisfied married couples from others.  Thus, in the late 90s, eHarmony was born .
  3. That is compatibility matching Similarity on dims that don ’ t get discussed When asked: “ Are you happy with yourself? ” Important but not pickup line. That ’ s why RQ A very good snapshot of personality
  4. Core traits and vital attrs Core traits: [CLICK TO BUILD] Vital attributes  Initial eH model
  5. Here is the initial eHarmony model Only pairs with high chance to be very happy together are introduced.
  6. If no click  no comm Compatibility and chemistry are two very different things. interests provide something to talk about. His matches have to like him back. Affinity Matching is about
  7. Every match eHarmony makes is compatible. That is from the personality perspective. However not all matches end up talking to each other. Sometimes the age gapcould be too big Other times the users may live too far. There are too many reasons to count. We are trying to deliver as many matches as possible where both users are interested in each other, start communicating and get to know each other.
  8. That leads me to the last piece of our matching process, which we call match distribution. We need to make sure that we ’ re presenting the right matches… to the right users… at the right time… to as many people as possible across our entire network, every day. Network changing every day Let me illustrate this.
  9. Now we ’ re not doing those joins on disk at all. For each potential match we want to process, we can load relevant user data on demand from each side from our voldemort cache. Ths was loaded with user data by a previous mapreduce step. now we ’ re joining in ram, record by record, on demand. At the end of the evalutation we ’ ve actually thrown away most of the data we don ’ t need after we ’ ve used it. Did I meantion this gave us a 10x speedup over conventional hadoop joins. It ’ s worth repeating: we got an order of magnitude performance improvement by doing this technique.
  10. It worked
  11. How it works for adam? matches Interested in Julia Break the ice? Pick up lines no good
  12. Doing matchmaking well requires an innate understanding of your customers and the sophistication to use that data to deliver a valuable experience. All the advances in computing power and algorithms have recently opened up a lot of new possibilities and applications. I ’ m happy to talk with any of you further if you have questions about eHarmony or how to apply matchmaking to your own businesses. Thank you.