Wednesday, June 12, 13
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese an...
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the sprea...
Data Science of Love
Vaclav Petricek @petricek
Wednesday, June 12, 13
The eHarmony Difference › Who we are
~45% Tech
Wednesday, June 12, 13
The eHarmony Difference › Who we are
~15% Customer Care
~45% Tech
Wednesday, June 12, 13
The eHarmony Difference › Who we are
~15% Customer Care
~45% Tech
~10% Marketing
Wednesday, June 12, 13
The eHarmony Difference › Compatibility Matching System®
Wednesday, June 12, 13
The eHarmony Difference › Compatibility Matching System®
Compatibility Matching
System®
Wednesday, June 12, 13
The eHarmony Difference › Compatibility Matching System®
Compatibility Matching
System®
Compatibility
Matching
1
Wednesday...
The eHarmony Difference › Compatibility Matching System®
Compatibility Matching
System®
Compatibility
Matching
1
Affinity
...
The eHarmony Difference › Compatibility Matching System®
Compatibility Matching
System®
Match
Distribution
3
Compatibility...
The eHarmony Difference
Wednesday, June 12, 13
Affinity
Matching
Match
Distribution
2 3
The eHarmony Difference › Compatibility Matching System®
Compatibility
Matching
1...
Affinity
Matching
Match
Distribution
2 3
The eHarmony Difference › Compatibility Matching System®
Compatibility
Matching
1...
Wednesday, June 12, 13
Wednesday, June 12, 13
150	
  
ques)ons
Wednesday, June 12, 13
150	
  
ques)ons
Personality
Values
A5ributes
Beliefs
Wednesday, June 12, 13
Compatibility Matching › Obstreperousness
Wednesday, June 12, 13
Compatibility Matching › Romantic
Wednesday, June 12, 13
CMP (CMP Makes Pairings)
Wednesday, June 12, 13
CMP (CMP Makes Pairings)
Wednesday, June 12, 13
CMP (CMP Makes Pairings)
Wednesday, June 12, 13
CMP (CMP Makes Pairings)
Compa)bility	
  
Models
Wednesday, June 12, 13
Compatibility Matching ›
Wednesday, June 12, 13
Compatibility Matching ›
Wednesday, June 12, 13
Match
Distribution
3
Compatibility
Matching
1
Affinity
Matching
2
The eHarmony Difference › Compatibility Matching System®...
Match
Distribution
3
Compatibility
Matching
1
Affinity
Matching
2
The eHarmony Difference › Compatibility Matching System®...
Affinity Matching ›
Wednesday, June 12, 13
61 21
Affinity Matching ›
Wednesday, June 12, 13
61 21
3000
Affinity Matching ›
Wednesday, June 12, 13
61 21
3000
Affinity Matching ›
Wednesday, June 12, 13
Affinity Matching ›
Wednesday, June 12, 13
………
Affinity Matching ›
Wednesday, June 12, 13
Affinity Matching › Distance
Prob(	
  	
  	
  	
  	
  	
  	
  )
Wednesday, June 12, 13
Affinity Matching › Distance
Wednesday, June 12, 13
Affinity Matching › Height difference
Prob(	
  	
  	
  	
  	
  	
  	
  ) 4	
  -­‐	
  8	
  in
cm
Wednesday, June 12, 13
Affinity Matching › “Attractiveness”
Prob(	
  	
  	
  	
  	
  	
  	
  )
Wednesday, June 12, 13
Affinity Matching › Zoom level
Wednesday, June 12, 13
Affinity Matching › Zoom level
Wednesday, June 12, 13
Affinity Matching › Zoom level
Wednesday, June 12, 13
25% -­‐1%-­‐1% -­‐24% 20% 13%
9% -­‐5%-­‐5% -­‐27% 7% 0%9% -­‐5%-­‐5% -­‐27% 7% 0%
-­‐12% -­‐21%-­‐21% -­‐42% -­‐19% -­‐23...
25% -­‐1%-­‐1% -­‐24% 20% 13%
9% -­‐5%-­‐5% -­‐27% 7% 0%9% -­‐5%-­‐5% -­‐27% 7% 0%
-­‐12% -­‐21%-­‐21% -­‐42% -­‐19% -­‐23...
Wednesday, June 12, 13
Wednesday, June 12, 13
Wednesday, June 12, 13
Wednesday, June 12, 13
Wednesday, June 12, 13
Affinity Matching ›
~40M	
  registered	
  users
~10^7	
  matches	
  per	
  day
~10^3	
  a5ributes
...
...
Prob( | data)
?
...
Affinity Matching ›
~40M	
  registered	
  users
~10^7	
  matches	
  per	
  day
~10^3	
  a5ributes
...
...
Prob( | data)
?
...
1TB RAM
Wednesday, June 12, 13
Maestro: Data
Protocol	
  Buffers
distcp
Wednesday, June 12, 13
Modeling: Maestro
UserMatchCommunica)on
feature	
  expansion
Sparse	
  
ML	
  format
models
Wednesday, June 12, 13
Modeling: Model parametrizations
Model	
  parameters
features
weights
tree	
  splits
Calibra)on	
  Spline
DISTANCE:534
Wed...
Modeling: Model parametrizations
Model	
  parameters
features
weights
tree	
  splits
Calibra)on	
  Spline
DISTANCE:534
DSL...
Modeling: Scala DSL
“same_religion”:”${user.profile.religion}=={cand.profile.religion}”
“cmp_drinking”:”cmp(${user.profile...
750M	
  Compressed
Protocol	
  Buffers
Production: Spring Conductor
Map-­‐side	
  joins
(TB)
Matching	
  User	
  Serice
Pai...
?
Production: FeatureX (expensive features)
FeatureX
LSH
NLP
Voldemort	
  backed	
  
Service
Wednesday, June 12, 13
Production: User Activity Service
User
Ac)vity
Service
10K	
  events/s
Matching
User
Service
~5ms	
  response
?
Event	
  L...
eHarmony & OpenSource
github.com/petricek/datatools
github.com/eHarmony/seeking
github.com/eHarmony/hive
springsource.org/...
Compatibility
Matching
1
Affinity
Matching
2
Match
Distribution
3
The eHarmony Difference › Compatibility Matching System®...
Compatibility
Matching
1
Affinity
Matching
2
Match
Distribution
3
The eHarmony Difference › Compatibility Matching System®...
Match Distribution › Graph optimization
Wednesday, June 12, 13
Match Distribution › Graph optimization
Wednesday, June 12, 13
Match Distribution › Graph optimization
2 2
Wednesday, June 12, 13
Match Distribution › Graph optimization
2 21
Wednesday, June 12, 13
Match Distribution › Graph optimization
2 21Prob( | data)
Wednesday, June 12, 13
Match Distribution › Graph optimization
2 21Prob( | data)
Wednesday, June 12, 13
Match Distribution › Graph optimization
2 2Prob( | data)
Wednesday, June 12, 13
Match Distribution › Graph optimization
2 2Prob( | data)
Wednesday, June 12, 13
Resulting Customer Experience › Guided
Communication
Wednesday, June 12, 13
Resulting Customer Experience › Guided
Communication
Wednesday, June 12, 13
? !
Resulting Customer Experience › Guided
Communication
Wednesday, June 12, 13
Resulting Customer Experience › Success!
Wednesday, June 12, 13
Resulting Customer Experience › Success!
Wednesday, June 12, 13
eHarmony Results › The eHarmony Impact
2005
90
eHarmony Members
Married Every Day
Wednesday, June 12, 13
eHarmony Results › The eHarmony Impact
2005 2007
236
eHarmony Members
Married Every Day
Wednesday, June 12, 13
eHarmony Results › The eHarmony Impact
2005 2007 2009
542
eHarmony Members
Married Every Day
Wednesday, June 12, 13
Proceedings of National Academy of Sciences
Wednesday, June 12, 13
Press coverage
Wednesday, June 12, 13
Since	
  2005,	
  about	
  1/3	
  of	
  couples	
  
who	
  have	
  married	
  in	
  the	
  US	
  
have	
  met	
  online	
 ...
Rates of breakup or divorce
0%
2.0%
4.0%
6.0%
8.0%
All Online Offline
*	
  according	
  to	
  survey	
  of	
  couples	
  m...
The	
  largest	
  number	
  
of	
  marriages	
  surveyed	
  
who	
  met	
  via	
  online	
  da)ng	
  
had	
  met	
  on	
  ...
Rates of breakup or divorce
0%
2.0%
4.0%
6.0%
8.0%
eHarmony All Other Online Offline
*	
  according	
  to	
  survey	
  of	...
Rates of breakup or divorce
0%
2.0%
4.0%
6.0%
8.0%
eHarmony All Other Online Offline
*	
  according	
  to	
  survey	
  of	...
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/eharmony
-hadoop
Upcoming SlideShare
Loading in...5
×

Data Science of Love

435

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/12jQfPk.

Vaclav Petricek digs some of the romantic interactions nuggets hidden in eHarmony's large collection of human relationships.Filmed at qconnewyork.com.

Vaclav Petricek is a Principal Data Scientist at Santa Monica-based eHarmony where he is responsible for optimization and machine learning in eHarmony's core matchmaking algorithms. He also runs a series of invited ML talks at eHarmony. He was Visiting Researcher at University College, London where his research spanned recommender systems, social networks, web structure and online auctions.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
435
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Data Science of Love

  1. 1. Wednesday, June 12, 13
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /eharmony-hadoop
  3. 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. Data Science of Love Vaclav Petricek @petricek Wednesday, June 12, 13
  5. 5. The eHarmony Difference › Who we are ~45% Tech Wednesday, June 12, 13
  6. 6. The eHarmony Difference › Who we are ~15% Customer Care ~45% Tech Wednesday, June 12, 13
  7. 7. The eHarmony Difference › Who we are ~15% Customer Care ~45% Tech ~10% Marketing Wednesday, June 12, 13
  8. 8. The eHarmony Difference › Compatibility Matching System® Wednesday, June 12, 13
  9. 9. The eHarmony Difference › Compatibility Matching System® Compatibility Matching System® Wednesday, June 12, 13
  10. 10. The eHarmony Difference › Compatibility Matching System® Compatibility Matching System® Compatibility Matching 1 Wednesday, June 12, 13
  11. 11. The eHarmony Difference › Compatibility Matching System® Compatibility Matching System® Compatibility Matching 1 Affinity Matching 2 Wednesday, June 12, 13
  12. 12. The eHarmony Difference › Compatibility Matching System® Compatibility Matching System® Match Distribution 3 Compatibility Matching 1 Affinity Matching 2 Wednesday, June 12, 13
  13. 13. The eHarmony Difference Wednesday, June 12, 13
  14. 14. Affinity Matching Match Distribution 2 3 The eHarmony Difference › Compatibility Matching System® Compatibility Matching 1 Wednesday, June 12, 13
  15. 15. Affinity Matching Match Distribution 2 3 The eHarmony Difference › Compatibility Matching System® Compatibility Matching 1 Wednesday, June 12, 13
  16. 16. Wednesday, June 12, 13
  17. 17. Wednesday, June 12, 13
  18. 18. 150   ques)ons Wednesday, June 12, 13
  19. 19. 150   ques)ons Personality Values A5ributes Beliefs Wednesday, June 12, 13
  20. 20. Compatibility Matching › Obstreperousness Wednesday, June 12, 13
  21. 21. Compatibility Matching › Romantic Wednesday, June 12, 13
  22. 22. CMP (CMP Makes Pairings) Wednesday, June 12, 13
  23. 23. CMP (CMP Makes Pairings) Wednesday, June 12, 13
  24. 24. CMP (CMP Makes Pairings) Wednesday, June 12, 13
  25. 25. CMP (CMP Makes Pairings) Compa)bility   Models Wednesday, June 12, 13
  26. 26. Compatibility Matching › Wednesday, June 12, 13
  27. 27. Compatibility Matching › Wednesday, June 12, 13
  28. 28. Match Distribution 3 Compatibility Matching 1 Affinity Matching 2 The eHarmony Difference › Compatibility Matching System® Wednesday, June 12, 13
  29. 29. Match Distribution 3 Compatibility Matching 1 Affinity Matching 2 The eHarmony Difference › Compatibility Matching System® Layers on Top of Compatibility Matching Wednesday, June 12, 13
  30. 30. Affinity Matching › Wednesday, June 12, 13
  31. 31. 61 21 Affinity Matching › Wednesday, June 12, 13
  32. 32. 61 21 3000 Affinity Matching › Wednesday, June 12, 13
  33. 33. 61 21 3000 Affinity Matching › Wednesday, June 12, 13
  34. 34. Affinity Matching › Wednesday, June 12, 13
  35. 35. ……… Affinity Matching › Wednesday, June 12, 13
  36. 36. Affinity Matching › Distance Prob(              ) Wednesday, June 12, 13
  37. 37. Affinity Matching › Distance Wednesday, June 12, 13
  38. 38. Affinity Matching › Height difference Prob(              ) 4  -­‐  8  in cm Wednesday, June 12, 13
  39. 39. Affinity Matching › “Attractiveness” Prob(              ) Wednesday, June 12, 13
  40. 40. Affinity Matching › Zoom level Wednesday, June 12, 13
  41. 41. Affinity Matching › Zoom level Wednesday, June 12, 13
  42. 42. Affinity Matching › Zoom level Wednesday, June 12, 13
  43. 43. 25% -­‐1%-­‐1% -­‐24% 20% 13% 9% -­‐5%-­‐5% -­‐27% 7% 0%9% -­‐5%-­‐5% -­‐27% 7% 0% -­‐12% -­‐21%-­‐21% -­‐42% -­‐19% -­‐23% 19% 0%0% -­‐28% 28% 10% 9% -­‐11%-­‐11% -­‐35% 11% 44% Affinity Matching › Food preference Wednesday, June 12, 13
  44. 44. 25% -­‐1%-­‐1% -­‐24% 20% 13% 9% -­‐5%-­‐5% -­‐27% 7% 0%9% -­‐5%-­‐5% -­‐27% 7% 0% -­‐12% -­‐21%-­‐21% -­‐42% -­‐19% -­‐23% 19% 0%0% -­‐28% 28% 10% 9% -­‐11%-­‐11% -­‐35% 11% 44% Affinity Matching › Food preference Wednesday, June 12, 13
  45. 45. Wednesday, June 12, 13
  46. 46. Wednesday, June 12, 13
  47. 47. Wednesday, June 12, 13
  48. 48. Wednesday, June 12, 13
  49. 49. Wednesday, June 12, 13
  50. 50. Affinity Matching › ~40M  registered  users ~10^7  matches  per  day ~10^3  a5ributes ... ... Prob( | data) ? ~10^8  daily Prob( | features) Wednesday, June 12, 13
  51. 51. Affinity Matching › ~40M  registered  users ~10^7  matches  per  day ~10^3  a5ributes ... ... Prob( | data) ? ~10^8  daily Prob( | features) Unsupervised  features (LDA,  classifiers) Constructed  features Wednesday, June 12, 13
  52. 52. 1TB RAM Wednesday, June 12, 13
  53. 53. Maestro: Data Protocol  Buffers distcp Wednesday, June 12, 13
  54. 54. Modeling: Maestro UserMatchCommunica)on feature  expansion Sparse   ML  format models Wednesday, June 12, 13
  55. 55. Modeling: Model parametrizations Model  parameters features weights tree  splits Calibra)on  Spline DISTANCE:534 Wednesday, June 12, 13
  56. 56. Modeling: Model parametrizations Model  parameters features weights tree  splits Calibra)on  Spline DISTANCE:534 DSL Wednesday, June 12, 13
  57. 57. Modeling: Scala DSL “same_religion”:”${user.profile.religion}=={cand.profile.religion}” “cmp_drinking”:”cmp(${user.profile.drinking},{cand.profile.drinking})” < “strict_distance_u”:”${user.profile.accepted_distance}<={pairing.distance}” 60miles Wednesday, June 12, 13
  58. 58. 750M  Compressed Protocol  Buffers Production: Spring Conductor Map-­‐side  joins (TB) Matching  User  Serice Pairings  Browser   Service 1+G  Compressed  Protocol  Buffers   Scorer Wednesday, June 12, 13
  59. 59. ? Production: FeatureX (expensive features) FeatureX LSH NLP Voldemort  backed   Service Wednesday, June 12, 13
  60. 60. Production: User Activity Service User Ac)vity Service 10K  events/s Matching User Service ~5ms  response ? Event  Listener Wednesday, June 12, 13
  61. 61. eHarmony & OpenSource github.com/petricek/datatools github.com/eHarmony/seeking github.com/eHarmony/hive springsource.org/spring-­‐data/hadoop github.com/JohnLangford/vowpal_wabbit Wednesday, June 12, 13
  62. 62. Compatibility Matching 1 Affinity Matching 2 Match Distribution 3 The eHarmony Difference › Compatibility Matching System® Wednesday, June 12, 13
  63. 63. Compatibility Matching 1 Affinity Matching 2 Match Distribution 3 The eHarmony Difference › Compatibility Matching System® Delivering the right matches at the right time to as many people as possible across the entire network. Wednesday, June 12, 13
  64. 64. Match Distribution › Graph optimization Wednesday, June 12, 13
  65. 65. Match Distribution › Graph optimization Wednesday, June 12, 13
  66. 66. Match Distribution › Graph optimization 2 2 Wednesday, June 12, 13
  67. 67. Match Distribution › Graph optimization 2 21 Wednesday, June 12, 13
  68. 68. Match Distribution › Graph optimization 2 21Prob( | data) Wednesday, June 12, 13
  69. 69. Match Distribution › Graph optimization 2 21Prob( | data) Wednesday, June 12, 13
  70. 70. Match Distribution › Graph optimization 2 2Prob( | data) Wednesday, June 12, 13
  71. 71. Match Distribution › Graph optimization 2 2Prob( | data) Wednesday, June 12, 13
  72. 72. Resulting Customer Experience › Guided Communication Wednesday, June 12, 13
  73. 73. Resulting Customer Experience › Guided Communication Wednesday, June 12, 13
  74. 74. ? ! Resulting Customer Experience › Guided Communication Wednesday, June 12, 13
  75. 75. Resulting Customer Experience › Success! Wednesday, June 12, 13
  76. 76. Resulting Customer Experience › Success! Wednesday, June 12, 13
  77. 77. eHarmony Results › The eHarmony Impact 2005 90 eHarmony Members Married Every Day Wednesday, June 12, 13
  78. 78. eHarmony Results › The eHarmony Impact 2005 2007 236 eHarmony Members Married Every Day Wednesday, June 12, 13
  79. 79. eHarmony Results › The eHarmony Impact 2005 2007 2009 542 eHarmony Members Married Every Day Wednesday, June 12, 13
  80. 80. Proceedings of National Academy of Sciences Wednesday, June 12, 13
  81. 81. Press coverage Wednesday, June 12, 13
  82. 82. Since  2005,  about  1/3  of  couples   who  have  married  in  the  US   have  met  online  (35%) eHarmony Results › The eHarmony Impact *  according  to  survey  of  couples  married  between  2005-­‐2012  by  Harris  InteracQve  for  eHarmony Wednesday, June 12, 13
  83. 83. Rates of breakup or divorce 0% 2.0% 4.0% 6.0% 8.0% All Online Offline *  according  to  survey  of  couples  married  between  2005-­‐2012  by  Harris  InteracQve  for  eHarmony Wednesday, June 12, 13
  84. 84. The  largest  number   of  marriages  surveyed   who  met  via  online  da)ng   had  met  on  eHarmony  (25%) eHarmony Results › The eHarmony Impact *  according  to  survey  of  couples  married  between  2005-­‐2012  by  Harris  InteracQve  for  eHarmony Wednesday, June 12, 13
  85. 85. Rates of breakup or divorce 0% 2.0% 4.0% 6.0% 8.0% eHarmony All Other Online Offline *  according  to  survey  of  couples  married  between  2005-­‐2012  by  by  Harris  InteracQve  for  eHarmony Wednesday, June 12, 13
  86. 86. Rates of breakup or divorce 0% 2.0% 4.0% 6.0% 8.0% eHarmony All Other Online Offline *  according  to  survey  of  couples  married  between  2005-­‐2012  by  by  Harris  InteracQve  for  eHarmony @petricek linkedin.com/in/petricek bit.ly/jobateharmony Wednesday, June 12, 13
  87. 87. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/eharmony -hadoop

×