Data Science of Love

  • 392 views
Uploaded on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/12jQfPk. …

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/12jQfPk.

Vaclav Petricek digs some of the romantic interactions nuggets hidden in eHarmony's large collection of human relationships.Filmed at qconnewyork.com.

Vaclav Petricek is a Principal Data Scientist at Santa Monica-based eHarmony where he is responsible for optimization and machine learning in eHarmony's core matchmaking algorithms. He also runs a series of invited ML talks at eHarmony. He was Visiting Researcher at University College, London where his research spanned recommender systems, social networks, web structure and online auctions.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
392
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Wednesday, June 12, 13
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /eharmony-hadoop
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. Data Science of Love Vaclav Petricek @petricek Wednesday, June 12, 13
  • 5. The eHarmony Difference › Who we are ~45% Tech Wednesday, June 12, 13
  • 6. The eHarmony Difference › Who we are ~15% Customer Care ~45% Tech Wednesday, June 12, 13
  • 7. The eHarmony Difference › Who we are ~15% Customer Care ~45% Tech ~10% Marketing Wednesday, June 12, 13
  • 8. The eHarmony Difference › Compatibility Matching System® Wednesday, June 12, 13
  • 9. The eHarmony Difference › Compatibility Matching System® Compatibility Matching System® Wednesday, June 12, 13
  • 10. The eHarmony Difference › Compatibility Matching System® Compatibility Matching System® Compatibility Matching 1 Wednesday, June 12, 13
  • 11. The eHarmony Difference › Compatibility Matching System® Compatibility Matching System® Compatibility Matching 1 Affinity Matching 2 Wednesday, June 12, 13
  • 12. The eHarmony Difference › Compatibility Matching System® Compatibility Matching System® Match Distribution 3 Compatibility Matching 1 Affinity Matching 2 Wednesday, June 12, 13
  • 13. The eHarmony Difference Wednesday, June 12, 13
  • 14. Affinity Matching Match Distribution 2 3 The eHarmony Difference › Compatibility Matching System® Compatibility Matching 1 Wednesday, June 12, 13
  • 15. Affinity Matching Match Distribution 2 3 The eHarmony Difference › Compatibility Matching System® Compatibility Matching 1 Wednesday, June 12, 13
  • 16. Wednesday, June 12, 13
  • 17. Wednesday, June 12, 13
  • 18. 150   ques)ons Wednesday, June 12, 13
  • 19. 150   ques)ons Personality Values A5ributes Beliefs Wednesday, June 12, 13
  • 20. Compatibility Matching › Obstreperousness Wednesday, June 12, 13
  • 21. Compatibility Matching › Romantic Wednesday, June 12, 13
  • 22. CMP (CMP Makes Pairings) Wednesday, June 12, 13
  • 23. CMP (CMP Makes Pairings) Wednesday, June 12, 13
  • 24. CMP (CMP Makes Pairings) Wednesday, June 12, 13
  • 25. CMP (CMP Makes Pairings) Compa)bility   Models Wednesday, June 12, 13
  • 26. Compatibility Matching › Wednesday, June 12, 13
  • 27. Compatibility Matching › Wednesday, June 12, 13
  • 28. Match Distribution 3 Compatibility Matching 1 Affinity Matching 2 The eHarmony Difference › Compatibility Matching System® Wednesday, June 12, 13
  • 29. Match Distribution 3 Compatibility Matching 1 Affinity Matching 2 The eHarmony Difference › Compatibility Matching System® Layers on Top of Compatibility Matching Wednesday, June 12, 13
  • 30. Affinity Matching › Wednesday, June 12, 13
  • 31. 61 21 Affinity Matching › Wednesday, June 12, 13
  • 32. 61 21 3000 Affinity Matching › Wednesday, June 12, 13
  • 33. 61 21 3000 Affinity Matching › Wednesday, June 12, 13
  • 34. Affinity Matching › Wednesday, June 12, 13
  • 35. ……… Affinity Matching › Wednesday, June 12, 13
  • 36. Affinity Matching › Distance Prob(              ) Wednesday, June 12, 13
  • 37. Affinity Matching › Distance Wednesday, June 12, 13
  • 38. Affinity Matching › Height difference Prob(              ) 4  -­‐  8  in cm Wednesday, June 12, 13
  • 39. Affinity Matching › “Attractiveness” Prob(              ) Wednesday, June 12, 13
  • 40. Affinity Matching › Zoom level Wednesday, June 12, 13
  • 41. Affinity Matching › Zoom level Wednesday, June 12, 13
  • 42. Affinity Matching › Zoom level Wednesday, June 12, 13
  • 43. 25% -­‐1%-­‐1% -­‐24% 20% 13% 9% -­‐5%-­‐5% -­‐27% 7% 0%9% -­‐5%-­‐5% -­‐27% 7% 0% -­‐12% -­‐21%-­‐21% -­‐42% -­‐19% -­‐23% 19% 0%0% -­‐28% 28% 10% 9% -­‐11%-­‐11% -­‐35% 11% 44% Affinity Matching › Food preference Wednesday, June 12, 13
  • 44. 25% -­‐1%-­‐1% -­‐24% 20% 13% 9% -­‐5%-­‐5% -­‐27% 7% 0%9% -­‐5%-­‐5% -­‐27% 7% 0% -­‐12% -­‐21%-­‐21% -­‐42% -­‐19% -­‐23% 19% 0%0% -­‐28% 28% 10% 9% -­‐11%-­‐11% -­‐35% 11% 44% Affinity Matching › Food preference Wednesday, June 12, 13
  • 45. Wednesday, June 12, 13
  • 46. Wednesday, June 12, 13
  • 47. Wednesday, June 12, 13
  • 48. Wednesday, June 12, 13
  • 49. Wednesday, June 12, 13
  • 50. Affinity Matching › ~40M  registered  users ~10^7  matches  per  day ~10^3  a5ributes ... ... Prob( | data) ? ~10^8  daily Prob( | features) Wednesday, June 12, 13
  • 51. Affinity Matching › ~40M  registered  users ~10^7  matches  per  day ~10^3  a5ributes ... ... Prob( | data) ? ~10^8  daily Prob( | features) Unsupervised  features (LDA,  classifiers) Constructed  features Wednesday, June 12, 13
  • 52. 1TB RAM Wednesday, June 12, 13
  • 53. Maestro: Data Protocol  Buffers distcp Wednesday, June 12, 13
  • 54. Modeling: Maestro UserMatchCommunica)on feature  expansion Sparse   ML  format models Wednesday, June 12, 13
  • 55. Modeling: Model parametrizations Model  parameters features weights tree  splits Calibra)on  Spline DISTANCE:534 Wednesday, June 12, 13
  • 56. Modeling: Model parametrizations Model  parameters features weights tree  splits Calibra)on  Spline DISTANCE:534 DSL Wednesday, June 12, 13
  • 57. Modeling: Scala DSL “same_religion”:”${user.profile.religion}=={cand.profile.religion}” “cmp_drinking”:”cmp(${user.profile.drinking},{cand.profile.drinking})” < “strict_distance_u”:”${user.profile.accepted_distance}<={pairing.distance}” 60miles Wednesday, June 12, 13
  • 58. 750M  Compressed Protocol  Buffers Production: Spring Conductor Map-­‐side  joins (TB) Matching  User  Serice Pairings  Browser   Service 1+G  Compressed  Protocol  Buffers   Scorer Wednesday, June 12, 13
  • 59. ? Production: FeatureX (expensive features) FeatureX LSH NLP Voldemort  backed   Service Wednesday, June 12, 13
  • 60. Production: User Activity Service User Ac)vity Service 10K  events/s Matching User Service ~5ms  response ? Event  Listener Wednesday, June 12, 13
  • 61. eHarmony & OpenSource github.com/petricek/datatools github.com/eHarmony/seeking github.com/eHarmony/hive springsource.org/spring-­‐data/hadoop github.com/JohnLangford/vowpal_wabbit Wednesday, June 12, 13
  • 62. Compatibility Matching 1 Affinity Matching 2 Match Distribution 3 The eHarmony Difference › Compatibility Matching System® Wednesday, June 12, 13
  • 63. Compatibility Matching 1 Affinity Matching 2 Match Distribution 3 The eHarmony Difference › Compatibility Matching System® Delivering the right matches at the right time to as many people as possible across the entire network. Wednesday, June 12, 13
  • 64. Match Distribution › Graph optimization Wednesday, June 12, 13
  • 65. Match Distribution › Graph optimization Wednesday, June 12, 13
  • 66. Match Distribution › Graph optimization 2 2 Wednesday, June 12, 13
  • 67. Match Distribution › Graph optimization 2 21 Wednesday, June 12, 13
  • 68. Match Distribution › Graph optimization 2 21Prob( | data) Wednesday, June 12, 13
  • 69. Match Distribution › Graph optimization 2 21Prob( | data) Wednesday, June 12, 13
  • 70. Match Distribution › Graph optimization 2 2Prob( | data) Wednesday, June 12, 13
  • 71. Match Distribution › Graph optimization 2 2Prob( | data) Wednesday, June 12, 13
  • 72. Resulting Customer Experience › Guided Communication Wednesday, June 12, 13
  • 73. Resulting Customer Experience › Guided Communication Wednesday, June 12, 13
  • 74. ? ! Resulting Customer Experience › Guided Communication Wednesday, June 12, 13
  • 75. Resulting Customer Experience › Success! Wednesday, June 12, 13
  • 76. Resulting Customer Experience › Success! Wednesday, June 12, 13
  • 77. eHarmony Results › The eHarmony Impact 2005 90 eHarmony Members Married Every Day Wednesday, June 12, 13
  • 78. eHarmony Results › The eHarmony Impact 2005 2007 236 eHarmony Members Married Every Day Wednesday, June 12, 13
  • 79. eHarmony Results › The eHarmony Impact 2005 2007 2009 542 eHarmony Members Married Every Day Wednesday, June 12, 13
  • 80. Proceedings of National Academy of Sciences Wednesday, June 12, 13
  • 81. Press coverage Wednesday, June 12, 13
  • 82. Since  2005,  about  1/3  of  couples   who  have  married  in  the  US   have  met  online  (35%) eHarmony Results › The eHarmony Impact *  according  to  survey  of  couples  married  between  2005-­‐2012  by  Harris  InteracQve  for  eHarmony Wednesday, June 12, 13
  • 83. Rates of breakup or divorce 0% 2.0% 4.0% 6.0% 8.0% All Online Offline *  according  to  survey  of  couples  married  between  2005-­‐2012  by  Harris  InteracQve  for  eHarmony Wednesday, June 12, 13
  • 84. The  largest  number   of  marriages  surveyed   who  met  via  online  da)ng   had  met  on  eHarmony  (25%) eHarmony Results › The eHarmony Impact *  according  to  survey  of  couples  married  between  2005-­‐2012  by  Harris  InteracQve  for  eHarmony Wednesday, June 12, 13
  • 85. Rates of breakup or divorce 0% 2.0% 4.0% 6.0% 8.0% eHarmony All Other Online Offline *  according  to  survey  of  couples  married  between  2005-­‐2012  by  by  Harris  InteracQve  for  eHarmony Wednesday, June 12, 13
  • 86. Rates of breakup or divorce 0% 2.0% 4.0% 6.0% 8.0% eHarmony All Other Online Offline *  according  to  survey  of  couples  married  between  2005-­‐2012  by  by  Harris  InteracQve  for  eHarmony @petricek linkedin.com/in/petricek bit.ly/jobateharmony Wednesday, June 12, 13
  • 87. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/eharmony -hadoop