Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Recommendation Engine: A Case Study

36 views

Published on

One of the surest ways to start down that path of making your data science and machine learning work for you is to find low-hanging fruit. Recommender systems have proven to be one of the most useful applications of data science to the consumer-facing web since the earliest days of the internet. This talk covers why and how one was built to recommend colleges to prospective high school students, the application of popularity tables and collaborative filters, as well as other approaches and the reasons for doing them sparkled with some war stories about their success and failures. Hopefully after this you can find how your data can work for your users to transparently improve their interaction with your websites instead of sitting in the back office somewhere helping some executive add graphs to their TPS reports.

Published in: Internet
  • Be the first to comment

  • Be the first to like this

The Recommendation Engine: A Case Study

  1. 1. THE RECOMMENDATION ENGINE:A CASE STUDY TERRY CHAY (@TYCHAY) HEAD OF ENGINEERING, CLARA HEALTH SUNSHINEPHP 2019, MIAMI, FLORIDA 2019-02-09T16:00-17:00 EST HTTPS://JOIND.IN/TALK/76824
  2. 2. WHO AM I? TERRY CHAY
  3. 3. PHP SOFTWARE PHP PROGRAMMER SINCE 2000 DIRECTOR OF ENGINEERING, QIXO 2000-2001 (FIRST TRAVEL SEARCH ENGINE) SENIOR WEB ENGINEER, MYCASA NETWORK 2002-2004 ("INTERNET OF THINGS") SCIENTIST, PLAXO 2004-2006 (ONE OF THE FIRST "VIRAL-TUNED" WEBSITES) SOFTWARE ARCHITECT,TAGGED, 2007-2009 (3RD LARGEST SOCIAL NETWORK IN THE US) PLANET TAKER,AUTOMATTIC 2009-2012 (AKA "WORDPRESS") DIRECTOR OF FEATURES ENGINEERING, WIKIMEDIA FOUNDATION, 2012-2014 (AKA "WIKIPEDIA")
  4. 4. PHP SOFTWARE PHP PROGRAMMER SINCE 2000 DIRECTOR OF ENGINEERING, QIXO 2000-2001 (FIRST TRAVEL SEARCH ENGINE) SENIOR WEB ENGINEER, MYCASA NETWORK 2002-2004 ("INTERNET OF THINGS") SCIENTIST, PLAXO 2004-2006 (ONE OF THE FIRST "VIRAL-TUNED" WEBSITES) SOFTWARE ARCHITECT,TAGGED, 2007-2009 (3RD LARGEST SOCIAL NETWORK IN THE US) PLANET TAKER,AUTOMATTIC 2009-2012 (AKA "WORDPRESS") DIRECTOR OF FEATURES ENGINEERING, WIKIMEDIA FOUNDATION, 2012-2014 (AKA "WIKIPEDIA")
  5. 5. YES, I WAS A PHP PROGRAMMER… SOMEWHERE IN THERE THIS HAPPENED…
  6. 6. AT OSCON ONE DAY… “…AND TERRY STARTED TAKING PICTURES AS TERRY DOES WITH EVERYTHING.” —CAL EVANS, EDITOR OF ZEND DEV ZONE, IN A PRO::PHP WEBCAST ABOUT THE ORIGIN OF PHP CARDS (2005)
  7. 7. AT OSCON ONE DAY… ANDREI ZMIEVSKI: LEAD DEVELOPER PHP 6 WEZ FURLONG: KING OF PECL IF YOU ACTIVATE THIS CARD’S PROFANITY SPECIAL ABILITY, THE AUDIENCE IS FROZEN FOR ONE TURN. ME! THE GEORGE SCHLOSSNAGLE CARD LOOKS A LOT LIKE ZAK GRAENT. RARE & POWERFUL!
  8. 8. AT OSCON ONE DAY…
  9. 9. AT ZENDCON ONE DAY…
  10. 10. PHPTERRORIST "CHAY" GUEVARA POWERED BY: THE BLOOD OF YOUNG RUBY DEVELOPERS PHP TERRORIST
  11. 11. ABOUT RUBY ON RAILS THE YEAR BEFORE…
  12. 12. ABOUT RUBY ON RAILS 2006 -YEAR OF THE DOG “IS PHP DOOMED?" “FIRST THEY IGNORE YOU, THEN THEY LAUGH AT YOU, THEN THEY FIGHT YOU, THEN YOU WIN.” —MAHATMA GHANDI “UNLESS YOU’RE RUBY.” — DANNY O'BRIEN, OSCON, 2006 IGNORE GHANDI STATE DIAGRAM LAUGH FIGHT YOU WIN! IGNORE RUBY WINS! RUBY ON RAILS STATE DIAGRAM (YEAR OF THE DOG - 2006) THAT'S SOME SERIOUS OPTIMIZATION! (RUBY MUST HAVE A GREAT PORT OF XDEBUG)
  13. 13. ABOUT RUBY ON RAILS 2006 -YEAR OF THE DOG TIOBE LANGUAGE OF THE YEAR IGNORE LAUGH FIGHT YOU WIN! GHANDI STATE DIAGRAM IGNORE RUBY WINS! RUBY ON RAILS STATE DIAGRAM (YEAR OF THE DOG - 2006)
  14. 14. ABOUT RUBY ON RAILS 2006 -YEAR OF THE DOG TIOBE LANGUAGE OF THE YEAR 2007 -YEAR OF THE PIG MY BIRTH YEAR 2007 - “IS RUBY THE DOG AND PHP THE DOGFOOD?” 2019-ALSO YEAR OF PIG IGNORE LAUGH FIGHT YOU WIN! GHANDI STATE DIAGRAM IGNORE RUBY WINS! RUBY ON RAILS STATE DIAGRAM (YEAR OF THE PIG - 2007) I LAUGH
  15. 15. TIOBE SOFTWARE RANKING 2002 2006 2019 PHP #5 (LoY) #6 #7 Ruby #39 #8 (LoY) #16 IGNORE RUBY WINS! RUBY ON RAILS STATE DIAGRAM (YEAR OF THE PIG - 2019) I LAUGHSTILL LAUGHING!
  16. 16. ABOUT FRAMEWORKS “RAILS IS LIKE A ROUNDED RECTANGLE AND PHP IS LIKE A BALL OF NAILS.” — ME (2007 -YEAR OF THE DOG)
  17. 17. “WHEN I SAY THAT PHP IS A BALL OF NAILS, BASICALLY, PHP IS JUST THIS PIECE OF SHIT THAT YOU JUST PUT TOGETHER—PUT ALL THE PARTS TOGETHER—AND YOU THROW IT AGAINST THE WALL AND IT FUCKING STICKS.” — ME (2007 -YEAR OF THE DOG) IN THIS TALK, WHEN I SAY… YOU CAN DO IT IN… RUBY, PYTHON, JAVA PHP RUBY ON RAILS, DJANGO LARAVEL, CODEIGNITER, SYMFONY, CAKEPHP, WORDPRESS… NUMPY, PANDA R, MATLAB, <INSERT MATRIX LIBRARY>…
  18. 18. 2018-CURRENT CLARA HEALTH HEAD OF ENGINEERING PATIENT-CENTRIC APPROACH TO CONNECTING PEOPLE TO CLINICAL TRIALS PYTHON/DJANGO
  19. 19. 2016-2018 RAISEME PRINCIPAL ENGINEER HELP STUDENTS EARN MONEY FOR COLLEGE IN THE FORM OF MICRO-SCHOLARSHIPS RUBY ON RAILS
  20. 20. DAVE CTO & CO-FOUNDER RAISEME (BY ANSWERING AN AD ON CRAIGSLIST HIS WIFE FOUND) — HIRED ME BEEN A CONSULTANT SINCE THE LATE 1990'S (BEEN DOING THIS LONGER THAN ME!) BEFORE THAT WAS IN A ROCK BAND (CAN STILL DOWNLOAD HIS MUSIC ON SPOTIFY) QUIT SMOKING BY TAKING UP CROSS- FIT DOG NAMED BUFFY (HE IS REALLY INTO BUFFY THE VAMPIRE SLAYER)
  21. 21. CLARKE’S THREE LAWS (NOT IN ORDER)
  22. 22. 1. CLARKE'S FIRST LAW
  23. 23. “WHEN A DISTINGUISHED BUT ELDERLY SCIENTIST STATES THAT SOMETHING IS POSSIBLE, HE IS ALMOST CERTAINLY RIGHT.WHEN HE STATES THAT SOMETHING IS IMPOSSIBLE, HE IS VERY PROBABLY WRONG.” — ARTHUR C. CLARK,“HAZARDS OF PROPHECY:THE FAILURE OF IMAGINATION,” PROFILES OF THE FUTURE (1962)
  24. 24. I THINK IT SHOULD BE POSSIBLE TO RECOMMEND COLLEGES TO STUDENTS. IT'LL BE LIKE COLLEGE SEARCH BUT WITHOUT ANY INPUT FROM THE USER. MY DISTINGUISHED, BUT ELDERLY CTO
  25. 25. …AND I TAKE EXCEPTION TO "ELDERLY." I DO CROSSFIT SO I'M QUITE SPRY. MY DISTINGUISHED, BUT ELDERLY CTO
  26. 26. …AND I TAKE EXCEPTION TO "ELDERLY." I DO CROSSFIT SO I'M QUITE SPRY. MY DISTINGUISHED, BUT ELDERLY CTO IN FACT, YOU SHOULD TRY CROSSFIT TOO,TERRY. IT'D BE IMPOSSIBLE FOR YOU NOT TO LOVE IT LIKE I DO! “VERY PROBABLY WRONG”
  27. 27. WHAT IS A RECOMMENDATION ENGINE?
  28. 28. WHAT IS A RECOMMENDATI ON ENGINE? A RECOMMENDER SYSTEM OR A RECOMMENDATION SYSTEM (SOMETIMES REPLACING "SYSTEM" WITH A SYNONYM SUCH AS PLATFORM OR ENGINE) IS A SUBCLASS OF INFORMATION FILTERING SYSTEM THAT SEEKS TO PREDICT THE "RATING" OR "PREFERENCE" A USER WOULD GIVE TO AN ITEM. —WIKIPEDIA
  29. 29. WHAT IS A RECOMMENDATI ON ENGINE? USES ANALYTICAL DATA (AS OPPOSED TO TRANSACTIONAL DATA) FOR A USER-CENTRIC PURPOSE (AS OPPOSED TO A BUSINESS-CENTRIC ONE).
  30. 30. ANALYTICAL DATA BUSINESS-CENTRIC TRANSACTIONAL DATA USER-CENTRIC
  31. 31. ANALYTICAL DATA BUSINESS-CENTRIC TRANSACTIONAL DATA USER-CENTRIC SELECT COUNT(*) FROM USERS AS U INNER JOIN ACTIVITY AS A ON A.USER_ID = U.USER_ID WHERE A.TIMESTAMP > NOW() - INTERVAL '30 DAYS' HOW MANY USERS DID WE HAVE IN THE LAST MONTH?USER WANTS TO LOG IN. SELECT PASSWORD, USER_ID FROM USERS WHERE EMAIL = 'TYCHAY@PHP.NET' ONLINE TRANSACTION PROCESSOR (OLTP) (E.G. RDBMS) ONLINE ANALYTICAL PROCESSOR (OLAP) (E.G. "DATA WAREHOUSE") PARACCEL (AT TAGGED) • COLUMNAR-STORAGE • MPP • AWS REDSHIFT (AT RAISEME) MYSQL/POSTGRESQL • ROW-BASED RELATIONAL STORAGE • PARTITIONING • AWS RDS RECOMMENDATION ENGINE ETL EXTRACT TRANSFORM LOAD
  32. 32. AWESOME COLLEGE DISCOVERY CREW ACDC?YEAH!!!
  33. 33. IMPROVE STUDENT- COLLEGE MATCHING STUDENTS ARE MORE ACTIVE IF THEY FIND SCHOOLS THEY CARE ABOUT. BUSINESS REVENUE WAS BASED ON APPLICATION RATES OF FOLLOWED COLLEGES
  34. 34. MAKE PRODUCT FLOW INTUITIVE GETTING MORE/BETTER DATA WILL IMPROVE THE MATCHING
  35. 35. “WHEN A DISTINGUISHED BUT ELDERLY SCIENTIST STATES THAT SOMETHING IS POSSIBLE, HE IS ALMOST CERTAINLY RIGHT. WHEN HE STATES THAT SOMETHING IS IMPOSSIBLE, HE IS VERY PROBABLY WRONG.” “I THINK IT SHOULD BE POSSIBLE TO RECOMMEND COLLEGES TO STUDENTS. IT'LL BE LIKE COLLEGE SEARCH BUT WITHOUT ANY INPUT FROM THE USER.” — ARTHUR C. CLARK,“HAZARDS OF PROPHECY:THE FAILURE OF IMAGINATION,” PROFILES OF THE FUTURE (1962) — DAVE, CO-FOUNDER, CTO RAISEME & MY BOSS (2017)
  36. 36. ALMOST CERTAINLY RIGHT AMAZON “CUSTOMERS ALSO BOUGHT” AMAZON “FREQUENTLY BOUGHT TOGETHER” AMAZON “RECOMMENDATION TO YOU…”
  37. 37. ALMOST CERTAINLY RIGHT NETFLIX “BECAUSE YOU WATCHED ORANGE IS THE NEW BLACK” NETFLIX “POPULAR ON NETFLIX” NETFLIX “% MATCH”
  38. 38. RECOMMENDATIONS EVERYWHERE! TWITTER “WHO TO FOLLOW” “TRENDING NOW”, “MOMENTS” SPOTIFY “RECOMMENDED SONGS” “RELATED ARTISTS”, “YOUR DAILY MIXES”, RADIO OKCUPID “DOUBLETAKE” “BROWSE MATCHES”, % MATCH
  39. 39. REQUEST PIPELINE GET BUCKETS /COLLEGES/RECOMMENDATIONS FRONT-END ASKS API FOR LIST OF BUCKETS GET RECOMMENDATIONS FOR BUCKET /V1/COLLEGES/DISCOVER/WEST RUBY API IS GIVEN BUCKET NAME (E.G.WEST) AND USES "EXTRA" DATA TO CALL GO API FOR RECOMMENDATIONS LIST OF BUCKETS /V1/COLLEGES/DISCOVER LIKE "MOVIES WITH A STRONG FEMALE LEAD” BUT ARE THINGS LIKE "BECAUSE YOU LIVE IN THE WEST” (STORING "EXTRA" DATA ON SERVER) RENDER RESULTS IN BUCKET /COLLEGES/RECOMMENDATIONS SIMILAR TO HOW COLLEGE SEARCH RENDERS SEARCH RESULTS. JAVASCRIPT RUBY
  40. 40. THE RECOMMENDATION ENGINE THIS IS HOW THE REC ENGINE GENERATES RECOMMENDATIONS FOR A BUCKET "WEST" ROR API FOR BUCKET “WEST” /V1/COLLEGES/DISCOVER/WEST RUBY API IS GIVEN BUCKET NAME POST EXTRA DATA TO GO API HTTPS://PROD- RECOMMEND.RAISE.ME/API/V1/ DISCOVER/ CURL -I -H "CONTENT-TYPE: APPLICATION/JSON" -X POST -D '{"CONTENT":{"STATES":["CA", "AK", "AZ", "CO", "HI", "ID", "MT", "NM", "NV", "OR", "UT", "WA", "WY"]}, "RANKING":"STUDENT-STUDENT", "SCHOOL_ID":"535432DA385C8D490D000001", "STATE":"CA", "USERID":"5992010DE87EB427B1ECE416", "ZIPCODE":"94121"}' HTTPS://PROD- RECOMMEND.RAISE.ME/API/V1/DISCOVER/ EXTRACTS "RANKING" AND UPLOADS RELEVANT DATA TO PYTHON COLLABORATIVE FILTER (OR QUERIES PYTHON PRE-COMPUTED POPULARITY TABLE) RANKING: "STUDENT-STUDENT", STUDENTID:"5992010DE87EB427B1ECE416" LOOK UP EXTRA DATA /V1/COLLEGES/DISCOVER/WEST {"CONTENT"=>{"STATES"=>["CA", "AK", "AZ", "CO", "HI", "ID", "MT", "NM", "NV", "OR", "UT", "WA", "WY"]}, "RANKING"=>"STUDENT- STUDENT", "SCHOOL_ID"=>"535432DA385C8D490D00 0001", "STATE"=>"CA", "USERID"=>"5992010DE87EB427B1ECE41 6", "ZIPCODE"=>"94121"} JAVASCRIPT RUBY GOLANG PYTHON
  41. 41. THE RECOMMENDATION ENGINE PYTHON MAGIC HAPPENS EXEC /USR/BIN/PYTHON34 STUDENTRANKS.PY 5992010DE87EB427B1ECE416 VIEWS_PLUS_FOLLOWING IN THIS CASE IT'S A COLLABORATIVE FILTER SO IT QUERIES THE PYTHON MODEL AND GETS BACK A RANKED LIST OF COLLEGE IDS RUN RESULT THROUGH CONTENT FILTER RDS SELECT T1.ID, X.RANK FROM COLLEGE_CONTENT_FILTER AS T1 INNER JOIN ( STUFF FROM PYTHON ) AS X (ID, RANK) ON X.ID = T1.ID WHERE X.ID IS NOT NULL AND SCHOOL_STATE IN ("CA", "AK", "AZ", "CO", "HI", "ID", "MT", "NM", "NV", "OR", "UT", "WA", "WY"]) ORDER BY X.RANK RUBY RECEIVES AND DOES EXTRA PROCESSING /V1/COLLEGES/DISCOVER/WEST E.G. MAPS "5498FFD56E670ECD4E00010C " TO "UNIVERSITY OF SAN DIEGO", ADDS ICON AND FOLLOW STATUS, ETC. RETURN RECOMMENDATIONS TO JAVASCRIPT FOR RENDERING /V1/COLLEGES/DISCOVER/WEST RUBY API IS GIVEN BUCKET NAME (E.G.WEST JAVASCRIPT RUBY GOLANG PYTHON
  42. 42. RUBY ON RAILS ASIDE:WHY THREE LANGUAGES (+ JAVASCRIPT)? RUBY PYTHON DJANGO JAVASCRIPT RUBY GOLANG PYTHON
  43. 43. BAD MARKETING DJANGO
  44. 44. …FIVE YEARS AGO IN MY ONLY SLIGHTLY LESS DISTINGUISHED, BUT STILL SPRY YOOT, I WAS A LITERAL ROCKSTAR SO I TOTALLY PREFER PYTHON ON A PLANE!! 🎸 (IN AN ALTERNATE GIT TIMELINE BRANCH)
  45. 45. DJANGO REINHARDT FRENCH JAZZ GUITARIST (1910-1953) JAZZ IS NOT ROCK AND ROLL!!
  46. 46. MONTY PYTHON (1969) WHAT?! AND PYTHON IS NAMED AFTER MONTY PYTHON AND NOT THE SNAKE? WHAT HAS MONTY PYTHON EVER DONE FOR US? BUFFY COULD TOTALLY RAMMED A HOLY HAND GRENADE OF ANTIOCH DOWN THE SPANISH INQUISITION’S THROAT AND THEN DRIVEN A STAKE IN THEM JUST TO MAKE SURE! I CHOOSE RUBY ON RAILS
  47. 47. …AND THAT IS WHY I HAD TO SPEND TWO WEEKS LEARNING GOLANG A little more fleshing out here
  48. 48. 2. CLARKE'S THIRD LAW
  49. 49. “ANY SUFFICIENTLY ADVANCED TECHNOLOGY IS INDISTINGUISHABLE FROM MAGIC.” — ARTHUR C. CLARK,“HAZARDS OF PROPHECY:THE FAILURE OF IMAGINATION,” PROFILES OF THE FUTURE (1973)
  50. 50. CLARKE'S THIRD LAW RECOMMENDO COLLEGIA! ANY SUFFICIENTLY ADVANCED TECHNOLOGY IS INDISTINGUISHABLE FROM MAGIC. —ARTHUR C. CLARK,“HAZARDS OF PROPHECY:THE FAILURE OF IMAGINATION,” PROFILES OF THE FUTURE (1973)
  51. 51. ANY SUFFICIENTLY ADVANCED TECHNOLOGY IS INDISTINGUISHABLE FROM MAGIC. —ARTHUR C. CLARK, “HAZARDS OF PROPHECY:THE FAILURE OF IMAGINATION,” PROFILES OF THE FUTURE (1973)
  52. 52. ANY ADVANCED TECHNOLOGY SUFFICIENTLY MAGIC INDISTINGUISHABLE FROM . IS
  53. 53. ADVANCED TECHNOLOGY SUFFICIENTLY MAGIC INDISTINGUISHABLE FROM . IS
  54. 54. ADVANCED TECHNOLOGY SUFFICIENTLY MAGIC INDISTINGUISHABLE FROM ? IS
  55. 55. PYTHON SUFFICIENTLY MAGIC INDISTINGUISHABLE FROM ? IS
  56. 56. IS PYTHON SUFFICIENTLY INDISTINGUISHABLE FROM MAGIC? A NON-TECH FRIEND WHO HAD RECENTLY MOVED TO SAN FRANCISCO… OVERHEARD TWO GUYS TALKING NEXT TO HER “WHAT ARE YOU UP TO?" "OH, I'M TRYING TO LEARN PYTHON." "EXCUSE ME, BUT I BELIEVE IT'S CALLED PARSELTONGUE."
  57. 57. IS PYTHON SUFFICIENTLY INDISTINGUISHABLE FROM MAGIC? A NON-TECH FRIEND WHO HAD RECENTLY MOVED TO SAN FRANCISCO… OVERHEARD TWO GUYS TALKING NEXT TO HER “WHAT ARE YOU UP TO?" "OH, I'M TRYING TO LEARN PYTHON." "EXCUSE ME, BUT I BELIEVE IT'S CALLED PARSELTONGUE."
  58. 58. RECOMMENDER SYSTEMS RANKING FILTER Content-based Filter RANKING SELECT T1.ID, X.RANK FROM COLLEGE_CONTENT_FILTER AS T1 INNER JOIN ( STUFF FROM PYTHON ) AS X (ID, RANK) ON X.ID = T1.ID WHERE X.ID IS NOT NULL AND SCHOOL_STATE IN ("CA", "AK", "AZ", "CO", "HI", "ID", "MT", "NM", "NV", "OR", "UT", "WA", "WY"]) ORDER BY X.RANK
  59. 59. RECOMMENDER SYSTEMS Collaborative Filter (CF) RANKING RANKING Content-based PopularityTable FILTER Content-based Filter
  60. 60. RECOMMENDER RANKINGS Collaborative Filter (CF) RANKING RANKING Content-based PopularityTable STUDENT-STUDENT: RANK COLLEGES BASED ON SIMILAR STUDENTS COLLEGE-COLLEGE: RANK COLLEGES BASED ON SIMILAR COLLEGES STATE-COLLEGE: RANK COLLEGES BASED ON STUDENT'S HOME STATE HIGH SCHOOL-COLLEGE: RANK COLLEGES BASED ON STUDENT'S HIGH SCHOOL ZIP-COLLEGE: RANK COLLEGES BASED ON STUDENT'S ZIP CODE
  61. 61. STUDENT-STUDENT RECOMMENDER Collaborative Filter (CF) RANKING College W College X College Y College Z Student A 1 2 3 4 Student B 1 ? 3 4 Student C 4 3 2 1 Student D ? 3 2 1 Student E ? 1 ? 1 TABLE IS BASED ON COLLEGE PAGE VIEWS (YOU CAN ADD/USE OTHER DATA LIKE "FOLLOWS") FIND TWO SMALLER MATRICES THAT APPROXIMATE THIS LARGE MATRIX (AKA MATRIX FACTORIZATION(MF)) (THERE ARE OTHER APPROACHES) USING NUMPY AND PANDAS MODULES IN PYTHON StudentFactors College FactorsX
  62. 62. STUDENT-STUDENT RECOMMENDER Collaborative Filter (CF) RANKING College W College X College Y College Z Student A 1 2 3 4 Student B 1 ? 3 4 Student C 4 3 2 1 Student D ? 3 2 1 Student E ? 1 ? 1 StudentFactors College FactorsX
  63. 63. STUDENT-STUDENT RECOMMENDER Collaborative Filter (CF) RANKING College W College X College Y College Z Student A 1 2 3 4 Student B 1 ~2 3 4 Student C 4 3 2 1 Student D ? 3 2 1 Student E ? 1 ? 1 StudentFactors College FactorsX
  64. 64. STUDENT-STUDENT RECOMMENDER Collaborative Filter (CF) RANKING College W College X College Y College Z Student A 1 2 3 4 Student B 1 ~2 3 4 Student C 4 3 2 1 Student D ? 3 2 1 Student E ? 1 ? 1 StudentFactors College FactorsX
  65. 65. STUDENT-STUDENT RECOMMENDER Collaborative Filter (CF) RANKING College W College X College Y College Z Student A 1 2 3 4 Student B 1 ~2 3 4 Student C 4 3 2 1 Student D ~4 3 2 1 Student E ? 1 ? 1 StudentFactors College FactorsX
  66. 66. STUDENT-STUDENT RECOMMENDER Collaborative Filter (CF) RANKING College W College X College Y College Z Student A 1 2 3 4 Student B 1 ~2 3 4 Student C 4 3 2 1 Student D ~4 3 2 1 Student E ? 1 ? 1 StudentFactors College FactorsX
  67. 67. STUDENT-STUDENT RECOMMENDER Collaborative Filter (CF) RANKING College W College X College Y College Z Student A 1 2 3 4 Student B 1 ~2 3 4 Student C 4 3 2 1 Student D ~4 3 2 1 Student E ~1 1 ~2 1 StudentFactors College FactorsX
  68. 68. STUDENT-STUDENT RECOMMENDER Collaborative Filter (CF) RANKING College W College X College Y College Z Student A 1 2 3 4 Student B 1 ~2 3 4 Student C 4 3 2 1 Student D ~4 3 2 1 Student E ~1 1 ~2 1 RECOMMEND COLLEGES WITH THE HIGHEST PREDICTED VALUES STUDENT B WE RECOMMEND Z,Y, X, THEN W LAST STUDENT D WE RECOMMEND W, X,Y, AND Z LIST STUDENT E WE RECOMMEND Y, THEN W, X OR Z StudentFactors College FactorsX
  69. 69. CF RANKING IMPLEMENTATION Collaborative Filter (CF) RANKING StudentFactors College FactorsX EC2 (1/HR) AWS REDSHIFT GO API SERVER PICKLED MF MODEL
  70. 70. RECOMMENDER SYSTEMS Collaborative Filter (CF) RANKING RANKING Content-based PopularityTable FILTER Content-based Filter
  71. 71. STATE-COLLEGE RECOMMENDER College W College X College Y College Z State Student A 1 2 3 4 CA Student B 1 7 8 4 NY Student C 4 3 2 1 WA Student D 3 3 2 1 CA Student E 12 3 2 1 NY Student F 8 5 9 3 CA RANKING Content-based PopularityTable
  72. 72. STATE-COLLEGE RECOMMENDER College W College X College Y College Z State Student A 1 2 3 4 CA Student B 1 7 8 4 NY Student C 4 3 2 1 WA Student D 3 3 2 1 CA Student E 12 3 2 1 NY Student F 8 5 9 3 CA RANKING Content-based PopularityTable
  73. 73. STATE-COLLEGE RECOMMENDER College W College X College Y College Z State Student A 1 2 3 4 CA Student D 3 3 2 1 CA Student F 8 5 9 3 CA Student B 1 7 8 4 NY Student E 12 3 2 1 NY Student C 4 3 2 1 WA RANKING Content-based PopularityTable
  74. 74. STATE-COLLEGE RECOMMENDER College W College X College Y College Z State Student A,D,F 12 10 14 8 CA Student B, E 1 1 3 5 NY Student C 4 3 2 1 WA RANKING Content-based PopularityTable CA STUDENTS ARE RECOMMENDED… 1. COLLEGE Y 2. COLLEGE W 3. COLLEGE X 4. COLLEGE Z
  75. 75. POPULARITY TABLE RANKING IMPLEMENTATION RANKING Content-based PopularityTable AWS REDSHIFT EC2 (1/DAY) GO API SERVER RDS XPLENTY ETL
  76. 76. RECOMMENDER SYSTEMS Collaborative Filter (CF) RANKING RANKING Content-based PopularityTable FILTER Content-based Filter
  77. 77. CONTENT-BASED FILTER IMPLEMENTATION GO API SERVER RDS FILTER Content-based Filter MONGODB FILTER Content-based FilterXPLENTY ETL SELECT T1.ID, X.RANK FROM COLLEGE_CONTENT_FILTER AS T1 INNER JOIN ( RANKINGS FROM CF ) AS X (ID, RANK) ON X.ID = T1.ID WHERE X.ID IS NOT NULL AND SCHOOL_STATE IN ("CA", "AK", "AZ", "CO", "HI", "ID", "MT", "NM", "NV", "OR", "UT", "WA", "WY"]) ORDER BY X.RANK AWS REDSHIFTXPLENTY ETL
  78. 78. 3. CLARKE'S SECOND LAW
  79. 79. “THE ONLY WAY OF DISCOVERING THE LIMITS OF THE POSSIBLE IS TO VENTURE A LITTLE WAY PAST THEM INTO THE IMPOSSIBLE.” — ARTHUR C. CLARK,“HAZARDS OF PROPHECY:THE FAILURE OF IMAGINATION,” PROFILES OF THE FUTURE (1973)
  80. 80. “THE ONLY WAY OF DISCOVERING THE LIMITS OF THE POSSIBLE IS TO VENTURE A LITTLE WAY PAST THEM INTO THE IMPOSSIBLE.” — ARTHUR C. CLARK,“HAZARDS OF PROPHECY:THE FAILURE OF IMAGINATION,” PROFILES OF THE FUTURE (1973)
  81. 81. CLARKE'S SECOND LAW THE ONLY WAY OF DISCOVERING THE LIMITS OF THE POSSIBLE IS TO VENTURE A LITTLE WAY PAST THEM INTO THE IMPOSSIBLE. —ARTHUR C. CLARK,“HAZARDS OF PROPHECY:THE FAILURE OF IMAGINATION,” PROFILES OF THE FUTURE (1962)
  82. 82. YOUR MISSION SHOULD YOU CHOOSE TO ACCEPT IT… • USE RECOMMENDERS ON YOUR PROJECT! (TO BUILD USER-CENTRIC, INTUITIVE UI) • COLLABORATIVE FILTER • POPULARITY TABLE • CONTENT FILTER • …OR HYBRID COMBINATION OF THE ABOVE OR OTHER CREATIVE APPROACHES
  83. 83. MY OTHER MISSIONS (SEQUELS)… 1. THIS MISSION'S POST CREDITS EASTER EGG 2. AUTOCOMPLETE @ CLARAHEALTH 3. MEETME @TAGGED
  84. 84. COLLEGE- RECOMMENDATION ENGINE MISSION DEBRIEF (2018)
  85. 85. COLLEGE- RECOMMENDATION ENGINE MISSION DEBRIEF (2018) • JULY 2018: RAISEME CLOSED THEIR SERIES B FUNDRAISING ($15M) • AUGUST 2018: THEY FIRED THE CTO/CO-FOUNDER (DAVE) • SEPTEMBER 2018: LATER THEY FIRED THE PRINCIPAL ENGINEER (ME) • …ON THE OTHER HAND, I HEAR THEY'RE HIRING ;-) D ISA V O W ED D ISA V O W ED
  86. 86. STARTED OCTOBER 2018 HEAD OF ENGINEERING AUTOCOMPLETE RECOMMENDER IN DECEMBER 2018 FOR CLINICAL TRIALS SEARCH TERMS EXAMPLE OF RECOMMENDER WITHOUT DATA WAREHOUSE OR MATRIX/ STATS LIBRARY CLARAHEALTH
  87. 87. CLARAHEALTH PROBLEM: AUTOCOMPLETE MEDICAL TERMS FOR MEDICAL SUBJECT HEADINGS (MESH) TOO SLOW AND INACCURATE: POSTGRESQL TRIGRAM TEXT-MATCHING ONLY REQUIRES LARGE NESTED QUERY MESH RDSAWS ZAPPA LAMBDA SERVERLESS
  88. 88. CLARAHEALTH 1. BUILD ETL OF POPULARITY TABLES OF MESH TERMS IN ACTUAL STUDIES 2. LOAD ELASTICSEARCH COMPLETION SUGGESTER 3. WEIGHT SUGGEST TERMS BASED ON TERM POPULARITY IN STUDIES TABLE (CAN BE IMPROVED LATER BY CHANGING ETLS TO USE SEARCH POPULARITY INSTEAD OF STUDY POPULARITY) AWS LAMBDA MESH RDSAWS ZAPPA LAMBDA SERVERLESS STUDIES RDS ELASTIC CLOUD
  89. 89. MEETME @ TAGGED (2008) EXAMPLE OF A NON- TRADITIONAL USE OF A RECOMMENDER SYSTEM
  90. 90. MEETME @ TAGGED (2008) PROBLEM: SITE HAD ONLY 20 MILLION PAGE VIEWS PER DAY WE HAD 40 MILLION USERS! 4 TOP PEOPLE (CEO, CTO, ME, DIR. OF ENGINEERING) EACH WORKING ON OWN PROJECT TO FIX THIS… IN MAY 2007, HUNG OUT WITH HOTORNOT.COM AT FACEBOOK'S FIRST F8 "MEET ME" APP FAILED HARD! I HAD THEORY THAT IT FAILED BECAUSE FACEBOOK API ONLY SHOWS YOU YOUR FRIENDS. “WHO WANTS TO MEET THEIR FRIENDS?”
  91. 91. MEETME @ TAGGED (2008) COULDN'T SHOW FRIENDS OR FOAF NOT RANDOM, RECOMMEND PEOPLE YOU'D WANT TO MEET.. 1. BUILD A NETWORK OF USERS AND THEIR RELATIONSHIPS/SIMILAR DATA (LOCATION, AGE, ETC.) 2. USE MONTE-CARLO RANDOM WALK: 9,11 STEPS ENDING ON A PERSON 3. MAP: DO IT 40 MILLION TIMES 4. REDUCE: SORT, KEEP TOP 1000 RESULTS 5. CACHE IT, RETURN TOP 20 TO PHP 6. CONTENT FILTER THE LIST DOWN AND PASS IT TO WEB CLIENT 7. AS MORE ARE RATED, PULL FROM CACHE OR REPEAT GOTO 2. PARAACCEL ORACLE MEMCACHE
  92. 92. MEETME @ TAGGED (2008) (DON'T EVEN REMEMBER THE OTHER 3 PEOPLE'S PROJECTS.) FIRST WEEK: DOUBLED TRAFFIC (40 MILLION PAGES/DAY) AVERAGE USER WAS RATING 200 PEOPLE/SESSION BY 2009, MEETME WAS DOING 100 MILLION PAGES/DAY SITE WAS 250 MILLION PAGES/ DAY, 3RD LARGEST US SOCIAL NETWORK
  93. 93. MEETME @ TAGGED (2008) RATING PROFILE PIC TO MEET STRANGERS SOUND FAMILIAR? 2011 MEETME.COM COPIED IT (TAGGED SUES) 2012 MATCH.COM COPIED THEM BUT APP SWIPE-LEFT/ SWIPE-RIGHT (TINDER) 2016 MEETME.COM BUYS TAGGED 2009 I WAS GONE (IPHONE APPS DIDN'T EXIST YET) PLUS, I STOLE THE IDEA FROM HOTORNOT.COM FAILED FACEBOOK APP ANYWAY. I WAS DONE WITH DOING EVIL SHIT.
  94. 94. SUMMARY RECOMMENDER SYSTEM ARE CERTAINLY POSSIBLE (FIRST LAW) BUSINESS ANALYTICS USED FOR TO BENEFIT USER BUSINESS ANALYTICS - MAGIC → TECHNOLOGY (THIRD LAW) DATA WAREHOUSE (MANY USE SQL AS THE QUERY LANGUAGE) ETL: EXTRACT-TRANSFORM-LOAD MATRIX FACTORIZATION RDS, MODELS, ELASTICSEARCH YOU CAN DO 90% IN PHP + SERVICES YOUR MISSION (IMPOSSIBLE) IS TO USE RECOMMENDERS TO BUILD: INTUITIVE UIS (SECOND LAW) 3 EXAMPLES: MEETME, COLLEGE RECOMMENDATIONS,AUTOCOMPLETE … ALSO, FUCK RUBY!
  95. 95. THANK YOU! ⟵ PLEASE RATE THIS TALK!HTTPS://JOIND.IN/TALK/76824 ADD ME! (SAY WE MET AT #SUNPHP19)

×