Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mukund Narasimhan, Engineer, Pinterest at MLconf Seattle 2017

455 views

Published on

Mukund Narasimhan is an engineer at Pinterest where he works on content modeling and recommendations. Prior to Pinterest, he has worked at Google, Facebook, Microsoft and Intel. He has a Ph.D. in Electrical Engineering from the University of Washington and a M.S in Mathematics from Louisiana State University.

Abstract summary

Knowledge at Pinterest:
Pinterest is building the world’s largest catalog of ideas. These ideas are embodied in billions of Pins and well over 100 million Users and Boards which, together, form a complex picture of our users and their interests. The goal of the Knowledge team is to leverage our unstructured data (images, videos, and text), and structured data (user curated data, partner generated content, and engagement signals) to model users and their interests so we can help them discover fresh, diverse, and personalized new ideas. In this talk, we go into some detail on the progress we’ve made so far.

Published in: Technology
  • Be the first to comment

Mukund Narasimhan, Engineer, Pinterest at MLconf Seattle 2017

  1. 1. Mukund Narasimhan May 19, 2017 MLConf2017 KnowledgeatPinterest
  2. 2. Helppeoplediscover anddothethingsthey love. Goal
  3. 3. Helppeoplediscover anddothethingsthey love. Helpusersdiscover personalizednewideas. Goal
  4. 4. 400+Engineers 2B+Boards 100B+Pins 175M+Monthly active users
  5. 5. Avisualbookmarksomeone hassavedfromtheinternetto aboardthey’vecreated. Pin 6 men’s style Jacob Hinmon End Clothing Men’s blue jacket
  6. 6. Pinsare notphotos Under the hood Pin Image
  7. 7. Agreatercollection
 ofideas. Board 8 MMonochromaticNormcoreFashion 48 Pins A house Trying not to die in the cold up in the northwest. Visit anyways. Andreas Pihlström
  8. 8. Problem Helpuserdiscoverpersonalizednewideas Howcanwematchideastousers? Solution1:Explicitlyfollowedlistoftopics • Returnideasbasedonexplicituserinterest(homedecor,fashionetc.) • Userprovidesanexplicitlistofinterests • Modelmapsuserstoahard-codedlistofinterests • Discoverinterestsfromdata,andmapuserstothoseinterests Solution2:Inferreduserinterests • Returnideasbasedoninferreduserinterests(noexplicittopics) Bothareimportant! • Wewilldiscussboth,butwewillgointotechnicaldetailsforsolution2only.
  9. 9. ExplicitTopics Domain
  10. 10. ExplicitTopics Sub-Domain
  11. 11. Resources • UserCuration: Pinsareorganizedintoboards • UserActivity:usersinteractwithPinsandboards • Content: Pinshaveassociatedcontent Whatdataisavailabletous?
  12. 12. Curation Pin-Board-Usergraph •Canbeusedtodiscovertopics •Canalsobeusedasasignalforthe inferredinterestsmodel
  13. 13. Content •Canbeusedtodiscover topics •Canalsobeusedasasignal forthe inferredinterestsmodel PinDescription Boardtitle Url Webpage
  14. 14. LogAnalysis Activity •Usersearchforpins,save pinstoboards,etc. •Canbeusedtotrainatopics model •Canbeusedtotrain an inferredinterestsmodel •Canbeusedtomodelusers
  15. 15. InferredUserInterests Goal:returnPins mostrelevanttouser Idea • Trainembeddingsforusers (=query)andPins(=result)insame embeddingsspace • Atruntime,findthepinembeddings thatbestmatchtheuser embeddingusinganinvertedindex. Returnthetoppinstotheuser
  16. 16. InferredUserInterests Twokeytechnology components • Embeddings • InvertedIndex
  17. 17. InferredUserInterests Embeddings • Dense,lower-dimensional,continuous representationofobjects • Representationismeaningful,captures semanticsimilarity.Twoobjectsthataresimilar intheembeddingsspaceshouldbe semanticallysimilar.
  18. 18. InferredUserInterests InvertedIndex • Givenanembedding,findobjectsthataresimilar intheembeddingsspace • Problem:toocostlytosearchthroughallpotential candidates • Solution:useLocalitySensitiveHashing(LSH) • Next:Howcanweindexembeddings?
  19. 19. HowareEmbeddingsCreated? Severaltechniquesavailable-herewecoverWord2Vec • Word2Vecencodeswordsimilaritybasedonthedistributionalhypothesis(Harris,1954)–words insimilarcontexthavesimilarmeanings • TotrainembeddingswithWord2Vec,definecontainer(sentence),item(word),context (surroundingwords), • Here,container=Pin,item=content,context=othercontent where content = Pin description, title of boards contained with the Pin, etc. Original Word2Vec Inferred User Interests Container Sentence Pin Item Word Content: word in content associated with Pin Context Surrounding words Other content: other words in content associated with Pin
  20. 20. HowareEmbeddingsTrained? NeuralNetworkModel • StartwiththevocabularyVofwords. • Thetrainingdataconsistsofinstances(v,C(v)) • V=term[word1] • C(v)=contextfortheterm[wordsco-occurringwithword1] • Defineascoringfunction • Trainingconsistsofminimizing • TheminimizerEistheembeddingmatrix. • Theembeddingsmatrixmapsv[word]->embedding
  21. 21. InvertedIndex UseLocalitySensitiveHashingtocreate indexableterms • Reminder:givenanembedding,wewanttofindobjectsthataresimilarinthe embeddingsspace • Thisproperty,wherecloserobjects(intermsofcosinesimilarity)aremorelikely tohavethesamevalueforthebucketcorrespondingtothesetofallmbitsis calledLocalitySensitiveHashing. • Idea:convertembeddingsintoindexableterms • Converttherealvaluedembeddingvectorintoabinaryvector1101011101110… • Theprobabilityoftwovectorshavingthesamevalueofbitiisproportionalto thecosinesimilaritybetweenthetwovectors
  22. 22. LocalitySensitiveHashing Pickprojectionvectorsonce
  23. 23. LocalitySensitiveHashing Foreachembeddingsvector
  24. 24. LocalitySensitiveHashing Foreachprojectionvector, determineonwhichsideofthe hyperplanetheembeddings vectorlands • On the same side: set bit to 1 • On different sides: set bit to 0 Result 1: 110 11 1 0
  25. 25. LocalitySensitiveHashing Dothesamewiththe nextembeddings vector Result 1: 110 Result 2: 101 1 1 0 1 0 1
  26. 26. LocalitySensitiveHashing Thetwoembeddingsare farinboththeembeddings spaceandinthebitspace Embeddings: (-1.0, 1.2), (1.3, -0.7) Cosine similarity: 0.05 Bits: 110, 101 1 out of 3 bits matches 1 1 0 1 0 1
  27. 27. CreateandIndexTerms Createterms Pickoptimalnumberofterms andbitsperterm Wewanttobeopentodiversecandidates: • Setalowthresholdforthenumberoftermsthatmustmatch(atmostxtermsmustmatch) • Createalargenumberofsmallerterms(notethateachtermmustmatchexactly) Butwewanttomaintainacertainlevelofprecision: • Setaminimumthreshold(atleastxtermsmustmatch) • Setaminimumsizeforterms Example1001110001011000->1001-1100-0101-1000 Indexterms RetrieveresultswithaWAND(WeakAnd)query
  28. 28. InferredUserInterests Goal:returnPins mostrelevanttouser Idea • Trainembeddingsforusers (=query)andPins(=result)insame embeddingsspace • Atruntime,findthepinembeddings thatbestmatchtheuser embeddingusinganinvertedindex. Returnthetoppinstotheuser
  29. 29. InferredUserinterests Pinsare independentof popularitysomay berecentorrarely seen Wasuploaded12daysago andhas580impressions Wasuploaded38daysago andhasonly140impressions Hasonly100impressions
  30. 30. InferredUserinterests ApproximateNearest Neighborsearchusing invertedIndexapplied toembeddings • Thisismypersonalfeedwith resultsthatmatchmyinterests,e.g. animals,castles,howtodrawwith colorpencils,gardening • Someofthesewereidentifiedusing themethoddescribedinthistalk
  31. 31. We’re hiring ! sonjaknoll@pinterest.com Thankyou!
  32. 32. © Copyright, All Rights Reserved, Pinterest Inc. 2017

×