Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

668 views

Published on

Sharing and Growing the World’s Knowledge with Machine Learning: At Quora our mission is to “share and grow the world’s knowledge”. To accomplish this, we need to build a complex ecosystem which requires us to understand and solve a variety of problems like content quality, demand, user engagement, personalization, and author reputation. In this talk, we will go over several exciting challenges of applying machine learning to these problems. We will give examples such as our ranking and recommendation approaches, as well as systems and tools we built to support experimentation and integration of machine learning models in the product.

Published in: Technology
  • Be the first to comment

Lei Yang, Senior Engineering Manager, Quora at MLconf NYC - 4/15/16

  1. 1. Sharing and growing the world's knowledge with machine learning Lei Yang (leiyang@quora.com) April 2016
  2. 2. Our mission “To share and grow the world’s knowledge” ● Millions of questions & answers ● Millions of users ● Thousands of topics ● ...
  3. 3. Demand What we care about Quality Relevance
  4. 4. Data @Quora
  5. 5. Topic Question User Answer Actions
  6. 6. Lots of data relations
  7. 7. Complex network propagation effects
  8. 8. Importance of topics & semantics
  9. 9. Machine Learning @Quora
  10. 10. Ranking - Answer ranking What is a good Quora answer? ● Truthful ● Reusable ● Provides explanation ● well formatted ...
  11. 11. Ranking - Answer ranking How are those criteria translated into features? ● Features that relate to the text quality itself ● Interaction features (upvotes/downvotes, clicks, comments…) ● User features (e.g. expertise in topic)
  12. 12. Ranking - Feed Present most interesting stories for a user at a given time ● Interesting = topical relevance + social relevance + timeliness ● Stories = questions + answers ● Personalized learning-to-rank approach ● Relevance-ordered vs time-ordered = big gains in engagement ● Challenges ○ Potentially many candidate stories ○ Real-time ranking ○ Objective function
  13. 13. Ranking - Feed ● Personalized LTR model ● Features ○ Quality of question/answer ○ Topics the user is interested in or knows about ○ Users the user is following ○ What is trending/popular ○ ... ● Different temporal windows ● Multi-stage solution with different “streams”
  14. 14. Recommendations - Topics Recommend new topics for the user to follow, based on ● Topics you already follow ● Users you already follow ● Interactions with questions/answers ● Topic-related features ● ...
  15. 15. Recommendations - Users Recommend new users for the user to follow, based on: ● Users you already follow ● Topics you already follow ● Interactions with users ● User-related features ● ...
  16. 16. Related questions Given interest in a question, what other questions are interesting? ● Not only about similarity, but also “interestingness” ● Features such as: ○ Textual ○ Co-visit ○ Topics ○ … ● Important for logged-out use case
  17. 17. Duplicate questions ● Important issue for Quora ○ Want to make sure we don’t disperse knowledge to the same question ● Binary classifier trained with labelled data ● Features ○ Textual vector space models ○ Usage-based features ○ ...
  18. 18. User expertise inference Infer user’s trustworthiness in relation to a given topic ● We take into account: ○ Answers written on topic ○ Upvotes/downvotes received ○ Endorsements ○ ... ● Trust/expertise propagates through the network ● Useful as input/features in other models
  19. 19. Spam detection and moderation ● Very important for Quora to keep quality of content ● Pure manual approaches do not scale ● Hard to get algorithms 100% right ● ML algorithms detect content/user issues ○ Output of the algorithms feed manually curated moderation queues
  20. 20. Content creation prediction ● Quora’s algorithms not only optimize for probability of reading ● Important to predict probability of a user answering a question ● Some product features completely rely on that prediction ○ E.g. A2A (ask to answer) suggestions
  21. 21. Trending topics Highlight current events that are interesting to the user ● We take into account: ○ Global “Trendiness” ○ Social “Trendiness” ○ User’s interest ○ ... ● Trending topics are a great discovery mechanism
  22. 22. Models & Experimentation
  23. 23. Models ● Logistic Regression ● Elastic Nets ● Gradient Boosted Decision Trees ● Random Forests ● (Deep) Neural Networks ● LambdaMART ● Matrix Factorization ● LDA ● ...
  24. 24. Open source project -- QMF Quora Matrix Factorization https://github.com/quora/qmf ● Currently BPR and WALS ● Multithreaded implementation in C++14
  25. 25. ML platform ● Allow ML Engineers and Data Scientists to collaborate within the same ML framework ● Easy integration with well known tools and open source libraries ● Offline evaluation and debugging ● User friendly Python frontend ● High performance and scalable C++/CUDA backend Redshift MySQL S3 Python User Interface Trainer Box Session CPU GPU Disk ...WALS BPR
  26. 26. ● Extensive A/B testing, data-driven decision-making ● Separate, orthogonal “layers” for different parts of the system ● Experiment framework showing comparisons for various metrics Experimentation
  27. 27. Conclusions
  28. 28. Conclusions ● At Quora we have not only Big, but also “rich” data ● Our algorithms need to understand and optimize complex aspects such as quality, interestingness, relevance, or user expertise ● We believe ML will be one of the keys to our success ● We have many interesting problems, and many unsolved challenges
  29. 29. We are hiring! www.quora.com/careers

×