Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning to Grow the World's Knowledge

22,498 views

Published on

How does Quora use Machine Learning to grow the world's knowledge? I talked about this in a presentation I gave at Stitchfix.

  • Nice !! Download 100 % Free Ebooks, PPts, Study Notes, Novels, etc @ https://www.ThesisScientist.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hi there! Essay Help For Students | Discount 10% for your first order! - Check our website! https://vk.cc/80SakO
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Nice !! Download 100 % Free Ebooks, PPts, Study Notes, Novels, etc @ https://www.ThesisScientist.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • FACTS FOR LIQUID BIOFERTILISER BY DR U C MISHRA
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Machine Learning to Grow the World's Knowledge

  1. Machine Learning to Grow the World's Knowledge Xavier Amatriain (@xamat) 8/18/2015 Multithreaded Data
  2. Our Mission “To share and grow the world’s knowledge” • Millions of questions & answers • Millions of users • Thousands of topics • ...
  3. Core Product & Quality Our Product Teams Distribution Lookup
  4. Demand What we care about Quality Relevance
  5. Data @Quora
  6. Lots of data relations
  7. Complex network propagation effects
  8. Importance of topics & semantics
  9. Machine Learning @Quora
  10. Ranking - Answer ranking What is a good Quora answer? • truthful • reusable • provides explanation • well formatted • ...
  11. Ranking - Answer ranking How are those dimensions translated into features? • Features that relate to the text quality itself • Interaction features (upvotes/downvotes, clicks, comments…) • User features (e.g. expertise in topic)
  12. Ranking - Feed • Personalized learning-to-rank approach • Goal: Present most interesting stories for a user at a given time • Interesting = topical relevance + social relevance + timeliness • Stories = questions + answers
  13. Ranking - Feed • Features • Quality of question/answer • Topics the user is interested on/ knows about • Users the user is following • What is trending/popular • … • Different temporal windows • Multi-stage solution with different “streams”
  14. Recommendations - Topics Goal: Recommend new topics for the user to follow • Based on • Other topics followed • Users followed • User interactions • Topic-related features • ...
  15. Recommendations - Users Goal: Recommend new users to follow • Based on: • Other users followed • Topics followed • User interactions • User-related features • ...
  16. Related Questions • Given interest in question A (source) what other questions will be interesting? • Not only about similarity, but also “interestingness” • Features such as: • Textual • Co-visit • Topics • … • Important for logged-out use case
  17. Duplicate Questions • Important issue for Quora • Want to make sure we don’t disperse knowledge to the same question • Solution: binary classifier trained with labelled data • Features • Textual vector space models • Usage-based features • ...
  18. User Trust/Expertise Inference Goal: Infer user’s trustworthiness in relation to a given topic • We take into account: • Answers written on topic • Upvotes/downvotes received • Endorsements • ... • Trust/expertise propagates through the network • Must be taken into account by other algorithms
  19. Trending Topics Goal: Highlight current events that are interesting for the user • We take into account: • Global “Trendiness” • Social “Trendiness” • User’s interest • ... • Trending topics are a great discovery mechanism
  20. Spam Detection/Moderation • Very important for Quora to keep quality of content • Pure manual approaches do not scale • Hard to get algorithms 100% right • ML algorithms detect content/user issues • Output of the algorithms feed manually curated moderation queues
  21. Content Creation Prediction • Quora’s algorithms not only optimize for probability of reading • Important to predict probability of a user answering a question • Parts of our system completely rely on that prediction • E.g. A2A (ask to answer) suggestions
  22. Models ● Logistic Regression ● Elastic Nets ● Gradient Boosted Decision Trees ● Random Forests ● Neural Networks ● LambdaMART ● Matrix Factorization ● LDA ● ...
  23. Conclusions • At Quora we have not only Big, but also “rich” data • Our algorithms need to understand and optimize complex aspects such as quality, interestingness, or user expertise • We believe ML will be one of the keys to our success • We have many interesting problems, and many unsolved challenges

×