Advertisement

Machine Learning to Grow the World's Knowledge

Cofounder/CTO at Curai
Aug. 24, 2015
Advertisement

More Related Content

Slideshows for you(20)

Similar to Machine Learning to Grow the World's Knowledge(20)

Advertisement

More from Xavier Amatriain(19)

Advertisement

Machine Learning to Grow the World's Knowledge

  1. Machine Learning to Grow the World's Knowledge Xavier Amatriain (@xamat) 8/18/2015 Multithreaded Data
  2. Our Mission “To share and grow the world’s knowledge” • Millions of questions & answers • Millions of users • Thousands of topics • ...
  3. Core Product & Quality Our Product Teams Distribution Lookup
  4. Demand What we care about Quality Relevance
  5. Data @Quora
  6. Lots of data relations
  7. Complex network propagation effects
  8. Importance of topics & semantics
  9. Machine Learning @Quora
  10. Ranking - Answer ranking What is a good Quora answer? • truthful • reusable • provides explanation • well formatted • ...
  11. Ranking - Answer ranking How are those dimensions translated into features? • Features that relate to the text quality itself • Interaction features (upvotes/downvotes, clicks, comments…) • User features (e.g. expertise in topic)
  12. Ranking - Feed • Personalized learning-to-rank approach • Goal: Present most interesting stories for a user at a given time • Interesting = topical relevance + social relevance + timeliness • Stories = questions + answers
  13. Ranking - Feed • Features • Quality of question/answer • Topics the user is interested on/ knows about • Users the user is following • What is trending/popular • … • Different temporal windows • Multi-stage solution with different “streams”
  14. Recommendations - Topics Goal: Recommend new topics for the user to follow • Based on • Other topics followed • Users followed • User interactions • Topic-related features • ...
  15. Recommendations - Users Goal: Recommend new users to follow • Based on: • Other users followed • Topics followed • User interactions • User-related features • ...
  16. Related Questions • Given interest in question A (source) what other questions will be interesting? • Not only about similarity, but also “interestingness” • Features such as: • Textual • Co-visit • Topics • … • Important for logged-out use case
  17. Duplicate Questions • Important issue for Quora • Want to make sure we don’t disperse knowledge to the same question • Solution: binary classifier trained with labelled data • Features • Textual vector space models • Usage-based features • ...
  18. User Trust/Expertise Inference Goal: Infer user’s trustworthiness in relation to a given topic • We take into account: • Answers written on topic • Upvotes/downvotes received • Endorsements • ... • Trust/expertise propagates through the network • Must be taken into account by other algorithms
  19. Trending Topics Goal: Highlight current events that are interesting for the user • We take into account: • Global “Trendiness” • Social “Trendiness” • User’s interest • ... • Trending topics are a great discovery mechanism
  20. Spam Detection/Moderation • Very important for Quora to keep quality of content • Pure manual approaches do not scale • Hard to get algorithms 100% right • ML algorithms detect content/user issues • Output of the algorithms feed manually curated moderation queues
  21. Content Creation Prediction • Quora’s algorithms not only optimize for probability of reading • Important to predict probability of a user answering a question • Parts of our system completely rely on that prediction • E.g. A2A (ask to answer) suggestions
  22. Models ● Logistic Regression ● Elastic Nets ● Gradient Boosted Decision Trees ● Random Forests ● Neural Networks ● LambdaMART ● Matrix Factorization ● LDA ● ...
  23. Conclusions • At Quora we have not only Big, but also “rich” data • Our algorithms need to understand and optimize complex aspects such as quality, interestingness, or user expertise • We believe ML will be one of the keys to our success • We have many interesting problems, and many unsolved challenges
Advertisement