Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning to Grow the World's Knowledge

24,057 views

Published on

How does Quora use Machine Learning to grow the world's knowledge? I talked about this in a presentation I gave at Stitchfix.

  • Do This Simple 2-Minute Ritual To Loss 1 Pound Of Belly Fat Every 72 Hours ■■■ http://scamcb.com/bkfitness3/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Don't fear hair loss - fight it! Try Profollica� natural two-step hair loss system for visible results you'll see and love! ▲▲▲ http://t.cn/AiHip2fH
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • How can I improve my memory and concentration? How can I improve my memory for studying?■■■ https://tinyurl.com/brainpill101
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Would you rather read the text transcript of how a burnt-out, 40 year old mother lost 84lb doing NO exercise, using a simple set of Flavor-Pairing Rituals? Read The Text Version Here To Find Out.. ◆◆◆ http://ishbv.com/poundinc/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • How I Cured My Acne? Life-long Sufferer Discovers Powerful Secret To Acne Free Skin  http://t.cn/AiWGkfA8
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Machine Learning to Grow the World's Knowledge

  1. Machine Learning to Grow the World's Knowledge Xavier Amatriain (@xamat) 8/18/2015 Multithreaded Data
  2. Our Mission “To share and grow the world’s knowledge” • Millions of questions & answers • Millions of users • Thousands of topics • ...
  3. Core Product & Quality Our Product Teams Distribution Lookup
  4. Demand What we care about Quality Relevance
  5. Data @Quora
  6. Lots of data relations
  7. Complex network propagation effects
  8. Importance of topics & semantics
  9. Machine Learning @Quora
  10. Ranking - Answer ranking What is a good Quora answer? • truthful • reusable • provides explanation • well formatted • ...
  11. Ranking - Answer ranking How are those dimensions translated into features? • Features that relate to the text quality itself • Interaction features (upvotes/downvotes, clicks, comments…) • User features (e.g. expertise in topic)
  12. Ranking - Feed • Personalized learning-to-rank approach • Goal: Present most interesting stories for a user at a given time • Interesting = topical relevance + social relevance + timeliness • Stories = questions + answers
  13. Ranking - Feed • Features • Quality of question/answer • Topics the user is interested on/ knows about • Users the user is following • What is trending/popular • … • Different temporal windows • Multi-stage solution with different “streams”
  14. Recommendations - Topics Goal: Recommend new topics for the user to follow • Based on • Other topics followed • Users followed • User interactions • Topic-related features • ...
  15. Recommendations - Users Goal: Recommend new users to follow • Based on: • Other users followed • Topics followed • User interactions • User-related features • ...
  16. Related Questions • Given interest in question A (source) what other questions will be interesting? • Not only about similarity, but also “interestingness” • Features such as: • Textual • Co-visit • Topics • … • Important for logged-out use case
  17. Duplicate Questions • Important issue for Quora • Want to make sure we don’t disperse knowledge to the same question • Solution: binary classifier trained with labelled data • Features • Textual vector space models • Usage-based features • ...
  18. User Trust/Expertise Inference Goal: Infer user’s trustworthiness in relation to a given topic • We take into account: • Answers written on topic • Upvotes/downvotes received • Endorsements • ... • Trust/expertise propagates through the network • Must be taken into account by other algorithms
  19. Trending Topics Goal: Highlight current events that are interesting for the user • We take into account: • Global “Trendiness” • Social “Trendiness” • User’s interest • ... • Trending topics are a great discovery mechanism
  20. Spam Detection/Moderation • Very important for Quora to keep quality of content • Pure manual approaches do not scale • Hard to get algorithms 100% right • ML algorithms detect content/user issues • Output of the algorithms feed manually curated moderation queues
  21. Content Creation Prediction • Quora’s algorithms not only optimize for probability of reading • Important to predict probability of a user answering a question • Parts of our system completely rely on that prediction • E.g. A2A (ask to answer) suggestions
  22. Models ● Logistic Regression ● Elastic Nets ● Gradient Boosted Decision Trees ● Random Forests ● Neural Networks ● LambdaMART ● Matrix Factorization ● LDA ● ...
  23. Conclusions • At Quora we have not only Big, but also “rich” data • Our algorithms need to understand and optimize complex aspects such as quality, interestingness, or user expertise • We believe ML will be one of the keys to our success • We have many interesting problems, and many unsolved challenges

×