ML @ Quora
ML Algorithms for Growing the World’s Knowledge
Seattle, 05/01/2015Xavier Amatriain (@xamat)
About Quora
Our Mission
“To share and grow the world’s
knowledge”
• Millions of questions & answers
• Millions of users
• Thousands of topics
• ...
Lots of data relations
Complex network propagation effects
Importance of topics & semantics
Demand
What we care about
Quality
Relevance
Machine Learning
@Quora
Ranking - Answer ranking
What is a good Quora answer?
• truthful
• reusable
• provides explanation
• well formatted
• ...
Ranking - Answer ranking
How are those dimensions translated
into features?
• Features that relate to the text
quality itself
• Interaction features
(upvotes/downvotes, clicks,
comments…)
• User features (e.g. expertise in topic)
Ranking - Feed
• Personalized learning-to-rank
approach
• Goal: Present most interesting stories
for a user at a given time
• Interesting = topical relevance +
social relevance + timeliness
• Stories = questions + answers
Ranking - Feed
• Features
• Quality of question/answer
• Topics the user is interested on/
knows about
• Users the user is following
• What is trending/popular
• …
• Different temporal windows
• Multi-stage solution with different
“streams”
Recommendations - Topics
Goal: Recommend new topics for the
user to follow
• Based on
• Other topics followed
• Users followed
• User interactions
• Topic-related features
• ...
Recommendations - Users
Goal: Recommend new users to follow
• Based on:
• Other users followed
• Topics followed
• User interactions
• User-related features
• ...
Related Questions
• Given interest in question A (source) what other
questions will be interesting?
• Not only about similarity, but also “interestingness”
• Features such as:
• Textual
• Co-visit
• Topics
• …
• Important for logged-out use case
Duplicate Questions
• Important issue for Quora
• Want to make sure we don’t disperse
knowledge to the same question
• Solution: binary classifier trained with
labelled data
• Features
• Textual vector space models
• Usage-based features
• ...
User Trust/Expertise Inference
Goal: Infer user’s trustworthiness in relation
to a given topic
• We take into account:
• Answers written on topic
• Upvotes/downvotes received
• Endorsements
• ...
• Trust/expertise propagates through the network
• Must be taken into account by other algorithms
Trending Topics
Goal: Highlight current events that are
interesting for the user
• We take into account:
• Global “Trendiness”
• Social “Trendiness”
• User’s interest
• ...
• Trending topics are a great discovery mechanism
Spam Detection/Moderation
• Very important for Quora to keep quality of
content
• Pure manual approaches do not scale
• Hard to get algorithms 100% right
• ML algorithms detect content/user issues
• Output of the algorithms feed manually
curated moderation queues
Content Creation Prediction
• Quora’s algorithms not only optimize for
probability of reading
• Important to predict probability of a user
answering a question
• Parts of our system completely rely on
that prediction
• E.g. A2A (ask to answer) suggestions
Models
● Logistic Regression
● Elastic Nets
● Gradient Boosted Decision
Trees
● Random Forests
● Neural Networks
● LambdaMART
● Matrix Factorization
● LDA
● ...
Conclusions
Conclusions
• At Quora we have not only Big, but also “rich” data
• Our algorithms need to understand and optimize complex aspects
such as quality, interestingness, or user expertise
• We believe ML will be one of the keys to our success
• We have many interesting problems, and many unsolved challenges
We’re Hiring!
http://www.quora.com/careers/

MLConf Seattle 2015 - ML@Quora

  • 1.
    ML @ Quora MLAlgorithms for Growing the World’s Knowledge Seattle, 05/01/2015Xavier Amatriain (@xamat)
  • 2.
  • 3.
    Our Mission “To shareand grow the world’s knowledge” • Millions of questions & answers • Millions of users • Thousands of topics • ...
  • 4.
    Lots of datarelations
  • 5.
  • 6.
  • 7.
    Demand What we careabout Quality Relevance
  • 8.
  • 9.
    Ranking - Answerranking What is a good Quora answer? • truthful • reusable • provides explanation • well formatted • ...
  • 10.
    Ranking - Answerranking How are those dimensions translated into features? • Features that relate to the text quality itself • Interaction features (upvotes/downvotes, clicks, comments…) • User features (e.g. expertise in topic)
  • 11.
    Ranking - Feed •Personalized learning-to-rank approach • Goal: Present most interesting stories for a user at a given time • Interesting = topical relevance + social relevance + timeliness • Stories = questions + answers
  • 12.
    Ranking - Feed •Features • Quality of question/answer • Topics the user is interested on/ knows about • Users the user is following • What is trending/popular • … • Different temporal windows • Multi-stage solution with different “streams”
  • 13.
    Recommendations - Topics Goal:Recommend new topics for the user to follow • Based on • Other topics followed • Users followed • User interactions • Topic-related features • ...
  • 14.
    Recommendations - Users Goal:Recommend new users to follow • Based on: • Other users followed • Topics followed • User interactions • User-related features • ...
  • 15.
    Related Questions • Giveninterest in question A (source) what other questions will be interesting? • Not only about similarity, but also “interestingness” • Features such as: • Textual • Co-visit • Topics • … • Important for logged-out use case
  • 16.
    Duplicate Questions • Importantissue for Quora • Want to make sure we don’t disperse knowledge to the same question • Solution: binary classifier trained with labelled data • Features • Textual vector space models • Usage-based features • ...
  • 17.
    User Trust/Expertise Inference Goal:Infer user’s trustworthiness in relation to a given topic • We take into account: • Answers written on topic • Upvotes/downvotes received • Endorsements • ... • Trust/expertise propagates through the network • Must be taken into account by other algorithms
  • 18.
    Trending Topics Goal: Highlightcurrent events that are interesting for the user • We take into account: • Global “Trendiness” • Social “Trendiness” • User’s interest • ... • Trending topics are a great discovery mechanism
  • 19.
    Spam Detection/Moderation • Veryimportant for Quora to keep quality of content • Pure manual approaches do not scale • Hard to get algorithms 100% right • ML algorithms detect content/user issues • Output of the algorithms feed manually curated moderation queues
  • 20.
    Content Creation Prediction •Quora’s algorithms not only optimize for probability of reading • Important to predict probability of a user answering a question • Parts of our system completely rely on that prediction • E.g. A2A (ask to answer) suggestions
  • 21.
    Models ● Logistic Regression ●Elastic Nets ● Gradient Boosted Decision Trees ● Random Forests ● Neural Networks ● LambdaMART ● Matrix Factorization ● LDA ● ...
  • 22.
  • 23.
    Conclusions • At Quorawe have not only Big, but also “rich” data • Our algorithms need to understand and optimize complex aspects such as quality, interestingness, or user expertise • We believe ML will be one of the keys to our success • We have many interesting problems, and many unsolved challenges
  • 24.