Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Predicting Answering Behaviour in Online Question Answering Communities

698 views

Published on

The value of Question Answering (Q&A) communities is dependent on members of the community finding the questions they are most willing and able to answer. This can be difficult in communities with a high volume of questions. Much previous has work attempted to address this problem by recommending questions similar to those already answered. However, this approach disregards the question selection behaviour of the answers and how it is affected by factors such as question recency and reputation. In this paper, we identify the parameters that correlate with such a behaviour by analysing the users’ answering patterns in a Q&A com- munity. We then generate a model to predict which question a user is most likely to answer next. We train Learning to Rank (LTR) models to predict question selections using various user, question and thread feature sets. We show that answering behaviour can be predicted with a high level of success, and highlight the particular features that influence users’ question selections.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Predicting Answering Behaviour in Online Question Answering Communities

  1. 1. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES GRÉGOIRE BUREL1, PAUL MULHOLLAND1, YULAN HE2 AND HARITH ALANI1 1Knowledge Media Institute, The Open University, Milton Keynes, UK. 2School of Engineering & Applied Science Aston University, UK. HT2015 Middle East Technical University Northern Cyprus Campus, Cyprus. 2015
  2. 2. OUTLINE PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES -  Answering Behaviour in Question Answering Communities -  Question Answering Communities. -  The Cooking Community. -  Needs and Motivations. -  Contributions. -  Representing and Modelling Question Selection Behaviour -  Matrix Representation of Behaviour and Partially Ordered Sets. -  LTR Models. -  Answering Behaviour Predictors. -  Predicting Answering Behaviour -  Prediction Results. -  Features Reduction. -  Future Work and Conclusions
  3. 3. Q&A COMMUNITIES “Q&A communities are communities composed of askers and answerers looking for solutions to particular issues.” PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  4. 4. Q&A COMMUNITIES “Q&A communities are communities composed of askers and answerers looking for solutions to particular issues.” Question Answer #1 Answer #2 ... Answer #n QuestionThread PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  5. 5. Q&A COMMUNITIES “Q&A communities are communities composed of askers and answerers looking for solutions to particular issues.” Question Answer #1 Answer #2 ... Answer #n QuestionThread PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  6. 6. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES -  Cooking (CO): -  A web based cooking community specialised in culinary issues. -  Mostly focused on factual questions rather than conversational questions. -  Dataset (Data up to April 2011): -  3065 Questions -  9820 Answers -  4941 Users -  641Topics (Tags) http://cooking.stackexchange.com
  7. 7. Q&A COMMUNITIES -  Q&A Communities Needs (Rowe et al. 2011, Burel et al. 2012): -  Community Managers: -  Make sure that the community is “happy” (i.e. questions are solved). -  Make sure that the community becomes more knowledgeable over time (users gain expertise and experience). -  Identify and implement features that help users goals. -  Askers: -  Get answers related to a particular issue. -  Make sure that a community can fulfil their needs before asking a questions. -  Answerers: -  Find which question they can answer. -  Find questions they are willing to answer. -  Find questions that are challenging. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  8. 8. Q&A COMMUNITIES -  Q&A Communities Needs (Rowe et al. 2011, Burel et al. 2012): -  Community Managers: -  Make sure that the community is “happy” (i.e. questions are solved). -  Make sure that the community becomes more knowledgeable over time (users gain expertise and experience). -  Identify and implement features that help users goals. -  Askers: -  Get answers related to a particular issue. -  Make sure that a community can fulfil their needs before asking a questions. -  Answerers: -  Find which question they can answer. -  Find questions they are willing to answer. -  Find questions that are challenging. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  9. 9. Q&A COMMUNITIES -  Q&A Communities Needs (Rowe et al. 2011, Burel et al. 2012): -  Community Managers: -  Make sure that the community is “happy” (i.e. questions are solved). -  Make sure that the community becomes more knowledgeable over time (users gain expertise and experience). -  Identify and implement features that help users goals. -  Askers: -  Get answers related to a particular issue. -  Make sure that a community can fulfil their needs before asking a questions. -  Answerers: -  Find which question they can answer. -  Find questions they are willing to answer. -  Find questions that are challenging. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  10. 10. Q&A COMMUNITIES -  Q&A Communities Needs (Rowe et al. 2011, Burel et al. 2012): -  Community Managers: -  Make sure that the community is “happy” (i.e. questions are solved). -  Make sure that the community becomes more knowledgeable over time (users gain expertise and experience). -  Identify and implement features that help users goals. -  Askers: -  Get answers related to a particular issue. -  Make sure that a community can fulfil their needs before asking a questions. -  Answerers: -  Find which question they can answer. -  Find questions they are willing to answer. -  Find questions that are challenging. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES Identify how users pick questions to answer
  11. 11. CONTRIBUTIONS How answering behaviour can be modelled? Can we predict question selection behaviour accurately? -  Introduce a method for representing the question-selection behaviour of individual users in a Q&A community. -  Study the influence of 62 user, question, and thread features on answering behaviour and show how combining these features increases the quality of behaviour predictions. -  Investigate the use of Learning to Rank models (LTR) for identifying the most relevant question for a user at any given time. -  Construct multiple models to predict question-selections, and compare against multiple baselines (question recency, topic affinity, and random), achieving high precision gains against the baseline (+93%). PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  12. 12. LITERATURE How answering behaviour can be modelled? Can we predict question selection behaviour accurately? -  Most existing research focus on recommending questions (i.e. question routing) independently of the willigness of users to answer particular questions (Pazzani et al., 2007). -  Some work proposed a relatively similar approach to ours (Liu et al. 2011) but our approach differs for three main reasons: -  We use a mixture of dynamically-calculated question, thread and user (potential answerer) features. -  We consider all available questions at each contribution time rather than only recently posed questions. -  We identify which features correlate the most with user behaviour. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  13. 13. ANSWERING BEHAVIOUR IN Q&A COMMUNITIES -  Answering process: 1.  Obtain the list of available questions. 2.  Select a question and answer it. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES -  Questions that do not have best answers yet. -  Questions that are not already replied by the user.
  14. 14. ANSWERING BEHAVIOUR IN Q&A COMMUNITIES -  Answering process: 1.  Obtain the list of available questions. 2.  Select a question and answer it. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES -  Questions that do not have best answers yet (Open). -  Questions that are not already replied by the user.
  15. 15. REPRESENTING ANSWERING BEHAVIOUR -  The answering behaviour of a user can be represented using a matrix-like structure where: -  Columns represent answering time (t). -  Rows represent questions (q) statuses (Available/ Closed/Selected). PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  16. 16. REPRESENTING ANSWERING BEHAVIOUR PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES q8 > q3, q5, q7, q11, q12 Matrix Representation Decision Graph Partially Ordered Set
  17. 17. REPRESENTING ANSWERING BEHAVIOUR PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES q8 > q3, q5, q7, q11, q12 Matrix Representation Decision Graph Partially Ordered Set
  18. 18. REPRESENTING ANSWERING BEHAVIOUR PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES q8 > q3, q5, q7, q11, q12 Matrix Representation Decision Graph Partially Ordered Set
  19. 19. PREDICTING ANSWERING BEHAVIOUR PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES q8 > q3, q5, q7, q11, q12 -  Answering behaviour prediction is a ranking problem where: -  Only one question needs to be selected from a list of available questions.
  20. 20. PREDICTING ANSWERING BEHAVIOUR PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES q8 > q3, q5, q7, q11, q12 -  Answering behaviour prediction is a ranking problem where: -  Only one question needs to be selected from a list of available questions. Learning to Rank (LTR) problem where only one item is relevant.
  21. 21. LTR MODELS PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES -  LTR models are designed for generating a list of ranked items based on derived relevance labels: 1.  Pointwise Methods: Rank questions directly by only considering them individually. (Ranked Random Forests). 2.  Pairwise Methods: Rank questions by considering pairs. (LambdaRank). 3.  Listwise Methods: Rank questions by optimising evaluation measures. (ListNet).
  22. 22. LTR MODELS PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES -  LTR models are designed for generating a list of ranked items based on derived relevance labels: 1.  Pointwise Methods: Rank questions directly by only considering them individually (Ranked Random Forests). 2.  Pairwise Methods: Rank questions by considering pairs (LambdaRank, Quoc et Al., 2007). 3.  Listwise Methods: Rank questions by optimising evaluation measures (ListNet, Cao et Al., 2007).
  23. 23. FEATURES 1.  User Features: –  Represents the current characteristics and reputation of potential answerers (e.g. reputation, number of best answers …). 2.  Question Features: –  Content based features (e.g. readability…) and asker features (similar to user features). 3.  Thread Features: –  Represents the current state of an answering thread. –  Aggregate (i.e. average) the features of all the answers already posted to a question. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  24. 24. FEATURES 1.  User Features: –  Represents the current characteristics and reputation of potential answerers (e.g. reputation, number of best answers …). 2.  Question Features: –  Content based features (e.g. readability…) and asker features (similar to user features). 3.  Thread Features: –  Represents the current state of an answering thread. –  Aggregate (i.e. average) the features of all the answers already posted to a question. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  25. 25. FEATURES 1.  User Features: –  Represents the current characteristics and reputation of potential answerers (e.g. reputation, number of best answers …). 2.  Question Features: –  Content based features (e.g. readability…) and asker features (similar to user features). 3.  Thread Features: –  Represents the current state of an answering thread. –  Aggregate (i.e. average) the features of all the answers already posted to a question. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  26. 26. FEATURES PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES Type Features User (17) Number of Answers, Reputation, Answering Success, Number of Posts, Number of Questions, Question Reputation, Answer Reputation, Asking Success, Topic Reputation, Topic Affinity, Average Answer Reputation, Average Question Reputation, Ratio of Successfully Answered Questions, Ratio of Successfully Solved Questions, Average Observer Reputation, Ratio of Reputation for a Potential Question, and Average Topic Reputation. Question (23) Asker Features + Question Age, Number of Words, Referral Count, Readability with Gunning Fog Index, Readability with LIX, Cumulative Term Entropy, Question Polarity. Thread (22) Average Answerer Features + Average Number of Words, Average Referral Count, Average Readability with Gunning Fog Index, Average Readability with LIX, Average Cumulative Term Entropy, Average Answer Polarity.
  27. 27. ANSWERING BEHAVIOUR PREDICTION -  Experimental Setting: 1.  Sample 100 users out of the 283 users that have answered at least 5 questions. 2.  Compute features and generate partially ordered sets. 3.  Train a model for each user using a chronological 80%-20% training/testing split. 4.  Compare the prediction results using 3 different LTR algorithms: 1) Random Forests; 2) LambdaRank, and; 3) ListNet. 5.  Compute MRR and MAP@n for different feature groups and algorithms. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  28. 28. ANSWERING BEHAVIOUR PREDICTION -  Mean Reciprocal Rank (MRR) in the context of behaviour prediction: -  Represents the average rank of the relevant question in each list. -  Mean Average Precision (MAP@n) in the context of behaviour prediction: – Represents the average position of the relevant question within the top n items of each list. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  29. 29. ANSWERING PREDICTIONS RESULTS PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  30. 30. ANSWERING PREDICTIONS RESULTS -  Answering Behaviour Predictions (MRR 0.446): – Baseline Models: -  Question age correlates better than topic affinity. -  Picked questions tend to be from the 10 most recent questions (MRR = 0.094). – Feature Types Models and Complete Model: -  Observer features are not relevant whereas question features are the most useful. -  Random Forests with all the features provides the best results (MRR = 0.446): -  On average, selected questions are found in the 2nd or 3rd position. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  31. 31. FEATURES RANKING -  Features Ranking: 1.  For each feature, Information Gain Ratio (IGR), Correlation Feature Selection (CFS) and MRR Feature Drop (Ablation Method, FD) are computed. 2.  The features are then sorted by their respective average importance. 3.  The best features are then selected for computing new prediction models by accounting for the best MRR. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  32. 32. FEATURES RANKING RESULTS PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  33. 33. FEATURES RANKING RESULTS PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  34. 34. FEATURES RANKING RESULTS -  Features Impact Comparison: – All features are important: - Question features represent 40% of the top 15 features, Thread features 29% and User features 20%. - The top question features show that: - Questions with hyperlinks are less likely to attract answerers. - Questions from reputable users are more likely to be picked as well as questions with fewer answers. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  35. 35. FEATURES RANKING RESULTS - The top thread features show that: - Users are more likely to answer when the complexity of existing answers is low and the reputations of answerers is low. - User features are not well ranked and may only be used for differentiating knowledgeable users from less skilled answerers. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  36. 36. ANSWERING PREDICTIONS RESULTS PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  37. 37. BEST MODEL RESULTS -  Best Model (MRR 0.491): – The best model is obtained when using FD and 58 of the proposed 62 features but… PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  38. 38. BEST MODEL RESULTS -  Best Model (MRR 0.491): – The best model is obtained when using FD and 58 of the proposed 62 features but… -  Almost Best Model (MRR 0.441): -  By using only 15 features and the merged rankings. -  With much less features computations. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  39. 39. FUTURE WORK -  Perform similar analysis on other Q&A Communities/Users: -  Confirm the results on additional datasets and user samples. -  Balance predication accuracy and computation complexity for analysing bigger communities: -  Relax some assumptions (e.g. limit the analysis to k most recent questions). -  Reduce the number of features. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  40. 40. CONCLUSIONS -  We observed that answering decisions can be represented using partially ordered sets and predicted using LTR models. -  For the CO community, we observed that: -  Pointwise LTR models can be applied successfully for predicting answering behaviour (MRR = 0.491). -  Only a few features may be enough for predicting answering behaviour (MRR = 0.441 with 15 features). PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  41. 41. QUESTIONS? Email: g.burel@open.ac.uk Twitter: @evhart @ PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  42. 42. REFERENCES -  Rowe, M., Alani, H., Angeletou, S., and Burel, G. Report on social, technical and corporate needs in online communities. Tech. Rep. 3.1, ROBUST, 2011. -  Burel, G, Yulan H., Alani H. Automatic Identification Of Best Answers In Online Enquiry Communities. In Proceeding of ESWC2012 (2012). Heraklion, Greece. -  Q. Liu and E. Agichtein. Modeling answerer behavior in collaborative question answering systems. In Advances in Information Retrieval. Springer, 2011. -  Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: From pairwise approach to listwise approach. In Proceedings of the 24th International Conference on Machine Learning, ICML ’07, New York, NY, USA, 2007. ACM. -  C. Quoc and V. Le. Learning to rank with nonsmooth cost functions. NIPS’07, 2007. -  M. J. Pazzani and D. Billsus. Content-based recommendation systems. In The adaptive web. Springer, 2007. PREDICTING ANSWERING BEHAVIOUR IN ONLINE QUESTION ANSWERING COMMUNITIES
  43. 43. REFERENCES -  Rowe, M., Alani, H., Angeletou, S., and Burel, G. Report on social, technical and corporate needs in online communities. Tech. Rep. 3.1, ROBUST, 2011. -  Burel, G, Yulan H., Alani H. Automatic Identification Of Best Answers In Online Enquiry Communities. In Proceeding of ESWC2012 (2012). Heraklion, Greece. -  Wu, M. The community health index. In Proceedings of the 4th International Conference on Persuasive Technology (New York, NY, USA, 2009), Persuasive ’09, ACM, pp. 24:1–24:2. -  Bachrach, Y., Graepel, T., Minka, T., and Guiver, J. How to grade a test without knowing the Answers - A bayesian graphical model for adaptive crowdsourcing and aptitude testing. arXiv preprint arXiv: 1206.6386 (2012). -  Welinder, P., Branson, S., Belongie, S., and Perona, P. The multidimensional wisdom of crowds. In In Proc. of NIPS (2010), pp. 2424–2432. -  Toral, S. L., Martınez-Torres, M. R., Barrero, F., and Cortals, F. An empirical study of the driving forces behind online communities. Internet Research 19, 4 (2009), 378–392. -  Pal, A., Chang, S., and Konstan, J. Evolution of experts in question answering communities. In Proceedings of the International AAAI Conference on Weblogs and Social Media (2012), pp. 274–281. -  Nam, K., Ackerman, M., and Adamic, L. Questions in, knowledge in?: a study of naver’s question answering community. In Proceedings of the 27th international conference on Human factors in computing systems (2009), pp. 779–788. -  Pal, A., Chang, S., and Konstan, J. Evolution of experts in question answering communities. In Proceedings of the International AAAI Conference on Weblogs and Social Media (2012), pp. 274–281. A QUESTION OF COMPLEXITY − MEASURING THE MATURITY OF ONLINE ENQUIRY COMMUNITIES

×