Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

983 views

Published on

How can we automatically difficulty-balance tasks or levels in human computation games or crowdsourcing?

Published in: Design

Player Rating Algorithms for Balancing Human Computation Games: Testing the Effect of Bipartiteness

  1. 1. Player Rating Systems for Balancing Human Computation Games testing the effect of bipartiteness Seth Cooper, Sebastian Deterding, Theo Tsapakos DiGRA 2016, August 6, 2016 c b
  2. 2. <1> the challenge
  3. 3. »flow« Difficulty Skill/time frustration boredom flow (1990) Mihaly Csikszentmihalyi
  4. 4. winning odds correlate w/ retention Lomas et al., 2013
  5. 5. human computation games
  6. 6. Difficulty Skill/time 1. scientific tasks are predetermined the problem
  7. 7. Difficulty Skill/time 2. tasks can’t be changed
  8. 8. Difficulty Skill/time 3. Difficulty is unknown in advance ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
  9. 9. Difficulty Skill/time 4. solving tasks defeats crowdsourcing ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !
  10. 10. Difficulty Skill/time ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? … hence tasks are served randomly Lintott, 2016
  11. 11. hence retention is very poor Sauermann & Franzoni, 2015
  12. 12. %Playerretained Time/levels most leave after balanced tutorials * idealised tutorial actual tasks *
  13. 13. Difficulty Skill/time How to sequence tasks w/o solving? ? ? ? ? ? ? ? ? the challenge ? ? ?
  14. 14. user-generated content also appliesto
  15. 15. crowdsourcing also appliesto
  16. 16. <2> the approach
  17. 17. multiplayer matchmaking
  18. 18. elo, 1978 glicko-2, 2012/3 trueskill, 2006 uses player rating algorithms
  19. 19. skill = winning odds, updated w/ each game Moser, 2010
  20. 20. remember: winning odds > retention Lomas et al., 2013
  21. 21. widely used, effective prediction Menke, 2016
  22. 22. our approach: tasks = players Player rating = skill Task rating = difficulty Player rating = skill
  23. 23. <3> the question
  24. 24. we produce a bipartite graph Asratian et al., 1998
  25. 25. we produce a bipartite graph Asratian et al., 1998 PlayersTasks
  26. 26. PlayersTasks less density, less information flow Scott, 2012
  27. 27. more structural holes Scott, 2012 PlayersTasks
  28. 28. more unbalanced graphs Scott, 2012 PlayersTasks
  29. 29. Research question does a bipartite (player-player or user-task) graph negatively affect the prediction accuracy of player rating algorithms? does graph balancedness affect accurcay?
  30. 30. <4> the study
  31. 31. predicting chess matches with elo data set1
  32. 32. bipartite training data has no effect
  33. 33. unbalanced bipartite graphs perform better
  34. 34. unbalanced bipartite graphs have super vertices
  35. 35. elo, glicko2, Truskill on paradox game data set2
  36. 36. all rating systems outperform baseline
  37. 37. <5> discussion & outlook
  38. 38. main contributions • Identified 4 challenges to difficulty balancing in human computation games, crowdsourcing, UGC • Introduced content sequencing through adapting player rating algorithms as a novel approach • Identified bipartiteness of user-task graph as potential issue • Found that bipartiteness does not affect prediction accuracy of ELO, Glicko-2, Truskill in Chess matches or human computation game Paradox • Found that unbalanced graphs improve prediction accuracy, presumably due to super vertices/players • Provided first support that our approach is viable
  39. 39. limitations & future work I • Approach requires previous/initial data • Use super-users to provide initial data • Use “calibration” tasks in tutorials • Use mixed method data to identify skill & difficulty indicators, data & machine learning to validate & extract additional indicators • Current algorithms only compute win/loss/draw • Graded success measures could improve accuracy and learning speed • Study trained on large data sets (10,000, 37 edges) • Testing learning speed of algorithms w/ current default retention in human computation games • Study tested only one human computation game • Replication with multiple games
  40. 40. limitations & future work II • Study didn’t test direct effect on retention • Follow-up user study • Task pool might not contain tasks of best-fitting difficulty (similar to empty bar in mulitplayer games) • Procedural content generation to generate training/filler tasks • Many human computation tasks don’t vary much in difficulty • Expand matching approach to other factors like curiosity/variety
  41. 41. sebastian@codingconduct.cc @dingstweets codingconduct.cc Thank you.

×