Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Can I Improve My App? Classifying User Reviews for Software Maintenance and Evolution

716 views

Published on

App Stores, such as Google Play or the Apple Store, allow users to provide feedback on apps by posting review comments and giving star ratings. These platforms constitute a useful electronic mean in which application developers and users can productively exchange information about apps. Previous research showed that users feedback contains usage scenarios, bug reports and feature requests, that can help app developers to accomplish software maintenance and evolution tasks. However, in the case of the most popular apps, the large amount of received feedback, its unstructured nature and varying quality can make the identification of useful user feedback a very challenging task.
In this paper we present a taxonomy to classify app reviews
into categories relevant to software maintenance and evolution,
as well as an approach that merges three techniques: (1)
Natural Language Processing, (2) Text Analysis and (3) Sentiment
Analysis to automatically classify app reviews into the proposed
categories. We show that the combined use of these techniques
allows to achieve better results (a precision of 75% and a recall
of 74%) than results obtained using each technique individually
(precision of 70% and a recall of 67%).

  • Be the first to comment

How Can I Improve My App? Classifying User Reviews for Software Maintenance and Evolution

  1. 1. How  Can  I  Improve  My  App?   Classifying  User  Reviews  for  Software   Maintenance  and  Evolution          Sebastiano                          Andrea                              Emi@a                          Corrado                        Gerardo                            Harald            Panichella                          Di  Sorbo                          Guzman                      Visaggio                        Canfora                                Gall  
  2. 2. Outline   2 Context:   App  Store  Reviews Case  Study:   User  Reviews  of  7 Popular  Apps Results: Automatic  Classification Performed  Using  Different Sources  of  Information
  3. 3. App  User  Reviews   3
  4. 4. A  Communication  Channel   Between  Developers  and  Users 4
  5. 5. Reviews  Include  Useful   Information  for  Developers Pagano  et  al.  –  RE2013          Chen  et  al.  –  ICSE2014          Galvis  Carreno  et  al.  –  ICSE2013 5
  6. 6. A  Large  Amount  of  Effort  for   Developers Many  reviews  every  day Unstructured  data 6
  7. 7. Users  Submit  Many    Reviews  Regularly iOS  apps  receive    on  average  23   reviews  per  day   Facebook  for  iOS  receive  more   than  4000  reviews  per  day   [  Pagano    et  al.  -­‐‑  RE  2013  ] 7
  8. 8. The  Quality  of  Reviews  Varies   [  Pagano    et  al.  -­‐‑  RE  2013  ] 8
  9. 9. Past  Work   Chen  et  al  –  ICSE  2014 ARMiner:  an  approach  to  help   app  developers  discover  the   most  informative  user   reviews i.  text  analysis  and  machine   learning  to  filter  out  non-­‐‑ informative  reviews   ii.  topic  analysis  to  recognize   topics  treated  in  the  reviews   classified  as  informative 9
  10. 10. Identifying  Useful  Reviews   i.  The  awful  buZon  in  the  page  doesn’t  work ii.  A  buZon  in  the  page  should  be  added 10
  11. 11. Identifying  Useful  Reviews   i.  The  awful  buRon  in  the  page  doesn’t  work ii.  A  buRon  in  the  page  should  be  added 11
  12. 12. Identifying  Useful  Reviews   i.  The  awful  buRon  in  the  page  doesn’t  work ii.  A  buRon  in  the  page  should  be  added 12 BUG  DESCRIPTION
  13. 13. Available  Sources  for  identifying  Useful  Reviews   i.  The  awful  buZon  in  the  page  doesn’t  work ii.  A  buZon  in  the  page  should  be  added sentiment lexicon structure 13
  14. 14. Employed  Techniques   sentiment lexicon structure Natural  Language  Parsing Sentiment  Analysis Text  Analysis 14
  15. 15. Natural  Language  Parsing Sentiment  Analysis Text  Analysis Machine  Learning 15 Employed  Techniques  
  16. 16. Goal:   Understanding   to   what   extent   NLP,   Text   Analysis   and   Sentiment   Analysis   could   help   in   recognizing   informative   reviews   from   a   software   maintenance   and   evolution  perspective Quality   focus:   Automatic   detection   of   user   reviews   containing  helpful  information  for  developers.   Perspective:   Guide   developers   in   maintaining   and   evolving  their  apps.   Case  Study   16
  17. 17. Research  Questions   RQ1:  Are  the  language  structure,  content  and  sentiment   information  able  to  identify  user  reviews  that  could  help   developers   in   accomplishing   software   maintenance   and   evolution  tasks? RQ2:   Does   the   combination   of   language   structure,   content  and  sentiment  information  produce  beRer  results   than  individual  techniques  used  in  isolation? 17
  18. 18. Context 18  All  seven  apps  were  in  the  list  of  the   most  popular  apps    in  the  year  2013
  19. 19. 19
  20. 20. Automatic  Classification   of  User  Reviews   20
  21. 21. Automatic  Classification   of  User  Reviews   21
  22. 22. Automatic  Classification of  User  Reviews   22
  23. 23. Automatic  Classification of  User  Reviews   23
  24. 24. Automatic  Classification of  User  Reviews   24
  25. 25. Automatic  Classification of  User  Reviews   25
  26. 26. Taxonomy Definition 26
  27. 27. What  do  Developers  Look  for? 27
  28. 28. Conversations  between  Developers 11  –  01  –  2014 01  –  01  –  2015   150   150   28
  29. 29. Sentences  Occurring  in   Developers’  Discussions 29
  30. 30. Topics  in  App  User  Reviews Pagano  et  al.  –  RE2013 30
  31. 31. Mapping (i)  Taxonomy  of  discussions   contained  in        informative   reviews. (ii)  Taxonomy  of  discussions   occurring  in    development   communication 31
  32. 32. Mapping (i)  Taxonomy  of  discussions   contained  in        informative   reviews. (ii)  Taxonomy  of  discussions   occurring  in    development   communication 32
  33. 33. Mapping (i)  Taxonomy  of  discussions   contained  in        informative   reviews. (ii)  Taxonomy  of  discussions   occurring  in    development   communication 33
  34. 34. Mapping (i)  Taxonomy  of  discussions   contained  in        informative   reviews. (ii)  Taxonomy  of  discussions   occurring  in    development   communication (iii)  Intersection   34
  35. 35. The  Taxonomy 35
  36. 36. Examples 36
  37. 37. Natural Language Processing 37
  38. 38. Recurrent  Linguistic  PaZerns "ʺ  You  should  add  a  new  bu/on  "ʺ add You should buRon a new subject auxiliary direct  object     determiner   aRribute   [someone]  should  add  [something] 38
  39. 39. NLP  Heuristics NLP  Heuristics  Examples   [someone]  should  add  [something] [something]  needs  to  be  [verb] [something]  is  required [something]  crashes [something]  is  unable  to  [verb] [something]  doesn’t  work please  adjust/fix  [something] [someone]  wants  to  know  [something] how  to  [verb]? [something]  provides/supports  [something] new  changes  include  [something] feature  request     problem  discovery     informa5on    seeking     informa5on  giving   39
  40. 40. NLP  Parser raw  text NLP  parser NLP  heuristics 40 feature  request     problem  discovery     informa5on    seeking     informa5on  giving     OTHERS  
  41. 41. Features  Extraction 41
  42. 42. Learning Phase and Evaluation 42
  43. 43. Learning  and  Evaluation 43 Classifying
  44. 44. Learning  and  Evaluation 44 Classifying
  45. 45. 45
  46. 46. 46 Are the language structure, content and sentiment information able to identify useful user reviews for software maintenance and evolution tasks? RQ1
  47. 47. Comparison  between  Structure,  Content     and  Sentiment  Information 47
  48. 48. Comparison  between  Structure,  Content     and  Sentiment  Information 48
  49. 49. Comparison  between  Structure,  Content     and  Sentiment  Information 49
  50. 50. Comparison  between  Structure,  Content     and  Sentiment  Information 50
  51. 51. 51 Does the combination of language structure, content and sentiment information produce better results than when used individually? RQ2
  52. 52. Results  of  The  Combination  of  Structure,   Content  and  Sentiment  information 52
  53. 53. Results  of  The  Combination  of  Structure,   Content  and  Sentiment  information 53
  54. 54. Results  of  The  Combination  of  Structure,   Content  and  Sentiment  information 54
  55. 55. Results  of  The  Combination  of  Structure,   Content  and  Sentiment  information 55
  56. 56. Is  it  Possible  Improve  the  Results? 56
  57. 57. Varying  the  SIZE    of     the  Training  Set 57
  58. 58. Varying  the  SIZE    of     the  Training  Set 58
  59. 59. 59
  60. 60. 60
  61. 61. 61
  62. 62. 62
  63. 63. 63
  64. 64. 64
  65. 65. 65 FUTURE  WORK
  66. 66. 66

×