Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
What Would Users Change in My App?
Summarizing App Reviews for
Recommending Software Changes.
Andrea Sebastiano Carol V. J...
OUTLINE
Context:
Manual v.s. Automated
Analysis of User Reviews
Proposed Solution:
Generating Summaries
of User Reviews
Ca...
Manual v.s. Automated
Analysis of User Reviews
V.S.
3
Maintenance of Mobile Applications
“About one third of app reviews
contain useful information for developers”
Pagano et. a...
Manual Analysis of Reviews
5
PAST WORK
Chen et al – ICSE 2014
Text Analysis to filter out
non-informative reviews
Topic Analysis to recognize
topics tr...
PAST WORK
Panichella et al – ICSME 2015
FEATURE REQUEST
PROBLEM DISCOVERY
INFORMATION SEEKING
INFORMATION GIVING
OTHER
Sen...
The Problem
Feature Requests Bug Reports
8
Generating
Summaries of User Reviews
SURF (Summarizer of User Review Feedback)
9
USER REVIEWS MODEL
10
USER REVIEWS MODEL
I love this app but it
crashes my whole iPad and it
has to restart itself
• User intention: Problem Dis...
SUMMARIZER OF USER REVIEW FEEDBACK
12
1. Data Collection1
13
2. Intention Classification2
machine
learning
14
3. Topics Classification3
15
3. Topics Classification3
16
3. Topics Classification3
Can't change position of icons on main
screen and can't close bookmarks icon too.
screen, trajec...
4. Sentence Scoring
Obs1) User feedback discussing bug reports and feature
requests are more important for developers than...
Obs1) User feedback discussing bug reports and feature requests are
more important for developers than all other reviews t...
Obs1) User feedback discussing bug reports and feature requests are
more important for developers than all other reviews t...
Obs1) User feedback discussing bug reports and feature requests are
more important for developers than all other reviews t...
5. Summary Generation5
23
Case Study
Involving 23 Developers
24
Case Study
Involving 23 Developers
3439
Reviews
25
Case Study
Involving 23 Developers
3439
Reviews
Of
17
Apps
26
Research Questions
RQ1: Is URM a robust and suitable model for representing user needs
in meaningful maintenance tasks for...
Study Procedure
28
TWO Experiments
Experiment I Experiment II
ITALY SWITZERLAND
NETHERLAND
JAPAN
29
TWO Experiments
Experiment I Experiment II
ITALY SWITZERLAND
NETHERLAND
JAPAN
30
TWO Experiments
Experiment I
ITALY
SWITZERLAND
NETHERLAND
31
TWO Experiments
Experiment I
ITALY
SWITZERLAND
NETHERLAND
1) Summaries
for 15
Apps
32
TWO Experiments
Experiment I
ITALY
SWITZERLAND
NETHERLAND
1) Summaries
for 15
Apps
2) Involving 16 Developers (6 were the ...
TWO Experiments
Experiment I
ITALY
SWITZERLAND
NETHERLAND
1) Summaries
for 15
Apps
2) Involving 16 Developers (6 were the ...
TWO Experiments
Experiment II
JAPAN
35
TWO Experiments
Experiment II
JAPAN
1) Summaries
Of 2
Apps
36
TWO Experiments
Experiment II
JAPAN
1) Summaries
Of 2
Apps
2) Involving 7 Employers from
37
TWO Experiments
Experiment II
Group 1 (3 subjects) Group 2 (4 subjects)
Experiment II-A
Experiment II-B
38
TWO Experiments
Experiment II
Group 1 (3 subjects) Group 2 (4 subjects)
Experiment II-A
Experiment II-B
Participants Class...
TWO Experiments
Experiment II
Group 1 (3 subjects) Group 2 (4 subjects)
Experiment II-A
Experiment II-B
Participants Class...
Is URM a robust and suitable model for
representing user needs in meaningful
maintenance tasks for developers?
RQ141
RQ1: Is URM a robust and suitable model for representing user
needs in meaningful maintenance tasks for developers?
Experi...
RQ1: Is URM a robust and suitable model for representing user
needs in meaningful maintenance tasks for developers?
Experi...
RQ1: Is URM a robust and suitable model for representing user
needs in meaningful maintenance tasks for developers?
78.26%...
RQ1: Is URM a robust and suitable model for representing user
needs in meaningful maintenance tasks for developers?
78.26%...
RQ1: Is URM a robust and suitable model for representing user
needs in meaningful maintenance tasks for developers?
78.26%...
RQ1: Is URM a robust and suitable model for representing user
needs in meaningful maintenance tasks for developers?
78.26%...
To what extent does a summarization technique
developed on top of URM help mobile
developers better understand the users' ...
RQ2: To what extent does a summarization technique developed
on top of URM help mobile developers better understand the
us...
RQ2:
The validation task performed
by the survey participants
highlights the very high
classification accuracy of
SURF, wh...
RQ2:
The validation task performed by
the survey participants highlights
the very high classification
accuracy of SURF, wh...
How do app review summaries generated
by SURF impact the time required by developers to
analyze user reviews?
52
How do app review summaries generated
by SURF impact the time required by developers to
analyze user reviews?
The time sav...
How do app review summaries generated
by SURF impact the time required by developers to
analyze user reviews?
The time sav...
How do app review summaries generated
by SURF impact the time required by developers to
analyze user reviews?
The time sav...
How do app review summaries generated
by SURF impact the time required by developers to
analyze user reviews?
The time sav...
How do app review summaries generated
by SURF impact the time required by developers to
analyze user reviews?
The time sav...
Quality of SURF’ Summaries
58
Quality of SURF’ Summaries
59
Quality of SURF’ Summaries
60
Conclusion
1) URM is a robust and suitable model for
representing user needs in meaningful
maintenance tasks for developer...
Thanks for the Attention!
Questions?
SURF (Summarizer of User Review Feedback)
Upcoming SlideShare
Loading in …5
×

What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

420 views

Published on

Mobile app developers constantly monitor feedback in user reviews with the goal of improving their mobile apps and better
meeting user expectations. Thus, automated approaches have
been proposed in literature with the aim of reducing the e ffort
required for analyzing feedback contained in user reviews via
automatic classi cation/prioritization according to specifi c
topics. In this paper, we introduce SURF (Summarizer of
User Reviews Feedback), a novel approach to condense the
enormous amount of information that developers of popular
apps have to manage due to user feedback received on a
daily basis. SURF relies on a conceptual model for capturing
user needs useful for developers performing maintenance and
evolution tasks. Then it uses sophisticated summarisation
techniques for summarizing thousands of reviews and generating
an interactive, structured and condensed agenda of
recommended software changes. We performed an end-to-end
evaluation of SURF on user reviews of 17 mobile apps (5 of
them developed by Sony Mobile), involving 23 developers
and researchers in total. Results demonstrate high accuracy
of SURF in summarizing reviews and the usefulness of the
recommended changes. In evaluating our approach we found
that SURF helps developers in better understanding user
needs, substantially reducing the time required by developers
compared to manually analyzing user (change) requests and
planning future software changes.

  • Be the first to comment

  • Be the first to like this

What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes.

  1. 1. What Would Users Change in My App? Summarizing App Reviews for Recommending Software Changes. Andrea Sebastiano Carol V. Junji Corrado A. Gerardo Harald Di Sorbo Panichella Alexandru Shimagaki Visaggio Canfora Gall UNIVERSITÀDEGLI STUDIDELSANNIO
  2. 2. OUTLINE Context: Manual v.s. Automated Analysis of User Reviews Proposed Solution: Generating Summaries of User Reviews Case Study: Assessment of the Summaries Involving 23 Developers Conclusion and Future Work 2
  3. 3. Manual v.s. Automated Analysis of User Reviews V.S. 3
  4. 4. Maintenance of Mobile Applications “About one third of app reviews contain useful information for developers” Pagano et. al. RE2013 4
  5. 5. Manual Analysis of Reviews 5
  6. 6. PAST WORK Chen et al – ICSE 2014 Text Analysis to filter out non-informative reviews Topic Analysis to recognize topics treated in the reviews classified as informative 6
  7. 7. PAST WORK Panichella et al – ICSME 2015 FEATURE REQUEST PROBLEM DISCOVERY INFORMATION SEEKING INFORMATION GIVING OTHER Sentiment Analysis + Natural Language Parsing + Text Analysis 7
  8. 8. The Problem Feature Requests Bug Reports 8
  9. 9. Generating Summaries of User Reviews SURF (Summarizer of User Review Feedback) 9
  10. 10. USER REVIEWS MODEL 10
  11. 11. USER REVIEWS MODEL I love this app but it crashes my whole iPad and it has to restart itself • User intention: Problem Discovery • Review topics: App, Model “…The User Reviews Model proposed by the authors is impressive in how it analyzes a review sentence by sentence and is able to characterize a sentence with multiple labels…” – one of FSE reviewers 11
  12. 12. SUMMARIZER OF USER REVIEW FEEDBACK 12
  13. 13. 1. Data Collection1 13
  14. 14. 2. Intention Classification2 machine learning 14
  15. 15. 3. Topics Classification3 15
  16. 16. 3. Topics Classification3 16
  17. 17. 3. Topics Classification3 Can't change position of icons on main screen and can't close bookmarks icon too. screen, trajectory, button, white, background, interface, usability, tap, switch, icon, orientation, position, picture, show, list, category, cover, scroll, touch, website, swipe, sensitive, view, roll, side, sort, click, small, colorful, glitch, page, corner, bookmark… GUI-related dictionary P (SENTENCE, GUI) = 5/14 = 0.357 17
  18. 18. 4. Sentence Scoring Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type. Intention Class Score Problem Discovery 3.0 Feature Request 3.0 Information Seeking 1.0 Information Giving 1.5 Other 0.5 4 IRSSENTENCE = 3.0 SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too 18
  19. 19. Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type. Obs2) Developers need reasonably useful sentences discussing specific aspect of an app with respect to other review sentences P (SENTENCE, GUI) = 5/14 = 0.357 4. Sentence Scoring4 SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too 19
  20. 20. Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type. Obs2) Developers need reasonably useful sentences discussing specific aspect of an app with respect to other review sentences. Obs3) Longer sentences are usually more informative than shorter ones. L SENTENCE = 80 4. Sentence Scoring4 SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too 20
  21. 21. Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type. Obs2) Developers need reasonably useful sentences discussing specific aspect of an app with respect to other review sentences. Obs3) Longer sentences are usually more informative than shorter ones. Obs4) Reviews treating frequently discussed features may attract more attention of developers than reviews dealing with features rarely used or discussed by users MFWR (SENTENCE,GUI) = 2/14 = 0.143 4. Sentence Scoring4 SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too 21
  22. 22. 5. Summary Generation5 23
  23. 23. Case Study Involving 23 Developers 24
  24. 24. Case Study Involving 23 Developers 3439 Reviews 25
  25. 25. Case Study Involving 23 Developers 3439 Reviews Of 17 Apps 26
  26. 26. Research Questions RQ1: Is URM a robust and suitable model for representing user needs in meaningful maintenance tasks for developers? RQ2: To what extent does a summarization technique developed on top of URM help mobile developers better understand the users' needs? URM 27
  27. 27. Study Procedure 28
  28. 28. TWO Experiments Experiment I Experiment II ITALY SWITZERLAND NETHERLAND JAPAN 29
  29. 29. TWO Experiments Experiment I Experiment II ITALY SWITZERLAND NETHERLAND JAPAN 30
  30. 30. TWO Experiments Experiment I ITALY SWITZERLAND NETHERLAND 31
  31. 31. TWO Experiments Experiment I ITALY SWITZERLAND NETHERLAND 1) Summaries for 15 Apps 32
  32. 32. TWO Experiments Experiment I ITALY SWITZERLAND NETHERLAND 1) Summaries for 15 Apps 2) Involving 16 Developers (6 were the original developers) 33
  33. 33. TWO Experiments Experiment I ITALY SWITZERLAND NETHERLAND 1) Summaries for 15 Apps 2) Involving 16 Developers (6 were the original developers) 3) We assigned to each participant an app. 34
  34. 34. TWO Experiments Experiment II JAPAN 35
  35. 35. TWO Experiments Experiment II JAPAN 1) Summaries Of 2 Apps 36
  36. 36. TWO Experiments Experiment II JAPAN 1) Summaries Of 2 Apps 2) Involving 7 Employers from 37
  37. 37. TWO Experiments Experiment II Group 1 (3 subjects) Group 2 (4 subjects) Experiment II-A Experiment II-B 38
  38. 38. TWO Experiments Experiment II Group 1 (3 subjects) Group 2 (4 subjects) Experiment II-A Experiment II-B Participants Classified Reviews according to URM Participants Classified Reviews according to URM 39
  39. 39. TWO Experiments Experiment II Group 1 (3 subjects) Group 2 (4 subjects) Experiment II-A Experiment II-B Participants Classified Reviews according to URM Participants Classified Reviews according to URM Participants Validated the summaries generated by SURF Participants Validated the summaries generated by SURF 40
  40. 40. Is URM a robust and suitable model for representing user needs in meaningful maintenance tasks for developers? RQ141
  41. 41. RQ1: Is URM a robust and suitable model for representing user needs in meaningful maintenance tasks for developers? Experiment I Experiment II& 42
  42. 42. RQ1: Is URM a robust and suitable model for representing user needs in meaningful maintenance tasks for developers? Experiment I Experiment II& 78.26% of participants declared that URM is not missing any relevant information and that the topics considered in URM are EXAUSTIVE. 43
  43. 43. RQ1: Is URM a robust and suitable model for representing user needs in meaningful maintenance tasks for developers? 78.26% of participants declared that URM is not missing any relevant information and that the topics considered in URM are EXAUSTIVE. Experiment I Experiment II& 82% of participants declared that the most important topics modeled in URM are the App, GUI and Feature or Functionality categories. 44
  44. 44. RQ1: Is URM a robust and suitable model for representing user needs in meaningful maintenance tasks for developers? 78.26% of participants declared that URM is not missing any relevant information and that the topics considered in URM are EXAUSTIVE. Experiment I Experiment II& 82% of participants declared that the most important topics modeled in URM are the App, GUI and Feature or Functionality categories. “I found the classification GUI- BUG, APP-BUG, etc very useful. . .” 45
  45. 45. RQ1: Is URM a robust and suitable model for representing user needs in meaningful maintenance tasks for developers? 78.26% of participants declared that URM is not missing any relevant information and that the topics considered in URM are EXAUSTIVE. Experiment I Experiment II& 82% of participants declared that the most important topics modeled in URM are the App, GUI and Feature or Functionality categories. “. . in case I'm searching for BUGs, I can just look for the category, instead of reading everything over and over again. . .” “I found the classification GUI- BUG, APP-BUG, etc very useful. . .” 46
  46. 46. RQ1: Is URM a robust and suitable model for representing user needs in meaningful maintenance tasks for developers? 78.26% of participants declared that URM is not missing any relevant information and that the topics considered in URM are EXAUSTIVE. Experiment I Experiment II& 82% of participants declared that the most important topics modeled in URM are the App, GUI and Feature or Functionality categories. “. . in case I'm searching for BUGs, I can just look for the category, instead of reading everything over and over again. . .” “I found the classification GUI- BUG, APP-BUG, etc very useful. . .” SUMMARY: Most of participants consider URM as a robust and suitable model for representing user needs in meaningful maintenance tasks for developers. 47
  47. 47. To what extent does a summarization technique developed on top of URM help mobile developers better understand the users' needs? RQ248
  48. 48. RQ2: To what extent does a summarization technique developed on top of URM help mobile developers better understand the users' needs? 49
  49. 49. RQ2: The validation task performed by the survey participants highlights the very high classification accuracy of SURF, which is 91%. To what extent does a summarization technique developed on top of URM help mobile developers better understand the users' needs? 50
  50. 50. RQ2: The validation task performed by the survey participants highlights the very high classification accuracy of SURF, which is 91%. To what extent does a summarization technique developed on top of URM help mobile developers better understand the users' needs? SURF works reasonable well in summarizing user feedback regarding change requests concerning GUI, APP, FEATURE improvements with the only exception of the maintenance topic “COMPANY”. 51
  51. 51. How do app review summaries generated by SURF impact the time required by developers to analyze user reviews? 52
  52. 52. How do app review summaries generated by SURF impact the time required by developers to analyze user reviews? The time saving capability of SURF perceived by all developers Is of at least 50%. 94% of participants believe that the time saving capability of SURF is of 75%. 53
  53. 53. How do app review summaries generated by SURF impact the time required by developers to analyze user reviews? The time saving capability of SURF perceived by all developers Is of at least 50%. 94% of participants believe that the time saving capability of SURF is of 75%. 54
  54. 54. How do app review summaries generated by SURF impact the time required by developers to analyze user reviews? The time saving capability of SURF perceived by all developers Is of at least 50%. 94% of participants believe that the time saving capability of SURF is of 75%. SURF helps to prevent more than 50% of the time required by developers for analyzing users feedback and planning software changes. 55
  55. 55. How do app review summaries generated by SURF impact the time required by developers to analyze user reviews? The time saving capability of SURF perceived by all developers Is of at least 50%. 94% of participants believe that the time saving capability of SURF is of 75%. SURF helps to prevent more than 50% of the time required by developers for analyzing users feedback and planning software changes. 66% of feedback manually extracted by the participants also appear in the summaries automatically generated by SURF. 56
  56. 56. How do app review summaries generated by SURF impact the time required by developers to analyze user reviews? The time saving capability of SURF perceived by all developers Is of at least 50%. 94% of participants believe that the time saving capability of SURF is of 75%. SURF helps to prevent more than 50% of the time required by developers for analyzing users feedback and planning software changes. 66% of feedback manually extracted by the participants also appear in the summaries automatically generated by SURF. SUMMARY: 1) SURF helps to prevent more than half of the time required by developers for analyzing users feedback and planning software changes. 2) 66% of manually extracted feedback appears also in the automatic generated summaries. 57
  57. 57. Quality of SURF’ Summaries 58
  58. 58. Quality of SURF’ Summaries 59
  59. 59. Quality of SURF’ Summaries 60
  60. 60. Conclusion 1) URM is a robust and suitable model for representing user needs in meaningful maintenance tasks for developers. 2) SURF helps to prevent more than half of the time required for analyzing users feedback and planning software changes. 3) 66% of manually extracted feedback appears also in the automatic generated summaries. V.S. 4) Summaries generated by SURF are reasonably correct, adequate, concise, and expressive. 61
  61. 61. Thanks for the Attention! Questions? SURF (Summarizer of User Review Feedback)

×