SlideShare a Scribd company logo
ROM: A Requirement Opinions Mining Method Preliminary
Try Based on Software Review Data
Ying Wang
School of Computer Science
Beijing Information and
Technology University
Beijing, China86-18811076140
2407556211@qq.com
Liwei Zheng
School of Computer Science
Beijing Information and
Technology University
Beijing, China86-13720020370
zlw@bistu.edu.cn
Ning Li
School of Computer Science
Beijing Information and
Technology University
Beijing, China86-17610895361
1498625198@qq.com
ABSTRACT
Requirement opinion mining aims to mine user opinions that can
be used to help the mining of software requirements from various
data sources. However, in the development of social network
systems, software application platforms or stores and other data
sources, the massive, noisy, non-standard data, makes the mining
of effective requirement opinions more difficult. Therefore, there
is less work in software requirements mining based on the data of
software review in development social media or application
market. This paper attempts to provide some knowledge support
for requirement user story establishing in RE based on the opinion
mining and clustering of massively software review data. First of
all, this paper combines the requirements of the requirements
engineering field to define the requirement opinions, functional
requirement opinions and non-functional requirements opinions.
Secondly, using the deep learning model to classify the functional
requirement reviews and non-functional requirements reviews
included in the reviews; Based on the differences between
functional data and non-functional data, this paper defines three
categories in the description of software functional data, and
chooses to use sequence labeling methods to identify functional
requirements. Then use the K-means clustering method based on
word vector to cluster the review data, and combine TF-IDF and
syntactic analysis to extract the aspect and aspect requirements or
specific requirements of the requirement opinion respectively, so
as to realize the requirement opinion mining of software review
data. Finally, this article will give a case study based on the user
review data of the mobile phone application service platform 360
mobile assistants.
CCS Concepts
• Software and its engineering → Software creation and
management →Designing software → Requirements analysis
Keywords
Review data; requirement opinion mining; clustering
1. INTRODUCTION
Each software development has a long process of requirement
acquisition and an iterative update. In the past, the requirement
acquisition was completed before the software development.
The software products developed by this method often failed to
meet the needs of the users, and the project end with failed [1]. In
order to overcome the shortcomings of the traditional methods,
the agile method came into being. This method focuses on the
frequent interaction between domain users and development teams,
the ability is to respond quickly to changes in the requirement, and
the development of software products that meet user needs in a
short period of time [2]. In this method, the user story is the most
important way for domain users to express their needs. This
method only provides a brief description of the intent [3], and
further needs to be communicated between the developer and the
domain user to form a system function description [4]. Therefore,
In order to express user requirement more accurately and have a
better software development, This paper aim to mine user
requirements from software review data, and attempts to provide
demand assistance to domain users and developers in the process
of software development user story discussions in agile
development.
Researchers at home and abroad have done a lot of research on the
online review data. Most of the research work reviews are mainly
news reviews, book reviews, and reviews of products and services.
These review data mainly express reviews and exchanges with
positive or negative emotions on certain kinds of things. Due to
restrictions on reviewed data, most studies extract aspects and
opinions from the review data and assist users to discover
valuable information in reviews, these jobs did not deliberately
dig out the needs of users. However, in recent years, with the
rapid development of mobile APP software, the reviews data on
mobile APP has also emerged in an endless stream, providing
sufficient data for many researchers. Chen et al. [5] mainly used a
pipeline method to extract opinions. First, define the problem set,
then classify the review data according to the problem set, and
then extract the topic. Although the method has achieved certain
effects, the definition of the problem set has a greater impact on
the final result, which requires too much labor time.
Under the experience and lessons of previous research work, this
paper gives a requirement opinion mining method based on
software review data (ROM). Firstly, combined with the
knowledge of the requirements engineering field, this paper
defines the requirement opinions, functional requirement opinions
and non-functional requirements opinions. Secondly, based on the
deep learning model, the software review data are classified into
functional requirement reviews and non-functional requirement
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights
for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee. Request permissions from
Permissions@acm.org.
ICMSS 2020, January 17–19, 2020, Wuhan, China
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7641-9/20/01…$15.00
DOI: https://doi.org/10.1145/3380625.3380665
26
reviews; Next, based on the differences between functional data
and non-functional data, this paper defines three categories on the
description of software functional data, and chooses to use
sequence labeling methods to identify functional requirements
opinion. K-means method based on word vector clusters classified
data, and the results of clustering are mined by TF-IDF and
syntactic analysis to mine specific requirement opinions. Finally,
We have a case study of the user review data of the platform 360
mobile assistants based on the current domestic large mobile
application services.
The second section of this paper discusses related work; The third
section gives some basic definitions involved in this method, and
gives the ROM framework; The fourth section details the method
of requirement opinion mining; the fifth section analyzes the
method of this paper. Section VI summarizes and looks forward to
it.
2. RELATED WORK
With the rapid development of social media platforms over the
past decade, many opinions mining methods based on online
review data have been spawned. Han et al. [6] comprehensively
reviewed the related research on the method of opinion mining.
He separately introduced the two dimensions about the aspects
and opinion of opinion mining. In the aspect extraction study, Jin
et al. [7] used an HMM model with vocabulary to extract aspects.
The method first establishes a set of words consisting of different
vocabularies and their corresponding part of speech, so as to
manually mark the aspects of the review text and the content of
the opinion, and then send them to the HMM. In addition, topic
model methods such as pLSA (probabilistic latent semantic
analysis) and LDA (latent Dirichlet allocation) are also used in
aspect extraction tasks. Hu et al. [8] first thought that nouns and
noun phrases were explicit aspects. The electronic product review
data is tagged with part of speech, and the Apriori algorithm-
based association rule mining method is used to find frequently
occurring nouns and noun phrases as candidate aspects, and then
the wrong phrase is filtered by the pruning algorithm to form
Aspect collection. In the study of the mining of opinions, most
statistical models, the mining opinion of aspect-based are
considered to be sequence tag problems [9]. Some researchers use
corpora to construct rules, preprocess corpus and word frequency
statistics, and then mine features and rules for statistical results,
and then use this rule to extract viewpoint words. In order to
reduce the manual labeling work, more and more scholars use
syntactic dependencies to extract opinion.
The method based on online review mining is mainly to mine
some news, books and products, and hope that users can find
valuable opinions in the reviews data more quickly and accurately,
and the merchants can provide better services for users. In recent
years, with the widespread use of mobile APP, there are more and
more software-based review data, and some research has also
appeared. Mainly adopt the method of classification or topic
extraction: (1) Firstly, classify user reviews. Panichella et al. [10]
found that text analysis, natural language processing, and
sentiment analysis combined to extract features from reviews, and
then use these features to train machine learning classifiers to get
the best classification. Maalej et al. [11] tried a variety of
techniques to process and classify user reviews, and found
through experiments that multiple binary classifiers are superior to
single multivariate classifiers. McIlroy et al. [12] focused on the
multi-label issue of reviews, and thought that a review may
contain multiple questions. The article raised 14 types of
questions and considered them to be independent of the specific
application. The machine learning classifiers such as Naive Bayes,
J48 decision tree and support vector machine are compared.
Finally, the support vector machine is used for classification.
Pagano et al. [13] investigated the specific content of user reviews
on the Apple App Store in 2013 and classified the content
according to the theme. Whether the method is suitable for the
Chinese application market needs further verification. (2) Some
studies use the methods adopted by the topic for review analysis.
Jiang et al. [14] proposed an associated LDA model for the
opinion mining problem domain and applied it to users' online
reviews. The above work only considers the method of
classification or topic extraction. chen et al. [5] combines the
methods of classification and topic extraction, obtains the types of
questions pointed out by the reviews through classification, and
obtains the specific software features in the reviews through topic
mining. A review analysis method RASL based on support vector
machine and topic model was raised.
Although the above work has achieved very objective results, the
subject extraction of the method is greatly affected by the
classified problem set, and it takes too much time for manual
labeling, and the user's needs are not carefully mined. Based on
the reduction of a large number of artificial work, this paper
intends to use the deep learning model to classify non-functional
and functional requirements, and select K-means based on word
vector to cluster the key information of requirement opinions, TF-
IDF and Syntactic analysis is used to mine the requirement
opinion mining of APP-based review data, and extract functional
and non-functional requirement opinions respectively to provide
clearer and more explicit requirement assistance for the
participants of the user story.
3. DEFINITIONS AND THE ROM
FRAMEWORK
This paper combines knowledge in the field of opinion mining
and requirements engineering to define requirement opinions. In
the definition of opinions, Kim et al. [15] proposed to define the
concept of opinions from four perspectives, namely, the aspect,
the opinion holder, the expression and the sentiment. And the four
are related to each other. For a certain subject, the opinion holder
expresses a review containing a certain emotional attitude, which
is the content of the opinion. We expanded the requirement
opinion based on the definition of the opinion. For a certain aspect
of software, the software user expresses some reviews with
emotional attitudes or questionable reviews for the software with
certain specific needs, called requirement opinions, which can be
formalized into the following four-tuples, as defined below:
Definition 1:
Requirement opinions:=
< Software, aspects, software users, emotional attitudes +
aspects of requirements / specific requirements >
For example WeChat: (1) ‘What is the reason why the position is
not used after the upgrade? (2) ‘What version of the particulate
loan is available? My WeChat does not have this function, is it not
qualified enough? In these two reviews, for the WeChat software,
(1) pointed out that the position update after the version update is
not as good as before, and the evaluation of the location of the
new version is low. (2) It is pointed out that this user's WeChat
did not find the particulate loan function, indicating that he has
this requirement.
27
Software requirements are generally divided into two categories:
functional requirements and non-functional requirements [16]. We
define functional and non-functional requirements separately.
Functional requirements have a uniform definition, which is the
function of the system or the behavior of the system under certain
conditions. We define the functional requirements opinion
according to this definition: The things of software specific to do
can be described by a set of requirements consisting of functions
and behavior points, as defined as Definition 2 below:
Definition 2:
Functional requirements opinions:=
<Function/specific behavior description, requirement
opinion >
Non-functional requirements are oriented toward the overall
attributes of the software. They usually describe the extent to
which the software satisfies certain attributes and is difficult to
express in a unified way [16]. Jia et al. [17] has divided the non-
functional requirements of the non-functional requirements into
the five categories of non-functional requirements such as
‘performance’, ‘reliability’, ‘availability’, ‘security’,
‘maintainability’. We refer to five categories of non-functional
requirements types to define non-functional requirements. We
classify the review data with these features as non-functional data,
as defined below:
Definition 3:
Non-functional requirement opinions:=
<Five major categories of non-functional requirements
description, requirement reviews>
The preliminary characteristics of the five categories of non-
functional requirements are based on the non-functional
requirements of Jia et al. [17] and their description vocabulary.
Vocabulary specific to the software requirements will be
developed in our next step. Some of the characteristic words are
shown in Table 1 below:
Table 1. Part of the characteristic word display
Aspect Characteristic word
performance
performance 、 high performance 、 time 、 response 、
reaction、delay time、bandwidth、capacity、space、
wait、Calculation、Occupy...
reliability
reliability、 reliable 、 stable 、 complete 、 Consisten 、
Compatible 、 effective 、 Correctness 、 serious 、
malfunction、Failure Rate...
availability
availability 、 Easy to learn 、 Easy to use 、
Understandable 、 operating 、 Attraction) 、 output 、
productive forces、benefit、Experience、interface...
security
security、safety、secret、password、visit、control、
access、jurisdiction、identity、verification、invade、
firewall、reveal...
maintainability
maintainability、maintain、test、detection、analyze、
cohesion 、 coupling 、 module 、 portable 、 reuse 、
quality、code...
Based on the above definitions, the framework of the requirement
opinion mining method (ROMc) is shown in Figure 1. Firstly,
based on the 360 mobile application platform, the review data of
various software is crawled. For details of the crawled content, see
the data description module in Chapter 5, and then classify the
functional data and non-functional data respectively. Extracting
the aspect-level requirements based on this software in these two
types of data, This applies to user story information assistance for
software improvement and software development.
Participants of user story
Non-Function
Requirements
Function -
Requirement
Review data
Classifier
Model
(deep
learning/statistic/r
egulation)
Classified
requirement
s
input
Requirement
opinions
Software
update/developme
nt
assist
train
storage
take out
Review -
datas
requirements
Requireme
nt opinions remand
Figure 1. The framework of ROM.
4. ROM (REQUIREMENT OPINION
MINING METHOD)
4.1 Classification of Requirement Opinions
Based on Deep Learning Methods
The classification techniques selected for different tasks in
domestic and foreign research are also different [18], This paper
chooses a deep learning method to classify the data, which is
divided into two stages: data preprocessing stage and model
training and prediction stage. The data preprocessing stage is
relatively simple, and the main job is to label the data for training
of the classification model. Here, the non-functional data is
marked as 0, the functional data is marked as 1, and the useless
data is marked as 2 based on the definition of functional and non-
functional requirements. After the marked corpus is divided into
training sets, verification sets, and test sets according to a certain
proportion, training can be performed. Here, this article chooses to
use the deep learning method for training (the model structure and
parameter details are not discussed here), the specific training
steps are as follows:
Training steps:
1) Random training samples from the training set 𝑿𝒊 (If it is
batch mode, it is to select multiple samples at the same
time.)
2) Forward propagation of 𝑿𝒊 under the current parameter W
to obtain the loss value
3) According to the chain rule, the backward propagation is
used to obtain the gradient value
𝝏𝒍𝒐𝒔𝒔
𝝏𝒘
4) Update parameter values 𝒘 ← 𝒘 − 𝜶 ∗
𝝏𝒍𝒐𝒔𝒔
𝝏𝒘
5) Cycle through steps 1~4 until the loss meets the target or
reaches the specified number of training rounds and
terminates the training.
28
In the process of training, the model can be saved according to the
situation. The saved model is used for prediction. The steps of
neural network prediction are similar to the training process, Just
no backpropagation update parameters are needed, and no
expansion is performed here. In this paper, the predicted data is
unlabeled review data. After the model is predicted, the data can
be classified into three categories according to the model-
predicted tags for further mining. The complete process for the
classification of requirement opinions on supervised methods is
shown in Figure 2.
Figure 2. The Classification flow chart.
In the classification stage of the review data, we first manually
label the native corpus, and then put the labeled corpus into the
model for training. Then save the model to predict the native
corpus, so as to realize the classification of functional and non-
functional data.
4.2 Extraction of Requirement Opinions
Based on Classified Data
As shown in definitions 2 and 3 above, functional data mainly
refers to the functions implemented by the system and reviews
under certain conditions, For example the review data: “Why
didn't WeChat care about tips? It is recommended to set a special
concern sound, and the position after the update is not easy to
use.” which mainly addresses the specific needs of users, In
functional requirements data, we get the functional requirement
opinions of ‘set a special concern sound’ ‘the position after the
update is not easy to use’ and so on. Non-functional data mainly
refers to reviews based on aspects and aspect requirements in
terms of performance, reliability, security and other characteristics.
For example the review data: “Garbage application, Not only
takes up a lot of memory, but also full of ads” This review data
points out the user's emotional needs for performance and
usability in this app, Currently we do not distinguish between
emotional attitudes, We believe that in software development and
iterative updates, both positive and negative attitudes indicate user
needs. Ideally, we get the non-functional requirement opinions of
‘takes up a lot of memory’ ‘full of ads’ and so on. In view of the
large differences between the types of non-functional data and
functional data, We use different methods for different data sets.
We detail the following:
4.2.1 Method for Extracting Requirement Opinions
based on Functional Data
This paper uses BiLSTM + CRF as the model of sequence
labeling, defines three types of functional data descriptions, and
identifies and extracts them. In previous work, We use sequence
labeling method to define and justify the identification of specific
software functions [19], We will not go into details here. This
article redefines software feature categories from another
dimension. In view of the characteristics of functional data based
on software reviews, software function descriptions can be further
divided into three categories, They are function loss (FL), function
improvement (FI), and function complement (FC).It means that
this software is missing some functions required by users, a
certain function already exists but needs to be improved, this
function already exists and has achieved a good user experience.
Examples of category review data are as table 2 show:
Table 2. The category of functional review data
Category Examples
Function improvement ( FI) ‘Chat history is out of sync’...
Function complement(FC) ‘Good, WeChat payment is very convenient’...
Fuction loss(FL) ‘It is recommended to set a special care tone.’...
The framework of the model is mainly composed of a word
embedding layer, a Bi-LSTM layer, and an output layer. It mainly
identifies the functional review data descriptions. Ideally, we get
three types of descriptions of functional requirements, and we
extract the requirements separately. Let's take the function loss
(FL) data as an example. The model framework is as follows in
Figure 3:
Figure 3. Model framework.
4.2.2 Method for Extracting Requirement Opinions
based on Non-functional Data
In view of the many types of software involved in non-functional
data, in order to reduce the manual labeling work, we use the
traditional unsupervised algorithm K-means combined with Tf-idf
and syntactic analysis (hanlp) for requirement opinion extraction.
We perform algorithm demonstration on classified data. Here we
choose to use non-functional requirement review data (NFD). The
detailed algorithm flow is as follows.
Algorithm Description:
Input: collection of text to be clustered
𝑫 = {𝑵𝑭𝑫𝟏, 𝑵𝑭𝑫𝟐, … , 𝑵𝑭𝑫𝑵}, Number of
clusters K.
Output: clustering: {𝑺𝟏, 𝑺𝟐, … , 𝑺𝑲}.
1. Randomly select K samples in D as the initial mean
vector
2. while (when the algorithm convergence condition is not
met):
29
3. for i=1, … , N
4. for k=1, …, K
5. Calculate the distance d
6. 𝒅(𝑵𝑭𝑫𝒊, 𝒎𝒌) = ‖𝑵𝑭𝑫𝒊 − 𝒎𝒌‖𝟐
of the
sample
𝑵𝑭𝑫𝒊to 𝒎𝒌
7. Divide the sample 𝑵𝑭𝑫𝒊 into the cluster where the
nearest mean vector is located 𝒂𝒓𝒈𝐦𝐢𝐧
𝒌
{𝒅(𝑵𝑭𝑫𝒊, 𝒎𝒌)}
8. for i = 1, …, K
9. Update each cluster mean vector:
𝒎𝒌
𝒏𝒆𝒘
=
𝟏
|𝑺𝒌|
∑ 𝑵𝑭𝑫𝒊
𝑵𝑭𝑫𝒊∈𝑺𝒌
In the K-means clustering algorithm, two key points are involved.
The first is the similarity calculation method. In the above
algorithm flow, the Euclidean distance is used as the default
vector similarity calculation method. In the specific application, it
can also be replaced, for example, using cosine as the evaluation
index of similarity. The second point is the representation of this
article. There are two commonly used text representation methods,
One-hot representation method, and word vector representation
method. Text representation in the form of One-hot, although the
calculation is simple, the effect is significantly different from the
representation method of the word vector. Therefore, this paper
chooses the method of using the word vector to represent the text.
After determining the above two points, the clustering algorithm
can be performed. First, the divided text is preprocessed by word
segmentation, part-of-speech tagging, etc., and the word vector is
trained using word2vec. Next, the review text is converted to a
vector representation based on the trained word vector. Then, the
clustering algorithm is called to complete the text clustering. After
the text clustering is completed, we can get K clusters. Ideally, we
think that these K clusters contain the aspect and aspect
requirements or specific requirements we want to mine.
After the clustering algorithm obtains K clusters, we need to
further extract the requirement reviews to get fine-grained aspects
(keywords) and aspect requirements or specific requirements.
There are many ways to further extract aspects (keywords), which
can be done by TF, LDA, Textrank, TF-IDF, etc. Here we choose
to use TF-IDF. The TF in the TF-IDF algorithm represents the
word frequency, and the IDF represents the inverse document
frequency. The word frequency indicates the number of times a
word appears in the current text, and it is assumed that the high-
frequency word contains more information characteristics than the
low-frequency word, so the higher the word frequency, the more
important. The calculation of TF is expressed as follows:
𝑇𝐹𝑖 = 𝑁(𝑡𝑖, 𝑑) (1)
where 𝑡𝑖 represents a word, d represents a document, and N
represents the number of times a word is in a corresponding
document. The document frequency (DF) indicates the number of
documents containing a word in all corpora. The higher the DF
value of a word, the lower the amount of effective information it
contains. Therefore, IDF essentially reflects the importance of
features in the entire corpus. The formula is defined as follows:
𝑖𝑑𝑓𝑖 = log
𝑁
𝑑𝑓𝑖
(2)
where 𝑖𝑑𝑓𝑖 represents the DF of the word 𝑡𝑖, and N is the total
number of documents in the corpus. After calculating the TF and
the IDF, the results of the two are multiplied to obtain the final
TF-IDF value. Intuitively, the TF-IDF algorithm believes that the
most critical point of distinguishing text should be that there are
enough occurrences of the current text and fewer words appear in
all the texts globally.
When applying the TF-IDF algorithm on a clustered cluster, a
basic transformation is required. Here, you only need to treat all
the text in one cluster as a whole, and all the clusters can be used
as a corpus.
With the TF-IDF algorithm, we can extract the keywords (aspects)
in each cluster. The nouns, noun phrases, gerunds, etc. in the
cluster are used as aspects of the user's requirement opinions.
Then re-excavate the review data in the cluster, using the HanLP
tool for syntactic analysis. Establish rules based on relationship
between subject-predicate and verb-object relationship, Adjectives,
verbs, adverb combinations, verb combinations, etc. as the content
of requirement opinions based on aspects with emotional attitudes,
Thus extracting aspect requirements or specific requirements in
the requirement opinion according to the rules. Based on this, we
will dig out the requirement opinions we need.
5. CASE STUDY
5.1 Data Crawling and Description
360 mobile assistant is an application platform with a large
domestic market share, providing a series of services [5] such as
uninstalling, installing, upgrading and evaluating mobile
applications. We currently only crawl the top ten APP review data
of various types of 360 mobile assistants. In each APP's review
data is divided into three levels of reviews, they are good, middle
and bad reviews. This article does not distinguish the three levels
of reviews, no matter what kind of emotions, it may contain
requirements. We crawled the top ten reviews of each app and did
statistics on each type of review data. Later work will further
crawl the data of the 360 mobile assistant according to the needs.
And combine the data of each mobile application market to
achieve a more comprehensive data set. preventing the
requirement opinions of a certain software from being different in
different application markets. so as to achieve more
comprehensive requirement opinions mining. The crawling
statistics of various software review data are as follows in Table 3.
Table 3. Statistics of each category review data
Application category Number of reviews
Theme & wallpaper 16129
Health & care 5291
Office & business 15405
Map & travel 15497
Av audio-visual 11789
Picture & video 11337
Education & study 11124
News & read 10710
Life & leisure 9699
Communication social 8050
Financial management 14900
To facilitate the analysis and display, we select some review data
of the communication social WeChat and QQ for example
analysis. First, we preprocess the crawled data. According to the
30
definition of non-functional requirements and functional
requirements in Chapter 3, we mark non-functional data as 0,
functional data as 1, and useless data as 2. The labeled data is
shown in the following Table 4.
Table 4. Data label display
Software
Name
Review data Label
WeChat It is recommended to set a special care tone 1
WeChat Take up a lot of memory 0
WeChat
The most garbage application, full of
advertising, memory thief, send a file and
various restrictions, not easy to use.
0
WeChat
Do you dare to let Ma Huateng Ma Yun go
bankrupt?
2
QQ Garbage, often numbered 0
QQ The new version cannot directly collect text. 1
QQ
I have been playing QQ for five or six years, it
is really my youth.
2
QQ Take up too much space 0
5.2 Requirement Opinion Classification
First, the labeled corpus is divided into training set, verification
set and test set according to the allocation of 7:2:1. The classified
data set is used to train the classification model, and the better-
performing model is saved. Next, load the trained model, classify
it on the unlabeled review corpus, and divide the review into three
parts. We select some of the review data mentioned above to show
the ideal classification effect, as shown in Figure 4.
Figure 4. Classification model ideal rendering.
5.3 Requirement Opinion Mining
After classifying the data, we process the functional and non-
functional data separately. For functional data, we divided the
functional descriptions into three categories. FL, FI, and FC, and
manually labeled them. Use BiLstm + Crf model for training and
prediction on new data to get functional requirements descriptions
for each category. The expected result is shown in Table 5.
For these three categories of requirement descriptions, We directly
extract opinions as functional requirements, The next step can be
fine-grained mining of FC and FI data.
Table 5. The expected result of functional requirement
opinions
Category Description
FL
(Fuction loss)
Input: ‘ It is recommended to set a special care tone.’
Output: B-FC M-FC . ... E-FC
FI
(Fuction improvement)
Input:‘Chat history is out of sync’
Output: B-FL M-BL .... E-BL
FC
(Function
complement)
Input:‘Good, WeChat payment is very convenient’..
Output: B-FI M-BI ... . E-BI
In non-functional data, some examples of WeChat data are
analyzed. First, cluster the requirement opinions to get clusters
that are clustered by certain requirements. Secondly, the
information of keywords (aspects) in the cluster is extracted by
TF-IDF. Finally, based on syntactic analysis, the aspect
requirements are extracted. The ideal result extraction result is
shown in the following figure 5.
Figure 5. The respected result of non-functional requirement
opinions.
The final expected results are shown in the chart below Figure 5.
In the category of social and communication, the requirement
opinion data for WeChat and QQ is divided into non-functional
requirement opinions and functional requirement opinions. The
classification of software requirements comments is convenient
for participants in the user story to inquire the requirements
information in software development and software iterative update.
As shown as figure 6.
Figure 6. Partial mining category display.
31
Based on the non-functional requirements comments and
functional requirements of QQ and WeChat based on the
comment data, the results of the demand opinion mining are
shown in Table 6.
Table 6. Requirement opinion mining part results
Software NFROs FROs
WeChat
Take up a lot of memory
More advertising...
set a special care
tone
Real name
certification requires
a bank card...
QQ
More frequent ringing
Take up a lot of memory...
Cannot collect text(
Chat history is out of
sync...
Finally, because the user requirements of each software are
different, but there is a common requirement under the same kind.
For example, in the reviews on the user reviews of WeChat and
QQ, the issue of ‘occupied memory’ is mentioned. Under the
premise of utilizing the uniqueness, We use statistics on non-
functional demand opinions and functional demand opinions
under the premise of using uniqueness. Find out the user’attention
point in non-functional requirements data and functional
requirements data under the communication social category. Store
uniqueness and commonality in some form. It is more convenient
for the participants of the user story to make inquiries about the
requirement comments. So that the method completes the
requirement information auxiliary work. The specific application
process simulation is shown in the following figure7:
Figure 7. Requirement opinion application flow.
6. CONCLUSION AND FUTUREWORKS
This paper proposes a requirement opinion mining method based
on software user reviews data, which aims to find the user's
requirement point for the software, to help the participants of the
user story in the required project to carry out the requirement
information assistance of software improvement or software
development. First, we define requirement opinions, functional
requirements opinions, and non-functional opinions based on the
definitions of opinion mining and requirements engineering. Then
introduce our ROM. We first need to get enough software reviews,
then extract some opinions about the requirements, and mark the
requirements, by the using of the corpus of the labeled to train the
text classification deep learning model, which allows the
annotated corpus to be predicted by the model. Thereby obtaining
3 aspects of data. Next, Targeting different characteristics of
functional and non-functional data, This article adopts BiLSTM +
CRF method and the clustering algorithm based on word vector,
TF-IDF and syntax analysis is used to mine the fine-grained
requirements, and the user's requirement opinions are mined. This
paper mainly realizes the initial construction of the idea of
requirement opinion mining method. The next step will be to
implement the method of requirement opinion mining in this
paper and make adjustments to the model and method according
to the specific problems encountered.
7. REFERENCES
[1] Boehm, B. and Turner, R., 2005. Management challenges to
implementing agile processes in traditional development
organizations. IEEE software, 22(5), pp.30-39.
[2] Cao, L. and Ramesh, B., 2008. Agile requirem000ooents
engineering practices: An empirical study. IEEE
software, 25(1), pp.60-67.
[3] Cohn, M., 2004. User stories applied: For agile software
development. Addison-Wesley Professional.
[4] Wang, CH., Jin, Z., Zhao, HY., Liu, L., Zhang, W. and Cui,
MY., 2019. Human-assisted elicitation and evolution of user
stories with scenarios. Ruan Jian Xue Bao(Chinses Journal
of Software), 30(10), pp.3186-3205.
[5] Chen, Q., Zhang, L., Jiang, J. and Huang, XY., 2018. Review
Analysis Method Based on Support Vector Machine and
Latent Dirichlet Allocation. Ruan Jian Xue Bao(Chinses
Journal of Software), 30(5), pp.1547-1560.
[6] Han, ZM., Li, MQ., Liu, W., Zhang, MM., Duan, DG. and
Yu, CC., 2017. Survey of Studies on Aspect-Based Opinion
Mining of Internet. Ruan Jian Xue Bao(Chinses Journal of
Software), 29(02), pp.417-441.
[7] Jin, W., Ho, H.H. and Srihari, R.K., 2009, June. A novel
lexicalized HMM-based learning framework for web opinion
mining. In Proceedings of the 26th annual international
conference on machine learning, Citeseer, pp.465-472.
[8] Hu, M. and Liu, B., 2004, August. Mining and summarizing
customer reviews. In Proceedings of the tenth ACM
SIGKDD international conference on Knowledge discovery
and data mining, ACM, pp.168-177.
[9] Sterckx, L., Caragea, C., Demeester, T. and Develder, C.,
2016, November. Supervised keyphrase extraction as
positive unlabeled learning. In Proceedings of the 2016
Conference on Empirical Methods in Natural Language
Processing, pp.1924-1929.
[10] Panichella, S., Di Sorbo, A., Guzman, E., Visaggio, C.A.,
Canfora, G. and Gall, H.C., 2015, September. How can i
improve my app? classifying user reviews for software
maintenance and evolution. In 2015 IEEE international
conference on software maintenance and evolution,
IEEE, pp.281-290.
[11] Maalej, W. and Nabil, H., 2015, August. Bug report, feature
request, or simply praise? on automatically classifying app
reviews. In 2015 IEEE 23rd international requirements
engineering conference, IEEE, pp.116-125.
[12] McIlroy, S., Ali, N., Khalid, H. and Hassan, A.E., 2016.
Analyzing and automatically labelling the types of user
issues that are raised in mobile app reviews. Empirical
Software Engineering, 21(3), pp.1067-1106.
[13] Pagano, D. and Maalej, W., 2013, July. User feedback in the
appstore: An empirical study. In 2013 21st IEEE
international requirements engineering conference,
IEEE, pp.125-134.
[14] Jiang, W., Zhang, L., Dai, Y., Jiang, J. and Wang, G., 2013.
Analyzing helpfulness of online reviews for user
32
requirements elicitation. Jisuanji Xuebao(Chinese Journal of
Computers), 36(1), pp.119-131.
[15] Kim, S.M. and Hovy, E., 2004, August. Determining the
sentiment of opinions. In Proceedings of the 20th
international conference on Computational Linguistics,
Association for Computational Linguistics, p. 1367.
[16] Luo, XX., Li, ZH. and Zhao, YJ., 2015. Overviews on
software non-functional requirements at home and abroad.
Application Research of Computer, 32(4), pp.972-977.
[17] Jia, YD. And Liu, L., 2019. Recognition and Classification of
Non-functional Requirements in Chinese. Ruan Jian Xue
Bao(Chinses Journal of Software), 30(10), pp.3115-3126.
[18] Nikolai A. K. Steur and Carsten Mueller, 2019.
"Classification of Viral Hemorrhagic Fever Focusing Ebola
and Lassa Fever Using Neural Networks," International
Journal of Machine Learning and Computing vol. 9, pp. 334-
343.
[19] N. Li, L. Zheng, Y. Wang and B. Wang, 2019. "Feature-
Specific Named Entity Recognition in Software
Development Social Content," 2019 IEEE International
Conference on Smart Internet of Things (SmartIoT), pp. 175-
182.
33

More Related Content

Similar to 2.pdf

Ijmer 46067276
Ijmer 46067276Ijmer 46067276
Ijmer 46067276
IJMER
 
Ijmer 46067276
Ijmer 46067276Ijmer 46067276
Ijmer 46067276
IJMER
 
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
csandit
 
IRJET- Hybrid Recommendation System for Movies
IRJET-  	  Hybrid Recommendation System for MoviesIRJET-  	  Hybrid Recommendation System for Movies
IRJET- Hybrid Recommendation System for Movies
IRJET Journal
 
Recommendation System Using Social Networking
Recommendation System Using Social Networking Recommendation System Using Social Networking
Recommendation System Using Social Networking
ijcseit
 
A Survey on Recommendation System based on Knowledge Graph and Machine Learning
A Survey on Recommendation System based on Knowledge Graph and Machine LearningA Survey on Recommendation System based on Knowledge Graph and Machine Learning
A Survey on Recommendation System based on Knowledge Graph and Machine Learning
IRJET Journal
 
Sub1583
Sub1583Sub1583
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET Journal
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
inventionjournals
 
Tourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation SystemTourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation System
IRJET Journal
 
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND R
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND RANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND R
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND R
ijaia
 
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R
gerogepatton
 
IRJET- A New Approach to Product Recommendation Systems
IRJET- A New Approach to Product Recommendation SystemsIRJET- A New Approach to Product Recommendation Systems
IRJET- A New Approach to Product Recommendation Systems
IRJET Journal
 
IRJET- A New Approach to Product Recommendation Systems
IRJET-  	  A New Approach to Product Recommendation SystemsIRJET-  	  A New Approach to Product Recommendation Systems
IRJET- A New Approach to Product Recommendation Systems
IRJET Journal
 
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
cscpconf
 
Video Commercial Image Preference Study Through The Web Analytical Tool
Video Commercial Image Preference Study Through The Web Analytical ToolVideo Commercial Image Preference Study Through The Web Analytical Tool
Video Commercial Image Preference Study Through The Web Analytical Tool
CSCJournals
 
DEVELOPMENT OF WEB APPLICATION FOR PACKAGING DESIGN
DEVELOPMENT OF WEB APPLICATION FOR PACKAGING DESIGNDEVELOPMENT OF WEB APPLICATION FOR PACKAGING DESIGN
DEVELOPMENT OF WEB APPLICATION FOR PACKAGING DESIGN
ijma
 
Recommendation system (1).pptx
Recommendation system (1).pptxRecommendation system (1).pptx
Recommendation system (1).pptx
prathammishra28
 
recommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdfrecommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdf
13DikshaDatir
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
vivatechijri
 

Similar to 2.pdf (20)

Ijmer 46067276
Ijmer 46067276Ijmer 46067276
Ijmer 46067276
 
Ijmer 46067276
Ijmer 46067276Ijmer 46067276
Ijmer 46067276
 
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
 
IRJET- Hybrid Recommendation System for Movies
IRJET-  	  Hybrid Recommendation System for MoviesIRJET-  	  Hybrid Recommendation System for Movies
IRJET- Hybrid Recommendation System for Movies
 
Recommendation System Using Social Networking
Recommendation System Using Social Networking Recommendation System Using Social Networking
Recommendation System Using Social Networking
 
A Survey on Recommendation System based on Knowledge Graph and Machine Learning
A Survey on Recommendation System based on Knowledge Graph and Machine LearningA Survey on Recommendation System based on Knowledge Graph and Machine Learning
A Survey on Recommendation System based on Knowledge Graph and Machine Learning
 
Sub1583
Sub1583Sub1583
Sub1583
 
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
IRJET- Text-based Domain and Image Categorization of Google Search Engine usi...
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
 
Tourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation SystemTourism Based Hybrid Recommendation System
Tourism Based Hybrid Recommendation System
 
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND R
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND RANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND R
ANALYSIS OF ENTERPRISE SHARED RESOURCE INVOCATION SCHEME BASED ON HADOOP AND R
 
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R
Analysis of Enterprise Shared Resource Invocation Scheme based on Hadoop and R
 
IRJET- A New Approach to Product Recommendation Systems
IRJET- A New Approach to Product Recommendation SystemsIRJET- A New Approach to Product Recommendation Systems
IRJET- A New Approach to Product Recommendation Systems
 
IRJET- A New Approach to Product Recommendation Systems
IRJET-  	  A New Approach to Product Recommendation SystemsIRJET-  	  A New Approach to Product Recommendation Systems
IRJET- A New Approach to Product Recommendation Systems
 
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
LINKING SOFTWARE DEVELOPMENT PHASE AND PRODUCT ATTRIBUTES WITH USER EVALUATIO...
 
Video Commercial Image Preference Study Through The Web Analytical Tool
Video Commercial Image Preference Study Through The Web Analytical ToolVideo Commercial Image Preference Study Through The Web Analytical Tool
Video Commercial Image Preference Study Through The Web Analytical Tool
 
DEVELOPMENT OF WEB APPLICATION FOR PACKAGING DESIGN
DEVELOPMENT OF WEB APPLICATION FOR PACKAGING DESIGNDEVELOPMENT OF WEB APPLICATION FOR PACKAGING DESIGN
DEVELOPMENT OF WEB APPLICATION FOR PACKAGING DESIGN
 
Recommendation system (1).pptx
Recommendation system (1).pptxRecommendation system (1).pptx
Recommendation system (1).pptx
 
recommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdfrecommendationsystem1-221109055232-c8b46131.pdf
recommendationsystem1-221109055232-c8b46131.pdf
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 

Recently uploaded

Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
mamamaam477
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
zubairahmad848137
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
NazakatAliKhoso2
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
abbyasa1014
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
Madan Karki
 

Recently uploaded (20)

Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Textile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdfTextile Chemical Processing and Dyeing.pdf
Textile Chemical Processing and Dyeing.pdf
 
Engineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdfEngineering Drawings Lecture Detail Drawings 2014.pdf
Engineering Drawings Lecture Detail Drawings 2014.pdf
 
ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
spirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptxspirit beverages ppt without graphics.pptx
spirit beverages ppt without graphics.pptx
 

2.pdf

  • 1. ROM: A Requirement Opinions Mining Method Preliminary Try Based on Software Review Data Ying Wang School of Computer Science Beijing Information and Technology University Beijing, China86-18811076140 2407556211@qq.com Liwei Zheng School of Computer Science Beijing Information and Technology University Beijing, China86-13720020370 zlw@bistu.edu.cn Ning Li School of Computer Science Beijing Information and Technology University Beijing, China86-17610895361 1498625198@qq.com ABSTRACT Requirement opinion mining aims to mine user opinions that can be used to help the mining of software requirements from various data sources. However, in the development of social network systems, software application platforms or stores and other data sources, the massive, noisy, non-standard data, makes the mining of effective requirement opinions more difficult. Therefore, there is less work in software requirements mining based on the data of software review in development social media or application market. This paper attempts to provide some knowledge support for requirement user story establishing in RE based on the opinion mining and clustering of massively software review data. First of all, this paper combines the requirements of the requirements engineering field to define the requirement opinions, functional requirement opinions and non-functional requirements opinions. Secondly, using the deep learning model to classify the functional requirement reviews and non-functional requirements reviews included in the reviews; Based on the differences between functional data and non-functional data, this paper defines three categories in the description of software functional data, and chooses to use sequence labeling methods to identify functional requirements. Then use the K-means clustering method based on word vector to cluster the review data, and combine TF-IDF and syntactic analysis to extract the aspect and aspect requirements or specific requirements of the requirement opinion respectively, so as to realize the requirement opinion mining of software review data. Finally, this article will give a case study based on the user review data of the mobile phone application service platform 360 mobile assistants. CCS Concepts • Software and its engineering → Software creation and management →Designing software → Requirements analysis Keywords Review data; requirement opinion mining; clustering 1. INTRODUCTION Each software development has a long process of requirement acquisition and an iterative update. In the past, the requirement acquisition was completed before the software development. The software products developed by this method often failed to meet the needs of the users, and the project end with failed [1]. In order to overcome the shortcomings of the traditional methods, the agile method came into being. This method focuses on the frequent interaction between domain users and development teams, the ability is to respond quickly to changes in the requirement, and the development of software products that meet user needs in a short period of time [2]. In this method, the user story is the most important way for domain users to express their needs. This method only provides a brief description of the intent [3], and further needs to be communicated between the developer and the domain user to form a system function description [4]. Therefore, In order to express user requirement more accurately and have a better software development, This paper aim to mine user requirements from software review data, and attempts to provide demand assistance to domain users and developers in the process of software development user story discussions in agile development. Researchers at home and abroad have done a lot of research on the online review data. Most of the research work reviews are mainly news reviews, book reviews, and reviews of products and services. These review data mainly express reviews and exchanges with positive or negative emotions on certain kinds of things. Due to restrictions on reviewed data, most studies extract aspects and opinions from the review data and assist users to discover valuable information in reviews, these jobs did not deliberately dig out the needs of users. However, in recent years, with the rapid development of mobile APP software, the reviews data on mobile APP has also emerged in an endless stream, providing sufficient data for many researchers. Chen et al. [5] mainly used a pipeline method to extract opinions. First, define the problem set, then classify the review data according to the problem set, and then extract the topic. Although the method has achieved certain effects, the definition of the problem set has a greater impact on the final result, which requires too much labor time. Under the experience and lessons of previous research work, this paper gives a requirement opinion mining method based on software review data (ROM). Firstly, combined with the knowledge of the requirements engineering field, this paper defines the requirement opinions, functional requirement opinions and non-functional requirements opinions. Secondly, based on the deep learning model, the software review data are classified into functional requirement reviews and non-functional requirement Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. ICMSS 2020, January 17–19, 2020, Wuhan, China © 2020 Association for Computing Machinery. ACM ISBN 978-1-4503-7641-9/20/01…$15.00 DOI: https://doi.org/10.1145/3380625.3380665 26
  • 2. reviews; Next, based on the differences between functional data and non-functional data, this paper defines three categories on the description of software functional data, and chooses to use sequence labeling methods to identify functional requirements opinion. K-means method based on word vector clusters classified data, and the results of clustering are mined by TF-IDF and syntactic analysis to mine specific requirement opinions. Finally, We have a case study of the user review data of the platform 360 mobile assistants based on the current domestic large mobile application services. The second section of this paper discusses related work; The third section gives some basic definitions involved in this method, and gives the ROM framework; The fourth section details the method of requirement opinion mining; the fifth section analyzes the method of this paper. Section VI summarizes and looks forward to it. 2. RELATED WORK With the rapid development of social media platforms over the past decade, many opinions mining methods based on online review data have been spawned. Han et al. [6] comprehensively reviewed the related research on the method of opinion mining. He separately introduced the two dimensions about the aspects and opinion of opinion mining. In the aspect extraction study, Jin et al. [7] used an HMM model with vocabulary to extract aspects. The method first establishes a set of words consisting of different vocabularies and their corresponding part of speech, so as to manually mark the aspects of the review text and the content of the opinion, and then send them to the HMM. In addition, topic model methods such as pLSA (probabilistic latent semantic analysis) and LDA (latent Dirichlet allocation) are also used in aspect extraction tasks. Hu et al. [8] first thought that nouns and noun phrases were explicit aspects. The electronic product review data is tagged with part of speech, and the Apriori algorithm- based association rule mining method is used to find frequently occurring nouns and noun phrases as candidate aspects, and then the wrong phrase is filtered by the pruning algorithm to form Aspect collection. In the study of the mining of opinions, most statistical models, the mining opinion of aspect-based are considered to be sequence tag problems [9]. Some researchers use corpora to construct rules, preprocess corpus and word frequency statistics, and then mine features and rules for statistical results, and then use this rule to extract viewpoint words. In order to reduce the manual labeling work, more and more scholars use syntactic dependencies to extract opinion. The method based on online review mining is mainly to mine some news, books and products, and hope that users can find valuable opinions in the reviews data more quickly and accurately, and the merchants can provide better services for users. In recent years, with the widespread use of mobile APP, there are more and more software-based review data, and some research has also appeared. Mainly adopt the method of classification or topic extraction: (1) Firstly, classify user reviews. Panichella et al. [10] found that text analysis, natural language processing, and sentiment analysis combined to extract features from reviews, and then use these features to train machine learning classifiers to get the best classification. Maalej et al. [11] tried a variety of techniques to process and classify user reviews, and found through experiments that multiple binary classifiers are superior to single multivariate classifiers. McIlroy et al. [12] focused on the multi-label issue of reviews, and thought that a review may contain multiple questions. The article raised 14 types of questions and considered them to be independent of the specific application. The machine learning classifiers such as Naive Bayes, J48 decision tree and support vector machine are compared. Finally, the support vector machine is used for classification. Pagano et al. [13] investigated the specific content of user reviews on the Apple App Store in 2013 and classified the content according to the theme. Whether the method is suitable for the Chinese application market needs further verification. (2) Some studies use the methods adopted by the topic for review analysis. Jiang et al. [14] proposed an associated LDA model for the opinion mining problem domain and applied it to users' online reviews. The above work only considers the method of classification or topic extraction. chen et al. [5] combines the methods of classification and topic extraction, obtains the types of questions pointed out by the reviews through classification, and obtains the specific software features in the reviews through topic mining. A review analysis method RASL based on support vector machine and topic model was raised. Although the above work has achieved very objective results, the subject extraction of the method is greatly affected by the classified problem set, and it takes too much time for manual labeling, and the user's needs are not carefully mined. Based on the reduction of a large number of artificial work, this paper intends to use the deep learning model to classify non-functional and functional requirements, and select K-means based on word vector to cluster the key information of requirement opinions, TF- IDF and Syntactic analysis is used to mine the requirement opinion mining of APP-based review data, and extract functional and non-functional requirement opinions respectively to provide clearer and more explicit requirement assistance for the participants of the user story. 3. DEFINITIONS AND THE ROM FRAMEWORK This paper combines knowledge in the field of opinion mining and requirements engineering to define requirement opinions. In the definition of opinions, Kim et al. [15] proposed to define the concept of opinions from four perspectives, namely, the aspect, the opinion holder, the expression and the sentiment. And the four are related to each other. For a certain subject, the opinion holder expresses a review containing a certain emotional attitude, which is the content of the opinion. We expanded the requirement opinion based on the definition of the opinion. For a certain aspect of software, the software user expresses some reviews with emotional attitudes or questionable reviews for the software with certain specific needs, called requirement opinions, which can be formalized into the following four-tuples, as defined below: Definition 1: Requirement opinions:= < Software, aspects, software users, emotional attitudes + aspects of requirements / specific requirements > For example WeChat: (1) ‘What is the reason why the position is not used after the upgrade? (2) ‘What version of the particulate loan is available? My WeChat does not have this function, is it not qualified enough? In these two reviews, for the WeChat software, (1) pointed out that the position update after the version update is not as good as before, and the evaluation of the location of the new version is low. (2) It is pointed out that this user's WeChat did not find the particulate loan function, indicating that he has this requirement. 27
  • 3. Software requirements are generally divided into two categories: functional requirements and non-functional requirements [16]. We define functional and non-functional requirements separately. Functional requirements have a uniform definition, which is the function of the system or the behavior of the system under certain conditions. We define the functional requirements opinion according to this definition: The things of software specific to do can be described by a set of requirements consisting of functions and behavior points, as defined as Definition 2 below: Definition 2: Functional requirements opinions:= <Function/specific behavior description, requirement opinion > Non-functional requirements are oriented toward the overall attributes of the software. They usually describe the extent to which the software satisfies certain attributes and is difficult to express in a unified way [16]. Jia et al. [17] has divided the non- functional requirements of the non-functional requirements into the five categories of non-functional requirements such as ‘performance’, ‘reliability’, ‘availability’, ‘security’, ‘maintainability’. We refer to five categories of non-functional requirements types to define non-functional requirements. We classify the review data with these features as non-functional data, as defined below: Definition 3: Non-functional requirement opinions:= <Five major categories of non-functional requirements description, requirement reviews> The preliminary characteristics of the five categories of non- functional requirements are based on the non-functional requirements of Jia et al. [17] and their description vocabulary. Vocabulary specific to the software requirements will be developed in our next step. Some of the characteristic words are shown in Table 1 below: Table 1. Part of the characteristic word display Aspect Characteristic word performance performance 、 high performance 、 time 、 response 、 reaction、delay time、bandwidth、capacity、space、 wait、Calculation、Occupy... reliability reliability、 reliable 、 stable 、 complete 、 Consisten 、 Compatible 、 effective 、 Correctness 、 serious 、 malfunction、Failure Rate... availability availability 、 Easy to learn 、 Easy to use 、 Understandable 、 operating 、 Attraction) 、 output 、 productive forces、benefit、Experience、interface... security security、safety、secret、password、visit、control、 access、jurisdiction、identity、verification、invade、 firewall、reveal... maintainability maintainability、maintain、test、detection、analyze、 cohesion 、 coupling 、 module 、 portable 、 reuse 、 quality、code... Based on the above definitions, the framework of the requirement opinion mining method (ROMc) is shown in Figure 1. Firstly, based on the 360 mobile application platform, the review data of various software is crawled. For details of the crawled content, see the data description module in Chapter 5, and then classify the functional data and non-functional data respectively. Extracting the aspect-level requirements based on this software in these two types of data, This applies to user story information assistance for software improvement and software development. Participants of user story Non-Function Requirements Function - Requirement Review data Classifier Model (deep learning/statistic/r egulation) Classified requirement s input Requirement opinions Software update/developme nt assist train storage take out Review - datas requirements Requireme nt opinions remand Figure 1. The framework of ROM. 4. ROM (REQUIREMENT OPINION MINING METHOD) 4.1 Classification of Requirement Opinions Based on Deep Learning Methods The classification techniques selected for different tasks in domestic and foreign research are also different [18], This paper chooses a deep learning method to classify the data, which is divided into two stages: data preprocessing stage and model training and prediction stage. The data preprocessing stage is relatively simple, and the main job is to label the data for training of the classification model. Here, the non-functional data is marked as 0, the functional data is marked as 1, and the useless data is marked as 2 based on the definition of functional and non- functional requirements. After the marked corpus is divided into training sets, verification sets, and test sets according to a certain proportion, training can be performed. Here, this article chooses to use the deep learning method for training (the model structure and parameter details are not discussed here), the specific training steps are as follows: Training steps: 1) Random training samples from the training set 𝑿𝒊 (If it is batch mode, it is to select multiple samples at the same time.) 2) Forward propagation of 𝑿𝒊 under the current parameter W to obtain the loss value 3) According to the chain rule, the backward propagation is used to obtain the gradient value 𝝏𝒍𝒐𝒔𝒔 𝝏𝒘 4) Update parameter values 𝒘 ← 𝒘 − 𝜶 ∗ 𝝏𝒍𝒐𝒔𝒔 𝝏𝒘 5) Cycle through steps 1~4 until the loss meets the target or reaches the specified number of training rounds and terminates the training. 28
  • 4. In the process of training, the model can be saved according to the situation. The saved model is used for prediction. The steps of neural network prediction are similar to the training process, Just no backpropagation update parameters are needed, and no expansion is performed here. In this paper, the predicted data is unlabeled review data. After the model is predicted, the data can be classified into three categories according to the model- predicted tags for further mining. The complete process for the classification of requirement opinions on supervised methods is shown in Figure 2. Figure 2. The Classification flow chart. In the classification stage of the review data, we first manually label the native corpus, and then put the labeled corpus into the model for training. Then save the model to predict the native corpus, so as to realize the classification of functional and non- functional data. 4.2 Extraction of Requirement Opinions Based on Classified Data As shown in definitions 2 and 3 above, functional data mainly refers to the functions implemented by the system and reviews under certain conditions, For example the review data: “Why didn't WeChat care about tips? It is recommended to set a special concern sound, and the position after the update is not easy to use.” which mainly addresses the specific needs of users, In functional requirements data, we get the functional requirement opinions of ‘set a special concern sound’ ‘the position after the update is not easy to use’ and so on. Non-functional data mainly refers to reviews based on aspects and aspect requirements in terms of performance, reliability, security and other characteristics. For example the review data: “Garbage application, Not only takes up a lot of memory, but also full of ads” This review data points out the user's emotional needs for performance and usability in this app, Currently we do not distinguish between emotional attitudes, We believe that in software development and iterative updates, both positive and negative attitudes indicate user needs. Ideally, we get the non-functional requirement opinions of ‘takes up a lot of memory’ ‘full of ads’ and so on. In view of the large differences between the types of non-functional data and functional data, We use different methods for different data sets. We detail the following: 4.2.1 Method for Extracting Requirement Opinions based on Functional Data This paper uses BiLSTM + CRF as the model of sequence labeling, defines three types of functional data descriptions, and identifies and extracts them. In previous work, We use sequence labeling method to define and justify the identification of specific software functions [19], We will not go into details here. This article redefines software feature categories from another dimension. In view of the characteristics of functional data based on software reviews, software function descriptions can be further divided into three categories, They are function loss (FL), function improvement (FI), and function complement (FC).It means that this software is missing some functions required by users, a certain function already exists but needs to be improved, this function already exists and has achieved a good user experience. Examples of category review data are as table 2 show: Table 2. The category of functional review data Category Examples Function improvement ( FI) ‘Chat history is out of sync’... Function complement(FC) ‘Good, WeChat payment is very convenient’... Fuction loss(FL) ‘It is recommended to set a special care tone.’... The framework of the model is mainly composed of a word embedding layer, a Bi-LSTM layer, and an output layer. It mainly identifies the functional review data descriptions. Ideally, we get three types of descriptions of functional requirements, and we extract the requirements separately. Let's take the function loss (FL) data as an example. The model framework is as follows in Figure 3: Figure 3. Model framework. 4.2.2 Method for Extracting Requirement Opinions based on Non-functional Data In view of the many types of software involved in non-functional data, in order to reduce the manual labeling work, we use the traditional unsupervised algorithm K-means combined with Tf-idf and syntactic analysis (hanlp) for requirement opinion extraction. We perform algorithm demonstration on classified data. Here we choose to use non-functional requirement review data (NFD). The detailed algorithm flow is as follows. Algorithm Description: Input: collection of text to be clustered 𝑫 = {𝑵𝑭𝑫𝟏, 𝑵𝑭𝑫𝟐, … , 𝑵𝑭𝑫𝑵}, Number of clusters K. Output: clustering: {𝑺𝟏, 𝑺𝟐, … , 𝑺𝑲}. 1. Randomly select K samples in D as the initial mean vector 2. while (when the algorithm convergence condition is not met): 29
  • 5. 3. for i=1, … , N 4. for k=1, …, K 5. Calculate the distance d 6. 𝒅(𝑵𝑭𝑫𝒊, 𝒎𝒌) = ‖𝑵𝑭𝑫𝒊 − 𝒎𝒌‖𝟐 of the sample 𝑵𝑭𝑫𝒊to 𝒎𝒌 7. Divide the sample 𝑵𝑭𝑫𝒊 into the cluster where the nearest mean vector is located 𝒂𝒓𝒈𝐦𝐢𝐧 𝒌 {𝒅(𝑵𝑭𝑫𝒊, 𝒎𝒌)} 8. for i = 1, …, K 9. Update each cluster mean vector: 𝒎𝒌 𝒏𝒆𝒘 = 𝟏 |𝑺𝒌| ∑ 𝑵𝑭𝑫𝒊 𝑵𝑭𝑫𝒊∈𝑺𝒌 In the K-means clustering algorithm, two key points are involved. The first is the similarity calculation method. In the above algorithm flow, the Euclidean distance is used as the default vector similarity calculation method. In the specific application, it can also be replaced, for example, using cosine as the evaluation index of similarity. The second point is the representation of this article. There are two commonly used text representation methods, One-hot representation method, and word vector representation method. Text representation in the form of One-hot, although the calculation is simple, the effect is significantly different from the representation method of the word vector. Therefore, this paper chooses the method of using the word vector to represent the text. After determining the above two points, the clustering algorithm can be performed. First, the divided text is preprocessed by word segmentation, part-of-speech tagging, etc., and the word vector is trained using word2vec. Next, the review text is converted to a vector representation based on the trained word vector. Then, the clustering algorithm is called to complete the text clustering. After the text clustering is completed, we can get K clusters. Ideally, we think that these K clusters contain the aspect and aspect requirements or specific requirements we want to mine. After the clustering algorithm obtains K clusters, we need to further extract the requirement reviews to get fine-grained aspects (keywords) and aspect requirements or specific requirements. There are many ways to further extract aspects (keywords), which can be done by TF, LDA, Textrank, TF-IDF, etc. Here we choose to use TF-IDF. The TF in the TF-IDF algorithm represents the word frequency, and the IDF represents the inverse document frequency. The word frequency indicates the number of times a word appears in the current text, and it is assumed that the high- frequency word contains more information characteristics than the low-frequency word, so the higher the word frequency, the more important. The calculation of TF is expressed as follows: 𝑇𝐹𝑖 = 𝑁(𝑡𝑖, 𝑑) (1) where 𝑡𝑖 represents a word, d represents a document, and N represents the number of times a word is in a corresponding document. The document frequency (DF) indicates the number of documents containing a word in all corpora. The higher the DF value of a word, the lower the amount of effective information it contains. Therefore, IDF essentially reflects the importance of features in the entire corpus. The formula is defined as follows: 𝑖𝑑𝑓𝑖 = log 𝑁 𝑑𝑓𝑖 (2) where 𝑖𝑑𝑓𝑖 represents the DF of the word 𝑡𝑖, and N is the total number of documents in the corpus. After calculating the TF and the IDF, the results of the two are multiplied to obtain the final TF-IDF value. Intuitively, the TF-IDF algorithm believes that the most critical point of distinguishing text should be that there are enough occurrences of the current text and fewer words appear in all the texts globally. When applying the TF-IDF algorithm on a clustered cluster, a basic transformation is required. Here, you only need to treat all the text in one cluster as a whole, and all the clusters can be used as a corpus. With the TF-IDF algorithm, we can extract the keywords (aspects) in each cluster. The nouns, noun phrases, gerunds, etc. in the cluster are used as aspects of the user's requirement opinions. Then re-excavate the review data in the cluster, using the HanLP tool for syntactic analysis. Establish rules based on relationship between subject-predicate and verb-object relationship, Adjectives, verbs, adverb combinations, verb combinations, etc. as the content of requirement opinions based on aspects with emotional attitudes, Thus extracting aspect requirements or specific requirements in the requirement opinion according to the rules. Based on this, we will dig out the requirement opinions we need. 5. CASE STUDY 5.1 Data Crawling and Description 360 mobile assistant is an application platform with a large domestic market share, providing a series of services [5] such as uninstalling, installing, upgrading and evaluating mobile applications. We currently only crawl the top ten APP review data of various types of 360 mobile assistants. In each APP's review data is divided into three levels of reviews, they are good, middle and bad reviews. This article does not distinguish the three levels of reviews, no matter what kind of emotions, it may contain requirements. We crawled the top ten reviews of each app and did statistics on each type of review data. Later work will further crawl the data of the 360 mobile assistant according to the needs. And combine the data of each mobile application market to achieve a more comprehensive data set. preventing the requirement opinions of a certain software from being different in different application markets. so as to achieve more comprehensive requirement opinions mining. The crawling statistics of various software review data are as follows in Table 3. Table 3. Statistics of each category review data Application category Number of reviews Theme & wallpaper 16129 Health & care 5291 Office & business 15405 Map & travel 15497 Av audio-visual 11789 Picture & video 11337 Education & study 11124 News & read 10710 Life & leisure 9699 Communication social 8050 Financial management 14900 To facilitate the analysis and display, we select some review data of the communication social WeChat and QQ for example analysis. First, we preprocess the crawled data. According to the 30
  • 6. definition of non-functional requirements and functional requirements in Chapter 3, we mark non-functional data as 0, functional data as 1, and useless data as 2. The labeled data is shown in the following Table 4. Table 4. Data label display Software Name Review data Label WeChat It is recommended to set a special care tone 1 WeChat Take up a lot of memory 0 WeChat The most garbage application, full of advertising, memory thief, send a file and various restrictions, not easy to use. 0 WeChat Do you dare to let Ma Huateng Ma Yun go bankrupt? 2 QQ Garbage, often numbered 0 QQ The new version cannot directly collect text. 1 QQ I have been playing QQ for five or six years, it is really my youth. 2 QQ Take up too much space 0 5.2 Requirement Opinion Classification First, the labeled corpus is divided into training set, verification set and test set according to the allocation of 7:2:1. The classified data set is used to train the classification model, and the better- performing model is saved. Next, load the trained model, classify it on the unlabeled review corpus, and divide the review into three parts. We select some of the review data mentioned above to show the ideal classification effect, as shown in Figure 4. Figure 4. Classification model ideal rendering. 5.3 Requirement Opinion Mining After classifying the data, we process the functional and non- functional data separately. For functional data, we divided the functional descriptions into three categories. FL, FI, and FC, and manually labeled them. Use BiLstm + Crf model for training and prediction on new data to get functional requirements descriptions for each category. The expected result is shown in Table 5. For these three categories of requirement descriptions, We directly extract opinions as functional requirements, The next step can be fine-grained mining of FC and FI data. Table 5. The expected result of functional requirement opinions Category Description FL (Fuction loss) Input: ‘ It is recommended to set a special care tone.’ Output: B-FC M-FC . ... E-FC FI (Fuction improvement) Input:‘Chat history is out of sync’ Output: B-FL M-BL .... E-BL FC (Function complement) Input:‘Good, WeChat payment is very convenient’.. Output: B-FI M-BI ... . E-BI In non-functional data, some examples of WeChat data are analyzed. First, cluster the requirement opinions to get clusters that are clustered by certain requirements. Secondly, the information of keywords (aspects) in the cluster is extracted by TF-IDF. Finally, based on syntactic analysis, the aspect requirements are extracted. The ideal result extraction result is shown in the following figure 5. Figure 5. The respected result of non-functional requirement opinions. The final expected results are shown in the chart below Figure 5. In the category of social and communication, the requirement opinion data for WeChat and QQ is divided into non-functional requirement opinions and functional requirement opinions. The classification of software requirements comments is convenient for participants in the user story to inquire the requirements information in software development and software iterative update. As shown as figure 6. Figure 6. Partial mining category display. 31
  • 7. Based on the non-functional requirements comments and functional requirements of QQ and WeChat based on the comment data, the results of the demand opinion mining are shown in Table 6. Table 6. Requirement opinion mining part results Software NFROs FROs WeChat Take up a lot of memory More advertising... set a special care tone Real name certification requires a bank card... QQ More frequent ringing Take up a lot of memory... Cannot collect text( Chat history is out of sync... Finally, because the user requirements of each software are different, but there is a common requirement under the same kind. For example, in the reviews on the user reviews of WeChat and QQ, the issue of ‘occupied memory’ is mentioned. Under the premise of utilizing the uniqueness, We use statistics on non- functional demand opinions and functional demand opinions under the premise of using uniqueness. Find out the user’attention point in non-functional requirements data and functional requirements data under the communication social category. Store uniqueness and commonality in some form. It is more convenient for the participants of the user story to make inquiries about the requirement comments. So that the method completes the requirement information auxiliary work. The specific application process simulation is shown in the following figure7: Figure 7. Requirement opinion application flow. 6. CONCLUSION AND FUTUREWORKS This paper proposes a requirement opinion mining method based on software user reviews data, which aims to find the user's requirement point for the software, to help the participants of the user story in the required project to carry out the requirement information assistance of software improvement or software development. First, we define requirement opinions, functional requirements opinions, and non-functional opinions based on the definitions of opinion mining and requirements engineering. Then introduce our ROM. We first need to get enough software reviews, then extract some opinions about the requirements, and mark the requirements, by the using of the corpus of the labeled to train the text classification deep learning model, which allows the annotated corpus to be predicted by the model. Thereby obtaining 3 aspects of data. Next, Targeting different characteristics of functional and non-functional data, This article adopts BiLSTM + CRF method and the clustering algorithm based on word vector, TF-IDF and syntax analysis is used to mine the fine-grained requirements, and the user's requirement opinions are mined. This paper mainly realizes the initial construction of the idea of requirement opinion mining method. The next step will be to implement the method of requirement opinion mining in this paper and make adjustments to the model and method according to the specific problems encountered. 7. REFERENCES [1] Boehm, B. and Turner, R., 2005. Management challenges to implementing agile processes in traditional development organizations. IEEE software, 22(5), pp.30-39. [2] Cao, L. and Ramesh, B., 2008. Agile requirem000ooents engineering practices: An empirical study. IEEE software, 25(1), pp.60-67. [3] Cohn, M., 2004. User stories applied: For agile software development. Addison-Wesley Professional. [4] Wang, CH., Jin, Z., Zhao, HY., Liu, L., Zhang, W. and Cui, MY., 2019. Human-assisted elicitation and evolution of user stories with scenarios. Ruan Jian Xue Bao(Chinses Journal of Software), 30(10), pp.3186-3205. [5] Chen, Q., Zhang, L., Jiang, J. and Huang, XY., 2018. Review Analysis Method Based on Support Vector Machine and Latent Dirichlet Allocation. Ruan Jian Xue Bao(Chinses Journal of Software), 30(5), pp.1547-1560. [6] Han, ZM., Li, MQ., Liu, W., Zhang, MM., Duan, DG. and Yu, CC., 2017. Survey of Studies on Aspect-Based Opinion Mining of Internet. Ruan Jian Xue Bao(Chinses Journal of Software), 29(02), pp.417-441. [7] Jin, W., Ho, H.H. and Srihari, R.K., 2009, June. A novel lexicalized HMM-based learning framework for web opinion mining. In Proceedings of the 26th annual international conference on machine learning, Citeseer, pp.465-472. [8] Hu, M. and Liu, B., 2004, August. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp.168-177. [9] Sterckx, L., Caragea, C., Demeester, T. and Develder, C., 2016, November. Supervised keyphrase extraction as positive unlabeled learning. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp.1924-1929. [10] Panichella, S., Di Sorbo, A., Guzman, E., Visaggio, C.A., Canfora, G. and Gall, H.C., 2015, September. How can i improve my app? classifying user reviews for software maintenance and evolution. In 2015 IEEE international conference on software maintenance and evolution, IEEE, pp.281-290. [11] Maalej, W. and Nabil, H., 2015, August. Bug report, feature request, or simply praise? on automatically classifying app reviews. In 2015 IEEE 23rd international requirements engineering conference, IEEE, pp.116-125. [12] McIlroy, S., Ali, N., Khalid, H. and Hassan, A.E., 2016. Analyzing and automatically labelling the types of user issues that are raised in mobile app reviews. Empirical Software Engineering, 21(3), pp.1067-1106. [13] Pagano, D. and Maalej, W., 2013, July. User feedback in the appstore: An empirical study. In 2013 21st IEEE international requirements engineering conference, IEEE, pp.125-134. [14] Jiang, W., Zhang, L., Dai, Y., Jiang, J. and Wang, G., 2013. Analyzing helpfulness of online reviews for user 32
  • 8. requirements elicitation. Jisuanji Xuebao(Chinese Journal of Computers), 36(1), pp.119-131. [15] Kim, S.M. and Hovy, E., 2004, August. Determining the sentiment of opinions. In Proceedings of the 20th international conference on Computational Linguistics, Association for Computational Linguistics, p. 1367. [16] Luo, XX., Li, ZH. and Zhao, YJ., 2015. Overviews on software non-functional requirements at home and abroad. Application Research of Computer, 32(4), pp.972-977. [17] Jia, YD. And Liu, L., 2019. Recognition and Classification of Non-functional Requirements in Chinese. Ruan Jian Xue Bao(Chinses Journal of Software), 30(10), pp.3115-3126. [18] Nikolai A. K. Steur and Carsten Mueller, 2019. "Classification of Viral Hemorrhagic Fever Focusing Ebola and Lassa Fever Using Neural Networks," International Journal of Machine Learning and Computing vol. 9, pp. 334- 343. [19] N. Li, L. Zheng, Y. Wang and B. Wang, 2019. "Feature- Specific Named Entity Recognition in Software Development Social Content," 2019 IEEE International Conference on Smart Internet of Things (SmartIoT), pp. 175- 182. 33