Ijmet 10 02_050

http://www.iaeme.com/IJMET/index.asp 489 editor@iaeme.com
International Journal of Mechanical Engineering and Technology (IJMET)
Volume 10, Issue 02, February 2019, pp. 489–500, Article ID: IJMET_10_02_050
Available online at http://www.iaeme.com/ijmet/issues.asp?JType=IJMET&VType=10&IType=2
ISSN Print: 0976-6340 and ISSN Online: 0976-6359
© IAEME Publication Scopus Indexed
MOVIE REVIEW ANALYSIS AND PREDICTION
USING MODIFIED
COLLABORATIVE FILTERING AND
CLUSTERING WITH REGRESSION
J. Sangeetha
Assistant Professor, Cauvery College for Women, Trichy, Tamilnadu, India
Dr.V. Sinthu Janita Prakash
Head, Department of Computer Science,
Cauvery College for Women, Trichy, Tamilnadu, India
ABSTRACT
Internet becomes the most popular surfing environment which increases the
service oriented data size. As the data size grows, finding and retrieving the most
similar data from the large volume of data would become more difficult task. This
problem is focused in the various research methods, which attempts to cluster the
large volume of data. In the existing research method Clustering-based Collaborative
Filtering approach (ClubCF) is introduced whose main goal is to cluster the similar
kind of data together, so that retrieval time cost can be reduced considerably.
However, existing research methods cannot find the similar reviews accurately which
needs to be focused more for efficient and accurate recommendation system. This is
ensured in the proposed research method by introducing the novel research technique
namely Modified Collaborative Filtering and Clustering with Regression (MoCFCR).
In this research method, initially k means algorithm is used to cluster the similar
movie reviewer together, so that recommendation process can be done in the easier
way. In order to handle the large volume of data this research work adapts the map
reduce framework which will divide the entire data into subsets which will assigned
on separate nodes with individual key values. After clustering, the clustered outcome
is merged together using inverted index procedure in which similarity between movies
would be calculated. Here collaborative filtering is applied to remove the movies that
are not relevant to input. Finally recommendations of movies are made in the accurate
way by using the logistic regression method. The overall evaluation of the proposed
research method is done in Hadoop from which it can be proved that the proposed
research technique can lead to provide better outcome than the existing research
techniques.

J. Sangeetha and Dr. V. Sinthu Janita Prakash
Key words: Clustering, Search retrieval, Semantic similarity, Partitions, map reduce.
Cite this Article: J. Sangeetha and Dr. V. Sinthu Janita Prakash, Movie Review
Analysis and Prediction Using Modified Collaborative Filtering and Clustering with
Regression, International Journal of Mechanical Engineering and Technology 10(2),
2019, pp. 489–500.
http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=10&IType=2
1. INTRODUCTION
Recommender systems or recommendation systems are a subclass of information filtering
system that seek to predict the "rating" or "preference" that a user would give to an
item[Ricci, F et al., 2011]. Recommender systems have become extremely common in recent
years, and are utilized in a variety of areas: some popular applications include movies, music,
news, books, research articles, search queries, social tags, and products in general. There are
also recommender systems for experts, collaborators, jokes, restaurants, garments, financial
services, life insurance.
Recommender systems typically produce a list of recommendations in one of two ways –
through collaborative and content-based filtering. Recommender System helps in addressing
the information overload problem by retrieving the information desired by the user based on
his or similar user’s preference and interests. Below figure show the phase of
recommendation system [Isinkaye, F. O., et al 2015].
Figure 1 Phases of recommendation system
Collaborative Filtering (CF) is a technique commonly used to build personalized
recommendations on the Web. Some popular websites that make use of the collaborative
filtering technology include Amazon, Netflix, iTunes, IMDB, LastFM, Delicious and Stumble
Upon. In collaborative filtering, algorithms are used to make automatic predictions about a
user's interests by compiling preferences from several users [Khorasani, E. S., et al 2016]
[Yang, Z., et al 2016].
The main goal of the proposed research method is to implement the method which can
effectively handle the large volume of datasets and find the similar items, so that
recommendation for end users can be made easily. This research work is carried out on movie
dataset whose main goal is to recommend the movies for the users based on preference. This
is achieved by introducing the novel research technique namely Modified
Collaborative Filtering and Clustering with Regression (MoCFCR).
Information collection phase
Learning phase
Recommendation phase
Feedback

Movie Review Analysis and Prediction Using Modified Collaborative Filtering and Clustering
with Regression
The overall organization of the proposed research method is shown as below. In this
section detailed introduction about the collaborative filtering system. In section 2, varying
related research methodologies has been discussed in detail based on their working procedure.
In section 3, a proposed research technique has been discussed in detail with suitable
examples and explanation. In section 4, performance evaluation of the proposed research
techniques has been carried over. Finally in section 5, overall conclusion of the proposed
research techniques has been given based on simulation outcome.
2. RELATED WORKS
Big data mining emerges as an innovative and potential research area for retrieving useful
data from huge datasets. It is utilized in real-time applications such as social site data
processing and biomedical applications to address massive volumes of data sets usually huge,
sparse, incomplete, uncertain, complex or dynamic data set from multiple and autonomous
sources.
[Sangeetha, J., & Prakash, V. S. J 2017] surveys about the big data mining techniques,
data slicing techniques and clustering techniques. This survey discusses about the advantages
and drawbacks of the big data mining techniques, data slicing techniques and clustering
techniques. [Saraswathi, S., & Sheela, M. I. 2014], carried out clustering analysis of various
data mining techniques. Cluster analysis or clustering is the task of grouping a set of objects
in such a way that objects in the same group are more similar to each other than to those in
other groups. Clustering is one of the complicated tasks in data mining. It plays a vital role in
a broad range of applications such as marketing, surveillance, fraud detection, Image
processing, Document classification and scientific discovery. Lot of issues related with cluster
analysis such as a high dimension of the dataset, arbitrary shapes of clusters, scalability, input
parameter, complexity and noisy data are still under research.
[Reed, J. W., et al 2004],described multi-agent system to cluster large data sets and
analysed. This technique is then compared to hierarchical agglomerative clustering using a
small set of text data. Results show that the agent-based approach can significantly reduce the
time required to cluster large data sets. [Menéndez, H. D. 2013] introduced Genetic Graph-
based Clustering (GGC), that improves the memory usage while maintaining the quality of
the solution. The new algorithm, called Multi- Objective Genetic Graph-based Clustering
(MOGGC), uses an evolutionary approach introducing a Multi-Objective Genetic Algorithm
to manage a reduced version of the Similarity Graph. The experimental validation shows that
MOGGC increases the memory efficiency, maintaining and improving the GGC results in the
synthetic and real datasets used in the experiments.
In [Skabar, A., & Abdalgader, K. 2013], presented a novel fuzzy clustering algorithm that
operates on relational input data; i.e., data in the form of a square matrix of pairwise
similarities between data objects. The algorithm uses a graph representation of the data, and
operates in an Expectation-Maximization framework in which the graph centrality of an
object in the graph is interpreted as likelihood. In [Saâdaoui, F., et al 2015], introduced a new
set of strategies allowing to simultaneously handle quantitative and qualitative data. The
principle of this approach is to perform a projection of the qualitative variables on the
subspaces spanned by quantitative ones. Subsequently, an optimal model is allocated to the
resulting PCA-regressed subspaces.
In [Prabhu, S. B., & Sophia, S. 2011], given a crisp introduction on clustering process in
WSNs. The survey of different distributed clustering algorithms (adaptive clustering
algorithms) used in WSNs, based on some metrics such as cluster count, cluster stability,
cluster head mobility, cluster head role, clustering objective and cluster head selection is

done. The study concludes with comparison of few distributed clustering algorithms in WSNs
based on these metrics. In [Yamashita, A. et al 2011], item-based collaborative filtering was
proposed to improve the recommendation accuracy. The unifying approach uses a constant
value as a weight parameter to unify both algorithms. However, because the optimal weight
for unifying is actually different by the situation, the algorithm should estimate an appropriate
weight dynamically, and should use it.
In [Pham, M. C., et al 2011], clustering approach is proposed based on the social
information of users to derive the recommendations. We study the application of this
approach in two application scenarios: academic venue recommendation based on
collaboration information and trust-based recommendation. Using the data from DBLP digital
library and Epinion, the evaluation shows that our clustering technique based CF performs
better than traditional CF algorithms. In [Kesemen, O., et al 2016], the fuzzy c-means
algorithm was adapted for directional data. The main benefit of FCM4DD is that the proposed
method is effectively a distribution-free approach to clustering for directional data.
In [Wu, J., et al 2013], presented a neighborhood based collaborative filtering approach to
predict such unknown values for QoS-based selection. Compared with existing methods, the
proposed method has three new features: 1) the adjusted- cosine-based similarity calculation
to remove the impact of different QoS scale; 2) a data smoothing process to improve
prediction accuracy; and 3) a similarity fusion approach to handle the data sparsity problem.
3. MOVIE REVIEW ANALYSIS AND RECOMMENDATION SYSTEM
The main goal of the proposed research method is to implement the method which can
effectively handle the large volume of datasets and find the similar items, so that
recommendation for end users can be made easily. This research work is carried out on movie
dataset whose main goal is to recommend the movies for the users based on preference. This
is achieved by introducing the novel research technique namely Modified
Collaborative Filtering and Clustering with Regression (MoCFCR). In this research method,
initially k means algorithm is used to cluster the similar movie reviewer together, so that
recommendation process can be done in the easier way. In order to handle the large volume of
data this research work adapts the map reduce framework which will divide the entire data
into subsets which will assigned on separate nodes with individual key values. After
clustering, the clustered outcome is merged together using inverted index procedure in which
similarity between movies would be calculated. Here collaborative filtering is applied to
remove the movies that are not relevant to input. Finally recommendations of movies are
made in the accurate way by using the logistic regression method. The overall flow of the
proposed research method is shown in the following figure 1.
In the figure 1, movie review analysis framework has been given. Here initially large
volume of movie review analysis dataset has been collected. To handle large volume of data
set, dataset will be divided into multiple divisions which are then assigned with key values for
the further processing. Those data are clusters and filtering based similarity to make the
recommendation process efficiently. Finally regression system is applied to perform
recommendation process very efficiently.

with Regression
Movie review
dataset
Mapper 1 Mapper 2 Mapper n
Remove repeated
items
Remove repeated
items
Remove repeated
items
K means
clustering
K means
clustering
K means
clustering
Reducer 1 Reducer n
Filtering Filtering
User
interest
Similarity finding
Regression based recommendation
Figure 1 Overall view of the proposed system
3.1. Clustering Using K Means Algorithm
Data clustering is the partitioning of object into groups (called clusters) such that the
similarity between objects of the same group is maximized and similarity between objects of
different groups is minimized. The goal of the clustering technique is to decompose or
partition a data set into groups such that both intra group similarity and inter-group
dissimilarity is maximized. Each clustering algorithm is based on some kind of distance
measures, which leads to grouping of related objects. The distance measure is used to
determine similarity of object criteria. As each distance measure shows different methods for
defining the degree of comparison between two objects. The K-Means algorithm uses
Euclidean distance to measure the distortion between a data object and its cluster centroid.
Euclidean distance metric is sufficient to successfully group similar data instances. K Means
clustering is a method used to of the most commonly and effective methods to classify data
because of its simplicity and ability to handle voluminous data sets.
3.1.1. Mapreduce Programming
In MapReduce process has two separate steps Map and Reduce steps. Each step is process on
sets of (key, value) pairs. While, the time of program execution is divided into a Map and a

Reduce stage, each separated by data transfer between nodes in the cluster. In Mapper
function can select the data values as input, applies the function to each value to the given
datasets and generates an output set. The mapper output in the form of (key, value) pairs. The
framework, then, sorts the mapper function outputs and inputs them into a Reducer. This data
transfer between the Mappers and the Reducer. The values are combined at the node running
the Reducer for that key. In Reducer stage produces another set of (key, value) pairs as final
output. The Reducer stage can only process after all data get from the Map process.
MapReduce requires the input as a (key, value) pair that can be divided and therefore, limited
to tasks and algorithms that use (key, value) pairs.
3.1.2. K-Means Clustering
Clustering is a process of grouping with similar objects. Any cluster should exhibit two main
properties that belong to, low inter-class similarity and high intraclass similarity. Clustering
techniques used to group a large number of things together into clusters that share some
similarity. It’s a method to discover hierarchy and order in a large or hard to understand
datasets and in that way reveal are interesting patterns or make the data set easier to
comprehend. Cluster analysis is used in many numbers of applications such as image
processing and data analysis. K-Means is one of the unsupervised learning methods among
partitions based clustering methods. It classifies a given dataset of n data objects in k clusters,
where k is the number of desired clusters. The K-means algorithm gave better results only
when the initial partition was close to the final solution. K-means clustering algorithm follows
the blow steps.
i) Choose a number of desired clusters, k.
ii) Choose k starting points to be used as initial estimates of the cluster centroids. The initial
starting values.
iii) Examine each point (i.e., job) in the workload dataset and assign it to the cluster whose
centroid is nearest to it.
iv) When each point is assigned to a cluster, recalculate the new k centroids.
v) Repeat steps 3 and 4 until no point changes its cluster assignment, or until a maximum
number of passes through the data set is performed.
3.1.3. K Means Clustering Using Mapreduce
In proposed method using k-means clustering algorithm to cluster the data for different type
of dimensional dataset in Hadoop framework and calculate the SSE value for those data. The
k-means algorithm is one of the most effective algorithms for clustering. To find the accuracy
to calculate SSE value while calculating the SSE value is small the given dataset is compact.
The implementation of clustering algorithm also benefits from the possibility to access by the
map reduce framework, so user can use the algorithm with large datasets.
Sum of Squared Error (SSE): The implemented k means clustering algorithm in
MapReduce paradigm based upon the Euclidean distance the result of cluster value can
calculated by SSE to identify the accuracy of cluster.
∑(( ) ( ) )
xi--> x co-ordinate of the points in the cluster.
xc--> x coordinate of the centroid.
yi--> y co-ordinate of the point in the cluster.
yc--> y co-ordinate of the centroid.

with Regression
Pseudocode - Kmeans Cluster Algorithm
Let n be the number of clusters you wantLet S be the set of feature vectors (|S| is the size of
the set)
Let A be the set of associated clusters for each feature vectorLet sim(x,y) be the similarity
function
Let c[n] be the vectors for our clusters
Init:
Let S' = S
//choose n random vectors to start our clusters
for i=1 to n
j = rand(|S'|)
c[n] = S'[j]
S' = S' - {c[n]} //remove that vector from S' so we can't choose it again
end
//assign initial clusters
for i=1 to |S|
A[i] = argmax(j = 1 to n) { sim(S[i], c[j]) }
end
Run:
Let change = true
while change
change = false //assume there is no change
//reassign feature vectors to clusters
for i = 1 to |S|
a = argmax(j = 1 to n) { sim(S[i], c[j]) }
if a != A[i]
A[i] = a
change = true //a vector changed affiliations -- so we need to
//recompute our cluster vectors and run again
end
end
//recalculate cluster locations if a change occurred
if change
for i = 1 to
nmean, count = 0for j = 1 to |S|
if A[j] == i
mean = mean + S[j]
count = count + 1
end
end
c[i] = mean/count

end
end
3.2. Colloborative Filtering
Up to now, item-based collaborative filtering algorithms have been widely used in many real
world applications such as at Amazon.com. It can be divided into three main steps, i.e.,
compute rating similarities, select neighbors and recommend services.
Compute Rating Similarity: Rating similarity computation between items is a time-
consuming but critical step in item-based CF algorithms. Common rating similarity measures
include the Pearson correlation coefficient (PCC) and the cosine similarity between ratings
vectors. The basic intuition behind PCC measure is to give a high similarity score for two
items that tend to be rated the same by many users. PCC which is the preferred choice in most
major systems was found to perform better than cosine vector similarity. Therefore, PCC is
applied to compute rating similarity between each pair of services in ClubCF.
Select Neighbors: The bigger value of 𝛾 is, the chosen number of neighbors will relatively
less but they may be more similar to the target service, thus the coverage of collaborative
filtering will decrease but the accuracy may increase. On the contrary, the smaller value of 𝛾
is, the more neighbors are chosen but some of them may be only slightly similar to the target
service, thus the coverage of collaborative filtering will increase but the accuracy would
decrease. Therefore, a suitable 𝛾 should be set for the tradeoff between accuracy and
coverage.
The collaborative filtering algorithm uses T-dimensional vectors, where T is the number
of distinct terms found in the search logs. We assign a number N (t) from 1 to T to each
distinct term in the search logs. For each URL u in the search logs we create a vector U whose
value in the N (t) position is the number of times that a user searched for that term and clicked
on that URL; this value may be zero. We create another vector Q with the same dimension as
U. It contains a 1 in the N (t) position if the term t is in the seed set and 0 otherwise.
We may multiplicatively weight each dimension of the vectors above by a factor
log(n/UF) where n is the number of distinct URLs in the search logs and UF the number of
distinct URLs that users clicked on after searching for the term corresponding to the entry.
This may be thought of as an Inverse URL Frequency, analogous to the Inverse Document
Frequency weighting used in information retrieval algorithms.
We compute T = ∑1 (U) Cos (U, Q), where 1(U) is an indicator vector. 1(U) is of the
same length as U and contains 1 for every non-zero entry of U and zero otherwise. The sum is
over the vector U for every URL in the search logs. Then rank the indices of T from its largest
to smallest entries. The terms corresponding to these indices are ranked from most similar to
least similar. The same collaborative filtering algorithm was applied to terms and URL in the
advertiser database. The term vector in this case consists of all the terms associated with a
URL.
3.3. Logistic Regression Based Recommendation
Logistic regression model using lexical features and features from search logs. Logistic
regression [Kleinbaum, D. G., & Klein, M. 2010] is a widely used generalized linear model
for predicting probabilities. In the logistic regression model, the log it of the probability of
relevance of a term is modeled as a linear function of the feature values. The model is trained
using maximum likelihood.
To simplify the problem, here focuses on predicting whether a user will watch a movie
based on other movies they’ve watched. Thus, for a given input pair (u, m) of user u and

with Regression
movie m, we want to predict whether the user will (0) or will not (1) watch the movie. As a
logistic regression model can also predict the probability of the interaction in addition to a
binary label, we can use the predicted probabilities to sort the movies in terms of users and
recommend some fixed number of top-ranked movies.
Let’s say that we have N users, M movies, and K features per movie. For each user, we
can define a feature vector with MK entries. When we want to predict interaction for a pair (u,
m) of user u and movie m, only features with indices in [mK,(m+1)K) will be “on” (have non-
zero values). The other M(K−1) features will be set to zero. This enables us to
pack MM separate logistic regression models into a single logistic regression model. In our
case, for each movie mm, we’ll use the interactions with the M−1 other movies as the
features. Thus, K=M−1 and our vector will have M(M−1) entries. This should immediately
raise a concern about memory usage – the number of features will scale quadratically with the
number of movies. In fact, this is one reason why you wouldn’t to train a separate model for
each movie in the first place anyway.
This is where feature hashing comes to our rescue. The interactions are very sparse; a
small percentage of the M−1 features for each movie are likely to be on. Instead of having a
vector of M (M−1) entries for each user-movie pair (u,m), we can create a small vector
of 2B
entries (where B is a new parameter for the model). For each movie i≠m that the user
has watched, we can encode the ids m and i as a string s (e.g., 23432_768). By hashing the
string, we can find the index in the feature vector idx = hash(s) % 2**B and increment the
count at that index. As with most of our blog posts, we implemented a proof of concept using
the wonderful scikit-learn. We used the SGDClassifier with a L2 penalty and log loss.
Features for each user-movie pair were hashed with the FeatureHasher extractor. The model
was trained in an online fashion, with each a batch formed for each user from the user’s
positive and negative examples. We released the implementation on GitHub under the Apache
v2 License. The overall flow of the proposed research method is given in the following
algorithm.
Algorithm: Overall flow of the proposed research method
1. Initial
(i) The given input data set can be split into sub datasets. The sub datasets are formed into <Key, Value> lists. And
these <Key, Value> lists input into map function.
(ii) Select k points randomly from the datasets as initial clustering centroids.
2. Mapper
i) Update the cluster centroids. Calculate the distance between the each point in given datasets and k centroids.
ii) Arrange each data to the nearest cluster until all the data have been processed.
iii) Output <ai, zj> pair. And ai is the center of the cluster zj.
3. Collaborative Filtering
i) Compute R_sim (st, sj) using Pearson correlation coefficient if st and sj belongs to the same cluster and compute
R_sim’(st, sj) by weighting R_sim (st, sj)
ii) Select services whose enhanced rating similarity with st exceed a rating similarity threshold 𝛾, and put them into a
neighbours set
iii) The logit of the probability of relevance of a term is modeled as a linear function of the feature values
iv) The model is trained using maximum likelihood
v) Compute the predicted rating of st for an active user. If the predicted rating exceeds a recommending threshold, it
will be recommended to the active user
3. Reducer
(i) Read <ai, zj> from Map stage. Collect all the data records. And then output of k clusters and the data points.
(ii) Calculate the average of each cluster which is selected as the new cluster center.
(iii) Calculate the new centroids with the original centroids in the same cluster. If the value is smaller than the
threshold or the number of iterations of the algorithm has reached the maximum, the algorithm will stop.
Otherwise, the new cluster centroids points are used to update the original centroids. Return to map stage, and
continue the algorithm until merging.

4. EXPERIMENTAL RESULTS
In this section overall research of the proposed work has been experimented and its results are
evaluated by comparing it with the already existing clustering approaches. In our research
work, movie review data set is used for the experimental analysis. MovieLens data sets were
collected by the GroupLens Research Project at the University of Minnesota. This data set
consists of:
* 100,000 ratings (1-5) from 943 users on 1682 movies.
* Each user has rated at least 20 movies.
* Simple demographic info for the users (age, gender, occupation, zip)
The data was collected through the MovieLens web site (movielens.umn.edu) during the
seven-month period from September 19th, 1997 through April 22nd, 1998. This data has been
cleaned up – users who had less than 20 ratings or did not have complete demographic
information were removed from this data set. Detailed descriptions of the data file can be
found at the end of this file.
The proposed clustering process is executed on the above mentioned data set to identify
its performance. The hadoop is used to develop the proposed methodology in terms of various
performance measures. The existing algorithms that are used to compare with the proposed
methodologies to analyze the performance improvement are k-means, k-mediod, density
based clustering and Clustering-based Collaborative Filtering approach (ClubCF). The
performance metrics used for comparison analysis are listed as follows:
 Computation time
 Mean absolute error
These performance measures are evaluated for both the proposed and existing
methodologies to analyze and predict the performance improvement. The comparison
evaluation is discussed in depth in the following sub sections.
4.1. Computation Time
Computation time (also called "running time") is the length of time required to perform
a computational process. Representation a computation as a sequence of rule applications,
the computation time is proportional to the number of rule applications.
Figure 2. Computation time comparison

with Regression
In the above figure 2 computation time evaluation is conducted for the existing and
proposed research techniques. It is evaluated and compared for the varying number of clusters
from which it can be proved that the proposed research technique can perform
recommendation with reduced computation time than the existing research techniques for the
varying number of methodologies.
4.2. Mean Absolute Error
In statistics, mean absolute error (MAE) is a measure of difference between two continuous
variables. The Mean Absolute Error is given by: It is possible to express MAE as the sum of
two components: Quantity Disagreement and Allocation Disagreement. Quantity
Disagreement is the absolute value of the Mean Error.
Figure 3. Mean absolute error
In the above figure 3, mean absolute error comparison evaluation has been done between
the existing and the proposed research techniques. From this comparison analysis, it can be
proved that the proposed research method can lead to provide the better outcome than the
existing research methods.
5. CONCLUSIONS
In the proposed research method Modified Collaborative Filtering and Clustering with
Regression (MoCFCR) method is introduce for the accurate movie review classification. In
this research method, initially k means algorithm is used to cluster the similar movie reviewer
together, so that recommendation process can be done in the easier way. In order to handle the
large volume of data this research work adapts the map reduce framework which will divide
the entire data into subsets which will assigned on separate nodes with individual key values.
After clustering, the clustered outcome is merged together using inverted index procedure in
which similarity between movies would be calculated. Here collaborative filtering is applied
to remove the movies that are not relevant to input. Finally recommendations of movies are
made in the accurate way by using the logistic regression method. The overall evaluation of
the proposed research method is done in hadoop from which it can be proved that the
proposed research technique can lead to provide better outcome than the existing research
techniques. In future works, more concentration can be given on partitioning of data where
data’s with more dependant values would lead to in accuracy in prediction. It will be more
effective if the correlation between the data are considered in future for further processing.

REFERENCES
[1] Ricci, F., Rokach, L., & Shapira, B. (2011). Introduction to recommender systems
handbook. In Recommender systems handbook (pp. 1-35). Springer US.
[2] Isinkaye, F. O., Folajimi, Y. O., & Ojokoh, B. A. (2015). Recommendation systems:
Principles, methods and evaluation. Egyptian Informatics Journal, 16(3), 261-273.
[3] Khorasani, E. S., Zhenge, Z., & Champaign, J. (2016, December). A Markov chain
collaborative filtering model for course enrollment recommendations. In Big Data (Big
Data), 2016 IEEE International Conference on (pp. 3484-3490). IEEE.
[4] Yang, Z., Wu, B., Zheng, K., Wang, X., & Lei, L. (2016). A Survey of Collaborative
Filtering-Based Recommender Systems for Mobile Internet Applications. IEEE Access, 4,
3273-3287.
[5] Sangeetha, J., & Prakash, V. S. J. (2017). A Survey on Big Data Mining Techniques.
International Journal of Computer Science and Information Security, 15(1), 482.
[6] Saraswathi, S., & Sheela, M. I. (2014). A comparative study of various clustering
algorithms in data mining. International Journal of Computer Science and Mobile
Computing, 11(11), 422-428.
[7] Reed, J. W., Potok, T. E., & Patton, R. M. (2004, May). A multi-agent system for
distributed cluster analysis. In Proceedings of Third International Workshop on Software
Engineering for Large-Scale Multi-Agent Systems (SELMAS’04) W16L Workshop-26th
International Conference on Software Engineering (pp. 152-155).
[8] Menéndez, H. D., Barrero, D. F., & Camacho, D. (2013, June). A multi-objective genetic
graph-based clustering algorithm with memory optimization. In Evolutionary
Computation (CEC), 2013 IEEE Congress on (pp. 3174-3181). IEEE.
[9] Skabar, A., & Abdalgader, K. (2013). Clustering sentence-level text using a novel fuzzy
relational clustering algorithm. IEEE transactions on knowledge and data engineering,
25(1), 62-75.
[10] Saâdaoui, F., Bertrand, P. R., Boudet, G., Rouffiac, K., Dutheil, F., & Chamoux, A.
(2015). A dimensionally reduced clustering methodology for heterogeneous occupational
medicine data mining. IEEE transactions on nanobioscience, 14(7), 707-715.
[11] Prabhu, S. B., & Sophia, S. (2011). A survey of adaptive distributed clustering algorithms
for wireless sensor networks. International Journal of Computer Science and Engineering
Survey, 2(4), 165.
[12] Yamashita, A., Kawamura, H., & Suzuki, K. (2011). Adaptive fusion method for user-
based and item-based collaborative filtering. Advances in Complex Systems, 14(02), 133-
149.
[13] Pham, M. C., Cao, Y., Klamma, R., & Jarke, M. (2011). A clustering approach for
collaborative filtering recommendation using social network analysis. J. UCS, 17(4), 583-
604.
[14] Kesemen, O., Tezel, Ö., & Özkul, E. (2016). Fuzzy c-means clustering algorithm for
directional data (FCM4DD). Expert Systems with Applications, 58, 76-82.
[15] Wu, J., Chen, L., Feng, Y., Zheng, Z., Zhou, M. C., & Wu, Z. (2013). Predicting quality
of service for selection by neighborhood-based collaborative filtering. IEEE Transactions
on Systems, Man, and Cybernetics: Systems, 43(2), 428-439.
[16] Kleinbaum, D. G., & Klein, M. (2010). Analysis of matched data using logistic regression.
In Logistic regression (pp. 389-428). Springer New York.

Ijmet 10 02_050

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Ijmet 10 02_050

Similar to Ijmet 10 02_050 (20)

More from IAEME Publication

More from IAEME Publication (20)

Recently uploaded

Recently uploaded (20)

Ijmet 10 02_050