International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology
IRJET - Recommendation System using Big Data Mining on Social NetworksIRJET Journal
The document describes a job recommendation system that uses machine learning algorithms and natural language processing on data collected from social networks. It uses naive Bayes, logistic regression, and random forest models to recommend jobs to users based on their profiles. The system tokenizes and lemmatizes words from user input before making recommendations. Evaluation of the models found naive Bayes had the highest training accuracy at 94.56% but lowest testing accuracy at 69.19%, while random forest had the highest testing accuracy at 70%.
CHURN ANALYSIS AND PLAN RECOMMENDATION FOR TELECOM OPERATORSJournal For Research
This document summarizes research on customer churn prediction and retention strategies in the telecommunications industry. It provides an abstract for a research paper on using a hybrid machine learning classifier and rule engine to predict customer churn and recommend retention plans. The document then reviews related work applying techniques like fuzzy logic, ordinal regression, artificial neural networks, and ensemble methods to predict churn. It outlines the problem of predicting churn and proposing recommendation rules. The proposed solution is a hybrid machine learning model to predict churn with high accuracy and explain reasons for churn to help recommend plans to retain customers.
Multidirectional Product Support System for Decision Making In Textile Indust...IOSR Journals
This document proposes a multidirectional rank prediction algorithm (MDRP) for decision making in the textile industry using collaborative filtering methods. MDRP learns asymmetric similarities between users, items, ratings, and sellers simultaneously through matrix factorization to overcome data sparsity and scalability issues. The algorithm was tested on textile datasets and analyzed product and user preferences. Results showed MDRP provided more accurate recommendations than existing similarity learning and collaborative filtering methods. MDRP allows effective decision making for multiple entities with multiple attributes.
IRJET- Customer Relationship and Management SystemIRJET Journal
This document discusses a customer relationship management (CRM) system that uses data mining techniques like classification, clustering, and regression to analyze customer data and improve customer relationships. It specifically proposes using fuzzy c-means clustering, which assigns customers to clusters based on membership values rather than binary classifications. This allows a customer to belong to multiple clusters, addressing limitations of traditional k-means clustering. The paper outlines the fuzzy c-means clustering algorithm and concludes it has advantages over crisp clustering methods for segmenting customer data in a CRM system.
Prediction of Default Customer in Banking Sector using Artificial Neural Networkrahulmonikasharma
The aim of this article is to present perdition and risk accuracy analysis of default customer in the banking sector. The neural network is a learning model inspired by biological neuron it is used to estimate and predict that can depend on a large number of inputs. The bank customer dataset from UCI repository, used for data analysis method to extract informative data set from a large volume of the dataset. This dataset is used in the neural network for training data and testing data. In a training of data, the data set is iterated till the desired output. This training data is cross check with test data. This paper focuses on predicting default customer by using deep learning neural network (DNN) algorithm.
Projection pursuit Random Forest using discriminant feature analysis model fo...IJECEIAES
A major and demand issue in the telecommunications industry is the prediction of churn customers. Churn describes the customer who attrites from the current provider to competitors searching for better service offers. Companies from the Telco sector frequently have customer relationship management offices it is the main objective in how to win back defecting clients because preserve long-term customers can be much more beneficial than gain newly recruited customers. Researchers and practitioners are paying great attention to developing a robust customer churn prediction model, especially in the telecommunication business by proposed numerous machine learning approaches. Many approaches of Classification are established, but the most effective in recent times is a tree-based method. The main contribution of this research is to predict churners/non-churners in the Telecom sector based on project pursuit Random Forest (PPForest) that uses discriminant feature analysis as a novelty extension of the conventional Random Forest for learning oblique Project Pursuit tree (PPtree). The proposed methodology leverages the advantage of two discriminant analysis methods to calculate the project index used in the construction of PPtree. The first method used Support Vector Machines (SVM) while, the second method used Linear Discriminant Analysis (LDA) to achieve linear splitting of variables during oblique PPtree construction to produce individual classifiers that are robust and more diverse than classical Random Forest. It is found that the proposed methods enjoy the best performance measurements e.g. Accuracy, hit rate, ROC curve, Lift, H-measure, AUC. Moreover, PPForest based on LDA delivers effective evaluators in the prediction model.
A survey of modified support vector machine using particle of swarm optimizat...Editor Jacotech
This document summarizes a research paper that proposes a modified support vector machine (MSVM) classification algorithm using particle swarm optimization (PSO) for data classification in data streams. It discusses how new evolving features and concept drift in data streams can decrease the performance of traditional SVM classifiers. The proposed MSVM-PSO technique uses PSO to optimize feature selection and control the evaluation of new evolving features. PSO works in two phases - dynamic population selection and optimization of new evolved features. The methodology and implementation of MSVM-PSO is explained along with experimental results on three datasets showing it improves classification performance over traditional SVM.
IRJET- An Integrated Recommendation System using Graph Database and QGISIRJET Journal
This document presents a recommendation system that uses a graph database and QGIS to find the shortest path for a user to shop at the nearest mall. It analyzes product reviews stored in the Neo4j graph database to determine which products the user may be interested in. It then uses QGIS to calculate the shortest distance from the user's current location to the nearest mall with the recommended products. The system aims to minimize the time a user spends shopping by providing personalized recommendations and routing them to the closest appropriate location. It discusses how graph databases and hybrid recommendation approaches can be used to integrate different recommendation techniques for improved performance.
IRJET - Recommendation System using Big Data Mining on Social NetworksIRJET Journal
The document describes a job recommendation system that uses machine learning algorithms and natural language processing on data collected from social networks. It uses naive Bayes, logistic regression, and random forest models to recommend jobs to users based on their profiles. The system tokenizes and lemmatizes words from user input before making recommendations. Evaluation of the models found naive Bayes had the highest training accuracy at 94.56% but lowest testing accuracy at 69.19%, while random forest had the highest testing accuracy at 70%.
CHURN ANALYSIS AND PLAN RECOMMENDATION FOR TELECOM OPERATORSJournal For Research
This document summarizes research on customer churn prediction and retention strategies in the telecommunications industry. It provides an abstract for a research paper on using a hybrid machine learning classifier and rule engine to predict customer churn and recommend retention plans. The document then reviews related work applying techniques like fuzzy logic, ordinal regression, artificial neural networks, and ensemble methods to predict churn. It outlines the problem of predicting churn and proposing recommendation rules. The proposed solution is a hybrid machine learning model to predict churn with high accuracy and explain reasons for churn to help recommend plans to retain customers.
Multidirectional Product Support System for Decision Making In Textile Indust...IOSR Journals
This document proposes a multidirectional rank prediction algorithm (MDRP) for decision making in the textile industry using collaborative filtering methods. MDRP learns asymmetric similarities between users, items, ratings, and sellers simultaneously through matrix factorization to overcome data sparsity and scalability issues. The algorithm was tested on textile datasets and analyzed product and user preferences. Results showed MDRP provided more accurate recommendations than existing similarity learning and collaborative filtering methods. MDRP allows effective decision making for multiple entities with multiple attributes.
IRJET- Customer Relationship and Management SystemIRJET Journal
This document discusses a customer relationship management (CRM) system that uses data mining techniques like classification, clustering, and regression to analyze customer data and improve customer relationships. It specifically proposes using fuzzy c-means clustering, which assigns customers to clusters based on membership values rather than binary classifications. This allows a customer to belong to multiple clusters, addressing limitations of traditional k-means clustering. The paper outlines the fuzzy c-means clustering algorithm and concludes it has advantages over crisp clustering methods for segmenting customer data in a CRM system.
Prediction of Default Customer in Banking Sector using Artificial Neural Networkrahulmonikasharma
The aim of this article is to present perdition and risk accuracy analysis of default customer in the banking sector. The neural network is a learning model inspired by biological neuron it is used to estimate and predict that can depend on a large number of inputs. The bank customer dataset from UCI repository, used for data analysis method to extract informative data set from a large volume of the dataset. This dataset is used in the neural network for training data and testing data. In a training of data, the data set is iterated till the desired output. This training data is cross check with test data. This paper focuses on predicting default customer by using deep learning neural network (DNN) algorithm.
Projection pursuit Random Forest using discriminant feature analysis model fo...IJECEIAES
A major and demand issue in the telecommunications industry is the prediction of churn customers. Churn describes the customer who attrites from the current provider to competitors searching for better service offers. Companies from the Telco sector frequently have customer relationship management offices it is the main objective in how to win back defecting clients because preserve long-term customers can be much more beneficial than gain newly recruited customers. Researchers and practitioners are paying great attention to developing a robust customer churn prediction model, especially in the telecommunication business by proposed numerous machine learning approaches. Many approaches of Classification are established, but the most effective in recent times is a tree-based method. The main contribution of this research is to predict churners/non-churners in the Telecom sector based on project pursuit Random Forest (PPForest) that uses discriminant feature analysis as a novelty extension of the conventional Random Forest for learning oblique Project Pursuit tree (PPtree). The proposed methodology leverages the advantage of two discriminant analysis methods to calculate the project index used in the construction of PPtree. The first method used Support Vector Machines (SVM) while, the second method used Linear Discriminant Analysis (LDA) to achieve linear splitting of variables during oblique PPtree construction to produce individual classifiers that are robust and more diverse than classical Random Forest. It is found that the proposed methods enjoy the best performance measurements e.g. Accuracy, hit rate, ROC curve, Lift, H-measure, AUC. Moreover, PPForest based on LDA delivers effective evaluators in the prediction model.
A survey of modified support vector machine using particle of swarm optimizat...Editor Jacotech
This document summarizes a research paper that proposes a modified support vector machine (MSVM) classification algorithm using particle swarm optimization (PSO) for data classification in data streams. It discusses how new evolving features and concept drift in data streams can decrease the performance of traditional SVM classifiers. The proposed MSVM-PSO technique uses PSO to optimize feature selection and control the evaluation of new evolving features. PSO works in two phases - dynamic population selection and optimization of new evolved features. The methodology and implementation of MSVM-PSO is explained along with experimental results on three datasets showing it improves classification performance over traditional SVM.
IRJET- An Integrated Recommendation System using Graph Database and QGISIRJET Journal
This document presents a recommendation system that uses a graph database and QGIS to find the shortest path for a user to shop at the nearest mall. It analyzes product reviews stored in the Neo4j graph database to determine which products the user may be interested in. It then uses QGIS to calculate the shortest distance from the user's current location to the nearest mall with the recommended products. The system aims to minimize the time a user spends shopping by providing personalized recommendations and routing them to the closest appropriate location. It discusses how graph databases and hybrid recommendation approaches can be used to integrate different recommendation techniques for improved performance.
On the benefit of logic-based machine learning to learn pairwise comparisonsjournalBEEI
This document describes a study that used a logic-based machine learning approach called APARELL to learn user preferences from pairwise comparisons in a recommender system. APARELL learns description logic rules from examples of users' pairwise item preferences and background knowledge about item properties. It was implemented in a used car recommender system. A user study evaluated the pairwise preference elicitation method compared to a standard list interface. Results showed the pairwise interface was significantly better in recommending items that matched users' preferences.
Abstract Learning Analytics by nature relies on computational information processing activities intended to extract from raw data some interesting aspects that can be used to obtain insights into the behaviors of learners, the design of learning experiences, etc. There is a large variety of computational techniques that can be employed, all with interesting properties, but it is the interpretation of their results that really forms the core of the analytics process. As a rising subject, data mining and business intelligence are playing an increasingly important role in the decision support activity of every walk of life. The Variance Rover System (VRS) mainly focused on the large data sets obtained from online web visiting and categorizing this into clusters according some similarity and the process of predicting customer behavior and selecting actions to influence that behavior to benefit the company, so as to take optimized and beneficial decisions of business expansion. Keywords: Analytics, Business intelligence, Clustering, Data Mining, Standard K-means, Optimized K-means
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...IRJET Journal
This document presents a tool for preprocessing and visualizing data using machine learning models. It aims to simplify the preprocessing steps for users by performing tasks like data cleaning, transformation, and reduction. The tool takes in a raw dataset, cleans it by removing missing values, outliers, etc. It then allows users to apply machine learning algorithms like linear regression, KNN, random forest for analysis. The processed and predicted data can be visualized. The tool is intended to save time by automating preprocessing and providing visual outputs for analysis using machine learning models on large datasets.
Estimating project development effort using clustered regression approachcsandit
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a
challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the
complex and dynamic interaction of factors that impact software development. Heterogeneity
exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying
them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency
due to heterogeneity of the data. Using a clustered approach creates the subsets of data having
a degree of homogeneity that enhances prediction accuracy. It was also observed in this study
that ridge regression performs better than other regression techniques used in the analysis.
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHcscpconf
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the complex and dynamic interaction of factors that impact software development. Heterogeneity exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency due to heterogeneity of the data. Using a clustered approach creates the subsets of data having a degree of homogeneity that enhances prediction accuracy. It was also observed in this study that ridge regression performs better than other regression techniques used in the analysis.
This document provides an overview of existing multi-attribute decision making (MADM) algorithms for vertical handoff decision making in heterogeneous wireless networks. It describes several popular MADM methods used as the basis for vertical handoff decision algorithms, including simple additive weighting, multiplicative exponential weighting, technique for order preference by similarity to ideal solution, grey relational analysis, and analytic hierarchy process. For each method, it provides an example of a vertical handoff decision algorithm from literature that uses that particular MADM approach. The document concludes that vertical handoff decision algorithms based on MADM methods can help provide seamless connectivity in next generation wireless networks.
This paper aims to build predictive models using the CRISP-DM framework to classify bank customers into predefined classes using a Portuguese marketing campaign dataset. It analyzes bank customer data containing over 45,000 instances and 16 features to classify customers as likely or not likely to subscribe to bank deposits. It uses multilayer perceptron and logistic regression algorithms for modeling. The results show that the multilayer perceptron model with a 70% training split provides the best average performance, accurately classifying customers 70% of the time.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Single view vs. multiple views scatterplotsIJECEIAES
Among all the available visualization tools, the scatterplot has been deeply analyzed through the years and many researchers investigated how to improve this tool to face new challenges. The scatterplot visualization diagram is considered one of the most functional among the variety of data visual representations, due to its relative simplicity compared to other multivariable visualization techniques. Even so, one of the most significant and unsolved challenge in data visualization consists in effectively displaying datasets with many attributes or dimensions, such as multidimensional or multivariate ones. The focus of this research is to compare the single view and the multiple views visualization paradigms for displaying multivariable dataset using scatterplots. A multivariable scatterplot has been developed as a web application to provide the single view tool, whereas for the multiple views visualization, the ScatterDice web app has been slightly modified and adopted as a traditional, yet interactive, scatterplot matrix. Finally, a taxonomy of tasks for visualization tools has been chosen to define the use case and the tests to compare the two paradigms.
Automated Feature Selection and Churn Prediction using Deep Learning ModelsIRJET Journal
This document discusses using deep learning models for churn prediction in the telecommunications industry. It begins with an introduction to churn prediction and feature selection challenges. It then provides an overview of deep learning techniques, including artificial neural networks, convolutional neural networks, and their applications. The document proposes three deep learning architectures for churn prediction and experiments with them on two telecom datasets. The results show deep learning models can achieve performance comparable to traditional models without manual feature engineering.
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
This document discusses applying machine learning algorithms to three datasets: a housing dataset to predict prices, a banking dataset to predict customer churn, and a credit card dataset for customer segmentation. For housing prices, linear regression, regression trees and gradient boosted trees are applied and evaluated on test data using R2 and RMSE. For customer churn, logistic regression and random forests are used with sampling to address class imbalance, and evaluated using confusion matrix metrics. For credit card data, k-means clustering with PCA is used to segment customers into groups.
Threshold benchmarking for feature ranking techniquesjournalBEEI
In prediction modeling, the choice of features chosen from the original feature set is crucial for accuracy and model interpretability. Feature ranking techniques rank the features by its importance but there is no consensus on the number of features to be cut-off. Thus, it becomes important to identify a threshold value or range, so as to remove the redundant features. In this work, an empirical study is conducted for identification of the threshold benchmark for feature ranking algorithms. Experiments are conducted on Apache Click dataset with six popularly used ranker techniques and six machine learning techniques, to deduce a relationship between the total number of input features (N) to the threshold range. The area under the curve analysis shows that ≃ 33-50% of the features are necessary and sufficient to yield a reasonable performance measure, with a variance of 2%, in defect prediction models. Further, we also find that the log2(N) as the ranker threshold value represents the lower limit of the range.
IRJET- Analyzing Voting Results using Influence MatrixIRJET Journal
This document discusses analyzing voting results using an influence matrix. It proposes modeling voting outcomes as results from an opinion dynamics process, where opinions evolve according to social influence. It formulates estimating the maximum posteriori opinions and influence matrix from voting data. The influence matrix technique is described to solve fluid flow problems numerically. It demonstrates vote prediction and dynamic results visualization based on the estimated influence matrix. Future work could explore modeling stubborn agents' topic-dependent beliefs instead of independently.
Markov Chain for the Recommendation of Materialized Views in Real-Time Data W...IJCSEA Journal
In this paper we propose an approach, which is based on Markov Chain, to cluster and recommend candidate views for the selection algorithm of materialized views. Our idea is to intervene at regular period of time in order to filter the candidate views which will be used by an algorithm for the selection of materialized views in real-time data warehouse. The aim is to reduce the complexity and the execution cost of the online selection of materialized views. Our experiment results have shown that our solution is very efficient to specify the more profitable views and to improve the query response time.
In this paper we propose an approach, which is based on Markov Chain, to cluster and recommend
candidate views for the selection algorithm of materialized views. Our idea is to intervene at regular period
of time in order to filter the candidate views which will be used by an algorithm for the selection of
materialized views in real-time data warehouse. The aim is to reduce the complexity and the execution cost
of the online selection of materialized views. Our experiment results have shown that our solution is very
efficient to specify the more profitable views and to improve the query response time.
As part of our team's enrollment for Data Science Super Specialization course under UpX Academy, we submitted many projects for our final assessments, one of them was Telecom Churn Analysis Model.
The input data was provided by UpX academy and language we used is R. As part of the project, our main objective was :-
-> To predict Customer Churn.
-> To Highlight the main variables/factors influencing Customer Churn.
-> To Use various ML algorithms to build prediction models, evaluate the accuracy and performance of these models.
-> Finding out the best model for our business case & providing executive Summary.
To address the mentioned business problem, we tried to follow a thorough approach. We did a detailed level Exploratory Data Analysis which consists of various Box Plots, Bar Plots etc..
Further we tried our best to build as many Classification models possible which fits our business case (Logistic Regression/kNN/Decision Trees/Random Forest/SVM) and also tried to touch Cox Hazard Survival analysis Model. Later for every model we tried to boost their performances by applying various performance tuning techniques.
As we all are still into our learning mode w.r.t these concepts & starting new, please feel free to provide feedback on our work. Any suggestions are most welcome... :)
Thanks!!
Mining Large Streams of User Data for PersonalizedRecommenda.docxARIV4
This document provides an overview of Netflix's approach to personalized recommendations. It discusses lessons learned from the Netflix Prize competition, including the effectiveness of matrix factorization and restricted Boltzmann machine models. It then describes Netflix's process for innovating recommendations called "Consumer Data Science", which uses A/B testing to evaluate new approaches. Finally, it outlines the key components of Netflix's personalized experience, noting that recommendations are integrated throughout the interface and account for differences within households.
Recommendation system using bloom filter in mapreduceIJDKP
Many clients like to use the Web to discover product details in the form of online reviews. The reviews are
provided by other clients and specialists. Recommender systems provide an important response to the
information overload problem as it presents users more practical and personalized information facilities.
Collaborative filtering methods are vital component in recommender systems as they generate high-quality
recommendations by influencing the likings of society of similar users. The collaborative filtering method
has assumption that people having same tastes choose the same items. The conventional collaborative
filtering system has drawbacks as sparse data problem & lack of scalability. A new recommender system is
required to deal with the sparse data problem & produce high quality recommendations in large scale
mobile environment. MapReduce is a programming model which is widely used for large-scale data
analysis. The described algorithm of recommendation mechanism for mobile commerce is user based
collaborative filtering using MapReduce which reduces scalability problem in conventional CF system.
One of the essential operations for the data analysis is join operation. But MapReduce is not very
competent to execute the join operation as it always uses all records in the datasets where only small
fraction of datasets are applicable for the join operation. This problem can be reduced by applying
bloomjoin algorithm. The bloom filters are constructed and used to filter out redundant intermediate
records. The proposed algorithm using bloom filter will reduce the number of intermediate results and will
improve the join performance.
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...IRJET Journal
The document describes a proposed hybrid recommendation algorithm that incorporates content filtering, collaborative filtering, and demographic filtering. It begins with an overview of recommendation systems and different filtering techniques. Then, it discusses related work incorporating various filtering approaches. The methodology section outlines the original algorithm, which develops user profiles based on browsing history and ratings. It provides recommendations by calculating similarities between user and item profiles. The proposed methodology enhances this by incorporating demographic attributes into user profiles and using fuzzy logic to validate recommendations. It claims this integrated approach can provide more accurate and personalized recommendations.
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...IRJET Journal
This document discusses evaluating and enhancing the efficiency of recommendation systems using big data analytics. It begins with an abstract that outlines recommendation systems, collaborative filtering, and the need for big data analytics due to large datasets. It then discusses specific collaborative filtering techniques like user-based, item-based, and matrix factorization. It describes challenges like scalability that big data analytics can help address. The document evaluates recommendation algorithms using metrics like MAE, RMSE, precision and time taken on movie recommendation datasets. It aims to design an efficient recommendation system using the best techniques.
On the benefit of logic-based machine learning to learn pairwise comparisonsjournalBEEI
This document describes a study that used a logic-based machine learning approach called APARELL to learn user preferences from pairwise comparisons in a recommender system. APARELL learns description logic rules from examples of users' pairwise item preferences and background knowledge about item properties. It was implemented in a used car recommender system. A user study evaluated the pairwise preference elicitation method compared to a standard list interface. Results showed the pairwise interface was significantly better in recommending items that matched users' preferences.
Abstract Learning Analytics by nature relies on computational information processing activities intended to extract from raw data some interesting aspects that can be used to obtain insights into the behaviors of learners, the design of learning experiences, etc. There is a large variety of computational techniques that can be employed, all with interesting properties, but it is the interpretation of their results that really forms the core of the analytics process. As a rising subject, data mining and business intelligence are playing an increasingly important role in the decision support activity of every walk of life. The Variance Rover System (VRS) mainly focused on the large data sets obtained from online web visiting and categorizing this into clusters according some similarity and the process of predicting customer behavior and selecting actions to influence that behavior to benefit the company, so as to take optimized and beneficial decisions of business expansion. Keywords: Analytics, Business intelligence, Clustering, Data Mining, Standard K-means, Optimized K-means
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...IRJET Journal
This document presents a tool for preprocessing and visualizing data using machine learning models. It aims to simplify the preprocessing steps for users by performing tasks like data cleaning, transformation, and reduction. The tool takes in a raw dataset, cleans it by removing missing values, outliers, etc. It then allows users to apply machine learning algorithms like linear regression, KNN, random forest for analysis. The processed and predicted data can be visualized. The tool is intended to save time by automating preprocessing and providing visual outputs for analysis using machine learning models on large datasets.
Estimating project development effort using clustered regression approachcsandit
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a
challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the
complex and dynamic interaction of factors that impact software development. Heterogeneity
exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying
them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency
due to heterogeneity of the data. Using a clustered approach creates the subsets of data having
a degree of homogeneity that enhances prediction accuracy. It was also observed in this study
that ridge regression performs better than other regression techniques used in the analysis.
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHcscpconf
Due to the intangible nature of “software”, accurate and reliable software effort estimation is a challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the complex and dynamic interaction of factors that impact software development. Heterogeneity exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency due to heterogeneity of the data. Using a clustered approach creates the subsets of data having a degree of homogeneity that enhances prediction accuracy. It was also observed in this study that ridge regression performs better than other regression techniques used in the analysis.
This document provides an overview of existing multi-attribute decision making (MADM) algorithms for vertical handoff decision making in heterogeneous wireless networks. It describes several popular MADM methods used as the basis for vertical handoff decision algorithms, including simple additive weighting, multiplicative exponential weighting, technique for order preference by similarity to ideal solution, grey relational analysis, and analytic hierarchy process. For each method, it provides an example of a vertical handoff decision algorithm from literature that uses that particular MADM approach. The document concludes that vertical handoff decision algorithms based on MADM methods can help provide seamless connectivity in next generation wireless networks.
This paper aims to build predictive models using the CRISP-DM framework to classify bank customers into predefined classes using a Portuguese marketing campaign dataset. It analyzes bank customer data containing over 45,000 instances and 16 features to classify customers as likely or not likely to subscribe to bank deposits. It uses multilayer perceptron and logistic regression algorithms for modeling. The results show that the multilayer perceptron model with a 70% training split provides the best average performance, accurately classifying customers 70% of the time.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Single view vs. multiple views scatterplotsIJECEIAES
Among all the available visualization tools, the scatterplot has been deeply analyzed through the years and many researchers investigated how to improve this tool to face new challenges. The scatterplot visualization diagram is considered one of the most functional among the variety of data visual representations, due to its relative simplicity compared to other multivariable visualization techniques. Even so, one of the most significant and unsolved challenge in data visualization consists in effectively displaying datasets with many attributes or dimensions, such as multidimensional or multivariate ones. The focus of this research is to compare the single view and the multiple views visualization paradigms for displaying multivariable dataset using scatterplots. A multivariable scatterplot has been developed as a web application to provide the single view tool, whereas for the multiple views visualization, the ScatterDice web app has been slightly modified and adopted as a traditional, yet interactive, scatterplot matrix. Finally, a taxonomy of tasks for visualization tools has been chosen to define the use case and the tests to compare the two paradigms.
Automated Feature Selection and Churn Prediction using Deep Learning ModelsIRJET Journal
This document discusses using deep learning models for churn prediction in the telecommunications industry. It begins with an introduction to churn prediction and feature selection challenges. It then provides an overview of deep learning techniques, including artificial neural networks, convolutional neural networks, and their applications. The document proposes three deep learning architectures for churn prediction and experiments with them on two telecom datasets. The results show deep learning models can achieve performance comparable to traditional models without manual feature engineering.
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
This document discusses applying machine learning algorithms to three datasets: a housing dataset to predict prices, a banking dataset to predict customer churn, and a credit card dataset for customer segmentation. For housing prices, linear regression, regression trees and gradient boosted trees are applied and evaluated on test data using R2 and RMSE. For customer churn, logistic regression and random forests are used with sampling to address class imbalance, and evaluated using confusion matrix metrics. For credit card data, k-means clustering with PCA is used to segment customers into groups.
Threshold benchmarking for feature ranking techniquesjournalBEEI
In prediction modeling, the choice of features chosen from the original feature set is crucial for accuracy and model interpretability. Feature ranking techniques rank the features by its importance but there is no consensus on the number of features to be cut-off. Thus, it becomes important to identify a threshold value or range, so as to remove the redundant features. In this work, an empirical study is conducted for identification of the threshold benchmark for feature ranking algorithms. Experiments are conducted on Apache Click dataset with six popularly used ranker techniques and six machine learning techniques, to deduce a relationship between the total number of input features (N) to the threshold range. The area under the curve analysis shows that ≃ 33-50% of the features are necessary and sufficient to yield a reasonable performance measure, with a variance of 2%, in defect prediction models. Further, we also find that the log2(N) as the ranker threshold value represents the lower limit of the range.
IRJET- Analyzing Voting Results using Influence MatrixIRJET Journal
This document discusses analyzing voting results using an influence matrix. It proposes modeling voting outcomes as results from an opinion dynamics process, where opinions evolve according to social influence. It formulates estimating the maximum posteriori opinions and influence matrix from voting data. The influence matrix technique is described to solve fluid flow problems numerically. It demonstrates vote prediction and dynamic results visualization based on the estimated influence matrix. Future work could explore modeling stubborn agents' topic-dependent beliefs instead of independently.
Markov Chain for the Recommendation of Materialized Views in Real-Time Data W...IJCSEA Journal
In this paper we propose an approach, which is based on Markov Chain, to cluster and recommend candidate views for the selection algorithm of materialized views. Our idea is to intervene at regular period of time in order to filter the candidate views which will be used by an algorithm for the selection of materialized views in real-time data warehouse. The aim is to reduce the complexity and the execution cost of the online selection of materialized views. Our experiment results have shown that our solution is very efficient to specify the more profitable views and to improve the query response time.
In this paper we propose an approach, which is based on Markov Chain, to cluster and recommend
candidate views for the selection algorithm of materialized views. Our idea is to intervene at regular period
of time in order to filter the candidate views which will be used by an algorithm for the selection of
materialized views in real-time data warehouse. The aim is to reduce the complexity and the execution cost
of the online selection of materialized views. Our experiment results have shown that our solution is very
efficient to specify the more profitable views and to improve the query response time.
As part of our team's enrollment for Data Science Super Specialization course under UpX Academy, we submitted many projects for our final assessments, one of them was Telecom Churn Analysis Model.
The input data was provided by UpX academy and language we used is R. As part of the project, our main objective was :-
-> To predict Customer Churn.
-> To Highlight the main variables/factors influencing Customer Churn.
-> To Use various ML algorithms to build prediction models, evaluate the accuracy and performance of these models.
-> Finding out the best model for our business case & providing executive Summary.
To address the mentioned business problem, we tried to follow a thorough approach. We did a detailed level Exploratory Data Analysis which consists of various Box Plots, Bar Plots etc..
Further we tried our best to build as many Classification models possible which fits our business case (Logistic Regression/kNN/Decision Trees/Random Forest/SVM) and also tried to touch Cox Hazard Survival analysis Model. Later for every model we tried to boost their performances by applying various performance tuning techniques.
As we all are still into our learning mode w.r.t these concepts & starting new, please feel free to provide feedback on our work. Any suggestions are most welcome... :)
Thanks!!
Mining Large Streams of User Data for PersonalizedRecommenda.docxARIV4
This document provides an overview of Netflix's approach to personalized recommendations. It discusses lessons learned from the Netflix Prize competition, including the effectiveness of matrix factorization and restricted Boltzmann machine models. It then describes Netflix's process for innovating recommendations called "Consumer Data Science", which uses A/B testing to evaluate new approaches. Finally, it outlines the key components of Netflix's personalized experience, noting that recommendations are integrated throughout the interface and account for differences within households.
Recommendation system using bloom filter in mapreduceIJDKP
Many clients like to use the Web to discover product details in the form of online reviews. The reviews are
provided by other clients and specialists. Recommender systems provide an important response to the
information overload problem as it presents users more practical and personalized information facilities.
Collaborative filtering methods are vital component in recommender systems as they generate high-quality
recommendations by influencing the likings of society of similar users. The collaborative filtering method
has assumption that people having same tastes choose the same items. The conventional collaborative
filtering system has drawbacks as sparse data problem & lack of scalability. A new recommender system is
required to deal with the sparse data problem & produce high quality recommendations in large scale
mobile environment. MapReduce is a programming model which is widely used for large-scale data
analysis. The described algorithm of recommendation mechanism for mobile commerce is user based
collaborative filtering using MapReduce which reduces scalability problem in conventional CF system.
One of the essential operations for the data analysis is join operation. But MapReduce is not very
competent to execute the join operation as it always uses all records in the datasets where only small
fraction of datasets are applicable for the join operation. This problem can be reduced by applying
bloomjoin algorithm. The bloom filters are constructed and used to filter out redundant intermediate
records. The proposed algorithm using bloom filter will reduce the number of intermediate results and will
improve the join performance.
Mixed Recommendation Algorithm Based on Content, Demographic and Collaborativ...IRJET Journal
The document describes a proposed hybrid recommendation algorithm that incorporates content filtering, collaborative filtering, and demographic filtering. It begins with an overview of recommendation systems and different filtering techniques. Then, it discusses related work incorporating various filtering approaches. The methodology section outlines the original algorithm, which develops user profiles based on browsing history and ratings. It provides recommendations by calculating similarities between user and item profiles. The proposed methodology enhances this by incorporating demographic attributes into user profiles and using fuzzy logic to validate recommendations. It claims this integrated approach can provide more accurate and personalized recommendations.
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...IRJET Journal
This document discusses evaluating and enhancing the efficiency of recommendation systems using big data analytics. It begins with an abstract that outlines recommendation systems, collaborative filtering, and the need for big data analytics due to large datasets. It then discusses specific collaborative filtering techniques like user-based, item-based, and matrix factorization. It describes challenges like scalability that big data analytics can help address. The document evaluates recommendation algorithms using metrics like MAE, RMSE, precision and time taken on movie recommendation datasets. It aims to design an efficient recommendation system using the best techniques.
Tourism Based Hybrid Recommendation SystemIRJET Journal
This paper proposes a hybrid tourism recommendation system that combines collaborative filtering, content-based filtering, and aspect-based sentiment analysis to improve accuracy and address cold start problems. The system analyzes user ratings and reviews to predict ratings for other tourism packages. It stores ratings, reviews, and sentiment information in a database to enhance recommendations. Results showed the hybrid approach increased efficiency over conventional methods. Future work could include testing on additional datasets and expanding the system.
Priority Based Prediction Mechanism for Ranking Providers in Federated Cloud ...IJERA Editor
In Current trends the Cloud computing has a lot of potential to help hospitals cut costs, a new report says. Here‟s some help overcoming some of the challenges hiding in the cloud. Previously there several methods are available for this like Broker based trust architecture and etc. Health care framework which are patient, doctor, symptom and disease. In this paper we are going to discuss the broker based architecture for federated cloud and its Service Measurement Index considered for evaluating the providers,construction of Grade Distribution Table(GDT),concept of ranking the providers based on prediction weights comparison, optimal service provider selection and results discussion compared with existing techniques available for ranking the providers. In this paper we are going to propose, two different ranking mechanisms to sort the providers and select the optimal provider automatically. Grade distribution ranking model is proposed by assigning the grade for the providers based on the values of SMI attributes, based on the total grade value, providers falls on either Gold, Silver or Bronze. Each category applies the quick sort to sort the providers and find the provider at the top is optimal. If there is more than one provider at the top, apply priority feedback based decision tree and find the optimal provider for the request. In the second ranking mechanism, joint probability distribution mechanism is used to rank the providers, two providers having same score, apply priority feedback based decision tree and find the optimal provider for the request.
A Review Study OF Movie Recommendation Using Machine LearningIRJET Journal
This document reviews machine learning techniques for movie recommendation systems. It discusses content-based filtering, collaborative filtering, and hybrid filtering as the most common approaches. Specific machine learning algorithms described for movie recommendations include k-means clustering to group similar users, k-nearest neighbors to find the closest matches to a user's preferences, and support vector machines to classify movies a user may like. The document also reviews challenges like cold start problems and potential solutions explored in previous literature.
Cross Domain Recommender System using Machine Learning and Transferable Knowl...IRJET Journal
This document discusses a proposed cross-domain recommender system that uses machine learning techniques to address the cold start problem. It involves a two stage process:
1) The TrAdaBoost algorithm is used to select some initial items to recommend to users in the target domain.
2) A nonparametric pairwise clustering algorithm clusters users into groups based on whether they would recommend an item or not.
PageRank is then used to provide additional relevant and unsearched content to users. The goal is to transfer useful knowledge from an auxiliary domain to help address cold start issues in the target domain.
Collaborative filtering is a technique used in recommender systems to predict a user's preferences based on other similar users' preferences. It involves collecting ratings data from users, calculating similarities between users or items, and making recommendations. Common approaches include user-user collaborative filtering, item-item collaborative filtering, and probabilistic matrix factorization. Recommender systems are evaluated both offline using metrics like MAE and RMSE, and through online user testing.
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Editor IJAIEM
Dr.G.Anandharaj1, Dr.P.Srimanchari2
1Associate Professor and Head, Department of Computer Science
Adhiparasakthi College of Arts and Science (Autonomous), Kalavai, Vellore (Dt) -632506
2 Assistant Professor and Head, Department of Computer Applications
Erode Arts and Science College (Autonomous), Erode (Dt) - 638001
ABSTRACT
In unpredictable increase in mobile apps, more and more threats migrate from outmoded PC client to mobile device. Compared
with traditional windows Intel alliance in PC, Android alliance dominates in Mobile Internet, the apps replace the PC client
software as the foremost target of hateful usage. In this paper, to improve the confidence status of recent mobile apps, we
propose a methodology to estimate mobile apps based on cloud computing platform and data mining. Compared with
traditional method, such as permission pattern based method, combines the dynamic and static analysis methods to
comprehensively evaluate an Android applications The Internet of Things (IoT) indicates a worldwide network of
interconnected items uniquely addressable, via standard communication protocols. Accordingly, preparing us for the
forthcoming invasion of things, a tool called data fusion can be used to manipulate and manage such data in order to improve
progression efficiency and provide advanced intelligence. In this paper, we propose an efficient multidimensional fusion
algorithm for IoT data based on partitioning. Finally, the attribute reduction and rule extraction methods are used to obtain the
synthesis results. By means of proving a few theorems and simulation, the correctness and effectiveness of this algorithm is
illustrated. This paper introduces and investigates large iterative multitier ensemble (LIME) classifiers specifically tailored for
big data. These classifiers are very hefty, but are quite easy to generate and use. They can be so large that it makes sense to use
them only for big data. Our experiments compare LIME classifiers with various vile classifiers and standard ordinary ensemble
Meta classifiers. The results obtained demonstrate that LIME classifiers can significantly increase the accuracy of
classifications. LIME classifiers made better than the base classifiers and standard ensemble Meta classifiers.
Keywords: LIME classifiers, ensemble Meta classifiers, Internet of Things, Big data
IRJET- Book Recommendation System using Item Based Collaborative FilteringIRJET Journal
This document describes an item-based collaborative filtering approach for a book recommendation system. It discusses different recommendation system techniques including collaborative filtering, content-based filtering, and hybrid filtering. It then focuses on item-based collaborative filtering, explaining how it calculates item similarities using adjusted cosine similarity and makes predictions using weighted sums. The document tests the approach on the Goodbooks10k dataset and evaluates it using mean absolute error, finding lower error rates with more neighbor items. In conclusion, item-based collaborative filtering is an effective approach for book recommendations.
An enhanced kernel weighted collaborative recommended system to alleviate spa...IJECEIAES
User Reviews in the form of ratings giving an opportunity to judge the user interest on the available products and providing a chance to recommend new similar items to the customers. Personalized recommender techniques placing vital role in this grown ecommerce century to predict the users‟ interest. Collaborative Filtering (CF) system is one of the widely used democratic recommender system where it completely rely on user ratings to provide recommendations for the users. In this paper, an enhanced Collaborative Filtering system is proposed using Kernel Weighted K-means Clustering (KWKC) approach using Radial basis Functions (RBF) for eliminate the Sparsity problem where lack of rating is the challenge of providing the accurate recommendation to the user. The proposed system having two phases of state transitions: Connected and Disconnected. During Connected state the form of transition will be „Recommended mode‟ where the active user be given with the Predicted-recommended items. In Disconnected State the form of transition will be „Learning mode‟ where the hybrid learning approach and user clusters will be used to define the similar user models. Disconnected State activities will be performed in hidden layer of RBF and Connected Sate activities will be performed in output Layer. Input Layer of RBF using original user Ratings. The proposed KWKC used to smoothen the sparse original rating matrix and define the similar user clusters. A benchmark comparative study also made with classical learning and prediction techniques in terms of accuracy and computational time. Experiential setup is made using MovieLens dataset.
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...IJTET Journal
This document describes a proposed algorithm for improving recommendation systems for e-services. It involves the following key steps:
1. Clustering customer transaction histories to group similar purchase patterns and derive customer-based recommendations.
2. Using incremental association rule mining on the transaction data to detect frequently purchased item sets and relationships between items.
3. Developing a fuzzy model to classify customers and provide dynamic recommendations tailored to different customer types. The recommendations will be based on matching customer preferences and purchase histories to specific product sets.
4. The algorithm clusters transactions, mines association rules incrementally as new data is added, and generates recommendations by classifying customers and matching them to relevant product clusters. This provides a personalized and
IRJET- Analysis of Rating Difference and User InterestIRJET Journal
This document summarizes a research paper that proposes a collaborative filtering recommendation algorithm that incorporates rating differences and user interests. It first adds a rating difference factor to the traditional collaborative filtering algorithm. It then calculates user interests based on item attributes and the similarity between user interests. Recommendations are made by weighting user rating differences and interest similarities. The proposed algorithm is shown to reduce error rates and improve accuracy compared to traditional collaborative filtering.
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...IJCSEA Journal
This document compares collaborative filtering algorithms with various similarity measures for movie recommendations. It summarizes User-based and Item-based collaborative filtering algorithms implemented in the Apache Mahout framework. Various similarity measures used in collaborative filtering are discussed, including Euclidean distance, Log Likelihood Ratio, Pearson correlation, Tanimoto coefficient, Uncentered Cosine, and Spearman correlation. The document concludes that Item-based algorithms typically provide better results than User-based algorithms for movie recommendations.
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEAS...IJCSEA Journal
Collaborative Filtering is generally used as a recommender system. There is enormous growth in the amount of data in web. These recommender systems help users to select products on the web, which is the most suitable for them. Collaborative filtering-systems collect user’s previous information about an item such as movies, music, ideas, and so on. For recommending the best item, there are many algorithms, which are based on different approaches. The most known algorithms are User-based and Item-based algorithms. Experiments show that Item-based algorithms give better results than User-based algorithms. The aim of this paper isto compare User-based and Item-based Collaborative Filtering Algorithms with many different similarity indexes with their accuracy and performance. We provide an approach to determine the best algorithm, which give the most accurate recommendation by using statistical accuracy metrics. The results are compared the User-based and Item-based algorithms with movie recommendation data set.
This document discusses analyzing a movie recommendation system using the MovieLens dataset. It compares user-based and item-based collaborative filtering approaches. For user-based filtering, it calculates user similarity using cosine similarity and predicts ratings. For item-based filtering, it also uses cosine similarity to find similar items and predicts ratings. It evaluates the performance of both approaches using root mean square deviation and finds that item-based collaborative filtering has lower error compared to user-based filtering.
An Adaptive Framework for Enhancing Recommendation Using Hybrid Techniqueijcsit
Recommender systems provide useful recommendations to a collection of users for items or products that
might be of concern or interest to them. Several techniques have been proposed for recommendation such
as collaborative filtering, content-based, knowledge-based, and demographic filtering. Each of these
techniques suffers from scalability, data sparsity, and cold-start problems when applied individually
resulting in poor recommendations. This paper proposes an adaptive hybrid recommender system that
combines multiple techniques together to achieve some synergy between them. Collaborative filtering and
demographic techniques are combined in a weighted linear formula. Different experiments applied using
movieLen dataset confirm that the proposed adaptable hybrid framework outperforms the weaknesses
resulted when using traditional recommendation techniques.
Improving Service Recommendation Method on Map reduce by User Preferences and...paperpublications3
Abstract: Service recommender systems have been shown as valuable tools for providing appropriate recommendations to users. In the last decade, the amount of customers, services and online information has grown rapidly, yielding the big data analysis problem for service recommender systems. Consequently, traditional service recommender systems often suffer from scalability and inefficiency problems most of existing service recommender systems present the same ratings and rankings of services to different users without considering diverse users' preferences, and therefore fails to meet users' personalized requirements. In this paper, to address the above challenges and presenting a personalized service recommendation list and recommending the most appropriate services to the users effectively. Specifically, keywords are used to indicate users' preferences, and a user-based Collaborative filtering algorithm is adopted to generate appropriate recommendations.Keywords: recommender system, user preference, keyword, Big Data, mapreduce, Hadoop.
Title: Improving Service Recommendation Method on Map reduce by User Preferences and Reviews
Author: Dayanand Bhovi, Mr. Ashwin Kumar
ISSN 2350-1022
International Journal of Recent Research in Mathematics Computer Science and Information Technology
Paper Publications
Recommender systems: a novel approach based on singular value decompositionIJECEIAES
Due to modern information and communication technologies (ICT), it is increasingly easier to exchange data and have new services available through the internet. However, the amount of data and services available increases the difficulty of finding what one needs. In this context, recommender systems represent the most promising solutions to overcome the problem of the so-called information overload, analyzing users' needs and preferences. Recommender systems (RS) are applied in different sectors with the same goal: to help people make choices based on an analysis of their behavior or users' similar characteristics or interests. This work presents a different approach for predicting ratings within the model-based collaborative filtering, which exploits singular value factorization. In particular, rating forecasts were generated through the characteristics related to users and items without the support of available ratings. The proposed method is evaluated through the MovieLens100K dataset performing an accuracy of 0.766 and 0.951 in terms of mean absolute error and root-mean-square error.
Similar to International Journal of Computational Engineering Research(IJCER) (20)
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Climate Impact of Software Testing at Nordic Testing Days
International Journal of Computational Engineering Research(IJCER)
1. International Journal of Computational Engineering Research||Vol, 03||Issue, 9||
||Issn 2250-3005 || ||September||2013|| Page 25
Overview Of Recommender System & A Speedy Approach In
Collaborative Filtering Recommendation Algorithms With
Mapreduce
1,
Prof. Dr. R. Shankar , 2,
Prof. Ateshkumar Singh , 3,
Ms. Ila Naresh Patil
1,
IES College of Technology, 2,
M Tech. CSE * M Tech. CSE*
1,
Bhopal, MP, India IES College of Technology, 2,
IES College of Technology,
1,
Bhopal, MP, India 2
,Bhopal, MP, India
I. INTRODUCTION
Recommender systems provide an important response to the information overload problem as it
presents users more practical and personalized information services. Three types of recommender systems:
content-based recommender systems, collaborative recommender systems and trust based recommender system.
Recommendation systems generate a ranked list of items on which a user might be interested. It is useful to
approximate the degree to which specific user will like a specific product. The Recommender systems are useful
in predicting the helpfulness of controversial reviews [1]. Recommender systems are a powerful new technology
& help users to find items they want to buy from a business. Recommender systems are rapidly becoming a
crucial tool in E-commerce on the Web.
In this paper we propose a new method that is implementing the user-based Collaborative Filtering
algorithm on distributed implementation model, MapReduce model, on Hadoop platform to solve sparse data
problem. The MapReduce model is inspired by the Lisp programming language map and reduces operations.
Typically, the Map/Reduce framework and the Hadoop Distributed File System ( HDFS Architecture ) are
running on the same set of nodes. This configuration allows the framework to effectively schedule tasks on the
nodes where data is already present, resulting in very high aggregate bandwidth across the cluster.
The Map/Reduce framework consists of a single master JobTracker and one slave TaskTracker per cluster-node.
The master is responsible for scheduling the jobs' component tasks on the slaves, monitoring them and re-
executing the failed tasks. The slaves execute the tasks as directed by the master.This paper is organized as
follows. Section 2 explains related work in commonly used recommendation strategies. Section 3 explains
MapReduce model on Hadoop platform. Section 4 includes the experimental analysis & Section 5 concludes the
paper.
ABSTRACT
Recommender systems suggest people items or services of their interest and proved to be an
important solution to information overload problem. The big problem of collaborative filtering is its. In
order to solve scalability problem, we can implement the Collaborative Filtering algorithm on the cloud
computing platform using Hadoop’s MapReduce. The work given here is focusing on the algorithm of
recommendation mechanism for mobile commerce using combination of MapReduce and user based CF
algorithm to overcome scalability. MapReduce is a programming model for expressing distributed
computations on massive amounts of data and an execution framework for large-scale data processing
on clusters of commodity servers. It was built on well-known principles in parallel and distributed
processing.
KEYWORDS: recommender system, collaborative filtering, speedup, partitioning, cloud-computing,
hadoop, Map-Reduce.
2. Overview Of Recommender System & A Speedy…
||Issn 2250-3005 || ||September||2013|| Page 26
II. RELATED WORK
2.1.Classification of Recommender Systems
2.2.Ontology-based Recommender System
In [2], the peer-to-peer network (P2P network) is based on decentralized architecture has the progress
of ontology-based recommender system. This is basically works with dynamically changing large scale
environment. In [3] a ontology-based multilayered semantic social network, is introduced. This model works on
a set of users having similar interest and the correlation at different semantic levels.
2.3.Collaborative Tagging-based Recommender System
In [4], the collaborative tagging-based recommender allows users particularly consumers to freely
connect tags or keywords to data contents. In [5] & [6] a generic model of collaborative tagging to recognize the
dynamics behind it. The tag-based system suggests the use of high quality tags, by which spam and noise can be
avoided.
2.4.Recommendation Methodologies
Basically there are three methods as content based, collaborative and trust based [7].
2.5.Content based Strategy
In [8], the Content Based (CB) method provides suggestions based on items similar to those that user
has previously purchased or reviewed. It provides the recommendations based on the contents of documents &
each user’s preferences.
2.6.Collaborative Filtering Strategy
Collaborative filtering (CF) provides personalized recommendations based on the knowledge of similar
users in the system. CF is focused on the principle that the finest recommendations for an individual are given
by people who have similar interest. Collaborative filtering identifies users with choice similar to the target user
and then built predictions based on the score of the neighbors. The job in collaborative filtering is to guess the
usefulness of product to a particular user which is based on a database of user votes. The CF algorithms predicts
ranking of a target item for target user with the help of ranking of the similar users that are known to item under
consideration[9]. There are six collaborative filtering algorithms are evaluated. These algorithms accepts values
for a interaction matrix A of order M x N = aij where M represents number of consumers (
C1,C2,C3,……CM) and N represents number of products or services (P1,P2,P3,……PN). The value of aij varies
based on transaction. The value of aij can be either 0 or 1. When aij = 1, means transaction between Ci & Pi (Ci
has brought Pi). when aij = 0, means absence of transaction between Ci & Pi. The outcome of the algorithm is a
list of probable ranked product for each consumer.
The user–based CF algorithm
This algorithm generates a list of recommendation of user interest in three steps. In first step, it searches N
users in database which are similar to active user by creating customer similarity matrix WC = (Wcst). The high
value of Wcst indicates the consumers X & Y have similar liking as they have already brought many similar
products .In second step, it calculates union of the items purchased by these users & link a respective weight
with every item based on its significance in the set. Finally, in the third step, generates the list of recommended
items & which have not already been brought by the active user. The resulting matrix will have element at Cth
row & Pth
column combine S the scores of the similarities between consumer C and other consumers who have
purchased the product.
2.7.The item-based CF algorithm
This algorithm is based on the similar principal of user-based. The item-based algorithm determines product
similarities instead of consumer similarity. It generates a product similarity matrix, Wp = (Wpst) which is based
on the column vectors of A. A high Wpst shows that products X and Y are similar as many consumers have
brought both of them. A WP will give the products’ probable scores for each consumer. Resulting matrix will be
containing the element at the Cth
row and Pth
column combines the scores of the similarities between product P
and other products that consumer C has purchased. This algorithm provides higher efficiency and comparable or
better recommendation quality than the user-based algorithm for many data sets [10].
2.8.The dimensionality-reduction algorithm
This algorithm compresses original interaction matrix & produce recommendations which are based on
compressed, less-sparse matrix to simplify the scarcity problem. It applies standard singular-vector
3. Overview Of Recommender System & A Speedy…
||Issn 2250-3005 || ||September||2013|| Page 27
decomposition (SVD is a technique of matrix factorization) to decompose the interaction matrix A into U· Z ·
V’ where U and V are two orthogonal matrices of size M x R and N x R respectively, and R is the rank of
matrix A. Z is diagonal matrix of size R x R which has all singular values at its diagonal values. SVD can be
used in recommender systems & has two features. SVD can be used to construct a low-dimensional image of
customer-product space & calculates region in reduced space[11].
2.9.The generative-model algorithm
This algorithm approximates appropriate possibility & conditional probability. Based on estimated
probability it creates score of product p for consumer c[12]. The approach is a statistical technique called as
probabilistic Latent Semantic Analysis which was initially extended in the context of information retrieval. The
method accomplishes competitive recommendation and calculation accuracies, is highly scalable, and extremely
flexible.
2.10.The spreading-activation algorithm
This algorithm focuses on scarcity problem by discovering transitive associations between consumers
& products by using bipartite consumer-product transitive-graph. The algorithm uses the graph to find out
transitive connections [13]. It focuses on high quality recommendations when sufficient data is not available.
The spreading activation algorithm consists of a series of nodes both user and item nodes. These nodes are then
connected by edges where each weighted edge represents the ratings the item has received from the users. The
higher the weight of the edge the higher the rating that item has received. The item nodes then send back
“pulses” to the active user and their neighbors thus spreading the activation to the other nodes in the
neighborhood of the active user.
2.11.The link-analysis algorithm
In this consumer-product graph, the global graph structure is used to help collaborative filtering under
sparse data [14]. In this graph first set of nodes consists of products, services & information items for probable
utilization. The second set consists of consumers or users. The feedback and transaction are represented as links
connecting nodes between these two sets. This graph is referred as consumer-product graph. The link analysis
algorithms such as HITS (Hypertext Induced Topic Selection) & PageRank are used for identifying essential
web pages in a Web graph. The algorithm focuses on the extract helpful link structure details from the
consumer-product graph & make more effective recommendation with sparse data.
2.12.Trust-Based Strategy
In Trust Based Recommendation systems, trust network is used in which users are joined by trust
scores which indicate how much faith they have in each other. The user’s trust network is constructed for
generating predictions [15] & [16]. It has three steps. In first step, considers direct trust. The direct has two
methods: explicitly or implicitly. The second step is propagation of trust. It is possible to propagate the trust i.e.
create new relations among users. The third step is predicting ratings. From the trust network, we can predict
what ratings the particular user would give for items.
2.13.MapReduce Collaborative Filtering Model
The conventional Collaborative Filtering consumes intensive computing time & computer resources
especially when the dataset is very high. Here the MapReduce model is a distributed implementation model
which is proposed by Google com. We introduce the MapReduce model and describe its working mechanism on
Hadoop platform. The MapReduce model abstracts the calculation process into two core phases:
Figure1. Collaborative Filtering Using MapReduce
4. Overview Of Recommender System & A Speedy…
||Issn 2250-3005 || ||September||2013|| Page 28
A. Data segmentation stage
In this phase[17], we separate the user ID into different files, in these files, each row store a user ID. These
files as the input files of map phase, the data partitioning should satisfy 2 principles as follows:In the total
running time, the proportion of computing time, the bigger the better. That means the most proportion of run
time should be spent in the computation process, rather than frequently initialize the mapper. The same end of
the tasks running time. That is the end of each mapper task time should be at the same time.
B. Map stage :Map stage : It is the function written by user which takes a set of input key/value pairs, and
produces a set of output key/value pairs. In the Map, written by the user, takes a set of input key/value pairs, and
produces a set of output key/value pairs. At this stage, the Hadoop platform estimate the algorithm’s memory
and others resources consumption, specified each DataNode the number of mapper it can be initialized. The
Hadoop platform determines whether initialize a mapper to deal with the user ID files. If there have enough
resources to initialize a mapper, the Hadoop platform initializes a new mapper. The mapper’s setup function
build the ratings-matrix between the users and the items at first, the mapper read the user ID file by line number,
take the line number as the input key and this line corresponding user ID as the value. The next step is to
calculate the similarity between this user and other users. The final step is to identify the user’s nearest
neighbors (by similarity values), and in accordance with the predict rating on items. We sort the predict ratings
and store them in recommendation list. The user ID and its corresponding recommend-list as the intermediate
key/value, output them to the reduce phase.Mapping creates a new output list by applying a function to
individual elements of an input list[18].
Figure2. Mapping
C. Reduce stage
It is also user defined function, accepts an intermediate key I and a set of values for that key. It merges
together these values to form a possibly smaller set of values. Typically, none output value or only one is
produced per Reduce invocation.In the reduce phase, the reducer collects the users ID and its corresponding
recommend list, sort them according to user ID, and then output them to the HDFS in which the reducers are
generated by hadoop platform.The computation takes a set of input key/value pairs, and produces a set of output
key/value pairs. The user of the MapReduce library expresses the computation as two functions: map and
reduce. Map, written by the user, takes an input pair and produces a set of intermediate key/value pairs[19]. The
MapReduce library groups together all intermediate values associated with the same intermediate key I and
passes them to the reduce function. The reduce function, also written by the user, accepts an intermediate key I
and a set of values for that key. It merges these values together to form a possibly smaller set of values.
Typically just zero or one output value is produced per reduce invocation. The intermediate values are supplied
to the user’s reduce function via an inter-mediator. This allows us to handle lists of values that are too large to
fit in memory.
Figure3. Reducing
5. Overview Of Recommender System & A Speedy…
||Issn 2250-3005 || ||September||2013|| Page 29
3.1 MapReduce framework for CF
Here, we present the implementation of the Collaborative-Filtering algorithm within the MapReduce
framework. In [20], when we make recommendation, we would store the user ID which need to calculate in
some txt files, then these files as the input of the Map function. The MapReduce framework initializes some
mapper to deal with these user ID files. Our algorithm could be divided into the following stages, as shown in
Figure 4:
Figure 4. Collaborative Filtering with MapReduce
3.2.Experimental Analysis
We have implemented our experiments for CF algorithm on Java platform. As explained earlier in
section 3, the Hadoop computer-cluster created on five computers. Here, we refer one of the computers as
MainNode & remaining four as DataSetNodes. Each computer is having 4 GB RAM & Intel(R)core(TM) i5
CPU with 2.5GHz speed & Operating System Ubuntu 10.10. also the software used for the experiments are
Hadoop MapReduce framework, Java JDK 1.6, the Mobile device (Android 3.0 & above), wireless Router are
additional hardware we have used. The dataset is created by Netflix data set. The list of different movies is
maintained in the dataset and more than 10,000 users. The users will define different ratings for each movie, not
necessary the same rating. The role of our CF algorithm is to compare the runtime between standalone &
Hadoop platform, so that we don’t focus on accuracy. We take 3 copies of sub-datasets with 100 users, 200
users, 500 users & 1000 users. The DataSetNode is also divided into 2 nodes, 3 nodes, 5 nodes.
For the comparative analysis of standalone & Hadoop platform, we have considered average time tavg as the
Hadoop platform at current DataSetNode and the data set running time. Here the speedup is an important criteria
to measure the efficiency of our algorithm. The speedup is given by,
Speedup = tavg / tsd1
In our CF algorithm the recommendation is based on the division of each user theoretically, if we
consider N nodes the speedup should be N. in other words, ideally the speedup should be linearly related to the
number of DataSetNode. In the figure 5 we have shown the analytical result in graph which implies, increase in
number of DataSetNodes, the speedup increases linearly. Also from the graph we can observe for 100 users, 200
users, the speedup is not linearly increase, this is because the data set is too small, thus the Hadoop platform is
unable to demonstrate its efficiency.
Figure 5. Speedup of CF of MapReduce