The performance of a Collaborative Filtering (CF) method is based on the properties of a User-Item Rating Matrix (URM). And the properties or Rating Data Characteristics (RDC) of a URM are constantly changing. Recent studies significantly explained the variation in the performances of CF methods resulted due to the change in URM using six or more RDC. Here, we found that the significant proportion of variation in the performances of different CF techniques can be accounted to two RDC only. The two RDC are the number of ratings per user or Information per User (IpU) and the number of ratings per item or Information per Item (IpI). And the performances of CF algorithms are quadratic to IpU (or IpI) for a square URM. The findings of this study are based on seven well-established CF methods and three popular public recommender datasets: 1M MovieLens, 25M MovieLens, and Yahoo! Music Rating datasets.
SENSITIVITY ANALYSIS OF INFORMATION RETRIEVAL METRICS ijcsit
Average Precision, Recall and Precision are the main metrics of Information Retrieval (IR) systems performance. Using Mathematical and empirical analysis, in this paper, we show the properties of those metrics. Mathematically, it is demonstrated that all those parameters are very sensitive to relevance judgment which is not usually very reliable. We show that position shifting downwards of the relevant document within the ranked list is followed by Average Precision decreasing. The variation of Average Precision parameter value is highly present in the positions 1 to 10, while from the 10th position on, this variation is negligible. In addition, we try to estimate the regularity of the Average Precision value changes, when we assume that we are switching the arbitrary number of relevance judgments within the existing ranked list, from non-relevant to relevant. Empirically, it is shown hat 6 relevant documents at the end of the 20 document list, have approximately same Average Precision value as a single relevant document at the beginning of this list, while Recall and Precision values increase linearly, regardless of the document position in the list. Also, we show that in the case of Serbian-to-English human translation query followed by English-to-Serbian machine translation, relevance judgment is significantly changed and therefore, all the parameters for measuring the IR system performance are also subject to change.
The document proposes a Response Aware Probabilistic Matrix Factorization (RAPMF) framework to address limitations in existing collaborative filtering recommendation systems. RAPMF incorporates users' response patterns into probabilistic matrix factorization by modeling responses as a Bernoulli distribution for observed ratings and a step function for unobserved ratings. This allows marginalizing missing responses. The authors also develop a mini-batch implementation of RAPMF to reduce computational costs from O(N×M) to O(B2) for mini-batches of B users and items. Experimental evaluation on synthetic and real-world datasets demonstrates the merits of RAPMF, including improved performance and reduced training time compared to other methods.
IMAGE QUALITY ASSESSMENT- A SURVEY OF RECENT APPROACHES cscpconf
Image Quality Assessment (IQA) is the process of quantifying degradation in image quality.
With the increasedimage-basedapplicationsIQAdeservesextensiveresearch.Inthis paper we have
presented popular IQA methods for the three types namely, Full Reference (FR), No Reference
(NR) and Reduced Reference (RR). The paper gives comparison of the approaches in terms of
the database used, the performance metric and the methods used.
CPU HARDWARE CLASSIFICATION AND PERFORMANCE PREDICTION USING NEURAL NETWORKS ...gerogepatton
This document summarizes a study that uses neural networks and statistical learning methods to classify CPU vendors based on estimated performance and predict CPU performance based on hardware components. For vendor classification, classification zones are created based on the highest/lowest estimated performance and frequency of each vendor, to identify vendors that meet a given performance requirement. For performance prediction, neural networks and regression models are used and compared. Experiments show neural networks obtain low error and high correlation, while cubic regression has the highest correlation when more data is used for training than testing.
CPU HARDWARE CLASSIFICATION AND PERFORMANCE PREDICTION USING NEURAL NETWORKS ...ijaia
We propose a set of methods to classify vendors based on estimated CPU performance and predict CPU performance based on hardware components. For vendor classification, we use the highest and lowest estimated performance and frequency of occurrences of each vendor to create classification zones. These zones can be used to identify vendors who manufacture hardware that satisfy a given performance requirement. We use multi-layered neural networks for performance prediction, which account for nonlinearity in performance data. Various neural network architectures are analysed in comparison to linear, quadratic, and cubic regression. Experiments show that neural networks obtain low error and high correlation between predicted and published performance values, while cubic regression produces higher
correlation than neural networks when more data is used for training than testing. An analysis of how the neural network architecture affects prediction is also performed. The proposed methods can be used to identify suitable hardware replacements.
This document discusses analyzing a movie recommendation system using the MovieLens dataset. It compares user-based and item-based collaborative filtering approaches. For user-based filtering, it calculates user similarity using cosine similarity and predicts ratings. For item-based filtering, it also uses cosine similarity to find similar items and predicts ratings. It evaluates the performance of both approaches using root mean square deviation and finds that item-based collaborative filtering has lower error compared to user-based filtering.
IRJET - Enhanced Movie Recommendation Engine using Content Filtering, Collabo...IRJET Journal
This document describes a study that developed an enhanced movie recommendation engine (MRE) using content filtering, collaborative filtering, and popularity filtering. The MRE analyzes movie data from three datasets and makes recommendations based on similarities in movie titles, genres, plots, casts, directors, keywords, vote counts, and vote averages. Evaluation shows the MRE achieves a root mean squared error of 0.873 and mean absolute error of 0.671 when using collaborative filtering, indicating good performance. The MRE provides a more personalized and accurate recommendation system for movies by combining multiple filtering techniques.
Building a new CTL model checker using Web Servicesinfopapers
Florin Stoica, Laura Stoica, Building a new CTL model checker using Web Services, Proceeding The 21th International Conference on Software, Telecommunications and Computer Networks (SoftCOM 2013), At Split-Primosten, Croatia, 18-20 September, pp. 285-289, 2013
DOI=10.1109/SoftCOM.2013.6671858 http://dx.doi.org/10.1109/SoftCOM.2013.6671858
SENSITIVITY ANALYSIS OF INFORMATION RETRIEVAL METRICS ijcsit
Average Precision, Recall and Precision are the main metrics of Information Retrieval (IR) systems performance. Using Mathematical and empirical analysis, in this paper, we show the properties of those metrics. Mathematically, it is demonstrated that all those parameters are very sensitive to relevance judgment which is not usually very reliable. We show that position shifting downwards of the relevant document within the ranked list is followed by Average Precision decreasing. The variation of Average Precision parameter value is highly present in the positions 1 to 10, while from the 10th position on, this variation is negligible. In addition, we try to estimate the regularity of the Average Precision value changes, when we assume that we are switching the arbitrary number of relevance judgments within the existing ranked list, from non-relevant to relevant. Empirically, it is shown hat 6 relevant documents at the end of the 20 document list, have approximately same Average Precision value as a single relevant document at the beginning of this list, while Recall and Precision values increase linearly, regardless of the document position in the list. Also, we show that in the case of Serbian-to-English human translation query followed by English-to-Serbian machine translation, relevance judgment is significantly changed and therefore, all the parameters for measuring the IR system performance are also subject to change.
The document proposes a Response Aware Probabilistic Matrix Factorization (RAPMF) framework to address limitations in existing collaborative filtering recommendation systems. RAPMF incorporates users' response patterns into probabilistic matrix factorization by modeling responses as a Bernoulli distribution for observed ratings and a step function for unobserved ratings. This allows marginalizing missing responses. The authors also develop a mini-batch implementation of RAPMF to reduce computational costs from O(N×M) to O(B2) for mini-batches of B users and items. Experimental evaluation on synthetic and real-world datasets demonstrates the merits of RAPMF, including improved performance and reduced training time compared to other methods.
IMAGE QUALITY ASSESSMENT- A SURVEY OF RECENT APPROACHES cscpconf
Image Quality Assessment (IQA) is the process of quantifying degradation in image quality.
With the increasedimage-basedapplicationsIQAdeservesextensiveresearch.Inthis paper we have
presented popular IQA methods for the three types namely, Full Reference (FR), No Reference
(NR) and Reduced Reference (RR). The paper gives comparison of the approaches in terms of
the database used, the performance metric and the methods used.
CPU HARDWARE CLASSIFICATION AND PERFORMANCE PREDICTION USING NEURAL NETWORKS ...gerogepatton
This document summarizes a study that uses neural networks and statistical learning methods to classify CPU vendors based on estimated performance and predict CPU performance based on hardware components. For vendor classification, classification zones are created based on the highest/lowest estimated performance and frequency of each vendor, to identify vendors that meet a given performance requirement. For performance prediction, neural networks and regression models are used and compared. Experiments show neural networks obtain low error and high correlation, while cubic regression has the highest correlation when more data is used for training than testing.
CPU HARDWARE CLASSIFICATION AND PERFORMANCE PREDICTION USING NEURAL NETWORKS ...ijaia
We propose a set of methods to classify vendors based on estimated CPU performance and predict CPU performance based on hardware components. For vendor classification, we use the highest and lowest estimated performance and frequency of occurrences of each vendor to create classification zones. These zones can be used to identify vendors who manufacture hardware that satisfy a given performance requirement. We use multi-layered neural networks for performance prediction, which account for nonlinearity in performance data. Various neural network architectures are analysed in comparison to linear, quadratic, and cubic regression. Experiments show that neural networks obtain low error and high correlation between predicted and published performance values, while cubic regression produces higher
correlation than neural networks when more data is used for training than testing. An analysis of how the neural network architecture affects prediction is also performed. The proposed methods can be used to identify suitable hardware replacements.
This document discusses analyzing a movie recommendation system using the MovieLens dataset. It compares user-based and item-based collaborative filtering approaches. For user-based filtering, it calculates user similarity using cosine similarity and predicts ratings. For item-based filtering, it also uses cosine similarity to find similar items and predicts ratings. It evaluates the performance of both approaches using root mean square deviation and finds that item-based collaborative filtering has lower error compared to user-based filtering.
IRJET - Enhanced Movie Recommendation Engine using Content Filtering, Collabo...IRJET Journal
This document describes a study that developed an enhanced movie recommendation engine (MRE) using content filtering, collaborative filtering, and popularity filtering. The MRE analyzes movie data from three datasets and makes recommendations based on similarities in movie titles, genres, plots, casts, directors, keywords, vote counts, and vote averages. Evaluation shows the MRE achieves a root mean squared error of 0.873 and mean absolute error of 0.671 when using collaborative filtering, indicating good performance. The MRE provides a more personalized and accurate recommendation system for movies by combining multiple filtering techniques.
Building a new CTL model checker using Web Servicesinfopapers
Florin Stoica, Laura Stoica, Building a new CTL model checker using Web Services, Proceeding The 21th International Conference on Software, Telecommunications and Computer Networks (SoftCOM 2013), At Split-Primosten, Croatia, 18-20 September, pp. 285-289, 2013
DOI=10.1109/SoftCOM.2013.6671858 http://dx.doi.org/10.1109/SoftCOM.2013.6671858
Human Activity Recognition Using AccelerometerDataIRJET Journal
This document discusses human activity recognition using accelerometer data from smartphones. It uses a dataset containing accelerometer readings from 36 participants performing activities like walking, jogging, sitting, and standing. The data is preprocessed and a 2D convolutional neural network is used to classify the activities. The model achieves an accuracy of 96.75% on the test data based on a confusion matrix. Some activities like upstairs and downstairs are confused, likely due to data imbalance during preprocessing. In conclusion, neural networks can effectively perform human activity recognition from smartphone sensor data.
Threshold benchmarking for feature ranking techniquesjournalBEEI
In prediction modeling, the choice of features chosen from the original feature set is crucial for accuracy and model interpretability. Feature ranking techniques rank the features by its importance but there is no consensus on the number of features to be cut-off. Thus, it becomes important to identify a threshold value or range, so as to remove the redundant features. In this work, an empirical study is conducted for identification of the threshold benchmark for feature ranking algorithms. Experiments are conducted on Apache Click dataset with six popularly used ranker techniques and six machine learning techniques, to deduce a relationship between the total number of input features (N) to the threshold range. The area under the curve analysis shows that ≃ 33-50% of the features are necessary and sufficient to yield a reasonable performance measure, with a variance of 2%, in defect prediction models. Further, we also find that the log2(N) as the ranker threshold value represents the lower limit of the range.
This paper aims to create a statistical model to estimate published relative performance (PRP), a metric used to evaluate CPU performance, using a dataset of 209 computers from 1981-1984 that includes PRP and six other attributes. The authors perform exploratory data analysis on the dataset, identify outliers, transform variables to address skewness, and create a preliminary linear regression model of log(PRP) as a function of three log-transformed and three untransformed independent variables. They then check this model for outliers using robust regression and Cook's distances to identify any outliers influencing the model.
ESTIMATING THE EFFORT OF MOBILE APPLICATION DEVELOPMENTcsandit
The rise of the use of mobile technologies in the world, such as smartphones and tablets,
connected to mobile networks is changing old habits and creating new ways for the society to
access information and interact with computer systems. Thus, traditional information systems
are undergoing a process of adaptation to this new computing context. However, it is important
to note that the characteristics of this new context are different. There are new features and,
thereafter, new possibilities, as well as restrictions that did not exist before. Finally, the systems
developed for this environment have different requirements and characteristics than the
traditional information systems. For this reason, there is the need to reassess the current
knowledge about the processes of planning and building for the development of systems in this
new environment. One area in particular that demands such adaptation is software estimation.
The estimation processes, in general, are based on characteristics of the systems, trying to
quantify the complexity of implementing them. Hence, the main objective of this paper is to
present a proposal for an estimation model for mobile applications, as well as discuss the
applicability of traditional estimation models for the purpose of developing systems in the
context of mobile computing. Hence, the main objective of this paper is to present an effort
estimation model for mobile applications.
This document presents a method for estimating software change effort at the design level using flow charts. The proposed method involves designing flow charts for the original and modified programs, identifying the differences between the flow charts by counting different symbols and operators, and using a formula to calculate the estimated effort based on weights assigned to different flow chart components. An experiment was conducted on 9 programs where estimated effort values were compared to actual implementation times, finding the trends to be similar. The method provides a way to estimate effort required for software changes early in the development process based on flow chart differences.
This document discusses integrating IDEF3 process modeling with queuing network analysis to provide quantitative performance measures without simulation. IDEF3 captures process knowledge visually but provides no metrics. Queuing network analysis can estimate metrics like utilization and wait times but requires a different modeling view. The authors develop a framework to convert IDEF3 models to queuing networks by extracting resource information from activities. A database stores all information to facilitate conversion and analysis. Results from the queuing network analyzer are compared to simulation, finding reasonable accuracy at low system utilization. This integration allows domain experts to obtain performance insights without complex simulation modeling.
Tourism Based Hybrid Recommendation SystemIRJET Journal
This paper proposes a hybrid tourism recommendation system that combines collaborative filtering, content-based filtering, and aspect-based sentiment analysis to improve accuracy and address cold start problems. The system analyzes user ratings and reviews to predict ratings for other tourism packages. It stores ratings, reviews, and sentiment information in a database to enhance recommendations. Results showed the hybrid approach increased efficiency over conventional methods. Future work could include testing on additional datasets and expanding the system.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Slides from SIGIR 2013 talk. The full paper can be found here: http://staff.science.uva.nl/~mdr/Publications/sigir2013-metrics.pdf
ABSTRACT: In recent years many models have been proposed that are aimed at predicting clicks of web search users. In addition, some information retrieval evaluation metrics have been built on top of a user model. In this paper we bring these two directions together and propose a common approach to converting any click model into an evaluation metric. We then put the resulting model-based metrics as well as traditional metrics (like DCG or Precision) into a common evaluation framework and compare them along a number of dimensions.
One of the dimensions we are particularly interested in is the agreement between offline and online experimental outcomes. It is widely believed, especially in an industrial setting, that online A/B-testing and interleaving experiments are generally better at capturing system quality than offline measurements. We show that offline metrics that are based on click models are more strongly correlated with online experimental outcomes than traditional offline metrics, especially in situations when we have incomplete relevance judgements.
Testing the performance of the power law process model considering the use of...IJCSEA Journal
Within the class of non-homogeneous Poisson process (NHPP) models and as a result of the simplicity of
the mathematical computations of the Power Law Process (PLP) model and the attractive physical
explanation of its parameters, this model has found considerable attention in repairable systems literature.
In this article, we conduct the investigation of new estimation approach, the regression estimation
procedure, on the performance of the parametric PLP model. The regression approach for estimating the
unknown parameters of the PLP model through the mean time between failure (TBF) function is evaluated
against the maximum likelihood estimation (MLE) approach. The results from the regression and MLE
approaches are compared based on three error evaluation criteria in terms of parameter estimation and its
precision, the numerical application shows the effectiveness of the regression estimation approach at
enhancing the predictive accuracy of the TBF measure.
Visualizing and Forecasting Stocks Using Machine LearningIRJET Journal
This document discusses using machine learning techniques like regression and LSTM models to predict stock market returns. It first provides background on the challenges of predicting the stock market due to its unpredictable nature. It then describes obtaining stock price data from Yahoo Finance to use as the dataset. The document outlines using regression analysis to build a relationship between stock prices and time and using LSTM due to its ability to learn from sequence data. It then reviews related work applying machine learning like neural networks and genetic algorithms to optimize stock prediction. The methodology section provides more detail on preprocessing the dataset and using regression and LSTM models to make predictions and compare results.
Real-time PMU Data Recovery Application Based on Singular Value DecompositionPower System Operation
Phasor measurement units (PMUs) allow for the enhancement of power system monitoring and control applications and they will prove even more crucial in the future, as the grid becomes more decentralized and subject to higher uncertainty. Tools that improve PMU data quality and facilitate data analytics workflows are thus needed. In this work, we leverage a previously described algorithm to develop a python application for PMU data recovery. Because of its intrinsic nature, PMU data can be dimensionally reduced using singular value decomposition (SVD). Moreover, the high spatio-temporal correlation can be leveraged to estimate the value of measurements that are missing due to drop-outs. These observations are at the base of the data recovery application described in this work. Extensive testing is performed to study the performance under different data drop-out scenarios, and the results show very high recovery accuracy. Additionally, the application is designed to take advantage of a high performance PMU data platform called PredictiveGrid™, developed by PingThings
Real-time PMU Data Recovery Application Based on Singular Value DecompositionPower System Operation
Phasor measurement units (PMUs) allow for the enhancement of power system monitoring and control applications and they will prove even more crucial in the future, as the grid becomes more decentralized and subject to higher uncertainty. Tools that improve PMU data quality and facilitate data analytics workflows are thus needed. In this work, we leverage a previously described algorithm to develop a python application for PMU data recovery. Because of its intrinsic nature, PMU data can be dimensionally reduced using singular value decomposition (SVD). Moreover, the high spatio-temporal correlation can be leveraged to estimate the value of measurements that are missing due to drop-outs. These observations are at the base of the data recovery application described in this work. Extensive testing is performed to study the performance under different data drop-out scenarios, and the results show very high recovery accuracy. Additionally, the application is designed to take advantage of a high performance PMU data platform called PredictiveGrid™, developed by PingThings.
KEYWORDS
Feature selection is an essential issue in machine learning. It discards the unnecessary or redundant features in the dataset. This paper introduced the new feature selection based on kernel function using 16 the real-world datasets from UCI data repository, and k-means clustering was utilized as the classifier using radial basis function (RBF) and polynomial kernel function. After sorting the features using the new feature selection, 75 percent of it was examined and evaluated using 10-fold cross-validation, then the accuracy, F1-Score, and running time were compared. From the experiments, it was concluded that the performance of the new feature selection based on RBF kernel function varied according to the value of the kernel parameter, opposite with the polynomial kernel function. Moreover, the new feature selection based on RBF has a faster running time compared to the polynomial kernel function. Besides, the proposed method has higher accuracy and F1-Score until 40 percent difference in several datasets compared to the commonly used feature selection techniques such as Fisher score, Chi-Square test, and Laplacian score. Therefore, this method can be considered to use for feature selection
Analysis of Nifty 50 index stock market trends using hybrid machine learning ...IJECEIAES
Predicting equities market trends is one of the most challenging tasks for market participants. This study aims to apply machine learning algorithms to aid in accurate Nifty 50 index trend predictions. The paper compares and contrasts four forecasting methods: artificial neural networks (ANN), support vector machines (SVM), naive bayes (NB), and random forest (RF). In this study, the eight technical indicators are used, and then the deterministic trend layer is used to translate the indications into trend signals. The principal component analysis (PCA) method is then applied to this deterministic trend signal. This study's main influence is using the PCA technique to find the essential components from multiple technical indicators affecting stock prices to reduce data dimensionality and improve model performance. As a result, a PCA-machine learning (ML) hybrid forecasting model was proposed. The experimental findings suggest that the technical factors are signified as trend signals and that the PCA approach combined with ML models outperforms the comparative models in prediction performance. Utilizing the first three principal components (percentage of explained variance=80%), experiments on the Nifty 50 index show that support vector classifier (SVC) with radial basis function (RBF) kernel achieves good accuracy of (0.9968) and F1-score (0.9969), and the RF model achieves an accuracy of (0.9969) and F1-Score (0.9968). In area under the curve (AUC) performance, SVC (RBF and Linear kernels) and RF have AUC scores of 1.
Test sequences for web service composition using cpn modelAlexander Decker
The document discusses using a Colored Petri Net (CPN) model to generate test sequences for validating web service compositions. It proposes an algorithm that generates a test suite covering all possible paths without redundancy. The CPN model, previously used to verify the composition design, is reused for test design. The technique was applied to an airline reservation system and test sequences were evaluated against three coverage criteria.
Application Of Extreme Value Theory To Bursts PredictionCSCJournals
Bursts and extreme events in quantities such as connection durations, file sizes, throughput, etc. may produce undesirable consequences in computer networks. Deterioration in the quality of service is a major consequence. Predicting these extreme events and burst is important. It helps in reserving the right resources for a better quality of service. We applied Extreme value theory (EVT) to predict bursts in network traffic. We took a deeper look into the application of EVT by using EVT based Exploratory Data Analysis. We found that traffic is naturally divided into two categories, Internal and external traffic. The internal traffic follows generalized extreme value (GEV) model with a negative shape parameter, which is also the same as Weibull distribution. The external traffic follows a GEV with positive shape parameter, which is Frechet distribution. These findings are of great value to the quality of service in data networks, especially when included in service level agreement as traffic descriptor parameters.
A HEURISTIC APPROACH FOR WEB-SERVICE DISCOVERY AND SELECTIONijcsit
This document proposes a new heuristic approach for web service discovery and selection using an algorithm inspired by honey bee behavior called the Bees Algorithm. The approach structures service registries by domain to simplify discovery. It uses the Bees Algorithm as an intelligent search method to efficiently find the optimal service matching a client's request and quality of service requirements from the relevant registry in least time.
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET Journal
This document proposes a new texture feature-based approach for crowd density estimation using Completed Local Binary Pattern (CLBP). The approach divides images into blocks and further divides blocks into cells to compute CLBP features for each cell. A multi-class Support Vector Machine (SVM) classifier is trained to classify each block into one of four crowd density categories (Very Low, Low, Medium, High). Experiments on the PETS 2009 dataset show the proposed CLBP descriptor achieves 95% accuracy and outperforms other texture descriptors for crowd density estimation.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
More Related Content
Similar to EXPLAINING THE PERFORMANCE OF COLLABORATIVE FILTERING METHODS WITH OPTIMAL DATA CHARACTERISTICS
Human Activity Recognition Using AccelerometerDataIRJET Journal
This document discusses human activity recognition using accelerometer data from smartphones. It uses a dataset containing accelerometer readings from 36 participants performing activities like walking, jogging, sitting, and standing. The data is preprocessed and a 2D convolutional neural network is used to classify the activities. The model achieves an accuracy of 96.75% on the test data based on a confusion matrix. Some activities like upstairs and downstairs are confused, likely due to data imbalance during preprocessing. In conclusion, neural networks can effectively perform human activity recognition from smartphone sensor data.
Threshold benchmarking for feature ranking techniquesjournalBEEI
In prediction modeling, the choice of features chosen from the original feature set is crucial for accuracy and model interpretability. Feature ranking techniques rank the features by its importance but there is no consensus on the number of features to be cut-off. Thus, it becomes important to identify a threshold value or range, so as to remove the redundant features. In this work, an empirical study is conducted for identification of the threshold benchmark for feature ranking algorithms. Experiments are conducted on Apache Click dataset with six popularly used ranker techniques and six machine learning techniques, to deduce a relationship between the total number of input features (N) to the threshold range. The area under the curve analysis shows that ≃ 33-50% of the features are necessary and sufficient to yield a reasonable performance measure, with a variance of 2%, in defect prediction models. Further, we also find that the log2(N) as the ranker threshold value represents the lower limit of the range.
This paper aims to create a statistical model to estimate published relative performance (PRP), a metric used to evaluate CPU performance, using a dataset of 209 computers from 1981-1984 that includes PRP and six other attributes. The authors perform exploratory data analysis on the dataset, identify outliers, transform variables to address skewness, and create a preliminary linear regression model of log(PRP) as a function of three log-transformed and three untransformed independent variables. They then check this model for outliers using robust regression and Cook's distances to identify any outliers influencing the model.
ESTIMATING THE EFFORT OF MOBILE APPLICATION DEVELOPMENTcsandit
The rise of the use of mobile technologies in the world, such as smartphones and tablets,
connected to mobile networks is changing old habits and creating new ways for the society to
access information and interact with computer systems. Thus, traditional information systems
are undergoing a process of adaptation to this new computing context. However, it is important
to note that the characteristics of this new context are different. There are new features and,
thereafter, new possibilities, as well as restrictions that did not exist before. Finally, the systems
developed for this environment have different requirements and characteristics than the
traditional information systems. For this reason, there is the need to reassess the current
knowledge about the processes of planning and building for the development of systems in this
new environment. One area in particular that demands such adaptation is software estimation.
The estimation processes, in general, are based on characteristics of the systems, trying to
quantify the complexity of implementing them. Hence, the main objective of this paper is to
present a proposal for an estimation model for mobile applications, as well as discuss the
applicability of traditional estimation models for the purpose of developing systems in the
context of mobile computing. Hence, the main objective of this paper is to present an effort
estimation model for mobile applications.
This document presents a method for estimating software change effort at the design level using flow charts. The proposed method involves designing flow charts for the original and modified programs, identifying the differences between the flow charts by counting different symbols and operators, and using a formula to calculate the estimated effort based on weights assigned to different flow chart components. An experiment was conducted on 9 programs where estimated effort values were compared to actual implementation times, finding the trends to be similar. The method provides a way to estimate effort required for software changes early in the development process based on flow chart differences.
This document discusses integrating IDEF3 process modeling with queuing network analysis to provide quantitative performance measures without simulation. IDEF3 captures process knowledge visually but provides no metrics. Queuing network analysis can estimate metrics like utilization and wait times but requires a different modeling view. The authors develop a framework to convert IDEF3 models to queuing networks by extracting resource information from activities. A database stores all information to facilitate conversion and analysis. Results from the queuing network analyzer are compared to simulation, finding reasonable accuracy at low system utilization. This integration allows domain experts to obtain performance insights without complex simulation modeling.
Tourism Based Hybrid Recommendation SystemIRJET Journal
This paper proposes a hybrid tourism recommendation system that combines collaborative filtering, content-based filtering, and aspect-based sentiment analysis to improve accuracy and address cold start problems. The system analyzes user ratings and reviews to predict ratings for other tourism packages. It stores ratings, reviews, and sentiment information in a database to enhance recommendations. Results showed the hybrid approach increased efficiency over conventional methods. Future work could include testing on additional datasets and expanding the system.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Slides from SIGIR 2013 talk. The full paper can be found here: http://staff.science.uva.nl/~mdr/Publications/sigir2013-metrics.pdf
ABSTRACT: In recent years many models have been proposed that are aimed at predicting clicks of web search users. In addition, some information retrieval evaluation metrics have been built on top of a user model. In this paper we bring these two directions together and propose a common approach to converting any click model into an evaluation metric. We then put the resulting model-based metrics as well as traditional metrics (like DCG or Precision) into a common evaluation framework and compare them along a number of dimensions.
One of the dimensions we are particularly interested in is the agreement between offline and online experimental outcomes. It is widely believed, especially in an industrial setting, that online A/B-testing and interleaving experiments are generally better at capturing system quality than offline measurements. We show that offline metrics that are based on click models are more strongly correlated with online experimental outcomes than traditional offline metrics, especially in situations when we have incomplete relevance judgements.
Testing the performance of the power law process model considering the use of...IJCSEA Journal
Within the class of non-homogeneous Poisson process (NHPP) models and as a result of the simplicity of
the mathematical computations of the Power Law Process (PLP) model and the attractive physical
explanation of its parameters, this model has found considerable attention in repairable systems literature.
In this article, we conduct the investigation of new estimation approach, the regression estimation
procedure, on the performance of the parametric PLP model. The regression approach for estimating the
unknown parameters of the PLP model through the mean time between failure (TBF) function is evaluated
against the maximum likelihood estimation (MLE) approach. The results from the regression and MLE
approaches are compared based on three error evaluation criteria in terms of parameter estimation and its
precision, the numerical application shows the effectiveness of the regression estimation approach at
enhancing the predictive accuracy of the TBF measure.
Visualizing and Forecasting Stocks Using Machine LearningIRJET Journal
This document discusses using machine learning techniques like regression and LSTM models to predict stock market returns. It first provides background on the challenges of predicting the stock market due to its unpredictable nature. It then describes obtaining stock price data from Yahoo Finance to use as the dataset. The document outlines using regression analysis to build a relationship between stock prices and time and using LSTM due to its ability to learn from sequence data. It then reviews related work applying machine learning like neural networks and genetic algorithms to optimize stock prediction. The methodology section provides more detail on preprocessing the dataset and using regression and LSTM models to make predictions and compare results.
Real-time PMU Data Recovery Application Based on Singular Value DecompositionPower System Operation
Phasor measurement units (PMUs) allow for the enhancement of power system monitoring and control applications and they will prove even more crucial in the future, as the grid becomes more decentralized and subject to higher uncertainty. Tools that improve PMU data quality and facilitate data analytics workflows are thus needed. In this work, we leverage a previously described algorithm to develop a python application for PMU data recovery. Because of its intrinsic nature, PMU data can be dimensionally reduced using singular value decomposition (SVD). Moreover, the high spatio-temporal correlation can be leveraged to estimate the value of measurements that are missing due to drop-outs. These observations are at the base of the data recovery application described in this work. Extensive testing is performed to study the performance under different data drop-out scenarios, and the results show very high recovery accuracy. Additionally, the application is designed to take advantage of a high performance PMU data platform called PredictiveGrid™, developed by PingThings
Real-time PMU Data Recovery Application Based on Singular Value DecompositionPower System Operation
Phasor measurement units (PMUs) allow for the enhancement of power system monitoring and control applications and they will prove even more crucial in the future, as the grid becomes more decentralized and subject to higher uncertainty. Tools that improve PMU data quality and facilitate data analytics workflows are thus needed. In this work, we leverage a previously described algorithm to develop a python application for PMU data recovery. Because of its intrinsic nature, PMU data can be dimensionally reduced using singular value decomposition (SVD). Moreover, the high spatio-temporal correlation can be leveraged to estimate the value of measurements that are missing due to drop-outs. These observations are at the base of the data recovery application described in this work. Extensive testing is performed to study the performance under different data drop-out scenarios, and the results show very high recovery accuracy. Additionally, the application is designed to take advantage of a high performance PMU data platform called PredictiveGrid™, developed by PingThings.
KEYWORDS
Feature selection is an essential issue in machine learning. It discards the unnecessary or redundant features in the dataset. This paper introduced the new feature selection based on kernel function using 16 the real-world datasets from UCI data repository, and k-means clustering was utilized as the classifier using radial basis function (RBF) and polynomial kernel function. After sorting the features using the new feature selection, 75 percent of it was examined and evaluated using 10-fold cross-validation, then the accuracy, F1-Score, and running time were compared. From the experiments, it was concluded that the performance of the new feature selection based on RBF kernel function varied according to the value of the kernel parameter, opposite with the polynomial kernel function. Moreover, the new feature selection based on RBF has a faster running time compared to the polynomial kernel function. Besides, the proposed method has higher accuracy and F1-Score until 40 percent difference in several datasets compared to the commonly used feature selection techniques such as Fisher score, Chi-Square test, and Laplacian score. Therefore, this method can be considered to use for feature selection
Analysis of Nifty 50 index stock market trends using hybrid machine learning ...IJECEIAES
Predicting equities market trends is one of the most challenging tasks for market participants. This study aims to apply machine learning algorithms to aid in accurate Nifty 50 index trend predictions. The paper compares and contrasts four forecasting methods: artificial neural networks (ANN), support vector machines (SVM), naive bayes (NB), and random forest (RF). In this study, the eight technical indicators are used, and then the deterministic trend layer is used to translate the indications into trend signals. The principal component analysis (PCA) method is then applied to this deterministic trend signal. This study's main influence is using the PCA technique to find the essential components from multiple technical indicators affecting stock prices to reduce data dimensionality and improve model performance. As a result, a PCA-machine learning (ML) hybrid forecasting model was proposed. The experimental findings suggest that the technical factors are signified as trend signals and that the PCA approach combined with ML models outperforms the comparative models in prediction performance. Utilizing the first three principal components (percentage of explained variance=80%), experiments on the Nifty 50 index show that support vector classifier (SVC) with radial basis function (RBF) kernel achieves good accuracy of (0.9968) and F1-score (0.9969), and the RF model achieves an accuracy of (0.9969) and F1-Score (0.9968). In area under the curve (AUC) performance, SVC (RBF and Linear kernels) and RF have AUC scores of 1.
Test sequences for web service composition using cpn modelAlexander Decker
The document discusses using a Colored Petri Net (CPN) model to generate test sequences for validating web service compositions. It proposes an algorithm that generates a test suite covering all possible paths without redundancy. The CPN model, previously used to verify the composition design, is reused for test design. The technique was applied to an airline reservation system and test sequences were evaluated against three coverage criteria.
Application Of Extreme Value Theory To Bursts PredictionCSCJournals
Bursts and extreme events in quantities such as connection durations, file sizes, throughput, etc. may produce undesirable consequences in computer networks. Deterioration in the quality of service is a major consequence. Predicting these extreme events and burst is important. It helps in reserving the right resources for a better quality of service. We applied Extreme value theory (EVT) to predict bursts in network traffic. We took a deeper look into the application of EVT by using EVT based Exploratory Data Analysis. We found that traffic is naturally divided into two categories, Internal and external traffic. The internal traffic follows generalized extreme value (GEV) model with a negative shape parameter, which is also the same as Weibull distribution. The external traffic follows a GEV with positive shape parameter, which is Frechet distribution. These findings are of great value to the quality of service in data networks, especially when included in service level agreement as traffic descriptor parameters.
A HEURISTIC APPROACH FOR WEB-SERVICE DISCOVERY AND SELECTIONijcsit
This document proposes a new heuristic approach for web service discovery and selection using an algorithm inspired by honey bee behavior called the Bees Algorithm. The approach structures service registries by domain to simplify discovery. It uses the Bees Algorithm as an intelligent search method to efficiently find the optimal service matching a client's request and quality of service requirements from the relevant registry in least time.
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET Journal
This document proposes a new texture feature-based approach for crowd density estimation using Completed Local Binary Pattern (CLBP). The approach divides images into blocks and further divides blocks into cells to compute CLBP features for each cell. A multi-class Support Vector Machine (SVM) classifier is trained to classify each block into one of four crowd density categories (Very Low, Low, Medium, High). Experiments on the PETS 2009 dataset show the proposed CLBP descriptor achieves 95% accuracy and outperforms other texture descriptors for crowd density estimation.
Similar to EXPLAINING THE PERFORMANCE OF COLLABORATIVE FILTERING METHODS WITH OPTIMAL DATA CHARACTERISTICS (20)
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSIJNSA Journal
The smart irrigation system represents an innovative approach to optimize water usage in agricultural and landscaping practices. The integration of cutting-edge technologies, including sensors, actuators, and data analysis, empowers this system to provide accurate monitoring and control of irrigation processes by leveraging real-time environmental conditions. The main objective of a smart irrigation system is to optimize water efficiency, minimize expenses, and foster the adoption of sustainable water management methods. This paper conducts a systematic risk assessment by exploring the key components/assets and their functionalities in the smart irrigation system. The crucial role of sensors in gathering data on soil moisture, weather patterns, and plant well-being is emphasized in this system. These sensors enable intelligent decision-making in irrigation scheduling and water distribution, leading to enhanced water efficiency and sustainable water management practices. Actuators enable automated control of irrigation devices, ensuring precise and targeted water delivery to plants. Additionally, the paper addresses the potential threat and vulnerabilities associated with smart irrigation systems. It discusses limitations of the system, such as power constraints and computational capabilities, and calculates the potential security risks. The paper suggests possible risk treatment methods for effective secure system operation. In conclusion, the paper emphasizes the significant benefits of implementing smart irrigation systems, including improved water conservation, increased crop yield, and reduced environmental impact. Additionally, based on the security analysis conducted, the paper recommends the implementation of countermeasures and security approaches to address vulnerabilities and ensure the integrity and reliability of the system. By incorporating these measures, smart irrigation technology can revolutionize water management practices in agriculture, promoting sustainability, resource efficiency, and safeguarding against potential security threats.
EXPLAINING THE PERFORMANCE OF COLLABORATIVE FILTERING METHODS WITH OPTIMAL DATA CHARACTERISTICS
1. International Journal on Cybernetics & Informatics (IJCI) Vol. 12, No.2, April 2023
David C. Wyld et al. (Eds): DBML, CITE, IOTBC, EDUPAN, NLPAI -2023
pp.287-295, 2023. IJCI – 2023 DOI:10.5121/ijci.2023.120221
EXPLAINING THE PERFORMANCE OF
COLLABORATIVE FILTERING METHODS WITH
OPTIMAL DATA CHARACTERISTICS
Samin Poudel and Marwan Bikdash
Department of Computational Data Science and Engineering, North Carolina A & T
University, Greensboro, USA
ABSTRACT
The performance of a Collaborative Filtering (CF) method is based on the properties of a User-Item
Rating Matrix (URM). And the properties or Rating Data Characteristics (RDC) of a URM are constantly
changing. Recent studies significantly explained the variation in the performances of CF methods resulted
due to the change in URM using six or more RDC. Here, we found that the significant proportion of
variation in the performances of different CF techniques can be accounted to two RDC only. The two RDC
are the number of ratings per user or Information per User (IpU) and the number of ratings per item or
Information per Item (IpI). And the performances of CF algorithms are quadratic to IpU (or IpI) for a
square URM. The findings of this study are based on seven well-established CF methods and three popular
public recommender datasets: 1M MovieLens, 25M MovieLens, and Yahoo! Music Rating datasets.
KEYWORDS
Rating Matrix, Recommendation System, Collaborative Filtering, Data Characteristics
1. INTRODUCTION
Collaborative Filtering (CF) recommendation system is widely used in domains like e-commerce,
retail, social media, banking etc. CF methods predict the unknown ratings based on the known
ratings data stored in a User-Item Rating Matrix (URM)[1]. Thus, the performance of a CF
method relies on the Rating Data Characteristics (RDC) of the URM [2]–[4]. In other words, the
URM properties can explain the variations in the performance of a CF method [5]–[7]. The RDC
are mainly based on the structure of the URM, the rating frequency of the URM and the rating
distribution in the URM [8], [9]. The RDC in the different recommendation domains can be very
different [10]. Also, the RDC in a recommendation dataset are changing rapidly because of the
exponential increase in the size of the data, resulting due to the development in the Internet and
the web-based applications [11].
Many new CF methods are developed regularly with the pursuit of exploring new techniques that
perform better than the existing ones. But there are very few works which study the variation in
the performance of CF techniques based on RDC. In [12], the authors showed that the density
impacts the performance of CF algorithms. Authors in [13], [14], and [15] showed that the
accuracy of CF techniques improve with the increasing number of users, items and density.
In [8], authors studied the impact of six data characteristics in the performance of three CF
algorithms in detail. They showed that the six data characteristics can significantly explain the
variation in the performance of CF techniques for some datasets but not for all.
2. International Journal on Cybernetics & Informatics (IJCI) Vol. 12, No.2, April 2023
288
Authors in [5] studied the effect of six data characteristics on the robustness of three CF
algorithms. Their study showed that around 80 percent variations in the effectiveness of shilling
attacks on 3 CF models can be explained based on the six RDC. In [9], the combined effect of
about ten RDC on the performances of CF based on the accuracy and the fairness were studied on
several CF recommendation models. The study in [9] showed that the ten RDC explained
variation in the performance in terms of accuracy better than the performance based on fairness.
From the available studies it is seen that, around six or more data characteristics are required to
explain the significant proportion of variation in the performance of CF models resulted due to
the change in the properties of a recommendation dataset. It is clear that the authors are being
lured to use more and more data characteristics so that the variation in the performance of CF
across different User-Item Rating Matrices (URMs) can be explained better. We do not find the
studies that tend to find the optimal number of data characteristics capable of explaining the
significant amount of variation in the performance of CF algorithms with the changing URM.
Therefore, in this study we seek to find the minimum number of RDC that can explain the
variation in the performance of seven CF models.
The study here is based on three well-known datasets (1M MovieLens [16],[17], 25M MovieLens
[17], [18] and, the Yahoo! Music dataset [19] and seven well-established CF approaches :
1. Regularized Singular Value Decomposition denoted as SVD[12], [20] [Cacheda2011,
Wilkinson1971]
2. SVD with bias terms denoted as SVD_b[21], [22]
3. Non-negative Matrix Factorization (NMF)[23]
4. Slope One [24]
5. Co-Clustering [25][George2005]
6. User based Nearest Neighbor (UNN)[5], [26]
7. Item based Nearest Neighbor (INN) [5], [26]
2. METHODOLOGY
Let, 𝑅 denotes the User-Item Rating Matrix (URM). Here, 𝑚 represents the size of the rows and
hence the number of users, while 𝑛 represents the size of the columns, and hence the number of
items.
Let 𝑁𝑟 denotes the total number of ratings or nonzero elements in a rating matrix. If 𝑁𝑟 is
considered as the size of information in URM, then number of ratings per user can be defined as
the Information per User (IpU). Mathematically,
IpU =
𝑁𝑟
𝑚
… … … . (1)
Similarly, the number of ratings per item can be defined as the Information per Item (IpI).
Mathematically,
IpI =
𝑁𝑟
𝑛
… … … . (2)
A CF application process first divides a rating matrix R into training and test data, then learns a
model from the training data and test the performance of the learned model using the test data. In
this study, we implemented the CF algorithms using SURPRISE python library[21]. SURPRISE
is an open-source python module for building and testing recommender systems with explicit
rating data.
3. International Journal on Cybernetics & Informatics (IJCI) Vol. 12, No.2, April 2023
289
There are different predictive accuracy metrics, classification accuracy metrics, ranking accuracy
metrics and others to evaluate the performance of a CF technique [27]–[29]. And, a universal RS
does not exist which can perform better than other methods for all the evaluation metrics[30],
[31]. However, the most popular and commonly used metric for prediction evaluation of a CF RS
is predictive accuracy metric, Root Mean Squared Error (RMSE) [12], [32]. In this work, we
have used the RMSE as the error measure of CF estimates. Let us define the performance
measure of a CF method as the inverse of the error measure. Meaning,
Performance (𝑃) =
1
RMSE
… … … (3)
Higher the RMSE of a CF method rating predictions lower is the performance P and vice versa.
2.1. Overview of the Methodology
Here, we sample around 3000 User-Item Rating Matrices (URMs) randomly from each 1M
MovieLens [16],[17], 25M MovieLens [17], [18] and, the Yahoo! Music dataset [19]. The
number of rows (users) in the subsampled URMs ranged from 300 to 8000 users. Similarly, the
number of columns (items) in the subsampled URMs ranged from 300 to 8000 items. The applied
random sampling procedure is similar to the one mentioned in [8]. The sampled rating matrices
vary widely in most of the URM characteristics mentioned in [Deldjoo2021]. Then, we apply
seven CF methods to each of the sampled URM and compute the performance measure based on
Eq. (3). After that, we plot for the Performance of CF methods versus Information per User (IpU)
at constant Information per Item (IpI). Similarly, we plot for the Performance of CF methods
versus IpI at constant IpU. Next, we perform the regression analysis of CF Performance versus
IpU and IpI, and then discuss the results.
The URMs extracted from 1M MovieLens have rating data in numerical discrete scale from 1.0
to 5.0 in steps of 1.0. The URMs extracted from 25M MovieLens have rating data in numerical
discrete scale from 0.5 to 5.0 in steps of 0.5. And the ones extracted from Yahoo! Music rating
data have discrete data scale of 1.0 to 100.0.
3. PLOTS OF CF PERFORMANCE VERSUS RATING DATA CHARACTERISTICS
During the analysis of the CF Performance versus different rating data characteristics, we found
that the CF Performance is linear to log (IpI) at constant IpU as shown in Figure 1. Similarly, the
CF Performance is linear to log (IpU) at constant IpI as shown in Figure 2. Therefore, we can
define a CF performance regression model as:
𝑃 = 𝑎0 + 𝑎1 log(IpU) + a2 log(IpI) + 𝑎3 log(IpU)log(IpI) … … . . (4)
4. International Journal on Cybernetics & Informatics (IJCI) Vol. 12, No.2, April 2023
290
Figure 1. CF performance versus log(IpI) at constant IpU of 58.55
The plots in Figures 1 and 2 give an insight that the two RDC, Information per User and
Information per Item have capability of explaining the significant proportion of the variation in
the performance of all seven CF methods. To prove this insight, we need to perform the
regression analysis based on Eq. (4), which is to follow in next section.
Figure 2. CF performance versus log(IpU) at constant IpI of 55.55
5. International Journal on Cybernetics & Informatics (IJCI) Vol. 12, No.2, April 2023
291
3.1. Regression Analysis of CF Performance versus Rating Data Characteristics
In this section, we perform the regression analysis of CF performance based on Eq. (4). The
results of the regression analysis along with the evaluation measures for explaining the variation,
adjusted 𝑅2
are tabulated in Tables 1, 2, and 3. The values of adjusted 𝑅2
can be 0 in minimum
and 1 in maximum. The higher the value of adjusted 𝑅2
, the higher the capability of the
independent variables to explain the variation in the dependent variable of a regression model.All
the coefficients and constant reported in Tables 1, 2, and3 had p-values less than 0.01 which
makes the regression analysis results statistically significant.
Here, The Table 1 shows the results of regression analysis using Eq. (4) based on the sampled
URMs from 25M MovieLens data. The results show that the IpU and IpI have significant
explanatory power (70 to 95 percent) to explain the variance in the performance of all CF models
except for Slope-One and NMF when considering the URMs from 25M MovieLens data.
Table 1. Regression Analysis based on Eq. (4) using 25M MovieLens data.
CF Method Coefficient of
log (IpU)
Coefficient of
log (IPI)
Coefficient
of log (IpU)
* log (IPI)
constant 𝑹𝟐
(Adjusted)
UNN 0.0121 -0.0017 0.0019 0.9843 0.82
INN -0.0902 -0.0318 0.0151 1.33 0.89
SVD 0.0283 0.118 0.0040 0.9205 0.94
SVD_b 0.0149 -0.0024 0.0067 0.9891 0.95
CoClustering 0.0452 0.0361 -0.0044 0.8561 0.71
Slope-One 0.0390 0.0310 -0.0045 0.9577 0.51
NMF 0.0558 0.0442 -0.0066 0.8461 0.57
The Table 2 shows the results of regression analysis using model in Eq (4) based on the sampled
URMs from 1M MovieLens data. The results show that the IpU and IpI have significant
explanatory power (70 to 92 percent) to explain the variance in the performance of all seven CF
models when considering the URMs from 1M MovieLens data.
Table 2. Regression Analysis based on Eq. (4) using 1M MovieLens data.
CF Method Coefficient of
log (IpU)
Coefficient of
log (IPI)
Coefficient
of log (IpU)
* log (IPI)
constant 𝑹𝟐
(Adjusted)
UNN 0.0341 0.0245 -0.0028 0.8153 0.82
INN -0.0195 0.0072 0.0057 0.8961 0.84
SVD 0.1592 0.1288 -0.0199 0.1831 0.92
SVD_b -0.0166 -0.0195 0.0088 1.0808 0.90
CoClustering 0.0196 0.0056 0.0031 0.8889 0.87
Slope-One 0.0398 0.0283 -0.0038 0.8631 0.74
NMF 0.0584 0.0376 -0.0042 0.7179 0.88
The Table 3 shows the results of regression analysis using model in (4) based on the sampled
URMs from Yahoo! Music Rating data. The results show that the IpU and IpI have significant
explanatory power (70 to 93) to explain the variance in the performance of all CF models except
SVD_b when considering the URMs from Yahoo Music Rating Data.
6. International Journal on Cybernetics & Informatics (IJCI) Vol. 12, No.2, April 2023
292
Table 3. Regression Analysis based on Eq. (4) using Yahoo Music Rating data.
CF Method Coefficient of
log (IpU)
Coefficient of
log (IPI)
Coefficient
of log (IpU)
* log (IPI)
constant 𝑹𝟐
(Adjusted)
UNN 0.0006 0.0007 0 0.0316 0.90
INN -0.0034 -0.0023 0.008 0.0523 0.93
SVD 0.0023 -0.0022 -0.0006 0.0355 0.71
SVD_b 0.0015 0.0015 -0.004 0.0336 0.35
CoClustering 0.0016 0.0018 -0.0002 0.0303 0.75
Slope-One 0.0010 0.0011 -0.0001 0.0347 0.70
NMF 0.0004 0.0004 -0.0001 0.0127 0.79
From the regression analysis results, it is clear that the variation in the performance of different
CF models resulted due to changes in URM can be explained significantly with the IpU and IpI
only. The results in this study involving two RDC is comparable to the study in [8] using six
RDC for memory-based CF methods (UNN and INN). And the results much better for matrix
factorization based methods (SVD, NMF) compared to the result in[8].
3.2. Discussion of CF Performance versus Rating Data Characteristics
The regression analysis and the evaluation presented in the Section 3.1 suggests Eq. (4) as a
statistically significant model to explain the variation in the performance of CF. Eq. (4) is re-
stated as below:
𝑃 = 𝑎0 + 𝑎1 log(IpU) + a2 log(IpI) + 𝑎3 log(IpU)log(IpI) … … . . (5)
The model in Eq. (5) has significant power to explain the variability in the performance of CF
methods based on two RDC only as opposed to the work found in the literature, which uses six or
more RDC[5], [8], [9].
The partial derivative of the Eq. (5) with IpU gives:
𝛿𝑃
𝛿IpU
=
1
IpU
(𝑎1 + 𝑎3 log(IpI)) =
𝑐1
IpU
… … . . (6)
which shows the rate of change in CF performance 𝑃 per IpU is inversely proportional to the
information per user IpU at constant IpI. Similarly, the partial derivative of the Eq. (5) with IpI
gives:
𝛿𝑃
𝛿IpI
=
1
IpI
(𝑎2 + 𝑎3 log(IpU)) =
𝑐2
IpI
… … . . (7)
which shows the rate of change in CF performance 𝑃 per IpI is inversely proportional to the
information per user IpI at constant IpU.
For a square URM, IpU = IpI. Then, the model in (5) can be written as
{
𝑃 = 𝑎0 + (𝑎1 + 𝑎2) log(IpU) + a3(log(IpU))
2
= 𝑎0 + (𝑎1 + 𝑎2) log(IpI) + a3(log(IpI))
2 } … … . . (8)
7. International Journal on Cybernetics & Informatics (IJCI) Vol. 12, No.2, April 2023
293
Hence, from the model in (8), we can state that the performance of a CF algorithm is quadratic in
IpU or IpI for a square URM.
4. CONCLUSION AND FUTURE WORK
In this study, we checked the capability of two Rating Data Characteristics (RDC) to explain the
variation in the performance of CF methods. The two RDC are the number of ratings per user or
Information per User (IpU) and the number of ratings per item or Information per Item (IpI). Our
study showed that the performance of seven CF methods considered in this study varied linearly
with log (IpU) at constant IpI. Similarly, the performance of all CF methods varied linearly with
log (IpI) at constant IpU. The extensive simulation and regression analysis disclosed that the
significant proportion of the variations in the performance of seven CF methods resulted due to
change in URMs can be explained with the IpU and IpI only. The results in this study are based
on the 1M MovieLens Dataset, the 25M MovieLens Dataset and, the Yahoo Music Rating
Dataset.
Our study here is based on seven well established CF methods and three public recommendation
datasets. The results are not tested for machine learning algorithms beyond CF recommendations.
Also, the conclusions of this study are empirical, the theoretical foundation behind the results still
needs to be explored. And the acceptability of the results for User-Item Rating Matrices from
different recommender domains is to be tested.
REFERENCES
[1] A. Pawlicka, M. Pawlicki, R. Kozik, and R. S. Choraś, “A Systematic Review of Recommender
Systems and Their Applications in Cybersecurity,” Sensors (Basel)., vol. 21, no. 15, Aug. 2021, doi:
10.3390/S21155248.
[2] G. Adomavicius and A. Tuzhilin, “Toward the next generation of recommender systems: A survey of
the state-of-the-art and possible extensions,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 6, pp. 734–
749, Jun. 2005, doi: 10.1109/TKDE.2005.99.
[3] J. L. Herlocker, J. A. Konstan, and J. Riedl, “Explaining collaborative filtering recommendations,”
Proc. ACM Conf. Comput. Support. Coop. Work, pp. 241–250, 2000, doi: 10.1145/358916.358995.
[4] Z. Huang and D. D. Zeng, “Why does collaborative filtering work? transaction-based
recommendation model validation and selection by analyzing bipartite random graphs,” INFORMS J.
Comput., vol. 23, no. 1, pp. 138–152, Dec. 2011, doi: 10.1287/IJOC.1100.0385.
[5] Y. Deldjoo, T. Di Noia, E. Di Sciascio, F. A. Merra, and F. Antonio, “How Dataset Characteristics
Affect the Robustness of Collab-orative Recommendation Models,” in Proceedings of the 43rd
International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020,
vol. 10, no. 20, Accessed: Jun. 12, 2021. [Online]. Available:
https://doi.org/10.1145/3397271.3401046.
[6] V. W. Anelli, T. Di Noia, E. Di Sciascio, C. Pomo, and A. Ragone, “On the discriminative power of
Hyper-parameters in Cross-Validation and how to choose them,” RecSys 2019 - 13th ACM Conf.
Recomm. Syst., pp. 447–451, Sep. 2019, doi: 10.1145/3298689.3347010.
[7] S. Poudel and M. Bikdash, “Closed-Form Models of Accuracy Loss due to Subsampling in SVD
Collaborative Filtering,” Big Data Min. Anal., vol. 6, no. 1, pp. 72–84, Mar. 2023, doi:
10.26599/BDMA.2022.9020024.
[8] G. Adomavicius and J. Zhang, “Impact of data characteristics on recommender systems
performance,” ACM Trans. Manag. Inf. Syst., vol. 3, no. 1, pp. 1–17, Apr. 2012, doi:
10.1145/2151163.2151166.
[9] Y. Deldjoo, A. Bellogin, and T. Di Noia, “Explaining recommender systems fairness and accuracy
through the lens of data characteristics,” Inf. Process. Manag., vol. 58, no. 5, p. 102662, Sep. 2021,
doi: 10.1016/J.IPM.2021.102662.
[10] S. Poudel and M. Bikdash, “Optimal Dependence of Performance and Efficiency of Collaborative
Filtering on Random Stratified Subsampling,” Big Data Min. Anal., vol. 5, no. 3, pp. 192–205, Sep.
8. International Journal on Cybernetics & Informatics (IJCI) Vol. 12, No.2, April 2023
294
2022, doi: 10.26599/BDMA.2021.9020032.
[11] R. Chen, Q. Hua, Y. S. Chang, B. Wang, L. Zhang, and X. Kong, “A survey of collaborative filtering-
based recommender systems: from traditional methods to hybrid methods based on social networks,”
IEEE Access, vol. 6, pp. 64301–64320, 2018, doi: 10.1109/ACCESS.2018.2877208.
[12] F. Cacheda, V. Carneiro, D. Fernández, and V. Formoso, “Comparison of collaborative filtering
algorithms: Limitations of current techniques and proposals for scalable, high-performance
recommender systems,” ACM Trans. Web, vol. 5, no. 1, pp. 1–33, Feb. 2011, doi:
10.1145/1921591.1921593.
[13] J. Lee, M. Sun, and G. Lebanon, “A Comparative Study of Collaborative Filtering Algorithms,” May
2012, Accessed: May 10, 2020. [Online]. Available: http://arxiv.org/abs/1205.3193.
[14] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-based collaborative filtering recommendation
algorithms,” in Proceedings of the 10th International Conference on World Wide Web, WWW 2001,
Apr. 2001, pp. 285–295, doi: 10.1145/371920.372071.
[15] V. H. Vegeborn, H. Rahmani, K. Skolan, F. Datavetenskap, and O. Kommunikation, “Comparison
and Improvement Of Collaborative Filtering Algorithms,” 2017.
[16] GroupLens, “MovieLens 1M Dataset | GroupLens,” 2019.
https://grouplens.org/datasets/movielens/1m/ (accessed May 18, 2020).
[17] F. M. Harper and J. A. Konstan, “The movielens datasets: History and context,” ACM Trans. Interact.
Intell. Syst., vol. 5, no. 4, Dec. 2015, doi: 10.1145/2827872.
[18] GroupLens, “MovieLens 25M Dataset,” 2019. https://grouplens.org/datasets/movielens/25m/
(accessed Jun. 13, 2021).
[19] Y. M. Data-set, “Webscope | Yahoo Labs,” 2019.
https://webscope.sandbox.yahoo.com/catalog.php?datatype=r&did=1 (accessed Jun. 13, 2021).
[20] J. H. Wilkinson, C. Reinsch, G. H. Golub, and C. Reinsch, “Singular Value Decomposition and Least
Squares Solutions,” in Linear Algebra, Springer Berlin Heidelberg, 1971, pp. 134–151.
[21] N. Hug, “Surprise: A Python library for recommender systems,” J. Open Source Softw., vol. 5, no.
52, p. 2174, Aug. 2020, doi: 10.21105/joss.02174.
[22] N. Hug, “Matrix Factorization-based algorithms — Surprise 1 documentation.”
https://surprise.readthedocs.io/en/stable/matrix_factorization.html (accessed Jan. 04, 2022).
[23] X. Luo, M. Zhou, Y. Xia, and Q. Zhu, “An efficient non-negative matrix-factorization-based
approach to collaborative filtering for recommender systems,” IEEE Trans. Ind. Informatics, vol. 10,
no. 2, pp. 1273–1284, 2014, doi: 10.1109/TII.2014.2308433.
[24] D. Lemire and A. Maclachlan, “Slope One Predictors for Online Rating-Based Collaborative
Filtering,” Feb. 2007, Accessed: May 17, 2020. [Online]. Available: http://arxiv.org/abs/cs/0702144.
[25] T. George and S. Merugu, “A scalable collaborative filtering framework based on co-clustering,”
Proc. - IEEE Int. Conf. Data Mining, ICDM, pp. 4–7, 2005, doi: 10.1109/ICDM.2005.14.
[26] Y. Koren, Y. Koren, R. Bell, and C. Volinsky, “MATRIX FACTORIZATION TECHNIQUES FOR
RECOMMENDER SYSTEMS,” IEEE Comput., vol. 42, no. 8, pp. 30--37, 2009, Accessed: Mar. 22,
2021. [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.147.8295.
[27] G. Shani and A. Gunawardana, “Evaluating Recommendation Systems,” Recomm. Syst. Handb., pp.
257–297, 2011, doi: 10.1007/978-0-387-85820-3_8.
[28] G. Schröder, M. Thiele, and W. Lehner, “Setting Goals and Choosing Metrics for Recommender
System Evaluations,” in UCERSTI2 workshop at the 5th ACM conference on recommender systems,
Chicago, USA, 2011, vol. 23, p. 53.
[29] S. Poudel, “A Study of Disease Diagnosis Using Machine Learning,” Med. Sci. Forum 2022, Vol. 10,
Page 8, vol. 10, no. 1, p. 8, Feb. 2022, doi: 10.3390/IECH2022-12311.
[30] M. Jalili, S. Ahmadian, M. Izadi, P. Moradi, and M. Salehi, “Evaluating Collaborative Filtering
Recommender Algorithms: A Survey,” IEEE Access, vol. 6, pp. 74003–74024, 2018, doi:
10.1109/ACCESS.2018.2883742.
[31] S. Poudel and M. Bikdash, “Collaborative Filtering system based on multi-level user clustering and
aspect sentiment,” Data Inf. Manag., p. 100021, Nov. 2022, doi: 10.1016/J.DIM.2022.100021.
[32] T. Chai and R. R. Draxler, “Root mean square error (RMSE) or mean absolute error (MAE)?-
Arguments against avoiding RMSE in the literature,” Geosci. Model Dev, vol. 7, pp. 1247–1250,
2014, doi: 10.5194/gmd-7-1247-2014.
9. International Journal on Cybernetics & Informatics (IJCI) Vol. 12, No.2, April 2023
295
AUTHORS
Samin Poudel received PhD in Computational Data Science and Engineering from
NorthCarolina A&T State University in 2022. He did his undergraduate in Nepal. His
research interests include but not limited to data analytics, data mining, machine learning,
developing models, and optimizing techniques based on data.
Marwan Bikdash received the MEng and PhD degrees in electrical engineering from
Virginia Tech in 1990 and 1993, respectively.He is currently a professor and the chair of
the Department of Computational DataScience and Engineering, North Carolina A&T
State University. He teaches and conducts research in signals and systems, computational
intelligence, and modelling and simulations of systems with applications in health, energy,
and engineering. He has authored over 150 journal and conference papers. He has
supported, advised, and graduated over 50 master and PhD students. His projects have
been funded by the Jet Propulsion Laboratory, Defense Threat Reduction Agency, Army
ResearchLab, NASA, National Science Foundation, the Office of Naval Research, Boeing
Inc., Hewlett Packard, National Renewable Energy Laboratories, the Army Construction Engineering
Research Laboratory, and others.