This document compares collaborative filtering algorithms with various similarity measures for movie recommendations. It summarizes User-based and Item-based collaborative filtering algorithms implemented in the Apache Mahout framework. Various similarity measures used in collaborative filtering are discussed, including Euclidean distance, Log Likelihood Ratio, Pearson correlation, Tanimoto coefficient, Uncentered Cosine, and Spearman correlation. The document concludes that Item-based algorithms typically provide better results than User-based algorithms for movie recommendations.
Recommender systems give suggestion according
to the user preferences. The number of contents and books in a
university size library is enormous and a better than ever.
Readers find it extremely difficult to locate their favorite books.
Even though they could possibly find best preferred book by
the user, finding another similar book to the first preferred
book seems as if finding an in nail the ocean. That is because
the second preferred book might be at very last edge of long
tail. So recommender system is often a requirement in library
that should be considers and need it to come into make the
above finding similar. They have become fundamental
applications in electronic commerce and information retrieval,
providing suggestions that effectively crop large information
spaces so that users are directed toward those items that best
meet their needs and preferences. A variety of techniques have
been suggested for performing recommendation, including
collaborative technique and its three methods which are Slope
One used for rating prediction, Pearson’s correlation used for
finding the similarity between users and last but not the least
item to item similarity. To upgrade the performance, these
methods have sometimes been combined in hybrid
recommendation technique.
Recommendation based on Clustering and Association RulesIJARIIE JOURNAL
Recommender systems play an important role in filtering and customizing the desired information.
Recommender system are divided into 3 categories i.e collaborative filtering , content-based filtering, and hybrid
filtering and they are the most adopted techniques being utilized in recommender systems. The paper mainly
describe about the issues of recommendation system.The main aim of paper is to recommend the suitable items to
the user, so for recommending the suitable items a better rule extraction is needed.Thus for better rule extraction
Association mining is applied .The clustering method is also applied here to cluster the data based on similar
characteristics .The propose methods try to eliminate certain problems such as sparsity, cold-start problem. So to
overcome the certain problem association mining over clustering is used
Investigation and application of Personalizing Recommender Systems based on A...Eswar Publications
To aid in the decision-making process, recommender systems use the available data on the items themselves. Personalized recommender systems subsequently use this input data, and convert it to an output in the form of ordered lists or scores of items in which a user might be interested. These lists or scores are the final result the user will be presented with, and their goal is to assist the user in the decision-making process. The application of recommender systems outlined was just a small introduction to the possibilities of the extension. Recommender
systems became essential in an information- and decision-overloaded world. They changed the way users make decisions, and helped their creators to increase revenue at the same time.
A Proposal on Social Tagging Systems Using Tensor Reduction and Controlling R...ijcsa
Social Tagging System is the process in which user makes their interest by tagging on a particular item. These STS are in associated with web 2.0 and has sourceful information for the users with their recommendations. It provides different types of recommendations are modeled by a 3-order tensor, on which multiway latent semantic analysis and dimensionality reduction is performed using both the Higher Order Singular Value Decomposition (HOSVD) method and the KernelSVD smoothing technique. We provide now with the 4-order tensor approach, which we named as Tensor Reduction. Here the items that are tagged can be viewed by the user who are recommended the same item and tagged over it. There by can improve the social tagging recommendations efficiency and also the unwanted request has been controlled. The results show significant improvements in terms of effectiveness.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Recommender systems give suggestion according
to the user preferences. The number of contents and books in a
university size library is enormous and a better than ever.
Readers find it extremely difficult to locate their favorite books.
Even though they could possibly find best preferred book by
the user, finding another similar book to the first preferred
book seems as if finding an in nail the ocean. That is because
the second preferred book might be at very last edge of long
tail. So recommender system is often a requirement in library
that should be considers and need it to come into make the
above finding similar. They have become fundamental
applications in electronic commerce and information retrieval,
providing suggestions that effectively crop large information
spaces so that users are directed toward those items that best
meet their needs and preferences. A variety of techniques have
been suggested for performing recommendation, including
collaborative technique and its three methods which are Slope
One used for rating prediction, Pearson’s correlation used for
finding the similarity between users and last but not the least
item to item similarity. To upgrade the performance, these
methods have sometimes been combined in hybrid
recommendation technique.
Recommendation based on Clustering and Association RulesIJARIIE JOURNAL
Recommender systems play an important role in filtering and customizing the desired information.
Recommender system are divided into 3 categories i.e collaborative filtering , content-based filtering, and hybrid
filtering and they are the most adopted techniques being utilized in recommender systems. The paper mainly
describe about the issues of recommendation system.The main aim of paper is to recommend the suitable items to
the user, so for recommending the suitable items a better rule extraction is needed.Thus for better rule extraction
Association mining is applied .The clustering method is also applied here to cluster the data based on similar
characteristics .The propose methods try to eliminate certain problems such as sparsity, cold-start problem. So to
overcome the certain problem association mining over clustering is used
Investigation and application of Personalizing Recommender Systems based on A...Eswar Publications
To aid in the decision-making process, recommender systems use the available data on the items themselves. Personalized recommender systems subsequently use this input data, and convert it to an output in the form of ordered lists or scores of items in which a user might be interested. These lists or scores are the final result the user will be presented with, and their goal is to assist the user in the decision-making process. The application of recommender systems outlined was just a small introduction to the possibilities of the extension. Recommender
systems became essential in an information- and decision-overloaded world. They changed the way users make decisions, and helped their creators to increase revenue at the same time.
A Proposal on Social Tagging Systems Using Tensor Reduction and Controlling R...ijcsa
Social Tagging System is the process in which user makes their interest by tagging on a particular item. These STS are in associated with web 2.0 and has sourceful information for the users with their recommendations. It provides different types of recommendations are modeled by a 3-order tensor, on which multiway latent semantic analysis and dimensionality reduction is performed using both the Higher Order Singular Value Decomposition (HOSVD) method and the KernelSVD smoothing technique. We provide now with the 4-order tensor approach, which we named as Tensor Reduction. Here the items that are tagged can be viewed by the user who are recommended the same item and tagged over it. There by can improve the social tagging recommendations efficiency and also the unwanted request has been controlled. The results show significant improvements in terms of effectiveness.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Search and communication are the most popular uses of the computer. Not surprisingly, many
people in companies and universities are trying to improve search by coming up with easier and faster ways to
find the right information.Behind this whole business model underlies a problem of text classification. Classify
the intention that users reflect through query to provide relevant results in terms of both organic search results
and sponsored links.Inspired by stock market machine learning systems, a text classification tool is proposed. It
consists of using a combination of classic text classification techniques to select the one that offers the best
results according to an established machine learning criterion.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Improved Slicing Algorithm For Greater Utility In Privacy Preserving Data Pub...Waqas Tariq
Several algorithms and techniques have been proposed in recent years for the publication of sensitive microdata. However, there is a trade-off to be considered between the level of privacy offered and the usefulness of the published data. Recently, slicing was proposed as a novel technique for increasing the utility of an anonymized published dataset by partitioning the dataset vertically and horizontally. This work proposes a novel technique to increase the utility of a sliced dataset even further by allowing overlapped clustering while maintaining the prevention of membership disclosure. It is further shown that using an alternative algorithm to Mondrian increases the efficiency of slicing. This paper shows though workload experiments that these improvements help preserve data utility better than traditional slicing.
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...Computer Science Journals
With the increased number of web databases, major part of deep web is one of the bases of database. In several search engines, encoded data in the returned resultant pages from the web often comes from structured databases which are referred as Web databases (WDB).
Vertical intent prediction approach based on Doc2vec and convolutional neural...IJECEIAES
Vertical selection is the task of selecting the most relevant verticals to a given query in order to improve the diversity and quality of web search results. This task requires not only predicting relevant verticals but also these verticals must be those the user expects to be relevant for his particular information need. Most existing works focused on using traditional machine learning techniques to combine multiple types of features for selecting several relevant verticals. Although these techniques are very efficient, handling vertical selection with high accuracy is still a challenging research task. In this paper, we propose an approach for improving vertical selection in order to satisfy the user vertical intent and reduce user’s browsing time and efforts. First, it generates query embeddings vectors using the doc2vec algorithm that preserves syntactic and semantic information within each query. Secondly, this vector will be used as input to a convolutional neural network model for increasing the representation of the query with multiple levels of abstraction including rich semantic information and then creating a global summarization of the query features. We demonstrate the effectiveness of our approach through comprehensive experimentation using various datasets. Our experimental findings show that our system achieves significant accuracy. Further, it realizes accurate predictions on new unseen data.
Extraction of Data Using Comparable Entity Miningiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Data Mining in Multi-Instance and Multi-Represented Objectsijsrd.com
In multi-instance learning, the training set comprises labeled bags that are composed of unlabeled instances, and the task is to predict the labels of unseen bags. In this part, a web mining problem, i.e. web index recommendation, is investigated from a multi-instance view. In detail, each web index page is regarded as a bag, while each of its linked pages is regarded as an instance. A user favoring an index page means that he or she is interested in at least one page linked by the index
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGijcsit
In today’s world of internet, with whole lot of e-documents such, as html pages, digital libraries etc. occupying considerable amount of cyber space, organizing these documents has become a practical need. Clustering is an important technique that organizes large number of objects into smaller coherent groups.This helps in efficient and effective use of these documents for information retrieval and other NLP tasks.Email is one of the most frequently used e-document by individual or organization. Email categorization is one of the major tasks of email mining. Categorizing emails into different groups help easy retrieval and maintenance. Like other e-documents, emails can also be classified using clustering algorithms. In this
paper a similarity measure called Similarity Measure for Text Processing is suggested for email clustering.
The suggested similarity measure takes into account three situations: feature appears in both emails, feature appears in only one email and feature appears in none of the emails. The potency of suggested similarity measure is analyzed on Enron email data set to categorize emails. The outcome indicates that the efficiency acquired by the suggested similarity measure is better than that acquired by other measures.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Prediction of Reaction towards Textual Posts in Social NetworksMohamed El-Geish
Posting on social networks could be a gratifying or a terrifying experience depending on the reaction the post and its author —by association— receive from the readers. To better understand what makes a post popular, this project inquires into the factors that determine the number of likes, comments, and shares a textual post gets on LinkedIn; and finds a predictor function that can estimate those quantitative social gestures.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology
Annotation Approach for Document with Recommendation ijmpict
An enormous number of organizations generate and share textual descriptions of their products, facilities, and activities. Such collections of textual data comprise a significant amount of controlled information, which residues buried in the unstructured text. Whereas information extraction systems simplify the extraction of structured associations, they are frequently expensive and incorrect, particularly when working on top of text that does not comprise any examples of the targeted structured data. Projected an alternative methodology that simplifies the structured metadata generation by recognizing documents that are possible to contain information of awareness and this data will be beneficial for querying the database. Moreover, we intend algorithms to extract attribute-value pairs, and similarly devise new mechanisms to map such pairs to manually created schemes. We apply clustering technique to the item content information to complement the user rating information, which improves the correctness of collaborative similarity, and solves the cold start problem.
Search and communication are the most popular uses of the computer. Not surprisingly, many
people in companies and universities are trying to improve search by coming up with easier and faster ways to
find the right information.Behind this whole business model underlies a problem of text classification. Classify
the intention that users reflect through query to provide relevant results in terms of both organic search results
and sponsored links.Inspired by stock market machine learning systems, a text classification tool is proposed. It
consists of using a combination of classic text classification techniques to select the one that offers the best
results according to an established machine learning criterion.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Improved Slicing Algorithm For Greater Utility In Privacy Preserving Data Pub...Waqas Tariq
Several algorithms and techniques have been proposed in recent years for the publication of sensitive microdata. However, there is a trade-off to be considered between the level of privacy offered and the usefulness of the published data. Recently, slicing was proposed as a novel technique for increasing the utility of an anonymized published dataset by partitioning the dataset vertically and horizontally. This work proposes a novel technique to increase the utility of a sliced dataset even further by allowing overlapped clustering while maintaining the prevention of membership disclosure. It is further shown that using an alternative algorithm to Mondrian increases the efficiency of slicing. This paper shows though workload experiments that these improvements help preserve data utility better than traditional slicing.
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...Computer Science Journals
With the increased number of web databases, major part of deep web is one of the bases of database. In several search engines, encoded data in the returned resultant pages from the web often comes from structured databases which are referred as Web databases (WDB).
Vertical intent prediction approach based on Doc2vec and convolutional neural...IJECEIAES
Vertical selection is the task of selecting the most relevant verticals to a given query in order to improve the diversity and quality of web search results. This task requires not only predicting relevant verticals but also these verticals must be those the user expects to be relevant for his particular information need. Most existing works focused on using traditional machine learning techniques to combine multiple types of features for selecting several relevant verticals. Although these techniques are very efficient, handling vertical selection with high accuracy is still a challenging research task. In this paper, we propose an approach for improving vertical selection in order to satisfy the user vertical intent and reduce user’s browsing time and efforts. First, it generates query embeddings vectors using the doc2vec algorithm that preserves syntactic and semantic information within each query. Secondly, this vector will be used as input to a convolutional neural network model for increasing the representation of the query with multiple levels of abstraction including rich semantic information and then creating a global summarization of the query features. We demonstrate the effectiveness of our approach through comprehensive experimentation using various datasets. Our experimental findings show that our system achieves significant accuracy. Further, it realizes accurate predictions on new unseen data.
Extraction of Data Using Comparable Entity Miningiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Data Mining in Multi-Instance and Multi-Represented Objectsijsrd.com
In multi-instance learning, the training set comprises labeled bags that are composed of unlabeled instances, and the task is to predict the labels of unseen bags. In this part, a web mining problem, i.e. web index recommendation, is investigated from a multi-instance view. In detail, each web index page is regarded as a bag, while each of its linked pages is regarded as an instance. A user favoring an index page means that he or she is interested in at least one page linked by the index
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGijcsit
In today’s world of internet, with whole lot of e-documents such, as html pages, digital libraries etc. occupying considerable amount of cyber space, organizing these documents has become a practical need. Clustering is an important technique that organizes large number of objects into smaller coherent groups.This helps in efficient and effective use of these documents for information retrieval and other NLP tasks.Email is one of the most frequently used e-document by individual or organization. Email categorization is one of the major tasks of email mining. Categorizing emails into different groups help easy retrieval and maintenance. Like other e-documents, emails can also be classified using clustering algorithms. In this
paper a similarity measure called Similarity Measure for Text Processing is suggested for email clustering.
The suggested similarity measure takes into account three situations: feature appears in both emails, feature appears in only one email and feature appears in none of the emails. The potency of suggested similarity measure is analyzed on Enron email data set to categorize emails. The outcome indicates that the efficiency acquired by the suggested similarity measure is better than that acquired by other measures.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Prediction of Reaction towards Textual Posts in Social NetworksMohamed El-Geish
Posting on social networks could be a gratifying or a terrifying experience depending on the reaction the post and its author —by association— receive from the readers. To better understand what makes a post popular, this project inquires into the factors that determine the number of likes, comments, and shares a textual post gets on LinkedIn; and finds a predictor function that can estimate those quantitative social gestures.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology
Annotation Approach for Document with Recommendation ijmpict
An enormous number of organizations generate and share textual descriptions of their products, facilities, and activities. Such collections of textual data comprise a significant amount of controlled information, which residues buried in the unstructured text. Whereas information extraction systems simplify the extraction of structured associations, they are frequently expensive and incorrect, particularly when working on top of text that does not comprise any examples of the targeted structured data. Projected an alternative methodology that simplifies the structured metadata generation by recognizing documents that are possible to contain information of awareness and this data will be beneficial for querying the database. Moreover, we intend algorithms to extract attribute-value pairs, and similarly devise new mechanisms to map such pairs to manually created schemes. We apply clustering technique to the item content information to complement the user rating information, which improves the correctness of collaborative similarity, and solves the cold start problem.
An effective search on web log from most popular downloaded contentijdpsjournal
A Web page recommender system effectively predicts the best related web page to search. While search
ing
a word from search engine it may display some unnecessary links and unrelated data’s to user so to a
void
this problem, the con
ceptual prediction model combines both the web usage and domain knowledge. The
proposed conceptual prediction model automatically generates a semantic network of the semantic Web
usage knowledge, which is the integration of domain knowledge and web usage i
nformation. Web usage
mining aims to discover interesting and frequent user access patterns from web browsing data. The
discovered knowledge can then be used for many practical web applications such as web
recommendations, adaptive web sites, and personali
zed web search and surfing
A REVIEW PAPER ON BFO AND PSO BASED MOVIE RECOMMENDATION SYSTEM | J4RV4I1015Journal For Research
Recommendation system plays important role in Internet world and used in many applications. It has created the collection of many application, created global village and growth for numerous information. This paper represents the overview of Approaches and techniques generated in recommendation system. Recommendation system is categorized in three classes: Collaborative Filtering, Content based and hybrid based Approach. This paper classifies collaborative filtering in two types: Memory based and Model based Recommendation .The paper elaborates these approaches and their techniques with their limitations. The result of our system provides much better recommendations to users because it enables the users to understand the relation between their emotional states and the recommended movies.
Costomization of recommendation system using collaborative filtering algorith...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Hybrid Personalized Recommender System Using Modified Fuzzy C-Means Clusterin...Waqas Tariq
Recommender Systems apply machine learning and data mining techniques for filtering unseen information and can predict whether a user would like a given resource. This paper proposes a novel Modified Fuzzy C-means (MFCM) clustering algorithm which is used for Hybrid Personalized Recommender System (MFCMHPRS). The proposed system works in two phases. In the first phase, opinions from the users are collected in the form of user-item rating matrix. They are clustered offline using MFCM into predetermined number clusters and stored in a database for future recommendation. In the second phase, the recommendations are generated online for active users using similarity measures by choosing the clusters with good quality rating. We propose coefficient parameter for similarity computation when weighting of the users’ similarity. This helps to get further effectiveness and quality of recommendations for the active users. The experimental results using Iris dataset show that the proposed MFCM performs better than Fuzzy C-means (FCM) algorithm. The performance of MFCMHPRS is evaluated using Jester database available on website of California University, Berkeley and compared with fuzzy recommender system (FRS). The results obtained empirically demonstrate that the proposed MFCMHPRS performs superiorly.
Text document clustering and similarity detection is the major part of document management, where every document should be identified by its key terms and domain knowledge. Based on the similarity, the documents are grouped into clusters. For document similarity calculation there are several approaches were proposed in the existing system. But the existing system is either term based or pattern based. And those systems suffered from several problems. To make a revolution in this challenging environment, the proposed system presents an innovative model for document similarity by applying back propagation time stamp algorithm. It discovers patterns in text documents as higher level features and creates a network for fast grouping. It also detects the most appropriate patterns based on its weight and BPTT performs the document similarity measures. Using this approach, the document can be categorized easily. In order to perform the above, a new approach is used. This helps to reduce the training process problems. The above framework is named as BPTT. The BPTT has implemented and evaluated using dot net platform with different set of datasets.
This paper presents a review & performs a comparative evaluation of few known machine learning
algorithms in terms of their suitability & code performance on any given data set of any size. In this paper,
we describe our Machine Learning ToolBox that we have built using python programming language. The
algorithms used in the toolbox consists of supervised classification algorithms such as Naïve Bayes,
Decision Trees, SVM, K-nearest Neighbors and Neural Network (Backpropagation). The algorithms are
tested on iris and diabetes dataset and are compared on the basis of their accuracy under different
conditions. However using our tool one can apply any of the implemented ML algorithms on any dataset of
any size. The main goal of building a toolbox is to provide users with a platform to test their datasets on
different Machine Learning algorithms and use the accuracy results to determine which algorithms fits the
data best. The toolbox allows the user to choose a dataset of his/her choice either in structured or
unstructured form and then can choose the features he/she wants to use for training the machine We have
given our concluding remarks on the performance of implemented algorithms based on experimental
analysis
“ Vertical Image Search Engine” is IEEE project ppt. The basic working principle of the image search engine can be helpful to you in building up final year project.
Framework for Product Recommandation for Review Datasetrahulmonikasharma
In the social networking era, product reviews have a significant influence on the purchase decisions of customers so the market has recognized this problem The problem with this is that the customers do not know how these systems work which results in trust issues. Therefore a different system is needed that helps customers with their need to process the information in product reviews. There are different approaches and algorithms of data filtering and recommendation .Most existing recommender systems were developed for commercial domains with millions of users. In this paper we have discussed the recommendation system and its related research and implemented different techniques of the recommender system .
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Editor IJAIEM
Dr.G.Anandharaj1, Dr.P.Srimanchari2
1Associate Professor and Head, Department of Computer Science
Adhiparasakthi College of Arts and Science (Autonomous), Kalavai, Vellore (Dt) -632506
2 Assistant Professor and Head, Department of Computer Applications
Erode Arts and Science College (Autonomous), Erode (Dt) - 638001
ABSTRACT
In unpredictable increase in mobile apps, more and more threats migrate from outmoded PC client to mobile device. Compared
with traditional windows Intel alliance in PC, Android alliance dominates in Mobile Internet, the apps replace the PC client
software as the foremost target of hateful usage. In this paper, to improve the confidence status of recent mobile apps, we
propose a methodology to estimate mobile apps based on cloud computing platform and data mining. Compared with
traditional method, such as permission pattern based method, combines the dynamic and static analysis methods to
comprehensively evaluate an Android applications The Internet of Things (IoT) indicates a worldwide network of
interconnected items uniquely addressable, via standard communication protocols. Accordingly, preparing us for the
forthcoming invasion of things, a tool called data fusion can be used to manipulate and manage such data in order to improve
progression efficiency and provide advanced intelligence. In this paper, we propose an efficient multidimensional fusion
algorithm for IoT data based on partitioning. Finally, the attribute reduction and rule extraction methods are used to obtain the
synthesis results. By means of proving a few theorems and simulation, the correctness and effectiveness of this algorithm is
illustrated. This paper introduces and investigates large iterative multitier ensemble (LIME) classifiers specifically tailored for
big data. These classifiers are very hefty, but are quite easy to generate and use. They can be so large that it makes sense to use
them only for big data. Our experiments compare LIME classifiers with various vile classifiers and standard ordinary ensemble
Meta classifiers. The results obtained demonstrate that LIME classifiers can significantly increase the accuracy of
classifications. LIME classifiers made better than the base classifiers and standard ensemble Meta classifiers.
Keywords: LIME classifiers, ensemble Meta classifiers, Internet of Things, Big data
Similar to COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEASURES FOR MOVIE RECOMMENDATION (20)
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
COMPARISON OF COLLABORATIVE FILTERING ALGORITHMS WITH VARIOUS SIMILARITY MEASURES FOR MOVIE RECOMMENDATION
1. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
DOI : 10.5121/ijcsea.2016.6301 1
COMPARISON OF COLLABORATIVE FILTERING
ALGORITHMS WITH VARIOUS SIMILARITY MEASURES
FOR MOVIE RECOMMENDATION
Taner Arsan, Efecan Köksal, Zeki Bozkuş
Department of Computer Engineering, Kadir Has University, Istanbul, Turkey
ABSTRACT
Collaborative Filtering is generally used as a recommender system. There is enormous growth in the
amount of data in web. These recommender systems help users to select products on the web, which is the
most suitable for them. Collaborative filtering-systems collect user’s previous information about an item
such as movies, music, ideas, and so on. For recommending the best item, there are many algorithms,
which are based on different approaches. The most known algorithms are User-based and Item-based
algorithms. Experiments show that Item-based algorithms give better results than User-based algorithms.
The aim of this paper isto compare User-based and Item-based Collaborative Filtering Algorithms with
many different similarity indexes with their accuracy and performance. We provide an approach to
determine the best algorithm, which give the most accurate recommendation by using statistical accuracy
metrics. The results are compared the User-based and Item-based algorithms with movie recommendation
data set.
KEYWORDS
Collaborative Filtering, Recommendation Systems, User-based Algorithms, Item-based Algorithms
1. INTRODUCTION
Collaborative Filtering (CF) is became most popular method for decreasing information conflicts.
Works Collaborative filtering is working like creating a database of preferences for users and
items. The system has significant success on the Internet and most big companies use CF. The
idea under this paper is about selecting right information to the right user in the given database.
Automated collaborative filtering systems aim that finding users who that the same tastes or
information according to the specific purpose. To build the database, users share information or
preferences with the system so the system can decide better choices for the other users. To
achieve that users should give their feedback truly [1, 2].
In this paper, database of Collaborative Filtering System includes the data of users and the movies
as shown in Table 1.
Table 1. Collaborative filtering System is about prediction of missing rate in User-Item matrix. Prediction
for theNathan’s rate for Titanic.
Star Wars Hoop Dreams Contact Titanic
Joe 5 2 5 4
John 2 5 3
Al 2 2 4 2
Nathan 5 1 5 ?
2. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
2
Collaborative filtering algorithms are divided into two different recommender systems that are
User-based recommender system and Item-based recommender system as shown in Figure 1 and
Figure 2 respectively.
Figure 1. Collaborative filtering Systems applied in User-based Recommender System.
Figure 2. Collaborative filtering Systems applied in Item-based Recommender System.
2. FRAMEWORK
2.1. Introduction to Apache Mahout
In this paper, Apache Mahout is used as an implementation framework, whichallows developers
to generate strong and scalable recommender.These are publicly available sources machine
learning library. Apache Lucane Project pioneer to start Mahout as a by-product in 2008.Mahout
principally provides in content search and technologies of receiving information. When the
3. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
3
collection of data is too large, Mahout intends became the first choice as a library for
collaborative filtering. Mahout is coded as Java programming Language. Mahout doesn’t supply a
user interfaces or installer. After coding part it is the job to the developer to complete interfaces
for the algorithm. The Mahout library includes a lot of recommender systems. This studyalso
discusses how Mahout has adapted the User-based Recommender Systems and Item-based
Recommender Systems [3, 4].
2.1.1. Further Subsections
To build up the inputs for the purpose of the paper, first datasets need to be converted to csv
extension file. This file consists of some data, which are User ID, Item ID and the given
preferences (rates).Ids in Mahout are always number (integer) and the preference has the property
that is the larger number is positive strong preferences. According to the Movie Lens data sets,
these preferences are between 1 and 5 as an integer. After converting data file to the csv file, first
column shows user id, second column shows item id and the last column shows the rates [4].
2.1.2 Recommender Input File, Intro Csv
Csv file is shows the numbers separating with commas. To be more clear the table shows which
column shows which identity [3,4]. Table 2 shows what information includes csv file.
Table 2.Information includedby csv file
The following codes are how u.data can be converted to csv file in Java
publicclass MovieDataConvert {
publicstaticvoid main(String[] args) throws IOException {
BufferedReader br = new BufferedReader(new
FileReader("data/u.data"));
BufferedWriter bw = new BufferedWriter(new
FileWriter("data/movies.csv"));
String line;
while((line = br.readLine()) != null) {
System.out.println(line);
String[] values = line.split("t", -1);
bw.write(values[0] + "," + values[1] + "," + values[2] +
"n");
}
br.close();
bw.close();
}
}
FileReader: Creates a new FileReader, given the name of the file to read from
User ID Movie ID Rates
1 102 3
2 35 2
2 75 5
91 102 3
101 54 3
101 102 4
4. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
4
FileWriter: Constructs a FileWriter object given a file name.
BufferedReader: Creates a buffering character-input stream that uses a default-sized input buffer.
BufferedWriter: Writes text to a character-output stream, buffering characters so as to provide for
the efficient writing of single characters, arrays, and strings.
readLine: Reads a line of text. A line is considered to be terminated by any one of a line feed
('n'), a carriage return ('r'), or a carriage return followed immediately by a linefeed.
2.1.3 Creating a Recommender
A little piece of code given as an example of how to create recommendation to users [3, 4].
class UserBasedPearsonCorrelationSimilarity {
Public static void main(String[]args) throws Exception{
DataModel model = new FileDataModel(new File("data/movies.csv"));
←Load Data files
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new
NearestNUserNeighborhood(50,
similarity, model);
Recommender recommender = new
GenericUserBasedRecommender(model,
neighborhood, similarity);
←Create Recommender Engine
HashMap<String,
String>getMovieNameById=MovieItemConvert.getGetMovieNameById();
List<RecommendedItem>recommendations =
recommender.recommend(2,5);
← For User 2,Recommend 5 items
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
With DataModel the program can reach all preferences which are user and item data and rates.
With UserSimilarity the program can find how similar the users.
With UserNeighborhood the program can find the most similar user for the selected user.
With Recommender the program can recommend items to the users.
2.1.4Analyzing the Output
When developer runs the code, output of this code should be like
RecommendedItem[Item:106, value:4.96451]
RecommendedItem[Item:205, value:4.36231]
RecommendedItem[Item:100, value:4.26752]
RecommendedItem[Item:12, value:4.13121]
RecommendedItem[Item:502, value:4.01531]
5. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
5
The components of User-based recommendation in Mahout is given in Figure 3. This figure also
shows interaction of the components.
Figure 3. Interaction of components in Mahout User-based recommendation.
3. SIMILARITY MEASURES
Recommender systems contain many similarity metrics that come from machine learning. They
are important for recommender systems. Each similarity metrics are related with vector space
methods; but there are various ways for defining the similarity. They can be categorized in a way
that distance and degree measurement. There are different similarity calculation techniques for
computing similarity between users. Since each similarity have different formulas, they give
different measures from each other. Some similarity computation techniques are explained in the
following sub headings [5].
In the Collaborative Filtering Systems, there is a mutual point that is establishment of similarity
between users and items. The Mahout library has concerted a lot of similarity algorithms and
gives permission to the developers for integrating them into collaborative Filtering Recommender
Systems for the purpose of clarifying similar neighborhoods to the users or computing similarities
between items. Mahout has concerted similarity algorithms, which are,
1. Euclidean Distance Similarity
2. Log Likelihood Ratio Similarity algorithms
3. Pearson Correlation Coefficient Similarity
4. Tanimoto Coefficient Similarity
5. Uncentered Cosine Similarity
6. Spearman Correlation Coefficient Similarity
3.1 Euclidean Distance Similarity
In the code implementing EuclideanDistanceSimilarity (model) to UserSimilarity will work for
this method. The method based on distance between users.
This method is working as users is a point in many items. The table has the rates of the each user
to the each item. This metric converts Euclidean distance d between 2 such users. Distance value
is smaller when these users are more similar. This method gives the value of 1/ (1+d).It never
gives negative value as a similarity and when the value increases it means that they are more
similar [3].
The equation is given in (1) as
Recommender
UserSimilarity
UserNeighborhood
DataModelApplication
6. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
6
(x, y)= ( − ) + ( − ) = ∑ ( − ) (1)
Table 3. Similarities between user 1 and the other users.
Euclidean Item 1 Item 2 Item 3 Distance Similarity to User 1
User 1 5.0 3.0 2.0 0.000 1.000
User 2 3.0 2.0 5.0 3.937 0.203
User 3 2.0 - - 2.500 0.286
User 4 5.0 - 3.0 0.500 0.667
User 5 4.0 3.0 2.0 1.118 0.472
In the code implementing EuclideanDistanceSimilarity (model) to ItemSimilarity will work for
this method.
As shown in Table 3, this method compares rates of the items for one item not for one user to
items. Item similarity gives better results because user based similarity affected by mood of user
or tastes of user can change over time. Item similarities are more fixed and better for
precomputation. It speeds up computation as runtime.
3.2 Log Likelihood Similarity
In the “Accurate Methods for the Statistics of Surprise and Coincidence” paper Ted Dunning
created Log Likelihood Ratio Similarity. Log Likelihood similarity is similar to Tanimoto
similarity, but it is more complex to understand. It can explain with Math and it doesnot take
individual preference. The value gives how unlikely the user to have so much conflicts and also it
is based on total number of items out and total number of each user has preferences. It means to
dissimilar user will have some common items, but two similar user will conflict. For example, if
two users have 4 preferences in common, but have both only taken 10 preferences into the data
model, they will be considered more similar than two users who have 4 preferences in common
but have both taken over 50 preferences into the data model[3]. Table 4 shows the similarity
between users according to the Log Likelihood Similarity Measurement.
Table 4. Example for Log Likelihood Similarity Measurement.
Item1 Item2 Item3 Item4 Item5 Item6 Item7
Similarity to
the user1
User 1 X X X 0.90
User 2 X X X X 0.84
User 3 0.55
User 4 X X X X 0.16
User 5 X X X X X X 0.55
3.3 Pearson Correlation Similarity
It is used for converting similarity between two users or items by measuring obliquity of two
series of preferences to act together in a comparative and linear manner. It considers preferences
of conflicting users and items. It tries to find each users’ or items’ derivations from their average
rates while recognizing linear adjustment between two items or users.
7. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
7
P, C(w,u)=
∑ ( , )( , )
∑ ( , ) ∑ ( , )
(2)
w and u shows the two users or items for which the coefficient is calculated, i is an item, , and
, are individual ratings from w and u for i, and average ratings of and are ,for user (or
item) w and u [3, 4]. Table 5 shows Pearson Correlation Similarity of user1 and the others based
on three items common.
Table 5.Pearson Correlation Similarity.
Item1 Item2 Item3 Correlation with user1
User 1 5.0 3.0 2.0 1.000
User 2 2.0 2.0 5.0 -0.764
User 3 2.0 - - -
User 4 5.0 - 3.0 1.000
User 5 4.0 3.0 2.0 0.945
3.4 Tanimoto Coefficient Similarity
As shown in Figure 4 and Table 6, this is a similarity that ignores the preference values so that it
does focus on the value that the user given for the item. It only checks that the user expressed a
preference or not. It is also known as Jaccard coefficient. Its formula is the number of items that
both users showed their interest, divided by the number of items that either usershows some
interest. When they do not have any similar preference, the result will be zero. The similarity
value cannot be greater than one [4]. The equation for Tanimoto Coefficient Similarity is given in
(3):
( , ) =
.
| | !|"| #."
=
∑ #$
%& '"
∑ #$
%& !∑ "$
%& ∑ #$
%& '"
(3)
Figure 4. The Tanimoto coefficient is the ratio of the intersection that means both users express their
feelings about the same items, to the union of the users preferred items.
8. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
8
Table 6. By using Tanimoto Coefficient similarity, the similarity values are calculated between
user one and the other users.
Item 1 Item 2 Item 3 Item 4 Item 5 Item 6 Item 7
Similarity
to User 1
User 1 √ √ √ 1.0
User 2 √ √ √ √ 0.75
User 3 √ √ √ √ 1.17
User 4 √ √ √ √ 0.4
User 5 √ √ √ √ √ √ 0.5
3.5 Uncentered Cosine Similarity
It is a similarity that measures cosine of the angle created from the two vectors in the coordinate
system. The result changes from -1 and 1. This similarity does not center the data, it moves the
user's preference values, it makes their means is 0. Also it does not adjust the preference values,
therefore it is called uncentered cosine similarity. The equation for Uncentered Cosine Similarity
is given in equation (4):
()*(+) =
∑ #$
%& '"
∑ #$
%& ∑ "$
%&
(4)
3.6 Spearman Correlation Similarity
As shown in Table 7, Spearman Correlation Similarity is similar to Pearson Correlation Similarity
but instead of preference values, it uses ranks. For each user, the preference item’s preference
values are ordered from the least-preferred to the most-preferred. Then this value modify with
starting from 1. Now if Pearson correlation is computed with these values, it will give the
Spearman Correlation Similarity. This similarity is better for smaller data sets because computing
and storing the ranks take long time [5, 6, 7]. The equation for Spearman Correlation Similarity is
given in equation (5):
,(-, ) =
∑ ( -./(0, ) -./0
111111111)$
%& ∗( -./( , ) -./111111111)
30∗3
(5)
Table 7. After changing the values of the preferences into the ranks, the results are found by using (2.5).
Item 101 Item 102 Item 103 Correlation to User 1
User 1 3.0 2.0 1.0 1.0
User 2 1.0 2.0 3.0 -1.0
User 3 1.0 - - -
User 4 2.0 - 1.0 1.0
User 5 3.0 2.0 1.0 1.0
4. IMPLEMENTATION
Item based collaborative Filtering Algorithm is chosen for this part of the paper. To recommend
something to the user Adjusted Cosine Similarity Method is chosen.
9. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
9
4.1 Adjusted Cosine Similarity
The difference between the User Based Collaborative Filtering and the Item Based Collaborative
Filtering is that User based takes the rows and Item based takes the columns for similarity
measurement. Basic Cosine similarity computation has important disadvantage which is rating
scale between different users are ignored. By subtracting the selected user mean from every co-
rated place, Adjusted Cosine Similarity takes advantage. The equation for Adjusted Cosine
Similarity is given in equation (6),
Sim(i,j)=
∑ (4 , 41111∈6 )(4 ,7 41 )
∑ (4 , 41111)∈6 ∑ (4 ,7 41 )∈6
(6)
This equation shows similarity between i and j items, and 81 is average of the ratings of the uth
user.
4.2 Prediction Computation
The significant part of the collaborative filtering system is output of the recommendation. The
items found with adjusted cosine similarity will be used for target users rates and finally
recommend something to the user.
4.2.1 Weighted Sum
For item i to useru, finding sum of the rates to user u on the items similar to ithis method is used.
Every rates are weighted by adjusted cosine similarity 9:; ,< between items i and j. Weighted
Sum equation is
= , =
∑ (> .?∗4 ,?)0@@ A B @0C DEBA,?
∑ (|> ,?|0@@ A B @0C DEBA,? )
(7)
This equation simply achieves to find how the active user rates the similar items.
4.3 Creating Database
First ‘Movierecommender’ database is created. Then tables has to be created.
Movies table added and this table includes data of id, movie titles, movie genres are available.
Rates table are built. This table includes data of id, movie, user_ and rates. Also in movie and
user_section primary keys are added so conflicts are blocked.
Finally, Users table added and this table includes information about user id, name, last name, age,
sex and email. Also into the email part primary key added. So when users come to the system
each user can enter the system with their own email addresses.
Flowchart of the Proposed Algorithm is given in Figure 5. On the other hand, flowchart of
Mahout Library used in algorithm is given in Figure 6.
10. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
10
Figure 5. Flowchart of the Proposed Algorithm.
11. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
11
Figure 6. Flowchart of Mahout Library Used Algorithm.
12. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
12
5. EXPERIMENTAL EVALUATION
5.1 Data Set
In this paper, algorithms applied to the MovieLens-100K data sets. It contains 100,000 ratings
from 943 users on 1682 movies. All users in the data sets rated at least 20 movies. There are 2
types of data file. In u.data set, it has user id, item id, rating, timestamp sections. In the u.item
data set, it contains information about movies such as movie id, movie title, release date, genres.
Since movie ids are the same in the both data sets, we connected these data sets in our
experiments.
5.1.1 Rating Distribution
As shown in Figure 7, the ratings in the Movie Lens data sets are integers. Ratings are between 1
and 5. Histogram is provided in the following section.
5.1.2 User and Movie Statistics
In this section, rating distributions are displayed. Mean of the ratings is calculated as 3,52986.
Standard deviation of the ratings is 1,125674.
Figure 7. Rating frequency in the Movie Lens data sets.
5.2 Evaluation Metric
After many years of researches on Collaborative filtering algorithms, many researchers found
different evaluation metrics in order to evaluate the quality of the prediction. Prediction accuracy
metrics find values that show how much the prediction is close to the real preference. There are
many prediction accuracy metrics are used by researchers for testing the prediction accuracy of
their used algorithms, are Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE)
which are also implemented in Mahout. We selected Mean Absolute Error and Root Mean
Squared Error as our choice of evaluation metrics for showing our experiment results.
13. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
13
5.2.1 Mean Absolute Error
This evaluation metric evaluates accuracy of an algorithm by comparing value of predictions
against the actual user’s ratings for the user-item pairs in the test dataset. For each rating-
prediction pair, their absolute error is calculated. After summing up these pairs and dividing them
by the total number of rating-prediction pairs, Mean Absolute Error can be found. It is the most
commonly used and can be interpret easily. The equation of Mean Absolute Error is given in
equation (8):
F G =
∑ |H |$
%&
.
(8)
5.2.2 Root Mean Square Error
This is a statistical accuracy metric that is slightly different from Mean Absolute Error. Once
rating-prediction difference is calculated, its power of 2 is taken. After summing them up and
dividing them by the total number of rating-prediction pairs and taking square root of it, Root
Mean Square Error can be found. Equation of Root Mean Square Error is given in equation (9):
8F9G =
∑ (H )$
%&
.
(9)
Where,
I is the prediction of user i
is the real or true rating of user i
Jis the number of ratings-prediction pairs
By using evaluation metrics, prediction accuracy and efficiency of the collaborative filtering
methods can be calculated and compared. Therefore the results will show which algorithm should
be used for given datasets.
5.3 Experimental Procedure
In this section experimental procedures are explained:
5.3.1 Experimental Steps
The data set has divided into a training and test portions. In the experiments, from 0.2 to 0.9
training test ratios are used in order to calculate and compare the prediction accuracy. For each
similarity measures and collaborative filtering techniques, evaluation has been coded to find
Mean Absolute Error and Root Mean Square Error.
5.3.2 Experimental Platform
All our experiments were implemented by using Java programming language. All the
experiments are run on windows based PC with Intel core i7 processor having a speed of 2.40
GHz and 16GB of ram.
5.3.3 Experiment Results
Experimental results of User-based and Item-based collaborative filtering techniques for creating
prediction are shown. There are some parameters that have to be determined. These parameters
are, the neighborhood size, training/test ratio and effects of different similarity measures. All the
14. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
14
classes that contains evaluation metric has been run separately. Then the results have been
recorded in order to compare them. By using these information, histograms have been created.
5.3.4 Experiment Results with Different Neighborhood Size
The size of the Neighbor affects the prediction quality. By changing the number of neighbors,
sensitivity of neighborhood is determined. As number of neighbors’ increases, the quality of
prediction is also increases.
Figure 8. Mean Absolute Error for all user-based similarities as Training/Test Ratio changes.
Figure 9. Root Mean Square Error for all user-based similarities as Training/Test Ratio changes.
0.70
0.75
0.80
0.85
0.90
0.95
10 20 30 60 90 120 150 180 210
MeanAbsoluteError
Number of neighbors
Sensitivity of the Neighborhood Size
(Training/Test ratio is 0.8)
User-Based Euclidean Distance Similarity User-Based Log Likelihood Distance Similarity
User-Based Pearson Correlation Similarity User-Based Spearman Correlation Similarity
User-Based Tanimoto Coefficient Similarity User-Based Uncentered Cosine Similarity
0.90
1.00
1.10
1.20
10 20 30 60 90 120 150 180 210
RootMeanSquareError
Training/Test Ratio
Sensitivity of the Neighborhood Size
(Training/Test ratio is 0.8)
User-Based Euclidean Distance Similarity User-Based Log Likelihood Distance Similarity
User-Based Pearson Correlation Similarity User-Based Spearman Correlation Similarity
User-Based Tanimoto Coefficient Similarity User-Based Uncentered Cosine Similarity
15. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
15
5.3.5 Experiment Results with Different Training/Test Ratio
By changing the Training/Test Ratio, sensitivity of the Training/Test ratio is determined. For this
purpose, Training/Test ratio is changed from 0.1 to 0.9 by 0.1 for all similarity metrics. The
results show that the quality of the prediction is increasing as Training/Test ratio increases.
Moreover, User-Based Log likelihood Distance Similarity and Item-Based Tanimoto Coefficient
Similarity have the lowest Mean Absolute Error and Root Mean Square Error which means they
predict better. We picked 0.8 as an optimum value for the following experiments. The results are
given In Figure 10, Figure 11, Figure 12 and Figure 13 respectively.
Figure 10. Mean Absolute Error for all user-based similarities as Training/Test Ratio changes.
Figure 11. Root Mean Square Error for all user-based similarities as Training/Test Ratio changes.
0.80
0.85
0.90
0.95
1.00
1.05
1.10
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
MeanAbsoluteError
Training/Test Ratio
Sensitivity of the Training/Test Ratio (neighborhood is 30)
User-Based Euclidean Distance Similarity User-Based Log Likelihood Distance Similarity
User-Based Pearson Correlation Similarity User-Based Spearman Correlation Similarity
User-Based Tanimoto Coefficient Similarity User-Based Uncentered Cosine Similarity
1.00
1.10
1.20
1.30
1.40
1.50
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
RootMeanSquareError
Training/Test Ratio
Sensitivity of the Training/Test Ratio (neighborhood is 30)
User-Based Euclidean Distance Similarity User-Based Log Likelihood Distance Similarity
User-Based Pearson Correlation Similarity User-Based Spearman Correlation Similarity
User-Based Tanimoto Coefficient Similarity User-Based Uncentered Cosine Similarity
16. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
16
Figure 12. Mean Absolute Error for all item-based similarities as Training/Test Ratio changes.
Figure 13. Root Mean Square Error for all item-based similarities as Training/Test Ratio changes.
5.3.6 Experiment Results with Different CF Algorithms
Since in the previous experiment, Log Likelihood Distance similarity and Tanimoto Coefficient
similarity gave the lowest Mean Absolute Error and Root Mean Square Error, these similarities
are picked in this experiment in order to compare Item-Based and User-Based algorithms. These
similarities are tested with our data sets. As shown in Figure 14 and Figure 15, with different
Training/Test ratio, Mean Absolute Error and Root Mean Square Error has calculated.
0.70
0.80
0.90
1.00
1.10
1.20
1.30
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
MeanAbsoluteError
Training/Test Ratio
Sensitivity of the Training/Test Ratio for Item Based Similarities
Item-Based Euclidean Distance Similarity Item-Based Log Likelihood Distance Similarity
Item-Based Pearson Correlation Similarity Item-Based Uncentered Cosine Similarity
Item-Based Tanimoto Coefficient Similarity
0.90
1.00
1.10
1.20
1.30
1.40
1.50
1.60
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
RootMeanSquareError
Training/Test Ratio
Sensitivity of the Training/Test Ratio for Item Based Similarities
Item-Based Euclidean Distance Similarity Item-Based Log Likelihood Distance Similarity
Item-Based Pearson Correlation Similarity Item-Based Uncentered Cosine Similarity
Item-Based Tanimoto Coefficient Similarity
17. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
17
Figure 14. Impact of Training/Test Ratio on Item-Based and User Based algorithms by using Mean
Absolute Error.
Figure 15 Impact of Training/Test Ratio on Item-Based and User Based algorithms by using Root Mean
Square Error.
5.4 Performance Results
In this section performance results will be compared. Performance results are related with
recommendation times. Since each similarity have different way to recommend item, their
recommendation times are different. Recommendation times show that how fast the
0.77
0.79
0.81
0.83
0.85
0.87
0.89
0.91
0.93
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
MeanAbsoluteError
Training/Test Ratio
Item-Based and User-Based Performances
User-Based Log Likelihood Distance Similarity Item-Based Log Likelihood Distance Similarity
User-Based Tanimoto Coefficient Similarity Item-Based Tanimoto Coefficient Similarity
0.95
1.05
1.15
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
RootMeanSquareError
Training/Test Ratio
Item-Based and User-Based Performances
User-Based Log Likelihood Distance Similarity Item-Based Log Likelihood Distance Similarity
User-Based Tanimoto Coefficient Similarity Item-Based Tanimoto Coefficient Similarity
18. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
18
recommendation is created. By taking average recommendation time, it can be observed that
which collaborative filtering technique gives the faster recommendations.
5.4.1 User-Based Similarities in Recommendation Times
In this section, the results for six User-Based similarities will be shown. Then by creating a
histogram graph, their recommendation times will be compared. In this experiment, ten movies
will be recommended to the selected user. For six User-based similarities, ten movies are
recommended to the selected user and it is repeated ten times as shown in Table 8.
Table 8. For each repeat, recommendation times and average recommendation time for six User-Based
similarities are shown as milliseconds.
USER BASED SIMILARITIES
EUCLIDEAN
DISTANCE
LOG
LIKELIHOOD
PEARSON
CORRELATION
TANIMOTO
COEFFICIENT
UNCENTERED
COSINE
SPEARMAN
CORRELATION
547 579 515 532 469 532
547 610 547 579 531 547
547 640 515 578 500 562
507 625 548 591 547 531
563 640 547 563 516 595
516 626 515 562 516 609
516 656 531 563 516 578
516 656 500 578 500 563
547 625 532 547 532 578
516 672 532 531 500 594
532,2 632,9 528,2 562,4 512,7 568,9
Figure 16. Impact of Item Based algorithms on recommendation times.
As shown in Table 8 and Figure 16, while Log Likelihood similarity is giving the slowest
recommendations, Uncentered Cosine gives the fastest recommendations.
19. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
19
5.4.2 Item-Based Similarities in Recommendation Times
In this section, the results for five Item-Based similarities will be shown. Then by creating a
histogram graph, their recommendation times will be compared. In this experiment, the most
similar two items with the selected items will be displayed and it will be repeated ten times.
Table 9. For each repeat, recommendation times and average recommendation time for five Item-Based
similarities are shown as milliseconds.
ITEM BASED SIMILARITIES
EUCLIDEAN
DISTANCE
LOG LIKELIHOOD
PEARSON
CORRELATION
TANIMOTO
COEFFICIENT
UNCENTERED COSINE
500 531 501 500 500
500 563 578 485 531
536 578 531 537 536
516 562 516 500 548
532 563 547 500 539
562 547 516 516 531
563 547 531 524 532
534 563 547 516 581
547 562 547 516 575
547 531 531 522 532
533,7 554,7 534,5 511,6 540,5
Figure 17. Impact of Item Based algorithms on recommendation times.
As shown in Table 9 and Figure 17, while Log Likelihood similarity is the slowest item-based
recommendation algorithm and Tanimoto Coefficient Similarity is the fastest item-based
recommendation algorithm.
20. International Journal of Computer Science, Engineering and Applications (IJCSEA) Vol.6, No.3, June 2016
20
6. CONCLUSION
Recommender systems offer users some items that they may desire to buy from a business. These
system use user databases for taking supplement value for business. Recommender systems help
user items that they would like to buy from business. Likewise these systems help the business by
occurring more sales. Recommender systems are becoming an essential tool in e-commerce on
the Web. New technologies are required that can develop the scalability of recommender systems
which are being underlined by the huge volume of user data in current databases. Collaborative
filtering is a new way to filtering data that can select from database. Collaborative filtering
systems collect user’s previous data about an item such as movies, book, music, ideas, feeling,
and products. For recommending the best item, there are many algorithms that are based on
different approaches. According to Collaborative Filtering Systems, there is a mutual point that is
establishment of similarity between users and items. Collaborative-basedalgorithm extends to big
data sets also support high quality recommendations.
In this paper, collaborative filtering algorithms are discussed, and showed the difference of these
algorithms. We compare User-based and Item-based algorithms with different similarity index.
By using these algorithms, we implemented them to the movie recommender system. These
algorithms can be used in any other data sets in order to recommend items. There are much more
work to be done in collaborative filtering algorithms. Our most important suggestions for
improvements are below:
As we implemented the algorithms for making a movie recommender, these implemented
algorithms can be used in many movie web pages for providing an option for their users.
Furthermore, these algorithms can also be implemented in any other areas such as in a marketing
department, looking the previous production tastes of the customers, and recommending them the
best product. Also these algorithms can be used in web streaming areas such as music
recommendation and also online bookstores and so on.
One challenge is that in the large amount of data sets, performance is not fast enough.
Performance improvements must be done in the large data sets in order to recommend items as
quick as possible.
REFERENCES
[1] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl, “Item-Based Collaborative Filtering
Recommendation Algorithms”, Tenth International World Wide Web Conference (WWW10), May 1-
5, 2001, Hong Kong.
[2] Jonathan L. Herlocker, Joseph A. Konstan, Alborchers, and John Riedl, “An Algorithmic Framework
for Performing Collaborative Filtering”, Proceedings of the 22nd Annual International ACM
SIGIR’99 Conference on Research and Development in Information Retrieval, Pages 230-237, New
York, NY, USA.
[3] Keshav R, et al, / (IJCSIT) International Journal of Computer Science and Information Technologies,
Vol. 5 (3), 2014, 4782-4787.
[4] Sean Owen, Robin Anil, Ted Dunning, Ellen Friedman,“Mahout in Action”, ISBN: 978-1935-18-2-
689, Manning Publications Co., U.S America, 2012.
[5] Shuhang Guo (2014), Lapland University of Applied Sciences, “Analysis and Evaluation of Similarity
Metrics in Collaborative Filtering Recommender System”, Thesis of the Degree Programme in
Business Information Technology, Tornio 2014.
[6] Hiroshi Shimodaira, Similarity and recommender systems School ofInformatics,The University of
Edinburgh, 2014.
[7] Peter Casinelli, Advisor: Sergio Alvarez, “Evaluating and Implementing Recommender Systems As
Web Services Using Apache Mahout”, Boston College Honour Thesis, 2014.