50120140505004

245 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
245
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

50120140505004

  1. 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 32-39 © IAEME 32 A SPEEDY APPROACH: USER-BASED COLLABORATIVE FILTERING WITH MAPREDUCE Nilay Narlawar, Ila Naresh Patil Department of Computer, MMCOE, Pune, India CSE Department, IES College Bhopal (MP), India ABSTRACT The conventional collaborative filtering system generates high-quality recommendations by influencing the likings of society of similar users but it has drawbacks as sparse data problem & lack of scalability. A new recommender system is required to deal with the sparse data problem & produce high quality recommendations in large scale mobile environment. In this paper, the described algorithm of recommendation mechanism for mobile commerce is user based collaborative filtering using Hadoop’s MapReduce with Bloom Filter on distributed environment like cloud computing which solves scalability problem in conventional CF system. The cloud/distributed computing has advantages as flexibility, high efficiency & helps to solve quality problem of mobile commence recommendation system. Bloom filters used in MapReduce will help to reduce the intermediate results in map phase which in turn speed up the overall process of recommendation. This research shows how MapReduce can be used to parallelize Collaborative Filtering. It also presents the architecture to enhance the join performance using Bloom filters in the MapReduce framework. Keywords: Bloom Filter, Collaborative Filtering, Distributed Environment, MapReduce Algorithm, Recommender System. 1. INTRODUCTION Many clients like to use the Web to discover product details in the form of online reviews. These reviews are given by other clients and specialists. User-given reviews are becoming more prevalent. Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user. The suggestions relate to various decision-making processes, such as INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 5, Issue 5, May (2014), pp. 32-39 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2014): 8.5328 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  2. 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 32-39 © IAEME 33 what items to buy, what music to listen to, or what online news to read. “Item” is the general term used to denote what the system recommends to users. A RS normally focuses on a specific type of item (e.g., CDs, or news) and accordingly its design, its graphical user interface, and the core recommendation technique used to generate the recommendations are all customized to provide useful and effective suggestions for that specific type of item [1]. Recommender systems or recommendation systems are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item (such as music, books, or movies) or social element (e.g. people or groups) they had not yet considered, using a model built from the characteristics of an item (content-based approaches) or the user's social environment (collaborative filtering approaches) [2, 3, 4]. The aim of a recommender system is often to "help consumers learn about new products and desirable ones among myriad of choices". [5, 6, 7] 2. WORKING OF USER-BASED COLLABORATIVE FILTERING Step1: 1) Obtain “User History” in rating matrix, which is a table in which row represents user and column represent items. Intersection represents the rating given by user to item. Absence of value represents user has not given rating to item. This problem is referred as sparse scoring, which is handled by replacing the matrix. [1,2] Step2: 1) Calculate the similarity between users. For that, many similarity measure methods are available. One of the famous method is person correlation coefficient which is benchmark for CF. We use cosine similarity measure method given by 2) Finding the nearest neighbors from the similarity calculations. Step3: 1) The algorithm calculates the item rating i.e. generates another rating matrix intern. For that the rating is calculated by a weighted average of the rating by the neighbors Where, γx is the average rating of user x. To reduce the highly intensive computing time and computer resources we purpose new method of CF on Hadoop platform. 3. GENERAL MAPREDUCE OVERIVEW 1) It is a distributed implementation model which is proposed by google.com. The following working described its working on Hadoop platform. 2) The MapReduce model is inspired by the Lisp programming language.
  3. 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 32-39 © IAEME 34 3) It is divided into two phases • Map Phase • Reduce Phase Map Phase: Map Phase takes a set of key/value pairs and produces a set of key/values pairs. Here, it groups together all intermediate values associated with the same intermediate key I and passes them to the Reduce phase. Reduce Phase: It accept the intermediate key I and a set of values for the key. It merges together these values and produces only one value per reduce invocation. 4) In the Hadoop platform, the default input dataset size of one mapper is less than 64MB file. If the file size is larger than 64MB, the platform would split it into a no. of small files which size less than 64MB automatically. 5) For every i/p file, the Hadoop platform initialize a mapper to deal with it where the file’s line no. as the key and the content of the line as the value. In map phase, the user can define process to deal with the i/p key/value and pass the intermediate key/value to the reduce phase. Finally Hadoop platform would kill the corresponding mapper. 4. PROPOSED RESEARCH ARCHITECTURE Collaborative Filtering Algorithm can be implemented within the MapReduce framework. It is difficult to directly use MapReduce model in computation process of Collaborative Filtering algorithm. The recommendation process for each user is summarized in the Map function i.e. while making recommendation, we save user ID in text files which serves as input to the Map function. The MapReduce framework defines few mappers to handle the user ID files. Fig 1 shows Application Architecture Diagram. Fig. 4.1: Application Architecture Diagram
  4. 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 32-39 © IAEME 35 5. PROPOSED ALGORITHM The proposed algorithm is divided into three phases: 1) Data Partitioning Phase 2) Map Phase 3) Reduce Phase 5.1 Data Partitioning Phase Here, it separates the UserID into different files, in these files each row store a UserID. These files are as the I/P to the map phase. 5.2 Map Phase The Hadoop platform, initialize a new mapper if the Datanode has enough response to initialize a mapper. The mapper’s setup builds the rating matrix between user and item which are already filtered by local filter. The mapper reads the UserID file by line no. Take the line no. as the i/p key and contents of the line as the values. The local filter of Bloom filter randomly selects 50% users by the random function. In the next step, it computes the similarity between this user and other users. Finally, it identifies the user’s nearest neighbor (by similarity values) and accordingly with equation 2 to calculate his predict rating on items. The Global filter of the Bloom filter works for the accuracy. It compares the two rating matrices and use e.g. threshold value to select the users from them. The algorithm sort the predict rating and store them in recommendation list. The UserID and its corresponding recommendation list as the intermediate key/value, output them to the reduce phase. Fig. 5.2.1: Working Sequence of MapReduce 5.3 Reduce Phase The Hadoop platform would generate some reducers implicitly. The reducers collect the UserID and its corresponding recommendation list, sort them to UserID and then o/p them to the HDFS.
  5. 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 32-39 © IAEME 36 Working of the algorithm is shown in below diagram: Fig. 5.3.1: User-based CF’s MapReduce- Bloom Filter 6. EXPERIMENTAL ANALYSIS 6.1 Implementation We have implemented our experiments for CF algorithm on Java platform. As explained earlier in, the Hadoop computer-cluster created on five computers. Here, we refer one of the computers as MainNode & remaining four as DataSetNodes. Each computer is having 4 GB RAM & Intel(R)core(TM) i5 CPU with 2.5GHz speed & Operating System Ubuntu 10.10. also the software used for the experiments are Hadoop MapReduce framework, Java JDK 1.6, the Mobile device (Android 3.0 & above), wireless Router are additional hardware we have used. The dataset is created by Netflix data set. The list of different movies is maintained in the dataset and more than 10,000 users. The users will define different ratings for each movie, not necessary the same rating. The role of our CF algorithm is to compare the runtime between standalone & Hadoop platform, so that we don’t focus on accuracy. We take 3 copies of sub-datasets with 100 users, 200 users, 500 users & 1000 users. The DataSetNode is also divided into 2 nodes, 3 nodes, 5 nodes. 6.2 Analysis For the comparative analysis of standalone & Hadoop platform, we have considered average time tavg as the Hadoop platform at current DataSetNode and the data set running time. Here the speedup is an important criterion to measure the efficiency of our algorithm. The speedup is given by, In our CF algorithm the recommendation is based on the division of each user theoretically, if we consider N nodes the speedup should be N, in other words, ideally the speedup should be linearly related to the number of DataSetNode.
  6. 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 32-39 © IAEME 37 Fig. 6.2.1: Comparative Speedup of MapReduce Vs MapReduceBF on 2 & 3 DataSetNodes In the figure 6.2.1 we have shown the analytical result in graph which implies, the time taken by simple Hadoop’s MapReduce is more than Hadoop’s MapReduce with Bloom Filter by increase number of Movies on Distributed environment. Also from the graph we can say that increase in number of DataSetNodes, the speedup increases linearly. Fig. 6.2.2: Analysis of Speedup Vs 3 DataSetNodes
  7. 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 32-39 © IAEME 38 From figure 6.2.2 we can observe that, for 100 users, 200 users, 300 users, 400 users, 500 users the speedup is not linearly increase, this is because the data set is too small, thus the Hadoop platform is unable to demonstrate its efficiency[10]. From figure 6.2.2 and 6.2.3 we can observe that, for 100 users, 200 users, 300 users, 400 users, 500 users the speedup is not linearly increase, this is because the data set is too small, thus the Hadoop platform is unable to demonstrate its efficiency [10]. Fig. 6.2.3: Analysis of Speedup Vs 5 DataSetNodes 7. CONCLUSION As the amount of information in e-commerce and mobile commence grows explosively, filtering irrelevant information but finding useful contents and reliable sources has gained more importance. Recommender systems have become a classic tool that interlinks users with information content and sources. However, regardless of its success in many application settings, conventional CF encounters a number of limitations which influence its recommendation accuracy. Bloom filters used in MapReduce will help to reduce the intermediate results in map phase which in turn speed up the overall process of recommendation. This research shows how MapReduce can be used to parallelize Collaborative Filtering. It also presents an architecture to enhance the join performance using Bloom filters in the MapReduce framework. 8. REFERENCES [1] Zhi-Dan Zhao, Ming-Sheng Shang. User-Based Collaborative-Filtering Recommendation Algorithms on Hadoop[C]. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, (2010) 478 – 481.
  8. 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 5, May (2014), pp. 32-39 © IAEME 39 [2] Zan Huang, Daniel Zeng and Hsinchun Chen “A Comparison of Collaborative-Filtering Recommendation Algorithms for E-commerce” Intelligent Systems, IEEE, vol 22, no.5, pp.68-78 Sept-Oct, 2007, doi.10.1109/MIS.2007.4338497URL:http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumb er=4338497&isnumber=4338472. [3] Trust and Distrust-Based Recommendations for Controversial Reviews 1541 1672/11/$26.00 © 2011 IEEE INTELLIGENT SYSTEMS. [4] J. Ben Schafer, Joseph Konstan, John Riedl Recommender Systems in E-Commerce Group Lens Research Project Minneapolis, MN 55455 1-612-625-4002. [5] Yaming ZHANG, Haiou LIU, Shiyong LI: A Distributed Collaborative Filtering Recommendation Mechanism for Mobile Commerce Based on Cloud Computing”, Journal of Information & Computational Science 8: 16 (2011) 3883–3891 Available at http://www.joics.com. [6] Zhili Wu1, Xueli Yu2 and Jingyu Sun:” An Improved Trust Metric for Trust-aware Recommender Systems” 2009 First International Workshop on Education Technology and Computer Science. [7] M. Deshpande and G. Karypis, “Item-Based Top-N Recommendation Algorithms,” ACM Trans. Information Systems, vol.22, no.1, 2004, pp. 143–177. [8] Priya Deshpande and Sunayna Giroti, “Priority Based Dynamic Adaptive Checkpointing Strategy in Distributed Environment”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 6, 2013, pp. 378 - 385, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [9] Paulo J. G. Lisboa, Huda Naji Nawaf and Wesam S. Bhaya, “Recommendation System Based on Association Rules Applied to Consistent Behavior Over Time”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 4, 2013, pp. 412 - 421, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [10] Suresh Kumar RG, S.Saravanan and Soumik Mukherjee, “Recommendations for Implementing Cloud Computing Management Platforms using Open Source”, International Journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 3, 2012, pp. 83 - 93, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [11] Anuj Verma and Kishore Bhamidipati, “A Survey of Memory Based Methods for Collaborative Filtering Based Techniques for Online Recommender Systems”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 2, 2013, pp. 366 - 372, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.

×