Published on

International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. N. A. Pansare et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.403-406 RESEARCH ARTICLE OPEN ACCESS Association Rule Mining in Distributed Environment Mrs. V. C. Kulloli (Guide) 1, O.A. Omble2, V.G. Gadle3, Y. G. Potdar4, N. A. Pansare5, 1 Asst. Prof in IT PCCOE B.E.I.T PCCOE 2,3,4,5 Abstract Association rule mining is an important term in data mining. Association rule mining generates important rules from the data. These rules are called frequent rules and the whole concept is known as frequent rule mining. Earlier this technique was used to be implemented at local machines to generate rule. But when the data size increases as transaction on data increases then local machines took large time to compute the frequent rules. To reduce the time, local machines started upgrading their machine with higher configuration like expanding RAM or Hard-Disk etc. In our paper we propose a technique in which the data is divided between the machines and each machine compute the frequent rules based the data which is given to them after division of data. This technique is related to distributed data mining. Existing framework such as IDMA, EMADS, suffers the communication overhead. In this paper, the proposed framework will attempt to reduce the communication overhead. For providing more security against unauthorized clients, we are using RC4 algorithm for encryption and decryption of messages which is going to be passed between the clients. Index Terms— FI-Mining, Association Rule Mining, Distributed Data Mining, Intelligent Agent Based Mining I. INTRODUCTION Data mining is a very old concept in research area. Since the development in computer science the roots of data mining are going deeper and deeper. There are many tools available for extracting information from the data. One of the techniques is known as Association Rule Mining introduced by Agrawal et al [1]. It is highly used for generating frequent pattern set and most popular algorithm is Apriori algorithm. But the problem with association rule mining is, it consumes large time for the large data set. Even a large disk space is required for the large data set. Distributed data is one way in which minimizes this problem can be minimized. In distributed data base the data is divided in many parts and each part is saved on different machines. Now each machine is used to mine the important data from the data set which is given to them machine after division of data. Mining rules in distributed data is known as distributed data mining (DDM). A large amount of time is saved using distributed data mining. Performing association rule mining in distributed data mining is called distributed association rule mining. Our paper is totally based on distributed association rule mining and reduces the communication between the distributed machines means reduces communication overhead. We are attempting to produce secure transmission of messages between different Clients. For that we are using s a encryption and decryption of messages. The algorithm we are using for encryption and decryption is RC4 algorithm [12]. The client will send a message to another client. Server receives the message and encrypts the message. Server then sends this encrypted message to the client which was supposed to receive the message. After receiving the message, client will decrypt the message in readable form. II. RELATED WORK There are some tools already present in the field of distributed data mining. Some of them are mentioned below. IDMA [9] architecture shows mobile agent based distributed and incremental association rule mining. The system includes the distributed knowledge discovery management system (KDMS), the knowledge discovery sub-system (sub-KDS), the data mining mobile agent (DMMA) and the local knowledge base (LKB). The KDMS dispatches the mobile agent DMMA to each site. The mobile agents move to the sub-KDS and execute the mission of data mining. The local large item set scan be got so the local association rules can be obtained and the local knowledge base can be refreshed. The set of local large item sets and their support counts led back to the KDMS by the mobile agents. When all the mobile agents come back to KDMS, the possible minimum and maximum support counts of the potential global 403 | P a g e
  2. 2. N. A. Pansare et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.403-406 item sets can be got. This system was implemented based on IBM Aglet. An Extendible Multi-Agent Data mining System (EMADS) [8] framework promotes the ideas of high-availability and high performance without compromising data or DM algorithm integrity. This framework provides a highly flexible and extendible data-mining platform. The resulting system allows users to build collaborative DM approaches. The proposed framework has been applied to a number of DM scenarios: Meta association rule mining (Meta ARM) and classifier generation. Our distributed association rule mining frameworks attempt to integrate global knowledge after the local mining. This obviously initiates several research problems: Reducing high communication cost, handling multiple heterogeneous data sources, improve the efficiency of incremental knowledge integration, scalability of the framework, data privacy & security, fault Tolerance of EMADS and efficient Data Partitioning III. PROPOSED FRAMEWORK This section describes the working of our proposed framework. A client server system is made in which all clients are registered with server. The data is divided between all clients. Association rule mining algorithm is present at each client system but not in server system as it was in earlier mentioned tools. If a client system is requires frequent item sets then that client generate its request and send it to server. Now server will ask each client to apply their association rule mining for generating frequent rules. Client systems generate there rules and send it to server. The rules generated at client side are call local rules. Now server will add these local rules and make a global rule which is called global frequent item set. Algorithm 1: Routine K-Server 1. function K-Server (minsup,ls) 2. { 3. min_sup=minsup; 4. key=genkey(); 5. GFIL=ø; 6. LS=ls; 7. visit=true; 8. if (visit) then 9. { 10. AG=MAGen(LS,min_sup,key,GFIL,visit); 11. Dispatch(AG,; 12. } 13. else 14. { 15. AG=Receive(AG); 16. GenAsso(GFIL); 17. } 18. } Algorithm 2: Routine SA 1. function SA(ls,ms,k,fil,v) 2. { 3. If (v) 4. { 5. If (key==k) 6. { 7. Find FI ; 8. Update GFIL; 9. If ( 10. { 11. v=false; 12. } 13. Dispatch(AG,; 14. } 15. } 16. } Below is the proposed architecture diagram of our framework. Figure 1: Client-Server based DDM Process This global frequent item set is then passes to each client system by server and hence each client is aware of total rules present in the actual data set. Below is the proposed algorithm for our proposed framework. Figure 2: MAD-ARM Model 404 | P a g e
  3. 3. N. A. Pansare et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.403-406 SA : Stationary Agent (Clients) MA : Mobile Agent (Message Sent to each client) LS : Local List (Generated by each client) GFIL: Global Frequent Item set (Generated by adding all LS) IV. SUPPORT AND CONFIDENCE PHENOMENON Any given association rule has a support level and a confidence level. If the percentage of the population in which the antecedent is satisfied is s, then the confidence is that the percentage in which the consequent is also satisfied. Every association rule has a support and confidence.The support is the percentage of transactions that demonstrate the rule.An itemset is called frequent if its support is equal or greater than an agreed upon minimal value the support threshold.The confidence is the conditional probability that, given X present in a transition , Y will also be present. Confidence measure, by definition: Confidence(X=>Y) equals support(X,Y) / support(X) If Client 1 wants to send important message to another Client2 then Client 1 will request to server that he/she wants to send a secure message to Client2. Server will accept the request and encrypt the message of Client 1 using RC4 algorithm. The encrypted message and a key are then sent to server. Client 2 receives the encrypted message from the server. Client 2 decrypts the message in readable form using RC4 algorithm with the provided key by server. VI. CONCLUSION In this paper we present the overview of Association rule mining in distributed environment or called Distributed Association rule Mining and points out the issues in existing systems. Our approach has minimized the communication between the different system used for association rule mining. We had added a secure message transfer between different clients using RC4 algorithm to provide more security for unauthorized Clients. REFERENCES [1]. V. RC4 ALGORITHM RC4 is recognized as the most commonly utilized stream cipher in the world of cryptography. It is also acknowledged with two other names such as the ARC4 and ARCFOUR, which means Alleged RC4.The person responsible behind the creation of the RC4 is no other than Ronald Rivest of RSA Data Security Inc. Based on how it was created, RC4 had the exact function as a shared key stream cipher algorithm that entails a highly-secured transfer of a specific shared key.RC4 has a use in both encryption and decryption while the data stream undergoes XOR together with a series of generated keys. It takes in keys of random lengths and this is known as a producer of pseudo arbitrary numbers. The output is then XORed together with the stream of data in order to generate a newly-encrypted data. Hence, a particular RC4 key should never be utilized again when encrypting two other data streams. [2]. [3]. [4]. [5]. [6]. [7]. [8]. R. Agrawal, T. Imielinski, and A. Swami, “Mining Associations between Sets of Items in Massive Databases,” Proceedings of the ACM SIGMOD, Washington DC, 1993. Jaturon Chattratichat, John Darlington, Moustafa Ghanem, and et. al, “Large Scale Data Mining: challenges and Responses”, Proceedings of the 3th International Conference on Knowledge Discovery and Data Mining, 1997. Rakesh Agrawal and John C. Shafer, “Parallel Mining of Association Rules”, IEEE Transactions on Knowledge and Data Engineering, 1996. Matthias Klusch, Stefano Lodi and Gianluca Moro, “Agent based distributed data mining: The KDEC Scheme”. A.O.Ogunde, O.Folorunso, A.S.Sodiya and G.O.Ogunieye, “ A review of some issues and challenges in current agent based distributed association rule mining”, Asian Journal of Information Technology, 2011. E.I. Ariwa, M.B.Senousy and M.M.Medhat, “Information and E-business model application for distributed data mining using mobile agents”, Proceedings of the international conference WWW/Internet, USA,2003. G.S.Bhamra, A.K.Verma and R.B.Patel, ”Agent Enriched Distributed Association Rule Mining: A Review”. Springer Verlag Berlin Heidelberg, 2012. Kamal Ali Albashiri, FransCoenen, and Paul Leng, “An investigation into the issues of 405 | P a g e
  4. 4. N. A. Pansare et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 4, Issue 2( Version 1), February 2014, pp.403-406 Multi-Agent Data Mining”Ph.D-Thesis 2010. [9]. Yun-Lan Wang, Zeng-Zhi Li and Hai-Ping Zhu, “Mobile Agent Based Distributed and Incremental Techniques for Association Rules”. In Proceeding of the Second International Conference on Machine Learning and Cybernetics”, 2003. [10]. U.P.Kulkarni, P.D.Desai, Tanveer Ahmed, J.V.Vadavi and A.R. Yardi, “Mobile Agent Based Distributed Data Mining”, ICCIMA, 2007. [11]. WalidAdlyAtteya, KeshavDahal and M.AlamgirHossain, “Distributed BitTable multi-agent Association Rules Mining Algorithm”, Springer-Verlag, KES 2011, Part I, LNAI 6881. [12]. Quentin, Galvane, Baptiste, Uzel ”Cryptography-RC4 Algorithm” February 18,2012. 406 | P a g e