Coordination issues of multi agent systems in distributed data mining

255 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
255
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Coordination issues of multi agent systems in distributed data mining

  1. 1. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME41COORDINATION ISSUES OF MULTI-AGENT SYSTEMS INDISTRIBUTED DATA MININGThulasi.Bikku, Asst.ProfessorComputer Science Department, NRI Institute of Technology,Andhra Pradesh, INDIA.Prof. N.Sambasiva Rao, PrincipalComputer Science Department,Vardhaman College of engineering,Andhra Pradesh, INDIA.ABSTRACTData mining technology has evolved, for extracting knowledge and identifying patterns andtrends from large data resources. The Data mining technology normally adopts data integration methodto generate large go-down known as Data warehouse, which is used to gather all data into a centralrepository, and then run an algorithm against that data to mine the useful Patterns and knowledgeevaluation. However, a single data-mining technique has not been proven suitable for every domain anddata set. Distributed data mining is originated from the need of mining over decentralized data sources.Multi-agent systems (MAS), which are having Artificial Intelligence (AI), deal with complexapplications that require distributed problem solving. In many applications the individual and collectivebehavior of the agents depends on the observed data from scattered data sources. Since multi-agentsystems are often distributed and agents have proactive and reactive features which are very useful forKnowledge Management Systems, combining DDM with MAS for data intensive application. Theintegration of multi-agent system and distributed data mining, also known as multi agent baseddistributed data mining.In this paper we briefly discuss about the existing approaches and the importance of using agenttechnology in the domain of knowledge discovery and we propose an approach to distributed dataclustering, summarize its agent-oriented implementation, and security attacks in which agents mayincur. Its core problem concerns collaborative work of distributed data resources in the design ofmulti-agent system destined for distributed data mining and classification.Index Terms: Distributed Data Mining, Multi-Agent Systems, Multi Agent Data Mining, Multi-AgentBased Distributed Data Mining.INTERNATIONAL JOURNAL OF ADVANCED RESEARCH INENGINEERING AND TECHNOLOGY (IJARET)ISSN 0976 - 6480 (Print)ISSN 0976 - 6499 (Online)Volume 4, Issue 3, April 2013, pp. 41-48© IAEME: www.iaeme.com/ijaret.aspJournal Impact Factor (2013): 5.8376 (Calculated by GISI)www.jifactor.comIJARET© I A E M E
  2. 2. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME421. INTRODUCTIONData Mining (DM), originated from knowledge discovery from databases (KDD), the largevariety of DM techniques which have been developed over the past decade includes methods forpattern-based similarity search, cluster analysis, and decision-tree based classification, prediction,outlier analysis and generalization taking the data cube or attribute-oriented induction (AOI) approach,and mining of association rules [1]. Distributed data mining (DDM) mines data from data sources inspite of their physical locations. The need for such characteristic arises from the fact that data producedlocally at each site may not often be transferred across the network due to the excessive amount of data,which leads to increase in cost and security issues[2]. Recently, DDM has become a critical componentof knowledge based systems because its decentralized architecture reaches every network such asweather databases, financial data portals, or emerging disease information systems has been recognizedby industrial companies as an opportunity of major revenues from applications such as warehousing,process control, medical services, Bio informatics and customer services, where large amounts of dataare stored. Data Mining still poses many challenges to the research community. The main challenges indata mining are: 1) Data mining has to deal with huge amounts of data located at different physicallocations. 2) Data mining is computationally intensive process involving very large data i.e. more thanpetabytes. So, it is necessary to partition and distribute the data for parallel processing to achieveacceptable time, cost and space performance. 3) The data stored for particular domain the input datachanges rapidly because of regular changes in data [3]. In these cases, knowledge has to be mined fastand efficiently in order to be usable and modernized.2. MULTI-AGENTS BEHAVIORDDM is a complex system focusing on the distribution of data resources over the network aswell as extraction of useful patterns from those data resources [6, 7]. The very core of DDM systems isthe scalability as the system configuration may be altered according to the time, therefore designingDDM systems deals with great details of software engineer issues, such reusability, extensibility,efficiency, effectiveness, compatibility, flexibility, scalability, accuracy, privacy, security androbustness. For these reasons, agents’ characteristics [4, 5] are useful for DDM systems.Autonomy: A DM agent operate in an autonomous manner and they are self deterministic, DM agentshaving proactive and reactive features so they can deliberatively handle the access to the data source inagreement with constraints on the required autonomy of the system, data and model. This is in fullcompliance with the paradigm of cooperative information systems [15].Scalability: To reduce the work load of network and DM application server, DM agents migrate toeach of the local data sites in a DDM system on which they may perform mining tasks locally, and theneither return with or send relevant pre-selected patterns to their central server for further processing.Agents can perform tasks locally if they have sufficient knowledge and resources, and they can interactwith other agents to help in the completion of tasks. [16].3. STRATEGY OF LEARNINGSeveral systems have been implemented for distributed data mining. These systems can be classifiedaccording to their learning strategy to three types: central learning, meta-learning, and hybrid learning.3.1 Central learning strategy: When all the data can be gathered at a central data repository and asingle data model is build. Here the data has to move to a central data repository in order to integratethem and then apply sequential DM algorithms [12]. This strategy is used only when geographicaldistribution of data is very small. The strategy is generally very expensive because the data transferringfrom different sources is costly but it provides more accurate results [10]. Agent technology is notchosen in this strategy.
  3. 3. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME433.2 Meta-learning strategy: It offers a way to mine classifiers from homogeneously distributed data.Meta-learning follows three main steps [11]: 1) Firstly, it generates base classifiers at each site using aclassifier learning algorithms. 2) Secondly, it collects the base classifiers at a central site, and producemeta-level data from a separate validation set and predictions generated by the base classifier on it. 3)Thirdly, it generates the final classifier (meta-classifier) from meta-level data via a combiner or anarbiter. Copies of classifier agent will exist or deployed on nodes in the network being used [13]. Agenttechnology can be preferred in this strategy.3.3 Hybrid learning strategy is a technique that combines local and centralized learning for modelbuilding [14]. The major criticism of such systems is that it is not always possible to obtain an exactfinal result, i.e. the global knowledge model obtained may be different from the one obtained byapplying the one model approach (if possible) to the same data. Approximated results are not always amajor concern, but it is important to be aware of that.4. MULTI AGENT-BASED DISTRIBUTED DATA MINING (MADM)MADM takes data mining as a basis foundation and is enhanced with agent technology [16];therefore, this new data mining technique inherits all powerful properties of agents and, as a result,yields desirable characteristics. In general, constructing an ADDM system concerns three keycharacteristics: interoperability, dynamic system configuration, and performance aspects, discussed asfollows [8]: 1) Interoperability concerns, not only collaboration of agents in the system, but alsoexternal interaction with new agents, which enters the system seamlessly. The architecture of thesystem must be open and flexible so that it can support the interaction including communicationprotocol, integration policy, and service directory. In this system we had to follow the mechanisms foradding/removing agents. 2) Communication protocol covers message encoding, encryption, andtransportation of data between agents. Integration policy specifies how a system behaves when anexternal component, such as an agent or a data site, requests to enter or leave. The negotiation andcommunication mechanism to be adopted to allow the envisaged agents to “talk” to one another.3) Inthe dynamic system configuration, that system tends to handle a dynamic configuration, is a challengeissue due to the complexity of the planning and mining algorithms. A mining task may involve severalagents and data sources, in which agents are configured to equip with an algorithm and deal with givendata sources. In distributed environment, tasks can be executed in parallel, which leads to concurrencyissues. Quality of service control in performance of data mining and system perspectives is desired;however it can be derived from both data mining and agents’ fields.The MADM framework facing a number of issues [23]1. Multiple Data Mining Tasks: The MADM framework must be able to provide mechanisms to allowthe coordination of multiple data mining tasks. The number and nature of the data mining tasks that theframework is not known in prior, and is expected to evolve based on time. Consequently the frameworkshould be designed in such a way as to anticipate future tasks.2. Agent Coordination: The framework must be flexible since it must accommodate new agents asthey are created in the environment or remove the agents when they are no longer in use. Carefulconsideration therefore needs to be directed at the communication mechanisms.3. Agent Reuse: The framework must promote the opportunistic reuse of agent services by otheragents. It has to provide mechanisms by which agents may advertise their capabilities, and ways offinding agents supporting their capabilities.4. Scalability and Efficiency: The scalability of a data mining system refers to the ability of the systemto operate effectively and without a substantial or discernible reduction in performance as the number ofdata sources increases. Efficiency, on the other hand, refers to the effective use of the available systemresources.
  4. 4. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME445. Portability: A distributed data mining system should be capable of operating across multipleenvironments with different hardware and software configurations, and be able to combine multiplemodels with different representations. The framework should be able to operate on any major operatingsystem.6. Compatibility: Combining multiple models of data mining results has been receiving increasingattention in the data mining research literature. In much of the prior work on combining multiplemodels, it is assumed that all models originate from the same database or from different databases withidentical schema. This is not always the case, and differences in the type and number of attributesamong different data sets are not uncommon. The resulting model computed at a single database isdirectly dependent on the format of the underlying data.7. Adaptivity and Extendibility: Most data mining systems operate in different environments that arelikely to change, a phenomenon known as concept drift. The MADM has to adapt to changes and workeffectively.The MADM framework used should achieve extendibility to provide the means to easily accept andincorporate new data sources and new DM techniques.An ADDM framework can be generalized into a set of components and viewed as depicted infigure 4.1.We may generalize activities of the architecture into request and response, each of whichinvolves a different set of components. Basic components of an ADDM system are as follows.Fig. 4.1: Overview of ADDM system.Data: Data is the base layer of the architecture. In distributed environment, data can be hosted invarious forms, such as Transactional databases, online relational databases, data stream,Object-relational databases, Multimedia databases, web pages, etc., in which purpose of the data mightbe varied.5. PROPOSED SCHEMEHere we propose a schema, which describes about the coordination of agents in large-scaledistributed systems, which is becoming an increasingly challenging task. Continuous involvement ofusers and administrators is generally limited in large-scale distributed environments. System support isalso needed for configuration and reorganization when systems grows or shrinks with the addition orremoval of new resources. The primary goal of the management of distributed systems is to ensureefficient use of resources and provide timely service to users with effective computational cost. Most ofthe distributed system management techniques still follow the centralized model that is based on theclient-server model because of accurate results. Though it provides accurate results but Centralizationalso having some problems, such as: 1) it could cause a traffic overload and processing at theoriginating node may affect its performance;2) it does not provide scalability in the increase of thecomplexity of the network; 3) the fault in the central originating node can leave the system without amanager.
  5. 5. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME45One approach is the distributed management where management tasks are spread across themanaged infrastructure and are carried out at managed resources. The goal is to minimize the networktraffic related to management and to speed up management tasks by distributing operations across dataresources. The new trend in distributed system management involves using multi-agents to manage theresources of distributed systems. Agents have the capability to autonomously travel (execution state andcode) among different data repositories to complete their task. The route may be predetermined orchosen dynamically depending on the results at each local data repository. That is, the agents can sharea common goal (e.g. an ant colony), or they can pursue their own interests.5.1 Multi-agent system structureThe multi-agent system structure assumes that each node in the system will have a set of agentsresiding and running on that node [21, 22]. These agent types are the following:Client agent (CA) sends or receives service requests, initiated by the user, from the system. The CAmay receive the request from the local user directly. In the other case, it will receive the request from theexporter agent coming from another node based on the request of the user.Service list agent (SLA) it consists of a list of the resource agents in the system. This agent will receivethe request from the CA initiated by the user and send it to the resource availability agent. If the replyindicates that the requested resource is local then the service list agent will deliver the request to thecategorizer agent. Otherwise, it will return the request to the nearby CA.Resource availability agent (RAA) indicates whether the requested resource is free and also availablefor use or not. It also indicates whether the requested resource is local or remote. It receives the requestfrom the service list agent and checks the status of the requested data resource through the access of theManagement Information Base (MIB). The agent then constructs the reply depending on the retrievedinformation from the data resource.Resource agent (RSA) is responsible for the operation and control of the resource. This agent executesthe request on the resource. Each node may have zero or more RAs.Router agent (RA) provides the path of the requested resource on the network in case of accessingremote resources. Before being dispatched, the exporter agent will ask the router agent for the path ofthe requested resource. This in turn delivers the routing path to the exporter agent.Categorizer agent (CZA) allocates a suitable resource agent to perform the users’ request. This agentreceives the inputs coming from the service list agent. It then tries to find a suitable nearby free resourceagent to perform the requested service.Exporter agent (EA) is a mobile agent that can carry the user request through the path identified by theRA to reach the node that has the required resource. It passes the requested resource id to the RA andthen receives the reply. If the router agent has no information about the requested resource, the EA willtry to locate the resource in the system based on the users’ request. There are also two additional mobileagent types exist in the system.Representative agent (RPA) is a mobile agent that is launched in each sub network. It is responsiblefor traversing sub network nodes instead of the exporter agent to do the user requested task and carryresults back to the exporter agent.Collector agent (CTA) is a mobile agent that is launched from the last sub network visited by theexporter agent. It is launched when results from that sub network become available. This agent goesthrough the reversed itinerary of the exporter agent trip. The CTA collects results from therepresentative agents and carries it to the source node. All mobile agents used here are of interruptdriven type.
  6. 6. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME465.2 System’s operationThe activity cycle of our multi-agents residing inside a local data repository. The clientagent (CA) receives the service requests either from the user or from an exporter agent (EA), if therequest is from other subnet. The client agent then asks the service list agent (SLA) for theexistence of a resource agent that can perform the request. The service list agent (SLA) checks theavailability of the required resource agent by consulting a resource availability agent to perform therequested service of the user. The reply of the resource availability agent describes whether or notthe resource is locally available and whether or not there is a resource agent that can perform therequested service. If the resource availability agent then accepts the request, the service list agentwill ask the categorizer agent to allocate a suitable resource agent to the requested service and.Otherwise, the service list agent informs the client agent with the rejection and is passed to theexporter agent, if the requested resource is in other subnet. The exporter agent asks the router agentfor the path of the required resource agent. Once the path is determined, the exporter agent will bedispatched through the network channel to the destination node identified by that path. If the routeragent has no information about the location of the required resource agent, the exporter agent willsearch the distributed system to find the location of the required resource agent and assign therequired task to it. As shown in Fig. 5.2.2, the exporter agent traverses the sub networks of thedistributed system through its trip. At each sub network, a representative agent is launched totraverse the local nodes of that sub network doing the required task and carrying results of that taskand providing the results to the client agent. The agents of the social interface described in Fig.5.2.1 are implemented at each node in the system. There are two approaches to collect results of therequired task and send these results back to the source. The results of the local resources aretransferred to the originating source and combined to give requested query output.Fig. 5.2.2: Overview of network architecture MADM.So in our proposed scheme, we effectively used parallel processing of queries using multiagentsystem technology.
  7. 7. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME476. CONCLUSIONSA Distribution Management System (DMS) is a collection of applications designed toanalyze, monitor & control the entire distribution network efficiently and reliably. This article describesa new multi-agent system for the management of distributed systems. The system is proposed tooptimize the execution of management functions in distributed systems and provide effectivecomputational costs. The proposed system can locate, monitor, and manage resources in the system.The new technique in that system allows management tasks to be submitted to sub networks of thedistributed system and executed the tasks in a parallel fashion. The proposed system uses two multi-agents: Representative agent and Collector agent. The first is used to submit tasks to the sub networks ofthe distributed system and the other collects results from these sub networks. The proposed system iscompared against traditional management techniques in terms of response time, speedup, andefficiency. The performance results indicate a significant improvement in response time, speedup,efficiency, and scalability can be compared to traditional techniques. The use of JVM in theimplementation of the proposed system gives the system a certain type of portability such that it can beused anywhere. Therefore, it is desirable to use the proposed system in the management of distributedsystems. The proposed system is limited to be applied to high-speed networks that have bandwidthmore than 100 Mb/s. Future research will be related to the security of data and mobile agents and ofhosts that receive them in the context of public networks. Mobile agents should be protected againstpotentially malicious hosts. The hosts should also be protected against malicious actions that may beperformed by the mobile code they receive and execute. Effective Security must be provided for themultiagents in the distributed data mining environment.REFERENCES:[1] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discoveryand Data Mining. In Proceedings of the Association for the Advancement of ArtificialIntelligence(AAAI) Press/MIT, 1996.[2] F. Provost. Distributed Data Mining: Scaling Up and Beyond. In Proceedings of the Advances inDistributed and Parallel Knowledge Discovery, MIT/AAAI Press, Cambridge, MA, New York. pages(3-27), 1999.[3] R. Schollmeier. A Definition of Peer-to-Peer Networking for the Classification of Peer-to-PeerArchitectures and Applications. IN Proceedings of the First International Conference on Peer-to-PeerComputing (P2P) IEEE, 2001.[4] I. Rudowsky. Intelligent Agents, volume 14, pages (275-290). In Proceedings of theCommunications of the Association for Information Systems, Springer, London, England, 2004.[5] “An Introduction to Multiagent Systems” by Michael Wooldridge. Published in February 2002 byJohn Wiley & Sons (Chichester, England). ISBN 0 47149691X.[6] T. Marwala and E. Hurwitz. Multi-Agent Modelling using intelligent agents in a game of Lerpa.eprint arXiv:0706.0280, 2007.[7] B. van Aardt and T. Marwala. A Study in a Hybrid Centralised-Swarm Agent Community. InProceedings of the IEEE 3rd International Conference on Computational Cybernetics, Mauritius, pages(169-74), 2005.[8] A. Symeonidis and P. Mitkas. Agent Intelligence Through Data Mining, volume XXVI, pages(0-206). In Proceedings of the Multi-agent Systems, Artificial Societies, and Simulated Organizations,2006.[9] L. Cao, C. Luo, and C. Zhang. Agent-Mining Interaction: An Emerging Area. In Proceedings of theAIS-ADM, LNAI 4476, Springer - Verlag, Berlin, Germany, pages (60-73), 2007.[10] R. Bose and V. Sugumaran. IDM: An Intelligent Software Agent Based Data Mining Environment.In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 1998.
  8. 8. International Journal of Advanced Research in Engineering and Technology (IJARET), ISSN 0976 –6480(Print), ISSN 0976 – 6499(Online) Volume 4, Issue 3, April (2013), © IAEME48[11] J. Bota, A. Gmez-Skarmeta, M. Valds, and A. Metala. A meta-learning architecture. InProceedings of the Computational Intelligence. Theory and Applications, 2206/2001, pages (688-698),2001.[12] J. Balter, A. Labarre-Vila, D. Zibelin, and C. Garbay. A Platform Integrating Knowledge and DataManagement for EMG Studies. In Proceedings of the Artificial Intelligence in Medicine (AIME), 2101,pages (417-420), 2001.[13] R. Grossman and A. Turinsky. A framework for finding distributed data mining strategies that areintermediate between centralized strategies and in-place strategies. In Proceedings of the KDDWorkshop on DistributedData Mining, 2000.[14] Souptik Datta, Kanishka Bhaduri, Chris Giannella, Ran Wolff, Hillol Kargupta, "Distributed DataMining in Peer-to-Peer Networks," IEEE Internet Computing, vol. 10, no. 4, pp. 18-26, July/Aug. 2006,doi:10.1109/MIC.2006.74[15] R. Lakshman Naik, D. Ramesh and B. Manjula, “Instances Selection Using Advance Data MiningTechniques”, International Journal Of Computer Engineering & Technology (IJCET) Volume 3, Issue 2,2012, pp. 47 - 53, ISSN Print : 0976 – 6367, ISSN Online : 0976 – 6375.[16] Mr. M. Karthikeyan, Mr. M. Suriya Kumar and Dr. S. Karthikeyan, “A Literature Review on theData Mining and Information Security” International Journal of Computer Engineering & Technology(IJCET) Volume 3, Issue 1, 2012, pp. 141 - 146, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.[17] R. Manickam, D. Boominath and V. Bhuvaneswari, “An Analysis of Data Mining: Past, Presentand Future”, International journal of Computer Engineering & Technology (IJCET), Volume 3, Issue 1,2012, pp. 1 - 9, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.

×