PPT

794 views
741 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
794
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

PPT

  1. 1. Towards Using Grid Services for Mining Fuzzy Association Rules Mihai Gabroveanu, Ion Iancu, Mirel Cosulschi, Nicolae Constantinescu Faculty of Mathematics and Computer Science, University of Craiova, ROMANIA { mihaiug, mirelc,nikyc } @central.ucv.ro,i iancu@yahoo.com
  2. 2. Introduction <ul><li>In this paper we show how the Knowledge Grid infrastructure can be used to implement a distributed algorithm for mining fuzzy association rules from distributed databases over a Grid network. </li></ul>Grid network FUZZY MINING +
  3. 3. Outline <ul><li>Knowledge Grid services </li></ul><ul><li>Distributed fuzzy association rules mining </li></ul><ul><li>Distributed problem definition </li></ul><ul><li>The distributed algorithm </li></ul><ul><li>Rules mining implementation over the Grid </li></ul><ul><li>Conclusion </li></ul>
  4. 4. Knowledge Grid Services-1 <ul><li>The Knowledge Grid ([4], [5], [6]) defines an integrating architecture for distributed data mining and knowledge discovery. </li></ul><ul><li>It uses basic grid services to build specific knowledge services. </li></ul><ul><li>the Core K-grid layer - offers services directly implemented on the top of generic grid services; </li></ul><ul><li>the High level K-grid layer - is used to describe, develop and execute distributed knowledge discovery computations; </li></ul>
  5. 5. Knowledge Grid Services-2 Knowledge directory service (KDS). This service extends the basic Globus MDS service and it is responsible for maintaining a description of all the data and tools used in the Knowledge Grid. it is used metadata information stored in a Knowledge Metadata Repository (KMR). The Knowledge Base Repository (KBR) is used to maintain discovered knowledge. Another important repository is the Knowledge Execution Plan Repository (KEPR). It store the execution plans of data mining processes. Resource allocation and execution management service (RAEMS). These services are used to find best mapping between an execution plan and available resources,with the goal of satisfying the application requirements.
  6. 6. Knowledge Grid Services-2 Data Access Service (DAS). This service is responsible for the search, selection (data search services), extraction,transformation and delivery (data extraction service) of data to be mined. Tools and algorithms access service (TASS). This service is responsible for the search, selection, and downloading of data mining tools and algorithms. Execution plan management service (EPMS). This service is a semi-automatic tool that takes data and programs selected by user, and generate a set of different,possible plans that meet user, data and algorithms requirements and constrains. Results presentation service (RPS). This service specifies how to generate, present and visualize the models extracted.
  7. 7. Distributed fuzzy association rules mining-1 DB = {t 1 , . . . , tn} I = {i 1 , . . . , i m } Ex: I = { Age, Income, Weight }
  8. 8. Distributed fuzzy association rules mining-2 For example, we can take into onsideration for the attribute Weight the following three fuzzy sets: ” thin ”,” middle ” and ” fat ”. F weigth = { thin , middle , fat }
  9. 9. Distributed fuzzy association rules mining-3 〈 X,Fx 〉 = 〈 {Age, Income}, {young, high} 〉
  10. 10. Distributed fuzzy association rules mining-4 X = {Age, Income}, Y = {Weight}, F X = { middle, high }, F Y = { fat } “ If Age is middle and Income is high then Weight is fat ” 〈 X,Fx 〉 = > 〈 Y,F Y 〉 〈 {Age, Income}, {middle, high} 〉 ⇒ 〈 {Weight}, {fat} 〉
  11. 11. Distributed fuzzy association rules mining-4 T1= 〈 {Age, Income}, {middle, high} 〉 = 〈 {Age, Income}, { 0.5 , 1 } 〉 T2= 〈 {Age, Income}, {middle, high} 〉 = 〈 {Age, Income}, { 1 , 1 } 〉 The fuzzy support value of itemset 〈 X,Fx 〉 = 〈 {Age, Income}, {middle, high} 〉 0.5 * 1 + 1 * 1 = 1.5 / 2 = 0.75
  12. 12. Distributed fuzzy association rules mining-5 An association rule is considered as interesting if it has enough support and high confidence value. This association rule can be encountered under the name strong rule .
  13. 13. Distributed fuzzy association rules mining-6 <ul><li>The problem of sequential mining of fuzzy association rules can be decomposed in two subproblems: </li></ul><ul><li>find all large fuzzy itemsets . </li></ul><ul><li>2. generate the fuzzy association rules from the large fuzzy itemsets founded . </li></ul>
  14. 14. Example 〈 {Age, Weight}, {young, thin} 〉 => 1*0.5 + 0*0.5 〈 {Age, Weight}, {young, fat} 〉 => 1*0 + 0*1 〈 {Age, Weight}, {old, thin} 〉 = > 0*0.5 + 0.5*0.5 〈 {Age, Weight}, {old, fat} 〉 = > 0*0 + 0.5*1 Support count > Minsup large fuzzy itemsets 70 30 40 15 weight age fat old thin young 1 0 0.5 0 0.5 0 0.5 1 weight age
  15. 15. Distributed problem definition-1 <ul><li>Let DB = { DB 1 ,DB 2 , . . . ,DB n } be a distributed database over n sites S 1 , S 2 , . . . , S n . </li></ul>DB 2 DB 1 ……. DB n …..
  16. 16. Distributed problem definition-2
  17. 17. Distributed problem definition-3
  18. 18. Distributed problem definition-4
  19. 19. Distributed problem definition-5 <ul><li>Distributed Mining Fuzzy Association Rules </li></ul><ul><li>Given the set of items I, the distributed database DB = </li></ul><ul><li>{DB 1 ,DB 2 , . . . ,DBn}, the fuzzy sets associated with attributes from I , </li></ul><ul><li>the minimum support threshold ( minsup ) and the minimum </li></ul><ul><li>confidence threshold ( minconf ), extract all global fuzzy association </li></ul><ul><li>rules. </li></ul><ul><li>find all global large fuzzy itemsets . </li></ul><ul><li>2. generate the global fuzzy association rules from the </li></ul><ul><li>global large fuzzy itemsets founded . </li></ul>
  20. 21. Fuzzy Count Distribution Algorithm ………… . First generated L1 globally large fuzzy 1-itemsets L (1). local large fuzzy 1-itemsets local large fuzzy 1-itemsets local large fuzzy 1-itemsets globally large fuzzy 1-itemsets L (1). global large candidates 1-itemsets CA (1). CA ( k ) = Fuzzy_Apriori_Gen( L ( k− 1)) .
  21. 22. Rules mining implementation over the Grid-1 Distributed Rules Mining Scenario
  22. 23. Rules mining implementation over the Grid-2 <ul><li>In order to present the implementation of this </li></ul><ul><li>process in a Grid network we shall consider that: </li></ul><ul><li>the database DB is stored on K-grid node NodeA . </li></ul><ul><li>the tools needed for mining association rules (the partitioner P , mining frequent itemsets tool and association rules extractor) are available as multiplatform executables on K-grid node NodeS . </li></ul><ul><li>the results will be stored into the Knowledge Base Repository (KBR) on NodeU . </li></ul>
  23. 24. Rules mining implementation over the Grid-3 <ul><li>Let’s suppose that a Grid User (GU) needs to extract all association rules from database DB using tools available on K-grid node NodeS . </li></ul><ul><li>Step 1. The GU starts the search of computational resources for executing the data mining process from his K-grid node NodeU . In order to locate the computation resources needed to execute the mining process the KDS (Knowledge Discovery Service) will be used. </li></ul>
  24. 25. Rules mining implementation over the Grid-4 <ul><li>Step 2. The GU builds an execution plan for the data mining task, specifying strategies for tools and data movements.The execution plan is constructed by using the EPMS (Execution Plan Management Service). This plan will be stored into local KEPR (Knowledge Execution Plan Repository). </li></ul><ul><li>Step 3. The GU sends the execution plan to RAEMS (Resource Allocation and Execution Management ervice) which starts the application. </li></ul><ul><li>Step 4. The GU visualizes and evaluates the result of computation stored in KBR by means of the RPS (Result Presentation Service) tools. </li></ul>
  25. 26. Conclusion <ul><li>In this article, it is proposed an implementation of a distributed algorithm for mining fuzzy association rules from distributed databases into a Knowledge Grid environment. </li></ul><ul><li>The proposed algorithm uses some properties of global large fuzzy itemsets and local large fuzzy itemsets , reduction of computations made heavily relying on them. </li></ul>
  26. 28. Knowledge Grid Services-2

×