• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Establishing knowledge base

Establishing knowledge base






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Establishing knowledge base Establishing knowledge base Document Transcript

    • Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, Kunming, 12-15 July 2008 ESTABLISHING KNOWLEDGE BASE OF EXPERT SYSTEM WITH ASSOCIATION RULES DONG-LAI MA1, WEN-JING ZHANG2, BIN DONG3, PENG YANG2, HAI-XIA LU2 1 Teaching Affair Office, Hebei Software Institute, Baoding 071000, PR China College of Information 2 College of Information Science & Technology, Agricultural University of Hebei, Baoding 071001, PR China 3 Computer Center, Hebei University Affiliated Hospital, Baoding 071000, China E-MAIL: zwjndjs@hotmail.com, dbin2000@hotamail.comAbstract: Both expert system and data mining belong to the 2. Introductions of the Association RulesArtificial Intelligence fields. Association rule is a method ofdatamining, whose typical application is analyzing the In 1993, Agrawal put forward to mining associationshopping basket in supermarket. The main task of expert rules of items in the customer transaction database for thesystem is ratiocination, while that of association rule is to findout the valuable relationship between each data item. By first time[3]. The mining of association rules is to discovermodifying the apriori arithmetic and the method of the the interesting relevancy or correlative relationship among amaking rules, we mine the decisive rule of database that could great deal of data items. The first is mining frequentbe applied in expert system, thereby find out the method of itemsets with Apriori, and then produce association rulesmining decisive rule using association rules. according the frequent itemsets mined [4].Keywords: 2.1. mining frequent itemsets: Apriori algorithm Data Mining; Association Rule; Apriori; Expert System;Knowledge Base In 1994, Agrawal put forward an important method of mining association rules in the customer transaction1. Introduction database[5, 6]. Its kernel is Apriori, which is based on the deduction of two phase’s frequent itemsets. Apriori is the The expert system is a knowledge system which takes basic algorithm of mining Boolean association rules. Itcomputer as a tool and makes use of the expertise and mines frequent itemsets by using a circulatory method ofknowledge consequence to comprehend and solve the searching frequent itemsets that produces (k+1)-itemsetsproblem[1]. It imitates the macroscopically inferential from k-itemsets. After scanning the database, the algorithmactivity of expert, and uses computer to inference realm begins to circulate. Each time, it connects two (k-1)knowledge which is conformed to the model described[2]. itemsets, and produce frequent k-itemsets after the riddlingExpert system is made up of four parts: knowledge base, of Apriori property, that is "each subsets of the frequentinference engine, knowledge acquisition and explaining itemsets also should be frequent" as well as the minimuminterface with the knowledge base and inference engine as support degree. The Apriori ends till the frequent itemsetsits kernel。 can’t be produced. Data mining means to abstract the information ormode which is implicit, unknown and valuable in large 2.2. making association rules from frequent itemsets[7,database or data warehouse. Association rule is one of the 8, 9] :main models of current data mining. It emphasizesparticularly on the data relations in different areas, and After mining out all frequent itemsets in database, itfinds out the dependent relationship among several items will produce strong association rules which satisfy both thewhich satisfies the enactment threshold value of supported minimum support degree (min_sup) and the minimumdegree and confidential degree. Mining association rules confidence degree (min_conf).(knowledge) is to searching out all exiting valuable The operating instruction as follows:relationship of items from the given database with the 1. For each frequent itemsets L, produce all subsets ofstatistics principle.978-1-4244-2096-4/08/$25.00 ©2008 IEEE 1785
    • Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, Kunming, 12-15 July 2008it 3.1. Establish item 2 .For each non-null subset S of L, generate anassociation rule “s-> (l-s)”, if its confidence degree is not Establishing the knowledge database of agriculturalless than min_conf. This is the classic algorithm of expert system based on these crop disease data. In order toassociation rules. Here, we modify this method in order to use the Apriori, we take these disease factor: disease spotbuild up the knowledge database of expert system. color, disease spot position , disease spot shape and disease spot name as the item of Apriori, and number them as: 1, 2,3. Establishing the knowledge database of the expert 3, 4, 5 in turn. Because each disease factor has severalsystem with Association Rule different condition (we call them factor values), we also number each different condition. In this way, we take the The modification to the algorithm is expressed in the form of "disease factor number. factor value number" tofollowing three aspects: first, modify the establishing subdivide the item. The result as follows:method of the item in the database, second modifying the 1. disease spot color: 1.1, black brown; 1.2, pink; 1.3,linking method of the item, and the third, modifying the brownproduction method of the association rules from frequent 2. disease spot position: 2.1, leaf disease; 2.2, hullitem. disease; The following is to illustrate these three modifications 3. disease spot shape: 3.1, circularity; 3.2, hemicycle;with the example of association rule establishing the 3.3, irregularity;knowledge database of agricultural expert system. 4. disease characters: 4.1, none characters; 4.2, slightly For example, in the agricultural expert system, there caved; 4.3, caved;are several disease factors to judge the crop disease: color 5. disease name: 5.1, Anthracnose; 5.2, Indiaof disease spot (black brown, pink and brown), position of Anthracnose; 5.3, Cornu spot diseasethe disease spot (leaf, hull), shape of the disease spot Then, we take the subdivided item that is "disease(circularity, hemicycle, and irregularity), and characteristic factor number.factor value number" as the final item whichof the disease spot (none, slightly caved, caved). With these is operated by Apriori. After the preparation of these items,factors, we can get the disease name (Anthracnose, India we begin to mine the frequent item with Apriori. Here, weAnthracnose, Cornu spot disease). The following data is got assume the min_sup and the min_conf are both one.from former experience: The initial database D from the experience data is as1. black brown spot, leaf disease, circularity, none follows (table 1):characters Anthracnose2. black brown spot, leaf disease, circularity, none Table 1: Initial database Dcharacters Anthracnose3. pink spot, hull disease, hemicycle, slightly ID Itemcaved Anthracnose 001 1.1,2.1,3.1,4.1,5.14. brown spot, leaf disease, circularity, none 002 1.1,2.1,3.1,4.1,5.1characters India Anthracnose 003 1.2,2.2,3.2,4.2,5.15. brown spot, leaf disease, circularity, none 004 1.3,2.1,3.1,4.1,5.2characters India Anthracnose 005 1.3,2.1,3.1,4.1,5.26. brown spot, hull disease, circularity, caved Cornu 006 1.3,2.2,3.1,4.3,5.3spot disease 007 1.1,2.2,3.3,4.1,5.37. black brown spot, hull disease, irregularity, none 008 1.1,2.1,3.1,4.1,5.1characters Cornu spot disease 009 1.2,2.2,3.2,4.2,5.18. black brown spot, leaf disease, circularity, none 010 1.1,2.2,3.3,4.1,5.1characters Anthracnose9. pink spot, hull disease, hemicycle, slightlycaved Anthracnose 3.2. mine frequent k_itemsets10. black brown spot, hull disease, irregularity, nonecharacters Anthracnose Use Apriori algorithm to scan the database to obtain candidate 1-itemsets, then, select it by min_sup 1 and you can get frequent 1-item.As follows (table 2): 1786
    • Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, Kunming, 12-15 July 2008 Table 2: frequent 1-itemsets confidence degree is not less than min_conf. While in the process of the knowledge database Item 1.1 1.2 1.3 2.1 2.2 3.1 3.2 establishment, what we need is the decision rule, but not the Support association rule. In other words, the focal point we pay 5 2 3 5 5 6 2 close attention to is not the associate relationship of degree attributes, but the results of the combination of them. Therefore, we only need to calculate the confidence degree Item 3.3 4.1 4.2 4.3 5.1 5.2 5.3 of the subset besides the decision result’s. For example, to the frequent itemsets "1.2, 2.2, 3.2, 4.2, 5.1", we only need Support to calculate the confidence degree of "1.2, 2.2, 3.2, 4.2", 2 7 2 1 6 2 2 degree that is to say, s= "1.2, 2.2, 3.2, and 4.2" (Because 5.1 is the decision result item: disease name). So, when it satisfies the The association rule of the classic Apriori algorithm rule that its confidence is more than min_conf. we canlinks the two frequent (k-1)-item if they have the same (k-2) produce the rule that is s (l-s), i.e. “1.2, 2.2, 3.2,items in front. For example: link "1.1, 2.1, 3.1" and "1.1, 4.2 5.1". At last, compared with the primary attribute2.1, 3.2", we can get “1.1, 2.1, 3.1, 3.2”. However they are number, we can get the comprehensive rule: “pinknot linked while we build up knowledge database. This is spot∧hull disease ∧ hemicycle ∧slightly cavedbecause "3.1" and "3.2" is two different values of one factor Anthracnose”. (This mean is that if the crop has pink(They both describe the shape of disease spot). They repel hemicycle and slightly caved spot on the hull, the diseaseeach other, so it is impossible for them to exist might be Anthracnose). The rules of the above mentionedsimultaneously. Therefore, to this instance we dont link crop plant diseases example is as follows:them together. Thus, after the linking work, and through theselection of the min_sup 1, we can get the frequent Rule form :( name sup as support degree and conf ask-itemsets which is satisfied both min_conf and min_sup. confidence degree)For the above example of crop diseases, we can get the 1. black brown spot, hull disease, irregularity, nonefrequent 5-itemsets finally, using the modifying Apriori characters Anthracnosealgorithm .As follows (table 3): sup: 0.2, conf: 0.5 2. black brown spot, hull disease, irregularity, none Table 3: frequent 5-itemsets characters Cornu spot disease sup: 0.2, conf: 0.5 Item 1.1,2.1,3.1,4.1,5.1 1.1,2.2,3.3,4.1,5.1 3. black brown spot, leaf disease, circularity, none characters AnthracnoseSupport sup: 0.3, conf: 1 3 1degree 4. pink spot, hull disease, hemicycle, slightly caved Item 1.2,2.2,3.2,4.2,5.1 1.3,2.1,3.1,4.1,5.2 AnthracnoseSupport sup: 0.4, conf: 1degree 2 2 5. brown spot, leaf disease, circularity, none characters India Anthracnose Item 1.1,2.2,3.3,4.1,5.3 1.3,2.2,3.1,4.3,5.3 sup: 0.2, conf: 1Support 1 1 6. brown spot, hull disease, circularity, caved Cornu spotdegree disease sup: 0.1, conf: 1 After mining out the frequent 5-itemsets, it couldn’t Finally, store all the rules into the database, and thenproduce new frequent itemsets any more, so the algorithm the knowledge database of the expert system is established.ends. The following job is to make association rules from Thus, according to the rules table in the knowledgethe frequent itemsets we have got. database, the system can output the homologous decision rules after the customer input some factors of the crop3.3. Produce rules disease. In the classic association algorithm, we find out allnon-null subsets of frequent itemsets L. To each non_nullsubset S, we produce an association rule: "s (l-s)" if its 1787
    • Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, Kunming, 12-15 July 20084. Conclusions [4] Zhu Ming. Datamining. University of Science and Technology of China Press, Hefei, pp:115-126, 2002. By modifying the Apriori algorithm from three aspects [5] R. Agrawal, T. Imielinski, and A. Swami. Miningwhich includes the establishing methods of items, the association rules between sets of items in largeconnection methods of items and the production methods of databases. Proceedings of the ACM SIGMODrules, this paper successfully established the knowledge Conference on Management of data, p.p:207-216.base of the expert system and validated the role of May 1993.association rule in discursion. Until now, this method has [6] R. Agrawal, and R. Srikant. Fast algorithms forbeen validated in sever fields such as the hospital expert mining association rules in large database. Technicalsystem. So it is a feasible method in establishing knowledge Report FJ9839, IBM Almaden Research Center, Sanbase of expert system. Jose, CA, Jun. 1994. [7] Agrawal R, Srikant R. Fast algorithms for mining association rules. In: Bocca JB, Jarke M, Zaniolo C,References eds. Proc. of the 20th Intl Conf. Very Large Data Bases (VLDB94). Morgan Kaufmann, pp:487-499,[1] Ti-yun Huang, Intelligent decision support system. 1994. Electronic industry publishing company, Beijing, [8] Agarwal R, Aggarwal C, Prasad V V V.A tree 2001. projection algorithm for generation of frequent[2] Ahmed K M, El-Makky N M, Taha Y.A note on itemsets. In Journal of Parallel and Distributed “Beyond market basket: Generalizing association rules Computing (Special Issue on High Performance Data to correlations.” SIGKDD Explorations, pp:46-48, Mining), 2000. January 2000, [9] Aggarwal C, Agarawal R, Prasad V V V.Depth First[3] Joseph G, Gary R. The principle and program of Generation of Long Patterns.In:The 6th ACM expert system. China Machine Press, Beijing. 2001 SIGKDD Intl.Conf.on Knowledge Discovery & Data Mining, Boston, MA, USA, 2000. 1788