Establishing knowledge base

Proceedings of the Seventh International Conference on Machine Learning and Cybernetics, Kunming, 12-15 July 2008

ESTABLISHING KNOWLEDGE BASE OF EXPERT SYSTEM WITH
ASSOCIATION RULES
DONG-LAI MA1, WEN-JING ZHANG2, BIN DONG3, PENG YANG2, HAI-XIA LU2
1
Teaching Affair Office, Hebei Software Institute, Baoding 071000, PR China College of Information
2
College of Information Science & Technology, Agricultural University of Hebei, Baoding 071001, PR China
3
Computer Center, Hebei University Affiliated Hospital, Baoding 071000, China
E-MAIL: zwjndjs@hotmail.com, dbin2000@hotamail.com

Abstract:
Both expert system and data mining belong to the 2. Introductions of the Association Rules
Artificial Intelligence fields. Association rule is a method of
datamining, whose typical application is analyzing the In 1993, Agrawal put forward to mining association
shopping basket in supermarket. The main task of expert
rules of items in the customer transaction database for the
system is ratiocination, while that of association rule is to find
out the valuable relationship between each data item. By first time[3]. The mining of association rules is to discover
modifying the apriori arithmetic and the method of the the interesting relevancy or correlative relationship among a
making rules, we mine the decisive rule of database that could great deal of data items. The first is mining frequent
be applied in expert system, thereby find out the method of itemsets with Apriori, and then produce association rules
mining decisive rule using association rules. according the frequent itemsets mined [4].

Keywords: 2.1. mining frequent itemsets: Apriori algorithm
Data Mining; Association Rule; Apriori; Expert System;
Knowledge Base In 1994, Agrawal put forward an important method of
mining association rules in the customer transaction
1. Introduction database[5, 6]. Its kernel is Apriori, which is based on the
deduction of two phase’s frequent itemsets. Apriori is the
The expert system is a knowledge system which takes basic algorithm of mining Boolean association rules. It
computer as a tool and makes use of the expertise and mines frequent itemsets by using a circulatory method of
knowledge consequence to comprehend and solve the searching frequent itemsets that produces (k+1)-itemsets
problem[1]. It imitates the macroscopically inferential from k-itemsets. After scanning the database, the algorithm
activity of expert, and uses computer to inference realm begins to circulate. Each time, it connects two (k-1)
knowledge which is conformed to the model described[2]. itemsets, and produce frequent k-itemsets after the riddling
Expert system is made up of four parts: knowledge base, of Apriori property, that is "each subsets of the frequent
inference engine, knowledge acquisition and explaining itemsets also should be frequent" as well as the minimum
interface with the knowledge base and inference engine as support degree. The Apriori ends till the frequent itemsets
its kernel。 can’t be produced.
Data mining means to abstract the information or
mode which is implicit, unknown and valuable in large 2.2. making association rules from frequent itemsets[7,
database or data warehouse. Association rule is one of the 8, 9]
:
main models of current data mining. It emphasizes
particularly on the data relations in different areas, and After mining out all frequent itemsets in database, it
finds out the dependent relationship among several items will produce strong association rules which satisfy both the
which satisfies the enactment threshold value of supported minimum support degree (min_sup) and the minimum
degree and confidential degree. Mining association rules confidence degree (min_conf).
(knowledge) is to searching out all exiting valuable The operating instruction as follows:
relationship of items from the given database with the 1. For each frequent itemsets L, produce all subsets of
statistics principle.

978-1-4244-2096-4/08/$25.00 ©2008 IEEE
1785


it 3.1. Establish item
2 .For each non-null subset S of L, generate an
association rule “s-> (l-s)”, if it's confidence degree is not Establishing the knowledge database of agricultural
less than min_conf. This is the classic algorithm of expert system based on these crop disease data. In order to
association rules. Here, we modify this method in order to use the Apriori, we take these disease factor: disease spot
build up the knowledge database of expert system. color, disease spot position , disease spot shape and disease
spot name as the item of Apriori, and number them as: 1, 2,
3. Establishing the knowledge database of the expert 3, 4, 5 in turn. Because each disease factor has several
system with Association Rule different condition (we call them factor values), we also
number each different condition. In this way, we take the
The modification to the algorithm is expressed in the form of "disease factor number. factor value number" to
following three aspects: first, modify the establishing subdivide the item. The result as follows:
method of the item in the database, second modifying the 1. disease spot color: 1.1, black brown; 1.2, pink; 1.3,
linking method of the item, and the third, modifying the brown
production method of the association rules from frequent 2. disease spot position: 2.1, leaf disease; 2.2, hull
item. disease;
The following is to illustrate these three modifications 3. disease spot shape: 3.1, circularity; 3.2, hemicycle;
with the example of association rule establishing the 3.3, irregularity;
knowledge database of agricultural expert system. 4. disease characters: 4.1, none characters; 4.2, slightly
For example, in the agricultural expert system, there caved; 4.3, caved;
are several disease factors to judge the crop disease: color 5. disease name: 5.1, Anthracnose; 5.2, India
of disease spot (black brown, pink and brown), position of Anthracnose; 5.3, Cornu spot disease
the disease spot (leaf, hull), shape of the disease spot Then, we take the subdivided item that is "disease
(circularity, hemicycle, and irregularity), and characteristic factor number.factor value number" as the final item which
of the disease spot (none, slightly caved, caved). With these is operated by Apriori. After the preparation of these items,
factors, we can get the disease name (Anthracnose, India we begin to mine the frequent item with Apriori. Here, we
Anthracnose, Cornu spot disease). The following data is got assume the min_sup and the min_conf are both one.
from former experience: The initial database D from the experience data is as
1. black brown spot, leaf disease, circularity, none follows (table 1):
characters Anthracnose
2. black brown spot, leaf disease, circularity, none Table 1: Initial database D
3. pink spot, hull disease, hemicycle, slightly ID Item
caved Anthracnose 001 1.1,2.1,3.1,4.1,5.1
4. brown spot, leaf disease, circularity, none 002 1.1,2.1,3.1,4.1,5.1
characters India Anthracnose 003 1.2,2.2,3.2,4.2,5.1
5. brown spot, leaf disease, circularity, none 004 1.3,2.1,3.1,4.1,5.2
characters India Anthracnose 005 1.3,2.1,3.1,4.1,5.2
6. brown spot, hull disease, circularity, caved Cornu 006 1.3,2.2,3.1,4.3,5.3
spot disease 007 1.1,2.2,3.3,4.1,5.3
7. black brown spot, hull disease, irregularity, none 008 1.1,2.1,3.1,4.1,5.1
characters Cornu spot disease 009 1.2,2.2,3.2,4.2,5.1
8. black brown spot, leaf disease, circularity, none 010 1.1,2.2,3.3,4.1,5.1
9. pink spot, hull disease, hemicycle, slightly
caved Anthracnose 3.2. mine frequent k_itemsets
10. black brown spot, hull disease, irregularity, none
characters Anthracnose Use Apriori algorithm to scan the database to obtain
candidate 1-itemsets, then, select it by min_sup 1 and you
can get frequent 1-item.As follows (table 2):

1786


Table 2: frequent 1-itemsets confidence degree is not less than min_conf.
While in the process of the knowledge database
Item 1.1 1.2 1.3 2.1 2.2 3.1 3.2
establishment, what we need is the decision rule, but not the
Support association rule. In other words, the focal point we pay
5 2 3 5 5 6 2 close attention to is not the associate relationship of
degree
attributes, but the results of the combination of them.
Therefore, we only need to calculate the confidence degree
Item 3.3 4.1 4.2 4.3 5.1 5.2 5.3 of the subset besides the decision result’s. For example, to
the frequent itemsets "1.2, 2.2, 3.2, 4.2, 5.1", we only need
Support to calculate the confidence degree of "1.2, 2.2, 3.2, 4.2",
2 7 2 1 6 2 2
degree that is to say, s= "1.2, 2.2, 3.2, and 4.2" (Because 5.1 is the
decision result item: disease name). So, when it satisfies the
The association rule of the classic Apriori algorithm rule that its confidence is more than min_conf. we can
links the two frequent (k-1)-item if they have the same (k-2) produce the rule that is s (l-s), i.e. “1.2, 2.2, 3.2,
items in front. For example: link "1.1, 2.1, 3.1" and "1.1, 4.2 5.1". At last, compared with the primary attribute
2.1, 3.2", we can get “1.1, 2.1, 3.1, 3.2”. However they are number, we can get the comprehensive rule: “pink
not linked while we build up knowledge database. This is spot∧hull disease ∧ hemicycle ∧slightly caved
because "3.1" and "3.2" is two different values of one factor Anthracnose”. (This mean is that if the crop has pink
(They both describe the shape of disease spot). They repel hemicycle and slightly caved spot on the hull, the disease
each other, so it is impossible for them to exist might be Anthracnose). The rules of the above mentioned
simultaneously. Therefore, to this instance we don't link crop plant diseases example is as follows:
them together. Thus, after the linking work, and through the
selection of the min_sup 1, we can get the frequent Rule form :( name sup as support degree and conf as
k-itemsets which is satisfied both min_conf and min_sup. confidence degree)
For the above example of crop diseases, we can get the 1. black brown spot, hull disease, irregularity, none
frequent 5-itemsets finally, using the modifying Apriori characters Anthracnose
algorithm .As follows (table 3): sup: 0.2, conf: 0.5
2. black brown spot, hull disease, irregularity, none
Table 3: frequent 5-itemsets characters Cornu spot disease
sup: 0.2, conf: 0.5
Item 1.1,2.1,3.1,4.1,5.1 1.1,2.2,3.3,4.1,5.1 3. black brown spot, leaf disease, circularity, none
Support sup: 0.3, conf: 1
3 1
degree 4. pink spot, hull disease, hemicycle, slightly caved
Item 1.2,2.2,3.2,4.2,5.1 1.3,2.1,3.1,4.1,5.2 Anthracnose
Support sup: 0.4, conf: 1
degree 2 2 5. brown spot, leaf disease, circularity, none
characters India Anthracnose
Item 1.1,2.2,3.3,4.1,5.3 1.3,2.2,3.1,4.3,5.3
sup: 0.2, conf: 1
Support
1 1 6. brown spot, hull disease, circularity, caved Cornu spot
degree
disease
sup: 0.1, conf: 1
After mining out the frequent 5-itemsets, it couldn’t Finally, store all the rules into the database, and then
produce new frequent itemsets any more, so the algorithm the knowledge database of the expert system is established.
ends. The following job is to make association rules from Thus, according to the rules table in the knowledge
the frequent itemsets we have got. database, the system can output the homologous decision
rules after the customer input some factors of the crop
3.3. Produce rules disease.
In the classic association algorithm, we find out all
non-null subsets of frequent itemsets L. To each non_null
subset S, we produce an association rule: "s (l-s)" if its

1787


4. Conclusions [4] Zhu Ming. Datamining. University of Science and
Technology of China Press, Hefei, pp:115-126, 2002.
By modifying the Apriori algorithm from three aspects [5] R. Agrawal, T. Imielinski, and A. Swami. Mining
which includes the establishing methods of items, the association rules between sets of items in large
connection methods of items and the production methods of databases. Proceedings of the ACM SIGMOD
rules, this paper successfully established the knowledge Conference on Management of data, p.p:207-216.
base of the expert system and validated the role of May 1993.
association rule in discursion. Until now, this method has [6] R. Agrawal, and R. Srikant. Fast algorithms for
been validated in sever fields such as the hospital expert mining association rules in large database. Technical
system. So it is a feasible method in establishing knowledge Report FJ9839, IBM Almaden Research Center, San
base of expert system. Jose, CA, Jun. 1994.
[7] Agrawal R, Srikant R. Fast algorithms for mining
association rules. In: Bocca JB, Jarke M, Zaniolo C,
References eds. Proc. of the 20th Int'l Conf. Very Large Data
Bases (VLDB'94). Morgan Kaufmann, pp:487-499,
[1] Ti-yun Huang, Intelligent decision support system. 1994.
Electronic industry publishing company, Beijing, [8] Agarwal R, Aggarwal C, Prasad V V V.A tree
2001. projection algorithm for generation of frequent
[2] Ahmed K M, El-Makky N M, Taha Y.A note on itemsets. In Journal of Parallel and Distributed
“Beyond market basket: Generalizing association rules Computing (Special Issue on High Performance Data
to correlations.” SIGKDD Explorations, pp:46-48, Mining), 2000.
January 2000, [9] Aggarwal C, Agarawal R, Prasad V V V.Depth First
[3] Joseph G, Gary R. The principle and program of Generation of Long Patterns.In:The 6th ACM
expert system. China Machine Press, Beijing. 2001 SIGKDD Intl.Conf.on Knowledge Discovery & Data
Mining, Boston, MA, USA, 2000.

1788

Establishing knowledge base

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Establishing knowledge base

Similar to Establishing knowledge base (20)

Establishing knowledge base