1) Fuzzy cognitive maps and rough set theory can be used for decision making. Fuzzy cognitive maps use nodes and weighted connections to represent relationships between concepts. Rough set theory uses approximations to deal with vagueness in data.
2) Fuzzy cognitive maps can be constructed by assigning weights to connections between concepts or using linguistic variables. Experts provide input to determine concepts and connections. Individual maps can be combined by averaging weights of common concepts.
3) Fuzzy cognitive maps function similarly to neural networks and can be trained or learned using algorithms like differential Hebbian learning. An inference algorithm is used to update concept values iteratively until reaching a fixed point or limit cycle.
Web & Social Media Analytics Previous Year Question Paper.pdf
Fuzzy cognitive map and Rough Sets in Decision Making
1. FUZZY COGNITIVE MAPS AND
ROUGH SET THEORY IN
DECISION MAKING
Dr. A.Tamilarasi
Professor/MCA
Kongu Engineering College
Perundurai, Erode 638052
TAMILNADU
2. Outline
Introduction
Fuzzy Cognitive Maps (FCM)
Different Representation of Fuzzy Cognitive
Maps
construction and learning of FCM
Algorithm
Synthesizing different FCMs
FCM for Decision Making
Rough sets
Basic Notations
Rough sets in Decision Making
4.11.2016
3. Introduction
The scientist Axelrod (1976) introduced cognitive maps for
representing social scientific knowledge and describing the methods
that are used for decision making in social and political systems.
Prof. Bart Kosko, the guru of fuzzy logic, introduced the Fuzzy
Cognitive Maps in the year 1986 is a combination of Neural
Networks and Fuzzy Logic
Advantage: Easy and fast crisp modelling and Fast computation
An FCM is a directed graph with concepts like policies, events etc. as
nodes and causalities as edges. It represents causal relationship
between concepts.
When the nodes of the FCM are fuzzy sets then they are called as
fuzzy nodes. FCMs with edge weights or causalities from the set {-1,
0, 1} are called simple FCMs.
How to develop FCM for Decision support?
4.11.2016
6. Consider the nodes / concepts C1, …, Cn of the FCM. Suppose the
directed graph is drawn using edge weight eij∈{0, 1, -1}.
The matrix E be defined by E = (eij) where eij is the weight of the
directed edge CiCj . E is called the adjacency matrix of the FCM, also
known as the connection matrix of the FCM. The matrix for Ex.1
4.11.2016
7. If Ci causally increases Cj means eij = +1 and Ci causally decreases
Cj means eij = -1. No causality means eii = 0.
In general causality are fuzzy. Causality admits degrees or linguistic
variables like some, lot, little, usually, more or less etc.
4.11.2016
9. Cognitive Map
Let C1, C2, … , Cn be the nodes of an FCM. A = (a1, a2, …, an)
where ai ∈ {0, 1}. A is called the instantaneous state vector
and it denotes the on-off position of the node at an instant.
ai= 0 if ai is off and ai = 1 if ai is on for i = 1, 2, …, n.
Let E1, E2, …, Ep be the adjacency matrices of the FCMs
with nodes C1, C2, …, Cn then the combined FCM is got by
adding all the adjacency matrices E1, E2, …, Ep .
A fixed point: if the FCM equilibrium state of a dynamical
system is a unique state vector, the state vector remains
unchanged for successive iterations, then it is called the fixed
point.
A limit cycle: if the FCM settles down with a state vector
repeating in the form A1 A2 Ai A1 then this
equilibrium is called a limit cycle.
94.11.2016
10. Representation of Fuzzy Cognitive Maps
Each concept is characterized by a number Ai that represents its
value and stands, in the interval [0,1].
The value of each of its inputs (concepts) is multiplied by the
respective weight [-1..1]; then the results are added and passed by a
non-linearity.
The process is the same as the one used on a common neuron in a
Neural Network.
The evolution of a FCM is iterative: the current value of each concept
is computed with its inputs previous values. After a certain number of
iterations, the map might reach equilibrium, converging to a single
state or a finite cycle of states
A new formulation for calculating the values of concepts of Fuzzy
Cognitive Map is
4.11.2016
11. Representation of FCM contd…
The coefficient k2 represents the proportion of the contribution of the
previous value of the concept in the computation of the new value
and the k1expresses the influence of the interconnected concepts in
the configuration of the new value of the concept Ai. This new
formulation assumes that a concept links to itself with a weight Wii
=k2. If k1 = k2 = 1.
(1)
)(
)1(
2
)1(
0
1
t
iji
tn
j
j
(t)
i
AkwAkfA
)(
)1(
)1(
0
t
i
tn
j
j
(t)
i AwAfA ji
4.11.2016
12. Fuzzy cognitive maps ..
When nature of concepts can be negative and their values belong to
the interval [-1,1] , then we use the threshold function as f(x) =
tanh(x) . Mostly we use sigmoid function, where f(x) = 1
1 + e-x
Suppose that an FCM is consisted of n concepts. It is represented
mathematically by a 1 n state vector A, which gathers the values of
the n concepts and by an n n weight matrix W. Each element eij of
the matrix W indicates the value of the weight Wij between concept
Ci and Cj and the matrix diagonal is zero since it is assumed that no
concept causes itself and Wii = 0.
f(x) =f(x) =
)(
)(
)1(
)1()1(
newt(t)
tt(t)
WAfA
AWAfA
4.11.2016
13. Methods for constructing Fuzzy Cognitive Maps
Assigning numerical weights
Experts are polled together and they determine the relevant factors
that should stand as nodes of FCM . So, they decide the number of
concepts , which consist the map and what characteristic of the
system each concept represents.
The individual FCMs must be combined into one collective FCM and
a method to combine the individual maps is needed. A first approach
could be the summation of different weight matrix:
that there are experts of different credibility on the knowledge of the
system, and for these experts their contributions on constructing
FCMs may be multiplied by a nonnegative ‘credibility’ weight bj
)(
1
n
j
jj
wbfW
)(
1
n
j
j
wfW
4.11.2016
14. how credibility weights will be assigned to
every expert :
Step 1: For all the N experts, set credibility weight bk=1
Step 2: For i,j=1 to n
Step 3: For each interconnection (Ci to Cj) examine the N weights WIj
k that each kth of
the N experts has assigned.
Step 4: IF there are weights WIj
k with different sign and the number of weights with the
same sign is less than π*N
THEN
ask experts to reassign weights for this particular interconnection and go to step3
ELSE
take into account the weights of the greater group with the same sign and consider
that there are no other weights and penalize the experts who choose “wrong” signed
weight with a new credibility weight bk =μ1*bk
Step 5: For the weights with the same sign, find their average value
n
wbf
W
n
j
jj
ave
ij
)(
1
4.11.2016
15. Step 6: IF Wij
ave – Wij
k≥ ω1THEN consider that there is no weight
Wij
k, penalize the kth expert bk=μ2*bk and go to step 5
Step 7: IF there have not examined all the n × n interconnection go
to step 2
ELSE
construct the new weight matrix W which has elements the weights
Wij
ave
Example : Six experts have constructed six individual FCMs. Experts
have suggested the following weights for the one interconnection
from concept Ci to concept Cj : Wij= [-0.5, 0.6, 0.66, 0.7, 0.65, 0.25]
so the 1st expert is penalized with credibility weight b1 =μ1*. b1
and the corresponding weight is dropping out. Wij
ave = 0.572. the 6th
expert with value 0.25 is excluded from the calculation and the 6th
expert is penalized. Wij
ave=0.652
4.11.2016
16. Assigning linguistic variables for FCM
weights.
Experts are asked to describe the causality among concepts using
linguistic notions. Every expert will determine the influence of one
concept to the other as “negative” or “positive” and then he will
describe the grade of influence with a linguistic variable such as
"strong", "weak" and etc
4.11.2016
17. Linguistic variable inference
Influence of one concept on another, is interpreted as a linguistic
variable taking values in the universe U=[-1,1] and its term set
T(influence) = {negatively very strong, negatively strong, negatively
medium, negatively weak, zero, positively weak, positively medium,
positively strong, positively very strong}
M(negatively very strong)= the fuzzy set for "an influence below to -
75%" with membership function μnvs
M(negatively strong)= the fuzzy set for "an influence close to -75%"
with membership function μns
M(negatively medium)= the fuzzy set for "an influence close to -50%"
with membership function μnm
M(negatively weak)= the fuzzy set for "an influence close to -25%"
with membership function μnw
M(zero)= the fuzzy set for "an influence close to 0" with membership
function μz
4.11.2016
18. M(positively weak)= the fuzzy set for "an influence close to 25%" with
membership function μpw
M(positively medium)= the fuzzy set for "an influence close to 50%"
with membership function μpm
M(positively strong)= the fuzzy set for "an influence close to 75%"
with membership function
μps
M(positively very strong)= the fuzzy set for "an influence above to
75%" with membership function μpvs
This methodology has the advantage that experts do not have to
assign numerical causality weights but to describe the degree of
causality among concepts
4.11.2016
19. .
Synthesizing different Fuzzy Cognitive Maps
The unification of the distinct FCM depends on the concepts of the
segmental FCM, if there are no common concepts among different
maps, the combined matrix W is constructed
W1 0 …
0 W2 …..
W = 0 0 W3 …….
----------------------------------
Wk
In this case, there are K different FCM matrices, with weight matrices
W i and the dimension of matrix W is n ×n where n equals the total
number of distinct concepts in all the FCMs.
4.11.2016
20. Example
Assume that there are two Fuzzy Cognitive Maps, F1 with concepts
C1, C2, C3 and F2 with concepts C4, C5, C6 . Weight matrices for
F1 and F2 are:
0 0 w13 0 w45 w46
W1 = w21 0 0 W2 = w54 0 0
w31 w32 0 0 w65 0
W = w1 0 = 0 0 w13 0 0 0
0 w2 w21 0 0 0 0 0
w31 w32 0 0 0 0
0 0 0 0 w45 w46
0 0 0 w54 0 w56
0 0 0 0 w65 0
4.11.2016
21. FCMs with common concepts, are combined together, calculating
new weights for the interconnection between common concepts.
If there are more than one common concept between the Fuzzy
Cognitive Maps, there will be proposed two or more weights for the
same interconnection. In this case, as new weight will be the average
of weights
assume that there are two Fuzzy Cognitive Maps, F1 with concepts
C1, C2, C3 and F2 with concepts C2, C3, C4 . C5 , Weight matrices
for F1 and F2 are:
0 0 w13 0 w23 w24 0
W1 = w21 0 0 W2 = w32 0 w34 w35
w31 w32 0 w42 w43 0 0
w52 w53 w54 0
4.11.2016
22. The augmented Fuzzy Cognitive Map will have five concepts and its
weight matrix
0 0 w13 0 0
w21 0 w23 w24 0
w31 w32
ave 0 w34 w35
0 w42 w43 0 0
0 w52 w53 w54 0
4.11.2016
23. Neural network nature of Fuzzy Cognitive
Maps
Learning algorithms can be supervised, reinforcement or
unsupervised and the function is commonly referred to an error
function, or a cost function, or an objective function.
The simplest networks consist of a set of input vectors x and outputs
Y connected by a weight matrix, w, where wij connects xi to yj. Then
the problem in unsupervised learning is to find the values of the
weights,w, which will minimize the error function.
The Hebb’s learning law is usually implemented by
Wij( k + 1) = wij(k) + yj xi
4.11.2016
24. Neural network nature of Fuzzy Cognitive
Maps
The Differential Hebbian learning rule has been used in the training
of a specific type of FCMs.
The Differential Hebbian learning law adjusts the weights of the
interconnection between concepts. It grows a positive edge between
two concepts if they both increase or both decrease and it grows a
negative edge if values of concepts move in opposite directions.
Adjusting the idea of differential Hebbian learning rule in the
framework of Fuzzy Cognitive Map, the following rule is proposed to
calculate the derivative of the weight between two concepts.
wji′ = −w ji + s(A j
new)s(A i
old ) + s′ ( Aj
new)s ′ (A i
old )
Where S(x) =
4.11.2016
25. FCM Inference Algorithm
Step 1: Definition of the initial vector A that corresponds to the
elements-concepts identified by experts’ suggestions and
available knowledge.
Step 2: Multiply the initial vector A with the matrix W defined by
experts
Step 3: The resultant vector A at time step k is updated using
function threshold ‘ f ’.
Step 4: This new vector is considered as an initial vector in the next
iteration.
Step 5: Steps 2–4 are repeated until epsilon (where epsilon is a
residual, describing the minimum error difference among the
subsequent concepts)
4.11.2016
27. one tank and three valves that impact the measure of fluid in the tank
Valve 1 and valve 2 empty two different kinds of fluid into tank 1
Gmin ≤ G ≤ Gmax
Hmin≤ H ≤ Hmax
Concept 1: The amount of the liquid which tank1 contains. This
amount is dependent on valve 1, valve2 and valve 3.
Concept 2: valve 1’s state (closed, open, partially opened).
Concept 3: valve 2’s state (closed, open, partially opened).
Concept 4: valve 3’s state (closed, open, partially opened).
Concept 5: The reading on the instrument of exact gravity.
4.11.2016
28. The connections between concepts are:
Event 1: It associates concept 2 (valve 1) with concept1
(measure of fluid in the tank). It relates the condition of the
valve 1 with the amount of the fluid in the tank.
Event 2: It relates concept 3 (valve 2) with concept1; valve 2
causes the expansion or not of the amount of fluid in the tank.
Event 3: It associates concept 4 (valve 3) with concept 1; the
state of valve 3 causes the decline or not of the amount of fluid
into the tank.
Event 4: It associates concept1 with concept 2; when the height
of the fluid in the tank is high, valve 1 (concept 2) needs
shutting the amount of incoming fluid into the tank is
decreasing.
Event 5: It associates concept 1 (tank) with concept 3; when the
height of the fluid in the tank is high, the end of valve 2 (concept
3) reduces the amount of incoming fluid.
4.11.2016
29. Event 6: It associates concept 5 (the particular
gravity) with concept 4 (valve 3). At the point when
the quality of the liquid in the tank is appropriate,
valve 3 is opened and the delivered fluid continues to
another process.
Event 7: It demonstrates the impact of concept 1
(tank) into concept 5 (particular gravity). At the point
when the amount of fluid into the tank is fluctuated,
this impact in the particular gravity of the fluid.
Event 8: It relates concept 5 (particular gravity) with
concept 2 (valve 1) when the particular gravity is low
then valve 1 (idea 2) is opened and fluid comes into
the tank.
4.11.2016
30. A0 =[0.1 0.45 0.39 0.04 0.01];
For this system, three experts have been used in order to construct the
Fuzzy Cognitive Map. They jointly determined the concepts of the FCM and
then each expert drawn the interconnections among concepts and he
assigned a weight for each interconnection.
4.11.2016
32. Autistic disorder problem description and construction of
fuzzy cognitive map model
Three experts like pediatrician, occupational therapist and
special educator were used to determine the main concepts of
the model as well as their interconnections among concepts.
This process was completed through a questionnaire, which
created for this process and proposed to the team of experts.
The linguistic values which are obtained from the parents of
the autistic infants and the weights are calculated by the
experts individually as per concepts
The input concept represents the symptoms and signs of
autistic spectrum disorder. The input vector consist of various
symptoms of autism which takes linguistic properties like (a)
certainly not, (b) at times and (c) always
4.11.2016
33. Concepts Description Type of the data
C1 Enjoy being swung Three fuzzy values
C2 Take an interest in other children Three fuzzy values
C3 Climbing on things Three fuzzy values
C4 Enjoy playing Three fuzzy values
C5 Pointing index finger Three fuzzy values
C6 Playing with small toys Three fuzzy values
…. .. ….
C18 Unusual finger movements near
his/her face
Three fuzzy values
.. …. ..
OUTC1 Autism (High, Probable Autism
and No autism)
Three fuzzy values
4.11.2016
34. Steps in constructing FCM s for autism
disorder prediction
Every expert like psychologist, pediatricians, special educators
describe each interconnections with linguistic fuzzy rule to assign the
weight.
The fuzzy rules for each interconnection can be calculated using the
following IF a change B occurs in the value of concept Cj THEN a
change D in the value of concept Ci is caused.
Infer: The influence from Cj to Ci is E where B, D and E are fuzzy
linguistic variables that experts use to describe the variance of
concept values and the degree of influence from concept Cj to Ci.
For example, IF a large change occurs in C18 THEN a large change
in OUTC1 is high. Infer: Influence of C18 to OUTC1 is high.
4.11.2016
35. The inferred fuzzy weights are combined using the SUM method as
suggested by the three experts.
3. Using defuzzification method of centroid, weight is calculated as
(2)
4. All the linguistic weights u* are transformed into a numerical
weight wj which lies in the range [-1, 1].
5. All the weights are gathered into a weight matrix (Wji) n n, where
n is the number of concepts
duu
uduu
u
c
c
)(
)(*
4.11.2016
36. Non-linear hebbian learning algorithm
The value of each concept of FCM is updated through the equation
Where as the value of weight is calculated using Eq. (2).
When the NHL algorithm is applied, only the initial non-zero weights
suggested by the experts are updated for each iteration step.
All the other weights of weight matrix Wji remains zero which is their initial
value.
For the non-linear adaptation algorithm, the value of weight wji
k is calculated
with learning rate parameter (k) and the weight decay parameter () using
the following mathematical Form
Wji
k = Wji
k-1 + k Ai
k-1 (Aj
k-1 - (Wji
k-1) Wji
k-1 Ai
k-1)
For each k, the weights are adjusted by using the formula
Wji = k Ai
k-1 Aj
k-1 - Wji
k-1 Ai
k-1
Stopping condition: Either the condition 1 or 2 has to be satisfied to
terminate the iterative process
)(
)1(
)1(
0
t
i
tn
j
j
(t)
i AwAfA ji
4.11.2016
37. There is a proposed two termination condition for the
NHL algorithm.
The first termination condition is the minimization of
function F1 which uses decision of concepts (DOC) as
defined by experts and a target value Ti which represents
a desired value (or) the mean value when DOC
represents a concept.
F1 = (Doc – Ti)
2
The second termination condition is the minimization of
the variation between two subsequent values of DOC
and helps to terminate the iterative process of the
learning algorithm.
F2 = Doci
(k+1) – Doci
k < e
Stop the process with resultant weight matrix
4.11.2016
38. Results
DOC and could be categorized as Definite Autism (DA), Probable
Autism (PA) and No Autism (NA) which takes a range of values such
as 0.41 DA 1.00, 0.26 PA 0.40 and 0 NA 0.25,
respectively.
4.11.2016
39. Rough sets
Rough Sets theory was introduced by Z. Pawlak (1982) as a
mathematical tool for data analysis.
Rough sets have many applications in the field of Knowledge
Discovery, feature selection, Banking etc.,
Sets derived from imperfect, imprecise, and incomplete data may
not be able to be precisely defined.
Rough set theory (RST) can be used as such a tool to discover data
dependencies and to reduce the number of attributes contained in
a dataset using the data alone, requiring no additional information.
Rough sets is based on the assumption that every object of the
universe of discourse have some information (data, knowledge).
4.11.2016
40. Equivalence Relation and Equivalence class
A relation on a set X is subset of X×X.
Let X be a set and let x, y, and z be elements of X. An
equivalence relation R on X is a relation on X such that:
Reflexive Property: xRx for all x in X.
Symmetric Property: if xRy, then yRx.
Transitive Property: if xRy and yRz, then xRz.
Let R be an equivalence relation on a set X. For each a ∈ X,
we define the equivalence class of a, denoted by [a], to be
the set [a] = {x ∈ SX: x R a}. The equivalence classes form a
partition of X. This partition – the set of equivalence classes –
is sometimes called the quotient set or the elementary set of
X by R and is denoted by X / R.
41. Rough Sets Theory
Let T = (U, A, C, D,), be a Decision system data, where:
U is a non-empty, finite set called the universe, A is a
non-empty finite set of attributes, C and D are subsets of
A, Conditional and Decision attributes subsets
respectively. : U C A information function
for a A, Va is called the value set of a ,
The elements of U are objects, cases, states,
observations.
The Attributes are interpreted as features, variables,
characteristics conditions, etc.
aVUa :
4.11.2016
42. information table : Example 1:
Let U = {x1, x2, x3, x4, x5,x6 }, the universe set. C = {a1, a2,
a3, a4 } the conditional features set. V1 = {good, Medium,
bad}, V2 ={good, bad},V3 = {good, bad}, V4 = {good, bad}.
The information function (x1, a1 ) = good, so on
Student a1 a2 a3 a4
x1 good good bad good
x2 Medium bad bad bad
x3 Medium bad bad good
x4 bad bad bad bad
x5 Medium good good bad
x6 good bad good good
4.11.2016
43. If in the set of attributes A, condition attributes C = {a1;
a2; a3} and decision attribute D= {a4},were distinguished,
the data table could be seen as a decision table.
In order to explain the evaluations of the decision
attribute in terms of the evaluations of the condition
attributes, one can represent the data table as a set of
decision rules. Such a representation gives the following
rules, for example :
If the level in Mathematics is good and the level in
Physics is good And the level in Literature is bad, then
the students is good.
4.11.2016
44. R-lower approximation & R-upper
approximation
Let X U and R C, R is a subset of conditional features, then the R-
lower approximation set of X, is the set of all elements of U which
can be with certainty classified as elements of X.
R-lower approximation set of X is a subset of X. The R-upper
approximation set of X, is the set of all elements of U such that:
X is a subset of R-upper approximation set of X. R-upper approximation
contains all data which can possibly be classified as belonging to the set X
the R-Boundary set of X is defined as:
}:/{ XYRUYXR
}:/{ XYRUYXR
XRXRXBN )(
4.11.2016
45. Representation of the approximation sets
If = then, X is R-definible (the boundary set is empty)
If then X is Rough with respect to R.
ACCURACY := Card(Lower)/ Card (Upper)
αR = /
XRXR
XB XR
XR XR
4.11.2016
46. Example
Lets consider U={x1, x2, x3, x4, x5, x6, x7, x8} and the equivalence
relation R with the equivalence classes:
X1={x1,x3,x5}, X2={x2,x4} and X3={x6,x7,x8} is a Partition.
Let the classification C={Y1,Y2,Y3} such that
Y1={x1, x2, x4}, Y2={x3, x5, x8}, Y3={x6, x7}
Only Y1has lower approximation, i.e.
,21 XYR
4.11.2016
47. information table – Example 2
H M T F
p1 No Yes High Yes
p2 Yes No High Yes
p3 Yes Yes V. High Yes
p4 No Yes Normal No
p5 Yes No High No
p6 No Yes V. High Yes
4.11.2016
Columns of the table are labeled by attributes Headache (H), Muscle-pain
(M),Temperature (T) and Flu (F) and rows –by patients (objects) – p1 p2 ,p3
,p4 ,p5 ,p6 . Each row of the table can be seen as information about specific
patient. For example, take p2,attribute-value set {(Headache, yes),
(Muscle-pain, no), (Temperature, high), (Flu, yes)}.
48. In the table patients p2, p3 and p5 are indiscernible with respect to
the attribute Headache, patients p3 and p6 are indiscernible with
respect to attributes Muscle-pain and Flu, and patients p2 and p5 are
indiscernible with respect to attributes Headache, Muscle-pain and
Temperature.
Hence, for example, the attribute Headache generates two
elementary sets {p2, p3, p5} and {p1, p4, p6}, whereas the attributes
Headache and Muscle-pain form the following elementary sets: {p1,
p4, p6}, {p2, p5} and {p3}. Similarly one can define elementary sets
generated by any subset of attributes.
Patient p2 has flu, whereas patient p5 does not, and they are
indiscernible with respect to the attributes Headache, Muscle-pain
and Temperature, hence flu cannot be characterized in terms of
attributes Headache, Muscle-pain and Temperature. Hence p2 and
p5 are the boundary-line cases, which cannot be properly classified
in view of the available knowledge.
4.11.2016
49. The remaining patients p1, p3 and p6 display symptoms which
enable us to classify them with certainty as having flu, patients
p2 and p5 cannot be excluded as having flu and patient p4 for
sure does not have flu, in view of the displayed symptoms.
Thus the lower approximation of the set of patients having flu
is the set {p1, p3, p6} and the upper approximation of this set
is the set {p1, p2, p3,p5, p6}, whereas the boundary-line cases
are patients
p2 and p5. Similarly p4 does not have flu and p2, p5 cannot be
excluded as having flu, thus the lower
approximation of this concept is the set {p4} whereas - the
upper approximation – is the set {p2, p4, p5} and the boundary
region of the concept “not flu” is the set {p2, p5}, the same as
in the previous case.
4.11.2016
50. Information system-
Example
P1 P2 P3 p4 p5
O1
1 2 0 1 1
O2
1 2 0 1 1
O3
2 0 0 1 0
O4
0 0 1 2 1
O5
2 1 0 2 1
O6
0 0 1 2 2
O7
2 0 0 1 0
O8
0 1 2 2 1
O9
2 1 0 2 2
O10 2 0 0 1 0
4.11.2016
When the full set of attributes P = { P1, P2,
P3 , P4 , P5 } is considered, we see that
we have the following seven equivalence
classes:
{ {O1 , O 2} {O3 , O7 , O10} {O 4} {O 5} {O 6 }
{O 8 } {O 9}.
It is apparent that different attribute subset
selections will in general lead to different
indiscernibility classes. For example, if
attribute P = {P1} alone is selected, we
obtain the following
{O1 , O 2}
{O3 , O5, O7 ,O9, O10},
{O4 , O6 , O8}
51. consider the target set X = { O1 , O2 , O3 , O4 }, and let attribute
subset P = { P1, P2 ,P3 , P4 , P5 }, the full available set of features. It
will be noted that the set X cannot be expressed exactly, because in
[x ] P , objects { O3 , O7 , O10 } are indiscernible. Thus, there is no
way to represent any set X which includes O3 but excludes objects
O7 and O10
However, the target set X can be approximated using only the
information contained within P by constructing the P -lower and P -
upper approximations of X :
The P-lower approximation, or positive region, is the union of all
equivalence classes in [x]P which are contained by (i.e., are subsets
of) the target set – in the example, = { O1 , O2 } ∪ {O4}, the union of
the two equivalence classes in [x] P which are contained in the target
set. The lower approximation is the complete set of objects in U / P
positively (i.e., unambiguously) classified as belonging to target set X
.
4.11.2016
XP
52. Upper approximation and negative region
The P -upper approximation is the union of all equivalence classes in
[x] P which have non-empty intersection with the target set – in the
example, = {O1 , O2 } ∪ {O4 } ∪ {O3 , O7 , O10 }, the union of the
three equivalence classes in [ x ] P that have non-empty intersection
with the target set.
The upper approximation is the complete set of objects that in U / P
that cannot be positively (i.e., unambiguously) classified as
belonging to the complement ( X ¯) of the target set X.
In other words, the upper approximation is the complete set of
objects that are possibly members of the target set X.
The set U − therefore represents the negative region, containing
the set of objects that can be definitely ruled out as members of the
target set.
4.11.2016
XP
XP
53. Indiscernibility Relation
The Indiscernibility relation IND(P) is an equivalence
relation.
Let a A, P A, the indiscernibility relation IND(P), is
defined as follows:
IND (P) = {(x.y) U U : for all a P, a(x) = a(y) }
The indiscernibility relation defines a partition in U. Let
P A, U/IND(P) denotes a family of all equivalence
classes of the relation IND(P), called elementary sets.
Two other equivalence classes U/IND(C) and U/IND(D),
called condition and decision equivalence classes
respectively, can also be defined.
4.11.2016
54. Let us depict above definitions by example 2
Consider the concept "flu", i.e., the set X= {p1, p2, p3, p6} and
the set of attributes B = {Headache, Muscle-pain,
Temperature}. Hence
= {p1,p3, p6} and = {p1, p2, p3, p5, p6}. For this case
we get αB(“flu") = 3/5. It means that the concept "flu" can be
characterized partially employing symptoms, Headache,
Muscle-pain and Temperature. Taking only one symptom B=
{Headache} we get
= ∅, = U and αB(“flu") = 0, which means that the concept
"flu“ cannot be characterized in terms of attribute Headache
only i.e., this attribute is not characteristic for flu whatsoever.
However, taking the attribute B= {Temperature} we get = {p3,
p6 }, = {p1, p2,p3,p5, p6} and αB(X) = 2/5, which means that
the single symptom Temperature is less characteristic for flu,
than the whole set of symptoms, but also characterizes flu
partially.
XB
XB
XB XB
XB
XB
4.11.2016
55. Positive region and Reduct
Positive region: POSR(d) is called the positive region of
classification
CLASST(d) is equal to the union of all lower approximation
of decision classes.
A reduct is a part of condition attributes that are enough to
classify the decision table. A reduct may not be unique.
Reducts ,are defined as minimal subset of condition
attributes which preserve positive region defined by the
set of all condition attributes, i.e.
A subset is a relative reduct iff
1 R C, POSR(D) = POSC (D)
2 For every proper subset R’ R, condition 1 is not true
4.11.2016
56. Core
The set of all the features indispensable in C is denoted by
CORE(C). We have
CORE( C ) = RED (C)
Where RED(C) is the set of all reducts of C. Thus, the Core is the
intersection of all reducts of an information system. The Core does
not consider the dispensable features and it can be expanded using
Reducts.
4.11.2016
57. Dependency coefficient
Is a measure of association, Dependency coefficient
between condition attributes A and a decision attribute d is
defined by the formula:
Where, Card represent the cardinality of a set.
)(
))((
),(
UCard
dPOSCard
dA A
4.11.2016
58. A reduct can be thought of as a sufficient set of features – sufficient,
that is, to represent the category structure.
In the example table above, attribute set { P3 , P4 , P5 } is a reduct –
the information system projected on just these attributes possesses
the same equivalence class structure as that expressed by the full
attribute set:
{ { O1 , O2 } { O3 , O7 , O10 } { O4 } { O5 } { O6 } { O8 } { O9 }
Attribute set {P3, P4 , P5 } is a legitimate reduct because eliminating
any of these attributes causes a collapse of the equivalence-class
structure, with the result that [ x ] R E D ≠ [x]P
4.11.2016
59. For example 2, there are two relative reducts with respect to flu {
Headache, temperature} and {Muscle-pain, Temperature} of the set
of condition attributes {Headache, Muscle-pain, Temperature}. That
means that either the attribute Headache or Muscle-pain can be
eliminated from the table and consequently instead of Table 1 we can
use either Table 2
Table 1 Table 2
4.11.2016
H T F
p1 No High Yes
p2 Yes High Yes
p3 Yes V. High Yes
p4 No Normal No
p5 Yes High No
p6 No V. High Yes
M T F
p1 Yes High Yes
p2 No High Yes
p3 Yes V. High Yes
p4 Yes Normal No
p5 No High No
p6 Yes V. High Yes
60. For Table 1 the relative core of with respect to the set {Headache,
Muscle-pain, Temperature} is temperature. This confirms our
previous considerations showing that Temperature is the only
symptom that enables, at least, partial diagnosis of patients
4.11.2016
61. Rough sets for Decision Making
INFORMATION SYSTEM FOR ACCIDENT DATASET
Let B = {A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, A11, A12, A13, A14,
A15} be the set of 15 accidents. The set of Condition attributes of
Information System C = {Drunk Driving, Distracted Driving, Over
Speed, Night Driving, Health Issue / Stress, Tire Blowouts, Brake
Failure Accidents}.
The set of Decision attribute of information system D = {Accident
Occurs, No Accident}
Decision Parameter
(Accident Occurs) = Number of positive condition attributes
Number of objects
Decision Parameter
(No Accident) = Number of negative condition attributes
Number of objects
4.11.2016
63. IND ({Distracted Driving}) = {{A1, A3, A5, A6, A7, A8, A9, A11, A15},
{A2, A4, A10, A12, A13, A14}}
IND ({Over Speed}) = {{A1, A4, A5, A7, A8, A9, A10, A11, A12, A14,
A15}, {A2, A3, A6, A13}}
IND ({Night Driving}) = {{A1, A2, A6, A14, A15}, {A3, A4, A5, A7, A8,
A9, A10, A11, A12, A13}}
Quality Coefficient of upper and lower approximation can be
calculated,
αB = 10/15, for areas with various attributes that have possibility
to meet accident.
αB( ) = 7/15, for areas with various attributes that have the
possibility of no accident.
XB
XB
4.11.2016
64. αB( ) = 8/15, for areas with various attributes that certainly meet
an accident. i.e., that is, 53% of areas certainly meet an accident
αB( ) = 5/15, for areas with various attributes that certainly do not
meet an accident. i.e., approximately 33% of areas certainly do not
meet an accident.
Dependency of Accident Dataset
In Accident dataset we have 8 elements in lower approximation that
is the areas with various attributes that meet an accident and 5
elements in lower approximation that is the areas with various
attributes that do not meet an accident and the total element in lower
approximation is 13 then the dependency coefficient is calculated as
γ(C,D) = 13/15 = 0.86. So D depends partially (with a degree k=0.86)
on C.
XB
XB
4.11.2016
65. The rules generated by reduct are called ‘Reduct Rules’ and
decision based on these rules are generally more precise and
accurate.
The first step towards the ‘Reduct Rule’ generation is removal
of redundancy.
The next step towards the removal of redundancy or reduction
is to analyze each condition attribute one by one
independently with decision attribute.
Finally we get : Rule 1
If (Drunk Driving = Yes) and (Over Speed = Yes) and (Tire
Blowout = Yes) , Then Accident Possibility = Yes
Rule 2
If (Drunk Driving = No) and (Over Speed = No) and (Tire
Blowout = Yes) , Then Accident Possibility = No
4.11.2016