1. The Fuzzy Logical Databases and
An Efficient Evaluation of a Fuzzy
Equi-Join
:Supervised by
Dr. Bassam Hammo
Presented by : Alaa AlZoubi
2. OutLines
.Introduction to Fuzzy concepts in database. 1
.Interpretation of Fuzzy Terms .2
.We propose a new measure for a fuzzy equality .3
We define a new type of fuzzy equi-join that is based on the new fuzzy .4
.equality
A sort-merge join algorithm based on a partial order of intervals is used to .5
evaluate the fuzzy equi-join
Experiment results to show a Significant improvement of efficiency when . 6
FE indicators are used with the Sort-merge join algorithm
3. ABSTRACT
In many real world Applications Such as business decision
making, medical diagnosis, and criminal justice, have to deal
.with information that is uncertain or imprecise
Classical database models often suffer from their incapability of
representing and manipulating imprecise and uncertain
.information
:Example
” .The age of Tom is “About 32
knowledge-base and database systems should directly support
such applications by providing functionalities to store and to
.manipulate ill-known data
4. The fuzzy logic has been used to extend various
.database models
The purpose of introducing fuzzy logic in databases is to
enhance the classical database models such that
uncertain and imprecise information can be represented
.and manipulated
In recent years, various fuzzy data models and fuzzy
. database systems have been proposed
These models and systems extend relational and objectOriented data models using the fuzzy set and the
possibility theory to provide the ability of representing illknown data and issuing queries containing Soft
.restrictions
5. These models can be classified into two
:categories
similarity-based-1
In a similarity-based model, some similarity relationships
are specified for some attributes so that values of these
.attributes may be grouped into similarity classes
possibility-based-2
In a possibility-based model, an ill-known data is
represented by a possibility distribution which describes the
possibility for each crisp attribute value to be the actual
.value of the data
6. Among the algebraic operations, fuzzy join is an important and expensive one, and
.its efficient evaluation is more difficult than that of an ordinary join
:There are two reasons for the difficulty
diverse semantics : In a fuzzy join, two tuples may join even if they do not -1
completely satisfy the join condition. The extent to which they do satisfy the
join condition is usually Represented by some satisfaction degrees
lack of fast access paths :most efficient join algorithms such as indexing and-2
hashing used in ordinary databases relational do not apply directly to fuzzy relational
. databases
7. Fuzzy concepts in database
`Fuzzy' information or `fuzzy' data can arise in several ways.
It could be :
1.Due to imprecision in measured data.
2.Due to subjective judgments.
i.e. a database may contain data that is describing things such as the quality of
a school, the safety of a neighborhoods etc)
3.Due to nature of the information required by a user .
i.e. a user may wish to make imprecise queries (`I want a list of universities that
offer a good graduate program in software engineering and where the cost of
living is low')
Fuzzy approaches have been used to extend systems in
two main ways
).Storing and updating imprecise information (data. 1
.Processing imprecise queries. 2
8. ?Introducing fuzziness
The table is a relation, the columns are attributes and the rows are tuples.
Each attribute has a domain.
For example the domain for the attribute JobType might be the set
(Academic, Industry, Government).
FirstName
SecondName
JobType
Expertise
Smith
Tom
Academic
AI
Alan
Smith
Industry
Expert Systems
Bill
Alin
Government
Statistics
Cindy
simth
Government
Robotics
9. Imprecision in attribute values can be introduced using a similarity matrix
e.g. for the attribute expertise
Robotics
Statistics
Expert Systems
AI
Robotics
1.0
0.2
0.6
0.6
Statistics
0.2
1.0
0.2
0.2
Expert Systems
0.6
0.2
1.0
0.9
AI
0.6
0.2
0.9
1.0
matrices of this type can be used to determine the matching degree between an
applicant and a job opening.
10. .Imprecision in attribute values can also be introduced using a linguistic variable
e.g. the attribute height could be stored as values (short, average, tall) each of these
.could be modeled as a fuzzy set
Another way in which imprecision may be introduced is to permit partial
.membership of a tuple in a relation
).The degree to which a tuple is a member is stored as a special attribute(
.The following is a relation (or table) storing information re endangered species
.Tuples of this type are sometimes called weighted tuples
Name
habitat
-------------
membership
Field Mouse
grassland
…
1.0
Wren
rain forest
…
0.4
Black Duck
coastal
…
0.4
11. :An example
By using conventional method we can call a person “TALL” if the height is 7
feet and a person with height 5 feet is NOT TALL. That is we represent the
person is either “TALL” or “NOT TALL” in Boolean Logic 1 or 0, 1 for “TALL”
”and 0 for “NOT TALL
: Fuzzy sets may be used to show the relationship or degree of precision
If S is the set of all people in the Universe, a degree of membership is
.assigned to each person in set S to find the subset TALL
.The membership function is based on the person’s height
,TALL(x) = 0,
Height(x) – 5 )/ 2 ,
if Height(x) ≥ 7 feet,
if Height(x) < 5
if 5 ≤ Height(x) < 7 (
1
13. :Imprecise queries
A user may make an imprecise query on a database. This can be due to the
:use of
.Imprecise conditions.1
Find all tax payers who have been audited in 2008 and whose income is
”.low
“
.Imprecise operators.2
Find all countries whose export revenue is about the same as the import“
”.revenue
.Imprecise quantifiers.3
”.Find the companies whose customers are mostly from government agencies“
14. Interpretation of Fuzzy Terms
A fuzzy data has an uncertain or imprecise value. We associate each fuzzy data
v with a fuzzy term and a membership function (of a fuzzy set).
The membership function, denoted by µv, maps each crisp value x in the universe
of v to a membership degree µv(x) in [0, 1] to indicate the possibility of v = x.
A membership function can be defined in a number of ways.
Over a numerical universe, a membership functions is typically convex (with a
convex curve) and normal (at least one member has degree 1).
We consider membership functions of a trapezoidal shape, and denote them by
MF (a, b, c, d), where the parameters mark the endpoints of the shape.
15. For example, the membership function defining fuzzy term F2 in this Figure
).is denoted by MF (20, 30, 40, 50
If a value v has a membership function defined by MF (a, b, c, d), the interval
[a, d] is called the supporting interval of v
As special cases,
MF (l, l, u, u) defines an interval [l, u],
MF (v, v, v, v) defines a crisp value v, and
MF (a, b, b, d) is a triangular function.
Over a categorical universe, membership function is defined by µv = x1/m1* )
+ x2/m2 + _ _ _ + xk/mk, where xi is a value in the universe and mi is the
membership degree of xi. The membership function of a single crisp value v
.(is µv = v/1
16. FUZZY RELATIONS
In this section, we briefly describe the representation of data in a fuzzy relational
database.
A data is crisp if it is certain and precise, and fuzzy, otherwise.
A fuzzy (sub) set F of an ordinary set U is characterized by a membership
function: µF: U—> [0, 1]
The Idea of Fuzzy Sets
Fuzzy sets are functions that map a value, which might be a member of a
set, to a number between zero and one, indicating its actual degree
of membership
A degree of zero means that the value is not in the set, and a degree of
.one means that the value is completely representative of the set
17. For every (crisp) value (x ∈ U), µF(x) is the membership degree of x with
respect to (wrt) F that is,
(µF(x) = 1) if x is a full member,
(0 < µF(x) < 1) if x is a partial member
or ( µF(x) = 0) if x is not a member of F.
.Without loss of generality, x is in F only if µF)x( > 0
A fuzzy data v is represented by a possibility distribution restricted by
a fuzzy set F in the sense that v is a member of F, and the possibility for v to
).be a member x of F is exactly µF(x
18. :A membership function can be defined in a number of ways
Over a numerical universe, a membership function is typically convex
). with a convex curve) and normal (at least one member has degree 1(
.The following generic parameterized function to define such membership functions
MF(a,b,c,d)(x)=
{
0, if x≤ a<b or
X<a=b
(x-a)/(b-a), if a<x<b;
1, if b ≤ x ≤ c
(d-x)/(d-c), if c < x <d ;
0, if c<d ≤ x , or
C=d < x
19. Where the parameters a, b, c, and d are values in the universe satisfying
,a ≤ b ≤ c ≤ d. In general, the curve of the generic function is a trapezoidal
.as shown in The Following Fig, but can also be some other shapes
For example, MF (a, b, b, d) defines a triangular function since the second and
.the third parameters are the same
20. Over a nonnumerical universe, a membership function takes the form of
);µF=X1/m1+ X2/m2 +… +Xk/mk(
Where xi is a value in the universe and mi is the membership degree of xi with
respect to F. In this case, the degenerated membership function of a crisp value
.v is µv= v/1
The universe of an attribute A, denoted by U (A), is the set of crisp values that
.may appear in the attribute
The domain of an attribute A, denoted by D (A), is the set of all (both crisp and
(fuzzy) values defined over U (A). A fuzzy relation R with a schema (A1, A2. . . An
( .is a fuzzy set of tuples in D(A1( ×…× D(An
22. A FUZZY EQUI-JOIN
In this section, we first define a fuzzy equality and then use it to define a
fuzzy equi-join.
The following example shows the needs for a fuzzy equi-join.
(.Example : Consider the following relation R (as shown in following Table
The query "Find all pairs of persons from R whose ages are equal to a
“degree no less than 0.5
NAME
OCCUPATION
AGE
Smith
Engineer
20
Alan
DBA
31
Bill
Teacher
About 32
Cindy
Lawyer
About 34
Mike
Teacher
Middle age
Tom
Farmer
58
23. :Solution
. a join of R with itself on the AGE attribute with a fuzzy equality comparison
Since AGE contains fuzzy values, we must determine the degree for two fuzzy
ages, say About 32 and Middle age, to be equal (that is, to satisfy the join
(.condition AGE = AGE
(, Where About 32 = MF (30, 32, 32, 34
(,About 34 = MF (32, 34, 34, 36
(.and Middle age = MF (30, 35, 45, 50
It is obvious from Example that the computation of the satisfaction degree of
the fuzzy equality comparison is the key to the meaning of the fuzzy equi-join
24. In the following, we propose a new measure for the fuzzy equality
Comparison based on the similarity of fuzzy values.
Definition : Let D be a set of values. The fuzzy equality on D is a mapping
[.D×D [0, 1~=:
that for every pair of values v1 =MF (a1, b1, c1, d1( and v2 =MF(a2, b2, c2,
d2( in D, gives
( =v1 ~= v2(
min(μv1(x(, μv2(x((dx∫
max(μv1(x(, μv2(x((dx∫
, Where ∫ is over the universe on which the membership functions are defined
and is interpreted as a summation if the universe is discrete
Intuitively, ∫min(μv1(x(, μv2(x((dx is the accumulated membership degrees of
the intersection, and ∫max(μv1(x(, μv2(x((dx is that of the union of the two fuzzy
. sets defining v1 and v2
25. Definition : A fuzzy equi-join of fuzzy relations R and S on attributes R.A
and S.B with a threshold i ≥ 0.
Denoted by
R
(R.A ~= S.B( ≥i
S
Is a fuzzy relation T with the membership function defined by
μT(xy(=min(μR(x( , μS(y( , μq(xy(( .
Where x is a tuple in R , y is a tuple in S, and
{
(If (x[A[ ~=y[B[ < i
[x[A[ ~=y[B
μq(xy( =
,0
.Otherwise
Since this fuzzy equi-join allows the threshold value to be specified, it is very
flexible and can be evaluated more efficiently than existing ones.
26. .Compared with the existing measures, the new measure seems more natural
.Allows the algebraic operations to be composed- 1
The degree is obtained by considering all possible values in both fuzzy data- 2
.rather than one best possible value of each fuzzy data
.Therefore, it is more intuitive
fuzzy data can be regarded as the subjective representation of a real-world- 3
.data viewed by an observer
Note that for fuzzy data, the satisfaction degree must always be treated as- 4
.uncertain
Notice that, for crisp data, the fuzzy equality is the same as the ordinary- 5
.equality, that is, it is a "hard" comparison
27. AN INTERVAL-BASED FUZZY JOIN
ALGORITHM
We now present a Sort-Merge Fuzzy Equi-join (SMFEJ) algorithm,for
.evaluating the fuzzy equi-join
.The purpose of using SMFEJ to evaluate the fuzzy equi-join efficiently
The SMFEJ algorithm assumes that fuzzy join attributes have numeric
universes and membership functions are defined by the generic
.parameterized function
28. :The algorithm has two phases
.sorting phase- 1
In the sorting phase, both relations R and S are sorted on their join attributes
.according to a partial order defined over the attribute values
Definition : Let x1 =[l1,h1[ and x2 =[l2, h2[ be two intervals on a set with a total
:order. We say that
x1 overlaps x2, denoted by x1 ∩ x2, if l1 < h2 and -1
.l2 < h1
x1 is equivalent to x2, denoted by x1 ≡ x2, if l1 = l2 and -2
. h1 = h2
.x1 precedes x2, if l1 < l2, or if l1 = l2 and h1 < h2 -3
. x1 precedes or equals x2, if x1 precedes x2, or if x1 ≡ x2 -4
29. .joining phase -2
In the joining phase, each page of R is read once. For each tuple r in R, the S.tuples that may join with r are in the range of r as defined below
Thus, only those pages containing rngs(r( need to be read into a buffer and
.those tuples in rngs(r( need to be scanned to see if they actually join with r
Thus, the time complexity of the algorithm will be
( ,O(cost(sorting(+ n + m
where n and m are the sizes of R and S, respectively, in pages, and
cost(sorting( is the time spent on sorting R and S including both I/O and
.CPU time
.Typically cost(sorting( = n log n + m log m
30. FUZZY EQUALITY INDICATORS
We now consider how to use the SMFEJ algorithm to evaluate fuzzy
.equi-join efficiently
For practical reasons, we assume a limited buffer space available to the
.algorithm
Thus, during the joining phase, some pages in rngs(r( for some tuple r
may have to be swapped out of the buffer to make rooms for other pages,
and then be swapped back in because they are also in the range of the
.next R-tuple
In this case, the key to the efficient evaluation of fuzzy equi-join is to
determine the appropriate intervals to associate with the fuzzy attribute
values
31. Example : Assume that R has a tuple r with r[A] = MF(10, 10, 40, 40) and S
contains exactly the tuples s1, . . . , s9 with
),S1[B] = MF(5, 5, 20, 20
),S2[B] = MF(6, 6, 9, 9
),S3[B] = MF(10, 10, 40, 40
),S4[B] = MF(11, 11, 16, 16
),S5[B] = MF(15, 15, 45, 45
),S6[B] = MF(20, 20, 30, 30
),S7[B] = MF(20, 20, 50, 50
),S8[B] = MF(32, 32, 36, 36
).and s9[B] = MF(35, 35, 60, 60
.]Thus, rngs(r). is [s1, . . . , s9
32. :With a little calculation, we have
] (r[A] ~= si[B)
Value
] )r[A] ~= s1[B(
0.29
] )r[A] ~= s2[B(
0
] )r[A] ~= s3[B(
1
] )r[A] ~= s4[B(
0.17
] )r[A] ~= s5[B(
0.71
] )r[A] ~= s6[B(
0.33
] )r[A] ~= s7[B(
0.5
] )r[A] ~= s8[B(
0.13
] )r[A] ~= s9[B(
0.1
If the join condition is )R.A ~= S.B( ≥ 0.5, only s3, s5, and s7 will join with r. If
,the threshold value is raised from 0.5 to 0.9, only s3 will join with r. In both cases
.however, all S tuples must be scanned
33. Now, suppose that i= 0.5, and that, based on a method to be discussed later,
we assign the interval
to r[A] and the intervals] 32.5 , 17.5 [(
, [37,5 ,22.5] , [14.75 , 12.25] ,[32.5 , 17.5] , [8,25 , 6,75] , [25 ,17 , 8.75]
]. and [41.25 , 53.75] , 35 , 33] , [42.5 , 27.5] ,[27.5 , 22.5 [
.)to si[B] i = 1, 2, . . . , 9, respectively
],It can be verified that rngs(r). becomes [s3, s5, s6, s7
that is, reduced by more than 50 percent. Thus, the same result can be
.obtained by scanning a less number of tuples
it is possible that not all tuples in rngs(r). join with r, and the higher the
).threshold value is, the more such irrelevant tuples are in rngs(r
34. , Since every tuple in rngs)r( must be scanned during the join
the efficiency can be improved by moving as many irrelevant
.tuples out of rngs(r). as possible
This can be achieved if the assignment of intervals to the attribute
values is an appropriate function of the threshold value, so that the sorting will
. rearrange the tuples appropriately
35. Intuitively, if f is an FE indicator over the domain of the join attributes of a fuzzy
, equi-join, by assigning intervals to join attribute values using f
it guarantees that after sorting according to , for every tuple r in R, every
(.relevant S-tuple is in rngs)r
However, it does not guarantee that every tuple in rngs)r(. joins with r
. unless f is a perfect FE indicator
If both f and g are FE indicators over the domains of the join attributes, and f is
stronger than g, f will assign smaller intervals to values than g would, thus may
.move more irrelevant tuples out of rngs(r) for every r
36. EXPERIMENT RESULTS
Study performance of algorithm SMFEJ using various types of data and the FE
indicators
The performance study is based on a simulation of algorithm SMFEJ on
.synthetic data
.The experiments are performed using a Sun SPARCStation 5
:The performance of the algorithm is measured by
The number of I/O pages read from the inner relation, as the I/O
. cost
.The number of comparisons made, as the CPU cost- 2
-1
For each pair of R and S tuple, if the values in the join attributes overlap with
each other, two comparisons are recorded, one to determine that
.they overlap, and the other to determine whether they really join
.If the two values do not overlap, one comparison is recorded
38. . The algorithm SMFEJ is implemented to take advantage of page buffers
, For each page of relation R, one page of relation S is read at a time
and all join results that can be obtained from the two pages will be obtained
.before the next page of relation S is read
It is straightforward to see that a larger buffer space
. will 1- Reduce the I/O cost
.Save more CPU cost than I/O cost- 2
39. CONCLUSION
In this paper, we propose a
New fuzzy equality comparison operator with a measure that combines the- 1
.possibility measure with the similarity measure
Define a type of fuzzy equi-join based on the new fuzzy equality comparison- 2
operator which allows threshold values to be associated with individual
. predicates of the join condition
A sort-merge join algorithm based on a partial order of intervals is used to- 3
. evaluate the fuzzy equi-join
Define FE indicators, that determine appropriate intervals for fuzzy data, are- 4
. identified for data sets with different characteristics
Experiment results from our preliminary simulation of the algorithm show a- 5
significant improvement of efficiency when FE indicators are used in
.conjunction with the sort-merge join algorithm
40. It may be interesting to
study
.study Other types of data correlations- 1
Finding efficient join algorithms that can be applied to both numeric and
. discrete attributes is an important issue
-2
Finding new types of fast access paths that handle both crisp and fuzzy data-3
.efficiently is a challenging task