SlideShare a Scribd company logo
1 of 33
Download to read offline
An Assessment of Fuzzy Linear Regression
By
Eric Szegedi
B.S., Liberty University, 1992
Advisor: Andrzej S. Kosinski, Ph. D.
A thesis submitted to the Faculty of the Rollins School of Public Health
Of Emory University in partial fulfillment
Of the requirements for the degree of
Master of Public Health
Department of Biostatistics
1996
An Assessment of Fuzzy Linear Regression
By
Eric Szegedi
Advisor: Andrzej S. Kosinski, Ph. D.
Approved for the Department
Andrzej S. Kosinski
Adviser
Michael Lynn
Committee Member
Accepted:
Vicki Stover Hertzeg
Director, Division of Biostatistics
10 May 1996
Date
In presenting this thesis as a partial fulfillment of the requirements for an advanced
degree from Emory University, I agree that the Library of the University shall make it
available for inspection and circulation in accordance with its regulations governing
materials of this type. I agree that permission to copy from, or to publish, this thesis may
be granted by the professor under whose direction it was written, or, in his absence, by
the Dean of the Rollins School of Public Health when such copying or publication is
solely for scholarly purposes and does not involve potential financial gain. It is
understood that any copying from, or publication of, this thesis which involves potential
financial gain will not be allowed without written permission.
Eric Szegedi
NOTICE TO BORROWERS
Unpublished theses deposited in the Emory University Library must be used only in
accordance with the stipulations prescribed by the author in the preceding statement.
The author of this thesis is:
NAME: Eric Szegedi
ADDRESS: szegman@yahoo.com
The director of this thesis is:
NAME: Andrzej S. Kosinski, Ph.D.
ADDRESS: The Rollins School of Public Health at Emory University, Division of
Biostatistics, 1518 Clifton Rd. NE, Atlanta, GA 30322, USA
Users of this thesis not regularly enrolled as students at Emory University are required to
attest acceptance of the preceding stipulations by signing below. Libraries borrowing this
thesis for the use of their patrons are required to see that each user record here the
information requested.
Name of user Address Date Type of use: (Examination or
copying)
ABSTRACT
The purpose of this thesis is to investigate the usefulness of fuzzy linear regression as
developed by Tanaka, Uejima, and Asai, and then refined by Savic and Pedrycz, 1991. A
comparison of fuzzy linear regression to least-squares linear regression is used to make
the assessment. The comparison is undertaken in two ways. One way is to compare the
spreads,
!
r
c , of the fuzzy linear regression estimated coefficients with the standard errors
or least-squares linear regression estimated coefficients. The other way is to compare the
residuals of each type of linear regression. The goal of these two comparisons is for
statisticians to gain a better understanding of fuzzy linear regression as a method for data
analysis. The conclusion of this thesis is that further work needs to be conducted in order
to obtain an interpretable meaning from the fuzzy linear regression parameters. At this
point a clearer understanding cannot be given.
(Note: Thesis is on file at http://www.sph.emory.edu/bios/news/library/szegedi.html)
Contents
1 INTRODUCTION......................................................................................................8
2 THEORY OF FUZZY LINEAR REGRESSION....................................................11
3 COMPARISON OF PERFORMANCE ..................................................................18
3.1 COMPARISON OF FLR SPREAD AND LSLR STANDARD ERROR ..........................18
3.2 COMPARISON OF RESIDUALS ..............................................................................20
4. CONCLUSION AND DISCUSSION......................................................................22
A APPENDIX..............................................................................................................25
B APPENDIX..............................................................................................................28
C APPENDIX..............................................................................................................31
REFERENCES............................................................................................................32
List of Tables
Table 1: Example of Membership Values for Several____________________ 10
Table 2: Estimated
!
" j's and
!
c j's for Restenosis Data Set ________________ 19
Table 3: LSLR Standard Error for Restenosis Data Set __________________ 19
Table 4: Comparison of FLR and LSLR for =
!
ˆH 0.50____________________ 21
List of Figures
Figure 1: Comparison of FLR residuals with LSLR residuals for Restenosis Data Set 21
Eric Szegedi: An Assessment of Fuzzy Linear Regression 8
1 Introduction
In analyzing medical data, various indicators for a disease or outcome do not always have
an exact definition nor does their relationship. Also measurements may not be precise or
cannot be precise and a more precise meaning may be given to a variable using fuzzy
sets. For example, hypertension is defined as being greater than 90 mmHg diastolic blood
pressure for a “normal” person. A more accurate definition may be given using fuzzy sets
which will give a slight possibility of hypertension to someone with 70 mmHg diastolic
blood pressure and a high possibility of hypertension to someone with 105 mmHg
diastolic blood pressure. Another example is the relationship between a person’s hdl
cholesterol level and their level of heart disease. Someone with a high hdl cholesterol
level will have a high possibility of heart disease, whereas someone with a low hdl
cholesterol level will have a slight possibility of heart disease.
Fuzzy sets are a way to deal with problems where the source of imprecision is not
random error but “the absence of sharply defined criteria of class membership” [12]. If X
is said to be a space of points or objects with an element of X being referred to in general
as x, then a membership function
!
µS x( ) that maps each point in X to a real number in the
interval [0,1] defines a fuzzy set S in X. The value of
!
µS x( ) at x
!
" X, and is the set of
possible membership values. A set is considered fuzzy as long as the valuation set
contains values between 0 and 1. In classical set theory the valuation set comprises only
two values – membership or non-membership,
!
µS x( )=1 or
!
µS x( )=0 respectively. In
fuzzy set theory this valuation set describes the graded membership of an element x in a
Eric Szegedi: An Assessment of Fuzzy Linear Regression 9
set S by use of the function
!
µS x( ) mapping x to values between 0 and 1. An element may
belong partially to a set S, and the higher the value of the membership function
!
µS x( ) or
the closer
!
µS x( ) is to 1, the more an element x belongs to the set S. The set S does not
have a clearly defined membership, since S cannot be said to contain certain elements or
not. An
!
x1 with a membership of
!
µS x2( )=0.15. The element
!
x1 is said to characterize the
set S more than
!
x2 since
!
x1 has a higher membership value than
!
x2.
The various indicators for a disease or outcome, mentioned earlier, may actually
have a fuzzy definition, and their relationship may be described by a fuzzy function, such
as the fuzzy linear regression described in this thesis. The distribution of the data for
these various indicators may be characterized by a valuation set, or possibility
distribution as a valuation set is sometimes called in the literature [5]. The valuation set
consists of the values of the membership function and is based on the concept of fuzzy
logic. Fuzzy logic was created nearly simultaneously by Zadeh [12] and Klaua [2] in
1965 and has been mostly applied to engineering problems and control systems. Where
classical logic allows only for a value of true or false, denoted 1 and 0 respectively, fuzzy
logic allows for a gradation of values within the interval [0,1], so that a value of partially
true or partially false may be assigned. Fuzzy set theory is based on this concept of fuzzy
logic.
For example, if one were to ask a respondent in a survey how many times they
use cocaine and they only respond with the word “several”, a valuation set for the word
“several” would have to be used. The valuation set may look like the table below [8]. In
this case the valuation set, which comprises the membership values in Table 1, was
Eric Szegedi: An Assessment of Fuzzy Linear Regression 10
created by asking 23 students to “rate the degree of possibility that various integers could
be the number someone has in mind when they say several” [8].
integer membership value
0 0.00
1 0.00
2 0.00
3 0.18
4 0.57
5 0.81
6 0.97
7 0.84
8 0.72
9 0.26
10 0.02
11 0.00
Table 1: Example of Membership Values for Several
The numbers shown in Table 1 are the mean ratings given by the students. The value of
the membership function for x=3 would be
!
µS 3( )=0.18. The table shows that the word
“several” refers mainly to the integers between 5 and 8 because of their high membership
values. The most possible value of the word “several” would be 6, since 6 is the number
in the fuzzy set S with the highest membership value.
In this thesis, the question of the usefulness of fuzzy linear regression (FLR) in
statistics is assessed by comparing FLR to least-squares linear regression (LSLR). The
comparison between FLR and LSLR is done in two ways. One such way is to compare
the spreads of FLR estimated coefficients with the standard errors of LSLR estimated
coefficients. The other way is to compare the residuals of FLR with the residuals of
LSLR. The goal is to have a better understanding of the FLR parameters [
!
(
r
",
r
c), defined
in section 2], so that statisticians can use FLR as an alternative to LSLR.
Eric Szegedi: An Assessment of Fuzzy Linear Regression 11
2 Theory of Fuzzy Linear Regression
The purpose of fuzzy linear regression is to determine the estimated fuzzy coefficients
that have the minimum membership function for the observed fuzzy set
!
Yi = (i =1,...,n) in
the predicted fuzzy set
!
Yi
*
. The fuzzy set
!
Yi
*
used in this thesis is defined by the linear
model
!
Yi
*
= "0 + "1xi1 + "2xi2 + ... + "p xip [10], where the
!
xij j =1,..., p) are covariates
given as non-fuzzy input data and
!
" j ( j = 0,..., p) are fuzzy sets. A design matrix for
!
xij is
!
X =
1 x11 K x1p
1 x21 K x2p
M M M M
1 xn1 K xnp
"
#
$
$
$
$
%
&
'
'
'
'
where
!
xi
T
is the ith
row of X. The observed fuzzy set
!
Yi is characterized by center and
spread
!
yi,ei( ),such that the observed fuzzy set
!
Yi contains real numbers in the interval
!
yi " ei,yi + ei[ ]. The center
!
yi and the spread
!
ei are given as input from the observed data
set. In this thesis
!
ei =0 for all i. The predicted fuzzy set
!
Yi
*
is characterized by
!
yi
*
,ei
*
( )
where
!
yi
*
= xi
T r
" and
!
ei
*
= xi
T r
c , such that the predicted fuzzy set
!
Yi
*
contains real numbers
in the interval
!
yi
*
" ei
*
,yi
*
+ ei
*
[ ]. Here
!
r
" and
!
r
c denote vectors of center values
!
r
" = "0,...,"p[ ]
T
( ) and spread
!
r
c = c0,...,cp[ ]
T
( ) for all the fuzzy sets
!
" j ( j = 0,..., p).
In this thesis the membership function that determines the membership value will
be of the form
!
L •( ) = max 0,1" •( ) such that the membership function of
!
Yi
*
is
Eric Szegedi: An Assessment of Fuzzy Linear Regression 12
!
µYi
* Yi( ) = L
yi " yi
*
ei " ei
*
#
$
%
&
'
( =
1"
yi " yi
*
ei " ei
*
if ei " ei
*
> 0 and yi " yi
*
) ei " ei
*
0 if ei " ei
*
= 0 and yi * yi
*
or ei " ei
*
< yi " yi
*
1 if ei " ei
*
= 0 and yi = yi
*
0 if ei " ei
*
< 0
+
,
-
--
.
-
-
-
where
!
µYi
* Yi( ) is the membership function of
!
Yi defining the fuzzy set
!
Yi
*
. The center
!
yi
*
is the most possible value of the set
!
Yi
*
because
!
yi
*
has the highest membership function
for
!
µYi
* Yi( ). The spread
!
ei
*
determines how fuzzy or precise the set
!
Yi
*
will be. Also, the
wider the spread the closer to 1 (the largest membership value) the membership function
becomes.
Let the minimum
!
µYi
* Yi( ) for all i be H such that H is “the largest membership
value such that all
!
yi values having membership of at least [H] inside the fuzzy
[observed] set
!
Yi have at least [H] membership values inside the fuzzy [estimated] set
!
Yi
*
” [6]
!
H = minµYi
* (Yi ),i =1,K,n( ). The value H is called the degree of fit for a model
and is a value in the interval [0,1]. A value of H=0.7 implies a higher degree of
membership of
!
Yi in
!
Yi than a value of H=0.2.
The fuzzy sets
!
" j ( j = 0,..., p) are defined as
!
µ" j
aj( )=
1#
aj # $ j
c j
if aj # $ j % cj and cj > 0
0 if cj = 0 and aj & $ j or cj < aj # $ j
1 if cj = 0 and aj = $ j
'
(
)
)
)
*
)
)
)
Eric Szegedi: An Assessment of Fuzzy Linear Regression 13
where
!
µ" j
aj( ) is the membership function of
!
aj defining the fuzzy set
!
" j . The center of
the fuzzy set
!
" j is
!
" j and is the most possible value of the set
!
" j because
!
" j has the
highest membership value for
!
µ" j
aj( ). The spread around the center
!
" j of the fuzzy set
!
" j is
!
c j and is the precision of the fuzzy set
!
" j . A fuzzy set
!
" j with
!
c j=0 is referred to
as a crisp set.
The determination of the FLR residuals will be accomplished by using the
original formulation of the minimization problem by Tanaka et al. [10]. The minimization
problem is formulated as solving the following linear programming problem for
!
r
c and
!
r
" :
!
r
"
r
c # R p + 1
min
ei
*
i=1
n
$
subject to yi
*
+ 1% ˆH( )ei
*
& yi + 1% ˆH( )ei
and yi
*
% 1% ˆH( )ei
*
& yi % 1% ˆH( )ei
where
!
c j " 0 e*
= xi
T r
c( ) and
!
ˆH is the estimated degree of fit. The minimization problem
above has 2n constraints.
To compare the spreads of FLR with standard errors of LSLR this thesis will use
the minimization problem, as developed by Tanaka et al. [10] but then refined by Savic
and Pedrycz [7], along with the search method by Moskowitz and Kim to provide the
proper
!
r
" and
!
r
c . The reason for the difference in the two analyses is that in the
refinement by Savic and Pedrycz the centers of FLR would be the same as the parameter
estimates in LSLR, therefore, the residuals of FLR and LSLR could not be different by
definition.
Eric Szegedi: An Assessment of Fuzzy Linear Regression 14
The fuzzy regression minimization problem for the second comparison is
conducted in two steps. The first step in the refinement of the problem by Savic and
Pedrycz is to obtain
!
r
" = XT
X( )
#1
XT
Y, which is the least-squares estimator. The second
step is to solve the following linear programming problem for
!
r
c :
!
r
c " R p + 1
min
ei
*
i=1
n
#
subject to yi
*
+ 1$ ˆH( )ei
*
% yi + 1$ ˆH( )ei
and yi
*
$ 1$ ˆH( )ei
*
% yi $ 1$ ˆH( )ei
!
c j " 0 e*
= xi
T r
c( ). The minimization problem above also has 2n constraints. This
procedure as refined by Savic and Pedrycz is uniquely defined for
!
r
" when X is a full
rank matrix. As the value of
!
ˆH becomes higher, the greater
!
r
c becomes [4]. The best
degree of fit is determined by finding the estimated fuzzy sets
!
" j characterized by
!
" j ,c j( ) of
!
Yi
*
which are solutions to the minimization problem.
Moskowitz and Kim note that “the selection of a proper value of [
!
ˆH] is important
in fuzzy regression, because it determines the range of the possibility distributions
[valuation sets] of the fuzzy parameters.” Moskowitz and Kim suggest two methods for
determining
!
ˆH, an analytical method and a search method. I will use the search method
in this thesis since the search method is more advantageous when the amount of spread is
uncertain. The search method is an extension to the second minimization problem above
and allows for the incorporation of the researchers beliefs regarding the spread of the
valuation set in selecting an
!
ˆH value, instead of just guessing an
!
ˆH value. The search
method is conducted by way of the following algorithm [4]:
Eric Szegedi: An Assessment of Fuzzy Linear Regression 15
1. Initialization
• set the interval of uncertainty (
!
Hmin = 0,Hmax =1)
• set H to the initial guess
!
H*
• set
!
ˆc j (chosen spread of selected jth
parameter) and set level of tolerance
!
"
(greatest amount of difference wanted between
!
ˆc j and
!
c j)
• obtain
!
r
" = XT
X( )
#1
XT
Y
• choose the type of membership function
!
µYi
* Yi( )
2. Fuzzy Regression
• determine fuzzy fitted sets with a degree of fit H and membership function
!
µYi
* Yi( )
• calculate for a selected jth
parameter the difference
!
"j = ˆc j (chosen) # c j (from fuzzy regression)
3. Termination or Update
• if
!
"j < # then set
!
ˆH=H,
!
" j = # j ,c j( ) for all j and stop
• if
!
"j # $ then set
!
Hmin = H
• if
!
"j # $% then set
!
Hmax = H
• set
!
H = Hmin + (Hmax " Hmin )
2 and then go to step 2
This algorithm will provide the proper level
!
ˆH and the optimal
!
c j’s under the
membership function
!
µYi
* Yi( ). The degree of fit
!
ˆH will be the membership value that can
be obtained while maintaining the spread of the jth
parameter at a specified level [4].
Eric Szegedi: An Assessment of Fuzzy Linear Regression 16
Least-squares linear regression equation has the equation
!
Y = "0 + "1X1 +K+ "p Xp + #, where the
!
"0,"1,K,"p are the p+1 regression coefficients
that need to be estimated,
!
X1,X2,K,Xp are the p independent variables, and
!
" is the
random error. The random error
!
" has a mean of 0 and a variance of
!
"2
. Least-squares
linear regression chooses as the best-fitting model that model which minimizes the sum
of squares of the distances between the observed responses and those predicted by the
fitted model. The idea is that the better the fit, the smaller the deviations of the observed
values from the predicted values. The least-squares solution then consists of those values
!
ˆ"0, ˆ"1,K, ˆ"p for which the sum
!
Yi " ˆY i( )i=1
n
#
2
is a minimum.
Fuzzy linear regression allows for some of the strict assumptions of least-squares
linear regression to be relaxed [10]. A comparison of the two relevant assumptions of
least-squares linear regression (LSLR) with fuzzy linear regression (FLR) is as follows
(LSLR assumptions are taken from a book by Kleinbaum et al. [3]):
• LSLR requires the linearity assumption, which states that the mean value of Y
for each specific combination of
!
X1,X2,K,Xp is a linear function of
!
X1,X2,K,Xp (i.e.
!
µY X1 ,K,X p
= "0 + "1X1 +K+ "p Xp ). FLR is not strictly linear
because it’s coefficients are the fuzzy sets
!
" j , which are defined by triangular
membership functions in this thesis.
• LSLR requires a normality assumption which states that for any fixed
combination of
!
X1,X2,K,Xp , the variable Y is normally distributed. In LSLR,
deviations between the observed and the estimated values are assumed to be
Eric Szegedi: An Assessment of Fuzzy Linear Regression 17
due to random errors. The variable Y obtains it’s normality distribution from
these normally distributed random error variables. In FLR, the normality
assumption does not apply since the deviations between the observed and the
estimated values are assumed to depend on the vagueness or lack of precision
of the parameters.
The fuzzy linear regression estimates of the coefficients
!
" j , (where j=0,…p) in the
equation
!
Y*
= "0 + "1x1 +K+ "p xp are determined using a Fortran program written by
Redden [6], and modified for the specific application in this thesis. The program uses the
IMSL programming subroutine DDLPRS [1] to determine the spread
!
c j and / or
!
r
" center
of the fuzzy parameters,
!
" j . The program is in Appendix A and limited documentation
for DDLPRS is in Appendix B. The Fortran program needs the data set, the least-squares
estimates of the parameters which will be used for
!
r
" , the center of the fuzzy coefficients
!
" j , and the estimated spread,
!
c j, of one of the parameters as input when comparing the
spreads of FLR with the standard errors of LSLR. When comparing residuals, the Fortran
program only needs the data set as input. The solutions for LSLR and the validity of
LSLR’s assumptions are found by using SAS.
Eric Szegedi: An Assessment of Fuzzy Linear Regression 18
3 Comparison of Performance
3.1 Comparison of FLR Spread and LSLR Standard Error
The data set will be referred to as the Restenosis data set and comes from a Lovastatin
restenosis trial [11]. The Restenosis data set consists of 404 observations with missing
values for 94 of those observations. There are ten covariates and the dependent variable.
The covariates are the angina pectoris grade III or IV (yes or no), the diameter of stenosis
before angioplasty, the index site, the status of diabetes mellitus (yes or no), the diameter
of stenosis after angioplasty, the presence of systemic hypertension (yes or no), the
presence of intimal tear pre-PTCA (yes or no), the presence of intimal tear post-PTCA
(yes or no), the determination of an eccentric or concentric index site, and the age of the
individual. The dependent variable is the restudy of the diameter of the stenosis. In the
comparisons made between FLR and LSLR for the Restenosis data set, only the 310
observations with no missing values were used. The Restenosis data set met the LSLR
assumptions of linearity and normality, as can be seen in Appendix C.
The comparison of the FLR spreads with the LSLR standard errors is undertaken
in order to have an interpretable meaning for the FLR spreads. What is the meaning of an
observed set
!
Yi falling within the predicted fuzzy set
!
Yi
*
? The solutions for FLR are given
below in Table 2, and the standard errors for LSLR are in Table 3. The
!
" j’s in Table 2
Eric Szegedi: An Assessment of Fuzzy Linear Regression 19
are LSLR parameters. Variables 1 through 10 in Table 2 and Table 3 are the covariates,
and variable 0 is the Y-intercept.
Table 2: Estimated
!
" j's and
!
c j's for Restenosis Data Set
The low values of
!
ˆH=0.019 and
!
ˆH=0.188 mean that the FLR estimates do not fit the
data well. The values of
!
ˆH=0.500 and
!
ˆH=0.705 mean that the FLR estimates fit the data
fairly well.
j
!
s.e.j 1.96
!
s.e.j
0 15.452 30.286
1 2.239 4.388
2 0.109 0.213
3 0.184 0.361
4 3.666 7.186
5 0.093 0.183
6 2.259 4.428
7 10.074 19.745
8 2.984 5.848
9 2.263 4.434
10 0.113 0.221
Table 3: LSLR Standard Error for Restenosis Data Set
H=0.019 H=0.188 H=0.500 H=0.705
!
"0=23.696
!
c0= 0.00
!
c0= 0.00
!
c0= 0.00
!
c0= 0.00
!
"1= 1.624
!
c1= 0.00
!
c1= 0.00
!
c1= 0.00
!
c1= 0.00
!
"2= 0.264
!
c2=0.453
!
c2=0.547
!
c2=0.889
!
c2=1.505
!
"3=-0.175
!
c3= 0.00
!
c3= 0.00
!
c3= 0.00
!
c3= 0.00
!
"4=-4.358
!
c4= 0.00
!
c4= 0.00
!
c4= 0.00
!
c4= 0.00
!
"5= 0.477
!
c5=0.343
!
c5=0.415
!
c5=0.674
!
c5=1.141
!
"6=-2.518
!
c6= 0.00
!
c6= 0.00
!
c6= 0.00
!
c6= 0.00
!
"7= 0.725
!
c7= 0.00
!
c7= 0.00
!
c7= 0.00
!
c7= 0.00
!
"8=-5.676
!
c8= 0.00
!
c8= 0.00
!
c8= 0.00
!
c8= 0.00
!
"9=-0.585
!
c9= 0.00
!
c9= 0.00
!
c9= 0.00
!
c9= 0.00
!
"10= 0.078
!
c10=0.338
!
c10=0.408
!
c10=0.663
!
c10=1.123
Eric Szegedi: An Assessment of Fuzzy Linear Regression 20
Only 1.9% of FLR prediction intervals for
!
ˆH=0.019 are narrower than the LSLR 95%
level prediction intervals. For
!
ˆH=0.188,
!
ˆH=0.500, and
!
ˆH=0.705, the FLR prediction
intervals are much wider than the LSLR 95% level prediction intervals. However, the
connection between FLR spreads and LSLR standard error is unclear. The question is
why the FLR prediction intervals for one
!
ˆH value, as opposed to prediction intervals for
other
!
ˆH values, have a higher percentage of prediction intervals that are narrower than
LSLR prediction intervals. There seems to be no relation between the FLR spreads and
the LSLR standard errors with regards to why some FLR prediction intervals are
narrower than the LSLR prediction intervals and other FLR prediction intervals are not
narrower. The crisp FLR spreads of 8 of the parameters do not allow for a good
comparison. Tanaka and Ishibuchi [9] have proposed a method with interactive fuzzy
parameters and quadratic membership functions to deal with crisp FLR spreads. This
method by Tanaka and Ishibuchi is not dealt with, though, in this thesis.
3.2 Comparison of Residuals
When comparing the residuals of the Restenosis data for LSLR and FLR, only 9 of the 10
covariates are used. The covariate for the presence of intimal tear post-PTCA was
removed for ease of computation. This comparison is undertaken in order to determine
whether FLR or LSLR has the better estimated coefficients with regards to the observed
data.
LSLR does much better than FLR since the sum of squared residuals for LSLR is
much smaller than the sum of squared residuals for FLR. The sum of squared residuals
Eric Szegedi: An Assessment of Fuzzy Linear Regression 21
for LSLR is 113,682.71, and the sum of squared residuals for FLR is 145,997.95, for a
difference of 32,315.24.
The results of FLR for
!
ˆH=0.50 are in Table 4. The first line of Table 4 is for the
Y-intercept, and the other lines are for the 9 covariates. For other
!
ˆH values, the
!
" j‘s are
the same except for
!
ˆH=0.80 where
!
"4= -1.632, a minor difference. With these other
!
ˆH
values, only the
!
c j‘s change. These
!
c j‘s are not relevant to the comparison of the
residuals between FLR and LSLR, since only
!
" j‘s are used in the comparison.
j
!
" j
!
c j LSLR
!
paramj
!
s.e.j
0 8.181 0.000 23.696 15.452
1 -4.943 7.468 1.624 2.239
2 0.546 1.246 0.264 0.109
3 -1.601 0.000 -0.175 0.184
4 -7.997 0.000 -4.358 3.666
5 0.418 0.000 0.477 0.093
6 1.365 0.000 -2.518 2.259
7 -2.154 0.000 -5.676 2.984
8 10.628 0.000 -0.585 2.262
9 -0.030 0.000 0.078 0.113
Table 4: Comparison of FLR and LSLR for =
!
ˆH 0.50
The graphs comparing the residuals to their respective observations and to their
respective predicted values are in Figure 1. FLR residuals seem to have greater variability
around 0 then LSLR residuals. The greater variability of the FLR residuals implies that
FLR predicted values are not as close to the observed values as are LSLR predicted
values.
Figure 1: Comparison of FLR residuals with LSLR residuals for Restenosis Data Set
(to view contact author at szegman@yahoo.com)
Eric Szegedi: An Assessment of Fuzzy Linear Regression 22
4. Conclusion and Discussion
An attempt to give meaning to the FLR parameters has been presented by comparing the
spreads of FLR estimated coefficients with the standard errors of LSLR estimated
coefficients and by comparing the residuals of FLR with the residuals of LSLR. At this
point no real conclusions can be drawn. LSLR seems to be better than FLR at describing
the Restenosis data set. Also, an interpretation that has meaning to statisticians can be
given to the results from LSLR. The same still cannot be said of FLR. Further work needs
to be conducted in order for statisticians to obtain an interpretable meaning from the FLR
parameters so that FLR can be used as a viable method for data analysis. For instance,
FLR might be able to be used for small data sets when LSLR can not meet it’s
assumptions.
Other considerations that need to be made when trying to give meaning to FLR
parameters are first, a generalization of the model used in this thesis. This generalized
model is considered robust in the presence of outliers, and the generalized model is a
model where the bounds of the interval are fuzzy. “The dependent data y are no longer
inside or outside the interval but belong to the interval to certain degrees (membership)”
[5]. This generalized model with fuzzy intervals maximizes
!
1
n " = "
i=1
n
# such that
!
1" #)( s0 " ei
*
j=1
p
$
i=1
n
$ % "d0 objective function
Eric Szegedi: An Assessment of Fuzzy Linear Regression 23
!
1" #i)( s1 + yi
*
j= 0
p
$ + ei
*
j= 0
p
$ % yi upper limit
!
1" #i)( s1 " yi
*
j= 0
p
$ + ei
*
j= 0
p
$ % "yi lower limit
!
"#i $ "1,#i $ 0,% & R and
!
xi0 =1
where
!
d0 is the desired value of the objective function,
!
si is the width of the tolerance
interval of the observed
!
yi, and
!
"i represents the membership value to which the solution
belongs to the set “good solution” (
!
"i restricted to [0,1]). A weak requirement to
minimize the spread (a high value of
!
s0 and low values of
!
si) leads to a wide interval.
Whereas strong requirements to minimize the spread (a low value of and high values of )
lead to a narrow interval. Peters suggest
!
d0 = 0 to obtain a model as crisp as possible [5].
This generalized model needs to be investigated to see if and if so, how much better or
narrower are the fuzzy linear regression prediction intervals than least-squares regression
prediction intervals for outliers.
A second consideration that can also possibly enhance the results of fuzzy linear
regression would be to vary the membership function used. Other membership functions
include the following: uniform membership function
!
L z( ) =
1 if -1" z "1
0 otherwise
#
$
%
asymmetric membership function
!
L z( ) = max 0,1" z
p
( ), p > 0, or summed exponential
function
Eric Szegedi: An Assessment of Fuzzy Linear Regression 24
!
L z( ) =
a " e
b z
- ce
d z
if L z( ) is concave -1# z #1
a + e
"b z
+ce
d z
if L z( ) is convex -1# z #1
$
%
&
where L represents the membership function of a standardized parameter z. The
standardized parameter z is defined as the distance between the observation
!
Yi and the
center value of the corresponding fuzzy estimate
!
Yi
*
divided by the difference in spread
of
!
Yi
*
and
!
Yi [4].
Eric Szegedi: An Assessment of Fuzzy Linear Regression 25
A Appendix
PARAMETER (NDATA = 310)
PARAMETER (NVARS = 10)
PARAMETER (N = 2 * NVARS)
PARAMETER (M1 = 2 * NDATA)
PARAMETER (ZZ1 = 1)
PARAMETER (ZZ2 = -1)
INTEGER IRTYPE (M1), I, J, JUMP, K
DOUBLE PRECISION X(NDATA, NVARS + 2), DUMMY(NDATA)
DOUBLE PRECISION A(M1, N), CF(N), S, SUM(NVARS)
DOUBLE PRECISION PSOL(N), DSOL(M1), H, UB, LB, DUMMY2(NDATA)
DOUBLE PRECISION XLB(N), XUB(N), BL(M1), BU(M1)
COMMON / WORKSP / RWKSP
REAL RWKSP(15844578)
C IMSL SUBROUTINE FOR WORKSPACE ALLOCATION
CALL IWKIN(15844578)
OPEN (UNIT=1, FILE=’rest.fprogb.dat’,STATUS=’OLD’)
OPEN (UNIT=2, FILE=’s2.dat’,STATUS=’OLD’)
OPEN (UNIT=3, FILE=’s3.dat’, STATUS=’UNKNOWN’)
C READS IN DATA POINTS
DO 10 I=1, NDATA
10 READ(1,*) (X(I,J), J=1,NVARS+2)
H=0.4
DO 15 WHILE (H .LE. 1.0)
DO 16 I=1, 2*NDATA
DO 17 J=1,2*NVARS
A(I,J)=0
BL(J)=0
17 CONTINUE
16 CONTINUE
C CALCULATES A(M1,NVARS) MATRIX AND BL(M1) VECTOR
C A MATRIX CONTAINS COEFFICIENTS OF M1 CONSTRAINTS
C BL VECTOR CONTAINS THE LOWER LIMIT CONSTRAINTS
ZZ3 = 1 – H
JUMP = 1
DO 30 I = 1, NDATA
Eric Szegedi: An Assessment of Fuzzy Linear Regression 26
DO 40 J = 1, NVARS
A(JUMP,J)=ZZ2*X(I,J+1)
A(JUMP+1,J)=ZZ1*X(I,J+1)
A(JUMP, J+NVARS)=ZZ3*ABS(X(I,J+1))
A(JUMP+1, J+NVARS)=ZZ3*ABS(X(I,J+1))
BL(JUMP) = ZZ2 * (X(I,1)+ZZ3*X(I,NVARS+2)
BL(JUMP + 1) = X(I,1)+ZZ3*X(I,NVARS+2)
40 CONTINUE
JUMP = JUMP + 2
30 CONTINUE
C CALCULATES THE COEFFICIENTS FOR THE OBJECTIVE FUNCTION
DO 50 J = 2, NVARS + 1
DO 60 I = 1, NDATA
SUM(J-1) = SUM(J-1) + ABS(X(I,J))
60 CONTINUE
50 CONTINUE
C ASSIGNS COEFFICIENTS TO OBJECTIVE FUNCTION
DO 70 J=1, NVARS
CF(J) = 0
CF(J+NVARS)=SUM(J)
XLB(J) = 1.0E30
XUB(J) = -1.0E30
XLB(J+NVARS)= 0
XUB(J+NVARS)= -1.0E30
70 CONTINUE
C ASSIGNS TYPE OF CONSTRAINT TO VECTOR IRTYPE(M1)
C (i.e. 2 INDICATES .GE. CONSTRAINT)
DO 80 I = 1, M1
IRTYPE(I) = 2
80 CONTINUE
C IMSL LINEAR PROGRAMMING SUBROUTINE
CALL DDLPRS(M1,N,A,M1,BL,BU,CF,IRTYPE,XLB,XUB,S,PSOL,DSOL)
C PRINTS SOLUTION TO LINEAR PROGRAMMING PROBLEM
WRITE(3, 90)
90 FORMAT(4X, ‘GAMMA’, 9X, ‘C’)
DO 100 J =1, NVARS
WRITE(3, 110) PSOL(J),PSOL(J+NVARS)
110 FORMAT(1X, F10.3, 3X, F7.3)
100CONTINUE
Eric Szegedi: An Assessment of Fuzzy Linear Regression 27
WRITE(3, 120) S
120FORMAT(‘ OBJECTIVE FN = ‘, F11.3)
WRITE(3, *) ‘H VALUE’, H
H=H+0.2
C CALCULATES CONFIDENCE INTERVAL
WRITE(3,*) ‘OBSERVATION ‘,’ UPPER BOUND’,’ LOWER BOUND’
DO 130 I=1, NDATA
DUMMY(I)=0
DUMMY2(I)=0
DO 140 J=1, NVARS
C CALCULATES [(GAMMA)^T]*X(i)
DUMMY(I)=PSOL(J)*X(I,J+1)+DUMMY(I)
C CALCULATES [C^T]*(Xi)
DUMMY2(I)=PSOL(J+NVARS)*X(I,J)+DUMMY2(I)
140 CONTINUE
IF (DUMMY2(I) .GE. 0.0) THEN
UB=DUMMY(I)+DUMMY2(I)
LB=DUMMY(I)-DUMMY2(I)
END IF
IF (DUMMY2(I) .LT. 0.0) THEN
UB=DUMMY(I)-DUMMY2(I)
LB=DUMMY(I)+DUMMY2(I)
END IF
WRITE(3,150) I,UB,LB
150 FORMAT(I7,F19.3,F13.3)
130CONTINUE
15CONTINUE
END
Eric Szegedi: An Assessment of Fuzzy Linear Regression 28
B Appendix
DDLPRS: DOUBLE PRECISION
Purpose: solve a linear programming problem via the revised simples algorithm
Usage: Call DDLPRS(M, NVAR, A LDA, BL, BU, C, IRTYPE, XLB, XUB, OBJ,
XSOL, DSOL)
Arguments:
M – Number of constraints.
NVAR – Number of variables (p+1).
A – Matrix of dimension M by NVAR containing the coefficients of the M
constraints.
LDA – Leading dimension of A exactly as specified in the dimension statement of the
calling program. LDA must be at least M.
BL – Vector of length M containing the lower limit of the general constraints. If there
is no limit on the I-th constraint, the BL(I) would not be referred.
BU – Vector of length M containing the upper limit of the general constraints. If there
is no upper limit on the I-th constraint, then BU(I) would not be referred. If there is
no range constraint, BL and BU can share the same storage location.
C – Vector of length NVAR containing the coefficients of the objective function.
IRTYPE – Vector of length M indication the type of constraints exclusive of simple
bounds, where IRTYPE(I) = 0, 1, 2, 3 indicate .EQ., .LE., .GE., and range constraints
respectively.
Eric Szegedi: An Assessment of Fuzzy Linear Regression 29
XLB – Vector of length NVAR containing the lower bound on the variables. If there
is no lower bound on a variable, then 1.0E30 should be set as the lower bound.
XUB – Vector of length NVAR containing the upper bound on the variables. If there
is no upper bound on a variable, then -1.0E30 should be set as the upper bound.
OBJ – Value of the objective function (Output).
XSOL – Vector of length NVAR containing the primal solution (Output).
DSOL – Vector of length M containing the dual solution (Output).
DDLPRS is based on Richard Hanson’s routine LPMGB. It uses a revised simplex
method to solve linear programming problems. In this thesis the problem is of the form:
!
c " R
min
p +1 r
cT
i = 1
n
# x
i
subject to
!
A
r
cT
"
r
bj
c j " 0
where A is the coefficient matrix consisting of p+1 columns and 2n rows such that
!
A = (1" H)
X
X
#
$
%
&
'
(
X is a design matrix such that
!
X =
1 x11 K x1p
1 x21 K x2p
M M M M
1 xn1 K xnp
"
#
$
$
$
$
%
&
'
'
'
'
!
xi = 1, xi1 , ..., xip[ ]
T
, bl is the vector of the lower bounds on the constraints consisting of
2n rows such that
Eric Szegedi: An Assessment of Fuzzy Linear Regression 30
!
r
bj =
Y1 + (1" H)e1 "
r
#T
x1
"Y2 + (1" H)e2 +
r
#T
x2
K
K
K
Y2n + (1" H)e2n +
r
#T
x2n
"Y2n + (1" H)e2n +
r
#T
x2n
$
%
&
&
&
&
&
&
&
&
&
'
(
)
)
)
)
)
)
)
)
)
and
!
r
cT
is a vector of p+1 rows.
Eric Szegedi: An Assessment of Fuzzy Linear Regression 31
C Appendix
Predicted Value of Y vs. Residual of Y for Restenosis Data Set
Normal Probability Plot for Restenosis Data Set
To view contact author at szegman@yahoo.com
Eric Szegedi: An Assessment of Fuzzy Linear Regression 32
References
[1] IMSL, Inc., Houston, Texas. IMSL Math/Library, version 1.0 edition, 1989.
[2] D. Klaua. Uber einen ansatz zur mehrwertigen mengenlehre. Monatsber Deut. Akad.
Wiss. Berlin, 7:859-876, 1965.
[3] David G. Kleinbaum, Lawrence L. Kupper, and Keith E. Muller. Applied Regression
Analysis and Other Multivariable Methods, 2 ed. PWS-Kent Publishing Company,
Boston, 1988.
[4] Hervert Moskowitz and Kwangjae Kim. On assessing the h value in fuzzy linear
regression. Fuzzy Sets and Systems, 58:303-327, 1993.
[5] Georg Peters. Fuzzy linear regression with fuzzy intervals. Fuzzy Sets and Systems,
63:45-55, 1994.
[6] David T. Redden and William H. Woodall. Properties of certain fuzzy linear
regression methods. Fuzzy Sets and Systems, 64:361-375, 1994.
[7] Dragan A. Savic and Witold Pedrycz. Evaluation of fuzzy linear regression models.
Fuzzy Sets and Systems, 39:51-63, 1991.
Eric Szegedi: An Assessment of Fuzzy Linear Regression 33
[8] Michael Smithson. Fuzzy Set Analysis for Behavioral and Social Sciences, Springer-
Verlag New York, Inc., NY 1987.
[9] Hideo Tanaka and H. Ishibuchi. Identification of possibilistic linear systems by
quadratic membership functions of fuzzy parameters. Fuzzy Sets and Systems,
41:145-160, 1991.
[10] Hideo Tanaka, Satoru Uejima, and Kiyoji Asai. Linear regression analysis with
fuzzy model. IEEE Transactions on Systems, Man, and Cybernetics, 12(6):903-907,
1982.
[11] William S. Weintraub, Andrzej S. Kosinski, Charles L. Brown III, and Spencer B.
King III. Can restenosis after coronary angioplasty be predicted from clinical
variables? Journal of the American College of Cardiology, 21:6-14, 1993.
[12] Lofti A. Zadeh. Fuzzy sets. Information and Control, 8:338-353, 1965.

More Related Content

What's hot

Quantitative data analysis final
Quantitative data analysis final Quantitative data analysis final
Quantitative data analysis final atrantham
 
Ratio and Product Type Estimators Using Stratified Ranked Set Sampling
Ratio and Product Type Estimators Using Stratified Ranked Set SamplingRatio and Product Type Estimators Using Stratified Ranked Set Sampling
Ratio and Product Type Estimators Using Stratified Ranked Set Samplinginventionjournals
 
Inferential Statistics - DAY 4 - B.Ed - AIOU
Inferential Statistics - DAY 4 - B.Ed - AIOUInferential Statistics - DAY 4 - B.Ed - AIOU
Inferential Statistics - DAY 4 - B.Ed - AIOUEqraBaig
 
Simple (and Simplistic) Introduction to Econometrics and Linear Regression
Simple (and Simplistic) Introduction to Econometrics and Linear RegressionSimple (and Simplistic) Introduction to Econometrics and Linear Regression
Simple (and Simplistic) Introduction to Econometrics and Linear RegressionPhilip Tiongson
 
Factor Analysis in Research
Factor Analysis in ResearchFactor Analysis in Research
Factor Analysis in ResearchQasim Raza
 
Introduction to Statistics and Probability
Introduction to Statistics and ProbabilityIntroduction to Statistics and Probability
Introduction to Statistics and ProbabilityBhavana Singh
 
Comparison between two statistical tests of significance
Comparison between two statistical tests of significanceComparison between two statistical tests of significance
Comparison between two statistical tests of significanceinventionjournals
 
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRYSTATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRYkeerthana151
 
applied multivariate statistical techniques in agriculture and plant science 2
applied multivariate statistical techniques in agriculture and plant science 2applied multivariate statistical techniques in agriculture and plant science 2
applied multivariate statistical techniques in agriculture and plant science 2amir rahmani
 
Statistika Dasar (15) statistika non_parametrik
Statistika Dasar (15) statistika non_parametrikStatistika Dasar (15) statistika non_parametrik
Statistika Dasar (15) statistika non_parametrikjayamartha
 
Antonio Gasparrini: Open access: a researcher's perspective
Antonio Gasparrini: Open access: a researcher's perspectiveAntonio Gasparrini: Open access: a researcher's perspective
Antonio Gasparrini: Open access: a researcher's perspectiveNeilStewartCity
 
APPLICATION OF THE METHOD OF VARIATION OF PARAMETERS: MATHEMATICAL MODEL FOR ...
APPLICATION OF THE METHOD OF VARIATION OF PARAMETERS: MATHEMATICAL MODEL FOR ...APPLICATION OF THE METHOD OF VARIATION OF PARAMETERS: MATHEMATICAL MODEL FOR ...
APPLICATION OF THE METHOD OF VARIATION OF PARAMETERS: MATHEMATICAL MODEL FOR ...IJESM JOURNAL
 
Bio statistic (lecture 01)
Bio statistic (lecture 01)Bio statistic (lecture 01)
Bio statistic (lecture 01)AlfahadFarwa
 

What's hot (19)

Multivariate
MultivariateMultivariate
Multivariate
 
Quantitative data analysis final
Quantitative data analysis final Quantitative data analysis final
Quantitative data analysis final
 
Ratio and Product Type Estimators Using Stratified Ranked Set Sampling
Ratio and Product Type Estimators Using Stratified Ranked Set SamplingRatio and Product Type Estimators Using Stratified Ranked Set Sampling
Ratio and Product Type Estimators Using Stratified Ranked Set Sampling
 
Inferential Statistics - DAY 4 - B.Ed - AIOU
Inferential Statistics - DAY 4 - B.Ed - AIOUInferential Statistics - DAY 4 - B.Ed - AIOU
Inferential Statistics - DAY 4 - B.Ed - AIOU
 
Data analysis
Data analysisData analysis
Data analysis
 
Simple (and Simplistic) Introduction to Econometrics and Linear Regression
Simple (and Simplistic) Introduction to Econometrics and Linear RegressionSimple (and Simplistic) Introduction to Econometrics and Linear Regression
Simple (and Simplistic) Introduction to Econometrics and Linear Regression
 
Factor Analysis in Research
Factor Analysis in ResearchFactor Analysis in Research
Factor Analysis in Research
 
Chi square test
Chi square testChi square test
Chi square test
 
Introduction to Statistics and Probability
Introduction to Statistics and ProbabilityIntroduction to Statistics and Probability
Introduction to Statistics and Probability
 
Comparison between two statistical tests of significance
Comparison between two statistical tests of significanceComparison between two statistical tests of significance
Comparison between two statistical tests of significance
 
121 vhgfhg
121 vhgfhg121 vhgfhg
121 vhgfhg
 
Probability and statistics
Probability and statisticsProbability and statistics
Probability and statistics
 
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRYSTATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
STATISTICAL TOOLS USED IN ANALYTICAL CHEMISTRY
 
applied multivariate statistical techniques in agriculture and plant science 2
applied multivariate statistical techniques in agriculture and plant science 2applied multivariate statistical techniques in agriculture and plant science 2
applied multivariate statistical techniques in agriculture and plant science 2
 
Statistika Dasar (15) statistika non_parametrik
Statistika Dasar (15) statistika non_parametrikStatistika Dasar (15) statistika non_parametrik
Statistika Dasar (15) statistika non_parametrik
 
Antonio Gasparrini: Open access: a researcher's perspective
Antonio Gasparrini: Open access: a researcher's perspectiveAntonio Gasparrini: Open access: a researcher's perspective
Antonio Gasparrini: Open access: a researcher's perspective
 
APPLICATION OF THE METHOD OF VARIATION OF PARAMETERS: MATHEMATICAL MODEL FOR ...
APPLICATION OF THE METHOD OF VARIATION OF PARAMETERS: MATHEMATICAL MODEL FOR ...APPLICATION OF THE METHOD OF VARIATION OF PARAMETERS: MATHEMATICAL MODEL FOR ...
APPLICATION OF THE METHOD OF VARIATION OF PARAMETERS: MATHEMATICAL MODEL FOR ...
 
Bio statistic (lecture 01)
Bio statistic (lecture 01)Bio statistic (lecture 01)
Bio statistic (lecture 01)
 
Chap019
Chap019Chap019
Chap019
 

Similar to MSTHESIS_Fuzzy

Evaluation Of A Correlation Analysis Essay
Evaluation Of A Correlation Analysis EssayEvaluation Of A Correlation Analysis Essay
Evaluation Of A Correlation Analysis EssayCrystal Alvarez
 
Multinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfMultinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfAlemAyahu
 
Correlational research
Correlational researchCorrelational research
Correlational researchAiden Yeh
 
cannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfcannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfJermaeDizon2
 
Berman pcori challenge document
Berman pcori challenge documentBerman pcori challenge document
Berman pcori challenge documentLew Berman
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06Kishor Ade
 
2012-Nathans-PARE-RegressionGuidebook.pdf
2012-Nathans-PARE-RegressionGuidebook.pdf2012-Nathans-PARE-RegressionGuidebook.pdf
2012-Nathans-PARE-RegressionGuidebook.pdfs_p2000
 
SubmissionCopyAlexanderBooth
SubmissionCopyAlexanderBoothSubmissionCopyAlexanderBooth
SubmissionCopyAlexanderBoothAlexander Booth
 
Running head INSERT TITLE HERE1INSERT TITLE HERE4.docx
Running head INSERT TITLE HERE1INSERT TITLE HERE4.docxRunning head INSERT TITLE HERE1INSERT TITLE HERE4.docx
Running head INSERT TITLE HERE1INSERT TITLE HERE4.docxjeanettehully
 
EXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docx
EXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docxEXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docx
EXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docxSANSKAR20
 
Basic Elements of Probability Theory
Basic Elements of Probability TheoryBasic Elements of Probability Theory
Basic Elements of Probability TheoryMaira Carvalho
 
Role of Modern Geographical Knowledge in National Development
Role  of Modern Geographical Knowledge in National DevelopmentRole  of Modern Geographical Knowledge in National Development
Role of Modern Geographical Knowledge in National DevelopmentProf Ashis Sarkar
 
Trends in Computer Science and Information Technology
Trends in Computer Science and Information TechnologyTrends in Computer Science and Information Technology
Trends in Computer Science and Information Technologypeertechzpublication
 
An Overview and Application of Discriminant Analysis in Data Analysis
An Overview and Application of Discriminant Analysis in Data AnalysisAn Overview and Application of Discriminant Analysis in Data Analysis
An Overview and Application of Discriminant Analysis in Data AnalysisIOSR Journals
 

Similar to MSTHESIS_Fuzzy (20)

Evaluation Of A Correlation Analysis Essay
Evaluation Of A Correlation Analysis EssayEvaluation Of A Correlation Analysis Essay
Evaluation Of A Correlation Analysis Essay
 
call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...call for papers, research paper publishing, where to publish research paper, ...
call for papers, research paper publishing, where to publish research paper, ...
 
Multinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdfMultinomial Logistic Regression.pdf
Multinomial Logistic Regression.pdf
 
Notes s8811 structuralequations2004
Notes s8811 structuralequations2004Notes s8811 structuralequations2004
Notes s8811 structuralequations2004
 
Chi square test
Chi square testChi square test
Chi square test
 
Correlational research
Correlational researchCorrelational research
Correlational research
 
Adensonian classification
Adensonian classificationAdensonian classification
Adensonian classification
 
cannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfcannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdf
 
Berman pcori challenge document
Berman pcori challenge documentBerman pcori challenge document
Berman pcori challenge document
 
Research Methodology Module-06
Research Methodology Module-06Research Methodology Module-06
Research Methodology Module-06
 
2012-Nathans-PARE-RegressionGuidebook.pdf
2012-Nathans-PARE-RegressionGuidebook.pdf2012-Nathans-PARE-RegressionGuidebook.pdf
2012-Nathans-PARE-RegressionGuidebook.pdf
 
SubmissionCopyAlexanderBooth
SubmissionCopyAlexanderBoothSubmissionCopyAlexanderBooth
SubmissionCopyAlexanderBooth
 
1756-0500-3-267.pdf
1756-0500-3-267.pdf1756-0500-3-267.pdf
1756-0500-3-267.pdf
 
Running head INSERT TITLE HERE1INSERT TITLE HERE4.docx
Running head INSERT TITLE HERE1INSERT TITLE HERE4.docxRunning head INSERT TITLE HERE1INSERT TITLE HERE4.docx
Running head INSERT TITLE HERE1INSERT TITLE HERE4.docx
 
EXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docx
EXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docxEXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docx
EXERCISE 24 UNDERSTANDING PEARSONS r, EFFECT SIZE, AND PERCEN.docx
 
Basic Elements of Probability Theory
Basic Elements of Probability TheoryBasic Elements of Probability Theory
Basic Elements of Probability Theory
 
Role of Modern Geographical Knowledge in National Development
Role  of Modern Geographical Knowledge in National DevelopmentRole  of Modern Geographical Knowledge in National Development
Role of Modern Geographical Knowledge in National Development
 
Trends in Computer Science and Information Technology
Trends in Computer Science and Information TechnologyTrends in Computer Science and Information Technology
Trends in Computer Science and Information Technology
 
Eviews forecasting
Eviews forecastingEviews forecasting
Eviews forecasting
 
An Overview and Application of Discriminant Analysis in Data Analysis
An Overview and Application of Discriminant Analysis in Data AnalysisAn Overview and Application of Discriminant Analysis in Data Analysis
An Overview and Application of Discriminant Analysis in Data Analysis
 

MSTHESIS_Fuzzy

  • 1. An Assessment of Fuzzy Linear Regression By Eric Szegedi B.S., Liberty University, 1992 Advisor: Andrzej S. Kosinski, Ph. D. A thesis submitted to the Faculty of the Rollins School of Public Health Of Emory University in partial fulfillment Of the requirements for the degree of Master of Public Health Department of Biostatistics 1996
  • 2. An Assessment of Fuzzy Linear Regression By Eric Szegedi Advisor: Andrzej S. Kosinski, Ph. D. Approved for the Department Andrzej S. Kosinski Adviser Michael Lynn Committee Member Accepted: Vicki Stover Hertzeg Director, Division of Biostatistics 10 May 1996 Date
  • 3. In presenting this thesis as a partial fulfillment of the requirements for an advanced degree from Emory University, I agree that the Library of the University shall make it available for inspection and circulation in accordance with its regulations governing materials of this type. I agree that permission to copy from, or to publish, this thesis may be granted by the professor under whose direction it was written, or, in his absence, by the Dean of the Rollins School of Public Health when such copying or publication is solely for scholarly purposes and does not involve potential financial gain. It is understood that any copying from, or publication of, this thesis which involves potential financial gain will not be allowed without written permission. Eric Szegedi
  • 4. NOTICE TO BORROWERS Unpublished theses deposited in the Emory University Library must be used only in accordance with the stipulations prescribed by the author in the preceding statement. The author of this thesis is: NAME: Eric Szegedi ADDRESS: szegman@yahoo.com The director of this thesis is: NAME: Andrzej S. Kosinski, Ph.D. ADDRESS: The Rollins School of Public Health at Emory University, Division of Biostatistics, 1518 Clifton Rd. NE, Atlanta, GA 30322, USA Users of this thesis not regularly enrolled as students at Emory University are required to attest acceptance of the preceding stipulations by signing below. Libraries borrowing this thesis for the use of their patrons are required to see that each user record here the information requested. Name of user Address Date Type of use: (Examination or copying)
  • 5. ABSTRACT The purpose of this thesis is to investigate the usefulness of fuzzy linear regression as developed by Tanaka, Uejima, and Asai, and then refined by Savic and Pedrycz, 1991. A comparison of fuzzy linear regression to least-squares linear regression is used to make the assessment. The comparison is undertaken in two ways. One way is to compare the spreads, ! r c , of the fuzzy linear regression estimated coefficients with the standard errors or least-squares linear regression estimated coefficients. The other way is to compare the residuals of each type of linear regression. The goal of these two comparisons is for statisticians to gain a better understanding of fuzzy linear regression as a method for data analysis. The conclusion of this thesis is that further work needs to be conducted in order to obtain an interpretable meaning from the fuzzy linear regression parameters. At this point a clearer understanding cannot be given. (Note: Thesis is on file at http://www.sph.emory.edu/bios/news/library/szegedi.html)
  • 6. Contents 1 INTRODUCTION......................................................................................................8 2 THEORY OF FUZZY LINEAR REGRESSION....................................................11 3 COMPARISON OF PERFORMANCE ..................................................................18 3.1 COMPARISON OF FLR SPREAD AND LSLR STANDARD ERROR ..........................18 3.2 COMPARISON OF RESIDUALS ..............................................................................20 4. CONCLUSION AND DISCUSSION......................................................................22 A APPENDIX..............................................................................................................25 B APPENDIX..............................................................................................................28 C APPENDIX..............................................................................................................31 REFERENCES............................................................................................................32
  • 7. List of Tables Table 1: Example of Membership Values for Several____________________ 10 Table 2: Estimated ! " j's and ! c j's for Restenosis Data Set ________________ 19 Table 3: LSLR Standard Error for Restenosis Data Set __________________ 19 Table 4: Comparison of FLR and LSLR for = ! ˆH 0.50____________________ 21 List of Figures Figure 1: Comparison of FLR residuals with LSLR residuals for Restenosis Data Set 21
  • 8. Eric Szegedi: An Assessment of Fuzzy Linear Regression 8 1 Introduction In analyzing medical data, various indicators for a disease or outcome do not always have an exact definition nor does their relationship. Also measurements may not be precise or cannot be precise and a more precise meaning may be given to a variable using fuzzy sets. For example, hypertension is defined as being greater than 90 mmHg diastolic blood pressure for a “normal” person. A more accurate definition may be given using fuzzy sets which will give a slight possibility of hypertension to someone with 70 mmHg diastolic blood pressure and a high possibility of hypertension to someone with 105 mmHg diastolic blood pressure. Another example is the relationship between a person’s hdl cholesterol level and their level of heart disease. Someone with a high hdl cholesterol level will have a high possibility of heart disease, whereas someone with a low hdl cholesterol level will have a slight possibility of heart disease. Fuzzy sets are a way to deal with problems where the source of imprecision is not random error but “the absence of sharply defined criteria of class membership” [12]. If X is said to be a space of points or objects with an element of X being referred to in general as x, then a membership function ! µS x( ) that maps each point in X to a real number in the interval [0,1] defines a fuzzy set S in X. The value of ! µS x( ) at x ! " X, and is the set of possible membership values. A set is considered fuzzy as long as the valuation set contains values between 0 and 1. In classical set theory the valuation set comprises only two values – membership or non-membership, ! µS x( )=1 or ! µS x( )=0 respectively. In fuzzy set theory this valuation set describes the graded membership of an element x in a
  • 9. Eric Szegedi: An Assessment of Fuzzy Linear Regression 9 set S by use of the function ! µS x( ) mapping x to values between 0 and 1. An element may belong partially to a set S, and the higher the value of the membership function ! µS x( ) or the closer ! µS x( ) is to 1, the more an element x belongs to the set S. The set S does not have a clearly defined membership, since S cannot be said to contain certain elements or not. An ! x1 with a membership of ! µS x2( )=0.15. The element ! x1 is said to characterize the set S more than ! x2 since ! x1 has a higher membership value than ! x2. The various indicators for a disease or outcome, mentioned earlier, may actually have a fuzzy definition, and their relationship may be described by a fuzzy function, such as the fuzzy linear regression described in this thesis. The distribution of the data for these various indicators may be characterized by a valuation set, or possibility distribution as a valuation set is sometimes called in the literature [5]. The valuation set consists of the values of the membership function and is based on the concept of fuzzy logic. Fuzzy logic was created nearly simultaneously by Zadeh [12] and Klaua [2] in 1965 and has been mostly applied to engineering problems and control systems. Where classical logic allows only for a value of true or false, denoted 1 and 0 respectively, fuzzy logic allows for a gradation of values within the interval [0,1], so that a value of partially true or partially false may be assigned. Fuzzy set theory is based on this concept of fuzzy logic. For example, if one were to ask a respondent in a survey how many times they use cocaine and they only respond with the word “several”, a valuation set for the word “several” would have to be used. The valuation set may look like the table below [8]. In this case the valuation set, which comprises the membership values in Table 1, was
  • 10. Eric Szegedi: An Assessment of Fuzzy Linear Regression 10 created by asking 23 students to “rate the degree of possibility that various integers could be the number someone has in mind when they say several” [8]. integer membership value 0 0.00 1 0.00 2 0.00 3 0.18 4 0.57 5 0.81 6 0.97 7 0.84 8 0.72 9 0.26 10 0.02 11 0.00 Table 1: Example of Membership Values for Several The numbers shown in Table 1 are the mean ratings given by the students. The value of the membership function for x=3 would be ! µS 3( )=0.18. The table shows that the word “several” refers mainly to the integers between 5 and 8 because of their high membership values. The most possible value of the word “several” would be 6, since 6 is the number in the fuzzy set S with the highest membership value. In this thesis, the question of the usefulness of fuzzy linear regression (FLR) in statistics is assessed by comparing FLR to least-squares linear regression (LSLR). The comparison between FLR and LSLR is done in two ways. One such way is to compare the spreads of FLR estimated coefficients with the standard errors of LSLR estimated coefficients. The other way is to compare the residuals of FLR with the residuals of LSLR. The goal is to have a better understanding of the FLR parameters [ ! ( r ", r c), defined in section 2], so that statisticians can use FLR as an alternative to LSLR.
  • 11. Eric Szegedi: An Assessment of Fuzzy Linear Regression 11 2 Theory of Fuzzy Linear Regression The purpose of fuzzy linear regression is to determine the estimated fuzzy coefficients that have the minimum membership function for the observed fuzzy set ! Yi = (i =1,...,n) in the predicted fuzzy set ! Yi * . The fuzzy set ! Yi * used in this thesis is defined by the linear model ! Yi * = "0 + "1xi1 + "2xi2 + ... + "p xip [10], where the ! xij j =1,..., p) are covariates given as non-fuzzy input data and ! " j ( j = 0,..., p) are fuzzy sets. A design matrix for ! xij is ! X = 1 x11 K x1p 1 x21 K x2p M M M M 1 xn1 K xnp " # $ $ $ $ % & ' ' ' ' where ! xi T is the ith row of X. The observed fuzzy set ! Yi is characterized by center and spread ! yi,ei( ),such that the observed fuzzy set ! Yi contains real numbers in the interval ! yi " ei,yi + ei[ ]. The center ! yi and the spread ! ei are given as input from the observed data set. In this thesis ! ei =0 for all i. The predicted fuzzy set ! Yi * is characterized by ! yi * ,ei * ( ) where ! yi * = xi T r " and ! ei * = xi T r c , such that the predicted fuzzy set ! Yi * contains real numbers in the interval ! yi * " ei * ,yi * + ei * [ ]. Here ! r " and ! r c denote vectors of center values ! r " = "0,...,"p[ ] T ( ) and spread ! r c = c0,...,cp[ ] T ( ) for all the fuzzy sets ! " j ( j = 0,..., p). In this thesis the membership function that determines the membership value will be of the form ! L •( ) = max 0,1" •( ) such that the membership function of ! Yi * is
  • 12. Eric Szegedi: An Assessment of Fuzzy Linear Regression 12 ! µYi * Yi( ) = L yi " yi * ei " ei * # $ % & ' ( = 1" yi " yi * ei " ei * if ei " ei * > 0 and yi " yi * ) ei " ei * 0 if ei " ei * = 0 and yi * yi * or ei " ei * < yi " yi * 1 if ei " ei * = 0 and yi = yi * 0 if ei " ei * < 0 + , - -- . - - - where ! µYi * Yi( ) is the membership function of ! Yi defining the fuzzy set ! Yi * . The center ! yi * is the most possible value of the set ! Yi * because ! yi * has the highest membership function for ! µYi * Yi( ). The spread ! ei * determines how fuzzy or precise the set ! Yi * will be. Also, the wider the spread the closer to 1 (the largest membership value) the membership function becomes. Let the minimum ! µYi * Yi( ) for all i be H such that H is “the largest membership value such that all ! yi values having membership of at least [H] inside the fuzzy [observed] set ! Yi have at least [H] membership values inside the fuzzy [estimated] set ! Yi * ” [6] ! H = minµYi * (Yi ),i =1,K,n( ). The value H is called the degree of fit for a model and is a value in the interval [0,1]. A value of H=0.7 implies a higher degree of membership of ! Yi in ! Yi than a value of H=0.2. The fuzzy sets ! " j ( j = 0,..., p) are defined as ! µ" j aj( )= 1# aj # $ j c j if aj # $ j % cj and cj > 0 0 if cj = 0 and aj & $ j or cj < aj # $ j 1 if cj = 0 and aj = $ j ' ( ) ) ) * ) ) )
  • 13. Eric Szegedi: An Assessment of Fuzzy Linear Regression 13 where ! µ" j aj( ) is the membership function of ! aj defining the fuzzy set ! " j . The center of the fuzzy set ! " j is ! " j and is the most possible value of the set ! " j because ! " j has the highest membership value for ! µ" j aj( ). The spread around the center ! " j of the fuzzy set ! " j is ! c j and is the precision of the fuzzy set ! " j . A fuzzy set ! " j with ! c j=0 is referred to as a crisp set. The determination of the FLR residuals will be accomplished by using the original formulation of the minimization problem by Tanaka et al. [10]. The minimization problem is formulated as solving the following linear programming problem for ! r c and ! r " : ! r " r c # R p + 1 min ei * i=1 n $ subject to yi * + 1% ˆH( )ei * & yi + 1% ˆH( )ei and yi * % 1% ˆH( )ei * & yi % 1% ˆH( )ei where ! c j " 0 e* = xi T r c( ) and ! ˆH is the estimated degree of fit. The minimization problem above has 2n constraints. To compare the spreads of FLR with standard errors of LSLR this thesis will use the minimization problem, as developed by Tanaka et al. [10] but then refined by Savic and Pedrycz [7], along with the search method by Moskowitz and Kim to provide the proper ! r " and ! r c . The reason for the difference in the two analyses is that in the refinement by Savic and Pedrycz the centers of FLR would be the same as the parameter estimates in LSLR, therefore, the residuals of FLR and LSLR could not be different by definition.
  • 14. Eric Szegedi: An Assessment of Fuzzy Linear Regression 14 The fuzzy regression minimization problem for the second comparison is conducted in two steps. The first step in the refinement of the problem by Savic and Pedrycz is to obtain ! r " = XT X( ) #1 XT Y, which is the least-squares estimator. The second step is to solve the following linear programming problem for ! r c : ! r c " R p + 1 min ei * i=1 n # subject to yi * + 1$ ˆH( )ei * % yi + 1$ ˆH( )ei and yi * $ 1$ ˆH( )ei * % yi $ 1$ ˆH( )ei ! c j " 0 e* = xi T r c( ). The minimization problem above also has 2n constraints. This procedure as refined by Savic and Pedrycz is uniquely defined for ! r " when X is a full rank matrix. As the value of ! ˆH becomes higher, the greater ! r c becomes [4]. The best degree of fit is determined by finding the estimated fuzzy sets ! " j characterized by ! " j ,c j( ) of ! Yi * which are solutions to the minimization problem. Moskowitz and Kim note that “the selection of a proper value of [ ! ˆH] is important in fuzzy regression, because it determines the range of the possibility distributions [valuation sets] of the fuzzy parameters.” Moskowitz and Kim suggest two methods for determining ! ˆH, an analytical method and a search method. I will use the search method in this thesis since the search method is more advantageous when the amount of spread is uncertain. The search method is an extension to the second minimization problem above and allows for the incorporation of the researchers beliefs regarding the spread of the valuation set in selecting an ! ˆH value, instead of just guessing an ! ˆH value. The search method is conducted by way of the following algorithm [4]:
  • 15. Eric Szegedi: An Assessment of Fuzzy Linear Regression 15 1. Initialization • set the interval of uncertainty ( ! Hmin = 0,Hmax =1) • set H to the initial guess ! H* • set ! ˆc j (chosen spread of selected jth parameter) and set level of tolerance ! " (greatest amount of difference wanted between ! ˆc j and ! c j) • obtain ! r " = XT X( ) #1 XT Y • choose the type of membership function ! µYi * Yi( ) 2. Fuzzy Regression • determine fuzzy fitted sets with a degree of fit H and membership function ! µYi * Yi( ) • calculate for a selected jth parameter the difference ! "j = ˆc j (chosen) # c j (from fuzzy regression) 3. Termination or Update • if ! "j < # then set ! ˆH=H, ! " j = # j ,c j( ) for all j and stop • if ! "j # $ then set ! Hmin = H • if ! "j # $% then set ! Hmax = H • set ! H = Hmin + (Hmax " Hmin ) 2 and then go to step 2 This algorithm will provide the proper level ! ˆH and the optimal ! c j’s under the membership function ! µYi * Yi( ). The degree of fit ! ˆH will be the membership value that can be obtained while maintaining the spread of the jth parameter at a specified level [4].
  • 16. Eric Szegedi: An Assessment of Fuzzy Linear Regression 16 Least-squares linear regression equation has the equation ! Y = "0 + "1X1 +K+ "p Xp + #, where the ! "0,"1,K,"p are the p+1 regression coefficients that need to be estimated, ! X1,X2,K,Xp are the p independent variables, and ! " is the random error. The random error ! " has a mean of 0 and a variance of ! "2 . Least-squares linear regression chooses as the best-fitting model that model which minimizes the sum of squares of the distances between the observed responses and those predicted by the fitted model. The idea is that the better the fit, the smaller the deviations of the observed values from the predicted values. The least-squares solution then consists of those values ! ˆ"0, ˆ"1,K, ˆ"p for which the sum ! Yi " ˆY i( )i=1 n # 2 is a minimum. Fuzzy linear regression allows for some of the strict assumptions of least-squares linear regression to be relaxed [10]. A comparison of the two relevant assumptions of least-squares linear regression (LSLR) with fuzzy linear regression (FLR) is as follows (LSLR assumptions are taken from a book by Kleinbaum et al. [3]): • LSLR requires the linearity assumption, which states that the mean value of Y for each specific combination of ! X1,X2,K,Xp is a linear function of ! X1,X2,K,Xp (i.e. ! µY X1 ,K,X p = "0 + "1X1 +K+ "p Xp ). FLR is not strictly linear because it’s coefficients are the fuzzy sets ! " j , which are defined by triangular membership functions in this thesis. • LSLR requires a normality assumption which states that for any fixed combination of ! X1,X2,K,Xp , the variable Y is normally distributed. In LSLR, deviations between the observed and the estimated values are assumed to be
  • 17. Eric Szegedi: An Assessment of Fuzzy Linear Regression 17 due to random errors. The variable Y obtains it’s normality distribution from these normally distributed random error variables. In FLR, the normality assumption does not apply since the deviations between the observed and the estimated values are assumed to depend on the vagueness or lack of precision of the parameters. The fuzzy linear regression estimates of the coefficients ! " j , (where j=0,…p) in the equation ! Y* = "0 + "1x1 +K+ "p xp are determined using a Fortran program written by Redden [6], and modified for the specific application in this thesis. The program uses the IMSL programming subroutine DDLPRS [1] to determine the spread ! c j and / or ! r " center of the fuzzy parameters, ! " j . The program is in Appendix A and limited documentation for DDLPRS is in Appendix B. The Fortran program needs the data set, the least-squares estimates of the parameters which will be used for ! r " , the center of the fuzzy coefficients ! " j , and the estimated spread, ! c j, of one of the parameters as input when comparing the spreads of FLR with the standard errors of LSLR. When comparing residuals, the Fortran program only needs the data set as input. The solutions for LSLR and the validity of LSLR’s assumptions are found by using SAS.
  • 18. Eric Szegedi: An Assessment of Fuzzy Linear Regression 18 3 Comparison of Performance 3.1 Comparison of FLR Spread and LSLR Standard Error The data set will be referred to as the Restenosis data set and comes from a Lovastatin restenosis trial [11]. The Restenosis data set consists of 404 observations with missing values for 94 of those observations. There are ten covariates and the dependent variable. The covariates are the angina pectoris grade III or IV (yes or no), the diameter of stenosis before angioplasty, the index site, the status of diabetes mellitus (yes or no), the diameter of stenosis after angioplasty, the presence of systemic hypertension (yes or no), the presence of intimal tear pre-PTCA (yes or no), the presence of intimal tear post-PTCA (yes or no), the determination of an eccentric or concentric index site, and the age of the individual. The dependent variable is the restudy of the diameter of the stenosis. In the comparisons made between FLR and LSLR for the Restenosis data set, only the 310 observations with no missing values were used. The Restenosis data set met the LSLR assumptions of linearity and normality, as can be seen in Appendix C. The comparison of the FLR spreads with the LSLR standard errors is undertaken in order to have an interpretable meaning for the FLR spreads. What is the meaning of an observed set ! Yi falling within the predicted fuzzy set ! Yi * ? The solutions for FLR are given below in Table 2, and the standard errors for LSLR are in Table 3. The ! " j’s in Table 2
  • 19. Eric Szegedi: An Assessment of Fuzzy Linear Regression 19 are LSLR parameters. Variables 1 through 10 in Table 2 and Table 3 are the covariates, and variable 0 is the Y-intercept. Table 2: Estimated ! " j's and ! c j's for Restenosis Data Set The low values of ! ˆH=0.019 and ! ˆH=0.188 mean that the FLR estimates do not fit the data well. The values of ! ˆH=0.500 and ! ˆH=0.705 mean that the FLR estimates fit the data fairly well. j ! s.e.j 1.96 ! s.e.j 0 15.452 30.286 1 2.239 4.388 2 0.109 0.213 3 0.184 0.361 4 3.666 7.186 5 0.093 0.183 6 2.259 4.428 7 10.074 19.745 8 2.984 5.848 9 2.263 4.434 10 0.113 0.221 Table 3: LSLR Standard Error for Restenosis Data Set H=0.019 H=0.188 H=0.500 H=0.705 ! "0=23.696 ! c0= 0.00 ! c0= 0.00 ! c0= 0.00 ! c0= 0.00 ! "1= 1.624 ! c1= 0.00 ! c1= 0.00 ! c1= 0.00 ! c1= 0.00 ! "2= 0.264 ! c2=0.453 ! c2=0.547 ! c2=0.889 ! c2=1.505 ! "3=-0.175 ! c3= 0.00 ! c3= 0.00 ! c3= 0.00 ! c3= 0.00 ! "4=-4.358 ! c4= 0.00 ! c4= 0.00 ! c4= 0.00 ! c4= 0.00 ! "5= 0.477 ! c5=0.343 ! c5=0.415 ! c5=0.674 ! c5=1.141 ! "6=-2.518 ! c6= 0.00 ! c6= 0.00 ! c6= 0.00 ! c6= 0.00 ! "7= 0.725 ! c7= 0.00 ! c7= 0.00 ! c7= 0.00 ! c7= 0.00 ! "8=-5.676 ! c8= 0.00 ! c8= 0.00 ! c8= 0.00 ! c8= 0.00 ! "9=-0.585 ! c9= 0.00 ! c9= 0.00 ! c9= 0.00 ! c9= 0.00 ! "10= 0.078 ! c10=0.338 ! c10=0.408 ! c10=0.663 ! c10=1.123
  • 20. Eric Szegedi: An Assessment of Fuzzy Linear Regression 20 Only 1.9% of FLR prediction intervals for ! ˆH=0.019 are narrower than the LSLR 95% level prediction intervals. For ! ˆH=0.188, ! ˆH=0.500, and ! ˆH=0.705, the FLR prediction intervals are much wider than the LSLR 95% level prediction intervals. However, the connection between FLR spreads and LSLR standard error is unclear. The question is why the FLR prediction intervals for one ! ˆH value, as opposed to prediction intervals for other ! ˆH values, have a higher percentage of prediction intervals that are narrower than LSLR prediction intervals. There seems to be no relation between the FLR spreads and the LSLR standard errors with regards to why some FLR prediction intervals are narrower than the LSLR prediction intervals and other FLR prediction intervals are not narrower. The crisp FLR spreads of 8 of the parameters do not allow for a good comparison. Tanaka and Ishibuchi [9] have proposed a method with interactive fuzzy parameters and quadratic membership functions to deal with crisp FLR spreads. This method by Tanaka and Ishibuchi is not dealt with, though, in this thesis. 3.2 Comparison of Residuals When comparing the residuals of the Restenosis data for LSLR and FLR, only 9 of the 10 covariates are used. The covariate for the presence of intimal tear post-PTCA was removed for ease of computation. This comparison is undertaken in order to determine whether FLR or LSLR has the better estimated coefficients with regards to the observed data. LSLR does much better than FLR since the sum of squared residuals for LSLR is much smaller than the sum of squared residuals for FLR. The sum of squared residuals
  • 21. Eric Szegedi: An Assessment of Fuzzy Linear Regression 21 for LSLR is 113,682.71, and the sum of squared residuals for FLR is 145,997.95, for a difference of 32,315.24. The results of FLR for ! ˆH=0.50 are in Table 4. The first line of Table 4 is for the Y-intercept, and the other lines are for the 9 covariates. For other ! ˆH values, the ! " j‘s are the same except for ! ˆH=0.80 where ! "4= -1.632, a minor difference. With these other ! ˆH values, only the ! c j‘s change. These ! c j‘s are not relevant to the comparison of the residuals between FLR and LSLR, since only ! " j‘s are used in the comparison. j ! " j ! c j LSLR ! paramj ! s.e.j 0 8.181 0.000 23.696 15.452 1 -4.943 7.468 1.624 2.239 2 0.546 1.246 0.264 0.109 3 -1.601 0.000 -0.175 0.184 4 -7.997 0.000 -4.358 3.666 5 0.418 0.000 0.477 0.093 6 1.365 0.000 -2.518 2.259 7 -2.154 0.000 -5.676 2.984 8 10.628 0.000 -0.585 2.262 9 -0.030 0.000 0.078 0.113 Table 4: Comparison of FLR and LSLR for = ! ˆH 0.50 The graphs comparing the residuals to their respective observations and to their respective predicted values are in Figure 1. FLR residuals seem to have greater variability around 0 then LSLR residuals. The greater variability of the FLR residuals implies that FLR predicted values are not as close to the observed values as are LSLR predicted values. Figure 1: Comparison of FLR residuals with LSLR residuals for Restenosis Data Set (to view contact author at szegman@yahoo.com)
  • 22. Eric Szegedi: An Assessment of Fuzzy Linear Regression 22 4. Conclusion and Discussion An attempt to give meaning to the FLR parameters has been presented by comparing the spreads of FLR estimated coefficients with the standard errors of LSLR estimated coefficients and by comparing the residuals of FLR with the residuals of LSLR. At this point no real conclusions can be drawn. LSLR seems to be better than FLR at describing the Restenosis data set. Also, an interpretation that has meaning to statisticians can be given to the results from LSLR. The same still cannot be said of FLR. Further work needs to be conducted in order for statisticians to obtain an interpretable meaning from the FLR parameters so that FLR can be used as a viable method for data analysis. For instance, FLR might be able to be used for small data sets when LSLR can not meet it’s assumptions. Other considerations that need to be made when trying to give meaning to FLR parameters are first, a generalization of the model used in this thesis. This generalized model is considered robust in the presence of outliers, and the generalized model is a model where the bounds of the interval are fuzzy. “The dependent data y are no longer inside or outside the interval but belong to the interval to certain degrees (membership)” [5]. This generalized model with fuzzy intervals maximizes ! 1 n " = " i=1 n # such that ! 1" #)( s0 " ei * j=1 p $ i=1 n $ % "d0 objective function
  • 23. Eric Szegedi: An Assessment of Fuzzy Linear Regression 23 ! 1" #i)( s1 + yi * j= 0 p $ + ei * j= 0 p $ % yi upper limit ! 1" #i)( s1 " yi * j= 0 p $ + ei * j= 0 p $ % "yi lower limit ! "#i $ "1,#i $ 0,% & R and ! xi0 =1 where ! d0 is the desired value of the objective function, ! si is the width of the tolerance interval of the observed ! yi, and ! "i represents the membership value to which the solution belongs to the set “good solution” ( ! "i restricted to [0,1]). A weak requirement to minimize the spread (a high value of ! s0 and low values of ! si) leads to a wide interval. Whereas strong requirements to minimize the spread (a low value of and high values of ) lead to a narrow interval. Peters suggest ! d0 = 0 to obtain a model as crisp as possible [5]. This generalized model needs to be investigated to see if and if so, how much better or narrower are the fuzzy linear regression prediction intervals than least-squares regression prediction intervals for outliers. A second consideration that can also possibly enhance the results of fuzzy linear regression would be to vary the membership function used. Other membership functions include the following: uniform membership function ! L z( ) = 1 if -1" z "1 0 otherwise # $ % asymmetric membership function ! L z( ) = max 0,1" z p ( ), p > 0, or summed exponential function
  • 24. Eric Szegedi: An Assessment of Fuzzy Linear Regression 24 ! L z( ) = a " e b z - ce d z if L z( ) is concave -1# z #1 a + e "b z +ce d z if L z( ) is convex -1# z #1 $ % & where L represents the membership function of a standardized parameter z. The standardized parameter z is defined as the distance between the observation ! Yi and the center value of the corresponding fuzzy estimate ! Yi * divided by the difference in spread of ! Yi * and ! Yi [4].
  • 25. Eric Szegedi: An Assessment of Fuzzy Linear Regression 25 A Appendix PARAMETER (NDATA = 310) PARAMETER (NVARS = 10) PARAMETER (N = 2 * NVARS) PARAMETER (M1 = 2 * NDATA) PARAMETER (ZZ1 = 1) PARAMETER (ZZ2 = -1) INTEGER IRTYPE (M1), I, J, JUMP, K DOUBLE PRECISION X(NDATA, NVARS + 2), DUMMY(NDATA) DOUBLE PRECISION A(M1, N), CF(N), S, SUM(NVARS) DOUBLE PRECISION PSOL(N), DSOL(M1), H, UB, LB, DUMMY2(NDATA) DOUBLE PRECISION XLB(N), XUB(N), BL(M1), BU(M1) COMMON / WORKSP / RWKSP REAL RWKSP(15844578) C IMSL SUBROUTINE FOR WORKSPACE ALLOCATION CALL IWKIN(15844578) OPEN (UNIT=1, FILE=’rest.fprogb.dat’,STATUS=’OLD’) OPEN (UNIT=2, FILE=’s2.dat’,STATUS=’OLD’) OPEN (UNIT=3, FILE=’s3.dat’, STATUS=’UNKNOWN’) C READS IN DATA POINTS DO 10 I=1, NDATA 10 READ(1,*) (X(I,J), J=1,NVARS+2) H=0.4 DO 15 WHILE (H .LE. 1.0) DO 16 I=1, 2*NDATA DO 17 J=1,2*NVARS A(I,J)=0 BL(J)=0 17 CONTINUE 16 CONTINUE C CALCULATES A(M1,NVARS) MATRIX AND BL(M1) VECTOR C A MATRIX CONTAINS COEFFICIENTS OF M1 CONSTRAINTS C BL VECTOR CONTAINS THE LOWER LIMIT CONSTRAINTS ZZ3 = 1 – H JUMP = 1 DO 30 I = 1, NDATA
  • 26. Eric Szegedi: An Assessment of Fuzzy Linear Regression 26 DO 40 J = 1, NVARS A(JUMP,J)=ZZ2*X(I,J+1) A(JUMP+1,J)=ZZ1*X(I,J+1) A(JUMP, J+NVARS)=ZZ3*ABS(X(I,J+1)) A(JUMP+1, J+NVARS)=ZZ3*ABS(X(I,J+1)) BL(JUMP) = ZZ2 * (X(I,1)+ZZ3*X(I,NVARS+2) BL(JUMP + 1) = X(I,1)+ZZ3*X(I,NVARS+2) 40 CONTINUE JUMP = JUMP + 2 30 CONTINUE C CALCULATES THE COEFFICIENTS FOR THE OBJECTIVE FUNCTION DO 50 J = 2, NVARS + 1 DO 60 I = 1, NDATA SUM(J-1) = SUM(J-1) + ABS(X(I,J)) 60 CONTINUE 50 CONTINUE C ASSIGNS COEFFICIENTS TO OBJECTIVE FUNCTION DO 70 J=1, NVARS CF(J) = 0 CF(J+NVARS)=SUM(J) XLB(J) = 1.0E30 XUB(J) = -1.0E30 XLB(J+NVARS)= 0 XUB(J+NVARS)= -1.0E30 70 CONTINUE C ASSIGNS TYPE OF CONSTRAINT TO VECTOR IRTYPE(M1) C (i.e. 2 INDICATES .GE. CONSTRAINT) DO 80 I = 1, M1 IRTYPE(I) = 2 80 CONTINUE C IMSL LINEAR PROGRAMMING SUBROUTINE CALL DDLPRS(M1,N,A,M1,BL,BU,CF,IRTYPE,XLB,XUB,S,PSOL,DSOL) C PRINTS SOLUTION TO LINEAR PROGRAMMING PROBLEM WRITE(3, 90) 90 FORMAT(4X, ‘GAMMA’, 9X, ‘C’) DO 100 J =1, NVARS WRITE(3, 110) PSOL(J),PSOL(J+NVARS) 110 FORMAT(1X, F10.3, 3X, F7.3) 100CONTINUE
  • 27. Eric Szegedi: An Assessment of Fuzzy Linear Regression 27 WRITE(3, 120) S 120FORMAT(‘ OBJECTIVE FN = ‘, F11.3) WRITE(3, *) ‘H VALUE’, H H=H+0.2 C CALCULATES CONFIDENCE INTERVAL WRITE(3,*) ‘OBSERVATION ‘,’ UPPER BOUND’,’ LOWER BOUND’ DO 130 I=1, NDATA DUMMY(I)=0 DUMMY2(I)=0 DO 140 J=1, NVARS C CALCULATES [(GAMMA)^T]*X(i) DUMMY(I)=PSOL(J)*X(I,J+1)+DUMMY(I) C CALCULATES [C^T]*(Xi) DUMMY2(I)=PSOL(J+NVARS)*X(I,J)+DUMMY2(I) 140 CONTINUE IF (DUMMY2(I) .GE. 0.0) THEN UB=DUMMY(I)+DUMMY2(I) LB=DUMMY(I)-DUMMY2(I) END IF IF (DUMMY2(I) .LT. 0.0) THEN UB=DUMMY(I)-DUMMY2(I) LB=DUMMY(I)+DUMMY2(I) END IF WRITE(3,150) I,UB,LB 150 FORMAT(I7,F19.3,F13.3) 130CONTINUE 15CONTINUE END
  • 28. Eric Szegedi: An Assessment of Fuzzy Linear Regression 28 B Appendix DDLPRS: DOUBLE PRECISION Purpose: solve a linear programming problem via the revised simples algorithm Usage: Call DDLPRS(M, NVAR, A LDA, BL, BU, C, IRTYPE, XLB, XUB, OBJ, XSOL, DSOL) Arguments: M – Number of constraints. NVAR – Number of variables (p+1). A – Matrix of dimension M by NVAR containing the coefficients of the M constraints. LDA – Leading dimension of A exactly as specified in the dimension statement of the calling program. LDA must be at least M. BL – Vector of length M containing the lower limit of the general constraints. If there is no limit on the I-th constraint, the BL(I) would not be referred. BU – Vector of length M containing the upper limit of the general constraints. If there is no upper limit on the I-th constraint, then BU(I) would not be referred. If there is no range constraint, BL and BU can share the same storage location. C – Vector of length NVAR containing the coefficients of the objective function. IRTYPE – Vector of length M indication the type of constraints exclusive of simple bounds, where IRTYPE(I) = 0, 1, 2, 3 indicate .EQ., .LE., .GE., and range constraints respectively.
  • 29. Eric Szegedi: An Assessment of Fuzzy Linear Regression 29 XLB – Vector of length NVAR containing the lower bound on the variables. If there is no lower bound on a variable, then 1.0E30 should be set as the lower bound. XUB – Vector of length NVAR containing the upper bound on the variables. If there is no upper bound on a variable, then -1.0E30 should be set as the upper bound. OBJ – Value of the objective function (Output). XSOL – Vector of length NVAR containing the primal solution (Output). DSOL – Vector of length M containing the dual solution (Output). DDLPRS is based on Richard Hanson’s routine LPMGB. It uses a revised simplex method to solve linear programming problems. In this thesis the problem is of the form: ! c " R min p +1 r cT i = 1 n # x i subject to ! A r cT " r bj c j " 0 where A is the coefficient matrix consisting of p+1 columns and 2n rows such that ! A = (1" H) X X # $ % & ' ( X is a design matrix such that ! X = 1 x11 K x1p 1 x21 K x2p M M M M 1 xn1 K xnp " # $ $ $ $ % & ' ' ' ' ! xi = 1, xi1 , ..., xip[ ] T , bl is the vector of the lower bounds on the constraints consisting of 2n rows such that
  • 30. Eric Szegedi: An Assessment of Fuzzy Linear Regression 30 ! r bj = Y1 + (1" H)e1 " r #T x1 "Y2 + (1" H)e2 + r #T x2 K K K Y2n + (1" H)e2n + r #T x2n "Y2n + (1" H)e2n + r #T x2n $ % & & & & & & & & & ' ( ) ) ) ) ) ) ) ) ) and ! r cT is a vector of p+1 rows.
  • 31. Eric Szegedi: An Assessment of Fuzzy Linear Regression 31 C Appendix Predicted Value of Y vs. Residual of Y for Restenosis Data Set Normal Probability Plot for Restenosis Data Set To view contact author at szegman@yahoo.com
  • 32. Eric Szegedi: An Assessment of Fuzzy Linear Regression 32 References [1] IMSL, Inc., Houston, Texas. IMSL Math/Library, version 1.0 edition, 1989. [2] D. Klaua. Uber einen ansatz zur mehrwertigen mengenlehre. Monatsber Deut. Akad. Wiss. Berlin, 7:859-876, 1965. [3] David G. Kleinbaum, Lawrence L. Kupper, and Keith E. Muller. Applied Regression Analysis and Other Multivariable Methods, 2 ed. PWS-Kent Publishing Company, Boston, 1988. [4] Hervert Moskowitz and Kwangjae Kim. On assessing the h value in fuzzy linear regression. Fuzzy Sets and Systems, 58:303-327, 1993. [5] Georg Peters. Fuzzy linear regression with fuzzy intervals. Fuzzy Sets and Systems, 63:45-55, 1994. [6] David T. Redden and William H. Woodall. Properties of certain fuzzy linear regression methods. Fuzzy Sets and Systems, 64:361-375, 1994. [7] Dragan A. Savic and Witold Pedrycz. Evaluation of fuzzy linear regression models. Fuzzy Sets and Systems, 39:51-63, 1991.
  • 33. Eric Szegedi: An Assessment of Fuzzy Linear Regression 33 [8] Michael Smithson. Fuzzy Set Analysis for Behavioral and Social Sciences, Springer- Verlag New York, Inc., NY 1987. [9] Hideo Tanaka and H. Ishibuchi. Identification of possibilistic linear systems by quadratic membership functions of fuzzy parameters. Fuzzy Sets and Systems, 41:145-160, 1991. [10] Hideo Tanaka, Satoru Uejima, and Kiyoji Asai. Linear regression analysis with fuzzy model. IEEE Transactions on Systems, Man, and Cybernetics, 12(6):903-907, 1982. [11] William S. Weintraub, Andrzej S. Kosinski, Charles L. Brown III, and Spencer B. King III. Can restenosis after coronary angioplasty be predicted from clinical variables? Journal of the American College of Cardiology, 21:6-14, 1993. [12] Lofti A. Zadeh. Fuzzy sets. Information and Control, 8:338-353, 1965.